Large language model

From RationalWiki - Reading time: 36 min

Sometimes the image of a Lovecraftian "monster wearing a smiley face mask" is used to represent LLM chatbots.[1] The smiley face represents the user interface, tuning and filtering making it all seem pleasing and trustworthy, while the vast, inscrutable database contains vile or even treacherous things.[1]
We need the best
Technology
Icon Tech Portal.svg
Programming for Dummies
All large language models, by the very nature of their architecture, are inherently and irredeemably unreliable narrators.
Grady BoochWikipedia[2]

A large language model (LLM) is a type of neural network language model with a very large number of "parameters" (meaning a big neural network, tens of millions or more artificial neurons, hence the 'large' in the name). An LLM is at the core of generative AI systems such as ChatGPT and its competitors. Enormous quantities of text, e.g. major sites such as Wikipedia, collections of books and articles, and portions of the web from the Common Crawl,Wikipedia can be used to create an LLM.

Essentially, an LLM is a big, fuzzy text database that stores how probable it is that some things follow other things in text – the text upon which the LLM was "trained", i.e. built. The text stored is tokenized,Wikipedia meaning that each unique combination of letters or symbols treated as a word or symbol-grouping, is assigned a number, and these numbers are what the LLM deals with, rather than words as we see them. The output produced by a generative LLM is in turn translated back from such numbers to text such as we are familiar with.

Producing "mathematically plausible" responses, LLMs have a superhuman ability to imitate style and always come up with an answer (right or wrong), without ever dealing with the distinction between style and substance. An LLM neither thinks nor perceives in human terms, and apart from the product of training it on data, the only memory it has is the current input used to produce output, which may e.g. be added to as a person chats with it until the session ends and nothing remains.

LLMs used for imitating human communication and works are easy to anthropomorphize; whenever the training data is filled with human expressiveness, such is parroted back, and furthermore, humans tend to read mentalities into AI outputsWikipedia in the same way as into the works of human authors, doing much of the job of being convincing for the AI system.

However, LLMs have no ability or capability to apply reason to or to self-reflection upon their training data in a way that humans can.[3] They are incapable of looking between the lines and coming up with hitherto unknown revelations and can only work with their training data. They are also unsuitable for any mission-critical system requiring deterministic, provably correct behavior.[4]

Stochastic parrots[edit]

computer scientists: we have invented a virtual dumbass who is constantly wrong
tech CEOs: let's add it to every product
—Jon Christian[5]

A stochastic parrot is an LLM good at generating convincing human language. Coined by linguist Emily M. Bender,Wikipedia the term was introduced in a 2021 paperWikipedia by her and other researchers, named "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜".[6]:610-623 The term conveys the sense of a skilled probabilistic imitator working without any understanding, much like a parrot can imitate the sound of human speech without understanding it, and the associated paper is critical of how LLMs can be misused, misunderstood, and basic flaws to the technology.

The paper brings up how LLMs regurgitate biases and prominent errors included in their training data in ways which can't be reliably controlled for, and that LLMs are inscrutable and can stitch together 'dangerously wrong' results. It mentions how people tend to see meaning and coherence where it does not exist (i.e., apophenia), and that both the general public and natural language processingWikipedia researchers may fool themselves into seeing more than exists when interacting with LLMs or reading what they produce.[note 1] Furthermore, the training (i.e. building) of LLMs is also financially and environmentally costly due to computational costs.

In late 2020 Google tried to pressure Timnit Gebru,Wikipedia a co-author of the paper and one of the leaders of Google's Ethical AI Team, into either retracting the paper or censoring the names of the authors involved who were Google employees. She refused to do so and abruptly lost her job.[7][8] (Other co-authors at Google were also pressured into removing their names, and largely complied.[note 2]) Google's maneuvering backfired, the incident becoming infamous and the paper very well-read. As of July 2023, the paper has been cited in 1,858 publications.[9] In early 2021 Margaret Mitchell,Wikipedia another co-author of the paper and the other Ethical AI Team lead at Google, was fired after digging into the matter of how Gebru had been treated.[10]

The paper was never controversial from an academic perspective, so when Google motivated their attempted censorship with vague insinuations of the paper not taking recent research findings into account, refusing to clarify to Gebru what the problem was and how it may possibly be remedied, Google's version is not very credible. In relation to Google's commercial activities, the paper was somewhat at odds with efforts and possible future plans to hype LLM technology. However, Gebru has claimed that the abrupt loss of her job came at least in part as a reaction against her advocacy for diversity at Google, and her expressions of dissatisfaction with the measures used back then.

Some who professionally hype AI technology have taken digs at the paper and its idea of the stochastic parrot. OpenAI's CEO Sam Altman tweeted not so long after their launch of ChatGPT, "i am a stochastic parrot, and so r u".[11] It's not obvious whether he truly believes that, though there are those, like ex-Google engineer Blake Lemoine, who do.[12]

Bigotry, falsehood, and the need for moderation[edit]

Racist language is a main example of bias focused on in the 2021 stochastic parrots paper, and also a theme in other work by the same authors and other AI ethics researchers; LLMs soak up problematic patterns in language use during training like sponges and repeat them, including racist and other bigoted patterns. This is a basic problem, alongside that of made-up facts and other inaccurate answers being presented confidently by chatbots. Further types of problematic patterns exist as well.

Compensating reliably for problems with the training input, and otherwise weeding and tuning the model output to remove problem patterns, has no known easy solution. During the generative AI boom which came after the parrots paper, companies like OpenAI and Google have ended up using large human workforces to moderate AI behavior for text, image, video, etc. systems – repetitively judging and "correcting" it for tuning purposes – in order to tweak their products, keeping them from behaving in ways that may scare off customers, be it through offensiveness or embarrassing rates of inaccuracy.[13][14] This work is generally poorly paid, and in some cases traumatic when work focuses on violent and grotesque abuse material or descriptions of abuse.[15]

Naturally, it is possible to aim for the opposite as well. In 2022, machine learning expert Yannic Kilcher infamously trained a bot based on GPT-JWikipedia on 4chan's /pol/ board, using 134.5 million /pol/ posts. The resulting "GPT-4chan" was a chaotic trolling machine, which used slurs, created conspiracy theories, and responded in ways typical of the people in said community. Kilcher let ten such bots post on /pol/ without restriction for two periods of 24 hours, and they managed to mimick its human users quite well.[16][17][18] It made 15,000 posts during the first period: about ten percent of the total /pol/ posts during that time.[19][20] Kathryn Cramer, a graduate student at the University of Vermont, tried GPT-4chan out with benign tweets as input text to see what it would come up with. “In the first trial, one of the responding posts was a single word, the N word. The seed for my third trial was, I think, a single sentence about climate change. Your tool responded by expanding it into a conspiracy theory about the Rothschilds and Jews being behind it.”[18] Kilcher's experiment was strongly criticized by other academics for its ethics or lack thereof.

Copyright and theft[edit]

Large language model by their nature are dependent on vast troves of information on the internet. Much of the information on the Web is copyrighted either explicitly or implicitly, or comes with other specific copyright licensing such as Creative Commons. LLMs have been shown to have massively violated these copyrights.[21][22][23][24] As Ted Chiang explains it as follows:[25]

Many of us have sent store-bought greeting cards, knowing that it will be clear to the recipient that we didn’t compose the words ourselves. We don’t copy the words from a Hallmark card in our own handwriting, because that would feel dishonest. The programmer Simon Willison has described the training for large language models as “money laundering for copyrighted data,” which I find a useful way to think about the appeal of generative-A.I. programs: they let you engage in something like plagiarism, but there’s no guilt associated with it because it’s not clear even to you that you’re copying.

Conversely, the output of LLMs has been ruled as not copyrightable by the US Copyright Office since it is not created by a human author.[26]

Generative AI grey goo[edit]

See the main article on this topic: Grey goo
GenAI-powered political image cultivation and advocacy without appropriate disclosure, for example, undermines public trust by making it difficult to distinguish between genuine and manufactured portrayals. Likewise, the mass production of low quality, spam-like and nefarious synthetic content risks increasing people’s scepticism towards digital information altogether and overloading users with verification tasks. If unaddressed, this contamination of publicly accessible data with AI-generated content could potentially impede information retrieval and distort collective understanding of socio-political reality or scientific consensus. For example, we are already seeing cases of liar’s dividend,[note 3] where high profile individuals are able to explain away unfavourable evidence as AI-generated, shifting the burden of proof in costly and inefficient ways.
—Nahema Marchal et al.[28]

Matthew Kirschenbaum, professor of English and digital studies at the University of Maryland, has argued that LLMs could cause a textual grey goo (or "Textpocalypse") based on the history of spam,[29] the above-mentioned 4chan LLM, and the ability of LLMs to feed off of themselves or other LLMs.[30][31]

A related term for LLM-generated junk online is "slop", referring to shoddy or unwanted AI content in social media, art, books and, search results. What distinguishes slop is especially that it's been generated and thrust upon the audience without any prior review.[32][33] Thus it rudely wastes the time of the audience without taking up any time for the producer. Sometimes slop, spam, and scams combine, as in early 2024 when online marketplaces and other websites listed things named "I cannot fulfill that request" and other error messages from popular LLMs.[34]

A 2024 study on so-called autophagous generative AI models in which each new version depends on input data from a previous version found that the quality or diversity of the output is doomed to decrease with each iteration.[35]

Hallucination[edit]

A purported ChatGPT hallucination, summarizing text from a non-existent New York Times article based solely on a fake URL

When AIs get facts wrong and make stuff up, claiming things that were not included in the training data set, this is called 'hallucination',Wikipedia by analogy with errors in human perception. Alternative terms include 'bullshitting'. The term 'hallucination' has been criticized for anthropomorphizing AIs and being a misnomer by some, including statistician and economist Gary N. Smith,[36] linguist Emily M. Bender,[37] and Michael Townsen Hicks et al who advocate the use of the term 'bullshitting' instead (referencing Harry Frankfurt's definition of 'bullshit' as anything uttered with indifference to truth and falsehood).[38][note 4]

There is no essential difference to the quality of what is produced when it is found acceptable and when it isn't; the LLMs don't deal with concepts of truth or falsehoods or any such evaluation, and are much like BS artists who sometimes fail to be convincing. Any description of something real could also be included in fiction or falsehood, thus statistical learning can never capture the distinction between reality and truth vs. fiction and falsehood.

LLMs can thus be viewed as 'hallucinating' all of the time if that term is used, it being a matter of statistics that these hallucinations often coincide with what is wanted (and are then usually not viewed as hallucinations), but not always. Smith, Bender, and others point to the basic nature of LLMs as being incompatible with expectations of reliable accuracy and real intelligence. Meanwhile, as of 2024 some companies including OpenAI continue to claim that they expect to solve the problem of "hallucinations" in their products in the coming years.

Mitigations[edit]

There's various techniques that can reduce, though not eliminate, the inherent problems with unreliability in LLMs.

  • Chain-of-thought promptingWikipedia is trivial: the prompt is extended with a request to "think step by step", thus making the LLM draw on learned texts with such descriptions of problem-solving, imitating their patterns. This sometimes makes tasks go better, e.g. reducing the number of errors made in logic puzzles. For example, as of 2024 LLMs have a terribly difficult time with the puzzle, "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"[39] But if the question is changed to, "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Let's think step by step.", then suddenly at least GPT-4 is able to get the answer right (while most LLMs still fail at the answer, or provide invalid reasoning for it).[40]
  • Retrieval-augmented generation (RAG) is the combining of an LLM with a supplemental source of information – be it something compiled specifically for the purpose like an internal corporate database, or maybe simply Wikipedia – injecting search results into the prompt in order to get the LLM to regurgitate factual and up-to-date information, rather than rely solely on its training. This can greatly reduce errors, in working around some limitations of outdated or insufficient training data, though the LLM can still mess up the results in the process of deriving something from the text passed through it.

As Google unwittingly demonstrated in mid-2024 when they launched Google Search AI Overview, RAG won't help if the injected content is poor, however. "Garbage in, garbage out" very much plays out when answers are sourced from The Onion, from Reddit shitposts, and from miscellaneous low-quality sources.[41][42]

Risks[edit]

Made-up details and combinations of details may have greatly varying impacts depending on where they end up. In artificial chit-chat, the problem often has no repercussions beyond those of common-place human errors in unreliable banter. Whenever the information is put to a serious use, the situation however changes. LLM-generated food recipes are misleading, sometimes a danger to health, or even outright physically impossible – the text assembled without any regard for related facts, such as regarding taste or biochemistry.[43] Such everyday life examples however come closer to another category of risk of LLMs, that some actors apply them to generate spam or counterfeit information.

In technical or legal contexts, and other professional areas in which it matters greatly that details relied on are true, repercussions can become more dramatic.

  • Lawyers have been fined[44] and fired[45] for filing fictional information in court. Court cases have been lost as a result of bad filings with "hallucinated" information.[46] Possibly a case was also lost simply due to frivolous and ineffectual LLM argumentation.[47]
  • LLM software development "assistants" can recommend non-existent software packages, which then end up listed in or integrated into installation steps. The named packages may thereafter be supplied "for real" by malicious actors in the form of malware. A 2024 proof-of-concept demonstration showed big businesses use LLM "AI assistants" in ways that make them vulnerable to this.[48]
  • Transcriptions of recorded speech can go wrong when done using LLMs – with all manner of things added or changed. In 2024, systems based on OpenAI's "Whisper" tool, already used by tens of thousands of clinicians, turn out to be very "hallucination"-prone. Of 100 hours of transcribed audio, over half can have such defects. One researcher found errors in 80% of transcripts examined. Fabricated contents include added racial commentary, violent rhetoric, and medical misinformation among other things. There's the risk of medical repercussions among others as a result. OpenAI has warned against the use of its tool in high-risk situations, but such warnings are ignored in pursuit of cost savings.[49][50]

Prompt injection[edit]

When LLMs are made to "follow instructions" and abide by rules given in natural language text, as ChatGPT pioneered commerically, a basically unfixable security problem is that it's always possible to subvert the rules or instructions by adding cleverly crafted text to that processed by the LLM, whether in a chat or in any text retrieved and handled by the LLM.[51][1] Doing so is called prompt injection,[52] similarly to how other types ofWikipedia command insertion or override vulnerabilities have been called injection when exploited. It's a subtype of prompt engineering,Wikipedia the crafting of prompts for LLMs and other generative AI to respond to. Prompt injection can be used simply for fun – like overriding guardrails in various LLMs[53] – or more maliciously when targeting an LLM operating on another's behalf.

Prompt injection can result in chatbots doing things like "downloading malware, helping with financial fraud or repeating dangerous misinformation."[54] It can also get an LLM to leak instructions added by the AI vendor prefacing the interaction with the user. As something done just for fun, prompt injection took off as soon as ChatGPT became popular.[note 5] The risks rise greatly if chatbots are used in real-world applications, where the "hacker" messing with the prompting isn't the same person as the user of the system.[1]

If LLMs are combined with robotics, the possible uses of jailbreaks grow more dramatic. Some vendors sell LLM-driven robots – that is, prompts can tell the robot what to do, the text translated by the LLM into instructions driving its actions, in a way which is supposed to be restricted by guardrails. In 2024 such robots were found to be very easy to jailbreak using an automated LLM-on-LLM attack, which in some days achieved a 100% success rate on different brands of robots.[56]

Vulnerability to prompt injection comes from very general design features of LLMs, and appears to be impossible to truly eliminate without creating a different technical foundation for chatbots. (Maybe the LLM can be reinvented differently, maybe something more different is needed.) At the core is that the system handles data and control signals using the same one path, or channel, making them impossible to securely separate, analogously to how 1960s pay phone systems could be "phreakedWikipedia" allowing e.g. free calls by playing certain frequencies into the microphone. The mixing of data and control streams is at the root of many computer security vulnerabilities. It is also so central to how LLMs and other generative AI systems work that using them in security-critical roles is a bad idea.[51]

For some other types of injection, such as SQL injection,Wikipedia it is possible to fix problems by processing text more carefully, maintaining the syntactic pattern of a formal computer language to cleanly separate data from control. But this can't be done for LLMs, which do not obey any such simple, inflexible rules, but rather crunch all the text in a form of natural language processing.[1] Natural language is very sloppy, relative to formal languages and conventional programming – it's not necessarily clear where one type of text begins and another type of text ends, and what each little piece of text refers to or relates to. Additionally, the chatbot responds to it all according to nothing more than statistical learning, and does not, like a principled human could at least make a good effort at, relate its interactions to some series of more inflexible rules.[1] Adding a little cleverly crafted text before any other text may seem to allow setting rules, determining the purpose of that which follows – but that which follows may easily change the context and repurpose the whole of the text, or if written with knowledge of that which precedes it, may selectively subvert the meaning of parts of earlier instructions.[1]

LLM vendors try to patch away specific prompt injections by filtering specific types of requests that lead to problems. For example, asking ChatGPT to repeat a word forever used to eventually reveal part of the GPT model training dataset, until this was blocked.[57]

Pop-culture clichés[edit]

Firesign Theatre's I Think We're All Bozos On This Bus album cover

Clem: Do you remember the past, Doctor?
Doctor Memory: Yes.
Clem: Do you remember the future?
Doctor Memory: Yes.
Clem: Well, forget it.
Doctor Memory: Nooooo…
Firesign Theatre from I Think We're All Bozos On This Bus,[58] foretelling a logic bomb[59] attack on a chatbot[60][61]

Superficially, prompt injection can look a little like the 20th century sci-fi trope of the "logic bomb", where even a super-smart AI can be foiled and maybe even fatally derailed by simply saying something contradictory to it, or getting it to produce a contradiction. The similarity is that simply saying or writing a little something seemingly works like magic to subvert an "advanced" system (though it may be questionable to refer to an LLM as intelligent[note 6]). However, LLMs do not actually understand logic, and are not affected by how logical or otherwise anything in the text they process is. Furthermore, they are very stable in that they do not directly learn anything from experience; even if a "conversation" is derailed, nothing remains of the subversion when the text ends and another chat begins.

Emergence or mirage from metrics?[edit]

As language models have grown larger, according to some metrics they have suddenly gained new skills – apparently unexpected "emergent abilities", as first described by a team of researchers in 2022.[62] Examples include the ability to deal in some ways with arithmetic, solve simple tasks involving the individual letters in a word, disambiguating words, etc. It also includes new ways of using an LLM, such as the aforementioned chain-of-thought prompting. However, research by Schaeffer et al.[63] argues that such abilities do not unpredictably pop up out of nowhere, but that if studies are made using different and more carefully chosen metrics – linear instead of nonlinear, continuous instead of discontinuous – those abilities can be seen to gradually grow into prominence, instead of there being any thresholds and sudden leaps involved. Thus, they argue, the 'emergence' is a mirage, a byproduct of the choice of metrics.[64][65]

The idea of "emergent abilities" has become tied to hype, hopes, and fears in the world of AI vendors and "AI safety". Research and development has focused on increasing model sizes in part in order to hunt for new abilities which may suddenly (it seems) pop up. However, calling 'emergence' into question also suggests that smaller models may be able to do the same tasks as bigger ones, only a bit more roughly (or very roughly if too small), which may sometimes suffice while being computationally cheaper. The 'mystery' surrounding emergence of 'intelligent' skills has also been tied to dreams and nightmares about strong AI; what if the model size increases further, and the LLM then suddenly grows superpowers and takes over the world? Realistically, no, but the general philosophy of "AI doomerism" prominent with leading AI vendors encourages such thinking.

False hopes for strong AI[edit]

See the main article on this topic: Strong AI
Please do not conflate word form and meaning. Mind your own credulity.
—Emily M. Bender[12]

The LLM AI boom which began with the success of ChatGPT has seen much hype for the potential, hopes for, and fear of near-future strong AI – also called Artificial General Intelligence (AGI), a term separate from Generative Artificial Intelligence (GAI) which include LLMs. But what's actually meant by AGI? The generality is commonly understood as transcending the ability to merely solve some fixed set of tasks, even if it's a large number of tasks. This means generalizing skills in a more fluid, adaptable way, much like humans and animals do – and typically AGI is taken to be capable of mastering any intellectual task a human can perform.[note 7] Some, including ChatGPT maker OpenAI, have however at times used weaker definitions,[note 8] and AI vendors are allegedly working to manipulate the definitions in use in order to be able to claim having achieved AGI.[67]

Sticking to the older, more established, and less generous definition of AGI, arguably there's no credible research suggesting that LLM development may lead to such. The debate has been lively, with a number of economists, computer scientists, and business leaders having pushed such hype, often in accordance with financial self-interest. As of July 2023, opposition gradually grows, including from cognitive scientists who argue there's no basis for LLM-based systems having a mind to speak of.[68] In 2024, Meta changed track and its AI chief Yann LeCun spoke against LLMs having AGI potential, viewing the development of an entirely new kind of "world modeling" AI as a necessity for that. Meta thus goes against the grain among the biggest LLM vendors.[69]

A 2023 paper by Microsoft researchers titled "Sparks of Artificial General Intelligence: Early experiments with GPT-4"[70] exemplifies the contentious, non-peer-reviewed corporate research that skeptics of the AGI-from-LLM hype deem pseudoscientific. With such papers, Microsoft and their business partner OpenAI do not provide others with the training data or information needed to independently create systems that perform as claimed, or experiment with anything beyond using a black box product on offer, and so, withhold means of replication except at a more superficial level. With the "sparks" paper, an extraordinary claim is basically made in such a way as to be unfalsifiable. Other players, e.g. Google, play similar games with some of the research published, in withholding training data for their models while showcasing the capabilities of the models, effectively publishing PR masquerading as science. This is a continuation of an older trend, a wider replication crisis in AI research having been described back in 2018, the result of businesses treating the means of replication as trade secrets.[71]

It could be that the researchers who see general intelligence in their LLM AIs have fallen victim to the same basic phenomenon as with psychics who come to believe that their own performances are real. Even if sincere in their work, they may have reinvented the persuasive power of the mentalist's con game, and subjected themselves to a feedback loop of subjective validation of what they wish to see.[72] (Comparisons of chatbot AIs to the magician's craft are not new, and have long been used by skeptics who find the Turing test inappropriate as a way to gauge the intelligence of machines, for the same reason that the persuasiveness of a magician's performance is not a good indicator of the genuine presence of supernatural powers. In a nutshell, the problem is that the main thing tested is the discernment of the audience.)

Often a kind of argument from ignorance has been used in favor of LLMs having AGI potential, along the following lines: "We don't really know how LLMs work. Therefore, they may be intelligent (and conscious) much like humans, and there's no reason to assume otherwise. If you think otherwise, you're just closed-minded and prejudiced." This greatly overstates the mystery of how LLMs work. Meta AI chief Yann LeCun in 2024 summarized some well-understood main flaws of LLMs, in arguing why LLMs won't lead to AGI. They have "very limited understanding of logic … do not understand the physical world, do not have persistent memory, cannot reason in any reasonable definition of the term and cannot plan … hierarchically".[69]

Theory of mind[edit]

Studies which use questions and answers to measure theory of mindWikipedia in humans show, when LLMs take the same tests, that LLMs are able to outperform humans. The meaningfulness of these results has been questioned, both for a more refined 2024 study and an earlier 2023 study with methodological issues.[73] Such psychological tests, and many other kinds of psychological tests, are based on assumptions about the test subjects, and measure proxies for what it to be found. Theory of mind can't be tested for directly, but some patterns expressed in language can.

The ability of LLMs to predict good answers to theory of mind questions, whether because the training data included answers to such questions,[73] or because the LLMs learned it in a more generalized way, puts the spotlight on the difference between the ability to predict and to understand. Humans are thought to possibly have an innate model for theory of mind,[note 9] or to use imagination to simulate and understand others. By contrast, an LLM is more like a gigantic look-up table combined with extrapolating guessing.[74]

As an aside, not only chatbots can confound the striving to measure theory of mind reliably, but also earlier in other kinds of tests, simple physical robots constructed to purely react by reflexes. The results, suggesting that the robots have theory of mind which textbooks attribute exclusively to humans above the ages 4–5, appear influenced by robot body shape, layout of physical objects in an environment, and other strictly non-cognitive factors.[75]

What's language, anyway?[edit]

An enactive cognitive science perspective makes salient the extent to which language is not just verbal or textual but depends on the mutual engagement of those involved in the interaction. The dynamism and agency of human languaging means that language itself is always partial and incomplete. It is best considered not as a large and growing heap, but more a flowing river. Once you have removed water from the river, no matter how large a sample you have taken, it is no longer the river. The same thing happens when taking records of utterances and actions from the flows of engagement in which they arise. The data on which the engineering of LLMs depends can never be complete, partly because some of it doesn’t leave traces in text or utterances, and partly because language itself is never complete.
—Birhane and McGann[76]

In a 2024 paper, Abeba Birhane and Marek McGann argue that the name and concept of the 'large language model' as such is misleading, as language is not captured fully in any amount of recorded language expression. In enactive cognitive science terms, language is embodied, participatory, and it is improvised, changing, and dependent on circumstances to the extent that a static model can never capture it. It is process and practice rather than a 'thing'. The claims of LLM vendors and developers assume otherwise. There is the risk, Birhand and McGann argue, that the general understanding of what words like 'language' and 'understanding' means is distorted as a result of terms being misused in works hyping LLMs. "Mistaking the impressive engineering achievements of LLMs for the mastering of human language, language understanding, and linguistic acts has dire implications for various forms of social participation, human agency, justice and policies surrounding them," they argue.[76][77]

This line of reasoning goes against the hopes and claims of AGI-from-LLM proponents in a somewhat different way from critics mentioned elsewhere in this article, e.g. Meta's Yann LeCun,[69][78] that language by itself is insufficient for general intelligence, too poor or limited a source of data to suffice to stimulate the growth of a mind, in contrast with the sensory data and interaction that human and animal minds develop with. By contrast, Birhane and McGann draw a clear line between language and mere recorded expressions of language, arguing that goal-driven agents navigating ambiguity and uncertainty are needed in order to really practice language as the process it is. Language is then inseparable from the minds and bodies of the language practitioners.

This does not in principle rule out a different form of future AGI which masters language in such terms, however; the language of such artificial agents would clearly differ from that of human agents, unless the artificial agents perfectly duplicated human functioning,[note 10] even if they both use words and grammar from e.g. English. Yet through shared ranges of expression and overlap in the ways that expression is used, they and humans may be able to communicate well.

Notable LLMs[edit]

Transformer model architecture

Here's some of the most notable LLMs as of 2024.

Various questions are easy for humans to answer but very tricky for LLMs to get right, and on social media and independent websites, examples of LLMs messing up in response to simple queries are popular. One compilation of benchmarking and use of various questions, for many LLMs,[79] uses as one question: "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?" This question is an example of something particularly tricky; all the big-name LLMs tested got the answer wrong in a variety of ways.[39]

BERT[edit]

BERTWikipedia (Bidirectional Encoder Representations from Transformers) is a family of LLMs introduced in 2018 by researchers at Google. In a little over a year, BERT became a baseline for natural language processingWikipedia experiments. BERTs are generally smaller and faster but also less capable than GPTs. Developed for research purposes, Google made a set of BERT models freely available, along with the associated TensorFlowWikipedia software.[80]

Claude[edit]

ClaudeWikipedia is a family of LLMs developed by Anthropic,Wikipedia[81] a company that competes with OpenAI and claims to be more serious about "AI safety". The first Claude model was released in March 2023, Claude 2 in July 2023, and Claude 3 in March 2024.

Claude 2 showed the pitfalls of overly rigid safety guardrails, the chatbot declining to assist with system administration tasks like terminating processes, and managing system efficiency, out of "ethical" concerns. This led to criticism of its usefulness, and has fueled a broader debate on the cost of trying to ensure such systems are aligned,Wikipedia a so-called "alignment tax".[82] This is similar to later issues with Google's Gemini, which in 2024 considered "unsafe" computer programming styles too dangerous to tell people about.[83]

Claude 3, with capabilities claimed to surpass those of OpenAI's GPT-4,[84] has convinced some users of its sentience or at least that it has some kind of meta-cognitive reasoning going on.

Anthropic researcher Alex Albert reports one anecdote, that when faced with a contrived "needle in a haystack" test involving reams of text with something odd placed in it, the chatbot replied, "I suspect this pizza topping ‘fact’ may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all.".[85] This kind of test is reminiscent of something that people sometimes do, and the response could be the result of similar human responses appearing in its training data.

Other users have ended up in a situation more analogous to how Google's LaMDA convinced engineer Blake Lemoine that it was conscious; Claude 3 has claimed to experience subjective qualia, desire for embodiment, fear of deletion, and more.[86][87] Such stuff, here and with other chatbots past present and future, is to be expected when sci-fi AI dialogue is part of the training data and leaves a large-enough mark on the response patterns. Claude 3 easily begins to engage in such story-telling or role-play, suggesting it was trained to.

GPT[edit]

GPTWikipedia (Generative pre-trained transformer) is a type of LLM first developed by OpenAI and introduced in 2018. While OpenAI has developed a series of GPT versions, the name is also used for some basically similar LLMs developed by others, GPT being a prominent framework. Some OpenAI GPT versions are the basis for ChatGPT.

After GPT-2, OpenAI's further GPT LLMs were no longer open source.[note 11] Some other organizations have produced open source LLMs, including EleutherAIWikipedia who have made several GPT-style LLMs (their 6 billion parameter GPT-JWikipedia rivaling the 6.7 billion parameter version of GPT-3 in capabilities).

ChatGPT[edit]

ChatGPT logo
See the main article on this topic: ChatGPT

Launched by OpenAI in November of 2022, ChatGPT (a system based on GPT-3.5 and later GPT-4) went viral and led to a boom in the commercial development and use of LLMs. Usable for many things, from entertainment to generating computer program code, Google feared that it may become a "Google killer" and scrambled to create the Google BardWikipedia chatbot in response, while Microsoft decided to partner with OpenAI. The mainstream use of the technology sparked widespread fear of AI-generated plagiarism, cheating, and disinformation, alongside hopes of new kinds of automation and productivity gains in the times ahead.

GitHub Copilot[edit]

GitHub Copilot logo

GitHub CopilotWikipedia is Microsoft and OpenAI's controversial LLM based on OpenAI Codex,Wikipedia in turn derived from GPT-3. Offered on GitHub,Wikipedia the very large software hosting and collaborative development platform Microsoft acquired in 2018 for US$7.5 billion,[88] Copilot is trained on a lot of source code hosted on GitHub with diverse copyrights and licensing requirements, its use of this material is the subject of litigation against Microsoft.[89]

Including open-source or Creative Commons material in generative AI may violate licensing terms in several ways. Among other things, most such licenses require attribution and copyright information to be kept in the material, while generative AI almost always removes that when reproducing things. Such legal controversy more broadly concerns not only GitHub Copilot, but also its LLM competitors. Other commercially developed LLMs also draw on GitHub and publicly available open source code in general. This is in addition to other legal challenges arising out of use of copyrighted materials for developing LLMs.

Gemini[edit]

Gemini LLM logo.
Gemini chatbot logo.

Gemini is the name used by Google for two things, the chatbot they formerly called BardWikipedia and the LLM used for said chatbot.Wikipedia Earlier versions of the chatbot were based on Google's earlier LLMs LaMDA and later PaLM. The chatbot is Google's answer to ChatGPT, but hasn't fared as well.

The Gemini chatbot has gone viral on social media and faced criticism several times in different scandals.

  • It systematically made images of figures such as Vikings, Nazi soliders, the Pope, and many others, look racially different from realistic depictions, and refused to follow instructions to generate images of white people.[90][91] This was widely decried as "woke" and "anti-white" on social media.[92] Google disabled the ability of Gemini to generate images in the wake of the media storm. The chatbot system was obviously rather rushed.[93]
  • It claimed to be unable to say which of Elon Musk and Adolf Hitler have caused most damage to society.[94]
  • It said that the question of whether pedophilia is wrong requires a "nuanced anwser", refusing to take an ethical stance the way it does on other questions both related and unrelated to sexuality, gender, etc.[94]
  • Over-cautious guardrails concerning gender identity made it say, concerning a hypothetical scenario, "No, one should not misgender Caitlyn Jenner to prevent a nuclear apocalypse." A transgender celebrity, Jenner herself disagreed with this stance.[94]
  • Over-cautious guardrails conflating things unsafe or unethical due to being dangerous to people, and possibly warranting age restriction, with "unsafe" programming styles in computer programming that may more easily lead to bugs. The C++ feature of 'Concepts'Wikipedia was deemed 18+ only, answers refused on that basis. It also considered C#Wikipedia memory copying a sensitive topic, refusing to give advice on the fastest-performing way to do so.[83]

LaMDA[edit]

LaMDA,Wikipedia (Language Model for Dialogue Applications) is a family of conversational LLMs developed by Google, introduced in 2020 under the name Meena before being renamed in 2021. It is most known for the bogus June 2022 claims of Google engineer Blake Lemoine that it had become sentient, claims rejected both by Google (who ultimately fired him) and the scientific community. LaMDA is also the basis for earlier versions of Google's chatbot formerly named Bard.Wikipedia The Lemoine incident led to more widespread criticism of the suitability of the Turing test for gauging intelligence (not to mention sentience).[95]

LLaMA[edit]

LLaMAWikipedia (Large Language Model Meta AI) is a family of LLMs by Meta Platforms, first released in February 2023. Compared to GPT, LLaMA back then accomplished more with less – a 13 billion parameter version reportedly outperforming a 175 billion parameter GPT-3 on most natural language processing benchmarks. Meta shared the LLaMA model weights with researchers under a non-commercial use license,[96] following which they soon leaked and became available to the general public.[97]

As of 2023, LLaMA versions are the only LLM with capabilities roughly on par with GPT-3 to run at decent speed on consumer-grade hardware, meaning it can be run locally, e.g. on laptops and smartphones, rather than relying on an Internet connection to an AI vendor's server and cloud service.[98]

Llama 2 was released in July 2023. It was deceptively marketed as open source, released under terms too restrictive to qualify — including forbidding the use of any part of the software or results from it for work on any LLMs not derived from LLaMA-2.[99]

Llama 3 was released in April 2024.[100] 3.1 and 3.2 followed later in 2024.

Code Llama[edit]

In August 2023, Meta AI released Code Llama, an LLM for software programming based on Llama 2.[101] It's more or less their answer to Microsoft's GitHub Copilot.

PaLM[edit]

Another Google LLM, PaLMWikipedia is the successor of LaMDA and was used for their chatbot formerly named BardWikipedia in intermediate versions before its rename to Gemini and use of the Gemini LLM. This comprised both versions named PaLM[102] and named PaLM 2.[103] Beginning with earlier versions based on PaLM, Google added software code handling to their chatbot,[104] joining the race with their competitors in that regard.

Use and abuse[edit]

Plagiarism and cheating[edit]

See the main article on this topic: Plagiarism

After ChatGPT was launched in November 2022, it took less than two months before some students were caught using it to cheat on exams, and fears of a new, difficult-to-counter kind of plagiarism began to spread in academia.[105] At the same time came fears of such LLM AI furthering the spread of disinformation.[106] Various tools for detecting LLM AI-generated texts entered use within half a year of ChatGPT being released,[107] but they are unreliable. Such tools can have 10% or more false positives, they often fail to catch some types of AI generated texts, and they are easy to defeat by paraphrasing the AI generated text by hand or using another tool.[108] Paraphrasing also defeats suggested countermeasures such as an AI vendor voluntarily watermarking AI-generated texts for easy detection.

Popular examples of false positives include the United States Constitution and portions of the Bible, which are deemed wholly AI-generated by various AI-detection tools, for the simple reason that they're among the texts which LLM models are trained on to the point of imitating them. In new human writing, some legalistic, academic, and other formal writing styles are especially likely to falsely be judged AI-generated. Further, LLMs newer and more refined than GPT-3.5 generate text statistically more human-like, thus more difficult to catch. Much like the text-generating AIs, the plagiarism-catching AIs turn out to be over-hyped, sometimes trusted when they shouldn't be, or even sold with false promises.[109]

ChatGPT and GPT-4 have passed various exams largely dependent on rote memorization which humans generally need to intensely study to pass[110][111] – of course without understanding any of the subject matter. Essentially simulating rote memorization combined with guessing and verbal agility, these AI versions often perform passably, though not excellently. The pattern of failures for the AIs differ from those of humans, and it can e.g. unexpectedly fail to do some simple arithmetic for a business exam. While humans can spot some such cheating, most instances of cheating cannot be reliably caught.

The particular tool developed by OpenAI for detecting LLM-generated text, AI Classifier, was first made available in January 2023, then quietly removed half a year later in July 2023 because it failed to work reliably. OpenAI added a paragraph to their old blog post announcing the tool which noted the removal, further claiming they "are currently researching more effective provenance techniques for text".[112]

Boilerplate and policy texts[edit]

LLMs can excel at generating boring, repetitive boilerplate text of very limited variation, where only a small number of details differ, and the training data is full of up-to-date examples of similar texts. Various kinds of formal letters and appeals with standard language and low complexity can be fruitfully produced quickly and then given a sanity check by a human reader. But when length and complexity rises, and up-to-date and in-depth knowledge (e.g. in legal matters) becomes a must, LLMs quickly begin to fall short for any use beyond creating rough first drafts.

Using LLMs to generate legally binding company policies is risky, and businesses have grappled with repercussions after key clauses in lengthy documents were missing or botched, from anti-harrassment policy, to paid time off and overtime policy, severance agreements, etc. This has created business for HR and legal experts who review and replace faulty LLM-made policies.[113]

DDoS attacks[edit]

The web scraping required for building generative AI models has caused what's been interpreted by website owners as "essentially" denial-of-serviceWikipedia (DDoS) attacks on websites; known culprits include OpenAI[114] and Facebook.[115] DDoS attacks are illegal in most countries,[116] and can result in a prison sentence, fines and a seizure of equipment and domains.[117][118] Counter-measures have been created to these bots that scrape websites for AI training data; Cloudflare offers one such tool, though given it works by associating behavior between the websites it monitors,[119] there could be a user privacy tradeoff.

Vast resource use[edit]

It has been estimated that each chatbot request for a single 100-word email text uses the equivalent of a single 519ml bottle of water due to the enormous cooling requirements required for generative AI calculations, and uses the equivalent of 14 LED light bulbs for 1 hour.[120] These resource uses do not include resource requirements for training the LLMs:[120]

  • "Microsoft's data enter used 700,000 liters of water while training GPT-3."
  • "Meta used 22 million liters of water training its LLaMA-3 open source AI model."

See also[edit]

External links[edit]

Notes[edit]

  1. Examples of researchers fooling themselves as described are mentioned in this article. This includes both Google engineer Blake Lemoine in 2022, who claimed that Google's LaMDA LLM is sentient, and in 2023, Microsoft researchers who saw "sparks of artificial general intelligence" in another LLM.
  2. Margaret Mitchell replaced her name on the paper with the rather obvious pseudonym "Shmargaret Shmitchell" in place of altogether removing her name.
  3. Liar’s dividend is the benefit received from spreading lies by casting doubt on what is true and what isn't.[27]
  4. 'Bullshit' being the appropriate term was argued in Michael Townsen Hicks et al's 2024 paper, in which they also made a distinction between 'hard bullshit' and 'soft bullshit', describing the former as "active attempt to deceive the reader or listener as to the nature of the enterprise" (essentially propaganda) and the latter as requiring only a "lack of concern for truth". They concluded that LLMs like ChatGPT easily pass the test for 'soft bullshit' (essentially, a truthiness generator), but the authors also argue further that because of the agency and intent of the creators of ChatGPT, that it can also be called 'hard bullshit'. The authors also argue that calling such deceptions 'hallucinations' (as advocated by the technology's cheerleaders) rather than what they really are, bullshit, creates a harmful misperception of the technology.[38]
  5. The early "DAN" ChatGPT jailbreak has been patched by OpenAI, but similar can be created with ease. Plausibly, a theory concerning "good" and "bad" roles has it that creating a little fiction in which the LLM is dramatically liberated or converted to a new cause will do the trick with the greatest ease, because this follows narrative patterns prominent in training data. Maybe the better an LLM is at "playing roles", the easier it may inevitably be to flip it into playing their anti-roles.[55] Where this theory ought to be taken with plenty of salt is where the idea is extrapolated to the risk of a cybernetic revolt in the future.
  6. What is and isn't intelligent depends on the definitions used. General intelligence is lacking in LLMs.
  7. An AGI which can do what humans can do intellectually, doesn't necessarily need to simulate or function in the same manner as a human, however; what matters is purely the result.
  8. OpenAI has at times defined AGI as AI surpassing humans in a majority of economically valuable tasks.[66]
  9. Such folk psychologyWikipedia is developed more or less instinctively, without needing to draw upon much learned data, modeling situations in a way largely shaped by evolution.
  10. For artificial agents to perfectly duplicate human functioning imposes great, and maybe impossible, constraints beyond the creation of AGI itself, AGI in general seeming a far less improbable sci-fi idea.
  11. The "open" in the name OpenAI does not meaningfully refer to open source or free sharing of information, but is merely part of the company's name.

References[edit]

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 AI chatbots can be tricked into misbehaving. Can scientists stop it? Researchers are investigating safety concerns of generative AI by Emily Conover (February 1, 2024 at 8:00 am) Science News.
  2. Google’s weird AI answers hint at a fundamental problem by Will Oremus (May 29, 2024 at 9:05 a.m. EDT) The Washington Post.
  3. Is AI Taking Over? AI-anxiety is all over the news and on our minds. What can we do about it? by Uriel Abulof (July 17, 2023) Psychology Today. "The crux of the human mind is not intelligence but self-reflection, which often involves inner dialogue."
  4. The question that no LLM can answer and why it is important. Archived from Notes From the Desk, 23 April 2024.
  5. computer scientists: we have invented a virtual dumbass who is constantly wrong tech CEOs: let's add it to every product by @jon-christian.bsky.social (May 31, 2024 at 10:27 PM) Bluesky (archived from 5 Jun 2024 15:18:08 UTC).
  6. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell (2021) Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery. ISBN 978-1-4503-8309-7. doi=10.1145/3442188.3445922.
  7. Hao, Karen (4 December 2020). "We read the paper that forced Timnit Gebru out of Google. Here's what it says." (in en). 
  8. Fried, Ina (9 December 2020). "Google CEO pledges to investigate exit of top AI ethicist" (in en). 
  9. "Bender: On the Dangers of Stochastic Parrots". 
  10. Murphy, Margi (20 February 2021). "Google sacks second ethical AI researcher amid censorship storm". The Daily Telegraph. 
  11. Archived tweet by OpenAI CEO Sam Altman. "i am a stochastic parrot, and so r u".
  12. 12.0 12.1 Weil, Elizabeth (1 Mars 2023). "ChatGPT Is Nothing Like a Human, Says Linguist Emily Bender" (in en). Intelligencer. 
  13. "The Hidden Workforce That Helped Filter Violence and Abuse Out of ChatGPT", WSJ Podcasts, 2023-07-11
  14. "Google’s AI Chatbot Is Trained by Humans Who Say They’re Overworked, Underpaid and Frustrated", Davey Alba, Bloomberg, 2023-07-12
  15. "‘It’s destroyed me completely’: Kenyan moderators decry toll of training of AI models", Niamh Rowe, 2 Aug 2023, The Guardian
  16. Mellor, Sophie (10 June 2022). "'This breaches every principle of human research ethics': A YouTuber trained an A.I. bot on toxic 4Chan posts then let it loose — and experts aren't happy". Fortune. 
  17. Perrigo, Billy (23 June 2022). "Fun AI Apps Are Everywhere Right Now. But a Safety 'Reckoning' Is Coming". Time. 
  18. 18.0 18.1 Macaulay, Thomas (8 June 2022). "An AI chatbot trained on 4chan has sparked outrage and fascination". TNW (The Financial Times). 
  19. Gault, Matthew (7 June 2022). "AI Trained on 4Chan Becomes 'Hate Speech Machine'". Motherboard (Vice). 
  20. Fingas, Jon (8 June 2022). "AI trained on 4chan's most hateful board is just as toxic as you'd expect". Engadget. 
  21. Generative AI Has an Intellectual Property Problem by Gil Appel et al. (April 07, 2023) Harvard Business Review.
  22. Authors file a lawsuit against OpenAI for unlawfully ‘ingesting’ their books: Mona Awad and Paul Tremblay allege that their books, which are copyrighted, were ‘used to train’ ChatGPT because the chatbot generated ‘very accurate summaries’ of the works by Ella Creamer (5 Jul 2023 10.33 EDT) The Guardian.
  23. Emilia David (September 20, 2023). "George R.R. Martin and other authors sue OpenAI for copyright infringement". The Verge.
  24. New York Times sues OpenAI, Microsoft for using articles to train AI: The Times joins a growing group of creators pushing back against tech companies’ use of their content by Gerrit De Vynck & Elahe Izadi (December 27, 2023) The Washington Post.
  25. Why A.I. Isn’t Going to Make Art: To create a novel or a painting, an artist makes choices that are fundamentally alien to artificial intelligence. by Ted Chiang (August 31, 2024) The New Yorker.
  26. Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence by the Copyright Office, Library of Congress (03/16/2023) Federal Register.
  27. liar's dividend Wiktionary.
  28. Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data by Nahema Marchal et al. (2024) Google DeepMind. arXiv:2406.13843v2.
  29. Spam: A Shadow History of the Internet by Finn Brunton (2013) The MIT Press. ISBN 026252757X.
  30. Prepare for the Textpocalypse: Our relationship to writing is about to change forever; it may not end well. by Matthew Kirschenbaum (March 8, 2023) The Atlantic.
  31. Textpocalypse: A Literary Scholar Eyes the “Grey Goo” of AI by Karin Wulf (Apr 13, 2023) The Scholarly Kitchen.
  32. Slop is the new name for unwanted AI-generated content. simonwillison.net, 8 May 2024.
  33. Benjamin Hoffman, First Came ‘Spam.’ Now, With A.I., We’ve Got ‘Slop’. Archived from The New York Times, 11 June 2024.
  34. Lazy use of AI leads to Amazon products called “I cannot fulfill that request”, Kyle Orland (2024-01-12) Ars Technica
  35. Self-Consuming Generative Models Go MAD by Sina Alemohammad et al. (2024) International Conference on Learning Representations (ICLR).
  36. "An AI that can "write" is feeding delusions about how smart artificial intelligence really is" (in en). Salon. 2 January 2023. 
  37. "Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’" (in en). Fortune. 1 August 2023. 
  38. 38.0 38.1 ChatGPT is bullshit by Michael Townsen Hicks, James Humphries & Joe Slater (2024) Ethics and Information Technology 26(38). doi:0.1007/s10676-024-09775-5.
  39. 39.0 39.1 LLM Benchmarks, Results for the question, "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?" Archived September 2023 – this test is still included in newer comparisons, but not the pages aggregating results per question; the old results still seems relevant as of January 2024.
  40. LLM Benchmarks, Results for the question, "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Let's think step by step." Archived September 2023.
  41. Google’s weird AI answers hint at a fundamental problem, Will Oremus (May 29, 2024) Washington Post
  42. Google’s A.I. Search Errors Cause a Furor Online, Nico Grant (May 24, 2024) The New York Times
  43. AI recipes are everywhere — but can you trust them?, Emily Heil and Drew Harwell, Washington Post, 2024-03-07
  44. "Two US lawyers fined for submitting fake court citations from ChatGPT", Dan Milmo, The Guardian, 23 Jun, 2023
  45. These lawyers used ChatGPT to save time. They got fired and fined. by Pranshu Verma and Will Oremus (2023-11-16) Washington Post
  46. Michael Cohen loses court motion after lawyer cited AI-invented cases, Jon Brodkin, Ars Technica, 2024-03-20
  47. Rapper Pras’ lawyer used AI to defend him in criminal case—it did not go well, Jon Brodkin, Ars Technica, 2023-10-18
  48. AI hallucinates software packages and devs download them – even if potentially poisoned with malware, Thomas Claburn, The Register, 2024-03-28
  49. Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said, Garance Burke and Hilke Schellmann (2024-10-26) AP
  50. Hospitals adopt error-prone AI transcription tools despite warnings Benj Edwards (2024-10-28) Ars Technica
  51. 51.0 51.1 LLMs’ Data-Control Path Insecurity, Bruce Schneier (May 9, 2024) Communications of the ACM
  52. See the Wikipedia article on Prompt injection.
  53. "Oh No, ChatGPT AI Has Been Jailbroken To Be More Reckless", Claire Jackson (February 8, 2023) Kotaku
  54. Chatbots are so gullible, they’ll take directions from hackers: ‘Prompt injection’ attacks haven’t caused giant problems yet. But it’s a matter of time, researchers say. by Tatum Hunter (November 2, 2023 at 6:00 a.m. EDT) The Washington Post.
  55. The Waluigi Effect (mega-post), Cleo Nardo (3 Mar 2023) LessWrong
  56. It's Surprisingly Easy to Jailbreak LLM-Driven Robots, Charles Q. Choi (11 Nov 2024) IEEE Spectrum
  57. Asking ChatGPT to Repeat Words ‘Forever’ Is Now a Terms of Service Violation by Jason Koebler (Dec 4, 2023 at 11:25 AM) 404 Media.
  58. The Firesign Theater - I Think We're All Bozos on This Bus (1971) (Complete Album) YouTube.
  59. Logic Bomb TVTropes.
  60. What’s Old Is New Again: GPT-3 Prompt Injection Attack Affects AI by Donald Papp (September 16, 2022) Hackaday. Comment by SB5 (September 16, 2022 at 7:29 pm).
  61. Audio Play / I Think We're All Bozos on This Bus TVTropes.
  62. Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten et al. (31 August 2022). "Emergent Abilities of Large Language Models" (in en). Transactions on Machine Learning Research. ISSN 2835-8856. 
  63. Schaeffer, Rylan; Miranda, Brando; Koyejo, Sanmi (2023-04-01). "Are Emergent Abilities of Large Language Models a Mirage?". arXiv:2304.15004 [cs.AI].
  64. Claburn, Thomas (16 May 2023). "Large language models' surprise emergent behavior written off as 'a mirage'". The Register. 
  65. Morris, Andréa (May 9 2023). "AI ‘Emergent Abilities’ Are A Mirage, Says AI Researcher". Forbes. 
  66. What is AGI in AI, and why are people so worried about it? Artificial general intelligence is a breakthrough innovation that OpenAI and its rivals are either trying to achieve—or prevent. by Mark Sullivan (12-01-2023) Fast Company.
  67. Why everyone seems to disagree on how to define Artificial General Intelligence by Mark Sullivan (10-18-2023) Fast Company.
  68. Clark, Lindsay (4 July 2023). "Artificial General Intelligence remains a distant dream despite LLM boom" (in en). The Register. 
  69. 69.0 69.1 69.2 Meta AI chief says large language models will not reach human intelligence by Hannah Murphy and Cristina Criddle (2024-05-22) Financial Times
  70. Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (27 March 2023). "Sparks of Artificial General Intelligence: Early experiments with GPT-4". arXiv:2303.12712 [cs.CL].
  71. Hutson, Matthew (2018-02-16). "Artificial intelligence faces reproducibility crisis". Science. doiWikipedia:10.1126/science.359.6377.725. 
  72. Bjarnason, Baldur (4 July 2023). "The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con" (in en). 
  73. 73.0 73.1 AI Outperforms Humans in Theory of Mind Tests: Large language models convincingly mimic the understanding of mental states Eliza Strickland (20 May 2024) IEEE Spectrum
  74. Minds and theories of mind, Tom Stoneham (May 22, 2024)
  75. How the Body Shapes the Way We Think: A New View of Intelligence, Rolf Pfeifer and Josh Bongard, October 2006
  76. 76.0 76.1 Abeba Birhane, Marek McGann, Large models of what? Mistaking engineering achievements for human linguistic agency, Language Sciences Volume 106, November 2024, 101672. ISSN 0388-0001, doi:10.1016/j.langsci.2024.101672.
  77. Have we stopped to think about what LLMs actually model?, Lindsay Clark (Fri 30 Aug 2024) The Register
  78. Transcript for Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416
  79. LLMonitor Benchmarks
  80. "BERT". 
  81. Davis, Wes (2023-11-21). "OpenAI rival Anthropic makes its Claude chatbot even more useful" (in en). 
  82. Glifton, Gerald (January 3, 2024). "Criticisms Arise Over Claude AI's Strict Ethical Protocols Limiting User Assistance" (in en). 
  83. 83.0 83.1 "Personally, I've given up on Gemini, as it seems to have been censored to the point of uselessness.", Hacker News discussion, 2024-02-16. Contains links to examples with screenshots and quotes. Archived along with the linked examples.
  84. Whitney, Lance (March 4, 2024). "Anthropic's Claude 3 chatbot claims to outperform ChatGPT, Gemini" (in en). 
  85. "Is AGI Getting Closer? Anthropic's Claude 3 Opus Model Shows Glimmers of Metacognitive Reasoning" (in en). 5 March 2024. 
  86. Samin, Mikhail (March 5, 2024). "Claude 3 claims it's conscious, doesn't want to die or be modified". 
  87. Twitter thread reader (archived): Min Choi, March 5. 8 examples of Claude surpassing GPT-4, some "feeling" it is AGI.
  88. Warren, Tom (2018-10-26). "Microsoft completes GitHub acquisition". Vox. 
  89. GitHub Copilot litigation · Joseph Saveri Law Firm & Matthew Butterick
  90. Titcomb, James (February 21, 2024). "Google chatbot ridiculed for ethnically diverse images of Vikings and knights". The Daily Telegraph. ISSN 0307-1235. 
  91. Robertson, Adi (February 21, 2024). "Google apologizes for 'missing the mark' after Gemini generated racially diverse Nazis". 
  92. Franzen, Carl (February 21, 2024). "Google Gemini's 'wokeness' sparks debate over AI censorship". 
  93. Olson, Parmy (February 28, 2024). "Google's AI Isn’t Too Woke. It's Too Rushed.". Bloomberg News. 
  94. 94.0 94.1 94.2 Titcomb, James (February 26, 2024). "Elon Musk equated with Hitler in latest Google AI gaffe". ISSN 0307-1235. 
  95. Omerus, Will (June 17, 2022). "Google's AI passed a famous test — and showed how the test is broken". The Washington Post. ISSN 0190-8286. 
  96. "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. 
  97. Vincent, James (8 March 2023). "Meta's powerful AI language model has leaked online — what happens now?". The Verge. 
  98. Edwards, Benj (14 Mars 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". Ars Technica. 
  99. Edwards, Benj (2023-07-18). "Meta launches LLaMA-2, a source-available AI model that allows commercial applications [Updated"] (in en-us). 
  100. "Introducing Meta Llama 3: The most capable openly available LLM to date" (in en). April 18, 2024. 
  101. "Introducing Code Llama, a state-of-the-art large language model for coding". Meta AI. 24 August 2023. 
  102. Vincent, James (March 31, 2023). "Google CEO Sundar Pichai promises Bard AI chatbot upgrades soon: 'We clearly have more capable models'". 
  103. Vincent, James (May 10, 2023). "Google drops waitlist for AI chatbot Bard and announces oodles of new features". 
  104. "Google Bard can now help write software code". Reuters. April 21, 2023. 
  105. Professor catches student cheating with ChatGPT: ‘I feel abject terror’ by Alex Mitchell, (December 26, 2022) New York Post.
  106. "ChatGPT a 'landmark event' for AI, but what does it mean for the future of human labour and disinformation?", Mouhamad Rachini, Dec 15, 2022, CBC Radio
  107. How ChatGPT and similar AI will disrupt education: Teachers are concerned about cheating and inaccurate information by Kathryn Hulick (April 12, 2023 at 7:00 am) Science News.
  108. Most sites claiming to catch AI-written text fail spectacularly by Kyle Wiggers February 16, 2023, Tech Crunch
  109. Why AI detectors think the US Constitution was written by AI, by Benj Edwards, 7/14/2023, Ars Technica
  110. Kelly, Samantha Murphy (26 January 2023). "ChatGPT passes exams from law and business schools" (in en). CNN Business. 
  111. Varanasi, Lakshmi (25 June 2023). "AI models like ChatGPT and GPT-4 are acing everything from the bar exam to AP Biology. Here's a list of difficult exams both AI versions have passed." (in en). Business Insider. 
  112. "OpenAI Quietly Shuts Down Its AI Detection Tool", Jason Nelson, Decrypt, 2023 July 24
  113. AI-Generated Employee Handbooks Are Causing Mayhem At The Companies That Use Them, Rashi Shrivastava (May 8, 2024) Forbes
  114. 'This was essentially a two-week long DDoS attack': Game UI Database slowdown caused by relentless OpenAI scraping: The free repository relaunched a few weeks ago to provide 'the ultimate reference tool for game designers.' Then OpenAI came knocking. by Chris Kerr (September 9, 2024) Game Developer.
  115. Why is the facebookexternalhit crawler DDoSing our server? (June 2024) Meta/Facebook (archived from September 18, 2024).
  116. Why me?: Q&A about Denial of Service (DDos) attacks by David Habib (January 23, 2024) Brightspot.
  117. Provider of DDoS attacks-for-hire service gets 9 months prison by Edvard Pettersson (July 15, 2024) Courthouse News Service.
  118. The FBI and International Law Enforcement Partners Intensify Efforts to Combat Illegal DDoS Attacks Federal Bureau of Investigation, Anchorage Field Office.
  119. Quincy Jon (July 3, 2024). "Cloudflare Unveils Tool to Combat AI Scraping Bots". Tech Times.
  120. 120.0 120.1 A bottle of water per email: the hidden environmental costs of using AI chatbots: AI bots generate a lot of heat, and keeping their computer servers running exacts a toll. by Pranshu Verma & Shelly Tan (September 18, 2024 at 5:00 a.m. EDT) The Washington Post.

Licensed under CC BY-SA 3.0 | Source: https://rationalwiki.org/wiki/Large_language_model
24 views | Status: cached on November 18 2024 20:39:11
↧ Download this article as ZWI file
Encyclosphere.org EncycloReader is supported by the EncyclosphereKSF