Short description: Natural language processing technique
In natural language processing, in-context learning, few-shot learning or few-shot prompting is a prompting technique that allows a model to process examples before attempting a task.[1][2] The method was popularized after the advent of GPT-3[3] and is considered to be an emergent property of large language models.[4]
A few-shot prompt normally includes n examples of (problem, solution) pairs known as "shots", with the overall usage of such a prompt being known as n-shot prompting.[5][6] For instance, the following is a one-shot prompt for review sentiment classification: Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment: If the model outputs "positive", then it has correctly solved the task.[4]
The term zero-shot prompting is often used to signify that no examples are provided.[7][8][9] An example of a zero-shot prompt for a question-answering task would be "Who wrote the book On the Origin of Species?".
In-context learning was initially proposed as an alternative to fine-tuning a pre-trained language model on a task-specific dataset.[3] Since no parameters are changed in the model, the model itself doesn't learn anything. But the prompt primes the model for subsequent inference, within a particular conversation or context. The main advantages of in-context learning over fine-tuning are a reduction in the amount of task-specific data needed and a reduced potential of overfitting by learning an overly narrow distribution from a large but narrow fine-tuning dataset.[3] Few-shot performance of large language models has been shown to achieve competitive results on NLP tasks, sometimes surpassing prior state-of-the-art fine-tuning approaches.[3][10] Examples of such NLP tasks are translation, question answering, cloze tasks, unscrambling words, and using a novel word in a sentence. The creation and optimization of such few-shot prompts is part of the now active field of study of prompt engineering.[11][12]
While few-shot prompting has performed competitively when compared to fine-tuned models, it has its own drawbacks. For example, it has been shown that the order in which the shots are listed can make the difference between state-of-the-art and random guess performance. A set of few-shot examples that works well in some specific order with one model may not work at all when used with a different model.[13] Despite these shortcomings, the commonly used Transformer model can encode principled learning algorithms based on gradient descent inside their weights and enable mesa-optimization[14] i.e. learn-to-learn small models based on the data given in-context when making predictions.[15][16][17][18]
A common example of in-context learning is chain-of-thought prompting, where few-shot examples are given to teach the model to output a string of reasoning before attempting to answer a question.[19] This technique has been shown to improve performance of models in tasks that require logical thinking and reasoning.[20]
See also
Prompt engineering
Fine-tuning (machine learning)
Chain-of-thought prompting
References
↑Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222.
↑Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].
↑ 4.04.1Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
↑Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6.
↑Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].
↑Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.
↑Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].
↑Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav et al. (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R. et al.. eds. "Language Models are Few-Shot Learners". Advances in Neural Information Processing Systems (Curran Associates, Inc.) 33: 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
↑Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185.
↑Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works.". https://www.businessinsider.com/prompt-engineering-ai-chatgpt-jobs-explained-2023-3.
↑Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required." (in en). Washington Post. https://www.washingtonpost.com/technology/2023/02/25/prompt-engineers-techs-next-big-job/.
↑Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556.
↑Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].
↑Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
↑Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].
↑Musser, George. "How AI Knows Things No One Told It". https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/.
↑Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought" (in en). https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html.
↑Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].
v
t
e
Differentiable computing
General
Differentiable programming
Neural Turing machine
Differentiable neural computer
Automatic differentiation
Neuromorphic engineering
Concepts
Gradient descent
Cable theory
Cluster analysis
Regression analysis
Pattern recognition
Adversarial machine learning
Computational learning theory
Programming languages
Python
Julia
Application
Machine learning
Artificial neural network
Scientific computing
Artificial Intelligence
Hardware
TPU
VPU
Memristor
SpiNNaker
Software library
TensorFlow
PyTorch
Implementation
Audio-visual
AlexNet
WaveNet
Human image synthesis
HWR
OCR
Speech synthesis
Speech recognition
Facial recognition system
Verbal
Word2vec
Transformer
BERT
NMT
Project Debater
Watson
GPT-3
Decisional
AlphaGo
Q-learning
SARSA
OpenAI Five
People
Alex Graves
Ian Goodfellow
Yoshua Bengio
Geoffrey Hinton
Yann LeCun
Andrew Ng
Demis Hassabis
Portals
Computer programming
Technology
Category
Artificial neural networks
Machine learning
0.00
(0 votes)
Original source: https://en.wikipedia.org/wiki/In-context learning (natural language processing). Read more