Short description: Natural language processing technique
In natural language processing, in-context learning, few-shot learning or few-shot prompting is a prompting technique that allows a model to process examples before attempting a task.[1][2] The method was popularized after the advent of GPT-3[3] and is considered to be an emergent property of large language models.[4]
A few-shot prompt normally includes n examples of (problem, solution) pairs known as "shots", with the overall usage of such a prompt being known as n-shot prompting.[5][6] For instance, the following is a one-shot prompt for review sentiment classification: Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment: If the model outputs "positive", then it has correctly solved the task.[4]
The term zero-shot prompting is often used to signify that no examples are provided.[7][8][9] An example of a zero-shot prompt for a question-answering task would be "Who wrote the book On the Origin of Species?".
In-context learning was initially proposed as an alternative to fine-tuning a pre-trained language model on a task-specific dataset.[3] Since no parameters are changed in the model, the model itself doesn't learn anything. But the prompt primes the model for subsequent inference, within a particular conversation or context. The main advantages of in-context learning over fine-tuning are a reduction in the amount of task-specific data needed and a reduced potential of overfitting by learning an overly narrow distribution from a large but narrow fine-tuning dataset.[3] Few-shot performance of large language models has been shown to achieve competitive results on NLP tasks, sometimes surpassing prior state-of-the-art fine-tuning approaches.[3][10] Examples of such NLP tasks are translation, question answering, cloze tasks, unscrambling words, and using a novel word in a sentence. The creation and optimization of such few-shot prompts is part of the now active field of study of prompt engineering.[11][12]
While few-shot prompting has performed competitively when compared to fine-tuned models, it has its own drawbacks. For example, it has been shown that the order in which the shots are listed can make the difference between state-of-the-art and random guess performance. A set of few-shot examples that works well in some specific order with one model may not work at all when used with a different model.[13] Despite these shortcomings, the commonly used Transformer model can encode principled learning algorithms based on gradient descent inside their weights and enable mesa-optimization[14] i.e. learn-to-learn small models based on the data given in-context when making predictions.[15][16][17][18]
A common example of in-context learning is chain-of-thought prompting, where few-shot examples are given to teach the model to output a string of reasoning before attempting to answer a question.[19] This technique has been shown to improve performance of models in tasks that require logical thinking and reasoning.[20]
↑Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222.
↑Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].
↑ 4.04.1Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
↑Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6.
↑Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].
↑Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].
↑Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185.
↑Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556.
↑Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].
↑Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
↑Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].