In-context learning (natural language processing)

From HandWiki - Reading time: 5 min

Short description: Natural language processing technique

In natural language processing, in-context learning, few-shot learning or few-shot prompting is a prompting technique that allows a model to process examples before attempting a task.[1][2] The method was popularized after the advent of GPT-3[3] and is considered to be an emergent property of large language models.[4]

A few-shot prompt normally includes n examples of (problem, solution) pairs known as "shots", with the overall usage of such a prompt being known as n-shot prompting.[5][6] For instance, the following is a one-shot prompt for review sentiment classification: Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment: If the model outputs "positive", then it has correctly solved the task.[4]

The term zero-shot prompting is often used to signify that no examples are provided.[7][8][9] An example of a zero-shot prompt for a question-answering task would be "Who wrote the book On the Origin of Species?".

In-context learning was initially proposed as an alternative to fine-tuning a pre-trained language model on a task-specific dataset.[3] Since no parameters are changed in the model, the model itself doesn't learn anything. But the prompt primes the model for subsequent inference, within a particular conversation or context. The main advantages of in-context learning over fine-tuning are a reduction in the amount of task-specific data needed and a reduced potential of overfitting by learning an overly narrow distribution from a large but narrow fine-tuning dataset.[3] Few-shot performance of large language models has been shown to achieve competitive results on NLP tasks, sometimes surpassing prior state-of-the-art fine-tuning approaches.[3][10] Examples of such NLP tasks are translation, question answering, cloze tasks, unscrambling words, and using a novel word in a sentence. The creation and optimization of such few-shot prompts is part of the now active field of study of prompt engineering.[11][12]

While few-shot prompting has performed competitively when compared to fine-tuned models, it has its own drawbacks. For example, it has been shown that the order in which the shots are listed can make the difference between state-of-the-art and random guess performance. A set of few-shot examples that works well in some specific order with one model may not work at all when used with a different model.[13] Despite these shortcomings, the commonly used Transformer model can encode principled learning algorithms based on gradient descent inside their weights and enable mesa-optimization[14] i.e. learn-to-learn small models based on the data given in-context when making predictions.[15][16][17][18]

A common example of in-context learning is chain-of-thought prompting, where few-shot examples are given to teach the model to output a string of reasoning before attempting to answer a question.[19] This technique has been shown to improve performance of models in tasks that require logical thinking and reasoning.[20]

See also

References

  1. Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222. 
  2. Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].
  3. 3.0 3.1 3.2 3.3 Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
  4. 4.0 4.1 Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
  5. Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6. 
  6. Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].
  7. Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/. 
  8. Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].
  9. Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav et al. (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R. et al.. eds. "Language Models are Few-Shot Learners". Advances in Neural Information Processing Systems (Curran Associates, Inc.) 33: 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. 
  10. Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185. 
  11. Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works.". https://www.businessinsider.com/prompt-engineering-ai-chatgpt-jobs-explained-2023-3. 
  12. Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required." (in en). Washington Post. https://www.washingtonpost.com/technology/2023/02/25/prompt-engineers-techs-next-big-job/. 
  13. Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556. 
  14. "Mesa-Optimization". https://www.alignmentforum.org/tag/mesa-optimization. 
  15. Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].
  16. Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
  17. Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].
  18. Musser, George. "How AI Knows Things No One Told It". https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/. 
  19. Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought" (in en). https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html. 
  20. Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].




Licensed under CC BY-SA 3.0 | Source: https://handwiki.org/wiki/In-context_learning_(natural_language_processing)
1 | Status: cached on October 18 2023 01:38:34
↧ Download this article as ZWI file
Encyclosphere.org EncycloReader is supported by the EncyclosphereKSF