In-context learning (natural language processing)

Short description: Natural language processing technique

In natural language processing, in-context learning, few-shot learning or few-shot prompting is a prompting technique that allows a model to process examples before attempting a task.^[1]^[2] The method was popularized after the advent of GPT-3^[3] and is considered to be an emergent property of large language models.^[4]

A few-shot prompt normally includes n examples of (problem, solution) pairs known as "shots", with the overall usage of such a prompt being known as n-shot prompting.^[5]^[6] For instance, the following is a one-shot prompt for review sentiment classification: Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment: If the model outputs "positive", then it has correctly solved the task.^[4]

The term zero-shot prompting is often used to signify that no examples are provided.^[7]^[8]^[9] An example of a zero-shot prompt for a question-answering task would be "Who wrote the book On the Origin of Species?".

In-context learning was initially proposed as an alternative to fine-tuning a pre-trained language model on a task-specific dataset.^[3] Since no parameters are changed in the model, the model itself doesn't learn anything. But the prompt primes the model for subsequent inference, within a particular conversation or context. The main advantages of in-context learning over fine-tuning are a reduction in the amount of task-specific data needed and a reduced potential of overfitting by learning an overly narrow distribution from a large but narrow fine-tuning dataset.^[3] Few-shot performance of large language models has been shown to achieve competitive results on NLP tasks, sometimes surpassing prior state-of-the-art fine-tuning approaches.^[3]^[10] Examples of such NLP tasks are translation, question answering, cloze tasks, unscrambling words, and using a novel word in a sentence. The creation and optimization of such few-shot prompts is part of the now active field of study of prompt engineering.^[11]^[12]

While few-shot prompting has performed competitively when compared to fine-tuned models, it has its own drawbacks. For example, it has been shown that the order in which the shots are listed can make the difference between state-of-the-art and random guess performance. A set of few-shot examples that works well in some specific order with one model may not work at all when used with a different model.^[13] Despite these shortcomings, the commonly used Transformer model can encode principled learning algorithms based on gradient descent inside their weights and enable mesa-optimization^[14] i.e. learn-to-learn small models based on the data given in-context when making predictions.^[15]^[16]^[17]^[18]

A common example of in-context learning is chain-of-thought prompting, where few-shot examples are given to teach the model to output a string of reasoning before attempting to answer a question.^[19] This technique has been shown to improve performance of models in tasks that require logical thinking and reasoning.^[20]

References

↑ Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222.
↑ Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].
↑ ^3.0 ^3.1 ^3.2 ^3.3 Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
↑ ^4.0 ^4.1 Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
↑ Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6.
↑ Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].
↑ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.
↑ Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].
↑ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav et al. (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R. et al.. eds. "Language Models are Few-Shot Learners". Advances in Neural Information Processing Systems (Curran Associates, Inc.) 33: 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
↑ Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185.
↑ Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works.". https://www.businessinsider.com/prompt-engineering-ai-chatgpt-jobs-explained-2023-3.
↑ Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required." (in en). Washington Post. https://www.washingtonpost.com/technology/2023/02/25/prompt-engineers-techs-next-big-job/.
↑ Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556.
↑ "Mesa-Optimization". https://www.alignmentforum.org/tag/mesa-optimization.
↑ Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].
↑ Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
↑ Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].
↑ Musser, George. "How AI Knows Things No One Told It". https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/.
↑ Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought" (in en). https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html.
↑ Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/In-context learning (natural language processing). Read more

[1] Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222.

[2] Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].

[gpt3-3] 3.0 ^3.1 ^3.2 ^3.3 Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].

[emerge-4] 4.0 ^4.1 Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].

[5] Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6.

[6] Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].

[Wiggers-7] Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.

[8] Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].

[few-shot-learners-9] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav et al. (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R. et al.. eds. "Language Models are Few-Shot Learners". Advances in Neural Information Processing Systems (Curran Associates, Inc.) 33: 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.

[10] Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185.

[11] Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works.". https://www.businessinsider.com/prompt-engineering-ai-chatgpt-jobs-explained-2023-3.

[12] Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required." (in en). Washington Post. https://www.washingtonpost.com/technology/2023/02/25/prompt-engineers-techs-next-big-job/.

[13] Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556.

[14] "Mesa-Optimization". https://www.alignmentforum.org/tag/mesa-optimization.

[15] Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].

[16] Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].

[17] Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].

[18] Musser, George. "How AI Knows Things No One Told It". https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/.

[19] Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought" (in en). https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html.

[weipaper-20] Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

In-context learning (natural language processing)

See also

References