Self-Play (Reinforcement Learning Technique)

Short description: Reinforcement learning technique

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

Definition and motivation

In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage:

It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge.
It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning.

Usage

Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.^[1]

Self-play is also used to train the Cicero AI system to outperform humans at the game of Diplomacy. The technique is also used in training the DeepNash system to play the game Stratego.^[2]^[3]

Connections to other disciplines

Self-play has been compared to the epistemological concept of tabula rasa that describes the way that humans acquire knowledge from a "blank slate".^[4]

References

↑ Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].
↑ Snyder, Alison (2022-12-01). "Two new AI systems beat humans at complex games" (in en). https://www.axios.com/2022/12/01/ai-beats-humans-complex-games.
↑ Erich_Grunewald (in en). Notes on Meta's Diplomacy-Playing AI. https://www.lesswrong.com/posts/oT8fmwWddGwnZbbym/notes-on-meta-s-diplomacy-playing-ai.
↑ Laterre, Alexandre (2018). "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization". arXiv:1712.01815.

0.00

(0 votes)

[1] Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].

[2] Snyder, Alison (2022-12-01). "Two new AI systems beat humans at complex games" (in en). https://www.axios.com/2022/12/01/ai-beats-humans-complex-games.

[3] Erich_Grunewald (in en). Notes on Meta's Diplomacy-Playing AI. https://www.lesswrong.com/posts/oT8fmwWddGwnZbbym/notes-on-meta-s-diplomacy-playing-ai.

[4] Laterre, Alexandre (2018). "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization". arXiv:1712.01815.

Self-Play (Reinforcement Learning Technique)

Contents

Definition and motivation

Usage

Connections to other disciplines

Further reading

References

Machine learning and data mining

Problems Classification Clustering Regression Anomaly detection AutoML Association rules Reinforcement learning Structured prediction Feature engineering Feature learning Online learning Semi-supervised learning Unsupervised learning Learning to rank Grammar induction
Supervised learning (classification • regression) Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Expectation–maximization (EM) DBSCAN OPTICS Mean-shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA t-SNE
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection k-NN Local outlier factor
Artificial neural networks Autoencoder Deep learning DeepDream Multilayer perceptron RNN LSTM GRU Restricted Boltzmann machine GAN SOM Convolutional neural network U-Net
Reinforcement learning Q-learning SARSA Temporal difference (TD)
Theory Bias–variance dilemma Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Machine-learning venues NeurIPS ICML ML JMLR ArXiv:cs.LG
Glossary of artificial intelligence Glossary of artificial intelligence
Related articles List of datasets for machine-learning research Outline of machine learning
v t e