From HandWiki - Reading time: 4 min
| Machine learning and data mining |
|---|
Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".
In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage:
Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.[1]
Self-play is also used to train the Cicero AI system to outperform humans at the game of Diplomacy. The technique is also used in training the DeepNash system to play the game Stratego.[2][3]
Self-play has been compared to the epistemological concept of tabula rasa that describes the way that humans acquire knowledge from a "blank slate".[4]
{{cite arXiv}}: CS1 maint: missing class (link){{cite arXiv}}: CS1 maint: missing class (link)