In game theory, a mean payoff game is a zero-sum game played on the vertices of a weighted directed graph. The game is played as follows: at the start of the game, a token is placed on one of the vertices of the graph. Each vertex is assigned to either the Maximizer of the Minimizer. The player that controls the current vertex the token is on, may choose one outgoing edge along which the token moves next. In doing so, the Minimizer pays the maximizer the number that is on the edge. Then, again, the player controlling the next vertex the token gets can choose where it goes, and this continues indefinitely. The objective for the Maximizer is to maximize their long term average payoff, and the Minimizer has the opposite objective.
A mean payoff game consists of a graph , and a function where is the set of vertices, which are partitioned between the players, and where is the weight of an edge. Often, the graph is assumed to be sinkless, which means that every vertex has at least one outgoing edge. A play is a possible outcome of the game, which is an inifinite walk on the graph, we could write this as a sequence of edges: where the head of equals the tail of . The objective value of the game can then be written as follows:
A strategy for the Maximizer is a function , where is the set of finite walks that start at the initial vertex and end at some vertex , which returns an outgoing edge of the end vertex . A strategy for the Minimizer can be defined analogously. If both players fix a strategy, say they pick strategies and , then the outcome of the game is fixed, and the resulting play is the path .
One of the fundamental results for mean payoff games is that they are positionally determined.[1] This means in our case that the game has a unique value, and that each player has a strategy that can attain the value, and that strategy is positional, e.g. it only depends on the current vertex the token is on. In formulas, the following equation holds for the value :
Solving a mean payoff game can mean several things, although in practice finding one often also yields the other:
Determining the value of one or all starting vertices
Determining the optimal strategy for one or both players
Determining the set of starting vertices from which the Maximizer can guarantee a value of at least 0. Doing so is called finding the zero-mean partition (and is also related to solving energy games)
It is a major open problem in computer science whether there exists a polynomial time algorithm for solving any of the above problems. These problems are one of the few to be contained in both the classes NP and coNP[2] but not known to be in P. Currently, the fastest algorithm is a randomized strategy improvement algorithm, which runs in time ,[3] where is the number of Maximizer vertices. The best deterministic algorithms run in time ,[4] where now is the number of edges and the total number of vertices.
Three of the most well-known algorithms for solving mean payoff games are the following (each of which has their own slight variants):
The GKK algorithm.[5] Its main idea is to add a potential to every vertex in the graph, and slowly increase the potential on some nodes until we find the zero-mean partition. This also gives us the energy values in an energy game.
Value iteration.[6] Its main idea is to give each vertex a value, and update the values via fixed point iteration. When the fixed point is reached, the zero-mean partition and energy values can be found.
Strategy improvement.[3] Its main idea is to start with an arbitrary Maximizer strategy, and assign it a valuation. Then, by repeatedly making changes to the strategy that improve its valuation, we end up with a strategy for the Maximizer than guarantees nonzero payoff wherever it is possible.
The problem of solving parity games can be polynomial-time reduced to solving mean payoff games.[7] Solving mean payoff games can be shown to be polynomial-time equivalent to many core problems concerning tropical linear programming.[8] Another closely related game to the mean payoff game is the energy game, in which the Maximizer tries to maximize the smallest cumulative sum within the play instead of the long-term average.