Evolutionary Stable Strategies (ESS)

A very short intro to evolutionary game theory

Game theory developed to study the strategic interaction among rational self regarding players (players seeking to maximize their own payoffs). However, by the early 1970’s the theory underwent a transformation and part of it morphed into evolutionary game theory, which allows to increase our understanding of dynamical systems, especially in biology and, more recently, in psychology and the social sciences, with significant ramifications for philosophy. The players are not required to be rational at all, but only to have (perhaps hardwired) strategies that are passed on to their progeny. In short, the notion of player is displaced by that of strategy, and consequently the notion of a player’s knowledge, complete or incomplete, is dispensed with. What drives systems is not the rationality of the players but the differential success of the strategies.

As before, we consider only two-player games. A game with strategies s1,…..sn for both players is strategy symmetric (symmetric, in brief) if:

when i=j the payoffs for the two identical strategies are the same, which means that along the main diagonal (top left to bottom right) the payoffs in each box are the same
the payoffs in the remaining boxes on one side of the main diagonal are mirror images of their counterparts on the other side.

For example, The Prisoners’ Dilemma is a symmetric game. Along the main diagonal, the payoffs are the same in each box that is, (1,1) and (6,-6); moreover, we have (-10, 10) in the top right box and (10, -10) in the bottom left, which are mirror images of each other.

	S	C
S	1;1	10;-10
C	-10;10	-6;-6

A symmetric matrix can be simplified by writing only the payoffs of the row player, as those of the column player can be easily obtained by exploiting the symmetry of the game. So, the previous matrix can be simplified as

	S	C
S	1	10
C	-10	-6

Evolutionary Stable Strategies (ESS)

An important concept of evolutionary game theory is that of evolutionarily stable strategy (ESS). To understand it, we need some new notions.

Imagine now that we keep repeating a symmetric game (each round is called a ‘stage game’) with random pairing in an infinite population in which the only relevant consideration is that successful players get to multiply more rapidly than unsuccessful ones. (The demand that the population is theoretically infinite excludes random drift). Suppose that all the players (the incumbents) play strategy X, which can be a pure or a mixed strategy. If X is stable in the sense that a mutant playing a different strategy Y (pure or mixed) cannot successfully invade, then X is an ESS. More precisely, X is an ESS if either

E(X,X)>E(Y,X),

that is, the payoff for playing X against (another playing) X is greater than that for playing any other strategy Y against X

E(X,X)=E(Y,X) and E(X,Y)>E(Y,Y)

that is, the payoff of playing X against itself is equal to that of playing Y against X but the payoff of playing Y against Y is less than that of playing X against Y.

Note that either (1) or (2) will do.

Obviously, if (1) obtains, the Y invader typically loses against X, and therefore it cannot persist. If (2) obtains, the Y invader does as well against X as X itself, but it loses to X against other Y invaders, and therefore it cannot multiply. In short, Y players cannot successfully invade a population of X players.

It is possible to introduce a strategy that is stronger than an ESS, namely, an unbeatable strategy. Strategy X is unbeatable if, given any other strategy Y

E(X,X)>E(Y,X) and E(X,Y)>E(Y,Y).

An unbeatable strategy is the most powerful strategy there is because it strictly dominates any other strategy; however, it is also rare, and therefore of very limited use.

Given a strategy X and any other strategy Y let us call X

Nash if E(X,X)≥E(Y,X)

(X is a best reply to itself)

and

Strict Nash if E(X,X)>E(Y,X)

(X is a strict best replu to itself)

Then the following relations obtain, with the arrow indicating entailment:

Unbeatable → Strict Nash → ESS → Nash

In sum, an unbeatable strategy is also strict Nash, and so on.

A few final points about ESS should be noted:

The population has to be infinite.
The notion of ESS, as we defined it, applies only to repeated symmetric games.
Every ESS is a strategy in a Nash equilibrium, although the reverse is not true (not all Nash equilibria are made up of ESS).
A strict Nash equilibrium in which both players follow the same strategy is made up of ESS. So, for example, in The Prisoners’ Dilemma the strict Nash equilibrium (which is also a dominance equilibrium) is constituted by ESS. Squealing does better against squealing than keeping silent: in a population of squealers keeping silent is a losing strategy, and therefore it cannot invade.
If E(X,X)=E(Y,X) and E(X,Y)>E(Y,Y) obtains, one Y invader would do as well as any X, which means that an ESS need not guarantee the highest payoff.
An ESS can be defeated since it can have a lower payoff than one of two or more simultaneously invading strategies.
One can relax the requirements for an ESS. For example, X may be stable against Y if Y is pure but not if Y is mixed, in which case X may be said, confusingly, to be an ESS (of sorts) in pure strategies.

Often, ESS are associated with mixed strategy equilibria. For example, Chicken (Snowdrift is essentially the same game) has two pure strategy Nash equilibria, neither of which rest on ESS. However, the mixed strategy resulting in a Nash equilibirum is an ESS. (Note that in this context mixed strategies are understood in terms of frequencies of players in a population each playing a pure strategy). A very famous version of ESS is the mixed strategy resulting in Nash equilibrium in Hawk-Dove, a biology oriented version of Chicken.

We now turn to a more general approach to evolutionary games.

Evolutionary Dynamics

We just saw that in a population in which an ESS has already taken over, invasion does not occur successfully. However, under which conditions does a strategy take over in a population? What happens if a game in an infinite population is repeated indefinitely? The answer comes from evolutionary dynamics, which studies the behavior of systems evolving under some specific evolutionary rule. The basic idea here is that of the replicator, an entity capable of reproducing, that is, of making (relevantly) accurate copies of itself. Examples of replicators are living organisms, genes, strategies in a game, ideas (silly or not), as well as political, moral, religious, or economic customs (silly or not). A replicator system is a set of replicators in a given environment together with a given pattern of interactions among them. An evolutionary dynamics of a replicator system is the process of change of the frequency of the replicators brought about by the fact that replicators which are more successful reproduce more quickly than those which are less successful. Crucially, this process must take into account the fact that the success, and therefore the reproduction rate, of a replicator is due in part to its distribution (proportion) in the population. For example, when playing Chicken, although drivers do well against swervers, in a population of drivers a single swerver does better than anybody else. So, it will reproduce more quickly the others. However, at some point or other, there will be enough swervers that the drivers will start to do better again. It would be nice to know whether there is some equilibrium point, and if so what it is.

Since the differential rate of reproduction determines the dynamics of the system, we need to be more precise and specify what we mean by ‘more quickly’. This is determined by the dynamics of the system; the one we are going to study is replicator dynamics. There are other models that plausibly appply to evolution, but replicator dynamics is the easiest and the most often used, at least at a first approach. Replicator dynamics makes three crucial assumptions:

The population is infinite. (In practice, this means that the population is sufficiently large, that is, large enough for the level of accuracy we seek).
There is no random drift; in other words, there are no random events interfering with the results of differential fitness, the fact that some individuals are favored in relation to others. In small populations, random drift is a significant dynamical force, an issue we shall have to deal with later.
The interactions among players (strategies) are random; that is, the probability of strategy S meeting strategy H is the frequency of H. For example, if 1/3 of the population is H, the probability of an individual playing an H is 1/3 and that of playing an S is 2/3.

In addition, we shall restrict ourselves to studying repeated games whose stage games are symmetric and have only two players, so that the math remains easy.

To understand replicator dynamics, we need to introduce a few notions. The first is that of rate of change. Experience teaches that things often change at different rates. For example, sometimes accidentally introduces species multiply more quickly in a population than native species: in other words, their rate of growth is greater than that of native species, and this typically results in a decline of the natives. So, if at the moment of introduction the frequency of the non-native species was p and that of the native species was q= 1-p, with q >> p, after some time the situation may change and become p>q. This means that p has a positive rate of change (it increases) while q has a negative rate of change (it decreases). Mathematically, we express this by writing

D(p) > 0 and D(q) <0 ,

where D(p) (the derivative of p with respect to time) means ‘the rate of change of p’, and similarly for q.

So, suppose a population P increases by 1/3 every second; then if in the beginning p=100, then after one second, p₁=100 + 1/3(100)= 130, after 2 seconds, p₂= p₁ +1/3(p₁) = 130 +1/3(130), and so on.

The second notion, which we have already met, is that of expected payoff of a pure strategy. Suppose a strategy s is played against strategies S1, …,Sn, and that Pr(Si) is the probability that Si is played. Then,

EP(s)=E(s,S1) Pr(S1) + …..+ E(s,Sn) Pr(Sn).

That is, if by Si we denote a generic S, the expected payoff of s is the sum of the payoffs of s against each of the Si times the probability that Si is played.

The third notion is that of average payoff of a set of strategies, and to understand this, we need to consider the notion of the mean. Suppose that in a group of boxes, 1/3 weigh 30 kilos, ½ weigh 20 and 1/6 weigh 60. Then the average weight ĀW (notice the bar above ‘A’) is:

ĀW =30x1/3 + 40x1/2 + 30x1/6 = 10+20+5=35.

In words, the average weight is the sum of all the weights, each multiplied by its frequency. (Since 1/3 of the boxes weigh 30 kilos, we multiply 30 by 1/3, and so on).

Similarly, if S(tag) and H(are) are the two available strategies, the average payoff is

ĀEP= EP(S)Pr(S) + EP(H) Pr(H).

For example, consider the following Stag Hunt matrix, and suppose that Pr(S)=p, so that Pr(H)=1-p.

	S	H
S	3;3	0;2
H	2;0	1;1

Table 12

Then, the EP of S, when played with frequency p against another S and with frequency 1-p against an H is:

EP(S)=3p+0(1-p)=3p.

Analogously, the EP of H, when played with frequency p against another S and with frequency 1-p against another H is

EP(H)=2p+1(1-p)=p+1.

So, the average expected payoff is EP(S) times the probability that S is played plus EP(H) times the probability that H is played:

ĀEP=EP(S) x Pr(S) + EP(H) x Pr(H)= 3p² + (p+1)(1-p)= 2p² +1.

In replicator dynamics, if Pr(S)=p, the dynamical equation (the equation ruling the behavior of the system through time) is:

D(p)= [EP(S) – ĀEP)p

In other words, the rate of change of the frequency of a strategy (in this case S), is determined by the difference between the expected payoff of S and the average payoff. Consequently, when S’s expected payoff is greater than the average payoff, the frequency of S increases, and when it’s smaller then the frequency of S decreases. Hence, in our example we have:

D(p)= [EP(S) – ĀEP)p = [-2p² +3p-1]p.

Obviously, when D(p)=0 the frequency of S (that is, p) does not change. The values of p for which the frequency of S does not change are called “fixed points”. So, let us find the fixed points in our example, that is, let us find when

[-2p² +3p-1]p=0.

Obviously, one fixed point is p=0. For the other two, we need to solve

-2p² +3p-1=0,

which gives p=1 and p=1/2. So, when p=0, or p=1, or p=1/2, the frequency of S does not change. But what happens when p is not equal to any of the three fixed points? Let us study the plot of

D(p)=[-2p² +3p-1]p.

As we can verify by substituting 1/3 for p, when 0<p<1/2 the rate of growth of p is negative, and when 1/2<p<1 the rate of growth of p is positive (just substitute 2/3 for p, for example). Since for p=1/2 the growth rate of p is zero, the plot looks (more or less) like this:

If at some time p<1/2, as the growth rate is negative p will eventually become zero, that is, the strategy S will disappear; by contrast, if at some time p>1/2, then S will become fixated, that is it will remain the only strategy (H will disappear). If p=1/2, then exactly half of the population will play S and half H. However, this equilibrium is not stable in the sense that even a minor deviation from it will push one of H or S to extinction and the other to fixation, both of which are stable.

The interval (0,1/2) is the basin of attraction of which H is the attractor, and the interval (1/2,1) that of S. The fixed points p=0 and p=1 are asymptotically stable because each is an attractor of a basin of attraction. If the basin of attraction of an attractor contains the whole interval in which p is defined, or at least all the interior points, then the attractor is globally stable. In our example, no fixed point is globally stable. 1/2 is the interior fixed point and, as we saw, it is unstable.

Replicator dynamics of a generalized 2 by 2 symmetric game

A symmetric game with 2 strategies , A and B, can be represented by the following payoff matrix, where the payoff are those of the row strategies:

	A	B
A	a	b
B	c	d

It turns out that the replicator dynamics does not change if in any column we add or subtract the same quantity from all the boxes. (For examples, see the exercises). Hence, we can reduce the matrix by subtracting c from the first column and d from the second, obtaining

	A	B
A	a-c	b-d
B	0	0

Let us now determine the dynamics, with Pr(A)=p.

EP(A)=(a-c)p+(b-d)(1-p).

Because of our matrix manipulation EP(B)=0, and consequently the average expected payoff is simply

ĀEP = p[(a-c)p+(b-d)(1-p)]+0.

Hence,

D(p) = p[(a-c)p+(b-d)(1-p)-p²(a-c)-p(b-d) (1-p)].

After a bit of algebra, we get

D(p) = p(1-p)[(a-c)p+(b-d)(1-p)],

that is,

D(p)=p(1-p)[EP(A)].

Hence, D(p) = 0 when p=0, or p=1, or EP(A) = 0. Note that solving EP(A) = 0 gives the interior fixed point.

So, to find the interior point:

· Reduce the matrix by setting strategy B’s payoffs equal to zero and modifying the other payoffs accordingly

· Solve EP(A) = 0.

There are five cases in a game:

A dominates B: a≥c and b≥d, where at least one of the inequalities is strict. It turns out that replicator dynamics wipes out dominated strategies; hence, A will reach fixation.
B dominates A: c≥a and d≥b, where at least one of the inequalities is strict. Since A is dominated, B reaches fixation.
A is the best response to A and B to B. (In other words, your best bet is to mimic the moves of your opponent). Then, a>c and d>b, so that in the reduced matrix a-c > 0 and b-d < 0. The interior point determines an unstable equilibrium. The system is bistable, meaning that one of the two strategies will reach fixation, but which does depends on the initial strategy distribution, that is, the initial value of p. The strategy that has the larger basin of attraction is risk dominant: if the initial distribution is random, on average the risk dominant strategy will reach fixation more often.
A is the best response to B and B the best response to A, that is, b>d and c>a, so that in the reduced matrix a-c < 0 and b-d > 0. In other words, you best bet is to do the opposite of what your opponent does. Then the interior fixed point determines a globally stable equilibrium: the system will converge to it independently of the original distribution. A and B will coexist in a predetermined fixed ratio. The system is ergodic, meaning that the final state is independent of the initial conditions.
A and B are neutral: a=c and b=d. Then selection is neutral, D(p)=0 at all times, and the original strategy distribution p will be preserved.

There are some interesting connections between dominance, Nash equilibriums, ESS, and replicator dynamics.

Nash equilibriums determine fixed points. However, fixed points such as p = 1 are not associated with a Nash equilibrium.
If the fixed point p is associated with an ESS, then p is asymptotically stable. The converse need not be true.
The fixed point p is associated with an ESS which uses every strategy with positive probability (that is, greater than zero) if and only if p is globally stable.
No strongly dominated strategy survives replicator dynamics.

The quasi-replicator dynamics of the iterated Prisoners Dilemma

In replicator dynamics, two players meet randomly, play a one-shot game and then separate, as each randomly meets a player again. As dominated strategies do not survive replicator dynamics, defecting reaches fixation in The Prisoners Dilemma. What happens if we play The Prisoners Dilemma with the evolution equation of replicator dynamics but with direct reciprocity, namely by having the same two players repeat the game more than once, with random drift, with the presence of mutations and with occasional strategy execution errors? To avoid the temptation of using backward induction to defecting all the times, let us suppose that the players do not know how many times they are playing each other; all they know is that after each round they have a probability p of playing again, so that the length of the average play is 1/(1-p) rounds. We may consider a general matrix for cooperation and defection in which only the row payoffs are given:

	C	D
C	R	S
D	T	P

One gets R(eward) for mutual cooperation, P(unishment) for mutual defection, S(ucker) for cooperating against a defector, and T(emptation) for defecting against a cooperator. The Prisoners’ Dilemma obtains if T>R>P>S. We could think of the game as follows. The cooperator helps at a cost c and the receiver of the help gets a benefit b. Defectors do not help and therefore incur no costs. Then: R=b-c; S= -c; T=b; P=0.

To make things more interesting, in addition to ALLC (always cooperate) and ALLD (always defect), let us consider some reactive strategies that act on the basis of what happened in the previous stage.

TFT (tit-for-tat) acts as follows: it starts by cooperating and then it considers the opponent’s last move; if the opponent cooperated TFT cooperates, and if the opponent defected, it defects.

GTFT (generous tit-for-tat) acts as follows: it’s like TFT with one difference: every so many moves (say, 1/3 of the times) it cooperates even if in the previous stage the opponent defected.

WSLS (win-stay; lose-shift) acts as follows: WSLS looks at its own payoff in the last stage; if it equal to T or R it considers his payoff a success and it repeats the previous strategy; if not, it shifts strategy. In short, if the previous payoff was one of the two highest, it keeps doing the same thing; if it wasn’t, it switches.

Martin Nowak has run programs modeling the following scenario. The matrix has R=3,T=5,P=1, and S=0. There is a large number (100 in the run) of randomly chosen and uniformly distributed strategies. There is direct reciprocity; occasionally, the strategies make mistakes, simulating human behavior; new strategies are put into play, simulating mutations, and neutral drift is allowed. What typically happens (M. Nowak, Evolutionary Dynamics, ch. 5) can be visualized as follows:

In a random mix of strategies, ALLD does very well, almost taking over. At that point, even a small cluster of TFT, already present or introduced by mutation, will start expanding because it will defect against defectors (ALLD) but cooperate with cooperators (other TFT’s mostly). Once TFT becomes abundant, its unforgiving nature makes it succumb to GTFT. The reason is that since mistakes sometimes occur, a TFT playing another TFT might defect instead of cooperating. This will prompt the latter to defect as well in the next stage, thus starting a cycle of cooperation/defection. By contrast, GTFT will at some point try to cooperate again, breaking the cycle and receiving higher payoff. In short, GTFT quickly recovers from mistakes while TFT does not. What happens now depends on how generous GTFT is. If it is sufficiently vindictive, it takes over and becomes stable. However, if it is too generous, once it has taken over it will do no better than an ALLC, which may arise by mutation. If the game is played long enough, ALLC will take over by neutral drift. (In a population of N individuals with equal fitness, the probability that eventually all the population will be the descendent of a given individual A is 1/N. Hence, if the game is played long enough, this eventuality will come about). At this point, an ALLD mutation will result in an ALLD explosion. The cycle will start again. However, if when the frequency of ALLC’s is high WSLS arises as a mutant, the cycle is broken. When two WSLS A and B play each other, if they cooperated in the previous stage, they’ll keep cooperating. If A makes a mistake and defects, here’s what happens:

A: CCCDDCCC…

B: CCCCDCCC…

Cooperation, in other words, will resume after two stages.

If A is a WSLS and B an ALLC, if they cooperated in the previous stage they’ll keep cooperating. However, if A makes a mistake and defects, it will keep defecting against the much too nice B:

A: CCCDDDD…

B: CCCCCCC…

In short, WSLS exploits ALLC’s goodness while cooperating with other WSLS’s.

When a WSLS meets an ALLD, the ALLD is better off, as we have

WSLS: CDCD….

ALLD: DDDD…

Note that ALLD averages (P+T)/2 per game. Hence, as long as R>(P+T)/2 , the WSLS playing each other will average more than an ALLD playing a WSLS. In other words,

E(WSLS,WSLS)>E(ALLD,WSLS),

a strict Nash that comes close to delivering an ESS strategy. (It just comes close because the population is finite and random events, random drift for example, are allowed). In other words, one (or just a few) ALLD mutant will not invade. If R≤(P+T)/2 then a stochastic variant of WSLS that cooperates after mutual defection only with probability less than 1 will take over. Occasionally, the system cycles back to ALLD, but the mechanism is unclear.

The limits of replicator dynamics

Replicator dynamics has some features that limit its application.

The dynamics we considered has no mutation, in the sense that every replicator produces an identical copy of itself. A side consequence of this is that if a strategy has disappeared, it will never reappear again. (By the way, this is why p=1 and p=0 are always fixed points, even when they are not stable). There are ways of dealing wih this. For example, one could modify the replicator equation by introducing a term that makes, say, S turn into H with some probability q>0, the simplest case of mutation; alternatively, one can use Markov chains. However, as this complicates things we shall not do it. Even so, we can note the following. In our Stag Hunt example, p=1/2 is an unstable interior point, which means that random mutations will move the system in either of the two attraction basins with equal probability. However, if we change the payoff matrix to (5,5) when S meets another S, the interior point becomes p=1/4 (check it out!), which means that, given initial uniform distribution, on average random mutations will push systems more often in the basin of attraction with p=1 as the attractor; hence, on average systems will spend most of their lives in that basin. Of course, if there are n individuals in the population, there is a probability qⁿ that in the transition from one generation to another, all the S’s (or enough of them) become H, thus bringing H to fixation; however, even for relatively small n, that probabilty is likely to be really negligible. For example, if q=10% and n=17, the resulting probability is 10^-17, which is really phenomenally low, as 10¹⁷ is roughly the age of the universe is seconds.

In addition, the population must be infinite (in practice, very large); with finite populations, random drift is unavoidable. Consequently, if we are modeling a population that is not very large we need to use a different, and more complex, procedure than replicator dynamics.

The Dynamics of Finite Populations

In replicator dynamics, the population must be infinite. Of course, this requirement is unrealistic, but replicator dynamics has the advantage of being straightforward and easy to work with. Hence, as long as the population is large enough it is typically the first modeling approach.

However, often populations are not large enough, and this requires a different approach that involves stochasticity; in other words, realistic finite populations entail jettisoning deterministic evolutionary rules: in finite populations chance matters. The easiest way to see this is to consider the Moran process in the case of neutral drift, where selection plays no role.

Consider a population of N individuals reproducing at the very same rate, which means that selection does not favor one over another: they are neutral variants. At each time step, one individual is randomly chosen for reproduction and one for elimination. The two individuals may be the same individual X, in which case X produces another X and dies in the same step. (Note that this requirement makes the process of choosing the same as drawing with replacement). It follows that N remains constant. Suppose now that there are i individuals of type A and consequently N-i individuals of type B. If we indicate with XD the fact that X is chosen for death and with XR the fact that X is chosen for reproduction, there are 4 cases:

AD & AR, in which case Pr(AD & AR) = (i/N)². Note that the result of this event is that the number of A’s remains i.
BD & BR, in which case Pr(BD & AR) = [(N-i)/N]². Note that the result of this event is that the number of A’s remains i.
AR & BD, in which case Pr(AR & BD) = (i/N)[(N-i)/N] = [i(N-i)]/N². Note that the result of this event is that the number of A’s increases to i+1.
AD & BR, in which case Pr(AD & BR) = (i/N)[(N-i)/N] = [i(N-i)]/N². Note that the result of this event is that the number of A’s decreases to i-1.

Now let us indicate with p_i,i+1the probability that i increases to i+1, and similarly for i decreasing to i-1, and i remaining the same. Then we obtain the following state transition rules:

p_i,i+1 = [i(N-i)]/N²
p_i,i-1 = [i(N-i)]/N²
p_i,i = 1- p_i,i+1- p_i,i-1 = 1 - 2 [i(N-i)]/N²
p_o,o = p_N,N = 1

(The last is true because when i=0 there are only B’s and when i=N there are only A’s). States i=0 and i=N are absorbing states because when the system reaches one of them it remains there forever; the other states are transient states. If we wait long enough, the system will end up with all A’s or all B’s; that is, all the population will descend from the very same individual. Hence, although there is no selection, eventually A or B will become fixated. There are techniques, Markov chains for example, to determine interesting facts like in how many steps the system will get to the fixation of A. However, we shall leave that aside and ask instead a different question we can answer directly: if there are i A’s, what is the probability that A will become fixated? As there is no selection, each individual has the same chance at reproducing and leaving a lineage, namely 1/N; hence, that probability is i/N, as I is the number of A’s.

Suppose now that we add selection to the Moran process as follows. Let A’s fitness be r and B’s fitness be 1. Obviously, if r>1 A is favored by selection; if r<1 B is favored, and if r=1 we have neutral drift. We may work the new quantity r into the previous formulas as follows:

Pr(AR) = ri/(ri+N-i)
Pr(BR) = N-i/(ri+N-i)
Pr(AD)= i/N
Pr(BD)= (N-i)/N

The reason for ri in the nominator of the first formula, Pr(AR), is obvious: we simulate selection by making the number of A’s be higher/lower than it actually is by causing it to be sensitive to r. The nominator of Pr(BR) is given by the fact that B’s fitness is 1 by definition. The rationale for the denominator being ri+N-i rather than merely N is normalization. Since Pr(AR)+PR(BR)=1, the nominators determine the denominators. Note that when r=1 the formulas revert to those for neutral drift. From (1)-(4) we can obtain the state transition rules, just as before. (What are they?). It turns out that if A’s fitness is r and the number of A’s is i, A’s fixation probability is:

P = [1- 1/rⁱ]/[1- 1/r^N].

If the population is large and r>1, r^N will be very large, 1/ r^N will be very small and therefore the denominator will become very close to 1. Hence, in a large B population the fixation probability of a single A mutant with fitness r>1 will be approximately

ρ_A = 1- (1/r).

For example, if N=100, r=1.1 and i=1, the numerator becomes .091. In the denumerator, r^N = 13780, so that the denumerator becomes 13779/13780, which is very close to 1. Hence, ρ_A = .091 is a good approximation.

The generic 2x 2 symmetric game in finite populations

What happens if we play a game in a finite population? Let us consider the generic 2x2 symmetric game

	A	B
A	a	b
B	c	d

in a population of size N, with i A’s and therefore N-i B’s. For any A, there are i-1 other A’s, and for any B, there are N-i-1 other B’s. Hence, given any A, the probability it interacts with another A is (i-1)/(N-1); the probability it interacts with a B is (N-i)/(N-1), and so on. So, A’s expected value is

EP(A)= a[(i-1)/(N-1)] + b[(N-i)/(N-1)],

and B’s is

EP(B)= c[i/(N-1)] + d[(N-i-1)/(N-1)].

In replicator dynamics, fitness is totally determined by EP, and were we to apply it to this finite population, we would determine the average payoff and set up the replicator equation. Here, however, fitness is given by a modification of EP. Let us introduce a variable s measuring the intensity of selection, with s=0 indicating that selection is absent and s=1 indicating that fitness is completely given by EP. When there are i A’s, let us indicate the fitness of an A with F_i and that of a B with G_i and define the two as:

F_i = 1 – s + sEP(A)

and

G_i = 1 – s + sEP(B).

Note that when s=0, the fitness becomes 1 for every individual, which means that we have neutral drift. By contrast, when s=1, EP totally determines fitness. For 0<s<1, there will be a part of fitness determined by EP and a part by drift. A’s reproduction chances must depend on

F_i
A’s frequency (the proportion of A’s in the population)
The average fitness.

(The same, of course applies to B’s.)

Suppose now that we superimpose a Moran process to our population so that one individual is chosen for reproduction and one for elimination. So, the state transition rule for an A to be added is given by the probability that an A reproduces times the probability that a B is chosen for elimination:

p_i,i+1 = or .

Here (i/N) F_iis A’s fitness times A’s frequency; is the average fitness; (N-i)/N is the probability a B is chosen for elimination.

Analogously:

p_i,i-1 =

As usual,

p_i,i = 1- p_i,i+1 - p_i,i-1.

As before, i=0 and i=N determine the two absorption states, which means that in the long run, A or B will become fixated.

There are some general rules determining the behavior of this system when:

s→0, (selection is very weak)
A and B are best replies to themselves, (a>c and d>b)
N is large (in practice, about 100 is enough).

The first rule is that if A is risk dominant (A’s basin is larger than B’s, or p*<1/2) then ρ_A>ρ_B , where ρ_Ais the probability that the offspring of a single A in a B-population will achieve fixation, and analogously for ρ_B . In short, the probablity of A replacing B through a single mutant is greater than that of B replacing A.

However, note that ρ_A > 1/N does not entail ρ_B < 1/N: both ρ_A and ρ_Bcan be greater or smaller than 1/N. In the first case, selection favors both the fixation of an A in a B-population and of a B in an A-population; in the second case, selection contrasts replacement in either direction.

So, can we determine when, say, ρ_A > 1/N, namely when a single A in a B population has greater probability of becoming fixated (its descendents taking over) than in neutral drif?. It turns out that when (1)-(3) above apply, unexpectedly the “1/3 Law” holds. Determine the basin of attraction for A and B by using replicator dynamics. Then,

if the basin of attraction of B is less than 1/3, then ρ_A > 1/N.

In other words, if in replicator dynamics A would become fixated for some initial p<1/3, then ρ_A > 1/N, that is, strategy A can be deemed advantageous under conditions (1)-(3). For example, if

	A	B
A	5	1
B	2	2

is played, B’s basin is ¼, and therefore A is advantageous in that ρ_A > 1/N. The 1/3 law applies to many (possibly all) processes in addition to the Moran process. The intuitive rationale, which we shall not prove, for this is that an A invader in a B-population plays on average 2/3 of the times with a B and 1/3 of the times with an A.

One can easily see that if A dominates B, then in replicator dynamics there is no interior point p* as p*<0; since no probability is negative, p* is discarded: p has only two values, 0 and 1, and A or B will reach fixation every time in replicator dynamics. In this case, p*<1/3 is always true, and therefore A is always advantageous, which is not unexpectad. However, there is a surprise: even if A dominates B, as long as c>b (B does better against A than A against B) then there will be a critical N_c such that if N< N_c then ρ_B > 1/N: the probability that a single B mutant will reach fixation is greater than in neutral drift! This is in sharp contrast with replicator dynamics in which dominated strategies necessarily disappear.

One can define something analogous to an ESS in finite population of size N. (Remember that the notion of ESS involves infinite populations). For a large N, B is an ESS_N against an A invader if

EP(B,B)>EP(A,B), that is, d>b
The basin of attraction of B is larger than 1/3.

Condition (1) entails that selection is against A as B is a strict best answer to itself, and condition (2) entails that ρ_A, the probability of fixation of a single A in a B-population, is smaller than 1/N, which means that selection favors B. Of course, since the system is stochastic (1)-(2) do not guarantee that A will not invade as ρ_A need not be zero.

For example, consider the following game:

	A	B
A	3	1
B	2	4

Both A and B are best replies to themselves. The unstable interior point is p=3/4. Hence, B’s basin is larger than 1/3, which entails that A’s fixation probability is less than 1/N. Hence, B is an ESS_N .

Note that if we increase EP(A,A) to 5 and diminish EP(B,B) to 2 then p=1/4, and B will not be an ESS_Nany longer.

Spatial Games

Replicator dynamics assumes random interactions among strategies. But this, as we noted, is unrealistic in many contexts. So, in Stag Hunt we can then think of groups of S’s in an S-structure interacting with H’s grouped together in an H structure. Then things can be dramatically different from random interaction situations, as the increase or decrease of S depends not on what happens inside the S-structure but on what happens at the border between the S’s and the H’s.

In such contextes, types of evolutionary dynamics different fom replicator dynamics are obvious; for example, a simple dynamics could be:

Every individual (remember that only those at the border matter) looks at its payoffs and at those of its immediate neighbors, and in the next round all individuals simultaneously adopt the strategy that produced the highest payoff.

The way to make these ideas more precise is to look at spatial games.

Consider a spatial grid in which each individual occupies a position and interacts with all of its neighbors. The payoffs of each interaction are summed and in the next round:

each player adopts the strategy with the highest payoff in the neighborhood
all the updatings occur at the same time, which means that generations are not overlapping


	D1	D2	D3	D4
	D12	C1	C2	D5
D13	D11	C3	C4	D6
	D10	D9	D8	D7

Here we have 4 cooperators from Stag Hunt surrounded by 12 defectors. We may imagine this grid as a small part of a larger one that is wrapped around so that there are no boundary effects. A cell’s neighborhood is the von Neumann neighborhood, constituted by the 4 cells sharing a side with it. For example, C3’s neighborhood is constituted by D11, C1, C4, D9. Hence, the fate of C3 depends on its strategy, those of its neighbors D11, C1, C4, D9, and those of its neighbors’ neighbors. Let us look at C3’s fate. It will obtain a payoff of 6 from cooperating with C1 and C4 and a payoff of 0 from attempting, and failing, to cooperate with D9 and D11. In short, its payoff will be 6. The same is true for the remaining 3 cooperating cells. Consider now D11. It will have a payoff of 3 from its interaction with D13, D12, and D10, and a payoff of 2 from its interaction with C3, for a total of 5. The same is true for the remaining defecting cells. Hence, in the next round the twelve defecting cells will turn into cooperators, and the cooperator square will expand, eventually taking over. Note that cooperation is more successful in this spatial game than under replicator dynamics. This is true of most, but by no means all, interesting spatial games.

A standard way to study an evolutionary game is to consider the conditions for invasion. So, imagine that cooperators have taken over and that one mutates into a defector. Its payoff will be 8, while that of each of its cooperating neighbors will be 9, which means that the defector will vanish in the next round. Two neighboring defectors will have a payoff of 7, while their neighboring cooperators will get 9; hence, the defectors will vanish. With 3 neighboring defectors, the defectors bordering with three cooperators get a payoff of 7, against one of 9 for the cooperators. More defectors will fare even worse. So, a community of cooperators is immune from invasion from defectors.

Imagine now a community of defectors with one mutant cooperator. The cooperator will have a payoff of 0 while each defecting neighbor will have one of 5, with the result that the cooperator will vanish in the next round. Two neighboring cooperators will obtain each a payoff of 3, while each defecting neighbor will get one of 5; consequently, the two cooperators will not survive to the next round. Three neighboring cooperators will evolve into a group of four cooperators with the shape of a latin cross ( “+”), with the central cooperator getting 9; this group will hold its own. As we saw, a square of four cooperators will take over. In short, in the spatial version of Stag Hunt we considered, cooperation is much more successful than defection when compared with replicator dynamics.

In addition to von Neumann neighborhoods, one can use Moore neighborhoods, in which a cell’s neighbors are those touching it, so that the neighbors are all those reachable from the cell by a king move in chess. Grids may become n-dimensional, or, given an appropriate metric, one might define a distance r from a cell such that only and all the cells within r are neighbors, with the result that the number of neighbors may vary from cell to cell. Analogously, the transition rules may be changed. For example, a cell might become a cooperator (defector) if the average payoff of all the neighborhood cooperators (defectors) is higher than that of the neighborhood defectors (cooperators). Or, the probability of change might be linked to the payoff of the most successful neghbors. Analogously, one can overlay a Moran process, thus obtaining a randomly asynchronous updating: a cell is chosen at random, the relevant payoffs determined and then only that cell is updated, thus mimicking overlapping generations. There is, then a great amount of very different dynamics producing very different types of evolutions. Interestingly, games with stochastic (probabilistic) updates usually (but not always!) allow for the coexistence of cooperators and defectors (often with a large majority of cooperators) for very long times even in The Prisoners Dilemma.