Possible models for the development of human prosociality

A model of the evolution of human cooperation should mirror basic human features of hunter-gatherer societies within which we spent 95% of our existence and within which our social evolution originally took place. A good case can be made that such features are similar to those present in most hunter gatherer societies for which we have anthropological records. Hence,), the model should take into account crucial features of late Pleistocene Homo Sapiens societies including

Our tendency to make some errors in the implementation of strategies and in the perception of the situation
A moderately high discount factor, a quantity measuring the propensity to value future goods as much as present ones
The fact that groups are small enough (30 to 60 or so individuals) to allow direct observation and yet large enough that free riding may be a problem.
The fact that information may not be public and accurate
The fact that interaction is among both kin and non-kin
The lack of a central authority capable of enforcing social norms or directing individuals towards a Nash equilibrium.
The fact that group membership is fluid in that it typically involves exogamy and high levels of intergroup migration
The fact that different groups display behavioral heterogeneity.

Finally, the model must describe as outcome an equilibrium that is

accessible, in the sense that altruism must have a good chance of becoming widespread even if it starts as initially rare
stable, in the sense that once achieved the equilibrium must be robust.

With this in mind, let us look at some proposed models of the emergence and persistence of cooperation.

THE RATIONAL MODEL

This model is based on the strategic interaction among players who are not merely self-regarding but rational as well, as classical game theory assumes. The basic idea is to appeal to repeated games and to Folk Theorems, to which we now turn.

Suppose we repeat a game G, The Prisoners’ Dilemma, for example, an indefinite amount of times. Each round of G is called a “stage” of the repeated game. It turns out that the repeated game has different properties than G; for example, there are Nash equilibriums of the repeated game that are not Nash equilibria of G. (A Nash equilibrium obtains when a strategy is a best reply to itself. For example, consider driving on the right side or on the left side of the road. If you drive on the right side, my best strategy is to do the same instead of driving on the left. The same applies to you. So, when all players drive on the right side we have a Nash equilibrium). To understand the import of this we need the notions of discount factor and of signaling.

The discount factor δ is the equivalent in present units of one unit of value to be received one time unit from now. So, in general, to you $1 received one year from now is worth δ of a present dollar; when δ=1 one values goods to be received one time unit from now exactly as much as present goods. Hence, when life is uncertain or the future looks grim the discount factor is low, and in general patient agents act on a basis of a high discount factor while impatient ones on the basis of a low discount factor. In monetary terms, when inflation is high, the discount rate is high and the discount factor small.

In determining the expected payoff of a strategy in a repeated game in which defection matters it is important to have reliable signals telling one whether other players have defected or not. A signal is public if all the players receive it (otherwise it’s private), and it is perfect if it correctly reports whether a player has defected or not (otherwise it’s imperfect).

Consider now the following Prisoners’ Dilemma

Player 2
Player 1		S	T
	S	+5; +5	-10;+10
	T	+10;-10	-5;-5

If the players can use mixed strategies (for example, Player 1 might use S 30% of the times and T 70% of the times), it turns out that any point in the quadrilateral below, where the abscissa represents player 1’s payoff and the ordinate player 2’s, refers to a possible payoff outcome. However, since by defecting each player can guarantee he’ll incur a loss of at most -5, only the points in the quadrilateral ABCD represent strategies with payoffs greater than those resulting from universal defection. As the players are rational, they’ll never settle for any payoff smaller than -5 and consequently only the points in ABCD represent feasible outcomes.

Suppose now that each player is given a list of moves that will result, if both follow it accurately, in an average payoff of a for player 1 and b for player two, corresponding to point P in the graph. Then, player 1 could set up the following strategy:

“As long as player 2 follows the list, then follow the list as well; however, if 2 deviates then maximally punish him (in this specific case, defect) forever after”.

(This type of strategy is called a “trigger strategy”)

Imagine now that player 2 follows the equivalent strategy. Clearly, the mixed strategies leading to P, constitute a Nash equilibrium (they are best replies to each other), as any deviation from them will result in the application of the trigger strategy which will assure that the deviator (and everybody else once the deviator retaliates) will get -5 ever after. Importantly, there are three assumptions at work:

The discount factor is sufficiently high (the players care enough about their future payoffs for the trigger strategy to work).
Defection signals are public
Defection signals are perfect.

Two things are worth noting. First, in any game a player can always maximize the losses to his opponents compatibly with his opponent minimizing his losses. (Here -5 is the minimum to which the original deviator may be pushed). Hence, the above argument is general; second, all of this applies to many-players games as well. Since P can be any point whatever in ABCD, we have the most basic version of the Folk Theorem:

If (1)-(3) obtain, any point in ABCD (any payoff outcome in ABCD) can be reached by a Nash equilibrium in the repeated game.

So, when the Folk Theorem applies, saying that a certain outcome is a Nash equilibrium is not saying much, as just about any outcome can be a Nash equilibrium. The theorem can be extended to many cases of public but imperfect information, and even to cases of private, but almost public, information.

In providing a genesis of cooperation one must show that the relevant equilibrium is both attainable and stable. At first inspection, Folk Theorems seem to do this straightforwardly as any feasible payoff above the maximin of -5 is a Nash equilibrium and a Nash equilibrium is, in a way, self-fulfilling in the sense that if others stick to it, then it’s not to one’s advantage to deviate, which means that such an equilibrium is, in principle, self-perpetuating, and therefore stable. However, there are serious problems.

· attainability. Since there is an infinite number of of Nash equilibria, how do separate individuals coordinate to settle for one? The easiest way is to assume that coordination rules are already present as social rules. In our example, the players were given a list of moves. But obviously this will not do as it posits what needs to be explained. Moreover, social rules are often broken when contrary to immediate self-interest, and therefore they are discretionary unless enforceable, which already presupposes social coordination. Of course, one might argue that the enforcement (through punishment) is carried out by individual members without any need to coordinate because it is advantageous to the enforcer, for example by withdrawing cooperation (a form of punishment) and therefore reducing the cost associated with cooperation. However, this solution is problematic because if punishment is advantageous to the punisher, then why punish only rule breakers? And if it is disadvantageous, why punish at all? It seems we are back in the swamps of the Prisoners’ Dilemma, where defection, free-riding, dominate. A solution might be the introduction of bargaining, but this seems already to presuppose coordination rules.

· Cognitive requirements. It’s doubtful that the cognitive requirements of the Folk Theorems can be realistically satisfied. Public information can be achieved either in very small groups where everyone sees what everyone else does, or with an information distribution system in larger groups. But the presence of such system already presupposes considerable levels of coordination.

· Homo economicus. evidence from behavioral game theory shows that the assumption that humans are self-interested is mistaken; this, in turn, eliminates the need to use self-interested individuals in the models for the development of altruism.

POSITIVE ASSORTMENT MODELS

In standard replicator dynamics, interactions are random. For example, if 20% of the group members are self-regarding, then any given individual will interact with a self-regarding member 20% of the times. However, when it comes to human interaction, such a requirement seems implausible. Hence, many models try to explain the emergence of other regarding behavior by appealing to positive assortment; for example, those with a tendency to be other-regarding interact with each other more frequently than by mere chance. Here are some interesting cases.

Kin altruism

If I increase my identical twin’s fitness at a cost to me, my altruistic genes will be transmitted, through my twin, to the next generation. So, if I behave altruistically towards my kin, we have a case of positive assortment. The key equation here is Hamilton’s rule, which states that kin cooperation is favored by natural selection if the genetic relatedness r (1/2 for siblings, 1/4 for nieces and nephews, and so on) between donor and beneficiary exceeds the cost-benefit ratio of the altruistic act:

r > c/b.

Problem:

The value of r is unlikely to be high in a population that is not highly inbred. In short, although there is no question that kin altruism is a force in evolution, it cannot explain the simple fact that among primates, and especially hunter gatherers, altruistic cooperation typically extends well beyond kin, at times even trumping it.

Reciprocal Altruism

If two individuals are randomly paired to play the Prisoners Dilemma for many rounds, then cooperation becomes probable if a strategy of reciprocal altruism is introduced. The idea here is that X cooperates with Y if Y has cooperated with X, and vice versa. This is an example of positive assortment in that cooperators tend to cooperate with other cooperators more than with the generic player. The simplest of these strategies is Tit-For-Tat (TFT), which says to cooperate if the other player cooperated in the previous round and defect if the other player defected in the previous round: TFT has a short memory. Although the evidence for reciprocal altruism outside humans is relegated to other primates, there is no doubt that it played an important role in human interaction, as food sharing in our ancestral past was probably network-based rather than common pot based; in other words, one primarily shared with the individual who had previously shared with one. It turns out that if a few TFT’s are present, TFT beats “Always Defect”, and therefore produces an accessible equilibrium. There is evidence that it is not stable, as TFT is itself supplanted by “Generous TFT” (GTFT), a strategy involving some degree of forgiveness towards shirkers, which becomes stable if it is not too generous. (Of course, what counts as too generous depends on the payoff matrix).

Problem:

Computer models show that TFT is not accessible and/or not stable when interactions involve more than two players (typically, six players are enough to reduce cooperation drastically) unless there are no mistakes and information is public and accurate, each of which is an unrealistic requirement.

Indirect Reciprocity

Indirect reciprocity is based on reputation. If X benefits Y, then X has a greater chance of being benefited by Z than if he benefits nobody. Positive assortment comes about because those with good reputation will cooperate among each other more than those without it. A strategy embodying indirect reciprocity is the good standing strategy. Players who cooperated with others in the past are in good standing; otherwise they are in bad standing. The strategy is to cooperate only with those who are in good standing if one is in good standing and cooperate unconditionally if one is in bad standing due to a previous mistake so as to reacquire good standing. Under incarnations of this model involving random interaction, indirect reciprocity will succeed if the probability p of knowing the score of another player exceeds the ratio between cost and benefit:

p> c/b.

Problem:

This information requirement is high, and exceedingly difficult to satisfy if interactions involve more than a few players. Although the use of language may facilitate the attainment of the relevant information, the strategy based equilibrium is unstable if errors are allowed and information is imperfect, as it is bound to be given the incentive to convey false information.

Possible solution:

a way to overcome the information problem in a large group if one engages in costly signaling, namely signaling that cannot be faked. For example, an act of bravery or a public sharing of food or some ritual scarring may increase one’s reputation. So, indirect reciprocity can produce a stable cooperative equilibrium in small groups where everyone knows all the relevant information about everyone else –a fact that can be plausibly modeled with various types of spatial games, or when costly signaling is present.

Multi Level (Group) Selection Models

The population is divided into groups. The idea here is that although cooperation lowers one’s fitness within one’s group, it sufficiently increases the group’s average fitness with respect to the population to render one reproductively successful. So, in general, in a group selection model, there are two forces:

· within group selection, that typically disadvantages altruists because they tend to do worse than selfish members of the group

· between group selection, that advantages altruists because the average member of a group with many cooperators will do better than the average member of other groups with little cooperation.

When cooperators suffer no disadvantage within their group (this may happen because of strong leveling factors), the model is one of weak multi-level (or group) selection; when cooperators do suffer a disadvantage, the model is one of strong multi-level (group) selection.

There are different types of multi-level selection; some work and some don’t

Consider a purely genetic based model. Groups are reproductively isolated but individuals replicate in proportion to how much better or worse with respect to the population average their payoffs are. Individuals interact only with members of their group. Given the presence of cooperators, the more internally homogeneous groups with more cooperators will increase; for example, individuals in a group made of all cooperators will have much above average population payoffs. The destiny of cooperators depends on whether

1. Pr(C|C) – Pr(C|~C) > c/b,

2. Pr(C|C) – Pr(C|~C) = c/b,

3. Pr(C|C) – Pr(C|~C) < c/b,

where Pr(C|C) stands for the probability that one interacts with a cooperator given than one is a cooperator, and Pr(C|~C) is the probability that one interacts with a cooperator given that one is not a cooperator. In (1), cooperation will increase, in (2) it will remain stationary, and in (3) it will decrease. The quantity

F=Pr(C|C) – Pr(C|~C)

is Wrights’s inbreeding coefficient, which measures the level of genetic differentiation among the groups and also the degree of positive assortment. It turns out that the evidence from foraging populations indicates that F is quite low, in the order of 1/12, which would require that benefit b must be at least 12 times greater than cost c, a condition too stringent to make the emergence of cooperation likely. Ethnographic evidence points to high levels of migration among groups resulting in low genetic group differentiation causing insufficient disparity in average payoff level among groups. So, such group selection mechanisms are unlikely.

Networks

In human societies, individuals are part of networks so that they interact only with members to whom they are directly linked. In other words, each individual interacts only with a group of neighbors. After each stage of the game, individuals update their strategy and adopt the strategy C of neighboring cooperators with probability equal to the sum of the payoffs of all the neighboring cooperators divided by the sum of all the payoffs of the neighboring non-cooperators. Nowak and others have shown that cooperation will increase if

1/k > c/b,

where k is the average number of neighbors an individual has. This obtains because the smaller the neighborhoods, the more likely that they are different from each other, thus effecting positive assortment. This means that the smaller the average neighborhood, the greater the chance of an expansion of cooperators.

Problem: in foraging societies the whole group, typically about 30 to 60 or so individuals, often constitutes the neighborhood, which requires c/b to be too small to be realistic.