Game Theory 7 – chap10. Infinitely repeated games

Infinitely Repeated Prisoners’ Dilemma

Imagine that the prisoners’ dilemma is played infinitely many times

In order to introduce discounting of future payoffs, we denote by \(\delta \in\) [0, 1] the players’ common discount factor

Suppose that the player obtains a payoff of \(\nu\) every period

Then the sum of the discounted payoff stream, or simply the discounted payoff, is

\(\nu + \nu \delta + \nu \delta^{2} + \nu \delta^{3} + \cdots = \frac{\nu}{1-\delta}\)

A more convenient way to express payoffs in repeated games is

\(\frac{\nu}{1-\delta} \cdot (1 – \delta) = \nu\)

This is referred to as the average discounted payoff and denoted \(\pi\)

e.g., \(10 + 5 \delta + 10 \delta^{2} + 5 \delta^{3} + \cdots = \frac{10}{1-\delta^{2}} + \frac{5\delta}{1-\delta^{2}}\)

average discounted payoff: \((\frac{10}{1-\delta^{2}} + \frac{5\delta}{1-\delta^{2}}) \cdot (1-\delta)\) = \(\frac{10+5\delta}{1+\delta}\)

To formalize the idea of reputation in infinitely repeated games, we consider the following simple strategy:

Cooperate so long as no one has ever defected; otherwise defect

Hence (D, D) is used in the punishment phase and (C, C) in the cooperation phase.
This kind of strategies is dubbed a grim-trigger strategy

To see whether this strategy constitutes a SPE, we utilize the symmetric payoff structure and focus on player 1’s incentives to deviate

e.g., symmetric: \(s_{1} = s_{2}^{\prime}, s_{2} = s_{1}^{\prime}\) in \(u_{1}(s_{1}, s_{2}) = u_{2}(s_{1}^{\prime}, s_{2}^{\prime})\)

1 \ 2CD
C2, 20, 3
D3, 01, 1

\(u_{1}(C, D) = u_{2}(D, C) = 0\)

(1) Cooperation Phase

Eqbm: (C, C), (C, C), (C, C), … → \(\pi_{1}\) = 2

Deviation: (D, C), (D, D), (D, D), … → \(\pi_{1}\) = \((1-\delta)(3+\sum_{t=1}^{\infty}\delta t)\)

= \((3 + \delta + \delta^{2} + \cdots) \cdot (1-\delta) = (3 + \frac{\delta}{1-\delta}) \cdot (1-\delta) = 3(1-\delta) + \delta = 3 – 2\delta\)

2 ≥ 3 – 2 \(\delta\) ⇔ \(\delta\) ≥ 1/2
So, if delta is greater than 1/2, it becomes unprofitable to deviate

(2) Punishment Phase

Eqbm: (D, D), (D, D), (D, D), … → \(\pi_{1}\) = 2

Deviation: (C, D), (D, D), (D, D), … → \(\pi_{1}\) = \((1-\delta)(0+\sum_{t=1}^{\infty}\delta^{t} \cdot 1) = \delta\)

Hence defecting forever is the best response for player 1

Therefore, the grim-trigger strategy can be supported as a SPE if \(\delta\) ≥ \(\frac{1}{2}\)

Remark: The one-shot deviation principle states that a player has no profitable deviation in any subgames if and only if she has no profitable one-shot deviation. Therefore, to determine whether a player’s behavior is optimal, it is enough to check whether the player cannot benefit from deviating only in the current period.

Another Prisoners’ Dilemma

1 \ 2CD
C4, 4-2, 6
D6, -20, 0

(1) Cooperation phase

Eqbm: \(\pi_{1}\) = 4.
Deviation: \(\pi_{1} = 6(1-\delta)\), thus, 4 ≥ 6(1-\(\delta\)) ⇔ 6\(\delta\) ≥ 2 ⇔ \(\delta\) ≥ 1/3

(2) Punishment phase

Eqbm: \(\pi_{1}\) = 0.
Deviation: \(\pi_{1} = -2(1-\delta)\), thus, -2(1-\(\delta\)) < 0 ⇔ \(\delta\) < 1 (Always true)

So, \(\delta\) ≥ 1/3 (C, C) can be supported as an equilibrium

Leave a Comment