Infinitely Repeated Prisoners’ Dilemma
Imagine that the prisoners’ dilemma is played infinitely many times
In order to introduce discounting of future payoffs, we denote by \(\delta \in\) [0, 1] the players’ common discount factor
Suppose that the player obtains a payoff of \(\nu\) every period
Then the sum of the discounted payoff stream, or simply the discounted payoff, is
\(\nu + \nu \delta + \nu \delta^{2} + \nu \delta^{3} + \cdots = \frac{\nu}{1-\delta}\)
A more convenient way to express payoffs in repeated games is
\(\frac{\nu}{1-\delta} \cdot (1 – \delta) = \nu\)
This is referred to as the average discounted payoff and denoted \(\pi\)
e.g., \(10 + 5 \delta + 10 \delta^{2} + 5 \delta^{3} + \cdots = \frac{10}{1-\delta^{2}} + \frac{5\delta}{1-\delta^{2}}\)
average discounted payoff: \((\frac{10}{1-\delta^{2}} + \frac{5\delta}{1-\delta^{2}}) \cdot (1-\delta)\) = \(\frac{10+5\delta}{1+\delta}\)
To formalize the idea of reputation in infinitely repeated games, we consider the following simple strategy:
Cooperate so long as no one has ever defected; otherwise defect
Hence (D, D) is used in the punishment phase and (C, C) in the cooperation phase.
This kind of strategies is dubbed a grim-trigger strategy
To see whether this strategy constitutes a SPE, we utilize the symmetric payoff structure and focus on player 1’s incentives to deviate
e.g., symmetric: \(s_{1} = s_{2}^{\prime}, s_{2} = s_{1}^{\prime}\) in \(u_{1}(s_{1}, s_{2}) = u_{2}(s_{1}^{\prime}, s_{2}^{\prime})\)
1 \ 2 | C | D |
C | 2, 2 | 0, 3 |
D | 3, 0 | 1, 1 |
\(u_{1}(C, D) = u_{2}(D, C) = 0\)
(1) Cooperation Phase
Eqbm: (C, C), (C, C), (C, C), … → \(\pi_{1}\) = 2
Deviation: (D, C), (D, D), (D, D), … → \(\pi_{1}\) = \((1-\delta)(3+\sum_{t=1}^{\infty}\delta t)\)
= \((3 + \delta + \delta^{2} + \cdots) \cdot (1-\delta) = (3 + \frac{\delta}{1-\delta}) \cdot (1-\delta) = 3(1-\delta) + \delta = 3 – 2\delta\)
2 ≥ 3 – 2 \(\delta\) ⇔ \(\delta\) ≥ 1/2
So, if delta is greater than 1/2, it becomes unprofitable to deviate
(2) Punishment Phase
Eqbm: (D, D), (D, D), (D, D), … → \(\pi_{1}\) = 2
Deviation: (C, D), (D, D), (D, D), … → \(\pi_{1}\) = \((1-\delta)(0+\sum_{t=1}^{\infty}\delta^{t} \cdot 1) = \delta\)
Hence defecting forever is the best response for player 1
Therefore, the grim-trigger strategy can be supported as a SPE if \(\delta\) ≥ \(\frac{1}{2}\)
Remark: The one-shot deviation principle states that a player has no profitable deviation in any subgames if and only if she has no profitable one-shot deviation. Therefore, to determine whether a player’s behavior is optimal, it is enough to check whether the player cannot benefit from deviating only in the current period.
Another Prisoners’ Dilemma
1 \ 2 | C | D |
C | 4, 4 | -2, 6 |
D | 6, -2 | 0, 0 |
(1) Cooperation phase
Eqbm: \(\pi_{1}\) = 4.
Deviation: \(\pi_{1} = 6(1-\delta)\), thus, 4 ≥ 6(1-\(\delta\)) ⇔ 6\(\delta\) ≥ 2 ⇔ \(\delta\) ≥ 1/3
(2) Punishment phase
Eqbm: \(\pi_{1}\) = 0.
Deviation: \(\pi_{1} = -2(1-\delta)\), thus, -2(1-\(\delta\)) < 0 ⇔ \(\delta\) < 1 (Always true)
So, \(\delta\) ≥ 1/3 (C, C) can be supported as an equilibrium
- Reference: Chang-Koo Chi, (26/50) Game Theory and Applications 7 – Finitely repeated game, Jul 8, 2020, https://youtu.be/fN6L9RL5IPk