### Infinitely Repeated Prisoners’ Dilemma

Imagine that the prisoners’ dilemma is played infinitely many times

In order to introduce *discounting* of future payoffs, we denote by \(\delta \in\) [0, 1] the players’ common discount factor

Suppose that the player obtains a payoff of \(\nu\) every period

Then the sum of the discounted payoff stream, or simply the **discounted payoff**, is

\(\nu + \nu \delta + \nu \delta^{2} + \nu \delta^{3} + \cdots = \frac{\nu}{1-\delta}\)

A more convenient way to express payoffs in repeated games is

\(\frac{\nu}{1-\delta} \cdot (1 – \delta) = \nu\)

This is referred to as the **average discounted payoff** and denoted \(\pi\)

e.g., \(10 + 5 \delta + 10 \delta^{2} + 5 \delta^{3} + \cdots = \frac{10}{1-\delta^{2}} + \frac{5\delta}{1-\delta^{2}}\)

average discounted payoff: \((\frac{10}{1-\delta^{2}} + \frac{5\delta}{1-\delta^{2}}) \cdot (1-\delta)\) = \(\frac{10+5\delta}{1+\delta}\)

To formalize the idea of reputation in infinitely repeated games, we consider the following simple strategy:

*Cooperate so long as no one has ever defected; otherwise defect*

Hence (D, D) is used in the **punishment** phase and (C, C) in the **cooperation** phase.

This kind of strategies is dubbed a *grim-trigger* strategy

To see whether this strategy constitutes a SPE, we utilize the symmetric payoff structure and focus on player 1’s incentives to deviate

e.g., symmetric: \(s_{1} = s_{2}^{\prime}, s_{2} = s_{1}^{\prime}\) in \(u_{1}(s_{1}, s_{2}) = u_{2}(s_{1}^{\prime}, s_{2}^{\prime})\)

1 \ 2 | C | D |

C | 2, 2 | 0, 3 |

D | 3, 0 | 1, 1 |

\(u_{1}(C, D) = u_{2}(D, C) = 0\)

##### (1) Cooperation Phase

Eqbm: (C, C), (C, C), (C, C), … → \(\pi_{1}\) = 2

Deviation: (D, C), (D, D), (D, D), … → \(\pi_{1}\) = \((1-\delta)(3+\sum_{t=1}^{\infty}\delta t)\)

= \((3 + \delta + \delta^{2} + \cdots) \cdot (1-\delta) = (3 + \frac{\delta}{1-\delta}) \cdot (1-\delta) = 3(1-\delta) + \delta = 3 – 2\delta\)

2 ≥ 3 – 2 \(\delta\) ⇔ \(\delta\) ≥ 1/2

So, if delta is greater than 1/2, it becomes unprofitable to deviate

##### (2) Punishment Phase

Eqbm: (D, D), (D, D), (D, D), … → \(\pi_{1}\) = 2

Deviation: (C, D), (D, D), (D, D), … → \(\pi_{1}\) = \((1-\delta)(0+\sum_{t=1}^{\infty}\delta^{t} \cdot 1) = \delta\)

Hence defecting forever is the best response for player 1

Therefore, the grim-trigger strategy can be supported as a SPE if \(\delta\) ≥ \(\frac{1}{2}\)

Remark: The **one-shot deviation principle** states that a player has no profitable deviation in any subgames if and only if she has no profitable one-shot deviation. Therefore, to determine whether a player’s behavior is optimal, it is enough to check whether the player cannot benefit from deviating only in the current period.

### Another Prisoners’ Dilemma

1 \ 2 | C | D |

C | 4, 4 | -2, 6 |

D | 6, -2 | 0, 0 |

(1) **Cooperation** phase

Eqbm: \(\pi_{1}\) = 4.

Deviation: \(\pi_{1} = 6(1-\delta)\), thus, 4 ≥ 6(1-\(\delta\)) ⇔ 6\(\delta\) ≥ 2 ⇔ \(\delta\) ≥ 1/3

(2) **Punishment** phase

Eqbm: \(\pi_{1}\) = 0.

Deviation: \(\pi_{1} = -2(1-\delta)\), thus, -2(1-\(\delta\)) < 0 ⇔ \(\delta\) < 1 (Always true)

So, \(\delta\) ≥ 1/3 (C, C) can be supported as an equilibrium

- Reference: Chang-Koo Chi, (26/50) Game Theory and Applications 7 – Finitely repeated game, Jul 8, 2020, https://youtu.be/fN6L9RL5IPk