1
囚徒困境和重复博弈
The Prisoners’ Dilemma
and Repeated Games
第11章
Chapter 11
Slide 2
囚徒困境
Prisoners’ Dilemma
囚徒困境是这样一个博弈:每个参与者有一个
优势策略,但是,当所有参与者使用他们这一
优势策略时,所产生的均衡对于每个人的结果,
比他们都使用劣势策略反而还要差。
The prisoners’ dilemma is a game in
which each player has a dominant
strategy, but the equilibrium that arises
when all players use their dominant
strategies provides a worse outcome for
every player than would arise if they all
used their dominated strategies instead.
2
Slide 3
囚徒困境
Prisoners’ Dilemma
本章考虑囚徒困境中的参与者是否以及如何获得和保
持对他们都有利的合作结果,克服为了自身利益而背
叛的个人激励。
In this chapter, we consider whether and how
the players in a prisoners’ dilemma can attain
and sustain their mutually beneficial
cooperative outcome, overcoming their
separate incentives to defect for individual
gain.
三种解 Three categories of solutions:
重复 Repetition
惩罚和奖励 Penalty and reward
领导 Leadership
Slide 4
内容提要
Outline
基本博弈(回顾)
The Basic Game (Review)
解之一:重复
Solutions I: Repetition
解之二:惩罚和奖励
Solutions II: Penalties and Rewards
解之三:领导
Solutions III: Leadership
实验证据
Experimental Evidence
真实世界中的囚徒困境
Real-world Dilemmas
3
Slide 5
基本博弈(回顾)
The Basic Game (Review)
3 yr, 3 yr25 yr, 1 yrDeny
(Cooperate)
1 yr, 25 yr10 yr, 10 yrConfess
(Defect)
HUSBAND
Deny
(Cooperate)
Confess
(Defect)
WIFE
Slide 6
基本博弈(回顾)
The Basic Game (Review)
在任何一个囚徒困境中,总会有一个合作策略
和一个欺骗或背叛策略。
In any prisoners’ dilemma, there is
always a cooperative strategy and a
cheating or defecting strategy.
参与者总是可以根据其策略选择,被称作背叛
者或合作者。
Players can always be labeled,
according to their choice of strategy, as
either defector or cooperators.
4
Slide 7
基本博弈(回顾)
The Basic Game (Review)
当参与者之间不进行相互合作,他们就选择背叛,希
望以对手的损失为代价,获得个人的利益。
When the players do not cooperate with each
other, they choose to defect in the hope of
attaining individual gain at the rival’s expense.
囚徒困境能否以及如何解决,问题的实质在于通过非
合作(个人)的行动去实现合作(共同偏好)的结果。
The essence of the question of whether, when
and how a prisoner’ dilemma can be resolved
is the difficulty of achieving a cooperative
(jointly preferred) outcome through
noncooperative (individual) actions.
Slide 8
解之一:重复
Solutions I: Repetition
在一个囚徒困境的重复博弈中,每个参与者担心一次
背叛会导致未来合作的崩溃。
In a repeated play of the prisoners’ dilemma,
each player fears that one instance of
defecting will lead to a collapse of cooperation
for the future.
如果未来合作的价值很大,超过了短期内通过背叛所
获得的,那么参与者的长期个人利益自动地消除了背
叛,并不需要任何额外惩罚或第三方强制。
If the value of future cooperation is large and
exceeds what can be gained in the short term
by defecting, then the long-run individual
interests of the players can automatically and
tacitly keep them from defecting, without the
need for any additional punishments or
enforcement by third parties.
5
Slide 9
定价中的囚徒困境
Prisoners’ dilemma of Pricing
324, 324216, 36026
(Cooperate)
360, 216288, 28820
(Defect)
XAVIER’S
TAPAS
26
(Cooperate)
20
(Defect)
YVONNE’S BISTRO
Slide 10
定价中的囚徒困境
Prisoners’ dilemma of Pricing
假定两个餐馆开始处于合作状态,每个人收取高价格
$26。
Suppose that the two restaurants are initially
in the cooperative mode, each charging the
higher price of $26.
如果他们正常地竞争至少3个月,按照一次博弈的理论,
我们似乎就应该看到合作行为(高价格)而不是背叛
行为(低价格)。
If they competed on a regular basis for at least
3 months, it seems that we might see
cooperative behavior (high prices) rather than
the defecting behavior (low prices) predicted
by theory for the one-shot game.
但是解实际上没有那么简单。
But the solution is not actually that simple.
6
Slide 11
有限次重复
Finite Repetition
只要两个参与者之间的关系持续的时间长度固定和已知,在最后
阶段的博弈中,优势策略均衡(背叛)就会出现。
As long as the relationship between the two players
lasts a fixed and known length of time, the dominant-
strategy equilibrium with defecting should prevail in the
last period of play.
参与者到达博弈终点时,继续合作就毫无价值,于是他们选择背
叛。
When the players arrive at the end of the game, there is
never any value to continued cooperation, and so they
defect.
按照反转的预测,相互背叛就会一直倒回到最开始的博弈。
Then rollback predicts mutual defecting all the way back
to the very first play.
Slide 12
无限次重复
Infinite Repetition
无论在怎样的重复博弈中,相互关系的序贯性
质意味着参与者可以采取的策略依赖于前面回
合的博弈中的行为。
In repeated games of any kind, the
sequential nature of the relationship
means that players can adopt strategies
that depend on behavior in preceding
plays of the games.
这样的策略被称为或然策略。
Such strategies are known as
contingent strategies.
7
Slide 13
无限次重复
Infinite Repetition
大多数或然策略都是触发策略:只要对手合作,该参
与者也合作;但对方任何背叛就会“触发”规定时间长
度的惩罚期,其间以非合作来回击。
Most contingent strategies are trigger
strategies, where a player plays cooperatively
as long as her rival(s) do so, but any defection
on their part “triggers” a period of punishment,
of specific length, in which she plays
noncooperative in response.
最有名的两个触发策略是严厉策略和以牙还牙。
Two of the best-known trigger strategies are
the grim strategy and tit-for-tat.
Slide 14
无限次重复
Infinite Repetition
严厉策略要求与你的对手合作,直到你们当中任何一
人背叛;一旦背叛发生,所有的参与者在此后博弈的
每一回合都选择背叛策略。
The grim strategy entails cooperating with
your rival until such time as any of the players
defects from cooperation; once a defection has
occurred, all the players choose the Defect
strategy on every play for the rest of the game.
以牙还牙的策略要求在任何一个回合中,都选择你的
对手在上一回合中的行动。
Tit-for-tat (TFT) is a strategy of choosing, in
any specified period of play, the action chosen
by your rival in the preceding period of play.
8
Slide 15
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
假定双方参与者在这一重复的餐馆定价博弈中都使用
严厉策略。
Suppose both players use the grim strategy in
the repeated restaurant pricing game.
如果参与者都不偏离这一策略,我们可以预期一个合
作结果,利润各为324。
Without any deviation from such a strategy
from any player, we would expect a
cooperative outcome, with a profit of 324 for
each.
对于某一参与者来说,给定对手盯住这一策略,选择
偏离是否值得?
Is it worthwhile to deviate from such a
strategy for some player, given the other sticks
to it?
Slide 16
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
假定开始时双方都采取合作行动。
Suppose at the beginning, both are playing
Cooperate.
如果X偏离仅一次(在一个月里出背叛),他会多得
36的利润(总利润360而不是324)。
If Xavier’s Tapas deviate only once (playing
Defect in one month), it could add 36 to its
profits (360 instead of 324).
在背叛后的第一个月,遵照严厉策略,双方都会被锁
定在背叛定价上,获得每月288的利润,损失36。
In the first month after Xavier’s defection, by
following the grim strategy, both restaurants
would be locked at the defective price earning
288 each month, where each suffers a loss of
36.
9
Slide 17
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
X必须比较36的得益和从第二时期开始并持续下去的
36的损失的现值。
Xavier’s must compare the gain of 36 with the
present value(PV) of the loss of 36 from the
second period on.
使用符号r
示月的总回报率,PV计算为:
Using the symbol r to denote the (monthly)
total rate of return yields a solution for PV:
PV=36/(1+r)+36/(1+r)2+……=36/r
偏离一次(然后永远背叛)是不值得的,当且仅当:
To deviate once (and then fall into defect
forever) is NOT worthwhile if and only if
36<36/r, or r<1.
Slide 18
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
假定开始时,双方都采取背叛行动。
Suppose at the beginning, both are playing
Defect.
X偏离一次(出招为合作)是否值得?
Is it worthwhile for Xavier’s to deviate (by
playing Cooperate) once?
否 No!
如果此前X出了背叛,对手出了合作,或者相反,X是
否值得偏离一次(出招为合作)?
What if one are playing Defect and the other
playing Cooperate, or vice versa?
否 No!
10
Slide 19
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
所有可能的子博弈(阶段博弈)必然从四类节
点中的一个开始:参与者在上一回合(分别)
出(C, C), (C, D), (D, C) 或(D, D)。
All the possible subgames (stage games)
must begin from one of four kinds of
nodes resulting from two players play
(C, C), (C, D), (D, C) or (D, D) in the
previous stage.
这样我们已经
,在任何一个阶段博弈中,
单独一次的偏离不能使偏离者受益。
So we have proved that in any stage
game, a single deviation cannot make
the deviating player better if r<1.
Slide 20
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
那么,偏离不只一次呢?
How about deviate more than once?
一阶段偏离原理(对无限范围博弈)表明,两个参与者的策略组
合构成子博弈完美均衡,当且仅当对任何一个参与者,不存在任
何的单独一个阶段(无论是否在均衡路径上)的策略偏离,可以
使得她更好,给定该阶段已经到达。
The one-stage deviation principle (for infinite-horizon
games) states that two players’ strategy combination is
a SPE if and only if there is no strategic deviation for
any player at any single stage (on or off equilibrium
path) which can make her better given that stage has
been reached.
这自然意味着,任何超过一次的偏离不可能使该参与者更好,如果
没有任何一次性偏离能够做到。
This naturally means any deviation more than once cannot
make that player better if any single deviation cannot do so.
11
Slide 21
使用严厉策略来确保合作
Use Grim Strategy to Guarantee
Cooperation
在我们的重复餐馆定价博弈中,严厉策略组合构
成了子博弈完美均衡,如果r<1。
In our repeated restaurant pricing game,
the Grim strategy combination is a SPE if
r<1。
或者,贴现因子 or the discount factor
δ≡1/(1+r)>1/2.
均衡的结果是合作的。
The equilibrium outcome is cooperative.
这样,使用严厉策略解决了两个餐馆之间的囚徒
困境。
Thus use of this grim strategy solves the
prisoners’ dilemma for the two restaurants.
Slide 22
TFT策略?
How about TFT Strategy?
双方参与者出以牙还牙能够构成子博弈完美均
衡吗?
Can both players playing TFT being a
SPE?
不能!NO! The textbook is wrong!
提示:使用一阶段偏离原理,讨论与严厉策略
的情形相类似的四种情况。
Hint: Use the one-stage deviation
principle for four cases similar with the
grim strategy.
12
Slide 23
博弈长度未知
Games of Unknown Length
下一期的某一金额的现值为δ=1/(1+r)乘以该金额。
The present value of an amount next month is
worth only δ=1/(1+r) times the amount.
如果除此之外,仅在概率p(小于1)下,博弈关系才
会持续到下一期,那么下一期的该金额只值pδ乘以这
一金额。
If in addition there is only a probability p (less
than 1) that the relationship will actually
continue to the next month, then next month’s
amount is worth only pδ times the amount.
有效回报率R,满足1/(1+R)= pδ,则:
The effective rate of return R, where 1/(1+R)=
pδ is,
R=(1- pδ)/(pδ).
Slide 24
一般理论
General Theory
C, CL, HCooperate
H, LD, DDefectROW
CooperateDefect
COLUMN
(H>C>D>L)
13
Slide 25
一般理论
General Theory
合作结果可以为严厉策略所支持,当且仅当:
An cooperative outcome can be
sustained by the grim strategy
combination if and only if,
R<(C-D)/(H-C)
合作破裂的可能性越大,如果:
The collapse of cooperation is more
likely if:
R越大(或者pδ越小)
the larger R (or the smaller pδ ) is
(C-D)越小 the smaller (C-D) is
(H-C)越大 the larger (H-C) is
Slide 26
一般理论
General Theory
也就是说,合作破裂的可能性越大,当……
That is, the collapse of cooperation is
more likely when ……
参与者缺乏耐心,或者预期博弈会很快结束
players are impatient, or the game is
expected to end quickly
惩罚不够严厉
punish is not very severe
背叛为参与者在很短时间内积攒了大量收益。
defecting garners a player large and
immediate benefits.
14
Slide 27
解之二:惩罚和奖励
Solutions II: Penalties and Rewards
3 yr, 3 yr25 yr, 21 yrDeny
(Cooperate)
21 yr, 25 yr10 yr, 10 yrConfess
(Defect)
HUSBAND
Deny
(Cooperate)
Confess
(Defect)
WIFE
The game has changed from being a prisoners’
dilemma to an assurance game.
Slide 28
解之二:惩罚和奖励
Solutions II: Penalties and Rewards
3 yr, 3 yr25 yr, 21 yrDeny
(Cooperate)
21 yr, 25 yr30 yr, 30 yrConfess
(Defect)
HUSBAND
Deny
(Cooperate)
Confess
(Defect)
WIFE
Each player has a dominant strategy and (Deny,
Deny) becomes the unique Nash equilibrium.
15
Slide 29
解之三:领导
Solutions III: Leadership
在囚徒困境的许多例子中,博弈都假定为对称。
In most examples of the prisoners’ dilemma,
the game is assumed to by symmetric.
然而,在实际的策略情况下,一个参与者可能相对较
“大”(领导者),另一个相对较“小”。
However, in actual strategic situations, one
player may be relatively “large” (a leader) and
the other “small”.
如果收益的规模相当不对等,则来自背叛的损害会如
此多地落在较大参与者身上,以致她明知对手会背叛,
依然选择合作行动。
If the size of the payoffs is unequal enough, so
much of the harm from defecting may fall on
the larger player that she acts cooperatively,
even while knowing that the other will defect.
Slide 30
解之三:领导
Solutions III: Leadership
-1.6, -1.60, -2No Research
-2, 0-1, -1ResearchDORMINICA
No ResearchResearch
SOPORIA
Equal-Population SANE Research Game
This game is a prisoners’ dilemma where each player
has a dominant strategy to do no research.
16
Slide 31
解之三:领导
Solutions III: Leadership
-2.4, -0.80, -2No Research
-2, 0-1, -1ResearchDORMINICA
No ResearchResearch
SOPORIA
Unequal-Population SANE Research Game
No Research is still the dominant strategy for Soporia,
but Dominica’s best response is now Research.
Slide 32
解之三:领导
Solutions III: Leadership
从某种意义上说,囚徒困境通过大小不对称得
到了解决。
The prisoners’ dilemma has, in a sense,
been “solved” by the size asymmetry.
较大的国家选择承担领导者的角色,为整个世
界提供利益。
The larger country chooses to take on a
leadership role and provide the benefit
for the whole world.
17
Slide 33
解之三:领导
Solutions III: Leadership
“弱者利用强者”
“The exploitation of the great by
the small”
沙特在欧佩克中充当平衡器
Saudi Arabia as the “swing producer”
in OPEC
北约中的美国 US in NATO
中超里的七大俱乐部
Slide 34
实验证据
Experimental Evidence
实验表明,在长度已知和有限的重复博弈中,依然可
以看到合作。
Experiments show that cooperation occurs
even in repeated versions of known and finite
length.
只在有限博弈的最后几个回合,背叛才会发生。
Only in the last few plays of a finite game does
defecting seem to creep in.
结果还显示,合作的基于反转的瓦解,随着时间推移,
可以被参与者从博弈的经历中学习到。
Results also suggests that the unraveling of
cooperation, based on the use of rollback, is
being learned from experience of the play
itself over time.
18
Slide 35
实验证据
Experimental Evidence
如果参与者发现自己处于合作状态,而且意识
到博弈关系即将结束,合作的破裂必定会涉及
到不确定性,如双方都出混合策略。
If players find themselves in a
cooperative mode with the known end
of the relationship approaching, the
unwinding of cooperation must include
some uncertainty, such as mixed
strategies, for both players.
Slide 36
实验证据
Experimental Evidence
计算机模拟实验表明,“善意”的程序比“恶意”的程序
表现更好。但不包括那些总是善意和合作的。
Computer simulations experiments shows
“nice” programs did better than “nasty”
programs. But not those always nice and
cooperative ones.
获胜策略是一个最简单的程序:以牙还牙。原因可能
是,它是立即原谅的、善意的、具有警示性的和清晰
的。
The winning strategy turned out to be the
simplest program: Tit-for tat. The reason
might be that it is at once forgiving, nice,
provocable, and clear.
19
Slide 37
真实世界中的囚徒困境
Real-World Dilemmas
政府竞争以吸引产业
Governments competing to attract
Business
劳动仲裁
Labor arbitration
演化生物学
Evolutionary biology
价格匹配
Price matching
Slide 38
政府竞争以吸引产业
Governments Competing to Attract
Business
3, 31, 4None
4, 12, 2IncentivesSu Zhou
(苏州)
NoneIncentives
Chang Zhou
(常州)
20
Slide 39
劳动仲裁
Labor Arbitration
44%23%No Lawyer
73%46%LawyerEMPLOYER
No LawyerLawyer
UNION
Predicted Percentage of Employer “Wins” in Arbitration Cases
Slide 40
演化生物学:营巢鸟的困境
Evolutionary Biology: Bowerbird’s
Dilemma
GG, GGGM, MGGuard
MG, GMMM, MMMaraudBIRD 1
GuardMaraud
BIRD 2
(MG>GG>MM>GM)
21
Slide 41
价格匹配
Price Matching
3,000, 3,0000, 4,000High
4,000, 02,000, 2,000LowTOYS “R” US
HighLow
KMART
Slide 42
价格匹配
Price Matching
2,000, 2,000
0, 4,000
2,000, 2,000
Low
KMART
3,000, 3,0003,000, 3,000High
TOYS “R” US
3,000, 3,0003,000, 3,000Match
2,000, 2,0004,000, 0Low
MatchHigh
22
Slide 43
Summary
在囚徒困境中,每个参与者都有一个优势策略
(背叛),但是其均衡结果对于所有参与者来
讲,都比每个人使用劣势策略(合作)时更坏。
In the prisoners’ dilemma, each player
has a dominant strategy (to defect), but
the equilibrium outcome is worse for all
players than when each uses her
dominated strategy (to Cooperate).
Slide 44
总结
Summary
困境的一种解决方法是重复博弈。
One of the solutions to the dilemma is
repetition of play.
在有限次重复博弈中,未来合作的现值最终为零。反
转导致一个没有合作行为的均衡。
In a finitely played game, the present value of
future cooperation is eventually zero and
rollback yields an equilibrium with no
cooperative behavior.
在无限次博弈(或期限不确定)时,通过使用合适的
或然策略,如严厉策略,可以达成合作。
With infinite play (or an uncertain end date),
cooperation can be achieved with the use of an
appropriate contingent strategy such the grim
strategy.
23
Slide 45
总结
Summary
在这一情形下,合作是可能的,仅当合作的现
值超过背叛的现值。
In this case, cooperation is possible only
if the present value of cooperation
exceeds the present value of defecting.
更为一般地,“没有明天”或者短期关系的前景
都会导致参与者减少他们之间的合作。
More generally, the prospects of “no
tomorrow” or of short-term
relationships lead to decreased
cooperation among players.
Slide 46
总结
Summary
困境也可以通过惩罚方法来“解决”。在对手合作或也
背叛时,惩罚改变了背叛合作者的收益。
The dilemma can also be “solved” with penalty
schemes that alter the payoffs for players who
defect from cooperation when their rivals are
cooperating or when other also defecting.
如果就其自身而言,强大参与者来自背叛的损失,大
于他选择合作行为的可能收益,就会有第三种解决方
法(领导)。
A third solution method arises if a large or
strong player’s loss from defecting is larger
than the available gain from cooperation
behavior on that player’s part.
24
Slide 47
总结
Summary
实验证据表明,参与者比理论预计的更为持久地合作。
Experimental evidence suggests that players often
cooperate longer than theory might predict.
这样的行为可以根据参与者对博弈的不完全知识,或其对合作收
益的看法来解释。
Such behavior can be explained by incomplete
knowledge of the game on the part of the players or by
their views regarding the benefits of cooperation.
可以观察到,以牙还牙是一个简单、善意、具有警示性和原谅的
策略,在重复囚徒困境中总体表现相当好。
Tit-for-tat has been observed to be a simple, nice,
provocable, and forgiving strategy that performs very
well on the average in repeated prisoners’ dilemmas.