为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

经济博弈论11

2010-07-28 24页 pdf 223KB 24阅读

用户头像

is_073846

暂无简介

举报
经济博弈论11 1 囚徒困境和重复博弈 The Prisoners’ Dilemma and Repeated Games 第11章 Chapter 11 Slide 2 囚徒困境 Prisoners’ Dilemma 囚徒困境是这样一个博弈:每个参与者有一个 优势策略,但是,当所有参与者使用他们这一 优势策略时,所产生的均衡对于每个人的结果, 比他们都使用劣势策略反而还要差。 The prisoners’ dilemma is a game in which each player has a dominant strate...
经济博弈论11
1 囚徒困境和重复博弈 The Prisoners’ Dilemma and Repeated Games 第11章 Chapter 11 Slide 2 囚徒困境 Prisoners’ Dilemma 囚徒困境是这样一个博弈:每个参与者有一个 优势策略,但是,当所有参与者使用他们这一 优势策略时,所产生的均衡对于每个人的结果, 比他们都使用劣势策略反而还要差。 The prisoners’ dilemma is a game in which each player has a dominant strategy, but the equilibrium that arises when all players use their dominant strategies provides a worse outcome for every player than would arise if they all used their dominated strategies instead. 2 Slide 3 囚徒困境 Prisoners’ Dilemma 本章考虑囚徒困境中的参与者是否以及如何获得和保 持对他们都有利的合作结果,克服为了自身利益而背 叛的个人激励。 In this chapter, we consider whether and how the players in a prisoners’ dilemma can attain and sustain their mutually beneficial cooperative outcome, overcoming their separate incentives to defect for individual gain. 三种解 Three categories of solutions: „ 重复 Repetition „ 惩罚和奖励 Penalty and reward „ 领导 Leadership Slide 4 内容提要 Outline 基本博弈(回顾) The Basic Game (Review) 解之一:重复 Solutions I: Repetition 解之二:惩罚和奖励 Solutions II: Penalties and Rewards 解之三:领导 Solutions III: Leadership 实验证据 Experimental Evidence 真实世界中的囚徒困境 Real-world Dilemmas 3 Slide 5 基本博弈(回顾) The Basic Game (Review) 3 yr, 3 yr25 yr, 1 yrDeny (Cooperate) 1 yr, 25 yr10 yr, 10 yrConfess (Defect) HUSBAND Deny (Cooperate) Confess (Defect) WIFE Slide 6 基本博弈(回顾) The Basic Game (Review) 在任何一个囚徒困境中,总会有一个合作策略 和一个欺骗或背叛策略。 In any prisoners’ dilemma, there is always a cooperative strategy and a cheating or defecting strategy. 参与者总是可以根据其策略选择,被称作背叛 者或合作者。 Players can always be labeled, according to their choice of strategy, as either defector or cooperators. 4 Slide 7 基本博弈(回顾) The Basic Game (Review) 当参与者之间不进行相互合作,他们就选择背叛,希 望以对手的损失为代价,获得个人的利益。 When the players do not cooperate with each other, they choose to defect in the hope of attaining individual gain at the rival’s expense. 囚徒困境能否以及如何解决,问题的实质在于通过非 合作(个人)的行动去实现合作(共同偏好)的结果。 The essence of the question of whether, when and how a prisoner’ dilemma can be resolved is the difficulty of achieving a cooperative (jointly preferred) outcome through noncooperative (individual) actions. Slide 8 解之一:重复 Solutions I: Repetition 在一个囚徒困境的重复博弈中,每个参与者担心一次 背叛会导致未来合作的崩溃。 In a repeated play of the prisoners’ dilemma, each player fears that one instance of defecting will lead to a collapse of cooperation for the future. 如果未来合作的价值很大,超过了短期内通过背叛所 获得的,那么参与者的长期个人利益自动地消除了背 叛,并不需要任何额外惩罚或第三方强制。 If the value of future cooperation is large and exceeds what can be gained in the short term by defecting, then the long-run individual interests of the players can automatically and tacitly keep them from defecting, without the need for any additional punishments or enforcement by third parties. 5 Slide 9 定价中的囚徒困境 Prisoners’ dilemma of Pricing 324, 324216, 36026 (Cooperate) 360, 216288, 28820 (Defect) XAVIER’S TAPAS 26 (Cooperate) 20 (Defect) YVONNE’S BISTRO Slide 10 定价中的囚徒困境 Prisoners’ dilemma of Pricing 假定两个餐馆开始处于合作状态,每个人收取高价格 $26。 Suppose that the two restaurants are initially in the cooperative mode, each charging the higher price of $26. 如果他们正常地竞争至少3个月,按照一次博弈的理论, 我们似乎就应该看到合作行为(高价格)而不是背叛 行为(低价格)。 If they competed on a regular basis for at least 3 months, it seems that we might see cooperative behavior (high prices) rather than the defecting behavior (low prices) predicted by theory for the one-shot game. 但是解实际上没有那么简单。 But the solution is not actually that simple. 6 Slide 11 有限次重复 Finite Repetition 只要两个参与者之间的关系持续的时间长度固定和已知,在最后 阶段的博弈中,优势策略均衡(背叛)就会出现。 As long as the relationship between the two players lasts a fixed and known length of time, the dominant- strategy equilibrium with defecting should prevail in the last period of play. 参与者到达博弈终点时,继续合作就毫无价值,于是他们选择背 叛。 When the players arrive at the end of the game, there is never any value to continued cooperation, and so they defect. 按照反转的预测,相互背叛就会一直倒回到最开始的博弈。 Then rollback predicts mutual defecting all the way back to the very first play. Slide 12 无限次重复 Infinite Repetition 无论在怎样的重复博弈中,相互关系的序贯性 质意味着参与者可以采取的策略依赖于前面回 合的博弈中的行为。 In repeated games of any kind, the sequential nature of the relationship means that players can adopt strategies that depend on behavior in preceding plays of the games. 这样的策略被称为或然策略。 Such strategies are known as contingent strategies. 7 Slide 13 无限次重复 Infinite Repetition 大多数或然策略都是触发策略:只要对手合作,该参 与者也合作;但对方任何背叛就会“触发”规定时间长 度的惩罚期,其间以非合作来回击。 Most contingent strategies are trigger strategies, where a player plays cooperatively as long as her rival(s) do so, but any defection on their part “triggers” a period of punishment, of specific length, in which she plays noncooperative in response. 最有名的两个触发策略是严厉策略和以牙还牙。 Two of the best-known trigger strategies are the grim strategy and tit-for-tat. Slide 14 无限次重复 Infinite Repetition 严厉策略要求与你的对手合作,直到你们当中任何一 人背叛;一旦背叛发生,所有的参与者在此后博弈的 每一回合都选择背叛策略。 The grim strategy entails cooperating with your rival until such time as any of the players defects from cooperation; once a defection has occurred, all the players choose the Defect strategy on every play for the rest of the game. 以牙还牙的策略要求在任何一个回合中,都选择你的 对手在上一回合中的行动。 Tit-for-tat (TFT) is a strategy of choosing, in any specified period of play, the action chosen by your rival in the preceding period of play. 8 Slide 15 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation 假定双方参与者在这一重复的餐馆定价博弈中都使用 严厉策略。 Suppose both players use the grim strategy in the repeated restaurant pricing game. 如果参与者都不偏离这一策略,我们可以预期一个合 作结果,利润各为324。 Without any deviation from such a strategy from any player, we would expect a cooperative outcome, with a profit of 324 for each. 对于某一参与者来说,给定对手盯住这一策略,选择 偏离是否值得? Is it worthwhile to deviate from such a strategy for some player, given the other sticks to it? Slide 16 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation 假定开始时双方都采取合作行动。 Suppose at the beginning, both are playing Cooperate. 如果X偏离仅一次(在一个月里出背叛),他会多得 36的利润(总利润360而不是324)。 If Xavier’s Tapas deviate only once (playing Defect in one month), it could add 36 to its profits (360 instead of 324). 在背叛后的第一个月,遵照严厉策略,双方都会被锁 定在背叛定价上,获得每月288的利润,损失36。 In the first month after Xavier’s defection, by following the grim strategy, both restaurants would be locked at the defective price earning 288 each month, where each suffers a loss of 36. 9 Slide 17 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation X必须比较36的得益和从第二时期开始并持续下去的 36的损失的现值。 Xavier’s must compare the gain of 36 with the present value(PV) of the loss of 36 from the second period on. 使用符号r示月的总回报率,PV计算为: Using the symbol r to denote the (monthly) total rate of return yields a solution for PV: PV=36/(1+r)+36/(1+r)2+……=36/r 偏离一次(然后永远背叛)是不值得的,当且仅当: To deviate once (and then fall into defect forever) is NOT worthwhile if and only if 36<36/r, or r<1. Slide 18 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation 假定开始时,双方都采取背叛行动。 Suppose at the beginning, both are playing Defect. X偏离一次(出招为合作)是否值得? Is it worthwhile for Xavier’s to deviate (by playing Cooperate) once? 否 No! 如果此前X出了背叛,对手出了合作,或者相反,X是 否值得偏离一次(出招为合作)? What if one are playing Defect and the other playing Cooperate, or vice versa? 否 No! 10 Slide 19 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation 所有可能的子博弈(阶段博弈)必然从四类节 点中的一个开始:参与者在上一回合(分别) 出(C, C), (C, D), (D, C) 或(D, D)。 All the possible subgames (stage games) must begin from one of four kinds of nodes resulting from two players play (C, C), (C, D), (D, C) or (D, D) in the previous stage. 这样我们已经,在任何一个阶段博弈中, 单独一次的偏离不能使偏离者受益。 So we have proved that in any stage game, a single deviation cannot make the deviating player better if r<1. Slide 20 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation 那么,偏离不只一次呢? How about deviate more than once? 一阶段偏离原理(对无限范围博弈)表明,两个参与者的策略组 合构成子博弈完美均衡,当且仅当对任何一个参与者,不存在任 何的单独一个阶段(无论是否在均衡路径上)的策略偏离,可以 使得她更好,给定该阶段已经到达。 The one-stage deviation principle (for infinite-horizon games) states that two players’ strategy combination is a SPE if and only if there is no strategic deviation for any player at any single stage (on or off equilibrium path) which can make her better given that stage has been reached. „ 这自然意味着,任何超过一次的偏离不可能使该参与者更好,如果 没有任何一次性偏离能够做到。 This naturally means any deviation more than once cannot make that player better if any single deviation cannot do so. 11 Slide 21 使用严厉策略来确保合作 Use Grim Strategy to Guarantee Cooperation 在我们的重复餐馆定价博弈中,严厉策略组合构 成了子博弈完美均衡,如果r<1。 In our repeated restaurant pricing game, the Grim strategy combination is a SPE if r<1。 „ 或者,贴现因子 or the discount factor δ≡1/(1+r)>1/2. 均衡的结果是合作的。 The equilibrium outcome is cooperative. 这样,使用严厉策略解决了两个餐馆之间的囚徒 困境。 Thus use of this grim strategy solves the prisoners’ dilemma for the two restaurants. Slide 22 TFT策略? How about TFT Strategy? 双方参与者出以牙还牙能够构成子博弈完美均 衡吗? Can both players playing TFT being a SPE? 不能!NO! The textbook is wrong! 提示:使用一阶段偏离原理,讨论与严厉策略 的情形相类似的四种情况。 Hint: Use the one-stage deviation principle for four cases similar with the grim strategy. 12 Slide 23 博弈长度未知 Games of Unknown Length 下一期的某一金额的现值为δ=1/(1+r)乘以该金额。 The present value of an amount next month is worth only δ=1/(1+r) times the amount. 如果除此之外,仅在概率p(小于1)下,博弈关系才 会持续到下一期,那么下一期的该金额只值pδ乘以这 一金额。 If in addition there is only a probability p (less than 1) that the relationship will actually continue to the next month, then next month’s amount is worth only pδ times the amount. 有效回报率R,满足1/(1+R)= pδ,则: The effective rate of return R, where 1/(1+R)= pδ is, R=(1- pδ)/(pδ). Slide 24 一般理论 General Theory C, CL, HCooperate H, LD, DDefectROW CooperateDefect COLUMN (H>C>D>L) 13 Slide 25 一般理论 General Theory 合作结果可以为严厉策略所支持,当且仅当: An cooperative outcome can be sustained by the grim strategy combination if and only if, R<(C-D)/(H-C) 合作破裂的可能性越大,如果: The collapse of cooperation is more likely if: „ R越大(或者pδ越小) the larger R (or the smaller pδ ) is „ (C-D)越小 the smaller (C-D) is „ (H-C)越大 the larger (H-C) is Slide 26 一般理论 General Theory 也就是说,合作破裂的可能性越大,当…… That is, the collapse of cooperation is more likely when …… „ 参与者缺乏耐心,或者预期博弈会很快结束 players are impatient, or the game is expected to end quickly „ 惩罚不够严厉 punish is not very severe „ 背叛为参与者在很短时间内积攒了大量收益。 defecting garners a player large and immediate benefits. 14 Slide 27 解之二:惩罚和奖励 Solutions II: Penalties and Rewards 3 yr, 3 yr25 yr, 21 yrDeny (Cooperate) 21 yr, 25 yr10 yr, 10 yrConfess (Defect) HUSBAND Deny (Cooperate) Confess (Defect) WIFE The game has changed from being a prisoners’ dilemma to an assurance game. Slide 28 解之二:惩罚和奖励 Solutions II: Penalties and Rewards 3 yr, 3 yr25 yr, 21 yrDeny (Cooperate) 21 yr, 25 yr30 yr, 30 yrConfess (Defect) HUSBAND Deny (Cooperate) Confess (Defect) WIFE Each player has a dominant strategy and (Deny, Deny) becomes the unique Nash equilibrium. 15 Slide 29 解之三:领导 Solutions III: Leadership 在囚徒困境的许多例子中,博弈都假定为对称。 In most examples of the prisoners’ dilemma, the game is assumed to by symmetric. 然而,在实际的策略情况下,一个参与者可能相对较 “大”(领导者),另一个相对较“小”。 However, in actual strategic situations, one player may be relatively “large” (a leader) and the other “small”. 如果收益的规模相当不对等,则来自背叛的损害会如 此多地落在较大参与者身上,以致她明知对手会背叛, 依然选择合作行动。 If the size of the payoffs is unequal enough, so much of the harm from defecting may fall on the larger player that she acts cooperatively, even while knowing that the other will defect. Slide 30 解之三:领导 Solutions III: Leadership -1.6, -1.60, -2No Research -2, 0-1, -1ResearchDORMINICA No ResearchResearch SOPORIA Equal-Population SANE Research Game This game is a prisoners’ dilemma where each player has a dominant strategy to do no research. 16 Slide 31 解之三:领导 Solutions III: Leadership -2.4, -0.80, -2No Research -2, 0-1, -1ResearchDORMINICA No ResearchResearch SOPORIA Unequal-Population SANE Research Game No Research is still the dominant strategy for Soporia, but Dominica’s best response is now Research. Slide 32 解之三:领导 Solutions III: Leadership 从某种意义上说,囚徒困境通过大小不对称得 到了解决。 The prisoners’ dilemma has, in a sense, been “solved” by the size asymmetry. 较大的国家选择承担领导者的角色,为整个世 界提供利益。 The larger country chooses to take on a leadership role and provide the benefit for the whole world. 17 Slide 33 解之三:领导 Solutions III: Leadership “弱者利用强者” “The exploitation of the great by the small” „ 沙特在欧佩克中充当平衡器 Saudi Arabia as the “swing producer” in OPEC „ 北约中的美国 US in NATO „ 中超里的七大俱乐部 Slide 34 实验证据 Experimental Evidence 实验表明,在长度已知和有限的重复博弈中,依然可 以看到合作。 Experiments show that cooperation occurs even in repeated versions of known and finite length. 只在有限博弈的最后几个回合,背叛才会发生。 Only in the last few plays of a finite game does defecting seem to creep in. 结果还显示,合作的基于反转的瓦解,随着时间推移, 可以被参与者从博弈的经历中学习到。 Results also suggests that the unraveling of cooperation, based on the use of rollback, is being learned from experience of the play itself over time. 18 Slide 35 实验证据 Experimental Evidence 如果参与者发现自己处于合作状态,而且意识 到博弈关系即将结束,合作的破裂必定会涉及 到不确定性,如双方都出混合策略。 If players find themselves in a cooperative mode with the known end of the relationship approaching, the unwinding of cooperation must include some uncertainty, such as mixed strategies, for both players. Slide 36 实验证据 Experimental Evidence 计算机模拟实验表明,“善意”的程序比“恶意”的程序 表现更好。但不包括那些总是善意和合作的。 Computer simulations experiments shows “nice” programs did better than “nasty” programs. But not those always nice and cooperative ones. 获胜策略是一个最简单的程序:以牙还牙。原因可能 是,它是立即原谅的、善意的、具有警示性的和清晰 的。 The winning strategy turned out to be the simplest program: Tit-for tat. The reason might be that it is at once forgiving, nice, provocable, and clear. 19 Slide 37 真实世界中的囚徒困境 Real-World Dilemmas 政府竞争以吸引产业 Governments competing to attract Business 劳动仲裁 Labor arbitration 演化生物学 Evolutionary biology 价格匹配 Price matching Slide 38 政府竞争以吸引产业 Governments Competing to Attract Business 3, 31, 4None 4, 12, 2IncentivesSu Zhou (苏州) NoneIncentives Chang Zhou (常州) 20 Slide 39 劳动仲裁 Labor Arbitration 44%23%No Lawyer 73%46%LawyerEMPLOYER No LawyerLawyer UNION Predicted Percentage of Employer “Wins” in Arbitration Cases Slide 40 演化生物学:营巢鸟的困境 Evolutionary Biology: Bowerbird’s Dilemma GG, GGGM, MGGuard MG, GMMM, MMMaraudBIRD 1 GuardMaraud BIRD 2 (MG>GG>MM>GM) 21 Slide 41 价格匹配 Price Matching 3,000, 3,0000, 4,000High 4,000, 02,000, 2,000LowTOYS “R” US HighLow KMART Slide 42 价格匹配 Price Matching 2,000, 2,000 0, 4,000 2,000, 2,000 Low KMART 3,000, 3,0003,000, 3,000High TOYS “R” US 3,000, 3,0003,000, 3,000Match 2,000, 2,0004,000, 0Low MatchHigh 22 Slide 43 Summary 在囚徒困境中,每个参与者都有一个优势策略 (背叛),但是其均衡结果对于所有参与者来 讲,都比每个人使用劣势策略(合作)时更坏。 In the prisoners’ dilemma, each player has a dominant strategy (to defect), but the equilibrium outcome is worse for all players than when each uses her dominated strategy (to Cooperate). Slide 44 总结 Summary 困境的一种解决方法是重复博弈。 One of the solutions to the dilemma is repetition of play. 在有限次重复博弈中,未来合作的现值最终为零。反 转导致一个没有合作行为的均衡。 In a finitely played game, the present value of future cooperation is eventually zero and rollback yields an equilibrium with no cooperative behavior. 在无限次博弈(或期限不确定)时,通过使用合适的 或然策略,如严厉策略,可以达成合作。 With infinite play (or an uncertain end date), cooperation can be achieved with the use of an appropriate contingent strategy such the grim strategy. 23 Slide 45 总结 Summary 在这一情形下,合作是可能的,仅当合作的现 值超过背叛的现值。 In this case, cooperation is possible only if the present value of cooperation exceeds the present value of defecting. 更为一般地,“没有明天”或者短期关系的前景 都会导致参与者减少他们之间的合作。 More generally, the prospects of “no tomorrow” or of short-term relationships lead to decreased cooperation among players. Slide 46 总结 Summary 困境也可以通过惩罚方法来“解决”。在对手合作或也 背叛时,惩罚改变了背叛合作者的收益。 The dilemma can also be “solved” with penalty schemes that alter the payoffs for players who defect from cooperation when their rivals are cooperating or when other also defecting. 如果就其自身而言,强大参与者来自背叛的损失,大 于他选择合作行为的可能收益,就会有第三种解决方 法(领导)。 A third solution method arises if a large or strong player’s loss from defecting is larger than the available gain from cooperation behavior on that player’s part. 24 Slide 47 总结 Summary 实验证据表明,参与者比理论预计的更为持久地合作。 Experimental evidence suggests that players often cooperate longer than theory might predict. 这样的行为可以根据参与者对博弈的不完全知识,或其对合作收 益的看法来解释。 Such behavior can be explained by incomplete knowledge of the game on the part of the players or by their views regarding the benefits of cooperation. 可以观察到,以牙还牙是一个简单、善意、具有警示性和原谅的 策略,在重复囚徒困境中总体表现相当好。 Tit-for-tat has been observed to be a simple, nice, provocable, and forgiving strategy that performs very well on the average in repeated prisoners’ dilemmas.
/
本文档为【经济博弈论11】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索