º£ÆÃ¡¤°ÔÀÓ | Cases & Studies in Betting & Gaming | 赌ÚÏ & ÷áÏõ
- Article] Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game
-
DocNo of ILP: 643
Doc. Type: Article
Title: Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game
Authors: Granmo, OC; Glimsdal, S
Full Name of Authors: Granmo, Ole-Christoffer; Glimsdal, Sondre
Keywords by Author: Bandit problems; Goore Game; Bayesian learning; Decentralized decision making; Quality of service control; Wireless sensor networks
Keywords Plus:
Abstract: The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments, involving QoS control in simulated wireless sensor networks, demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. As an additional benefit, performance also becomes more stable. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making.
Cate of OECD: Computer and information sciences
Year of Publication: 2013
Business Area: game
Detail Business: game
Country: Netherlands
Study Area:
Name of Journal: APPLIED INTELLIGENCE
Language: English
Country of Authors: [Granmo, Ole-Christoffer; Glimsdal, Sondre] Univ Agder, Dept Informat & Commun Technol, N-4604 Kristiansand, Norway
Press Adress: Granmo, OC (reprint author), Univ Agder, Dept Informat & Commun Technol, POB 422, N-4604 Kristiansand, Norway.
Email Address: ole.granmo@uia.no; sondre.glimsdal@uia.no
Citaion:
Funding:
Lists of Citation: Bouhmala N, 2010, ENG APPL ARTIF INTEL, V23, P715, DOI 10.1016/j.engappai.2010.01.009; Cao YU, 1997, AUTON ROBOT, V4, P7, DOI 10.1023/A:1008855018923; Chen D, 2004, 2004 INT C WIR NETW; Dimitrakakis C, 2006, LECT NOTES COMPUT SC, V4131, P850; Gelly S, 2006, P NIPS 2006 NIPS; Google, GOOGL WEBS OPT; Graepel T, 2010, P 27 INT C MACH LEAR, P1320; Granmo OC, 2010, IEEE T COMPUT, V59, P545, DOI 10.1109/TC.2009.189; Granmo OC, 2010, LECT NOTES ARTIF INT, V6098, P199, DOI 10.1007/978-3-642-13033-5_21; Granmo Ole-Christoffer, 2010, International Journal of Intelligent Computing & Cybernetics, V3, DOI 10.1108/17563781011049179; Granmo O-C, 2007, INT J COMPUTER SCI A, V4, P15; Granmo OC, 2007, IEEE T SYST MAN CY B, V37, P166, DOI 10.1109/TSMCB.2006.879012; Gupta N, 2011, P 10 INT C MACH LEAR; Gupta N, 2011, P 31 SGAI INT C ART; Iyer R., 2003, IEEE INT C COMM, P517, DOI 10.1109/ICC.2003.1204230; May BC, 2011, TECHNICAL REPORT; Oommen BJ, 2007, IEEE T COMPUT, V56, P959, DOI 10.1109/TC.2007.1045; Narendra K.S., 1989, LEARNING AUTOMATA IN; Oommen BJ, 2008, GAME THEORY STRATEGI; Scott SL, 2010, APPL STOCH MODEL BUS, V26, P639, DOI 10.1002/asmb.874; Thompson WR, 1933, BIOMETRIKA, V25, P285, DOI 10.2307/2332286; Tsetlin M. L., 1973, AUTOMATION THEORY MO; Tung B., 1996, IEEE T PARALL DISTR, V7, P47; Wang T., 2005, P 22 INT C MACH LEAR, P956, DOI DOI 10.1145/1102351.1102472
Number of Citaion: 24
Publication: SPRINGER
City of Publication: DORDRECHT
Address of Publication: VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS
ISSN: 0924-669X
29-Character Source Abbreviation: APPL INTELL
ISO Source Abbreviation: Appl. Intell.
Volume: 38
Version: 4
Start of File: 479
End of File: 488
DOI: 10.1007/s10489-012-0346-z
Number of Pages: 10
Web of Science Category: Computer Science, Artificial Intelligence
Subject Category: Computer Science
Document Delivery Number: 140IE
Unique Article Identifier: WOS:000318646900001
[ÀÌ °Ô½Ã¹°Àº HyeJung Mo¡¦´Ô¿¡ ÀÇÇØ 2015-05-20 14:37:45 GAMBLING¿¡¼ À̵¿ µÊ]
- reply : 0
-
- list
-
- prev
- next