• HOME
    KOREAN
    CHINESS
    SITE MAP
    JOIN
  • Username (Site Login ID)
  • Password
  • Forgot your password?

  • º£ÆÃ¡¤°ÔÀÓ | Cases & Studies in Betting & Gaming | 赌ÚÏ & ÷áÏõ

    date : 2015-05-20 01:10|hit : 2536
    Article] Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game
    DocNo of ILP: 643

    Doc. Type: Article

    Title: Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

    Authors: Granmo, OC; Glimsdal, S

    Full Name of Authors: Granmo, Ole-Christoffer; Glimsdal, Sondre

    Keywords by Author: Bandit problems; Goore Game; Bayesian learning; Decentralized decision making; Quality of service control; Wireless sensor networks

    Keywords Plus:

    Abstract: The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments, involving QoS control in simulated wireless sensor networks, demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. As an additional benefit, performance also becomes more stable. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making.

    Cate of OECD: Computer and information sciences

    Year of Publication: 2013

    Business Area: game

    Detail Business: game

    Country: Netherlands

    Study Area:

    Name of Journal: APPLIED INTELLIGENCE

    Language: English

    Country of Authors: [Granmo, Ole-Christoffer; Glimsdal, Sondre] Univ Agder, Dept Informat & Commun Technol, N-4604 Kristiansand, Norway

    Press Adress: Granmo, OC (reprint author), Univ Agder, Dept Informat & Commun Technol, POB 422, N-4604 Kristiansand, Norway.

    Email Address: ole.granmo@uia.no; sondre.glimsdal@uia.no

    Citaion:

    Funding:

    Lists of Citation: Bouhmala N, 2010, ENG APPL ARTIF INTEL, V23, P715, DOI 10.1016/j.engappai.2010.01.009; Cao YU, 1997, AUTON ROBOT, V4, P7, DOI 10.1023/A:1008855018923; Chen D, 2004, 2004 INT C WIR NETW; Dimitrakakis C, 2006, LECT NOTES COMPUT SC, V4131, P850; Gelly S, 2006, P NIPS 2006 NIPS; Google, GOOGL WEBS OPT; Graepel T, 2010, P 27 INT C MACH LEAR, P1320; Granmo OC, 2010, IEEE T COMPUT, V59, P545, DOI 10.1109/TC.2009.189; Granmo OC, 2010, LECT NOTES ARTIF INT, V6098, P199, DOI 10.1007/978-3-642-13033-5_21; Granmo Ole-Christoffer, 2010, International Journal of Intelligent Computing & Cybernetics, V3, DOI 10.1108/17563781011049179; Granmo O-C, 2007, INT J COMPUTER SCI A, V4, P15; Granmo OC, 2007, IEEE T SYST MAN CY B, V37, P166, DOI 10.1109/TSMCB.2006.879012; Gupta N, 2011, P 10 INT C MACH LEAR; Gupta N, 2011, P 31 SGAI INT C ART; Iyer R., 2003, IEEE INT C COMM, P517, DOI 10.1109/ICC.2003.1204230; May BC, 2011, TECHNICAL REPORT; Oommen BJ, 2007, IEEE T COMPUT, V56, P959, DOI 10.1109/TC.2007.1045; Narendra K.S., 1989, LEARNING AUTOMATA IN; Oommen BJ, 2008, GAME THEORY STRATEGI; Scott SL, 2010, APPL STOCH MODEL BUS, V26, P639, DOI 10.1002/asmb.874; Thompson WR, 1933, BIOMETRIKA, V25, P285, DOI 10.2307/2332286; Tsetlin M. L., 1973, AUTOMATION THEORY MO; Tung B., 1996, IEEE T PARALL DISTR, V7, P47; Wang T., 2005, P 22 INT C MACH LEAR, P956, DOI DOI 10.1145/1102351.1102472

    Number of Citaion: 24

    Publication: SPRINGER

    City of Publication: DORDRECHT

    Address of Publication: VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS

    ISSN: 0924-669X

    29-Character Source Abbreviation: APPL INTELL

    ISO Source Abbreviation: Appl. Intell.

    Volume: 38

    Version: 4

    Start of File: 479

    End of File: 488

    DOI: 10.1007/s10489-012-0346-z

    Number of Pages: 10

    Web of Science Category: Computer Science, Artificial Intelligence

    Subject Category: Computer Science

    Document Delivery Number: 140IE

    Unique Article Identifier: WOS:000318646900001

    [ÀÌ °Ô½Ã¹°Àº HyeJung Mo¡¦´Ô¿¡ ÀÇÇØ 2015-05-20 14:37:45 GAMBLING¿¡¼­ À̵¿ µÊ]
    reply : 0
  • list
  • prev
  • next