搜索结果: 1-2 共查到“科学技术统计学 Regret Bounds”相关记录2条 . 查询时间(0.008 秒)
Further Optimal Regret Bounds for Thompson Sampling
Further Optimal Regret Bounds Thompson Sampling
2012/11/23
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several s...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ step...