美添一樹 @東京大学湊離散構造処理系プロジェクトセミナー 2012/2/24

by user

on 28 марта 2017

Category: Documents

>> Downloads: 3

views

Report

Comments

Description

Download 美添一樹 @東京大学湊離散構造処理系プロジェクトセミナー 2012/2/24

Transcript

美添一樹 @東京大学湊離散構造処理系プロジェクトセミナー 2012/2/24

超並列
分散モンテカルロ木探索
美添一樹 @東京大学
湊離散構造処理系プロジェクト
セミナー 2012/2/24
1
1
自己紹介
過去の研究内容
2
2
投機並列実行
• ループ等を複数のCPUに割当てて強引に並列実行
し、後から依存関係を解決
•
依存関係の原因：分岐、レジスタ、メモリ
• メモリアクセスを介した依存関係が一番難しい
•
キャッシュ同期プロトコルの拡張で対処する手法を提案
• 問題点: 性能向上が小さい
•
そもそも投機並列実行自体が、ハードウェア資源を要求
する割に性能向上が少ない (4コアで2割程度)
•
当時はマルチコアがまだ一般的でなかったので、レジス
タ間の依存関係も対処が難しい
3
3
デジタル無線通信
•
(株) 富士通研究所
•
モバイルコミュニケーション開発研究所
•
•
現在はワイヤレス研究所
携帯電話無線基地局関連の開発研究
•
無線通信シミュレータの開発
•
•
Adaptive Array Antenna 方式の開発研究
•
•
性能検証のため
富士通研究所
@YRP5番館
複数のアンテナを利用した指向性のある通信(の制御)
Digital Signal Processor のプログラミング
•
VLIWのアセンブリを手書き
4
4
証明数探索によるAND/OR木探索
•
グラフ(木)の形状を手がかりに探索する手法
•
枝分かれの少ない方を優先して探索する
•
•
詰将棋には理想的な性質
•
•
相手の選択肢を狭めるように探索する
[長井2002]
ほとんど全ての詰将棋を解く(df-pn)
その他、多くのゲームの終盤にも有効
•
•
五目並べの先手必勝証明
[Allis1992]
checkers の引き分け証明 (Scienceに掲載)
[Schaeffer, 岸本 et al. 2007]
人間にも読み切れる問題は
コンピュータ圧倒的に優勢だが例外あり
5
5
証明数探索の弱点
熟練した人間に負ける問題
開いた
詰碁
詰みのない
将棋の終盤
閉じた
詰碁は
解ける
難しい
倉庫番
6
6
領域の開いた問題を解いた
[Yoshizoe, Kishimoto, Müller 2007] [Yoshizoe 2009]
既存のアルゴリズムが苦手とする
領域の開いた問題を解いた
「分岐因子が大きくかつ一様な探
索空間にも有効」
脅威度 (λorder) の概念と深さ優
先証明数探索 (df-pn) の組合せ
おまけ
これだと
別解あり
7
7
静脈認証の安全性評価
背景: FAR と FRR 及びその問題点
•
FAR, FRR が標準的に用いられてきた
•
•
•
FAR: 他人受入率
0.8"
FRR: 本人拒否率
0.6"
人工物によるなりすましは考慮外
•
•
•
•
1"
0.4"
FRR"
0.2"
指紋認証グミ指
0"
虹彩認証眼の写真
0"
10"
20"
30"
40"
50"
静脈認証プラスチック, (大根)
「ウルフ攻撃確率の提案」[宇根 et al. 2006]
•
•
FAR"
FAR: False Acceptance Rate
FRR: False Reject Rate
人間由来でないデータを考慮した基準
ウルフ: 人間ではないのに複数の人間と
誤一致するデータ
•
cf. 赤ずきんちゃん、音声認識
8
8
ウルフ攻撃確率 (WAP)
false2acceptance2rate
50%
50%
0.03%
0.03%
0.01%
FAR=
0.05%
100%
100%
0.01%
0.05%
特に、攻撃者が
アルゴリズムの
性質を知っている
場合を想定
WAP=
安易に本人拒否率(FRR) を下げようとすると、
ウルフが発生する可能性がある
しかし、ウルフ攻撃確率の推定は難しい
9
9
ウルフ攻撃確率(WAP)の推定
目的
認証アルゴリズムが与えられ
たときに、WAPを推定する
ある静脈認証方式の万能鍵
(ユニバーサルウルフ) を
探索アルゴリズムで発見
[田辺,美添,今井2008]
SCIS2008論文賞
[Tanabe,Yoshizoe, Imai 2009]
アルゴリズムは
単純だったが、
結果が面白かった
このパターンを
敷き詰めた画像を
提示すると、
どんな画像にも
マッチする
10
10
量子計算機シミュレータ
•
•
量子bitを表現するには膨大なメモリが必要
•
36量子bit を倍精度で表現するには 1TB 必要
SSDを記憶領域として利用し、量子演算をエミュレー
ションする
•
32GB SSD x 4 を使用 Intel X-25E Extreme
•
•
•
Read 984MB / sec, Write 685MB / sec
PC全部で約30万円
Hadamard 変換の時間を測定
•
•
•
規則的な行列乗算を複数回行う
SSDの性能に合わせてブロック化などを調整
32 qubit Hadamard変換に約380秒 (メモリ64GB)
•
参考: BlueGene/L (256 MPI process) で約70秒
11
11
背景: SAT solver
(¬x1
¬x2
¬x3 )
(x2 )
(x1
x2 )
(x1
¬x2
x3 )
(¬x1
x3 )
• SAT はもちろんNP完全
• しかし実際には多くの問題が実用的な時間
で解ける
•
変数が多いだけでは難しくならない
• 毎年SAT競技会が開催されている
• キーテクニック
•
DPLL, Conflict-Driven Clause Learning (CDCL)
12
12
DPLL+CDCL
(¬x1
(¬x1
¬x2
¬x3 )
¬x3 )
(x1
(x1
x2 )
(¬x1
x3 )
(x1
x2 = 1
x1 = 1
(¬x3 )
(x2 )
DPLL = Davis, Putnum, Logemann, Loveland
M. Davis, G. Logemann, and D. Loveland.
“A machine program for theorem proving”
Comm. ACM,Vol. 5, No. 7, pp. 394-397, 1962.
(x3 )
x3 )
x1 = 0
(x3 )
¬x2
x3 )
(¬x1
x3 )
近年のSAT solverは
DPLL に学習説
x3 = 1
充足不能なので、
充足
1, back track するか、
2, 学習節を生成し追加、restart
CDCL = Conflict-Driven Clause Learning
(¬x2 ¬x1 )
cf. MiniSAT
13
13
分散並列 SAT solver
1, Master-Worker
3, TDS方式 (詳細略)
分散ハッシュテーブル
を用いた並列化手法を
SAT solver に適用
負荷分散が問題
•1, 2, は既存方式
2, Portfolio
•最新のクラスタ上で実
個別に複数のソルバを実行
装、性能測定
•3, は新提案
学習節だけ交換
•まだ実装が不安定
14
14
Image provided by Tokyo Institute of Technology
Photographer: Takahiro Uozumi
Scalable Distributed
Monte Carlo Tree Search
Kazuki Yoshizoe ([email protected])
Dept. of Computer Science
The University of Tokyo
Joint work with A. Kishimoto, H. Yoshimoto, T. Kaneko
and Y. Ishikawa
15
Graph search
Root
Branch (Edge)
Node
Path
Leaf
Main component of many
AI (Artificial Intelligence) applications
16
16
背景 1
背景 2
モンテカルロ木探索 (MCTS)
ランダムサンプリング+ 木探索
分散ハッシュテーブルを
用いた並列探索 (TDS)
[Romain et al. 1999]
[Coulom 2006][Kocsis, Szepesvári 2006]
分散並列モンテカルロ木探索
TDS-‐df-‐UCT: 1,200core で600-‐750倍高速化
ただしまだ仮想ゲーム木での性能
(Transposition table Driven Scheduling, depth-first, Upper Confidence bound for Trees)
17
17
背景その１
モンテカルロ木探索
とコンピュータ囲碁
あるいは二人ゼロ和完全確定情報ゲームの進歩
18
18
普通の二人ゼロ和完全情報ゲーム
min-max探索 + αβ枝刈り (αβ探索)
αβカットにより探索が
省略される。理想的には
探索節点数が sqrt になる
[Knuth, Moore 1975]
Max node
Min node
50
24
70
25
47
15
67
65
19
19
普通の二人ゼロ和完全情報ゲーム
min-max探索 + αβ枝刈り (αβ探索)
αβカットにより探索が
省略される。理想的には
探索節点数が sqrt になる
50
50
47
50
[Knuth, Moore 1975]
70 47
67
Max node
Min node
50
24
70
25
47
15
67
65
19
19
普通の二人ゼロ和完全情報ゲーム
min-max探索 + αβ枝刈り (αβ探索)
αβカットにより探索が
省略される。理想的には
探索節点数が sqrt になる
50
50
47
50
[Knuth, Moore 1975]
70 47
67
Max node
Min node
50
24
70
25
47
15
67
65
19
19
普通の二人ゼロ和完全情報ゲーム
min-max探索 + αβ枝刈り (αβ探索)
αβカットにより探索が
省略される。理想的には
探索節点数が sqrt になる
50
50
47
50
[Knuth, Moore 1975]
70 47
67
Max node
Min node
50
24
70
25
47
15
67
65
19
19
αβ探索によるAIの強さ
チェッカー
オセロ
チェス
1994年に世界チャンピオンに勝利
2007年に引き分け証明
1997年に世界チャンピオンに完勝
1997年に世界チャンピオンに勝利
(IBM DeepBlue vs Kasparov)
将棋
そろそろ羽生さんと良い勝負
囲碁
アマ3級くらい？
明らかに囲碁だけ突出して弱い
囲碁も二人ゼロ和完全情報ゲームなのになぜか？
20
20
囲碁の難しさ
探索空間が巨大
チェッカー
1020
良い評価関数が作れる
オセロ
1028
チェス
1050
手生成、機械学習などによる
高速、正確な評価関数
将棋
71
10
囲碁(9路)
1038
囲碁(19路)
10171
μ秒オーダー
9割方当たる
(この表現の意味は置いておいて)
囲碁には、良い
評価関数が作れない
21
21
Monte Carlo Simulation
とゲームAI
•
確率的なゲームに使うのは自然なアイデア
•
•
•
•
Poker (Texas hold em)
Scrabble
Back Gammon
完全情報ゲームに使うのは…
•
•
一見奇抜なアイデア
「原始モンテカルロ碁」
•
1993年頃に提案
[Brügmann 1993][Bouzy][Cazenave]
22
22
Playout とは
• 囲碁は終盤に近づくにつれ
て合法手が減少する
• 合法手の中からランダムに
選んで打つだけのプレイ
ヤーでも終局可能
•
ただし、少し制約が必要「眼に
打たない」
• playoutは造語
•
既存の用語は simulation,
episode 等
23
23
原始モンテカルロ囲碁
playout による局面評価
黒の手番
白の手番
playout
黒勝ち
白勝ち
24
24
もちろん、
原始モンテカルロ囲碁は弱い
• playout の回数が多ければ強くなるという期待
• しかし、深さが2段以上の木に対しては、最善
手を返す保証がない
•
•
相手のミスに期待した手を打つ特徴がある
正しい応手の数が少ない場合は特にその傾向が強い
• どのくらいが強さの限界か調べた論文あり
•
GNU Go 相手の勝率は1割くらいだったようです
H.Yoshimoto, K.Yoshizoe, T. Kaneko, A. Kishimoto, and K. Taura
Monte Carlo Go Has a Way to Go, AAAI06,
25
25
[Coulom 2006]
CrazyStone の登場
• 2006年 Computer Olympiad (@Torino)
囲碁9路盤部門優勝プログラム
•
作者は Rémi Coulom (フランス)
•
•
using Monte-Carlo method !
優勢だと手堅く逃げ切り、劣勢だと無理な手を
打つ
• CrazyStone が使っていたのが、原始モンテ
カルロを改良したアルゴリズム
•
これがモンテカルロ木探索の発明とされる
26
26
モンテカルロ木探索
Monte Carlo Tree Search
有望な手で多くの
playout を実行
playout の回数が
閾値を超えたら
木が成長する
27
27
Normal search
algorithm
Monte Carlo Tree Search
(MCTS)
Leaves are evaluated
using evaluation function
MCTS dramatically improved
Go programs
[R. Coulom 2006, L. Kocsis and C. Szepezvári 2006]
Do random samplings at leaves
So-called Monte Carlo Tree Search
(sampling is called playout in game AI)
28
28
UCT algorithm
[L. Kocsis and C. Szepezvári 2006]
Upper Confidence bound applied to Trees
Upper Confidence Bound is a way of comparing
probability distributions
w
2 ln t
s
+
mean
s
[P. Auer, N. Cesa-Bianchi and P. Fischer 2002]
bias
Multi-Armed Bandit Problem
UCT algorithm applied
UCB to tree search
Proof: UCT converges to
the optimal node with
infinite playouts
29
29
MCTS applications
[H. Nakhost and M. Müller]
“Monte-Carlo Exploration for
Deterministic Planning”, IJCAI’09
2 player
games
Multi player
games
Planning
Constraints
Satisfaction Problem
[S. Baba,Y. Joe, A. Iwasaki and M.Yokoo]
“Real-time Solving of Quantified CSPs based on
Monte-Carlo Game Tree Search”, IJCAI’11
Canadian Traveler
Problem
Single player
game (puzzle)
[Z. Bnaya, A. Felner, D. Fried, O. Maksin,
S. E. Shimony:] “Repeated-Task Canadian
Traveler Problem”, SOCS 2011
30
30
特に囲碁の進歩は
•
•
2005年以前はアマ3級程度で停滞
2006年末、MCTSの発明によりアマ初段
•
•
2012年2月現在、最強のプログラムはアマ7段格
•
•
•
•
kgs 1d を達成
kgs 5d-6d
あと少しで都道府県代表を狙える (7dあれば狙える)
1年で1段程度の進歩
プロは kgs 9d 以上トッププロは 11d-12d と推測
31
31
背景その2
分散ハッシュテーブル
によるIDA*探索
またはクラスタでルービックキューブを解くには
32
32
A*探索
• A*探索は評価関数付きダイクストラ
• 反復深化と組み合わせたIDA*探索はよく使われる
•
Iterative Deepening A*
• 様々な問題に適用可能
•
•
普通の最短経路、何らかの最適化、パズルの解法
ルービックキューブの20手証明にもおそらくIDA*探索
が用いられている
[Rokicki, Kociemba, Davidson and Dethridge 2010]
God’s Number is 20 ( http://cube20.org/ )
33
33
Hash table and Search
何らかの hash 関数
によって、節点の
hash 値を計算
(cf. Zobrist hash)
01001011
Hash Table
Hash Key
11010001
10001110
違う手順で同一局面に至ることを transposition
と呼ぶので、transposition table と言う。
合流の検知、履歴を用いた高速化 (反復深化など)
で利用される。
探索アルゴリズムでは標準的なテクニック。
34
34
Straightforward parallel search
•
•
Simple approach
•
•
Divide tree into subtrees
Do Work Stealing if there are
idle CPUs
Difficulties
•
Communication overhead for
Work Stealing
•
Needs accurate workload
prediction for uniform load
balancing
•
Transpositions (DAG)
35
35
Hash table based parallelization
Transposition table Driven Scheduling [Romein et al. 1999]
CPU ID
11 01001011
3
CPU0
Hash Table
CPU1
Hash Table
Hash Key
2
1
10 11010001
01 10001110
CPU2
Hash Table
CPU3
Hash Table
cf. Distributed Hash Table (DHT)
•Disadvantage: frequent inter-‐CPU communicaJons
•Advantage 1: all communicaJons are 1 to 1 (no broadcast)
•Advantage 2: communicaJon and computaJon can overlap
36
36
Parallel IDA* by TDS
[Romein et al. 1999]
•
IDA* search (depth first A*)
•
•
Iterative Deepening A*
最短経路を発見する
• 並列化の準備
•
通常の A* 探索
•
•
結果を上の節点に返す
上に結果を返さないA*探索
•
•
ボトルネックを回避
少しのオーバーヘッドで可能
37
37
TDS (TDS-IDA*) mechanism
Evaluation
Search
Search
Search
Search
Search
Search
Search
Search
Search
Search
job
queue
Search
Search
Search
Search
Search
hash
table
CPU0
CPU1
CPU2
38
38
TDS performs extremely well
The secret is elimination of upward communication
Speedup
Speedup
(a) 15-puzzles
Speedup
Parallel IDA* search speedup
Fig. 5 [Romein et al. 1999]
(b) 14-puzzles
(c) Rubik’s Cube
39
39
TDS の適用例
•
A* search
•
original TDS [Romein et al. 1999]
•
•
•
パズル、大規模グラフの最短経路問題
Alpha-beta search
•
TDSAB [Kishimoto 2002]
•
•
•
TDS = IDA* + TDS
TDSAB = MTD(f) + YBWC + TDS
二人ゼロ和完全情報ゲーム
並列モンテカルロ木探索
•
TDS-df-UCT [Yoshizoe et al. 2011]
•
•
TDS-df-UCT = df-UCT + TDS
これから話します
40
40
Distributed Parallel
Monte-Carlo Tree Search
スーパーコンピュータでモンテカルロ木探索
41
41
Parallel computing
All computers will be parallel
Shared memory
CPU
CPU
Main
Memory
CPU
Distributed memory
CPU
CPU
CPU
Main
Memory
Main
Memory
Main
Memory
CPU
CPU
CPU
CPU
Fast communication,
few CPUs, easy to use
Network
Slow communication,
many CPUs, difficult to use
42
42
Existing parallel approach
Root parallelization (Portfolio)
Many trees with
different random seeds
Achieves 32 to 64
fold speedup
Similar approach is
effective for other problems.
e. g. SAT solver, CSP solver
Portfolio based
parallel SAT solver
cf. SATzilla
Share important
information
43
43
r
TDS-UCT
Original UCB formula
w
+
s
a
add virtual loss to UCB
b
How to distribute
CPUs?
UCT: UCB applied to Trees
UCB: Upper Confidence Bound
2 ln t
s
w
+
s+d
2 ln(t + D)
s+d
award a penalty if other CPU(s)
are searching the node
TDS: Transposition table Driven Scheduling
44
44
TDS-UCT mechanism
Playout
Report
Search
Search
Search
Search
Search
Search
Report
Report
Report
job
queue
Search
Search
Report
Report
Search
hash
table
CPU0
CPU1
CPU2
45
45
Total number of jobs
•
Number of jobs has to be greater than number of
CPUs
•
Appropriate number of jobs are not known (We used
20 x no. CPU, but maybe too many)
•
•
•
3 x no. CPU seems better
Many jobs
•
Less idle CPUs, more strength degradation
Fewer jobs
•
More idle CPUs, less strength degradation
46
46
Experiments
6 combinatorial problems
3 synthesized game trees
(branches 8, 40, 150)
2 playout Jmes
(0.1 ms, 1.0 ms)
Pure speedup is measured
(Jme for 6M, 20M samplings)
(not strength speedup)
Image provided by Tokyo Institute of Technology
Photographer: Takahiro Uozumi
TSUBAME2 supercomputer
(Tokyo InsJtute of Technology)
Used 100 compute nodes,
1200 CPU cores (Xeon 2.93 GHz),
QDR InﬁniBand x 2 (80 Gbps)
47
47
Naïve TDS-UCT speedup
0.1-‐ms playout
14 12 Branch 40
10 Branch 8
8 6 4 2 0 120 100 Branch 40
Branch 150
80 Speedup
Speedup
200
400
600
800 1000 1200
Branch 8
60 Branch 150
40 20 0
1.0-‐ms playout
0 0
Number of CPU cores
200
400
600
800 1000 1200
Number of CPU cores
Slower sampling yields be`er speedup
Too slow
48
48
TDS-UCT with ad hoc enhancement
14 12 10 Speedup
8 6 4 2 0 120 100 1.0-‐ms playout
Branch 40
Branch 8
Branch 150
60 40 20 0
200
400
600
800 1000 1200
Branch 40
80 Speedup
0.1-‐ms playout
Branch 8
Branch 150
0 0
Number of CPU cores
200
400
600
800 1000 1200
Number of CPU cores
Using CPU0 only for communication
results in a slight improvement
49
49
The bottleneck
Number of received jobs
per CPU (20M playouts)
Max
Avg.
Ret. lv.
21.7M
685K
20.6
and avg. levels of returns
Communications
concentrate on
one CPU
Ad hoc solution was
not enough
50
50
Solving the bottleneck
r
f
c
a
d
r
g
f
e
b
Normal UCT
unnecessarily returns
to the root node
c
a
d
g
e
df-UCT also
sends records of
branch comparison
b
Depth First UCT
skips unnecessary returns
[H.Yoshimoto, A. Kishimoto, T. Kaneko, K.Yoshizoe* 2007]
Published only in Japanese
51
51
Depth First UCT
r
f
c
a
d
b
g
e
w
+
s
2 ln t
s
Depth
0
1
2
Best
wf sf
2nd
wg sg
wc sc
wa sa
wd sd
wb sb
Best と 2nd best の値を保持したスタックを持つ
(正確には virtual loss の値も持つ)
比較はスタック中の値で行い、余計なリターンを回避
ただし、jobあたりの通信サイズは増大する
(96bytes for naive TDS-‐UCT, 2K bytes for TDS-‐df-‐UCT)
52
52
df-UCT vs UCT on a single core
Ignoring 3rd child did not
affect the strength
r
f
c
a
d
b
g
e
w
+
s
2 ln t
s
df-UCT vs UCT in P-Game
with 100K po. / move
3rd child can become the
best (with prob. 1.0-‐2.4%)
but ignored because table
only has 2 entries
Br.
Dp.
先手後手 Win Loss Draw
8
30
UCT
df
80
119
1
8
30
df
UCT
84
116
0
40
20
UCT
df
157
39
4
40
20
df
UCT
159
36
5
150
10
UCT
df
58
140
2
150
10
df
UCT
65
130
5
53
53
TDS-df-UCT speedup
0.1-‐ms playout
400 Branch 8
300 1.0-‐ms playout
800 Branch 8
600 200 100 Speedup
Speedup
Branch 40
Branch 150
400 200 Branch 40
Branch 150
Naïve TDS-UCT results
Naïve TDS-UCT results
0 0
200
400
600
800 1000 1200
0 0
Number of CPU cores
200
400
600
800 1000 1200
Number of CPU cores
250-‐330 fold speedup (0.1-‐ms)
600-‐740 fold speedup (1.0-‐ms)
Max Avg. Ret. lv.
Naïve 21.7M 685K 20.6
df
100K 82K 2.64
Number of received jobs
54
54
Number of upward jobs
Branch 8
Branch 40
Branch 150
max
avg.
RP/pl.
max
avg.
RP/pl.
max
avg.
RP/pl.
UCT1
21.7M
685K
20.6
20.4M
243K
7.28
20.0M
18.3K
5.50
UCT2
21.8M
697K
20.9
20.0M
240K
7.21
20.0M
18.7K
5.61
df-‐UCT
100K
82.0K
2.64
118K
66.9K
2.01
158K
58.0K
1.74
Number of report jobs received at each CPU
0.1ms playout, 600cores, 20M playouts
There is a big gap between max and avg. for naïve TDS-UCT.
TDS-df-UCT almost equally distributes report jobs.
RP/pl. shows the number of report jobs per playout.
It shows that df-UCT jobs only returns 2 levels in average.
55
55
Job gathering overhead
•
Naïve TDS-UCT は毎回 root 節点に返る
•
•
オーバーヘッドはあるが、常に最新の情報が root にある
TDS-df-UCT は、木の末端近くにとどまる
•
•
•
root 節点の情報がそのままではなかなか更新されない
終了時に job を root 節点に「呼び戻す」必要がある
このオーバーヘッドが大きい (job gathering overhead)
•
•
•
1,200 core だと、実行時間 8秒で overhead 2秒など
オーバーヘッドの大きさは job 総数と木の深さで決まる
job の総数を減らして、探索時間を長くすれば軽減で
きるが、さらに高速化を目指す場合には問題となる
56
56
TDS-df-UCT w/o overhead
400 300 200 100 Branch 8
Branch 150
Branch 40
0 800 600 400 200 0
200
400
600
800 1000 1200
1.0-‐ms sampling
1,000 Speedup
500 Speedup
0.1-‐ms sampling
Branch 8
Branch 150
Branch 40
0 0
Number of CPU cores
200
400
600
800 1000 1200
Number of CPU cores
Execution time includes 1-‐3 sec. fixed time
overhead (for 1,200 cores). Speedup without
the overhead reaches 1000 fold.
57
57
Conclusions and future work
•
Solved bottleneck in parallel MCTS and achieved 740 fold
speedup (existing results are less than 64 fold speedup)
•
•
•
Proposed promising approach for other search algorithms
•
•
Provide new applications for supercomputers
Improve speedup further, hopefully up to 50,000 fold
Apply parallel MCTS to problems other than game tree
(Constraints Satisfaction Problems, Planning, ...)
Beat human Go champion using a supercomputer
(reproducing IBM Deep Blue vs Kasparov)
58
58
Conclusions and future work
•
Solved bottleneck in parallel MCTS and achieved 740 fold
speedup (existing results are less than 64 fold speedup)
•
•
•
Proposed promising approach for other search algorithms
•
•
Provide new applications for supercomputers
Improve speedup further, hopefully up to 50,000 fold
Apply parallel MCTS to problems other than game tree
(Constraints Satisfaction Problems, Planning, ...)
Beat human Go champion using a supercomputer
(reproducing IBM Deep Blue vs Kasparov)
58
58
今後の方向性
• (未発表の) 4,800 core の実験によれば、探索
終了時の Job gathering overhead はかなり
本質的である
•
これを解決する必要がある
• 並列化したUCTの収束の証明
• (並列化を離れて) UCBを置き換える
•
理論的には良いが計算時間がかかるものを実用化
• 最適なパラメータを自動調整
• 囲碁プログラムをこの上で動作させて…
59
59

美添 一樹 @東京大学 湊離散構造処理系プロジェクト セミナー 2012/2/24

Comments

Description

Transcript

美添一樹 @東京大学湊離散構造処理系プロジェクトセミナー 2012/2/24