forked from pasky/pachi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
HACKING
307 lines (233 loc) · 12.3 KB
/
HACKING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
This is brief developer-oriented overview in Pachi structure.
Pachi is completely Go-specific (c.f. Fuego; though e.g. atari go support
should be easy to add), but fairly modular. It has been built with focus
on MonteCarlo-based play, but it can in principle be used for other
play engines as well.
Basic architecture
==================
Pachi consists of the following components:
+------+ +--------+ +---------+ +------------------+
| core | -- | engine | -- | playout | -- | tactical library |
+------+ +--------+ +---------+ +------------------+
| |
+-------------+
| aux library |
+-------------+
* "core" takes care of the program's lifetime, GTP interface and basic
fast Go board implementation
pachi.c global initialization and the main loop
version.h current version information
debug.h debugging infrastructure
random.[ch] fast random number generator
gtp.[ch] GTP protocol interface
network.[ch] Network interface (useful for distributed engine)
timeinfo.[ch] Time-keeping information
stone.[ch] one board point coloring definition
move.[ch] one board move definition
board.[ch] board definition and basic interface
* "aux library" provides extra functions like static tactical evaluation
and pattern matching; it is somewhat interwound with "core" component
mq.h "move queue" data structure
stats.h "move statistics" data structure
probdist.[ch] "probability distribution" data structure
ownermap.[ch] simulation-based finalpos. "owner map" data structure
pattern3.[ch] fast 3x3 spatial pattern matcher
pattern.[ch] general multi-feature pattern matcher
* "tactical library" provides extended interfaces for the go board,
most important non-trivial tactical information
tactics/
* "engine" receives notifications about opponent moves and is asked
to generate a move to play on given board
engine.h abstract engine interface
random/ example "random move generator" engine
replay/ example "playout move generator" engine
montecarlo/ simple treeless Monte Carlo engine, quite bitrotten
uct/ the main UCT-player engine, see below
distributed/ "meta-engine" for distributed play by orchestrating
several UCT engines on different computers
patternscan/ auxiliary engine for harvesting patterns from
existing games
joseki/ auxiliary engine for generating joseki patterns
* "playout" policy is asked to generate moves to play during the Monte Carlo
simulations, and to provide rough evaluation of moves feasibility for
the engine
playout.[ch] abstract playout policy interface,
Monte Carlo simulation execution
playout/light uniformly random playout policy
playout/moggy rule-based "Mogo-like" playout policy
* Also, several ways of testing Pachi are provided:
t-unit/ interface for writing unit-tests for specific
functionality, mainly tactics
t-play/ interface for testing performance by playing games
against a fixed opponent (e.g. GNUGo)
t-predict/ test prediction rates of various components
UCT architecture
================
The UCT engine (the proper name should be MCTS as it does not have that
much common with classic UCT now) has non-trivial structure by itself:
+-------------+ +-----+ +-------------------+
| node policy | -- | UCT | --- | node prior-hinter |
+-------------+ +-----+ +-------------------+
| |
+---------+ |
| playout | -----'
+---------+
* "UCT" is the core of the engine
uct.[ch] engine initialization, public interface
internal.h internal state and data structures
tree.[ch] minimax move tree with success statistics
walk.[ch] filling the tree by walking it many times
and running MC simulations from leaves
slave.[ch] engine interface for the distributed engine
* "node prior-hinter" assigns newly created nodes preliminary success
statistics ("prior values") to focus the search better
prior.[ch] variety of methods for setting the priors
* "node policy" mainly chooses the current node's child to descend
through during the tree walk, based on the already recorded statistics;
it must balance exploration and exploitation well during the selection
policy/ucb1 the old-school original simple policy
policy/ucb1amaf the AMAF/RAVE-based policy gathering statistics rapidly
* "dynkomi driver" dynamically determines self-imposed extra virtual komi
dynkomi.[ch]
Board Implementation
====================
The infrastructure is optimized for speed to make it well suited
for bruteforce engines, however tradeoffs are made to make it useful
for heavier MonteCarlo playouts as well (e.g. real liberties are
tracked instead of pseudoliberties). If you are looking for raw
light playout speed, libEGO is better choice.
In general, arbitrary board sizes are supported; however, board sizes
smaller than 9x9 have not been tested and board sizes larger than 25x25
are not supported by GTP - also, in theory some buffer overflows might
happen with board sizes larger than 19x19. The engine parameters are
primarily tuned for 19x19 play.
Ruleset
-------
While the Pachi engines generally play according to Chinese rules,
internally, Pachi uses Tromp-Taylor rules because they are simple,
fast and universal; they are very close to the New Zealand rules.
That means, it simply counts the number of stones and one-point eyes
of each color on the board, plus komi and handicap correction.
Tromp-Taylor rules also mean that multi-stone suicide is allowed! If you
do not like that (basically if you want to pretend it plays according
to Chinese rules), you need to rule that out in your engine, currently.
The provided engines DO avoid multi-stone suicide, though it is allowed
in the playouts for performance reasons (perhaps we should re-visit that
decision in light of heavy playouts).
Tromp-Taylor rules have positional superko; the board implementation
will set a flag if it is violated, but play the move anyway. You need
to enforce the superko rule in your engine.
GTP Implementation
==================
...is a very sad hack. ENSURE that only trusted parties talk to Pachi's
GTP interface, as it is totally non-resilient to any kind of overflow
or bad input attacks and allowing arbitrary input to be entered within
is a major security hole. Yes, this needs to be cleaned up. Also, currently
engines cannot plug in their own commands and there is no GoGui interface.
Pachi supports only few GTP commands now. Most importantly, it does not
support the undo command. The final_status_list command requires engine
support.
General Pattern Matcher
=======================
Pachi has in-development general pattern matcher that can match various
sets of features (spatial and others), inspired by the CrazyStone and
van der Werf - de Groot - Stern pattern model. Please see pattern.h
for detailed description of the pattern concept and recognized features.
To harvest patterns, use 'pachi -e patternscan' (see patternscan/patternscan.c
for available options). The output of the pattern scanner are two data
structures: The matched patterns
(feature1:payload feature2:payload ...)
and spatial dictionary. "Spatial" feature represents a particular
configuration of stones in a circle around the move-to-play; each
configuration has its own record in the dictionary and the spatial
feature references only the id in the dictionary; so you need to keep
both the patterns and the "patterns.spat" file. Normally, 'patternscan'
will only match already existing dictionary entries, but you
can pass it "gen_spat_dict" so that it appends all newly found spatial
features to the dictionary - use "spat_threshold" to limit recording
only to frequently occuring spatial features; to start the dictionary
from scratch, simply remove any existing "patterns.spat" file.
There are few pre-made scripts to make the initialization of the pattern
matcher easy:
* tools/pattern_byplayer.sh: Sorts out patterns from given SGF collection
by player names, one player per file in a dedicated directory. This is
useful if you want to use the patterns to e.g. recognize games of a
player by characteristic patterns. Spatial dictionary is autogenerated
in full.
* tools/pattern_spatial_gen.sh: Initializes spatial dictionary by spatial
features found at least N times in given SGF collection. This is useful
for further gathering of general pattern statistics while keeping
the amount of spatial features manageable.
* tools/pattern_spatial_show.pl ID: Shows spatial pattern of given id
in 2D plane.
* tools/pattern_bayes_gen.sh: Generates a Bayesian probability table
for each pattern in a SGF collection. The computed number is the empiric
probability of playing a given pattern when it is available on board.
* tools/pattern_bayes_merge.sh: Merges Bayesian probability tables
of patterns. This is useful for parallel pattern mining - run multiple
pattern_bayes_gen (perhaps remove the counts >= 2 test) and then merge
the generated tables.
* tools/pattern_getdrops.pl: Processes game logs to determine moves that
lead to large drop in Pachi's evaluation and extracts patterns that
represent these moves. This might allow Pachi to "learn from its past
mistakes" and recognize these moves in the future. A full pipeline
for this might be:
cat kgsGtp.log | tools/kgslog2gtp.pl |
tools/pattern_getdrops.pl | tools/pattern_bayes_gen.sh - |
tools/pattern_bayes_merge.sh patterns.prob - >patterns.prob2
Plugin API
==========
The UCT engine allows external plugins to be loaded and provide external
knowledge for various heuristics - currently just biasing the MCTS priors,
but more can be added easily. The plugins should follow the API in
<uct/plugin.h> - see that file for details.
Joseki Patterns
===============
Joseki patterns can be generated from variations of a SGF file. Josekis
are matched per-quadrant, no other move must be within the quadrant. See
joseki/README for details on usage of the auxiliary engine and a full
recipe for making Pachi play according to joseki sequences extracted from
the Kogo dictionary.
Opening Book
============
The UCT engine can "pre-read" the starting board position and
dump the core of the built tree to a file, loading it later. This is
called a 'tbook' (as in "tree book") and can be generated using the
tools/gentbook.sh script. The newly generated file is automatically
used by the UCT engine when found.
Alternatively, there is a support for directly used opening book
(so-called fbook, a.k.a. "forced book" or "fuseki book"). The book
is stored in a text file in Fuego-compatible format and can be loaded
using the ./pachi -f parameter. A naive way to build such a book
based on shell-based, twogtp-based UCT is available through the
tools/autobook/ framework.
Local Trees
===========
This is a mostly unique idea in Pachi, currently in development;
it does not work too well yet, but Pasky has great hopes for it in
the future.
A local tree is a tree comprising of (CFG-topologically) local
sequences, built in parallel with the real game tree; there is
a separate tree for black-first and white-first sequences. E.g if
the game tree (on empty board) consists of D4-C6-D6-Q16-O17-D7,
the coresponding local tree sequences are (black) D4-C6-D6, (white)
Q16-O17 and (white) D7. The simulation result is then recorded as
usual in the game tree, but also in the corresponding local trees.
The goal of this is to dynamically build a cache of various
resolutions of local situations. This is to overcome cases where
the proper resolution of a given situation is easily found in
the tree, but improper heuristical bisaes keep muddying the
evaluation down.
Exactly what kind of values to store in the local tree nodes is
still an open question. The original approach has been to simply
use simulation results directly or biased on base "value temperature";
now, we are experimenting with survival rate of the "base stone"
of the branch (the first move in the branch sequence). We also intend
to integrate criticality information with local tree data.
The local trees are descended in parallel with the main tree.
The values stored in local trees are added to the RAVE term
from the main tree during node selection. We also wish to use
the information from local trees in the simulations, but this is
still subject to research.
To enable local trees usage, pass local_tree=1 to the UCT engine.
There are many further knobs to twiddle, too.