MTD(f) Experiments
October, 2018
1. Introduction
This page contains MTD(f) experiments referenced in the paper for ICGA Journal: Move selection in MTD(f). See also my earlier online paper Handling Search Inconsistencies in MTD(f) and the discussion on the World Draughts Forum.
Our implementation of the MTD(f) algorithm is given in Algorithm 1 (MT) and Algorithm 2 (MTD) below. MTD is called from within an iterative deepening framework, starting with f = 0 for d = 1. The returned value g is input f to the next depth. MTD replaces AlphaBetaTT for the root node in an existing program.
Algorithm 1. Memory-enhanced Test with move selection code (Fail-Soft Negamax version):
int MT(Board board, int gamma, int depth) {
if (depth == 0 || terminal(board)) return
negaEval(board); //
negaEval returns the evaluation relative to the color to move
ttMove = NO_MOVE;
entry = tt.lookup(board);
if (entry != null) {
if (entry.depth >= depth)
{
if
(entry.bound == LOWER && entry.value >= gamma) return
entry.value;
if
(entry.bound == UPPER && entry.value < gamma) return
entry.value;
}
ttMove = entry.move;
}
generateMoves(board,
moves); // at
least one move (else terminal)
orderMoves(moves, ttMove);
// apply
move ordering, ttMove (if any) first
g = −INFINITY;
for (i = 0; i < moves.size; i++) {
value = −MT(board.makeMove(moves[i]),
−gamma + 1, depth − 1);
if (value > g) {
g = value;
move =
moves[i];
if (g
>= gamma) break;
}
}
tt.store(board, g, g < gamma ? UPPER : LOWER, depth, <NEW_TT_MOVE>);
return g;
}
Algorithm 2. MTD(f) with move selection code (marked with an asterisk):
int, Move MTD(Board root, int f, int
d) {
g = f;
low = −INFINITY; upp = +
INFINITY;
* bestMove = NO_MOVE;
do {
gamma = g ==
low ? g + 1 : g;
g = MT(root,
gamma, d);
* bestMove = <NEW_BEST_MOVE>;
if (g
< gamma) upp = g; else low = g;
} while (low <
upp);
// not: while (low != upp)
* return g, bestMove;
}
There are three viable options for the new TT move in function MT:
OLD:
<NEW_TT_MOVE> := move
NEW:
<NEW_TT_MOVE> := g >= gamma ? move : NO_MOVE
HIGH:
<NEW_TT_MOVE> := g >= gamma ? move : ttMove
There are two considered options for the new best move in MTD:
LAST:
<NEW_BEST_MOVE> := tt.lookup(root).move
SAFE:
<NEW_BEST_MOVE> := g >= gamma ? tt.lookup(root).move : bestMove
In the Note it is shown that option SAFE is the safe way to select the best move in MTD(f) in the presence of search inconsistencies, regardless of how we implemented the new TT move (option OLD, NEW or HIGH). Using option LAST, in combination with option OLD, the algorithm is capable of blundering in the presence of search inconsistencies. The experiments in Handling Search Inconsistencies in MTD(f) tentatively confirm this.
2. Experiments
AlphaBetaTT and all options for MTD(f) were implemented in the developing 10x10 draughts engine Max, successor to Maximus (Van Horssen 2012). It has a machine-learned evaluation function (Buro 1997, Letouzey Online a) and a 6-piece endgame database. The search is still sequential and other than extending forced moves and leaf captures (quiescence), no search extensions and forward pruning are used yet. Move ordering consists of using the TT move and the History Heuristic (Schaeffer 1983) for the remainder of the moves (except for shallow depths). In this engine, which uses centipawns as evaluation unit, MTD(f) makes on average 5 calls to MT.
2.1. Performance of the TT move options
We investigated the performance of the options OLD, NEW and HIGH (i.e., their effect on move ordering). A tournament was played between versions of Max using different options, see Table 1. The search was interrupted after 0.1 second per move, returning the best move of the last completed iteration. No pondering was used.
|
OLD |
NEW |
HIGH |
games |
score |
MTD(f) option OLD |
|
+106 −104 =778, 50.1% |
+92 −89 =807, 50.2% |
1976 |
50.1% |
MTD(f) option NEW |
+104 −106 =778, 49.9% |
|
+90 −102 =796, 49.4% |
1976 |
49.6% |
MTD(f) option HIGH |
+89 −92 =807, 49.8% |
+102 −90 =796, 50.6% |
|
1976 |
50.2% |
Table 1. 10x10 draughts tournament of 988-game matches (the 3-move ballots: 494 3-ply openings, played with both colors) at 0.1 sec/move. We use '+' for wins, '−' for losses and '=' for draws.
We found no significant difference between the three options, or it should be that option NEW is slightly worse than the others, which would indicate that having some TT move is slightly better than having no TT move in case the search failed low.
2.2. Ponder move and TT move options
We also investigated the effectiveness of options OLD and HIGH for providing a ponder move, retrieved from the TT. (Option NEW often provides no ponder move.) This was done by using the ponder move to predict the next move in self-play. The TT was cleared in between moves to make sure no TT entries from a previous search by the opponent were used. Of course, the percentage of ponder hits is biased in self-play, that is why we compare the results to those of AlphaBetaTT, which always provides a good ponder move. We only examine positions where a ponder move should be available and the move to play is not forced. This gives an average branching factor of about 10.0 for 10x10 draughts. Table 2 shows the results. AlphaBetaTT also returns the best move of the last completed iteration. We performed three series of experiments: (1) with a fixed search depth per move (10 ply), (2) 988 games (3-move ballots) with 0.1 sec/move, and (3) 158 games (2-move ballots) with 1 sec/move.
10 ply nominal |
0.1 sec/move |
1 sec/move |
||||
988 games, > 82,000 moves |
988 games, > 83,000 moves |
158 games, > 13,000 moves |
||||
ponder hit % |
no ponder % |
ponder hit % |
no ponder % |
ponder hit % |
no ponder % |
|
AlphaBeta |
71.3 |
0.000 |
74.6 |
0.000 |
77.3 |
0.000 |
MTD(f) option OLD |
69.7 |
0.025 |
71.9 |
0.031 |
74.4 |
0.030 |
MTD(f) option HIGH |
69.1 |
0.023 |
71.9 |
0.043 |
74.4 |
0.008 |
Table 2. Percentages of ponder hits and no ponder move in 10x10 draughts self-play.
No significant difference was found between the two options, but a significant difference with AlphaBetaTT was found. If we take the AlphaBetaTT ponder hit percentage as a baseline of 100%, MTD(f) achieves about 96% of that. So, in MTD(f) we have to settle for a somewhat less reliable ponder move and in rare cases no ponder move at all.
3. Conclusions
Concluding, we can say that
1. To select the best move (the move to play) we should always use option SAFE. It is the safe way to select a move in the presence of search inconsistencies, which are a fact of life if we use a TT efficiently.
2. For selecting the move to store in the TT there are three viable options: OLD, NEW and HIGH. We found no significant difference in performance between these options.
3. MTD(f) cannot provide the real principal variation, but for practical purposes a semi-PV can be efficiently retrieved from the TT.
4. To select the ponder move we can retrieve the position after the best move from the TT. This way, the options OLD and HIGH (not NEW) almost always provide a ponder move. However, the quality of this ponder move cannot quite match that of the ponder move provided by alpha-beta based algorithms.
References
Buro, M. (1997). Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. Games in AI Research, 77-96.
Van Horssen, J. J. (2012). Schwarzman vs. MAXIMUS: A Man-Machine Match in International Draughts. ICGA Journal, 35(2), 106-119.
Letouzey, F. (Online a). Scan. World Draughts Forum. http://laatste.info/bb3/viewtopic.php?t=6786.
Schaeffer, J. (1983). The history heuristic. ICCA Journal, 6(3), 16-19.