@AllieTheChessBot progress update and stronger configurations

yimingz31 Jun 202410,234 viewsEnglish (US)

Challenge new Allie configurations!

What's Allie and how does it work again?

@AllieTheChessBot is a humanlike chess bot that learns to play, think and resign like humans. See our previous post for details.

Is Allie humanlike?

Offline evaluation

How do we evaluate if Allie plays chess like humans? We focus on three aspects: whether Allie plays moves like humans, thinks like humans and resigns like humans, evaluated on a test set of ~20k Lichess blitz games.

Move-matching Using Maia as a baseline, we compute move-matching accuracy of Allie, defined as how often the model plays a move that a human would play. Since Maia is a family of models for different ratings, we define an adaptive Maia* model by choosing the Maia model with the closest ELO rating to the players' ratings. Note that a move-matching accuracy of 100% is unattainable, because people play different moves in the same positions.

Across strength levels, Allie outperforms Maia* by an average of 2-5% in matching human moves. Notably, move-matching accuracy increases for both models until 2300 ELO, and then decreases consistently with rating. Neither model uses search to improve quality of moves played, and this perhaps points towards a need for search to match human moves at stronger levels.

Think time We find a strong correlation between the Allie's think time and human think time, with Kendall’s τ = 0.697.

Resignation We look at how often Allie resigns when humans do (true positive rate, TPR) and don't (false positive rate, FPR): Allie usually resigns at positions where humans resign (TPR = 86.4%) and virtually never resigns when humans don't (FPR = 0.1%).

Online results from 4k online games

Offline evaluation results are promising, but we ultimately want Allie to play like humans online, and humans to enjoy playing against Allie.

In 4k online games, player are randomly assigned one of the following three bots/configurations:

Allie-Adaptive: Allie that tries to adapt to opponent player's strength.
Allie-Strong: Alliethat targets 2500 strength gameplay.
Maia*

Both Allie-Strong and Maia* use argmax sampling, meaning they play the top move under the model. Allie-Adaptive instead uses softmax sampling, picking a random move under the model distribution, to introduce variation in moves and (hopefully) blunder just the right amount.

Allie loses too much against strong players

A perfectly adaptive model would have an expected score of 0.5 against players of all strengths. However, all three bots lose more games to stronger players. Again, it seems like search is a fundamental aspect of strong human chess play, and it's difficult to play like humans at a strong level without search.

Play against Allie with search!

Most chess bots use search, which means looking ahead at potential moves and future positions to evaluate the current position. Existing Allie variants play without any search. We've deployed new variants of Allie with Monte-Carlo tree search (MCTS) enabled to make Allie stronger. In offline evaluations, we find Allie consistently plays stronger moves, without drop in humanlikeness.

Please help improve Allie by playing @AllieTheChessBot. Since we want to test the upper bound of Allie's strength, we would love more games from strong players. There is also an optional postgame survey collecting subjective feedback (e.g., whether Allie plays like a human). For questions, DM @yimingz3!

Discuss this blog post in the forum

Using Stockfish to identify ideal squares

How can an engine be used to answer positional questions?