Rating Your Moves: Guess the Elo, Move by Move

When a chess game ends the outcome is simple: your rating goes up or down. The thoughtful buildup, rush decisions, small inaccuracies and big blunders are all condensed into one number. However, chess players know that strength shows itself before checkmate or resignation. The question is whether an AI could spot that strength, move by move.

That’s precisely the question that Jim Yang, a graduate student at Columbia University, is looking to answer with his project Rating Your Moves. This work earned him one of the 2025 Chessable Research Awards. By analyzing millions of online games, the model treats every move as evidence of the Elo hidden in a single decision. As the game progresses, the model’s estimate sharpens revealing play patterns, control of the board and other reactions to signal the player’s strength. Ultimately, chess is used as a way for AI to understand human decision-making. In this guest post, Yang provides a compelling overview of these efforts and how they can be used in the future to help chess improvement.

Rating Your Moves: Guess the Elo, Move by Move

“The mistakes are all there on the board, waiting to be made.” – Savielly Tartakower

When we play online chess, we usually only see our rating at the end of the game: a number goes up or down, and we move on. The Rating Your Moves (we) project turns that idea on its head. Instead of asking, “What was the result?”, it asks:

If an AI watched your game move by move, how quickly could it figure out how strong you are?

To answer this, we take millions of online blitz games, turn every half-move (ply) into a structured representation a neural network can understand, and train models to guess a player’s rating band from that stream of moves. As the game progresses, the model updates a whole probability distribution over rating levels, like a commentator who becomes more and more certain about who’s sitting at the board.

Learning from many games

“Chess is the art of analysis.” – Mikhail Botvinnik

Earlier work on large online databases has already shown that you can get surprisingly far just by looking at what moves people play, game after game. When models take entire games as input, they can correctly predict a player’s rating band a significant fraction of the time, and they’re usually not far off even when they miss. Performance improves sharply as the model sees more games per player, then flattens out: more data leads to more learning, but in a distinctly non-linear way.

These studies also show that different kinds of information each contribute their own “slice” of predictability. Features related to chess and strategy (material balance, king safety, central control), time usage (time trouble, time spent on move), and engine-based move quality (an engine’s opinion on the move) all add something, but in different ways. No single metric fully explains playing strength; it’s the combination of these factors that reveals the fingerprint.

We build on those insights, but zoom in from whole games to individual decisions. Rather than asking “How strong is this player overall?”, the focus is: “How much rating information is hidden in this move, on this board, right now?”

Thinking like Maia

“Every chess master was once a beginner.” – Irving Chernev

The most famous “human-like” chess engines today come from the Maia project. The original Maia models took the AlphaZero / Leela style of neural network and trained it purely on human games, with one separate network for each rating band, from 1100 up to 1900. Each model tried to answer the question: “Given this position and this rating level, what move would a typical player at that level play?” Maia-2 uses a unified model that works across all skill levels at once. A key innovation is a “skill-aware attention” mechanism that explicitly combines the encoded board position with information about the player’s strength, so the model can smoothly track how style and accuracy change as people improve. Maia and Maia-2 answer: “What would a human of this strength play here?” Yet, consider the complementary question: “Given these human moves, what strength is most likely behind them?”

Where Maia focuses on imitating human moves, we focus on inferring human strength. Both are examples of human-aligned AI in chess, but they look at the board from opposite ends.

From online archives to rating bands

“You may learn much more from a game you lose than from a game you win.” – José Raúl Capablanca

Under the hood, this research draws on extensive online chess databases that contain tens of millions of games. For the current experiments, the project concentrates on 5+0 blitz games from a recent window of time, with players between roughly 400 and 2400 Elo.

The raw PGN files are processed into a clean dataset: only standard rated games with sensible ratings are kept, and obvious oddities are filtered out. Because online ratings cluster heavily in the middle, the project builds a balanced dataset for training and validation, with roughly equal numbers of games in each rating band, alongside a “real-world” test set that keeps the naturally skewed distribution. That way, models are not rewarded for always guessing “1400,” but they are still evaluated on realistic ladders.

Each remaining game is then exploded into a sequence of ply-level records: one data point per half-move, tagged with the average rating of the two players. That’s the raw material from which we teach an AI to “listen” to your moves.

Sixty-four layers of board vision

“Pawns are the soul of chess.” – François-André Danican Philidor

To us, a position is a single 8×8 board. To our models, it looks more like a stack of 64 transparent boards, each one highlighting a different aspect of the position and the move just played.

Some of those layers mark where pieces stand: one board for white pawns, another for white knights, and so on, mirrored for Black. Others capture something closer to board vision: squares a piece could reach, squares controlled by a certain piece, and global summaries such as “all squares White controls” or “net control” showing which side dominates which squares.

A few critical layers encode the change created by the current move. One layer lights up the from-square, another the to-square; others show how overall control patterns look after the move, side by side with the patterns from before. If a pawn storm suddenly opens lines against a king, or a quiet prophylactic move shuts down an attack, these layers make that visible to the network. Put together, the 64 layers form a high-dimensional X-ray of the position: material, square control, and the shockwave of each decision.

Watching the AI watch your game

“On the chessboard, lies and hypocrisy do not survive long.” – Emanuel Lasker

Once each move has been turned into a 64-layer snapshot, we train a family of neural networks, some simple, some more sophisticated, to recognize patterns in these snapshots and associate them with rating bands. For each move, the models output a probability distribution over rating levels: perhaps a broad spread early on, then a sharper peak as the game progresses.

The really fun part comes when these predictions are stitched together. Instead of just looking at the final “best guess,” we apply simple Bayesian updates: each new move is treated as a piece of evidence that shifts the model’s belief about the player’s strength. If several moves in a row look very typical of 1700s, the distribution drifts toward that band; if later decisions look more like novice blunders or master-level precision, the belief moves accordingly.

This process can be turned into a short animated GIF. On top, the game plays out on a normal board. On the bottom, smooth curves show the current rating distribution, changing with every move. At the start, the curves are flat and uncertain. After a few opening moves, they begin to favour a range. A burst of accurate tactics might send them climbing; a sudden collapse in time trouble might drag them back down. By the end, there’s usually a narrow peak near the true rating.

Beyond the board – time, engines, and style

“The beauty of a move lies not in its appearance, but in the thought behind it.” – Siegbert Tarrasch

The current pipeline deliberately focuses on board-only information: pieces, square control, and the pattern of moves. That makes it easier to understand how much rating information is contained in the structure of decisions alone.

But earlier work already suggests that time usage and engine-measured move quality each add their own, independent slices of information. How long a player thinks in quiet vs. sharp positions, how often they burn their clock in already-lost positions, how frequently their moves swing the engine’s evaluation; these all tell us something about style, risk appetite, and practical strength that the board position alone might not.

Future versions of the study will add those channels back in, alongside the 64 board layers, to ask richer questions: Are there players whose positions look stronger than their rating, but whose time management drags them down? Are some styles, such as wild attackers or ultra-solid grinders, harder for the model to “rate” fairly? How does the information from the board interact with the information from the clock?

The long-term goal is not just a clever rating predictor. It’s a set of tools that can tell you, in concrete terms, what your moves say about you: where you consistently overperform your rating, where you fall short under pressure, and how your decisions change as you improve.

In that sense, we fit neatly into the same landscape as Maia-2: both projects treat chess as a laboratory for understanding human decision-making, and both point toward AI partners that don’t just play strong moves, but understand the humans on the other side of the board.

References

Tang, Z., McIlroy-Young, R., Sen, S., Kleinberg, J., & Anderson, A. (2024). Maia-2: A unified model for human–AI alignment in chess. In Advances in Neural Information Processing Systems (NeurIPS 2024).

Hamade, K., McIlroy-Young, R., Sen, S., Kleinberg, J., & Anderson, A. (2024). Designing skill-compatible AI: Methodologies and frameworks in chess. In Proceedings of the International Conference on Learning Representations (ICLR 2024).

McIlroy-Young, R., Sen, S., Kleinberg, J., & Anderson, A. (2020). Aligning superhuman AI with human behavior: Chess as a model system. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1677–1687).

McIlroy-Young, R., Wang, R., Sen, S., Kleinberg, J., & Anderson, A. (2021). Detecting individual decision-making style: Exploring behavioral stylometry in chess. In Advances in Neural Information Processing Systems, 34.

McIlroy-Young, R., Wang, R., Sen, S., Kleinberg, J., & Anderson, A. (2022). Learning models of individual behavior in chess. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22) (pp. 1153–1163). https://doi.org/10.1145/3534678.3539367

McIlroy-Young, R., Kleinberg, J., Sen, S., Barocas, S., & Anderson, A. (2022). Mimetic models: Ethical implications of AI that acts like you. In Proceedings of the 5th AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES).

Backus, M., Blake, T., Larsen, B., & Tadelis, S. (2016). Sequential bargaining in the field: The pricing of players in online games. Unpublished manuscript.

Bendre, S., Maharaj, S., Polson, N., & Sokolov, V. (2023). On the probability of Magnus Carlsen reaching 2900. Applied Stochastic Models in Business and Industry, 39(3), 372–381. https://doi.org/10.1002/asmb.2745

Carow, A., & Witzig, M. (2024). Time pressure and strategic risk-taking in professional chess (Gutenberg School of Management and Economics Working Paper No. 2404). Johannes Gutenberg University Mainz.

Dilmaghani, M. (2021). The gender gap in competitive chess across countries. Women’s Studies International Forum, 87, 102493.

Gupta, A., Maharaj, S., Polson, N., & Sokolov, V. (2023). On the value of chess squares. Entropy, 25(10), 1374. https://doi.org/10.3390/e25101374

Künn, S., Seel, C., & Zegners, D. (2022). Cognitive performance in remote work: Evidence from professional chess. Economic Journal, 132(643), 1218–1255.

Levitt, S. D., List, J. A., & Sadoff, S. (2011). Checkmate: Exploring backward induction among chess players. American Economic Review, 101(2), 625–631.

Maharaj, S., Polson, N., & Sokolov, V. (2022). Chess AI: Competing paradigms for machine intelligence. arXiv preprint arXiv:2109.08149.

Maharaj, S., Polson, N., & Turk, C. (2022). Gambits: Theory and evidence. Applied Stochastic Models in Business and Industry, 38(3), 429–446.

Maharaj, S., Polson, N., & Sokolov, V. (2023). Kramnik vs. Nakamura or Bayes vs. p-value. SSRN Working Paper No. 4648621.

Regan, K. W., & Biswas, T. (2013). Psychometric modeling of decision making via game play. In Proceedings of the IEEE Conference on Computational Intelligence in Games (CIG).

Regan, K. W., & Di Fatta, G., & Haworth, G. (2009). Skill rating by Bayesian inference. In 2009 IEEE Symposium on Computational Intelligence and Data Mining (pp. 89–94).

Regan, K. W., & Haworth, G. (2011). Intrinsic chess ratings. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI-11).

Regan, K. W., Biswas, T., & Zhou, J. (2014). Human and computer preferences at chess. In Proceedings of the 8th Multidisciplinary Workshop on Advances in Preference Handling (MPREF).

Regan, K. W., Macieja, B., & Haworth, G. (2011). Understanding distributions of chess performances. In H. J. van den Herik et al. (Eds.), Advances in Computer Games 13 (pp. 116–130).

Salant, Y. (2011). Complexity and choice. Quarterly Journal of Economics, 126(3), 1459–1492.

Salant, Y., & Spenkuch, J. L. (Forthcoming). Complexity and satisficing: Theory with evidence from chess. Review of Economic Studies.

Salant, Y., Spenkuch, J. L., & Almog, D. (n.d.). The memory premium (Working paper).

Smerdon, D., Hu, A., McLennan, A., & von Hippel, W. (2020). Female chess players show typical stereotype-threat effects in a field experiment. Psychological Science, 31(10), 1193–1204. https://doi.org/10.1177/0956797620924051

Smerdon, D., Meyer, C. B., Reizniece-Ozola, D., Rodrigo-Yanguas, M., & Sorokina, A. (2023). Report: 2023 FIDE Gender Equality in Chess Index (GECI). FIDE & The University of Queensland. https://doi.org/10.14264/9bb291f

Welcome to Liberty Case

Welcome to Liberty Case

Welcome to Liberty Case

Forever

Recommended

1-Year

1-Month

Welcome to Liberty Case

Jaxon Smith-Njigba: Believe I deserve to be highest-paid WR

Sesko finds a winner for visitors who go fourth

How to watch TGL: New York GC vs. The Bay GC on ESPN

Ranking best premier league players name pronunciation

Jaxon Smith-Njigba: Believe I deserve to be highest-paid WR

Sesko finds a winner for visitors who go fourth

How to watch TGL: New York GC vs. The Bay GC on ESPN

Ranking best premier league players name pronunciation

Rating Your Moves: Guess the Elo, Move by Move