Expected Score of Grandmasters based on Evaluation

Comments on https://lichess.org/@/jk_182/blog/expected-score-of-grandmasters-based-on-evaluation/GB9sX2ab

Perhaps the win expectancy formula could consider player ratings and time pressure.

Perhaps the win expectancy formula could consider player ratings and time pressure.

This could be an interesting project, using the Lichess database to get enough games from lower rated players and at different time controls.

@Toadofsky said in #2: > Perhaps the win expectancy formula could consider player ratings and time pressure. This could be an interesting project, using the Lichess database to get enough games from lower rated players and at different time controls.

gardnec

Neat post. I do have some suggestions though.

It would be neat to see what the error bars are for the points you are fitting to. It would also give some insight into which function you should use. Both functions in the post work pretty well for the first 6 points, but not the 7th so I wonder if maybe the 7th point has more uncertainty? Also, I'm assuming the points are calculated from a histogram but the centipawn bins don't look the same(there is more horizontal difference between point 6 and 7 then 5 and 6). If you for example have more games on the left side of that last bin you could be underestimating the win % at where the point is shown.

Neat post. I do have some suggestions though. It would be neat to see what the error bars are for the points you are fitting to. It would also give some insight into which function you should use. Both functions in the post work pretty well for the first 6 points, but not the 7th so I wonder if maybe the 7th point has more uncertainty? Also, I'm assuming the points are calculated from a histogram but the centipawn bins don't look the same(there is more horizontal difference between point 6 and 7 then 5 and 6). If you for example have more games on the left side of that last bin you could be underestimating the win % at where the point is shown.

Periastron

Inspiring stuff, and pleasing in terms of methodology as well. Thanks a lot for that!

One more thought: in principle, the use of centipawns is an indirection inherited from the early days of computer chess. Back then, engines could count material (leading to centipawns directly) and started to add functional knowledge bit by bit, converting these insights to centipawns.
However, the function you really want to optimize is score percentage (or in some cases, driven by tounament situation etc., win percentage). LC0, if I get this right, actually uses this approach directly, deriving WDL percentages from learning. It would be quite interesting IMHO to check those results against human outcomes as well - wouldn't it?

Inspiring stuff, and pleasing in terms of methodology as well. Thanks a lot for that! One more thought: in principle, the use of centipawns is an indirection inherited from the early days of computer chess. Back then, engines could count material (leading to centipawns directly) and started to add functional knowledge bit by bit, converting these insights to centipawns. However, the function you really want to optimize is score percentage (or in some cases, driven by tounament situation etc., win percentage). LC0, if I get this right, actually uses this approach directly, deriving WDL percentages from learning. It would be quite interesting IMHO to check those results against human outcomes as well - wouldn't it?

GnocchiPup

This is good, then you could also recalibrate the inaccuracy, mistake, and blunder tagging for gm games.

kinglovesqueens

@jk_182 Great insights! You mentioned that you analyzed around 4,500 games from top tournaments, big open tournaments, and the Chess Olympiad over the past two years.
Did you use game data from Lichess for this analysis ( or some other place )? Also, do you know if this data includes the clock time for each move?

@jk_182 Great insights! You mentioned that you analyzed around 4,500 games from top tournaments, big open tournaments, and the Chess Olympiad over the past two years. Did you use game data from Lichess for this analysis ( or some other place )? Also, do you know if this data includes the clock time for each move?

jk_182

@gardnec said in #4:

Neat post. I do have some suggestions though.

It would be neat to see what the error bars are for the points you are fitting to. It would also give some insight into which function you should use. Both functions in the post work pretty well for the first 6 points, but not the 7th so I wonder if maybe the 7th point has more uncertainty? Also, I'm assuming the points are calculated from a histogram but the centipawn bins don't look the same(there is more horizontal difference between point 6 and 7 then 5 and 6). If you for example have more games on the left side of that last bin you could be underestimating the win % at where the point is shown.

I decided to add points for evaluations of 0.5 to 3 in steps of size 0.5 and then add one more point for an advantage of +5. I was unsure how to deal with higher evaluations since they appear less often and I think that the formula should be geared towards the 0-3 range because the win percentage for high evaluations is less interesting. But I'm unsure if this is the best approach.

@gardnec said in #4: > Neat post. I do have some suggestions though. > > It would be neat to see what the error bars are for the points you are fitting to. It would also give some insight into which function you should use. Both functions in the post work pretty well for the first 6 points, but not the 7th so I wonder if maybe the 7th point has more uncertainty? Also, I'm assuming the points are calculated from a histogram but the centipawn bins don't look the same(there is more horizontal difference between point 6 and 7 then 5 and 6). If you for example have more games on the left side of that last bin you could be underestimating the win % at where the point is shown. I decided to add points for evaluations of 0.5 to 3 in steps of size 0.5 and then add one more point for an advantage of +5. I was unsure how to deal with higher evaluations since they appear less often and I think that the formula should be geared towards the 0-3 range because the win percentage for high evaluations is less interesting. But I'm unsure if this is the best approach.

jk_182

@Periastron said in #5:

Inspiring stuff, and pleasing in terms of methodology as well. Thanks a lot for that!

One more thought: in principle, the use of centipawns is an indirection inherited from the early days of computer chess. Back then, engines could count material (leading to centipawns directly) and started to add functional knowledge bit by bit, converting these insights to centipawns.
However, the function you really want to optimize is score percentage (or in some cases, driven by tounament situation etc., win percentage). LC0, if I get this right, actually uses this approach directly, deriving WDL percentages from learning. It would be quite interesting IMHO to check those results against human outcomes as well - wouldn't it?
Yes, I intend to take a deeper dive into the comparison between the LC0 WDL and how humans do in these positions. For now I wanted to keep it based on the Stockfish evaluation since this is still more commonly used and converting centipawns into an expected score has value because, as you said, centipawn evaluations are far less intuitive than their name would suggest.

@Periastron said in #5: > Inspiring stuff, and pleasing in terms of methodology as well. Thanks a lot for that! > > One more thought: in principle, the use of centipawns is an indirection inherited from the early days of computer chess. Back then, engines could count material (leading to centipawns directly) and started to add functional knowledge bit by bit, converting these insights to centipawns. > However, the function you really want to optimize is score percentage (or in some cases, driven by tounament situation etc., win percentage). LC0, if I get this right, actually uses this approach directly, deriving WDL percentages from learning. It would be quite interesting IMHO to check those results against human outcomes as well - wouldn't it? Yes, I intend to take a deeper dive into the comparison between the LC0 WDL and how humans do in these positions. For now I wanted to keep it based on the Stockfish evaluation since this is still more commonly used and converting centipawns into an expected score has value because, as you said, centipawn evaluations are far less intuitive than their name would suggest.

jk_182

@kinglovesqueens said in #7:

@jk_182 Great insights! You mentioned that you analyzed around 4,500 games from top tournaments, big open tournaments, and the Chess Olympiad over the past two years.
Did you use game data from Lichess for this analysis ( or some other place )? Also, do you know if this data includes the clock time for each move?
I downloaded the games from The Week in Chess and they didn't include clock times. AFAIK the broadcasts on Lichess save the clock times in the PGN so one can use these games but the downloading is a bit more cumbersome.

@kinglovesqueens said in #7: > @jk_182 Great insights! You mentioned that you analyzed around 4,500 games from top tournaments, big open tournaments, and the Chess Olympiad over the past two years. > Did you use game data from Lichess for this analysis ( or some other place )? Also, do you know if this data includes the clock time for each move? I downloaded the games from The Week in Chess and they didn't include clock times. AFAIK the broadcasts on Lichess save the clock times in the PGN so one can use these games but the downloading is a bit more cumbersome.

Your network blocks the Lichess assets!

Expected Score of Grandmasters based on Evaluation