Why Opening Statistics Are Wrong

It turns out that using the subsample is enough to remove the bias of the explorer statistics, and running my proposed methodology on the subsample yields very marginal improvement (at least when it comes to the first move by white and black--I'm now looking into the second moves). I believe that this is because the subsample does not include blitz games played in tournaments. This results to fewer "type-A" players entering the sample and distorting the statistics.

Wow, I didn't anticipate that. Seems like regular matchmaking pool games control for rating automatically. It feels like a pretty substantial and easy improvement to make to the analysis board, a checkbox for "matchmaking games only" shouldn't be too hard since "Event" is already a field in the database.

Nice work!

@D2D4C2C4 said in #17: > It turns out that using the subsample is enough to remove the bias of the explorer statistics, and running my proposed methodology on the subsample yields very marginal improvement (at least when it comes to the first move by white and black--I'm now looking into the second moves). I believe that this is because the subsample does not include blitz games played in tournaments. This results to fewer "type-A" players entering the sample and distorting the statistics. Wow, I didn't anticipate that. Seems like regular matchmaking pool games control for rating automatically. It feels like a pretty substantial and easy improvement to make to the analysis board, a checkbox for "matchmaking games only" shouldn't be too hard since "Event" is already a field in the database. Nice work!

Toadofsky

Initially I had wondered the same thing as this blog post, but quickly assumed that only I had this concern and it would have at most a marginal effect.

Upon further reflection, maybe this effect is why I observe a larger than expected disparity between the master games opening explorer and the Lichess games opening explorer. Interesting...

Initially I had wondered the same thing as this blog post, but quickly assumed that only I had this concern and it would have at most a marginal effect. Upon further reflection, maybe this effect is why I observe a larger than expected disparity between the master games opening explorer and the Lichess games opening explorer. Interesting...

marcusbuffett

Thank you for writing this up and especially for validating it with real data, I was inclined to wave it off as a small effect but the numbers are hugely different. I think maybe a small tweak to add would be just using games where the difference in rating was small enough, since you can't really attribute the opening to the winrate difference if someone rated 2000 plays a 1200 player in an open tournament or something

This is an incredibly interesting post, especially as someone who has dealt extensively with opening data from Lichess (I run chessbook.com). I'm going to use the proposed method here in our next iteration of our openings database. Thank you for writing this up and especially for validating it with real data, I was inclined to wave it off as a small effect but the numbers are hugely different. I think maybe a small tweak to add would be just using games where the difference in rating was small enough, since you can't really attribute the opening to the winrate difference if someone rated 2000 plays a 1200 player in an open tournament or something

Simpo137

I find it very interesting that the 2000-2200 black players have a significantly greater than 50% win rate with black. This probably means on average they play more games against weaker opposition (we'd expect this as the number of people for a rating peaks lower than 2000).
Something else to consider - could you try where both players are in a certain rating range?

I find it very interesting that the 2000-2200 black players have a significantly greater than 50% win rate with black. This probably means on average they play more games against weaker opposition (we'd expect this as the number of people for a rating peaks lower than 2000). Something else to consider - could you try where both players are in a certain rating range?

D2D4C2C4

@marcusbuffett said in #23:

This is an incredibly interesting post, especially as someone who has dealt extensively with opening data from Lichess (I run chessbook.com). I'm going to use the proposed method here in our next iteration of our openings database.

Thank you for writing this up and especially for validating it with real data, I was inclined to wave it off as a small effect but the numbers are hugely different. I think maybe a small tweak to add would be just using games where the difference in rating was small enough, since you can't really attribute the opening to the winrate difference if someone rated 2000 plays a 1200 player in an open tournament or something

I like Chessbook! I have used it to construct repertoires, and it makes the process easy and fun. I'd be glad if my idea turns out to be useful. I assume that running this analysis for the whole sample is feasible for you, so I'd be curious to see if the picture is similar in the whole dataset. I used only the January 2017 sample, because I'm running this on a mediocre laptop and my internet speed is 3mbps lol. Some preliminary analysis I'm running on the subsample (mentioned in the comments) seems to suggest that my methodology matters less when big rating differences are less common. Anyway, keep us updated!

@marcusbuffett said in #23: > This is an incredibly interesting post, especially as someone who has dealt extensively with opening data from Lichess (I run chessbook.com). I'm going to use the proposed method here in our next iteration of our openings database. > > Thank you for writing this up and especially for validating it with real data, I was inclined to wave it off as a small effect but the numbers are hugely different. I think maybe a small tweak to add would be just using games where the difference in rating was small enough, since you can't really attribute the opening to the winrate difference if someone rated 2000 plays a 1200 player in an open tournament or something I like Chessbook! I have used it to construct repertoires, and it makes the process easy and fun. I'd be glad if my idea turns out to be useful. I assume that running this analysis for the whole sample is feasible for you, so I'd be curious to see if the picture is similar in the whole dataset. I used only the January 2017 sample, because I'm running this on a mediocre laptop and my internet speed is 3mbps lol. Some preliminary analysis I'm running on the subsample (mentioned in the comments) seems to suggest that my methodology matters less when big rating differences are less common. Anyway, keep us updated!

Awesome-Days

@D2D4C2C4 said in #20:

Bigger is better, but the difference matters too, because I may be willing to make some modification in my opening repertoire to chase a +5% advantage, but a +1% advantage is not worth the effort (for me at least).

yes, my point is that the gap will remain big between a good move and bad one regardless of whether the 8% of games biasing toward 50%-win rate are included or excluded.

Another form of bias is also games where a player decides to Berserk with 50% on the clock, biasing results toward 50-50 as well.

@D2D4C2C4 said in #20: > Bigger is better, but the difference matters too, because I may be willing to make some modification in my opening repertoire to chase a +5% advantage, but a +1% advantage is not worth the effort (for me at least). yes, my point is that the gap will remain big between a good move and bad one regardless of whether the 8% of games biasing toward 50%-win rate are included or excluded. Another form of bias is also games where a player decides to Berserk with 50% on the clock, biasing results toward 50-50 as well.

D2D4C2C4

@Simpo137 said in #24:

I find it very interesting that the 2000-2200 black players have a significantly greater than 50% win rate with black. This probably means on average they play more games against weaker opposition (we'd expect this as the number of people for a rating peaks lower than 2000).
Something else to consider - could you try where both players are in a certain rating range?

As expected, win rates for black significantly go down (towards 48%) when I restrict the opponents ratings to also be within this same 2000-2200 range, but the difference between the relative effectiveness of various moves does not change much (one exception: 1... d5 in response to 1.d4 looks better after this adjustment). I need to think more about this, and what type of rating adjustment should be done. Should we perhaps keep only opponents who are in the same range? or opponents within a range from the individual player's rating? etc The more I look at the data the more I tend to agree that some restrictions on rating differences would be useful too.

@Simpo137 said in #24: > I find it very interesting that the 2000-2200 black players have a significantly greater than 50% win rate with black. This probably means on average they play more games against weaker opposition (we'd expect this as the number of people for a rating peaks lower than 2000). > Something else to consider - could you try where both players are in a certain rating range? As expected, win rates for black significantly go down (towards 48%) when I restrict the opponents ratings to also be within this same 2000-2200 range, but the difference between the relative effectiveness of various moves does not change much (one exception: 1... d5 in response to 1.d4 looks better after this adjustment). I need to think more about this, and what type of rating adjustment should be done. Should we perhaps keep only opponents who are in the same range? or opponents within a range from the individual player's rating? etc The more I look at the data the more I tend to agree that some restrictions on rating differences would be useful too.

D2D4C2C4

@Awesome-Days said in #26:

yes, my point is that the gap will remain big between a good move and bad one regardless of whether the 8% of games biasing toward 50%-win rate are included or excluded.

Another form of bias is also games where a player decides to Berserk with 50% on the clock, biasing results toward 50-50 as well.

Good point, berserk games are indeed another reason in favor of excluding tournament games.

The gaps between good and bad moves are generally huge, but this is more about the gaps between the many good moves that players can play in the very first moves of the game. It's a comparison between e4, d4, c4, nf3, f4 for white and the top 5 responses to e4 and d4 for black. The gaps between the good moves are those that may change depending on the methodology of calculation used

@Awesome-Days said in #26: > yes, my point is that the gap will remain big between a good move and bad one regardless of whether the 8% of games biasing toward 50%-win rate are included or excluded. > > Another form of bias is also games where a player decides to Berserk with 50% on the clock, biasing results toward 50-50 as well. Good point, berserk games are indeed another reason in favor of excluding tournament games. The gaps between good and bad moves are generally huge, but this is more about the gaps between the many good moves that players can play in the very first moves of the game. It's a comparison between e4, d4, c4, nf3, f4 for white and the top 5 responses to e4 and d4 for black. The gaps between the good moves are those that may change depending on the methodology of calculation used

Simpo137

Thanks for the response @D2D4C2C4. It depends what the person looking at wants, but maybe +- 100. Another feature (on chess tempo) is to compare the average performance (of a move) against the average rating. That is probably best

dboing

Funny I never really used the opening explorer that way (still reading), but now I might try it.

I have been using it in correspondance during games, to learn. To see the boards without getting distracted by having to be bombarded with names and without getting restricted by my abiliity (or lack of) to learn moves, rather than look at what is on the board for its mecanistic logic.

And I would try to learn from the board themselsve amond the popularity statistics but not in my range. just above my range. As I seek the struggle, near the win for myself. I prefer chess study to chess performance. (while there is no such reality, the dichotomy is not about that, but about what motivates us, as both are, in the long time scales of learning entangled, question of proportions and fluctuations).

But I also have that idea that always using the ceiling as a measure of progress might be a saturated non-informative method of aiming while learning, which is more from our levels. going to read for the actual point of the blog about the methodology.

I just found this different of usage surprising or new information. As there might be many walks of chess on lichess and chess, I get to discover fast chess habits.

Funny I never really used the opening explorer that way (still reading), but now I might try it. I have been using it in correspondance during games, to learn. To see the boards without getting distracted by having to be bombarded with names and without getting restricted by my abiliity (or lack of) to learn moves, rather than look at what is on the board for its mecanistic logic. And I would try to learn from the board themselsve amond the popularity statistics but not in my range. just above my range. As I seek the struggle, near the win for myself. I prefer chess study to chess performance. (while there is no such reality, the dichotomy is not about that, but about what motivates us, as both are, in the long time scales of learning entangled, question of proportions and fluctuations). But I also have that idea that always using the ceiling as a measure of progress might be a saturated non-informative method of aiming while learning, which is more from our levels. going to read for the actual point of the blog about the methodology. I just found this different of usage surprising or new information. As there might be many walks of chess on lichess and chess, I get to discover fast chess habits.