King's Gambit vs. Queen's Gambit: A Statistical Breakdown by Elo

Comments on https://lichess.org/@/lkama/blog/kings-gambit-vs-queens-gambit-a-statistical-breakdown-by-elo/bETVHQdU

Well, this might just be because when you're a beginner, you see a opening, haven't really played it but do it cuz you saw a Youtube video about, and honestly, with the kings gambit, its risky for White, as you are opening the kings squares making it easier for beginners to blunder and with advanced players, you see a bunch of weird moves and the kings gambit is just one of them that they already know how to exploit the weaknesses. But intermediate players don't really ever see that line often, (I don't think I've ever played against the kings Gambit!) But because its so rare, they haven't calculated the variations and there for, what ever traps or tactics are in the Kings, they fall for it, not blundering but slowly losing somehow they don't know!

Well, thanks for the heads up! I'll make sure to actually see the line against so that I'm not one of those intermediate players! ;)

"I was surprised by how potent the Queen's Gambit is across all levels, even though it's a very popular opening. The King's Gambit, on the other hand, showed a more curious story: its effectiveness is low at beginner Elos, peaks for intermediate players, and then drops off at higher levels." Well, this might just be because when you're a beginner, you see a opening, haven't really played it but do it cuz you saw a Youtube video about, and honestly, with the kings gambit, its risky for White, as you are opening the kings squares making it easier for beginners to blunder and with advanced players, you see a bunch of weird moves and the kings gambit is just one of them that they already know how to exploit the weaknesses. But intermediate players don't really ever see that line often, (I don't think I've ever played against the kings Gambit!) But because its so rare, they haven't calculated the variations and there for, what ever traps or tactics are in the Kings, they fall for it, not blundering but slowly losing somehow they don't know! Well, thanks for the heads up! I'll make sure to actually see the line against so that I'm not one of those intermediate players! ;)

cyqsimon

TIL surprise = reachability / popularity

AnlamK

edited

An expected ELO gain of 80 seems a little high. I've looked into GitHub to see how you are calculating this metric. This is what you wrote:

Reachability ("If Wants %")
This metric calculates the probability of reaching a position assuming one player actively tries to steer the game towards it, while their opponent's moves follow the database's overall frequencies. It answers, "How often can I realistically get this position on the board?"

Expected Value (EV)
A metric to judge a position's statistical value, calculated from the win/draw/loss percentages. A positive EV favors White; a negative EV favors Black.

Expected ELO Gain / 100 Games

This is the ultimate metric for practicality. It combines a move's reachability with its statistical impact (ΔEV) to estimate the concrete rating point gain you could expect over 100 games by learning and playing this opportunity. It is calculated as: Reachability % * |ΔEV| * ELO_Factor

When you judge a position's statistical value from win/draw/loss percentages, are you taking into account the ELO scores of different sides? There was a discussion about this some time ago.

For instance, 1.Nf3 seems to score well according to Lichess database - if you purely look at win/draw/loss percentage. But that seems to be because 1.Nf3 is more often played by strong White players against weaker Black opposition in the Lichess database.

When you take the rating differential into account or look into only games played between players of similar rating, the advantage of 1.Nf3 disappears.

I'll link to the blog article that was posted on Lichess about this if I can find it.

An expected ELO gain of 80 seems a little high. I've looked into GitHub to see how you are calculating this metric. This is what you wrote: > Reachability ("If Wants %") This metric calculates the probability of reaching a position assuming one player actively tries to steer the game towards it, while their opponent's moves follow the database's overall frequencies. It answers, "How often can I realistically get this position on the board?" Expected Value (EV) A metric to judge a position's statistical value, calculated from the win/draw/loss percentages. A positive EV favors White; a negative EV favors Black. Expected ELO Gain / 100 Games This is the ultimate metric for practicality. It combines a move's reachability with its statistical impact (ΔEV) to estimate the concrete rating point gain you could expect over 100 games by learning and playing this opportunity. It is calculated as: Reachability % * |ΔEV| * ELO_Factor > When you judge a position's statistical value from win/draw/loss percentages, are you taking into account the ELO scores of different sides? There was a discussion about this some time ago. For instance, 1.Nf3 seems to score well according to Lichess database - if you purely look at win/draw/loss percentage. But that seems to be because 1.Nf3 is more often played by strong White players against weaker Black opposition in the Lichess database. When you take the rating differential into account or look into only games played between players of similar rating, the advantage of 1.Nf3 disappears. I'll link to the blog article that was posted on Lichess about this if I can find it.

AnlamK

I suggest you read this article. It may indicate why your expected elo gain calculation is likely too high:

https://lichess.org/@/D2D4C2C4/blog/why-opening-statistics-are-wrong/VKNZ1oKw

I suggest you read this article. It may indicate why your expected elo gain calculation is likely too high: https://lichess.org/@/D2D4C2C4/blog/why-opening-statistics-are-wrong/VKNZ1oKw

VeganOpenings

edited

Cool graphs! I wonder if the uptick in the King's Gambit ELO gain at 2500+ rating could be due to low sample size. Hard to say without error bars and sample counts.

Also, like another commenter mentioned, I wonder if the analysis is restricted to games between similarly-rated opponents, and what time controls are being taken into account. I'd expect 2500 ELO versus 1500 ELO bullet games to be a lot different to 1500 ELO vs 1500 ELO rapid games.

Cool graphs! I wonder if the uptick in the King's Gambit ELO gain at 2500+ rating could be due to low sample size. Hard to say without error bars and sample counts. Also, like another commenter mentioned, I wonder if the analysis is restricted to games between similarly-rated opponents, and what time controls are being taken into account. I'd expect 2500 ELO versus 1500 ELO bullet games to be a lot different to 1500 ELO vs 1500 ELO rapid games.

LKama

@cyqsimon said in #3:

TIL surprise = reachability / popularity

This is a custom metric I came up with for this project to try and quantify the value of preparation.

The intuition is this: if you play an uncommon move early on, you take most of your opponents out of their comfort zone and into your specific preparation. At first, I considered making the metric a ratio of reachability from one move to the next, but that had some pitfalls.

@cyqsimon said in #3: > TIL surprise = reachability / popularity This is a custom metric I came up with for this project to try and quantify the value of preparation. The intuition is this: if you play an uncommon move early on, you take most of your opponents out of their comfort zone and into your specific preparation. At first, I considered making the metric a ratio of reachability from one move to the next, but that had some pitfalls.

LKama

@AnlamK said in #4:

...

Hello! Thank you for the high-value feedback input!

On the "Expected Elo Gain" metric:

There is a difference in the calculation depending on the mode:

In the hunt and line modes, the goal is to find high-impact moves. The formula incorporates reachability to discount the value of lines that are hard to get on the board.
In the plot mode, the goal is to show the raw performance of a line if you get it. The formula is simpler: (Win % - Loss %) * Elo_Factor * 100, which gives the expected Elo change after 100 games.

I will clarify this in the README.

This metric is not a predictor that you will gain that much Elo. It simply measures the performance of the average player at that level with that opening. To achieve those results, you'd need to match their proficiency, and as you gain Elo, you'd face stronger opponents, changing the calculation.

On the Opening Explorer Data Bias:

The blog post you linked is excellent.
Unfortunately, since my tool uses the same Lichess API as the website's explorer, it inherits the same potential biases.

The core problem, as the author points out, is that stronger players might self-select into certain openings, skewing the stats for a given Elo bucket.

The author's proposed fix is interesting, but it seems to introduce a different kind of bias (and I can’t implement it with the opening API alone, the code would need to download the entire list of games). I've considered trying to "correct" the data using popularity stats, but that feels like a very risky way of manipulating data where I might have to hand-tune correction factors.

For now, I'm moving forward with the current method because it is at least transparent and consistent with the source data on Lichess. It provides a broad picture for comparing openings, even with this known caveat. I have to say that I was surprised by how much the win rates can change depending on how the games are grouped.

I wish the opening API had an option to filter games where opponents were more than, say, 50 rating points from one another.

@AnlamK said in #4: > ... Hello! Thank you for the high-value feedback input! 1. On the "Expected Elo Gain" metric: There is a difference in the calculation depending on the mode: - In the hunt and line modes, the goal is to find high-impact moves. The formula incorporates reachability to discount the value of lines that are hard to get on the board. - In the plot mode, the goal is to show the raw performance of a line if you get it. The formula is simpler: (Win % - Loss %) * Elo_Factor * 100, which gives the expected Elo change after 100 games. I will clarify this in the README. This metric is not a predictor that you will gain that much Elo. It simply measures the performance of the average player at that level with that opening. To achieve those results, you'd need to match their proficiency, and as you gain Elo, you'd face stronger opponents, changing the calculation. 2. On the Opening Explorer Data Bias: The blog post you linked is excellent. Unfortunately, since my tool uses the same Lichess API as the website's explorer, it inherits the same potential biases. The core problem, as the author points out, is that stronger players might self-select into certain openings, skewing the stats for a given Elo bucket. The author's proposed fix is interesting, but it seems to introduce a different kind of bias (and I can’t implement it with the opening API alone, the code would need to download the entire list of games). I've considered trying to "correct" the data using popularity stats, but that feels like a very risky way of manipulating data where I might have to hand-tune correction factors. For now, I'm moving forward with the current method because it is at least transparent and consistent with the source data on Lichess. It provides a broad picture for comparing openings, even with this known caveat. I have to say that I was surprised by how much the win rates can change depending on how the games are grouped. I wish the opening API had an option to filter games where opponents were more than, say, 50 rating points from one another.

LKama

@VeganOpenings said in #6:

Cool graphs! I wonder if the uptick in the King's Gambit ELO gain at 2500+ rating could be due to low sample size. Hard to say without error bars and sample counts.

Also, like another commenter mentioned, I wonder if the analysis is restricted to games between similarly-rated opponents, and what time controls are being taken into account. I'd expect 2500 ELO versus 1500 ELO bullet games to be a lot different to 1500 ELO vs 1500 ELO rapid games.
If you look at the popularity graph, you'll see that the KG occurrence is super low at 2500+. I didn't run a statistical significance test on this point of data but I should have. It's likely that these games were played by:

Hyper specialists of the openings
Players with a much higher rating than their opponents (or "smurfs") that had the confidence to crush the game with a rich tactical position

So I would not conclude that the KG gambit becomes god again at very high Elo indeed.

@VeganOpenings said in #6: > Cool graphs! I wonder if the uptick in the King's Gambit ELO gain at 2500+ rating could be due to low sample size. Hard to say without error bars and sample counts. > > Also, like another commenter mentioned, I wonder if the analysis is restricted to games between similarly-rated opponents, and what time controls are being taken into account. I'd expect 2500 ELO versus 1500 ELO bullet games to be a lot different to 1500 ELO vs 1500 ELO rapid games. If you look at the popularity graph, you'll see that the KG occurrence is super low at 2500+. I didn't run a statistical significance test on this point of data but I should have. It's likely that these games were played by: - Hyper specialists of the openings - Players with a much higher rating than their opponents (or "smurfs") that had the confidence to crush the game with a rich tactical position So I would not conclude that the KG gambit becomes god again at very high Elo indeed.

forsoothplays

Finally. Some useful ducking statistics.