FIDE Ratings Revisited • page 2/5 • Community Blog Discussions • lichess.org

Very well written article! I'm not pretending to be a chess connoisseur, I started playing in 2020 (as many of us did, I imagine). But I really dug into it back then, and after reading and partly understanding how ratings work, I remember asking myself why FIDE is still using the totally outdated (and outmatched) Elo system while there are far superior (albeit still flawed) systems available.

The job of a rating system like that is (to me) a very simple one: predict the outcome of chess games. And the Elo system has been sub-par at this since far before I started playing.

But here's what I don't quite understand in your article: You are talking about rating deflation and how it is bad. But my naive mind thinks: How is that bad? If everybody is affected by the deflation, everyone's number gets smaller. But that doesn't change anything. It doesn't change how good or bad anyone is, it just changes what the number means. How you interpret it. To me, that just sounds like chess players being obsessed with a number, instead of performance. Don't get me wrong, I don't think that is what is actually happening, I'm sure there is something I don't understand, but that is how it sounds to me, an amateur.

Deflation affects mostly players who aspire for an international title, and becomes less of a factor among hobby players. If today's CMs are actually stronger than the FMs (or IMs!) of old, then this dilution is inequitable. Higher titled players can charge more for lessons and are generally more desirable for creating educational material such as Chessable courses/teaching at prestigious academies. If this unfairness propagates, soon enough 2000-rated players will have to introduce themselves as "CM strength", but as victims of deflation, not of skill diminishment.

@LokiBrot said in #9: > Very well written article! I'm not pretending to be a chess connoisseur, I started playing in 2020 (as many of us did, I imagine). But I really dug into it back then, and after reading and partly understanding how ratings work, I remember asking myself why FIDE is still using the totally outdated (and outmatched) Elo system while there are far superior (albeit still flawed) systems available. > > The job of a rating system like that is (to me) a very simple one: predict the outcome of chess games. And the Elo system has been sub-par at this since far before I started playing. > > But here's what I don't quite understand in your article: You are talking about rating deflation and how it is bad. But my naive mind thinks: How is that bad? If everybody is affected by the deflation, everyone's number gets smaller. But that doesn't change anything. It doesn't change how good or bad anyone is, it just changes what the number _means_. How you interpret it. To me, that just sounds like chess players being obsessed with a number, instead of performance. Don't get me wrong, I don't think that is what is actually happening, I'm sure there is something I don't understand, but that is how it sounds to me, an amateur. Deflation affects mostly players who aspire for an international title, and becomes less of a factor among hobby players. If today's CMs are actually stronger than the FMs (or IMs!) of old, then this dilution is inequitable. Higher titled players can charge more for lessons and are generally more desirable for creating educational material such as Chessable courses/teaching at prestigious academies. If this unfairness propagates, soon enough 2000-rated players will have to introduce themselves as "CM strength", but as victims of deflation, not of skill diminishment.

Khallyx

Fantastic article, very comprehensive. A lot of food for thought.

Mago1

Best article ever written on this damn site! The last changes FIDE made barely corrected the damage that was already done. If they don't make some structural improvements to the rating formula, in 10 years we will be in the same spot or worse.

LokiBrot

@Vlad_G92 said in #11:

Deflation affects mostly players who aspire for an international title, and becomes less of a factor among hobby players. If today's CMs are actually stronger than the FMs (or IMs!) of old, then this dilution is inequitable. Higher titled players can charge more for lessons and are generally more desirable for creating educational material such as Chessable courses/teaching at prestigious academies. If this unfairness propagates, soon enough 2000-rated players will have to introduce themselves as "CM strength", but as victims of deflation, not of skill diminishment.

Thank you for clarifying!

@Vlad_G92 said in #11: > Deflation affects mostly players who aspire for an international title, and becomes less of a factor among hobby players. If today's CMs are actually stronger than the FMs (or IMs!) of old, then this dilution is inequitable. Higher titled players can charge more for lessons and are generally more desirable for creating educational material such as Chessable courses/teaching at prestigious academies. If this unfairness propagates, soon enough 2000-rated players will have to introduce themselves as "CM strength", but as victims of deflation, not of skill diminishment. Thank you for clarifying!

petri999

@Sinego_s_progiba said in #3:

I guess FIDE just doesn't see it as an issue which needs to be fixed.

They well see it as an issue but one there is no way to address.

there huge resistance for any change. People like predictability which is not part of URS for example.
FIDE does want to introduce anything with "hidden variables" as it would reducy transparency. Like player would find hard to esitmate what is required for Norm result

So things will not change any time soon. Maximum likelihood rating algorithms (like URS) have existed at least 30 years and any of them outperforms Elo, Glicko-x etc. algorithms take only lates evidence into account. So I guess need go at least another 30 years before they are taken into use

@Sinego_s_progiba said in #3: > I guess FIDE just doesn't see it as an issue which needs to be fixed. They well see it as an issue but one there is no way to address. - there huge resistance for any change. People like predictability which is not part of URS for example. - FIDE does want to introduce anything with "hidden variables" as it would reducy transparency. Like player would find hard to esitmate what is required for Norm result So things will not change any time soon. Maximum likelihood rating algorithms (like URS) have existed at least 30 years and any of them outperforms Elo, Glicko-x etc. algorithms take only lates evidence into account. So I guess need go at least another 30 years before they are taken into use

courier-six

I have been thinking about a system that would be easy to implement and wouldn't change much. The main idea is when a player overperforms we should use his performance rating and not his actual rating when calculating the rating changes of his opponents. In this way the upcoming player gets the rating points he deserves but his opponents lose rating according to his rating performance. What do you think?

petri999

That could lead more problems. Performance rating is based on only 5-9 games. And at lower end where the raising start is 5. It would be very random number at best. Hard to say but mere "mathematical feeling" over the matter is that would not work. And again introduce the "hidden variable" i.e how much points are at risk in tournament is known only after it is over. Not important for ratings but important on acceptable it is to players.

courier-six

@petri999 said in #17:

That could lead more problems. Performance rating is based on only 5-9 games. And at lower end where the raising start is 5. It would be very random number at best. Hard to say but mere "mathematical feeling" over the matter is that would not work. And again introduce the "hidden variable" i.e how much points are at risk in tournament is known only after it is over. Not important for ratings but important on acceptable it is to players.

I am not sure if you read and understood what I propose. It is just a mitigating tool for "regular" players. You can never loose more rating points and the "improving" player gets the same rating points as before.

@petri999 said in #17: > That could lead more problems. Performance rating is based on only 5-9 games. And at lower end where the raising start is 5. It would be very random number at best. Hard to say but mere "mathematical feeling" over the matter is that would not work. And again introduce the "hidden variable" i.e how much points are at risk in tournament is known only after it is over. Not important for ratings but important on acceptable it is to players. I am not sure if you read and understood what I propose. It is just a mitigating tool for "regular" players. You can never loose more rating points and the "improving" player gets the same rating points as before.

KMcGeoch

Actually in my mind deflation is the biggest issue and I'll use the following assumptions for how I think it arises.

On average for every point a player gains another player loses a point
On average a new player will be rated around 1500 on average
A player who stops playing playing chess is rated around 1900 on average (either through death or other pastimes)

From this I would assume following applies.

If a player starts off at 1500 and finishes at 1900 then they have deflated rating system by 400 points. In general people enter the rating pool as beginners, gain points against others and leave the pool as a stronger player taking the points they earned with them.
If point gain/losses are identical then average rating should be close to the average rating of new players entering the system in the long run.

Added to that you have issue of rating lag. Should a player be rated at 1500 with a playing strength of 2000 then if they play against other players who have a rating and strength of 2000 then everyone is pulled down. It can occur in reverse direction with a 2000 that has weakened to 1500 but generally underrated players are more keen to play rated games than overrated players.

Ideally it would be nice to have some constants to calibrate ratings with but unlike computers I don't think humans play with constant strength.

Generally I've found that many otb events aren't FIDE rated since the cost element means that usually it's unappealing. Until recently it was only events where people gained norms that were FIDE rated. Even now when there's an effort to get more games FIDE rated less than 1/3 of the rated games I played otb were FIDE rated.

I'll also add that FIDE measures like giving extra rating points is pretty much applying a band aid to the problem and if fundamental issues I mentioned above are unchanged same situation should repeat itself.

Actually in my mind deflation is the biggest issue and I'll use the following assumptions for how I think it arises. 1. On average for every point a player gains another player loses a point 2. On average a new player will be rated around 1500 on average 3. A player who stops playing playing chess is rated around 1900 on average (either through death or other pastimes) From this I would assume following applies. 1. If a player starts off at 1500 and finishes at 1900 then they have deflated rating system by 400 points. In general people enter the rating pool as beginners, gain points against others and leave the pool as a stronger player taking the points they earned with them. 2. If point gain/losses are identical then average rating should be close to the average rating of new players entering the system in the long run. Added to that you have issue of rating lag. Should a player be rated at 1500 with a playing strength of 2000 then if they play against other players who have a rating and strength of 2000 then everyone is pulled down. It can occur in reverse direction with a 2000 that has weakened to 1500 but generally underrated players are more keen to play rated games than overrated players. Ideally it would be nice to have some constants to calibrate ratings with but unlike computers I don't think humans play with constant strength. Generally I've found that many otb events aren't FIDE rated since the cost element means that usually it's unappealing. Until recently it was only events where people gained norms that were FIDE rated. Even now when there's an effort to get more games FIDE rated less than 1/3 of the rated games I played otb were FIDE rated. I'll also add that FIDE measures like giving extra rating points is pretty much applying a band aid to the problem and if fundamental issues I mentioned above are unchanged same situation should repeat itself.

courier-six

Just an observation. Rating gains and loss are not matched one point by one point. Young and lower rated players have bigger K-factor which means they gain more points than the points lost by their opponents.