Your network blocks the Lichess assets!

lichess.org
Donate

Lichess ratings are not Glicko-2

@MFXX said in #10:

Is it because player's ratings are not calculated in batches assumed to be simultaneous, as Glickman's paper suggest? In that case, wouldn't one argue that it still is Glicko-2, but with rating periods in the order of a single second, or a single server tick?

  1. Yes...
  2. Are multiple games for a single player rated each second, with those games being treated as if played concurrently?

I think Glickman's observation holds true of any rating system where upsets are possible. A new player who wins their first game against a world champion then loses every other game disrupts ratings (slows convergence) more if games are rated sequentially rather than concurrently.

@MFXX said in #10: > Is it because player's ratings are not calculated in batches assumed to be simultaneous, as Glickman's paper suggest? In that case, wouldn't one argue that it still is Glicko-2, but with rating periods in the order of a single second, or a single server tick? 1. Yes... 2. Are multiple games for a single player rated each second, with those games being treated as if played concurrently? I think Glickman's observation holds true of any rating system where upsets are possible. A new player who wins their first game against a world champion then loses every other game disrupts ratings (slows convergence) more if games are rated sequentially rather than concurrently.

@MFXX said in #10:

A (hopefully interesting) note to add to the discussion: Mark Glickman once told me that one disadvantage of updating ratings after each game instead of having periods of several games is that it decreases the algorithms stability over time, as some of the approximations done in the deduction of the formulas converge faster with a few games per player per rating period.
That's true in general but there are some corner cases where it actually works in the opposite direction. E.g. when a significantly underrated player (and possibly high RD/volatility) suddenly plays a larger batch of games, batch based system can amplify the resulting jump to overshoot the actual playing strength. A system updating the rating after each game (but otherwise based on the same formula) would have a moderating (stabilizing) effect here as with the rating growing, the updates for each game become smaller.

This problem may be further pronounced with longer rating periods. E.g. our national classical rating has a rating period of 4 months and, unlike FIDE, it does not have the rule that long running competitions (spanning over multiple period) have to send results for each period. Thus the updates from whole season running from October to March/April are based on the rating at October 1st.

@MFXX said in #10: > A (hopefully interesting) note to add to the discussion: Mark Glickman once told me that one disadvantage of updating ratings after each game instead of having periods of several games is that it decreases the algorithms stability over time, as some of the approximations done in the deduction of the formulas converge faster with a few games per player per rating period. That's true in general but there are some corner cases where it actually works in the opposite direction. E.g. when a significantly underrated player (and possibly high RD/volatility) suddenly plays a larger batch of games, batch based system can amplify the resulting jump to overshoot the actual playing strength. A system updating the rating after each game (but otherwise based on the same formula) would have a moderating (stabilizing) effect here as with the rating growing, the updates for each game become smaller. This problem may be further pronounced with longer rating periods. E.g. our national classical rating has a rating period of 4 months and, unlike FIDE, it does not have the rule that long running competitions (spanning over multiple period) have to send results for each period. Thus the updates from whole season running from October to March/April are based on the rating at October 1st.

TrUe GlICkO-2 HaS nEvEr BeEn TrIeD

TrUe GlICkO-2 HaS nEvEr BeEn TrIeD

Best of my knowledge Australian Chess Federation uses Glicko-2 rating system. Though you cannot find any information about easily. In rating listing the RD seems to indicated with following symbols after numeric rating.:
A !! indicates a very reliable rating.
A ! indicates a reliable rating.
A blank indicates the rating is unreliable..
A ? indicates the rating is very unreliable.
A ?? indicates the rating is extremely unreliable.
A g following a number indicates the player needs that many more games before he will get a rating.

So it has been tried.

Best of my knowledge Australian Chess Federation uses Glicko-2 rating system. Though you cannot find any information about easily. In rating listing the RD seems to indicated with following symbols after numeric rating.: A !! indicates a very reliable rating. A ! indicates a reliable rating. A blank indicates the rating is unreliable.. A ? indicates the rating is very unreliable. A ?? indicates the rating is extremely unreliable. A g following a number indicates the player needs that many more games before he will get a rating. So it has been tried.

OP's opinions are completely harsh and super biased
I love how "peak sitting" is abuse, and "cheater" often in the same or neighboring sentences. This quickly went from a discussion of rating differential to a projection of some admin-related PTSD.

The single account rule should be enough to thwart any motivation for "peak sitting" "abuse" aside from legitimately not being not being around or having the desire to jump back into many/regular games.

A more objective tone on the pluses and minuses of RD formats, instead of a tone of "humans are evil, so this is why this RD system was chosen" would be much better digestible.

OP's opinions are completely harsh and super biased I love how "peak sitting" is abuse, and "cheater" often in the same or neighboring sentences. This quickly went from a discussion of rating differential to a projection of some admin-related PTSD. The single account rule should be enough to thwart any motivation for "peak sitting" "abuse" aside from legitimately not being not being around or having the desire to jump back into many/regular games. A more objective tone on the pluses and minuses of RD formats, instead of a tone of "humans are evil, so this is why this RD system was chosen" would be much better digestible.

@icytease said in #15:

OP's opinions are completely harsh and super biased

It might be fair to say that I think https://lichess.org/terms-of-service could be improved considerably by removing about half of it.

@icytease said in #15:

A more objective tone on the pluses and minuses of RD formats

Are there other RD formats? How does the Lichess rating system work, or rather: does it work?

@icytease said in #15: > OP's opinions are completely harsh and super biased It might be fair to say that I think https://lichess.org/terms-of-service could be improved considerably by removing about half of it. @icytease said in #15: > A more objective tone on the pluses and minuses of RD formats Are there other RD formats? How does the Lichess rating system work, or rather: does it work?

@lollycopter said in #7:

Edit: I just realised an idea - what if there were a hover-over option to display Simple Moving Averages for any given rating?

I agree. I certainly think SMAs are a better indicator of strength and performance.

I row indoors and face a similar conundrum there. An improving SMA is a much better indicator of rowing fitness than a personal best.

@lollycopter said in #7: > Edit: I just realised an idea - what if there were a hover-over option to display Simple Moving Averages for any given rating? I agree. I certainly think SMAs are a better indicator of strength and performance. I row indoors and face a similar conundrum there. An improving SMA is a much better indicator of rowing fitness than a personal best.

Thank you for an interesting and informative article!

In my opinion, the way how Lichess calculates ratings has some advantages and disadvantages. Lichess allows players to reach an adequate rating relatively quickly, which is very good. There are relevant and up-to-date leaderboards for active players accross the world. (Obviously if we forget about problems with cheating, which causes many problems, the leaderboards being being a minor one.)
Lichess leaderboards discourage high-ranked players from "sitting on their ratings", which is also good.
That said, the system also has its drawbacks. Say, I play bullet very rarely, and mostly not "true bullet", but something like 1+1. When I return after a longer break, my rating often jumps a lot up and down, so I once got to a peak rating of 3082 and a spot in Top 50 without being much of a bullet player. (Too slow...) Not a big problem in terms of th

I play a lot of Chess960 here. The local ratings work much better than on the bigger platform, where I have several times faced strong GMs rated around 1600. (I am active on both platforms and find both of them attractive, though in different ways.)
That said, Titled Arenas are heavily underrated compared to the pool, by 150-200 points. Some strong players enter them with provisional ratings around 1500 and many titled players only play rated Chess960 games within these arenas. Other prize events might also collect underrated field at least near the top, given that players who berserk might be underrated but have higher chances to win a prize.
It is nice that one is encouraged to play similarly rated opponents to keep the rating deviation below 65, but not so easy to achieve in variants, given that many of the highest-rated players mostly play bullet or hyperbullet, whereas I am mostly playing slow bullet, superblitz or blitz. Once I returned after a month-long break and needed to play around 28 rated Chess960 games to push my rating deviation below 65 again. In the Monthly Arenas the average rating of my opponents is probably more often below 2000 than above, and in the Shields it is better, but not that much.
There are other pitfalls with the rating deviation as well. Say, on July 12 I was rated 2654 with a rating deviation around 55, obviously overrated compared to many other titled players. Then I played the Titled Arena with some berserking, some good games and some worse games, dropped to 2560 and my rating deviation dropped to 45, making it far harder to get back. It took me over 2 months to get back over 2600, although I was displaying very good play, similar as when I got to 2650. Then I played another prize tournament, tilted and lost over 50 points again. Then I had a poor form and lost other 40 points. If I stabilize my form, I might have the 2600 strength in terms of local Chess960 ratings (compared to other high-rated players in the pool) but I would need to win around 30 games in a row against 2300-rated opponents to get there. Not very realistic.
By the way, I have not played a single Chess960 game on Lichess against 8 out of the top 10 players from the current leaderboard, although I play a lot of Chess960 and set challenges for the 3+0 time control here and there. Perhaps we prefer different time controls, different playing times, different events or different opponents.
I understand that many players are comfortable with bullet and hyperbullet for various reasons. I do not understand all the details of the Lichess rating system, but to me it also seems that playing hyperbullet might help one to decrease the rating deviation faster, thus being able to get more "cups" without spending a plenty of time playing. That said, in hyperbullet just a higher speed of moving (with the same speed of thinking) alone might mean 200 extra rating points. It is also a skill, but in my opinion it influences the variant leaderboards more than it should.
I play Chess960 for fun and for training purposes, not for ratings. That said, at least in some variants ratings do not work that well.
I am sorry for writing such a long comment, at least it is not "off-topic".

Thank you for an interesting and informative article! In my opinion, the way how Lichess calculates ratings has some advantages and disadvantages. Lichess allows players to reach an adequate rating relatively quickly, which is very good. There are relevant and up-to-date leaderboards for active players accross the world. (Obviously if we forget about problems with cheating, which causes many problems, the leaderboards being being a minor one.) Lichess leaderboards discourage high-ranked players from "sitting on their ratings", which is also good. That said, the system also has its drawbacks. Say, I play bullet very rarely, and mostly not "true bullet", but something like 1+1. When I return after a longer break, my rating often jumps a lot up and down, so I once got to a peak rating of 3082 and a spot in Top 50 without being much of a bullet player. (Too slow...) Not a big problem in terms of th I play a lot of Chess960 here. The local ratings work much better than on the bigger platform, where I have several times faced strong GMs rated around 1600. (I am active on both platforms and find both of them attractive, though in different ways.) That said, Titled Arenas are heavily underrated compared to the pool, by 150-200 points. Some strong players enter them with provisional ratings around 1500 and many titled players only play rated Chess960 games within these arenas. Other prize events might also collect underrated field at least near the top, given that players who berserk might be underrated but have higher chances to win a prize. It is nice that one is encouraged to play similarly rated opponents to keep the rating deviation below 65, but not so easy to achieve in variants, given that many of the highest-rated players mostly play bullet or hyperbullet, whereas I am mostly playing slow bullet, superblitz or blitz. Once I returned after a month-long break and needed to play around 28 rated Chess960 games to push my rating deviation below 65 again. In the Monthly Arenas the average rating of my opponents is probably more often below 2000 than above, and in the Shields it is better, but not that much. There are other pitfalls with the rating deviation as well. Say, on July 12 I was rated 2654 with a rating deviation around 55, obviously overrated compared to many other titled players. Then I played the Titled Arena with some berserking, some good games and some worse games, dropped to 2560 and my rating deviation dropped to 45, making it far harder to get back. It took me over 2 months to get back over 2600, although I was displaying very good play, similar as when I got to 2650. Then I played another prize tournament, tilted and lost over 50 points again. Then I had a poor form and lost other 40 points. If I stabilize my form, I might have the 2600 strength in terms of local Chess960 ratings (compared to other high-rated players in the pool) but I would need to win around 30 games in a row against 2300-rated opponents to get there. Not very realistic. By the way, I have not played a single Chess960 game on Lichess against 8 out of the top 10 players from the current leaderboard, although I play a lot of Chess960 and set challenges for the 3+0 time control here and there. Perhaps we prefer different time controls, different playing times, different events or different opponents. I understand that many players are comfortable with bullet and hyperbullet for various reasons. I do not understand all the details of the Lichess rating system, but to me it also seems that playing hyperbullet might help one to decrease the rating deviation faster, thus being able to get more "cups" without spending a plenty of time playing. That said, in hyperbullet just a higher speed of moving (with the same speed of thinking) alone might mean 200 extra rating points. It is also a skill, but in my opinion it influences the variant leaderboards more than it should. I play Chess960 for fun and for training purposes, not for ratings. That said, at least in some variants ratings do not work that well. I am sorry for writing such a long comment, at least it is not "off-topic".

You're welcome! I held off from writing it for years since it seems to mostly plagiarize Glickman's work, but in a way that might be even more accessible than his papers which already are quite accessible (so much so that developers produced multiple reference implementations).

@RealDavidNavara said in #18:

Lichess leaderboards discourage high-ranked players from "sitting on their ratings", which is also good.
That said, the system also has its drawbacks. Say, I play bullet very rarely, and mostly not "true bullet", but something like 1+1. When I return after a longer break, my rating often jumps a lot up and down...

I'll defer to others what behavior they'd like to see on the leaderboard, but personally I think if there's a leaderboard at all it might as well only show players with well-established ratings.
Precisely, after this period of a rating jumping up and down (which shouldn't last as long as it does) it's not supposed to be an ordeal to make it increase again when you are in top form. With the Glicko-2 rating system, upsets (whether players are overrated or underrated) should result in corrections not only to the base rating "r" but also the volatility factor which affects how much future ratings can deviate (for a player who is overrated or underrated; i.e. one whose rating doesn't do well predicting actual game outcomes).

@RealDavidNavara said in #18:

It is nice that one is encouraged to play similarly rated opponents to keep the rating deviation below 65, but not so easy to achieve in variants, given that many of the highest-rated players mostly play bullet or hyperbullet, whereas I am mostly playing slow bullet, superblitz or blitz...
That said, in hyperbullet just a higher speed of moving (with the same speed of thinking) alone might mean 200 extra rating points. It is also a skill, but in my opinion it influences the variant leaderboards more than it should.

That does sound problematic, and I'm not seeing an easy solution. I don't think even the restrictions (must have played X rated games) which make these events less popular help much with the disparity in ratings (how some players on the leaderboard enter and may be overrated, while others who play different time controls or don't go berserk may be underrated).

Since I play a variety of time controls (but not ultrabullet) there was a different time control-related concern that other players and myself observed, where lag compensation was unpredictably different by time control. There's some discussion about how to further improve that, in order to allow high-latency players to play faster TCs (without heavily favoring such players)...
https://github.com/lichess-org/lila/issues/12097#issuecomment-1364683497

You're welcome! I held off from writing it for years since it seems to mostly plagiarize Glickman's work, but in a way that might be even more accessible than his papers which already are quite accessible (so much so that developers produced multiple reference implementations). @RealDavidNavara said in #18: > Lichess leaderboards discourage high-ranked players from "sitting on their ratings", which is also good. > That said, the system also has its drawbacks. Say, I play bullet very rarely, and mostly not "true bullet", but something like 1+1. When I return after a longer break, my rating often jumps a lot up and down... I'll defer to others what behavior they'd like to see on the leaderboard, but personally I think if there's a leaderboard at all it might as well only show players with well-established ratings. Precisely, after this period of a rating jumping up and down (which shouldn't last as long as it does) it's not supposed to be an ordeal to make it increase again when you are in top form. With the Glicko-2 rating system, upsets (whether players are overrated or underrated) should result in corrections not only to the base rating "r" but also the volatility factor which affects how much future ratings can deviate (for a player who is overrated or underrated; i.e. one whose rating doesn't do well predicting actual game outcomes). @RealDavidNavara said in #18: > It is nice that one is encouraged to play similarly rated opponents to keep the rating deviation below 65, but not so easy to achieve in variants, given that many of the highest-rated players mostly play bullet or hyperbullet, whereas I am mostly playing slow bullet, superblitz or blitz... > That said, in hyperbullet just a higher speed of moving (with the same speed of thinking) alone might mean 200 extra rating points. It is also a skill, but in my opinion it influences the variant leaderboards more than it should. That does sound problematic, and I'm not seeing an easy solution. I don't think even the restrictions (must have played X rated games) which make these events less popular help much with the disparity in ratings (how some players on the leaderboard enter and may be overrated, while others who play different time controls or don't go berserk may be underrated). Since I play a variety of time controls (but not ultrabullet) there was a different time control-related concern that other players and myself observed, where lag compensation was unpredictably different by time control. There's some discussion about how to further improve that, in order to allow high-latency players to play faster TCs (without heavily favoring such players)... https://github.com/lichess-org/lila/issues/12097#issuecomment-1364683497

@Toadofsky said in #11:

  1. Are multiple games for a single player rated each second, with those games being treated as if played concurrently?

What I meant is that players can have one or even zero games per rating period, and the system will still be pure Glicko-2, just not following the suggested parameters given by Glickman. Consider a rating period with lenght of a single server tick: since it's impossible for players to play a network of correlated games in a single tick, I think you could conclude that, per player, ratings are being updated per batch of a single game, but it is nonetheless a batch.

@Toadofsky said in #11: > 2. Are multiple games for a single player rated each second, with those games being treated as if played concurrently? What I meant is that players can have one or even zero games per rating period, and the system will still be pure Glicko-2, just not following the suggested parameters given by Glickman. Consider a rating period with lenght of a single server tick: since it's impossible for players to play a network of correlated games in a single tick, I think you could conclude that, per player, ratings are being updated per batch of a single game, but it is nonetheless a batch.

This topic is now closed.