Lichess ratings are not Glicko-2

Comments on https://lichess.org/@/toadofsky/blog/lichess-ratings-are-not-glicko-2/x1bpwn38

Major reason for calculating rating for each game is that it how people want it. Internet is world of instant gratification. Second reason this how all online system do it. End resutl would be about same just that perhaps people would be slightly less obsesses with the results. Publishing rating once week would not be hard to implement nor would it cost computing resources.

John_Hamilton

The source code has a variable named "ratingPeriodsPerDay" here:

https://github.com/lichess-org/lila/blob/e1e8640513237b862791857348662807b17256eb/modules/rating/src/main/Glicko.scala#L65

I haven't studied the source closely enough to see how it's being used, but they could be updating the rating for the current rating period as new games are finished. Basically, by just redoing the calculation for the period. I don't know how they came up with that period length.

The source code has a variable named "ratingPeriodsPerDay" here: https://github.com/lichess-org/lila/blob/e1e8640513237b862791857348662807b17256eb/modules/rating/src/main/Glicko.scala#L65 I haven't studied the source closely enough to see how it's being used, but they could be updating the rating for the current rating period as new games are finished. Basically, by just redoing the calculation for the period. I don't know how they came up with that period length.

FranciscoMw

Author just came with the abbrev "RD" in the middle of the blog and some of us didn't understand what it is.
Am I the only one?

Author just came with the abbrev "RD" in the middle of the blog and some of us didn't understand what it is. Am I the only one?

Toadofsky

@FranciscoMw said in #4:

Author just came with the abbrev "RD" in the middle of the blog and some of us didn't understand what it is.
Am I the only one?

Thanks, I've now updated the post, adding:

As with the original Glicko system, it is usually informative to summarize a player’s strength in the form of an interval (rather than merely report a rating). One way to do this is to report a 95% confidence interval. The lowest value in the interval is the player’s rating minus twice the RD, and the highest value is the player’s rating plus twice the RD. So, for example, if a player’s rating is 1850 and the RD is 50, the interval would go from 1750 to 1950. We would then say that we’re 95% confident that the player’s actual strength is between 1750 and 1950.

I do wish Lichess would show the 95% confidence interval, instead of indicating some ratings with question marks and otherwise not reporting this information.

@FranciscoMw said in #4: > Author just came with the abbrev "RD" in the middle of the blog and some of us didn't understand what it is. > Am I the only one? Thanks, I've now updated the post, adding: > As with the original Glicko system, it is usually informative to summarize a player’s strength in the form of an interval (rather than merely report a rating). One way to do this is to report a 95% confidence interval. The lowest value in the interval is the player’s rating minus twice the RD, and the highest value is the player’s rating plus twice the RD. So, for example, if a player’s rating is 1850 and the RD is 50, the interval would go from 1750 to 1950. We would then say that we’re 95% confident that the player’s actual strength is between 1750 and 1950. I do wish Lichess would show the 95% confidence interval, instead of indicating some ratings with question marks and otherwise not reporting this information.

Flatlander

There's beauty in the elegance of simplicity. There's also beauty in rules being transparently posted in a user friendly manner, instead of merely hidden in obscure code that changes unexpectedly on capricious whims. Competent coders are mature enough to accept the discipline of compliance with posted policies. Sloppy coders revel in the ease of no worries about potential conflict with posted policies. Long I have suffered under the tyranny of obscure rating methods. I therefore now hearken unto the Magna Carta, by which I suggest that developers should not act as above the law, and that some ratings law should be set forth clearly.

I prefer an Elo system that would be about as simple as possible, but with a k-factor that is higher for truly new ratings, then adjusted for ratings brackets somewhat how FIDE does. Complexity based on frequency of play is a cultural consideration, absurdly foreign to the pure logic of chess. I don't care if a highly skilled statistician has wise words about expected deviation. That should give no right to rig the rating system.

There's beauty in the elegance of simplicity. There's also beauty in rules being transparently posted in a user friendly manner, instead of merely hidden in obscure code that changes unexpectedly on capricious whims. Competent coders are mature enough to accept the discipline of compliance with posted policies. Sloppy coders revel in the ease of no worries about potential conflict with posted policies. Long I have suffered under the tyranny of obscure rating methods. I therefore now hearken unto the Magna Carta, by which I suggest that developers should not act as above the law, and that some ratings law should be set forth clearly. I prefer an Elo system that would be about as simple as possible, but with a k-factor that is higher for truly new ratings, then adjusted for ratings brackets somewhat how FIDE does. Complexity based on frequency of play is a cultural consideration, absurdly foreign to the pure logic of chess. I don't care if a highly skilled statistician has wise words about expected deviation. That should give no right to rig the rating system.

lollycopter

edited

I've never had an issue with Lichess's implementation of Glicko-2 (Glicko-2.1?) and have always been able to clearly explain it to new users who aren't familiar with concepts like Rating Deviation and provisional ratings, as well as the value of being paired against someone of a similar rating.

It's much better than some other major chess website which has opted to quietly remove RD information from member profiles some years ago. That information is still available via their api, but I have a strong suspicion it was removed to allow their largely less-active user base (on average) to chase rating peaks after previously overshooting their true rating.

Edit: I just realised an idea - what if there were a hover-over option to display Simple Moving Averages for any given rating?

I've never had an issue with Lichess's implementation of Glicko-2 (Glicko-2.1?) and have always been able to clearly explain it to new users who aren't familiar with concepts like Rating Deviation and provisional ratings, as well as the value of being paired against someone of a similar rating. It's much better than some other major chess website which has opted to quietly *remove* RD information from member profiles some years ago. That information is still available via their api, but I have a strong suspicion it was removed to allow their largely less-active user base (on average) to chase rating peaks after previously overshooting their true rating. Edit: I just realised an idea - what if there were a hover-over option to display Simple Moving Averages for any given rating?

airfloo

I merely understood 50% of that extraordinarily well written article, yet every ounce I got out of that was quite informative

thanks and cheers!

I merely understood 50% of that extraordinarily well written article, yet every ounce I got out of that was quite informative thanks and cheers!

Bhaaradwaaj

me not no math
anyway great article!

me not no math anyway great article!

MFXX

Thanks for the post @Toadofsky :), it really surprised me.

Also, I'm a bit lost. Mathematically, why do you say that Lichess does not use true Glicko-2? Is it because player's ratings are not calculated in batches assumed to be simultaneous, as Glickman's paper suggest? In that case, wouldn't one argue that it still is Glicko-2, but with rating periods in the order of a single second, or a single server tick?

A (hopefully interesting) note to add to the discussion: Mark Glickman once told me that one disadvantage of updating ratings after each game instead of having periods of several games is that it decreases the algorithms stability over time, as some of the approximations done in the deduction of the formulas converge faster with a few games per player per rating period.

Thanks for the post @Toadofsky :), it really surprised me. Also, I'm a bit lost. Mathematically, why do you say that Lichess does not use true Glicko-2? Is it because player's ratings are not calculated in batches assumed to be simultaneous, as Glickman's paper suggest? In that case, wouldn't one argue that it still is Glicko-2, but with rating periods in the order of a single second, or a single server tick? A (hopefully interesting) note to add to the discussion: Mark Glickman once told me that one disadvantage of updating ratings after each game instead of having periods of several games is that it decreases the algorithms stability over time, as some of the approximations done in the deduction of the formulas converge faster with a few games per player per rating period.

Your network blocks the Lichess assets!