Lichess ratings are not Glicko-2

@MFXX said in #20:

I think you could conclude that, per player, ratings are being updated per batch of a single game, but it is nonetheless a batch.

Such a system would have more in common with Elo than with Glicko. Glickman's very first paragraph about application of Glicko explains at length (and this is Glickman's only summary of the algorithm, immediately followed by 4 pages of math and nothing else, so even if I hadn't studied statistics and didn't fully agree with it, I'd still find it important to carefully consider):

To apply the rating algorithm, we treat a collection of games within a “rating period” to have occurred simultaneously. A rating period could be as long as several months, or could be as short as one minute. In the former case, players would have ratings and RD’s at the beginning of the rating period, game outcomes would be observed, and then updated ratings and RD’s would be computed at the end of the rating period (which would then be used as the pre-period ratings and RD’s for the subsequent rating period). In the latter case, ratings and RD’s would be updated on a game-by-game basis (this is currently the system used by FICS). The Glicko system works best when the number of games in a rating period is moderate, say an average of 5-10 games per player in a rating period. The length of time for a rating period is at the discretion of the administrator.
http://www.glicko.net/glicko/glicko.pdf

@MFXX said in #20: > I think you could conclude that, per player, ratings are being updated per batch of a single game, but it is nonetheless a batch. Such a system would have more in common with Elo than with Glicko. Glickman's very first paragraph about application of Glicko explains at length (and this is Glickman's only summary of the algorithm, immediately followed by 4 pages of math and nothing else, so even if I hadn't studied statistics and didn't fully agree with it, I'd still find it important to carefully consider): > To apply the rating algorithm, we treat a collection of games within a “rating period” to have occurred simultaneously. A rating period could be as long as several months, or could be as short as one minute. In the former case, players would have ratings and RD’s at the beginning of the rating period, game outcomes would be observed, and then updated ratings and RD’s would be computed at the end of the rating period (which would then be used as the pre-period ratings and RD’s for the subsequent rating period). In the latter case, ratings and RD’s would be updated on a game-by-game basis (this is currently the system used by FICS). The Glicko system works best when the number of games in a rating period is moderate, say an average of 5-10 games per player in a rating period. The length of time for a rating period is at the discretion of the administrator. http://www.glicko.net/glicko/glicko.pdf

nadjarostowa

I am a bit skeptical with batches in online play. No matter if they are batched by number of games or a certain time period, players will make use of it and try to play the system.

I think it will incentivise bad actions on the later/last games in each batch.

I fail to see how "peak sitting" is a problem. It's just an ordinary break like every other. Not including inactive players on the leaderboard is an obvious fix to keep things meaningful - and that "problem" doesn't seem to be related to the details of the rating system.

I am a bit skeptical with batches in online play. No matter if they are batched by number of games or a certain time period, players will make use of it and try to play the system. I think it will incentivise bad actions on the later/last games in each batch. I fail to see how "peak sitting" is a problem. It's just an ordinary break like every other. Not including inactive players on the leaderboard is an obvious fix to keep things meaningful - and that "problem" doesn't seem to be related to the details of the rating system.

GnocchiPup

I'm a peak sitter.

Once one gets on a peak, it's all downhill...

I'm a peak sitter. Once one gets on a peak, it's all downhill...

petri999

@nadjarostowa said in #22:

I am a bit skeptical with batches in online play. No matter if they are batched by number of games or a certain time period, players will make use of it and try to play the system.

True apart from new accounts now 1700+ 1500 1300- kinda trip is quite unnecessary. It would make sense to calculate the rating only after say 10 games. How to handle opponent rating meanwhile is an issue I do not know. Best would be to ignore those games but given obsession to each rating point which good part of lichess players have it could lead abortions of games.

FIDE ratings are published only after 10 games only if they contain at least a draw. Rules how it actually happens have changed I have not bothered to look. But is based performance number in tournaments.

@nadjarostowa said in #22: > I am a bit skeptical with batches in online play. No matter if they are batched by number of games or a certain time period, players will make use of it and try to play the system. > True apart from new accounts now 1700+ 1500 1300- kinda trip is quite unnecessary. It would make sense to calculate the rating only after say 10 games. How to handle opponent rating meanwhile is an issue I do not know. Best would be to ignore those games but given obsession to each rating point which good part of lichess players have it could lead abortions of games. FIDE ratings are published only after 10 games only if they contain at least a draw. Rules how it actually happens have changed I have not bothered to look. But is based performance number in tournaments.

Toadofsky

@nadjarostowa said in #22:

I fail to see how "peak sitting" is a problem. It's just an ordinary break like every other. Not including inactive players on the leaderboard is an obvious fix to keep things meaningful - and that "problem" doesn't seem to be related to the details of the rating system.

To some extent I actually agree here... it would be my preference not to have a rating leaderboard, not to publish rating statistics, like https://lichess.org/stat/rating/distribution/blitz , and not to show ratings at all. But since we have a leaderboard and since players frequently commented both in the Lichess Feedback forum and the Lichess Discord server, we might as well minimize incentives for silly behavior (with side effects of the leaderboard working better and new cheaters being easier to detect before they make it onto the leaderboard).

@nadjarostowa said in #22: > I fail to see how "peak sitting" is a problem. It's just an ordinary break like every other. Not including inactive players on the leaderboard is an obvious fix to keep things meaningful - and that "problem" doesn't seem to be related to the details of the rating system. To some extent I actually agree here... it would be my preference not to have a rating leaderboard, not to publish rating statistics, like https://lichess.org/stat/rating/distribution/blitz , and not to show ratings at all. But since we have a leaderboard and since players frequently commented both in the Lichess Feedback forum and the Lichess Discord server, we might as well minimize incentives for silly behavior (with side effects of the leaderboard working better and new cheaters being easier to detect before they make it onto the leaderboard).

mkubecek

@petri999 said in #24:

FIDE ratings are published only after 10 games
The minimum is 5 games (since 2014, at least). And zero result is only discarded from first event (tournament).

How to handle opponent rating meanwhile is an issue I do not know. Best would be to ignore those games but given obsession to each rating point which good part of lichess players have it could lead abortions of games.
People would complain if their games against new players were not rated - and now they complain that they are rated (when they lose to new players, that is). From this point of view, either option is fine. :-) What I would be more worried about is the design of the rating system which assumes that each game is either rated for both players or for neither. It would IMHO require a thorough analysis and simulations to make sure rating these games only for one player wouldn't affect the rating distribution. Perhaps just not showing the early ratings until RD gets below some threshold (probably higher than 110) might help a bit without unwanted side effects.

@petri999 said in #24: > FIDE ratings are published only after 10 games The minimum is 5 games (since 2014, at least). And zero result is only discarded from first event (tournament). > How to handle opponent rating meanwhile is an issue I do not know. Best would be to ignore those games but given obsession to each rating point which good part of lichess players have it could lead abortions of games. People would complain if their games against new players were not rated - and now they complain that they are rated (when they lose to new players, that is). From this point of view, either option is fine. :-) What I would be more worried about is the design of the rating system which assumes that each game is either rated for both players or for neither. It would IMHO require a thorough analysis and simulations to make sure rating these games only for one player wouldn't affect the rating distribution. Perhaps just not showing the early ratings until RD gets below some threshold (probably higher than 110) might help a bit without unwanted side effects.

patriots_12

Wow.

Toadofsky

@mkubecek said in #26:

Perhaps just not showing the early ratings until RD gets below some threshold (probably higher than 110) might help a bit without unwanted side effects.

Back in the Lichess beta days (before the 1.0 stable release) Lichess properly displayed ratings as r ± 2*RD. Some players disliked it, but nowhere near as much as players hate the "?" symbol these days. Nobody cared if their opponent had a high RD or not, since this precise presentation of ratings was boring. Ratings converged differently in those days too.

Perhaps for players on the leaderboard, Lichess should start showing the leaderboard ranks next to ratings, then eventually in place of ratings.

Actually, come to think of it I should make userscripts/extensions for changes I want to see, since seeing such changes in action might actually be persuasive.

@mkubecek said in #26: > Perhaps just not showing the early ratings until RD gets below some threshold (probably higher than 110) might help a bit without unwanted side effects. Back in the Lichess beta days (before the 1.0 stable release) Lichess properly displayed ratings as r ± 2*RD. Some players disliked it, but nowhere near as much as players hate the "?" symbol these days. Nobody cared if their opponent had a high RD or not, since this precise presentation of ratings was boring. Ratings converged differently in those days too. Perhaps for players on the leaderboard, Lichess should start showing the leaderboard ranks next to ratings, then eventually in place of ratings. Actually, come to think of it I should make userscripts/extensions for changes I want to see, since seeing such changes in action might actually be persuasive.

biscuitfiend

@RealDavidNavara: Very insightful comment, of course. The bit where you describe your frustration returning to your "usual" rating is very relatable.

In my opinion, the problem is built into the design: the lichess rating system assumes that form is transient enough that your rating needs to be updated every single game - in other words, that if your rating is updated between individual games, then the rating will be "more accurate" in the second game than if it wasn't updated - but at the same time, also assumes that form is so intransient that the rating deviation (RD) only ever goes down when you play more games. We know empirically that this is a contradiction, because humans tilt, or get distracted, or whatever else, so that you might lose 10 games in a row one day, then the next day you're back to "normal".

Of course, the response from the designer of the system is always going to be "well, no system is perfect, and the rating is only meant to be an approximation anyway!". I'm not alone in thinking this is a bit of a cheap cop-out, because

If you accept that no system is perfect, then why have you chosen to over-engineer yours to such an extent?
If rating is only meant to be an approximation anyway, then why does your system care so much about "rating deviation", and why have you chosen the minimum RD to be so low that win/loss means 5 points difference?
If the amount of points is that small to account for volume of games, then in my view you must accept that volume of games is a big enough factor that the rating system should be slightly different between time controls. It's obviously much harder to play 20 rapid games than it is to play 20 bullet games.

Would be interesting, as ever, to hear your thoughts, David and Toad.

@RealDavidNavara: Very insightful comment, of course. The bit where you describe your frustration returning to your "usual" rating is very relatable. In my opinion, the problem is built into the design: the lichess rating system assumes that form is transient enough that your rating needs to be updated every single game - in other words, that if your rating is updated between individual games, then the rating will be "more accurate" in the second game than if it wasn't updated - but at the same time, also assumes that form is so intransient that the rating deviation (RD) only ever goes *down* when you play more games. We know empirically that this is a contradiction, because humans tilt, or get distracted, or whatever else, so that you might lose 10 games in a row one day, then the next day you're back to "normal". Of course, the response from the designer of the system is always going to be "well, no system is perfect, and the rating is only meant to be an approximation anyway!". I'm not alone in thinking this is a bit of a cheap cop-out, because 1) If you accept that no system is perfect, then why have you chosen to over-engineer yours to such an extent? 2) If rating is only meant to be an approximation anyway, then why does your system care so much about "rating deviation", and why have you chosen the minimum RD to be so low that win/loss means 5 points difference? 3) If the amount of points is that small to account for volume of games, then in my view you must accept that volume of games is a big enough factor that the rating system should be slightly different between time controls. It's obviously much harder to play 20 rapid games than it is to play 20 bullet games. Would be interesting, as ever, to hear your thoughts, David and Toad.

mkubecek

@biscuitfiend said in #29:

the lichess rating system assumes that form is transient enough that your rating needs to be updated every single game - in other words, that if your rating is updated between individual games, then the rating will be "more accurate" in the second game than if it wasn't updated
I don't think this is the actual reason, or at least not the primary one. IMHO it's mostly because most online players don't have patience and would find it unacceptable if they did not get an immediate feedback and had to wait for an updated rating for a week or even a month. Actually, some are even unhappy that they have to wait until the game is over: one of repeated requests in the forum is to show in advance "what are they playing for", i.e. what would be the update in case of a win/draw/loss.

In the OTB world, people learned to live with regular batch updates, partly because it always worked like that. And yet, the site showing "live ratings" of top players, while unofficial, is very popular and many fans seem to take it more seriously than actual FRLs. I don't dare to imagine the outrage if lichess decided to switch to batch updates of its ratings.

@biscuitfiend said in #29: > the lichess rating system assumes that form is transient enough that your rating needs to be updated every single game - in other words, that if your rating is updated between individual games, then the rating will be "more accurate" in the second game than if it wasn't updated I don't think this is the actual reason, or at least not the primary one. IMHO it's mostly because most online players don't have patience and would find it unacceptable if they did not get an immediate feedback and had to wait for an updated rating for a week or even a month. Actually, some are even unhappy that they have to wait until the game is over: one of repeated requests in the forum is to show in advance "what are they playing for", i.e. what would be the update in case of a win/draw/loss. In the OTB world, people learned to live with regular batch updates, partly because it always worked like that. And yet, the site showing "live ratings" of top players, while unofficial, is very popular and many fans seem to take it more seriously than actual FRLs. I don't dare to imagine the outrage if lichess decided to switch to batch updates of its ratings.

This topic is now closed.

Your network blocks the Lichess assets!

Lichess ratings are not Glicko-2