Introducing a universal rating converter for 2024

ok, but thats still innacurate, I could easily reach 2400 classical if I wanted to but lets leave that

And with 2400 you will be 2200 / 2300 FIDE....

@F-35_Raptor said in #15: > ok, but thats still innacurate, I could easily reach 2400 classical if I wanted to but lets leave that And with 2400 you will be 2200 / 2300 FIDE....

DarkChessWizard

Interesting conversions, mine was quite inaccurate but I think the problem was that different countries have different metrics. For example, in India, a 2300 lichess could easily only be a 1500-1700 player. It would be cool to make the comparison using different countries.

PeterRazin

My lichess rapid rating is 1300, but my USCF rating is 919, not 400 as dedicated by the graph.

NoseKnowsAll

@DarkChessWizard @PeterRazin As mentioned in the blog, this only applies to CLASSICAL lichess ratings. Please don't conflate bullet, blitz, and classical.

dboing

edited

@RookyBeach said in #17:

The same happens here, but to a smaller degree when we use piecewise fits for a non piecewise relation

Well, the piecewise approximation is not necessarily more flexible (and overfitting risk) than an equivalent number of parameters family of functions.

The training set risk of being overfitted is in the case one is not doing the double optimization process of having some kind of validation on the held out sub-sample. So we agree there. prediction and generalization from the sampling set to the "reality" of unseen new data, and this sort of things.

With many parameter models, as you mention the NN, one can also go at it a bit more automatically, by adding some regularization terms in the objective function. In regression typically one already has some kind of functional distance to minimize, a regularization term would be about also considering some distance between the function and zero function so that it acts like an under-fitting pressure while the other distance in being pressure to the manifested training data.

Back of using piecewise fits. If the training data is also determining during optimization the number of jumps, maybe that might be different from having an out of data assumption of some function family with a fixed number of parameters.

Why did you think piece-wise fitting would be prone to overfitting? if I understood. I made some arguments but they might not be getting the points.

Afterthoughts. optional.
I also wondering about the notion of hyperparameter here. In some way the out of data assumption can themselves be adjusted, and then the generalization optimization would be tethered to that optimization not being at risk of fitting the sampling errors of the data of interest (that we think is informative about the target phenomenon). This is why I make a distinction between "out of data" and wihtin data. It might be more about this kind of modeling flow. But if this cross-over method (my words) is not having any framework that question is just a musing of mine I guess.

@RookyBeach said in #17: > The same happens here, but to a smaller degree when we use piecewise fits for a non piecewise relation Well, the piecewise approximation is not necessarily more flexible (and overfitting risk) than an equivalent number of parameters family of functions. The training set risk of being overfitted is in the case one is not doing the double optimization process of having some kind of validation on the held out sub-sample. So we agree there. prediction and generalization from the sampling set to the "reality" of unseen new data, and this sort of things. With many parameter models, as you mention the NN, one can also go at it a bit more automatically, by adding some regularization terms in the objective function. In regression typically one already has some kind of functional distance to minimize, a regularization term would be about also considering some distance between the function and zero function so that it acts like an under-fitting pressure while the other distance in being pressure to the manifested training data. Back of using piecewise fits. If the training data is also determining during optimization the number of jumps, maybe that might be different from having an out of data assumption of some function family with a fixed number of parameters. Why did you think piece-wise fitting would be prone to overfitting? if I understood. I made some arguments but they might not be getting the points. Afterthoughts. optional. I also wondering about the notion of hyperparameter here. In some way the out of data assumption can themselves be adjusted, and then the generalization optimization would be tethered to that optimization not being at risk of fitting the sampling errors of the data of interest (that we think is informative about the target phenomenon). This is why I make a distinction between "out of data" and wihtin data. It might be more about this kind of modeling flow. But if this cross-over method (my words) is not having any framework that question is just a musing of mine I guess.

thefrickouttaherelol

@F-35_Raptor said in #10:

this is just funny lol, you consider a 2470 on lichess 2400 Fide, I know 1600s with 2600 rating on lichess lol
In classical?

@F-35_Raptor said in #10: > this is just funny lol, you consider a 2470 on lichess 2400 Fide, I know 1600s with 2600 rating on lichess lol In classical?

thefrickouttaherelol

Looks like the data isn't of particularly high value, probably due to classical popularity being 1/100th of what bullet and blitz are on online chess.

dboing

But if this cross-over method (my words) is not having any framework that question is just a musing of mine I guess.

I meant any framework as I was speculating about. bad typo. phrase fragment typo!

> But if this cross-over method (my words) is not having any framework that question is just a musing of mine I guess. I meant any framework as I was speculating about. bad typo. phrase fragment typo!

RookyBeach edited

@dboing said in #25:

Why did you think piece-wise fitting would be prone to overfitting? if I understood. I made some arguments but they might not be getting the points.

It's basically just a matter of expressiveness - the more expressive the more prone to overfitting it is (e.g. see in this image: https://analystprep.com/study-notes/wp-content/uploads/2021/03/Img_13.jpg)
Least expressive would be just using a constant, linear fitting is more expressive and piece wise linear fitting is even more expressive (since you basically have more degrees of freedom since each piece can have a different slope and intercept - and using the same slope and intercept for all pieces gets you back to a linear fit so this is just a more expressive generalization of that). The smaller you make the pieces the more expressive it gets (since you can basically model nonlinearity with infinitely small piece wise linear functions). Nonlinearity would be on the most expressive side so the most prone to overfitting.
There are a lot of ways of regularization to prevent overfitting (e.g. using a penality on the magnitude of the parameters as in ridge regression like you mentioned or dropout in NNs) and I was curious how this was done here. Especially with small sample sizes you can overfit easily without noticing it if you don't have a validation set.

I was just curious why the piece-wise fit was chosen since points can be made for both using something less expressive as a linear fit and also for using something more expressive like a NN. Normally you just test different models and then check what generalizes best on a holdout set.

@dboing said in #25: > Why did you think piece-wise fitting would be prone to overfitting? if I understood. I made some arguments but they might not be getting the points. It's basically just a matter of expressiveness - the more expressive the more prone to overfitting it is (e.g. see in this image: https://analystprep.com/study-notes/wp-content/uploads/2021/03/Img_13.jpg) Least expressive would be just using a constant, linear fitting is more expressive and piece wise linear fitting is even more expressive (since you basically have more degrees of freedom since each piece can have a different slope and intercept - and using the same slope and intercept for all pieces gets you back to a linear fit so this is just a more expressive generalization of that). The smaller you make the pieces the more expressive it gets (since you can basically model nonlinearity with infinitely small piece wise linear functions). Nonlinearity would be on the most expressive side so the most prone to overfitting. There are a lot of ways of regularization to prevent overfitting (e.g. using a penality on the magnitude of the parameters as in ridge regression like you mentioned or dropout in NNs) and I was curious how this was done here. Especially with small sample sizes you can overfit easily without noticing it if you don't have a validation set. I was just curious why the piece-wise fit was chosen since points can be made for both using something less expressive as a linear fit and also for using something more expressive like a NN. Normally you just test different models and then check what generalizes best on a holdout set.

byebob

man I wish I was 2400 FIDE... like 300 points away