Introducing a universal rating converter for 2024 • page 1/12 • Community Blog Discussions • lichess.org

Comments on https://lichess.org/@/noseknowsall/blog/introducing-a-universal-rating-converter-for-2024/X2QAH27t

Welcome to the Dojo!

can you provide confidence intervals for the different conversions in the table? some of them are surely way less reliable than others and this would make these easy to identify

NoseKnowsAll

@RookyBeach I cannot provide confidence intervals because I already pruned out data that wasn't reasonable and hand-modified results that popped up in one comparison that didn't line up well with more fundamentally reliable results resulting from another comparison. That concept was discussed a bit under conclusions. Feel free to imagine your own confidence intervals based on the plots I showed. I think those plots reasonably clearly show when the results can and can't be trusted.

dboing

edited

could you include pool information in such conversion reasoning, if it were having some other information than rating.
About the games information that contributed to the native rating systems in such pools.

As for now I assume all such information, but not being included, is being intergrated as they float over all data points (the non monitroed possible contributing variables).

That one can have a good enough relation with such floating variables. I wonder what it might mean about the pools? part of your whole world of data sample.

I did not reallly read about the construction or the reasoning. I just did not see any mention of the context of the rating systems of that I could catch on as being part of that dependency. I did look at the figures.

Ok more. I suspect or kind of fast read that is might be about finding sub-pool in common pairwise across systems. It might be that when adding more systems, the sub-pooling quirks might get a smaller effect overall?

edit: more than just 2 pool common sub-pools. I will need to read that more carefully. at some point.. Well interesting.

could you include pool information in such conversion reasoning, if it were having some other information than rating. About the games information that contributed to the native rating systems in such pools. As for now I assume all such information, but not being included, is being intergrated as they float over all data points (the non monitroed possible contributing variables). That one can have a good enough relation with such floating variables. I wonder what it might mean about the pools? part of your whole world of data sample. I did not reallly read about the construction or the reasoning. I just did not see any mention of the context of the rating systems of that I could catch on as being part of that dependency. I did look at the figures. Ok more. I suspect or kind of fast read that is might be about finding sub-pool in common pairwise across systems. It might be that when adding more systems, the sub-pooling quirks might get a smaller effect overall? edit: more than just 2 pool common sub-pools. I will need to read that more carefully. at some point.. Well interesting.

RookyBeach edited

@NoseKnowsAll but even after filtering and pruning you can use the remaining sample sizes per conversion to provide confidence intervals right? even if they are slightly biased because of the pruning and filtering. It's very hard to extract any sample size numbers from just the plots.

Unrelated to that, which model did you use? It looks like a step wise linear fit, but in reality the relation would be perfectly linear (since rating differences basically express expected scores which would be the same no matter the system), right? so there is probably some overfitting

@NoseKnowsAll but even after filtering and pruning you can use the remaining sample sizes per conversion to provide confidence intervals right? even if they are slightly biased because of the pruning and filtering. It's very hard to extract any sample size numbers from just the plots. Unrelated to that, which model did you use? It looks like a step wise linear fit, but in reality the relation would be perfectly linear (since rating differences basically express expected scores which would be the same no matter the system), right? so there is probably some overfitting

CkickyCheck

so this means i have a legitimate chance at being a CM soon right??

Schachgestalt

Your estimation fits my ELO quite good. I am 2230 ELO.

Contrary, https://ethanlebowitz.github.io/RatingConverter/index.html thinks I am just 2010 ELO. So, add 220 points to the guess of that tool and you get your real ELO :-)

Your estimation fits my ELO quite good. I am 2230 ELO. Contrary, https://ethanlebowitz.github.io/RatingConverter/index.html thinks I am just 2010 ELO. So, add 220 points to the guess of that tool and you get your real ELO :-)

dboing

can we see the big model hypothesis and its free parameters that would handles all the figures system pairs of data.

or a description. I might have missed in the reading of the text, a possible reference to the type.

But figure wise. The first table seems to be the resulting cross many system result. So there is an underlying more symbolic model. That can be projected slice wise on each figure of data point clouds about each pairs of systems.

Is that a correct blurred vision description?

And other question. maybe related to the post about confidence notions. Can one bathe this data exercise into a prediction one, from sub-sampling the data used to fit the global model with certain number of adjustable parameters, and then use the reamaining data to test for predicting value.

I always wondered what would be the relation between a few parameter model assumptin about the population distribution being singly optimized within such context, and perhaps having a extra assumption about the sampling distruction as well, and basing confidence estimation on the null hypothesis involving such things about the confidence of the regression model extrapolation (or interpolatoin) outside the data points being used, and the machine learning many parameter models that are doubly optmized as function approximators, on some partitions of the full sample.

I might have forgotten some things. It has been a while. for the former type of statistics. But since we are definitely outside the guarantess of each rating systems statistical bases, it might be worth exploring.

A pure fit, without a sense of how else the data sample could have been a factor itself, as you mentioned perhaps the range of high level data point density might influence the extrapolation dispersion or error. or alternatively the prediction from one slice of data to another. Correct me if anything sounds funky in there. I do not vouch for it being without misconceptions.

can we see the big model hypothesis and its free parameters that would handles all the figures system pairs of data. or a description. I might have missed in the reading of the text, a possible reference to the type. But figure wise. The first table seems to be the resulting cross many system result. So there is an underlying more symbolic model. That can be projected slice wise on each figure of data point clouds about each pairs of systems. Is that a correct blurred vision description? And other question. maybe related to the post about confidence notions. Can one bathe this data exercise into a prediction one, from sub-sampling the data used to fit the global model with certain number of adjustable parameters, and then use the reamaining data to test for predicting value. I always wondered what would be the relation between a few parameter model assumptin about the population distribution being singly optimized within such context, and perhaps having a extra assumption about the sampling distruction as well, and basing confidence estimation on the null hypothesis involving such things about the confidence of the regression model extrapolation (or interpolatoin) outside the data points being used, and the machine learning many parameter models that are doubly optmized as function approximators, on some partitions of the full sample. I might have forgotten some things. It has been a while. for the former type of statistics. But since we are definitely outside the guarantess of each rating systems statistical bases, it might be worth exploring. A pure fit, without a sense of how else the data sample could have been a factor itself, as you mentioned perhaps the range of high level data point density might influence the extrapolation dispersion or error. or alternatively the prediction from one slice of data to another. Correct me if anything sounds funky in there. I do not vouch for it being without misconceptions.

F-35_Raptor

this is just funny lol, you consider a 2470 on lichess 2400 Fide, I know 1600s with 2600 rating on lichess lol