Identifying Brilliant Moves With Leela Chess Zero

what it seems you're finding is low policy moves, usually strong moves are difficult to find, and generally you'll find low policy on historical brilliant moves, because leela, similarly wouldn't see the point at 1 node, but perhaps at 1000

Since leela is very strong even when playing on 1 node, my hope was that "obvious" sacrifices and "natural" strong moves would also have a higher policy and only moves that really stand out have a policy under 2%.
I think that it worked quite well, since only 9 moves in the 1959 candidates tournament (112 games) were found to be "brilliant", so it certainly many strong moves have a higher policy.

dboing

@BugMeLater said in #10:

what it seems you're finding is low policy moves, usually strong moves are difficult to find, and generally you'll find low policy on historical brilliant moves, because Leela, similarly wouldn't see the point at 1 node, but perhaps at 1000

I am not sure to understand. now.

So the idea is using lc0 for its own failure at going for the right move (if it were sharp landscape) as a measure of surprise, and SF for its absolute oracle knowledge of the best move in all positions, amen.

It is like searching for moves that SF and LC0 don't agree on. and among those, the ones that previous human annotation declared brilliant (by someone). Was only the policy used not the current position evalution?

What is the nesting order of the various variables having been explored? I gave an example.. maximal set over some database of games with positions available for sampling (words!), actrually position successor pairs, such that LC0 relative weight on the moves made was below certain ceiling, among that subset (nested) filtering of those with SF high score on those moves. Or does it matter. I have no clue of the resource cost.. so I keep in abstract.

Did I get some of the idea right?

I think you could actually combine the sharpness notion, as it affect the landscape (is actually a statement of move profile or landscape, in my current understanding of how people use the word).

this was mentioned though.. in blog.. the prevalence of "good" moves.. same thing. in the end why not use whole distribution of move probabilities.. again resource load ignorance.

So the current position odds might also be a factor.. not only best move. but amplitude of odds differential across the pair. do I make sense?

@BugMeLater said in #10: > what it seems you're finding is low policy moves, usually strong moves are difficult to find, and generally you'll find low policy on historical brilliant moves, because Leela, similarly wouldn't see the point at 1 node, but perhaps at 1000 I am not sure to understand. now. So the idea is using lc0 for its own failure at going for the right move (if it were sharp landscape) as a measure of surprise, and SF for its absolute oracle knowledge of the best move in all positions, amen. It is like searching for moves that SF and LC0 don't agree on. and among those, the ones that previous human annotation declared brilliant (by someone). Was only the policy used not the current position evalution? What is the nesting order of the various variables having been explored? I gave an example.. maximal set over some database of games with positions available for sampling (words!), actrually position successor pairs, such that LC0 relative weight on the moves made was below certain ceiling, among that subset (nested) filtering of those with SF high score on those moves. Or does it matter. I have no clue of the resource cost.. so I keep in abstract. Did I get some of the idea right? I think you could actually combine the sharpness notion, as it affect the landscape (is actually a statement of move profile or landscape, in my current understanding of how people use the word). this was mentioned though.. in blog.. the prevalence of "good" moves.. same thing. in the end why not use whole distribution of move probabilities.. again resource load ignorance. So the current position odds might also be a factor.. not only best move. but amplitude of odds differential across the pair. do I make sense?

jk_182

@dboing said in #12:

I am not sure to understand. now.

So the idea is using lc0 for its own failure at going for the right move (if it were sharp landscape) as a measure of surprise, and SF for its absolute oracle knowledge of the best move in all positions, amen.

It is like searching for moves that SF and LC0 don't agree on. and among those, the ones that previous human annotation declared brilliant (by someone). Was only the policy used not the current position evalution?

What is the nesting order of the various variables having been explored? I gave an example.. maximal set over some database of games with positions available for sampling (words!), actrually position successor pairs, such that LC0 relative weight on the moves made was below certain ceiling, among that subset (nested) filtering of those with SF high score on those moves. Or does it matter. I have no clue of the resource cost.. so I keep in abstract.

Did I get some of the idea right?

I think you could actually combine the sharpness notion, as it affect the landscape (is actually a statement of move profile or landscape, in my current understanding of how people use the word).

this was mentioned though.. in blog.. the prevalence of "good" moves.. same thing. in the end why not use whole distribution of move probabilities.. again resource load ignorance.

So the current position odds might also be a factor.. not only best move. but amplitude of odds differential across the pair. do I make sense?

It's not quite finding moves where SF and LC0 don't agree, since only the policy of LC0 is used. The idea is that LC0 views the position without any calculation to determine which moves are surprising. What I didn't want to happen is that a basic queen sacrifice leading to a back rank mate will be called brilliant (which in reality it isn't since every relatively strong player sees the idea instantly) and LC0 on one node is strong enough find common combinations so they have a higher policy. SF is only used to determine the objective quality of the move since many surprising moves are simply bad (for example, 1.e4 e5 2.f4 has a policy of less than 1 percent).

I didn't test it too deeply since it took a lot of time, but I started with known brilliant moves to see if the idea works for these examples. Then I looked at some more games to see if there aren't too many brilliant moves. I hope this answers your question.

I didn't use the sharpness, since I was also interested about great positional moves (I think that Ra2 in the Petrosian-Olafsson game is a good example of that). But using sharpness as an additional indicator can give you more info about the type of position.

I haven't considered using the whole probability distribution, but I'm not quite sure whether a move is more brilliant if there is one "obvious" alternative which is worse or if there are many "obvious" moves but none as strong as the strongest move with a low policy. I think the best way to overcome the issue with many good options is to say that a brilliant move has to be clearly stronger than any other move. I haven't added this since I this would have doubled the computing time and I could only run it on my laptop.

@dboing said in #12: > I am not sure to understand. now. > > So the idea is using lc0 for its own failure at going for the right move (if it were sharp landscape) as a measure of surprise, and SF for its absolute oracle knowledge of the best move in all positions, amen. > > It is like searching for moves that SF and LC0 don't agree on. and among those, the ones that previous human annotation declared brilliant (by someone). Was only the policy used not the current position evalution? > > What is the nesting order of the various variables having been explored? I gave an example.. maximal set over some database of games with positions available for sampling (words!), actrually position successor pairs, such that LC0 relative weight on the moves made was below certain ceiling, among that subset (nested) filtering of those with SF high score on those moves. Or does it matter. I have no clue of the resource cost.. so I keep in abstract. > > Did I get some of the idea right? > > I think you could actually combine the sharpness notion, as it affect the landscape (is actually a statement of move profile or landscape, in my current understanding of how people use the word). > > this was mentioned though.. in blog.. the prevalence of "good" moves.. same thing. in the end why not use whole distribution of move probabilities.. again resource load ignorance. > > So the current position odds might also be a factor.. not only best move. but amplitude of odds differential across the pair. do I make sense? It's not quite finding moves where SF and LC0 don't agree, since only the policy of LC0 is used. The idea is that LC0 views the position without any calculation to determine which moves are surprising. What I didn't want to happen is that a basic queen sacrifice leading to a back rank mate will be called brilliant (which in reality it isn't since every relatively strong player sees the idea instantly) and LC0 on one node is strong enough find common combinations so they have a higher policy. SF is only used to determine the objective quality of the move since many surprising moves are simply bad (for example, 1.e4 e5 2.f4 has a policy of less than 1 percent). I didn't test it too deeply since it took a lot of time, but I started with known brilliant moves to see if the idea works for these examples. Then I looked at some more games to see if there aren't too many brilliant moves. I hope this answers your question. I didn't use the sharpness, since I was also interested about great positional moves (I think that Ra2 in the Petrosian-Olafsson game is a good example of that). But using sharpness as an additional indicator can give you more info about the type of position. I haven't considered using the whole probability distribution, but I'm not quite sure whether a move is more brilliant if there is one "obvious" alternative which is worse or if there are many "obvious" moves but none as strong as the strongest move with a low policy. I think the best way to overcome the issue with many good options is to say that a brilliant move has to be clearly stronger than any other move. I haven't added this since I this would have doubled the computing time and I could only run it on my laptop.

Toscani

What's the most common method for measuring sharpness?
Sharpness is the presence of immediate tactical chances in a position. So sharpness must be a measurement of the impact of a move. Something like one player walking on an edge where there are more possibilities to do a bad move, while the other has less chance of doing bad moves because of all the good candidate moves. Stay sharp is like remain on your guard. I believe sharpness is subjective (influenced by personal choices or algorithms) and it's probably biased by engine depth or human strategies.

Lucas chess shows percentages after an analysis. Can someone explain Narrowness compared to Sharpness.

What's the most common method for measuring sharpness? Sharpness is the presence of immediate tactical chances in a position. So sharpness must be a measurement of the impact of a move. Something like one player walking on an edge where there are more possibilities to do a bad move, while the other has less chance of doing bad moves because of all the good candidate moves. Stay sharp is like remain on your guard. I believe sharpness is subjective (influenced by personal choices or algorithms) and it's probably biased by engine depth or human strategies. Lucas chess shows percentages after an analysis. Can someone explain Narrowness compared to Sharpness.

jk_182

@Toscani said in #14:

What's the most common method for measuring sharpness?
Sharpness is the presence of immediate tactical chances in a position. So sharpness must be a measurement of the impact of a move. Something like one player walking on an edge where there are more possibilities to do a bad move, while the other has less chance of doing bad moves because of all the good candidate moves. Stay sharp is like remain on your guard. I believe sharpness is subjective (influenced by personal choices or algorithms) and it's probably biased by engine depth or human strategies.

Lucas chess shows percentages after an analysis. Can someone explain Narrowness compared to Sharpness.

I don't think that sharpness has a concrete definition and it's certainly a human concept that engines don't really understand. I tried in the past to determine sharpness using LC0's WDL output, which are the probabilities for a win, a draw and a loss. I interpreted a high win and a high loss percentage at the same time as a sharp position, since the position is somewhat balanced, but both sides can win. You can read more here: https://www.chess-journal.com/evaluatingSharpness1.html

I don't know what narrowness is, but I would say that an important point about sharpness is that both sides have chances to win and it doesn't say much about the objective evaluation.

@Toscani said in #14: > What's the most common method for measuring sharpness? > Sharpness is the presence of immediate tactical chances in a position. So sharpness must be a measurement of the impact of a move. Something like one player walking on an edge where there are more possibilities to do a bad move, while the other has less chance of doing bad moves because of all the good candidate moves. Stay sharp is like remain on your guard. I believe sharpness is subjective (influenced by personal choices or algorithms) and it's probably biased by engine depth or human strategies. > > Lucas chess shows percentages after an analysis. Can someone explain Narrowness compared to Sharpness. I don't think that sharpness has a concrete definition and it's certainly a human concept that engines don't really understand. I tried in the past to determine sharpness using LC0's WDL output, which are the probabilities for a win, a draw and a loss. I interpreted a high win and a high loss percentage at the same time as a sharp position, since the position is somewhat balanced, but both sides can win. You can read more here: https://www.chess-journal.com/evaluatingSharpness1.html I don't know what narrowness is, but I would say that an important point about sharpness is that both sides have chances to win and it doesn't say much about the objective evaluation.

PondSnail

I have thought about this before. Using machine learning to assign exclams automatically.
My idea was to train a model on a lot of games annotated by grandmasters and let it come up with patterns as to what constitutes a brilliant move.
I think this would work very well and give human-looking brilliant moves.

I have thought about this before. Using machine learning to assign exclams automatically. My idea was to train a model on a lot of games annotated by grandmasters and let it come up with patterns as to what constitutes a brilliant move. I think this would work very well and give human-looking brilliant moves.

Toscani

Found some stuff about brilliancies and narrowness.
https://github.com/lukasmonk/lucaschessR2/blob/main/bin/Code/Analysis/AnalysisIndexes.py
https://lucaschess.blogspot.com/2022/10/setting-analysis-parameters.html#:~:text=be%20an%20inaccuracy.-,very_good_depth,-Default%20value%3A%208
https://chessionate.com/lucaswiki/index.php?title=Analyse_games

NARROWNESS: page 7 "... how crowded and/or narrow is a position ..."
https://lucaschess.pythonanywhere.com/static/pdf/english/advanced_info_%20in_LC8.pdf

Brilliant move may just be a very good move, but I think it does not mean it's the best move. It all depends on who won the game by the brilliant move. As soon as two top engines don't agree on the same best move, than which one is brilliant? The engine that wins with the brilliant move. So to determine a brilliant move, I think games need to be analysed by different engines. As soon as a move is not agreed on by the top engines, than it might just very well fall under brilliancies. Found some stuff about brilliancies and narrowness. https://github.com/lukasmonk/lucaschessR2/blob/main/bin/Code/Analysis/AnalysisIndexes.py https://lucaschess.blogspot.com/2022/10/setting-analysis-parameters.html#:~:text=be%20an%20inaccuracy.-,very_good_depth,-Default%20value%3A%208 https://chessionate.com/lucaswiki/index.php?title=Analyse_games NARROWNESS: page 7 "... how crowded and/or narrow is a position ..." https://lucaschess.pythonanywhere.com/static/pdf/english/advanced_info_%20in_LC8.pdf

dboing

@PondSnail said in #16:

Part of that idea is also something I think should be experiemented with, however, one often has in the passed only considered wrongly it turns out, that learning was mostly imitation learning. Chess is a rational and logical games, even beyond our calculation abilities from the microscopite small finite ruleset (micro for same combinatioral depth).

The hard part is about the "logic" beyond calculation. And that is where we might have thought that only the expert had some magic touch that could only be approached by imitation, hence learning from masters games only, etc...

Hence, I think why ML took a long while coming to chess. Because of the database being used with the conceptual constraint above.

I would revise the experiment with badgyal from lc0, under that point of view. I improved on some lc0 already trained network by doing some supervised learning (which use to fail before alpha zero RL from zero almost knowledge, if we exclude not the finite mobility and legality ruleset). I may misremember the details, but it did show, if a0 (zero in it) did not already show that the intuition part of learning (which is likely behind the observer experts talking about other experts moves in annotations) needs bad chess too, in order to build itself and generalize to unseen games.

So, looping back here.. Yes good question, but perhaps one need to be careful about the constituents of the training database. Yes obviously the ones with labels (brilliancy), but then it needs a lot of others games not just by masters. I think the blog is or could be developing more than one way to build such database characterization, if it were a bit more explorative and not too hasty on attributing names before agreement becomes clear (meaning keep the assumptions explicit and the flow of information too, so that the antecedents and the consequences be shared as food to digest by others (i.e. reasoned with at that level). I might be wrong in that not being the case, but then I would say keep that up.

I am saying the latter concern, because it seems that for learning from experts calls on brilliancy, the good idea, how are we going to build the ambient context database and learning framework (unsupervised and supervised, and in-between). Can we use other statistical tools like the blog has used 2 types of engines (not doing statistics over same chess world, moves and outcomes, input and output antecedents not being "trained" same way, yet both having their own set of goggle value).

but good idea, I was wondering the same.. why not had this variable. the other point of view about what is brilliancy attributed by humans, and characterize over many games and their positions (the nature of which might be part of research as I argued, or tried too). Then the most constructive measure vector from the blog (ceiling from LC0 posicy, floor from SF, and what i did not get yet), would have some alternate point of view to help in the discussion about brilliancies notions.. I find it early to name the 2 engine scales based one exactly same name as human called brilliancy..

sorry it takes me so long to make my points.. at least all my thinking right or wrong is laid out and accepting criticism by reader by being there.

@PondSnail said in #16: > Part of that idea is also something I think should be experiemented with, however, one often has in the passed only considered wrongly it turns out, that learning was mostly imitation learning. Chess is a rational and logical games, even beyond our calculation abilities from the microscopite small finite ruleset (micro for same combinatioral depth). The hard part is about the "logic" beyond calculation. And that is where we might have thought that only the expert had some magic touch that could only be approached by imitation, hence learning from masters games only, etc... Hence, I think why ML took a long while coming to chess. Because of the database being used with the conceptual constraint above. I would revise the experiment with badgyal from lc0, under that point of view. I improved on some lc0 already trained network by doing some supervised learning (which use to fail before alpha zero RL from zero almost knowledge, if we exclude not the finite mobility and legality ruleset). I may misremember the details, but it did show, if a0 (zero in it) did not already show that the intuition part of learning (which is likely behind the observer experts talking about other experts moves in annotations) needs bad chess too, in order to build itself and generalize to unseen games. So, looping back here.. Yes good question, but perhaps one need to be careful about the constituents of the training database. Yes obviously the ones with labels (brilliancy), but then it needs a lot of others games not just by masters. I think the blog is or could be developing more than one way to build such database characterization, if it were a bit more explorative and not too hasty on attributing names before agreement becomes clear (meaning keep the assumptions explicit and the flow of information too, so that the antecedents and the consequences be shared as food to digest by others (i.e. reasoned with at that level). I might be wrong in that not being the case, but then I would say keep that up. I am saying the latter concern, because it seems that for learning from experts calls on brilliancy, the good idea, how are we going to build the ambient context database and learning framework (unsupervised and supervised, and in-between). Can we use other statistical tools like the blog has used 2 types of engines (not doing statistics over same chess world, moves and outcomes, input and output antecedents not being "trained" same way, yet both having their own set of goggle value). but good idea, I was wondering the same.. why not had this variable. the other point of view about what is brilliancy attributed by humans, and characterize over many games and their positions (the nature of which might be part of research as I argued, or tried too). Then the most constructive measure vector from the blog (ceiling from LC0 posicy, floor from SF, and what i did not get yet), would have some alternate point of view to help in the discussion about brilliancies notions.. I find it early to name the 2 engine scales based one exactly same name as human called brilliancy.. sorry it takes me so long to make my points.. at least all my thinking right or wrong is laid out and accepting criticism by reader by being there.

TotalNoob69

@jk_182 I am writing here about both brilliancy and sharpness, because Twitter wants me to get verified in order to message you on your blog entry.

So first the brilliancy. Very interesting approach. I've also thought a lot about the subject and I've found it difficult to quantify the "obviousness" of a move. It seems that's what it takes to separate simply good moves from good moves that are also hard to find. But hard to find by whom? SF brute forces everything. Leela does whatever it thinks works based on previous games. Maya does whatever it thinks people would do. Also there is the lichess database, which could say exactly how many people played a move, but that could cover only openings. There is also the question of level: what is obvious to Carlsen would not be obvious to me.

I would have thought Maya is the obvious choice for the metric, but your approach seems very interesting, perhaps "objectively" better, since Leela seems to be using nodes to calculate obviousness, which implies some deeper thought on what someone would do. Perhaps Maya does this, too, I don't know. But are NN based engines, so would the same system apply to both, just different model?

And there is the idea of SF, which we like to think covers all moves, but it's so fast because it prunes the vast majority of them. Could the SF pruning decision tree be used to see how much a move would be "prunable" and get a similar result?

Now back to sharpness, I will be quick. It feels to me that sharpness is the opposite of drawing and clear winning chances. If there is no chance of draw, but White has 100% chances of winning, it's not sharp. So my formula for it would be :
1 - (probability of draw + probability of one side winning) = 1 - (d+|w-l|).
So for your example in the blog of WDL [181, 146, 673] sharpness would be 0.362, multiplied by 10 to get to your scale: 3.62
Thus sharpness would be a value between 0 and 1 and its maximum value would be achieved on exactly 50% chances of winning for each side, with 0% of draw.

@jk_182 I am writing here about both brilliancy and sharpness, because Twitter wants me to get verified in order to message you on your blog entry. So first the brilliancy. Very interesting approach. I've also thought a lot about the subject and I've found it difficult to quantify the "obviousness" of a move. It seems that's what it takes to separate simply good moves from good moves that are also hard to find. But hard to find by whom? SF brute forces everything. Leela does whatever it thinks works based on previous games. Maya does whatever it thinks people would do. Also there is the lichess database, which could say exactly how many people played a move, but that could cover only openings. There is also the question of level: what is obvious to Carlsen would not be obvious to me. I would have thought Maya is the obvious choice for the metric, but your approach seems very interesting, perhaps "objectively" better, since Leela seems to be using nodes to calculate obviousness, which implies some deeper thought on what someone would do. Perhaps Maya does this, too, I don't know. But are NN based engines, so would the same system apply to both, just different model? And there is the idea of SF, which we like to think covers all moves, but it's so fast because it prunes the vast majority of them. Could the SF pruning decision tree be used to see how much a move would be "prunable" and get a similar result? Now back to sharpness, I will be quick. It feels to me that sharpness is the opposite of drawing and clear winning chances. If there is no chance of draw, but White has 100% chances of winning, it's not sharp. So my formula for it would be : 1 - (probability of draw + probability of one side winning) = 1 - (d+|w-l|). So for your example in the blog of WDL [181, 146, 673] sharpness would be 0.362, multiplied by 10 to get to your scale: 3.62 Thus sharpness would be a value between 0 and 1 and its maximum value would be achieved on exactly 50% chances of winning for each side, with 0% of draw.

dboing

edited

yes SF show us your entrails so we can see through your goggles instead of gobbling your 42 answer version as from a divine oracle, without knowing what the question was exactly.

Your network blocks the Lichess assets!

Identifying Brilliant Moves With Leela Chess Zero