[ASoT] Some ways ELK could still be solvable in practice

https://www.lesswrong.com/posts/SbxWdhhwJWCpifTst/asot-some-ways-elk-could-still-be-solvable-in-practice

Editor’s note: I’m experimenting with having a lower quality threshold for just posting things even while I’m still confused and unconfident about my conclusions, but with this disclaimer at the top. This post is a followup to my earlier post. If ELK is impossible in generality, how could we solve it in practice? Two main ways I can think of:

Comment

https://www.lesswrong.com/posts/SbxWdhhwJWCpifTst/asot-some-ways-elk-could-still-be-solvable-in-practice?commentId=jehbSTiz65CCqJic9

my intuition about language models is that despite communicating in language, they learn and think in pretty alien ways. Do you think you can elaborate on why you think this? Recent interpretability work such as Locating and Editing Factual Knowledge in GPT and In-context Learning and Induction Heads has generally made me think language models have more interpretable/​understandable internal computations than I’d initially assumed.