[Question] Any work on honeypots (to detect treacherous turn attempts)?

https://www.lesswrong.com/posts/mY7aZSXHpehrfwKn5/any-work-on-honeypots-to-detect-treacherous-turn-attempts

I know the idea of making a "honeypot" to detect when an AI system would attempt a treacherous turn if given the opportunity has been discussed (e.g. IIRC, in Superintelligence). But is there anyone actually working on this? Or any work that’s been published?

Comment

https://www.lesswrong.com/posts/mY7aZSXHpehrfwKn5/any-work-on-honeypots-to-detect-treacherous-turn-attempts?commentId=4T8YgjrHsuDRhqMSn

I don’t know of any serious work on it. I did have an idea regarding honeypots a little while ago here.