[Personal Experiment] Training YouTube’s Algorithm

https://www.lesswrong.com/posts/TGAN2MZXAufvx8ksg/personal-experiment-training-youtube-s-algorithm

I listen to YouTube music every day. I created a new Google account three weeks ago. I’ve watched nothing but music and dancing on it. Whenever it recommends a non-music non-dancing video I immediately click "Don’t recommend channel"[1].

If you you opened my YouTube account right now you’d see 509 recommendations on the home page. 487 would be for music and dancing. 22 would be things I’m not interested in. Below are YouTube’s 22 non-musical recommendations:

Most of these are understandable errors at personalization. The violin unboxing makes sense because I listen to lots of violin music. The Star Wars and anime stuff makes sense because I watch "Star Wars Anime Opening" videos. "不要随便让老师唱歌跳舞..." is dancing-related. I think the movie trailers come from the fact that some of the music videos I watch take the form of movie clips. I was surprised by the science recommendations until I realized it’s not difficult to infer that someone who listens to AIVA-generated music over-and-over again (among similar things) might be interested in science. The wedding highlight film could be related to the wedding dance videos I enjoy.

The videos in Chinese, Spanish and Japanese make sense because I listen to music in these languages. When I first created my account it recommended a lot of news, videogames and comedy (among other things). The foreign language news, videogames and comedy are generic recommendations speakers of these languages. If I click "Don’t recommend channel" enough times they’ll probably go away like they (kinda) do in English. I don’t know exactly[2] where the French and Filipino videos came from but I expect the same goes for them too.

This leaves 5 truly impersonal recommendations[3].

To summarize:

Comment

https://www.lesswrong.com/posts/TGAN2MZXAufvx8ksg/personal-experiment-training-youtube-s-algorithm?commentId=37c7RrpDMSgvnDAYk

It seems likely that the algorithms are coded (probably outside the ML portions) to always give some amount of random-appearing stuff, in order to avoid becoming too narrow in their suggestions.

Comment

https://www.lesswrong.com/posts/TGAN2MZXAufvx8ksg/personal-experiment-training-youtube-s-algorithm?commentId=wGsRi6YRqFAs7vXir

Most of the weirder suggestions happen later in the recommendations (lower down on the page). I think the algorithm thinks to itself "The user appears tired of music and probably wants to watch something else".

https://www.lesswrong.com/posts/TGAN2MZXAufvx8ksg/personal-experiment-training-youtube-s-algorithm?commentId=uG7tWDpPHzMs3EnEg

That seems silly, given the money on the line and that you can have your ML architecture take this into account.