The idea of reasoning about content to recommend as a similarity graph is quite widespread. Broadly speaking, you can start by drawing a set of circles (for users) on the left and a set of circles (for “items” – songs, movies..) on the right; when users rate/listen to/etc items, you draw an arrow from the corresponding left circle to the right circle (i.e. a bipartite graph). What collaborative filtering algorithms can do is project the two-sided graph to two equivalent representations, where users are linked to other users, and items are linked to other items based on how similar they are.
There are a bunch of places where this kind of abstraction has been used; for example, Oscar Celma used graphs to navigate users when discovering music in the long-tail. Paul Lamere posted graphs made with the EchoNest API on his blog. I’ve also dabbled in this area a bit, but not using music listening data; I was using (the more traditional) MovieLens and Netflix datasets. The question that comes to mind when reading about techniques that operate on the graph, though, is: are the underlying graphs real representations of similarity between content? What if the graphs are wrong? (more…)