Archive for the ‘methodology’ Category

Similarity Graphs

Thursday, February 26th, 2009

The idea of reasoning about content to recommend as a similarity graph is quite widespread. Broadly speaking, you can start by drawing a set of circles (for users) on the left and a set of circles (for “items” – songs, movies..) on the right; when users rate/listen to/etc items, you draw an arrow from the corresponding left circle to the right circle (i.e. a bipartite graph).  What collaborative filtering algorithms can do is project the two-sided graph to two equivalent representations, where users are linked to other users, and items are linked to other items based on how similar they are.

There are a bunch of places where this kind of abstraction has been used; for example, Oscar Celma used graphs to navigate users when discovering music in the long-tail. Paul Lamere posted graphs made with the EchoNest API on his blog. I’ve also dabbled in this area a bit, but not using music listening data; I was using (the more traditional) MovieLens and Netflix datasets. The question that comes to mind when reading about techniques that operate on the graph, though, is: are the underlying graphs real representations of similarity between content? What if the graphs are wrong? (more…)

Media Slant

Friday, November 28th, 2008

Say that you have to answer this research question “Does the market discourage biased reporting (media slant)? Or does the market encourage it? Matthew Gentzkow and Jesse Shapiro, two economists at the University of Chicago’s business school, set out to test this proposition, and The Economist reported on their research in just two pages. Gentzkow’s and Shapiro’s methodology is smart and imaginative! Here you go:

(more…)