Private Collaborative Filtering

Recently, I submitted my first paper to RecSys 07, “Private Distributed Collaborative Filtering using Estimated Concordance Measures.” Even though it is not particularly about mobile-stuff, here’s a quick run through the main ideas:

Collaborative filtering is a means of using a community’s behavior, within a certain domain (movies, music), to support reducing the amount of information that each individual needs to looks through to find their items of interest. It is the dominant method behind recommender systems (such as Amazon, etc), and is based on a simple idea: people with previous shared interests will, most likely, share common likes and dislikes in the future. So, to predict how much I will like a certain item, the system first compares my rating history to all the other users to produce similarity measures, and then uses these similarity measures to compute a weighted average of how much they enjoyed the item in question.

The problem is that this method allows for no privacy. Particularly in a distributed environment, where I do not know how much to trust unknown neighbors, I do not want to have to share my entire rating history (i.e. my profile) with them, to find a similarity measure- thus discouraging cooperation, which is harmful to collaborative filtering. The actual similarity measures (the most famous being the Pearson correlation coefficient), simply cannot be found without full profile disclosure.

Therefore, in the paper we proposed a new similarity measure, based on concordance. You and I rate a movie concordantly if we both rate it above or below our mean rating (in other words, we agree about whether to give it thumbs up/thumbs down). If you hated the movie (and rated it below your mean), and I loved it (rating it above my mean), then we disagree- the ratings are discordant. If one of us has no opinion about the movie, we just say it is tied. The new similarity measure is derived from the number of concordant (C), discordant (D), tied (T) pairs between our rating sets, and the size of the set N, and this similarity measure works just as effectively (in terms of generating recommendations) as the Pearson correlation coefficient.

So how can this be used to include privacy in collaborative filtering? If you and I share a common randomly-generated set, and report to each other the number of C, D, and T pairs we have with the random set to each other, these values can be used to place bounds on the actual values of C, D, and T pairs we have with each other: we can estimate our similarity without ever sharing any profile-specific information, only sharing abstracted profile information derived from a comparison with a random set. Privacy is not breached, and, along with an incremental learning technique (future work) about how to evolve the similarity measure between recommenders, we can start collaborating with each other!

If you’re interested in the details (and the evaluation), I’ll post the paper on my web site soon (when I hear back from RecSys!)

8 Responses to “Private Collaborative Filtering”

  1. Last week, IEEE Intelligent Systems published a special issue on recommender system:

    Recommender Systems
    Acquiring and Revising Preferences in a Critique-Based Mobile Recommender System
    Collaborative Filtering Using Dual Information Sources
    Informed Recommender: Basing Recommendations on Consumer Product Reviews
    New Recommendation Techniques for Multicriteria Rating Systems
    Attacks and Remedies in Collaborative Recommendation
    Attacking Recommender Systems: A Cost-Benefit Analysis
    Comparing Recommendation Strategies in a Commercial Context
    MarCol: A Market-Based Recommender System

  2. Forgot to say: papers can be found through ieeexplorer ;-)

  3. mike says:

    That’s fantastic – like the dining cryptographers protocol for recommender systems!

  4. [...] and will also attempt to suggest new research directions (such as private collaborative filtering – post & [...]

  5. [...] dataset. The work was a direct follow up to the new similarity measure I wrote about in my previous post on privacy in distributed recommender systems, and begins by reshaping the way we think about (and [...]

  6. mr-yellow says:

    Why do all the academic types that deal with semantic web and collaborative filtering type topics all post their information in the information vault at IEEE where the world can neither access or learn from their discoveries and thoughts. Why is this information not shared with the world?

  7. CS academics don’t usually have any choice other than publishing in conferences sponsored by IEEE or ACM. Other publishers (eg, Elsevier)? It just gets worse:

    However, many academics publish their papers as technical reports. For example, UCL academics do so using eprints

  8. Neal Lathia says:

    Mr. Yellow- you have a good point. It also affects us academic types sometimes.. I’ve had trouble accessing papers that I would like to read. However, I’ve found that (regardless of the “vault” that the paper is locked in) if you write to the author they will most likely send you a copy of their work..

    (ps. the full paper for this post is on my web site)