Archive for the ‘recommendations’ Category

Private Collaborative Filtering

Thursday, June 7th, 2007

Recently, I submitted my first paper to RecSys 07, “Private Distributed Collaborative Filtering using Estimated Concordance Measures.” Even though it is not particularly about mobile-stuff, here’s a quick run through the main ideas:

Collaborative filtering is a means of using a community’s behavior, within a certain domain (movies, music), to support reducing the amount of information that each individual needs to looks through to find their items of interest. It is the dominant method behind recommender systems (such as Amazon, etc), and is based on a simple idea: people with previous shared interests will, most likely, share common likes and dislikes in the future. So, to predict how much I will like a certain item, the system first compares my rating history to all the other users to produce similarity measures, and then uses these similarity measures to compute a weighted average of how much they enjoyed the item in question.

The problem is that this method allows for no privacy. Particularly in a distributed environment, where I do not know how much to trust unknown neighbors, I do not want to have to share my entire rating history (i.e. my profile) with them, to find a similarity measure- thus discouraging cooperation, which is harmful to collaborative filtering. The actual similarity measures (the most famous being the Pearson correlation coefficient), simply cannot be found without full profile disclosure.

Therefore, in the paper we proposed a new similarity measure, based on concordance. You and I rate a movie concordantly if we both rate it above or below our mean rating (in other words, we agree about whether to give it thumbs up/thumbs down). If you hated the movie (and rated it below your mean), and I loved it (rating it above my mean), then we disagree- the ratings are discordant. If one of us has no opinion about the movie, we just say it is tied. The new similarity measure is derived from the number of concordant (C), discordant (D), tied (T) pairs between our rating sets, and the size of the set N, and this similarity measure works just as effectively (in terms of generating recommendations) as the Pearson correlation coefficient.

So how can this be used to include privacy in collaborative filtering? If you and I share a common randomly-generated set, and report to each other the number of C, D, and T pairs we have with the random set to each other, these values can be used to place bounds on the actual values of C, D, and T pairs we have with each other: we can estimate our similarity without ever sharing any profile-specific information, only sharing abstracted profile information derived from a comparison with a random set. Privacy is not breached, and, along with an incremental learning technique (future work) about how to evolve the similarity measure between recommenders, we can start collaborating with each other!

If you’re interested in the details (and the evaluation), I’ll post the paper on my web site soon (when I hear back from RecSys!)

Vehicles and pervasive computing

Sunday, May 27th, 2007

As you might now I attended PerCom a couple of months ago.

During the panel discussion, there was an interesting conclusion that vehicles are the most pervasive devices available today.
If you think about it vehicles contain a large number of sensors:

  • Speed
  • Acceleration
  • Yaw, G sensors (for ESP)
  • Temperature (environment, engine, tires)
  • Light sensors (automatic lights etc)
  • Fuel consumption, oxygen level, CO2 levels
  • GPS, Navigation system, maps etc.
  • Noise (to automatically regulate car radio volume)

Furthermore, modern vehicles have a number of ways to communicate (FM Radio (RDS,TMC), Bluetooth, GSM, soon 802.11n (WAVE) ). In my opinion, all these features already constitute vehicles as mobile sensor platforms. By exploiting already available sensors we can design numerous applications:

  • Road traffic monitoring (using acceleration and speed sensors)
  • Distributed pollution and temperature monitoring
  • Parking information
  • Formation of platoons of vehicles (e.g. maximize road capacity)
  • Dissemination of warning information
  • Accident avoidance (e.g. break when approaching a red light fast)
  • Visual enhancement (e.g. provide information on the windshield about traffic ahead, red light warning).
  • Landmarks/advertisement
  • Communication between vehicles (voice/file sharing etc)

All these application can be implemented either centrally (e.g. using GSM) or in an Ad-Hoc manner. Although the first approach is more reliable and fast it has some disadvantages:

  • Centralized data may be outdated and the response time may not meet the real-time requirements. Especially if we want local information (like “is the traffic light ahead red?”, “is there a parking spot within 100m?”)
  • Current centralized communication solutions (GSM, WiMAX) may not be able to handle the burden of real-time monitoring of hundreds of thousands of vehicles. These services are allready congested with million of mobile phone users.
  • Infrastructure could be quite expensive, especially if the area to be covered is large. Furthermore, infrastructure may not be available, especially in remote and isolated areas (you haven’t been on mountains in Greece ;) ).
  • Ad-hoc service is free and it can provide more concentrated local information (e.g. advertisements etc)

However, there are a lot of research issues in order to implement all these Ad-Hoc applications:

  • We need robust routing protocols that work both in dense urban areas and sparser areas (e.g. DTN).
  • We need dissemination protocols that take into account the interest of the vehicles/drivers to avoid flooding the drivers with unnecessary information and cause congestion to the network
  • We need MAC protocols that are able to deliver information to high-speed moving vehicles.
  • We need extremely robust trust and security mechanisms because there are human lives at risk!!

My last year’s research was manly focused on two areas:
How to exploit the navigation system of the car to route and disseminate messages, and how to use the Publish/Subscribe communication paradigm to achieve that.

Navigation systems are becoming more and more popular. A part from navigation suggestions, navigation systems provide valuable information that is currently not generally used. First of all, the GPS unit provides the vehicle’s geographical location. Furthermore, the NS map provides various geographical information: street names and numbers, location of points of interest (like fuel stations, train stations, etc), kilometer ranges, etc. This information may be extended to include the location of known infostations so that the vehicle can geographically route information to them.

At the same time, the most important information that the NS provides is the suggested route of the vehicle: This information 1)makes the mobility patterns of the vehicles more predictable. Additionally, 2)it can be exploited to extract interests, which can be used to design efficient routing and dissemination protocols and to filter information that is only relevant to the driver without his/her intervention (for example an accident warning affects only vehicles that will drive near the accident).

Therefore, in the last months I examined how we exploit the navigation systems (suggested routes to predict mobility patterns and to extract subscriptions) in order to design a vehicular routing algorithm and a Publish/Subscribe system.

You can find the results here http://www.cs.ucl.ac.uk/staff/I.Leontiadis/pubs.html
There is also some additional work on Pub/sub issues but we have to wait the reviewers first :)

Ilias Leontiadis

Recommendation Software

Tuesday, November 14th, 2006

US online DVD rental service Netflix Inc has announced a version of the Longitude prize for film geeks – a $1m (£529.6m) bounty to the first person to develop software to improve its movie recommendation system by 10%.
The full story is posted in the Guardian and on the NetFlix Prize site

The current system works by making predictions based on correlations between user feedback (as described in this interview).