Archive for the ‘prediction’ Category

Enhancing Mobile Recommender Systems with Activity Inference

Thursday, July 2nd, 2009

Daniele had briefly blogged here about this interesting paper, by Kurt Partridge and Bob Price, for which I will give a longer review. Some of the techniques used in this paper could be useful for further research and even its limitations are interesting subject of analysis.

Given that today’s Mobile Leisure Guide Systems need a big amount of user interaction (for configuration/preferences), this paper proposes to integrate current sensor data, models built from historical sensor data, and user studies, into a framework able to infer user high level activities, in order to improve recommendations and decrease the amount of user tasks.

Authors claim to address the problem of lack of situational user preferences by interpreting multidimensional contextual data using a categorical variable that represents high-level user activities like “EAT”, “SHOP”, “SEE”, “DO”, “READ”. A prediction is of course a probability distribution over the possible activity types.

Recommendations are provided through a thin client supported by a backend server. The following techniques are employed to produce a prediction:

  • Static prior models
    • PopulationPriorModel: based on time of the day, day of the week, and current weather, according to typical activities studies from the Japan Statistics Bureau.
    • PlaceTimeModel: based on time and location, using hand-constructed data collected from a user study.
    • UserCalendarModel: provides a likely activity based on the user’s appointment calendar.
  • Learning models
    • LearnedVisitModel: tries predicting the user’s intended activities from time of day, learning from observations of their contextual data history. A Bayesian network is employed to calculate the activity probability given location and time.
    • LearnedInteractionModel: constructs a model of the user’s typical activities at specific times, by looking for patterns in the user’s interaction with his/her mobile device.

Activity inferences are made by combining the predictions from all the five models, using geometric combination of the probability distributions.

A query context module is fed to the activity prediction module to provide prediction data of the context in which the user may be interested. For example, the user could be at work when searching for a restaurant, but his/her actual context could be the area downtown in which he/she plans to go for dinner.

Authors carried out a user study, evaluating the capability of each model to provide accurate predictions.  Eleven participants carried the device for two days, and were rewarded with cash discounts for leisure activities they engaged in while using the device. The Query Context Prediction module was not enabled because of the short duration. Results show high accuracy (62% for baseline=”always predict EAT”, 77% for PlaceTimeModel).

Some good and problematic issues with this paper

  • the prediction techniques used are interesting and could be applied to other domains; moreover I think it’s useful to combine data from user studies and learning techniques as user profiling helps developers (and apps) to understand users in general - before applying this knowledge to a specific user
  • the sample size makes the user study flawed: 11 participants carrying devices for 2 days approaches statistical insignificance; weekdays/weekends is the first issues that bumps into my mind, just to mention one
  • offering cash discounts for leisure activities is presumably not the correct form of reward for this kind of study as it makes users more willing to engage in activities that require spending money over the free ones (e.g. EAT vs. SEE)
  • authors admit they have mostly restaurants in their RS base, which I think is not taken in enough account when claiming high accuracy. Given that the baseline predictor has a 62% accuracy predicting always EAT, a deeper analysis would have made the paper more scientific
  • one of the most interesting contribution of the paper is the definition of the query context module, which is unfortunately not employed in the study for usability reasons related to the its duration. Again, a better defined user study would have solved this problem. I question whether it’s worth carrying out user studies when resources are so limited that the statitistical significance becomes objectable. However, there is some attempt to discuss expected context vs. actual context which is potentially very interesting: e.g., a user wants to SHOP but the shops are closed, so he/she EATs. It would be interesting to discuss how a RS should react to such situations
  • user-interaction issues: the goal of the presented system is to reduce user tasks on the mobile; yet, this is needed to tune the system and address its mistakes; yet, one of the predictors uses exactly user’s interaction with the mobile as a parameter. It looks like there is some confusion considering the role of user interaction in this kind of systems (imho, I think that a HCI approach could improve RS usability and, consequently, accuracy)
  • the systems is not well suited to multi-purpose trips (e.g. one EATs whilst DOing, or alternatively SHOPs and EATs) and in this case predictions are mostly incorrect.

Temporal Collaborative Filtering

Tuesday, April 28th, 2009

As part of my recent work on collaborative filtering (CF), I’ve been examining the role that time plays in recommender systems. To date, the most notable use of temporal information (if you’re familiar with the Netflix prize) is that researchers are using time(stamps) to inch their way closer to the million dollar reward. The idea is to use how user-ratings vary according to, for example, the day of the week they were input in order to better predict the probe (and more importantly, the qualifying) datasets. I suppose my only criticism here is that once the million dollars has been won, nobody is going to implement and deploy this aspect of the algorithm (unless you are prepared to update your recommendations every day?) – since, in practice, we do not know when users are going to rate items.

In the context of my work, I’ve been looking at 2 areas; the effect of time on (1) similarity between users (RecSys ’08), and (2) the recommender system itself. Here’s a brief summary of (2): (more…)

Tutorials at RecSys 2008

Friday, October 24th, 2008

Yesterday was the first day of RecSys 2008, and was dedicated to three very interesting tutorials:

1. Robust Recommender Systems. Robin Burke introduced the wide range of attacks that typical collaborative filtering algorithms are vulnerable to; scenarios that arise when people attempt to force, rather than express, opinions. An attack was strictly defined as a set of profiles intending to obtain excessive influence on others, which can be aimed at pushing (making recommendation more likely) or nuking (i.e. recommendation less likely) items. His talk was an interesting blend of attack strategies, knowledge that attackers need to have, and a high-level description of approaches aiming at preventing or fixing the system when attacked. Of course, there are strong overlaps between this work and work in other areas (p2p trust, adversarial information retrieval, search engine spam..); I particularly like this area as pushes the point that recommender systems are about people/dynamic datasets, and not just prediction.

2. Recent Progress in Collaborative Filtering. Yehuda Koren (who has recently moved from AT&T to Yahoo! Research) gave a tutorial about the leading approaches in the Netflix prize competition. The techniques he described blend matrix factorisation and neighbourhood models, and include a number of other factors (such as user biases and time) that result in techniques that have multiple-billions of parameters (and the resulting ranking of team BellKor in the competition). His work is remarkable and worthy of the progress prizes he has been awarded thus far. He also explored alternative techniques of evaluating recommender systems, explaining his take on evaluating top-N recommendation lists.

3. Context-Aware Recommendations. Gedas Adomavicius and Alex Tuzhilin introduced their work on incorporating context into recommender systems, including pre-, post-, and hybrid-filtering of recommendation algorithm results based on user context. A running example that was repeated throughout the tutorial was going to the theatre with your girlfriend on the weekend: if you always watch comedy, then your recommendations can be filtered to match what you did in previous instances of the same context (i.e. you can be recommended comedy). They have done a lot of cool stuff on multi-dimensional recommenders, extending the common rating scales into cubes of ratings, and stressed more than once that this is virgin territory. Their work is also impressive, but raised a few questions. For example, should context be described by a well-enumerated taxonomy? Moreover, if you always watch comedy at the theatre with your girlfriend on weekends, then why should you need a recommender system in the first place (especially a collaborative one- what happened to serendipity or diversity)? They have a number of papers that are worth reading before trying to answer these questions!

Filtering By Trust

Thursday, February 28th, 2008

A lot of the posts on this blog discuss the idea of computers using trust in different scenarios. Since starting as a research student, and reading a lot of good papers about trust, I still find the idea slightly strange: is trust something that can actually be quantified? Are these computers actually reasoning about trust? (what is trust?)

Trust, however, seems to have a place in our research. It gives us a metaphor to describe how interactions between “strangers” should happen, a platform to build applications that feed off these ideas, and a language to describe and analyse the emergent properties that we observe in these systems. So, even if a trust relationship may be a concept that cannot be boiled down to a deterministic protocol, it gives researchers a model of the world they are exploring.

I decided to apply this metaphor to collaborative filtering, an algorithm that aims to generate recommendations by assigning neighbours to each user. Usually this assignment is done by computed similarity, which has its downfalls: how similar are two people who have never rated items in common? What is the best way of measuring similarity? Applying the metaphor of trust, instead, aims at capturing the value that each neighbour gives to a user’s recommendations over time, and value is not only derived from agreement- but also from the ability a neighbour has to provide information. While similarity-based algorithms use neighbour ratings, interactions based on trust acknowledge that information received from others is subject to interpretation, as there may be a strong semantic distance between the way two users apply ratings to items.

Trust is a richer concept than similarity by favouring neighbours who can give you information about the items you seek and offering a means of learning to interpret the information received from them. Evaluating this technique improves on the basic similarity-driven algorithms, both in terms of accuracy and coverage: modeling the problem as an instance of a trust-management scenario seems to offer an escape from traditional pitfalls of similarity-based approaches. I leave all the details to the paper, “Trust-based Collaborative Filtering,” which is due to appear in IFIPTM 2008 (thanks to Daniele for his great comments and feedback).

Netflix: Winners and Losers

Friday, December 14th, 2007

By now, the news has spread around that team BellKor has won this year’s Netflix progress prize ($50,000) by achieving an 8.43% improvement over Cinematch (the grand prize is less than 2% away). Their current solution is described here; and perhaps the most interesting thing about it is the first sentence. “Our final solution (RMSE=0.8712) consists of blending 107 individual results.” It is also interesting to note that the second place on the leader board is team KorBell; I assume that this is because Netflix has restricted each team to one submission per day.

A natural question to ask, therefore, (other than the one about how many teams may have multiple names and can/are trying and infer what the qualifying ratings actually are) is that perhaps this race for accuracy is developing methods that are perfectly suitable for the qualifying data but not necessarily for the rest! It becomes a problem of overfitting. To quote wikipedia, a brute-force approach aimed at accuracy could develop a method that “reduces or destroys the ability of the model to generalize beyond the fitting data!” In other words, once they unleash the winning algorithm on the rest of their data, will they maintain the 10% improvement over Cinematch?

My work in recent weeks has been following up on a previous paper, by exploring the (lack of) information that a lonely RMSE or MAE can give us about how well collaborative filtering is performing: we know nothing about how much the predictions are dispersed around the mean, how error evolves over time, and are not considering a number of other aspects that should be close to our heart. More coming up on that soon. But in the mean time, I made my first submission to the Netflix prize site to see how well the Java random number generator would perform. My incredible predictions were made using nextInt(5)+1. I achieved an RMSE of 1.93, and hopefully no team has performed worse than me.

Just out of curiosity, I got RMSE 1.92 on the probe set using the same technique; I haven’t read anywhere about the extent to which the probe set offers a good idea as to how well qualifying performance will be. Further predictions on the probe set, based on a random number between 3 and 5, or (nextDouble()*2) + 3, (since rating distribution is skewed towards the positive end in these datasets) improved my losing streak to RMSE 1.31. Lastly, simply returning the average rating for each movie gets RMSE 1.13. So if anyone out there is doing this well with crazy matrix operations, you might want to rethink your strategy :)

Measurement and Analysis of Online Social Networks

Wednesday, November 14th, 2007

At IMC, it has been presented the first study to examine multiple online social networks at scale. The paper analyzes “data gathered from four popular online social networks: Flickr, YouTube, LiveJournal, and Orkut”.

Findings

  • “the indegree of user nodes tends to match the outdegree;
  • the networks contain a densely connected core of high-degree nodes;
  • this core links small groups of strongly clustered, low-degree nodes at the fringes of the network”.

Implications on info dissemination and search

  • “The existence of a small, well-connected core implies that information seeded via a core node will rapidly spread through the entire network.”
  • “Similarly, searches that proceed along social network links will quickly reach the core. This suggests that simple unstructured search algorithms could be designed if the core users were to store some state about other users.”

Implications on trust
“In a social network, the underlying user graph can potentially be used as a means to infer some level of trust in an unknown user, to check the validity of a public key certificate, and to classify potential spam”.

  • “The tight core coupled with link reciprocity implies that users in the core appear on a large number of short paths. Thus, if malicious users are able to penetrate the core, they can skew many trust paths (or appear highly trustworthy to a large fraction of the network).”
  • “However, these two properties also lead to small path lengths and many disjoint paths, so the trust inference algorithms should be adjusted to account for this observation. In particular, given our data, an unknown user should be highly trusted only if multiple short disjoint paths to the user can be discovered.”
  • “The correlation in link degrees implies that users in the fringe will not be highly trusted unless they form direct links to other users. The “social” aspect of these networks is selfreinforcing: in order to be trusted, one must make many “friends”, and create many links that will slowly pull the user into the core.”

Collaborative Filtering is Strange

Friday, September 21st, 2007

I just submitted a paper that includes some very strange results that I got when playing around with the different collaborative filtering techniques on the MovieLens dataset. The work was a direct follow up to the new similarity measure I wrote about in my previous post on privacy in distributed recommender systems, and begins by reshaping the way we think about (and try to visualise) collaborative filtering. (more…)

Private Collaborative Filtering

Thursday, June 7th, 2007

Recently, I submitted my first paper to RecSys 07, “Private Distributed Collaborative Filtering using Estimated Concordance Measures.” Even though it is not particularly about mobile-stuff, here’s a quick run through the main ideas:

Collaborative filtering is a means of using a community’s behavior, within a certain domain (movies, music), to support reducing the amount of information that each individual needs to looks through to find their items of interest. It is the dominant method behind recommender systems (such as Amazon, etc), and is based on a simple idea: people with previous shared interests will, most likely, share common likes and dislikes in the future. So, to predict how much I will like a certain item, the system first compares my rating history to all the other users to produce similarity measures, and then uses these similarity measures to compute a weighted average of how much they enjoyed the item in question.

The problem is that this method allows for no privacy. Particularly in a distributed environment, where I do not know how much to trust unknown neighbors, I do not want to have to share my entire rating history (i.e. my profile) with them, to find a similarity measure- thus discouraging cooperation, which is harmful to collaborative filtering. The actual similarity measures (the most famous being the Pearson correlation coefficient), simply cannot be found without full profile disclosure.

Therefore, in the paper we proposed a new similarity measure, based on concordance. You and I rate a movie concordantly if we both rate it above or below our mean rating (in other words, we agree about whether to give it thumbs up/thumbs down). If you hated the movie (and rated it below your mean), and I loved it (rating it above my mean), then we disagree- the ratings are discordant. If one of us has no opinion about the movie, we just say it is tied. The new similarity measure is derived from the number of concordant (C), discordant (D), tied (T) pairs between our rating sets, and the size of the set N, and this similarity measure works just as effectively (in terms of generating recommendations) as the Pearson correlation coefficient.

So how can this be used to include privacy in collaborative filtering? If you and I share a common randomly-generated set, and report to each other the number of C, D, and T pairs we have with the random set to each other, these values can be used to place bounds on the actual values of C, D, and T pairs we have with each other: we can estimate our similarity without ever sharing any profile-specific information, only sharing abstracted profile information derived from a comparison with a random set. Privacy is not breached, and, along with an incremental learning technique (future work) about how to evolve the similarity measure between recommenders, we can start collaborating with each other!

If you’re interested in the details (and the evaluation), I’ll post the paper on my web site soon (when I hear back from RecSys!)

Learning and Inferring Transportation Routines

Wednesday, January 31st, 2007

This paper (whose extended journal version has been recently accepted) proposes a hierarchical Markov model that learns and infers a user’s daily movements.

A playlist for every moment

Wednesday, November 22nd, 2006

The XPod project at UMBC introduces a “smart” music player that learns its user’s preferences and activity, and tailors its music selections accordingly (NewsDay article).