Archive for the ‘recommender systems’ Category

ACM RecSys 2009 Keynote (in 140 character chunks)

Friday, October 23rd, 2009

The third ACM RecSys conference started today in New York; unfortunately I could not make it. However, a number of people who I follow on Twitter are there (@xamat, @danielequercia, @barrysmyth)… and are tweeting away as the conference unfolds. You can follow the stream of #recsys09 tweets here. Although I’m sure that there are many details that do not make it into the 140 character-long tweets, they provide a real time snapshot of what is going on in the conference.

For example, the first keynote has just ended. Francisco Martin, Founder/CEO Strands, gave a talk about the “Top 10 Lessons Learned Developing, Deploying, and Operating Real-World Recommender Systems.” Here’s the twitter summary (note: copy/pasted and lightly edited to merge similar tweets).

Lesson 1 – Make sure a recommender is really needed! Do you have lots of recommendable items? Many diverse customers?… also think Return-on-Invesment… a more sophisticated recommender may not deliver a better ROI.

Lesson 2 – Make sure the recommendations make strategic sense. Is the best recommendation for the customer also the best for the business? What is the difference between a good and useful recommendation? Good recommendations vs useful recs; Obvious recommendations may not be useful; risky recs may deliver better long-term value

Lesson 3 - Choose the right partner! Select the right rec vendor vs hire some #recsys09 students. If you are a big company the best you can do is to organize a contest

Lesson 4 – Forget about cold-start problems (!) …. just be creative. The internet has the data you need (somewhere…)

Lesson 5 – Get the right balance between data and algorithms. 70% of the success of a #recsys is on the data, the other 30% on the algorithm

Lesson 6 – Finding correlated items is easy but deciding what, how, and when to present to the user is hard… or dont just recommend for the sake of it. Remember user attention is a scarce and valuable resource. Use it wisely! … dont make a recommendations to a customer who is just about to pay for items at the checkout! User interface should get at least 50% of your attention.

Lesson 7 – Dont’s waste time computing nearest neighbours (use social connections)… just mine the social graph. Might miss useful connections??

Lesson 8 – Dont wait to scale

Lesson 9 – Choose the right feedback mechanism. Stars vs thumbs …. the YouTube problem. More research on implicit and other feedback mechanisms is needed. The perfect rating system is no rating system! … focus on the interface. Seems to me this is one of the gaps in current research… algorithms > data > interface

Lesson 10 – Measure Everything! … business control and analytics is a big opportunity here.

Keynote Takeaway – Think about application context; Focus on interface as much as algs; Be creative with startup data. … the UI needs to get the lion’s share of the effort (50%) compared to algorithms (5%) , knowledge (20%), analytics (25%)

Netflix Prize – Round 2

Monday, September 21st, 2009

The netflix prize winners have been announced, as well as the next $1 million competition. From here:

“The new challenge focuses on predicting the movie preferences of people who rarely or never rate the movies they rent. This will be deduced from more than 100 million data points, including information about renters’ ages, genders, ZIP codes, genre ratings and previously chosen movies.

Instead of a single $1 million prize, this new challenge will be split into one $500,000 award to the team judged to be leading after six months and an additional $500,000 to the team in the lead at the 18-month mark, when the contest is wrapped up.”

Interestingly, our previous discussion on the viability of the winner’s results has now an answer. From here:

The team’s 10 percent achievement will not be immediately incorporated into Netflix.com, said Neil Hunt, chief product officer.

“There are several hundred algorithms that contribute to he overall 10 percent improvement – all blended together,” Hunt said. “In order to make the computation feasible to generate the kinds of volumes of predictions that we needed for a real system – we’ve selected just a small number – two or three of those algorithms for direct implementation.”

Recommender Systems @ SIGIR 2009

Friday, July 24th, 2009

There were two sessions on recommender systems at this year’s ACM SIGIR (held in Boston). Overall, it was a good conference- organised well, run smoothly. It became very quickly apparent to me (a first-timer to SIGIR) that this is a tight community of researchers; there were many hugs at the opening drinks. Here is a quick summary of the recommender system papers and a couple other noteworthy papers/events.

(more…)

Using Data Mining and Recommender Systems to Scale up the Requirements Process

Monday, July 13th, 2009

Paper by Jane Cleland-Huang and Bamshad Mobasher

Summary:

Ultra-large scale (ULS) software projects involve hundreds and thousands of stakeholders. The requirements may not be fully knowable upfront and emerge over time as stakeholders interact with the system. As a result, the requirements process needs to scale up to the large number of stakeholders and be conducted in increments to respond quickly to changing needs.

Existing requirements engineering methods are not designed to scale for ULS projects:

  • Waterfall and iterative approaches assume requirements are knowable upfront and are elicited during the early phases of the project
  • Agile processes are suitable for small scaled projects
  • Stakeholder identification methods only identify a subset of stakeholders

This position paper makes two proposals:

  • Using data-mining techniques (i.e., unsupervised-clustering) to identify themes from stakeholders’ statements of needs
  • Using recommender systems to facilitate broad stakeholder participation in the requirements elicitation and prioritisation process

Early evaluations show promise in the proposals:

  • Cluster algorithms (e.g, bisective, K-means) generated reasonably cohesive requirements clusters. However, a significant number contained requirements that were loosely coupled. The probabilistic Latent Semantic Analysis (LSA) method was used and early results showed improvement in cluster quality.
  • Their prototype recommender systems generated discussion forums that were more cohesive than ad-hoc ones created by users, and were able to recommend a significant number of relevant forums to stakeholders.

The proposed open elicitation framework introduces some challenges:

  • unsupervised group collaborations
  • decentralised prioritisation of requirements
  • malicious stakeholders manipulating the system for their personal gains

IJCAI ’09 Workshop on Intelligent Techniques for Web Personalization & Recommender Systems

Sunday, July 12th, 2009

Yesterday, I attended the 7th IJCAI Workshop on Intelligent Techniques for Web Personalization & Recommender Systems, in Pasadena. The IJCAI-2009 technical program will start on Tuesday. Here’s a summary of the sessions during the day: (more…)

Enhancing Mobile Recommender Systems with Activity Inference

Thursday, July 2nd, 2009

Daniele had briefly blogged here about this interesting paper, by Kurt Partridge and Bob Price, for which I will give a longer review. Some of the techniques used in this paper could be useful for further research and even its limitations are interesting subject of analysis.

Given that today’s Mobile Leisure Guide Systems need a big amount of user interaction (for configuration/preferences), this paper proposes to integrate current sensor data, models built from historical sensor data, and user studies, into a framework able to infer user high level activities, in order to improve recommendations and decrease the amount of user tasks.

Authors claim to address the problem of lack of situational user preferences by interpreting multidimensional contextual data using a categorical variable that represents high-level user activities like “EAT”, “SHOP”, “SEE”, “DO”, “READ”. A prediction is of course a probability distribution over the possible activity types.

Recommendations are provided through a thin client supported by a backend server. The following techniques are employed to produce a prediction:

  • Static prior models
    • PopulationPriorModel: based on time of the day, day of the week, and current weather, according to typical activities studies from the Japan Statistics Bureau.
    • PlaceTimeModel: based on time and location, using hand-constructed data collected from a user study.
    • UserCalendarModel: provides a likely activity based on the user’s appointment calendar.
  • Learning models
    • LearnedVisitModel: tries predicting the user’s intended activities from time of day, learning from observations of their contextual data history. A Bayesian network is employed to calculate the activity probability given location and time.
    • LearnedInteractionModel: constructs a model of the user’s typical activities at specific times, by looking for patterns in the user’s interaction with his/her mobile device.

Activity inferences are made by combining the predictions from all the five models, using geometric combination of the probability distributions.

A query context module is fed to the activity prediction module to provide prediction data of the context in which the user may be interested. For example, the user could be at work when searching for a restaurant, but his/her actual context could be the area downtown in which he/she plans to go for dinner.

Authors carried out a user study, evaluating the capability of each model to provide accurate predictions.  Eleven participants carried the device for two days, and were rewarded with cash discounts for leisure activities they engaged in while using the device. The Query Context Prediction module was not enabled because of the short duration. Results show high accuracy (62% for baseline=”always predict EAT”, 77% for PlaceTimeModel).

Some good and problematic issues with this paper

  • the prediction techniques used are interesting and could be applied to other domains; moreover I think it’s useful to combine data from user studies and learning techniques as user profiling helps developers (and apps) to understand users in general - before applying this knowledge to a specific user
  • the sample size makes the user study flawed: 11 participants carrying devices for 2 days approaches statistical insignificance; weekdays/weekends is the first issues that bumps into my mind, just to mention one
  • offering cash discounts for leisure activities is presumably not the correct form of reward for this kind of study as it makes users more willing to engage in activities that require spending money over the free ones (e.g. EAT vs. SEE)
  • authors admit they have mostly restaurants in their RS base, which I think is not taken in enough account when claiming high accuracy. Given that the baseline predictor has a 62% accuracy predicting always EAT, a deeper analysis would have made the paper more scientific
  • one of the most interesting contribution of the paper is the definition of the query context module, which is unfortunately not employed in the study for usability reasons related to the its duration. Again, a better defined user study would have solved this problem. I question whether it’s worth carrying out user studies when resources are so limited that the statitistical significance becomes objectable. However, there is some attempt to discuss expected context vs. actual context which is potentially very interesting: e.g., a user wants to SHOP but the shops are closed, so he/she EATs. It would be interesting to discuss how a RS should react to such situations
  • user-interaction issues: the goal of the presented system is to reduce user tasks on the mobile; yet, this is needed to tune the system and address its mistakes; yet, one of the predictors uses exactly user’s interaction with the mobile as a parameter. It looks like there is some confusion considering the role of user interaction in this kind of systems (imho, I think that a HCI approach could improve RS usability and, consequently, accuracy)
  • the systems is not well suited to multi-purpose trips (e.g. one EATs whilst DOing, or alternatively SHOPs and EATs) and in this case predictions are mostly incorrect.

Discussing the Netflix Prize

Tuesday, June 30th, 2009

After my last blog post, I was contacted by a journalist who wanted to discuss the effects of the Netflix prize. It seems that now that the competition is winding to an end, one of the real questions that emerges is whether it was worth it. Below, I’m pasting part of my side of the dialogue; other blogs are posting similar discussions, and I’m curious as to what any of you fellow researchers may have to say.

(more…)

Temporal Collaborative Filtering

Tuesday, April 28th, 2009

As part of my recent work on collaborative filtering (CF), I’ve been examining the role that time plays in recommender systems. To date, the most notable use of temporal information (if you’re familiar with the Netflix prize) is that researchers are using time(stamps) to inch their way closer to the million dollar reward. The idea is to use how user-ratings vary according to, for example, the day of the week they were input in order to better predict the probe (and more importantly, the qualifying) datasets. I suppose my only criticism here is that once the million dollars has been won, nobody is going to implement and deploy this aspect of the algorithm (unless you are prepared to update your recommendations every day?) – since, in practice, we do not know when users are going to rate items.

In the context of my work, I’ve been looking at 2 areas; the effect of time on (1) similarity between users (RecSys ’08), and (2) the recommender system itself. Here’s a brief summary of (2): (more…)

Putting a Price on Social Connections

Wednesday, April 8th, 2009

From today’s Business Week:

Why weak ties aren’t always strong. “Researchers at IBM and MIT have found that certain e-mail connections and patterns at work correlate with higher revenue production … they used mathematical formulas to analyze the e-mail traffic, address books, and buddy lists of 2,600 IBM consultants over the course of a year. … They compared the communication patterns with performance, as measured by billable hours. They found that consultants with weak ties to a number of managers produced $98 per month less than average. Why? Those employees may move more slowly as they process “conflicting demands from different managers,” the study’s authors write. They suffer from “too many cooks in the kitchen.”

How to introduce people (matchmaking). They also analyzed methods to introduce employees to colleagues they haven’t yet met (to incent people to participate). … “Geyer and his team are digging for signs of shared interests and behaviors among their colleagues. …In their matchmaking efforts, the IBM team tried a variety of approaches. One used a tool favored by Facebook, recommending friends of common friends. Others analyzed the subjects and themes of employees’ postings on Beehive, words they use, and patents they’ve filed. As expected, some of the systems lined up workers with colleagues they already knew. Others were better at unearthing unknowns. But fewer of them turned out to be good matches. To the frustration of the researchers, some of the workers noted that recommendations looked good, yet they didn’t bother contacting the people. “They put them aside for future reference,” Geyer says. “

Rich RDF Data for building Music RecSys (by BBC Music)

Friday, March 27th, 2009

The new BBC Music website was launched yesterday – a lot of RDF data. For example:

More in this post. Up to  build a recommender systems from this data?

WikiRank and Car Traffic Data

Thursday, March 19th, 2009

Wikirank uses Wikipedia’s traffic data to see what’s interesting on the web. One could use car traffic data to spot what’s interesting on our streets. The question of course is how ;-)

Why I blog this? It’s relevant to Ilias’ research. Plus, Licia will start a cool project on using mobile data for navigating cities – she currently has an opening on that – check her website! For more, stay tuned.

blind review

Wednesday, March 18th, 2009

There are plenty of conferences that dropped blind reviewing and that consequently became cliquey (always the same people publish in them; more specifically, the match between “program committee” and “conference program” is surprising). The result: slack conferences. So I don’t really understand why a conference that was successfully growing should do this:

“RecSys ’09 will not use blind review”

boh!

Social net infantalising the human kind? Greenfield and Sigman are infantilising social research

Monday, March 16th, 2009

Dear Greenfield and Sigman:

Please join mm-sing, star blog, or rage! You may have fun and, in the process, your research agenda will start to reflect reality and British taxpayers will finally get value for money.

(*) “And then there’s the discussion of Lady Greenfield’s claims that social network sites are “infantilising” the human mind. She made a speech to the House of Lords to encourage people to research her hypothesis. There is NO EVIDENCE to prove her claims. Listening to her talk, it is very clear to me that she has no idea how social network sites work.” (danah)

Similarity Graphs

Thursday, February 26th, 2009

The idea of reasoning about content to recommend as a similarity graph is quite widespread. Broadly speaking, you can start by drawing a set of circles (for users) on the left and a set of circles (for “items” – songs, movies..) on the right; when users rate/listen to/etc items, you draw an arrow from the corresponding left circle to the right circle (i.e. a bipartite graph).  What collaborative filtering algorithms can do is project the two-sided graph to two equivalent representations, where users are linked to other users, and items are linked to other items based on how similar they are.

There are a bunch of places where this kind of abstraction has been used; for example, Oscar Celma used graphs to navigate users when discovering music in the long-tail. Paul Lamere posted graphs made with the EchoNest API on his blog. I’ve also dabbled in this area a bit, but not using music listening data; I was using (the more traditional) MovieLens and Netflix datasets. The question that comes to mind when reading about techniques that operate on the graph, though, is: are the underlying graphs real representations of similarity between content? What if the graphs are wrong? (more…)

Research is the New Music

Monday, February 23rd, 2009

I’ve started trying out a new service, called Mendeley. The quickest way to describe it is a “last.fm for research;” they have a desktop client that can monitor the pdf files that you are reading, and an online presence where each user has a profile. (Read about them on their blog; my profile is here). So far, it seems that they are at a very early stage. However, the basic functionality (seeing/tagging/searching papers you read) seems quite nice. On the other hand, an obvious difficulty is that of extracting accurate meta-data from research pdf files.

The similarity between research papers and songs is quite striking. Think of it this way: songs (research papers) are made by musicians (authored by researchers), have a name (title), and are collected in albums (journals/conference proceedings). Both have a time of release; both can be tagged/described/loved/hated; both are blogged and talked about. Sometimes artists make music videos, sometimes researchers make presentations or demos. (more…)