Archive for the ‘search’ Category

SIGIR: Research vs. Reality

Monday, July 26th, 2010

Last week, I attended SIGIR 2010 in Geneva, where I presented a paper. The conference has left its traces online: a steady stream of tweets and some great blog-posts (e.g., the Noisy Channel). You can even read comments about the incredible (and, for my part, unexpected) temperatures of some of the conference rooms.

It is always interesting to attend conferences, meet people, and see what research others do to fill their time. However, it was also interesting to attend SIGIR for another reason. Back when the notifications were released, there was a noticeable outcry about the poor quality of the reviewing process. Some authors chose to publish public rebuttals to the reviewers on their blog; others wrote about the unending cycle of complaining that the IR community has spiraled into. “Not Relevant” was born in the wake of all this discussion to give SIGIR rejects an alternative venue to publish work. I wanted to see how this outcry would have been reflected in the conference itself – would the “complainers” not attend? Would the attendees be the subset of authors who are happy with SIGIR? In short: no. There were conversations that I heard replay themselves over and over in the coffee breaks that echoed a variety of problems in the IR-research community. In trying to make sense of the flurry of comments, it seems there are two general areas that need attention:

1. The divergence between reality and research. As the web grows (and goes mobile) the breadth and types of information that people want to access has both grown and changed. However, image, mobile/contextual, and real-time (see tweet) search were largely underrepresented at the conference. A quick look at the conference program shows that the conference still focuses primarily on text and document retrieval. Can people’s information needs be fully captured in a text/document-oriented conference? Is the published research ignoring the

latest trends in information access?

2. The divergence between research and reality. The flip side of the coin: SIGIR research has fallen into a methodological rut; the conference is “trapped by a very successful paradigm [...where] people can do complex work, the quality of that work can be measured, and progress made.” There are two problems here: first, the community has been hypnotised by its metrics. The current research paradigm encourages researchers to produce “minor-delta” papers (i.e., “we propose an algorithm that improves a baseline by x%”) rather than look at novel problems (see #1 above). However, while doing so, there is no evidence of long-term, cumulative progress in decades of publications. On the other hand, I continue to miss the link between these metrics and the users that they are meant to serve (similar discussions often arise between recommender system researchers). Yes, there are lengthy arguments to be had here: the most important point, now, is that this discussion needs to happen (and happen more frequently).

Lastly, a more general note, based on a question I was asked that is worth pondering on and related, more generally, to all the research we do. Why do we have presentation-based conferences? We take turns standing up, giving our 20-minute summary of our paper, and relegate all meaningful conversation to short coffee breaks. How does this affect the research that we produce?

…will last week become the last SIGIR that I ever attend?

Microsoft’s Robust Location Search

Sunday, June 14th, 2009

The primary goal of this  project is to explore novel and effective ways to search geo-spatial data and leverage multi-lingual technologies within maps.

Solving Research Problems

Tuesday, September 9th, 2008

There’s an interesting post on techcrunch based on a comment by Google’s Marissa Mayer, who apparently said that search is “90-95% solved.” Regardless of the context in which this comment was made, the post’s response is that the problem is not solved: it then outlines a number of areas where information retrieval has yet to succeed. However, (reminiscent of a short question that appeared in the panels of RecSys 2007) there are other questions to ask: when is ANY research question “solved?” What does it mean for a problem (like information retrieval or collaborative filtering) to be “solved?”


Efficient Search Not Good for Research?

Friday, July 18th, 2008

I read a a curious article posted on wired: based on a recent study of journal citation patterns between ’98 and ’05 (that is to appear in Science), the authors claim that as the Internet provides researchers with efficient search of journal papers, “the breadth of scholarship” is being lost. Here is a quote:

“As more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of the citations were to fewer journals and articles.”

So, is this google scholar’s fault? Is this a new trend in research? Or maybe this means that as the wealth of published research explodes, the truly cite-able papers are still few (i.e., is citation breadth a measure of quality (or not)?)

What do you think?

Social Systems

Monday, June 30th, 2008

This month’s Data Engineering Bulletin is about Recommendation and Search in Social Systems. It sports thoughts on robustness and user experience.

Spam Dataset

Monday, January 21st, 2008

WEBSPAM-UK2007 ” is a large collection of annotated spam/nonspam hosts labeled by a group of volunteers. The base data is a set of 105,896,555 pages in 114,529 hosts in the .UK domain downloaded by the Laboratory of Web Algorithmics of the University of Milano. The assessment was done by a group of volunteers.

For the purpose of the Web Spam Challenge 2008, the labels are being released in two sets. SET1, containing roughly 2/3 of the assessed hosts will be given for training, while SET2 containing the remaining 1/3, will be held for testing. More information about the Web Spam Challenge 2008, co-located with AIRWeb 2008 will be available soon” here and here.