BlindSearch is a simple and neat site that collects ‘objective’ opinions on search quality by showing query results from Google, Yahoo and Bing side by side without identifying which is which and inviting you to select the best.
Archive for the ‘evaluation’ Category
Yahoo has recently made publicly available a huge catalog of datasets (data on ratings, language, graphs, and advertising)
I don’t know how it works but it seems to be a nice mobile testbed. Anyone knows more?
“An award-winning, ground-breaking product designed by Mobile Complete, DeviceAnywhere™ provides developers real-time interaction with handsets that are connected to live global networks. Built on Mobile Complete’s innovative device interaction technology, Direct-To-Device™, DeviceAnywhere enables you to connect to and control mobile devices around the world – using just the Internet. Through DeviceAnywhere’s original, non-simulated, real-time platform, you can remotely press buttons, view LCD displays, listen to ringers and tones, and play videos … just as if you were holding the device in your hands!”
Say that you have to answer this research question “Does the market discourage biased reporting (media slant)? Or does the market encourage it?” Matthew Gentzkow and Jesse Shapiro, two economists at the University of Chicago’s business school, set out to test this proposition, and The Economist reported on their research in just two pages. Gentzkow’s and Shapiro’s methodology is smart and imaginative! Here you go:
There’s an interesting post on techcrunch based on a comment by Google’s Marissa Mayer, who apparently said that search is “90-95% solved.” Regardless of the context in which this comment was made, the post’s response is that the problem is not solved: it then outlines a number of areas where information retrieval has yet to succeed. However, (reminiscent of a short question that appeared in the panels of RecSys 2007) there are other questions to ask: when is ANY research question “solved?” What does it mean for a problem (like information retrieval or collaborative filtering) to be “solved?”
Flickr Places “is a method of exploring Flickr with geo-specific pages. The page shows the most interesting photos for a location (iconic photos they call them), the most recent and common tags for the photos and the most prolific photo groups. It creates a separate page for each geographic location with a unique human-readable URL. Places go down to the city level so San Francisco, Seattle, and London will each have their own page and unique URL. In time they will go deeper. Places will be accessible via the Flickr API.” More here and here.
From this project, data useful for evaluation could come out !
To evaluate new mobile content discovery approaches, one needs to understand:
1) What mobile users query for:
- Deciphering Mobile Search Patterns: A Study of Yahoo! Mobile Search Queries
- How People Use the Web on Mobile Devices
2) How interests distribute across mobile users (who befriend each other):
I’ve just read an article from Anita Elberse titled “Should we invest in the long tail?”, published in the Harvard Business Review (no direct link, google for it). Based on what appears to be a very rigorous and extensive study, the author reports conclusions which seem to go in the opposite direction of what stated in the famous book from Chris Anderson “The Long tail”.
Many of us are designing smart algorithms and are often supposed to evaluate them by carrying out well-designed user studies.
Problem: Those studies are expensive and, consequently, we tend to trade off between sample size, time requirements, and monetary costs.
Proposal (by PARC researchers) : To collect user measurements from micro-task markets (such as Amazon’s Mechanical Turk). Here is their blog post (which comments on their upcoming HCI paper titled “Crowdsourcing User Studies With Mechanical Turk“).
Note: Often, users are irrational.