Archive for the ‘dataset’ Category
- 30 Resources to Find the Data You Need
- New Reality Mining Data Available. From Nathan Eagle: “I am currently releasing the full Reality Mining dataset. It’s got loads of additional information – especially related to survey responses (friendships, recent illness, satisfaction, etc). The new ReadMe has a complete description. If you’d like access, just drop me an email. As I’m now involved in other projects, I haven’t had much time to look at this new data – so have at it. ”
“The new challenge focuses on predicting the movie preferences of people who rarely or never rate the movies they rent. This will be deduced from more than 100 million data points, including information about renters’ ages, genders, ZIP codes, genre ratings and previously chosen movies.
Instead of a single $1 million prize, this new challenge will be split into one $500,000 award to the team judged to be leading after six months and an additional $500,000 to the team in the lead at the 18-month mark, when the contest is wrapped up.”
The team’s 10 percent achievement will not be immediately incorporated into Netflix.com, said Neil Hunt, chief product officer.
“There are several hundred algorithms that contribute to he overall 10 percent improvement – all blended together,” Hunt said. “In order to make the computation feasible to generate the kinds of volumes of predictions that we needed for a real system – we’ve selected just a small number – two or three of those algorithms for direct implementation.”
Yahoo has recently made publicly available a huge catalog of datasets (data on ratings, language, graphs, and advertising)
by Alvin Chin: I believe the best way to get at Facebook data is to take a subset, get consent from people within a Facebook group. …
The first day I attended the Automated Journeys workshop organized by Arianna Bassoli (who gave a talk at UCL a while back), Johanna Brewer (whose recent work has been covered here; for more, check her blog), and Alex Taylor. The workshop’s format was not traditional. As part of the workshop, we went out and had lunch , and, while doing so, we observed how people in Seoul use technologies. Then, we came back and, through group discussions and hands-on design brainstorming sessions, we produced 4 envisagements that critically reflected on technological futures. It was very engaging! I hope other workshops will replicate/mutate this format. I wished I could attend at least two of the other workshops on offer: Ubiquitous Systems Evaluation partly organized by Chris Kray (I am in debt with him, and he knows why ) and Devices that Alter Perception partly organized by Carson Reynolds.
At Ubicomp, the speakers did not suffer from powerpoint karaoke syndrome, and their slides were generally well-designed – less text, more images. That is largely because the ubicomp’s community is made of design-conscious (CHI) researchers. Few talks are already available on slideshare.
Here are few papers I personally found intriguing because of their algorithms, their evaluation, or their interesting ideas. At the end of this post, I’ll point to few datasets that have been used and can be of interest
Navigate Like a Cabbie: Probabilistic Reasoning from Observed Context-Aware Behavior. Brian D. Ziebart showed a new way of making route predictions. He used a probabilistic model presented at AAAI “Maximum Entropy Inverse Reinforcement Learning“. Interestingly, he showed that the model works upon data that is noisy and imperfect.
Pedestrian Localisation for Indoor Environments. Oliver Woodman proposed a way of tracking people indoor. Oliver and Robert showed how to combine a foot-mounted unit, a building model, and a particle filter to track people in a building. They experimentally showed that users can be effectively tracked within 1m without knowing their initial positions. Great results! It’s a paper well worth reading!
Discovery of Activity Patterns using Topic Models. Bernt Schiele presented a new method for recognizing a person’s activities from wearable sensors. This method adapts probabilistic topic models and has been shown to recognize daily routines without user annotation. One of Bernt’s students had an interesting poster on detecting location transitition using sensor data (pdf).
A couple of papers (including the great work done by Matthew Lee) used a method called the Wizard of Oz evaluation. The general idea is to simulate those parts of the system (e.g., speech recognition) that require most effort in terms of development or to assess the suitability of your interface(see “Wizard of Oz studies – why and how” (pdf) for more).
Flowers or a Robot Army? Encouraging Awareness & Activity with Personal, Mobile Displays by Sunny Consolvo et al. They designed a system that makes it possible for mobile users to self-monitor their physical activities and conducted a greatly designed 3-month field experiment.
Reflecting on the Invisible: Understanding End-User Perceptions of Ubiquitous Computing (pdf). Erika Shehan Poole detailed end-user perceptions of RFID technology using an interesting qualitative method that combines structured interviews and photo elicitation excercises. Erika and her mates show that, by using this method, one is able to uncover perceptions that are often difficult for study participants to verbalize. One of her findings: many people believed that RFID can be used to remotely tract the location of tagged objects, people, or animals!
3. Interesting Ideas
Bookisheet: Bendable Device for Browsing Content Using the Metaphor of Leafing Through the Pages. Trash your mouse. Jun-ichiro Watanabe presented a VERY promising interface (a book made of two thin plastic sheets and bend sensors) with which a user can easily scroll digital content such as photos. The user does so by simply bending one side of the sheet or the other.
Towards the Automated Social Analysis of Situated Speech Data. To automatically understand individual and group behavior, Danny Wyatt et al. recorded the coversational dynamics of 24 people over 6 months. They did so using privacy-sensitive techniques. By using this type of studies, researchers may well gain broad sociological insights.
The Potential for Location-Aware Power Management. Robert Harle showed how to dinamically optimize the energy consumption of an office. Very interesting problem-driven research!
Accessible Contextual Information for Urban Orientation. Jason Stewart presented a prototype of a location-based service with which mobile users share content (see their project’s website)
Enhanced Shopping: A Dynamic Map in a Retail Store. Alexander Meschtscherjakov presented a prototype for mobile phones that displays customer activities (e.g., customer flow) inside a shopping mall
Spyn: Augmenting Knitting to Support Storytelling and Reflection (pdf). Daniela K. Rosner‘s presentation was masterfully designed! She walked us through her expirience of designing Spyn – a system for knitters to record, playback, and share information involved in the creation of their hand-knit artifacts. She showed how her system enriches the knitter’s craft
Picture This! Film assembly using toy gestures. Cati Vaucelle (who keeps a cool blog) presented a new input device embedded in children’s toys for video composition. As they play with the toys to act out a story, children conduct film assembly.
Understanding Mobility Based on GPS Data by Yu Zheng et al. used GPS logs of 65 people over 10 months (the largest dataset in the community!) to evaluate a new way of inferring people’s motion modes from their GPS logs
Accurate Activity Recognition in a Home Setting (pdf) by Tim van Kasteren et al. used 28 days of sensor data about one person @ home and corresponding annotations of his activities (e.g., toileting, showering, etc.) to evaluate a new method for recognizing activities from sensor data.
Discovery of Activity Patterns using Topic Models by Tam Huynh et al. used 16 days of sensor data from a man who was carrying 2 wearable sensors to test their method for automatically recognizing activities (e.g., dinner, commuting, lunch, office work) from sensor data.
On Using Existing Time-Use Study Data for Ubiquitous Compting Applications by Kurt Partridge and Philippe Golle how to use data (e.g. people’s activities and locations) that has been collected by governments and commercial institutions to evaluate ubicomp systems.
The Potential for Location-Aware Power Management by Rober Harletested on location data of 40 people in 50-room office building for 60 working days his proposed strategies for dinamically optimizing the energy consumption of an office.
Flickr Places “is a method of exploring Flickr with geo-specific pages. The page shows the most interesting photos for a location (iconic photos they call them), the most recent and common tags for the photos and the most prolific photo groups. It creates a separate page for each geographic location with a unique human-readable URL. Places go down to the city level so San Francisco, Seattle, and London will each have their own page and unique URL. In time they will go deeper. Places will be accessible via the Flickr API.” More here and here. From this project, data useful for evaluation could come out !
To evaluate new mobile content discovery approaches, one needs to understand:
1) What mobile users query for:
- Deciphering Mobile Search Patterns: A Study of Yahoo! Mobile Search Queries
- How People Use the Web on Mobile Devices
2) How interests distribute across mobile users (who befriend each other):