- datakind.org connects non-profits facing data analysis challenges with pro-bono data scientists
- Google’s and USAID’s support of the World Resources Institute’s Global Forest Watch 2.0 monitoring tool
- Red Cross Relies on OpenStreetMap in Haiyan Relief Efforts
- shareable.net launched the Sharing Cities Network to support innovators and activists working to make cities around the world more sharing
Interesting list. Two innovators in the area of “smart cities”:
“To understand human development, you have to understand the contexts that affect it. Sampson argues that neighborhood environments—not just the characteristics of people living in them—influence phenomena like crime, health, and learning.
Researchers also filmed Chicago block by block in what Sampson describes as a proto-Google Street View. Cruising in an SUV with tinted windows, at three to five miles an hour, his crew videotaped 23,816 street segments. They captured and logged both physical and social details: housing structures, garbage in the street, abandoned cars, broken windows, unsupervised kids, public drinking, prostitution, graffiti, drugs, arguments, police presence, even condoms on the street.
To examine altruism, Sampson fielded a “Lost Letter Experiment” in which researchers scattered stamped, addressed letters across neighborhoods and measured rates of return. To evaluate civic engagement, his team mined newspaper archives for events like protests or community breakfasts, amassing a database that spanned more than 30 years and 4,000 public gatherings.
Sampson took the social temperature of communities. One concept key to Sampson’s thinking is “collective efficacy,” a measure of how much people trust their neighbors and are willing to help them. In explaining a neighborhood’s level of criminal violence, Sampson has argued, collective efficacy is as important as—or even more important than—other characteristics, like poverty or physical disorder. .. No longer could we simply clean up the streets to get rid of the criminal element. We needed to promote, well, neighborliness”
But the question of neighborhood effects remains “hotly debated,” One knock on neighborhood-effects research is that it fails to account for a problem that scholars call “self-selection bias“—the effect of similar people clustering together.
Anyhow, “neighborhoods profoundly matter… the difference between living in a very poor neighborhood and a moderately middle-class neighborhood is as large as doubling your income in terms of happiness and well-being.”
Greatly organized conference. Attendees especially liked the “Brazilian typical dinner”, pic below
What did I do in Rio? Here we shall focus on the professional bit of my visit On Monday, I gave a keynote at the workshop on ”Making Sense of Microposts“. I summarized the work I have been doing for the last year or so - Urban*: Crowdsourcing for the Good of London (slides). I also briefly mentioned what I then presented on Friday: Psychological Maps 2.0 (slides). The idea behind this work is that planners and social psychologists have suggested that the recognizability of the urban environment is linked to people’s socio-economic well-being. We built a web game that puts the recognizability of London’s streets to the test. It follows as closely as possible one experiment done by Stanley Milgram in 1972. We found strong correlations suggesting that the more recognizable a neighborhood, the more socio-economic comfortable the neighborhood (less crime, better living environment). This has interesting implications for urban planning, for location-based services (profiling&personalization of mobile services), and for web engagement studies (stay tuned for our future work!). With Jisun and Jon, I also had a poster on Fragmented Social Media: A Look into Selective Exposure to Political News.
As for other Yahoo!ers , Gianmarco gave a keynote at the Real-Time Analysis and Mining of Social Streams. He introduced a platform for mining big data streams called SAMOA – really high-impact work! Mounia enjoyed an impressive buzz on Twitter with her tutorial on measuring user engagement (the slides are detailed enough to read beautifully). Bart presented his work with Adam about segmenting space (e.g., USA map) solely based on geo-referenced picture tags. The work featured an impressive demo, which I really hope will be publicly available soon.
Papers I found interesting include:
Trade Area Analysis using User Generated Mobile Location Data
This is about “identifying the activity center of a mobile user, profiling users based on their location history, and modeling users’ preference probability.” interesting applications of this work include determination of trade Area Boundary using Check-ins; and Location-based User Profiling.
Do Social Explanations Work? Studying and Modeling the Effects of Social Explanations in Recommender Systems
very interesting topic in recsys – explanability! “Recommender systems associated with social networks often use social explanations (e.g. “X, Y and 2 friends like this”) to support the recommendations. We present a study of the effects of these social explanations in a music recommendation context.”
Hierarchical Geographical Modeling of User Locations from Social Media Posts
Google folks proposed an integrated generative model of location and message content (tweets). this model can predict location just based on tweets. more specifically, they are able to “obtain accurate estimates of the location of a user
based on his tweets and to obtain a detailed estimate of a geographical language model.”
MSR friends show how user demographic traits such as age and gender, and even political and religious views can be efficiently and accurately inferred based on their search query histories. “This is accomplished in two steps; we first train predictive models based on the publically available myPersonality dataset containing users’ Facebook Likes and their demographic information. We then match Facebook Likes with search queries using Open Directory Project categories. Finally, we apply the model trained on Facebook Likes to large-scale query logs of a commercial search engine while explicitly taking into account the difference between the traits distribution in both datasets. “
Aggregating Crowdsourced Binary Ratings
“In this paper we analyze a crowdsourcing system consisting of a set of users and a set of binary choice questions. Each user has an unknown, fixed, reliability that determines the user’s error rate in answering questions. The problem is to determine the truth values of the questions solely based on the user answers. “
No Country for Old Members: User Lifecycle and Linguistic Change in Online Communities
This won the best paper award. Users of online communities follow a “two-stage lifecycle with respect to their susceptibility to linguistic change: a linguistically innovative learning phase in which users adopt the language of the community followed by a conservative phase in which users stop changing and the evolving community norms pass them by. Building on this observation, we show how this framework can be used to detect, early in a user’s career, how long she will stay active in the community.”
From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise Through Online Reviews
In a vein similar to the previous paper, they ” model how tastes change due to the very act of consuming more products— in other words, as users become more experienced.” it’s a very nice addition to the recsys literature
Timespent Based Models for Predicting User Retention
Another way of predicting a user’s lifetime. The authors “attempt to address the problem of predicting user retention based on the user’s previous sessions. The paper first explores the different user and content features that are helpful in predicting user retention.
WTF: The Who to Follow Service at Twitter
directly from twitter folks. the paper gets interested in section 5 for me. i didn’t know that twitter used salsa.
Organizational Overlap on Social Networks and its Applications
another way of doing link prediction by linkedin. ” computing the probability of connection between two people based on organizational overlap (based on the users belonging to organizations such as companies, schools, and online groups) “
Spatio-Temporal Dynamics of Online Memes: A Study of Geo-Tagged Tweets
James presented this cool paper. very very interesting metrics in there like hashtag’s: focus; entropy; and spread. they found that majority of hashtags have a small spread. they also measured a city’s capability of spreading ideas (=hashtags) globally and found spray & diffuse patterns seen in previous paper (scellato’s paper on youtube videos in WWW’12). these findings might results into interesting applications, e.g., predicting the popularity of videos based on the country; given an idea, predicting the popularity of the idea; learning spatial granularities for spatio-temporal DBs It would be cool to expand the work by looking at different types of boundaries: language boundaries, ideological boundaries (west vs. east coasts), and cultural boundaries.
Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms
The diversification problem is usually addressed as a bicriteria objective optimization problem of relevance and diversity…we propose a novel measure called expanded relevance which combines both relevance and diversity into a single function in order to measure the coverage of the relevant part of the graph.
Wisdom in the Social Crowd: an Analysis of Quora
the first analysis of quora i’ve seen. very nice
Voices of Victory: A Computational Focus Group Framework for Tracking Opinion Shift in Real Time
they propose a nice way of real-time tracking opinion shift in social media. “Our approach uses prior user behaviors to detect users’ biases, then groups users with similar biases together. We track the behavior streams from these like-minded sub- groups and present time-dependent collective measures of their opinions. These measures control for the response rate and base attitudes of the users, making shifts in opinion both easier to detect and easier to interpret. “
Gender Swapping and User Behaviors in Online Social Games
If you are into changing sex, virtually i meant, you might be interested in this paper. it studies ““gender swapping” in multiplayer games which refers to players choosing avatars of genders opposite to their natural ones.” unfortunately, i missed the presentation and could not enjoy Juyong in cross-dressing attire.
Predicting Group Stability in Online Social Networks
“We build models to predict if a group is going to remain stable or is likely to shrink over a period of time. We observe that both the level of member diversity and social activities are critical in maintaining the stability of groups. We also find that certain ‘prolific’ members play a more important role in maintaining the group stability. “
cute cross-platform work. “We specifically aim at understanding if the user’s profile information in a social network (for example Facebook) can be leveraged to predict what categories of products the user will buy from (for example eBay Electronics). “
“… Google has asked us to build our lives around it, and we have responded…Encyclopaedias? Antiques. Book shelves and file cabinets? Who needs them? And once we all become comfortable with that, we begin rearranging our mental architecture. We stop memorising key data points and start learning how to ask the right questions. We begin to think differently. About lots of things. We stop keeping a mental model of the physical geography of the world around us, because why bother? We can call up an incredibly detailed and accurate map of the world, complete with satellite and street-level images, whenever we want. … The bottom line is that the more we all participate in this world, the more we come to depend on it. The more it becomes the world. … That’s a lot of power to put in the hands of a company … But in the long run that’s a problem for Google. Because we tend not to entrust this sort of critical public infrastructure to the private sector. Network externalities are all fine and good to ignore so long as they mainly apply to the sharing of news and pics from a weekend trip with college friends. Once they concern large swathes of economic output and the cognitive activity of millions of people, it is difficult to keep the government out. “Google’s Google problem
in “march 16 and 17 in Washington DC (#data4good). People who rarely work together — coders, quants, data visualizers, procurement experts, economists, lawyers, students, senior managers, open data evangelists — ended up at the same table for 36 hours of intense work, united by their love of data. The goals were attractive. How can we measure poverty more often and more accurately? Can we detect fraud by looking at the data?”
Photographer Paths: Sequence Alignment of Geotagged Photos for Exploration-based Route Planning (pdf, blog, slides)
problem: how can we build city route planners that ‘automatically’ compute route plans based not on efficiency, but on people’s trailing city experiences?
proposal: use a sequence alignment technique from biology
evaluation: lab + web survey + interviews (well done)
Using Facebook after losing a job: Differential benefits of strong and weak ties (ACM pdf)
problem: @grammarnerd presents awesome work pairing surveys with Facebook log data to see what ties predict support & finding new job
results: social support and lowering of stress both increase with strong ties communication. Surprisingly, bridging social capital increases with not only weak-tie communication but
also with strong-tie communication (which is not about reading but it’s about talking to them). talking with strong ties for people who are looking for jobs increases stress level, while talking with strong ties for people who have jobs decreases stress level BUT talking more to strong, not weak, ties was twice as likely to lead to a new job.
Trend Makers and Trend Spotters in a Mobile Application (pdf, slides)
questions: WHO creates trends in a mobile sharing app? accidentals or influentials?
answer: influentials DO exist, yet they are not few but many!
application: identify trends early on (recsys paper pdf)
Finger On The Pulse: Identifying Deprivation Using Transit Flow Analysis (pdf, blog, slides)
problem: can we assess a city’s health by monitoring the flow of people, just like a nurse takes your heart-rate and blood pressure during a health check?
answer: yes! using passenger flow, diversity of passenger geographic connections, and use of transport modality, one can effectively do so!
Ubiquitous Crowd-sourcing into Context (pdf)
problem: ”investigate what contextual factors correlate with coverage of OSM information in urban settings”
results: ” although there is a direct correlation between population density and information coverage, other socio-economic factors also play an important role. We discuss the implications of these findings with respect to the design of urban crowd-sourcing applications.”
Major Life Changes and Behavioral Markers in Social Media: Case of Childbirth ()
very interesting work by @munmun10, looking at linguistic markers pre and post childbirth. also, see great work to be published in chi 2013 on this.
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Recommender Algorithm (pdf)
problem: instead of using KNN for recommending stuff, they came up with KFN!
KNN: recommend movies that are liked by people similar to you
KFN: recommend movies that are disliked by people dissimilar to you
results: KNN recommends movies that users have seen; KNN and KFN both recommend movies that user likes
Digital Neighborhood Watch: Investigating the Sharing of Camera Data Amongst Neighbors ()
idea: neighborhood watch supported by webcams.
comment: the privacy angle is of great importance.
Representation and Communication: Challenges in Interpreting Large Social Media Datasets (pdf)
idea: study of “four features of Foursquare’s use: the relationship between attendance and check-ins, event check-ins, commercial incentives to check-in, and lastly humorous check-ins These points show how large data analysis is affected by the end user uses to which social networks are put.”
Hollaback!: The Role of Collective Storytelling Online in a Social Movement Organization (pdf)
idea: can sharing a story of experienced harassment really make a difference to an individual or a community?
Doodle Around the World: Online Scheduling Behavior Reflects Cultural Differences in Time Perception and Group Decision-Making (pdf, blog, data)
question: “Does (national) culture determine how we schedule events online?”
answer: yes, it does! big time individualists strategically respond late, but are less likely to find consensus, while collectivists seem to make a larger effort to reach mutual agreement
also, interesting the keynote talk by ron burt on the serial closure hypothesis (pdf) and the special session dedicated to the
conference’s most cited paper*, where: “The authors will re-present
the original papers using their original slides, and then discuss
developments in the field since then.” The paper is “Grouplens: an open architecture for collaborative filtering of net news” (CSCW 1994)
Freedom-loving yellow is the symbol of vastness and openness, is regarded as the color of intellect – who loves yellow, has a great desire for freedom. Harmonious green is the primary color of nature – it symbolizes growth, healing and harmony. Those who love green, are reliable, have a lot of compassion and great social skills. In Islam and Judaism is the color of compassion. Loyal blue corresponds to the element of water and symbolizes peace – people who love blue are often admired: because of their solid charakter and deep loyalty. They often appear very distant and reserved. Powerful red is symbol of love, sex, excitement. People who love red are Power types – always one step ahead of all others. Motto: You can, if you want. In love they are very sensual.
Falling Walls is a German government-led initiative aimed at encouraging breakthrough research in science, industry and politics. Started as a one-off event to celebrate the 20th anniversary of the fall of the Berlin Wall, the Falling Walls Conference brings together the best of the world’s researchers and thinkers for one day, 9 November. One day prior to the conference, 100 outstanding young scientists, professionals, and entrepreneurs present their groundbreaking ideas in 3 minutes at the Falling Walls Lab (urbangems.org was one of the 100, and that’s why I attended the conference and the lab). More on both conference and lab …
random notes & thoughts
From the Sunday’s workshops, I remember this paper “Dating Sites and the Split-complex Numbers” It uses split-complex numbers to represent dating preferences in an elegant way. It seems promising. I’d be great to connect this work on previous papers on trust and distrust and on structural balance theories… I also heard that two presentations were quite good: 1) Content, Connections, and Context 2) Joseph Konstan talk abt the different decision strategies ppl have in different contexts.
On Thursday, we run a workshops on mobile recommender systems. Francesco Calabrese of IBM Smart Cities gave an interesting invited talk about current projects on transportation systems. Then, we had a set of really good talks & one outdoor activity. What did I learn? Well, most of the existing mobile systems assume that the recommendation process unfolds in one single step – get restaurant recommendations & choose one of them. In reality, recommendations in the built environment should go beyond that. For example,
- To mimic humans, the task of recommending restaurants should at least return 3 different recommendations (or facets): closest restaurant, best restaurant, trade-off between the two.
- One should understand WHY people visit certain places. How did they make those decisions? Which criteria did they employ?
- Recommender systems need to tap into established findings in the area of urban studies. For example, in our RecSys paper “Ads & the City“, we exploited the fact that people are boring – they generally do not travel very far – unless what they are looking for is not readily available where they are.
- Temporal patterns in recommender systems have not been widely studied. They have been studied on Web platforms only recently (and Neal Lathia has done great work on that!) and have been neglected in mobile platforms. That is why we had another paper in the conference titled “Spotting Trends: The Wisdom of the few“
- Finally, and more importantly, we need far more user studies of how these systems are ACTUALLY used! Recommendations do not matter much -the experience counts
And this is just scratching the surface
I remember only few things from the conference (the industry track was pretty good):
- Multiple Objective Optimization in Recommendation Systems (linkedin). Nice example of A/B testing
- Towards Personality-Based Personalization (Thore Graepel of Microsoft Research). Nice talk about how easy is to predict personal attributes of Facebook users based on their likes. if you are interested in personality and social media, you should check out our work on Facebook and Twitter (we can predict personality traits of twitter users upon only their number of followers, following, and listed counts)
- Building Industrial-scale Real-world Recommender Systems (Xavier Amatriain of Netflix). Brilliant (& fully packed) tutorial. Check this out for a summary.
- Controlled experiments at Microsoft Bing (very
- Pareto-efficient hybrization for multi-objective recommender systems (UFMG). Here the question is how to combine different types of algorithms (hybrization).
- User Effort vs. Accuracy in Rating-based Elicitation (PoliMI). What’s the optimal number of users ratings for movie recommendations? It seems to be between 5 to 20.
- TasteWeights: A Visual Interactive Hybrid Recommender System (UCSB). Visualization platform for your social media stream
- Learning to rank optimizing MRR for recommendations. Very cool work. It taps into the less is more concept, which I’m a big fan of
- Thumbs up to real-world stuff: Beyond Lists: Studying the Effect of Different Recommendation Visualizations; Yokie – Explorations in Curated Real-Time Search & Discovery Using Twitter; A System for Twitter User List Curation; The Demonstration of the Reviewer’s Assistant; CubeThat: News Article Recommender (browser extension for Chrome displays recommended additional news stories related to the same topic as the current news story)
- Challenges in music recommendation (@plamere from @echonest). A couple of interesting insights: “Understanding the specifics of your domain is critical to building a good recommender”; and recommending down-tail is OK, while recommending up-tail (britney to one who likes tom waits) is risky. Might be offensive to one’s music identity. So make your recommendations Hipster-Friendly
Six degrees of mobilisation. Technology and society: To what extent can social networking make it easier to find people and solve real-world problems?
Look, no hands. Automotive technology: Driverless cars promise to reduce road accidents, ease congestion and revolutionise transport… Assuming that autonomous vehicles make journeys quicker and use road space more efficiently, how should planners exploit the benefits of automation? On the one hand it would allow cities to get bigger, by reducing the time and stress associated with commuting. On the other, it could allow cities to become denser, by reducing the amount
A knight in digital armour. Chris Soghoian, the most prominent of a new breed of activist technology researchers, delights in exposing security flaws and privacy violations
Personal data. Shameless self-promotion Britain wants to lead the world in exploiting consumer data…. Transactional data helps businesses make money, and the government thinks consumers should profit from it too.
Changing London. Brixton is now a black shopping destination.
Radio Ga Ga. A small radio station in Sierra Leone offers big lessons for the UN.
from FT article.
About 2.5bn people in emerging markets have mobile phones. … Consider what happened two-and-a-half years ago when the Haitian earthquake struck. …researchers at Columbia University and the Karolinska Institute took a different tack: they tracked Sim cards inside Haitians’ mobile phones. That helped them to “analyse the destination of more than 600,000 people who were displaced from Port au Prince”
Twitter and Facebook. These are strikingly popular in emerging markets; Indonesia, for example, has one of the world’s most Twitter-addicted populations. Thus a sudden increase in certain keywords can provide early warning of distress. References to food or ethnic strife may indicate incipient famine or unrest.