Highlights from the Horizon Crowdsourcing for Transport Lab Talk, Nottingham, 25.01.2012

January 26th, 2012


Rob Houghton, Horizon on The Role of Social Media in Transportation

  • Transport providers still use social media primarily for marketing purposes
  • Users signal each other about disruptions and cause back-channel effects for transportation providers
  • Extracting the “rail” lexicon from Twitter to detect backchanneling

Louise Crow, Lead Developer, www.FixMyTransport.com

  • Bother transport providers with transport issues reported by users
  • TfL has embraced the platform and uses it to improve its image and instill trust in passengers
  • They began crowdsourcing the relevant contact emails of responsible providers through a Google Spreadsheet

Matt Watkins, Tech Director – Mudlark – www.Chromaroma.com

  • http://vimeo.com/22023369
  • They get permission from users to “scrape” their data from TfL
  • They have permission to do so on some level but TfL is not very happy so they are looking to take the game overground
  • They have lots of TfL data that they are willing to share

Tracy Ross & Chris Parker, Loughborough Design School – “Ideas in Transit”

  • “Grassroot collaboration” – mass collaboration for fixing problems
  • They do feasibility studies in experienced utility and travel behaviour: experience sampling + crowdsourcing
  • Explored the effects of presenting map information vs verbal information to users traveling with public transport and did accessibility studies for disabled travelers

[readings] cities, leadership, adoption

December 10th, 2011

Building a Science of Cities by Mike Batty [pdf]

What Are Leaders Really For? by Duncan Watts on HBR [web]

An Experimental Study of Homophily in the Adoption of Health Behavior by Damon Centola on Scinece [web]

Happiness: measure of economic & social progress?

May 17th, 2011

This week, The Economist’s debate is…

Motion: “This house believes that new measures of economic and social
progress are needed for the 21st century economy.

Pro: Richard Layard
“Quality of life, as people experience it, has got to be a key measure of progress and a central objective for any government.” Read more.

Con: Paul Ormerod
“The real danger is the belief that by measuring happiness, it can then be predicted and controlled.”. Read more.

 

Join debate

sentiment analysis

April 12th, 2011

if you are building sentiment analysis algorithms, test the results for the following examples:

  • It is not in doubt that Radiohead’s new album is excellent.
  • Not many albums are as good as the new Radiohead album.
  • Few people would claim that Radiohead’s new album is not excellent.
  • Radiohead’s new album is hardly a successful example of the genre.
  • Radiohead’s new album is anything but good.
  • Nobody considers Radiohead’s new album to be good.

To deal with wit, sarcasm, and complex emotions, one should resort to crowdsourcing.

[Bad Science] Mistaking friends for foes

March 30th, 2011

Mistaking friends for foes: An analysis of a social network-based Sybil defense in mobile networksThe long tail doesn’t only hold for music, songs, and movies but also holds for papers published in Computer Science – few are good (mostly those in tier-one conferences), while most are rubbish. That begs the question of whether CS publishing as it is will perish. I’ll try to answer that question by taking a running example of a really bad paper recently published.

Mistaking friends for foes: An analysis of a social network-based Sybil defense in mobile networks

united colors of social computing

March 5th, 2011

this guy has pasted huge photos of people’s faces on street walls around the world, and he’s done so for a reason – to turn the world inside out! which social media technologies will turn the world inside out? one of the roles of social computing should be to build systems that unite people who are now divided by political, social, and religious differences

what we geeks don’t get about social media privacy

March 4th, 2011

Few months ago i worked with cambridge folks on the problem of mobile location privacy and yesterday i presented the resulting paper (with the cool name of spotme). during my presentation, i realised that we computer scientists often ignore the difference of your social media data being public or being publicised, and we do so because technically there isn’t any difference but socially there is a dramatic difference. To see what I mean, consider services that combine pieces of public information shared by individuals from different sources; they are often called data mashups. The problem with data mashups is that they have caused public outcries over privacy issues and are expected to create more problems in the future. To see why, consider that when people willingly share information about themselves, they do so in specific social contexts. When they make a piece of information publicly available, they implicitly guess who is more or less likely to come across that piece of information. When different pieces of public information are integrated together (when they are publicized), the social expectations people had when disclosing the single pieces may be completely disrupted (danah docet). Recent privacy failures are telling stories of disrupted social exceptions. A few years ago, Facebook aggregated content in ways that made it more visible to users who could already access it. When a Facebook user switched to an “it’s complicated” relationship, the user thought that only the few social contacts regularly visiting his profile would notice the change. Suddenly, that was not true anymore. A variety of contacts would learn the switch just from their streams of updates. This change caused a big outcry, but Facebook did not have to back off – the users did. Facebook founder Mark Zuckerberg recently contributed to the discussion and claimed that the rise of social networking online means that people no longer have an expectation of privacy, adding “we decided that these would be the social norms now and we just went for it” (MZ sharing his wisdom). The result is that Facebook “users are now so hooked that they are unlikely to revolt against a gradual loosening of privacy safeguards”. Another example comes from the data mashup performed by the site pleaserobme.com of Twitter (a microbloging service) and Foursquare (a service that lets people publicize their location so their social contacts can see where they are). This site publishes Foursquare location posts that appear on Twitter. The problem is that, when a user shares her location on Foursquare, the user thinks that only her social contacts on Foursquare or Twitter would notice it. But that has now changed – the site pleaserobme exposes whether users are somewhere other than their home to the entire Internet community, including to burglars. Again, when sharing location data, one has specific social expectations, but those exceptions are disrupted by data mashups. The aim of pleaserobme was not malicious but was to simply make users of location-based services reflect upon whether they are over-sharing. Sharing decisions might be rational in the short term, but they underestimate what might happen to that information as it is remixed and reshared.

- daniele
web @danielequercia

icdm 2010

January 27th, 2011

last month i went to icdm (i gave a talk on rethinking mobile recommendations & neal on personalised public transport). few interesting contributions follow:

// A. Christos Faloutsos (Carnegie Mellon University) gave a keynote talk titled ‘Mining Billion-node Graphs: Patterns, Generators and Tools’. He & his colleagues (=they, henceforth) studied several network measures such as:

  1. node degree. this measure might be useful to answer the question of, for example, whether an epidemic will die or not (one may look at average degree, max degree, degree variance). However, it turns out that only the first eigenvalue of the adjacency matrix is needed to understand whether the epidemic will take over or not. [see Prakas on arvix]
  2. network diameter. they found a surprising result: the diameter shrinks over time. this is surprising because the theory (see science papers by barabasi et al.)  predicts the opposite, ie, that the diameter increases, and it does so according to log(n), where n is the number of nodes in the network. in a paper at sdm10, they proposed a way of computing the diameter efficiently and computed the whole’s (Yahoo) Web diameter: of course, its distribution was multi-modal, as one would expect -  there are parts of the Web that are separate by, for example, language
  3. eigenvalues. eigenvalues might be useful for fraud detection (see KDD09 paper on  belief propagation) and for characterising and spotting anomalies in networks (they built a tool for doing that. it’s called Oddball and characterises, eg, egonetwork by studying very simple quantities like number of nodes, number of edges, number of triangles, total weight, and principal eigenvalues)
  4. triangles. in this paper [Tsonakakis ICDM 2008], they have shown that if one has n friends, then he/she would have n^1.6 triangles on average. computing the number of triangles in a network is computationally expensive.  fortunately, they proposed two ways to make this computation tractable:
  • the 1st way is about computing  few top eigenvalues only (lambda_i) and the number of triangles would then be =1/6 of the sum(lamba_i^3)  [Tsonakakis ICDM 2008]
  • the 2nd way relies on SVD (see EigenSpokes at PKDD 2010)

they also studied network-related quantities over time. for instance, they looked at:

  1. popularity of posts over time. they studied the number of links to a post over “lag” days. that is, given a post at time t, they looked after which time t+lag a link to the post would start to appear. what’s the distribution of lag? it’s power law with exponent -1.6
  2. duration of phone calls. this quantity is often used to compute link weights in networks. in their research, they found that phone call duration fits  a  log-normal distribution in a OKish way but is  best described by a newly introduced distribution called TLAC (LAzy Contractor): the longer a call has taken, the longer it will take

// B. [don't remember title] this paper tries to predict age and gender of web-page creators based on the page’s text, title, or structure.

// C. Modeling Information Diffusion in Social Media by Jaewon Yang and Jure Leskovec. the goal of this paper is to look at the process of info diffusion in networks and predict the number of infected nodes at a given point in time without knowing the network structure. to do so, they use a linear influence model that accounts for only which nodes got infected in the past. each node has an influence function that is estimated using past data. a node’s influence function is modelled in discrete time units, so no assumption is made about its shape. more on http://snap.stanford.edu/

//D. MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model. i would also briefly check this paper ;-)

Morozov talk at the LSE

January 12th, 2011

Professional cyber-curmudgeon Evgeny Morozov will be speaking at the LSE on the 19th of January.

The war of words

January 5th, 2011

Evgeny Morozov’s latest skeptical article (of many) about the liberatory potential of communication technology describes a new trend in internet censorship: in addition to building national firewalls and knocking websites offline, some governments are trying to win the war of words on the internet by engaging their citizens in debate.

The Chinese government’s practice of paying citizens — the so-called “50 cent party” — to make pro-government points in online debates is well documented, and the propaganda department reported last year that “we have used the Internet to vigorously organize and launch positive propaganda, and actively strengthen our abilities to guide public opinion.” In Russia, Morozov claims that the Kremlin is engaging in “comment warfare” with its opponents. Yet what’s rarely asked in Western criticisms of such developments is how they compare to the operation of media power in our own societies. To the extent that authoritarian governments are engaging in public debate about their policies, rather than trying to silence such debate, are they not in fact moving towards a different form of governance — one in which winning the argument, rather than preventing the argument, is the foundation of legitimacy?

I’m not of course trying to claim that the internet in Russia or China is some kind of ideal Habermasian public sphere — but neither is the internet in Britain. “Public relations” and “spin” are accepted facts of life in Western politics, though few are honest enough to call them “propaganda”. We accept that a few hands control the news agenda, and that while anyone is in principle free to speak their mind, most will not be heard — yet we regard our society as essentially democratic, and Chinese or Russian society as essentially authoritarian.

I want to argue that such essentialism is mistaken, and that the mode of governance towards which China is rapidly moving — described by Rebecca MacKinnon as networked authoritarianism, and by Min Jiang as authoritarian deliberation — resembles in many ways the society in which we already live. That is not, of course, to say that they are identical; only that in both systems, winning the war of words is of paramount political importance.

This raises a difficult question for those who would like to quantify the democratising influence of communication technology: can we distinguish between propaganda and spin, between “comment warfare” and genuine debate — and if not, what does that mean for our understanding of democracy?

Magazines, marketers, middlemen and micropayments

January 4th, 2011

Fortune Tech is reporting that digital magazine subscriptions are already falling, before the new medium has even managed to reach the mainstream. Some in the publishing world would like to blame the problem on the difficulty of marketing digital magazines when Apple refuses to share customer data with publishers. Such complaints may be put to the test if Google’s prospective subscription service for Android gives publishers access to the customer data they’ve been asking for, as the Fortune article speculates.

How might the industry evolve if Apple, Google and the like have to compete to attract content creators by offering them customers’ data? This is an issue that goes beyond the new medium of digital magazines and calls into question the role of the device vendor as information gatekeeper.

One interesting possibility is that once creators can contact their customers they’ll deal with them directly, cutting device vendors out of future transactions and reducing their power as gatekeepers – and thus their bargaining power when trying to withold further customer data. Given the potential for companies such as Apple and Google to abuse their position as Holders of the Holy Code-Signing Keys, I see this as a good thing; it might even be a step towards the kind of ‘peer-to-peer economics’ copyright reformers have long advocated, where artists and audiences benefit by cutting out rent-seeking middlemen. But for that to happen, there needs to be an easy way for creators to squeeze the occasional drop of cash from their newly-identified customers.

Perhaps Flattr, a click-based micropayment system, can fill that niche. But I wonder whether we can’t take its core insight – minimising mental transaction costs – even further. Why do I need to click when my device already knows what I’m reading, playing, watching and listening to?

What I want to suggest isn’t a new idea – it’s just a combination of two existing ideas, Flattr and scrobbling. Creators would tag their content to indicate ownership, and the mobile device would keep track of how much time was spent on each creator’s content during the month, dividing a fixed budget among the creators at the end of the month. Mental transaction cost: zero.

The problem, of course, is that if users set their monthly budgets to zero, nobody gets paid. Will creators be willing to risk giving their work away for free in return for circumventing the middlemen and earning 100% of whatever people are willing to pay? If sales of digital magazines are anything to go by, they might not have much to lose…

Londoners! How do you get around town?

December 20th, 2010

Following on from some recent research about how individuals move about town, we are calling out to all Londoners to participate in a survey that we recently put together. The survey asks about two things: your travel habits and how you fund those travel habits. But a good place to start is: why should anyone care about these things?

At face value, topping up your Oyster card with credit or buying a travel card seems simple and mundane. However, we all know that the cost of travel in London is not only always growing – it also depends on who you are (which determines which discounts you are eligible for) where you travel to and from (i.e. what zones), when you travel (e.g., rush-hour or day time) and how frequently you tend to move between places over time periods that span from single days to an entire year (anyone out there ever bought an annual travel card? Not me!).

In other words, there isn’t really a transparent link between how you travel and what the cheapest fare for you to be paying is. Yes, I know about the daily capping on pay as you go – but if you are going to be travelling every day for seven days in a row, there is no “cap” on your weekly spend- so maybe you should have bought that 7-day pass! How do Londoners make these decisions?

There are some fascinating numbers relating to money and the tube. Over £40,000 was refunded to travellers between January and August 2010 as a result of complaints regarding overcharging. TfL itself estimates that over £300,000 is wasted per day by passengers buying paper tickets instead of opting for the electronic equivalent (see here), and other investigations have revealed that approximately £30 million of travel credit is sitting in the system, idle and unused. These vast sums of wasted money all point to the fact that making the correct decision at the point of purchase is not only uninformed and lacking in transparency, but also incredibly difficult for travellers to reason about in order to purchase the cheapest fare for themselves.

The survey has three parts:

  1. Questions about your travel habits! Where do you start/end your days? How often do you travel? What times do you travel? How consistent are your commutes?
  2. Questions about your topping up/travel card purchase habits! How much do you top-up by? Why and when do you use pay as you go? What travel cards do you buy? Why do you buy them?
  3. An opportunity for you to really help our research and enter a prize draw for a new Apple iPad! All you have to do is give us your Oyster card number and allow us to get your 8 week travel history from Transport for London. How will we use this? Your travel history will give us a direct insight into how groups of Londoners navigate our city. Keep in mind that we don’t want or ask for your name, telephone number, age, gender, or occupation. You are, to that extent, very anonymous (we ask for your email address for the prize draw). We just care about your Oyster card number and what kind of Oyster card it is- your travel history data will be stored safely and anonymously and will only be used for this research project. If you have any concerns or need clarification, get in touch with me (email or twitter).

So, have I linked to the survey enough already? Please help us and fill it out!

Personalised Public Transport

December 20th, 2010

I’m just on my way back from beautiful Sydney, where I presented a paper called “Mining Public Transport Usage for Personalised Intelligent Transport Systems” (by me, Jon Froehlich, and Licia Capra) at the IEEE 2010 International Conference on Data Mining. The abstract of the paper reads as follows:

Traveller information, route planning, and service updates have become essential components of public transport systems: they help people navigate built environments by providing access to information regarding delays and service disruptions. However, one aspect that these systems invariably lack is a way of tailoring the information they offer in order to provide personalised trip time estimates and relevant notifications to each traveller. Mining each user’s travel history, collected by automated ticketing systems, has the potential to address this gap. In this work, we analyse one such dataset of travel history on the London underground. We then propose and evaluate methods to (a) predict personalised trip times for the system users and (b) rank stations based on future mobility patterns, in order to identify the subset of stations that are of greatest interest to each other and thus provide useful travel updates.

This roughly translates to:

Public transport in a large city like London can be chaotic; the information services that were built to support it do not take into consideration who you are when they spit out updates. At the same time, most Londoners now use Oyster cards, that record detailed traces of each person’s movements around the city. The research question we address in the paper is: can Oyster card records be leveraged to build personalised travel info services? Much like the way Amazon says “recommended especially for you” – can we do similar things with travel data? Short answer: yes. Long answer: read the paper. Medium answer: look at slides below.

foursquare was down

October 21st, 2010

salvo told me that  foursquare had one of its databases overloaded with check-ins and consequently experienced a downtime of 11 hours! so central services are unscalable (unreliable) and they need to be decentralized. partial decentralization is the solution in the industry, and full decentralization still remains in the academic circles ($$ reasons). few years ago, i asked the question: what if mobile social-networking services were to be decentralized. it was an academic exercise that resulted in:

social activism and weak ties

October 19th, 2010

In The New Yorker, Malcom Gladwell argued that online social networks such as Twitter aren’t good for “real” social activism, not least because they support only weak ties. The assumption here is that social activism needs strong ties. In reality, the opposite is true. Mark Granovetter’s classic 1973 paper titled “The Strength of Weak Ties” discussed the relationship between tie strength and social activism. Granovetter considered the redevelopment project of the Italian neighbourhood in Boston in the 60s. The project was widely opposed by the community but went forward. Why? The problem was the absence of weak ties within the Italian neighbourhood. Social life revolved around members and unchanging groups of friends, and the density of strong ties (but relative lack of weak ones)  inhibited any political change. Gladwell cited Granovetter’s article but didn’t read it. Gladwell titled his article “Why the revolution will not be tweeted”. Perhaps revolution is not what we need. We might just need people who read what they cite and don’t fall into the trap of “the old dismissing the new” (substitute “telephone” for “twitter”/”facebook” and see how the article reads).#fail