Web Science: A Royal Society Meeting (day 2)

following this post, notes for day 2. featured:  luis von ahn, jonathan zittrain, nourish contractor, and manuel castells.

luis von ahn‘s talk was one of the most well-received [we were about to cry and go for a standing ovation ;-)]. in the first part of the talk, luis covered his well-know work on re-captcha.  in the second part, he covered  a new project called duolingo – a collaborative human effort to translate the entire web! the project has been considering the following problem “how do we get 100 million people translating the web into every major language *for free*?” though  problem, not least because of the following two main issues:

  1. lack of bilinguals
  2. how do you motivate people to do it for free?

here comes the brilliant ‘aha’ moment:

there are 1,2 B people around the world who are learning a foreign language (5 million of which live in usa and paid over $500 for an educational software). so why not build a web platform with which people learn a foreign language for free while simultaneously translating text? luis’ group built one such platform and are now testing it. simply GENIUS! the result is that people learn as well as they would do with the professional educational software while they translate text (not word-to-word translation but entire sentence translation sequentially extracted from real books), and they do so really fast! luis estimated that translating all Wikipedia into, for example, Spanish would only take 80 hours with 1m users! of course, there are major details not covered in the presentation (for luis’ grad students’ piece of mind), but here is the big picture.  if, say, you want to learn spanish (you want to learn reading, writing, listening, and speaking it), then the software will do the following – for

  1. reading: you are given “el zorro corre” and you should translate it (hopefully into something like “the fox runs”);
  2. writing: [the reverse] you are given “the fox runs” and you are supposed to come up with “el zorro corre”. then, the software combines a) and b) and infers the semantic link between “el zorro corre” and “the fox runs”.
  3. listening: you need to subtitle videos in spanish
  4. speaking: you need to train a speech recogniser

this kind of research could be summarise in one-sentence as follows: “take activities people do in everyday world and re-purpose them to do something useful‘!

jonathan zittrain made a very interesting point in hist talk “Will the web break?” – the web is not isotropic and that’s bad. He was referring to the variety of restrictions that challenge the notion of open web – eg, one isn’t able to watch BBC iplayer outside the UK, google shows different results depending on where you are. In one sentence – the web isn’t flat and changes depending on where you are! he proposed a solution that might preserve the web. the ideal web should be:

  • isotrophic (wherever you are you are link to the same place);
  • time symmetric and persistent (inference works only on this);
  • generative (any third-parties can produce something new);
  • if many universes, wormholes (different wikipedia for each language linked by bridges); entropic?

nourish contractor (site) introduced his recent work around in the new field of computational social science. this field taps into the opportunities created by having: i) social science theories; ii) data on web 2.0; iii)

methods for analysing the data; and iv) computational infrastructure for doing the analysis.  in one of their projects, they apply  p* models to the problem of  link prediction on different types of networks: i) social network: who u know; ii) cognitive social networks: who they think u know; iii) knowledge networks: what they think u know. as a practical application, they consider recommender systems.

The main idea behind jain rameshis’ talk is that there are many source of (what jain calls) “events” that come from, eg,  Facebook, twitter, cameras, and webcams. The question is – can we gather and analyse those events to offer users a personalised experience of info consumption? we need a personalised experience because, he claims, mainstream media outlets only cover events of general interest and do not cover all news that is important to us (eg, from our family members). the good news is that with social networking all this information is there…  the tech question then becomes – how do we translate events (eg,spike of  tweets in town) into meaningful real-life situations (eg, protest in town)? jain’s group tried to answer this question by plotting events into a “social picture” (where each pixel represents a micro-event) and by then using image processing algorithms to fin hot spots (see this and kleinberg’s work). for more, check jain’s recent publications.

manuel castells gave a talk on what social research knows about social networks. he claims that network technologies rarely create social isolation; by contrast, they are often the medium for organising  offline communication in new ways – the more social individuals are, the more they use the internet (to strengthen relationships with family, friends and local community). that’s true nearly for all cultures, research shows (see a study by Michael Wilmott on women). interestingly, manuel castells says that social-networking sites cannot control how people interact – “if they do someone will create a new site that does what they want and everyone will migrate there. If facebook tried to go nasty it will disappear, as AOL did, as seen when Facebook tried to charge and retracted this three days later as people went away.”  [my opinion: i'm not sure i entirely agree with that - i feel that facebook is gradually eroding our privacy]. nice talk! during the q&a, wendy hall asked asked whether traditional sociology is relegated to small-scale studies. castells answered ‘yes’, of course. but this needs to be changed. i feel that we need more initiatives related to computational social science in uk!

those are only few of the talks given. more here.

Comments are closed.