Archive for April, 2008

CINCO: supporting trust decisions for inter-enterprise collaboration

Wednesday, April 30th, 2008

The CINCO (Collaborative and Interoperable Computing) research group at University of Helsinki aims to automate some of the routine tasks in inter-enterprise collaboration management. The vision is that one day, enterprises can trust an automatic system to 1) figure out which service provider to use for a task, e.g. a logistics service to deliver a set of goods, 2) ensure that the collaborating services are interoperable, and 3) gather and share experience on how the collaboration went. And all this should be achievable without first spending a few years to get to know (and integrate your systems with) every single service provider whose offers you might wish to choose between.

Experience sharing for this kind of a system has special needs. A major difference to e.g. eBay and the various recommender systems for consumers is that the information should be possible to both understand and evaluate for credibility automatically. While the average concerned web user can google around for hoaxes, or browse through the profiles and activities of the users behind eBay ratings until convinced, our automatic decision-maker has to have an explicit model of “suspicious” or “sensible” for reputation information in order to determine the credibility of the information available. When a decision to commit real-world resources is made automatically, we’ll need to be able to measure the certainty behind the reasoning.

A few of the interesting research questions we’re working with are how to represent the different factors of trust for these decisions and to combine them into a decision, how to model the shared experiences or reputation, how to evaluate the credibility of information and its sources, and how to make different reputation systems interoperate. See the group’s selected reading for more information and three surveys.

KDD 2008 Workshops

Tuesday, April 15th, 2008

The 2008 ACM SIGKDD (Knowledge Discovery and Data Mining) has just announced the list of accepted workshops, that are below. More information can be found here.

Full Day Workshops

  • The 2nd International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD’08)
  • The 9th Intl. Workshop on Multimedia Data Mining
  • WEBKDD’08: 10 Years of Knowledge Discovery on the Web
  • The 2nd International Workshop on Knowledge Discovery from Sensor Data (Sensor-KDD, 2008)
  • The 2nd ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD’08)
  • The 2nd SNA-KDD Workshop on Social Network Mining and Analysis

Half-day Workshops

  • The 2nd International Workshop on Mining Multiple Information Sources
  • The 2nd KDD Workshop on Large Scale Recommenders Systems and the Netflix Prize
  • Data Mining with Constraints
  • Data Mining using Matrices and Tensors
  • The 8th International Workshop on Data Mining in Bioinformatics (BIOKDD08)
  • Data Mining for Business Applications

Online Applications

Tuesday, April 15th, 2008

Although in these posts we tend to focus on the research aspect of trust, social networking, recommender systems, and mobile applications, it is always interesting to keep an eye on what is going on in the “real” world – what is drawing the investment and attention of entrepreneurs.

Recommender Systems: the list is near-infinite now (and is proportional to the problem of data portability?), and everyone knows about the Amazons and Last.fms. However there are a number of names appearing that merge recommendations with social interactions – away from neat algorithms and towards human-driven reviews and recommendations. Names like Reevoo, Boxedup, LouderVoice, Crowdstorm, and RecommendBox.

Location-based Services: Although there are a wide range of potential applications for mobile phones, many of the early names seem to be focusing on mobile social networking. Sites like Imity, Mobiluck, Loopt, Hyphen-8, and MeetMoi. Some of them use bluetooth, others only require their users to SMS their location. One day the “familiar stranger” will not be a stranger for long!

The Culture of the Amateur

Thursday, April 10th, 2008

If you are running particularly long experiments like me, or are looking for something to watch for 45 minutes, then I suggest this video on youtube: a documentary about truth and wikipedia. It features interviews with big pro- and anti- web 2.0 names, and discusses the extent to which sites like wikipedia encourage truth, freedom, and democracy (or mob-rule, lies, and social fragmentation).

Underground Aesthetics: Rethinking Urban Computing

Tuesday, April 8th, 2008

Yesterday I came across this terrific piece of research (pdf).

Situation: We usually see mobility as a (research) problem. So we design applications:

  • For accessing info “anytime, anywhere” (When we view mobility as disconnection)
  • For helping users to find interesting nearby restaurants (When mobility involves being “out of place” or lost)
  • That respond to contextual cues. For example, a mobile that sets “itself automatically to vibrate mode in a theatre”. (When we view mobility as disruption)

Proposal: Some local folk (Arianna Bassoli of LSE and Karen Martin of UCL) and some folk on the other side of the pond (Johanna Brewer and Paul Dourish of UCI and Scott Mainwaring of Intel) propose to depart from our habit of viewing mobility as a problem. By contrast, they encourage designers of mobile applications to profit from movement and space. To prove the point, they have designed undersound – a music application that consists of three parts:
“1) A mobile phone client lets both emerging musicians and audiophiles wirelessly upload their tracks at upload points inside the Underground station ticket halls.
2) This same phone application lets users download tracks from download points on the train platforms as well as from other users in proximity.
3) The phone application stores metadata from each music exchange, which the upload and download access points throughout the undersound network collect and use to drive large visualizations in the ticket halls, which reflect the music’s movement through the network.”
For example, emerging musicians can get some free publicity by uploading their latest track and by adding the date of their next gig as a note to the track.
Also, their etnographic study in the Tube is well worth reading. It reminded me of what Francine Prose once wrote: “Travelers compare notes on how best to prevent their seatmates from making casual conversation. Pervesely, it’s more likely that someone might “share” a confession with a national TV audience…”

Trust and Security in Virtual Communities

Tuesday, April 8th, 2008

What: Second Workshop: Usability and Interoperability in AuthN/AuthZ

Where: Oxford

When: 8th & 9th May 2009

Why: To take a snapshot of work being done in this area, particularly in the UK, to identify and disseminate the most promising solutions and best practice, and to inform and develop proposals for future research. Anyone wishing to offer a talk should contact Andrew Martin.

‘Ruthlessness gene’ discovered

Monday, April 7th, 2008

Researchers at the Hebrew University in Jerusalem found a link between a gene called AVPR1a and ruthless behaviour in an economic exercise called the ‘Dictator Game’.

WorldWide Buzz

Wednesday, April 2nd, 2008

A new , written while the author was an intern at Microsoft, analyses “the largest social network analyzed to date.” Here is the abstract:

We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by “six degrees of separation” and find that the average path length among Messenger users is 6.6. We also find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.

YouTube Behavior

Wednesday, April 2nd, 2008

I read an interesting paper, “Identifying User Behavior in Online Social Networks,” by Marcelo Maia, Jussara Almeida and Virgílio Almeida. It was presented yesterday at the First International Workshop on Social Networks (co-located with EuroSys 2008).

The paper uses an interesting dataset: a social network based on the user-subscriptions on youtube. In other words, if I subscribe to your video uploads, then I link to you in the network. Here is a very brief summary: How is it possible to classify users according to different behaviors? An answer to this question would help specialists design their sites according to the target audience; however, trying to identify groups of similarly-behaved users based on individual attributes does not produce useful results. So what can be done? More informative traits can be used: the social interaction attributes. For example, consider the subscription network of youtube: considering each user’s in-degree (people who subscribe to that user’s content), out-degree (number of subscriptions), and reciprocity (mutual subscriptions), as well as number of uploads, watches, and channel views, allows for user behavior to be classified into five groups. The three main groups that appear are the content producers, consumers, and mixed producer/consumers. The last two are the old-possibly inactive users and those who small-degree/high clustering coefficient (the cliques).

The users in this dataset were classified using k-means, which typically relies on a pre-defined value of k to work. Another interesting contribution is a method that finds what k to use, based on balancing the proportion of inter- and intra- cluster distance properly (details in the paper). Of course, just like needing to specify k, a more general weakness of these techniques seems to be that you need to know what you are looking for before you can find any structure. In other words, if the authors had decided to cluster based on different social-interaction attributes (or social-net graph properties), maybe their results would have been remarkably different?

There are a lot of other interesting papers that use youtube datasets, including this one that looks at how content popularity on the site fluctuates.