Archive for the ‘paper’ Category


Tuesday, July 28th, 2009

Placing Flickr Photos on a Map. They place photos on a map based only on the tags of those photos. They exploit both info from nearby locations and spatial ambiguity

When More Is Less: The Paradox of Choice in Search Engine Use. They show that increasing recall works counter to user satisfaction, if it implies a choice from a more extensive set of result items. They call this phenomenon the paradox of choice. For example, having to choose from six results yielded both higher satisfaction and greater confidence than when there were 24 items to choose from

Telling Experts from Spammers: Expertise Ranking in Folksonomies. They presented a method in which power early-adopters  score highly. I call power early-adopters those who promptly tag items that happen to then become popular in the future.

Good Abandonment in Mobile and PC Internet Search. ” Investigation of when search abandonment is good (when the answer is right in the results list – no need to open page). Good abandonments are much more likely to occur on mobile device as opposed to PC; varies by locale (looked at US, Japan, China) and by category of query. “Our study has three key findings: First, queries potentially indicating good abandonment make up a significant portion of all abandoned queries. Second, the good abandonment rate from mobile search is significantly higher than that from PC search, across all locales tested. Third, classified by type of information need, the major classes of good abandonment vary dramatically by both locale and modality.”

Page Hunt: Improving Search Engines Using Human Computation Games.
Called Page Hunt, the game presents players with web pages and asks them to guess the queries that would produce the page within its first five results. Players score 100 points if the page is no.1 on the list, 90 points if it’s no.2, and so on. Bonuses are also awarded for avoiding frequently-used queries.

danah boyd’s gave a GREAT talk titled ‘The Searchable Nature of Acts in Networked Publics‘. In it, she debunked 3 myths about social networks:
1. There is only one type of social network. NO! There are 3 types of net
1) sociological network  (created from sociological study)
2) articulated network (created from listing friends)
3) behavioral network (created from interaction patterns)
those nets are very different but we have a tendency to assume they’re the same thing!!!

[Student Project Idea] Test whether the 3 types of social networks are related to each other and, if so, how!

2. Social ties are all equal. NO. The context of those ties and how strong they are are two important aspects, for example. (we have been discussing why context matters)
3. Content is King. In the tweet ‘i’m having for breakfast…’, the content isn’t important at all – it’s all about the awareness of sharing an experience.
danah then argued that social network sites are a type of networked public with four properties that are not typically present in face-to-face public life: persistence (what you say online it stays online), replicability (content can be duplicated (and can be taken of out-of-context – often u can’t replicate context)), searchability ( the potential visibility of content is great), and invisible audiences (we can only imagine the audience).  This networked public creates a new sense of what is public and what is private. For example, young people care deeply about their privacy, but their notion of privacy is very different from that of audults. finally,  danah introduced few stats on twitter (5% of accounts are protected, 22% include http://, 36% mention @user, 5% contain #hashtag, RT 3% are retweets, & spam accounts are proliferating) and highlighted some interesting research points for the future: 1)  how to make sense of content for such small bits of text; and 2) how social search can exploit analysis of the  network of twitters,  of context, and of tie strength.

8 friends are enough

Wednesday, May 20th, 2009

New article by Ross Anderson’s group. It’s beautiful in its simplicity. “Eight Friends are Enough: Social Graph Approximation via Public Listings shows how easy it is for an outsider to work out the structure of friendships on Facebook. (For more, see our blog on Facebook’s technical privacy and its democracy theatre.) ”

In short: Having

  • G: undirected graph (e.g., Facebook social net)
  • Gk: publicly available portion of G (one in which k outgoing friendship edges have been randomly chosen from G),

they show that the results of applying a certain function f (e.g., centrality, shortest paths, community structure) on Gk are simlar to those of  applying f on the entire G! That is, by using the public view (Gk), one is able to infer node centralities, shortest paths, and community structures of the whole G! Scary result for privacy-conscius people! But good news for researchers who need to handle big networks ;-) On the scary side, from a partial (public) view of a social network, one is able to guess

  • which nodes are central – e.g., 1) marketing companies are able to  identify influential individuals and virally spread products through them; or 2) during protests that are self-orginized via text messages, repressive governments are able to identify influential individuals and intercept  their text traffic.
  • communities – the authors “were ableto divide the [partial] graph into communities nearly as well as using complete graph knowledge.” (Sect 3.5)

Studying Social Tagging and Folksonomy: A Review and Framework

Tuesday, April 14th, 2009

paper (pdf) by J. Trant, University of Toronto

Abstract:  This paper reviews research into social tagging and folksonomy (as reflected in about 180 sources published through December 2007). Methods of researching the contribution of social tagging and folksonomy are described, and outstanding research questions are presented. This is a new area of research, where theoretical perspectives and relevant research methods are only now being defined. This paper provides a framework for the study of folksonomy, tagging and social tagging systems. Three broad approaches are identified, focusing first, on the folksonomy itself (and the role of tags in indexing and retrieval); secondly, on tagging (and the behaviour of users); and thirdly, on the nature of social tagging systems (as socio-technical frameworks).

Crowdsourcing User Studies With Mechanical Turk

Tuesday, February 10th, 2009

We just finished our reading session of “Crowdsourcing User Studies With Mechanical Turk” (pdf). Very interesting paper. Few hand-written notes on which type of tasks we would run on the MechTurk.

Sybils in RecSys

Friday, February 6th, 2009

SybilGuard’s authors will present a paper on how to defend recommender systems from the Sybil Attack.


DSybil: Optimal Sybil-Resistance for Recommendation Systems

I’m waiting to read the paper to see which real data they’ve used and how it would possibly work on typical social networks of recsys websites, which aren’t that big and may well not be  fast mixing (controversial SybilGuard’s assumptions)

Homophily in MySpace

Friday, February 6th, 2009

(doc) by Mike Thelwall: “The results showed no evidence of gender homophily but significant evidence of homophily for ethnicity, religion, age, country, marital status, attitude towards children, sexual orientation, and reason for joining MySpace. There were also some imbalances, with women and the young being disproportionately commenters and commenters tending to have more Friends than commentees.”

On homophily

Thursday, January 29th, 2009

From “Birds of a Feather: Homophily in Social Networks” (pdf). “Similarity breeds connection. This principle—the homophily principle—structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship.”

On social web: open & privacy-friendly

Thursday, January 8th, 2009

From the Economist: Websites can now let visitors bring along their friends. A NEW button is appearing on some websites. It says “Facebook Connect” and saves visitors from having to fill out yet another tedious registration form, upload another profile picture and memorise another username and password. Instead, visitors can now sign into other sites using their existing identity on Facebook. …The big new idea, says Dave Morin, a Facebook Connect manager, is “dynamic privacy”. It means that, as the social network reaches out across the wider web, users will in theory take their privacy settings with them. Wherever on the web they are, they will be able to choose who among their friends will and won’t see what they are up to. As soon as a user demotes a “friend” from intimate to arm’s-length in his Facebook settings, this will also take effect on other sites.

Proximity Marketing & Proximity Networks

Thursday, November 20th, 2008

I’ve just finished to put some old material together for a position paper titled “Tapping the Mobile Digital Tapestry: Can mobile 2.0 companies make money without being greedy for personal data? ” Of course, my answer is yes: “if companies were to give up control over user data, how they would make money? One promising way seems to be proximity marketing campaigns: distributing electronic ads among co-located mobile users. Companies like HyperTag and BlueMedia are currently working out how to best do so.”

However, to figure that out, those companies need to be supported by research, which necessarily needs real data. That is why it will be very important to collect data of who is collocated with whom and of what co-located people like. Only in that way will it be possible to preliminarily test the effectiveness of proximity marketing campaigns. Hopefully, that will open up a new research area: proximity & affinity networks!

Social network collaborative filtering

Monday, October 13th, 2008

Interestingly, “This paper demonstrates that “social network collaborative filtering” (SNCF), wherein user-selected like-minded alters are used to make predictions, can rival traditional user-to-user collaborative filtering (CF) in predictive accuracy. “

WWW’08 highlights

Tuesday, July 8th, 2008

Few papers from WWW’08 that may be of interest:

Social Systems

Monday, June 30th, 2008

This month’s Data Engineering Bulletin is about Recommendation and Search in Social Systems. It sports thoughts on robustness and user experience.

Underground Aesthetics: Rethinking Urban Computing

Tuesday, April 8th, 2008

Yesterday I came across this terrific piece of research (pdf).

Situation: We usually see mobility as a (research) problem. So we design applications:

  • For accessing info “anytime, anywhere” (When we view mobility as disconnection)
  • For helping users to find interesting nearby restaurants (When mobility involves being “out of place” or lost)
  • That respond to contextual cues. For example, a mobile that sets “itself automatically to vibrate mode in a theatre”. (When we view mobility as disruption)

Proposal: Some local folk (Arianna Bassoli of LSE and Karen Martin of UCL) and some folk on the other side of the pond (Johanna Brewer and Paul Dourish of UCI and Scott Mainwaring of Intel) propose to depart from our habit of viewing mobility as a problem. By contrast, they encourage designers of mobile applications to profit from movement and space. To prove the point, they have designed undersound – a music application that consists of three parts:
“1) A mobile phone client lets both emerging musicians and audiophiles wirelessly upload their tracks at upload points inside the Underground station ticket halls.
2) This same phone application lets users download tracks from download points on the train platforms as well as from other users in proximity.
3) The phone application stores metadata from each music exchange, which the upload and download access points throughout the undersound network collect and use to drive large visualizations in the ticket halls, which reflect the music’s movement through the network.”
For example, emerging musicians can get some free publicity by uploading their latest track and by adding the date of their next gig as a note to the track.
Also, their etnographic study in the Tube is well worth reading. It reminded me of what Francine Prose once wrote: “Travelers compare notes on how best to prevent their seatmates from making casual conversation. Pervesely, it’s more likely that someone might “share” a confession with a national TV audience…”

WorldWide Buzz

Wednesday, April 2nd, 2008

A new , written while the author was an intern at Microsoft, analyses “the largest social network analyzed to date.” Here is the abstract:

We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by “six degrees of separation” and find that the average path length among Messenger users is 6.6. We also find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.

YouTube Behavior

Wednesday, April 2nd, 2008

I read an interesting paper, “Identifying User Behavior in Online Social Networks,” by Marcelo Maia, Jussara Almeida and Virgílio Almeida. It was presented yesterday at the First International Workshop on Social Networks (co-located with EuroSys 2008).

The paper uses an interesting dataset: a social network based on the user-subscriptions on youtube. In other words, if I subscribe to your video uploads, then I link to you in the network. Here is a very brief summary: How is it possible to classify users according to different behaviors? An answer to this question would help specialists design their sites according to the target audience; however, trying to identify groups of similarly-behaved users based on individual attributes does not produce useful results. So what can be done? More informative traits can be used: the social interaction attributes. For example, consider the subscription network of youtube: considering each user’s in-degree (people who subscribe to that user’s content), out-degree (number of subscriptions), and reciprocity (mutual subscriptions), as well as number of uploads, watches, and channel views, allows for user behavior to be classified into five groups. The three main groups that appear are the content producers, consumers, and mixed producer/consumers. The last two are the old-possibly inactive users and those who small-degree/high clustering coefficient (the cliques).

The users in this dataset were classified using k-means, which typically relies on a pre-defined value of k to work. Another interesting contribution is a method that finds what k to use, based on balancing the proportion of inter- and intra- cluster distance properly (details in the paper). Of course, just like needing to specify k, a more general weakness of these techniques seems to be that you need to know what you are looking for before you can find any structure. In other words, if the authors had decided to cluster based on different social-interaction attributes (or social-net graph properties), maybe their results would have been remarkably different?

There are a lot of other interesting papers that use youtube datasets, including this one that looks at how content popularity on the site fluctuates.