Archive for the ‘paper’ Category


Tuesday, July 28th, 2009

Placing Flickr Photos on a Map. They place photos on a map based only on the tags of those photos. They exploit both info from nearby locations and spatial ambiguity

When More Is Less: The Paradox of Choice in Search Engine Use. They show that increasing recall works counter to user satisfaction, if it implies a choice from a more extensive set of result items. They call this phenomenon the paradox of choice. For example, having to choose from six results yielded both higher satisfaction and greater confidence than when there were 24 items to choose from

Telling Experts from Spammers: Expertise Ranking in Folksonomies. They presented a method in which power early-adopters  score highly. I call power early-adopters those who promptly tag items that happen to then become popular in the future.

Good Abandonment in Mobile and PC Internet Search. ” Investigation of when search abandonment is good (when the answer is right in the results list – no need to open page). Good abandonments are much more likely to occur on mobile device as opposed to PC; varies by locale (looked at US, Japan, China) and by category of query. “Our study has three key findings: First, queries potentially indicating good abandonment make up a significant portion of all abandoned queries. Second, the good abandonment rate from mobile search is significantly higher than that from PC search, across all locales tested. Third, classified by type of information need, the major classes of good abandonment vary dramatically by both locale and modality.”

Page Hunt: Improving Search Engines Using Human Computation Games.
Called Page Hunt, the game presents players with web pages and asks them to guess the queries that would produce the page within its first five results. Players score 100 points if the page is no.1 on the list, 90 points if it’s no.2, and so on. Bonuses are also awarded for avoiding frequently-used queries.

danah boyd’s gave a GREAT talk titled ‘The Searchable Nature of Acts in Networked Publics‘. In it, she debunked 3 myths about social networks:
1. There is only one type of social network. NO! There are 3 types of net
1) sociological network  (created from sociological study)
2) articulated network (created from listing friends)
3) behavioral network (created from interaction patterns)
those nets are very different but we have a tendency to assume they’re the same thing!!!

[Student Project Idea] Test whether the 3 types of social networks are related to each other and, if so, how!

2. Social ties are all equal. NO. The context of those ties and how strong they are are two important aspects, for example. (we have been discussing why context matters)
3. Content is King. In the tweet ‘i’m having for breakfast…’, the content isn’t important at all – it’s all about the awareness of sharing an experience.
danah then argued that social network sites are a type of networked public with four properties that are not typically present in face-to-face public life: persistence (what you say online it stays online), replicability (content can be duplicated (and can be taken of out-of-context – often u can’t replicate context)), searchability ( the potential visibility of content is great), and invisible audiences (we can only imagine the audience).  This networked public creates a new sense of what is public and what is private. For example, young people care deeply about their privacy, but their notion of privacy is very different from that of audults. finally,  danah introduced few stats on twitter (5% of accounts are protected, 22% include http://, 36% mention @user, 5% contain #hashtag, RT 3% are retweets, & spam accounts are proliferating) and highlighted some interesting research points for the future: 1)  how to make sense of content for such small bits of text; and 2) how social search can exploit analysis of the  network of twitters,  of context, and of tie strength.

A Principal Component Analysis of 39 Scientific Impact Measures

Wednesday, July 15th, 2009

Paper by Johan Bollen, Herbert Van de Sompel, Aric Hagberg, and Ryan Chute


Traditionally, the impact of scientific publications has been expressed in terms of citation counts (e.g., Journal Impact Factor – JIF). Today, new impact measures has been proposed based on social network analysis (e.g., eigenvector centrality) and usage log data (e.g. usage impact factor) to capture scientific impact in the digital era. However, among the plethora of new measures, which is most suitable for measuring scientific impact?

The authors performed a principal component analysis on the rankings produced by 39 different measures of scientific impact. They find that scientific impact is a multi-dimensional construct that cannot be adequately measured by any single indicator, although some are more suitable than others.

From the results, they draw four conclusions that have significant implications on the development of scientific assessment.

  1. The set of usage measures is more strongly correlated than the set of citation measures, indicating a greater reliability of usage measures calculated from the same usage log data than between citation measures calculated from the same citation data.
  2. Usage-based measures are stronger indicators of scientific Prestige than many presently available citation measures. Impact factor and journal rank turn out to be strong indicators of scientific Popularity.
  3. Usage impact measures turn out to be closer to a “consensus ranking” of journals than some common citation measures.
  4. Contrary to common belief that JIF is the “golden standard”, usage-based measures such as Usage Closeness centrality may be better “consensus” measures than JIF.

Using Data Mining and Recommender Systems to Scale up the Requirements Process

Monday, July 13th, 2009

Paper by Jane Cleland-Huang and Bamshad Mobasher


Ultra-large scale (ULS) software projects involve hundreds and thousands of stakeholders. The requirements may not be fully knowable upfront and emerge over time as stakeholders interact with the system. As a result, the requirements process needs to scale up to the large number of stakeholders and be conducted in increments to respond quickly to changing needs.

Existing requirements engineering methods are not designed to scale for ULS projects:

  • Waterfall and iterative approaches assume requirements are knowable upfront and are elicited during the early phases of the project
  • Agile processes are suitable for small scaled projects
  • Stakeholder identification methods only identify a subset of stakeholders

This position paper makes two proposals:

  • Using data-mining techniques (i.e., unsupervised-clustering) to identify themes from stakeholders’ statements of needs
  • Using recommender systems to facilitate broad stakeholder participation in the requirements elicitation and prioritisation process

Early evaluations show promise in the proposals:

  • Cluster algorithms (e.g, bisective, K-means) generated reasonably cohesive requirements clusters. However, a significant number contained requirements that were loosely coupled. The probabilistic Latent Semantic Analysis (LSA) method was used and early results showed improvement in cluster quality.
  • Their prototype recommender systems generated discussion forums that were more cohesive than ad-hoc ones created by users, and were able to recommend a significant number of relevant forums to stakeholders.

The proposed open elicitation framework introduces some challenges:

  • unsupervised group collaborations
  • decentralised prioritisation of requirements
  • malicious stakeholders manipulating the system for their personal gains

8 friends are enough

Wednesday, May 20th, 2009

New article by Ross Anderson’s group. It’s beautiful in its simplicity. “Eight Friends are Enough: Social Graph Approximation via Public Listings shows how easy it is for an outsider to work out the structure of friendships on Facebook. (For more, see our blog on Facebook’s technical privacy and its democracy theatre.) ”

In short: Having

  • G: undirected graph (e.g., Facebook social net)
  • Gk: publicly available portion of G (one in which k outgoing friendship edges have been randomly chosen from G),

they show that the results of applying a certain function f (e.g., centrality, shortest paths, community structure) on Gk are simlar to those of  applying f on the entire G! That is, by using the public view (Gk), one is able to infer node centralities, shortest paths, and community structures of the whole G! Scary result for privacy-conscius people! But good news for researchers who need to handle big networks ;-) On the scary side, from a partial (public) view of a social network, one is able to guess

  • which nodes are central – e.g., 1) marketing companies are able to  identify influential individuals and virally spread products through them; or 2) during protests that are self-orginized via text messages, repressive governments are able to identify influential individuals and intercept  their text traffic.
  • communities – the authors “were ableto divide the [partial] graph into communities nearly as well as using complete graph knowledge.” (Sect 3.5)

Studying Social Tagging and Folksonomy: A Review and Framework

Tuesday, April 14th, 2009

paper (pdf) by J. Trant, University of Toronto

Abstract:  This paper reviews research into social tagging and folksonomy (as reflected in about 180 sources published through December 2007). Methods of researching the contribution of social tagging and folksonomy are described, and outstanding research questions are presented. This is a new area of research, where theoretical perspectives and relevant research methods are only now being defined. This paper provides a framework for the study of folksonomy, tagging and social tagging systems. Three broad approaches are identified, focusing first, on the folksonomy itself (and the role of tags in indexing and retrieval); secondly, on tagging (and the behaviour of users); and thirdly, on the nature of social tagging systems (as socio-technical frameworks).

Crowdsourcing User Studies With Mechanical Turk

Tuesday, February 10th, 2009

We just finished our reading session of “Crowdsourcing User Studies With Mechanical Turk” (pdf). Very interesting paper. Few hand-written notes on which type of tasks we would run on the MechTurk.

Sybils in RecSys

Friday, February 6th, 2009

SybilGuard’s authors will present a paper on how to defend recommender systems from the Sybil Attack.


DSybil: Optimal Sybil-Resistance for Recommendation Systems

I’m waiting to read the paper to see which real data they’ve used and how it would possibly work on typical social networks of recsys websites, which aren’t that big and may well not be  fast mixing (controversial SybilGuard’s assumptions)

Homophily in MySpace

Friday, February 6th, 2009

(doc) by Mike Thelwall: “The results showed no evidence of gender homophily but significant evidence of homophily for ethnicity, religion, age, country, marital status, attitude towards children, sexual orientation, and reason for joining MySpace. There were also some imbalances, with women and the young being disproportionately commenters and commenters tending to have more Friends than commentees.”

On homophily

Thursday, January 29th, 2009

From “Birds of a Feather: Homophily in Social Networks” (pdf). “Similarity breeds connection. This principle—the homophily principle—structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship.”

On social web: open & privacy-friendly

Thursday, January 8th, 2009

From the Economist: Websites can now let visitors bring along their friends. A NEW button is appearing on some websites. It says “Facebook Connect” and saves visitors from having to fill out yet another tedious registration form, upload another profile picture and memorise another username and password. Instead, visitors can now sign into other sites using their existing identity on Facebook. …The big new idea, says Dave Morin, a Facebook Connect manager, is “dynamic privacy”. It means that, as the social network reaches out across the wider web, users will in theory take their privacy settings with them. Wherever on the web they are, they will be able to choose who among their friends will and won’t see what they are up to. As soon as a user demotes a “friend” from intimate to arm’s-length in his Facebook settings, this will also take effect on other sites.

Proximity Marketing & Proximity Networks

Thursday, November 20th, 2008

I’ve just finished to put some old material together for a position paper titled “Tapping the Mobile Digital Tapestry: Can mobile 2.0 companies make money without being greedy for personal data? ” Of course, my answer is yes: “if companies were to give up control over user data, how they would make money? One promising way seems to be proximity marketing campaigns: distributing electronic ads among co-located mobile users. Companies like HyperTag and BlueMedia are currently working out how to best do so.”

However, to figure that out, those companies need to be supported by research, which necessarily needs real data. That is why it will be very important to collect data of who is collocated with whom and of what co-located people like. Only in that way will it be possible to preliminarily test the effectiveness of proximity marketing campaigns. Hopefully, that will open up a new research area: proximity & affinity networks!

Social network collaborative filtering

Monday, October 13th, 2008

Interestingly, “This paper demonstrates that “social network collaborative filtering” (SNCF), wherein user-selected like-minded alters are used to make predictions, can rival traditional user-to-user collaborative filtering (CF) in predictive accuracy. “

WWW’08 highlights

Tuesday, July 8th, 2008

Few papers from WWW’08 that may be of interest:

Social Systems

Monday, June 30th, 2008

This month’s Data Engineering Bulletin is about Recommendation and Search in Social Systems. It sports thoughts on robustness and user experience.

Underground Aesthetics: Rethinking Urban Computing

Tuesday, April 8th, 2008

Yesterday I came across this terrific piece of research (pdf).

Situation: We usually see mobility as a (research) problem. So we design applications:

  • For accessing info “anytime, anywhere” (When we view mobility as disconnection)
  • For helping users to find interesting nearby restaurants (When mobility involves being “out of place” or lost)
  • That respond to contextual cues. For example, a mobile that sets “itself automatically to vibrate mode in a theatre”. (When we view mobility as disruption)

Proposal: Some local folk (Arianna Bassoli of LSE and Karen Martin of UCL) and some folk on the other side of the pond (Johanna Brewer and Paul Dourish of UCI and Scott Mainwaring of Intel) propose to depart from our habit of viewing mobility as a problem. By contrast, they encourage designers of mobile applications to profit from movement and space. To prove the point, they have designed undersound – a music application that consists of three parts:
“1) A mobile phone client lets both emerging musicians and audiophiles wirelessly upload their tracks at upload points inside the Underground station ticket halls.
2) This same phone application lets users download tracks from download points on the train platforms as well as from other users in proximity.
3) The phone application stores metadata from each music exchange, which the upload and download access points throughout the undersound network collect and use to drive large visualizations in the ticket halls, which reflect the music’s movement through the network.”
For example, emerging musicians can get some free publicity by uploading their latest track and by adding the date of their next gig as a note to the track.
Also, their etnographic study in the Tube is well worth reading. It reminded me of what Francine Prose once wrote: “Travelers compare notes on how best to prevent their seatmates from making casual conversation. Pervesely, it’s more likely that someone might “share” a confession with a national TV audience…”