Archive for January, 2011

icdm 2010

Thursday, January 27th, 2011

last month i went to icdm (i gave a talk on rethinking mobile recommendations & neal on personalised public transport). few interesting contributions follow:

// A. Christos Faloutsos (Carnegie Mellon University) gave a keynote talk titled ‘Mining Billion-node Graphs: Patterns, Generators and Tools’. He & his colleagues (=they, henceforth) studied several network measures such as:

  1. node degree. this measure might be useful to answer the question of, for example, whether an epidemic will die or not (one may look at average degree, max degree, degree variance). However, it turns out that only the first eigenvalue of the

    adjacency matrix is needed to understand whether the epidemic will take over or not. [see Prakas on arvix]

  2. network diameter. they found a surprising result: the diameter shrinks over time. this is surprising because the theory (see science papers by barabasi et al.)  predicts the opposite, ie, that the diameter increases, and it does so according to log(n), where n is the number of nodes in the network. in a paper at sdm10, they proposed a way of computing the diameter efficiently and computed the whole’s (Yahoo) Web diameter: of course, its distribution was multi-modal, as one would expect -  there are parts of the Web that are separate by, for example, language
  3. eigenvalues. eigenvalues might be useful for fraud detection (see KDD09 paper on  belief propagation) and for characterising and spotting anomalies in networks (they built a tool for doing that. it’s called Oddball and characterises, eg, egonetwork by studying very simple quantities like number of nodes, number of edges, number of triangles, total weight, and principal eigenvalues)
  4. triangles. in this paper [Tsonakakis ICDM 2008], they have shown that if one has n friends, then he/she would have n^1.6 triangles on average. computing the number of triangles in a network is computationally expensive.  fortunately, they proposed two ways to make this computation tractable:
  • the 1st way is about computing  few top eigenvalues only (lambda_i) and the number of triangles would then be =1/6 of the sum(lamba_i^3)  [Tsonakakis ICDM 2008]
  • the 2nd way relies on SVD (see EigenSpokes at PKDD 2010)

they also studied network-related quantities over time. for instance, they looked at:

  1. popularity of posts over time. they studied the number of links to a post over “lag” days. that is, given a post at time t, they looked after which time t+lag a link to the post would start to appear. what’s the distribution of lag? it’s power law with exponent -1.6
  2. duration of phone calls. this quantity is often used to compute link weights in networks. in their research, they found that phone call duration fits  a  log-normal distribution in a OKish way but is  best described by a newly introduced distribution called TLAC (LAzy Contractor): the longer a call has taken, the longer it will take

// B. [don't remember title] this paper tries to predict age and gender of web-page creators based on the page’s text, title, or structure.

// C. Modeling Information Diffusion in Social Media by Jaewon Yang and Jure Leskovec. the goal of this paper is to look at the process of info diffusion in networks and predict the number of infected nodes at a given point in time without knowing the network structure. to do so, they use a linear influence model that accounts for only which nodes got infected in the past. each node has an influence function that is estimated using past data. a node’s influence function is modelled in discrete time units, so no assumption is made about its shape. more on

//D. MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model. i would also briefly check this paper ;-)

Morozov talk at the LSE

Wednesday, January 12th, 2011

Professional cyber-curmudgeon

Evgeny Morozov will be speaking at the LSE on the LIGHTROOM 5 DISCOUT 19th of January.

The war of words

Wednesday, January 5th, 2011

Evgeny Morozov’s latest skeptical article (of many) about the liberatory potential of communication technology describes a new trend in internet censorship: in addition to building national firewalls and knocking websites offline, some governments are trying to win the war of words on the internet by engaging their citizens in debate.

The Chinese government’s practice of paying citizens — the so-called “50 cent party” — to make pro-government points in online debates is well documented, and the propaganda department reported last year that “we have used the Internet to vigorously organize and launch positive propaganda, and actively strengthen our abilities to guide public opinion.” In Russia, Morozov claims that the Kremlin is engaging in “comment warfare” with its opponents. Yet what’s rarely asked in Western criticisms of such developments is how they compare to the operation of media power in our own

societies. To the extent that authoritarian governments are engaging in public debate about their policies, rather than trying to silence such debate, are they not in fact moving towards a different form of governance — one in which winning the argument, rather than preventing the argument, is the foundation of legitimacy?

I’m not of course trying to claim that the internet in Russia or China is some kind of ideal Habermasian public sphere — but neither is the internet in Britain. “Public relations” and “spin” are accepted facts of life in Western politics, though few are honest enough to call them “propaganda”. We accept that a few hands control the news agenda, and that while anyone is in principle free to speak their mind, most will not be heard — yet we regard our society as essentially democratic, and Chinese or Russian society as essentially authoritarian.

I want to argue that such essentialism is mistaken, and that the mode of governance towards which China is rapidly moving — described by Rebecca MacKinnon as networked authoritarianism, and by Min Jiang as authoritarian deliberation — resembles in many ways the society in which we already live. That is not, of course, to say that they are identical; only that in both systems, winning the war of words is of paramount political importance.

This raises a difficult question for those who would like to quantify the democratising influence of communication technology: can we distinguish between propaganda and spin, between “comment warfare” and genuine debate — and if not, what does that mean for our understanding of democracy?

Magazines, marketers, middlemen and micropayments

Tuesday, January 4th, 2011

Fortune Tech is reporting that digital magazine subscriptions are already falling, before the new medium has even managed to reach the mainstream. Some in the publishing world would like to blame the problem on the difficulty of marketing digital magazines when Apple refuses to share customer data with publishers. Such complaints may be put to the test if Google’s prospective subscription service for Android gives publishers access to the customer data they’ve been asking for, as the

Fortune article speculates.

How might the industry evolve if Apple, Google and the like have to compete to attract content creators by offering them customers’ data? This is an issue that goes beyond the new medium of digital magazines and calls into question the role of the device vendor as information gatekeeper.

One interesting possibility is that once creators can contact their customers they’ll deal with them directly, cutting device vendors out of future transactions and reducing their power as gatekeepers – and thus their bargaining power when trying to withold further customer data. Given the potential for companies such as Apple and Google to abuse their position as Holders of the Holy Code-Signing Keys, I see this as a good thing; it might even be a step towards the kind of ‘peer-to-peer economics’ copyright reformers have long advocated, where artists and audiences benefit by cutting out rent-seeking middlemen. But for that to happen, there needs to be an easy way for creators to squeeze the occasional drop of cash from their newly-identified customers.

Perhaps Flattr, a click-based micropayment system, can fill that niche. But I wonder whether we can’t take its core insight – minimising mental transaction costs – even further. Why do I need to click when my device already knows what I’m reading, playing, watching and listening to?

What I want to suggest isn’t a new idea – it’s just a combination of two existing ideas, Flattr and scrobbling. Creators would tag their content to indicate ownership, and the mobile device would keep track of how much time was spent on each creator’s content during the month, dividing a fixed budget among the creators at the end of the month. Mental transaction cost: zero.

The problem, of course, is that if users set their monthly budgets to zero, nobody gets paid. Will creators be willing to risk giving their work away for free in return for circumventing the middlemen and earning 100% of whatever people are willing to pay? If sales of digital magazines are anything to go by, they might not have much to lose…