## icdm 2010

last month i went to icdm (i gave a talk on rethinking mobile recommendations & neal on personalised public transport). few interesting contributions follow:

// A. Christos Faloutsos (Carnegie Mellon University) gave a keynote talk titled ‘Mining Billion-node Graphs: Patterns, Generators and Tools’. He & his colleagues (=they, henceforth) studied several network measures such as:

1. node degree. this measure might be useful to answer the question of, for example, whether an epidemic will die or not (one may look at average degree, max degree, degree variance). However, it turns out that only the first eigenvalue of the

adjacency matrix is needed to understand whether the epidemic will take over or not. [see Prakas on arvix]

2. network diameter. they found a surprising result: the diameter shrinks over time. this is surprising because the theory (see science papers by barabasi et al.)  predicts the opposite, ie, that the diameter increases, and it does so according to log(n), where n is the number of nodes in the network. in a paper at sdm10, they proposed a way of computing the diameter efficiently and computed the whole’s (Yahoo) Web diameter: of course, its distribution was multi-modal, as one would expect -  there are parts of the Web that are separate by, for example, language
3. eigenvalues. eigenvalues might be useful for fraud detection (see KDD09 paper on  belief propagation) and for characterising and spotting anomalies in networks (they built a tool for doing that. it’s called Oddball and characterises, eg, egonetwork by studying very simple quantities like number of nodes, number of edges, number of triangles, total weight, and principal eigenvalues)
4. triangles. in this paper [Tsonakakis ICDM 2008], they have shown that if one has n friends, then he/she would have n^1.6 triangles on average. computing the number of triangles in a network is computationally expensive.  fortunately, they proposed two ways to make this computation tractable:
• the 1st way is about computing  few top eigenvalues only (lambda_i) and the number of triangles would then be =1/6 of the sum(lamba_i^3)  [Tsonakakis ICDM 2008]
• the 2nd way relies on SVD (see EigenSpokes at PKDD 2010)

they also studied network-related quantities over time. for instance, they looked at:

1. popularity of posts over time. they studied the number of links to a post over “lag” days. that is, given a post at time t, they looked after which time t+lag a link to the post would start to appear. what’s the distribution of lag? it’s power law with exponent -1.6
2. duration of phone calls. this quantity is often used to compute link weights in networks. in their research, they found that phone call duration fits  a  log-normal distribution in a OKish way but is  best described by a newly introduced distribution called TLAC (LAzy Contractor): the longer a call has taken, the longer it will take

// B. [don't remember title] this paper tries to predict age and gender of web-page creators based on the page’s text, title, or structure.

// C. Modeling Information Diffusion in Social Media by Jaewon Yang and Jure Leskovec. the goal of this paper is to look at the process of info diffusion in networks and predict the number of infected nodes at a given point in time without knowing the network structure. to do so, they use a linear influence model that accounts for only which nodes got infected in the past. each node has an influence function that is estimated using past data. a node’s influence function is modelled in discrete time units, so no assumption is made about its shape. more on http://snap.stanford.edu/

//D. MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model. i would also briefly check this paper 