Trust propagation and the origins of PageRank

Since Matteo’s seminar about neighbourhood maps a couple of months ago I’ve been wondering whether PageRank could be applied to a local view of a social network to calculate trust scores. (This might be useful in the new darknet version of Freenet, for example.) One of the Freenet developers pointed out that PageRank is patented, but Wikipedia showed that using eigenvector centrality to calculate the importance of nodes isn’t a new idea.

After following a few references it turns out that the idea of propagating trust/status/etc across a graph dates back to at least 1953 [1]. Pinski and Narin [2] suggested normalising each node’s output by dividing the output on each outgoing edge by the node’s outdegree. Geller [3] pointed out that their model was equivalent to a Markov chain: the scores assigned to the nodes followed the Markov chain’s stationary distribution. In other words, propagating trust/status/etc with normalisation at each node is equivalent to taking random walks from random starting points and counting how many times you end up at each node.

The only difference between Geller’s model and PageRank is the damping factor: in PageRank you continue your random walk with probability d or jump to a random node with probability 1-d. (Incidentally, when the algorithm’s described this way rather than in terms of a transition matrix, it’s easy to see how you could implement it on a web spider.)

[1] L. Katz, “A new status index derived from sociometric analysis,” Psychometrika 18 (1953), pp. 39-43. (PDF)
[2] G. Pinski and F. Narin, “Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics,” Information Processing and Management 12 (1976), pp. 297-312. (PDF)
[3] N. Geller, “On the citation influence method of Pinski and Narin,” Information Processing and Management 14 (1978), pp. 93-95. (PDF)

5 Responses to “Trust propagation and the origins of PageRank”

  1. Hi, I’m happy that my talk gave you food for thought. I’m quite curious about how you think you could use PageRank in the Freenet darknet. Wouldn’t it be a security hazard if you knew the local structure of the network in Freenet?

    BTW: would other trust metrics such as maximal flow be viable for your purposes? MaxFlow is also sybilproof, while with PageRank you could boost your ranking by a factor of 1/d with sybil attacks.

  2. mike says:

    Hi Matteo,

    Thanks for the information about MaxFlow – it sounds similar to the approach used by Advogato.

    The problem we’re trying to solve in Freenet is similar to the predecessor attack: an eavesdropper inside the network sees a series of requests for pieces of the same file and concludes that its neighbour is likely to be the source of the requests, because the requests would have dispersed if the source was further away. The attack can be prevented by sending related requests through a tunnel before they begin to disperse. But the problem is how to choose the tunnel endpoint in a Sybil-proof way.

    I agree that sharing information about the local topology isn’t ideal, but we’re hoping that since nearby nodes in a friend-to-friend network are likely to be run by friends or friends of friends, it’s an acceptable risk. On the other hand perhaps that contradicts the argument that a Sybil-proof mechanism is needed in the first place! :-)

    By the way do you have any more information about the 1/d figure for PageRank? I would have thought the small cut around Sybil clusters would prevent them attracting much trust, as in SybilGuard.

  3. The Sybilguard paper says that (under certain conditions) endpoints of random walks are unlikely to be Sybil nodes, so maybe random walks are good enough for you…

    And about the 1/d figure… I’m sorry, using your notation it is 1/(1-d). It is actually an upper bound on the amount of damage a Sybil node can do.
    I’ll use my favourite explanation of PageRank: you *stop* your random walk with a probability 1-d, go on with the random walk with probability d, and evaluate the probability of stopping at each node. This is equivalent to the more common definition and, at least to me, it provides a better intuition to explain that each node takes (1-d) for itself and redistributes d to other nodes.

    A malicious node could divert all paths passing through it to other sybil nodes… in other words, when a node enters the sybil region has probability 1 of remaining there. I hope this explanation is clear enough, even if I’m not so sure about it…

  4. mike says:

    Thanks, I’ve fixed the definition of d in the post.

  5. mike says:

    It turns out that the idea of recurrently propagating value across a graph can be traced back even further, to 1941.