Tuesday, November 06, 2007

Notes on "Collaborative tagging and Semiotic Dynamics"

By Gattuto, C., L.Vittorio and L.Pietronero (2006).

Firstly, I must say that I was glad to read this paper. Lately, I've been seeing many papers talking about the properties of folksonomies, like co-occurrence, etc., which have intrigued me quite a lot. This paper explains the process pretty well and underlines an important point - they factor out the users and only deal with streams of tagging events and their statistical properties!

I must admit that this makes the whole area of Semiotic Dynamics less attractive to me. I think it is important to study tags and their properties, but not in isolation from the user. I see (barely) the point to explain tagging activity and the growth of tags in separation from the users. But fair enough.

Problem statement: Uncovering the mechanisms governing the emergence of shared categorisatioins or vocabularies in absence of global coordination is a key problem with significant scientific and technological potential. Collaborative tagging provides a precious opportunity to both analyze the emergence of shared conventions and inspire the design of large agent systems.

Semiotic Dynamics study how populations of humans or agents can establish and share semiotic systems, typically driven by their use in communication. The author argue that the emergence of a folksonomy exhibits dynamical aspects also observed in human languages, such as the crystallisation of naming conventions, competition between terms, takeovers by neologisms, and more.

  • Users interact with a collaborative tagging system by using tags or adding new resources to system
  • Basic unit of information in collaborative tagging systems is a (user, resources, {tags}) triple, which they refer as post in this paper. Tagging event is a tri-partite graph (with partitions corresponding to users, resources and tags, respectively) and can be used as a navigation aid in browsing tagged information
    • Comment: I like the tri-partite graph as navigation aid, yes!, but as the authors mention just above, they don't think of other users and those networks as navigational aid. In contrary, they omit the users just to study the properties, which strikes bizzarre to me.
The authors cite the "rich get richer" model (Yule-Simon's stochastic model) and propose to enhance it with a "fat-tailed memory kernel". This original model is related to the construction of text from scratch:
At each discrete time step one word is appended to the text: with probability p the appended work is a new workd, never occurred before, while with probability 1-p one work is copied from the existing text, choosing it with a proability proportional to its current frequency of occurrence. This simple process ields frequency-rank distribution that display a power öaw tail with exponent alpha = 1-p, lower than the exponent we observe in actual data. This happends because the Yule-Simon process has no notion of "aging", i.e., all positions within the text are regarded as identical ..
This all leads to a model of users' behaviour: the process by which users of a collaborative tagging system associate tags to resources can be regarded as the construction of a "text", build one step at a time by adding "words" (tags) to a text initially comprised of n 0 words. There is also that same Yule-Simon model with long-term memory (about inventing new tags or using existing ones), but recent tags are used more often than old ones.

Also, "in our model,.., the average user is exposed to a few roughly equivalent top-ranked tags and is translated to mathematically into a low -rank cutoff of the power law, i..e., the observed low-rank flattening".

Conclusion: It seems that users of collaborative tagging system share universal behaviour which, despite the intricacies of personal categorisation, tagging procedures and user interactions, appear to follow simple activity pattern.

There is also something about the co-occurrence between high-rank and low-rank tags: it says: "This suggest that high-frequency tags partition - or "categorize" - the resources marked by tags of lower frequency. "
Comment: This all sounds interesting and important, but will need to look into that later.

No comments: