Friday, February 27, 2009

Are tags from Mars and descriptors from Venus?

A study on the ecology of educational resource metadata.

I just finished a paper on the tag evaluations that we did in the MELT project. We had lots of fun with the name of the paper :) the main question being which one, tags or descriptors, should be from Venus...?

Anyway, we were able to show that not all the tags are as far from the Thesaurus descriptors as Mars is from Venus. We had different perspectives for evaluations: end-users, expert indexers and repository owners. For me the most interesting thing that came up was that 11% of end-user generated tags are actually terms that we can find in our multilingual Thesaurus! I assume teachers are "better taggers" than average, usually there is lots of talk about the gap between end-users' language and the one deployed by experts.

Abstract. pdf. In this study, over a period of six months, we gathered empirical data from more than 200 users on a learning resource portal with a social bookmarking and tagging feature. Our aim was to look at the tags from different stakeholders’ points of view; end-users, librarians/expert indexers and repository owners. We first look how users tag resources, and then conduct an evaluation with indexers to understand how they perceive the value of tags as descriptors. We then present a case study from a repository owner’s point of view. Lastly, we study users’ clickstream when searching resources. We find that, even though end-users and expert evaluators apply very different strategies when adding metadata, (end-users have a rather synthetic approach whereas expert indexers an analytical one) there is an overlap in the information in tags and the official descriptors, this overlap is even up to 51%, creating an ecology of metadata.

Keywords: Learning resource metadata, tags, folksonomy, clickstream,
thesaurus, evaluation.

Monday, February 09, 2009


One of the particularities of the MELT portal is that apart from being a "traditional" resource portal, we also have social tagging-features implemented. This creates a situation where resources have both indexing terms that come from our multilingual Thesaurus, as well as teacher generated tags, that, btw, are also multilingual.

I was looking at the tags today with a specific question in mind: "How many of the user-generated tags are actually terms that exist in the Thesaurus?" If there is a tag that is added by a teacher, and if it exists in the Thesaurus, I will call it a "Thesaurus-tag".

Here are the figures:
  • Distinct tags: 4428
  • Tags applied: 5009
  • Distinct "Thesaurus-tags": 505
  • "Thesaurus-tags" applied: 714
I was actually really impressed: 11.4% of distinct tags are "Thesaurus-tags"! And if we look at tag application, "Thesaurus-tags" amount to 14.25% of all tags.

Moreover, 22.37% of distinct "Thesaurus-tags" are applied more than once. This amounts to 45.1% of all "Thesaurus-tag" applications! The top "Thesaurus-tag" were:
Europe (10), music (8), test (8), Vocabulary (8), Internet (7), art (6), biology (6), history (6), Australia (5), chemistry (5).

This is really quite interesting. On the one hand, we always ask ourselves how to make the Learning resource indexing better, and here we can totally "crowd source" part of the indexing, that is usually done by the experts", to end-users. If we think an end-user thought that a Thesaurus-tag was good enough to add for a resource, I can be pretty sure that it is also good enough for being an indexing term. The story can be VERY different for all tags, we do not think that ALL tags could become indexing terms, although we know they are good for other stuff.

On the other hand, being in the multi-lingual context, we have always a bit hard time with tags in different languages. At least with "Thesaurus-tags" we could easily show a translation of the tag, as we are certain it to be a good one.

There are lots of interesting things to see, for example, whether these resources previously had the same indexing term as the "Thesaurus-tag" was, i.e. was the tag redundant or does it really add some value to our system. It will also be interesting to see whether there was a trend; the resources that had poor/little indexing terms received more "Thesaurus-tags" from the end-users. Well, lots more, I guess, but that will be for another time.