Monday, February 09, 2009

"Thesaurus-tags"

One of the particularities of the MELT portal is that apart from being a "traditional" resource portal, we also have social tagging-features implemented. This creates a situation where resources have both indexing terms that come from our multilingual Thesaurus, as well as teacher generated tags, that, btw, are also multilingual.

I was looking at the tags today with a specific question in mind: "How many of the user-generated tags are actually terms that exist in the Thesaurus?" If there is a tag that is added by a teacher, and if it exists in the Thesaurus, I will call it a "Thesaurus-tag".

Here are the figures:
  • Distinct tags: 4428
  • Tags applied: 5009
  • Distinct "Thesaurus-tags": 505
  • "Thesaurus-tags" applied: 714
I was actually really impressed: 11.4% of distinct tags are "Thesaurus-tags"! And if we look at tag application, "Thesaurus-tags" amount to 14.25% of all tags.

Moreover, 22.37% of distinct "Thesaurus-tags" are applied more than once. This amounts to 45.1% of all "Thesaurus-tag" applications! The top "Thesaurus-tag" were:
Europe (10), music (8), test (8), Vocabulary (8), Internet (7), art (6), biology (6), history (6), Australia (5), chemistry (5).

This is really quite interesting. On the one hand, we always ask ourselves how to make the Learning resource indexing better, and here we can totally "crowd source" part of the indexing, that is usually done by the experts", to end-users. If we think an end-user thought that a Thesaurus-tag was good enough to add for a resource, I can be pretty sure that it is also good enough for being an indexing term. The story can be VERY different for all tags, we do not think that ALL tags could become indexing terms, although we know they are good for other stuff.

On the other hand, being in the multi-lingual context, we have always a bit hard time with tags in different languages. At least with "Thesaurus-tags" we could easily show a translation of the tag, as we are certain it to be a good one.

There are lots of interesting things to see, for example, whether these resources previously had the same indexing term as the "Thesaurus-tag" was, i.e. was the tag redundant or does it really add some value to our system. It will also be interesting to see whether there was a trend; the resources that had poor/little indexing terms received more "Thesaurus-tags" from the end-users. Well, lots more, I guess, but that will be for another time.

No comments: