Wednesday, November 15, 2006

Where tags and structured vocabularies co-exist, case for tagging learning resources

NOTE: this is a draft and lot of thoughts put together


In the Learning Resources Exchange-portal end-users, i.e. K-12 teachers from all over Europe can access digital learning resources that are made available by European Schoolnet or by its partners from different learning repositories from a number of European country. All of the learning resources have associated metadata to them, all of the partners use Learning Object Metadata, and many of them an application profile specifically geared to European K-12 education. The indexing keywords come from a multilingual Thesaurus that currently exists in 14 languages. Additionally, teachers can add their own keywords to learning resources when saving them to their personal collections, that function like bookmarks. Also, in their personal collections teachers can evaluate their learning resources, rate them and add comments. This paper focuses on vocabularies, both unstructured and structured ones, and proposes a complimentary approaches to their use to enhance learning resources retrieval. We think that tagging has the potential to enhance the retrieval, both through the complementary features that it can offer for controlled vocabularies, as well as through its underlying social structures.

Structured and Unstructured vocabularies to access resources

The table "axes of organizational vocabularies" depicts the dimensions of structured and unstructured vocabularies. Learning resource repositories, for example, often use an authoritative vocabulary to index the material that is based on their national or regional educational needs such as requirements in curriculum or educational levels. Some of these vocabularies are structured domain taxonomies with hierarchies (LRE thes, EUN, Italy; Mobus thesaurus, France;,..) whereas others might be unstructured glossary type lists of terms (eg....). These vocabularies are used in educational repositories and portal most often to provide access to resources by classification or allowing hierarchical browsing, search functions are equally applied, however, teachers seem to prefer browsing features (based on focus groups and some Calibrate logs).

AuthoritativeDomain TaxonomyGlossary
Personal/OpportunisticHierarchical FilesystemFolksonomy

Axes of organizational vocabularies, Iverson (2006), CC some rights reserved.

As opposed to authoritative vocabularies, personal tagging of resources provides an unstructured vocabulary for resources. These end-user generated keywords serve, in the first place, personal knowledge management needs. For example by bookmarking resources one creates a link to the resources to access them later, and the use of personal keywords allows naming things in a meaningful way, grouping them, etc. Depending on the user-base and behaviour, tagging can start providing a community based vocabulary for the given community of users. This is commonly known as folksonomy. It is important to note that not all the actions of tagging and personal vocabularies allow a community folksonomy to emerge, thus the division to broad and narrow folksonomies (Vander Wal, 2005).

Classified categories or domain taxonomies offer a way to access resources, either through keyword based retrieval or browsing through categories. A solid indexing strategy based on controlled vocabularies guarantees that the searcher does not need to face all the uncertainties of natural language (synonymy, polysemy, homonymy), combined with all the uncertainties of a full text search (no relevance control on the retrieved occurrences) (Trigari, 2002). Both searching and browsing, based on metadata and controlled, structured taxonomies, are methods most used to access resources on educational portals.

Browsing and searching serve two fundamentally different information seeking tasks. It is similar to the difference between exploring a problem space to formulate questions, as opposed to actually looking for answers to specifically formulated questions (Mathes, 2005). One could say that searching happens at the stage where user already have a clearly defined task, a well formulated question, and an idea how and by what information sources to fulfill it (reference), whereas browsing many times occurs when the user has not yet a clear question or task at hand, rather just a hunch of what to look for.

Second important advantage of folksonomies is that they can serve the community by promoting a serendipitous access to resources through browsing interlinked related tag sets, also known as "tagclouds". A tagcloud can be a personal one comprised only of the personal keywords, it can be a local one to display the most common tags by a given community of uses (e.g. based on domain interest, language, etc.) or even a global one allowing an access to all resources that are available through the service.

Complicity of relationships between thesauri and folksonomies

An interesting hybrid of vocabularies will emerge within the ELR context, where users add personal keywords to learning resources that are already indexed by an authoritative, structured domain taxonomy such as the LRE Thesaurus. As opposed to most Web-based applications where no top-down vocabulary exist and where users tag content with their personal keywords (, Flickr,..), the situation is different in ELR. Here, a learning resource is indexed using a 1 to 3 thesaurus terms that further allow search and browsing in a multilingual context, added also with multilingual tags by end-users. This will open a number of interesting research possibilities in the twilight zone of structured and unstructured vocabularies.

In the LRE Thesaurus the structure of the descriptors follows the classical semantic relationships such as the following: intra-language equivalence (USE-UF), inter-language equivalence, hierarchical BT/NT relationship and the associative relationship (RT). The research attention will be directed to study the relations between terms in all the areas. For the revision purposes of the thesauri, we will be interested in looking whether folksonomies could provide any new descriptors for the LRE Thesaurus.

Maybe the most potential area of complicity between folksonomies and thesaurus will be in the area of inter-language equivalence (USE-UF) between the thesaurus terms and folksonomies, which could be seen as a good source for non-descriptors. Non-descriptors are part of thesaurus, they provide the intra-language equivalence that facilitates the access to resources that are indexed by using the descriptors (i.e. thesaurus term) that do not translate well to the language that the end-user uses. For example, a user could search for antiquities, but the preferred term in the thesaurus, the one used for indexing, is "ancient history". The use of antiquities as a non-descriptor thus facilitates the access to the resource.

Example of inter-language equivalence (USE-UF)

USE ancient history
ancient history UF Antiquities

Moreover, an equally interesting area will be associative relationships (RT) and folksonomies. The associative or related term relationship is expressed in thesauri between terms that are mentally associated to such an extent that it's useful to make the link between them explicit, however, they are not members of equivalence set, neither are subordinated or superordinated to another (Trigari, 2001). We think that folksonomies could help to define and re-define a number of associative relationships in thesauri.

Example of associative (RT) relationship
NT1 old age
. . . . RT elderly person

As the ELR Thesaurus is also multilingual, we will be interested in investigating what kind of inter-language equivalence folksonomies could offer between languages. As for the area of hierarchical broad terms and narrow terms (BT/NT relationship) between the thesauri terms, we assume that folksonomies have very little to offer.

All in all, additionally to the above, for the revision purpose of the thesaurus, we think that folksonomies can be a great source to verify whether the scope of the current descriptors actually serves the need of its users for indexing and retrieval purposes.

To better understand the nature of folksonomies and tagging

A number of interesting research papers have emerged in the area of tags and folksonomies. Mathes (2005) described the phenomena, whereas the term and its main characteristics such as broad and narrow folksonomies were described by Vander Wal (2005). Broad folksonomies can be defined as social tagging systems where many people tag a few items (e.g., whereas narrow folksonomies are the ones where content provider, usually the owner of the content, gives tags to items (e.g. flickr). Mainly, folksonomies emerge from broad ones, as there are many users and also niche user groups, who give the item the name that they know it for. Through the recent research we also know more about the evolution of tagging and how users provide them (Sen et al., 2006). Marlow et al., (2006) propose a model and taxonomy of different tagging systems. Moreover, Bogers et al. (2006), Lin et al. (2006) and Tonkin (2006) have provided interesting insights into the issues between classification and tagging.

By studying tagging and folksonomies more closely in this specific hybrid setting of learning resource,s that are indexed by experts and tagged by end-users, might yield some more interesting insights into its nature, both from qualitative and quantitative point of view. More so, we are interested in finding best ways to have social tagging systems to co-exist with more top-down indexing with structured vocabularies, and find out ways that can leverage such a system to better serve its users' needs. Also, research in this type of setting might allow us to better understand the many downfalls of folksonomies, such as ambiguity of tags, their synonym (different word, same meaning) and homonym (same word, different meaning) control (Guy, 2006). Moreover, we are interested in using the folksonomies to tap into the social structures and networks to, first of all, study and analyse the charasteristics and the main behavior of our user-base, and secondly, to use the social networks to enhance the information retrieval and serendipitous access to resources.


Bogers, T., Thoonen, W., & van den Bosch, A. (2006). Expertise classification: Collaborative classification vs. automatic extraction. Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from

Guy, M., & Tonkin, E. (2006). Folksonomies: Tidying up Tags? D-Lib Magazine, 12(1). Retrieved November 10, 2006, from

Iverson, L. (2006). Thomas Vander Wal on Folksonomy. Blog posting. Retrieved from

Lin, X., Joan, B., Yen, B., et al. (2006). Exploring characteristics of social classification. Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from

Long, K. (2005). Leveraging folksonomy - flickr clusters at ExperienceCurve. Blog posting. Retrieved November 10, 2006, from

Marlow, C., Naaman, M., boyd, D., et al. (2006). HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, . Hypertext 06. Retrieved from

Mathes, A. (2004). Folksonomies - Cooperative Classification and Communication Through Shared Metadata. Retrieved November 13, 2006, from

Sen, S., Shyong K., L., Cosley, D., et al. (2006). tagging, community, vocabulary, evolution. Proceedings of CSCW 2006. Retrieved from

Tonkin, E. (2006). Searching the long tail: Hidden structure in social tagging . Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from

Trigari, M. (2001). Multilingual thesaurus, why? . European Schoolnet, ETB project.. Retrieved November 13, 2006, from

Vander Wal, T. (2005). Explaining and Showing Broad and Narrow Folksonomies :: Personal InfoCloud. Blog posting. Retrieved November 13, 2006, from

No comments: