Saturday, August 04, 2007

Draft paper: Analysis of User Behavior on Multilingual tagging of learning resources

This is an almost final draft of a paper that I'm currently working on. It's been accepted as a full paper to the SIRTEL workshop.

Ah, should be mentioned, maybe, that I'm also co-chairing it :) It's gonna be very cool, so try to make it there, if possible.

If not, you can always think of posting a question in YouTube, like they did in the US presidential campaign. I kind of like that, although I don't think that I get CNN to co-host it!

Anyway, comments are welcome on this on. All images are missing, I was testing Google docs for this-copy and paste from OO did not include images.

Also, the formating took some damage, sorry about that. The final, more readable version will be at the conference site in about 10 days.

Analysis of User Behavior on Multilingual Tagging of Learning resources
Riina Vuorikari1,, Xavier Ochoa2, and Erik Duval1

Abstract. Although social, collaborative classification through tagging has been the focus of recent research, the effect of multilingual tags is often overlooked. This work presents an early exploratory study of the production and consumption of multilingual tags in a European educational K-12 context. The data, produced by teachers bookmarking and tagging learning resources during three month period, was analysed. Thereafter, this information was presented in the form of metadata keywords to a focus group of teachers who evaluated its descriptiveness, usefulness and overall quality. The results of this early study suggest that users are divided about the benefits of multilingual tags, however, some tags are useful for some users, thus “hiding all but the right tags” becomes crucial for the success of a multilingual collaborative tagging system.

Keywords: Collaborative tagging, multilinguality, learning resources.

1 Introduction
The use of social, collaborative classification systems has gone through a continuous growth in the latest years [1]. An example of this is a multitude of sites that provide some type of social annotation of digital artefacts and a social navigation system (Flikr, , CiteULike,, among others). Social tagging, i.e. allowing individuals to apply free text keywords to digital objects, potentially offers advantages in terms of personal knowledge management, serendipitous access to objects through tags, and enhanced possibilities to share content with emerging social networks.

Several studies have been undertaken to better understand the behaviour and evolution of social tagging systems. Early research has been conducted by Mathes [2] where the term “folksonomy” is used to compare the emerging socially generated vocabulary with the more formal ontology concept. Golder and Huberman [3] first looked at user patterns of collaborative tagging systems. Recent studies focus on the navigability of such social systems [4] and on understanding the network properties [5].

A prevailing aspect among current studies concerning tagging is that they assume that tags are represented in a common language [6], understandable by all the members of the user community. Guy suggests that it is not always the case [7], but does not offer insight on how to deal with tags in multiple languages.

Lately, multilingual tags have started emerging on popular social tagging systems as their user-base grows, and different ways to deal with multiple languages can be observed. Delicious users, for example, add tags in different languages for a bookmark (e.g. achat, shopping) and even in some occasions add language identification in tags (e.g. lang:fi) for the language of the resource. However, it does not offer any system level support, that allows users to see tags, say, only in French or Finnish. Other services, like Yahoo!'s MyWeb on the other hand, offer tags and tag clouds in different languages in their localised parts of the portal (e.g. .fr, .es, ...), thus some language identification of tags takes place on the system level. Thirdly, In LibraryThing experienced users can combine tags, where in some occasions tags in different languages have been grouped together.

Our work, still at its early stage, attempts to shed light on a community of users who shares a common educational interest to use a social tagging system across country and language borders, but does not necessarily share a common language, as the users are free to choose the language(s) in which they apply tags. This exploration takes place in the context of two European Community founded projects, CALIBRATE1 and MELT2, both focusing on sharing and re-using of digital learning resources for K-12.

European education, especially that of K-12 education, is inherently multilingual and multicultural. Offering educational resources and services in native languages is deemed important, but equally important is the exposure to other languages. One way to promote this is to make learning resources available across national and linguistic boarders. This puts constraints on semantic interoperability, i.e. how well content and its metadata can be understood by other systems and users.

Controlled vocabularies, such as multilingual LRE Thesaurus3, can be used to overcome some hurdles of semantic interoperability. However, the gap between the terms used by experts and practitioners in the field is also problematic. For that reason, the current research looks into co-existence of taxonomies and end-user generated tags.

A federation of learning resource repositories in a multilingual context needs to support multiple languages at the system level in order to support each repository and its national user-base, but at the same time, there is a need to allow people (i.e. user information and preferences), resources and tags to “travel” across national and linguistic borders.
This paper is structured as follows: first, in section 2, we analyse the early stage of the bookmarking and tagging behavior of our community in order to better understand how teachers bookmark and tag resources in a multilingual context; what types of tags are provided and in which languages. Then, in the section 3, an experiment is presented that measures the effect of multilingual tags on the descriptiveness, usefulness and overall quality of the metadata. Finally, the findings are discussed and applied to design decisions for multilingual tagging systems.

2 Analysis of tagging behaviour in multilingual context

The CALIBRATE project makes K-12 digital learning resources available to its pilot schools (78 schools ) in Hungary, Austria, Estonia, Czech Republic, Lithuania and Poland in their different curriculum areas. Schools can access material in different languages through a portal that is connected to a federation of learning resource repositories [8] in the pilot countries.
As part of the project's multilingual search interface4, a personal bookmarking and tagging tool has been available since the beginning of 2007. This tool allows a user to create personal collections of learning resources by bookmarking interesting resources found through the portal. To facilitate the management of these personal collections (also called favourites in the project), the user can also add keywords to resources to make it easier to ”keep found things found”. These keywords are free for the user to choose and can be expressed in any language. The collections and keywords are kept private to the user, and at this stage of the experiment, they cannot be shared among users.

The data for this analysis is from a period of about three months (January 24 to April 21 2007). There were 77 teachers who made 459 bookmarks with 417 multilingual tags on 320 different learning resources. It is intended to have regular analysis of this data within the projects lifespan (-2008).

2.1 Quasi-Experimental Set-up

A total of 173 subjects used the portal during the time of the experiment, however, the subjects of this dataset comprises of a group of 77 teachers who had done at least one bookmark during this time. Thus, it was a self-selected group formed based on the bookmarking behavior during the period of three months and it represents 45% of all the pilot participants. As there was no overall methodology to introduce bookmarking and tagging to the subjects, more than half of the participants had not shown interest in using this feature of the portal.

The bookmarking habits, at this very early stage, varied a lot in terms of what languages to use, how many tags to add, how to add multiple tags (with comma separated or without commas), etc. Also, hardly any of the participants had previous experience on tagging, so not one single tagging convention emerged, rather many different ways to use tags in multiple languages. As there is very little research done on the multilingual context, we think it is important to study the early stage of tagging behaviour to better anticipate the effect of multilinguality on the system to improve its design.

It is noteworthy to mention that the bookmarking and tagging system, at this stage of the pilot, offers very little social influence in what comes to choosing what to bookmark and what keywords to choose. Oftentimes in social bookmarking sites, social cues are made available (e.g. most bookmarked items, tag clouds, tags are recommended based on previous tags, etc). At the time of the experiment, the system had hardly any tags attached to resources, so teachers started from an empty plate. In the case where a resource was already tagged by another participant, the user would see the term(s) only if they were in the same language as the interface is.

2.2 Results

In the part we present the results of the analysis, which will be discussed further in conjunction with the other results in the discussion section.

When we look at the distribution of bookmarks per users, we can find that on the average, each user had 6 bookmarks (Fig.1). However, the distribution was very wide; 10% of the users had more than the average amount of bookmarks, which leaves 90% under the average. Eight of the users could be called “super users”, as they had more than 20 bookmarks, and 12 users had between 20 and 6 bookmarks. About 30% of the users seem to have only experimented with the bookmarking system, as they only have one single bookmarked item in their favorites folder.

We had recorded 418 tags in the system. During the semantic analysis of tags we found that many tags actually contained multiple terms, i.e. they were bundles of terms without comma separation. This was due to a technical feature of the tool that treated terms without comma separation as one tag. When broken down, they resulted in 585 terms. They were translated into English and a semantic analysis was performed to better understand the types of tags. We used the classification from Sen [9] that is also based on the categories of Golder et al. [3], which are Factual tags (Golder: item topics, kinds of item, category refinements); Subjective tags (Golder: item qualities) and Personal tags (Golder: item ownership, self-reference, tasks organisation)

The vast majority of the tags at this early stage (Table 1) are of the factual type. From the factual tags, 79% were put into a rough category of topic and 14% of the category refinement with richer information. The rest of the tags were subjective in their nature and could be used to describe the quality of the resources or how the person felt about them. None of the tags fell into the category of personal tags as Golder describes them (e.g. tags related to item ownership, self-reference or personal tasks organisation). When we analysed how these tags were used and re-used among users, we found that 80% of tags related to bookmarks were factual and 20% of tags subjective tags. In a MovieLens study [9], for comparison, the distribution was 63% factual, 29% subjective, 3% personal and 5% other.
Table 1. Types analysis of each tags (no re-use)
Category refinement

After categorising the tags, we further studied their nature. Two main trends seemed to emerge, first, many of the tags contained the same terms as in the title, i.e. user had just copied the title in the tag field. Second, about 13% of tags contain a general term, a name, place, e.g. EU, Euroopa, Euroopa, Europa, europe, geograafia, Phytagoras, etc . We hypothesise that this type of “travel well” tags, even if not translated, could be found useful for other users for their close similarity in spelling in many languages. We think it could be of interest to work towards automatically filter this type of terms from the pool of all multilingual tags, for example, by matching them against existing multilingual vocabulary lists available on the Internet.
When we looked at the number of tags that users related to bookmarks, we were able to identify some early trends. For the total of 459 bookmarked resources, we found that some of the tags were re-used, there was an average of 1.92 tags/resource. More than half (56%) of the tags were entered as a bundle of terms, i.e. most teachers had added 2 to 6 terms without a comma separation. In quite a few cases these terms were comprised of the terms in the title of the resource (Fig.2). In 28% of the cases only one term was entered as one tag.

The rest had used multiple separate tags (2-6 tags). In the latter case the terms were not necessary related to the title alone, but carried other types of information (e.g. title: Umweltkids and tags: Oekologie, Artenschutz, Regenwald, Tierschutz, Skisport).
Contrary to our expectations, the users took liberties to add tags in multiple languages and to use the portal interface in different languages than that of their mother tongue (interface was made available in the languages of the pilot and in English). This made the identification of the language of tags more difficult, as we had expected to be able to identify the language of the tag from the language of the interface that the user used when inserting the tag. In about 70% of the cases we were able to identify the language of the tag correctly using this method, which leaves us with a 30% error rate on language identification. This error in identifying the language of the tag correctly would make it hard, for example, to display tags and tag clouds in one single language, an issue that is related to the usability of the portal, and the one of which the second experience was set up to find more evidence.
We found the following scenarios for tagging, however, due to our logging, we can't give percentages for these use cases:
  • Interface and tags in mother tongue
  • Interface was used in mother tongue, but tags in other language
  • Interface was used in a language that is other than the mother tongue, but tags were entered in mother tongue
  • The tagging language was other than the interface language and the mother tongue
These scenarios were found through comparing the real language of the tags to that of identified language by using the interface language. In this early stage of the experiment it is impossible to draw firm conclusions, but it seems that users are likely to use, or at least try, the interface in different languages. We found, for example, that the tags entered through the English interface were in English only in 50% of the cases, which means that users added tags in languages within their areas of competences. On the other hand, we also found that there were many more tags in English than we expected from the choice of the interface language. These users had chosen to tag in English, even if they used the interface in some other language, most likely to be able to share tags with users from other countries.

3 Experiment with Multilingual Tags

An exploratory experiment was set up in order to measure the perceived usefulness and quality [10] of multilingual tags, traditional metadata and expert classification keywords. We were also interested in how users reacted when they were confronted with tags in multiple languages that they did not have knowledge of. The experiment subjects were shown a list of learning resources metadata with keywords in multiple languages, the list was imitating the search result list of the portal. The results of this experiment will be useful to guide design decision in the development of retrieval tools for learning objects in a multilingual environment.

3.1 Experimental Set-up

Thirteen teachers, who belong to the MELT focus group, were selected to participate in the experiment. They were confronted with metadata regarding five learning resources in different areas of primary and secondary education curricula, namely in health education, social science, physics, mathematics and biology. An online form was used for the experiment5.
Each learning resource had a metadata description, but the number of elements varied. However, they all had the following metadata: title, description, age range (all in English) and keywords. The keywords were comprised of tags and thesaurus terms, they were mixed together and displayed in an alphabetical order. The number of Thesaurus terms and tags varied for resources. Twenty of these keywords were thesaurus terms in English that an expert cataloger had used to classify the resource. The rest (39) were multilingual tags provided by pilot teachers during the three first months of the CALIBRATE pilot. These tags were both in commonly used languages and in less used languages as listed below:
  • 11 in Hungarian
  • 7 in German
  • 7 in English
  • 6 in Polish
  • 4 in Estonian
  • 1 in Finnish
The participants were asked to look at each learning resource at the time and go through the metadata related to it. Then, they were exposed to two different task related questions: first, to select the keywords that they found helped them to learn about the resource for the given learning resource (i.e. descriptiveness), and secondly, they were asked about decision support (i.e. help using the learning resources in teaching). Finally, they were also asked to rate the perceived overall quality of all the metadata displayed (traditional metadata plus keywords). This procedure was repeated for each one of the five learning resources.

Once the review of all the resources was concluded, the users were asked to identify their language competencies, and to indicate their comfort level when keywords were presented in languages that they did not understand. All these questions were mandatory to answer. The subjects commented later that in some cases they did not feel that any of the keywords was useful, but they had to choose one to conclude the web-survey. This might have skewed the results to some extend. Finally, participants had a choice to leave free comments about their experience during the experiment.


In this part we present the results of the experiment, which will be discussed further in in the following section.
On average, only 35% of the presented keywords, both Thesaurus and tags, were found descriptive for the learning resource. The thesaurus terms were found descriptive in 58% of the cases, while the tags only in 25% (Fig.3). When we look at the two top terms for each resource, we find that Thesaurus terms were somewhat more popular (60%) than tags (40%) (Table.2). All but one of the most popular tags were in English, which was also the most spoken language among the focus group. There were a lot of variations, by resource and by language groups, on how users perceived the keywords.

For example, for the first resource in the Fig.3, there was only one Thesaurus term and nine tags, which were in English and German, the languages widely spoken by participants. In this case two of the tags were chosen almost as often as the Thesaurus term. As for the second resources in Fig3, there was almost equal amount of tags and Thesaurus terms; two tags, both generic terms (EU, Europa) were chosen more often than Thesaurus terms.

Fig. 3. Percentage of tags and thesaurus terms found descriptive

The no:3 in the same figure represents a case of multilingual tags in less spoken languages in which the users did not have competences in. In this case two Thesaurus terms were most chosen, however, two “travel well” tags (JavaApplets, Applets) were very high on the list. As for the resource no:4, there was an equal number of tags and Thesaurus terms which was also displayed in the results, top two positions were held by both. In the last case the tags were in less spoken languages, in Hungarian and Estonian; one Hungarian tag was found useful by all with Hungarian skills.

Table 2. The two most popular keywords for each resource

From the total number of keywords, 54% were in a language within users competencies; however 87% of the keywords found descriptive were in a language that the user had skills in (Fig. 4). The remaining 13% of tags that were found useful, but not in the languages that users had competences, seem to comprise of terms of the generic type, the “travel well” tags, as described previously.

Fig. 4. Percentage of keywords in a known and unknown language that were found descriptive

When we asked about how well the keywords would help to use the resource, in average, only 27% of the presented keywords were found useful to indicate possible uses of the learning resource. Thesaurus terms were found useful 50% of the time, against only 18% for tags. In this case, when we look at the languages in which the participants had skills in, we find that in 83% of the time they mark those terms useful.
We can say that the issue of multilingual tags evokes sentiments and also splits users. From the thirteen users, two “love” being able to see multilingual tags and four found them useful, whereas six found them confusing and one “hates” to see keywords in languages that he/she does not understand (Fig. 5).

Fig. 5. Answer of the participants to the question: “What do you think when you see the keywords in many languages?”

Lastly, we were also interested in how users evaluated the overall quality of the metadata record [10]. The quality assigned to the metadata record correlate in a statistical significant way with the amount of words in the description (.909) and with how descriptive (.944) and useful (.994) the user found the keywords for that learning object. The first correlation was already found in a previous study [10].

4 Discussion on the results

The main argument that comes out of this early experimental research is that certain multilingual tags seem to be useful for some users – the challenge is how to make the other tags invisible? Moreover, the results can lead us to discuss the multilingualism of tags and indexing keywords from different perspectives; what are the user needs and requirements in a multilingual Europe, how can they be supported at the system level, what are the ramifications on their usability and how is the overall quality of the portal enhanced through multilingual tags?

In the spirit of “how to hide all but the right tags for each user”, this research has identified two topics that need further investigation: one is that of identifying “travel well” tags and the other that of how to correctly identify the language of each entered tag. After tackling these two issues, hiding all but the right tags becomes a much more manageable task.
Solving those two issues would greatly enhance the usability of the portal that offers multilingual tags: as shown in the experiment with the focus group, being exposed to tags in many languages has a dividing effect. One half of the subject expressed that they liked to see multilingual tags, whereas the other half found them rather irritating, especially when they were in languages that they did not recognise. It was also mentioned that multilingual tags make it harder and slower to pick the useful terms out of all the tags.

Two possible ways to further advance the cause could be envisaged: to automate the recognition of “travel well” tags and the identification of languages of all tags by using already existing vocabulary and dictionary lists on the Internet, or by crowd-sourcing” it to users, which is asking the end-users to identify “travel well” tags and allow them to translate and correct the language of tags. A co-existence of both could also be envisaged.
Another interesting outcome of the study is that keywords in general received a rather low appreciation rate among the subjects: 35% of the keywords were found descriptive and 27% were found helpful to the use of the resource. Overall, the Thesaurus terms performed better than the tags, however, it can be argued that tags, after all being produced with no outlay, showed an overall encouraging and potential gain in their usefulness. This needs to be investigated further and more in depth with a bigger sample size.

It could be envisaged that, in the case of sharing the accumulated knowledge regarding the actual use of resources in teaching and learning, social tagging could be in the future interesting in adding value to keywords. Thus, more design level effort is needed in guiding and encouraging users in using tags for such purposes.

5 Conclusions

This early study contributes to the understanding of tags in multiple languages. Despite the small sample size and early tagging behaviour of the participants, we can assume that tags in a multi-cultural and lingual context offer potential advantages to the collaborative tagging system and its multilingual user communities (e.g. Europe). However, there are challenges and research questions that need further attention. As it becomes clear that some tags are useful for some users, the design challenge becomes “hiding all but the right tags”. This implies for both entering and viewing the tags, e.g. what tags and in what languages to show/recommend to users when they are about to add a tag and what kind of tags to show for retrieval and social navigation.

First, it seems important that the system has a capacity to infer and identify tags entered in multiple languages, so that users can be shown or exposed to tags only in languages that they desire. Second, it appears that there are tags that “travel well”, i.e. tags that are easily understood by many users despite the lingual barriers. It appears important that those terms are identified, either automatically or by users, so that they could be better taken advantages of. The two above findings seem to further indicate that tags in different languages should not be kept as separate silos, but interaction between languages should be used for connecting like-minded people across country and linguistic borders.

The issue of multilingual tags is intriguing and offers interesting possibilities for both the learning resources repository managers and administrators, as well as for end users. In a multilingual environment such as Europe, where making learning resources available in languages others than in mother tongue is becoming more mainstream, mixing tagging with top-down expert classification system seem to offer interesting possibilities for accessing resources and for other novel educational applications that leverage the social network aspects of a given community. From this early experiment it becomes clear that further research into the topic of multilingualism is needed to better understand its complexity, but also to be able to design more adaptable applications.

Acknowledgments. We would like to thank Sylvia Hartinger from European Schoolet for making the tags available for analysis and Jim Ayre from Multimedia Ventures Europe Ltd. for valuable comments. Acknowledgment also goes to Helsingin Sanomain 100-vuotissäätiö for the research grant that made this research possible.

1. Marlow, C., Naaman, M., Boyd, D., Davis, M.: Position paper, tagging, taxonomy, flickr, article, toread. In: Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland. (2006).
2. Mathes, A.: Folksonomies-cooperative classification and communication through shared metadata. In: Computer Mediated Communication, Graduate School of Library and Information Science, University of Illinois Urbana-Champaign. (2004)
3. Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. Journal of Information Science 32(2), pp. 198—208. (2006).
4. Chi, E., Mytkowicz, T.: Understanding navigability of social tagging systems. In: Proceedings of CHI. Volume 7. (2007)
5. Catutto, C., Schmitz, C., Baldassarri, B., Servedio, V.D.P., Loreto, V., Hotho, A.,
Grahl, M., Stumme, G. Network Properties of Folksonomies. AI Communications
Journal, Special Issue on "Network Analysis in Natural Sciences and Engineering",
6. Hammond, T., Hannay, T., Lund, B., Scott, J.: Social bookmarking tools (i). D-Lib Magazine 11(4) (2005)
7. Guy, M., Tonkin, E.: Tidying up tags. D-Lib Magazine 12(1) (2006)
8. J.-N. Colin and D. Massart. LIMBS: Open source, open standards, and open content to foster learning resource exchanges. In Kinshuk, R. Koper, P. Kommers, P. Kirschner, D. Sampson, and W. Didderen, editors, Proc. of The Sixth IEEE International Conference on Advanced Learning Technologies, ICALT'06, pp. 682-686, Kerkrade, The Netherlands, July 2006.
9. Sen, S., Lam, S.K., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: tagging, communities, vocabulary, evolution. In: Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. pp. 181–190 (2006)
10. Ochoa, X., Duval, E.: Towards automatic evaluation of learning object metadata quality. In: Advances in Conceptual Modeling - Theory and Practice, ER 2006 Workshops BP-UML, CoMoGIS, COSS, ECDM, OIS, QoIS, SemWAT. pp. 372–381. Lecture Notes in Computer Science, Tucson, AZ, USA, Springer (November 2006)


Suvendu Sahoo said...

Interesting work !! I landed up at your blog through searching for "personal knowledge management " (PKM) through Google blog search. I would like to know if you have any thoughts on PKM as I am interested in how social and personal KM works together and how web based tools facilitate that.

vuorikari said...
This comment has been removed by the author.
vuorikari said...

Hi there,

I have this one paper that you can find from
(sorry, you have to paste the url, as it is so long. Otherwise it was cut off).

It basically a concept for PKM for educational purposes (isn't that called learning?), but no follow up ever took place.

Thanks for dropping by.


Anonymous said...

I'm surfing on u blog,look great,
well,thinks for you info
mate&travel,Find More....