Monday, July 28, 2008

Measures for cross-border actions with Tags and Resources

If I break down the triple of {user, tag(s), resource} I can study the three things separately
  • User - resource
  • User - tags

  • Resource - users
  • Resource - tags

  • Tag - resource
  • Tag - users

And as I am especially interested in the cross-border actions, I would study the cases where:

User country ≠ Resource country

In this case I am interested in studying users collections of bookmarked resources, especially establishing the facts based on which country the resources are originated from. Using the cross-border metrics I can take a snapshot of the resources and calculate a cross-border resources value for the use.
  • E.g. User Finland has bookmarked Resource1 Poland , Resource2 Spain and Resource3Finland

  • This would make a User Finland to have a resource profile Poland 33%, Spain 33% and Finland 33%

  • In this case, as the user is from Finland, the cross-border profile would be 66% which would most likely have a value of .66, if we imagine that the cross-border value is between 0 and 1.
So what, you say. It makes a difference, I say.
  • This allows me to categorise this user into cross-border user of resources. I assume
    that users have differences in their inclination of using resources that come from different countries, some use them a lot others do not want to bother with them.

  • So this metric allows me to study who does what and thus better understand our user-base.

  • On the long run this of course will make it easier to recommend resources to users, as we
    already know that in their profile it shows that they are inclined to use cross-border resources.
Resource country≠ User country

This allows to me to look at the thing from a different point of view. Here, I am interested in establishing a profile for a resource. It appears that some resources are used a lot by people from different countries, whereas others are used predominantly by users from the same country than the resource itself is from.
  • E.g. Resource Finland has been bookmarked by User1 Poland , User2 Spain and User3Finland
  • This makes the ResourceFinland to have a profile Poland 33%, Spain 33% and Finland 33%.

  • In this case, as the resources is from Finland, the cross-border profile would be 66% of users, which would most likely have a value of .66, if we imagine that the cross-border value is between 0 and 1.
So what, you ask again. I think that it's cool, because then I can quickly and in an automated way calculate which of my resources have a high potent to cross borders easily. First of all, this will help me study whether there are some characteristics that make these resources to cross-borders.

Second, we can use this information to make filter out the resources that we think cross borders easily. This could be cool for example on our portal, we could flag out these resources for users, and furthermore, we could give these resources a priority when other repositories are harvesting or searching us in a federated manner.

Resource country Taglanguage
It'll also be interesting to create profiles for resources based on tags in different languages. For tag, we do not trace the country of origin, rather just the language. So in this case I'm interested in looking at resource profile on tags.
  • E.g. Resource Finland has been added a Tag1 Polish, Tag2 Spanish and Tag3 Finnish
  • This makes the ResourceFinland to have a tag profile Polish 33%, Spanish 33% and Finnish 33%.

  • In this case, as the resources is from Finland, the cross-border tag profile would be 66% of users, which would most likely have a value of .66, as above.
This is also an indication that the resource has a potent to cross borders. Tags in different language might yield some interesting information on how this learning resource could be used in a new context. In this example the resource was created in Finland, so one could assume that it has some underlying ingredients that make it suitable for Finnish curriculum. On the other hand, the fact that users have added tags in Polish and Spanish too might indicate that this resource is also useful for teachers in those countries.

Here an interesting case seem to emerge for topics like Language learning, say, English as Second Language (ESL). Language learning and teaching resources seem to be easily reusable in another language context. Interestingly, though, we've seen that in these cases teachers tend to tag them in the language in question.
E.g. User Finland has added a Tag English for ESL Resource Poland

Tag language Resource country

We can also look at the things from tags perspective.
  • E.g. Tag Finnish has been added to Resource1Poland, Resource2 Spain and Resource3 Finland
  • This makes the TagFinnish to have a resource profile Polish 33%, Spanish 33% and Finnish 33%.

  • In this case, as the resources is from Finland, the cross-border tag profile would be 66% of users, which would most likely have a value of .66, as above
This allows us to observe cases where a tag is related to learning resources that most likely share some thematic resemblance. It could be for example Science resources from different countries that Finnish teachers have collected. In this case we also can find evidence that these resources were adaptable to Finnish curriculum despite the fact that they come from other countries.

Tag language ≠ User country

On the other hand, we also find tags that have been used by users from different countries. These are the tags that we have previously identified as "travel well" tags. They have some interesting properties that make them easily understandable without translations, e.g. names (people, country, place), acronyms, common terms (web2.0).

By looking at the connection between Tag language and User country we can possibly identify such tags. The other common case for this seems to be that these people have tagged the resource in English. In any case, if many people have done that, we can identify these terms and manually analyse them. The hypothesis is that they either are "travel well" tags or then they are some super popular tags that could also count high on tag non-obviousness metric by Farooq et l (2007).

User country - Tag language
Lastly, just to enumerate the cases, we also have the relation User country and Tag language. This can be used to study user's personal tagging behaviour. In the previous study in Calibrate we found that on average users tag in their mother tongue and in English (75% to 25%). It seems though that things look different in MELT, where teachers are tagging more in English.

We are not sure whether these are personal preferences or the influence of social awareness, as in MELT tags are made readily available to others through a tag cloud, whereas in Calibrate they were only used for personal knowledge management reasons.

In any case, this relation allows us to measure individual differences between users and thus understand our user-base and possible user scenarios better.

What next? I will make a case study to apply these measures to MELT tags that we've got in the system so far

  • Learning resources: 199
  • Users: 40 (From Fi, Hu, Et, Be, At, It)
  • Tags:
    • 572 distinct,
    • 969 applied tags
    • 75% of tags were used only once
    • 25% of tags were used more than once

Tags and SNA measures

Studies on tags commonly have the triple of {user, tag(s), item} as a unit of study. That's also what I'm interested in, especially in those underlying structures that build relationships between users, tags and items. Some apply Social Network Analysis to study, for example the centrality measures of the network.

A run-down of SNA measures from Wikipedia

Degree an individual lies between other individuals in the network; the extent to which a node is directly connected only to those other nodes that are not directly connected to each other; an intermediary; liaisons; bridges. Therefore, it's the number of people who a person is connecting indirectly through their direct links.
(Somewhere else:The betweenness measurement indicates a node or nodes that connect clusters of nodes. Nodes that have hight betweenness have high influence over what information flows in the network.)
The degree an individual is near all other individuals in a network (directly or indirectly). It reflects the ability to access information through the "grapevine" of network members. Thus, closeness is the inverse of the sum of the shortest distances between each individual and every other person in the network.
(Degree) centrality
The count of the number of ties to other actors in the network. See also degree (graph theory).
Flow betweenness centrality
The degree that a node contributes to sum of maximum flow between all pairs of nodes (not that node).
Eigenvector centrality
a measure of the importance of a node in a network. It assigns relative scores to all nodes in the network based on the principle that connections to nodes having a high score contribute more to the score of the node in question.
The difference between the n of links for each node divided by maximum possible sum of differences. A centralized network will have many of its links dispersed around one or a few nodes, while a decentralized network is one in which there is little variation between the n of links each node possesses
Clustering coefficient
A measure of the likelihood that two associates of a node are associates themselves. A higher clustering coefficient indicates a greater 'cliquishness'.
The degree to which actors are connected directly to each other by cohesive bonds. Groups are identified as ‘cliques’ if every actor is directly tied to every other actor, ‘social circles’ if there is less stringency of direct contact, which is imprecise, or as structurally cohesive blocks if precision is wanted.
(Individual-level) density
the degree a respondent's ties know one another/ proportion of ties among an individual's nominees. Network or global-level density is the proportion of ties in a network relative to the total number possible (sparse versus dense networks).
Path Length
The distances between pairs of nodes in the network. Average path-length is the average of these distances between all pairs of nodes.
Degree an individual’s network reaches out into the network and provides novel information and influence
The degree any member of a network can reach other members of the network.
Structural cohesion
The minimum number of members who, if removed from a group, would disconnect the group.[15]
Structural equivalence
Refers to the extent to which actors have a common set of linkages to other actors in the system. The actors don’t need to have any ties to each other to be structurally equivalent.
Structural hole
Static holes that can be strategically filled by connecting one or more links to link together other points. Linked to ideas of social capital: if you link to two people who are not linked you can control their communication.
What made me think of this now was that I read this mini study Using Social Network Analysis to Highlight an Emerging Online Community of Practice. Anthony Cocciolo, Hui Soo Chae, Gary Natriello, Teachers College, Columbia University

The method used made me tick. They used
..System Theory to define the uploading and downloading of materials as "communicative acts", the users of the system were the "actors" and the cululative communicative exchanges as "interactions" (Buckley, 1967). .. this particular systems arrangement is useful because it provides a readily available metric for assessing actors' interactions within a network.
I think it might be interesting to think how this could be used to study the underlying networks with tags.

The most comprehensive reference is: Wasserman, Stanley, & Faust, Katherine. (1994). Social Networks Analysis: Methods and Applications. Cambridge: Cambridge University Press. A short, clear basic summary is in Krebs, Valdis. (2000). "The Social Life of Routers." Internet Protocol Journal, 3 (December): 14-25.

Thursday, July 10, 2008

Notes: Tagging tagging. Analysing user keywords in scientific bibliography management systems

An interesting paper on JoDI about tagging in bibliography management system.
Tagging tagging. Analysing user keywords in scientific bibliography management systems
Christian Wolff, Markus Heckner, Susanne Mühlbacher
Journal of Digital Information, Vol 9, No 27 (2008)

Some outcomes:

a category model for tags in a scientific bibliography management scenario. This model covers linguistic features, the relation between tags and the text of the tagged resources, as well as functional and semantic aspects of social tags.
Here is an image of the model that I copied from the paper:

This is actually a really cool model for tags. I've been so far using three categories from MovieLens and Golder (2006)/Huberman (2005) studies; Factual, subjective and personal. I've noticed, though, that I've added many sub-categories for the Factual ones.

Like in this model, I've discovered very similar types in tags. Especially the "Functional Category Model" is interesting : it has 2 sub-classes:
  • subject related (e.g. resource related and content related) and
  • non-subject related, personal tags (e.g. affective, time and task related, tag avoidance=no tags).

Other things:
The ”typical tag” is a single-word noun, taken from the title of the respective article
(identical or variation), thus directly related to the respective subject.
Yep, we have many of these too! When I talk about these I refer to the non-obviousness metric from Farooq et al. (2007).

In contrast to previous studies the number of non-subject related tags remains rather low in the scientific data we observed and the full potential of tagging systems to describe qualities or aspects of resources does not seem to be used. But the absence of tags like cool, interesting, to_read does not mean that users who tagged the resource do not think it is cool, of interest or worthy of reading, but simply that the users did not express their ideas they may have or may not have about the resource.

This is interesting too. I think each audience tags differently. Our target audience are teachers, about 35-55 years old. They do not seem to go around tagging learning resources with tags like cool, etc.
Compared to author keywords, social tags tend to introduce less and simpler con-
cepts: Altogether, only one third of the social tags matched with (the far more numerous) authors’ keywords. Moreover, tags tend to be more general and users tag their articles more general and with less words than authors.

This is also interesting. There are some studies that have compared the tags and expert indexer keywords and have found even less overlap, if I remember right.

I love this one, it is so much the case:
Additionally, it shows that the respective system environment, e.g. tag suggestions, has a major influence on the tagging behaviour in terms of spelling errors, tag usage and creation of a specific tagging languages. This extends the number of the main influential factors on tagging behaviour being personal tendency and community influence through the additional component system influence.

They also flag out as an interesting study area the comparative studies across tagging platforms. I've looked at different tagging systems for educational resources a bit. This version is an old one, but I post it anyway:

Vuorikari, R., Poldoja, H. (submitted). Comparing tagging and its purposes across learning resource repositories. pdf

Wednesday, July 09, 2008

Teachers as Netpromotors of digital content

I made a survey with 28 teachers from different European countries on multilingual learning resources. You can find those 28 resources from this list. Our portal has a lot of multilingual resources that come from a variety of Ministries of Education in Europe.

But - we do not know for sure whether teachers find resources useful that come from different countries than they do, and that are in different languages than they speak. Hence my little survey. You can read more details here.

We only considered responses from teachers who came from different countries than the 18 resources did that we had in our survey. Quick round of results:
  • 43% of respondents found resources, which came from a different country than they did, of use for preparation purposes.

  • 41% of respondents found resources, which came from a different country than they did, of use for teaching purposes.

  • 65% of respondents said that they would share these resources, or parts of them, with their colleagues and friends.

  • Even 35% of respondents, who said they did not have expertise in the given subject area, thought that they would share the resource with their colleagues
These were the results on a scale 1-5 (n=254)

It made me think that:

a) If teachers use multilingual or foreign language resources, they most likely use them both for preparatory purposes and for teaching purposes. We do not know, though, whether they would use the resource in their teaching themselves or let pupils interact with this resource.

b) Teachers are good filters. More teachers said that they would be willing to share resources with their colleagues than actually use them themselves. It might be that this happens with a resource, which they think is interesting, but does not match to their curriculum goals for the year. They might say, "Hey, my colleague would love this, I'll send it to her!" This is the basic mechanism of viral marketing, how can we leverage this on a learning portal?

c) "Would you like to share it with your colleagues" is one of the key questions when studying customer satisfaction and loyalty, topic that we in learning repositories often neglect. If teachers are happy users, or if teachers find good material on the portal, they can become promoters of those resources. This might be very important especially when we deal with resources that are in multiple languages, because sometimes it is hard to discovery those resources.

If we take the teachers in the survey, we could calculate the Net Promoter Score by subtracting the % Detractors (e.g. the ones in my survey who rated this 1 or 2 on the scale 1-5) from the % Promoters (e.g. the ones in my survey who rated this 4-5).

Take the case for sharing: it would be 65% -22% =43%. That is a pretty good net promoter score, most companies have it around 5 to 10%, and it is very unusual to have it above 50%.

This can indicate that teachers are willing to put their credibility on the line by recommending a resource that comes from a different country than they do to a friend!

Now, I just have to think of the best way to do this ;)

A draft idea for a paper: A case study on teachers' use of social tagging tools to create collections of resources - and how to consolidate them?

UPDATE: the submitted paper, comments welcome!

This paper explores how a group of pilot teachers (16) create collections of digital learning resources using tagging tools. We study two different tools: an educational portal (MELT) and We first look at the characteristics of these collections (number of resources, languages of resources, number of tags used, etc), and then propose a way to display the resources and tags from on the learning portal (MELT) using Attention Profing Markup Language (APML). This allows a higher level of integration between a learning portal and an external social tagging service like, and thus enhances the wider variety of digital learning resources to be discovered.


We selected 16 pilot teachers to be subjects of this study from the MELT project. These teachers have both an account on the MELT portal and on the delicious bookmarking service. These teachers are primary and secondary teachers in science, language learning and ICTs in Finland, Estonia, Hungary and Belgium. 7 of them are females and 10 males. One participant is under 30 years old, 8 are under 40 years, 5 under 50 years, 3 under 60 years old.

They have been part of the MELT project since Summer 2007, when they were first introduced to delicious during a summer school. In March 2008 they were also invited to create a profile on the MELT portal, where they were able to access multilingual learning resources for different topical areas.

From the MELT portal we know the detailed profiles of these teachers: their names, topics they teach, country where they teach and languages they speak. Moreover, we have information regarding the learning resources that they have bookmarked using the portal. This includes the information about the resource itself and the tags applied. We additionally have asked for their delicious username to be part of this small study.

From delicious, using the html service, we were able to download the 100 last bookmarks and tags that these teachers had posted on delicious. We also took all the data regarding the tags and people these users had in their network. Lastly, we recorded the number of posts each teacher had on their account.

We collected the following data for our selected 16 users:

Additionally, the delicious data contained the following information regarding the networks. Two people had chosen to keep their networks private:
  • Number of distinct people in the networks: 104
  • Number of people in the networks: 270

References API and other not so successful trials

I am getting somewhat disappointed in some of these web 2.0 "things". Take, for example, the delicious API.

I wanted to download the posts by a number of ppl in my network to study what the hell are they doing. The API allows you to download all your posts in a neat xml format. That's cool, I thought, let me just do this to 20 of my buddies, and I can study better how teachers are bookmarking - especially how are they bookmarking websites that are not from their own countries or in their own languages (e.g. cross-border use).

The delicious API only allows you to get 30 latests posts from people that you do not know the password of. wtf? The same if you try to get them through RSS feeds, you only get 30. Then, there is the html code that you can use, but it also allows you to get only 100 posts. What about the rest, those 999 posts that I want? That stuff is so badly documented on the site that it's very annoying. Why not just be frank about it and say this is how things are?

I do not understand why to limit the API, RSS or html code when all that stuff is freely viewable anyway. So I tried using wget to suck that stuff out, but there is also something fishy and I can never get past 100 posts. So, I guess that just makes me to limit my study to a sample of 100 posts per user. Easy.

The other thing that I've been sightly disappointed with lately is APML and a number of tool that they make available for you to track your online profile, like or

You know what, the idea is great, but those tools/widgets suck, and they are so badly documented that it makes you just wanna cry. I've tried like 3 times in engagd to make my APML profile of 2 different feeds, and it never works. The tagurself cannot even load the example from the url that they have themselves posted as an example. wtf?

Moreover, the Yahoo! pipes are also somewhat strange, they never actually seem to post what they should. I put this example in one of my lasts post and it hardly never loads. Not so fun.

Hmm...I guess if more people used all these 2.0 tools, and not only talked about their potentially revolutionary usage by non savvy web-users, we could face the fact that the user-created web is far from being so revolutionary and does not empower users like me. Instead, I'd like to see those folks walk that talk, sit down on their asses and finally get past the BETA versions of their tools to actually make them work properly. Dude, cannot wait to get rid of all the BETA versions on the web.

Tuesday, July 08, 2008