Friday, September 19, 2008

Encourage lurking in conferences!

I'm always amazed how old-school these conferences are. I'm currently in Ec-Tel 08, sitting outside of the session room on the floor and listening the speech through the half-open door. I'm a lurker.
lurking in ectel08
I do not want to participate in the whole session, but I do want to lurk around because I'm half interested and there is one speaker that I want to see later. I just don't want to sit in the room and tap on my laptop.

It would be great to have a couch or a few seats outside of a session room with a screen, and hopefully also audio, to be able to properly lurk. We already know from online communities that it is totally OK to lurk, so why would we not make it also properly available in the conferences, too?

I've had really interesting conversations in these marginal places of conferences. Let's just make them happen more often! Let's provide places to officially lurk around.

Friday, September 12, 2008

Cross-boundary ranking of learning resources

Based on the idea of Interest Indicators, like social bookmarks and ratings, I've looked at the data so that we can make the cross-boundary resources better available on the MELT portal.
The aim is that we can, based on previous users' behaviour :

a) make separate "travel well" lists of resources that have a potential to cross-borders better,

b) use this information to rank resources better in the normal search result list,

c) allow users search for resources that have a good "travel well" value (e.g. give me resources in math that can cross-borders)

This is the data that I'm using (table below) and this is how I've defined cross-boundary (e.g. cross-country and language) learning resources before. Using that definition I have manually verified the number of cross-country resources. In the dataset, about 82% of resources were cross-country.

Now, we have a problem, though. On our MELT portal we do not have information about the country where the resource originates from. Dah!

This is a big blunder (in my opinion) in our Application Profile, we have not defined the country where the resource originates. We do define the provider, and the country information could be inferred from the provider, but it does not always work.

For example, one of our providers has frequently metadata about resources that do not originate from the same country!

I've experimented with the data using the information that we have on the portal, which is LOM about the resource including the language of the resource. As we also know the mother tongue of the registered users, this gives us a kick.

In the table below we can see the coverage of cross-boundary actions that we can get on resources without using any manual labour or verification of the country or language. As a base-line, with manual verification I found that 82% of the actions concerned cross-border rating or bookmarking of a resource.

The first row represents the cross-language resources (i.e. user's mother tongue is different from the resource language). Just using this information, we get about 65% of resources right, as opposed using manual checking (82%). I think it's pretty good, I'd settle for that! (although I have to look what kind of material was left out!)

The two other comparisons in the table are based only using information about users' previous behaviour. These would be:
  • rating > 2
  • bookmark
Only using information regarding bookmarked and rated resources results in a lousy coverage of around 20%. The problem is that 25% of resources bookmarked or rated are on more than one resource, the data still is very sparse.

Anyway, I want to use that information to "cross-boundary rank" the resources. As we do not know the country where the resource comes from, my work-around is based on countries where these users come from.

Here is a visualisation about resources that have been bookmarked or rated by users (see also ManyEyes link below). We can see the orange node in the middle, a learning resource called "Five Days in New York..". We see 3 edges leading out to Finland, Belgium and Hungary. This means at least one user from each of these countries has bookmarked the resource!

So, even if we do not know the origin of the resource, we know that it has users from 3 different countries. I can infer that it is a cross-boundary resource.

As most likely one of these 3 users come from the same country than the resource comes from, I will minus one country out of the total of countries: (number of countries -1)

My cross-boundary rank will be the following:
  • Count the number of ratings grater than 2 and/or bookmarks for a resource (actions). Give each action one point
  • Count the number of these users and give each user one point
  • Count the number of user countries of origin. Give each country one point and then minus 1
  • Compare the mother tongue of each of these users to the language of resource. If they differ, give one point/mismatch.
Then, count the following:
number of users + number of actions + number of cross-language x (number of countries -1)
Let's take the above resource "Five Days in New York.." as an example
  • Count the number of ratings grater than 2 (3) and/or bookmarks for a resource (5). Give each action one point. (8)
  • Count the number of these users and give each user one point (5)
  • Count the number of user countries of origin (Hungary, Finland, Belgium). Give each country one point and then minus 1 ( 3-1=2)
  • Count the number of user mother tongue (hu, nl, fi). Compare the mother tongue of each of these users to the language of resource (en). If they differ, give one point/mismatch (3).

  • number of users (5) + number of actions (8) + number of cross-language (3) x (number of countries -1) (2) = 32 Travel well value
This way you can count a value of "travel well" for each resource that users have previously interacted with on the portal. The value will always be an integer, which is important from the technical implementation point of view (in Lucine index it apparently needs to be an integer).

The down side is that we'll have a huge cold-start problem. As I said, our data is very sparse. To seed the system, I actually still manually check the new resources that users have interacted with and make a fake bookmark on them so that it looks like it has at least two users from 2 different countries. This way the resource gets a "travel well" value counted and appears on the "travel well" list and is better ranked, etc.

Of course, at the end I will evaluate how this treatment affects on users, do we, for example, see a big amount of bookmarks on these resources that I have been able to count a travel well value?

You can see a visualisation here. This is based on on user's country of origin.

Wednesday, September 03, 2008

Cross-boundary use of learning resources in LeMill

LeMill ( is a web community for finding, authoring and sharing learning resources. It is divided to four sections: Content, Methods, Tools and Community. The main target audience is primary and secondary school teachers, but anyone can join. It is a wiki-like system where all the learning resources are published under open licence and can be edited by other members.

Registered users can publish learning content, and descriptions of educational methods and tools. Users can also create their own Collections of learning resources. About 10% of LeMill users have created collections (users total based on March 15 2008), this represents 188 users (data snapshot May 30 2008).

Collections are good for "Keeping found things found", the nice resources that you find in LeMill can be easily put in a collection. Another important thing is that Collections can be used to make content units or thematic lessons. Collections are actually folders that you give a name, you can make as many collections as you want.

For me the Collections tool is interesting. It's the whole thing about what content do users find interesting enough so that they want to keep it.

In Table 1 you can find the description of the data that I use for this study. I give quickly some descriptive statistics about it, and then drill into the Cross-Boundary use of learning resources. This this I mean that the user comes from a different country than the resource (cross-country) or the resource is in a different language than that of the user’s mother tongue (cross-language). If resource comes from a different country and is in a differnt language, then it's a cross-border resource. I'll give examples of this later to make it more clear.

What's in Collections?

Users had saved 1645 resources in 376 collections. There is 4.4 resources on the average in each collections. Some Collections are huge, the biggest has 82 resources in it, whereas the median is 2 resources.

When we look at how the users have used this feature, we find a wide variety of cases. Just by looking at numbers, the most active user had 94 resources in Collections, whereas the average is 8,75 resources. Median is 3, so as usual, we have a group of very active users (30% above average) and lots of not-so-active users of this functionality.

When we look at the resources in Collections, we find that there are 1387 different resources. 13% of the resources exist in more than one collection, but most of the resources (1205) are put to only one Collection. Not much of an overlap there, which is a bit surprising seen the fact that other users' Collections are public, so I can easily go and see the lessons created by others. There are nice pivotal navigational features that allow me to click on the other user's name and see their Collections. To my dismay, though, I noticed that it is not very easy to add resources from other users' Collection to mine, which might hinder the reuse a bit.

Who uses Collections?

The cool thing about LeMill is that their user-base is a total fruit bowl. There are users registered from all over, in this dataset we have 22 countries. The top number of Collections are Estonians, that's 45% of all resources in Collections. Others are Lithuania (14%), n/a (14%), Hungary (9%), and so on.

Where do resources come from in Collections?

It seems like most Estonians put resources made by another Estonian in their collections (48.5%). Then we have n/a "country" (13.2%), resources from Lithuania (11.4%), from Finland (7.5%), Hungary (6.6%), resources from Georgia (6.2%) and so on.

What about resources originating from different countries than the user? The case of crossing boundaries.

About 40% of Collection users are Cross-Boundary users (74). You can see this at the lower part of the 2nd Table above. I was able to calculate this by looking into the resources that they have put in their Collections. I found out that every fourth (419) resource in Collections crosses some boundaries, either language or country boundaries. Let's look at this in more details (Table here on the left).

1st Case are Cross-border resources: the resource comes from a different country than the user and is also in a different language than the mother tongue. 28% of the cases were like this.

: a German resource from a German teacher that a Finnish teacher has put in his Collection.

2nd Case is about crossing language borders. Most cases (76%) represent resources that are in a different language than the mother tongue of the user is. Many of these resources are in English (63%) even if they are not created by an English native speaker.

: Let's say an Estonian teacher has made a learning resource in English and put it in her own Collection.

3rd Case: Cross-country resource. These are resources that are in the user's mother tongue, but come from a different country. Much less of those, only less than 5%.

Example: A resource in English made by an American and put in a Collection by a Canadian. Or it could be a German resource added into a Collection by an Austrian teacher.

So are there any commonalities in this?

In LeMill, it looks like most cross-boundary actions (i.e. resources put in Collections) are within users' language skills (lower part of the Table on the left). You'd be surprised to learn the language skills these teachers have! Most of them have one additional language on top of their mother tongue, but many of them boast 3 or 4. This results that most of the 352/419 cross-boundary resources in collections are within Foreign language skills of these users. That's 84%.

Naturally, one is curious to know what those remaining 16% of resources are?
Actually, a bit disapointingly, there were little surprise (and very little of evidence on my last post on "Vygotsky's psychological tools". Many of these resources were in English, most likely the user had forgotten to mention that s/he masters this lingua franca. Remaining were either with no text or multimedia, some resources I was not able to locate in LeMill anymore. I put some of them in my Travel Well Collection.

Tuesday, September 02, 2008

Emerging search patterns on learning resources

I am hugely inspired by the stuff from J.Feinberg and D.Millen, especially by the studies that they've done on doger, the IBM internal social bookmarking service. I must admit, though, that I had missed on it a bit, I cannot believe! Anyway..

This paper is really interesting, Social bookmarking and exploratory search (2007), not least for the reason that it offers a very interesting, almost similar study design that I am planning on my log files and search pattern analysis on the MELT portal. (Great minds think a like ;p yeah, right..)

The study design is a field study of a social bookmakring service in a large corporate (IBM) with quantitative data (click level analyses of log files and boomarking data) and qualitative data like interviews.

And, this is the data that I collect for my study (Table 1). Quite simlar!

They use a categorisation of search that I will adapt to my usage:
  • Community browsing (Examining bookmarks created by the community. In my case these could be lists of most boomarked items, travel well; tags; other people Favourites)
  • Personal search (Looking for bookmarks from one's own personal collecction of bookmarks)
  • Explicit search (Explicit search using traditional search box)
A few days ago I looked at the first logs from Melt. Here is the run-down. I have not used the same types of searches, but I will explore them for the later usage. But basically what you can see:

There were 512 search events:
  • 41% Explicit searches (adv. search)
  • 34% Community browsing (tagcloud)
  • 25% browsing categories
What was called "click-through" in this paper is when the particular navigation path resulted in a page view. This is what I call view resources. We can see that there were 538 page views, which implies that most likely users have clicked on more than one resource as a result of a search.
  • 74% of resources were viewed in search result list (srl), this means that they were results from advanced search and browsing categories

  • 20% of resources were viewed as a result of community browsing (e.g. tagcloud, lists)

  • 5% of of resources were viewed as a result of personal search (e.g. in Favourites)

How many searches resulted in viewing resources? I have to verify this
  • 85% of explicit searches and browsing categories
  • 65% of Community browsing (tag cloud and lists)
Millen et al. 2007 speculate that in their study, the a higher click-through rate indicates a more purposeful searching, whereas the community browsing was used more as an exploratrory search activity. I will need to keep my attention on this and whether I can make similar conclusions.

Most likely, anyway, I do not use click-through as an indication of intenet of using a learning resources, what I find most interesting in my study will be how many of the search activities result in a bookmark and/or rating. That is much cooler in my mind than viewing the page. Actually, I've noticed that in our system users view resources a lot, but they do not necessarily show any further interest on them.

That is why I use both implicit and explicit Interest Indicators. By Interest Indicators I mean Explicit Interest Indicators like ratings as a subjective relevance judgment on a given learning content, Marking Interest Indicators like bookmarks and tags on educational content, and Navigational Interest Indicators such as time spent on evaluating the metadata of educational resources, as well as Repetition Interest Indicators as categorised by Claypool, et al., 2001.

In this small study we can see that 19% of viewed resources ended up in users' Favourites. Again, ratings were much less, only about 6% of viewed resources ended up being rated. In the study from IBM system, they had 34-39% as high click-through. Will be keeping my eye on that.

My advisor asked me whether I could find search patterns in my logs. I think I could. In this paper (Millen et al. 2007) they do that :) and here is how: "Looking for Patterns using Cluster Analysis"
To better see the patterns of use, we performed a cluster analysis (K-means) for the different types of search activities. We first normalized the use data for each end-user by computing the percentage of each search type (i.e., community, person, topic, personal, and explicit search). The K- means cluster analysis is then performed, which finds related groups of users with
similar distributions of the five search activities. The resulting cluster solution, shown in Table 4, shows the average percentage of item type for each of four easily interpretable clusters.
Should not be too hard :)

Millen, D., Yang, M., Whittaker, S., Feinberg, J., Social bookmarking and exploratory search (2007). In L. Bannon, I. Wagner, C. Gutwin, R. Harper, and K. Schmidt (eds.).
ECSCW’07: Proceedings of the Tenth European Conference on Computer Supported Cooperative. Work, 24-28 September 2007, Limerick, Ireland

Activity Theory helping us explain folksonomies

I've been lately reading Engeström's (1) stuff and about Activity Theory. In this paper (2) I found a cool reference that explains how using Activity Theory as a theoretical framework we can study learning resource repositories (LOR) and their communities as one single system rather than as a loose set of instruments, subject, objects and outcomes.

I think that is a very important point. I've been arguing for quite a while that tags and social bookmarking can be revolutionary for LOR because now we can make a connection between the user, the resource and its metadata.

Before, it was only the resource and metadata, and a scary looking form for searching the resources (this is time before Google's simple search box, right?). Now, if implemented correctly, social bookmarking and tagging not only helps individuals with their resources management (e.g. Favourites), but also helps other folks to find resources through other users and their digital traces such as tags, number of bookmarks, etc.

To understand Activity Theory it is important to get the bases: everything, well, everything within human activity, is based on the three dominant aspects which are production, distribution and exchange (or communication).
The model suggest the possibility of analyzing a multitude of relations within the triangular structure of activity. However, the essential task is always to grasp the systemic whole, not just separate connections. (no page number in my print, just below Figure 2.6).

This is also the base for the analysis in Margaryan & Littlejohn (2008) for the learning resource repositories as instrument, with rules, division of labour, outcomes, etc.

Moreover, Engström emphasises that there is no activity without the component of production.
The specificity of human activity is that it yields more than what goes into the immediate reproduction of the subjects of productions. One part of this "more" is the surplus product that leads to sharing and sociality, discussed by Leakey & Lewin and Ruben above...(found on the next page)

I was thinking of folksonomies and how the production of tags is first of all good for me. People often times tag and bookmark to "keep found things found", it's part of personal knowledge management activity. Similarly like above, when referred to, for example, production of food that leads to sharing and sociality, in tags, the fact that they are made available to all, leads to sharing and sociality.

I thought that was pretty neat.

Ideas for the design of the "Evidence" study on cross-border use of learning resources

"Psychological tools" helping us explain "Travel well" resources

In the chapter 2, in the subsection called The Third Lineage: From Vygotsky to Leont'ev Engström (1) talks how Vygotsky distinguished between two interrelated types of mediating instruments in human activity: tools and signs. The latter belonged to the broader category of "psychological tools".

Psychological tools
..are directed towards the mastery or control of behavioral processes - someone else's or one's own - just as technical means are directed towards the control of processes of nature. (I guess this is from the same source as the quote above this, this is Vygotsky 1978, 55)
Examples of psychological tools:
various systems for counting; mnemonic techniques; algebraic symbol systems; works of art; writing; schemes, diagrams, maps, and technical drawings; all sorts of conventional signs, and so on. (Vygotsky, 1982:137, cited in Cole & Wertsch)
...and folksonomies :)

I was thinking that the concept of psychological tools is a pretty cool way to start looking into the cross-border use of digital learning resources. My definition is: By cross-border use of digital resources we mean that the user comes from a different country than the resource (cross-country) and/or the resource is in a different language than that of the user’s mother tongue (cross-language).

Many people are baffled about the cross-border use, they always ask me "..but how would teachers be able to make any use of a resource that is in a language that they don't understand".

Let me first elaborate on different types of use of cross-border resources, and then I give my theory on it. The plan is to make a study to see whether this holds or not.

We had a workshop with 35 teachers in science and language learning from different European countries. We asked them to bring along a learning resource that they though would be useful for other teachers in other countries. The observations were the following:

1. Resources that contain psychological tools: examples of this type of resources were in science, biology and math. Here are some examples of the characteristics of these resources
  • How chromosomes define characteristics (e.g. eye color, color of rabbit) or how the human heart works (we actually had examples of this in 2 different languages!).
    If you know the concept (as this would be part of the acquired knowledge of a teacher) you can explain it using this type of examples. It's not important that the manipulations are not in their own language, as the user interface is pretty symbolic and self-explanatory. Also, the little texts in other language did not seem to bother teachers.

  • DNA and how it works, another one on chemistry. The intersting thing is that the text is in Estonian and the resource was intended for pupils, but the group of teachers agreed that they would find it useful as a tool to demonstrate the concept by themselves (note: different intended user group). They explained that they would manipulate the resource for demonstrational purposes, not let the pupils to use it.

  • GeoGebra was one of the examples of a math application. They all loved it! Most importantly, it can be translated easily and it has user communities in different languages.
    Besides those points, they said that as math symbols are commonly shared, it is easy even in other languages. Even this type of applet would be useful for a non-Spanish speaker. BTW, they hated when some resources did not use the proper symbols, but wrote out "tiempo"
2. Resources with more text in a foreign language. One could also observe that some teachers were not minding too much about the text in foreign languages, but they used their pedagogical skills to work that into a challenge to learn.
  • One example from another workshop was a history resource about Greco-Persian war. Although this was already harder to navigate in Polish, a Belgian teacher started coming up with ideas where learners have to solve the language as a challenge. An example was given about a "match the words with an image"-type an exercise about the war equipment of a Greek solder.

  • Another innovative usage was this Japanese virtual reality game, where users have to find their way out of the virtual room with a help of a team. This Hungarian teacher had given it as an English exercise for his students to solve as a group and write down the instructions in English. He said that students were completing the exercise on Friday evening working online with their buddies!
3. Resources that are in the languages that the user has competencies in. This is of course the most used case. If you have studies Russian, for example, you can use resources in that language.

4. Language teachers. This is a group a bit apart too. They, of course, find the whole Internet as their resource for learning! But also the language resources that are created, say, in Finland to study Way finding in French, can be useful in any other language teacher somewhere else. Here the important thing is to make the instructions also in the language that is being taught, so that teachers understand (of course best is making interfaces easy enough without any instructions needed)

So, my theory is that the use of cross-border resources plays on a continuum that has two quite distinct extremes: On the one end we have psychological tools (example 1 and 2) and on the other Foreign language as a tool (example 3 and 4).

The acceptance or willingness of using this kind of material is related to the teacher's previous knowledge and understanding on the topic on the one hand, and on the other, it can be the knowledge or previous experience on coping with foreign languages. Additionally, the pedagogical skills set and pedagogical concepts that are preferred by that teacher drive the final decision on using such material.

A study design

In my study of "Finding evidence" I will only focus on the continuum of psychological tools and foreign language skills. I have a huge dataset from at least 3 or 4 different learning resource environments where users (teachers) have bookmarked or made collections of learning resources that exist in multiple languages.

The dataset currently has 440 users who have selected at least one learning resource to bookmark or add into their collection.
  • Calibrate (176 users, number of posts=1742)
  • LeMill (189 users, number of posts 1645, out of which 238 cross-border actions)
  • (16 users, number of posts 1176).
  • MELT
When I look at the titles of these learning resources, there are 3700 of them. 2992 of these resources have been bookmarked only once. The idea is to sort out cross-border resources (using my definition above), and see whether I can classify them on my continuum.

The good thing is that I know the user languages and country of origin in all the cases, the bad this is that I do not know the country of origin or language of all the resources :( That seems like lots of resources starring.

1 Engeström, Y.: Learning by expanding: An activity theoretical approach to developmental research. Helsinki: Orienta-Konsultit Oy (1987) Retrieved August 25, 2008, from

2 Margaryan, A., & Littlejohn, A.: Repositories and communities at cross-
purposes: Issues in sharing and reuse of digital learning resources. Journal of
Computer Assisted Learning (JCAL), 24(4), 333-347 (2008)