[Sm]all things considered by Riina Vuorikari: January 2009

Tuesday, January 13, 2009

Modelling the portal ecology: What goes around comes around

The three main actions on the portal: discover resources, play them and annotate. The two main group of users: ones logged-in and the others not.

I've divided the resource discovery process in three slots (Millen et al.):

Explicit search
Community search
Personal search

Play is when the user clicks on the link. We also call this implicit interest indicator, however, we are not sure whether it was relevant to the user or not. Worth noting anyway. This is also called "hits" or "click-through" in some lingo.

Annotation is when the user makes an explicit interest marking (indicator) on the resource, this can currently be either a rating (usefulness, scale 1 to 5) and bookmark with tags. Both of these actions are public.

About users and logs

In general terms we record all kinds of clicks and actions on the portal (see here). I studied the logs from the last 2,5 months. We know that we have 340 users who have a user name, excluding staff, etc., we have 168 "real" users. Out of them 82 had clicked on a resource on the portal at least once, so these users are included in the logs. Additionally, there are users who do not log, but I do not have any idea currently how many they are (check Analytics). There were 13 604 actions recorded, 40% from the ones who logged in and 60% by others who did not log in.

In general, we can think that the relationship between these 3 actions is important on the portal and can indicate something about its efficiency for users to get what they want, as well as for the system to get what is needed to keep it going. In our case, we are in the process of looking at how Social information can help the discovery. So, a perquisite is to have SI available, thus the system needs ratings and bookmarks.

With the contributing users (=logged-in) on the MELT portal:

2 searches result to one play;
2.6 searches result to one annotation, this can be either rating or bookmarking;
1.3 plays result to one annotation.

For the comparison, in Calibrate the figures were the following:

0.5 searches result to one play;
5.7 searches result to one annotation, this can be either rating or bookmarking;
11.3 plays result to one annotation.

I will study this further too. A quick look would say that a system, which emphasises Social Information for users own benefits (Favourites) and for everyone's benefits (allows Community browsing) like is the case with MELT, the loop for getting annotations is more efficient than in the system which does not make use of such information (e.g. Calibrate). The ration of search-to-annotation is 2.6 to 1 in MELT, whereas the same in Calibrate is 5.7 to 1.

However, if we look at the ration of "hits", the Calibrate system has been four times more efficient: it took one search to play 2 resources, whereas in MELT it took 2 searches for one play. The MELT system search function has been under constant development for speed, which has been somewhat problematic due to huge amount of content. I will report later on the same ration after our last optimisation effort.

What goes around comes around

Graph 1 depicts what is going on on the portal. I will explain this later in details. For each action I have indicated the percentage of total, e.g. Explicit search 78% is from all Explicit searches executed by non-logged in users.

Graph 1

Monday, January 12, 2009

Discovering cross-boundary resources on the portal

This is continuation for the previous post, the same data-

The question now is how and where do users discover cross-border resources? By this I mean that the user and the content come from different countries, and/or that the content is in a language other than the user’s mother tongue.

One challenge is discovering resources in general (previous post), another one is to discover resources that are in a different language and come from different countries. This is the case on our portal, so I'm interested in how to facilitate the discovery process that involves crossing those boundaries (language and national, which might have implications to the educational content of the resource), and hopefully make it more efficient.

My take is social information: making it readily available to all to leave cues from other users. I bet that this should facilitate the discovery process and thus also make it more efficient (more resources and faster).

So I looked at the previous data and the cross-boundary bookmarks from users. This is relative to the user, of course, so with every bookmark I compare whether the resource and the user come from the same country and/or is the users mother tongue different from the resource language.

41 out of 48 active users had bookmarked cross-boundary resources. They had added

299 distinct resources into their collections 350 times. Out of these resources
163 were cross-boundary resources, which had been added to their collections 198 times.
This means that 55% of distinct resources obtained during the period of 1.5 months were cross-boundary resources.

So it was interesting to see how many of these resources had Social information added to them?

Table 1

Table 1 shows that about a third of the bookmarked cross-boundary resources had Social Information on them, they were either bookmarked by previous users or existed in the "travel well" list. This is cool! Although we cannot say that the users only discovered these resources because of Social Information, it is important to know that it has had added value for users to discover them. I also found out that about 10% of bookmarks on resources with SI had been previously bookmarked by someone from the same country as the resource was. The fact that they were bookmarked, although still almost dismissible small (10%), is still good news for SI and social navigation based on it.

I then looked at where the resources are discovered: 62% of cross-boundary discoveries were done in Search Result List (SRL) whereas 37% took place in Community searches, most of them in the tag cloud (30%) and 7.5% on the Travel well and Most bookmarked lists (only one case in the latter :(

As comparison for the resource discoveries that did not cross any borders (e.g. German teacher found German resources), 90% of them took place in SRL. So it seems that for discovering cross-boundary resources the Social Information is important, as it allows users to do Community searches to discover these resources. We do see, though, that cross-boundary discovery is efficient also within the resources that cannot leverage the previous user experiences, as 63% of cross-boundary resources do not have any SI.

Interestingly, when we look at Table 1, we see that for the resource discovery that does not imply any cross-border action, users do not seem to care much for SI. Actually, more than 90% of these bookmarked resources had no SI. This is cool, as it seems that we need users to discover and annotate resources among their comfort zone (e.g. national and regional educational material in their own language) in order to make it more readily available to others.

Another thing that I've looked at are these measures for

usage coverage within the repository,
how many resources are shared among collections (e.g. Favourites) and
what I call the pick-up rate, this is how many of distinct used resources are reused. This could happen when someone discovers a resource that has SI related to it (e.g. in the travel well list or from other user's favourites, or just picks it up from SRL based on someone else's annotations). These were discussed in more details in the last paper.

Table 2

Table 2 shows these measure in LeMill, Calibrate and delicious, additionally, the gray column indicates this 1.5 month trial in MELT and the last column has all the data from MELT, which includes the pilot teachers, staff, partners, etc.

We can see that as the initial amount of resources is so big in MELT, we still cover only a minimal amount of resources, from 1 to 2 %. This does not even include the assets, which more than doubles the amount. Used here means that the resource is added to Favourites once (reuse more than once).

We can see, though, that even if the resources coverage is not that high, there is still quite a lot of sharing among used resources. This figure still remains low with 1.5month trial (about 15%, same as in Calibrate which did not make the SI available!), but if we look at all the usage so far, the sharing is at 43%. This is somewhat artificial, though, as there is a lot of staff use, but still I hope it indicates that making SI available on the long run helps sharing resources (or, I need to look better solutions on the portal for sharing, which is also planned but super delayed because of all the other dev programme).

We can already see that the pick-up rate is higher in MELT trail than in the 3 other platforms that I've looked previously. This is an indication that SI works, I hope.

Btw, I could not find any correlation between the act of putting the resources in Favourites and whether they have Social information related to them (I got some lousy 0.18 even if I removed 2 outliers who outperformed anyone else). Also, in some previous test I had hard time finding significant changes, so I might need to seddle with increased reuse rate, which has previously been shown to be abotu 20% across collections. I found that it was about half of this (or the same as general reuse) in my previous paper.

Saturday, January 10, 2009

Where does a CC-licence take you?

..or better, your photos? Some time ago one of my pics was "selected" for a travel guide from Prague. That was pretty cool and I really appreciated it. Tonight, I was checking my Flickr stats and noticed that one photo had been viewed many times recently. So I looked what's up with that.

I followed the referrals. Here are the top 3 sources, some other hits were through search engines for generic keywords like ski, etc.

1. It was the main picture on the Facebook group for Sankt-Anton fans.

2. It was on a Yahoo! travel website for best skiing in America. They had a Flickr badge with some random ski shots, et voila moi! (see the small image on the right corner)

3. The funniest of all, though, is that it is one some German photo website where the dude discussed how the picture should have been framed differently! Hmm..I think my boyfriend, who took the picture, did not appreciate this lesson..

It's just kinda weird to find yourself in odd places on the web, but hey, isn't that what the creative commons license is supposed to allow. So this is just one consequence of it, I guess.

Talking about odd places to be on the Web, the other night my Google alerts had picked up that my blog post appeared on a porno site. Sure enough, there after lots of photos of youknowwhat, were feeds from my latest blog post. Pretty hilarious... I don't recommend clicking on the link, but you can read the paper though!

Wednesday, January 07, 2009

Out Now: Special Issue on Social Information Retrieval for Technology Enhanced Learning

I am glad to announce the Special Issue on Social Information Retrieval for Technology Enhanced Learning (SIRTEL) which just came out today in Journal of Digital Information (JoDI) Vol 10, No 2 (2009)!

I co-editored it with Erik Duval and Nikos Manouselis. The following stuff's in it, enjoy!

Special Issue on Social Information Retrieval for Technology Enhanced Learning	HTML
Erik Duval, Riina Vuorikari, Nikos Manouselis

Articles

Identifying the Goal, User model and Conditions of Recommender Systems for Formal and Informal Learning	Abstract PDF
Hendrik Drachsler, Hans G. K. Hummel, Rob Koper

The Pedagogical Value of Papers: a Collaborative-Filtering based Paper Recommender	Abstract PDF
Tiffany Y Tang, Gordon McCalla

Lost in social space: Information retrieval issues in Web 1.5	Abstract HTML
Jon Dron, Terry Anderson

Exploratory Analysis of the Main Characteristics of Tags and Tagging of Educational Resources in a Multi-lingual Context	Abstract HTML
Riina Vuorikari, Xavier Ochoa

Visualising Social Bookmarks	Abstract PDF
Joris Klerkx, Erik Duval

A Special thank to people who participated in the PC:

Alexander Felfernig, University of Klagenfurt, Germany
Brandon Muramatsu, Utah State University, USA
David Massar, European Schoolnet, Be
Hendrik Drachsler, Open University of the Netherlands, The Netherlands
Jon Dron, Athabasca University, Canada
Marc Spaniol, Max-Planck-Institute for Informatics, Germany
Martin Wolpers, Fraunhofer, Germany
Miguel-Angel Sicilia, University of Alcala, Spain
Nikos Manouselis, Greek Research & Technology Network, Greece
Rick D. Hangartner, MyStrands, USA
Salvador Sanchez, University of Alcala, Spain
Xavier Ochoa, Escuela Superior Politécnica del Litoral, Ecuador
Yiwei Cao, RWTH Aachen University, Germany

Tuesday, January 06, 2009

Users on the portal: consuming and contributing

I continued the previous study with more data: this time I took all the logs from Oct 1 to December 18 2008. The idea, again, was to see how much the previous annotations (ratings and bookmarks) would "guide" the choice of new users.

The datasets:

1. All bookmarks and ratings in MELT until Oct 1 2008. This comprises of 565 distinct resources. We call these resources with Social Information (SI), as it is something that is shared and made public by users.

88 of these resources were on the list called "Travel well" resources. They are made available directly from the portal front page. They were recorded to the system by a user called "EUNRecommender" suggesting that these resources should be of interest and good quality. Additionally, annotations from other users were made available.
477 of these resources were annotated (ratings and bookmarks) by previous users. These were either the pilot teachers, project partners and staff in the office. The top bookmarked and rated once would appear on the "Most bookmarked list", also accessible from the front page. Similarly, annotations from other users were made available.

2. I took server-side logs from the period of Oct 1 to Dec 18 2008. At the end of this period the portal had 340 users registered, out of which some people never used the portal when logged in and a number was project related staff. We excluded those people from the logs and were left with 168 users, out of which 82 had clicked on a resource on the portal at least once. Additionally, users who do not log in are recorded, so our click-though includes many more users. This group had:

clicked ("played") on 1711 distinct resources (all users);
out of which 974 were clicked ("played") by logged-in users (82);
bookmarked 294 distinct resources 351 times;
rated 323 distinct resources 385 times;
= 394 distinct resources annotated.

The method for this study is a manual log-file analyses using my own defined logging scheme (see here).

Questions: How and where on the portal do users discover resources, do they rather discover new resources or re-discover once that have been annotated by others? Are the suggestion lists like "Travel well" resources or "Most boomarked" effective, i.e. are many resources discovered through them?

We split our understanding of "discover resources" meaning two different things and look at the separately. Discover meaning:

finding the resource on the portal, clicking on it (i.e. play). This is also called implicit interest indicator;
finding a resource and creating an explicit marker on the resource in terms of rating or bookmarking with tags (explicit interest indicator).

The first one is important to know how big coverage the resources that have been "touched upon" or "hit" cover from all resources in the repository (use). The second one, is important as it creates explicit maker which can be used for cues and social navigation purposes. Moreover, the second one can be used as a proxy for reuse. For reuse we use the following definition: " resource is integrated in a new context with other components, and when this occurred more than once, we consider the resource reused".

1. Who discovers resources?

Out of 82 active users, 63 had annotated a resource at least once, 75% had annotated more than once. Average was 11.6 annotations, median 4. The top two users annotated 120 and 108 times, the following users 50 times. There were 36 users who bookmarked and rated, and 14 users who only bookmarked and 13 who only rated.

2. What kind of "actions" take place?

By actions, in this case, I mean playing the resource (click on the link), bookmarking and/or rating it, but also viewing evaluations related to this resource or checking the "Favourites" from the user who had discovered an interesting resource.

Table 2 presents data on how the users (described in 2. above) interacted with resources. The left column explains where on the portal this action took place, whereas the top row indicates the action. There were total of 3542 actions related to this.

In general, this group annotated 75% resources in SRL, 15% in tag cloud, 4% on the Travel well list, 3% in their own favourites (ratings) and only 1% on Most bookmarked list.

Table 2

Let's first look at what happens in the Search Result List (SRL). Table 2 shows that most things happen when users are here: resources are played more than 2000 times (2/3 of all plays), most rating (70%) and bookmarking (73%) of "virgin" resources also take place here. We can also see that some small amount of resources with SI are rated and bookmarked from SRL (less than 10%). This indicates that some users paid attention to cues made available by previous SI and found them useful.

Mostly, though, it's "virgin" resources. This is a good news, in a way, as I was worried that users might be lazy and not discover resources that other users have not annotated before. On the other hand, most users click on resources and never annotate them, so we are left to guess whether they liked them or not... only 13% of these clicks lead to rating the resource (268), for example.On the average, 71% of virgin resources "hit" get annotated, and 29% of resources with SI get annotated.

Table 2 also shows that the tag cloud is a good catcher of ratings (15%) and taggings (15%), whereas the "Travel well" list is a real hit catcher. 13% of hits on resources with SI are played here. This list was comprised of 88 resources, out of which only 54 got played. The average was 6.8 plays/resource (median 3.5), but in reality, some got played a lot more than others. 30% received more than average hits. The highest were 30, 29 and 28 (great, just checked the top dog, and it's a dead link :(.

The story looks worse for actually bookmarking and rating resources from the Travel well list, only 18 of them got hitched (20%). 2 got bookmarked twise and two rated 5 times (no overlap). So, I suspect that we did not succeed that well in creating appealing "recommendations". I'll get back to that point at a later stage. A rather effective features of TW list was the use of "view evaluations" and "view other users", about 40% of both were generated here, so they helped users to social navigate the portal.

Compared to "Travel well" list, the "Most bookmarked" resources did not seem to have the same effect of chatching eyeballs. This is bizarre, as both these are available on the font page, however, "Most bookmarked" are a click away on the tab. As 84% of these hits come from users who are not logged in, I think it might be lots of clicks from our testing period, but also it's possible that this is only what users experience on the portal and then fly away (forever..?).

Lastly, table 2 shows that some ratings take place in Favourites. That's good, since we intended favourites to be the place for that. I imagined that teachers first want to use the resource, so they put it on thier favourites and then come back later to rate it. Well, users know better, they seem to take about a second to view the resource and bookmark it on the SRL. I will have to look if there is any qualitative difference between ratings of the ones in SRL and Favourites?

3. Where do these actions take place?

I was also interested in seeing what kind of search methods do users choose to use. Possibilities offered on the portal are the following:

Explicit search: using traditional search box with text or advanced search options. This results in resources on a Search Result List (SRL) where users can view the metadata about the resources as well as the annotations by others. Searches within results in SRL can also be refined, or ranked either by popularity or ratings. Also "Browse by category" results in this.
Community browsing: These include browsing the tag cloud, and examining bookmarks created by the community. In our case these are lists of most boomarked items; travel well; tags; other people Favourites.
Personal search: Looking for bookmarks from one's own personal collecction of bookmarks (Favourites)

Table 3.

Of all searches 76% were Explicit searches, where 21% Community searches and 2% personal searches.

What comes to spearing actions on other things on the portal, we can observe some differences between the logged-in and not logged-in users. Whereas not logged-in users spend most of their actions on searching (60% + 17%) and playing resources (23%), the logged in users have a more variety.

The logged-in users search differently, they use far less the Explicit search function (28%) and also spend less time on the Community search (only 8%), but additionally they have the Personal search function available (3,5%). The difference here is that these users can interact with the resources that they have found earlier ("keep found things found"). There is quite a huge difference between how much actions are spent on searching between these groups, 77% vs. 40%. Logged-in users also play less resources (18%), but still "outsmart" not logged-in users in terms of searches returning plays:

the first group spears 3.4 searches to play one resource, whereas
the logged-in users spear 1,98 searches to play one resource.

This apparent inefficiency is partly due to the fact that we are still testing the server and probably many tests are done when the user is not logged in. However, this is something to remark for now and keep the eye on (check: can I omit the searches from the office using the data from Analytics).

A major difference between the two groups is what I call contributing actions. We already have seen that 27% on resource with SI get played and that on the average 13.3% of searches take advantage of Community browsing. We also saw that some 6% of annotations on the SRL are on resources that have Social Information related to them. So where does this social information come from?

In Table 3 we see that contributing actions cover about 40% of all the actions by logged in users. This is about 16% of all actions during this period! I will come back to this point later trying to make a picture of what the input is by a group of logged-in users and how it can be taken advantage of by other users of the system.

4. What resources do get played?

Users can access 30116 learning resources through the portal, and many more assets. During the trial period, 1547 distinct resources got played 2828 times by all users. That make an average of 1.83 plays/resources, but in reality some get a few hits and a few gets many hits (median: 1). 27% get more than one hit, most hits were 35 ( strangely, this resource was not even bookmarked "123216875").

I was interested in what happened with resources that had Social Information vs. the ones that were "virgins", not yet annotated by previous users.

Of the total plays (2828 times) 73% were on resources without SI and 27% on resource with SI. If we look at it from all resources that were made available, out of 565 resources with SI 34,7% were played at least ones, whereas from all other resources (29551), 4.57% got played at least once. Also, some of the newly annotated resources got hits right away, we have 54 resources that got played 70 times, most likely thanks to their new annotations. Some of these newly annotated resources (32 cases) prompted 99 further annotations from the new users. These 99 annotations were made both in SRL (69%) and in social navigation areas like tagcloud and Favourites (31%). This shows that these new annotations became useful to other users right away, actually more so than the previsouly annotated resources, out of which only less than 10% were found of use by these users (31% vs. 10%, see Table 4).

It's quite interesting, though, that only about one third of resources with SI got played. I would have thought that it is more. This might be due that our search result list is not ranked to start with. It is possible for the user to re-organise the results by popularity or ratings, but actually we do not know whether this happens (currently have not found a way to log it). I guess the other side of the coin that surprises me is that users clicked on so many resources that did not have any annotations on them. I guess it's a good sign of curiosity :)

Interestingly, we find that users who are logged-in discovered less resources than the others. Out of all these resources (n=1547), 68% were played by users who were not logged in and 44% by logged-in users (last row in Table 1). The same can be observed for resources with SI and not.

Table 1.

5. How many annotations per resource?

Of the total amount of annotations received during this period, we can count that 394 distinct resources received 734 annotations, they were half and half ratings and bookmarkings.

322 resources received 383 ratings. 13% received more than one rating, average 1.18. Top amont of ratings were 5 ratings, which was only on 1 resources.
294 resources received 350 bookmarkings. 14% received more than one bookmark, average 1.19. Top amont of bookmarks were 5, which was only on 2 resources.

58.5% of annotations were on new resources, and 41.5% on old ones. They were distributed rather differently, the "old" resources received on the average more annotations than the new ones.

38 Travel well resources received annotations 99. 63 of them received more than one annotation, average is 2.6 each. The top one received 12 annotations.
97 previously annotated resources (this includes 32 resources that were discovered during this period and annotated later by other new users) received 209 annotations. Average is 2.15 annotations per resource.

6. Story of the Travel well list

Barely about 15% of the 88 resources on the list were re-discovered by the new users! Previously we saw that these resources received a good amount of "hits" and eyeballs, but not so many of them actually resulted in ratings, and when they did, they were not equally distributed among the resources (Table 4). Only 18 of the annotations took place on the TW-list, otherwise, an additional 20 resources from this list were annotated on the SRL and tagcloud. So the success-rate of these recommendations was less than 50%! Can barely call these recommendations. I will look later which ones were "thumbed up" and which ones downed. As with the previously bookmarked resources, ahem, maybe the taste differs between the two sets of users.

Table 4

The resources on the "Travel well" list received many more hits (average 7.12) than just "any previously annotated resource" (average: 0.68).

Sunday, January 04, 2009

New users on the portal and resource discovery

I looked at 18 new users on the portal, and studied resources that they bookmarked. I wondered how many of these resources had previous annotations by users? By annotations I mean that previous users had added ratings on them and Favourited these resources. If this is the case, it's clearly shown on the portal.

But are these annotations persuasive? Do they help users make their decision better or faster?

I had two sets of data:

last 3.5 months (Aug to Nov 17 2008)
Nov 18 to Dec 18. This are my 18 users who had bookmarked 114 resources.

Out of these results, it seems that 2/3 of the resources that these new users bookmarked had no previous annotations on them! I have to verify this finding, because currently I lack data from March to July to see what was bookmarked then.

Table 1

Anyhow, let's see what the current mini-study holds. Out of the third of resources that were discovered by this group, about 25% had previous annotations on them. They were mostly done by users before this group got initiated on the portal, however, some were also discovered thanks to the bookmarking by this group.

Only about 8% of resources were discovered through a special list called "Travel well" resources. These resources have been added there by "EUNRecommender" which currently is hand operated, but mainly based on picks by other users from at least two different countries. I find this figure rather surprising, as this "Travel well" list is the first thing that the user sees when they come to the portal.

Anyhow, I find it cool that 25% of bookmarks by this group were resources that had previous annotations. What we cannot say, though, is whether these users could have found these resources without these annotations displayed publicly on the search result list. However, I think that my measures like "pick-up rate" and "overlap" among Favourites will help me sort that out.

Here is a visualisation of this. The huge node is "sos-rec" which means that these resources were previously annotated (rate, bookmark). What is called "EUN Recommender" are resources from the "Travel well list". Other than that, the new users are the nodes which are connected by edges to resources that they have bookmarked.

Right from the bat, we can see that 5 users had taken their totally own trails and bookmarked resources that no one had bookmarked before - quite cool!

Additionally, there are few users on the lower right hand corner who are only connected to the whole graph through one resource (user: 192682) and (user: 217391).

[Sm]all things considered by Riina Vuorikari