Tuesday, September 02, 2008

Emerging search patterns on learning resources

I am hugely inspired by the stuff from J.Feinberg and D.Millen, especially by the studies that they've done on doger, the IBM internal social bookmarking service. I must admit, though, that I had missed on it a bit, I cannot believe! Anyway..

This paper is really interesting, Social bookmarking and exploratory search (2007), not least for the reason that it offers a very interesting, almost similar study design that I am planning on my log files and search pattern analysis on the MELT portal. (Great minds think a like ;p yeah, right..)

The study design is a field study of a social bookmakring service in a large corporate (IBM) with quantitative data (click level analyses of log files and boomarking data) and qualitative data like interviews.

And, this is the data that I collect for my study (Table 1). Quite simlar!

They use a categorisation of search that I will adapt to my usage:
  • Community browsing (Examining bookmarks created by the community. In my case these could be lists of most boomarked items, travel well; tags; other people Favourites)
  • Personal search (Looking for bookmarks from one's own personal collecction of bookmarks)
  • Explicit search (Explicit search using traditional search box)
A few days ago I looked at the first logs from Melt. Here is the run-down. I have not used the same types of searches, but I will explore them for the later usage. But basically what you can see:

There were 512 search events:
  • 41% Explicit searches (adv. search)
  • 34% Community browsing (tagcloud)
  • 25% browsing categories
What was called "click-through" in this paper is when the particular navigation path resulted in a page view. This is what I call view resources. We can see that there were 538 page views, which implies that most likely users have clicked on more than one resource as a result of a search.
  • 74% of resources were viewed in search result list (srl), this means that they were results from advanced search and browsing categories

  • 20% of resources were viewed as a result of community browsing (e.g. tagcloud, lists)

  • 5% of of resources were viewed as a result of personal search (e.g. in Favourites)

How many searches resulted in viewing resources? I have to verify this
  • 85% of explicit searches and browsing categories
  • 65% of Community browsing (tag cloud and lists)
Millen et al. 2007 speculate that in their study, the a higher click-through rate indicates a more purposeful searching, whereas the community browsing was used more as an exploratrory search activity. I will need to keep my attention on this and whether I can make similar conclusions.

Most likely, anyway, I do not use click-through as an indication of intenet of using a learning resources, what I find most interesting in my study will be how many of the search activities result in a bookmark and/or rating. That is much cooler in my mind than viewing the page. Actually, I've noticed that in our system users view resources a lot, but they do not necessarily show any further interest on them.

That is why I use both implicit and explicit Interest Indicators. By Interest Indicators I mean Explicit Interest Indicators like ratings as a subjective relevance judgment on a given learning content, Marking Interest Indicators like bookmarks and tags on educational content, and Navigational Interest Indicators such as time spent on evaluating the metadata of educational resources, as well as Repetition Interest Indicators as categorised by Claypool, et al., 2001.

In this small study we can see that 19% of viewed resources ended up in users' Favourites. Again, ratings were much less, only about 6% of viewed resources ended up being rated. In the study from IBM system, they had 34-39% as high click-through. Will be keeping my eye on that.

My advisor asked me whether I could find search patterns in my logs. I think I could. In this paper (Millen et al. 2007) they do that :) and here is how: "Looking for Patterns using Cluster Analysis"
To better see the patterns of use, we performed a cluster analysis (K-means) for the different types of search activities. We first normalized the use data for each end-user by computing the percentage of each search type (i.e., community, person, topic, personal, and explicit search). The K- means cluster analysis is then performed, which finds related groups of users with
similar distributions of the five search activities. The resulting cluster solution, shown in Table 4, shows the average percentage of item type for each of four easily interpretable clusters.
Should not be too hard :)

Millen, D., Yang, M., Whittaker, S., Feinberg, J., Social bookmarking and exploratory search (2007). In L. Bannon, I. Wagner, C. Gutwin, R. Harper, and K. Schmidt (eds.).
ECSCW’07: Proceedings of the Tenth European Conference on Computer Supported Cooperative. Work, 24-28 September 2007, Limerick, Ireland

No comments: