[Sm]all things considered by Riina Vuorikari: November 2007

Monday, November 19, 2007

eTwinning/bookmarks and social networks

Excellent write up here about SNA on social networking sites. This makes me think of eTwinning, or my social bookmarks, and the SNA analysis there to better support users.

Kumar, Novak and Tomkins (200&) saw that network activity is of three types:

“Singletons,” who have no connections and are least central

The “giant component,” which is the largest group of nodes tightly connected to the central nodes and to each other
The “middle region,” which represents isolated groups which interact amongst themselves but not with the rest of the network, forming isolated stars. These groups grow one user at a time. Over time they merge with the giant component.

The node analysis of these networks showed that more than half of a social network is outside the giant component where the greatest centrality lies. They used the “control” definition of centrality to determine this. The research also highlighted a prevalence of “stars” in the middle region which are mini social networks, typically driven by one dynamic member who serves as the point of centrality with others serving as satellite nodes – connected to the dynamic member but not to each other. In Kumar, Novak and Tomkins’ analysis the middle region represented one-third of users on Flickr and about ten percent of users on Yahoo! 360.

Also keep in mind that the most growth happens in the middle region where dynamic members influence others to join their network. These sub-networks can gradually join the giant component over time. Once they do, the importance of the dynamic member diminishes. Even if that dynamic member were to leave the network, the others would stay in the network.

So, what is needed is to support the "stars" in their growth so that they become independent of that one dynamic member and are able to continue even without that person.

Wednesday, November 14, 2007

Such a cool way to send a message, thanks B.Dylan!

Tagging in different e-learning environments

In the last days we've had a few discussions about tagging in e-learning environments. My environment, where the tagging takes place, is a portal for learning resources.

Today I came across this nice graph that displays "power law of participation". Now, haven't looked at the scientific background of it yet, so no comments on that. Anyway, it kinda rang the bell with what I'm doing when looking into levels of user engagement on the portal.

According to this graph, adding things to favourites (e.g. bookmarking) and tagging them represents a pretty low threshold to participate in the activities of that given community.

I'm also looking at leMill environment, which on the other hand, demands a pretty hight level of user engagement, as it is about collaborative authoring of digital learning resources. About a year down with users, there is somewhat little collaborative authoring that actually takes place, Hans told me yesterday.

Maybe tagging in some way could help the participant to take the first steps? well, they can already tag and favourite things in LeMill, so maybe the issue is rather to see if similar levels of engagement appear in that community.

So, along with that, I am interested in looking at the tags in LeMill from the same point of view that I'm doing for tags in our learning resources portal. The difference is that our case is clearly what is called broad folksonomies, whereas leMill should be a rather classical narrow folksonomy. Or is it? Maybe once we start looking at those tags as a triple {user, resource, (tags)} with a timestamp on them, it appears that participants first start by bookmarking and tagging resources from other users, before the user takes a step to create her own resources and finally collaboratively work on other's resources.

Update:

So, some data to back-up was found:

Social Technographics®
Mapping Participation In Activities Forms The Foundation Of A Social Strategy
by Charlene Li
http://www.forrester.com/Research/Document/Excerpt/0,7211,42057,00.html
with Josh Bernoff, Remy Fiorentino, Sarah Glass

This is a document excerpt EXECUTIVE SUMMARY

Many companies approach Social Computing as a list of technologies to be deployed as needed — a blog here, a podcast there — to achieve a marketing goal. But a more coherent approach is to start with your target audience and determine what kind of relationship you want to build with them, based on what they are ready for. Forrester categorizes Social Computing behaviors into a ladder with six levels of participation; we use the term Social Technographics® to describe a population according to its participation in these levels. Brands, Web sites, and any other companies pursuing social technologies should analyze their customers' Social Technographics first and then create a social strategy based on this profile.

Wednesday, November 07, 2007

notes on "Context, (e)Learning, and Knowledge Discovery for Web User Modeling: Common Research Themes and Challenges"

"Context, (e)Learning, and Knowledge Discovery for Web User Modeling: Common Research Themes and Challenges" by B.Berendt

This paper is about context and how to define it or how it is defined differently. The following is related to "Context in Web usage mining and eLearning"

2.1 Context as data and as metadata

"In order to evaluate whether intended and actual usage coincide or not, and in order to obtain a more fine-grained picture of actual usage, it is of course interesting to measure aspects of actual usage. "

- This is also one thing that we are interested to find out in MELT, and partly also in my PhD. As we have very little access to "actual use" we try to infer this type of information from usage logs. E.g. We have a teacher who has said in his profile that he teachers students from 12 to 13 year olds. If he bookmarks LOs that have intended audience of 14-18, we can maybe infer that this LO can also be used for younger students. Especially, if we start seeing this taking place a lot, we might want to update the LOM on intended audience: instead of 14-18 we could say 12-18.

- My interest is also to see if tags can give us any hints of this.

2.2 Context and model parts

"context representations can form and/or enrich (a) user models, (b) material/environment models, or (c) interaction models."

- EUN uses a) in one search to rank resources, but we are still only implementing it and we don't know how users react to it. That is related to my own PhD, as are how different search methods are used. In general, we do way too little with user modeling (I guess bigger issues are still more imminent)

2.3 Context: parameters of the (inter)action

- For my PhD I'm looking into user logs to create "levels of user interaction", e.g. what does it mean if a user views a page vs. makes a bookmark on it. We want to use this as an input for a recommendation system, for example.

- I'm also interested in the type of search that the user has chosen and its relation to the search task that the user has at hand.

- Tags were mentioned in this context, that is also a huge part of what I am looking. There are different questions around them, one most interesting related to search is how they can be used for discovering resources.

Need to look into these papers:

- B. Berendt, G. Stumme, and A. Hotho. Usage mining for and on the semantic web. In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha, editors, Data Mining: Next Generation Challenges and Future Directions, pages 461–480. AAAI/MIT Press, 2004.

- Claus-Peter Klas, Hanne Albrechtsen, Norbert Fuhr, Preben Hansen, Sarantos Kapidakis, L aszl o Kov acs, Sascha Kriewel, Andr as Micsik, Christos Papatheodorou, Giannis Tsakonas, and Elin Jacob. A logging scheme for comparative digital library evaluation. In Julio Gonzalo, Costantino Thanos, M. Felisa Verdejo, and Rafael C. Carrasco, editors, ECDL, volume 4172 of Lecture Notes in Computer Science, pages 267–278. Springer, 2006.

- Totally agree with his observation, not the method: Tanimoto [53] emphasizes that may be difficult to conclude, from a mere clicking event, that there was indeed attention paid to (specific) content of the requested page.

2.4 Context: background knowledge

Tags, tags, tags. multiple views.

2.5 Context: Activity structure
" This metadatum can provide important information about a visitor’s intention or expectation (e.g., whether they followed a prescribed link from a course page, or whether they found a material by actively searching with a very detailed search phrase)."

For me this is important, I guess using terms from this paper, I'm interested in user's intentions and expectations and finding out the ways the users choose to access or discover resources in our portal. I'm also interested in seeing whether one method is more useful to a given task, e.g. if people like browsing to find inspirational material and some other method (social information retrieval vs. information retrieval) for another task. If we know what kind of method is useful for a given task, I think we can help our users a lot.

2.6 example

An example is given using the three aspects of context; activity structure, parameters of the (inter)action and background knowledge. The type of analysis allows answering questions like: which search options are popular and are there differences between users? Which content areas were frequented, and how did people navigate between then; did they go back to the search options, or did they use the inter-content links? Did certain content areas become hugs for navigation and thus served to organise the domain and the presentation of the domain? On the other hand, questions like; were there differences between users with high verbal and users with high visuo-spatial competencies; did certain textual or pictoral material become hub?

These are also questions that I am looking at in lre portal and am getting a good idea of them. However, I have not been able to link them with the task at hand yet, which is something that I'm interested in.

Schooling for Tomorrow scenarios and Science-Fiction

Last autumn I heard a few e-learning keynotes with a heavy science-fiction emphasis. That was wild, I totally loved it. I can not agree more on the idea that working with future scenarios, or science-fiction for that matter, is actually helping the future to come along. It is about helping to shape the future after having peeked in to the future with a positive or a negative outlook. And, I'd like to say that looking or creating scenarios in many perspectives is useful too, as it helps you to see whether this is where you want to end-up or not.

There's been some work going on since the beginning of the millennium regarding scenarios for the future of schooling, we also in EUN worked on that. The OECD report Schooling for Tomorrow, is out too. It's way less exiting than talking about 2048 in a science-fiction scenario, I'm afraid, but still aiming at the same goal - see how technology and network enhanced learning could be used in the days to come.

Of course I picked upon the scenario called "Learning in Networks replacing schools"

This scenario imagines the disappearance of schools per se, replaced by learning networks operating within a highly developed “network society”.

Networks based on diverse cultural, religious and community interests lead to a multitude of diverse formal, non-formal and informal learning settings, with intensive use of ICTs.

How about that for science-fiction?

Well, if it is up to me, I would like to see learning networks in schools even if schools per se are not facing the extinction. You know, before we need to go to a dinosaur museum to see a replica of a teacher.

I don't mind the idea of "networked society", but somehow, when wearing those gray classes, I'm thinking of the efficiency of terrorist cells workings, and how religious and local interest groups could manipulate their own learning interests on me, while at the same time monitoring with whom do I want to learn on the international scale. Yak!

The Schooling for Tomorrow scenarios by OECD are a real tool set for policy-makers, and why not others, to work on. I like practical things like these are. Lately, I've been toying with the idea of making people, who work with education, technology and networks, to write science-fiction short stories of learning in the future. I also think that this would be a very helpful exercise for other PhD students to open up their thinking and not be stuck with what we got now.

How to get started with your own science-fiction story on Schooling for Tomorrow

It's important to understand that it's the journey that is important, not the destination. To say it in other words, I think the whole thinking process to come up with a plot for your story is what counts! Don't worry about picking up the right publisher now ;)

This is what I found out about writing science-fiction, how to get started

idea: the premise, the basic thought around which the story will turn
a setting: worl for your story to take place in; it can be familiar, or wholly new
Characters: two or three, at least, to people your story
Aliens: (optional) strange and mysterious beings for your characters to encounter
Problem: something your characters want, need, must escape from, etc.

Hmm, I really like to think of Aliens and education...

Next, when you have those sorted out, think about the five dimensional framework from Schooling for Tomorrow. What are:

“attitudes, expectations, and political support”,
“goals and functions” of education systems
“organisations and structures”,
the “geo-political aspects”
“the teaching force”.

Then, just let it flow!

My own attempt on this is in a wiki. I've already had quite a few engaging and hilarious dinner discussions with my friends about what should the plot be. We never got very far, but it's always been a lot of fun! I only wish we could somehow get the discussion transcripts on the wiki....

Tuesday, November 06, 2007

Notes on "Collaborative tagging and Semiotic Dynamics"

By Gattuto, C., L.Vittorio and L.Pietronero (2006).

Firstly, I must say that I was glad to read this paper. Lately, I've been seeing many papers talking about the properties of folksonomies, like co-occurrence, etc., which have intrigued me quite a lot. This paper explains the process pretty well and underlines an important point - they factor out the users and only deal with streams of tagging events and their statistical properties!

I must admit that this makes the whole area of Semiotic Dynamics less attractive to me. I think it is important to study tags and their properties, but not in isolation from the user. I see (barely) the point to explain tagging activity and the growth of tags in separation from the users. But fair enough.

Problem statement: Uncovering the mechanisms governing the emergence of shared categorisatioins or vocabularies in absence of global coordination is a key problem with significant scientific and technological potential. Collaborative tagging provides a precious opportunity to both analyze the emergence of shared conventions and inspire the design of large agent systems.

Semiotic Dynamics study how populations of humans or agents can establish and share semiotic systems, typically driven by their use in communication. The author argue that the emergence of a folksonomy exhibits dynamical aspects also observed in human languages, such as the crystallisation of naming conventions, competition between terms, takeovers by neologisms, and more.

Users interact with a collaborative tagging system by using tags or adding new resources to system
Basic unit of information in collaborative tagging systems is a (user, resources, {tags}) triple, which they refer as post in this paper. Tagging event is a tri-partite graph (with partitions corresponding to users, resources and tags, respectively) and can be used as a navigation aid in browsing tagged information

Comment: I like the tri-partite graph as navigation aid, yes!, but as the authors mention just above, they don't think of other users and those networks as navigational aid. In contrary, they omit the users just to study the properties, which strikes bizzarre to me.

The authors cite the "rich get richer" model (Yule-Simon's stochastic model) and propose to enhance it with a "fat-tailed memory kernel". This original model is related to the construction of text from scratch:

At each discrete time step one word is appended to the text: with probability p the appended work is a new workd, never occurred before, while with probability 1-p one work is copied from the existing text, choosing it with a proability proportional to its current frequency of occurrence. This simple process ields frequency-rank distribution that display a power öaw tail with exponent alpha = 1-p, lower than the exponent we observe in actual data. This happends because the Yule-Simon process has no notion of "aging", i.e., all positions within the text are regarded as identical ..

This all leads to a model of users' behaviour: the process by which users of a collaborative tagging system associate tags to resources can be regarded as the construction of a "text", build one step at a time by adding "words" (tags) to a text initially comprised of n 0 words. There is also that same Yule-Simon model with long-term memory (about inventing new tags or using existing ones), but recent tags are used more often than old ones.

Also, "in our model,.., the average user is exposed to a few roughly equivalent top-ranked tags and is translated to mathematically into a low -rank cutoff of the power law, i..e., the observed low-rank flattening".

Conclusion: It seems that users of collaborative tagging system share universal behaviour which, despite the intricacies of personal categorisation, tagging procedures and user interactions, appear to follow simple activity pattern.

There is also something about the co-occurrence between high-rank and low-rank tags: it says: "This suggest that high-frequency tags partition - or "categorize" - the resources marked by tags of lower frequency. "
Comment: This all sounds interesting and important, but will need to look into that later.

Monday, November 05, 2007

My PhD research

| View | Upload your own

Notes on "Aspects on Broad Folksonomies"

Aspects on Broad Folksonomies by M.Lux and M. Granizer (2007)

This paper continues the trend in studying and analysing the underlying statistical properties of broad folksonomies that aims to identify laws and characteristics which allow inferring those properties. A few notes on what I found interesting related to the emerging notion of quality of tags, something that I've also spared a few thoughts on.

First, though, on some other issues. The paper talks about the emergence of power law distribution in folksonomies. They describe which approach they took to fit the sample to a power law, which was something that I've sometimes contemplated on the how-part of things. The paper aims at analysing whether one can find similar term distribution in folksonomies as in classical term retrieval (e.g. Zipf. note: Zipf's law with an exponent between 1 and 2). The dataset is that of delicious (uh, with about 800 000 bookmarks and about 27 000 users- I got a way to go with my MELT bookmarks).

Tag co-occurrence
They are able to show that "for around 80% of the tags of a folksonomy the co-occurring tags follow a power law distribution, which approves Cattuto's assumption. We found that for about 90% of the estimated power law exponent B xxx [-1.5, -0.5], which shows that for most tags co-occurrence follows a model with similar parameters. "
Resource and user based tagging characteristics
Secondly, they looked into frequently used tags (more than 30 users).

For resources statistics they (frequency of users tagging the resource with a tag) found that around 18,4% of resources followed a power law distribution.

assigned by lot of users to few resources (head) and to a lot of different resources by a few users (tail)

For user statistics (frequency of resources tagged with a tag), around 13% are following a power law.

few users tag a lot, whereas lot of users tag a few

i.e. the characteristics of the user statics are similar to the characteristics of the resource statics.
They argue that those tags, which follow a power law w.r.t users and resources are high quality tags (i.e. tags describing resources with high accuracy [no misspellings and meaningful tags] ) for most of the users involved in the investigated social bookmarking system.
A small fraction of tags have overlapping user groups, which points towards sub communities (user groups sharing the same link selection and tagging behavoiur) in the tail of the power law distribution.

this was found through splitting resources in 3 (high, mid and low rank resources)

They also looked at the big chunk of tags that were not following the power law.

Unique assignments. More than half (57%) of less frequently tags are used only once. They think that they can be seen as "shortcuts" for a user to a resource or a misspellings. They argue that these tags are useless from retrieval point of view (hmm..).
Personal vocabulary. especially in less frequently used tags (19%) of tags were only used by one user but assigned to many resources. They are useful for personal retrieval but useless for the rest of the community.
Unpopular vocabularies. between 1/5 and 2/5 of tags are assigned to different resources by different users only once. Unpopular vocs used by a small fraction of users.
they conclude that from retrieval point of view (e.g. inverted indices, TF*IDF) a large fraction of tags are good for single or sub-communities, and only the power law distributed tags are good for that.

They don't say anything about how to include the large fraction of tag not distributed by power law into IR methods.

Retrieval Aspects
Q: Do tags add information to further to description and title for retrieval purposes? This is a lot along the lines that I am also interested in, although I will look more into the networks of users. They say that for retrieval tags can be seen as an additional resource. Moreover, about 50% of available description contain information similar to the information described by tags, whereas the remaining 50% can be seen as orthogonal information.

Comment. This all is treating tags only as additional keywords that can be useful for conventional retrieval purposes. I think the connection tag-resource-user is more interesting. Just the fact that even if the tag is misspelled or hooks to a small user community is less important to me, because I know that the fact that this resource was tagged shows that the user has an interest to this resources, thus it is a vote. This aspect has an immense potential for retrieval (recommender point of view), but is seldom regarded in papers with very conventional retrieval approach.

Open social and education

I wonder who is going to come up with the first OpenSocial app or widget for educational use? We certainly are talking about it, for example for our eTwinning platform. It could be cool to be able to use information about teachers collaborative networks to allow, say, better retrieval of learning resources relevant for the project, purpose or task that teachers are undertaking; link with some other sources that teachers are working on through cool widgets, etc.

I never thought that Facebook, which has lately become really popular among my friends (not early adapters), would be the seul app that would "take it all". I was glad to read this:

"The market has already decided that there's going to be a long tail of social networks, and that people are going to belong to more than one. As soon as you belong to more than one, this kind of interoperability is critical," Dash says. "Open standards win every time." wired

Hurray for open standards!

[Sm]all things considered by Riina Vuorikari