[Sm]all things considered by Riina Vuorikari: September 2006

Friday, September 29, 2006

"Hello YouTubists...!"

Ok, I must say that I missed some of the YouTube's glamour, but I'm discovering some interesting things about it now. I though it was all about "lonelygirl15" and kids posting their skateboarding videos.

Today I came across "geriatric1927" (http://www.youtube.com/profile_videos?user=geriatric1927), number 1927 probably being his date of birth. This grandpa has found YouTube, and an amazing audience there, to tell about his life, and especially about his life during the years of World War II in England. I find it adoring, I thought that only the young generation uses YouTube, and there he is: this nearly 80 year old man, sitting in front of his computer, telling us all about his life. And that he learned about editing music and putting images in his video, and all!

It's worth watching a few of his babbles, some of which have been watched over 100 000 times by YouTubists. He seems to always start by saying "Hello YouTubists...!", he talks about war times, his youth, life after war and such. It's like anyone's grand-dad; taking time to tell you a story in his own time, space and pace. Very touchy! He also goes into talking about the media exposure that he's received, about communicating with people who post him messages and who send him mails, etc. It's gotta have taken his life into totally new dimension!

I love when new media is used in a new context and especially when its take-up reaches new groups of users that we never thought it might. Imagine the designers of YouTube, early in the day when they thought about setting it up, creating use cases for this! "Use case no 12: the 3rd generation using YouTube to record their experiences of life - hey, this might also help to integrate the oldies into the Web 2.0...". Well, I don't think!

This post is just to remind myself about wonders of the Web!

Wednesday, September 20, 2006

Is rating broken?

Yahoo!, You Tube and Netflix all use ratings on their services to better gear towards users' needs. We are talking about huge numbers here; Yahoo! gets some 5 million ratings a day for artists, albums, songs and videos; Netflix has currently 2 million ratings per day from its 5M customers; and within a day after the launch of lonelygirl15's My First Kiss- clip on YouTube she's got over 5000 ratings. Clearly, rating is something not to neglect, it is an easy way for users to input their opinion, as well as rather a straight way to compute affinities in relation to some other type of data such as search history, demographies, etc..

However, it seems like there is more to rating than meets the eye, and it becomes increasingly complicated for services to make the best use out of it. In the Recommenders06 conference issues with ratings were mentioned, but not discussed thoroughly. To me this seems a highly important issue, as current services are using rating as a primary input for their recommenders and many of the algorithms work based on ratings.

The following issues came up with ratings (unordered list):

Semantics of rating are pretty unclear; what does a user actually mean with 3.5?
Meaning of ratings is very subjective; does my 2.5 mean the same as your 2.5?
Ratings are straight out unclear; on the scale 1 to 5, does one (1) mean that I really don't want to ever see it again or does it mean that I just quite don't like it?
What does a single attribute actually mean when rating for example one movie; is it about the plot, the actor, soundtrack? What if I like the plot but hate the main actor, how to express that?
Love/Hate-ratings: many services are getting more and more ratings only on the far dimentions of the scale; rating value distribution is large.
Binaries like thumbs up and down have issues too; How do I interpret something that has 10 thumbs-ups and 10 downs? Am I going to take the risk to either really like it or really hate it?
Rating variance; how does 10 up and 10 down rating effect on people's choice? Do people go for the middle way? Apparently not, see Jolie's presentation.
Only a few have rated many, many have rated a few – distribution of ratings is very sparse. It is hard to recommend something for those many with few ratings.
Rating distribution between genres: some genres are more predictable than others thanks to user ratings. How to recommend the ones not so ofter rated? In Netflix comedies and drama are more predictable than musicals, for example. The presentage of 4-5 start movies rented has increased, as prediction accuracy becomes better.
Do users understand what ratings are for? Whether users really understand what ratings can do for them when using Yahoo! Music stuff or Netflix? It's about the trade-off between control gained over the service but yielding to users' convenience that they give up when taking time to rate.
Feedback loop between rating and recommendations can become self-promoting. If I rate something good, the recommender keeps recommending that or similar items to me (also known as similarity-trap). There is the popularity bias: at the end, everything is related to Britney Spears.
Knowing the users' intentions: wanna buy or listen?
Ratings depend on when the item was rated (Netflix found out that ratings done immediately after watching the movie vary from the ones made at the later stage!).
Ratings are vulnerable to chilling, intentionally bad ratings, want to lift some music up on the list by rating it high, etc. (influencing the vote is relatively easy using some algorithms whereas hybrids might be more robust against manipulation, see Mr Mobasher’s persentation).
Does the user feel home with the other raters? Am I sure that I belong to this group of users and tastes? For example Last.fm started as a rather geek service, thus lots of users have rated items that match geeky music taste! This becomes really important when we think about internationalisation of recommender services, can my taste match with white-male-middle-class American taste?
Computing affinities with userprofile, editorial rankings, etc can take a long time, for example some Yahoo! services are only updated weekly since it's so computing intensive.

Ok, I think it's broken, but the question is can it be fixed? Well, that looks like a long list of issues to deal with, but I'm sure nothing has gone beyond repairing.

However, there are many remaining questions: How to help people who don't rate? How to better understand users behaviour, what do they like and what not, and get that information in a more implicit way? The following remedies were mentioned in the conference:

Going beyond rating for data input for recommenders by monitoring the play events in an online radio.
Uploaded playlists by users can yield important information about sequencing music, moods that they are played in, etc.
Netflix talked about encoding traits of movies that predict emotional responses, for example. Maria, one of the students, talked about combining personality traits and mood settings to further personalise and contextualise recommendations.
Prof.Riedl talked about letting users know the value of their rating to the community, e.g. how important rating one given item is to make better recommendations for this given group. It seems like people care about others, they are willing to make ratings to help other people similar to them finding better items.
Using social networks to better make and find recommendations.
Imporving ROI for users; with fewer inputs get more valuable outputs like playlists, concerts, videos, music news, etc.

Some more ways that I could think of:

If binary types of ratings are something that people do, let's just use thumbs up and thumbs down.
If scales are used, be explicit about them. No one really knows what the stars mean in iPod! Say clearly: O means “never play again”.
Multi-attribute ratings: if you allow people to rate, give them also options to be more clear about it; I think the plot is good, but acting sucks. There are people who love to do ratings and evaluations (just look at Amazon.com with their reviewers lists!) and many times they are good in doing it.
Leverage on re-using ratings from other services: Netflix, Yahoo!, MovieLense and God knows how many other services rate the movies. Think about webservices or harvesting those ratings and get rid of the sparsity problem! There should be some interoperability between user ratings and other evaluations between services.
I want a meta-recommender! Would be good to know if my music taste matches with other people's taste in a given service or whether I should hang out somewhere else to get favourable recommendations.
Anyway, those services are too focused only getting people to use that one and only service, by pooling up and letting users to take advantage of their profile in place a in place b would be convenient for me! Maybe Attention metadata could become to help here. Attention XML and Attention Metadata: Collecting, Managing and Exploiting of Rich Usage Information at International ACM Workshop

Monday, September 18, 2006

Anousheh Ansari: currently in orbit

What an inspiring story, what an inspirational person! and yeah, a woman :)

Not that I want to overemphises that latter fact, but I must say that it really makes me feel shivers and smile proudly - we need this kind of examples to inspire us. A quote from BBC world:

The Star Trek fan, who spent her early childhood in pre-revolutionary Iran, has spoken of the nights on the balcony gazing at the stars and a longing to become an astronaut.

Imagine that! And now she's in space, blogging away, after making it to Fortune magazine's "40 under 40", ha! Kinda person that makes me want to achieve something too. Being an example to someone else and make a positive impact on them.

Touché!

Her spaceblog is at: http://spaceblog.xprize.org/

Wednesday, September 13, 2006

A meta recommender of movies and music

Recommenders06
Originally uploaded by alvy.

Would be fun to play around with a meta-recommender that compares recommendations from different services, for example, for a movie. So, I would be in one service, say Netflix and had hard time choosing a film between the big choice of movies. Instead of knowing what other Netflix users thought about the movie, I could also see how users in Yahoo!, MovieLens, etc rated it.

As we found out today, ratings, that are widely used in recommenders, are quite tricky things. There is a lot of sparsity problems, variations in ratings, interpretations of a rating scale, not enough criteria or too much of them, etc. So, it might be useful (or then totally not), to see how MovieLens and Amazon users rated the movie, to compare whether there is a deep variation between them, and eventually find a community whose tase is similar to yours.

Of course the datamodels are not the same and there are different rating scales, but some normalisation could be done or thumbs up/down, or so. Besides, comparing the evaluation data across different applications would probably yeld very interesting results!

Recommenders06 - round table and attention

IMG_2840
Originally uploaded by jjdonald.

As the round table discussion drifted from one thing to another, some things penetrated my attention - specially when the discussion was about business models and future trends. It was mentioned that convergences between things are found, lilke the pact with Apple and Nike.

It made me think further about the labour intensiveness of the social applications on the web; you have to add your user profile, friends, music taste, etc to make it fun to work with (some automatisation exist, but not quite..). The same thing with recommenders, first I spend time to rate movies in MovieLens to get some recommendations. Then I go to Amazon.com to buy some movies or books, and they don't know shit about my taste as I haven't been shopping exclusively there.

Someone mentioned "Universal profile", which apparently has been thought of at some point in the past. To me that sounds like a really bad idea. That should be a mother big profile, a real monster to keep all info about me and my intentions and attentions.

The idea of using Attention metadata (AttentionTrust.org) kind of solution would be better. I own my attention, and I choose how much of it I want to share with a commercial entity to get better recommendations.

Some current work going on Attention XML and Attention Metadata here: Collecting, Managing and Exploiting of Rich Usage Information at International ACM Workshop http://ariadne.cs.kuleuven.be/cama2006/

Tuesday, September 12, 2006

Recommenders 06

Yesterday after arrival to Bilbao, Spain, some people from the conference got together to go around the town. There was Claudio, Nikos, Jolie and Iilja. So I already had a chance to talk with Claudio Baccigalupo (IIIA-CSIC, Spain) about his research already.

(btw; Blog postings with presentations are found at http://blog.recommenders06.com/)

It's about automatically generating playlists based on Case-based Reasoning (CBR). You can find a demo at http://labs.mystrands.com. As a case base for his recommender he uses uploaded playlists by MyStrands users who have decided to upload them. Then the system analyses them for regarding the order, and when a user says that she wants a playlist with F.Sinatra, the recommender creates one based on previous playlists. Of course, for the system to work well it is desirable to assume that the case base is perfect, i.e. hopefully the playlists are created by semi-professional deejays, and not just with people without any understanding of how songs actually best fit together to have a nice flow.

Fun, will play around with it!

Friday, September 08, 2006

Notes and ideas on “tagging, communities, vocabulary, evoution”

Shilad Sen, Shyong K. Lam, Dan Cosley, Al Mamunur Rashid, Dan Frankowski, Franklin Harper, Jeremy Osterhouse, John Riedl. tagging, community, vocabulary, evolution. To appear in Proceedings of CSCW 2006.

This paper focuses on users and tags from the user point of view, not from the object-tag point of view.

The paper refers to the nine tag classes (feels kinda contrary to talk about classes for tags...) presented by Golder et al. (2006), and present three classes system with

factual tags (Golder: item topics, kinds of item, category refinements)
subjective tags (Golder: item qualities)
personal tags (Golder: item ownership, self-reference, tasks organisation)

The authors conducted their research on MovieLens system where they had introduced a tagging system that was tested over a period of a month. The distribution over the tags was

63% factual
29% subjective
3% personal
5% other

Note: in our experiments we should look into finding classes too. In my vite-fait analysis of personal bookmarks in Celebrate the following were found:

names were related to the broad subject area such as biology or mathematics (factual);
the name additionally had indication about the intended audience (factual);
the name only had indication of the intended audience (factual);
the name indicated a sub-area such as “trigonometria,” a precise theme such as “brain” or a pluridiciplinary subject such as “watter” (factual);
also names that identified other things were used such as “easy” (subjective),
many indicated names of people and other acronymes whose meaning was not identifiable for an “outsider” (personal).
names were mostly marked in the language of the user.

Some worth mentioning finding and something to look for or compare with in learning resources experiments

About half of the tags were tags that the user had previously applied, thus they conclude that clearly habit and investment influence tagging behaviour and grows stronger as users apply more tags (I'm not sure whether they mean that as more tags (bigger variety of tags) are used by the tagger or as more items are tagged in general (with using a small variety of tags..). However they state that habit and investment aren't the only factors that contribute to vocabulary evolution.

On the act on tagging

Users who view tags by other people before tagging their first tag are more likely to have their tags influenced by other taggers community. So community affects user's personal vocabulary and it is stronger on user's first tag if they have been exposed for others' tags. This is to consider when designing a tool!

On the convergence of vocabularies

Seems like (quite self evidently) that if people see other's (e.g. they are proposed, are autofilled when typing, ...) tags while they are tagging vocabularies are more likely to converge than if users are working on their own. This is important to think of when we are designing our tool: are we going to show only user's own tags, all other users' tags, only most used tags or make a ready-made set of desirable tags on a given resource. Plus, how will the thesaurus term affect on the tagging culture. This could actually provide an interesting research possibility: to have different tagging interfaces for users and see how tags would differ!

On tags and how they are useful for different user's tasks

Different classes of tags approve useful for different tasks that the user has. The “taskonomy” is: self-expression (helps to express opinion), organising, learning about the given movie, finding, decision support.

It seems like personal tags are found useful only for self-organising (which hints to the direction that I think of their usefulness for PKM), whereas factual tags are good for learning about the movie and to help finding it. Subjective tags are found overwhelmingly good for self-expression, but also to support decision making process (!) (although only 1/3 of people who did not tag thought so).

All in all, all tags were mostly found useful in self-expressing and organising. Note, 23% of people who did not tag found them also helpful for organising! What is interesting and worth noting for our development is that people did not like seeing other people's personal tags, to which the authors mention that maybe a design decision needs to be taken on whether to have some way to choose to keep tags only private to the tagger, i.e. how to strike balance between other benefits of tagging and the privacy that people might need.

Also, another design question is related to people who did not tag, there were overwhelming more non-taggers in the group that had not seen examples of tags than in the one that had seen them in their tagging interface. To remember, though, pre-existing tags affect future tagging behaviour.

Moreover, the authors suggest that it could be useful to try to foresee some way to classify tags in those above mentioned classes, both automated ways to infer tags and interface designs should be considered.

Good references to check:

[4] C. Cattuto, V. Loreto, and L. Pietronero. Semiotic dynamics in online social communities. In The European Physical Journal C (accepted). Springer-Verlag, 2006.
[5] R. B. Cialdini. Influence Science and Practice. Allyn and Bacon, MA, USA, 2001.
[6] D. Cosley, S. K. Lam, I. Albert, J. Konstan, and J. Riedl. Is seeing believing? How recommender system interfaces affect users’ opinions. In CHI, 2003.
[9] S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Science (accepted), 2006.
[10] M. Guy and E. Tonkin. Tidying up Tags? D-Lib Magazine, 12(1):1082–9873, 2006.
[11] T. Hammond, T. Hannay, B. Lund, and J. Scott. Social bookmarking tools : A general review. D-Lib Magazine, 11(4), April 2005.

Tuesday, September 05, 2006

Questions and notes on Making Recommendations Better: An Analytic Model for Human-Recommender Interaction

S.M. McNee, J. Riedl, and J.A. Konstan. "Making Recommendations Better: An Analytic Model for Human-Recommender Interaction". In the Extended Abstracts of the 2006 ACM Conference on Human Factors in Computing Systems (CHI 2006) [to appear], Montreal, Canada, April 2006.

The paper start from an healthy self-assertion that recommenders do not always generate good recommendations for users. Authors propose a Human-Recommender Interaction (HRI) as a framework and a methodology to understand users, their tasks and how do they relate to recommender algorithms. They propose that HRI can be a bridge between user information seeking tasks and recommender algorithms, when applied in the HRI Analytic Process Model, it can become a constructive model to help the process to design a recommender.

As the information density grows, users have more specific needs for their information seeking. HRI can be used to describe these needs, thus firstly, thinking about myself and my research area, I have to describe user types (probably I could use the LRE logs to deduce this) and typical domain tasks (this would have to be some guestimates that I test with a focus group). The authors suggest Hackos 1998 for this, but looks pretty old. A detailed analysis of these tasks will allow us to link task to specific HRI Aspects.

HRI Aspects, the three pillars:

the Recommendation Dialog, the act of giving information and recieving one recommendation list from a recommender. This contains aspects like Correctness, Transparency, Saliency, Serendipity Quantity, Usefulness, Spread and Usability. The authors argues that recommender's purpose is to generate salient recommendations that strike an emotional response (the awe factor!)
the Recommender Personality (uh, I don't like that term), the user's perception of the recommender over a period of time. Aspects are such like personalisation, boldness, adaptability, trust/first impression, risk taking/aversion, affirmation, pigeonholing and freshness.
User Information Seeking Tasks, the reason the user came to the recommender system. Aspects such as Concreteness of task, task compromising, recommender appropriateness, expectations of recommender usefulness, recommender importance in meeting needs. Check out Case 2002.

The authors claim (still without any proof of concept, as the paper is pretty recent) that a user's needs and expectations from a recommender can be described by selecting the most relevant aspects from each pillar.

The HRI Analytic Process Model can help to analyse and redesign recommenders to better meet user information needs. (Can it also help to design them in the first place?) For example it could help to understand whether a user would be contented with risky recommendations or more like the ones that affirm her information seeking needs.

Moreover, the authors say that by looking at which HRI aspects are important to which task, some metrics can be designed (I would be very interested in those metrics!) to categorise the differences between tasks. These metrics could be used to benchmark the known algorithms, and thus help to choose the proper one for the task. Rather, as the authors state, a recommender should have a set of algorithms to use instead of being “one for all users”-type of set.

Questions for Mr. Riedl

What are the metrics, any exaples?
What are the outcomes of the simulations against the well-known algorithms, did the mapping between the tasks and algorithms materialise, and if not, how well? More information available? In the paper it mentiones that they are submitted, under review. Whom to contact?

More on HRI, a PhD thesis by McNee: http://www-users.cs.umn.edu/~mcnee/mcnee-thesis-preprint.pdf
Research statement by the above: http://www-users.cs.umn.edu/~mcnee/mcnee-research-statement.pdf

Sunday, September 03, 2006

On Social Network Analysis and Recommenders

An interesting area where I drifted today is the crossing point of recommender systems and social network analysis (SNA). I read a few papers in a row about it (Rashid et al., 2005; Korfiatis et al. 2006, not published; Carcia-Barriocanal&Sicilia, 2005),

The other day I chatted up with my PhD study-buddy on SNA and it was actually quite enlightening. I was seeking to understand what is the difference between what most (old-school) recommenders do, and what does SNA have to offer to this. SNA are used to better figure out what groups do and how do they form, etc, but I lacked the understanding of how to use this for something that I want to do, i.e. enhance the discovery and re-use of LOs in a repository.

I am an avid believer and lover of social bookmarks. That's it, it's out. I think we could do so many things better just doing that. I, of course, just have to prove that in my PhD, and find a way to prove it, so it helped to talk with my buddy who is researching stuff somewhere between behavioural economics and social network theory. He had the words that I was lacking for social navigation – you get in a social space and you don't have any clues of what is out there. What do people do. They follow others, they need a guide. Say, you see other people going one way and you follow. That is what I see social bookmarks offer you, a guide to go ahead, a direction, a pointer to start. But there is also another aspect to it, bookmarks offer connections, relations between me and things I like, and then again, between things I like and other people who like the same things.

Which brings me to - how can we leverage this for information retrieval (IR). Sicilia and Garcia, 2005 and Korfiatis et al. 2006 (not published) talked about this: to bridge the areas of Social Network Analysis (SNA) and Information Retrieval. In a way, already the famous PageRank is about social networks, who endorses whom in a form of a hyperlink. The only problem being is that we do also link to things that we don't care about...but back to recommenders...

Until somewhat recently recommenders were about ratings and explicit values that people gave to items. The big deal was inferring those values for users who had not explicitly done that or even interacted with the item. Nowadays we are moving into using all other kinds of data as an input for recommenders, like the context-aware attention metadata that my colleagues are looking into.

The idea of Contextual Attention Metadata-framework is that it would log data from different application that a user is using for the e-learning purposes. The fact is, that nowadays we are getting further and further away (at least mentally) from single big Learning Management Systems (LMS) and are more and more looking into using small “comfi” tools (IM, bookmarks, wikis, blogs,..) for learning purposes too. All these tools can generate attention metadata, and a framework like CAM could track that. A step ahead from conventional data-mining from separate and sparse log-files.

So, now are are looking into contextual attention metadata that can arch across application boundaries and tell us stuff like: after watching that educational movie, the learner 3 contacted a tutor by IM and then spent an hour working on a text editor while surfing on the Web using x and y keywords. From that we can try to deduce things (like how the learner actually uses the learning tools and material) that we could use to make more personalised recommendations.

What I find more interesting, though, is the social context, like PeopleRank (Carcia-Barriocanal&Sicilia, 2005; Korfiatis, 2006 n-y-p), that could be used to compliment something like PageRank. PeopleRank would use the social ties, i.e. the links that people have expressed in a FOAF-file to compliment the “conventional” the PageRank algorithm. That's cool, all right, although, just right from the bat I feel like I prefer the Yahoo's MyRank, that also uses a FOAF-description on top of their conventional search algorithm. Moreover, I would be interested in finding some other ways to use the FOAF-file, which I'm trying to think of. Maybe some more interesting things could, in deed like suggested by Carcia-B..&co, come from the use of foaf to express relations between organisation or group (schools, educational projects,.- like we could use it in our EUN-context), instead of individuals.

Well, back to my bookmarks and tags: I'm interested in observing on what happens in a repository of LOs where users can bookmark learning resources, socially navigate them in other people's collections, when tags are used and when people can rate and evaluate LOs that they have in their collections. Furthermore, we like to facilitate the creation of lesson plans, like one would create play lists in iTunes.

Recommending educational material to teachers and learners, automatically sequencing course material or aggregating learning resources and delivering personalised learning has in many research oriented projects relied on pedagogical concepts, on learners learning styles, on assessment of previous knowledge and skills, etc. This is probably very useful and has undoubtedly many potentials. (First we only need kind of standardised testing to assess skills and then plentiful pool of varied learning resources that comply to any different learning style, oh yeah, and which definition of learning styles are we going to use...).

Instead, I'm interested in tapping into the social power of a group of educators and their knowledge about what learning resources to use and in what case. Instead of looking into personalisation-side of things, I want to see what happens if we just look into socialisation-side of things. Do like others have done-kinda idea. If other people cross the street here, maybe I should cross it here too.

Of course we would have to assume that there are some personalisation going on, each case is unique, after all. But still many cases do resemble one another. And maybe looking into social navigation in an educational context can help us to unlock the problematic and labour intensive questions of recommending educational material.

Additionally, bookmarking comes with tagging, user-generated keywords that people can assign for material to find it later. That's the personal knowledge management side of things. Tags can also create communities, people interested in same things eventually end up using similar names/tags and thus a link is formed. Tags also make us understand better the different meanings and ways that people can understand “a thing”, etc..In the LOR-context tags could make explicit some of the teachers' "folk pedagogy" type of knowledge. Folk pedagogy can be an accumulated set of beliefs, conceptions and assumptions that professors personally hold about the practice of teaching (Bruner, 1996). Maybe this can also unlock something that we don't know of, yet.

Well, some thoughts that I've decided to write down to keep track of my thought.

[Sm]all things considered by Riina Vuorikari