Tuesday, August 29, 2006

Notes and comments on: Accurate is not always good: How Accuracy Metrics have hurt Recommender systems

S.M. McNee, J. Riedl, and J.A. Konstan. "Being Accurate is Not Enough: How Accuracy Metrics have hurt Recommender Systems". In the Extended Abstracts of the 2006 ACM Conference on Human Factors in Computing Systems (CHI 2006) [to appear], Montreal, Canada, April 2006

The paper starts by informally arguing that "the recommender community should move beyond the conventional accuracy metrics and their associated experiment methodologies. We propose new user-centric directions for evaluating recommender systems".

The paper states that the current accuracy metrics, such as MAE (Herlocker 1999), measure recommender algorithm performance by comparing the algorithm's prediction against a user's rating of an item. They continue saying that this means, in essence, that a recommender that recommends places to a user where she has already visited would be rewarded rather than a recommendation on new places that might be of interest. Clearly, if that is the case, there is something rotten.

The paper proposes three aspects; similarity, recommendation serendipity and the importance of user needs and expectations in a recommender, and suggests how they could be improved.

A) Similarity

- the item-item collaborative filtering algorithm can trap users in a "similarity hole" only giving similar recommendations. This becomes more problematic when there is less data, for example, for a new user in a system.

The authors go on to discuss about the accuracy metrics that don't recognise this problem, because they are designed to judge the accuracy of individual items and not the list of items. However, the authors argue, "the recommendation list should be judged for its usefulness as a complete entity, not just as a collection of individual items." There was evidence in a user testing that the lists that had performed badly on conventional accuracy measures were the ones preferred by users. These lists had used the Intra-List Similarity Metrics and the process of Topic Diversification for recommendation lists (Ziegler 2005).

Authors go on saying that depending on the user's intentions, the makeup of items appearing on the list affected the user's satisfaction with the recommender. Here, in my opinion, it becomes important to remember the user intentions as provided by Swearingen & Sinha (2001)
  1. Reminder recommendations, mostly from within genre (“I was planning to read this anyway, it’s my typical kind of item”)
  2. More like this” recommendations, from within genre, similar to a particular item (“I am in the mood for a movie similar to GoodFellas”)
  3. New items, within a particular genre, just released, that they / their friends do not know about
  4. “Broaden my horizon” recommendations (might be from other genres)
Comment: So, taking all this into account, what can we think of the lists for learning resources?

B) Serendipity

This is how unexpected the recommendation is for the user and how novel it is. This is hard to measure. The authors approach the issue by its opposite: the ratability of received recommendations, and this, they say, is easy to measure by using the "leave-n-out" approach. However, the assumption that users are interested in the highest ratable items is not always true for recommenders. They give an example of recommending Beatle's White Album to users of a music recommender as a bad idea, as it almost adds no value.

The same example, I remember, was somewhere else on recommending to people buy bananas when they go shopping, however, apparently people almost always buy bananas anyway, thus no commercial value there..This could, though, have some value, when building people's trust on a recommender.

Moreover, the authors point out that different algorithms give different recommendations, and that people preferred one over another depending on their current task (think again about Swearingen/Sinha). To conclude on serendipity, the authors say that other metrics could be needed to judge a variety of algorithm aspects - no direction given on this one, though.

c) User experiences and expectations

New users have different needs from experienced users. Rashid (2001) has shown that the choice of algorithm for a new user greatly affects the experience (really?!) and also, apparently, the native language is greatly preferred (Torres 2004), wonder what kind of language groups were in question there..

- Moving forward

Authors don't suggest that the old-school metrics should be thrown away, but not only be used alone, we need to think of the users who want meaningful recommendations (!!).

Firstly, it is recommended that instead of looking at each item on the list of recommendations, one should pay more attention on the integrity of the list, using metrics like Intra-List Similarity metrics, and more of such kinds.

We should test more what kind of search algorithms users like and given them those.

Users have a purpose for expecting a recommendation, so we would need to know better what actually are the user needs when they come to see a recommendation (Zaslow 2002).

Well, well, if this is where we are at with recommender usability studies, it is not much. However, it is great that important people such as Grouplens researchers tell us this, so maybe it makes the general audience more susceptible for new things to come.


Swearingen & Sinha (2001)

Torres, R., McNee, S.M., Abel, M., Konstan, J.A., and Riedl, J. Enhancing digital libraries with
TechLens+. In Proc. of ACM/IEEE JCDL 2004, ACM
Press (2004) 228-236.

Ziegler, C.N., McNee, S.M., Konstan, J.A., and Lausen, G., Improving Recommendation Lists through Topic Diversification. In Proc. of WWW 2005, ACM Press (2005), 22-32.

Zaslow, J. If TiVo Thinks You Are Gay, Here's How To Set It Straight --- Amazon.com Knows You, Too, Based on What You Buy; Why All the Cartoons? The Wall Street Journal, sect. A, p. 1, November 26, 2002.

Wednesday, August 23, 2006

Why Google is not a content-based recommender

Yesterday in the HMDB-bookclub, that we run in my unit, we read and discussed a paper on the recommender systems (Adomavicius & Tuzhilin, 2005, Toward the Next Generation of Recommender Systems). As this is my topic of research I was very eager to hear how my study-buddies perceived the issue and what did they have to say.

The discussion lingered into understanding the two main trends to produce recommendations: the content-based (CB) and collaborative recommendation. There were questions and attempts to answer them which left me unsatisfied after the session. Mainly, we left with the impression that Google, or any information retrieval system, would be, at the end of it, just a content-based recommender. I was somewhat troubled with this though and set my self for the quest to understand better what is there to discover.

Let’s go first by definition: Konstan et al. (2005) say:
Unlike ordinary keyword search systems, recommenders attempt to find items that match user's tastes and the user’s sense of quality, as well as syntactic matches on topic or keyword. For example, a music recommender will use an individual’s prior taste in music to identify additional songs or albums that may be of interest.

When Adomavicius et al (2005) talk about CB approach, they state that it has its roots in information retrieval and filtering research, but
the improvements over the traditional information retrieval approaches comes from the use of user profiles that contain information about user’s tastes, preferences, and needs. The profiling information can be elicited from users explicitly, e.g., though questionnaires, or implicitly- learned from their transactional behavior over time.

In the regular Google search there is no account, whereas to produce both content-based (CB) and collaborative filtering (CF) recommendations we need an account that we can assign to the user. An individual user profile is build based upon this.

In the CB recommendation a user is recommended items similar to the ones preferred in the past. This means that we need a search history, i.e. a user profile, where we can identify what the user has preferred in the past.

Thus, to generate a rather complete user profile that can find similarities between items (not people!) things like a history of viewed paged, bookmarked pages, the purchase history, “wish list”, and things like heurestic text analysis, etc. are important (implicit rating/input). Conventionally, especially with the first generation of recommenders the explicit ratings were the top notch:

...mid-1990s when researchers started focusing on recommendation problems that explicitly rely on the ratings structure. In its most common formulation, the recommendation problem is reduced to the problem of estimating ratings for the items that have not been seen by a user...Once we can estimate ratings for the yet unrated items, we can recommend to the user the items(s) with the highest estimated ratings(s) (Adomavicius, 2005)

Additionally, many times the CB systems would use additional information such as demographic, specific interests, location, etc that is part of the user’s self-manifested profile for the input. Maybe in the future this type of information could be extracted from some other sources, such as blog-postings, as were suggested during the session.

So, to get closer to the answer to the question, whether Google is just a content-based recommender, we can say that if used anonymously, it is not, although probably many of the techniques are the same. However, if we think of Google Personalized Search (beta) it for sure gets to be one.

The second somewhat baffling issues was the name of collaborative filtering, as it turns out, there is no collaboration between the users to produce any recommendations. In the CF recommendation the user is recommended items that people with similar tastes and preferences liked in the past. This means that we need a history for this person, too, in order to find out similarities within tastes and past experiences.

The strength of the CF approach at this stage is that even if you personally haven’t seen a link, product or what ever object we are talking about, or indicated the system what you liked about it, there most likely is someone in your nearest neighbourhood who has indicated that. Thus, in CF the values used to compute the recommendations are inferred based on similarities on the profiles, and you don’t need to have necessarily done it yourself. So, here lies the one main divider between CB and CF as for the input for the recommender: CB only uses YOUR history, whereas CF uses other users’ search history to better understand, or guesstimate, your history.

Well, this is stuff explained in short, more and better arguments are found in the papers and in my links at: http://www.furl.net/members/vuorikari/recommendation

Adomavicius & Tuzhilin, 2005, Toward the Next Generation of Recommender Systems

J.A. Konstan, N. Kapoor, S.M. McNee, and J.T. Butler. "TechLens: Exploring the Use of Recommenders to Support Users of Digital Libraries". A Project Briefing at the Coalition for Networked Information Fall 2005 Task Force Meeting, Phoenix, AZ, December 2005.

Wednesday, August 16, 2006

Yet Another Summer school: The Present and Future of Recommender Systems

September 12-13, 2006 Bilbao, Euskadi

Now I am very exited, I was accepted to a summer school, well, rather a kinda corporate conference, on "The Present and Future of Recommender Systems".

It'll be kick-ass, even Chris Anderson, The Long Tail-guy and John Riedl from the GroupLens will be among speakers!

The event is organised by MyStrands, an online music service, a recommender for music with all the gadgets; using tags, recommending related music, tracking trends, etc. The easy way to input the system is to allow it to hook to your iTunes. They'll check all the music you have there, and recommend something to you that you probably like.

I'm testing it a bit, so now when I turn my iTunes on, it turns the recommender on, and I get some 5 to 10 items that the system thinks I'd like. You can rate them, tag songs, buy them (oh really?!) and also find similar profiles of other users. The site looks good and seems to work pretty well. However, I guess I should really use it more than a few times to know if it really works, and I guess I should at least try and buy a few tunes - just to see whether the match is made in heaven! And, ...just to see whether they can really persuade to me the desired action from their side - to consume.

There's been some interesting usability studies on recommender system by Kirsten Swearingen & Rashmi Sinha a few years back. These were somewhat out of the general strand of the research in the field, as they asked for users' opinions (really!) on the usability issues and most importantly, whether they liked the recommendations. It seems to me, when reading the literature review, that most studies don't even give a hec whether the systems will ever be used by end-users, but they are busy proving the accuracy of algorithms in some bizarre mathematical ways. They come with stuff - genre - yes, this recommender works, it recommends to the user books by her favourite author. Right, just like one didn't know that...

This brings me to think what is it actually that people might want from a recommender system? Do they want it to help them to discover new items that they are not aware of, or just give good secure recommendations on the items that they already feel comfortable with?

Kirsten Swearingen & Rashmi Sinha go for "Different Strokes for Different Folks"-approach on user's needs:
  • Reminder recommendations, mostly from within genre (“I was planning to read this anyway, it’s my typical kind of item”)
  • More like this” recommendations, from within genre, similar to a particular item (“I am in the mood for a movie similar to GoodFellas”)
  • New items, within a particular genre, just released, that they / their friends do not know about
  • “Broaden my horizon” recommendations (might be from other genres)
From: Beyond Algorithms: An HCI Perspective on Recommender Systems
Kirsten Swearingen & Rashmi Sinha

ps. the pic was done by Adam, Jehad's son. I will have to ask him for cc-licence, so for now it's copyrighted.

Friday, August 04, 2006

Showcase demonstration on the absurdity of software patents

This week's buzz has been the press release by Blackboard Inc. that announced, well, that Blackboard actually has invented e-learning or at least, the virtual learning environments. S.Downes gives a good run down on the blog postings on the issue here and in his today's OLDaily.

Like many have noted and protested against, there is plethora of cases of prior-art on what Blackboard Inc. claims to have invented, and what the US. Patent authority has granted them.

A great initiative called "History of virtual learning environments" has started in Wikipedia that currently is collecting and documenting our e-learning history in a form of cases of prior-art in the area of virtual learning environments. This will be an indispensable source of information, sort of poor man's portfolio of counter arguments, whenever it comes to a patent litigation in court over this given issue. Which, it seems, could be anticipated; it sounds like Blackboard is giving some indication that it might be using its 30-global-patents-and-patents-pending-portfolio aggressively (from the FAQ:
"My institution doesn't use a Blackboard system but uses a competitor’s course management system. How are we affected?"
Answer: Evaluating patents can be complex and because we don’t know the specifics of how your system works, we would encourage you to consult with your CMS provider for answers."
Just imagine all the CMS providers freaking out on this! Another story is whether anyone can afford opposing this patent in court, as potential targets might be open source initiatives like Moodle, Sakai, etc. and the educational institutions using them. However, like Mr. Attwell notes: let's hope that big companies will take care of the fight: SAP apparently has pending e-learning patents, too.

Moreover, the saga continues to other countries and continents where Blackboard Inc. has deposit patent claims for the same patent. To be precise, we here at the EU-land have also had our share of attention: the European Patent Office's database has a record of a pending claim on "EP1192615: Software Patent: Internet-based education support system and methods".

How will that effect on our life is still in the mist. It is a known fact that the EU is still giving a priority to strengthen Intellectual Property Rights, but where do software patents stand on that seem pretty cryptic to a common citizen.

After the last Public Consultation and public hearing on future patent policy in Europe in July, it seems that the two DGs in the European Commission can't find a common message to sent out, whereas the European patent litigation agreement (EPLA) is being set up all without the Commission's involvement by the the European Patent Convention (EPC).

Hey, little things count: we started collecting names on the petition to raise awareness against possible e-learning patents in Europe last Spring. This was to flag our concerns to people who are preparing the Public Consultation and public hearing on future patent policy in EU.

Keep discussing about this issue and ask your colleagues and friends to sign the petition online! It's good to show to our policy-makers and corporate folks that there are many people who do not want to be part of the software patent hell, but just get on with our work.

The petition to sign: http://flosse.dicole.org/?item=don-t-allow-software-patents-to-threaten-technology-enhanced-learning-in-europe