Wednesday, July 15, 2009

Google parsing microformat, e.g. ratings

In May Google announced that they will start parsing microformats (on a small scale first), similar stuff came out from Yahoo! last year, but even on a smaller scale.

This is pretty huge for the end-user generated ratings! I must say that I did not see it coming in this way, which makes it even more exiting :)

..Google is releasing support for parsing and display of microformat data in their search results. .. anyone who marks their pages up with the appropriate microformat data will be able to make their information understandable by Google. This technology would allow you to explicitly search, for example, for only printers that had an average customer review of 3 stars or higher.

Holy smokes! This is cool, can't wait to see when it will first pop up in my search :)

So, since a long time it's been problematic to get enough ratings on items, this is a known problem especially in the field of Recommender systems. They talk about "sparse data". An example, you want to make a recommendation on music, but the item x has 3 ratings, item y 2 ratings, etc. This is way too little to be used to create recommendations using the algorithms that are out there. Take another example, a camera shop, they let users rate their cameras, but they get very little reviews from users.

Now, however, there are other camera shops who are struggling with the same problem. Essentially, they all are selling the same camera brands, and they all have only a few ratings on it, and at the end, non of them can do much fun with this small amount of rather anecdotal information.

There has been talk about a unique identifier set by industry, for example, so that all camera sellers could use them and thus aggregate all the reviews and ratings together. Yep, you guessed it, there's maybe that one shop down the blog who does not want to use it. I think a couple of years back Yahoo! came up with a very compelling paper reiterating the idea and trying to muster up enough consensus among industry and other players. Not much happened - and then, here is Google and microformats... beautiful :)

Why I'm interested in this is that with the idea of federating learning resource metadata across repositories, we face the same problem. As a result of sharing metadata, the same resource might end up used in many different repositories, where users might be allowed to rate them. But that metadata on ratings or evaluations is VERY seldom shipped back to the mother board.

The same with tags and bookmarking (other other tools that allow users to create collections or playlists). That could be valuable information for the repository who first federated the resource metadata out. By collecting back the varied annotations from different repositories, they could gain interesting information, and eventually overpass the sparse data problem. Moreover, they would gain data about what works and in which context, which makes me think of "travel well" resources.

In action

Here is an example of a search for Palm's new phone that I'm contemplating on. I search for reviews only and the result list shows the ratings, but I cannot yet make a query saying "palm pre" ratings grater than 3. Nice in any case.




I've had a few ideas on this with some colleagues and I really look forward to seeing what Google comes up with that. And how are they going to solve the issue of different rating scales used, and multi-attribute ratings.

Vuorikari, R., Manouselis, N., & Duval, E. (2007). Metadata for social recommendations: storing, sharing and reusing evaluations of learning resources. In D. H. Goh & S. Foo (Eds.), Social Information Retrieval Systems: Emerging Technologies and Applications for Searching the Web Effectively (pp. 87-107). Hershey, PA: Idea Group Inc. Retrieved from http://elgg.ou.nl/rvu/files/20/144/SIR_vuorikari_manouselis_duval_web.pdf.


Manouselis, N., & Vuorikari, R. (2009). What if annotations were reusable: a preliminary discussion. In M. Spaniol (Ed.), Advances in Web-Based Learning - ICWL 2009, Lecture Notes in Computer Science (Vol. 5686, pp. 255–264). Berlin Heidelberg: Springer-Verlag.

1 comment:

Unknown said...

Interesting read! Nowadays there is actually a huge debate about reviews. And if we want to talk about it, in my opinion not all the websites providing reviews management are the same. Portals like Yelp! and TripAdvisor, Trustpilot, Google reviews, whatnot are spammed with fake reviews. That is why it is important to learn from those companies who enlist independent third-party review providers like eKomi, BazaarVoice, etc. in order to read verified info from true critics.
This kind of companies use a Transaction-Based system for feedbacks, in other words only a customer who actually experienced the product/service can leave a comment, simple as that, but 100% reliable! Plus, reviews automatically generate a stream of fresh SEO content that contribute to accelerate the page ranking on Google. There are so many benefits for companies that adopt this kind of feedback management that is almost impossible to mention them all! Last but not least, this is the only way to get the golden stars, giving more visibility to your AdWords campaigns, leaving competitors kilometers far behind you!