Tuesday, November 21, 2006

A Case study on 5S

5S, a fundamental abstraction of digital libraries, stands for Streams, Structures, Spaces, Scenarios and Societies. 5S proposes a formal language and model to describe digital libraries.

The advantage of using a formal language and model, according to its creators (Goncalves, et al. ), is that they are precise and unambiguous when defining semantics of specific abstractions of a knowledge field. Thus, 5S could be used as an instrument for
1) building and interpretation of a DL taxonomy,
2) informal and formal analysis of case studies of digital libraries and utilisation as a formal bases for a DL description language.
Moreover, 5S could be used for requirements analysis in Digital Library development.

An example of a case study to describe a digital library using the main elements of 5S is presented below, taking Learning Resources Exchange as an example.

5S, a fundamental abstraction of digital libraries, stands for Streams, Structures, Spaces, Scenarios and Societies. Streams are sequences of arbitrary items used to describe both static and dynamic content. Structures impose organisation. Spaces are stets with operations on those sets that obey certain constraints. Scenarios consist of sequences of events that modify states of a computation in order too accomplish a functional requirement. Societies are sets of entities and activities and relationships among them. The below case study illustrates of what each “S” is comprised of.


Societies
  • primary community: European teachers and learners
  • repository providers: public authorities who make the decision about joining
  • repository maintenance and running, editorial control
  • some pedagogical support, etc.
  • commercial providers

Scenarios (services, different corresponding scenarios)
  • trainings to help authorities to hook up to the federation
  • training to help teachers to use the system and submit material to it and creating the metadata
  • scenarios to maintain the service and content?
  • Scenarios on how to access the site, through browsing and searching, in one or federation (SQI)

Spaces
  • physical location of members (a metric space)
  • vocabularies and metadata used in different services (conceptual space). In addition to this, also manual, semi-automatic, and automatic indexing and classification methods to relate repositories to the conceptual space of ELR.
  • User interfaces (APIs?) to relate various software routines (like LMS etc)

Streams
  • simplest level they are streams of characters for text, and streams for pixels for images; audio, digital files. Challenges for quality of system if in real-tine or storage problems at the local level if downloaded and locally played
  • Network protocols, transmissions of serialised streams over the network such as federated search, harvesting, hybrid services using protocols like Dienst, Z39.50, OAI-PMH,...

Structures
  • database management system at the heart of the software for submission and workflow management.
  • Xml to store and exchange the resources' metadata
  • EUN Application profile
  • structures in the form of semantic networks, any?

S5 as a description language.

The paper, using S5 Descriptive Language, defines a digital library as follows:

A digital library is a 4-tuple (R, DM, Serv, Soc), where
  • R is a repository;
  • DM= {DM c1, DM c2, ...DM ck} is a set of metadata catalogs for all collections {C1, C2,...Ck}
  • Serv is a set of services containing at least services for indexing, searching and browsing;
  • Soc is a society.

“ We should stress that the above definition only captures the syntax of a digital library, that is, what a digital library is. Many semantic constraints and consistency rules regarding the relationships among the DL components (e.g. How the scenarios in Serv should be built from R and DM and from the relationships among communities inside the society Soc, or what the consistency rules are moment digital objects in collections of R and metadata records in DM) are not specified here. Those will be a subject of future research. “

Tuple: in a database, an ordered set of data constituting a record; a data structure consisting of comma-separated values passed to a program or operating system.

GONCALVES, M., A., FOX E., A,, WATSON, L., T. and KIPP, N.,A. (200 ).
Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital Libraries
Virginia Polytechnic Institute and State University
http://www.dlib.vt.edu/projects/5S-Model/

Monday, November 20, 2006

Workshop: Impact of Social Software on Society: PROWalk Event

Last Friday I participated in a BlogWalk event in Bonn. Crowing intensively weary of conferences where participants sit down on their asses making the best use of wireless Internet-connection to catch up with all the mails they missed because of the travel, I took a personal engagement to actively participate in this one.

Already for some time I've been aware of the idea of BlogWalk, but only now had a chance to have my first experience. I must say that I missed the "walking" part, we did not leave the conference venue to explore other areas. Nevertheless, mentally we did and it was interesting 3h of interactions, introductions, supporting questions, personal experiences and accounts, monologues, and post-its clued on the window (i.e. window-wiki, see the pic below).


The theme was "Impact of Social Software on Society". The discussion started from personal accounts and feelings; how information acquisition has changed in last few to 10 years. Many of us had similar experiences of following tens to hundreds of blogs through Web-feed reader, instead of reading a few authoritative newspapers and catching a number of news channels to get a different point of view on affairs.

Currently, instead of getting a "pre-filtered" view, one aggregates and filters (through friends) news, opinions, digital prints and artefacts by choosing whose blogs, what news and whose pictures/videos to aggregate and actually read. There are many different context one works in during the day (different way of working, that knowledge worker-stuff).

Overdose of all this was also discussed, how time consuming it is to filter all this information, read and mull it over, etc. Check out "continuous partial attention", but it's not only about email flows or instant messaging, it's increasingly about Web-feed loads, too! From my own point of view, just to balance out all the rants and not-so-well-formed opinions on the blogosphere, there is nothing better than reading The Economist, one of the best source of journalism. I sometime think of it as an ultimate opposite to blogs: impersonal (you have no idea of who has authored the article), although views on economics, and the world (in that order) are very pre-set, they know how to be self-critical, and so on. All this coined with a previous discussion with a pal about how he, previously an avid blog reader/writer is gonna only start reading peer-reviewed articles. Of course it was just a half jokingly said, but there is a little truth there too. We are probably soon gonna see some blog-fatigue in a way or another.

While talking about all this, we posted notes on the window with issues that we thought relevant for the BlogWalk. The following categories emerged:

1) Usage (personal level)
2) Effects and Fall-out (group level)
3) Contexts of usage (organisational/societal level)
4) Convergence (deep changes behind it all)
5) On-line interaction and/vs face to face
6) Design and evaluation of tools

"These clusters could be combined into one general narrative, starting from the personal level, moving upwards towards the group/organisational and societal level. This leads to a view on the deeper changes behind it all. At the same time from the personal level, moving downwards, we can cover social affordances and technical affordances."




One gap that was identified is how little research is actually done on social practices on how the tools and systems are used. We were discussing about many different ways of how and for what reasons people aggregate Web-feeds, for example. A cognitive point of view on this is missing, which, in a way, hinders us to see and understand the phenomena that social software can be part of in our society. Some empirical descriptions are missing, or like it was put on the wiki:

To understand the social implications we need more storytelling/ethnographic/anthropologic research that starts with the individual, in stead of just looking at the 'big numbers' and large patterns. (More stuff like by Efimova and Ben Lassoued, and my blog articles on my personal information strategy: 1, 2 and 3)


Nowadays many skeptics laugh about blogs that they have only one reader, the author, attempting to put them down. Hec, blogs are great for self-reflection, too. How needs an audience for that? If we had better understanding of a diverse ways people use these tools, we could maybe make them have a better support for all the diver ways. Which made me think of many of the read/write web tools and systems being on their perpetual beta, and whether that is a good or a bad thing. It could be good, in case the developers actually followed how people use their tools and developed them accordingly. However, it can also mean that they have no interest in supporting any new features and functions, and they just leave it hanging. Thus, better exploratory field studies and different contexts are needed to be studied, which would lead to more rigorous field and lab experiments to understand this all better.

For once again, attention metadata was discussed, on the one hand from a privacy point of view, and on the other hand, as a way to better understand how users use the wide palette of tools and systems.

Wednesday, November 15, 2006

Where tags and structured vocabularies co-exist, case for tagging learning resources

NOTE: this is a draft and lot of thoughts put together

Context

In the Learning Resources Exchange-portal end-users, i.e. K-12 teachers from all over Europe can access digital learning resources that are made available by European Schoolnet or by its partners from different learning repositories from a number of European country. All of the learning resources have associated metadata to them, all of the partners use Learning Object Metadata, and many of them an application profile specifically geared to European K-12 education. The indexing keywords come from a multilingual Thesaurus that currently exists in 14 languages. Additionally, teachers can add their own keywords to learning resources when saving them to their personal collections, that function like bookmarks. Also, in their personal collections teachers can evaluate their learning resources, rate them and add comments. This paper focuses on vocabularies, both unstructured and structured ones, and proposes a complimentary approaches to their use to enhance learning resources retrieval. We think that tagging has the potential to enhance the retrieval, both through the complementary features that it can offer for controlled vocabularies, as well as through its underlying social structures.


Structured and Unstructured vocabularies to access resources


The table "axes of organizational vocabularies" depicts the dimensions of structured and unstructured vocabularies. Learning resource repositories, for example, often use an authoritative vocabulary to index the material that is based on their national or regional educational needs such as requirements in curriculum or educational levels. Some of these vocabularies are structured domain taxonomies with hierarchies (LRE thes, EUN, Italy; Mobus thesaurus, France;,..) whereas others might be unstructured glossary type lists of terms (eg....). These vocabularies are used in educational repositories and portal most often to provide access to resources by classification or allowing hierarchical browsing, search functions are equally applied, however, teachers seem to prefer browsing features (based on focus groups and some Calibrate logs).



















StructuredUnstructured
AuthoritativeDomain TaxonomyGlossary
Personal/OpportunisticHierarchical FilesystemFolksonomy

Axes of organizational vocabularies, Iverson (2006), CC some rights reserved.


As opposed to authoritative vocabularies, personal tagging of resources provides an unstructured vocabulary for resources. These end-user generated keywords serve, in the first place, personal knowledge management needs. For example by bookmarking resources one creates a link to the resources to access them later, and the use of personal keywords allows naming things in a meaningful way, grouping them, etc. Depending on the user-base and behaviour, tagging can start providing a community based vocabulary for the given community of users. This is commonly known as folksonomy. It is important to note that not all the actions of tagging and personal vocabularies allow a community folksonomy to emerge, thus the division to broad and narrow folksonomies (Vander Wal, 2005).

Classified categories or domain taxonomies offer a way to access resources, either through keyword based retrieval or browsing through categories. A solid indexing strategy based on controlled vocabularies guarantees that the searcher does not need to face all the uncertainties of natural language (synonymy, polysemy, homonymy), combined with all the uncertainties of a full text search (no relevance control on the retrieved occurrences) (Trigari, 2002). Both searching and browsing, based on metadata and controlled, structured taxonomies, are methods most used to access resources on educational portals.

Browsing and searching serve two fundamentally different information seeking tasks. It is similar to the difference between exploring a problem space to formulate questions, as opposed to actually looking for answers to specifically formulated questions (Mathes, 2005). One could say that searching happens at the stage where user already have a clearly defined task, a well formulated question, and an idea how and by what information sources to fulfill it (reference), whereas browsing many times occurs when the user has not yet a clear question or task at hand, rather just a hunch of what to look for.

Second important advantage of folksonomies is that they can serve the community by promoting a serendipitous access to resources through browsing interlinked related tag sets, also known as "tagclouds". A tagcloud can be a personal one comprised only of the personal keywords, it can be a local one to display the most common tags by a given community of uses (e.g. based on domain interest, language, etc.) or even a global one allowing an access to all resources that are available through the service.

Complicity of relationships between thesauri and folksonomies

An interesting hybrid of vocabularies will emerge within the ELR context, where users add personal keywords to learning resources that are already indexed by an authoritative, structured domain taxonomy such as the LRE Thesaurus. As opposed to most Web-based applications where no top-down vocabulary exist and where users tag content with their personal keywords (del.icio.us, Flickr,..), the situation is different in ELR. Here, a learning resource is indexed using a 1 to 3 thesaurus terms that further allow search and browsing in a multilingual context, added also with multilingual tags by end-users. This will open a number of interesting research possibilities in the twilight zone of structured and unstructured vocabularies.

In the LRE Thesaurus the structure of the descriptors follows the classical semantic relationships such as the following: intra-language equivalence (USE-UF), inter-language equivalence, hierarchical BT/NT relationship and the associative relationship (RT). The research attention will be directed to study the relations between terms in all the areas. For the revision purposes of the thesauri, we will be interested in looking whether folksonomies could provide any new descriptors for the LRE Thesaurus.

Maybe the most potential area of complicity between folksonomies and thesaurus will be in the area of inter-language equivalence (USE-UF) between the thesaurus terms and folksonomies, which could be seen as a good source for non-descriptors. Non-descriptors are part of thesaurus, they provide the intra-language equivalence that facilitates the access to resources that are indexed by using the descriptors (i.e. thesaurus term) that do not translate well to the language that the end-user uses. For example, a user could search for antiquities, but the preferred term in the thesaurus, the one used for indexing, is "ancient history". The use of antiquities as a non-descriptor thus facilitates the access to the resource.

Example of inter-language equivalence (USE-UF)

Antiquities
USE ancient history
ancient history UF Antiquities

Moreover, an equally interesting area will be associative relationships (RT) and folksonomies. The associative or related term relationship is expressed in thesauri between terms that are mentally associated to such an extent that it's useful to make the link between them explicit, however, they are not members of equivalence set, neither are subordinated or superordinated to another (Trigari, 2001). We think that folksonomies could help to define and re-define a number of associative relationships in thesauri.

Example of associative (RT) relationship
NT1 old age
. . . . RT elderly person

As the ELR Thesaurus is also multilingual, we will be interested in investigating what kind of inter-language equivalence folksonomies could offer between languages. As for the area of hierarchical broad terms and narrow terms (BT/NT relationship) between the thesauri terms, we assume that folksonomies have very little to offer.

All in all, additionally to the above, for the revision purpose of the thesaurus, we think that folksonomies can be a great source to verify whether the scope of the current descriptors actually serves the need of its users for indexing and retrieval purposes.


To better understand the nature of folksonomies and tagging

A number of interesting research papers have emerged in the area of tags and folksonomies. Mathes (2005) described the phenomena, whereas the term and its main characteristics such as broad and narrow folksonomies were described by Vander Wal (2005). Broad folksonomies can be defined as social tagging systems where many people tag a few items (e.g. del.icio.us), whereas narrow folksonomies are the ones where content provider, usually the owner of the content, gives tags to items (e.g. flickr). Mainly, folksonomies emerge from broad ones, as there are many users and also niche user groups, who give the item the name that they know it for. Through the recent research we also know more about the evolution of tagging and how users provide them (Sen et al., 2006). Marlow et al., (2006) propose a model and taxonomy of different tagging systems. Moreover, Bogers et al. (2006), Lin et al. (2006) and Tonkin (2006) have provided interesting insights into the issues between classification and tagging.

By studying tagging and folksonomies more closely in this specific hybrid setting of learning resource,s that are indexed by experts and tagged by end-users, might yield some more interesting insights into its nature, both from qualitative and quantitative point of view. More so, we are interested in finding best ways to have social tagging systems to co-exist with more top-down indexing with structured vocabularies, and find out ways that can leverage such a system to better serve its users' needs. Also, research in this type of setting might allow us to better understand the many downfalls of folksonomies, such as ambiguity of tags, their synonym (different word, same meaning) and homonym (same word, different meaning) control (Guy, 2006). Moreover, we are interested in using the folksonomies to tap into the social structures and networks to, first of all, study and analyse the charasteristics and the main behavior of our user-base, and secondly, to use the social networks to enhance the information retrieval and serendipitous access to resources.


References


Bogers, T., Thoonen, W., & van den Bosch, A. (2006). Expertise classification: Collaborative classification vs. automatic extraction. Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from http://www.slais.ubc.ca/users/sigcr/sigcr-06bogers.pdf.

Guy, M., & Tonkin, E. (2006). Folksonomies: Tidying up Tags? D-Lib Magazine, 12(1). Retrieved November 10, 2006, from http://www.dlib.org/dlib/january06/guy/01guy.html.

Iverson, L. (2006). Thomas Vander Wal on Folksonomy. Blog posting. Retrieved from http://www.ece.ubc.ca/~leei/weblog/2006/03/thomas_vander_wal_on_folksonom.html.

Lin, X., Joan, B., Yen, B., et al. (2006). Exploring characteristics of social classification. Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from http://www.slais.ubc.ca/users/sigcr/sigcr-06lin.pdf.

Long, K. (2005). Leveraging folksonomy - flickr clusters at ExperienceCurve. Blog posting. Retrieved November 10, 2006, from http://blog.experiencecurve.com/archives/leveraging-folksonomy-flickr-clusters.

Marlow, C., Naaman, M., boyd, D., et al. (2006). HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, . Hypertext 06. Retrieved from http://www.danah.org/papers/Hypertext2006.pdf.

Mathes, A. (2004). Folksonomies - Cooperative Classification and Communication Through Shared Metadata. Retrieved November 13, 2006, from http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.

Sen, S., Shyong K., L., Cosley, D., et al. (2006). tagging, community, vocabulary, evolution. Proceedings of CSCW 2006. Retrieved from http://www.grouplens.org/papers/pdf/sen-cscw2006.pdf.

Tonkin, E. (2006). Searching the long tail: Hidden structure in social tagging . Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from http://www.slais.ubc.ca/users/sigcr/sigcr-06tonkin.pdf.

Trigari, M. (2001). Multilingual thesaurus, why? . European Schoolnet, ETB project.. Retrieved November 13, 2006, from http://etb.eun.org/eun.org2/eun/en/etb/content.cfm?lang=en&ov=3813.

Vander Wal, T. (2005). Explaining and Showing Broad and Narrow Folksonomies :: Personal InfoCloud. Blog posting. Retrieved November 13, 2006, from http://www.personalinfocloud.com/2005/02/explaining_and_.html.

Monday, November 13, 2006

List of some good reading on folksonomies

Bogers, T., Thoonen, W., & van den Bosch, A. (2006). Expertise classification: Collaborative classification vs. automatic extraction. Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from http://www.slais.ubc.ca/users/sigcr/sigcr-06bogers.pdf.

Guy, M., & Tonkin, E. (2006). Folksonomies: Tidying up Tags? D-Lib Magazine, 12(1). Retrieved November 10, 2006, from http://www.dlib.org/dlib/january06/guy/01guy.html.

Iverson, L. (2006). Thomas Vander Wal on Folksonomy. Blog posting. Retrieved from http://www.ece.ubc.ca/~leei/weblog/2006/03/thomas_vander_wal_on_folksonom.html.

Lin, X., Joan, B., Yen, B., et al. (2006). Exploring characteristics of social classification. Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from http://www.slais.ubc.ca/users/sigcr/sigcr-06lin.pdf.

Long, K. (2005). Leveraging folksonomy - flickr clusters at ExperienceCurve. Blog posting. Retrieved November 10, 2006, from http://blog.experiencecurve.com/archives/leveraging-folksonomy-flickr-clusters.

Marlow, C., Naaman, M., boyd, D., et al. (2006). HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, . Hypertext 06. Retrieved from http://www.danah.org/papers/Hypertext2006.pdf.

Sen, S., Shyong K., L., Cosley, D., et al. (2006). tagging, community, vocabulary, evolution. Proceedings of CSCW 2006. Retrieved from http://www.grouplens.org/papers/pdf/sen-cscw2006.pdf.

Tonkin, E. (2006). Searching the long tail: Hidden structure in social tagging . Social Classification: Panacea or Pandora? 17th Annual SIG/CR Classification Research Workshop. Retrieved from http://www.slais.ubc.ca/users/sigcr/sigcr-06tonkin.pdf.

Vander Wal, T. (2005). Explaining and Showing Broad and Narrow Folksonomies :: Personal InfoCloud. Blog posting. Retrieved November 13, 2006, from http://www.personalinfocloud.com/2005/02/explaining_and_.html.