The Use of Classification
Systems on the Web
University of Illinois
For more information:
Classification on the Net (Candy Schwartz)
Koch, Traugott and Michael Day. The Role of Classification Schemes in Internet Resource Description and Discovery.
Project Aristotle(sm): Automated Categorization of Web Resources
The Scorpion Project (OCLC)
Scout Report Signpost
Weinberg, Bella Hass. "Improved Internet Access: Guidance from Research on Indexing and Classification." Bulletin of the American Society for Information Science, v. 25, no. 2 (26)
Experimentation with the use of classification systems on the Web has been going on for some time. Recently, though, it has become apparent that computer scientists, engineers and Web designers are reinventing the organizational skills of librarians to suit their needs in developing effective Web sites. Taxonomy and ontology-building have become sought after skills in designers as the amount of information to be organized grows exponentially. The skills of indexers are being applied to the construction of corporate Intranets and public Internet portals as well. Search engines with a profit motive, such as Northern Light, employ indexers to add value to quality publications and make their money from document delivery. YAHOO! is essentially an alphabetico-classed system. Alternatively it can be viewed as a controlled vocabulary that displays narrower terms. In each case, a type of classification system is used to provide structure and organization to the information being provided.
As noted by Traugott Koch and Michael Day in their paper The Role of Classification Schemes in Internet Resource Description and Discovery, a site that organizes knowledge with a classification scheme demonstrates several advantages over sites that do not.
Browsing: Classified subject lists are easily browsable in an online environment. Browsing is particularly helpful for an inexperienced user or users not familiar with a subject and its structure and terminology. The structure of the classification scheme can be displayed in different ways as a navigation aid. The classification notation does not even need to be displayed on the screen so that an inexperienced user can have the advantage of using a hierarchical system without the distraction of the notation itself.
Broadening and narrowing searches: Classification schemes are hierarchical and therefore can be used to broaden (i.e. for improved recall) or narrow a search when required. Questions can be limited to individual parts of a collection (filtering) and the number of false hits reduced (i.e. for improved precision)
Context: the use of a classification scheme gives context to the search terms used.
Potential to permit multilingual access to a collection: since classification systems often use notations independent from a specific language, indices in different languages can offer multilingual access to the same resources without any further changes to the collection. A searcher could enter search terms in a given language and those terms would then relate to the relevant parts of the classification system (as a switching language) and be used to retrieve resources in any given language on the subject.
Partitioning and manipulation of a database: Large classified lists can be divided logically into smaller parts if required.
Revision and support: An established classification system is not usually in danger of obsolescence. The larger schemes undergo continuous revision.
Potential to be well known: Regular users of libraries will be familiar with at least part of one or more of the traditional library schemes. Members of a subject community are likely to be familiar with their (subject specific) schemes as well.
There are also some disadvantages to using an established classification scheme. The division of logical collections of material: Classification schemes often split up collections of related material. This can be partly overcome with good cross-references
The illogical subdivision of classes: Some popular schemes do not always subdivide classes in a logical manner. This can make them difficult to use for browsing purposes.
Assimilating new areas of interest: Classification schemes, since they are usually updated through formal processes by organized bodies, often reveal difficulty in reacting to new areas of study.
The classification system employed can be one of two kinds: derived or imposed. A great deal of experimentation occurred in the late 1990’s with automatic classification. Most projects attempting automatic classification used methods of derived indexing. These projects extracted information from documents and used it for structuring sites or access. Gerry McKiernan, Iowa State University, offers a comprehensive collection of pointers to such projects and systems, including short descriptions, citations and addresses at Project Aristotle (sm) – Automated Categorization of Web Resources.
One of the more prominent projects in this area is OCLC’s Project Scorpion. Scorpion is a research project attempting to combine indexing and cataloging based on the observation that these are complementary activities. Scorpion specifically focuses on building tools for automatic subject recognition by combining library science and information retrieval techniques. For instance, to assign subject codes to a document, the document can be treated as a query against a Dewey Decimal System database using ranked retrieval. The results of the search can then be treated as the subjects of the document. Subject assignment in this manner provides clear differentiation from the traditional computer indexing behind the currently available free search services.
A different approach involves the use of imposed, traditional library classification systems. These systems have universal or subject specific schemes constructed over many years by co-operative organizations, independently from the content of documents which actually exist in particular collections. The DDC is the most often used classification system on the Web followed by UDC and LCC. Specific subject-oriented sites often use classification systems germane to their topics. The Engineering Index (Ei) and the National Library of Medicine’s NLM classification and MeSH headings are used heavily in their areas. For an excellent bibliography of writing on this subject and a list of projects using traditional library classification systems, see the Web site "Classification on the Net" maintained by Candy Schwartz.
There are a number of sites using LC classification to give structure to their information. CYBERSTACKS is a site maintained by Gerry McKiernan at Iowa State University. Scholarly Internet resources, including search services, in the sciences are organized by an abridged LC classification. Users can browse from broad class letter to subclass letters (with topical icons) to division numbers, and then to resources with rich extracts and summaries.
The Scout Report Signpost is a part of the Internet Scout Project, located in the Computer Sciences Department of the University of Wisconsin-Madison. The project is funded by the National Science Foundation. Its mission is to assist in the advancement of resource discovery on the Internet. The Scout Report Signpost demonstrates that Internet Resources can be cataloged, classified, and arranged using existing taxonomies such as the Library of Congress Classification Scheme and the Library of Congress Subject Headings in concert with the Dublin Core.
Libraries can take advantage of this approach by using the classification system as a bridge between on-line and print collections. If both are organized using LC, browsing between the two is simpler for the user. It also makes accessing both collections through one interface easier since the data is organized using the same taxonomy.
Knowledge structuring on the Internet has to cope with large numbers of documents, exponential growth rates and a high risk of change occurring in documents that already exist. Whether the structure of a site is mined from its documents or imposed from the outside, it is clear that classification systems bring a greater degree of order and usability.