FCIL Newsletter, February 1997

FCIL Newsletter / February 1997


LC Classification Gets Automated

by Aaron Wolfe Kuperman
Library of Congress

Bibliographic (book) cataloging data has been distributed in a machine readable format for almost 30 years. The Library of Congress Subject Headings have existed as a computer file (from which printed volumes are produced) for over a decade. The Library of Congress classification schedules have remained totally manual. When numbers are added or changed (which happens on a weekly basis), they are distributed on printed lists from which conscientious catalogers manually annotate their printed copy of the schedules, or file replacement pages into loose-leaf copies of the schedules. Most libraries purchase and maintain few copies of the schedules, and rarely is a copy available to public service staff. Except for a non-updated index, access is limited to browsing or looking for a known number. This is about to change, radically, for the better.

The Library of Congress is converting its classification to machine readable records, and will shortly start to distribute them just as they distribute records for bibliographic data and subject headings. Already these tapes are being used by one vendor to produce a CD-ROM classification schedule, and two vendors are coming out with their own CD-ROM schedules later this year based on the printed schedules. The system for editing the records, is also used for retrieval by LC staff via an internet connection.

A major advantage of automating classification schedules is that there will no longer be a need to manually update schedules. This especially benefits libraries with large foreign law collections since we have to worry about many more schedules than a library that intensively collects only KF materials. The other advantage of an online system will be a radically improved ability to search index terms and captions. Also, an online system can give meaningful access to many users, including public service staff.

Being online and readily available will unleash the underused power of the classification schedules. A search of bibliographic records using class numbers can be far more useful than a search by any other method. For an "easy" search, e.g. contracts in France, one can quickly get material by checking subject headings or going straight to the class number based on a search of the bibliographic data base. That won't necessarily work for a more exotic subject or a jurisdiction that lacks a comprehensive legal literature, e.g., Mandate in Mali. In that situation, a cataloger or reference librarian will need to expand a search by switching to a broader topic (contracts, or obligations or civil law), or to search jurisdictions with similar legal systems (Francophonic Africa or France, and then probably any other civil law jurisdiction).

This is where classification becomes a more powerful tool. The hierarchy in the classification is well maintained and reflects the structure of the legal systems specific to each schedule. The hierarchy expressed in the subject headings is haphazard, never country specific, and tends to reflect a common law bias. Broadening the above search to include the range of numbers for broader civil law concepts is possible using classification, but very difficult to do with subject headings since the hierarchy for those terms that are also used in common law systems are cluttered with what to someone researching civil law are wrong or misleading references. Expanding the jurisdictions covered is difficult using geographic subject headings, since using civil law is not a function of geography. Refining the search using the classification is possible since civil law jurisdictions (except for Louisiana) use different tables than common law jurisdictions.

An automated system will offer radically improved access to the schedules. Keyword access to captions will be possible for the first time. Additional index terms can be inserted. Boolean operators will be available for searching the text of the schedules. Once the classification schedule exists as machine readable files, new "tricks" become possible. Since each number in the schedule is represented by a distinct record, it is theoretically possible to modify parts of the schedule for local needs. This could take the form of an individual cataloger attaching a personal note ("send new books in this range to the Dean" or "remember that French rente is a false cognate"). It could allow for a library or group of libraries to create local numbers that remain linked to LC's numbers. It would make it possible to add local index terms, or to insert captions and index terms reflecting languages other than English (something now done only for KJV and KK). Having current schedules with meaningful access will encourage catalogers both in LC and elsewhere to propose improved captions and index terms. It might be possible to "break out" some jurisdictions whose legal systems are not a good match with their table (Louisiana, Israel, the Muslim countries, Australia and India all come to mind); once the schedules are predominantly seen as computerized rather than manual tools, a proliferation of country specific schedules is affordable and manageable.

None of the above is available for law, yet. While the law schedules have largely been converted to the online format, they haven't been proofread. KF will be ready later this year, and others will follow including the new KZ schedule. The unproofread data is available internally within LC, and at present I'm the only law cataloger using it as a primary source of classification data (though I check the printed schedules frequently, but rarely find mistakes).

There are several possible ways that the classification schedules will be distributed in the future. One possibility is that the new classification tapes will be used to print "old fashioned" looking schedules. This will lower the cost of printing the schedules and result in fully revised editions coming out every few years rather than every few decades. This is what is already happening with the H social sciences schedules. This will probably have a negative inpact on the two vendors (Rothman/Larry Dershem and Gale) who have been doing an excellent job of issuing loose-leaf or recompiled schedules on a regular basis. The more interesting possibility is that the tapes will be used to produce CD-ROM or otherwise computerized schedules.

The Cataloging Distribution Service (the "sales" part of the Library of Congress) is already offering "Classification plus" as a CD-ROM that includes both the entire LCSH and those schedules already converted to machine readable format. This package will include the KF schedule later in 1997, and the remaining law schedules (for non-U.S. law) when they are converted to machine readable format.

At present, "Classification plus" sells for under $500 for a single license, with only $16 for each additional license per institution, and includes four quarterly updates. This is less than what it would cost to purchase new copies of the law schedules together with the subject headings.

Two established vendors, Gale and Rothman, plan to start coming out with CD-ROMs based on the printed schedules, including the updates. Both will have products available later this year. Gale says their tentative price will be several thousand dollars though the disparity between their price and "Classification Plus" seems curious. Rothman hasn't announced a price yet.

Larry Dershem demonstrated a very promising prototype of KF at AALL last year, which included links from the subject headings. Since LC doesn't make such links this would be a valuable addition, though to implement it out for all schedules would be a lot of work. Presumably Gale and Rothman, as well as CDS, will be looking for features to give their products a competitive edge. As discussed below, there are many law specific features we might hope for.

Cactus Software's Minaret system ("a software package for managing diverse collections in archives, libraries and museums") is being used by LC for editing the online schedules and also for retrieval via a staff-only telnet connection. They demonstrated a prototype user-friendly classification at AALL, but decided not to make it, citing, among other reasons, CDS's low price and low number of potential sales. Their system lends itself well to internet access, but whether this will be exploited outside of LC is unclear. There is also the possibility that someone such as RLIN or OCLC will make the online classification data base available online, just as the bibliographic files and subject headings are. In theory the Library of Congress could make files open to the public just as they do for bibliographic and subject heading files, though this option does not seem to be receiving strong consideration within LC. It would also raise a problem since while the classification data is produced by LC and not under copyright, the interfaces used internally by LC are licensed from private corporations.

As the first law cataloger to use online schedules on a regular basis, it appears to me that a law specific interface is needed. The non-law captions muddle up the indexing for a law cataloger. It would also be better if an index entry were initially displayed as a single line, and then expanded to show the specific numbers for each schedule only if requested. At present one has to work through a tremendous number of entries, since the index shows each table/schedule as a separate line in the index, and there are well over 30 law schedules, some of which are officially called supplementary tables even though they are really schedules (e.g. the 5000 number KJ-KKZ Table for European countries other than France or Germany). It would be nice if a system displayed search results to reflect "our way of thinking". Thus if I'm looking for Ontario law the order should be: Ontario, rest of Canada, other common law schedules, other schedules. This sort of search logic would be unique to the K schedules.

An additional problem is the infamous "Garbage in, Garbage out" problem. In the past, LC catalogers never bothered to worry about upgrading captions or index terms since they were in effect unsearchable. Many of these need revision, and more importantly, they need to be standardized. At present "Torts" and "Tort (Extra-contractual liability)" and "Tort. Delict" mean the same thing but don't index together since the wording varies. This severely undermines the value of the index. Another problem is that law subject headings lack references to call numbers, primarily since any subject heading can be used in any of the 30+ schedules. Many non-law users at LC have found the links (in the 053) field to be a very effective way to get to the numbers, but this won't work for law libraries. The Rothman prototype demonstrated by Larry Dershem at AALL had such links for KF (based on his efforts, not LC's). Establishing links for all schedules, or even for the more popular ones would be a lot of work and would require either a very hardworking vendor or a substantial cooperative project among law catalogers. While such efforts might be easy to justify for KF which is used heavily by virtually all American law libraries, it would be harder to get support for the non-U.S. schedules which are used intensively only by a relatively small number of libraries.

Online classification offers convenience for catalogers, and will probably justify its cost since it allows us to do our current work cheaper and better. For the FCIL community these economies will be greater. As a tool, it offers many new possibilities.

This represents the author's views and is not an official communication from the Library of Congress.


FORWARD to next article:
BACK to Contents page.