Southern Illinois University
Whenever we finally get a handle on one, another faster, better, new and improved one pops up. Extensible Markup Language (XML), a standard under development by the World Wide Web Consortium (W3C), is the latest entry into a crowded field.
When placed upon the markup language continuum, it provides a simpler set of rules for markup than SGML, but more flexibility than HTML. To put it in less cryptic terms, using XML requires a different focus, demanding that text encoders examine the components of documents rather how they should look. XML emphasizes the importance of such structural information by making it possible for text encoders to create and manage their own sets of tags. Designers can apply these tags in concert with Cascading Style Sheets to create documents that produce formatting if they like, but the main emphasis of XML is on managing content.
When we speak of encoding a documents structure, we mean that XML tags can be used to encode information by identifying titles, sections, subsections, paragraphs, citations, lists, figures, etc. This differs from HTML. HTML is largely concerned with how a document is formatted for presentation via a browser. It has little regard for the structure of the document. For instance, a legal case encoded in HTML could include structure reflective tags such as title <title>> and paragraph <p>, as well as more format reflective tags such as strong <strong>, emphasis <em> or tables <table> to align text. Many of these tags have and should be replaced by style sheets, but may be still in use.
The same case encoded in XML might have tags such as <Caption>, <Plaintiff>, <Defendant>, <Appellant>, <Respondent>, <DocketNo>, <Syllabus>, <Attorneys>, <Opinion>, etc. Rules for using these tags would be set out in a Document Type Definition (DTD). In much the same way the MARC21 format guides catalogers in using MARC tags, these rules would include what XML tags can be used, what order they must be used in, whether they are optional or required, and much more. As opposed to the case encoded using HTML, the XML tags give no instructions as to how the content should look. Using a style sheet would do this. Simply put, a style sheet exists as a separate document linked to the encoded text. It relates styles to specific tags. For instance, using the above example, the style sheet may say that anything appearing within the <Appellant> tags appear in bold, aligned center using the font Times New Roman or that text appearing within the <Attorneys> tag appear italicized and aligned to the left. Using style sheets thus keeps separate content markup from presentation markup and creates a more reusable, portable document.
Bibliographic, rights, and other information associated with an XML encoded document can be stored as a part of the document by using the Resource Description Framework (RDF). RDF provides a model for describing resources. Also developed under the auspices of the W3C, it enables the encoding, exchange and reuse of structured data. This data could be bibliographic data, rights management data, or other information integral to its use. Since the data is stored as a part of the document it describes, it could be said it functions like an electronic title page. RDF does not stipulate exactly how it must be used for each resource description community, but rather provides the ability for these communities to define data elements as needed. RDF uses XML as a common syntax for the exchange and processing of data. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures. By exploiting the features of XML, RDF imposes structure that provides for the unambiguous expression of semantics and, as such, enables consistent encoding, exchange, and machine processing of data.
One of the more prominent instances of XML use in a legal context is Legal XML. Founded in November 1998, Legal XML is a non-profit organization comprised of volunteer members from private industry, non-government organizations, government, and academia. The mission of Legal XML is to develop open, non-proprietary technical standards for legal documents and related applications. The organizations scope seeks to develop these standards for items such as court filings, case law, public and private law, legal books and law journals.
Legal XML states that its mission is not to standardize the internal format or functioning of applications (e.g., databases, database elements). Instead, it is to standardize the interchange format that exists in between applications. For instance, one of the standards creating efforts of the group involves creation of tag names. Examples are <CourtFiling>, <FirstName>, <Jurisdiction>, and <CivilActionNumber>.
Other organizations and projects using XML with legal documents include the Utah Electronic Law and Commerce Partnership, the National Center for State Courts, and the Washington State Bar Association Electronic Communications Committee XML Study Group. There are few professions better suited than ours to implement XML given the ubiquitous use of structured documents in the legal world.
As vendors, publishers and projects such as Legal XML bring XML into the mainstream of legal publishing, understanding the basics of these markup languages becomes more and more important for technical services librarians charged with acquiring, cataloging and preserving electronic resources. As with all such initiatives, there will come a time when it ceases to be a project and must move into production. Vendors if they have not already, will begin selling electronic resources encoded in XML. Both Microsoft Word and WordPerfect now include options for saving documents in XML. Whether the documents are created locally or purchased from a vendor or publisher, understanding their structure factors heavily into the acquisitions process. Were they created using a standard DTD? How were they encoded? Were they created using proprietary tags/standards that will only allow them to be viewed using specific types of software or open standards? The answers to these questions and more are important for each of us to consider as we begin collecting, preserving and providing access to more and more encoded electronic texts.