PreCYdent:  A New Search Engine Enters the Legal Research World

Steven Robert Miller, Reference Librarian,
Indiana University School of Law - Indianapolis

A new legal search engine named PreCYdent debuted earlier this year. PreCYdent.com was founded in April of 2006 to create new technologies for legal research. “PreCYdent” is an old English spelling of the word “precedent,” but the “cy” in the spelling is suggestive of “cybernetics.” PreCYdent.com was co-founded by Thomas A. Smith,1 a law professor at the University of San Diego. PreCYdent.com is based upon the application of a proprietary search system to a database of over 923,000 federal and state cases,2 over 51,000 statutes,3 regulations and other legal documents.4

PreCYdent screen shot

While working on a law review article in 2004, Professor Smith read a book unrelated to his research at the time entitled, Linked:  The New Science of Networks5, by Albert-László Barabási. Barabási’s book influenced Smith to write a paper entitled “Web of Law,” which he posted to SSRN.6 The reception to his SSRN paper was very positive from law librarians and law professors, but no academic law review saw the same importance in it as Smith did and those providing positive feedback.

Professor Smith began cold-calling mathematicians who might be interested in his idea of writing a paper about legal citation networks. He found a graduate student of the internationally known computer scientist Piero Fraternali, Antonio Tomarchio,7 who was visiting at Cornell. Tomarchio and Smith wrote about the mathematical properties of legal citation networks. In April of 2006, Smith and Tomarchio formed PreCYdent to build a legal search engine, and Smith found an angel investor in San Diego, who invested $100,000 towards the development of PreCYdent. Smith and Tomarchio began to form a team to build the legal search engine we call PreCYdent today.

PreCYdent has all U.S. Supreme Court decisions, all U.S. Courts of Appeals decisions reported in the Federal Reporter from 1950 to 2006, and all published and unpublished Courts of Appeals decisions reported after 2006. PreCYdent has an incomplete U.S. District Court database of decisions since 2004. In addition to federal court decisions, PreCYdent also “crawls” state court sites for all available state cases. The back file of state decisions varies from state to state. For example, the Arizona Court of Appeals’ decisions go back to January of 2008, but the Oklahoma Court of Criminal Appeals’ decisions go back to January of 1900. Most state case law on PreCYdent goes back about ten years. Archival data is much harder and more expensive to provide users. Licensing archival data will be expensive. Professor Smith recently received a quote for a licensing agreement from one of the three largest legal publishers of about 45 cents per 1,000 words. “When you have four million cases, such an agreement would cost about $10 million.”8

Despite its popularity, there are no plans to charge anyone for using PreCYdent. Some have asked the question as to “why should PreCYdent be free.” Smith replies, “Well, looking at it from the other side, why shouldn’t it be free? This information is in the public domain.”9 Professor Smith has given some thought to providing subscription services in the future if they develop PreCYdent in the area of non-public domain materials. As he explained, there is a crowded market with low-end providers. “We don’t want to be another entrant in already crowded space.”

PreCYdent’s algorithm offers another way to cut the material and to make sure nothing important is missed. Therefore, even lawyers with a flat-rate license on Westlaw or LexisNexis may use PreCYdent in the interest of being thorough.10 Many use PreCYdent to do their homework before they even talk to a lawyer. For savvy legal researchers who have access to Westlaw or LexisNexis, or know their way around a law library, PreCYdent offers something that can supplement their research needs. Google Analytics11 tells the PreCYdent team that they also have attorneys from large law firms that come to a free site like PreCYdent and then go to Westlaw and Lexis where the meter is running. It is free, so lawyers can use PreCYdent to clear the ground before they start running up research charges. They know that high school students and government agency workers use their services. “We provide a service to a wide audience. We see this as an opportunity in providing a service in law.”12

Carl Malamud, a longtime advocate of free government information,13 recently said, “It’s about time legal information is free and open online to the public. Information on medicine on the Web has changed the doctor-patient relationship - but it’s still hard to do your homework before you go to see a lawyer. Law is the last bastion.”14 Because of Carl Malamud’s efforts to create free online access to government documents, Smith decided to add GPO documents to PreCYdent and donate copies of the state cases gathered by PreCYdent’s web crawling to Malamud’s organization. Although other sites also provide GPO documents, PreCYdent as with case law offers a different search algorithm to reach them.

How similar is PreCYdent to Google?

PreCYdent uses a combination of algorithm and user response similar to Mahalo (Beta),15 but focuses on legal cases and statutes.16 PreCYdent uses various mathematical techniques. One in particular is eigenvector centrality.17 Its unique algorithm was developed by Antonio Tomarchio. Professor Piero Fraternali and Professor Stefano Ceri, internationally known computer scientists at the Politecnico di Milano,18 are the scientific advisors of PreCYdent.19 The PreCYdent search engine developed from the same Kleinberg principles20 used by Google’s PageRank21 but branches off by integrating the uniqueness of the idiosyncrasies found within the body of two hundred years of American case law.

The question of similarity to Google must be asked and answered in context. According to Professor Smith, PreCYdent cannot disclose its algorithm and claims to have better search results in selected case research than either Westlaw or Google. Google’s PageRank algorithm models probable behavior of networks including eigenvector methods.22 PreCYdent is different than PageRank and does not infringe upon Google’s patents23 in any way, but the perception is that it is very similar to Google because of the web-based, natural language (simple search string) strategy versus the longtime, Boolean search strategies of Lexis and Westlaw. The PreCYdent algorithm mines the information latent in the legal citation network to find the most authoritative and relevant legal authorities in response to the researcher’s query.24

PreCYdent’s network takes into account not only direct connections but also indirect links and temporal factors. This allows them to get around some of the obstacles of the earlier Kleinberg algorithm on legal cases. According to Smith, the Kleinberg algorithm appears biased towards very old cases, which tend to have many citations. PreCYdent’s algorithm, on the contrary, is able to recognize important “very young” authorities. Even 2008 cases can rank among the first results when using the PreCYdent search engine.25

The PreCYdent algorithm is query-dependent. The network variables and the centrality scores are calculated spontaneously when needed on the set of documents textually related with the user query. Other successful eigenvector centrality algorithms such as PageRank calculate the authority scores offline on the whole network and they are not related with the domain of the document. The PreCYdent algorithm represents the first, large-scale application of query-dependent, eigenvector centrality methods for ranking.26

In a recent issue of the Res Ipsa Blog, Benson Varghese writes about a search he ran using PreCYdent, namely, “arrest warning,” which yielded some interesting results. The landmark Miranda decision came up first in PreCYdent, but came up sixth in Westlaw and something higher in LexisNexis.27 Varghese highlights a problem that many librarians face when counseling students, paralegals, and other legal researchers on the dangers of simple (natural language) search string strategies on any database. Librarians emphasize to students the need to know something about the topic before they begin searching. But it is as if PreCYdent is providing topical elements in the search results ranking. PreCYdent also appears to be adding a layer of authority context in its relevance ranking system.

PreCYdent mines the law from law review articles and from the opinions of law professors. For example, their team sent out letters to law professors around the country about various topics. Data collected from the surveys were used to generate search strings and search responses (e.g., “private property takings”) to arrive at the most qualitatively relevant matches to a particular search. Authority found in context was a key element to their search strategy. “Measuring authority is an art rather than a science” according to Professor Smith and PreCYdent.com is attempting to do it in a way that has not been done before.28

You do not need to log onto PreCYdent. And you will not need to log onto it in the future. The registration feature is used to find a connection among users in a social network environment in order to provide an advanced search engine to find people later on when PreCYdent is more developed. Collaborative filtering is what the registration process aims to achieve. Anonymity is presumed. They were sensitive to some of the news stories about privacy issues surrounding many online services that have surfaced over the past several years.29

The latest feature introduced on PreCYdent is the first release of a case citator that allows you to check for subsequent actions on cases, such as overruling, reversals, and affordances. PreCYdent now offers a citator page for each case that displays all this information. For your convenience, you can also access this information in the text of a case by placing your mouse cursor over a citation. The citator will display a window visualizing the subsequent judicial history of the cited opinion.30 Not as advanced as Shepard’s or KeyCite, the first release of their citator does offer some authority checking of case law without charge.

PreCYdent pulls out cases as relevant even if the search term does not occur in them because it analyzes the citation links among cases. So there might be a case, which does not use the researcher’s search term, but uses a synonym for it, or a related term. Cases that do contain your search term(s) are connected with that case by citations, however. So their algorithm follows the citations to that other case (the one without the exact terms you are searching for) and calculates that it is closely related enough by citation that it should be made part of the search results.31

In practice, Smith says this is usually not necessary. More commonly, you should see an important case in that your search term occurs, but only once or twice. A natural language-based algorithm such as Westlaw or Lexis apparently ranks according to term frequency, so an important case could be ranked low, where you may not see it, or might take a long time to get to it. But the PreCYdent team can figure out its level of importance even though the term occurs only once or twice.32 As with other legal web sites, PreCYdent follows early informatics concepts in web retrieval of European legal information that Eric Schweighofer of the University of Vienna wrote about in 1999.33 PreCYdent’s algorithm, however, is fine-tuned for U.S. case law research.

PreCYdent features Web 2.0 tagging and ranking of cases.34 PreCYdent appears to appeal to a younger generation of users. The full extent of current PreCYdent technology might not reach an older generation of users that primarily uses a combination of print resources (e.g., digests, legal encyclopedias, etc.) and electronic resources (Lexis and Westlaw) and various secondary sources (print or electronic). PreCYdent however is primarily aimed at everyone using the Web for fundamental case law and statutory research needs.

In truth, PreCYdent is putting in state-of-the-art features based on brainstorming with their team, Smith acknowledges. As with a lot of web startups, they plan to evolve based on user responses and what seems to work for them. Since his information technology team is young, and he works with law students, there probably is a focus toward younger users according to Professor Smith. But they also are consciously trying to reach the general public interested in law (e.g., people he thinks of as “Law & Order,” the TV show, fans.) “Our thought is that there is a lot of unsatisfied interest in law both in the U.S. and abroad that a free site with powerful search could reach.”35

Asked whether PreCYdent will be for primary legal sources only, or will links to secondary legal sources follow, Professor Smith responded that this depends on how successful they are. “If we have the resources, we would love to bring in secondary materials. I think our algorithm would be very powerful indeed in heavily annotated materials, such as law review articles, treatises and digests.”36 PreCYdent’s is not only a natural language search engine for federal and state case law and statutes. It also added Boolean searching and proximate searching capabilities as Westlaw and Lexis.37

In an issue of the Law Librarian Blog,38 Joe Hodnicki of the University of Cincinnati credits Smith for indexing the law in an innovative way, noting that legal citation indexing originated with a table of cases that Joseph Story began in 1743. At first blush, some might argue that Professor Smith might be doing the opposite with a search engine like PreCYdent because no traditional index is offered by PreCYdent. Note, this same kind of criticism was made about Westlaw, Lexis, Dialog, and other online resources when they first came out. Forms of indexing were added to their databases. Free search engines like Google have made finding information on the Web fairly easy for many by the use of simple, natural language searches.

Professor Smith argues that what his team is doing is better than indexing, or at least complementary to it. Smith explains that an index is an order imposed by one or a small team of humans on a mass of complex information, while a purely textual search engine does something that they view as pretty crude, namely pull out parts of that mass (such as documents) based on something fairly arbitrary, namely whether certain words occur in the document. What his team does is very different. “We start with the assumption that judges (in the case of judicial opinions) have made literally billions of decisions in deciding which cases to cite in their opinions. Each of these choices is an expert decision about what other case is most relevant to the opinion the judge is writing,” Smith explains.39

According to Smith, far more information is embedded in these decisions than anything a team of indexers can create. Out of these many citation decisions judges make to link one case to another, there spontaneously emerges an order. “We measure attributes of these patterns to figure out how closely cases are related to one another. So we see ourselves as measuring attributes of the legal system that really are there organically, just as an ecologist goes out and figures out how various animals in some ecosystem are related to each other,” he adds. Smith is not just indexing; he is describing a natural order. “Thus, we view ourselves as being more true to the underlying organic structures of law than the indexers are, and certainly more true than (mere) textual searching is.”40

Professor Smith thinks that perhaps one day their technology, “especially as it gets better and more refined, may stand to law somewhat as calculators stand to arithmetic and trigonometry.” Because law is getting more complicated and massive in quantity all the time, Professor Smith says we need the more powerful tools just to stay in the same place. He also strongly feels that technology like PreCYdent’s can make complex bodies of law more transparent to the ordinary intelligent, but non-law trained citizen. “This transparency helps promote the rule of law. So without being grandiose about it, I do believe our project is connected with promoting the rule of law in an increasingly complex world.”41


1 Thomas A. Smith, A.B. 1979, Cornell University; B.A. 1981, Oxford University; J.D. 1984, Yale University. Professor of Law, University of San Diego School of Law, 1992 - Present. http://www.sandiego.edu/usdlaw/faculty/facprofiles/smithta.php (last visited May 17, 2008).

2 923,211 cases reported by PreCYdent.com as of May 18, 2008. See http://precydent.com/ (last visited May 18, 2008).

3 51,583 statutes reported by PreCYdent.com as of May 18, 2008. See http://precydent.com/ (last visited May 18, 2008).

4 PreCYdent (Technical) Memo, March 28, 2008 (on file with author).

5 Albert-László Barabási, Linked:  The New Science of Networks, Perseus, Cambridge, MA, 2002.

6 Thomas A. Smith, “Web of Law,” University of San Diego Law School, Spring 2005, San Diego Legal Studies Research Paper No. 06-11 (abstract and SSRN electronic paper), available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=642863 (last visited May 18, 2008).

7 Antonio Tomarchio, CTO and co-founder of PreCYdent. Degree in Mathematical Engineering from Politecnico di Milano, Milan, Italy. Co-authored (with Frank B. Cross and Thomas A. Smith) “Determinants of Cohesion in the Supreme Court’s Network of Precedents,” U. of Texas Law, Law and Econ. Research Paper No. 90, 2nd Annual Conference on Empirical Legal Studies Paper, San Diego Legal Studies Paper No. 07-67 (abstract and SSRN electronic paper), available at http://search.ssrn.com/sol3/papers.cfm?abstract_id=924110 (last visited May 17, 2008).

8 Telephone interview with Thomas A. Smith, CEO and co-founder of PreCYdent.com (April 28, 2008).

9 Id.

10 Email correspondence with Thomas A. Smith, CEO and co-founder of PreCYdent.com (May 5, 2008).

11 Google Analytics™, available at http://www.google.com/analytics/ (last visited May 18, 2008).

12 Telephone interview with Thomas A. Smith, CEO and co-founder of PreCYdent.com (April 28, 2008).

13 John Markoff, “A Quest to Get More Court Rulings Online, and Free,” New York Times (Technology Section), August 20, 2007, available at http://www.nytimes.com/2007/08/20/technology/20westlaw.html (last visited May 19, 2008).

14 Anne Eisenberg, “Lawyers Open Their File Cabinets for a Web Resource,” New York Times (Technology Section), April 27, 2008, available at http://www.nytimes.com/2008/04/27/technology/27novel.html (last visited May 13, 2008).

15 Mahalo Beta, available at http://mahalo.com/ (last visited May 18, 2008).

16 “PreCYdent Setting the Precedent,” Tech Coast Review, March 6, 2008, available at http://www.techcoastreview.com/2008/03/precydent-setting-precident.html (last visited June 2, 2008).

17 PreCYdent Technical Memo 3, March 28, 2008 (on file with author).

18 Id.

19 Id.

20 Principles developed by Jon Kleinberg, Professor of computer science at Cornell University. A chronological list of his published papers can be found at http://www.cs.cornell.edu/home/kleinber/chrono.html.

21 PageRank is a system for ranking web pages developed by Google founders Larry Page and Sergey Brin at Stanford University. See http://www.google.com/technology/ (last visited May 17, 2008). See also T.H. Haveliwala, “Topic-Sensitive PageRank:  A Context-Sensitive Ranking Algorithm for Web Search, IEEE Transactions on Knowledge and Data Engineering, 15 (4):784–796 (August 2003).

22 See M. Burgess, G. Canright, and K. Engø, “A Graph Theoretical Model of Computer Security:  From File Access to Social Engineering,” International Journal of Information Security, 3(2):70–85 (November 2004); G. Canright and K. Engø-Monsen, “A Natural Definition of Clusters and Roles in Undirected Graphs,” Science of Computer Programming, 53:195 (2004); and I. Bytyci, “Monitoring Changing in the Stability of Networks Using Eigenvector Centrality,” Master Thesis, 26-29, Oslo University College, Department of Informatics, University of Oslo, Norway (2006), available at http://research.iu.hio.no/theses/pdf/master2006/ilir.pdf (last visited May 7, 2008).

23 Google patent for PageRank found at http://www.pat2pdf.org/patents/pat6285999.pdf (last visited May 13, 2008).

24 Telephone interview with Thomas A. Smith, CEO and co-founder of PreCYdent.com (April 28, 2008).

25 PreCYdent Technical Memo 3, March 28, 2008 (on file with author).

26 Id. at 3-4.

27 Benson Varghese, “PreCYdent - A New Tool for Lawyers,” Res Ipsa Blog (posted in News, Tech Tips for Lawyers), April 20, 2008, available at http://resipsablog.com/2008/04/20/precydent-a-new-tool-for-lawyers/ (last visited May 13, 2008).

28 Telephone interview with Thomas A. Smith, CEO and co-founder of PreCYdent.com (April 28, 2008).

29 See Maria Aspen, “How Sticky Is Facebook Membership? Just Try Breaking Free,” New York Times (Technology Section), February 11, 2008, available at http://www.nytimes.com/2008/02/11/technology/11facebook.html (last visited May 18, 2008).

30 “ PreCYdent Releases First Version of its Citator,” Precedent News, May 8, 2008, available at http://precydent.com/news.html (last visited May 18, 2008).

31 Email correspondence with Thomas A. Smith, CEO and co-founder of PreCYdent.com (May 5, 2008).

32 Id.

33 See Erich Schweighofer, “The Revolution in Legal Information Retrieval or: the Empire Strikes Back,” Journal of Information, Law & Technology, 1999(1): (1999), available at http://www2.warwick.ac.uk/fac/soc/law/elj/jilt/1999_1/schweighofer (last visited May 18, 2008).

34 Robert J. Ambrogi, “Sophisticated Search for Public Domain Law,” Robert Ambrogi’s Lawsites:  Tracking New and Intriguing Web Sites for the Legal Profession, available at http://www.legaline.com/2008/01/sophisticated-search-for-public-domain.html (last visited May 18, 2008).

35 Email correspondence with Thomas A. Smith, CEO and co-founder of PreCYdent.com (May 5, 2008).

36 Id.

37 See http://www.precydent.com/opinionbasic.html?keywords=&more=Jurisdictions+%26+options#

38 Joe Hodnicki, “Law Prof as Toolmaker:  An Interview with PreCYdent’s Thomas A. Smith (San Diego),” Law Librarian Blog:  A Member of the Law Professor Blogs Network (January 29, 2008), available at http://lawprofessors.typepad.com/law_librarian_blog/2008/01/law-prof-as-too.html (last visited May 13, 2008).

39 Email correspondence with Thomas A. Smith, CEO and co-founder of PreCYdent.com (May 5, 2008).

40 Id.

41 Id.



The ALL-SIS Newsletter