Where is All the Legal Data?

Last summer, UNC’s law journals tasked prospective student staff members with writing a note. The subject was to be State v. Gaddis, a criminal case involving a finding of harmless error. The harmless error doctrine holds that when a trial court errs, but the error does not affect the case’s outcome, an appellate court will not reverse the trial court’s decision.

This doctrine is controversial. We can never know for certain what would have happened without an error. The safest option would be to remand every case where a trial judge erred for a new trial. But this would be inefficient; the outcome of the second trial would often be the same. And efficiency matters, given the volume of cases courts must process and the backlogs that often result.

As one of the many law students working on this note, I found myself wondering: doesn’t it matter how many cases contain a harmless error? The lower the number, the less convincing the efficiency rationale would be.

But despite my best research, I could find no statistics on harmless error holdings—in North Carolina or anywhere else.

Later that summer, I ran into the same problem involving a different legal issue. North Carolina’s alienation of affection doctrine allows a person to sue their unfaithful spouse’s lover. I wondered how this cause of action interacted with gender norms. Historically, did men or women bring more alienation of affection claims—and was one gender more successful in winning these claims than the other? But as with harmless error, I found no statistics on alienation of affection holdings.

I have now run into this issue more times than I can count. I’m surprised to find so little conversation about it among lawyers.

It is not surprising that statistics describing niche legal issues are difficult to find. Gathering the outcomes of all cases that deal with a particular legal issue is a monumental task, especially at the trial level. North Carolina only recently began to digitize basic docket information from trial courts.

But it doesn’t need to be this way. State and federal courts could record the basic characteristics of every cause of action filed in that court: what the core legal claims were, the demographic characteristics of the parties, the outcome of each claim, etc. They could then assemble this data into a publicly accessible dataset, with particularly sensitive information redacted as needed. Importantly, the dataset would need to contain information at the level of individual cases or causes of action. If courts released only summary statistics describing their cases broadly, lawyers would not be able to use the data to answer niche questions, such as how often a particular type of constitutional error is found to be harmless.

Providing this data would put an administrative burden on courts, but not an unreasonable one. Many federal agencies already record basic data for all administrative law cases, and legal academics have found ways to make this data more accessible, such as through a random sample of the data of all EEOC cases.

Partial solutions may currently exist for well-resourced law firms. Westlaw and Lexis boast legal data services, though it is unclear how comprehensive the data they provide is. I reached out to Westlaw and Lexis for more information and have not heard back.

For the rest of us, legislation is the best path forward. Legislation requiring the collection and publication of data would not be unprecedented. In 1999, the North Carolina General Assembly passed N.C. Gen. Stat. § 143B-903, a law that required police officers to record traffic stop data. Now, we have access to a massive dataset containing every traffic stop made by a police officer since 2002 in most parts of North Carolina. The data includes every stopped person’s age, race, sex, whether an arrest was made, and what the arrest was for. This data has spawned litigation, books on policing, and research on racial discrimination.

Lawyers should no longer be satisfied receiving only piecemeal information, like individual decisions, or generalized information, like high-level summary statistics, from courts.

A similar dataset describing court cases would be even more valuable. Lawyers could better understand their clients’ chances of winning a claim by looking at the success rate of past claims with similar litigants. Legislators and judges could better assess the consequences of changing legal doctrines. The impact on legal scholarship would be profound. In law, “empirical research” too often refers to a manual analysis of published cases available through Westlaw or Lexis. These tend to be a biased sample of the orders and decisions courts produce. If a comprehensive collection of case data were available, legal scholars could produce higher-quality research—often without needing an army of research assistants to read cases.

The first step in getting detailed legal data is to ask for it. Lawyers should no longer be satisfied receiving only piecemeal information, like individual decisions, or generalized information, like high-level summary statistics, from courts. Courts can publish comprehensive and detailed case data. For both civic and professional reasons, we should demand that they do so.

Justin Giles

Justin Giles is a 2L at the University of North Carolina School of Law and a MPP student at the Duke Sanford School of Public Policy. Be on the lookout for Justin’s recent development, A Noisy Debate: Should the Law Force the U.S. Census Bureau to Produce Inaccurate Data? in our forthcoming spring issue.