Legal documents are pushing text interfaces forward

30 April 2022
30 Apr 2022
New York, NY
4 mins

A core research interest of mine is imagining new kinds of interfaces to text documents that are made possible by modern AI and software. I think an interesting place to look for such ideas may be interface designs for reading and writing legal documents.

Legal document-wrangling tools have a handful of properties that make it fertile ground for innovative ways for humans to interact with text-dense documents:

  1. Large, well-financed market of users that will pay for small advantages and efficiency improvements in workflows
  2. Heavy use of well-established jargon that can be easily machine-parsed and referenced (“Plaintiff”, “Company”, etc.), often in standardized document formats (PDF and MS Word)
  3. Documents that are amenable to objective, fact-based analysis – a merger agreement can be distilled down to a bullet-point list of facts, hypotheticals, and consequences; Moby Dick, not so much.

If you squint, legal text looks a lot like source code: there are terms, their definitions, and references to them, and they have to form graphs that obey certain rules (only one definition per term, all references to terms have to resolve, and so on). There is also established syntax norms around how terms are defined, how sections and subsections are notated, how court cases and external links are named, and how all these concepts are referenced in-line in the “source code”, if you will, of contracts and laws.

Because of this structured and regular nature of legal text, I think tools in this space have had an easier time building new interfaces and functionality to help professionals do their work. Many of the ideas in these tools would also improve the way we work with normal text documents and books, but we may have to wait on further advances in technology to realize them. Until then, I think legal tools offer us interesting glimpses into what futuristic reading and writing interfaces may look like.

Some examples

Coparse is a “smart PDF reader” that lets readers click on terms to jump to their definitions, expand chains of references, and automatically detect and resolve mentions of sections and named parties in documents. Coparse can also raise “compilation errors” when writing legal documents, warning the user of things like terms defined twice, mislabelled or missing sections, and unused definitions.

Casetext is a research tool for searching the legal literature, including the user’s own PDF documents and case law from past court decisions. Casetext features an AI-powered search tool that can take a specific description of a case (presumably a case a lawyer is defending) and search the available literature for similar cases, where important details like the charge and jurisdiction match.

There are a few other spaces that display weaker versions of the same properties that make legal such an interesting laboratory for UI experiments. One I’ve noticed this year is the space of tools to help researchers read and understand academic literature.

On the interface design side, here are a few research prototypes I’ve stumbled upon that try to give researchers better tools to digest academic papers.

  1. The ScholarPhi paper, Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols, explores an interface within a PDF reader that can automatically detect and surface various variables and terms and the places within the text where they are defined.
  2. The HEDDEx system tries to automate reliable definition extraction – answering the question “what does this term mean in this context?” for technical terms or newly defined terms in a research paper.
  3. Gehrmann, Layne, and Dernoncourt’s work on this paper about autogenerating section titles explores whether using language models to automatically synthesize short section titles can help scholars read papers more effectively.

Of course, we can’t end without mentioning Elicit, the team behind which is exploring using language models as research assistants. Elicit’s current work is less about interfaces and more about capabilities: can language models, by their brute-force intelligence and mastery of language, help researchers by summarizing, understanding, critiquing, and discovering papers relevant to every researcher’s work?

Both legal and academia are serious contexts of use of knowledge tools that I think breed interesting innovation in the space. As they advance and these ideas become battle-tested over time, hopefully we’ll see the best of these ideas making their way downstream to everyday tools for consumers as well.


On proving yourself

Imagining better interfaces to language models

I share new posts on my newsletter. If you liked this one, you should consider joining the list.

Have a comment or response? You can email me.