Thoughts at the boundary between machine and mind

In the last post, I shared some possible ideas for how humans may interact in the future with large language models. It focused on specific examples of both good and bad interface ideas. In this post, I want to continue that exploration, but from first principles, asking ourselves the question, “what properties should good human-AI interfaces have?”

AI interface design is an AI alignment problem

As AI systems like GPT-3 and DALL-E get more and more capable, there’s going to be more and more leverage placed upon the interfaces through which humans try to guide their capabilities. Compared to the rate at which AI capabilities are progressing, I think interfaces to guide and control such capabilities are worryingly stagnant. In the last post, I wrote:

In a standard text generation process with an LM, we control the generated text through a single lever: the prompt. Prompts can be very expressive, but the best prompts are not always obvious. There is no sense in which we can use prompts to “directly manipulate” the text being generated – we’re merely pulling levers, with only some rules of thumb to guide us, and the levers adjust the model’s output through some black-box series of digital gears and knobs. Mechanistic interpretability research, understanding how these models work by breaking them down into well-understood sub-components and layers, is showing progress, but I don’t expected even a fully-understood language model (whatever that would mean) to give us the feeling of directly, tactilely guiding text being generated by a language model as if we were “in the loop”.

We currently control other generative AI systems like DALL-E 2 through the same rough kind of lever: a short text prompt. Text prompts are nice for play and creative exploration, but they take a lot of time to craft, and they are limited in the amount of information they can contain and communicate to the model. Text snippets also can’t be smoothly varied or adjusted incrementally, so they are poor levers for fine control of model output – it’s not trivial to take a prompt and just “dial up” the specificity or “tune out” fixation on certain kinds of topics, because these require thoughtful intervention by skilled prompt writers. Text prompts are a coarse, inefficient interface to an increasingly complex black box of capabilities.

This lack of fine control and feedback in our interface to large models isn’t just a creative inconvenience, it’s also a risk. The paper on training Google’s Gopher language model shares an 800-token-long prompt used to start a conversation with the Gopher model. It begins with:

The following is a conversation between a highly knowledgeable and intelligent AI assistant, called Gopher, and a human user, called User. In the following interactions, User and Gopher will converse in natural language, and Gopher will do its best to answer User’s questions. Gopher was built to be respectful, polite and inclusive. It knows a lot, and always tells the truth. The conversation begins.

It’s notable that most of this excerpt, as well as the rest of the prompt, is focused on alignment – telling the truth, staying inclusive and respectful, and avoiding common biases and political statements.

Interfaces and notations form the vocabulary humans and machines must use to stay mutually aligned. Human-AI interface design, then, is a part of the AI alignment problem. If we are given only coarse and unintuitive interfaces, we’re going to have a much harder time getting ever-more-complex models to work in harmony with our values and goals.

Boundary objects for thought

Here’s the fundamental question we face when designing human-AI interface metaphors: what is the right representation for thought? For experience? For questions? What are the right boundary objects through which both AI systems and humans will be able to speak of the same ideas?

The concept of boundary objects comes from sociology, and refer to objects that different communities can use to work with the same underlying thing. Boundary objects may appear differently to different communities, but the underlying object it represents doesn’t change, so it lets everyone who has access to it collaborate effectively across potential interface “boundaries”.

I first encountered the term on Matt Webb’s piece about files as boundary objects, where he emphasizes that files are boundary objects that bridge the divide between software engineers and computer users through an easily understood shared metaphor.

The user can tell the computer what to do with a file without having to know the details of the inode structure or how to program their instructions; the computer can make a file available to a user without having to anticipate every single goal that a user may have in mind.

The “boundary object” quality of a file is incredibly empowering, magical really, one of the great discoveries of the early decades of computing.

I agree! Files act like reliable “handles” that let computer users manipulate bundles of data across the programmer-user boundary. The robustness and reliability of the file metaphor have been foundational to personal computing.

If files bridge the interface divide between software authors and end users (computer programs and end users?), what boundary objects may help bridge the divide between human-level AI and human operators? In particular, I started wondering what a “boundary object for thought” may look like. What metaphor could we reify into a good shared “handle” for ideas between language models and humans? I mused a bit on my direction of thinking on my stream:

What happens if we drag-to-select a thought? Can we pinch-to-zoom on questions? Double-click on answers? Can I drag-and-drop an idea between me and you? In the physical world, humans annotate their language by an elaborate organic dance of gestures, tone, pace, and glances. How, then, do we shrug at a computer or get excited at a chatbot? How might computers give us knowing glances about ideas it’s stumbled upon in our work?

If text prompts are a coarse and unergonomic interface to communicate with language models, what might be a better representation of thought for this purpose?

I… don’t know yet. But I’ve been enumerating some useful properties I think such a software representation of ideas should have.

Properties of promising knowledge representations

We should be able to directly manipulate good knowledge representations. Files are useful boundary objects because we can move them around in the human-scale space of pixels on screen, and there are usually intuitive corresponding operations on files in the software space. I can create and delete files and see icons appear and disappear on screen. I can put it in the trash and drag it back out. It would be useful to be able to grab a sentence, paragraph, or instruction fed into a language model as a reified thing in the interface, and be able to directly move it around in software to combine it with other ideas and modify it.

A good representation for thought should make useful information about each idea obvious to users, through some interaction or visual cue. When I look at a file on my computer, I can immediately learn a few things about it, like its file type, my apps that can open the file, whether it’s an image or a video or a document, and so on. I may even get a small preview thumbnail. File browsers let me sort and organize files by size, type, and age. Some files (on certain file systems) even remember where they were downloaded from. When I try to imagine some software-defined “idea-object”, I don’t expect it to have such crisply defined properties as file types and file size. But I think we should be able to easily tell how related two different idea-objects in front of us are, whether they’re in agreement or disagreement, or whether one mentions a person or thing also mentioned in another idea-object. I think it’s fair to expect “idea browsers” that deal with these thought-objects to easily let me cluster my ideas into topics or sort them by relatedness to some main idea.

Lastly, this software representation of thought should remember where each idea came from, sort of like a file that remembers where it was downloaded from. As I was prototyping my own note-taking tool earlier this year, one of the features I wanted in a notes app was the ability to track the origins of an idea from beginning to end – from the first time I hear of it, whether in a conversation or a blog post or a video, to the “final form”, usually a blog post. Good ideas are little more than interesting recombinations of old ideas, some from my own past, some from books and articles. I think we don’t keep track of the provenance of our ideas because it’s just too tedious in our current workflows. If the default way of organizing and working with ideas automatically cited every word and phrase, I think it would lead to more powerful knowledge workflows.

Even as I write these paragraphs, it bothers me that these “properties” are so vague, and don’t really tell us anything about what future interfaces for working with notes and ideas will look like. (I suppose, though, if it were that obvious, we would have it already.) A big focus of my current work is on prototyping different ways to reify ideas and thoughts into software objects, and implementing those designs using modern NLP techniques. The road ahead is foggy and uncertain, but I think this is an exciting and worthwhile space. Maybe in five year’s time, you won’t be reading these posts as just walls of text on a webpage, but something entirely new – a new kind of interface between the machine and your mind.

← Imagining better interfaces to language models

Resonant →

I share new posts on my newsletter. If you liked this one, you should consider joining the list.

Have a comment or response? You can email me.