I spent the last couple of months delving deeper into how I could integrate elements of modern machine learning with my love of building personal knowledge tools. This is a space brimming with untapped ideas and experiments to come. One open question for me is how exactly human users should interact with AI integrated into knowledge tools and creative tools.
There are undoubtedly many ways to break this apart, but today, I’m finding it helpful to draw a spectrum from AI being used as a tool to AI being treated like a collaborator in our workflows.
AI as a tool
In this model, the human calls on the AI to improve or enhance a thing they’re trying to achieve, in very specific, well-defined ways.
This might take the form of annotations or specific UI elements to “call on” the AI, like some language model leaving comments on your essay to fix mistakes, a feature that highlights awkward wording, or an easy way to automate repetitive design tasks in a prototyping tool. In marketing speak, this might be called “Smart X”. e.g. “Smart erase” to automatically clean up a photo in a photo editor, or “Smart spell check” to automatically fix nuanced grammar mistakes.
As a real case study, Apple’s last few generations of iPhones have featured “Smart HDR”, which takes the basic principles of HDR, compositing multiple photos from one point of view to achieve better quality, and leans into AI to do more like reducing blur and adjusting skin tones.
With this approach of integrating AI into our creative workflows, the AI is always subordinate to human users. It has no agency but that which is granted exactly and literally by the human operator.
AI as a collaborator
In this model, the human and the AI are two independent, autonomous agents at equal level of engagement with the work being produced, and they have access to the same interaction mechanics and tools to accomplish the task together. Working with this kind of AI is like working with a smart human collaborator – you don’t invoke them to help you accomplish something specific they’re there to do; you learn how they think, they learn how you think, and you develop a sense of how to produce the best ideas together. The collaboration is much more organic, and there’s a constant feedback loop informing both participants about the ever-changing creative direction.
I could imagine a world where photo editors, writing apps, note-taking apps, and IDEs all have collaborator-style AI built in that you can turn on for it to lurk in the background and chime in here or there with suggestions like:
- “This style of color you’re using reminds me of this artist you’ve liked / you haven’t heard of yet. I’ll paste some images from their portfolio here.”
- “This question you wrote down sounds like a good one to tackle with this mental model you wrote about last week. Let me fill out the template for this question here.”
- “You’re using this pattern a lot in your code — here, I’ll refactor it into a function/template we can reuse.”
There are lots of details to iron out if we want to bring any of these ideas into real tools today, not to mention advancing the state of AI to achieve this level of helpfulness and accuracy. Details like whether the human should be required to approve every machine-made suggestion or to whom copyright is assigned. But these minutiae aren’t so important for the thought experiment I am laying out here, because one way or another, these questions will be answered.
The collaborator model has two benefits:
- It dramatically opens up the potential capabilities of AI built into our tools, because the AI can do anything a human collaborator could do without needing some extra button or call-out in the interface for controlling “AI-assisted” features.
- The fact that AI interacts with the tool and the human creator using the same mechanics familiar to the user already means there are no new interaction mechanics for the user to learn to take advantage of its intelligence.
The second benefit feels important to me. In Transferable mechanics, Mary Rose notes that good interaction design often lets users apply knowledge about mechanics they learned before to interact with something, to then interact with something new. This lets them gain new capabilities without understanding new mechanics or abstractions. It helps new users learn to use tools or play games much faster, and makes the process more enjoyable.
The “type-ahead” suggestions we’re starting to see in Google Docs and Gmail is an imperfect example of the collaborator model of AI. Rather than having a “complete the sentence” button in a right-click menu, the AI “types ahead” in little grey letters, so you can confirm its suggestions with a single tap of the Tab
key. There’s hardly anything new to learn, because it’s just finishing our sentences. Something feels pretty right about it, compared to having to “call on” or “invoke” a feature. GitHub Copilot uses a similar mechanic, and it feels pretty good to me too.
In these tools, the AI inherits our powers, and we inherit their intelligence in return by working alongside them using the same buttons and knobs.
Obviously, the reality is that there are many features and ideas that will fall somewhere in between these two archetypes. But I think this spectrum is a useful tool to help us imagine interesting ways we can design AI into our creative tools: how much agency should it have? How specific or general should its expertise be? Is it a button we press, or another player whose footsteps we can follow to stumble upon something new?
Thanks to Karina Nguyen for helpful feedback that added to a revision of this post.
← The web browser as a tool of thought
Towards a research community for better thinking tools →
I share new posts on my newsletter. If you liked this one, you should consider joining the list.
Have a comment or response? You can email me.