Sometimes approximate is fine

This newsletter strays a bit from the subject of management, but not too far. We’re in an era where every person in management is thinking about the implications of AI and LLMs on their job, on their firms, and on their industries. What I’m finding, though, is that the nature of LLMs isn’t really well understood, and both promoters and skeptics are struggling to imagine how to apply them to problems effectively. We see this both in the claims that LLMs are completely useless and untrustworthy, and also in “AI-powered features” that customers find contribute negatively to their experience.
Rather than starting out talking about LLMs, I’m going to talk about another technology that is similar to LLMs in nature, but is much, much simpler — probabilistic data structures. Here’s the definition from the Redis documentation:
Probabilistic data structures give approximations of statistics such as counts, frequencies, and rankings rather than precise values. The advantage of using approximations is that they are adequate for many common purposes but are much more efficient to calculate. They sometimes have other advantages too, such as obfuscating times, locations, and other sensitive data.
This is probably most easily explained with an example. A Bloom Filter is a data structure that’s used to check for the presence of an item in a set, and it’s probabilistic in a specific way. It’s always right about items not being in the set, but it’s occasionally wrong about items that are in the set. The advantage of the Bloom Filter is that it doesn’t use much storage and it performs much more efficiently than an ordinary lookup will as the size of the set grows. For example, the Chrome browser maintains a list of malicious URLs. If a URL is not on the list, it’s treated as safe. If it may be on the list, Chrome checks the full list to make sure. This is invisible to the user but improves the performance of the browser.
There are a couple of points to take away here. The first is that in certain use cases, imprecision is fine. The second is that when you ultimately do need a precise answer, you need a backstop. If you want to know how many different users open your app each day knowing that some users open it more than once, you can use HyperLogLog. If getting an answer within a couple percent of the right one is good enough, you’re all set.
This brings me to LLMs, they’re not probabilistic data structures, but they share the same benefits and drawbacks. An LLM generates text (probabilistically) based on a prompt. To do so, the model has an encoded version of all of the text used to train the model (for the popular foundational models, that’s a huge amount of text) and the prompt itself (this is why putting a lot more data in the prompt is often helpful).
The power of the LLM is in the fact that it can store a representation of massive amounts of text and provide access to it in an extremely performant way relative to its size, returning responses that are both brief and coherent to just about any prompt. The weakness of LLMs is that they can be inaccurate in broad and unpredictable ways.
To use LLMs effectively, you need to keep these strengths and weaknesses in mind, just as you would with probabilistic data structures. Given that the data in the LLM is an encoded representation of the training data, a lot of specific details are going to be missing. For example, if I ask an LLM like Claude or GPT for a list of the most influential romantic comedies of the Nineties, it will give me a very useful answer. This is the sort of question that has been written about a lot, and there’s no “correct” answer, it’s a matter of opinion. If I ask it what the top 10 highest grossing romantic comedies of the Nineties were, I’m really taking my chances. There’s a clear right answer to this question, and the LLM may not quite have it or generate it given its training data.
Let’s compare this with Web search, briefly. If I ask Google for a list of the most influential romantic comedies, it provides hundreds of results, most of which were created purely for the purpose of search engine optimization (and at this point may just be AI-generated slop). Having a conversation with an AI chatbot about this question is likely to yield more insights than wading through a bunch of those articles. On the other hand, for the box office grosses, Google yields a number of reasonable looking results that include the source of the data, making it possible to verify.
Let’s go back to probabilistic data structures for a moment, they’re useful when imprecision is fine or verification is a practical option. Most importantly, they are most beneficial when there are opportunities for huge efficiency gains to be made.
These are the properties of LLMs as well, but they manifest differently. Probabilistic data structures allow you to perform computation more efficiently, saving memory and CPU cycles. LLMs enable humans to work more efficiently, synthesizing large amounts of data more quickly and efficiently than people can. Probabilistic data structures are inaccurate in predictable ways, whereas LLMs are inaccurate in unpredictable ways, and aren’t even predictable in their responses. They will usually give different responses even when given the same input.
LLMs work poorly in products when the product ignores their key properties. Perhaps the most famous recent example is the AI-assisted summaries of push notifications provided by Apple Intelligence. Imprecision renders the summaries useless, and the way to verify them is to open the push notifications and read them, which you were going to do anyway. Most importantly, there are minimal efficiency gains from summarizing push notifications. They summarize information that is already designed to be brief. Just as you wouldn’t use a probabilistic data structure to help manage a set of 10 items, it makes no sense to use an LLM to summarize push notifications.
On the other hand, using generative AI to perform the text to speech function that Siri uses would be great, because the algorithm processes the words in context it tends to do a much better job than Siri, which tends to read text word by word.
“Use the right tool for the job” is one of the most basic principles of engineering, and while LLMs seem (to some people) like they can do anything, the truth is that there are some use cases for which they are incredibly powerful, there are many cases where, due to their probabilistic nature, they are not. One of the most amazing things about LLMs is that because they are a system based on natural language, the model will do its best to generate a useful response to the prompt (where useful is defined by the system prompt that you may not have access to). Unfortunately, that’s true regardless of whether or not the responses are actually good.
I’d leave you with two thoughts — the first is that it’s a mistake to dismiss systems that return responses that are approximations, or, in the case of an LLM, will not always return the same response given the same input. This renders them unfit for some problems but makes them the optimal solutions for others. The second is that because LLMs draw their input not from a large database of facts but rather an encoded, summarized representation of a large corpus of text, their output needs to be verified. If the verification step (whatever it is) ruins the product experience, then an LLM isn’t the right tool for the job.
Subscribe to the newsletter.
Unorthodox management insights in your inbox.
Member discussion