Fun with Token Probabilities

2025-02-22Technology

Westworld arrived just a couple years before the AI hype train. But, there’s one scene that really stuck with me. It’s in Season 1 Episode 6, The Adversary:

Having “died”, Maeve finds herself awake in the tunnels under Westworld. Felix, a Westworld technician, explains to her that she is a host (ie. android).

To convince her, he shows a pad with the live dialog tree of what she will say just milliseconds before she says it.

The writers on Westworld were probably thinking of markov chains when they wrote this scene. But it turns out this is very applicable to Large Language Models (LLM) as well.

If you’ve ever heard audio feedback of yourself at a slight delay, you know the feeling.

Over the past year I’ve been building with LLM’s. And one of the most fascinating parts for me so far has been constrained grammar, such as BNF and JSON schema output.

But limiting the output to a fixed format isn’t the only thing LLM’s can do. The other major way to manipulate LLM completions are logit bias and token probabilities. So, I wanted to try to recreate something like the Westworld dialog tree UI to explore this as a mini art project.

And, this is what I came up with:

Demo of the TUI for exploring token probabilities.

It’s a small TUI that sends requests to LLama.cpp Server. This demo is showing Qwen 2.5 32B.

Using the tool, you can enter a sentence and it will show the next 4 words (really tokens). And you can then navigate with the arrow keys to see what it would generate if you choose one of the alternate tokens.

This makes it really easy to see a tiny bit more of the internal relationships a model has built up during training.

And, it reminds me of the work done by David Rozado, exploring the biases LLM’s show towards various demographics groups.

Even though this capability has existed for at least 2 years, I think it bears repeating that technology adoption takes a surprisingly long time. This is something that a lot of people fail to fully appreciate.

There’s usually a significant (ie. years) lag between AI research and when it becomes available in mainstream apps. For example, LM Studio just announced beta support for speculative decoding, which was also added to LLama.cpp in 2023.

So, what will get built with logit bias and token probabilities? Who knows! But, I’m totally here for it.

There wasn’t really much point to this other than highlighting the cool Westworld scene and showing that this is basically how LLM’s work. But thanks for reading my meandering thoughts.