← All posts

Streaming tokens without the flicker

Token-by-token rendering looks smooth on the demo. In production it jitters. Here's how we smoothed it out.

Streaming responses solve the perceived-latency problem and create a new one: every token re-flows the layout. Citations slide. Code blocks reformat. The whole bubble nudges left when the punctuation lands. Death by a thousand reflows.

After a few rounds of profiling, we ended up with a small set of tricks that look obvious in hindsight.

Pin the bubble width

The bot bubble has max-width: 90% and min-width: 32ch. As tokens arrive the text fills out instead of pushing the bubble wider, so neighbors do not jiggle. Citations and feedback rows stay welded to the bottom edge.

Render layout shells, not finals

When the model signals it is about to produce a code block, we render an empty <pre> placeholder immediately. The placeholder reserves vertical space at a sane default height. As the code streams in it fills the placeholder rather than expanding it. Same trick for tables and source lists.

Decouple token cadence from frame cadence

Tokens arrive in bursts of two or three, then nothing for 80ms, then a burst. If you naively setState on every token you trigger a render per token and the cursor stutters. We coalesce tokens into a single requestAnimationFrame flush, so React renders at most once per frame regardless of how chunked the upstream stream is.

Mind the autoscroll

Autoscrolling on every render fights the user when they try to scroll up to read a citation. Our rule: only autoscroll if the user was at the bottom before the new tokens arrived. A 32-pixel slack zone counts as “at the bottom” so users who scrolled up by mistake do not feel trapped.

The boring win

None of these are clever. The win is in the boringness of the result: the bubble inflates smoothly, the cursor pulses cleanly, and nothing under the bot’s reply jumps around. That is the bar.

More from the blog

All posts →
Maya Reyes ·

Citations vs hallucinations: why grounding matters

An unsourced answer is a guess wearing a tie. Here's how grounding closes the gap between confidence and correctness.

Dev Patel ·

Why our widget lives inside a Shadow DOM

Embedding third-party UI on someone else's page is a CSS minefield. Shadow DOM is the only sane defuser.

The FluentBot team ·

Why we built FluentBot

Most chatbots hallucinate because they were never trained on your content. We took a different path.