Streaming tokens without the flicker
Token-by-token rendering looks smooth on the demo. In production it jitters. Here's how we smoothed it out.
Streaming responses solve the perceived-latency problem and create a new one: every token re-flows the layout. Citations slide. Code blocks reformat. The whole bubble nudges left when the punctuation lands. Death by a thousand reflows.
After a few rounds of profiling, we ended up with a small set of tricks that look obvious in hindsight.
Pin the bubble width
The bot bubble has max-width: 90% and min-width: 32ch. As tokens arrive the text fills out instead of pushing the bubble wider, so neighbors do not jiggle. Citations and feedback rows stay welded to the bottom edge.
Render layout shells, not finals
When the model signals it is about to produce a code block, we render an empty <pre> placeholder immediately. The placeholder reserves vertical space at a sane default height. As the code streams in it fills the placeholder rather than expanding it. Same trick for tables and source lists.
Decouple token cadence from frame cadence
Tokens arrive in bursts of two or three, then nothing for 80ms, then a burst. If you naively setState on every token you trigger a render per token and the cursor stutters. We coalesce tokens into a single requestAnimationFrame flush, so React renders at most once per frame regardless of how chunked the upstream stream is.
Mind the autoscroll
Autoscrolling on every render fights the user when they try to scroll up to read a citation. Our rule: only autoscroll if the user was at the bottom before the new tokens arrived. A 32-pixel slack zone counts as “at the bottom” so users who scrolled up by mistake do not feel trapped.
The boring win
None of these are clever. The win is in the boringness of the result: the bubble inflates smoothly, the cursor pulses cleanly, and nothing under the bot’s reply jumps around. That is the bar.