Two different tricks for fast LLM inference
When you use their fast mode, you get real Opus 4.6, while when you use OpenAI’s fast mode you get GPT-5.3-Codex-Spark, not the real GPT-5.3-Codex.
The AI labs aren’t advertising the details of how their fast modes work, but I’m pretty confident it’s something like this: Anthropic’s fast mode is backed by low-batch-size inference, while OpenAI’s fast mode is backed by special monster Cerebras chips.
How Anthropic’s fast mode worksThe tradeoff at the heart of AI inference economics is batching, because the main bottleneck is memory.
How OpenAI’s fast mode worksOpenAI’s fast mode does not work anything like this.
Also, they told us in the announcement blog post exactly what’s backing their fa…
1 час назад @ seangoedecke.com
infomate
