Reference

The model landscape

There is no single "best" AI model. Capability, cost, latency and context length trade off against each other, so teams keep a toolbox of model sizes and reach for the right one per task. Here's the mental model, the best-in-class families compared, and when to use each.

Why different sizes exist

Capability

Bigger / frontier models reason better and follow complex instructions more reliably.

Cost & latency

Smaller models are dramatically cheaper and faster — and you pay on every request (inference).

Context length

Some models hold far more text at once; bigger windows cost more and can dilute focus.

Control & privacy

Open-weight models can run in your own environment for data residency and customisation.

The practical rule (echoed by every major provider): prototype on a strong model to prove the task is possible, then downshift to the smallest model that still passes your evals. See Choosing a model and Large reasoning models.

Model tiers compared

Representative examples per tier. Specs move fast — confirm against the provider links below.

TierWhat it isExamplesContextBest forWatch out
Frontier / flagshipThe most capable general models — strongest reasoning, instruction-following and agentic behaviour.Claude Opus (Anthropic) · GPT-4o / GPT-4.1 (OpenAI) · Gemini 2.5 Pro (Google) · Nova Premier (Amazon)~128K–1M+ tokensHardest tasks, complex multi-step agents, and proving a use case is even possible before optimising.Highest cost and latency — overkill for simple, high-volume work.
Balanced / workhorseMost of frontier quality at a fraction of the cost and latency. The sensible default for production.Claude Sonnet (Anthropic) · GPT-4o (OpenAI) · Gemini 2.5 Flash (Google) · Nova Pro (Amazon)~128K–1M tokensThe bulk of production features — RAG, chat, drafting, summarisation, tool use.Slightly below frontier on the very hardest reasoning.
Small / fast / efficientCheap, low-latency models for high volume, simple tasks and edge/on-device use.Claude Haiku (Anthropic) · GPT-4o mini (OpenAI) · Gemini Flash-Lite / Gemma (Google) · Nova Micro & Lite (Amazon) · Phi (Microsoft) · Granite small (IBM)~8K–128K tokensClassification, extraction, routing, autocomplete and anything high-throughput or latency-sensitive.Weaker on complex reasoning and nuanced, open-ended tasks.
Reasoning modelsSpend extra 'thinking' compute (test-time) on an internal chain of reasoning before answering.o-series e.g. o3 / o4-mini (OpenAI) · Claude with extended thinking (Anthropic) · Gemini thinking (Google) · DeepSeek-R1 (open)~128K–1M tokensHard maths, coding, planning and multi-constraint problems where a snap answer fails.Slower and pricier (many hidden reasoning tokens). Wasteful on easy tasks.
Open-weightDownloadable weights you can self-host, fine-tune and run in your own environment.Llama (Meta) · Granite (IBM) · Phi (Microsoft) · Gemma (Google) · Mistral / Mixtral~8K–128K+ tokensPrivacy / on-prem / regulated workloads, cost control at scale, and heavy customisation via fine-tuning.You own the infrastructure, scaling, safety and updates.
MultimodalUnderstand (and sometimes generate) more than text — images, audio, and video.GPT-4o (OpenAI) · Gemini (Google) · Claude vision (Anthropic) · Nova (Amazon)Varies by modelDocument & screenshot understanding, image Q&A, UI/diagram analysis, voice interfaces.Can misread small text, charts and precise spatial detail — verify anything exact.

Best-in-class by provider

ProviderFlagship familyWhere to accessKnown for
OpenAIGPT-4o / GPT-4.1, o-series (reasoning), GPT-4o miniOpenAI API · Microsoft Azure OpenAIStrong all-round capability and a leading reasoning line.Docs ↗
AnthropicClaude — Opus (frontier), Sonnet (balanced), Haiku (fast)Anthropic API · Amazon Bedrock · Google Vertex AILong context, strong writing/coding, safety & steerability focus.Docs ↗
GoogleGemini (Pro / Flash / Flash-Lite) · Gemma (open)Google AI Studio · Vertex AIVery long context and strong native multimodality.Docs ↗
MicrosoftPhi (small open models) · hosts OpenAI & many othersAzure AI Foundry (multi-vendor model catalog)Efficient small language models; enterprise integration & tooling.Docs ↗
AmazonNova (Micro / Lite / Pro / Premier) · TitanAmazon Bedrock (also serves Anthropic, Meta, Mistral, etc.)Cost-tiered family and a single API across many vendors' models.Docs ↗
IBMGranite (open, enterprise-focused)IBM watsonx.aiGovernance, transparency and fit for regulated enterprise use.Docs ↗
MetaLlama (open weights)Self-host · Bedrock · Azure · most cloudsLeading open-weight ecosystem for customisation and self-hosting.Docs ↗

When to use which

Starting a new feature / unsure if it's even feasible

Prototype on a frontier model to prove it works, then downshift to the cheapest model that still passes your evals.

High volume, latency-sensitive, or a simple task (tagging, routing)

A small/fast model — far cheaper and quicker, with little quality loss on easy tasks.

Hard maths, coding, planning or multi-step logic

A reasoning model — the extra test-time compute pays off here (but not on easy tasks).

Sensitive data, on-prem, or a regulated environment

A self-hosted open-weight model (Llama, Granite, Phi) so data never leaves your environment.

Very long documents or whole codebases in one prompt

A large-context model (e.g. Gemini, Claude) — but consider RAG to keep cost and 'lost-in-the-middle' risk down.

Images, screenshots, audio or video as input

A multimodal model (GPT-4o, Gemini, Claude vision, Nova).

Sources & further reading

Primary references from the providers themselves.

Watch — video explainers

Reputable channels (linked at channel level so they stay live).

The AI model landscape changes quickly. Treat this page as a stable mental model and the linked provider docs as the source of truth for current model names, sizes and prices.