Reference

The model landscape

There is no single "best" AI model. Capability, cost, latency and context length trade off against each other, so teams keep a toolbox of model sizes and reach for the right one per task. Here's the mental model, the best-in-class families compared, and when to use each.

Why different sizes exist

Capability

Bigger / frontier models reason better and follow complex instructions more reliably.

Cost & latency

Smaller models are dramatically cheaper and faster — and you pay on every request (inference).

Context length

Some models hold far more text at once; bigger windows cost more and can dilute focus.

Control & privacy

Open-weight models can run in your own environment for data residency and customisation.

The practical rule (echoed by every major provider): prototype on a strong model to prove the task is possible, then downshift to the smallest model that still passes your evals. See Choosing a model and Large reasoning models.

Model tiers compared

Representative examples per tier. Specs move fast — confirm against the provider links below.

Tier	What it is	Examples	Context	Best for	Watch out
Frontier / flagship	The most capable general models — strongest reasoning, instruction-following and agentic behaviour.	Claude Opus (Anthropic) · GPT-4o / GPT-4.1 (OpenAI) · Gemini 2.5 Pro (Google) · Nova Premier (Amazon)	~128K–1M+ tokens	Hardest tasks, complex multi-step agents, and proving a use case is even possible before optimising.	Highest cost and latency — overkill for simple, high-volume work.
Balanced / workhorse	Most of frontier quality at a fraction of the cost and latency. The sensible default for production.	Claude Sonnet (Anthropic) · GPT-4o (OpenAI) · Gemini 2.5 Flash (Google) · Nova Pro (Amazon)	~128K–1M tokens	The bulk of production features — RAG, chat, drafting, summarisation, tool use.	Slightly below frontier on the very hardest reasoning.
Small / fast / efficient	Cheap, low-latency models for high volume, simple tasks and edge/on-device use.	Claude Haiku (Anthropic) · GPT-4o mini (OpenAI) · Gemini Flash-Lite / Gemma (Google) · Nova Micro & Lite (Amazon) · Phi (Microsoft) · Granite small (IBM)	~8K–128K tokens	Classification, extraction, routing, autocomplete and anything high-throughput or latency-sensitive.	Weaker on complex reasoning and nuanced, open-ended tasks.
Reasoning models	Spend extra 'thinking' compute (test-time) on an internal chain of reasoning before answering.	o-series e.g. o3 / o4-mini (OpenAI) · Claude with extended thinking (Anthropic) · Gemini thinking (Google) · DeepSeek-R1 (open)	~128K–1M tokens	Hard maths, coding, planning and multi-constraint problems where a snap answer fails.	Slower and pricier (many hidden reasoning tokens). Wasteful on easy tasks.
Open-weight	Downloadable weights you can self-host, fine-tune and run in your own environment.	Llama (Meta) · Granite (IBM) · Phi (Microsoft) · Gemma (Google) · Mistral / Mixtral	~8K–128K+ tokens	Privacy / on-prem / regulated workloads, cost control at scale, and heavy customisation via fine-tuning.	You own the infrastructure, scaling, safety and updates.
Multimodal	Understand (and sometimes generate) more than text — images, audio, and video.	GPT-4o (OpenAI) · Gemini (Google) · Claude vision (Anthropic) · Nova (Amazon)	Varies by model	Document & screenshot understanding, image Q&A, UI/diagram analysis, voice interfaces.	Can misread small text, charts and precise spatial detail — verify anything exact.

Best-in-class by provider

Provider	Flagship family	Where to access	Known for
OpenAI	GPT-4o / GPT-4.1, o-series (reasoning), GPT-4o mini	OpenAI API · Microsoft Azure OpenAI	Strong all-round capability and a leading reasoning line.	Docs ↗
Anthropic	Claude — Opus (frontier), Sonnet (balanced), Haiku (fast)	Anthropic API · Amazon Bedrock · Google Vertex AI	Long context, strong writing/coding, safety & steerability focus.	Docs ↗
Google	Gemini (Pro / Flash / Flash-Lite) · Gemma (open)	Google AI Studio · Vertex AI	Very long context and strong native multimodality.	Docs ↗
Microsoft	Phi (small open models) · hosts OpenAI & many others	Azure AI Foundry (multi-vendor model catalog)	Efficient small language models; enterprise integration & tooling.	Docs ↗
Amazon	Nova (Micro / Lite / Pro / Premier) · Titan	Amazon Bedrock (also serves Anthropic, Meta, Mistral, etc.)	Cost-tiered family and a single API across many vendors' models.	Docs ↗
IBM	Granite (open, enterprise-focused)	IBM watsonx.ai	Governance, transparency and fit for regulated enterprise use.	Docs ↗
Meta	Llama (open weights)	Self-host · Bedrock · Azure · most clouds	Leading open-weight ecosystem for customisation and self-hosting.	Docs ↗

When to use which

Starting a new feature / unsure if it's even feasible

→ Prototype on a frontier model to prove it works, then downshift to the cheapest model that still passes your evals.

High volume, latency-sensitive, or a simple task (tagging, routing)

→ A small/fast model — far cheaper and quicker, with little quality loss on easy tasks.

Hard maths, coding, planning or multi-step logic

→ A reasoning model — the extra test-time compute pays off here (but not on easy tasks).

Sensitive data, on-prem, or a regulated environment

→ A self-hosted open-weight model (Llama, Granite, Phi) so data never leaves your environment.

Very long documents or whole codebases in one prompt

→ A large-context model (e.g. Gemini, Claude) — but consider RAG to keep cost and 'lost-in-the-middle' risk down.

Images, screenshots, audio or video as input

→ A multimodal model (GPT-4o, Gemini, Claude vision, Nova).

Sources & further reading

Primary references from the providers themselves.

Watch — video explainers

Reputable channels (linked at channel level so they stay live).

▶ IBMIBM Technology ↗
Short, whiteboard-style explainers: LLMs, RAG, AI agents, fine-tuning vs RAG, foundation models.
▶ Independent3Blue1Brown ↗
The best visual intuition for neural networks, and the 'But what is a Transformer / attention' series.
▶ Independent (ex-OpenAI/Tesla)Andrej Karpathy ↗
Deep but accessible: 'Intro to Large Language Models' (1-hour talk) and 'Deep Dive into LLMs'.
▶ GoogleGoogle Cloud Tech ↗
Gemini, Vertex AI model garden, and practical generative-AI on Google Cloud.
▶ AnthropicAnthropic ↗
Claude capabilities, prompt engineering, building effective agents and AI safety.
▶ OpenAIOpenAI ↗
Model launches, reasoning models and developer/API walkthroughs.
▶ MicrosoftMicrosoft Developer ↗
Phi small models, Azure AI Foundry and enterprise AI patterns.
▶ AmazonAWS ↗
Amazon Bedrock, the Nova family and choosing models on AWS.

The AI model landscape changes quickly. Treat this page as a stable mental model and the linked provider docs as the source of truth for current model names, sizes and prices.