The model landscape
There is no single "best" AI model. Capability, cost, latency and context length trade off against each other, so teams keep a toolbox of model sizes and reach for the right one per task. Here's the mental model, the best-in-class families compared, and when to use each.
Why different sizes exist
Capability
Bigger / frontier models reason better and follow complex instructions more reliably.
Cost & latency
Smaller models are dramatically cheaper and faster — and you pay on every request (inference).
Context length
Some models hold far more text at once; bigger windows cost more and can dilute focus.
Control & privacy
Open-weight models can run in your own environment for data residency and customisation.
The practical rule (echoed by every major provider): prototype on a strong model to prove the task is possible, then downshift to the smallest model that still passes your evals. See Choosing a model and Large reasoning models.
Model tiers compared
Representative examples per tier. Specs move fast — confirm against the provider links below.
| Tier | What it is | Examples | Context | Best for | Watch out |
|---|---|---|---|---|---|
| Frontier / flagship | The most capable general models — strongest reasoning, instruction-following and agentic behaviour. | Claude Opus (Anthropic) · GPT-4o / GPT-4.1 (OpenAI) · Gemini 2.5 Pro (Google) · Nova Premier (Amazon) | ~128K–1M+ tokens | Hardest tasks, complex multi-step agents, and proving a use case is even possible before optimising. | Highest cost and latency — overkill for simple, high-volume work. |
| Balanced / workhorse | Most of frontier quality at a fraction of the cost and latency. The sensible default for production. | Claude Sonnet (Anthropic) · GPT-4o (OpenAI) · Gemini 2.5 Flash (Google) · Nova Pro (Amazon) | ~128K–1M tokens | The bulk of production features — RAG, chat, drafting, summarisation, tool use. | Slightly below frontier on the very hardest reasoning. |
| Small / fast / efficient | Cheap, low-latency models for high volume, simple tasks and edge/on-device use. | Claude Haiku (Anthropic) · GPT-4o mini (OpenAI) · Gemini Flash-Lite / Gemma (Google) · Nova Micro & Lite (Amazon) · Phi (Microsoft) · Granite small (IBM) | ~8K–128K tokens | Classification, extraction, routing, autocomplete and anything high-throughput or latency-sensitive. | Weaker on complex reasoning and nuanced, open-ended tasks. |
| Reasoning models | Spend extra 'thinking' compute (test-time) on an internal chain of reasoning before answering. | o-series e.g. o3 / o4-mini (OpenAI) · Claude with extended thinking (Anthropic) · Gemini thinking (Google) · DeepSeek-R1 (open) | ~128K–1M tokens | Hard maths, coding, planning and multi-constraint problems where a snap answer fails. | Slower and pricier (many hidden reasoning tokens). Wasteful on easy tasks. |
| Open-weight | Downloadable weights you can self-host, fine-tune and run in your own environment. | Llama (Meta) · Granite (IBM) · Phi (Microsoft) · Gemma (Google) · Mistral / Mixtral | ~8K–128K+ tokens | Privacy / on-prem / regulated workloads, cost control at scale, and heavy customisation via fine-tuning. | You own the infrastructure, scaling, safety and updates. |
| Multimodal | Understand (and sometimes generate) more than text — images, audio, and video. | GPT-4o (OpenAI) · Gemini (Google) · Claude vision (Anthropic) · Nova (Amazon) | Varies by model | Document & screenshot understanding, image Q&A, UI/diagram analysis, voice interfaces. | Can misread small text, charts and precise spatial detail — verify anything exact. |
Best-in-class by provider
| Provider | Flagship family | Where to access | Known for | |
|---|---|---|---|---|
| OpenAI | GPT-4o / GPT-4.1, o-series (reasoning), GPT-4o mini | OpenAI API · Microsoft Azure OpenAI | Strong all-round capability and a leading reasoning line. | Docs ↗ |
| Anthropic | Claude — Opus (frontier), Sonnet (balanced), Haiku (fast) | Anthropic API · Amazon Bedrock · Google Vertex AI | Long context, strong writing/coding, safety & steerability focus. | Docs ↗ |
| Gemini (Pro / Flash / Flash-Lite) · Gemma (open) | Google AI Studio · Vertex AI | Very long context and strong native multimodality. | Docs ↗ | |
| Microsoft | Phi (small open models) · hosts OpenAI & many others | Azure AI Foundry (multi-vendor model catalog) | Efficient small language models; enterprise integration & tooling. | Docs ↗ |
| Amazon | Nova (Micro / Lite / Pro / Premier) · Titan | Amazon Bedrock (also serves Anthropic, Meta, Mistral, etc.) | Cost-tiered family and a single API across many vendors' models. | Docs ↗ |
| IBM | Granite (open, enterprise-focused) | IBM watsonx.ai | Governance, transparency and fit for regulated enterprise use. | Docs ↗ |
| Meta | Llama (open weights) | Self-host · Bedrock · Azure · most clouds | Leading open-weight ecosystem for customisation and self-hosting. | Docs ↗ |
When to use which
Starting a new feature / unsure if it's even feasible
→ Prototype on a frontier model to prove it works, then downshift to the cheapest model that still passes your evals.
High volume, latency-sensitive, or a simple task (tagging, routing)
→ A small/fast model — far cheaper and quicker, with little quality loss on easy tasks.
Hard maths, coding, planning or multi-step logic
→ A reasoning model — the extra test-time compute pays off here (but not on easy tasks).
Sensitive data, on-prem, or a regulated environment
→ A self-hosted open-weight model (Llama, Granite, Phi) so data never leaves your environment.
Very long documents or whole codebases in one prompt
→ A large-context model (e.g. Gemini, Claude) — but consider RAG to keep cost and 'lost-in-the-middle' risk down.
Images, screenshots, audio or video as input
→ A multimodal model (GPT-4o, Gemini, Claude vision, Nova).
Sources & further reading
Primary references from the providers themselves.
- IBMWhat are foundation models? ↗
- IBMGranite models (open, enterprise) ↗
- GoogleVertex AI — choosing a model ↗
- GoogleGemini models overview ↗
- MicrosoftAzure AI Foundry model catalog ↗
- MicrosoftPhi small language models ↗
- AnthropicClaude models & choosing one ↗
- OpenAIOpenAI models documentation ↗
- AmazonAmazon Nova & Bedrock ↗
- MetaLlama open models ↗
Watch — video explainers
Reputable channels (linked at channel level so they stay live).
- ▶ IBMIBM Technology ↗
Short, whiteboard-style explainers: LLMs, RAG, AI agents, fine-tuning vs RAG, foundation models.
- ▶ Independent3Blue1Brown ↗
The best visual intuition for neural networks, and the 'But what is a Transformer / attention' series.
- ▶ Independent (ex-OpenAI/Tesla)Andrej Karpathy ↗
Deep but accessible: 'Intro to Large Language Models' (1-hour talk) and 'Deep Dive into LLMs'.
- ▶ GoogleGoogle Cloud Tech ↗
Gemini, Vertex AI model garden, and practical generative-AI on Google Cloud.
- ▶ AnthropicAnthropic ↗
Claude capabilities, prompt engineering, building effective agents and AI safety.
- ▶ OpenAIOpenAI ↗
Model launches, reasoning models and developer/API walkthroughs.
- ▶ MicrosoftMicrosoft Developer ↗
Phi small models, Azure AI Foundry and enterprise AI patterns.
- ▶ AmazonAWS ↗
Amazon Bedrock, the Nova family and choosing models on AWS.
The AI model landscape changes quickly. Treat this page as a stable mental model and the linked provider docs as the source of truth for current model names, sizes and prices.