Beyond Chat: APIs, Agents & What's Next
Chat interfaces are just the surface. Underneath, there is a powerful ecosystem of APIs, agent frameworks, and emerging tools that let you build with AI, not just talk to it. This final part of the Academy takes you from consumer to creator — and looks at where all of this is heading.
1. Chat Interfaces vs APIs
When you use ChatGPT, Claude.ai, or Gemini through their websites, you are using a chat interface — a pre-built front end designed for conversation. An API (Application Programming Interface) is the raw connection underneath. It lets you send prompts and receive responses programmatically, from your own code.
Chat Interface
- Type a message, get a reply
- Designed for manual, interactive use
- Fixed features (file upload, web search, etc.)
- Subscription pricing (monthly fee)
- No coding required
API
- Send a request, receive structured data
- Designed for automation and integration
- Full control over parameters and behaviour
- Pay-per-use pricing (per token)
- Requires some programming knowledge
When Would You Want an API?
You'd reach for the API when you need to: process hundreds of documents automatically, embed AI into an app or website, build a custom chatbot for your business, run the same prompt across many inputs, or integrate AI into an existing workflow. If you are doing the same thing manually more than a dozen times, the API is probably worth learning.
2. Understanding Token Pricing
API pricing is based on tokens — the chunks of text that models process internally. Understanding tokens is essential to predicting costs and choosing the right model for your budget.
What Is a Token?
A token is roughly three-quarters of a word in English. "ChatGPT is amazing" is about four tokens. A typical page of text is around 400-500 tokens. Models are priced per million tokens processed, with prices typically listed as "per 1M tokens" for both input (what you send) and output (what you receive).
Input vs Output Pricing
Most providers charge differently for input tokens (your prompt, any context, system instructions) and output tokens (the model's response). Output tokens are typically 2-5 times more expensive than input tokens. This is why a long, detailed prompt with a short answer can be cheaper than a brief prompt that generates a 2,000-word essay.
Blended Cost
The "blended cost" is the average cost per token across both input and output, weighted by a typical use pattern. This is more useful for estimating real-world costs than looking at input and output prices separately. A model with cheap input but expensive output could end up costing more overall than one with moderate pricing on both sides.
Use our pricing calculator to estimate costs for your specific use case, or explore pricing trends to see how costs have changed over time (spoiler: they keep falling).
3. Temperature and Parameters
When using the API, you can control how the model behaves through a few key parameters. Understanding these gives you much finer control than a chat interface offers.
Temperature
0.0 - 2.0Temperature controls randomness. At 0, the model gives the most predictable, deterministic response. At higher values, it becomes more creative and varied but also less reliable. Think of it as a dial between "strictly factual" and "creatively adventurous."
Low (0 - 0.3)
Factual tasks, code, data extraction
Medium (0.5 - 0.7)
General conversation, balanced output
High (0.8 - 1.5)
Creative writing, brainstorming
Top-p (Nucleus Sampling)
0.0 - 1.0Top-p is an alternative way to control randomness. Instead of scaling probabilities (like temperature), it restricts the pool of words the model considers. A top-p of 0.1 means the model only considers the top 10% most likely next tokens. Most of the time, you adjust either temperature or top-p, not both. If you are new to this, stick with temperature and leave top-p at the default.
Max Tokens
This sets an upper limit on how long the model's response can be. If you set max tokens to 500, the model will stop generating after 500 tokens regardless of whether it has finished its thought. This is useful for controlling costs and ensuring responses don't run away. It does not force the model to use all the tokens — it is a ceiling, not a target.
4. AI Agents
One of the most significant developments in AI is the emergence of agents — AI systems that don't just respond to a single prompt but can plan, use tools, and execute multi-step tasks autonomously.
What Is an AI Agent?
A regular chatbot responds to one message at a time. An agent receives a goal and then works towards it over multiple steps. It can decide which tools to use, break a problem into sub-tasks, check its own work, and iterate until the goal is met. Think of the difference between asking someone a question (chatbot) versus giving someone a project to complete (agent).
How Agents Work
Most agent architectures follow a loop:
Plan — The model analyses the goal and decides what to do first.
Act — It calls a tool (search the web, run code, read a file, call an API).
Observe — It reads the result of the tool call.
Reflect — It decides if the goal is met or if it needs another step.
Repeat — The loop continues until the task is done or a limit is reached.
The Current State of Agents
AI agents are real and increasingly capable, but they are still in their early stages. They work well for well-defined tasks with clear success criteria — coding, research, data processing. They struggle more with ambiguous, open-ended goals where judgement is required. Expect rapid improvement here, but also expect that fully autonomous agents will take time to become reliable for high-stakes work.
See how different models perform on agent tasks in our agents leaderboard.
5. The Emerging AI Stack
Beyond basic prompt-and-response, a set of techniques has emerged that lets you build much more powerful AI applications. Here are the key building blocks you should know about.
RAG (Retrieval-Augmented Generation)
Instead of relying solely on what the model learned during training, RAG retrieves relevant information from a knowledge base (your documents, a database, the web) and includes it in the prompt. This keeps answers grounded in real, up-to-date data and dramatically reduces hallucination.
Use when: You need AI to answer questions about your specific data or documents.
Fine-Tuning
Fine-tuning means taking a pre-trained model and training it further on your specific data. This adjusts the model's behaviour, style, or knowledge to match your needs. It is more involved than RAG — you need training data and some technical setup — but the result is a model that naturally "knows" your domain without needing information stuffed into every prompt.
Use when: You need consistent style, specialised terminology, or domain-specific behaviour.
Embeddings
Embeddings convert text into numerical vectors — lists of numbers that capture meaning. Texts with similar meanings end up with similar numbers. This is the foundation of semantic search: instead of matching keywords, you can find content that is conceptually related. Embeddings power RAG systems, recommendation engines, and content classification.
Use when: You need to search by meaning, cluster content, or build recommendation systems.
Vector Databases
A vector database stores and searches embeddings efficiently. When you have thousands or millions of documents, you need a database that can quickly find the most similar vectors to a query. Tools like Pinecone, Weaviate, and Chroma are purpose-built for this. A vector database is what makes RAG fast and scalable.
Use when: You are building RAG at scale, or need fast semantic search across large datasets.
6. Where AI Is Heading
Predicting the future of AI is difficult — the field has repeatedly surprised even its own researchers. But several trends are clearly underway.
Scaling and Efficiency
Models continue to get better, but the focus is shifting from "make it bigger" to "make it smarter." Smaller models that match or exceed the performance of yesterday's giants are becoming common. This means AI will keep getting cheaper, faster, and more accessible. The performance you pay a premium for today will likely be available at a fraction of the cost within a year.
Reasoning Models
Models that "think before they answer" — like OpenAI's o-series — represent a significant shift. Instead of producing the first plausible response, they explore multiple approaches, check their logic, and arrive at better answers for complex problems. Expect reasoning capabilities to become standard across all model families, not just a premium feature.
Multimodal Everything
Models are rapidly gaining the ability to work across text, images, audio, video, and code simultaneously. The line between "text model" and "image model" is disappearing. In the near future, you'll interact with a single model that can read your documents, analyse your photos, transcribe your meetings, and generate visuals — all in one conversation.
Open-Source Momentum
Open-weight models from Meta, DeepSeek, Mistral, and others are closing the gap with proprietary models at a remarkable pace. This is good for everyone: it drives competition, reduces costs, and ensures that advanced AI isn't controlled by a handful of companies. For privacy-sensitive applications, local deployment of open models is becoming increasingly practical.
An Honest Note About Uncertainty
Nobody knows exactly where this goes. Some predictions from two years ago have already proven wildly wrong — in both directions. What we can say is that AI capabilities are improving faster than most experts expected, costs are falling faster than anyone predicted, and the technology is becoming more accessible every month. The best strategy is to stay curious, keep experimenting, and avoid getting locked into any single tool or workflow.
Congratulations — You've Completed the AI Academy
You've gone from "What is AI?" to understanding APIs, agents, and the emerging AI stack. That puts you ahead of the vast majority of people who are still guessing at how this technology works. The next step is to put this knowledge into practice. Here are the best places to go from here.
API Getting Started Guides
Code snippets and walkthroughs for OpenAI, Anthropic, Google, and more
Prompt Library
Model-specific prompting guides with official best practices
Use Cases
Real-world applications of AI across industries and tasks
Model Comparison
Compare any two models head-to-head on benchmarks, pricing, and capabilities