LLM App Development #1: Your First API Call and Environment Setup
This series is an introduction for people working with LLMs in code for the first time. If you have experience with REST APIs and databases, you will find it easier to follow. By the end, you will be able to build a working LLM app yourself, starting from simple calls and going all the way to RAG and agents.
It runs in 13 parts.
- #1 Your first API call and environment setup ← this post
- #2 Understanding messages and parameters
- #3 Streaming responses in real time
- #4 Prompt engineering in practice
- #5 Getting structured output
- #6 Connecting external functions with tool calling
- #7 Embeddings and vector search
- #8 Building a RAG pipeline
- #9 Conversation memory and context management
- #10 Building an AI agent
- #11 Connecting tools with MCP
- #12 Cost, evaluation, and observability
- #13 A real-world project
The code examples are written in Python, and the model is Anthropic’s Claude. Most of the concepts apply just as well to other providers.
How LLM apps differ from a typical backend #
Calling an LLM from code is structurally similar to calling an external API. You send an HTTP request and get a response. But once you look inside, there are a few properties that differ from the backend development you are used to. Understanding these differences first is what lets you see why later parts take the approaches they do.
First, the output differs every time. Even with the same input, you will not get an identical response. An ordinary function returns the same result for the same arguments, but an LLM does not. So code written on the assumption that “exactly this string comes back” will break. Forcing and validating the shape of the response becomes important, and we will cover that in #5.
Second, the input and output are natural language. You describe what you want in sentences rather than a function signature. How you write those sentences changes the quality of the result significantly. Writing prompts becomes a skill of its own, and we will cover it separately in #4.
Third, cost is measured in tokens. Rather than a flat fee per request, you are billed in proportion to the amount of text exchanged (tokens, to be precise). The longer the input and the longer the output, the higher the cost. So how much context to send and how to cache it become important topics in real operation. We will cover this in #12.
Fourth, responses can be slow and long. An LLM does not produce its answer all at once; it generates tokens one at a time. A long answer can take several seconds. So streaming the output to the screen as it is generated becomes a basic technique, which we will cover in #3.
For now, it is enough to remember these four as “so that is how it behaves.” We will deal with each one concretely in its own part.
What you need #
To make your first call, you need three things.
- An Anthropic account and an API key
- A Python 3.8 or later environment
- A small amount of usage credit (your first call costs a fraction of a cent)
You issue an API key by signing in to the Anthropic Console and going to the API Keys menu. The full key is shown only once, so copy it right there.
Installing the SDK and setting the key #
Anthropic provides an official SDK for Python. Create a virtual environment, then install it.
python -m venv venv
source venv/bin/activate
pip install anthropicPut the key you issued into the environment variable ANTHROPIC_API_KEY. The SDK reads an environment variable of this name automatically.
export ANTHROPIC_API_KEY="sk-ant-your-issued-key-here"This way, the key string never appears anywhere in your code. It is best to build the habit of not hardcoding keys from the very start.
Your first call #
Now let’s send Claude a single sentence and get a response back.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Introduce yourself in one sentence."}
],
)
for block in response.content:
if block.type == "text":
print(block.text)Run it, and a one-sentence self-introduction generated by Claude is printed. It is short, but this code contains the entire basic skeleton of an LLM call. Let’s break it down piece by piece.
anthropic.Anthropic()— creates the client. Leave the arguments empty and it reads the key from the environment variableANTHROPIC_API_KEY.model— specifies which model answers. We cover model selection just below.max_tokens— sets an upper limit on how many tokens the response may generate. Exceed it and the response is cut off midway.messages— the conversation. For now it is a singleuserspeaking once. We cover the role structure in detail in #2.response.content— the response is not a single string but a list of blocks. Check each block’stypeand pull out the content oftextblocks.
That last part can feel like a hassle at first. “Why a list when it could just hand back the text?” you might think. But once tool calling or reasoning-related blocks start arriving alongside the text later on, this structure becomes necessary. So it is best to build the habit of checking type and pulling the content out from the start.
Choosing among three models #
The Claude models you can pass to model fall broadly into three tiers by capability and cost.
| Tier | Model ID | Character | Good for |
|---|---|---|---|
| Most capable | claude-opus-4-8 | Smartest, most expensive | Complex reasoning, long tasks, agents |
| Balanced | claude-sonnet-4-6 | Balance of speed and intelligence | Most real-world work |
| Fastest | claude-haiku-4-5 | Fast and cheap | Simple tasks like classification and summarization |
For learning, the balanced tier claude-sonnet-4-6 is a safe choice. It is light on cost while handling most of the examples comfortably. For light work like simple classification or short transformations, the cheaper Haiku is enough; when you need demanding reasoning, use Opus. Throughout this series we will pick the model that fits the nature of each task.
Where people commonly trip up #
Here are a few problems you commonly run into at the first-call stage.
AuthenticationErroris raised — The environment variable is not set, or the key is wrong. Runecho $ANTHROPIC_API_KEYto check the value is actually there. If you opened a new terminal, you may need toexportit again.- The response is cut off mid-way —
max_tokensis too small. If the response’sstop_reasoncomes back asmax_tokens, it hit the limit, so raise the value. block.textraises an error — Not every block is text. As in the example above, checkblock.type == "text"first before pulling it out, to be safe.
Knowing just these three will save you from most of the snags in your first 30 minutes.
Wrapping up #
In this post, we looked at how LLM apps differ from a typical backend and walked through everything from issuing an API key to a first response. To summarize:
- LLM apps produce different output every time, take natural language in and out, are billed per token, and can be slow to respond.
- Handle the API key through an environment variable and never hardcode it.
- The response is a list of blocks, so check
typeand pull the content out. - Choose from Opus, Sonnet, and Haiku according to the nature of the task.
In the next post, “LLM App Development #2: Understanding Messages and Parameters,” we will take a proper look at the role structure of messages, which we just glossed over, and parameters like temperature. You need these to convey context and instructions to Claude precisely.