LLM App Development #1: Your First API Call and Environment Setup

7 min read

This series is an introduction for people working with LLMs in code for the first time. If you have experience with REST APIs and databases, you will find it easier to follow. By the end, you will be able to build a working LLM app yourself, starting from simple calls and going all the way to RAG and agents.

It runs in 13 parts.

  • #1 Your first API call and environment setup ← this post
  • #2 Understanding messages and parameters
  • #3 Streaming responses in real time
  • #4 Prompt engineering in practice
  • #5 Getting structured output
  • #6 Connecting external functions with tool calling
  • #7 Embeddings and vector search
  • #8 Building a RAG pipeline
  • #9 Conversation memory and context management
  • #10 Building an AI agent
  • #11 Connecting tools with MCP
  • #12 Cost, evaluation, and observability
  • #13 A real-world project

The code examples are written in Python, and the model is Anthropic’s Claude. Most of the concepts apply just as well to other providers.

How LLM apps differ from a typical backend #

Calling an LLM from code is structurally similar to calling an external API. You send an HTTP request and get a response. But once you look inside, there are a few properties that differ from the backend development you are used to. Understanding these differences first is what lets you see why later parts take the approaches they do.

First, the output differs every time. Even with the same input, you will not get an identical response. An ordinary function returns the same result for the same arguments, but an LLM does not. So code written on the assumption that “exactly this string comes back” will break. Forcing and validating the shape of the response becomes important, and we will cover that in #5.

Second, the input and output are natural language. You describe what you want in sentences rather than a function signature. How you write those sentences changes the quality of the result significantly. Writing prompts becomes a skill of its own, and we will cover it separately in #4.

Third, cost is measured in tokens. Rather than a flat fee per request, you are billed in proportion to the amount of text exchanged (tokens, to be precise). The longer the input and the longer the output, the higher the cost. So how much context to send and how to cache it become important topics in real operation. We will cover this in #12.

Fourth, responses can be slow and long. An LLM does not produce its answer all at once; it generates tokens one at a time. A long answer can take several seconds. So streaming the output to the screen as it is generated becomes a basic technique, which we will cover in #3.

For now, it is enough to remember these four as “so that is how it behaves.” We will deal with each one concretely in its own part.

What you need #

To make your first call, you need three things.

  • An Anthropic account and an API key
  • A Python 3.8 or later environment
  • A small amount of usage credit (your first call costs a fraction of a cent)

You issue an API key by signing in to the Anthropic Console and going to the API Keys menu. The full key is shown only once, so copy it right there.

Note
An API key is like a password. Never write it directly in your code or commit it to a Git repository. If a key leaks, you are billed for whatever cost is run up with it. Below, we handle it through an environment variable.

Installing the SDK and setting the key #

Anthropic provides an official SDK for Python. Create a virtual environment, then install it.

Install the SDK
python -m venv venv
source venv/bin/activate
pip install anthropic

Put the key you issued into the environment variable ANTHROPIC_API_KEY. The SDK reads an environment variable of this name automatically.

Set the API key as an environment variable
export ANTHROPIC_API_KEY="sk-ant-your-issued-key-here"

This way, the key string never appears anywhere in your code. It is best to build the habit of not hardcoding keys from the very start.

Your first call #

Now let’s send Claude a single sentence and get a response back.

first_call.py
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Introduce yourself in one sentence."}
    ],
)

for block in response.content:
    if block.type == "text":
        print(block.text)

Run it, and a one-sentence self-introduction generated by Claude is printed. It is short, but this code contains the entire basic skeleton of an LLM call. Let’s break it down piece by piece.

  • anthropic.Anthropic() — creates the client. Leave the arguments empty and it reads the key from the environment variable ANTHROPIC_API_KEY.
  • model — specifies which model answers. We cover model selection just below.
  • max_tokens — sets an upper limit on how many tokens the response may generate. Exceed it and the response is cut off midway.
  • messages — the conversation. For now it is a single user speaking once. We cover the role structure in detail in #2.
  • response.content — the response is not a single string but a list of blocks. Check each block’s type and pull out the content of text blocks.

That last part can feel like a hassle at first. “Why a list when it could just hand back the text?” you might think. But once tool calling or reasoning-related blocks start arriving alongside the text later on, this structure becomes necessary. So it is best to build the habit of checking type and pulling the content out from the start.

Choosing among three models #

The Claude models you can pass to model fall broadly into three tiers by capability and cost.

TierModel IDCharacterGood for
Most capableclaude-opus-4-8Smartest, most expensiveComplex reasoning, long tasks, agents
Balancedclaude-sonnet-4-6Balance of speed and intelligenceMost real-world work
Fastestclaude-haiku-4-5Fast and cheapSimple tasks like classification and summarization

For learning, the balanced tier claude-sonnet-4-6 is a safe choice. It is light on cost while handling most of the examples comfortably. For light work like simple classification or short transformations, the cheaper Haiku is enough; when you need demanding reasoning, use Opus. Throughout this series we will pick the model that fits the nature of each task.

Note
Use the model ID exactly as the strings in the table above. Changing it arbitrarily, such as appending a date or adding a dot, makes it resolve to a model that does not exist and raises an error.

Where people commonly trip up #

Here are a few problems you commonly run into at the first-call stage.

  • AuthenticationError is raised — The environment variable is not set, or the key is wrong. Run echo $ANTHROPIC_API_KEY to check the value is actually there. If you opened a new terminal, you may need to export it again.
  • The response is cut off mid-waymax_tokens is too small. If the response’s stop_reason comes back as max_tokens, it hit the limit, so raise the value.
  • block.text raises an error — Not every block is text. As in the example above, check block.type == "text" first before pulling it out, to be safe.

Knowing just these three will save you from most of the snags in your first 30 minutes.

Wrapping up #

In this post, we looked at how LLM apps differ from a typical backend and walked through everything from issuing an API key to a first response. To summarize:

  • LLM apps produce different output every time, take natural language in and out, are billed per token, and can be slow to respond.
  • Handle the API key through an environment variable and never hardcode it.
  • The response is a list of blocks, so check type and pull the content out.
  • Choose from Opus, Sonnet, and Haiku according to the nature of the task.

In the next post, “LLM App Development #2: Understanding Messages and Parameters,” we will take a proper look at the role structure of messages, which we just glossed over, and parameters like temperature. You need these to convey context and instructions to Claude precisely.

X