Claude API Complete Guide | Anthropic SDK Usage, Pricing & Function Calling
이 글의 핵심
The Claude API is Anthropic's conversational AI API. Similar to the OpenAI API but with a longer context window (200K tokens), powerful prompt caching, and superior code generation capabilities.
What is the Claude API?
The Claude API is Anthropic’s conversational AI API. Alongside GPT-4 and Gemini, it is one of the most widely used LLM APIs in 2026 — especially strong at long document processing, code generation, and safety.
This guide covers everything from API key setup to advanced features, with Python and TypeScript examples you can use in production right away.
# Claude API in 30 seconds
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Implement the Fibonacci sequence in Python"}]
)
print(message.content[0].text)
Table of Contents
- Model Comparison & Pricing
- Getting Your API Key
- Installing the SDK
- Basic Messages API
- Streaming
- System Prompts
- Multi-turn Conversations
- Tool Use (Function Calling)
- Vision (Image Input)
- Prompt Caching
- Production Pattern: Error Handling & Retries
Model Comparison & Pricing
| Model | Context | Input ($/MTok) | Output ($/MTok) | Notes |
|---|---|---|---|---|
| claude-haiku-4-5 | 200K | $0.80 | $4.00 | Fastest, lowest cost |
| claude-sonnet-4-5 | 200K | $3.00 | $15.00 | Balanced performance |
| claude-opus-4-5 | 200K | $15.00 | $75.00 | Highest capability |
| claude-sonnet-4-6 | 200K | $3.00 | $15.00 | Latest, improved reasoning |
Production tip: Use Haiku for development and testing, Sonnet for production to cut costs significantly.
Getting Your API Key
- Go to console.anthropic.com
- Create an account (Google/GitHub login supported)
- API Keys → Create Key
- Copy and store the key securely (it cannot be viewed again)
# Store in a .env file
ANTHROPIC_API_KEY=sk-ant-api03-...
Installing the SDK
Python
pip install anthropic
TypeScript / Node.js
npm install @anthropic-ai/sdk
# or
pnpm add @anthropic-ai/sdk
Basic Messages API
Python
Import the required modules and set up the dependencies:
import anthropic
client = anthropic.Anthropic() # Reads ANTHROPIC_API_KEY from environment automatically
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Response structure
print(message.content[0].text) # Paris.
print(message.model) # claude-sonnet-4-5
print(message.usage) # token usage
TypeScript
Import the required modules and set up the dependencies:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "What is the capital of France?" }],
});
console.log(message.content[0].type === "text" ? message.content[0].text : "");
Response Object Structure
Configuration file:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Paris."
}
],
"model": "claude-sonnet-4-5",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 14,
"output_tokens": 5
}
}
Streaming
Essential for displaying real-time responses in conversational apps.
Python (Synchronous)
Import the required modules and set up the dependencies:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain Python's GIL"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# Access the final message object after completion
final_message = stream.get_final_message()
print(f"\n\nTotal tokens: {final_message.usage.input_tokens + final_message.usage.output_tokens}")
Python (Asynchronous)
The stream_response function is implemented below. It handles the core logic described above:
import asyncio
import anthropic
client = anthropic.AsyncAnthropic()
async def stream_response():
async with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "What is async programming?"}]
) as stream:
async for text in stream.text_stream:
print(text, end="", flush=True)
asyncio.run(stream_response())
TypeScript (Streaming)
Import the required modules and set up the dependencies:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const stream = await client.messages.stream({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain JavaScript's event loop" }],
});
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
process.stdout.write(chunk.delta.text);
}
}
System Prompts
Define the model’s role and behavior.
Import the required modules and set up the dependencies:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system="""You are a senior backend engineer with 10 years of experience.
- Proficient in Python, Go, and Rust
- You prioritize performance optimization and system design
- You provide practical and specific feedback during code reviews
- You answer in English""",
messages=[
{"role": "user", "content": "Fix the N+1 query problem in this Django code:\n\n```python\ndef get_posts():\n posts = Post.objects.all()\n return [(p.title, p.author.name) for p in posts]\n```"}
]
)
print(message.content[0].text)
Multi-turn Conversations
Maintain conversation history for contextual dialogue.
import anthropic
client = anthropic.Anthropic()
conversation_history = []
def chat(user_message: str) -> str:
conversation_history.append({
"role": "user",
"content": user_message
})
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
system="You are a friendly programming tutor.",
messages=conversation_history
)
assistant_message = response.content[0].text
conversation_history.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message
# Example conversation
print(chat("What is Python?"))
print(chat("Show me a Hello World example in Python"))
print(chat("Now add a variable to that example"))
Tool Use (Function Calling)
Let Claude call external functions to process real-time information.
import anthropic
import json
client = anthropic.Anthropic()
# Tool definitions
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name (e.g. London, Tokyo)"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
},
{
"name": "get_stock_price",
"description": "Get stock price",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol (e.g. AAPL, TSLA)"
}
},
"required": ["ticker"]
}
}
]
# Actual function implementations
def get_weather(city: str, unit: str = "celsius") -> dict:
# In production, call a real weather API
return {"city": city, "temperature": 22, "unit": unit, "condition": "Sunny"}
def get_stock_price(ticker: str) -> dict:
# In production, call a real stock API
return {"ticker": ticker, "price": 180.50, "currency": "USD"}
def process_tool_call(tool_name: str, tool_input: dict) -> str:
if tool_name == "get_weather":
result = get_weather(**tool_input)
elif tool_name == "get_stock_price":
result = get_stock_price(**tool_input)
else:
result = {"error": f"Unknown tool: {tool_name}"}
return json.dumps(result)
def chat_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=messages
)
# No tool calls — return final response
if response.stop_reason == "end_turn":
return response.content[0].text
# Handle tool calls
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = process_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
# Run
print(chat_with_tools("What's the weather in London and what's Apple's stock price?"))
Vision (Image Input)
Claude can understand and analyze images.
import anthropic
import base64
from pathlib import Path
client = anthropic.Anthropic()
# Method 1: Base64 encoding
def analyze_image_file(image_path: str, question: str) -> str:
image_data = Path(image_path).read_bytes()
base64_image = base64.standard_b64encode(image_data).decode("utf-8")
ext = Path(image_path).suffix.lower()
media_types = {".jpg": "image/jpeg", ".png": "image/png",
".gif": "image/gif", ".webp": "image/webp"}
media_type = media_types.get(ext, "image/jpeg")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": base64_image,
},
},
{"type": "text", "text": question}
],
}
],
)
return message.content[0].text
# Method 2: URL (public images)
def analyze_image_url(image_url: str, question: str) -> str:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "url", "url": image_url},
},
{"type": "text", "text": question}
],
}
],
)
return message.content[0].text
# Usage
result = analyze_image_file("screenshot.png", "What does this error message mean?")
print(result)
Prompt Caching
Reduce costs by up to 90% for repeated long contexts.
import anthropic
client = anthropic.Anthropic()
# Cache a long system prompt or document
LARGE_DOCUMENT = """
[Very long technical document — 5000+ tokens...]
""" # In practice, thousands of tokens
def ask_about_document(question: str) -> str:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a technical documentation expert. Answer questions based on the document below.",
},
{
"type": "text",
"text": LARGE_DOCUMENT,
"cache_control": {"type": "ephemeral"} # Enable caching (5 minutes)
}
],
messages=[{"role": "user", "content": question}]
)
# Check cache hit
usage = message.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache created: {getattr(usage, 'cache_creation_input_tokens', 0)}")
print(f"Cache read: {getattr(usage, 'cache_read_input_tokens', 0)}")
return message.content[0].text
# First call: creates cache
print(ask_about_document("Summarize the main points of this document"))
# Second call: cache hit (90% cost savings)
print(ask_about_document("What are the performance optimization techniques in this document?"))
Production Pattern: Error Handling & Retries
import anthropic
import time
from typing import Optional
client = anthropic.Anthropic()
def create_message_with_retry(
messages: list,
model: str = "claude-sonnet-4-5",
max_tokens: int = 1024,
system: Optional[str] = None,
max_retries: int = 3,
base_delay: float = 1.0,
) -> str:
"""Reliable API call with retry logic"""
for attempt in range(max_retries):
try:
kwargs = {
"model": model,
"max_tokens": max_tokens,
"messages": messages,
}
if system:
kwargs["system"] = system
response = client.messages.create(**kwargs)
return response.content[0].text
except anthropic.RateLimitError:
# Rate limit exceeded — exponential backoff
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt)
print(f"Rate limit hit. Retrying in {delay}s ({attempt + 1}/{max_retries})")
time.sleep(delay)
else:
raise
except anthropic.APIStatusError as e:
if e.status_code == 529: # Overloaded
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt)
print(f"Server overloaded. Retrying in {delay}s")
time.sleep(delay)
else:
raise
else:
raise # Propagate other errors immediately
except anthropic.APIConnectionError:
if attempt < max_retries - 1:
time.sleep(base_delay)
else:
raise
# Usage
result = create_message_with_retry(
messages=[{"role": "user", "content": "Hello!"}],
system="You are a helpful assistant.",
)
print(result)
ChatGPT API vs Claude API
| Feature | Claude API | ChatGPT API |
|---|---|---|
| Max Context | 200K tokens | 128K tokens |
| Prompt Caching | Yes (90% savings) | No |
| Code Generation | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Document Analysis | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| JSON Mode | Yes | Yes |
| Vision | Yes | Yes |
| Ecosystem | Growing | Very broad |
| Price (Sonnet level) | $3/$15 | $5/$15 |
Conclusion
The Claude API excels at long document processing, code review, and complex reasoning. Using prompt caching aggressively can significantly cut your costs.
Next steps: