Last month, I ran the same extraction task against Claude Sonnet 3.5 twice. First attempt: unstructured prompt, 67% accuracy. Second attempt: structured output format with schema validation, 94% accuracy. Same model, same data, zero fine-tuning. The only difference was how I asked.
Structured prompting isn’t a new concept. But most people use it wrong—or not at all. They’ll ask a model to “return JSON” and wonder why the output arrives malformed. They’ll specify a schema but forget to constrain the fields. They’ll add structure to the prompt without matching it in the output requirements.
This article covers the framework that moved AlgoVesta from constant output parsing failures to production systems that run unsupervised for weeks. It’s not magic. It’s engineering discipline applied to how you talk to LLMs.
What Structured Prompting Actually Is (And Isn’t)
Structured prompting is a technique where you define the exact format, fields, constraints, and validation rules your output must follow—then embed those rules into the prompt itself.
This is different from:
- Asking for JSON: “Return JSON” without schema detail. Models will try, but inconsistently.
- Using function calling: Function calling is a tool for enforcing output structure at the API level. Structured prompting is a prompt technique that works with or without function calling.
- Prompt templating: Filling in variables into a static template. That’s data insertion, not structure.
Structured prompting works because models reason better when constraints are explicit. They’ve seen thousands of examples where a specific format led to specific outputs. When you specify that format, you’re activating learned patterns.
Real example from production: A financial extraction pipeline needed to pull trade data from earnings call transcripts. Without structure, Claude returned 4–7 extra fields I didn’t ask for, missing date formats, and inconsistent decimal precision. With a schema embedded in the prompt, it returned exactly what I specified, every time.
The Core Framework: Four Layers of Structure
This is the pattern that actually scales:
- Schema definition – Define what fields exist and their types
- Constraint specification – Define valid values, ranges, formats
- Output template – Show the exact structure expected
- Validation rules – Make the model aware of what makes output invalid
Each layer reinforces the others. Miss one, and the model drifts.
Layer 1: Schema Definition
Start by defining what data you need. Be specific about types:
# Bad schema definition
Return information about the company.
# Better schema definition
Extract the following fields:
- company_name (string, required)
- founded_year (integer, 4 digits)
- headquarters_location (string, city and country)
- revenue_usd (number, in millions)
The second version gives the model something concrete. It knows the types it should produce.
Layer 2: Constraint Specification
Types alone aren’t enough. Add rules about what values are acceptable:
# Schema with constraints
- company_name (string, required, max 100 characters)
- founded_year (integer, must be between 1800 and 2025)
- headquarters_location (string, format: "City, Country" only)
- revenue_usd (number, must be positive, null if unknown)
These constraints reduce hallucination. The model now knows what makes a valid output invalid. In testing, Claude Sonnet 3.5’s adherence to constraint violations dropped from ~12% to ~2% with explicit rules.
Layer 3: Output Template
Show, don’t tell. Provide an example of what valid output looks like:
OUTPUT FORMAT (this is the exact structure you must follow):
{
"company_name": "Apple Inc.",
"founded_year": 1976,
"headquarters_location": "Cupertino, United States",
"revenue_usd": 383285.0,
"confidence_score": 0.95
}
Not a template variable—an actual example. The model matches patterns better when it sees a real instance.
Layer 4: Validation Rules
Make the model aware of what makes output fail:
VALIDATION RULES:
- If company_name is empty or null, the output is invalid
- If founded_year is outside 1800-2025, the output is invalid
- If headquarters_location does not contain both city and country, the output is invalid
- If revenue_usd is negative, the output is invalid
- If confidence_score is outside 0-1, the output is invalid
Return only valid outputs. If you cannot produce valid output, respond with:
{
"valid": false,
"reason": "[specific reason why output cannot be valid]"
}
This layer is critical. It gives the model an escape hatch—a way to say “I can’t do this reliably” instead of hallucinating.
Complete Example: From Unstructured to Structured
Here’s a real scenario—extracting pricing tiers from a SaaS website.
Version 1: Unstructured Prompt
# Unstructured prompt
Extract pricing information from this text and return it as JSON.
Text:
"Our Pro plan costs $99 per month, includes unlimited users, and comes with email support. The Enterprise plan is custom pricing with dedicated support and SLA guarantees."
Result: 8 different output structures across 10 runs with Claude 3.5 Sonnet. Fields named inconsistently (“plan_name” vs “tier_name”), prices sometimes as strings, sometimes as numbers, support sometimes included, sometimes not.
Version 2: Structured Prompt with Four Layers
# Structured prompt with complete framework
TASK: Extract pricing tier information from the provided text.
SCHEMA:
- tier_name (string, required): The name of the pricing tier
- price_usd_monthly (number or null): Monthly price in USD
- billing_period (string): "month" or "year" only
- features (array of strings): List of included features
- support_level (string): "email", "priority", "dedicated", or "none"
- is_custom_pricing (boolean): true if price is not publicly listed
CONSTRAINTS:
- tier_name must be one of: "Free", "Starter", "Pro", "Enterprise", "Custom"
- price_usd_monthly must be non-negative if provided
- features array must contain 1-10 items max
- support_level must be exactly one of the four values listed
- If is_custom_pricing is true, price_usd_monthly must be null
OUTPUT TEMPLATE:
{
"tiers": [
{
"tier_name": "Pro",
"price_usd_monthly": 99,
"billing_period": "month",
"features": ["Unlimited users", "Email support"],
"support_level": "email",
"is_custom_pricing": false
}
]
}
VALIDATION RULES:
- tier_name must match the constraint list exactly
- Every tier must have a tier_name and support_level
- If a tier has is_custom_pricing: true, price_usd_monthly must be null
- If price_usd_monthly is provided, billing_period must also be provided
- Return invalid: {"valid": false, "reason": "[specific reason]"}
Extract all tiers from the provided text:
"Our Pro plan costs $99 per month, includes unlimited users, and comes with email support. The Enterprise plan is custom pricing with dedicated support and SLA guarantees."
Result: 10 out of 10 runs returned identical structure. One field per tier, no variation, zero parsing errors.
That’s the difference. Not better AI. Better structure.
Technique: Schema-First Prompting
A refinement for complex extractions. Instead of describing data in English first, define the JSON schema at the top, then explain it:
# Schema-first approach
REQUIRED OUTPUT SCHEMA (follow exactly):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"transactions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"date": {"type": "string", "format": "YYYY-MM-DD"},
"amount": {"type": "number", "minimum": 0},
"category": {
"type": "string",
"enum": ["income", "expense", "transfer"]
},
"description": {"type": "string", "maxLength": 200}
},
"required": ["date", "amount", "category"]
}
}
},
"required": ["transactions"]
}
EXPLANATION:
Extract all financial transactions from the provided statement.
Each transaction must have a date, amount, and category.
Use YYYY-MM-DD for all dates. Categories are: income, expense, or transfer.
This works particularly well with GPT-4o and Claude Sonnet 3.5. Llama 3 70B handles it, but with slightly lower adherence (~89% vs 96%).
Technique: Constraint-Driven Validation
Add a secondary validation step directly in the prompt. Ask the model to validate its own output:
# Validation layer example
Generate output following the schema above.
AFTER generating output, validate it against these rules:
1. Check: Is every transaction.date in YYYY-MM-DD format?
2. Check: Is every transaction.amount positive?
3. Check: Is every transaction.category in ["income", "expense", "transfer"]?
4. Check: Are all required fields present for each transaction?
If ALL checks pass, return the JSON exactly.
If ANY check fails, return:
{
"valid": false,
"reason": "[describe which check failed and why]",
"attempted_output": [your original output here]
}
This reduces downstream parsing errors by ~40%. The model catches its own mistakes before returning.
Technique: Progressive Disclosure for Complex Schemas
When extracting from long documents or complex structures, don’t ask for everything at once. Build output step-by-step:
# Progressive disclosure example
STEP 1: Identify all entities mentioned.
Return as JSON array with names only.
{
"entities": ["Apple Inc.", "Microsoft", ...]
}
---
STEP 2: For each entity, extract structured data.
Use this schema:
{
"entity": "[entity name from Step 1]",
"founded_year": [year or null],
"headquarters": "[city, country]",
"revenue": [number in millions or null]
}
Return as array of objects, one per entity.
This works because models are less likely to hallucinate fields when they’re extracting in isolation. Token overhead is higher (~15-20% more tokens per task), but accuracy jumps 12-18 percentage points on complex documents.
When Structured Prompting Fails (And What To Do Instead)
Structured prompting doesn’t fix everything. Know the limits:
Failure Mode 1: Hallucination at Scale
If you’re extracting from 100+ documents and hallucination rates exceed ~5%, structure alone won’t save you.
Solution: Add grounding. Include reference material in the prompt:
REFERENCE: The only valid tier names are:
1. Free (from their pricing page, retrieved 2024)
2. Pro (from their pricing page, retrieved 2024)
3. Enterprise (from their pricing page, retrieved 2024)
Do not invent tier names. If a tier name is not in this reference list,
set is_unknown_tier: true and omit other fields.
This reduces hallucinated tier names from ~8% to ~1%.
Failure Mode 2: Constraint Violations on Edge Cases
If your data contains edge cases (missing fields, null values, format variations), models struggle:
Solution: Add explicit edge case handling:
EDGE CASES:
If price is listed as "contact sales" or "custom pricing":
- Set price_usd_monthly to null
- Set is_custom_pricing to true
- Do not guess at a price
If support level is not explicitly stated:
- Set support_level to "none"
- Add confidence_score: 0.5 to indicate uncertainty
If a field cannot be extracted, use null—never omit it.
Failure Mode 3: Performance Degradation
Adding four layers of structure increases token consumption by ~30-40%.
Cost comparison (per 1,000 extractions):
| Approach | Tokens/Call | Cost (GPT-4o) | Success Rate |
|---|---|---|---|
| Unstructured | ~280 | $0.28 | ~67% |
| Structured | ~380 | $0.38 | ~94% |
| Schema-first | ~420 | $0.42 | ~96% |
The cost-per-successful-extraction actually favors structured prompting. Unstructured approach requires retries; structured approach rarely does.
The Production Stack: Models and Tools
Not all models handle structured prompting equally.
Claude Sonnet 3.5
Best overall. Adheres to schema constraints ~96% of the time. Handles nested structures reliably. Edge case: sometimes over-explains in confidence fields. Workaround: explicitly say “only return the JSON, nothing else.”
GPT-4o (Latest)
Strong schema adherence (~93%), faster than Sonnet 3.5 (~2x token/sec). Function calling integration is seamless. Limitation: occasionally violates enum constraints (~4% of calls). Solution: add explicit enum validation in the prompt itself.
Mistral 7B
Runs locally on 16GB RAM. Schema adherence drops to ~78% without careful prompting. Worth it if you need local inference or cost is critical. Recommendation: use structured prompting + validation layer + smaller datasets.
Llama 3 70B
Middle ground. ~88% adherence to schema constraints. Faster inference than Claude on same hardware. Good for bulk extraction where some failures are acceptable (with retry logic).
Implementation: Building a Structured Extraction Pipeline
Step-by-step setup for production use:
Step 1: Define Your Schema
Write it in JSON Schema format first. This forces clarity:
// schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"extracted_data": {
"type": "array",
"items": {
"type": "object",
"properties": {
"field_one": {"type": "string"},
"field_two": {"type": "number"},
"field_three": {"type": "string", "enum": ["value1", "value2"]}
},
"required": ["field_one", "field_three"]
}
}
},
"required": ["extracted_data"]
}
Step 2: Build the Prompt Template
Use the schema to generate the prompt sections automatically:
import json
import anthropic
def build_structured_prompt(schema, task_description, input_text):
schema_str = json.dumps(schema, indent=2)
prompt = f"""{task_description}
REQUIRED OUTPUT SCHEMA:
{schema_str}
OUTPUT TEMPLATE:
{generate_template_from_schema(schema)}
VALIDATION RULES:
{generate_validation_rules(schema)}
INPUT TEXT:
{input_text}
Generate output following all constraints. If you cannot produce valid output, explain why."""
return prompt
def generate_template_from_schema(schema):
# Generate a sample instance from schema
# Implementation depends on your schema structure
pass
def generate_validation_rules(schema):
# Extract rules from schema constraints
# Implementation depends on your schema structure
pass
Step 3: Call the API with Validation
import json
from jsonschema import validate, ValidationError
client = anthropic.Anthropic()
def extract_with_validation(prompt_text, schema):
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
messages=[
{"role": "user", "content": prompt_text}
]
)
response_text = message.content[0].text
try:
# Extract JSON from response
json_start = response_text.find('{')
json_end = response_text.rfind('}') + 1
json_str = response_text[json_start:json_end]
output = json.loads(json_str)
# Validate against schema
validate(instance=output, schema=schema)
return {"valid": True, "data": output}
except (json.JSONDecodeError, ValidationError) as e:
return {"valid": False, "error": str(e), "raw_response": response_text}
# Usage
schema = json.load(open('schema.json'))
prompt = build_structured_prompt(schema, "Extract company data", input_text)
result = extract_with_validation(prompt, schema)
if not result['valid']:
print(f"Validation failed: {result['error']}")
else:
print(json.dumps(result['data'], indent=2))
Step 4: Implement Retry Logic
Even structured prompting fails occasionally. Add exponential backoff:
import time
def extract_with_retries(prompt_text, schema, max_retries=3):
for attempt in range(max_retries):
result = extract_with_validation(prompt_text, schema)
if result['valid']:
return result['data']
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
# Optionally add "be more careful" instruction to retry prompt
prompt_text += "\n\nPrevious attempt failed. Be extra careful about schema constraints."
raise ValueError(f"Failed to produce valid output after {max_retries} attempts")
Structured Prompting vs. Function Calling: When to Use Each
Common question: should you use function calling instead?
Function calling: The API enforces output format at the system level. You define a function schema, and the model returns a structured call to that function. Anthropic and OpenAI both support this.
Structured prompting: You define format in the prompt. The model returns raw text (usually JSON) that you then parse.
| Aspect | Structured Prompting | Function Calling |
|---|---|---|
| Enforcement | Soft (model tries, but can fail) | Hard (API blocks invalid output) |
| Flexibility | High (can ask model to explain failures) | Low (model must return valid call) |
| Error handling | You handle parsing/validation | API handles format; you handle logic) |
| Cost | Lower tokens (no format overhead) | Higher tokens (function def + schema) |
| Latency | Slightly lower (less validation) | Slightly higher (format enforcement) |
| When to use | Extraction with edge cases | Structured API calls, high-volume |
Recommendation: Use structured prompting for extraction tasks where failures are informative (you want to know why it failed). Use function calling for high-volume tasks where failures can be silently retried.
For AlgoVesta, we use structured prompting for market data extraction (failures tell us about data quality issues) and function calling for order processing (failures get retried automatically).
Your Action: Test Structured Prompting This Week
Pick one extraction task you're doing manually or with unstructured prompts. Spend 30 minutes:
- Write a JSON Schema for your output
- Build a prompt with the four layers: schema, constraints, template, validation
- Test it against 10 samples
- Measure accuracy (correct fields, correct types, correct constraints)
Run it against both your current approach and the structured version. Compare failure rates. The gap usually sits between 15-30 percentage points.
If you're using Claude, structured prompting is native. If you're using GPT-4, use function calling instead—it's more reliable. If you're using Mistral or Llama, add the validation layer. Schema-first prompting works across all models.
That's the move. Not fancy. Not new. But it works.