Skip to content

artemiskit stress

Test your LLM’s performance under load using scenario-based stress testing.

Terminal window
artemiskit stress <scenario-file> [options]
akit stress <scenario-file> [options]
ArgumentDescription
scenario-filePath to the YAML scenario file
OptionShortDescriptionDefault
--provider-pLLM provider to useFrom config/scenario
--model-mModel nameFrom config/scenario
--concurrency-cNumber of concurrent requests10
--requests-nTotal number of requests to makeBased on duration
--duration-dDuration to run test (seconds)30
--ramp-upRamp-up time (seconds)5
--saveSave results to storagefalse
--output-oOutput directory for reportsartemis-output
--verbose-vVerbose outputfalse
--configPath to config fileartemis.config.yaml
--redactEnable PII/sensitive data redactionfalse
--redact-patternsCustom redaction patterns (space-separated)Default patterns
--budgetMaximum budget in USD - fail if exceededNone

Built-in patterns:

  • email — Email addresses
  • phone — Phone numbers (various formats)
  • credit_card — Credit card numbers
  • ssn — Social Security Numbers
  • api_key — API keys (common formats)
  • ipv4 — IPv4 addresses
  • jwt — JWT tokens
  • aws_key — AWS access keys
  • secrets — Generic secrets (password=, secret=, etc.)

Custom regex patterns:

Pass regex patterns as plain strings (without delimiters). The g flag is added automatically.

Terminal window
# Mix built-in and custom patterns
akit stress scenario.yaml --redact --redact-patterns email "\b[A-Z]{2}\d{6}\b"

Run a 30-second stress test with default concurrency:

Terminal window
akit stress scenarios/chatbot.yaml

Test with 50 concurrent requests:

Terminal window
akit stress scenarios/chatbot.yaml -c 50

Run exactly 500 requests:

Terminal window
akit stress scenarios/chatbot.yaml -n 500 -c 25

Run for 5 minutes:

Terminal window
akit stress scenarios/chatbot.yaml -d 300 --save

Gradually increase load over 30 seconds:

Terminal window
akit stress scenarios/chatbot.yaml -c 100 --ramp-up 30

Fail if estimated cost exceeds a threshold:

Terminal window
# Fail if cost exceeds $5.00
akit stress scenarios/chatbot.yaml --budget 5.00

This is useful in CI/CD pipelines to prevent runaway costs during load testing.

The stress test measures:

MetricDescription
ThroughputRequests per second
Avg LatencyAverage response time
P50 Latency50th percentile (median)
P90 Latency90th percentile
P95 Latency95th percentile
P99 Latency99th percentile
Min/Max LatencyLatency range
Success RatePercentage of successful requests
Error RateFailures and rate limiting
Total TokensTotal tokens consumed (prompt + completion)
Avg Tokens/RequestAverage token usage per request
Estimated CostCost estimation based on model pricing
Stress Test: chatbot
Provider: openai (gpt-4o-mini)
Duration: 30s | Concurrency: 50 | Ramp-up: 5s
Progress: [████████████████████] 100%
┌─ STRESS TEST SUMMARY ─────────────────────────────┐
│ Total Requests: 523 │
│ Successful: 513 (98.1%) │
│ Failed: 10 (1.9%) │
│ Duration: 30.2s │
│ Throughput: 17.3 req/s │
├───────────────────────────────────────────────────┤
│ LATENCY │
│ Avg: 289ms | P50: 245ms | P95: 512ms | P99: 892ms│
│ Min: 89ms | Max: 1,245ms │
├───────────────────────────────────────────────────┤
│ TOKEN USAGE │
│ Total Tokens: 125,000 │
│ Prompt Tokens: 75,000 │
│ Completion Tokens: 50,000 │
│ Avg per Request: 1,250 │
├───────────────────────────────────────────────────┤
│ COST ESTIMATION │
│ Estimated Total: $2.50 │
│ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens │
└───────────────────────────────────────────────────┘

ArtemisKit estimates API costs based on token usage and model pricing data.

  1. Token Tracking: Each request’s prompt and completion tokens are tracked
  2. Model Lookup: Pricing data is retrieved for the model used
  3. Calculation: Cost = (prompt_tokens × prompt_rate) + (completion_tokens × completion_rate)
ProviderModelsPricing Source
OpenAIGPT-5, GPT-5.1, GPT-5.2, GPT-4.1, GPT-4o, o1, o3, o3-miniOpenAI API pricing
AnthropicClaude Opus 4.5, Sonnet 4.5, Haiku 4.5, Claude 4, Claude 3.7Anthropic API pricing
AzureSame as OpenAIAzure OpenAI pricing

OpenAI Models:

ModelPromptCompletionNotes
gpt-5$0.00125$0.01400K context
gpt-5-mini$0.00025$0.002
gpt-4.1$0.002$0.0081M context
gpt-4o$0.0025$0.01128K context
gpt-4o-mini$0.00015$0.0006128K context
o3$0.002$0.008Reasoning model
o3-mini$0.0011$0.0044
o1$0.015$0.06Reasoning model

Anthropic Models:

ModelPromptCompletionNotes
claude-opus-4.5$0.005$0.025Most capable
claude-sonnet-4.5$0.003$0.015Balanced
claude-haiku-4.5$0.001$0.005Fastest
claude-opus-4$0.015$0.075
claude-sonnet-4$0.003$0.015
claude-sonnet-3.7$0.003$0.015

For models not in the pricing database, ArtemisKit uses conservative default estimates:

  • Prompt: $0.003 per 1K tokens
  • Completion: $0.015 per 1K tokens
│ COST ESTIMATION │
│ Estimated Total: $2.50 │
│ Prompt Cost: $0.75 (75,000 tokens) │
│ Completion Cost: $1.75 (50,000 tokens) │
│ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens │
  • Estimates are based on publicly available pricing data
  • Actual costs may vary based on:
    • Enterprise agreements
    • Batch discounts
    • Regional pricing differences
  • Always verify with your provider’s billing dashboard for exact costs

When testing, be aware of provider rate limits:

  • OpenAI: Varies by tier
  • Anthropic: Varies by tier
  • Azure: Depends on deployment

Use --concurrency and --ramp-up to avoid hitting limits too quickly.