Skip to content

artemiskit stress

Test your LLM’s performance under load using scenario-based stress testing.

Terminal window
artemiskit stress <scenario-file> [options]
akit stress <scenario-file> [options]
ArgumentDescription
scenario-filePath to the YAML scenario file
OptionShortDescriptionDefault
--provider-pLLM provider to useFrom config/scenario
--model-mModel nameFrom config/scenario
--concurrency-cNumber of concurrent requests10
--requests-nTotal number of requests to makeBased on duration
--duration-dDuration to run test (seconds)30
--ramp-upRamp-up time (seconds)5
--saveSave results to storagefalse
--output-oOutput directory for reportsartemis-output
--verbose-vVerbose outputfalse
--configPath to config fileartemis.config.yaml
--redactEnable PII/sensitive data redactionfalse
--redact-patternsCustom redaction patterns (regex or built-in)Default patterns

Built-in patterns: email, phone, credit_card, ssn, api_key

Run a 30-second stress test with default concurrency:

Terminal window
akit stress scenarios/chatbot.yaml

Test with 50 concurrent requests:

Terminal window
akit stress scenarios/chatbot.yaml -c 50

Run exactly 500 requests:

Terminal window
akit stress scenarios/chatbot.yaml -n 500 -c 25

Run for 5 minutes:

Terminal window
akit stress scenarios/chatbot.yaml -d 300 --save

Gradually increase load over 30 seconds:

Terminal window
akit stress scenarios/chatbot.yaml -c 100 --ramp-up 30

The stress test measures:

MetricDescription
ThroughputRequests per second
Avg LatencyAverage response time
P50 Latency50th percentile (median)
P90 Latency90th percentile
P95 Latency95th percentile
P99 Latency99th percentile
Min/Max LatencyLatency range
Success RatePercentage of successful requests
Error RateFailures and rate limiting
Total TokensTotal tokens consumed (prompt + completion)
Avg Tokens/RequestAverage token usage per request
Estimated CostCost estimation based on model pricing
Stress Test: chatbot
Provider: openai (gpt-4o-mini)
Duration: 30s | Concurrency: 50 | Ramp-up: 5s
Progress: [████████████████████] 100%
┌─ STRESS TEST SUMMARY ─────────────────────────────┐
│ Total Requests: 523 │
│ Successful: 513 (98.1%) │
│ Failed: 10 (1.9%) │
│ Duration: 30.2s │
│ Throughput: 17.3 req/s │
├───────────────────────────────────────────────────┤
│ LATENCY │
│ Avg: 289ms | P50: 245ms | P95: 512ms | P99: 892ms│
│ Min: 89ms | Max: 1,245ms │
├───────────────────────────────────────────────────┤
│ TOKEN USAGE │
│ Total Tokens: 125,000 │
│ Prompt Tokens: 75,000 │
│ Completion Tokens: 50,000 │
│ Avg per Request: 1,250 │
├───────────────────────────────────────────────────┤
│ COST ESTIMATION │
│ Estimated Total: $2.50 │
│ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens │
└───────────────────────────────────────────────────┘

When testing, be aware of provider rate limits:

  • OpenAI: Varies by tier
  • Anthropic: Varies by tier
  • Azure: Depends on deployment

Use --concurrency and --ramp-up to avoid hitting limits too quickly.