artemiskit stress

Test your LLM’s performance under load using scenario-based stress testing.

Synopsis

artemiskit stress <scenario-file> [options]
akit stress <scenario-file> [options]

Arguments

Argument	Description
`scenario-file`	Path to the YAML scenario file

Options

Option	Short	Description	Default
`--provider`	`-p`	LLM provider to use	From config/scenario
`--model`	`-m`	Model name	From config/scenario
`--concurrency`	`-c`	Number of concurrent requests	`10`
`--requests`	`-n`	Total number of requests to make	Based on duration
`--duration`	`-d`	Duration to run test (seconds)	`30`
`--ramp-up`		Ramp-up time (seconds)	`5`
`--save`		Save results to storage	`false`
`--output`	`-o`	Output directory for reports	`artemis-output`
`--verbose`	`-v`	Verbose output	`false`
`--config`		Path to config file	`artemis.config.yaml`
`--redact`		Enable PII/sensitive data redaction	`false`
`--redact-patterns`		Custom redaction patterns (space-separated)	Default patterns
`--budget`		Maximum budget in USD - fail if exceeded	None

Redaction Patterns

Built-in patterns:

email — Email addresses
phone — Phone numbers (various formats)
credit_card — Credit card numbers
ssn — Social Security Numbers
api_key — API keys (common formats)
ipv4 — IPv4 addresses
jwt — JWT tokens
aws_key — AWS access keys
secrets — Generic secrets (password=, secret=, etc.)

Custom regex patterns:

Pass regex patterns as plain strings (without delimiters). The g flag is added automatically.

# Mix built-in and custom patterns
akit stress scenario.yaml --redact --redact-patterns email "\b[A-Z]{2}\d{6}\b"

Examples

Basic Stress Test

Run a 30-second stress test with default concurrency:

akit stress scenarios/chatbot.yaml

High Concurrency

Test with 50 concurrent requests:

akit stress scenarios/chatbot.yaml -c 50

Fixed Request Count

Run exactly 500 requests:

akit stress scenarios/chatbot.yaml -n 500 -c 25

Extended Duration

Run for 5 minutes:

akit stress scenarios/chatbot.yaml -d 300 --save

Custom Ramp-Up

Gradually increase load over 30 seconds:

akit stress scenarios/chatbot.yaml -c 100 --ramp-up 30

With Budget Limit

Fail if estimated cost exceeds a threshold:

# Fail if cost exceeds $5.00
akit stress scenarios/chatbot.yaml --budget 5.00

This is useful in CI/CD pipelines to prevent runaway costs during load testing.

Metrics

The stress test measures:

Metric	Description
Throughput	Requests per second
Avg Latency	Average response time
P50 Latency	50th percentile (median)
P90 Latency	90th percentile
P95 Latency	95th percentile
P99 Latency	99th percentile
Min/Max Latency	Latency range
Success Rate	Percentage of successful requests
Error Rate	Failures and rate limiting
Total Tokens	Total tokens consumed (prompt + completion)
Avg Tokens/Request	Average token usage per request
Estimated Cost	Cost estimation based on model pricing

Example Output

Stress Test: chatbot
Provider: openai (gpt-4o-mini)
Duration: 30s | Concurrency: 50 | Ramp-up: 5s

Progress: [████████████████████] 100%

┌─ STRESS TEST SUMMARY ─────────────────────────────┐
│ Total Requests:     523                           │
│ Successful:         513 (98.1%)                   │
│ Failed:              10 (1.9%)                    │
│ Duration:           30.2s                         │
│ Throughput:         17.3 req/s                    │
├───────────────────────────────────────────────────┤
│ LATENCY                                           │
│ Avg: 289ms | P50: 245ms | P95: 512ms | P99: 892ms│
│ Min: 89ms  | Max: 1,245ms                         │
├───────────────────────────────────────────────────┤
│ TOKEN USAGE                                       │
│ Total Tokens:       125,000                       │
│ Prompt Tokens:       75,000                       │
│ Completion Tokens:   50,000                       │
│ Avg per Request:      1,250                       │
├───────────────────────────────────────────────────┤
│ COST ESTIMATION                                   │
│ Estimated Total:    $2.50                         │
│ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens    │
└───────────────────────────────────────────────────┘

Cost Estimation

ArtemisKit estimates API costs based on token usage and model pricing data.

How It Works

Token Tracking: Each request’s prompt and completion tokens are tracked
Model Lookup: Pricing data is retrieved for the model used
Calculation: Cost = (prompt_tokens × prompt_rate) + (completion_tokens × completion_rate)

Supported Models

Provider	Models	Pricing Source
OpenAI	GPT-5, GPT-5.1, GPT-5.2, GPT-4.1, GPT-4o, o1, o3, o3-mini	OpenAI API pricing
Anthropic	Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, Claude 4, Claude 3.7	Anthropic API pricing
Azure	Same as OpenAI	Azure OpenAI pricing

Sample Pricing (per 1K tokens)

OpenAI Models:

Model	Prompt	Completion	Notes
gpt-5	$0.00125	$0.01	400K context
gpt-5-mini	$0.00025	$0.002
gpt-4.1	$0.002	$0.008	1M context
gpt-4o	$0.0025	$0.01	128K context
gpt-4o-mini	$0.00015	$0.0006	128K context
o3	$0.002	$0.008	Reasoning model
o3-mini	$0.0011	$0.0044
o1	$0.015	$0.06	Reasoning model

Anthropic Models:

Model	Prompt	Completion	Notes
claude-opus-4.5	$0.005	$0.025	Most capable
claude-sonnet-4.5	$0.003	$0.015	Balanced
claude-haiku-4.5	$0.001	$0.005	Fastest
claude-opus-4	$0.015	$0.075
claude-sonnet-4	$0.003	$0.015
claude-sonnet-3.7	$0.003	$0.015

Unknown Models

For models not in the pricing database, ArtemisKit uses conservative default estimates:

Prompt: $0.003 per 1K tokens
Completion: $0.015 per 1K tokens

Cost Report Example

│ COST ESTIMATION                                   │
│ Estimated Total:    $2.50                         │
│ Prompt Cost:        $0.75 (75,000 tokens)         │
│ Completion Cost:    $1.75 (50,000 tokens)         │
│ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens    │

Accuracy Notes

Estimates are based on publicly available pricing data
Actual costs may vary based on:
- Enterprise agreements
- Batch discounts
- Regional pricing differences
Always verify with your provider’s billing dashboard for exact costs

Rate Limiting

When testing, be aware of provider rate limits:

OpenAI: Varies by tier
Anthropic: Varies by tier
Azure: Depends on deployment

Use --concurrency and --ramp-up to avoid hitting limits too quickly.

artemiskit stress

artemiskit stress

Synopsis

Arguments

Options

Redaction Patterns

Examples

Basic Stress Test

High Concurrency

Fixed Request Count

Extended Duration

Custom Ramp-Up

With Budget Limit

Metrics

Example Output

Cost Estimation

How It Works

Supported Models

Sample Pricing (per 1K tokens)

Unknown Models

Cost Report Example

Accuracy Notes

Rate Limiting

See Also