artemiskit stress
artemiskit stress
Section titled “artemiskit stress”Test your LLM’s performance under load using scenario-based stress testing.
Synopsis
Section titled “Synopsis”artemiskit stress <scenario-file> [options]akit stress <scenario-file> [options]Arguments
Section titled “Arguments”| Argument | Description |
|---|---|
scenario-file | Path to the YAML scenario file |
Options
Section titled “Options”| Option | Short | Description | Default |
|---|---|---|---|
--provider | -p | LLM provider to use | From config/scenario |
--model | -m | Model name | From config/scenario |
--concurrency | -c | Number of concurrent requests | 10 |
--requests | -n | Total number of requests to make | Based on duration |
--duration | -d | Duration to run test (seconds) | 30 |
--ramp-up | Ramp-up time (seconds) | 5 | |
--save | Save results to storage | false | |
--output | -o | Output directory for reports | artemis-output |
--verbose | -v | Verbose output | false |
--config | Path to config file | artemis.config.yaml | |
--redact | Enable PII/sensitive data redaction | false | |
--redact-patterns | Custom redaction patterns (space-separated) | Default patterns | |
--budget | Maximum budget in USD - fail if exceeded | None |
Redaction Patterns
Section titled “Redaction Patterns”Built-in patterns:
email— Email addressesphone— Phone numbers (various formats)credit_card— Credit card numbersssn— Social Security Numbersapi_key— API keys (common formats)ipv4— IPv4 addressesjwt— JWT tokensaws_key— AWS access keyssecrets— Generic secrets (password=, secret=, etc.)
Custom regex patterns:
Pass regex patterns as plain strings (without delimiters). The g flag is added automatically.
# Mix built-in and custom patternsakit stress scenario.yaml --redact --redact-patterns email "\b[A-Z]{2}\d{6}\b"Examples
Section titled “Examples”Basic Stress Test
Section titled “Basic Stress Test”Run a 30-second stress test with default concurrency:
akit stress scenarios/chatbot.yamlHigh Concurrency
Section titled “High Concurrency”Test with 50 concurrent requests:
akit stress scenarios/chatbot.yaml -c 50Fixed Request Count
Section titled “Fixed Request Count”Run exactly 500 requests:
akit stress scenarios/chatbot.yaml -n 500 -c 25Extended Duration
Section titled “Extended Duration”Run for 5 minutes:
akit stress scenarios/chatbot.yaml -d 300 --saveCustom Ramp-Up
Section titled “Custom Ramp-Up”Gradually increase load over 30 seconds:
akit stress scenarios/chatbot.yaml -c 100 --ramp-up 30With Budget Limit
Section titled “With Budget Limit”Fail if estimated cost exceeds a threshold:
# Fail if cost exceeds $5.00akit stress scenarios/chatbot.yaml --budget 5.00This is useful in CI/CD pipelines to prevent runaway costs during load testing.
Metrics
Section titled “Metrics”The stress test measures:
| Metric | Description |
|---|---|
| Throughput | Requests per second |
| Avg Latency | Average response time |
| P50 Latency | 50th percentile (median) |
| P90 Latency | 90th percentile |
| P95 Latency | 95th percentile |
| P99 Latency | 99th percentile |
| Min/Max Latency | Latency range |
| Success Rate | Percentage of successful requests |
| Error Rate | Failures and rate limiting |
| Total Tokens | Total tokens consumed (prompt + completion) |
| Avg Tokens/Request | Average token usage per request |
| Estimated Cost | Cost estimation based on model pricing |
Example Output
Section titled “Example Output”Stress Test: chatbotProvider: openai (gpt-4o-mini)Duration: 30s | Concurrency: 50 | Ramp-up: 5s
Progress: [████████████████████] 100%
┌─ STRESS TEST SUMMARY ─────────────────────────────┐│ Total Requests: 523 ││ Successful: 513 (98.1%) ││ Failed: 10 (1.9%) ││ Duration: 30.2s ││ Throughput: 17.3 req/s │├───────────────────────────────────────────────────┤│ LATENCY ││ Avg: 289ms | P50: 245ms | P95: 512ms | P99: 892ms││ Min: 89ms | Max: 1,245ms │├───────────────────────────────────────────────────┤│ TOKEN USAGE ││ Total Tokens: 125,000 ││ Prompt Tokens: 75,000 ││ Completion Tokens: 50,000 ││ Avg per Request: 1,250 │├───────────────────────────────────────────────────┤│ COST ESTIMATION ││ Estimated Total: $2.50 ││ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens │└───────────────────────────────────────────────────┘Cost Estimation
Section titled “Cost Estimation”ArtemisKit estimates API costs based on token usage and model pricing data.
How It Works
Section titled “How It Works”- Token Tracking: Each request’s prompt and completion tokens are tracked
- Model Lookup: Pricing data is retrieved for the model used
- Calculation: Cost = (prompt_tokens × prompt_rate) + (completion_tokens × completion_rate)
Supported Models
Section titled “Supported Models”| Provider | Models | Pricing Source |
|---|---|---|
| OpenAI | GPT-5, GPT-5.1, GPT-5.2, GPT-4.1, GPT-4o, o1, o3, o3-mini | OpenAI API pricing |
| Anthropic | Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, Claude 4, Claude 3.7 | Anthropic API pricing |
| Azure | Same as OpenAI | Azure OpenAI pricing |
Sample Pricing (per 1K tokens)
Section titled “Sample Pricing (per 1K tokens)”OpenAI Models:
| Model | Prompt | Completion | Notes |
|---|---|---|---|
| gpt-5 | $0.00125 | $0.01 | 400K context |
| gpt-5-mini | $0.00025 | $0.002 | |
| gpt-4.1 | $0.002 | $0.008 | 1M context |
| gpt-4o | $0.0025 | $0.01 | 128K context |
| gpt-4o-mini | $0.00015 | $0.0006 | 128K context |
| o3 | $0.002 | $0.008 | Reasoning model |
| o3-mini | $0.0011 | $0.0044 | |
| o1 | $0.015 | $0.06 | Reasoning model |
Anthropic Models:
| Model | Prompt | Completion | Notes |
|---|---|---|---|
| claude-opus-4.5 | $0.005 | $0.025 | Most capable |
| claude-sonnet-4.5 | $0.003 | $0.015 | Balanced |
| claude-haiku-4.5 | $0.001 | $0.005 | Fastest |
| claude-opus-4 | $0.015 | $0.075 | |
| claude-sonnet-4 | $0.003 | $0.015 | |
| claude-sonnet-3.7 | $0.003 | $0.015 |
Unknown Models
Section titled “Unknown Models”For models not in the pricing database, ArtemisKit uses conservative default estimates:
- Prompt: $0.003 per 1K tokens
- Completion: $0.015 per 1K tokens
Cost Report Example
Section titled “Cost Report Example”│ COST ESTIMATION ││ Estimated Total: $2.50 ││ Prompt Cost: $0.75 (75,000 tokens) ││ Completion Cost: $1.75 (50,000 tokens) ││ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens │Accuracy Notes
Section titled “Accuracy Notes”- Estimates are based on publicly available pricing data
- Actual costs may vary based on:
- Enterprise agreements
- Batch discounts
- Regional pricing differences
- Always verify with your provider’s billing dashboard for exact costs
Rate Limiting
Section titled “Rate Limiting”When testing, be aware of provider rate limits:
- OpenAI: Varies by tier
- Anthropic: Varies by tier
- Azure: Depends on deployment
Use --concurrency and --ramp-up to avoid hitting limits too quickly.
See Also
Section titled “See Also”- Scenario Format
- Run Command
- Red Team Command — Security testing with automated attacks
- CI/CD Integration