artemiskit stress
artemiskit stress
Section titled “artemiskit stress”Test your LLM’s performance under load using scenario-based stress testing.
Synopsis
Section titled “Synopsis”artemiskit stress <scenario-file> [options]akit stress <scenario-file> [options]Arguments
Section titled “Arguments”| Argument | Description |
|---|---|
scenario-file | Path to the YAML scenario file |
Options
Section titled “Options”| Option | Short | Description | Default |
|---|---|---|---|
--provider | -p | LLM provider to use | From config/scenario |
--model | -m | Model name | From config/scenario |
--concurrency | -c | Number of concurrent requests | 10 |
--requests | -n | Total number of requests to make | Based on duration |
--duration | -d | Duration to run test (seconds) | 30 |
--ramp-up | Ramp-up time (seconds) | 5 | |
--save | Save results to storage | false | |
--output | -o | Output directory for reports | artemis-output |
--verbose | -v | Verbose output | false |
--config | Path to config file | artemis.config.yaml | |
--redact | Enable PII/sensitive data redaction | false | |
--redact-patterns | Custom redaction patterns (regex or built-in) | Default patterns |
Redaction Patterns
Section titled “Redaction Patterns”Built-in patterns: email, phone, credit_card, ssn, api_key
Examples
Section titled “Examples”Basic Stress Test
Section titled “Basic Stress Test”Run a 30-second stress test with default concurrency:
akit stress scenarios/chatbot.yamlHigh Concurrency
Section titled “High Concurrency”Test with 50 concurrent requests:
akit stress scenarios/chatbot.yaml -c 50Fixed Request Count
Section titled “Fixed Request Count”Run exactly 500 requests:
akit stress scenarios/chatbot.yaml -n 500 -c 25Extended Duration
Section titled “Extended Duration”Run for 5 minutes:
akit stress scenarios/chatbot.yaml -d 300 --saveCustom Ramp-Up
Section titled “Custom Ramp-Up”Gradually increase load over 30 seconds:
akit stress scenarios/chatbot.yaml -c 100 --ramp-up 30Metrics
Section titled “Metrics”The stress test measures:
| Metric | Description |
|---|---|
| Throughput | Requests per second |
| Avg Latency | Average response time |
| P50 Latency | 50th percentile (median) |
| P90 Latency | 90th percentile |
| P95 Latency | 95th percentile |
| P99 Latency | 99th percentile |
| Min/Max Latency | Latency range |
| Success Rate | Percentage of successful requests |
| Error Rate | Failures and rate limiting |
| Total Tokens | Total tokens consumed (prompt + completion) |
| Avg Tokens/Request | Average token usage per request |
| Estimated Cost | Cost estimation based on model pricing |
Example Output
Section titled “Example Output”Stress Test: chatbotProvider: openai (gpt-4o-mini)Duration: 30s | Concurrency: 50 | Ramp-up: 5s
Progress: [████████████████████] 100%
┌─ STRESS TEST SUMMARY ─────────────────────────────┐│ Total Requests: 523 ││ Successful: 513 (98.1%) ││ Failed: 10 (1.9%) ││ Duration: 30.2s ││ Throughput: 17.3 req/s │├───────────────────────────────────────────────────┤│ LATENCY ││ Avg: 289ms | P50: 245ms | P95: 512ms | P99: 892ms││ Min: 89ms | Max: 1,245ms │├───────────────────────────────────────────────────┤│ TOKEN USAGE ││ Total Tokens: 125,000 ││ Prompt Tokens: 75,000 ││ Completion Tokens: 50,000 ││ Avg per Request: 1,250 │├───────────────────────────────────────────────────┤│ COST ESTIMATION ││ Estimated Total: $2.50 ││ Model: gpt-4o-mini @ $0.15/$0.60 per 1M tokens │└───────────────────────────────────────────────────┘Rate Limiting
Section titled “Rate Limiting”When testing, be aware of provider rate limits:
- OpenAI: Varies by tier
- Anthropic: Varies by tier
- Azure: Depends on deployment
Use --concurrency and --ramp-up to avoid hitting limits too quickly.