Stress Testing
Stress Testing
Section titled “Stress Testing”Know your latency limits before users find them. ArtemisKit stress testing measures throughput, latency percentiles, and cost under load.
Overview
Section titled “Overview”Stress testing reveals:
- Throughput — Requests per second your system can handle
- Latency percentiles — p50, p90, p95, p99 response times
- Failure rate — How the system degrades under load
- Token usage — Consumption patterns at scale
- Cost estimates — API costs for production traffic
Quick Start
Section titled “Quick Start”# Basic stress test (30 seconds, 10 concurrent)akit stress scenarios/test.yaml
# Higher loadakit stress scenarios/test.yaml \ --concurrency 50 \ --duration 60
# With cost tracking and budget limitakit stress scenarios/test.yaml \ --concurrency 20 \ --duration 60 \ --budget 5.00 \ --saveCLI Options
Section titled “CLI Options”| Option | Description | Default |
|---|---|---|
-c, --concurrency <n> | Concurrent requests | 10 |
-d, --duration <sec> | Test duration in seconds | 30 |
-n, --requests <n> | Max total requests (optional) | No limit |
--ramp-up <sec> | Gradual ramp-up time | 5 |
--budget <usd> | Max budget in USD (fails if exceeded) | No limit |
--save | Save results to storage | false |
-o, --output <dir> | Output directory for reports | None |
--redact | Enable PII redaction | false |
-v, --verbose | Show latency histogram | false |
Creating a Stress Scenario
Section titled “Creating a Stress Scenario”The scenario file defines the prompts to use:
name: api-stress-testdescription: Load test for production APIprovider: openaimodel: gpt-4o-mini
# Prompts are selected round-robin during the testcases: - id: short-query prompt: "What is 2+2?" expected: type: contains values: ["4"]
- id: medium-query prompt: "Explain photosynthesis in 2 sentences." expected: type: llm_grader rubric: "Explains photosynthesis clearly"
- id: longer-query prompt: "Write a haiku about programming." expected: type: llm_grader rubric: "Valid haiku structure"Understanding Results
Section titled “Understanding Results”Stress Test Results━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Requests 1,247 total │ 1,241 success │ 6 failed Success 99.5% Throughput 41.6 req/s
Latency p50 234ms p90 456ms p95 512ms p99 789ms avg 267ms
Tokens Total 156,234 Prompt 78,456 Completion 77,778 Avg/req 125.3
Cost Estimated $0.47 (gpt-4o-mini)Key Metrics
Section titled “Key Metrics”| Metric | Meaning | Target |
|---|---|---|
| p50 | Median latency (50th percentile) | < 500ms typical |
| p95 | 95% of requests faster than this | < 2000ms typical |
| p99 | 99% of requests faster than this | Watch for spikes |
| Throughput | Requests per second | Depends on concurrency |
| Success rate | % of requests that succeeded | > 99% for production |
SDK Stress Testing
Section titled “SDK Stress Testing”import { ArtemisKit } from '@artemiskit/sdk';
const kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o-mini',});
// Run stress testconst results = await kit.stress({ scenario: './scenarios/stress-test.yaml', concurrency: 20, duration: 60,});
// Access metricsconst { metrics } = results.manifest;
console.log('Throughput:', metrics.requests_per_second, 'req/s');console.log('p50 Latency:', metrics.p50_latency_ms, 'ms');console.log('p95 Latency:', metrics.p95_latency_ms, 'ms');console.log('p99 Latency:', metrics.p99_latency_ms, 'ms');console.log('Success Rate:', (metrics.success_rate * 100).toFixed(1), '%');
if (metrics.cost) { console.log('Estimated Cost:', `$${metrics.cost.estimated_total_usd.toFixed(2)}`);}
if (metrics.tokens) { console.log('Total Tokens:', metrics.tokens.total_tokens); console.log('Avg Tokens/Request:', metrics.tokens.avg_tokens_per_request.toFixed(1));}With Progress Tracking
Section titled “With Progress Tracking”const kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o-mini',});
// Subscribe to eventskit.on('stress:request:complete', (event) => { console.log(`Request ${event.index}: ${event.latencyMs}ms ${event.success ? '✓' : '✗'}`);});
const results = await kit.stress({ scenario: './scenarios/stress-test.yaml', concurrency: 10, duration: 30,});Ramp-Up Testing
Section titled “Ramp-Up Testing”Gradually increase load to find breaking points:
# 10 second ramp-up to 50 concurrentakit stress scenarios/test.yaml \ --concurrency 50 \ --ramp-up 10 \ --duration 60 \ --verboseThe --verbose flag shows a latency distribution histogram:
Latency Distribution
0- 100ms │ ██████████████████████████████ 423 100- 200ms │ ████████████████████ 312 200- 300ms │ ██████████████ 198 300- 400ms │ ████████ 124 400- 500ms │ █████ 78 500- 600ms │ ███ 45 600- 700ms │ █ 21 700- 800ms │ █ 12 800- 900ms │ 5 900- 1000ms │ 3Budget Limits
Section titled “Budget Limits”Set cost limits to prevent runaway spending:
# Fail if cost exceeds $5akit stress scenarios/test.yaml \ --concurrency 20 \ --duration 120 \ --budget 5.00If exceeded:
✗ BUDGET EXCEEDED Budget: $5.00 | Actual: $5.47 | Over by: $0.47CI/CD Integration
Section titled “CI/CD Integration”GitHub Actions
Section titled “GitHub Actions”name: Performance Tests
on: schedule: - cron: '0 6 * * *' # Daily at 6 AM workflow_dispatch:
jobs: stress-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20'
- name: Install ArtemisKit run: npm install -g @artemiskit/cli
- name: Run Stress Test env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | akit stress scenarios/stress-test.yaml \ --concurrency 20 \ --duration 60 \ --budget 10.00 \ --save \ --output ./reports
- name: Check Performance run: | P95=$(cat reports/*.json | jq '.metrics.p95_latency_ms') SUCCESS=$(cat reports/*.json | jq '.metrics.success_rate')
if (( $(echo "$P95 > 2000" | bc -l) )); then echo "::error::p95 latency too high: ${P95}ms" exit 1 fi
if (( $(echo "$SUCCESS < 0.99" | bc -l) )); then echo "::error::Success rate too low: ${SUCCESS}" exit 1 fi
- name: Upload Reports uses: actions/upload-artifact@v4 if: always() with: name: stress-reports path: reports/SDK Performance Gate
Section titled “SDK Performance Gate”const results = await kit.stress({ scenario: './scenarios/stress-test.yaml', concurrency: 20, duration: 60,});
const { metrics } = results.manifest;
// Performance gatesif (metrics.p95_latency_ms > 2000) { throw new Error(`p95 latency too high: ${metrics.p95_latency_ms}ms`);}
if (metrics.success_rate < 0.99) { throw new Error(`Success rate too low: ${(metrics.success_rate * 100).toFixed(1)}%`);}
if (metrics.cost && metrics.cost.estimated_total_usd > 10) { throw new Error(`Cost too high: $${metrics.cost.estimated_total_usd.toFixed(2)}`);}
console.log('Performance tests passed!');Comparing Providers
Section titled “Comparing Providers”Test the same scenario against different providers:
# Test OpenAIakit stress scenarios/test.yaml \ --provider openai \ --model gpt-4o-mini \ --save
# Test Anthropicakit stress scenarios/test.yaml \ --provider anthropic \ --model claude-3-haiku-20240307 \ --save
# Compare resultsakit compare --baseline <openai-run-id> --current <anthropic-run-id>Best Practices
Section titled “Best Practices”Test Matchers (SDK)
Section titled “Test Matchers (SDK)”import { ArtemisKit } from '@artemiskit/sdk';import { vitestMatchers } from '@artemiskit/sdk/vitest';import { describe, test, expect, beforeAll } from 'vitest';
expect.extend(vitestMatchers);
describe('Performance', () => { let kit: ArtemisKit;
beforeAll(() => { kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o-mini', }); });
test('p95 latency under 2000ms', async () => { const results = await kit.stress({ scenario: './scenarios/stress-test.yaml', concurrency: 10, duration: 30, });
expect(results).toHaveStressP95LatencyBelow(2000); }, 60000);
test('achieves 10 RPS', async () => { const results = await kit.stress({ scenario: './scenarios/stress-test.yaml', concurrency: 10, duration: 30, });
expect(results).toAchieveRPS(10); }, 60000);
test('99% success rate', async () => { const results = await kit.stress({ scenario: './scenarios/stress-test.yaml', concurrency: 10, duration: 30, });
expect(results).toHaveStressSuccessRate(0.99); }, 60000);});See Also
Section titled “See Also”- CLI Stress Command — Full command reference
- CI/CD Integration — Pipeline setup
- Regression Testing — Track performance over time