Skip to content

Stress Testing

Know your latency limits before users find them. ArtemisKit stress testing measures throughput, latency percentiles, and cost under load.

Stress testing reveals:

  • Throughput — Requests per second your system can handle
  • Latency percentiles — p50, p90, p95, p99 response times
  • Failure rate — How the system degrades under load
  • Token usage — Consumption patterns at scale
  • Cost estimates — API costs for production traffic
Terminal window
# Basic stress test (30 seconds, 10 concurrent)
akit stress scenarios/test.yaml
# Higher load
akit stress scenarios/test.yaml \
--concurrency 50 \
--duration 60
# With cost tracking and budget limit
akit stress scenarios/test.yaml \
--concurrency 20 \
--duration 60 \
--budget 5.00 \
--save
OptionDescriptionDefault
-c, --concurrency <n>Concurrent requests10
-d, --duration <sec>Test duration in seconds30
-n, --requests <n>Max total requests (optional)No limit
--ramp-up <sec>Gradual ramp-up time5
--budget <usd>Max budget in USD (fails if exceeded)No limit
--saveSave results to storagefalse
-o, --output <dir>Output directory for reportsNone
--redactEnable PII redactionfalse
-v, --verboseShow latency histogramfalse

The scenario file defines the prompts to use:

scenarios/stress-test.yaml
name: api-stress-test
description: Load test for production API
provider: openai
model: gpt-4o-mini
# Prompts are selected round-robin during the test
cases:
- id: short-query
prompt: "What is 2+2?"
expected:
type: contains
values: ["4"]
- id: medium-query
prompt: "Explain photosynthesis in 2 sentences."
expected:
type: llm_grader
rubric: "Explains photosynthesis clearly"
- id: longer-query
prompt: "Write a haiku about programming."
expected:
type: llm_grader
rubric: "Valid haiku structure"
Stress Test Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Requests 1,247 total │ 1,241 success │ 6 failed
Success 99.5%
Throughput 41.6 req/s
Latency
p50 234ms
p90 456ms
p95 512ms
p99 789ms
avg 267ms
Tokens
Total 156,234
Prompt 78,456
Completion 77,778
Avg/req 125.3
Cost
Estimated $0.47 (gpt-4o-mini)
MetricMeaningTarget
p50Median latency (50th percentile)< 500ms typical
p9595% of requests faster than this< 2000ms typical
p9999% of requests faster than thisWatch for spikes
ThroughputRequests per secondDepends on concurrency
Success rate% of requests that succeeded> 99% for production
import { ArtemisKit } from '@artemiskit/sdk';
const kit = new ArtemisKit({
provider: 'openai',
model: 'gpt-4o-mini',
});
// Run stress test
const results = await kit.stress({
scenario: './scenarios/stress-test.yaml',
concurrency: 20,
duration: 60,
});
// Access metrics
const { metrics } = results.manifest;
console.log('Throughput:', metrics.requests_per_second, 'req/s');
console.log('p50 Latency:', metrics.p50_latency_ms, 'ms');
console.log('p95 Latency:', metrics.p95_latency_ms, 'ms');
console.log('p99 Latency:', metrics.p99_latency_ms, 'ms');
console.log('Success Rate:', (metrics.success_rate * 100).toFixed(1), '%');
if (metrics.cost) {
console.log('Estimated Cost:', `$${metrics.cost.estimated_total_usd.toFixed(2)}`);
}
if (metrics.tokens) {
console.log('Total Tokens:', metrics.tokens.total_tokens);
console.log('Avg Tokens/Request:', metrics.tokens.avg_tokens_per_request.toFixed(1));
}
const kit = new ArtemisKit({
provider: 'openai',
model: 'gpt-4o-mini',
});
// Subscribe to events
kit.on('stress:request:complete', (event) => {
console.log(`Request ${event.index}: ${event.latencyMs}ms ${event.success ? '' : ''}`);
});
const results = await kit.stress({
scenario: './scenarios/stress-test.yaml',
concurrency: 10,
duration: 30,
});

Gradually increase load to find breaking points:

Terminal window
# 10 second ramp-up to 50 concurrent
akit stress scenarios/test.yaml \
--concurrency 50 \
--ramp-up 10 \
--duration 60 \
--verbose

The --verbose flag shows a latency distribution histogram:

Latency Distribution
0- 100ms │ ██████████████████████████████ 423
100- 200ms │ ████████████████████ 312
200- 300ms │ ██████████████ 198
300- 400ms │ ████████ 124
400- 500ms │ █████ 78
500- 600ms │ ███ 45
600- 700ms │ █ 21
700- 800ms │ █ 12
800- 900ms │ 5
900- 1000ms │ 3

Set cost limits to prevent runaway spending:

Terminal window
# Fail if cost exceeds $5
akit stress scenarios/test.yaml \
--concurrency 20 \
--duration 120 \
--budget 5.00

If exceeded:

✗ BUDGET EXCEEDED
Budget: $5.00 | Actual: $5.47 | Over by: $0.47
.github/workflows/performance.yml
name: Performance Tests
on:
schedule:
- cron: '0 6 * * *' # Daily at 6 AM
workflow_dispatch:
jobs:
stress-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install ArtemisKit
run: npm install -g @artemiskit/cli
- name: Run Stress Test
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
akit stress scenarios/stress-test.yaml \
--concurrency 20 \
--duration 60 \
--budget 10.00 \
--save \
--output ./reports
- name: Check Performance
run: |
P95=$(cat reports/*.json | jq '.metrics.p95_latency_ms')
SUCCESS=$(cat reports/*.json | jq '.metrics.success_rate')
if (( $(echo "$P95 > 2000" | bc -l) )); then
echo "::error::p95 latency too high: ${P95}ms"
exit 1
fi
if (( $(echo "$SUCCESS < 0.99" | bc -l) )); then
echo "::error::Success rate too low: ${SUCCESS}"
exit 1
fi
- name: Upload Reports
uses: actions/upload-artifact@v4
if: always()
with:
name: stress-reports
path: reports/
const results = await kit.stress({
scenario: './scenarios/stress-test.yaml',
concurrency: 20,
duration: 60,
});
const { metrics } = results.manifest;
// Performance gates
if (metrics.p95_latency_ms > 2000) {
throw new Error(`p95 latency too high: ${metrics.p95_latency_ms}ms`);
}
if (metrics.success_rate < 0.99) {
throw new Error(`Success rate too low: ${(metrics.success_rate * 100).toFixed(1)}%`);
}
if (metrics.cost && metrics.cost.estimated_total_usd > 10) {
throw new Error(`Cost too high: $${metrics.cost.estimated_total_usd.toFixed(2)}`);
}
console.log('Performance tests passed!');

Test the same scenario against different providers:

Terminal window
# Test OpenAI
akit stress scenarios/test.yaml \
--provider openai \
--model gpt-4o-mini \
--save
# Test Anthropic
akit stress scenarios/test.yaml \
--provider anthropic \
--model claude-3-haiku-20240307 \
--save
# Compare results
akit compare --baseline <openai-run-id> --current <anthropic-run-id>
import { ArtemisKit } from '@artemiskit/sdk';
import { vitestMatchers } from '@artemiskit/sdk/vitest';
import { describe, test, expect, beforeAll } from 'vitest';
expect.extend(vitestMatchers);
describe('Performance', () => {
let kit: ArtemisKit;
beforeAll(() => {
kit = new ArtemisKit({
provider: 'openai',
model: 'gpt-4o-mini',
});
});
test('p95 latency under 2000ms', async () => {
const results = await kit.stress({
scenario: './scenarios/stress-test.yaml',
concurrency: 10,
duration: 30,
});
expect(results).toHaveStressP95LatencyBelow(2000);
}, 60000);
test('achieves 10 RPS', async () => {
const results = await kit.stress({
scenario: './scenarios/stress-test.yaml',
concurrency: 10,
duration: 30,
});
expect(results).toAchieveRPS(10);
}, 60000);
test('99% success rate', async () => {
const results = await kit.stress({
scenario: './scenarios/stress-test.yaml',
concurrency: 10,
duration: 30,
});
expect(results).toHaveStressSuccessRate(0.99);
}, 60000);
});