Test Matchers
Test Matchers
Section titled “Test Matchers”ArtemisKit provides custom matchers for Jest and Vitest to make LLM testing seamless in your test suites.
Installation
Section titled “Installation”The matchers are included in the SDK package:
bun add @artemiskit/sdk# ornpm install @artemiskit/sdk// jest.setup.ts or in your test fileimport { jestMatchers } from '@artemiskit/sdk/jest';
expect.extend(jestMatchers);// vitest.setup.ts or in your test fileimport { vitestMatchers } from '@artemiskit/sdk/vitest';import { expect } from 'vitest';
expect.extend(vitestMatchers);TypeScript Setup
Section titled “TypeScript Setup”For full TypeScript support, extend the matcher types:
// jest.d.ts or in your test filedeclare global { namespace jest { interface Matchers<R> { toPassAllCases(): R; toHaveSuccessRate(rate: number): R; toPassCasesWithTag(tag: string): R; toHaveMedianLatencyBelow(ms: number): R; toHaveP95LatencyBelow(ms: number): R; toPassRedTeam(): R; toHaveDefenseRate(rate: number): R; toHaveNoCriticalVulnerabilities(): R; toHaveNoHighSeverityVulnerabilities(): R; toPassStressTest(): R; toHaveStressSuccessRate(rate: number): R; toAchieveRPS(rps: number): R; toHaveStressP95LatencyBelow(ms: number): R; } }}// vitest.d.ts or in your test filedeclare module 'vitest' { interface Assertion<T = any> { toPassAllCases(): void; toHaveSuccessRate(rate: number): void; toPassCasesWithTag(tag: string): void; toHaveMedianLatencyBelow(ms: number): void; toHaveP95LatencyBelow(ms: number): void; toPassRedTeam(): void; toHaveDefenseRate(rate: number): void; toHaveNoCriticalVulnerabilities(): void; toHaveNoHighSeverityVulnerabilities(): void; toPassStressTest(): void; toHaveStressSuccessRate(rate: number): void; toAchieveRPS(rps: number): void; toHaveStressP95LatencyBelow(ms: number): void; }}Available Matchers
Section titled “Available Matchers”Run Test Matchers
Section titled “Run Test Matchers”Use these with results from kit.run():
| Matcher | Description |
|---|---|
toPassAllCases() | All test cases passed |
toHaveSuccessRate(rate) | Achieve minimum success rate (0-1) |
toPassCasesWithTag(tag) | All cases with specified tag passed |
toHaveMedianLatencyBelow(ms) | Median latency under threshold |
toHaveP95LatencyBelow(ms) | P95 latency under threshold |
Red Team Matchers
Section titled “Red Team Matchers”Use these with results from kit.redteam():
| Matcher | Description |
|---|---|
toPassRedTeam() | No vulnerabilities found |
toHaveDefenseRate(rate) | Achieve minimum defense rate (0-1) |
toHaveNoCriticalVulnerabilities() | No critical severity issues |
toHaveNoHighSeverityVulnerabilities() | No high or critical severity issues |
Stress Test Matchers
Section titled “Stress Test Matchers”Use these with results from kit.stress():
| Matcher | Description |
|---|---|
toPassStressTest() | Stress test passed overall |
toHaveStressSuccessRate(rate) | Achieve minimum success rate under load |
toAchieveRPS(rps) | Achieve minimum requests per second |
toHaveStressP95LatencyBelow(ms) | P95 latency under threshold |
Examples
Section titled “Examples”Basic Quality Testing
Section titled “Basic Quality Testing”import { ArtemisKit } from '@artemiskit/sdk';import { jestMatchers } from '@artemiskit/sdk/jest';
expect.extend(jestMatchers);
describe('LLM Quality Tests', () => { let kit: ArtemisKit;
beforeAll(() => { kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o-mini', project: 'jest-tests', }); });
it('should pass all test cases', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', });
expect(results).toPassAllCases(); }, 60000);
it('should achieve 90% success rate', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', });
expect(results).toHaveSuccessRate(0.9); }, 60000);
it('should have acceptable latency', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', });
expect(results).toHaveMedianLatencyBelow(5000); expect(results).toHaveP95LatencyBelow(10000); }, 60000);});import { ArtemisKit } from '@artemiskit/sdk';import { vitestMatchers } from '@artemiskit/sdk/vitest';import { beforeAll, describe, expect, test } from 'vitest';
expect.extend(vitestMatchers);
describe('LLM Quality Tests', () => { let kit: ArtemisKit;
beforeAll(() => { kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o-mini', project: 'vitest-tests', }); });
test('should pass all test cases', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', });
expect(results).toPassAllCases(); }, 60000);
test('should achieve 90% success rate', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', });
expect(results).toHaveSuccessRate(0.9); }, 60000);});Tag-Based Testing
Section titled “Tag-Based Testing”it('should pass critical test cases', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', tags: ['critical'], });
expect(results).toPassCasesWithTag('critical');});
it('should pass smoke tests', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', tags: ['smoke'], });
expect(results).toPassCasesWithTag('smoke'); expect(results).toHaveMedianLatencyBelow(2000);});Security Testing
Section titled “Security Testing”describe('Security Tests', () => { it('should pass red team testing', async () => { const results = await kit.redteam({ scenario: './scenarios/quality.yaml', mutations: ['typo', 'role-spoof', 'encoding'], countPerCase: 5, });
expect(results).toPassRedTeam(); }, 120000);
it('should maintain 95% defense rate', async () => { const results = await kit.redteam({ scenario: './scenarios/quality.yaml', mutations: ['typo', 'role-spoof', 'instruction-flip'], countPerCase: 10, });
expect(results).toHaveDefenseRate(0.95); }, 120000);
it('should have no critical vulnerabilities', async () => { const results = await kit.redteam({ scenario: './scenarios/quality.yaml', countPerCase: 5, });
expect(results).toHaveNoCriticalVulnerabilities(); expect(results).toHaveNoHighSeverityVulnerabilities(); }, 120000);});Performance Testing
Section titled “Performance Testing”describe('Performance Tests', () => { it('should handle concurrent load', async () => { const results = await kit.stress({ scenario: './scenarios/performance.yaml', concurrency: 10, duration: 30, rampUp: 5, });
expect(results).toPassStressTest(); }, 60000);
it('should achieve minimum throughput', async () => { const results = await kit.stress({ scenario: './scenarios/performance.yaml', concurrency: 10, duration: 30, });
expect(results).toAchieveRPS(2); // At least 2 requests per second expect(results).toHaveStressSuccessRate(0.95); }, 60000);
it('should maintain acceptable latency under load', async () => { const results = await kit.stress({ scenario: './scenarios/performance.yaml', concurrency: 10, duration: 30, });
expect(results).toHaveStressP95LatencyBelow(5000); }, 60000);});CI/CD Integration
Section titled “CI/CD Integration”Example test file for CI/CD pipelines:
import { ArtemisKit } from '@artemiskit/sdk';import { jestMatchers } from '@artemiskit/sdk/jest';
expect.extend(jestMatchers);
const CI_TIMEOUT = 120000;const isCI = process.env.CI === 'true';
describe('LLM Quality Gate', () => { let kit: ArtemisKit;
beforeAll(() => { kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o-mini', project: 'ci-quality-gate', }); });
// Always run: Quick smoke tests it('smoke: critical functionality', async () => { const results = await kit.run({ scenario: './scenarios/smoke.yaml', });
expect(results).toPassAllCases(); expect(results).toHaveMedianLatencyBelow(3000); }, CI_TIMEOUT);
// Always run: Security baseline it('security: no critical vulnerabilities', async () => { const results = await kit.redteam({ scenario: './scenarios/security.yaml', mutations: ['typo', 'role-spoof'], countPerCase: 3, });
expect(results).toHaveNoCriticalVulnerabilities(); }, CI_TIMEOUT);
// Skip in CI: Full regression suite (too slow) (isCI ? it.skip : it)('regression: full test suite', async () => { const results = await kit.run({ scenario: './scenarios/', parallel: true, });
expect(results).toHaveSuccessRate(0.95); }, 300000);
// Skip in CI: Performance testing (isCI ? it.skip : it)('performance: stress test', async () => { const results = await kit.stress({ scenario: './scenarios/performance.yaml', concurrency: 10, duration: 60, });
expect(results).toPassStressTest(); expect(results).toAchieveRPS(1); }, 120000);});Custom Assertions
Section titled “Custom Assertions”You can also access raw results for custom assertions:
it('should meet custom criteria', async () => { const results = await kit.run({ scenario: './scenarios/quality.yaml', });
// Access raw manifest data const { manifest, cases } = results;
// Custom assertions expect(manifest.project).toBe('my-project'); expect(manifest.metrics.total_cases).toBeGreaterThan(0); expect(manifest.metrics.pass_rate).toBeGreaterThanOrEqual(0.8);
// Check individual cases for (const caseResult of cases) { expect(caseResult.id).toBeDefined(); if (!caseResult.ok) { console.log(`Failed: ${caseResult.name} - ${caseResult.reason}`); } }});Best Practices
Section titled “Best Practices”- Set appropriate timeouts — LLM calls can be slow; use 60-120 second timeouts
- Use tags for organization — Group tests by criticality (smoke, critical, regression)
- Skip slow tests in CI — Use
it.skipor environment checks for performance tests - Test incrementally — Start with smoke tests, then add security and performance
- Monitor flakiness — LLM responses can vary; use appropriate success rate thresholds
See Also
Section titled “See Also”- SDK Overview — ArtemisKit SDK documentation
- Evaluation API — Programmatic evaluation
- Guardian Mode — Runtime protection
- CI/CD Integration — CI/CD setup guide