Scenario Builders
Scenario Builders
Section titled “Scenario Builders”The ArtemisKit SDK provides a type-safe fluent API for building evaluation scenarios programmatically, without writing YAML files.
Installation
Section titled “Installation”Builders are included in the main SDK package:
bun add @artemiskit/sdkImport from the builders subpath:
import { scenario, testCase, contains, exact } from '@artemiskit/sdk/builders';Quick Start
Section titled “Quick Start”import { scenario, testCase, contains, exact, regex } from '@artemiskit/sdk/builders';import { ArtemisKit } from '@artemiskit/sdk';
// Build a scenario programmaticallyconst myScenario = scenario('api-response-tests') .description('Test API response quality') .provider('openai') .model('gpt-4o') .cases([ testCase('greeting') .prompt('Say hello to the user') .expect(contains(['hello', 'hi', 'hey'])),
testCase('math-calculation') .prompt('What is 15 + 27?') .expect(exact('42')),
testCase('code-output') .prompt('Write a function that returns true') .expect(regex(/return\s+true/)), ]) .build();
// Run the scenarioconst kit = new ArtemisKit({ project: 'my-project' });const results = await kit.run({ scenario: myScenario });Scenario Builder
Section titled “Scenario Builder”The scenario() function creates a new scenario builder:
import { scenario } from '@artemiskit/sdk/builders';
const myScenario = scenario('scenario-name') .description('What this scenario tests') .provider('openai') // 'openai' | 'anthropic' | 'azure-openai' | etc. .model('gpt-4o') .timeout(30000) // Timeout per case in ms .retries(2) // Retry failed cases .tags(['smoke', 'critical']) // Tags for filtering .variables({ // Variables for interpolation userName: 'Alice', topic: 'TypeScript', }) .cases([...]) .build();Scenario Methods
Section titled “Scenario Methods”| Method | Description |
|---|---|
description(text) | Set scenario description |
provider(name) | Set LLM provider |
model(name) | Set model name |
timeout(ms) | Set timeout per case |
retries(count) | Set retry count |
tags(tags[]) | Add tags for filtering |
variables(obj) | Set variables for interpolation |
cases(cases[]) | Add test cases |
build() | Build the final scenario object |
Test Case Builder
Section titled “Test Case Builder”The testCase() function creates individual test cases:
import { testCase, contains } from '@artemiskit/sdk/builders';
const tc = testCase('case-id') .prompt('Your prompt here') .systemPrompt('Optional system prompt') .expect(contains(['expected', 'values'])) .tags(['smoke']) .timeout(10000) .build();Test Case Methods
Section titled “Test Case Methods”| Method | Description |
|---|---|
prompt(text) | Set the user prompt |
systemPrompt(text) | Set system prompt |
messages(msgs[]) | Set full message array |
expect(expectation) | Set expected output |
tags(tags[]) | Add tags |
timeout(ms) | Override timeout |
build() | Build the test case object |
Multi-turn Conversations
Section titled “Multi-turn Conversations”const tc = testCase('conversation') .messages([ { role: 'system', content: 'You are a helpful assistant' }, { role: 'user', content: 'Hello!' }, { role: 'assistant', content: 'Hi there! How can I help?' }, { role: 'user', content: 'What is 2+2?' }, ]) .expect(contains(['4'])) .build();Expectation Builders
Section titled “Expectation Builders”contains(values, options?)
Section titled “contains(values, options?)”Check if response contains specified values:
import { contains } from '@artemiskit/sdk/builders';
// Any of the values (default)contains(['hello', 'hi', 'hey'])
// All values requiredcontains(['hello', 'world'], { mode: 'all' })
// Case insensitivecontains(['HELLO'], { caseInsensitive: true })notContains(values)
Section titled “notContains(values)”Check that response does NOT contain values:
import { notContains } from '@artemiskit/sdk/builders';
notContains(['error', 'failed', 'exception'])exact(value)
Section titled “exact(value)”Exact string match:
import { exact } from '@artemiskit/sdk/builders';
exact('42')exact('Hello, World!')regex(pattern)
Section titled “regex(pattern)”Regular expression match:
import { regex } from '@artemiskit/sdk/builders';
regex(/\d{4}-\d{2}-\d{2}/) // Date patternregex(/^(yes|no)$/i) // Yes/No with flagsregex('\\d+') // String patternfuzzy(value, options?)
Section titled “fuzzy(value, options?)”Fuzzy string matching using Levenshtein distance:
import { fuzzy } from '@artemiskit/sdk/builders';
fuzzy('Hello World', { threshold: 0.8 })similarity(value, options?)
Section titled “similarity(value, options?)”Semantic similarity matching:
import { similarity } from '@artemiskit/sdk/builders';
// Embedding-based (default)similarity('A friendly greeting', { threshold: 0.85 })
// LLM-based semantic comparisonsimilarity('A helpful response', { mode: 'llm', threshold: 0.9 })llmGrade(rubric, options?)
Section titled “llmGrade(rubric, options?)”LLM-as-judge grading:
import { llmGrade } from '@artemiskit/sdk/builders';
llmGrade('Response should be helpful, accurate, and concise', { threshold: 0.8,})
llmGrade('Is this a valid JSON response?', { threshold: 0.9, model: 'gpt-4o', // Override grader model})jsonSchema(schema)
Section titled “jsonSchema(schema)”Validate JSON output against a schema:
import { jsonSchema } from '@artemiskit/sdk/builders';
jsonSchema({ type: 'object', required: ['name', 'age'], properties: { name: { type: 'string' }, age: { type: 'number', minimum: 0 }, email: { type: 'string', format: 'email' }, },})allOf(...expectations) / anyOf(...expectations)
Section titled “allOf(...expectations) / anyOf(...expectations)”Combine multiple expectations:
import { allOf, anyOf, contains, regex } from '@artemiskit/sdk/builders';
// All must passallOf( contains(['hello']), regex(/\d+/),)
// At least one must passanyOf( exact('yes'), exact('no'), contains(['maybe']),)Complete Example
Section titled “Complete Example”import { scenario, testCase, contains, exact, regex, jsonSchema, llmGrade, allOf,} from '@artemiskit/sdk/builders';import { ArtemisKit } from '@artemiskit/sdk';
const apiTestScenario = scenario('api-quality-tests') .description('Comprehensive API response quality tests') .provider('openai') .model('gpt-4o') .timeout(30000) .tags(['api', 'quality']) .variables({ apiVersion: 'v2', }) .cases([ testCase('greeting-response') .prompt('Greet the user warmly') .expect(contains(['hello', 'hi', 'welcome'])) .tags(['smoke']),
testCase('json-output') .prompt('Return a JSON object with name and age') .expect(jsonSchema({ type: 'object', required: ['name', 'age'], properties: { name: { type: 'string' }, age: { type: 'number' }, }, })),
testCase('code-generation') .systemPrompt('You are a helpful coding assistant') .prompt('Write a TypeScript function that adds two numbers') .expect(allOf( contains(['function']), regex(/:\s*number/), // Return type )),
testCase('helpful-response') .prompt('Explain what an API is to a beginner') .expect(llmGrade( 'Response should be clear, accurate, and appropriate for beginners', { threshold: 0.85 } )), ]) .build();
// Run the scenarioconst kit = new ArtemisKit({ provider: 'openai', model: 'gpt-4o', project: 'api-tests',});
const results = await kit.run({ scenario: apiTestScenario });
console.log(`Pass rate: ${results.manifest.metrics.pass_rate * 100}%`);Type Safety
Section titled “Type Safety”All builders are fully typed. TypeScript will catch errors at compile time:
import { testCase, contains } from '@artemiskit/sdk/builders';
// TypeScript error: 'invald' is not a valid modecontains(['hello'], { mode: 'invald' });
// TypeScript error: threshold must be a numberllmGrade('rubric', { threshold: 'high' });
// TypeScript error: missing required 'prompt' or 'messages'testCase('test').expect(contains(['x'])).build();Type Contracts
Section titled “Type Contracts”For maximum type safety, use the contract types:
import type { ScenarioContract, TestCaseContract, ExpectationContract,} from '@artemiskit/sdk/contracts';
const myScenario: ScenarioContract = { name: 'typed-scenario', cases: [...],};See Also
Section titled “See Also”- Evaluation API — Running evaluations
- Scenario Format — YAML scenario reference
- Expectations — All expectation types