artemiskit run
artemiskit run
Section titled “artemiskit run”Run scenario-based evaluations against your LLM.
Synopsis
Section titled “Synopsis”artemiskit run <scenario> [options]akit run <scenario> [options]Arguments
Section titled “Arguments”| Argument | Description |
|---|---|
scenario | Path to scenario file, directory, or glob pattern |
The scenario argument supports:
- Single file:
scenarios/test.yaml - Directory:
scenarios/(runs all.yaml/.ymlfiles recursively) - Glob pattern:
scenarios/**/*.yaml(matches pattern)
Options
Section titled “Options”| Option | Short | Description | Default |
|---|---|---|---|
--provider | -p | LLM provider to use | From config/scenario |
--model | -m | Model name | From config/scenario |
--output | -o | Output directory for results | artemis-output |
--verbose | -v | Verbose output | false |
--tags | -t | Filter test cases by tags | All cases |
--save | Save results to storage (enabled by default) | true | |
--concurrency | -c | Number of concurrent test cases per scenario | 1 |
--parallel | Number of scenarios to run in parallel | Sequential | |
--timeout | Timeout per test case (ms) | From config | |
--retries | Number of retries per test case | From config | |
--config | Path to config file | artemis.config.yaml | |
--redact | Enable PII/sensitive data redaction | false | |
--redact-patterns | Custom redaction patterns (regex or built-in) | Default patterns |
Redaction Patterns
Section titled “Redaction Patterns”Built-in patterns: email, phone, credit_card, ssn, api_key
You can also use custom regex patterns.
Examples
Section titled “Examples”Basic Run
Section titled “Basic Run”akit run scenarios/qa-test.yamlRun All Scenarios in a Directory
Section titled “Run All Scenarios in a Directory”akit run scenarios/Run with Glob Pattern
Section titled “Run with Glob Pattern”akit run "scenarios/**/*.yaml"Run Scenarios in Parallel
Section titled “Run Scenarios in Parallel”Run 4 scenarios concurrently:
akit run scenarios/ --parallel 4With Provider Override
Section titled “With Provider Override”akit run scenarios/qa-test.yaml -p anthropic -m claude-3-5-sonnet-20241022Filter by Tags
Section titled “Filter by Tags”akit run scenarios/qa-test.yaml --tags regression securitySave Results
Section titled “Save Results”akit run scenarios/qa-test.yaml --save -o ./reportsConcurrent Execution
Section titled “Concurrent Execution”akit run scenarios/qa-test.yaml --concurrency 5Verbose Output
Section titled “Verbose Output”akit run scenarios/qa-test.yaml -vCustom Config
Section titled “Custom Config”akit run scenarios/qa-test.yaml --config ./custom-config.yamlWith Redaction
Section titled “With Redaction”Redact PII from results using built-in patterns:
akit run scenarios/qa-test.yaml --redactWith specific patterns:
akit run scenarios/qa-test.yaml --redact --redact-patterns email phone api_keyExit Codes
Section titled “Exit Codes”| Code | Meaning |
|---|---|
| 0 | All test cases passed |
| 1 | One or more test cases failed |
| 2 | Configuration or runtime error |
Output
Section titled “Output”When --save is used, ArtemisKit creates files in the output directory:
run_manifest.json— Complete run metadata and results
The manifest includes:
- Run ID and timestamps
- Provider and model used
- All test case results with pass/fail status
- Response latencies and token counts
- Git information (if in a git repo)
Example Output
Section titled “Example Output”Running scenario: qa-testProvider: openai (gpt-5)
✓ greeting-test (234ms) ✓ math-test (189ms) ✗ complex-reasoning (512ms) Expected: contains ["explanation"] Got: Response did not contain expected values
Results: 2/3 passed (66.7%)Total time: 1.2s