Skip to content

artemiskit run

Run scenario-based evaluations against your LLM.

Terminal window
artemiskit run <scenario> [options]
akit run <scenario> [options]
ArgumentDescription
scenarioPath to scenario file, directory, or glob pattern

The scenario argument supports:

  • Single file: scenarios/test.yaml
  • Directory: scenarios/ (runs all .yaml/.yml files recursively)
  • Glob pattern: scenarios/**/*.yaml (matches pattern)
OptionShortDescriptionDefault
--provider-pLLM provider to useFrom config/scenario
--model-mModel nameFrom config/scenario
--output-oOutput directory for resultsartemis-output
--verbose-vVerbose outputfalse
--tags-tFilter test cases by tagsAll cases
--saveSave results to storage (enabled by default)true
--concurrency-cNumber of concurrent test cases per scenario1
--parallelNumber of scenarios to run in parallelSequential
--timeoutTimeout per test case (ms)From config
--retriesNumber of retries per test caseFrom config
--configPath to config fileartemis.config.yaml
--redactEnable PII/sensitive data redactionfalse
--redact-patternsCustom redaction patterns (regex or built-in)Default patterns

Built-in patterns: email, phone, credit_card, ssn, api_key

You can also use custom regex patterns.

Terminal window
akit run scenarios/qa-test.yaml
Terminal window
akit run scenarios/
Terminal window
akit run "scenarios/**/*.yaml"

Run 4 scenarios concurrently:

Terminal window
akit run scenarios/ --parallel 4
Terminal window
akit run scenarios/qa-test.yaml -p anthropic -m claude-3-5-sonnet-20241022
Terminal window
akit run scenarios/qa-test.yaml --tags regression security
Terminal window
akit run scenarios/qa-test.yaml --save -o ./reports
Terminal window
akit run scenarios/qa-test.yaml --concurrency 5
Terminal window
akit run scenarios/qa-test.yaml -v
Terminal window
akit run scenarios/qa-test.yaml --config ./custom-config.yaml

Redact PII from results using built-in patterns:

Terminal window
akit run scenarios/qa-test.yaml --redact

With specific patterns:

Terminal window
akit run scenarios/qa-test.yaml --redact --redact-patterns email phone api_key
CodeMeaning
0All test cases passed
1One or more test cases failed
2Configuration or runtime error

When --save is used, ArtemisKit creates files in the output directory:

  • run_manifest.json — Complete run metadata and results

The manifest includes:

  • Run ID and timestamps
  • Provider and model used
  • All test case results with pass/fail status
  • Response latencies and token counts
  • Git information (if in a git repo)
Running scenario: qa-test
Provider: openai (gpt-5)
✓ greeting-test (234ms)
✓ math-test (189ms)
✗ complex-reasoning (512ms)
Expected: contains ["explanation"]
Got: Response did not contain expected values
Results: 2/3 passed (66.7%)
Total time: 1.2s