Skip to content

artemiskit baseline

Manage baseline runs for regression detection. Baselines allow you to track expected performance and detect regressions when metrics drop below acceptable thresholds.

Terminal window
artemiskit baseline <subcommand> [options]
akit baseline <subcommand> [options]
SubcommandDescription
setSet a run as the baseline for a scenario
listList all configured baselines
getGet baseline details for a specific scenario
removeRemove a baseline

Set a run as the baseline for regression comparison.

Terminal window
akit baseline set <run-id> [options]
ArgumentDescription
run-idThe run ID to set as baseline (from akit history)
OptionShortDescriptionDefault
--scenario-sOverride scenario nameFrom run manifest
--tag-tOptional tag/description for the baselineNone
--configPath to config fileartemis.config.yaml
Terminal window
# Set baseline from a specific run
akit baseline set abc123def456
# Set baseline with a tag
akit baseline set abc123def456 --tag "v1.0.0-release"
# Override scenario name
akit baseline set abc123def456 --scenario "qa-regression-suite"
✔ Baseline created
Scenario: qa-test
Run ID: abc123def456
Success Rate: 95.0%
Test Cases: 19/20 passed
Tag: v1.0.0-release
Future runs of this scenario will be compared against this baseline.

List all configured baselines.

Terminal window
akit baseline list [options]
OptionDescriptionDefault
--configPath to config fileartemis.config.yaml
--jsonOutput as JSONfalse
Terminal window
# List all baselines
akit baseline list
# Output as JSON
akit baseline list --json
╔════════════════════════════════════════════════════════════════════════════════════════════╗
║ BASELINES ║
╠════════════════════════════════════════════════════════════════════════════════════════════╣
║ Scenario Run ID Success Rate Created Tag ║
╟────────────────────────────────────────────────────────────────────────────────────────────╢
║ qa-test abc123def456 95.0% 1/15/2026 10:30 AM v1.0.0 ║
║ security-scan xyz789uvw123 88.5% 1/14/2026 3:45 PM - ║
╚════════════════════════════════════════════════════════════════════════════════════════════╝
2 baselines configured

Get baseline details by run ID or scenario name.

Terminal window
akit baseline get <identifier> [options]
ArgumentDescription
identifierRun ID of the baseline (or scenario name with --scenario)
OptionShortDescriptionDefault
--scenario-sTreat identifier as scenario name instead of run IDfalse
--configPath to config fileartemis.config.yaml
--jsonOutput as JSONfalse
Terminal window
# Get baseline by run ID (default)
akit baseline get abc123def456
# Get baseline by scenario name
akit baseline get "qa-test" --scenario
# Output as JSON
akit baseline get abc123def456 --json
Baseline: qa-test
Run ID: abc123def456
Created: 1/15/2026, 10:30:00 AM
Success Rate: 95.0%
Test Cases: 19/20
Latency: 150ms (median)
Tokens: 12,500
Tag: v1.0.0-release

Remove a baseline by run ID or scenario name.

Terminal window
akit baseline remove <identifier> [options]
ArgumentDescription
identifierRun ID of the baseline to remove (or scenario name with --scenario)
OptionShortDescriptionDefault
--scenario-sTreat identifier as scenario name instead of run IDfalse
--force-fSkip confirmation promptfalse
--configPath to config fileartemis.config.yaml
Terminal window
# Remove baseline by run ID (with confirmation)
akit baseline remove abc123def456
# Remove baseline by scenario name
akit baseline remove "qa-test" --scenario
# Force remove without confirmation
akit baseline remove abc123def456 --force

  1. Establish a baseline after a successful release:
Terminal window
# Run tests and save results
akit run scenarios/qa-test.yaml --save
# Check history to get the run ID
akit history
# Set this run as the baseline
akit baseline set abc123def456 --tag "v1.0.0"
  1. Run tests with baseline comparison in CI:
Terminal window
# Compare against baseline, fail if regression detected
akit run scenarios/qa-test.yaml --ci --baseline --threshold 0.05
  1. Update baseline when expected behavior changes:
Terminal window
# After intentional changes, update the baseline
akit baseline set new_run_id --tag "v1.1.0"
name: LLM Evaluation
on: [push, pull_request]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v1
- run: bun install
- name: Run LLM Evaluation
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
bunx artemiskit run scenarios/ --ci --baseline --threshold 0.05
llm-evaluation:
stage: test
script:
- bunx artemiskit run scenarios/ --ci --baseline --threshold 0.05
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"

Baselines are stored in the .artemis/baselines.json file within your configured storage directory (default: artemis-runs/.artemis/baselines.json).

Each baseline stores:

  • Scenario name
  • Run ID reference
  • Creation timestamp
  • Key metrics (success rate, latency, tokens, case counts)
  • Optional tag