Manage baseline runs for regression detection. Baselines allow you to track expected performance and detect regressions when metrics drop below acceptable thresholds.
artemiskit baseline <subcommand> [options]
akit baseline <subcommand> [options]
Subcommand Description setSet a run as the baseline for a scenario listList all configured baselines getGet baseline details for a specific scenario removeRemove a baseline
Set a run as the baseline for regression comparison.
akit baseline set <run-id> [options]
Argument Description run-idThe run ID to set as baseline (from akit history)
Option Short Description Default --scenario-sOverride scenario name From run manifest --tag-tOptional tag/description for the baseline None --configPath to config file artemis.config.yaml
# Set baseline from a specific run
akit baseline set abc123def456
# Set baseline with a tag
akit baseline set abc123def456 --tag " v1.0.0-release "
akit baseline set abc123def456 --scenario " qa-regression-suite "
Future runs of this scenario will be compared against this baseline.
List all configured baselines.
akit baseline list [options]
Option Description Default --configPath to config file artemis.config.yaml--jsonOutput as JSON false
akit baseline list --json
╔════════════════════════════════════════════════════════════════════════════════════════════╗
╠════════════════════════════════════════════════════════════════════════════════════════════╣
║ Scenario Run ID Success Rate Created Tag ║
╟────────────────────────────────────────────────────────────────────────────────────────────╢
║ qa-test abc123def456 95.0% 1/15/2026 10:30 AM v1.0.0 ║
║ security-scan xyz789uvw123 88.5% 1/14/2026 3:45 PM - ║
╚════════════════════════════════════════════════════════════════════════════════════════════╝
Get baseline details by run ID or scenario name.
akit baseline get <identifier> [options]
Argument Description identifierRun ID of the baseline (or scenario name with --scenario)
Option Short Description Default --scenario-sTreat identifier as scenario name instead of run ID false--configPath to config file artemis.config.yaml--jsonOutput as JSON false
# Get baseline by run ID (default)
akit baseline get abc123def456
# Get baseline by scenario name
akit baseline get " qa-test " --scenario
akit baseline get abc123def456 --json
Created: 1/15/2026, 10:30:00 AM
Remove a baseline by run ID or scenario name.
akit baseline remove <identifier> [options]
Argument Description identifierRun ID of the baseline to remove (or scenario name with --scenario)
Option Short Description Default --scenario-sTreat identifier as scenario name instead of run ID false--force-fSkip confirmation prompt false--configPath to config file artemis.config.yaml
# Remove baseline by run ID (with confirmation)
akit baseline remove abc123def456
# Remove baseline by scenario name
akit baseline remove " qa-test " --scenario
# Force remove without confirmation
akit baseline remove abc123def456 --force
Establish a baseline after a successful release:
# Run tests and save results
akit run scenarios/qa-test.yaml --save
# Check history to get the run ID
# Set this run as the baseline
akit baseline set abc123def456 --tag " v1.0.0 "
Run tests with baseline comparison in CI:
# Compare against baseline, fail if regression detected
akit run scenarios/qa-test.yaml --ci --baseline --threshold 0.05
Update baseline when expected behavior changes:
# After intentional changes, update the baseline
akit baseline set new_run_id --tag " v1.1.0 "
- uses : actions/checkout@v4
- uses : oven-sh/setup-bun@v1
- name : Run LLM Evaluation
OPENAI_API_KEY : ${{ secrets.OPENAI_API_KEY }}
bunx artemiskit run scenarios/ --ci --baseline --threshold 0.05
- bunx artemiskit run scenarios/ --ci --baseline --threshold 0.05
- if : $CI_PIPELINE_SOURCE == "merge_request_event"
Baselines are stored in the .artemis/baselines.json file within your configured storage directory (default: artemis-runs/.artemis/baselines.json).
Each baseline stores:
Scenario name
Run ID reference
Creation timestamp
Key metrics (success rate, latency, tokens, case counts)
Optional tag