Skip to content

CI/CD Integration

Automate LLM evaluation as part of your deployment pipeline. ArtemisKit provides CI-friendly output formats, JUnit XML export, and validation tools.

Terminal window
# Validate scenarios first (fail fast)
akit validate scenarios/ --strict
# Run tests with JUnit export for CI platforms
akit run scenarios/ --ci --export junit --export-output ./test-results
# Run security tests
akit redteam scenarios/chatbot.yaml --export junit --export-output ./security-results
.github/workflows/llm-evaluation.yml
name: LLM Evaluation
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Bun
uses: oven-sh/setup-bun@v2
- name: Install ArtemisKit
run: bun add -g @artemiskit/cli
- name: Validate Scenarios
run: akit validate scenarios/ --strict --export junit --export-output ./validation-results
- name: Run Evaluation
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: akit run scenarios/ --ci --export junit --export-output ./test-results
- name: Publish Test Results
uses: EnricoMi/publish-unit-test-result-action@v2
if: always()
with:
files: |
validation-results/*.xml
test-results/*.xml
- name: Upload Artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: evaluation-results
path: |
validation-results/
test-results/
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Download Baseline
uses: actions/download-artifact@v4
with:
name: baseline-results
path: ./baseline
continue-on-error: true
- name: Install ArtemisKit
run: npm install -g @artemiskit/cli
- name: Run Evaluation
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: akit run scenarios/regression.yaml --save
- name: Compare with Baseline
if: hashFiles('baseline/run_manifest.json') != ''
run: |
akit compare baseline/run_manifest.json artemis-output/run_manifest.json --strict
- name: Update Baseline
if: github.ref == 'refs/heads/main'
uses: actions/upload-artifact@v4
with:
name: baseline-results
path: artemis-output/

Use exit codes to fail builds:

CodeMeaningAction
0All tests passedContinue
1Tests failedFail build
2Configuration errorFail build

Store secrets securely:

env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Run evaluations on a schedule:

on:
schedule:
- cron: '0 0 * * *' # Daily at midnight
.gitlab-ci.yml
llm-evaluation:
image: node:20
script:
- npm install -g @artemiskit/cli
- akit run scenarios/regression.yaml --save
artifacts:
paths:
- artemis-output/
expire_in: 30 days
variables:
OPENAI_API_KEY: $OPENAI_API_KEY
  1. Store API keys as secrets — Never commit API keys
  2. Use deterministic seeds — Ensure reproducible results
  3. Set reasonable timeouts — Prevent hanging builds
  4. Archive results — Keep history for comparison
  5. Run on PRs — Catch regressions before merge