CI/CD Integration
CI/CD Integration
Section titled “CI/CD Integration”Automate LLM evaluation as part of your deployment pipeline. ArtemisKit provides CI-friendly output formats, JUnit XML export, and validation tools.
Quick Start
Section titled “Quick Start”# Validate scenarios first (fail fast)akit validate scenarios/ --strict
# Run tests with JUnit export for CI platformsakit run scenarios/ --ci --export junit --export-output ./test-results
# Run security testsakit redteam scenarios/chatbot.yaml --export junit --export-output ./security-resultsGitHub Actions
Section titled “GitHub Actions”Basic Workflow with JUnit Reports
Section titled “Basic Workflow with JUnit Reports”name: LLM Evaluation
on: push: branches: [main] pull_request: branches: [main]
jobs: evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Setup Bun uses: oven-sh/setup-bun@v2
- name: Install ArtemisKit run: bun add -g @artemiskit/cli
- name: Validate Scenarios run: akit validate scenarios/ --strict --export junit --export-output ./validation-results
- name: Run Evaluation env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: akit run scenarios/ --ci --export junit --export-output ./test-results
- name: Publish Test Results uses: EnricoMi/publish-unit-test-result-action@v2 if: always() with: files: | validation-results/*.xml test-results/*.xml
- name: Upload Artifacts uses: actions/upload-artifact@v4 if: always() with: name: evaluation-results path: | validation-results/ test-results/With Regression Detection
Section titled “With Regression Detection”jobs: evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Download Baseline uses: actions/download-artifact@v4 with: name: baseline-results path: ./baseline continue-on-error: true
- name: Install ArtemisKit run: npm install -g @artemiskit/cli
- name: Run Evaluation env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: akit run scenarios/regression.yaml --save
- name: Compare with Baseline if: hashFiles('baseline/run_manifest.json') != '' run: | akit compare baseline/run_manifest.json artemis-output/run_manifest.json --strict
- name: Update Baseline if: github.ref == 'refs/heads/main' uses: actions/upload-artifact@v4 with: name: baseline-results path: artemis-output/Exit Codes
Section titled “Exit Codes”Use exit codes to fail builds:
| Code | Meaning | Action |
|---|---|---|
| 0 | All tests passed | Continue |
| 1 | Tests failed | Fail build |
| 2 | Configuration error | Fail build |
Environment Variables
Section titled “Environment Variables”Store secrets securely:
env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}Scheduled Runs
Section titled “Scheduled Runs”Run evaluations on a schedule:
on: schedule: - cron: '0 0 * * *' # Daily at midnightGitLab CI
Section titled “GitLab CI”llm-evaluation: image: node:20 script: - npm install -g @artemiskit/cli - akit run scenarios/regression.yaml --save artifacts: paths: - artemis-output/ expire_in: 30 days variables: OPENAI_API_KEY: $OPENAI_API_KEYBest Practices
Section titled “Best Practices”- Store API keys as secrets — Never commit API keys
- Use deterministic seeds — Ensure reproducible results
- Set reasonable timeouts — Prevent hanging builds
- Archive results — Keep history for comparison
- Run on PRs — Catch regressions before merge
See Also
Section titled “See Also”- Validate Command — Pre-flight scenario validation
- Run Command — Execute scenarios with JUnit export
- Red Team Command — Security testing with JUnit export
- Compare Command — Compare runs for regression
- Baseline Command — Manage baselines for regression detection