Skip to content

Core Concepts

ArtemisKit is built around a few key concepts that apply across both the CLI and SDK. Understanding these will help you get the most out of the toolkit.

Scenarios

Test suites containing prompts and expectations for evaluating LLM outputs. Learn more →

Expectations

Matchers that define how to evaluate LLM responses against expected behavior. Learn more →

Providers

LLM provider configurations for OpenAI, Anthropic, Azure, and more. Learn more →

Evaluators

The evaluation engine that runs scenarios and produces results. Learn more →

┌─────────────────────────────────────────────────────────────┐
│ ArtemisKit │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Scenarios │───▶│ Evaluators │───▶│ Results │ │
│ │ (YAML/TS) │ │ (Matchers) │ │ (Reports) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Provider │ │ Guardian │ │ Storage │ │
│ │ (LLM API) │ │ (Protection)│ │ (History) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
  1. Define scenarios — Write test cases in YAML files or TypeScript using builders
  2. Configure providers — Set up API keys and model settings
  3. Run evaluations — Execute scenarios via CLI (akit run) or SDK (kit.run())
  4. Review results — Analyze pass/fail rates, latency, and detailed responses
  5. Iterate — Refine prompts, adjust expectations, improve your LLM application

These concepts work the same way whether you’re using the CLI or SDK:

ConceptCLISDK
ScenariosYAML filesYAML files or scenario() builder
ExpectationsYAML expected fieldYAML or contains(), exact(), etc.
ProvidersConfig file or --provider flagArtemisKit({ provider: '...' })
ResultsTerminal output + reportsRunResult object
Storageartemis-output/ directoryConfigurable via storage option