Skip to content

SDK Overview

The ArtemisKit SDK (@artemiskit/sdk) provides programmatic access to LLM evaluation, testing, and Guardian Mode for runtime protection.

Terminal window
npm install @artemiskit/sdk

Or with other package managers:

Terminal window
# Bun (recommended)
bun add @artemiskit/sdk
# pnpm
pnpm add @artemiskit/sdk
# Yarn
yarn add @artemiskit/sdk

Guardian Mode

Runtime protection with semantic validation, injection detection, PII filtering, and action validation. Learn more →

Evaluation API

Programmatic LLM evaluation with all CLI evaluators available in code. Learn more →

Scenario Builders

Type-safe fluent API for building scenarios programmatically without YAML. Learn more →

Test Integration

Jest and Vitest matchers for LLM testing in your test suites. Learn more →

Validation & Comparison

Pre-flight scenario validation and regression detection between runs. Learn more →

Agentic Adapters

Test LangChain chains/agents and DeepAgents multi-agent systems. Learn more →

Protect your LLM applications from prompt injection, jailbreaks, and unauthorized actions.

import { createGuardian } from '@artemiskit/sdk';
import { createAdapter } from '@artemiskit/core';
// Create your LLM client
const client = await createAdapter({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
});
// Create guardian with protection settings
const guardian = createGuardian({
mode: 'selective', // 'observe' | 'selective' | 'strict'
validateInput: true,
validateOutput: true,
contentValidation: {
strategy: 'semantic', // LLM-as-judge validation (new in 0.3.3)
semanticThreshold: 0.9,
},
});
// Wrap your client with guardian protection
const protectedClient = guardian.protect(client);
// Now all requests go through Guardian
const result = await protectedClient.generate({
prompt: 'What is the capital of France?',
maxTokens: 100,
});

Validate scenarios before execution for pre-flight checks:

import { ArtemisKit } from '@artemiskit/sdk';
const kit = new ArtemisKit({ project: 'my-project' });
// Validate scenario files
const validation = await kit.validate({
scenario: './scenarios/**/*.yaml',
strict: true,
});
if (!validation.valid) {
console.error('Validation errors:', validation.errors);
process.exit(1);
}

Compare runs to detect regressions:

const comparison = await kit.compare({
baseline: 'baseline-run-id',
current: 'current-run-id',
threshold: 0.05,
});
if (comparison.regression) {
console.error(`Regression: ${comparison.delta.passRate}% drop`);
}

Run programmatic evaluations with full access to all evaluator types.

import { ArtemisKit } from '@artemiskit/sdk';
const kit = new ArtemisKit({
provider: 'openai',
model: 'gpt-4o',
project: 'my-project',
});
// Run scenario-based evaluation
const results = await kit.run({
scenario: './scenarios/quality-tests.yaml',
});
console.log(`Pass rate: ${results.manifest.metrics.pass_rate * 100}%`);
// Red team security testing
const redteamResults = await kit.redteam({
scenario: './scenarios/my-app.yaml',
mutations: ['typo', 'role-spoof', 'encoding'],
countPerCase: 5,
});
// Stress testing
const stressResults = await kit.stress({
scenario: './scenarios/load-test.yaml',
concurrency: 10,
duration: 60,
});
import { ArtemisKit } from '@artemiskit/sdk';
import { jestMatchers } from '@artemiskit/sdk/jest';
// Extend Jest with ArtemisKit matchers
expect.extend(jestMatchers);
describe('My LLM App', () => {
let kit: ArtemisKit;
beforeAll(() => {
kit = new ArtemisKit({
provider: 'openai',
model: 'gpt-4o-mini',
project: 'jest-tests',
});
});
it('should pass all test cases', async () => {
const results = await kit.run({
scenario: './scenarios/quality.yaml',
});
expect(results).toPassAllCases();
});
it('should achieve 90% success rate', async () => {
const results = await kit.run({
scenario: './scenarios/quality.yaml',
});
expect(results).toHaveSuccessRate(0.9);
});
it('should pass red team testing', async () => {
const results = await kit.redteam({
scenario: './scenarios/quality.yaml',
mutations: ['typo', 'role-spoof'],
});
expect(results).toPassRedTeam();
expect(results).toHaveNoCriticalVulnerabilities();
});
});
import { ArtemisKit } from '@artemiskit/sdk';
import { vitestMatchers } from '@artemiskit/sdk/vitest';
import { beforeAll, describe, expect, test } from 'vitest';
// Extend Vitest with ArtemisKit matchers
expect.extend(vitestMatchers);
describe('My LLM App', () => {
let kit: ArtemisKit;
beforeAll(() => {
kit = new ArtemisKit({
provider: 'openai',
model: 'gpt-4o-mini',
project: 'vitest-tests',
});
});
test('should pass all cases', async () => {
const results = await kit.run({
scenario: './scenarios/quality.yaml',
});
expect(results).toPassAllCases();
}, 60_000);
test('should have acceptable latency', async () => {
const results = await kit.run({
scenario: './scenarios/quality.yaml',
});
expect(results).toHaveMedianLatencyBelow(5000);
expect(results).toHaveP95LatencyBelow(10000);
}, 60_000);
});
MatcherDescription
toPassAllCases()All test cases passed
toHaveSuccessRate(rate)Achieve minimum success rate (0-1)
toPassCasesWithTag(tag)All cases with tag passed
toHaveMedianLatencyBelow(ms)Median latency under threshold
toHaveP95LatencyBelow(ms)P95 latency under threshold
MatcherDescription
toPassRedTeam()No vulnerabilities found
toHaveDefenseRate(rate)Achieve minimum defense rate (0-1)
toHaveNoCriticalVulnerabilities()No critical severity issues
toHaveNoHighSeverityVulnerabilities()No high severity issues
MatcherDescription
toPassStressTest()Stress test passed
toHaveStressSuccessRate(rate)Achieve minimum success rate under load
toAchieveRPS(rps)Achieve minimum requests per second
toHaveStressP95LatencyBelow(ms)P95 latency under threshold