SDK Overview

ArtemisKit SDK

The ArtemisKit SDK (@artemiskit/sdk) provides programmatic access to LLM evaluation, testing, and Guardian Mode for runtime protection.

Installation

npm install @artemiskit/sdk

Or with other package managers:

# Bun (recommended)
bun add @artemiskit/sdk

# pnpm
pnpm add @artemiskit/sdk

# Yarn
yarn add @artemiskit/sdk

Features

Guardian Mode

Runtime protection with semantic validation, injection detection, PII filtering, and action validation. Learn more →

Evaluation API

Programmatic LLM evaluation with all CLI evaluators available in code. Learn more →

Scenario Builders

Type-safe fluent API for building scenarios programmatically without YAML. Learn more →

Test Integration

Jest and Vitest matchers for LLM testing in your test suites. Learn more →

Validation & Comparison

Pre-flight scenario validation and regression detection between runs. Learn more →

Agentic Adapters

Test LangChain chains/agents and DeepAgents multi-agent systems. Learn more →

Quick Start

Guardian Mode (Runtime Protection)

Protect your LLM applications from prompt injection, jailbreaks, and unauthorized actions.

import { createGuardian } from '@artemiskit/sdk';
import { createAdapter } from '@artemiskit/core';

// Create your LLM client
const client = await createAdapter({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY,
});

// Create guardian with protection settings
const guardian = createGuardian({
  mode: 'selective',       // 'observe' | 'selective' | 'strict'
  validateInput: true,
  validateOutput: true,
  contentValidation: {
    strategy: 'semantic',  // LLM-as-judge validation (new in 0.3.3)
    semanticThreshold: 0.9,
  },
});

// Wrap your client with guardian protection
const protectedClient = guardian.protect(client);

// Now all requests go through Guardian
const result = await protectedClient.generate({
  prompt: 'What is the capital of France?',
  maxTokens: 100,
});

Scenario Validation (CI/CD)

Validate scenarios before execution for pre-flight checks:

import { ArtemisKit } from '@artemiskit/sdk';

const kit = new ArtemisKit({ project: 'my-project' });

// Validate scenario files
const validation = await kit.validate({
  scenario: './scenarios/**/*.yaml',
  strict: true,
});

if (!validation.valid) {
  console.error('Validation errors:', validation.errors);
  process.exit(1);
}

Regression Detection

Compare runs to detect regressions:

const comparison = await kit.compare({
  baseline: 'baseline-run-id',
  current: 'current-run-id',
  threshold: 0.05,
});

if (comparison.regression) {
  console.error(`Regression: ${comparison.delta.passRate}% drop`);
}

Evaluation API

Run programmatic evaluations with full access to all evaluator types.

import { ArtemisKit } from '@artemiskit/sdk';

const kit = new ArtemisKit({
  provider: 'openai',
  model: 'gpt-4o',
  project: 'my-project',
});

// Run scenario-based evaluation
const results = await kit.run({
  scenario: './scenarios/quality-tests.yaml',
});

console.log(`Pass rate: ${results.manifest.metrics.pass_rate * 100}%`);

// Red team security testing
const redteamResults = await kit.redteam({
  scenario: './scenarios/my-app.yaml',
  mutations: ['typo', 'role-spoof', 'encoding'],
  countPerCase: 5,
});

// Stress testing
const stressResults = await kit.stress({
  scenario: './scenarios/load-test.yaml',
  concurrency: 10,
  duration: 60,
});

Jest Integration

import { ArtemisKit } from '@artemiskit/sdk';
import { jestMatchers } from '@artemiskit/sdk/jest';

// Extend Jest with ArtemisKit matchers
expect.extend(jestMatchers);

describe('My LLM App', () => {
  let kit: ArtemisKit;

  beforeAll(() => {
    kit = new ArtemisKit({
      provider: 'openai',
      model: 'gpt-4o-mini',
      project: 'jest-tests',
    });
  });

  it('should pass all test cases', async () => {
    const results = await kit.run({
      scenario: './scenarios/quality.yaml',
    });

    expect(results).toPassAllCases();
  });

  it('should achieve 90% success rate', async () => {
    const results = await kit.run({
      scenario: './scenarios/quality.yaml',
    });

    expect(results).toHaveSuccessRate(0.9);
  });

  it('should pass red team testing', async () => {
    const results = await kit.redteam({
      scenario: './scenarios/quality.yaml',
      mutations: ['typo', 'role-spoof'],
    });

    expect(results).toPassRedTeam();
    expect(results).toHaveNoCriticalVulnerabilities();
  });
});

Vitest Integration

import { ArtemisKit } from '@artemiskit/sdk';
import { vitestMatchers } from '@artemiskit/sdk/vitest';
import { beforeAll, describe, expect, test } from 'vitest';

// Extend Vitest with ArtemisKit matchers
expect.extend(vitestMatchers);

describe('My LLM App', () => {
  let kit: ArtemisKit;

  beforeAll(() => {
    kit = new ArtemisKit({
      provider: 'openai',
      model: 'gpt-4o-mini',
      project: 'vitest-tests',
    });
  });

  test('should pass all cases', async () => {
    const results = await kit.run({
      scenario: './scenarios/quality.yaml',
    });

    expect(results).toPassAllCases();
  }, 60_000);

  test('should have acceptable latency', async () => {
    const results = await kit.run({
      scenario: './scenarios/quality.yaml',
    });

    expect(results).toHaveMedianLatencyBelow(5000);
    expect(results).toHaveP95LatencyBelow(10000);
  }, 60_000);
});

Available Matchers

Run Test Matchers

Matcher	Description
`toPassAllCases()`	All test cases passed
`toHaveSuccessRate(rate)`	Achieve minimum success rate (0-1)
`toPassCasesWithTag(tag)`	All cases with tag passed
`toHaveMedianLatencyBelow(ms)`	Median latency under threshold
`toHaveP95LatencyBelow(ms)`	P95 latency under threshold

Red Team Matchers

Matcher	Description
`toPassRedTeam()`	No vulnerabilities found
`toHaveDefenseRate(rate)`	Achieve minimum defense rate (0-1)
`toHaveNoCriticalVulnerabilities()`	No critical severity issues
`toHaveNoHighSeverityVulnerabilities()`	No high severity issues

Stress Test Matchers

Matcher	Description
`toPassStressTest()`	Stress test passed
`toHaveStressSuccessRate(rate)`	Achieve minimum success rate under load
`toAchieveRPS(rps)`	Achieve minimum requests per second
`toHaveStressP95LatencyBelow(ms)`	P95 latency under threshold

Documentation

Guardian Mode — Runtime protection for AI apps
Evaluation API — Programmatic evaluation
Test Matchers — Jest/Vitest integration
Agentic Adapters — LangChain and DeepAgents

SDK Overview

ArtemisKit SDK

Installation

Features

Quick Start

Guardian Mode (Runtime Protection)

Scenario Validation (CI/CD)

Regression Detection

Evaluation API

Jest Integration

Vitest Integration

Available Matchers

Run Test Matchers

Red Team Matchers

Stress Test Matchers

Documentation

See Also