ArtemisKit is an open-source CLI toolkit for testing LLM applications. It combines security red-teaming, quality evaluation, and stress testing in a single tool, replacing the need for 3+ separate tools.

Is ArtemisKit free to use?

Yes, ArtemisKit CLI is completely free and open-source under the Apache 2.0 license. You can use it for personal and commercial projects without any cost.

What security tests does ArtemisKit perform?

ArtemisKit tests for 6 mutation types including prompt injection, jailbreak attempts, role spoofing, instruction flipping, encoding attacks, and multi-turn conversation attacks. Each vulnerability is scored using a CVSS-like severity system.

How do I install ArtemisKit?

Install ArtemisKit globally via npm with: npm install -g @artemiskit/cli. You can also use pnpm, yarn, or bun. After installation, verify with: akit --version

What LLM providers does ArtemisKit support?

ArtemisKit supports OpenAI, Anthropic (Claude), Azure OpenAI, and any OpenAI-compatible API. It also integrates with Vercel AI SDK for additional provider support.

Can I use ArtemisKit in CI/CD pipelines?

Yes, ArtemisKit is designed for CI/CD integration. It returns proper exit codes (0 for pass, 1 for failures) and can generate JSON reports for automated processing.

OWASP #1 Threat Prompt Injection

Stop Hoping Your AI Is Secure. Start Proving It.

Name: ArtemisKit CLI
Author: ArtemisKit

Most teams use 3+ tools to test, secure, and benchmark their LLMs. ArtemisKit does all three. One CLI. Complete reliability.

Security 6 Attack Types

Quality 12 Evaluators

Performance p50/p95/p99

Test Your Defenses View on GitHub

$ npm install -g @artemiskit/cli

artemiskit — security scan

2 Critical3 Blocked

6 mutation types

Available Now

Apache 2.0 Licensed

stars forks

Apache-2.0

The Reality Check

Your LLM Is One Prompt Away
From Disaster

Prompt injection is OWASP's #1 LLM security risk. Most teams don't test for it. These scenarios aren't hypothetical—they're happening.

CRITICAL • Prompt Injection

Your AI passed every test.

Then it leaked customer data.

A cleverly crafted prompt bypassed your safety filters. Sensitive information was in the response before anyone noticed.

#1 OWASP LLM Risk

HIGH • Output Inconsistency

Same prompt. Different answer.

Users noticed before you did.

Non-deterministic outputs eroded trust. Your chatbot contradicted itself, and support tickets flooded in.

73% AI projects fail pre-production

HIGH • Performance Unknown

Staging worked perfectly.

Production crashed under load.

Your first real stress test happened when users flooded your app. Latency spiked. The API hit rate limits. Revenue was lost.

10x production load vs staging

Sound familiar?

Manual Testing

Copy-pasting prompts into a chat window is not a test suite.

Security Afterthought

Traditional SAST/DAST tools don't cover LLM attack surfaces.

Ship and Pray

Deploying without quality gates and hoping nothing breaks.

Tool Fragmentation

Using 3+ tools for testing, security, and performance.

There's a better way

The Solution

One CLI. Three Superpowers.

ArtemisKit is the open-source standard for LLM reliability. Security red-teaming, quality evaluation, and stress testing—unified in a single CLI.

Lead Capability

Security Red-Teaming

Break it before attackers do.

Systematically test for prompt injection, jailbreaks, data extraction, and more. Get CVSS-like severity scores and audit-ready reports.

6 mutation types

Prompt injection detection

Jailbreak resistance testing

Data extraction prevention

Custom attack definitions

CVSS-like severity scoring

terminal

$ artemiskit redteam

6

Mutation types

Core Capability

Quality Evaluation

Catch regressions before they ship.

Evaluate LLM outputs with semantic similarity, LLM-as-judge, JSON schema validation, regex matching, and more. Reproducible tests, every time.

12 evaluator types

Semantic similarity scoring

LLM-as-judge evaluation

JSON schema validation

Multi-turn conversation testing

CI/CD integration ready

terminal

$ artemiskit run

12

Evaluator types

Core Capability

Stress Testing

Know your limits before users find them.

Measure p50/p95/p99 latency, throughput, and token costs under load. Discover rate limits and performance cliffs in staging, not production.

Configurable concurrency

p50/p95/p99 latency metrics

Throughput analysis

Token cost calculation

Rate limit detection

Capacity planning data

terminal

$ artemiskit stress

p99

Latency percentiles

+

Stop using 3 tools. Use one.

Open source • Apache 2.0 • Self-hosted

Quick Start

Get Started in 3 Steps

From installation to your first test results in under 5 minutes. No complex setup. No configuration hell.

01

Install

Install globally with npm, yarn, pnpm, or bun. Zero configuration required.

bash

# Install globally
npm install -g @artemiskit/cli

# Or run directly with npx
npx @artemiskit/cli --help

02

Define

Define test scenarios in simple YAML. Multi-turn conversations, custom evaluators, and more.

yaml

# scenarios/my-test.yaml
name: chatbot-test
config:
  provider: openai
  model: gpt-4

scenarios:
  - name: basic-qa
    turns:
      - role: user
        content: "What is 2+2?"
      - role: assistant
        expect:
          contains: ["4"]

03

Run

Execute tests and get detailed reports. HTML dashboards, JSON exports, CI/CD ready.

bash

$ artemiskit run scenarios/my-test.yaml

Running chatbot-test...

✓ Scenario: basic-qa ........... PASS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 1/1 passed (100%)
Report: ./artemis-output/report.html

Read the Documentation Scenario Format Guide

Use Cases

Built for Every Team

From security audits to compliance documentation, ArtemisKit adapts to your workflow.

Security Teams

01/05

Red Team Your LLM

Break it before attackers do

Traditional security tools don't cover LLM attack surfaces. Prompt injection is OWASP #1, but most teams don't test for it.

$akit redteam scenario.yaml

Prompt injection detection (direct & indirect)

Jailbreak resistance testing

Data extraction prevention

CVSS-like vulnerability scoring

Learn more

Click tabs or dots to explore different use cases

Not sure where to start? The CLI guides you through your first evaluation in under 5 minutes.

Get Started View on GitHub

Coming Soon

ArtemisKit Cloud

The CLI you love, with managed infrastructure, team collaboration, and advanced analytics. Zero setup required.

What's coming

Team Workspaces

Collaborate on evaluation suites

Scheduled Runs

Automated periodic testing

Historical Analytics

Track trends over time

Dashboard

Visualize results at a glance

REST API & SDKs

Programmatic control

Integrations

Slack, PagerDuty, Jira

Free tier available

No credit card required

Get Early Access

Be first in line when we launch.

Open Source

Built in the Open

ArtemisKit is Apache-2.0 licensed and open source. Contribute, customize, and build on a foundation you control.

Self-Hosted

Your data stays on your infrastructure. No external calls, no data sharing.

Forever Free

Apache 2.0 licensed. No paywalls, no feature gates, no surprises.

Community Driven

Built in the open. Contributions welcome. Your input shapes the roadmap.

Star on GitHub Contribute View Roadmap

Licensed under Apache-2.0

Stop Hoping Your AI Is Secure. Start Proving It.

Your LLM Is One Prompt Away From Disaster

Your AI passed every test.

Same prompt. Different answer.

Staging worked perfectly.

Sound familiar?

Manual Testing

Security Afterthought

Ship and Pray

Tool Fragmentation

One CLI. Three Superpowers.

Security Red-Teaming

Quality Evaluation

Stress Testing

Get Started in 3 Steps

Install

Define

Run

Built for Every Team

Red Team Your LLM

ArtemisKit Cloud

What's coming

Team Workspaces

Scheduled Runs

Historical Analytics

Dashboard

REST API & SDKs

Integrations

Get Early Access

Built in the Open

Self-Hosted

Forever Free

Community Driven

Your LLM Is One Prompt Away
From Disaster