OWASP #1 Threat Prompt Injection

Stop Hoping Your AI Is Secure. Start Proving It.

Most teams use 3+ tools to test, secure, and benchmark their LLMs. ArtemisKit does all three. One CLI. Complete reliability.

Security 6 Attack Types
Quality 12 Evaluators
Performance p50/p95/p99
$ npm install -g @artemiskit/cli
artemiskit — security scan
2 Critical3 Blocked
6 mutation types
Available Now
Apache 2.0 Licensed
starsforks
Apache-2.0
The Reality Check

Your LLM Is One Prompt Away
From Disaster

Prompt injection is OWASP's #1 LLM security risk. Most teams don't test for it. These scenarios aren't hypothetical—they're happening.

CRITICAL • Prompt Injection

Your AI passed every test.

Then it leaked customer data.

A cleverly crafted prompt bypassed your safety filters. Sensitive information was in the response before anyone noticed.

#1 OWASP LLM Risk
HIGH • Output Inconsistency

Same prompt. Different answer.

Users noticed before you did.

Non-deterministic outputs eroded trust. Your chatbot contradicted itself, and support tickets flooded in.

73% AI projects fail pre-production
HIGH • Performance Unknown

Staging worked perfectly.

Production crashed under load.

Your first real stress test happened when users flooded your app. Latency spiked. The API hit rate limits. Revenue was lost.

10x production load vs staging

Sound familiar?

Manual Testing

Copy-pasting prompts into a chat window is not a test suite.

Security Afterthought

Traditional SAST/DAST tools don't cover LLM attack surfaces.

Ship and Pray

Deploying without quality gates and hoping nothing breaks.

Tool Fragmentation

Using 3+ tools for testing, security, and performance.

There's a better way
The Solution

One CLI. Three Superpowers.

ArtemisKit is the open-source standard for LLM reliability. Security red-teaming, quality evaluation, and stress testing—unified in a single CLI.

Lead Capability

Security Red-Teaming

Break it before attackers do.

Systematically test for prompt injection, jailbreaks, data extraction, and more. Get CVSS-like severity scores and audit-ready reports.

6 mutation types
Prompt injection detection
Jailbreak resistance testing
Data extraction prevention
Custom attack definitions
CVSS-like severity scoring
terminal
$ artemiskit redteam
6
Mutation types
Core Capability

Quality Evaluation

Catch regressions before they ship.

Evaluate LLM outputs with semantic similarity, LLM-as-judge, JSON schema validation, regex matching, and more. Reproducible tests, every time.

12 evaluator types
Semantic similarity scoring
LLM-as-judge evaluation
JSON schema validation
Multi-turn conversation testing
CI/CD integration ready
terminal
$ artemiskit run
12
Evaluator types
Core Capability

Stress Testing

Know your limits before users find them.

Measure p50/p95/p99 latency, throughput, and token costs under load. Discover rate limits and performance cliffs in staging, not production.

Configurable concurrency
p50/p95/p99 latency metrics
Throughput analysis
Token cost calculation
Rate limit detection
Capacity planning data
terminal
$ artemiskit stress
p99
Latency percentiles
+
+

Stop using 3 tools. Use one.

Open source • Apache 2.0 • Self-hosted

Quick Start

Get Started in 3 Steps

From installation to your first test results in under 5 minutes. No complex setup. No configuration hell.

01

Install

Install globally with npm, yarn, pnpm, or bun. Zero configuration required.

bash
# Install globally
npm install -g @artemiskit/cli

# Or run directly with npx
npx @artemiskit/cli --help
02

Define

Define test scenarios in simple YAML. Multi-turn conversations, custom evaluators, and more.

yaml
# scenarios/my-test.yaml
name: chatbot-test
config:
  provider: openai
  model: gpt-4

scenarios:
  - name: basic-qa
    turns:
      - role: user
        content: "What is 2+2?"
      - role: assistant
        expect:
          contains: ["4"]
03

Run

Execute tests and get detailed reports. HTML dashboards, JSON exports, CI/CD ready.

bash
$ artemiskit run scenarios/my-test.yaml

Running chatbot-test...

✓ Scenario: basic-qa ........... PASS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 1/1 passed (100%)
Report: ./artemis-output/report.html
Use Cases

Built for Every Team

From security audits to compliance documentation, ArtemisKit adapts to your workflow.

Security Teams
01/05

Red Team Your LLM

Break it before attackers do

Traditional security tools don't cover LLM attack surfaces. Prompt injection is OWASP #1, but most teams don't test for it.

$akit redteam scenario.yaml
Prompt injection detection (direct & indirect)
Jailbreak resistance testing
Data extraction prevention
CVSS-like vulnerability scoring

Click tabs or dots to explore different use cases

Not sure where to start? The CLI guides you through your first evaluation in under 5 minutes.

Coming Soon

ArtemisKit Cloud

The CLI you love, with managed infrastructure, team collaboration, and advanced analytics. Zero setup required.

What's coming

Team Workspaces

Collaborate on evaluation suites

Scheduled Runs

Automated periodic testing

Historical Analytics

Track trends over time

Dashboard

Visualize results at a glance

REST API & SDKs

Programmatic control

Integrations

Slack, PagerDuty, Jira

Free tier available
No credit card required

Get Early Access

Be first in line when we launch.

We'll only email you about ArtemisKit Cloud launch.

Open Source

Built in the Open

ArtemisKit is Apache-2.0 licensed and open source. Contribute, customize, and build on a foundation you control.

Self-Hosted

Your data stays on your infrastructure. No external calls, no data sharing.

Forever Free

Apache 2.0 licensed. No paywalls, no feature gates, no surprises.

Community Driven

Built in the open. Contributions welcome. Your input shapes the roadmap.

Licensed under Apache-2.0