Stop Hoping Your AI Is Secure. Start Proving It.
Most teams use 3+ tools to test, secure, and benchmark their LLMs. ArtemisKit does all three. One CLI. Complete reliability.
npm install -g @artemiskit/cli
Your LLM Is
One Prompt Away
From Disaster
Prompt injection is OWASP's #1 LLM security risk. Most teams don't test for it. These scenarios aren't hypothetical—they're happening.
Your AI passed every test.
Then it leaked customer data.
A cleverly crafted prompt bypassed your safety filters. Sensitive information was in the response before anyone noticed.
Same prompt. Different answer.
Users noticed before you did.
Non-deterministic outputs eroded trust. Your chatbot contradicted itself, and support tickets flooded in.
Staging worked perfectly.
Production crashed under load.
Your first real stress test happened when users flooded your app. Latency spiked. The API hit rate limits. Revenue was lost.
Sound familiar?
Manual Testing
Copy-pasting prompts into a chat window is not a test suite.
Security Afterthought
Traditional SAST/DAST tools don't cover LLM attack surfaces.
Ship and Pray
Deploying without quality gates and hoping nothing breaks.
Tool Fragmentation
Using 3+ tools for testing, security, and performance.
One CLI. Three Superpowers.
ArtemisKit is the open-source standard for LLM reliability. Security red-teaming, quality evaluation, and stress testing—unified in a single CLI.
Security Red-Teaming
Break it before attackers do.
Systematically test for prompt injection, jailbreaks, data extraction, and more. Get CVSS-like severity scores and audit-ready reports.
Quality Evaluation
Catch regressions before they ship.
Evaluate LLM outputs with semantic similarity, LLM-as-judge, JSON schema validation, regex matching, and more. Reproducible tests, every time.
Stress Testing
Know your limits before users find them.
Measure p50/p95/p99 latency, throughput, and token costs under load. Discover rate limits and performance cliffs in staging, not production.
Stop using 3 tools. Use one.
Open source • Apache 2.0 • Self-hosted
Get Started in 3 Steps
From installation to your first test results in under 5 minutes. No complex setup. No configuration hell.
Install
Install globally with npm, yarn, pnpm, or bun. Zero configuration required.
# Install globally
npm install -g @artemiskit/cli
# Or run directly with npx
npx @artemiskit/cli --help Define
Define test scenarios in simple YAML. Multi-turn conversations, custom evaluators, and more.
# scenarios/my-test.yaml
name: chatbot-test
config:
provider: openai
model: gpt-4
scenarios:
- name: basic-qa
turns:
- role: user
content: "What is 2+2?"
- role: assistant
expect:
contains: ["4"] Run
Execute tests and get detailed reports. HTML dashboards, JSON exports, CI/CD ready.
$ artemiskit run scenarios/my-test.yaml
Running chatbot-test...
✓ Scenario: basic-qa ........... PASS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 1/1 passed (100%)
Report: ./artemis-output/report.html Built for Every Team
From security audits to compliance documentation, ArtemisKit adapts to your workflow.
Red Team Your LLM
Break it before attackers do
Traditional security tools don't cover LLM attack surfaces. Prompt injection is OWASP #1, but most teams don't test for it.
akit redteam scenario.yamlClick tabs or dots to explore different use cases
Not sure where to start? The CLI guides you through your first evaluation in under 5 minutes.
ArtemisKit Cloud
The CLI you love, with managed infrastructure, team collaboration, and advanced analytics. Zero setup required.
What's coming
Team Workspaces
Collaborate on evaluation suites
Scheduled Runs
Automated periodic testing
Historical Analytics
Track trends over time
Dashboard
Visualize results at a glance
REST API & SDKs
Programmatic control
Integrations
Slack, PagerDuty, Jira
Get Early Access
Be first in line when we launch.
Built in the Open
ArtemisKit is Apache-2.0 licensed and open source. Contribute, customize, and build on a foundation you control.
Self-Hosted
Your data stays on your infrastructure. No external calls, no data sharing.
Forever Free
Apache 2.0 licensed. No paywalls, no feature gates, no surprises.
Community Driven
Built in the open. Contributions welcome. Your input shapes the roadmap.