Skip to content

OpenAI-Compatible APIs

ArtemisKit works with any provider that implements the OpenAI API specification. This includes cloud providers, local inference servers, and API aggregators.

The @artemiskit/adapter-openai adapter accepts a baseUrl option that redirects API calls to any OpenAI-compatible endpoint:

providers:
openai:
apiKey: ${YOUR_API_KEY}
baseUrl: https://your-provider.com/v1
ProviderBase URLBest For
Groqhttps://api.groq.com/openai/v1Ultra-fast LPU inference
Together AIhttps://api.together.xyz/v1200+ open-source models
Fireworks AIhttps://api.fireworks.ai/inference/v1Function calling, MoE models
OpenRouterhttps://openrouter.ai/api/v1400+ models, unified API
Ollamahttp://localhost:11434/v1Local inference
LM Studiohttp://localhost:1234/v1Local GUI + API
vLLMhttp://localhost:8000/v1Production self-hosting

Ultra-fast inference powered by custom LPU (Language Processing Unit) hardware. Achieves up to 1660 tokens/second with speculative decoding.

  1. Get your API key from console.groq.com

  2. Set environment variable:

Terminal window
export GROQ_API_KEY="gsk_..."
  1. Configure ArtemisKit:
name: groq-test
provider: openai
model: llama-3.3-70b-specdec
providers:
openai:
apiKey: ${GROQ_API_KEY}
baseUrl: https://api.groq.com/openai/v1
cases:
- id: example
prompt: "What is 2+2?"
expected:
type: contains
values: ["4"]
ModelContextNotes
llama-3.3-70b-specdec128KFastest 70B (1660 T/s with speculative decoding)
llama-3.3-70b-versatile128KBest quality general-purpose
llama-3-groq-70b-tool-use8K#1 on Berkeley Function Calling Leaderboard
llama-3.1-405b-instruct128KLargest open model
qwen3-32b128KStrong multilingual
mixtral-8x7b-3276832KFast MoE

Access to 200+ open-source models with the Together Inference Engine (4x faster than vLLM). Features frontier-class models including Qwen3 and DeepSeek.

  1. Get your API key from api.together.xyz

  2. Set environment variable:

Terminal window
export TOGETHER_API_KEY="..."
  1. Configure ArtemisKit:
name: together-test
provider: openai
model: Qwen/Qwen3-235B-A22B-Instruct
providers:
openai:
apiKey: ${TOGETHER_API_KEY}
baseUrl: https://api.together.xyz/v1
cases:
- id: example
prompt: "Explain quantum computing"
expected:
type: contains
values: ["qubit"]
ModelContextNotes
Qwen/Qwen3-235B-A22B-Instruct262KOutperforms GPT-4 on benchmarks, 22B active params
Qwen/Qwen3-235B-A22B-Thinking262KDeep reasoning mode, beats OpenAI o3
Qwen/Qwen3-Coder-480B-A35B128KLargest open-source coding model
deepseek-ai/DeepSeek-R1128KState-of-the-art reasoning (math, code, logic)
deepseek-ai/DeepSeek-V3128K671B MoE, 37B active, tool calling
meta-llama/Llama-3.3-70B-Instruct-Turbo128KFast Llama 3.3

Optimized for function calling, structured outputs, and MoE models. 4x throughput with 50% lower latency on NVIDIA H100s.

  1. Get your API key from fireworks.ai/account/api-keys

  2. Set environment variable:

Terminal window
export FIREWORKS_API_KEY="..."
  1. Configure ArtemisKit:
name: fireworks-test
provider: openai
model: accounts/fireworks/models/qwen3-235b-a22b
providers:
openai:
apiKey: ${FIREWORKS_API_KEY}
baseUrl: https://api.fireworks.ai/inference/v1
cases:
- id: example
prompt: "Write a haiku about testing"
expected:
type: contains
values: ["test", "code"]
mode: any
ModelContextNotes
accounts/fireworks/models/qwen3-235b-a22b262KBest quality MoE
accounts/fireworks/models/qwen3-coder-480b-a35b-instruct128KLargest coding model
accounts/fireworks/models/deepseek-r1-0528128KAdvanced reasoning
accounts/fireworks/models/glm-4p7-flash202K30B MoE, fast and efficient
accounts/fireworks/models/llama-v3p3-70b-instruct128KReliable general-purpose

Single unified API for 400+ models from all major providers. Automatic fallbacks, load balancing, and cost optimization.

  1. Get your API key from openrouter.ai/keys

  2. Set environment variable:

Terminal window
export OPENROUTER_API_KEY="sk-or-..."
  1. Configure ArtemisKit:
name: openrouter-test
provider: openai
model: deepseek/deepseek-r1
providers:
openai:
apiKey: ${OPENROUTER_API_KEY}
baseUrl: https://openrouter.ai/api/v1
cases:
- id: example
prompt: "Hello"
expected:
type: contains
values: ["hello", "hi"]
mode: any

OpenRouter provides access to models from all major providers:

ModelProviderNotes
openai/gpt-5.2OpenAI400K context, adaptive reasoning
google/gemini-3Google1M token context, multimodal
anthropic/claude-sonnet-4AnthropicLatest Claude
deepseek/deepseek-r1DeepSeekFrontier quality at 1/100th cost
mistralai/devstral-2Mistral123B agentic coding, 256K context
meta-llama/llama-4-scoutMetaMultimodal, extreme context
xiaomi/mimo-v2-flashXiaomi309B MoE, 15B active, 256K context

See openrouter.ai/models for the full list of 400+ models.


Run models locally on your machine. No API key required. Partners with OpenAI to bring GPT-OSS to local inference.

  1. Install Ollama from ollama.com

  2. Pull a model:

Terminal window
ollama pull qwen3
  1. Start the server (runs automatically on install):
Terminal window
ollama serve
  1. Configure ArtemisKit:
name: ollama-test
provider: openai
model: qwen3
providers:
openai:
apiKey: ollama # Required but ignored
baseUrl: http://localhost:11434/v1
cases:
- id: example
prompt: "What is the capital of France?"
expected:
type: contains
values: ["Paris"]

Run ollama list to see installed models. Popular options:

ModelSizeNotes
qwen38BLatest Qwen with dual-mode reasoning
qwen3:72b72BBest quality Qwen
deepseek-r17B-671BState-of-the-art reasoning
llama3.370BLatest Llama
gpt-ossvariesOpenAI’s open-source model
gemma34B-27BGoogle’s efficient models
mistral-large-3variesMultimodal MoE for enterprise
qwen3-codervariesCode-specialized

Desktop app with a local API server. Easy model management with GUI.

  1. Download from lmstudio.ai

  2. Load a model in the app (recommended: Qwen3, DeepSeek-R1, or Llama 3.3)

  3. Start the local server (Developer tab → Start Server)

  4. Configure ArtemisKit:

name: lmstudio-test
provider: openai
model: local-model # Model name from LM Studio
providers:
openai:
apiKey: lm-studio # Required but ignored
baseUrl: http://localhost:1234/v1
cases:
- id: example
prompt: "Explain recursion"
expected:
type: contains
values: ["function", "itself"]
mode: any

High-performance inference server for production self-hosting. Supports speculative decoding, continuous batching, and PagedAttention.

  1. Install vLLM:
Terminal window
pip install vllm
  1. Start the server:
Terminal window
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-8B \
--port 8000
  1. Configure ArtemisKit:
name: vllm-test
provider: openai
model: Qwen/Qwen3-8B
providers:
openai:
apiKey: vllm # Required but ignored
baseUrl: http://localhost:8000/v1
cases:
- id: example
prompt: "Hello"
expected:
type: contains
values: ["hello", "hi"]
mode: any

For local servers (Ollama, LM Studio, vLLM), ensure the server is running:

Terminal window
# Check if server is responding
curl http://localhost:11434/v1/models

Some providers require specific header formats. If you get auth errors, check the provider’s documentation for any custom headers.

Model names vary by provider. Check the provider’s documentation for exact model identifiers.

For local inference with large models, increase the timeout:

providers:
openai:
baseUrl: http://localhost:11434/v1
timeout: 120000 # 2 minutes