notokenlimitnotokenlimit
PUBLIC REST API · v1

Build with our AI in minutes.

Drop-in compatible with the OpenAI Chat Completions format. Enterprise-grade quotas, audit logs, per-user model allow-lists and IP blocking. Production-ready.

≈ 380ms p50
🟢99.95% uptime
🧠Frontier models
🛡️Audit logged

Introduction

The notokenlimit.com API lets you integrate frontier AI models into your own application using an endpoint that is fully compatible with the OpenAI Chat Completions format. Access is private and must be approved by an administrator.

  • API-Key authentication (Authorization: Bearer ntl_live_...).
  • Per-user quotas (per-minute, daily, monthly).
  • Admin-controlled model allow-list.
  • Full audit trail, every request is logged (no content).
  • Automatic and manual IP blocking.

Quick start

  1. 1Create an account on notokenlimit.com.
  2. 2Request API access from an administrator.
  3. 3Open the Developer panel and generate a key.
  4. 4Make your first request (see Examples).
bash
curl -X POST https://notokenlimit.com/api/v1/chat \
  -H "Authorization: Bearer ntl_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Authentication

Every request requires the Authorization header with your API key:

http
Authorization: Bearer ntl_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Keys always start with ntl_live_.
  • The plaintext key is shown only ONCE, right after creation.
  • We only store a SHA-256 hash of the key in our database.
  • If you lose a key, you must revoke it and generate a new one.
  • Never embed the key on the client side (frontend, mobile). Always use a backend.

Endpoints

POST/api/v1/chat/completions
POST/api/v1/messages
GET/api/v1/models
Base URLhttps://notokenlimit.com

Limits & quotas

Each account has individually configurable quotas. When exceeded, the server responds with HTTP 429 and a specific error code.

WindowError codeMeaning
60 secondsrate_limit_minuteToo many requests in a minute
24 hoursquota_dayDaily quota exhausted
30 daysquota_monthMonthly quota exhausted

Quota response headers are not returned yet, check the Developer panel for live usage.

Models

The models available to your account are approved by an administrator. You can list them on your panel or by calling GET /api/v1/models.

If you try to use a model that is not allowed, you will receive 403 model_not_allowed.

Live model catalog

These are the model IDs accepted by the API right now. Click any ID to copy.

Loading models…

Errors

Error bodies follow the OpenAI format:

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limit_minute",
    "message": "Rate limit: 60/min"
  }
}
HTTPCodeCause
400invalid_messagesMalformed messages
400missing_modelMissing model field
400unknown_modelUnknown model
401missing_authMissing Authorization header
401invalid_keyInvalid or unknown key
401key_revokedKey revoked
401key_expiredKey expired
403access_disabledAPI access not enabled
403suspendedAccess suspended
403ip_blockedYour IP is blocked
403model_not_allowedModel not allowed for your account
429rate_limit_minuteToo many requests per minute
429quota_dayDaily quota reached
429quota_monthMonthly quota reached
502upstream_errorProvider failure (retry)

Security

Keys are stored only as SHA-256 hashes (one-way). Only the public prefix is kept for display.
All traffic is encrypted in transit (HTTPS/TLS).
Logs store metadata (model, status, latency, tokens, IP), never the message content.
Malicious IPs can be blocked in real time from the admin panel.
Each key can have an optional expiry date.
Administrators can suspend a user's access at any time.
Treat keys like passwords, they are NEVER returned after creation.

Examples

cURL

bash
curl -X POST https://notokenlimit.com/api/v1/chat \
  -H "Authorization: Bearer $NTL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain entropy in one sentence."}
    ],
    "max_tokens": 256
  }'

JavaScript (fetch)

javascript
const res = await fetch("https://notokenlimit.com/api/v1/chat", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.NTL_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "claude-sonnet-4.6",
    messages: [{ role: "user", content: "Hello world" }],
  }),
});

if (!res.ok) {
  const err = await res.json();
  throw new Error(err.error?.message ?? "API error");
}
const data = await res.json();
console.log(data.choices[0].message.content);

Python (requests)

python
import os, requests

resp = requests.post(
    "https://notokenlimit.com/api/v1/chat",
    headers={
        "Authorization": f"Bearer {os.environ['NTL_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "claude-sonnet-4.6",
        "messages": [{"role": "user", "content": "Hello"}],
    },
    timeout=60,
)
resp.raise_for_status()
print(resp.json()["choices"][0]["message"]["content"])

Drop-in with the OpenAI SDK

You can use the official OpenAI SDK pointed at our endpoint:

python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://notokenlimit.com/api/v1",
    api_key=os.environ["NTL_KEY"],
)

resp = client.chat.completions.create(
    model="claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

FAQ

How do I request access?+
Sign up and contact an administrator. Once approved, you can generate keys from the panel.
Do you support SSE streaming?+
Yes. Pass "stream": true to /api/v1/chat/completions (OpenAI-style chunks) or /api/v1/messages (Anthropic-style events).
Can I use it with coding agents, Claude tools or the Anthropic/OpenAI SDKs?+
Yes. We expose an Anthropic-compatible endpoint at /api/v1/messages and an OpenAI-compatible one at /api/v1/chat/completions, both with streaming AND tool/function calling. Point Claude Code or the Anthropic SDK at ANTHROPIC_BASE_URL=https://notokenlimit.com/api with your ntl_live_ key as x-api-key, or any OpenAI-compatible tool at base_url=https://notokenlimit.com/api/v1. See the Clients & IDEs section for per-tool setup.
What if I lose a key?+
Revoke it from the panel and create a new one. Lost keys cannot be recovered, we only store the hash.
Do you store my messages?+
No. We only store metadata: timestamp, model, status, latency, estimated tokens, and IP. Never the text.
Can I raise my quotas?+
Yes, contact an administrator. Limits are configurable per user.

Clients & IDEs

Your ntl_live_ key is a drop-in replacement in any OpenAI- or Anthropic-compatible tool — two universal modes, both with streaming and tool/function calling:

Claude Code · Anthropic SDK · Claude Agent SDK

Point Claude Code or the Anthropic SDK at our Anthropic endpoint. Tool/function calling works.

bash
export ANTHROPIC_BASE_URL=https://notokenlimit.com/api
export ANTHROPIC_API_KEY=ntl_live_YOUR_KEY
export ANTHROPIC_MODEL=claude-opus-4.8
claude

OpenAI SDK (Python / JS)

Point the OpenAI SDK (or any OpenAI-compatible library) at our base URL — it appends /chat/completions automatically.

python
from openai import OpenAI

client = OpenAI(
    base_url="https://notokenlimit.com/api/v1",
    api_key="ntl_live_YOUR_KEY",
)
resp = client.chat.completions.create(
    model="claude-opus-4.8",
    messages=[{"role": "user", "content": "Hello"}],
    tools=[],   # tool / function calling supported
)

Antigravity · Cline · Cursor · Roo Code · Continue · Kilo Code

In the model/provider settings, add a custom "OpenAI Compatible" provider with these values:

http
Provider type : OpenAI Compatible
Base URL      : https://notokenlimit.com/api/v1
API Key       : ntl_live_YOUR_KEY
Model ID      : claude-opus-4.8

GitHub Copilot & others

Any tool that lets you set a custom OpenAI base URL works the same way. For tools locked to official providers (e.g. GitHub Copilot's stock setup), run a local OpenAI-compatible proxy pointing here.

#
Use any model id from GET /api/v1/models (e.g. claude-opus-4.8, gpt-5.5, gemini-3.1-pro). Whatever you set as the model in your tool is sent as the model field.
!
Every request is audited (model, status, latency, IP — never your content). Unusual volume or a brand-new IP automatically alerts our staff and can trigger an IP block, so keep your key server-side.