AI Runtime
Multi-vendor LLM chat, embeddings, vision, image generation, speech and live sessions.
Overview
/v1/aiai# Python
from infrai import ai
# TypeScript
import { InfraiClient } from "@infrai/sdk";
const client = new InfraiClient({ apiKey: process.env.INFRAI_API_KEY! });
// → client.aiMethods
ai.chat
Multi-vendor chat completion with quality-based routing, tools, structured output, caching and batch mode.
Python
ai.chat(prompt, *, model=None, vendor=None, task="general", prefer="balanced", stream=False, response_format=None, tools=None, cache_strategy="vendor", batch_mode=False, max_cost_multiplier=1.5, timeout_seconds=60, idempotency_key=None)TypeScript
client.ai.chat(opts: ChatOptions): Promise<ChatResult>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
messages | string | ChatMessage[] | Required | A prompt string, or a list of role/content messages. |
model | string | Optional | Explicit model id; skips task/prefer routing. |
vendor | string | Optional | Pin a specific AI vendor. |
task | "general" | "reasoning" | "coding" | "long_context" | Optional | Use-case axis (general/reasoning/coding/long_context); routes to a model family. |
prefer | "balanced" | "cheapest" | "smartest" | Optional | Optimization axis: balanced (best value) | cheapest | smartest. |
temperature | number | Optional | Sampling temperature. |
max_tokens | number | Optional | Maximum tokens to generate. |
tools | Tool[] | Optional | Tool/function-calling definitions. |
response_format | { type: "json_object" | "text" | "json_schema" } | Optional | Force structured output (JSON object or schema). |
cache_strategy | "vendor" | "infrai" | "none" | Optional | Which cache layer to use. |
batch_mode | boolean | Optional | Run as a batch job for a discounted rate. |
max_cost_multiplier | number | Optional | Cap failover cost relative to the primary vendor. |
stream | boolean | Optional | Set true to stream; use the streaming method. |
idempotency_key | string | Optional | Optional dedup key; identical retries return the same result. |
Returns
ChatResult { content, finish_reason, usage, metadata }Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
# stream=True asks the gateway to deliver the completion incrementally.
result = ai.chat(
"Tell me a 50-word story about a curious raccoon.",
prefer="balanced",
stream=True,
)
print(result.get("text") or result.get("content"))
print("request_id:", result["_metadata"].get("request_id"))
print("vendor:", result["_metadata"].get("vendor"))ai.chat
Stream chat tokens as Server-Sent Events; iterate the chunks as they arrive.
Python
ai.chat(prompt, *, stream=True) # iterate chunksTypeScript
client.ai.streamChat(opts: ChatOptions): AsyncIterable<ChatStreamChunk>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
messages | string | ChatMessage[] | Required | A prompt string, or a list of role/content messages. |
signal | AbortSignal | Optional | An AbortSignal to cancel the stream. |
Returns
AsyncIterable<ChatStreamChunk { delta, finish_reason, index }>Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
for chunk in ai.chat("Tell me a 50-word story about a curious raccoon.", stream=True):
print(chunk.delta.content or "", end="", flush=True)ai.embed
Create text embeddings, single or batched.
Python
ai.embed.create(text, *, model=None, vendor=None, dimensions=None, idempotency_key=None)TypeScript
client.ai.embed(opts: EmbedOptions): Promise<EmbedResult>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
input | string | string[] | Required | Text or list of texts to embed. |
model | string | Optional | Explicit model id; skips task/prefer routing. |
dimensions | number | Optional | Target embedding dimension count. |
idempotency_key | string | Optional | Optional dedup key; identical retries return the same result. |
Returns
EmbedResult { embeddings, model, usage, metadata }Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
# Embed text once for semantic search / RAG, then rank by cosine similarity.
result = ai.embed.create("What animal has clever paws?")
data = result.get("data") or result.get("embeddings") or []
vector = data[0].get("embedding") if data and isinstance(data[0], dict) else data[0]
print("dimensions:", len(vector or []))ai.image
Generate images from a text prompt across image vendors.
Python
ai.image.generate(prompt, *, size=None, n=None, model=None, vendor=None, prefer="balanced")TypeScript
client.ai.image(opts: ImageOptions): Promise<{ images }>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
prompt | string | Required | Text description of the desired image. |
size | string | Optional | Output size, e.g. 1024x1024. |
n | number | Optional | Number of images to generate. |
prefer | "balanced" | "cheapest" | "smartest" | Optional | Optimization axis: balanced (best value) | cheapest | smartest. |
Returns
{ images: Array<{ url, b64? }> }Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
result = ai.image.generate(
"A watercolor raccoon reading a book under a maple tree",
size="1024x1024",
n=1,
prefer="balanced",
)
images = result.get("images") or result.get("data") or []
print("first url:", images[0].get("url") if images else None)ai.vision
Reason over one or more images with a prompt.
Python
ai.vision(prompt=..., images=[...], model=None, vendor=None)TypeScript
client.ai.vision(opts: VisionOptions): Promise<ChatResult>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
prompt | string | Required | Text description of the desired image. |
images | Array<{ url } | { base64, mime? }> | Required | Image references as URLs or base64. |
Returns
ChatResultExample
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
r = ai.vision(prompt="What's in this image?",
images=[{"url": "https://example.com/photo.jpg"}])
print(r.get("content") or r.get("text"))ai.tts
Synthesize speech audio from text.
Python
ai.tts(input=..., voice=None, model=None, vendor=None, format=None)TypeScript
client.ai.tts(opts: TtsOptions): Promise<{ audio_url, format }>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
input | string | Required | Text to synthesize into speech. |
voice | string | Optional | Voice id to use. |
format | "mp3" | "wav" | "ogg" | "pcm" | Optional | Output audio format. |
Returns
{ audio_url, format }Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
r = ai.tts(input="Hello from Infrai.", voice="alloy", format="mp3")
print("audio url:", r.get("audio_url"))ai.asr
Transcribe audio to text with optional language hints.
Python
ai.asr(audio=..., language=None, model=None, vendor=None)TypeScript
client.ai.asr(opts: AsrOptions): Promise<{ text, language, segments? }>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
audio | { url } | { base64, mime? } | Required | Audio reference as URL or base64. |
language | string | Optional | Language hint, e.g. en or zh. |
Returns
{ text, language, segments? }Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
r = ai.asr(audio={"url": "https://example.com/clip.mp3"}, language="en")
print("transcript:", r.get("text"))ai.realtime.token
Mint a short-lived token for a WebRTC live session.
Python
ai.realtime.token(model=None, voice=None, ttl_seconds=None)TypeScript
client.ai.liveToken(opts): Promise<LiveSessionToken>Parameters
| Name | Type | Required | Description |
|---|---|---|---|
model | string | Optional | Explicit model id; skips task/prefer routing. |
voice | string | Optional | Voice id to use. |
ttl_seconds | number | Optional | Lifetime in seconds before expiry. |
Returns
LiveSessionToken { session_id, ice_servers, token, expires_at }Example
一次性前置(每个范例都假定已完成):
pip install infrai
# one-time auth (no secret needed): anonymous account + trial, writes ~/.infrai/credentials
python -c "from infrai import infra; infra.activate()"
# returning user instead: export INFRAI_API_KEY=ifr_pk_proj_...from infrai import ai
tok = ai.realtime.token(ttl_seconds=300)
print("session:", tok.get("session_id"), "| expires:", tok.get("expires_at"))