One unified LLM+more infrastructure for research.

The internal layer behind research.jing.vision and the Article API. The full stack in one place: a unified LLM endpoint that routes to any model; agents with tool-calling and decision loops; prompts‑as‑functions — versioned, typed, schema‑enforced; and compound pipelines that orchestrate whole workflows end‑to‑end.

one endpoint · any model · OpenAI-compatible
 1from openai import OpenAI
 2
 3gw = OpenAI(
 4  api_key  = "gw_••••••••",
 5  base_url = "https://x.jing.vision/v1",
 6)
 7# route to any model — zero other changes
 8resp = gw.chat.completions.create(
 9  model    = "claude-opus-4.6",
10  messages = messages,
11  stream   = True,
12)
multi-step agent · tools · automatic fallback & routing
1agent = create_agent(
2  tools   = [search, scrape, embed, summarize],
3  routing = {
4    "reasoning":  "claude-opus-4.6",
5    "summaries":  "gpt-5-mini",
6    "embeddings": "ollama/nomic-embed",
7  },
8)
9findings = await agent.run(query)
trace · 3 steps · 4.1s
STEP 1 search(query) → 18 results
STEP 2 scrape + embed → 3 docs chunked
STEP 3 summarize(docs) → 320 tokens
prompt as versioned, typed function
1extract_methods = define_prompt(
2  name    = "extract-methods",
3  model   = "gpt-5-mini",
4  schema  = MethodsSchema,
5  version = 4,
6)
call it anywhere · schema-validated
7out = await extract_methods(section=section)
8# { "datasets":[...], "baselines":[...],
9#   "confidence": 0.91 }
compound pipeline · PDF → index
1paper = await gateway.ingest(
2  id       = "2401.12345",
3  passes   = ["entities", "methods",
4              "metrics",  "topics"],
5  strategy = "parallel",
6)
output · 4 passes · 2.8s · schema-enforced
PASS methods datasets, baselines isolated
PASS metrics F1 0.88, BLEU 42.1, params 7B
PASS topics RAG · retrieval · long-context
→ INDEX deduped + written to research index

The engine room
for messy
papers.

Research papers are messy in two ways. Each one is long and unstructured — critical details like datasets, baselines, metrics, and limitations are buried deep inside PDFs with no reliable schema. And there are simply too many of them: manual reading doesn't scale, and keyword search returns noise rather than comparable evidence.

What you actually need is to extract evidence and key findings, then compare horizontally across many papers — reliably and repeatedly. That's what gateway makes possible. It's not a standalone product; it's the deep infrastructure behind the Article API extraction pipeline and the research experience at research.jing.vision.

Think of it as the engine room. It runs the LLM passes that make paper intelligence possible — so the app can surface key findings and compare papers at scale without anyone reading a full PDF.

Each extraction is a typed, versioned prompt function: submit an arXiv ID, retrieve and parse the PDF into sections, run parallel LLM passes, validate every output against a JSON schema with confidence scoring, dedup against the index, and serve structured results to the API. Repeatable, observable, and cheap enough to run across hundreds of papers.

capabilities

The building blocks.
Plus the research layer built on them.

Four core primitives — model routing, agents, prompt functions, compound pipelines — plus five deeper capabilities that power the research application. Every card is real and shipped.

─ Core primitives · 01–04
01 ·
Any Model, One Key

Different providers, different SDKs, different auth tokens.

One OpenAI-compatible endpoint routes to Claude, GPT-4o, Gemini, Mistral, or any local model via Ollama — zero refactor. Swap a base_url and nothing else changes.
drop-in · zero refactor
02 ·
Tool-Calling Agents
agentic

Research tasks need decision loops, not one-shot completions.

create_agent wires tools into a repeatable multi-step decision loop with per-step model routing and automatic retry. Run a full literature review across 40 papers — searching, fetching, chunking, synthesising — in a single await.
query search scrape embed synthesize findings
multi-step · tool-calling · decision loops
03 ·
Prompt-as-Function

Prompts scattered inline can’t be versioned, reused, or tested.

define_prompt wraps any prompt in a named, typed, schema-enforced function. Call it anywhere in the codebase. Bump the version and re-run only that pass — prior indexed results stay intact and historically comparable.
v1 v2 v3 v4 · live · prior results intact
versioned · typed · callable
04 ·
Compound Pipelines

Multi-stage tasks need orchestration, not a tangle of callbacks.

Chain steps — ingest, extract, validate, index — into one observable pipeline. Gateway manages state between steps end-to-end.
orchestrated · stateful · observable
─ Research layer · 05–09
05 ·
Parallel Execution

Sequential LLM passes don’t scale to hundreds of papers.

Run extraction passes concurrently — four at the latency of one. Each is independently validated so a failed pass never blocks a clean one from writing to the index.
extractMethods extractMetrics extractTopics 2.8s total
concurrent · independently scoped
06 ·
Cost-Optimal Routing

Using the same model for every task burns budget fast.

Route by role: Claude for deep reasoning, GPT-4o-mini for bulk summaries, Ollama for embeddings. The agent picks the cheapest capable model per step.
claude · gpt-mini · ollama
07 ·
Step-Level Tracing

Agents fail silently; outputs are hard to audit or reproduce.

Every tool call and LLM completion emits a trace — input, output, model, latency. Debug exactly what went wrong across any prompt version.
input · output · latency
08 ·
Versioned Prompt Registry

Improving a prompt silently invalidates prior results.

Every prompt function carries a version. Each indexed record stores the version that produced it — comparisons stay valid as prompts evolve.
reproducible · comparable
09 ·
Structured Output, Always

Freeform LLM text breaks every downstream system.

Every extraction is schema-enforced and confidence-scored. Low-confidence outputs are flagged before the index; duplicates dropped on write. Always clean and queryable.
typed · scored · deduped

The four primitives in action

Each primitive. Real code.

The same four building blocks — model routing, agents, prompt functions, compound pipelines — shown end-to-end on a real research task.

# ① unified LLM — one endpoint, swap model freely
from openai import OpenAI

gw = OpenAI(
  base_url = "https://x.jing.vision/v1",
  api_key  = "gw_••••••••",
)

# route to any provider — identical call each time
gw.chat.completions.create(model="claude-opus-4.6", messages=msgs)
gw.chat.completions.create(model="gpt-5",                  messages=msgs)
gw.chat.completions.create(model="ollama/llama3.2",          messages=msgs)
# ② agent — decision loop across tools, model-per-role routing
agent = create_agent(
  tools   = [search, scrape, embed, summarize],
  routing = {
    "reasoning":  "claude-opus-4.6",
    "bulk":       "gpt-5-mini",
    "embeddings": "ollama/nomic-embed",
  },
)
result = await agent.run("compare RAG architectures across 30 papers")

# trace emitted automatically:
# STEP 1  search(query)       → 31 papers found
# STEP 2  scrape + embed      → 31 docs chunked
# STEP 3  summarize(docs)     → synthesis, 480 tokens  ·  4.1 s
# ③ prompt-as-function — versioned, typed, schema-enforced
extract_methods = define_prompt(
  name    = "extract-methods",
  model   = "gpt-5-mini",
  schema  = MethodsSchema,  # pydantic model
  version = 4,
)

# call it anywhere — output is always typed + validated
out = await extract_methods(text=section_text)

# → { "datasets":  ["Natural Questions", "TriviaQA"],
#      "baselines": ["DPR+FiD", "Atlas", "Vanilla GPT-4"],
#      "confidence": 0.91 }   # flagged if < 0.75
# ④ compound pipeline — ingest → ∥ passes → validate → index
paper = await gateway.ingest(
  id       = "2401.12345",
  passes   = ["methods", "metrics", "topics"],
  strategy = "parallel",
)

# agent orchestrates each pass, retries failures, routes by cost:
# ∥ extractMethods@v4  →  { datasets, baselines }  confidence 0.91
# ∥ extractMetrics@v3  →  { EM_NQ: 0.512, params_B: 7 }  conf 0.88
# ∥ extractTopics@v3   →  ["RAG", "retrieval", "long-context"]
# → dedup → index.upsert()   ·  2.8 s total  ·  pass_versions stored
Crafted abstraction, for complex research.
01 ·

The research app it powers

The live research experience built on top of gateway — key-finding extraction, side-by-side paper comparison, and structured evidence across the full index.

Open research.jing.vision
02 ·

The extraction pipeline

The Article API landing and extraction pipeline — where PDFs go in and structured, machine-readable evidence comes out via gateway's orchestration layer.

Article API ↗