x.jing.vision — personal AI infrastructure layer

One unified LLM^+more infra layer for research.

The internal layer behind research.jing.vision and the Article API. The full stack in one place: a unified LLM endpoint that routes to any model; agents with tool-calling and decision loops; prompts‑as‑functions — versioned, typed, schema‑enforced; and compound pipelines that orchestrate whole workflows end‑to‑end.

1from openai import OpenAI 2 3gw = OpenAI( 4 api_key = "gw_••••••••", 5 base_url = "https://x.jing.vision/v1", 6) 7# route to any model — zero other changes 8resp = gw.chat.completions.create( 9 model = "claude-opus-4.6", 10 messages = messages, 11 stream = True, 12)

1agent = create_agent( 2 tools = [search, scrape, embed, summarize], 3 routing = { 4 "reasoning": "claude-opus-4.6", 5 "summaries": "gpt-5-mini", 6 "embeddings": "ollama/nomic-embed", 7 }, 8) 9findings = await agent.run(query)

Research papers are messy in two ways. Each one is long and unstructured — critical details like datasets, baselines, metrics, and limitations are buried deep inside PDFs with no reliable schema. And there are simply too many of them: manual reading doesn't scale, and keyword search returns noise rather than comparable evidence.

What you actually need is to extract evidence and key findings, then compare horizontally across many papers — reliably and repeatedly. That's what gateway makes possible. It's not a standalone product; it's the deep infrastructure behind the Article API extraction pipeline and the research experience at research.jing.vision.

Think of it as the engine room. It runs the LLM passes that make paper intelligence possible — so the app can surface key findings and compare papers at scale without anyone reading a full PDF.

Each extraction is a typed, versioned prompt function: submit an arXiv ID, retrieve and parse the PDF into sections, run parallel LLM passes, validate every output against a JSON schema with confidence scoring, dedup against the index, and serve structured results to the API. Repeatable, observable, and cheap enough to run across hundreds of papers.

Prompt-as-function PDF parsing Parallel LLM passes JSON schema validation Confidence scoring Dedup & indexing arXiv ingestion Horizontal comparison

# ① unified LLM — one endpoint, swap model freely from openai import OpenAI gw = OpenAI( base_url = "https://x.jing.vision/v1", api_key = "gw_••••••••", ) # route to any provider — identical call each time gw.chat.completions.create(model="claude-opus-4.6", messages=msgs) gw.chat.completions.create(model="gpt-5", messages=msgs) gw.chat.completions.create(model="ollama/llama3.2", messages=msgs)

# ② agent — decision loop across tools, model-per-role routing agent = create_agent( tools = [search, scrape, embed, summarize], routing = { "reasoning": "claude-opus-4.6", "bulk": "gpt-5-mini", "embeddings": "ollama/nomic-embed", }, ) result = await agent.run("compare RAG architectures across 30 papers") # trace emitted automatically: # STEP 1 search(query) → 31 papers found # STEP 2 scrape + embed → 31 docs chunked # STEP 3 summarize(docs) → synthesis, 480 tokens · 4.1 s

# ③ prompt-as-function — versioned, typed, schema-enforced extract_methods = define_prompt( name = "extract-methods", model = "gpt-5-mini", schema = MethodsSchema, # pydantic model version = 4, ) # call it anywhere — output is always typed + validated out = await extract_methods(text=section_text) # → { "datasets": ["Natural Questions", "TriviaQA"], # "baselines": ["DPR+FiD", "Atlas", "Vanilla GPT-4"], # "confidence": 0.91 } # flagged if < 0.75

# ④ compound pipeline — ingest → ∥ passes → validate → index paper = await gateway.ingest( id = "2401.12345", passes = ["methods", "metrics", "topics"], strategy = "parallel", ) # agent orchestrates each pass, retries failures, routes by cost: # ∥ extractMethods@v4 → { datasets, baselines } confidence 0.91 # ∥ extractMetrics@v3 → { EM_NQ: 0.512, params_B: 7 } conf 0.88 # ∥ extractTopics@v3 → ["RAG", "retrieval", "long-context"] # → dedup → index.upsert() · 2.8 s total · pass_versions stored

One unified LLM^+more infra layer for research.

The engine room
for messy
papers.

The building blocks.
Plus the research layer built on them.

Each primitive. Real code.

The research app it powers

The extraction pipeline

One unified LLM+more infra layer for research.

The engine roomfor messypapers.