How It Works

From document to answer

A nine-step pipeline that turns your raw files into precise, cited answers — with no model hallucination in the loop.

Step 1 — Document Ingestion

Upload

Direct to S3 via pre-signed URL

PDF, DOCX, images, spreadsheets — no file size bottleneck

Parse

Textract OCR + fast parsers

AWS Textract for scanned PDFs, vision model fallback for images

Chunk

Token-aware semantic chunking

Semchunk splits text into semantically coherent passages

Embed

AWS Bedrock Titan V2

Amazon Titan Embeddings V2 with Redis caching layer

Index

Pinecone per-workspace namespace

Tenant-isolated vector index, ready to query in seconds
Query time

Step 2 — Retrieval & Generation

Query

HyDE + expansion

HyDE generates a hypothetical answer to improve semantic match; query expansion widens recall

Retrieve

Vector similarity search

Top-k passages retrieved from Pinecone by cosine similarity

Re-rank

Cohere Rerank v3.5

Cross-encoder re-scores retrieved passages for precision — higher quality context for the LLM

Generate

OpenRouter LLM

LLM receives only the re-ranked, relevant context — grounded answer with document citations

2-stage

RAG pipeline

AES-256

Credential encryption

Per-workspace

Vector isolation

Fail-closed

JWT revocation