How It Works
From document to answer
A nine-step pipeline that turns your raw files into precise, cited answers — with no model hallucination in the loop.
Step 1 — Document Ingestion
Upload
Direct to S3 via pre-signed URL
PDF, DOCX, images, spreadsheets — no file size bottleneck
Parse
Textract OCR + fast parsers
AWS Textract for scanned PDFs, vision model fallback for images
Chunk
Token-aware semantic chunking
Semchunk splits text into semantically coherent passages
Embed
AWS Bedrock Titan V2
Amazon Titan Embeddings V2 with Redis caching layer
Index
Pinecone per-workspace namespace
Tenant-isolated vector index, ready to query in seconds
Query time
Step 2 — Retrieval & Generation
Query
HyDE + expansion
HyDE generates a hypothetical answer to improve semantic match; query expansion widens recall
Retrieve
Vector similarity search
Top-k passages retrieved from Pinecone by cosine similarity
Re-rank
Cohere Rerank v3.5
Cross-encoder re-scores retrieved passages for precision — higher quality context for the LLM
Generate
OpenRouter LLM
LLM receives only the re-ranked, relevant context — grounded answer with document citations
2-stage
RAG pipeline
AES-256
Credential encryption
Per-workspace
Vector isolation
Fail-closed
JWT revocation