AI Microservices Suite
RAG, multi-modal chat, voice & vision
AI Engineer
01 · Context
A US marketplace wanted AI woven through the product, not bolted on: recommendations that understand listings semantically, an assistant that remembers its conversations, voice as an input channel, image moderation that scales past human review, and a dispute process that doesn't burn support hours. Each of those is a different model with different failure modes — the engineering problem was shipping them as one dependable suite.
02 · Architecture
Every capability is its own Dockerized FastAPI microservice. The recommendation engine embeds listings with OpenAI embeddings and serves semantic search from Pinecone across 10,000+ listings. The assistant is a GPT-4o-mini chat service with persistent conversation history in MongoDB; Whisper handles voice-to-service transcription, and GPT-4 Vision screens uploaded images. The dispute-resolution engine is multi-modal — it weighs text and image evidence and produces an automated verdict with a confidence score attached, exposed over the same REST surface as everything else.
03 · Decisions
01One model, one service
RAG search, chat, transcription, vision moderation and dispute resolution each live in their own FastAPI container. A misbehaving model can be redeployed, rolled back, or rate-limited without touching its neighbors.
02Confidence scores on every verdict
The dispute engine never returns a bare answer — every automated verdict carries a confidence score, so low-confidence cases are observable and can be routed to a human instead of failing silently.
03Memory lives in the database, not the prompt
Chat history persists in MongoDB rather than being replayed from the client, which keeps conversations durable across sessions and keeps token costs under control.
04Embeddings + Pinecone over keyword search
Semantic retrieval over 10k+ listings meant recommendations could match intent ('something for a leaky kitchen tap') rather than literal keywords — the difference between a search box and a recommendation engine.
04 · Outcomes
- A RAG recommendation engine serving semantic search across 10,000+ marketplace listings.
- A multi-modal assistant in production: GPT-4o-mini chat with persistent MongoDB history, Whisper voice input, and GPT-4 Vision image moderation.
- Automated dispute resolution producing confidence-scored verdicts through a Dockerized FastAPI microservice.
Stack
Want the full walkthrough?
The repository is private — happy to walk through the code and decisions on a call.