Skip to content
Mohammad.
All systems
Case Studyoperational2024 – 2025

AI Microservices Suite

RAG, multi-modal chat, voice & vision

AI Engineer

Request flow
0k+listings indexed
multi-modaltext · voice · vision
scoreddispute verdicts

01 · Context

A US marketplace wanted AI woven through the product, not bolted on: recommendations that understand listings semantically, an assistant that remembers its conversations, voice as an input channel, image moderation that scales past human review, and a dispute process that doesn't burn support hours. Each of those is a different model with different failure modes — the engineering problem was shipping them as one dependable suite.

02 · Architecture

Every capability is its own Dockerized FastAPI microservice. The recommendation engine embeds listings with OpenAI embeddings and serves semantic search from Pinecone across 10,000+ listings. The assistant is a GPT-4o-mini chat service with persistent conversation history in MongoDB; Whisper handles voice-to-service transcription, and GPT-4 Vision screens uploaded images. The dispute-resolution engine is multi-modal — it weighs text and image evidence and produces an automated verdict with a confidence score attached, exposed over the same REST surface as everything else.

03 · Decisions

01One model, one service

RAG search, chat, transcription, vision moderation and dispute resolution each live in their own FastAPI container. A misbehaving model can be redeployed, rolled back, or rate-limited without touching its neighbors.

02Confidence scores on every verdict

The dispute engine never returns a bare answer — every automated verdict carries a confidence score, so low-confidence cases are observable and can be routed to a human instead of failing silently.

03Memory lives in the database, not the prompt

Chat history persists in MongoDB rather than being replayed from the client, which keeps conversations durable across sessions and keeps token costs under control.

04Embeddings + Pinecone over keyword search

Semantic retrieval over 10k+ listings meant recommendations could match intent ('something for a leaky kitchen tap') rather than literal keywords — the difference between a search box and a recommendation engine.

04 · Outcomes

  • A RAG recommendation engine serving semantic search across 10,000+ marketplace listings.
  • A multi-modal assistant in production: GPT-4o-mini chat with persistent MongoDB history, Whisper voice input, and GPT-4 Vision image moderation.
  • Automated dispute resolution producing confidence-scored verdicts through a Dockerized FastAPI microservice.

Stack

FastAPIOpenAIPineconeRAGWhisperGPT-4 VisionDocker

Want the full walkthrough?

The repository is private — happy to walk through the code and decisions on a call.

Get in touch