Proxima AI

A sleek, responsive Retrieval-Augmented Generation (RAG) AI chat application built with Blazor Server. It enables users to upload `.pdf`, `.docx`, `.txt`, and `.md` documents securely, dynamically extracting vectors from the text to power a highly intelligent, context-aware AI assistant. Featuring a persistent multi-thread conversational tracking—all without demanding users log in or create an account.

What I Did

Key technical highlights and implementation details include:

  • Plug-and-Play Anonymous Sessions: I wanted casual users to have instant access to the application without authentication friction. I engineered secure HTTP-only middleware to transparently generate and attach a persistent UUID cookie to routing traffic. This effortlessly separated user environments securely, mapping visitors specifically to their targeted vector partitions and local conversation streams.
  • Retrieval-Augmented Generation (RAG) Pipeline: The core intelligence of the app relies on a rigorous RAG architecture. When a user uploads a document, the `DataIngestor` service parses paragraphs dynamically and creates mathematical vector embeddings using Microsoft's `IEmbeddingGenerator`. These embeddings are stored locally within an SQLite semantic vector database partitioned by the user's Session ID. When chatting, the backend intercepts the prompt, executes a cosine similarity search across the vectors, and injects the highest-relevance text chunks seamlessly into the LLM context window. Bounded by a strict custom `SystemPrompt`, the AI generates contextual, cited answers directly from those specific texts, effectively neutralizing hallucination risks.
  • Multi-Conversation Persistence:A raw SQLite database instance allowed me to cleanly serialize and deserialize chat interfaces on the fly. I constructed a scoped SQLite `ConversationRepository` to dynamically pull and update tracking histories chronologically the moment users input new textual context.
  • AI Auto-Titling Engine: Implemented an asynchronous background algorithm that intercepts the very first message of any empty thread. The logic triggers a lightweight, ephemeral OpenAI call explicitly instructed to bypass normal instructions and just summarize the text into a clean 3-5 word label for the sidebar, keeping session management remarkably clean.