Key technical highlights and implementation details include:
- Plug-and-Play Anonymous Sessions:
I wanted casual users to have instant access to the application without authentication friction. I engineered secure HTTP-only middleware to transparently generate and attach a persistent UUID cookie to routing traffic. This effortlessly separated user environments securely, mapping visitors specifically to their targeted vector partitions and local conversation streams.
- Retrieval-Augmented Generation (RAG) Pipeline: The core intelligence of the app relies on a rigorous RAG architecture. When a user uploads a document, the `DataIngestor` service parses paragraphs dynamically and creates mathematical vector embeddings using Microsoft's `IEmbeddingGenerator`. These embeddings are stored locally within an SQLite semantic vector database partitioned by the user's Session ID. When chatting, the backend intercepts the prompt, executes a cosine similarity search across the vectors, and injects the highest-relevance text chunks seamlessly into the LLM context window. Bounded by a strict custom `SystemPrompt`, the AI generates contextual, cited answers directly from those specific texts, effectively neutralizing hallucination risks.
- Multi-Conversation Persistence:A raw SQLite database instance allowed me to cleanly serialize and deserialize chat interfaces on the fly. I constructed a scoped SQLite `ConversationRepository` to dynamically pull and update tracking histories chronologically the moment users input new textual context.
- AI Auto-Titling Engine: Implemented an asynchronous background algorithm that intercepts the very first message of any empty thread. The logic triggers a lightweight, ephemeral OpenAI call explicitly instructed to bypass normal instructions and just summarize the text into a clean 3-5 word label for the sidebar, keeping session management remarkably clean.