ELISA (Emacs Lisp Information System Assistant) is a system designed to generate informative answers to user queries using a Retrieval Augmented Generation (RAG) approach. RAG combines the capabilities of Large Language Models (LLMs) with Information Retrieval (IR) techniques to enhance the accuracy and relevance of generated responses. ELISA addresses limitations inherent in purely LLM-based systems by: - Leveraging External Knowledge: Unlike LLMs trained on a fixed dataset, ELISA can access and process information from external knowledge sources, expanding its knowledge base beyond its initial training data. - Reducing Computational Requirements: Instead of retraining the entire LLM for new information, ELISA focuses on retrieving relevant data, minimizing computational resources required for query processing. - Minimizing Hallucinations: By grounding responses in factual data retrieved from external sources, ELISA aims to reduce the likelihood of generating incorrect or nonsensical information (hallucinations) often associated with LLMs. The following sections will detail the key components and processes involved in ELISA's operation: parsing, retrieval, augmentation, and generation. Parsing. Simple solution is split text document into chunks by length with overlap and save it to storage. In ELISA we use more advanced solution. Instead of split by length we split text by semantic distances between parts of text document. To store this chunks we use sqlite database. Retrieving. Simple solution is to extract top K chunks by semantic similarity from storage. To improve quality we use more robust solution. Before going to storage we let LLM rewrite user query to make it context agnostic. For example, user ask LLM about llamas, LLM answer. Then user ask "where it lives?". If we try to search for this query we barely find something useful. But LLM can rewrite it to something like "where llamas lives?" and we will find useful information. Instead of use simple semantic similarity search only we use hybrid search. It means that ELISA search relevant chunks by semantic similarity and by full text search and then combine results. To improve relevance even more instead of use top K results from hybrid search user can enable reranker. Reranker is a service that gives user query, top N chunks and feed it to reranker model. This model gives pairs of text: user query and text chunk and return number that means relevance. Service collect this results and sort it by relevance. Also if reranker enabled ELISA filter out irrelevant results. Augmentation. ELISA gets text retrieved in previous step and put it into context. This context later will be sent to LLM together with user query. Generation. To improve generation quality to user query will be added instructions to LLM how to answer. We let LLM ability to say "not enough data" instead of hallucinations. LLM generates answer based on context, instructions and user query.