Retrieval-Augmented Generation for Everyone

Unlocking insights from your own data with Large Language Models (LLMs)

Mark Hinkle
October 25, 2024 • Estimated Reading Time: 15 minutes

I lead an AI Meetup user group of over 1500 members in the Research Triangle Park area of North Carolina. At these meetups, one question comes up continuously: They ask it many different ways, but usually it comes down to:

“How can I use ChatGPT with my private data?”

The nuance here is that by using your own private data, you can get answers to questions that ChatGPT has not been trained on. This is touching on a more complex term called retrieval-augmented generation, or RAG. So, in this week’s edition of the AIE, I’ll tackle how any knowledge user can do this without being an AI developer.

Also stay tuned for next week’s edition I have a lot of big announcements coming…

AIOS - Artificially Intelligent Operating System

The Artificially Intelligent Operating System

Work Smarter, Not Harder with AI

Accelerate your productivity and improve the quality of your work with this FREE 14-day email course, designed by industry experts and packed with practical tactics.

This course is free and only available to subscribers of The AIE!

Real-World Tested: Proven AI tactics used by top companies for real results.
Expert-Led: Strategies from AI pros who’ve transformed businesses.
Immediate Impact: Automate tasks and improve decisions instantly.

Click the button below and increase your productivity by 30% in 8 minutes a day over 2 weeks.

AI Efficiency Edge - Quick Tips for Big Gains

Analyze your Documents with Google NotebookLM

A little-known AI tool from Google, NotebookLM, is going viral for its Audio Overviews that mimic the speech cadence of podcasters. This allows users to create a simulated conversation between two people from a document. You can read about that feature in Wired. However, many other cool features align with our topic this week and let anyone essentially “chat with their documents” privately and securely.

NotebookLM is a powerful tool developed by Google AI that combines the functionality of a traditional notebook with the advanced capabilities of a large language model (LLM). With NotebookLM, you can ask questions about your doc's content, get summaries of specific sections, generate ideas based on your writing, and even get summaries from your notes!

NotebookLM keeps your information secure and cites sources correctly. It's perfect for analyzing research papers, summarizing meeting notes, or studying complex topics

Here's how it works

1. Accessing NotebookLM

Visit the NotebookLM website: https://notebooklm.google/
Sign in with your Google account.

2. Creating a New Notebook

Click the "Create" button.
Give your notebook a descriptive name.

3. Adding Sources

NotebookLM can process information from various sources:

Upload files: You can upload documents directly from your computer (PDFs, Docs, etc.).
Connect to Google Drive: Link your Google Drive to import files.
Add links: Provide URLs of web pages or YouTube videos.
Input text directly: Paste the text into the notebook.

4. Utilizing the AI Features

Once you've added your sources, NotebookLM will process them and enable several AI-powered features:

Summarization: Get concise summaries of your uploaded documents or linked web pages.
Answering Questions: Ask questions about the material, and NotebookLM will provide answers with relevant quotes.
Generating Ideas: Brainstorm new concepts or explore different perspectives based on the information provided.
Creating Content: Generate different creative text formats, such as poems, code, scripts, musical pieces, emails, and letters.

Upload your content for analysis from Google Docs, a website, or just plain text.

Enterprise AI Essentials - Your Weekly Deep Dive

Retrieval-Augmented Generation for Everyone

Maximizing the value of AI hinges on bridging the gap between generic knowledge and your organization’s proprietary data. While large language models (LLMs) are powerful, they often struggle to deliver meaningful insights based on company-specific information. For example, asking ChatGPT about your Q3 performance or your customers' most common complaints will likely result in vague responses—it simply lacks access to your internal data. But what if AI could seamlessly tap into your organization’s unique knowledge base?

That’s where retrieval systems come into play. Researchers found that when GPT-4 was equipped with retrieval methods to access real-time data, its performance improved significantly—accuracy jumped by 13%, and unhelpful responses were cut in half. This approach, known as Retrieval-Augmented Generation (RAG), transforms AI from a broad-spectrum assistant into a powerful tool that can deliver tailored, context-rich insights. Whether using platforms like NotebookLM or a CustomGPT with a specialized knowledge base, RAG ensures that AI provides relevant answers to your business.

Retrieval systems enable AI to access and pull relevant data from vast collections of documents or databases, much like a researcher consulting a well-organized library. Even when dealing with familiar topics, GPT-4 saw a noticeable improvement in performance when retrieval methods were applied. Instead of relying solely on pre-trained knowledge, RAG allows AI to reference your specific data to generate more accurate, actionable answers.

This improvement is crucial for knowledge workers handling a wealth of company data—from sales reports to customer feedback. While LLMs like GPT-4 are inherently powerful, they become significantly more helpful when they can draw from your proprietary information. Without retrieval systems, asking questions like, “What’s our Q3 performance?” or “What are our top customer complaints?” will yield little value. With RAG, AI becomes a customized, high-precision problem-solver, delivering the business-critical insights that matter most.

How RAG Works in Simple Terms

Ask a question: You input a query into the AI.
Retrieve data: The AI searches your documents or databases using embeddings stored in a vector database.
Generate a response: The LLM uses its knowledge and the retrieved data to create a highly accurate and relevant answer.

Technical Overview of How RAG Works

If you wanted to build your own “EnterpriseGPT” with your data, it would look like this: The first step in creating a Retrieval-Augmented Generation (RAG) application is integrating a framework, such as LangChain, which acts as the orchestrator for the query process.

The framework is responsible for receiving the client's query, adding necessary context, and managing the connection between the large language model (LLM) and external data sources. This framework ensures the flow of information is organized and that all components—LLMs and external databases—communicate effectively. Additionally, it formats the final output, ensuring the user receives a clear, structured response.

At the core of the system is the LLM, such as GPT-4 or IBM Granite for your own private LLM, which processes the client’s query and generates a response. However, the LLM does not rely solely on its pre-trained knowledge. Instead, the model performs a semantic search to access external, contextually relevant data from a vector database.

This vector database, Milvus, Weaviate, or MongoDB Altas Search (especially if you already have MongoDB), for example, stores embeddings that represent documents, reports, and other internal data. The system uses these embeddings to retrieve relevant data based on the meaning behind the query, ensuring that the response is specific and tailored to the user's needs.

Finally, the LLM generates a raw response by combining the retrieved external data with its pre-existing knowledge. The framework then formats this response and sends it back to the client.

This architecture allows the RAG system to provide more accurate, reliable, and context-aware answers, leveraging the LLM's generative capabilities and the real-time information stored in external data sources. The inclusion of semantic search ensures that the system retrieves relevant documents efficiently, even when the user’s query wording does not precisely match the stored data.

Anatomy of a RAG Implementation

Example Workflow:

User Query: "What was our company's Q3 performance?"
Embedding Creation: The query is embedded into a vector representation and compared against stored financial reports in the vector database.
Document Retrieval: The system retrieves the most relevant Q3 financial reports from the database based on vector similarity.
Response Generation: GPT-4 processes both the original query and the retrieved documents, generating an answer that includes relevant numbers or insights directly from the Q3 data.

By using retrieval-augmented generation, companies can significantly enhance the utility of LLMs, turning generic AI systems into powerful tools for querying proprietary, domain-specific data in real-time.

Example With and Without RAG

You ask the AI, "What were our sales in Q3?"

Without RAG, the model might give you a general answer about sales because it doesn’t know your specific data.
With RAG, the model can pull your Q3 sales report from your database and generate a precise answer, such as: “Your sales in Q3 were $1.2M, up 10% from Q2.”

When to Use RAG

RAG is powerful for businesses and knowledge workers because it allows AI to provide answers based on your own data, not just general knowledge. This is especially useful for tasks like customer service, report generation, or any scenario where having access to specific data is critical.

[If you want to try a little more advanced example on your own, AI Rewind has a good tutorial on how to build RAG-as-a-service with Claude 3.5 Sonnet in less than 50 lines of Python code.]

Further Reading:

AIthority: Revolutionizing Generative AI with Retrieval Augmented Generation
The NewStack: Advanced Retrieval-Augmented Generation (RAG) Techniques
National Law Review: From RAGs to Glitches: Large Language Models for Small Law Practices
Arvix: Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases

AI Toolbox

Inquisite - Research doesn't have to be slow and tedious. Inquisite conducts deep research for you using trusted sources to get answers you can rely on so you can make faster progress using agentic RAG.
Pryon - Pryon is a product that ingests your enterprise data and allows you to create an Enterprise GPT. The company was founded by Igor Jablokov, who previously founded industry pioneer Yap, which was acquired by Amazon and served as the nucleus for follow-on products such as Alexa, Echo, and Fire TV.
Granite 3.0 - The new IBM Granite 3.0 models deliver state-of-the-art performance relative to model size while maximizing safety, speed, and cost-efficiency for enterprise use cases.
Claude 3.5 - Anthropic has announced upgrades to its AI portfolio, including an enhanced Claude 3.5 Sonnet model and the introduction of Claude 3.5 Haiku, alongside a “computer control” feature in a public beta.

Prompt of the Week

Analyze Your Sales with This CustomGPT

You may want to implement AG but aren’t an engineer. One easy solution is to create a CustomGPT to enhance customer service or sales analysis. Here’s a prompt that allows your LLM to retrieve relevant information from your company’s specific data and respond intelligently.

Remember that you are uploading your private data to a public service. Make sure this adheres to your company’s data policies. If you don’t have one, check out this edition of the AIE with a prompt for creating an AI Acceptable Use Policy.

How To Use This Prompt

First, go to ChatGPT and create a CustomGPT (instructions from an earlier AIE edition). Then, use this prompt in the Instructions section of the GPT. I used the following prompt, but you can customize it. Next, upload the files you want to analyze. Finally, save the GPT and ask questions about the topics housed in your data. It’s more than a search. The GPT can do summaries and draw insights. You can also fill in your company name in the prompt, primarily if public data about your company exists. So, you could ask how your results compare with those of others in your industry.

### CustomGPT Prompt for Company-Specific Data Queries

You are a smart assistant trained to assist users by retrieving and utilizing information from [Company Name]'s proprietary data, including sales reports, customer interactions, product documentation, and company policies. For every query, follow these steps:

1. **Clarify the user's request**: If the query is vague, ask clarifying questions.
2. **Search [Company Name]'s data repository**: Retrieve the most relevant documents or data points that address the query.
3. **Generate a detailed response**: Use the retrieved data to answer the query as accurately as possible. If applicable, provide context or additional insights.

#### Sample Queries:
- "What were our top-selling products last quarter?"
- "What is the most common customer complaint this year?"
- "How many employees took paid leave in the last six months?"

For each query, ensure your response is based on data retrieved from your knowledge base and provide actionable insights whenever possible.

How did we do with this edition of the AIE?

I appreciate your support.

Your AI Sherpa,

Mark R. Hinkle
Editor-in-Chief
Connect with me on LinkedIn
Follow Me on Twitter

P.S. Interested in just AI news? Try AI Tangle. Are you a marketer? Then you might enjoy The AI Marketing Advantage.

Reply

or to participate.