Guides
RAG Application

Build a RAG Application

Implement Retrieval Augmented Generation (RAG) for accurate, grounded AI responses using Hypersave.

What is RAG?

RAG (Retrieval Augmented Generation) combines the power of large language models with your own data:

  1. Retrieve relevant information from your knowledge base
  2. Augment the LLM prompt with this context
  3. Generate accurate responses grounded in your data

This eliminates hallucinations and keeps responses accurate to your content.

Architecture

User Query → Hypersave Search → Context Retrieval → LLM + Context → Response

Set Up the Retriever

// retriever.js
const HYPERSAVE_API = 'https://api.hypersave.io';
const API_KEY = process.env.HYPERSAVE_API_KEY;
 
export async function retrieveContext(query, options = {}) {
  // Use hybrid search for best results
  const response = await fetch(`${HYPERSAVE_API}/v1/search`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      query,
      limit: options.limit || 5,
      threshold: options.threshold || 0.7,
      hybrid: true
    }),
  });
 
  const data = await response.json();
  return data.results || [];
}
 
export function formatContext(documents) {
  return documents
    .map((doc, i) => `[${i + 1}] ${doc.title || 'Document'}\n${doc.content}`)
    .join('\n\n---\n\n');
}

Create the RAG Pipeline

// rag.js
import OpenAI from 'openai';
import { retrieveContext, formatContext } from './retriever.js';
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 
const SYSTEM_PROMPT = `You are a helpful assistant that answers questions based on the provided context.
 
IMPORTANT RULES:
1. Only answer based on the provided context
2. If the answer isn't in the context, say "I don't have information about that"
3. Cite your sources using [1], [2], etc.
4. Be concise and accurate
 
Context will be provided in the format:
[1] Document Title
Document content...`;
 
export async function ragQuery(question) {
  // 1. Retrieve relevant documents
  const documents = await retrieveContext(question);
 
  if (documents.length === 0) {
    return {
      answer: "I don't have any relevant information to answer this question.",
      sources: []
    };
  }
 
  // 2. Format context for the LLM
  const context = formatContext(documents);
 
  // 3. Generate response with context
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      { role: 'system', content: SYSTEM_PROMPT },
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` }
    ],
    temperature: 0.3 // Lower temperature for more factual responses
  });
 
  return {
    answer: completion.choices[0].message.content,
    sources: documents.map(d => ({
      id: d.id,
      title: d.title,
      relevance: d.score
    }))
  };
}

Add Reranking (Optional)

For better accuracy, rerank results:

// Advanced retrieval with reranking
export async function retrieveWithRerank(query, options = {}) {
  // Get more candidates than needed
  const candidates = await retrieveContext(query, {
    limit: 20,
    threshold: 0.5
  });
 
  // Use Hypersave's ask endpoint for intelligent selection
  const response = await fetch(`${HYPERSAVE_API}/v1/ask`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      query: `Select the most relevant documents for: ${query}`,
      includeContext: true,
      limit: options.limit || 5
    }),
  });
 
  const data = await response.json();
  return data.context || candidates.slice(0, options.limit || 5);
}

Create the API Endpoint

// api/rag.js (Next.js API route)
import { ragQuery } from '../../lib/rag';
 
export default async function handler(req, res) {
  if (req.method !== 'POST') {
    return res.status(405).json({ error: 'Method not allowed' });
  }
 
  const { question } = req.body;
 
  if (!question) {
    return res.status(400).json({ error: 'Question is required' });
  }
 
  try {
    const result = await ragQuery(question);
    res.json(result);
  } catch (error) {
    console.error('RAG error:', error);
    res.status(500).json({ error: 'Failed to process question' });
  }
}

Build the Frontend

// RAGInterface.jsx
import { useState } from 'react';
 
export default function RAGInterface() {
  const [question, setQuestion] = useState('');
  const [result, setResult] = useState(null);
  const [loading, setLoading] = useState(false);
 
  const handleAsk = async () => {
    setLoading(true);
 
    const response = await fetch('/api/rag', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ question })
    });
 
    const data = await response.json();
    setResult(data);
    setLoading(false);
  };
 
  return (
    <div className="max-w-3xl mx-auto p-6">
      <h1 className="text-2xl font-bold mb-6">Ask Anything</h1>
 
      <div className="flex gap-2 mb-6">
        <input
          type="text"
          value={question}
          onChange={(e) => setQuestion(e.target.value)}
          placeholder="Ask a question..."
          className="flex-1 p-3 border rounded-lg"
          onKeyDown={(e) => e.key === 'Enter' && handleAsk()}
        />
        <button
          onClick={handleAsk}
          disabled={loading}
          className="px-6 py-3 bg-blue-600 text-white rounded-lg"
        >
          {loading ? 'Thinking...' : 'Ask'}
        </button>
      </div>
 
      {result && (
        <div className="space-y-4">
          <div className="bg-gray-50 p-4 rounded-lg">
            <p className="whitespace-pre-wrap">{result.answer}</p>
          </div>
 
          {result.sources?.length > 0 && (
            <div>
              <h3 className="font-semibold mb-2">Sources</h3>
              <ul className="space-y-2">
                {result.sources.map((source, i) => (
                  <li key={source.id} className="text-sm text-gray-600">
                    [{i + 1}] {source.title}
                    <span className="text-gray-400 ml-2">
                      ({Math.round(source.relevance * 100)}% relevant)
                    </span>
                  </li>
                ))}
              </ul>
            </div>
          )}
        </div>
      )}
    </div>
  );
}

Advanced Techniques

Chunking Strategies

For long documents, chunk content appropriately:

function chunkDocument(content, options = {}) {
  const chunkSize = options.chunkSize || 500;
  const overlap = options.overlap || 50;
 
  const chunks = [];
  let start = 0;
 
  while (start < content.length) {
    const end = start + chunkSize;
    chunks.push(content.slice(start, end));
    start = end - overlap;
  }
 
  return chunks;
}

Query Expansion

Improve retrieval with query expansion:

async function expandQuery(query) {
  const completion = await openai.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{
      role: 'user',
      content: `Generate 3 alternative phrasings for this search query: "${query}". Return only the alternatives, one per line.`
    }],
    temperature: 0.7
  });
 
  const alternatives = completion.choices[0].message.content
    .split('\n')
    .filter(Boolean);
 
  return [query, ...alternatives];
}

RAG Best Practices:

  • Use hybrid search (semantic + keyword) for best recall
  • Keep context under the model's token limit
  • Include source citations for transparency
  • Use lower temperature for factual responses

Next Steps