Use this button to switch between dark and light mode.

What is RAG and Why Does It Matter for Trusted Generative AI?

Generative AI is widely predicted to transform almost every industry and use case, and companies spent more than $20 billion on the technology last year. But it also exposes these firms to new risks if not implemented strategically. In the first of two blogs in our ‘RAGs to Riches’ series, we will explain how the Retrieval Augmented Generation (RAG) technique enhances generative AI helps to mitigate these risks and deliver more accurate, relevant and trustworthy results.

The ABC of RAG

Retrieval Augmented Generation (RAG) is a technique to enhance the results of a generative AI or Large Language Model (LLM) solution. Perhaps the best way to understand RAG is to first look at how generative AI traditionally works, and why that poses a risk to companies seeking to leverage the technology.

A typical generative AI tool which hasn’t been enhanced by Retrieval Augmented Generation will generate a response to a prompt based on its training data and continuous learning from prompts and responses to and from users of the tool. This brings four main risks, which limit the confidence a user can have in its use of generative AI’s outputs:

  • Hallucinations: A generative AI tool can provide a response that sounds plausible but is false and based on learned experiences of prompts and responses with users, rather than relevant data. There have been examples of lawyers using generative AI to write a brief, but their tool provided results that cited completely fictional cases. While journalists using generative AI have noticed that it confidently asserts inaccuracies to be true – in one case, Bloomberg reported that a tool inaccurately said there had been a ceasefire in an ongoing military conflict.
  • Outdated data: A training dataset is static and reflects a point in time so it can quickly become outdated, leading to inaccurate responses to user queries.
  • Inaccurate data: If the data used by a generative AI tool does not come from a trustworthy and licensed source, it may not be accurate or reliable.
  • Black box: A feature of generative AI is that we don’t see how it forms a response, and it does not generally provide its sources.

A Retrieval Augmented Generation technique is regarded as the best way to overcome these risks. This approach forces the generative AI tool to retrieve every response from authoritative and original sources, which supersedes its continuous learning from training data and subsequent prompts and responses. This contextual data will shape the response that is provided to the user based off exact source content in the dataset and can provide a citation within the response.

This brings two significant benefits to companies using generative AI solutions:

  • Confidence: Companies can use these tools with the knowledge that their outputs come from authoritative original sources. Citations allow them to read the original sources to verify relevance and accuracy.
  • Ethics and compliance: Companies can demonstrate to stakeholders that they are using generative AI solutions which pull from original and accurate sources which are licensed for this specific use. This will allay fears of breaches of data protection and privacy regulations, or unethical harvesting of data.

Three considerations for effective use of RAG in generative AI

  1. Prioritize credible data for generative AI – The contextual data used in a RAG approach must be credible. This means sourcing data from trustworthy and licensed data providers and publishers. There have been instances of data allegedly being scraped and used in generative AI tools without permission from the publisher or individuals who the data belongs to, which brings legal and reputational risks. Companies must therefore ensure their data has been sourced ethically and be transparent about that.
  1. Find an optimal delivery method – A large company might have developed their own generative AI solution. In this case, they should think about how to bring in excellent data to support their RAG approach. Alternatively, companies may find it more cost-effective to use third-party generative AI tools to support their operations. These firms should seek to understand how that tool uses and collects data and verify that the provider is trustworthy and compliant.
  1. Set out an ethical approach from the top down – The C-Suite is responsible for setting the strategy and tone for how a company uses generative AI, which will give a lead to its employees. Making clear that you only want to use the most reliable and credible data and ensuring your generative AI tool is using a Retrieval Augmented Generation approach which clearly cites sources used to generate each answer, will inspire confidence in your company. 97% of professionals surveyed for the LexisNexis® Future of Work Report 2024 said it is important that human members of staff validate AI outputs, so staff should be trained and empowered to oversee this technology and look out for potential inaccuracies or regulatory breaches.

LexisNexis® offers data and technology for a successful RAG approach

Using a Retrieval Augmented Generation technique for generative AI is only effective if the contextual data it brings in is accurate, trustworthy, and approved for use in generative AI tools. LexisNexis provides licensed content and optimized technology to support your generative AI and RAG ambitions:

  • Data for generative AI: Our extensive news coverage, enriched with robust metadata, is readily available for integration into your generative AI projects with Nexis® Data+. Thousands of sources are already available for use with generative AI technology. This can be input into your own tools via our API.
  • Generative AI for research: Nexis+ AI™ is a new, AI-powered research platform that combines time-saving generative AI tools with our vast library of trusted sources. Nexis+ AI not only saves time on core research tasks like document analysis, article summarization and report generation, but deploys Retrieval Augmented Generations and citations that transparently illustrate the sources used for AI-generated content.