Generative AI’s potential for companies is well-known, but the technology can create new risks if it is not powered by original and trustworthy data sources. In the second blog in our ‘RAGs to Riches’...
Generative AI is widely predicted to transform almost every industry and use case, and companies spent more than $20 billion on the technology last year. But it also exposes these firms to new risks if...
Adverse media screening has become an essential part of a company’s risk management process, both while onboarding third parties and customers and throughout the relationship. In recent years, technological...
Generative Artificial Intelligence (GenAI) stands as a transformative force in the digital landscape, promising innovative solutions and creative approaches to data synthesis. However, GenAI faces its...
In a recent LinkedIn post , data and technology transformation consultant Tommy Tang writes, “Generative AI has emerged as a potent tool across various domains, from content creation to bolstering decision...
Generative AI’s potential for companies is well-known, but the technology can create new risks if it is not powered by original and trustworthy data sources. In the second blog in our ‘RAGs to Riches’ series, we explore those risks; highlight best practices around pulling data for generative AI using a Retrieval Augmented Generation (RAG) technique; and suggest the key questions to ask your data provider for a trustworthy and effective approach.
87% of companies plan to adopt generative AI technology (if they haven’t already), according to the LexisNexis® Future of Work Report 2024. But, in recent years, far too many corporate AI initiatives have ended in failure. A common cause of this is poor quality data – as the saying goes, “garbage in, garbage out”. The outputs from generative AI tools will only be as accurate and relevant as the data powering them.
The problem typically lies in companies inputting low-quality data from third parties into their generative AI models. This might be a third-party generative AI tool which a company uses to support its work, or a third-party data aggregator from which it pulls content to power its own generative AI solution. If these providers cannot clearly demonstrate where and how they pulled their data, it poses five main risks:
Retrieval Augmented Generation (RAG) is a technique to enhance a generative AI tool to mitigate these risks. Traditionally, a tool learns continuously from its original training data and its prompts and responses with users. But Retrieval Augmented Generation forces the model to pull information from an extra layer of data which supersedes the previously learned data. This data should be credible, authoritative and pulled directly from original sources, such as the data licensed for generative AI use by LexisNexis®. The generative AI model is therefore required to generate every answer by pulling from this data as context and cite the original source(s) used in each response.
Retrieval Augmented Generation offers myriad benefits, for example:
Unlocking the benefits of a RAG approach to generative AI requires access to trustworthy data which is optimized for use in this specific technology. The LexisNexis® Future of Work Report 2024 found that 9/10 of professionals’ main consideration for choosing a generative AI tool is the quality and accuracy of its output. While 7/10 said trusted, accurate data sources are the key to fostering trust in their use of generative AI. So how can companies pull this contextual data for their generative AI models using a RAG approach from original sources?
Pulling from original sources to power generative AI initiatives involves going to individual, reliable publishers and requesting to use their data. Companies operating worldwide may need to do this for sources across multiple jurisdictions and languages. This would be extremely time-consuming, both to negotiate acquiring the data and to ensure compliance with differing regulations over time.
Therefore, it is far more efficient to outsource the acquisition of data sources to a specialist third-party provider. Depending on your budget, there are two approaches you might take:
Whichever approach you take, it is critical that the third-party provider has ensured each data source it uses is licensed and approved for the specific use of generative AI and meets all relevant regulations and ethical standards around data protection and privacy. Your company will be held accountable for any failures in this respect. Questions to ask a potential provider include:
Applying Retrieval Augmented Generation into your generative AI development is only effective if the contextual data it brings in is accurate, trustworthy, and approved for use in generative AI tools. LexisNexis provides licensed content and optimized technology to support your generative AI and RAG ambitions: