
Organisations are paying a price for poor data integrity
Problems with data integrity are costing large organisations about $390,000 a year on average. What are firms doing to stop bad data from derailing AI?
Data flaws can have broad implications and threaten the potential of artificial intelligence (AI) tools. A new study from information management company Iron Mountain, based on a survey of senior leaders at 500 large organisations worldwide (those with more than 1,000 employees), finds that issues with data integrity have lost these firms an average of $389,780 over the past 12 months.
Such information management problems must not persist if AI is to deliver the gains organisations want. The organisations in the research that are experiencing greater revenue and profitability increases from how they manage their information recognise the urgency of this issue. These high performers say that AI readiness is the information management area that will have the biggest influence on whether they achieve their organisational ambitions in 2025.
Data integrity is at the core of that readiness. “As AI technology advances, ethical considerations become increasingly important,” says Rohit Dhawan, Director of AI and Advanced Analytics at Lloyds Banking Group. “And nowhere is this more critical than in the quality and reliability of data.”
The key is for organisations to ensure their processes include practices that confront data integrity problems. The Iron Mountain research’s high performers are much more likely to have data integrity checks and balances in place – especially in three critical areas: eliminating redundant, obsolete or trivial (ROT) data; automating data extraction; and encrypting data and installing security systems.
1. Eliminating redundant, obsolete or trivial data
Organisations might believe that the bigger the dataset, the better, as they maximise the insights their AI models can access. But inaccurate and out-of-date data can add vulnerability rather than value, according to Swami Jayaraman, SVP and Chief Enterprise Architect at Iron Mountain. Not only can ROT data skew the accuracy of AI outputs, says Jayaraman, but “the more data you have, the bigger your attack surface becomes.”
Every high-performing organisation in the research has set up processes to eliminate ROT data. On average, all organisations scan for this data once a month, but higher risk appetites may call for more regular reviews, especially for redundant and trivial data.
And scans alone will not ensure integrity. Organisations also need a clear rules engine that sets out what data should be retained or eliminated, with strict practices for following the rules.
2. Automating data checkpoints
Every high-performing organisation in the research has automated data extraction, and they are also more likely than others in the study to have automatic validation checkpoints to detect issues at various process stages.
Automation is a critical way to check data quality at scale. Machine-readable controls can, for instance, check that data being added into a system meets the required standards and trigger alerts to notify employees where attention is needed to address any inconsistencies. These controls can be embedded throughout process workflows to continually test compliance and identify any anomalies. For instance, a retail company’s automated validation checkpoints can find any glitches between sales transaction data and inventory levels to prevent shops from running out of stock.
“Automated systems analyse large datasets for accuracy and anomalies, reducing reliance on manual checks,” says Lloyds Banking Group’s Dhawan.
Dhawan also expects to see more organisations using retrieval-augmented generation (RAG), which combines generative AI with verified datasets to improve accuracy. Third-party verification standards and advanced identity verification techniques such as liveness detection will also play an important role: they can make sure that datasets have enough integrity to reliably feed into AI models.
3. Encrypting data and installing security mechanisms
Data encryption and other security mechanisms are incorporated into the workflows of every high-performing organisation in the research. Unless it has a security-centric focus, an organisation risks failures that expose it to regulatory sanctions and reputational damage, setting back its ability to achieve competitive gains through AI.
Jayaraman and Narasimha Goli, Iron Mountain’s CTO, recommend working inside ‘walled gardens’, with tight controls over who has access to source data, how they connect to the AI model and the security protocols they have to observe. This minimises the risk of data leakage.
Once AI models move into production, organisations need to monitor their security metrics. Constant surveillance will allow them to address issues swiftly and suspend any deployments while they sort out the problems.
Data integrity is at the core of being AI-ready
The pressure is on to invest in data integrity. Without it, any gains from AI could be undone. Data issues will lead AI models to create flawed outputs that can misguide employees in their decision-making.
“Fine tuning AI models with exclusive data sources and context-specific insights can give businesses a competitive edge,” says Andrew Chin, Chief AI Officer at AllianceBernstein. “But making sure the input data is accurate and responsibly sourced is mission critical.”
See Iron Mountain’s executive summary for more on how large organisations are working towards AI readiness
