✉ Contact

#Data Quality : Noise in data

Written by Ascent Standard

“The need for high-quality, trustworthy data in our world will never go away.”

While the above statement is true, it comes with its own set of challenges — starting with the collection of quality data.

With exponential growth in data, the need for trust arises more than ever. Even though we have evolved from data silos to pipelines (ETL/ELT), streaming, modern data stacks, data warehouses, multi-cloud, and data mesh architectures, we are still faced with an age-old problem: trusting data.

What data is good for which purpose? What data can be used where? How can it be improved? What data is sensitive?

These questions remain unanswered despite decades of Data Management disciplines such as Data Governance, Data Quality, Data Observability, Data Catalog, Master Data Management, Data Remediation, Data Discovery, Active Metadata Management, Data Privacy, Data Intelligence, Data Lineage, Reference Data Management, and many more.

Somewhere along the way, the original intent was lost — the objective of trusting data and improving it. Trust me, I am with you on this. If you don’t believe me, see my hand-drawn picture below.

Noise in data quality

Whilst working with data, the deeper you look, the scarier it gets — and that’s a good sign.

For the brave data professionals who practice all these disciplines and are still hanging in there, this is our effort to cut through the noise and highlight what truly matters in modern data quality.

Key Factors Creating Noise in Modern Data Quality

Scale with Increase in Data (Scale)

As data volumes grow, platforms must scale to handle massive datasets, diverse data types, and multiple architectures. Speed and scale are critical, but equally important is the ability to scale using no-code or low-code approaches.

Each Organization Is Unique (Context)

Organizations differ even within the same industry, technology stack, or architecture. Processes, people, and customers define context. Without understanding this context, data solutions become meaningless.

For example, the same income range can have completely different interpretations in marketing, risk analysis, or underwriting — same data, different meaning.

Supporting Organizational Evolution (Maturity)

Mergers and acquisitions introduce complexity. Platforms must support multiple maturity levels as organizations evolve, otherwise businesses end up replacing platforms every year — which simply does not work.

Relevance to Business Value (Impact)

Data quality efforts often focus too much on metrics and alerts while forgetting business value. Anomalies or outliers may actually be intentional business strategies.

Excessive alerts, unnecessary root-cause analysis, and irrelevant notifications only create more noise. Data quality must align with strategic business goals.

Time to Value (Time & Cost)

Heavy governance frameworks with complex processes slow everything down. By the time implementation is complete, the data landscape has already changed.

Time to value should be measured in days — not weeks, months, or years.

Supporting All Users (Stewardship)

Organizations have both business users and technical users. Both must collaborate to improve data quality. Otherwise, silos and communication barriers only increase.

Data Accuracy

Data Accuracy is one of the commonly cited dimensions of Data Quality. It aims to measure how closely data represents reality.

We define Data Accuracy as:

“The degree to which a data value represents what it purports to represent.”

One challenge is that there is no universal agreement on how Data Accuracy should be defined. More importantly, achieving 100% accuracy is impossible.

Walter A. Shewhart observed that all measurement systems introduce error. W. Edwards Deming expanded on this by stating that there is “no true value of anything.”

Since perfect accuracy is unattainable, the real question becomes: how inaccurate is the data?

This cannot be answered by analyzing the data alone. Assessing Data Accuracy requires stepping outside the dataset, independently evaluating samples, and comparing observations with curated data.

One exception exists when data itself represents reality — such as bank account balances in transactional systems. In such cases, data is not an observation but the reality itself.

What’s Next

Welcome to the Data Quality Series. In upcoming blogs, we will explore the challenges of each data discipline and practical ways to overcome them to achieve trusted, high-quality data.

The goal of this series is to help data leaders and influencers avoid becoming victims of hype, fads, and broken promises.

Stay tuned for the next installments in the data series…