Dangers of Incomplete Data

It is easy to make comparisons between sets of quantitative data but it is important to be aware of the dangers of incomplete data.  There is a risk of mistaking data for information.

A new management information system has been introduced to improve the analysis of sales fluctuations in an organization.  The system has been operational for two weeks but due to transmission problems, not all stores have been able to transmit their data.  The data reported are therefore incomplete.  Even though the data are incomplete, comparisons are still being made between total sales in week one and two.  Analysis of incomplete data will result in wrong information.

The incomplete data will be regularly used to compare quarterly, half-yearly and end-of-year sales, which will always be wrong.  In subsequent years store managers will be required to explain why sales in their store differ to the sales this year, which are wrong!

The current transmission problems may be resolved so that data from week three onwards does include data from all stores.  However, it is likely that data transmission will fail in one or more stores in the future, affecting the completeness of the data.  There is also the likelihood of a new store opening or a store closing, again affecting the total sales and the comparisons that can be made between trading years.

Comparisons of quantified data can be useful but care is needed to ensure that the context in which the data are collected is fully documented.  Decisions based on incomplete data will be wrong and considerable time can be wasted in organizations as staff are tasked with trying to explain data and develop action plans to improve situations based on data that are wrong.

Individuals have a responsibility to ensure that the context in which the data are collected and the limitations of the data in terms of completeness and accuracy are considered when analysing the data, to avoid the dangers of incomplete data.  The longevity of data means that the legacy of poor quality data can haunt organizations long into the future.  The introduction of a new information system provides the opportunity to establish processes to capture and analyse reliable data.

In this organization, the initial data collected may need to be discarded until all stores are regularly transmitting data.  Processes will also need to be implemented to address transmission problems in the future.  The organization should consider what information it needs from the system as store level data may be more useful.  Rather than trying to rationalize incomplete data the organization needs to take action based on complete information.

Further Reading: approaches to clean incomplete data are discussed in Chapter 8.

Please use the following to reference this blog post in your own work:

Cox, S. A., (2014), ‘Dangers of Incomplete Data’, 14 November 2014, http://www.managinginformation.org/dangers-incomplete-data/, [Date accessed: dd:mm:yy]

© 2014 Sharon A Cox