“Is ‘Big Data’ Just a Marketing Puff Term?” was the contentious title of an email circulated by ebulletins.co.uk. I am currently working with 25TB of data, so am I working with Big Data? I am certainly working with a lot of data, though for major organizations 25TB is a small volume of data. But are the challenges the same regardless of the volume of data?
I have three challenges in working with the 25TB of data. First, there are data management issues relating to the storage and backup of the data. Indexing and retrieving the data are challenging, not because of the volume of data, but because of the nature of the data. About half of the data are videos; two thirds of the remainder are audio data; the rest being text-based data. Each media object has a data element that refers to the provenance of the object (such as date created and location where the data were captured) and an information element relating to the subject content of the data. Although the volume of data to be indexed is significant, the problems of indexing multimedia data objects remain the same with one object or several thousand objects.
Second, there are data analysis issues, or more specifically, how to make sense of the data. By breaking down one data stream into a series of snapshots, tools can be used to analyse specific parameters that form a situation in an instance of time, providing a context from which information can be derived. The analysis of the data is complicated by multiple streams of multimedia data that need to be analysed together to inform a holistic view of the situation. However, the tools to analyse the data in this project are fundamentally the same, irrespective of the volume of the data being analysed.
Third, there are data mining issues of how to discover relationships between the data. This is potentially the strength of Big Data, though there is the danger that data mining can result in more data rather than more meaningful data.
We can (and do) capture immense quantities of data every second. Perhaps Big Data is merely the latest marketing term that will be superseded by, the Data Mountain, Epic Data or the Data Galaxy. But what we really need is accurate valid data from which meaningful information can be derived. As we seek to educate the next generation of IT developers and information consumers, we need to engender an understanding of the importance of accurate data capture and of how individual actions can affect data quality.
We may have Big Data but often have little information.
Further Reading: data mining is discussed in Chapter 3 and making sense of data is discussed in Chapter 15.
Please use the following to reference this blog post in your own work:
Cox, S. A., (2014), ‘Big Data, Little Information’, 23 May 2014, http://www.managinginformation.org/big-data-little-information/, [Date accessed: dd:mm:yy]