What’s the Difference Between ‘Big Data’ and ‘Data’?
I’m not sure it’s needed but frankly when the topic arises (and it does all the time) it’s just too tempting to pass up. Any definition is a bit circular, as “Big” data is still data of course.
Data is a set of qualitative or quantitative variables – it can be structured or unstructured, machine readable or not, digital or analogue, personal or not. Ultimately it is a specific set or sets of individual data points, which can be used to generate insights, be combined and abstracted to create information, knowledge and wisdom. Traditional analysis tools and software can be used to analyse and “crunch” data.
There are “dimensions” that distinguish data from BIG DATA, summarised as the “3 Vs” of data: Volume, Variety, Velocity. Hence, BIG DATA, is not just “more” data. It is so much data, that is so mixed and unstructured, and is accumulating so rapidly, that traditional techniques and methodologies including “normal” software do not really work (like Excel, Crystal reports or similar). Gartner stated that in 2011, the rate of data growth globally was around 59%. This means that almost 40% of all data ever created was created in the previous year and I am sure it is even more now.
Thus, “BIG DATA” can be a summary term to describe a set of tools, methodologies and techniques for being able to derive new “insight” out of extremely large, complex sample sizes of data and (most likely) combining multiple extremely large complex datasets. The potential here is that if we crunch true BIG DATA, we can make an attempt to establish patterns and correlations between seemingly random events in the world. Then, by establishing and testing hypotheses, we could understand causality, so predictions and deep insights could be made.
Due the complexity of BIG DATA and computational power / (new) methods required, this has only been possible to attempt in the last decade or so. Even today, most BIG DATA projects do not attempt to test hypotheses, or establish patterns, thus missing out on the potential.
In practice, BIG DATA is almost always to do with multiple sets of data, and in most cases, has little to do with personal data (though probably personally identifiable data is likely to be ubiquitous, given that sufficient correlation of multiple datasets could make personal data “fingerprints” unique).
In my experience however, when ‘big’ data is discussed, the discussions are not really about ‘BIG’ data. Most examples given, such at those at the Big Data in Government Conference are to do with just better use of data, reporting and analytics. It is not new, nor should it be viewed as new. Arguably, it has been (should have been) happening since the beginning of organised government. All too often definitions and key concepts in the data / BIG DATA world are not shared amongst practitioners, and fashions and fads take over.
Further, there is no consensus or shared understanding that using data and BIG DATA are different things and could deliver different outcomes. No one quite knows what special benefits might come from BIG DATA, not even in the private sector world. Nonetheless, there have also been some notable successes in using BIG DATA, such as Google Translate, Tesco Clubcard retail optimisation or airline fare modelling and prediction algorithms.
So let's get back to an easier topic such as good “small” data use. None of the examples given at the recent Big Data in Government Conference were BIG DATA. More worryingly, none of them really affect the day to day business of the government - the actual decisions being made by officers or managers. I will repeat that: I heard no examples where a decision made was changed (at operational level) by a government officer or civil servant based on new use of data (BIG or otherwise). Data and its analysis appeared to sit as an ‘appendix’ on the side of government. Being in an appendix means that it is not involved in the day to day workings and processes of government. This may have been the fault of the specific examples, but I would love to hear of some more in future conferences.
Today, many more excellent tools, platforms and ideas exist in the field of good management of data (not just BIG DATA). This creates an enormous and immediate potential for the Public Sector in making relevant and timely improvements in “small” data management, data integration and visualisation. Most importantly, in integrating “small” data into the real time decision making of public servants and making it useful. I think this is best achieved by not being distracted by fancy and fashionable titles such as BIG DATA, but focusing on boring (but essential) transformation of the Public Sector. Let's have a “small” data (or just plain old “data” conference. Less sexy, but more useful...