Skip to content

How Much Data is BIG Data?

A Fair Question.  Everyone in business is talking about Big Data, but the term is impenetrably, wonderfully vague in common use.  Certainly, it means dealing with large amounts of data, but businesses have been struggling with processing large amounts of data for eons.  The current emphasis on ‘Big data’ has to be more than just concern about ‘lots of data’.  It is instructive to recognize that our view of what constitutes large amounts of data has actually shifted over the years.  Fifty years ago, I was working on a demographic research project for the Department of Agriculture forecasting population migrations across the United States.  For that project, we had 24 punched cards for each county in the country for each of two US population censuses, 1950 and 1960.  The cards contained census data about such things as births, deaths, and migrations associated with each county.  For that day and age, this was very big data indeed.  Doing the math, that was something over 12 megabytes of data.  And backing up the data required a second full copy of all those punched cards!  The first generation mainframe computer that we were using had no operating system and used vacuum tubes.  It cost $2 million, but it only had 2k of main memory on board.  There were no higher-level programming languages, only machine code.  It took forever to process all of those racks of punched cards.  Back then, this was a lot of data!  This illustrates that, in a practical sense, ‘big data’ tends to be whatever level of data stretches your capacity to process it.  In the real world, big data is situational, relative to a firm’s operational ability to deal with it effectively.

Characteristics of Big Data.  With Big Data, the discussion always turns to the three V’s – VOLUME, VELOCITY, and VARIETY.[1]  We could add a fourth – V for VALUE, as well.

Volume is the most obvious aspect of Big Data.  Today, businesses are preparing for what is being called The Age of Exabytes.  An Exabyte is a billion billion bytes – and Zettabytes cannot be far behind (a trillion billion bytes).  Driven by advances in storage methods and technologies, the quest for competitive advantage by harnessing data seems endless.  Velocity describes the pace at which data arrives for storage.  Modern networking has had a profound impact on this rate of arrival, streamlining the connections to data sources and optimizing data flows everywhere.  Internet apps, automatic sensors, and devices of very description are generating data at a prodigious rate.  Variety is probably the most perplexing aspect of big data today.  This refers to the myriad formats in which modern data exist in corporate data warehouses.  Most of the history of data management has dealt with managing well-defined, structured data.  But today’s relevant business data is mostly highly unstructured, often machine-generated, and frequently free formatted text.  Organizing and evaluating it is complicated, to say the least.  Value deals with such factors as timeliness and accuracy.  Data is obviously more valuable when made available in time to use it.  And accuracy is crucial for the reliability and ultimately the trustworthiness of the data.  But a good estimate achieved early is often better than a more exact answer received later.  So, these factors interact importantly in the assessment of the value of the data.

Charles K. Davis, Ph.D.
Professor; Cameron Endowed Chair of Management & Marketing

See more posts by this author

© Copyright by Charles K. Davis, November 2013

 


[1] McAfee, A., & Brynjolfsson, E.  (2012). Big Data: The Management Revolution. Harvard Business Review, 90(10), 78–83

share this post

Community

Discipline

Goodness

Knowledge

Never miss an update...

Subscribe to the CSB Blog!