The definition of the data scale has everything to do with the human context.
Big Data is a combination of enormous volumes of structured, semi-structured, and unstructured data that are too complex to be analyzed and processed by typical software tools for processing, storing, and analyzing data. Big Data can also be defined using the three V’s: Volume, Velocity and Variety. Volume refers to the amount of data generated every second; velocity means the speed at which data is received and processed; and variety refers to the different data formats.
While years ago, data used to mean documents and papers, perhaps with some photos or videos, now it means much more. It is almost impossible to estimate the amount of data we produce. It is believed that nearly 2.5 quintillion bytes of data are generated every day, given that we are in the digital age, which includes all web data generated by emails, apps, web sites and social platforms. Thanks to the growing number of digital devices and the growth of IoT, this figure increases even more.
Small Data, on the other hand, is a ‘part’ of Big Data. Specifically, it involves segmenting Big Data into smaller portions; ‘many Small Data’ make Big Data possible. These are data sets small enough to be conveniently stored on a single machine, particularly local servers or a laptop, and are easily accessible.
Big Data: The ‘Unmanageable’?”
In 1989, journalist Erik Larson used the term “Big Data” for the first time in the way we currently understand it, in an article where he sought to foresee the future relationship between marketing and the use of customer data.
Those were the years when the internet was becoming widespread, with the birth of Google and on the verge of the 2000s—a decade that saw the rise of companies generating and storing large amounts of data created through collaboration and social networks. The term became popular because the volume, velocity, and variety of data grew exponentially.
Now, I’d like to pause here. It is often argued that Big Data is ‘unmanageable’ and ‘ungraspable’; however, I believe the right question to ask is: ‘For whom?’ It is essential to understand that ‘Big’ or ‘Small’ is nothing more than a concept defined in relation to the context.
From cuneiform writing—the oldest known writing system—to modern data centers, humans have always collected information. But how much data is ‘a lot’ or ‘a little’? What is the standard?
There are Big Data experts who trace its history back much further than 1989. Some even place it in the Palaeolithic era, because the key is to think of it in a relational manner: What did the notion of ‘a lot of data’ imply back then? What form did the ‘three Vs’ take? Surely, it was related to the capacity to generate and collect information at that time. Certainly, it was not—conceptually—the same as what came later, I’m referring to the increasing volume of data generated in the historical periods that followed, with the industrial revolutions and the enormous impact of the digital revolution. The notion of ‘a lot’ or ‘a little’ data must have necessarily evolved in each historical period. Therefore, Big Data has less to do with a specific quantity of data and more with the human context and the capacity to generate, process, and manage that information.
The point is that a good recommendation for getting started in the world of Big Data is to begin with Small Data, applied, for example, to the commercial, sales, or production area. Diving into Big Data requires learning and training that Small Data can provide. Otherwise, it would be like diving into the ocean without knowing how to swim when the more logical path would be to start with a pool.
By Julio Cesar Blanco – September 12, 2022