Big Data, Small Data​: it all depends on how you look at it

Big Data refers to large volumes of complex data that cannot be processed by traditional software tools. It is characterized by the three Vs: Volume, Speed and Variety. Small Data is a part of Big Data, which refers to smaller and easily accessible data. The term Big Data emerged in the 1980s with the massive growth of the Internet and the increase in data generated. However, the perception of whether it is manageable or not depends on the context and the human ability to process it. Starting with Small Data can be an initial step to enter the world of Big Data, especially in commercial or production areas, as it provides gradual learning and training.

Defining data scale has everything to do with human context

Big Data is a combination of huge volumes of structured, semi-structured and unstructured data that are too complex to be analyzed and processed by typical software tools for processing, storing and analyzing data. Big Data can also be defined using the three Vs: Volume, Velocity and Variety. Volume refers to the amount of data generated every second; speed means the speed at which data is received and processed; and variety refers to the different data formats.

If a few years ago data used to mean documents and papers, perhaps with some photos or videos, now it means much more than that. It is almost impossible to estimate the amount of data we produce. It is believed that almost 2.5 quintillion bytes of data are generated every day, given that we are in the digital age, which includes all web data generated by emails, apps, websites and social platforms. Thanks to the increasing number of digital devices and the growth of the Internet of Things, that number increases even more.

Small Data, on the other hand, is a "part" of Big Data, specifically, it would be like segmenting Big Data into small doses, "many Small Data" make Big Data possible. This is data small enough to be conveniently stored on a single machine, particularly local servers or a laptop, and is easily accessible. 

Big Data: The “intractable”?

In 1989, the journalist Erik Larson used the term “Big Data” for the first time in the terms that we currently know, in an article where he sought to see the future relationship between marketing and the use that would be given to the data of the customers. 

Those were the years of the massiveness of the Internet consolidating, with the birth of Google and close to the 2000s, a decade that saw the birth of companies that generate and store large amounts of data generated from collaboration and social networks. The term became popular because the volume, speed and variety of data grew exponentially.

Now, I would like to dwell on the following. It is often argued that Big Data is “intractable” and “unreachable”; However, I think the correct question would be: “For whom?” It is necessary to understand that “Big” or “Small” is nothing more than a concept that is defined in relation to the context.

From cuneiform writing – the oldest known writing system – to modern data centers, humans have always collected information, but how much data is “a lot” or “a little”? What is the parameter? 

There are experts in Big Data who place its history long before 1989, there are even those who place it in the Paleolithic, because the key is to think about it in a relational way: What did the notion “a lot of data” imply at that time? What was the shape that the “three Vs” took? Surely, it would have to do with the ability to generate and collect information at that moment. For sure, it was not - in conceptual terms - the same as what came later, I am referring to the growing volume of data that was generated in the historical periods that followed, with the industrial revolutions and the enormous impact of the digital revolution. The notion of “many” or “little” data must have necessarily mutated in each historical period. Big Data, then, has less to do with a certain amount of data but with the human context and the ability to generate, process and manage that information.

The point is that a good recommendation to get started in the world of Big Data is to start with Small Data, applied for example to the commercial area, sales or production. Jumping into Big Data requires learning and training that Small Data can provide, otherwise it would be like jumping into the ocean without knowing how to swim when the most logical path would be to start through a pool. 

Julio Cesar Blanco – September 12, 2022

Be part of the Cloud world

Subscribe to our periodic summary of Technology News.