Definition of the problem: everyone's responsibility in data management

In a data science strategy, the precise definition of the problem is crucial. Asking the right questions allows us to gain insights, predictions, and insights that are useful for business in a big data environment. It is important to involve all the actors in the organization and use direct methods to raise the problem, integrating the vision of different areas. Collaboration between data scientists and business users is critical to the success of the project.

When designing a data strategy, the problem statement is a central issue. Its correct definition depends on the participation of all the key areas and sectors of the companies.

When we refer to a data science strategy, the correct definition of the problem is the key to this entire process. If we are looking for data to work for us, we must be able to ask the right questions. 

Once formulated, data can give us excellent insights, good predictions and reveal insights of great use to businesses in a context where escaping to big data seems unlikely: 65% of companies risk becoming irrelevant or uncompetitive if do not adopt data strategies, a sector which in Spain grows by 30% each year.

The first measure, we must be very clear about the problem we want to solve, so it is time to ask yourself: What is my main objective? What business problem do I have? What do I want to explain using data? What business model would I be interested in creating? Where do I think I can get the data to fix that problem?

As a second point, it is necessary to verify if there is a standardized methodology to present the problem to the data science team. For example, it is possible to detect problems directly (through signals in the environment or through the use of indicators, or by anticipating trends) or it is also possible to use tools such as the "Customer Journey Map" that can indicate the customer's experience. client (CX), among others possible.

But at this point, I would like to draw attention to certain dangers in defining a problem: that definition may be too vague or broad, or for some reason not the precise approach to the data that is required. 

That is why within the direct methodologies we find the most appropriate option when posing a problem: involving all the actors of the organization. The data comes from multiple sources, which is why it is also necessary to approach the problem from a multisector perspective. We need to integrate the vision of all areas in order to build the problem in the most effective way.

An interesting reflection on this is that problem posing is a step in the data science process that relies more on soft skills than technological or hard skills.

By this I mean that it is an instance similar to brainstorming: all the key representatives of the different areas interacting in the same space with the same objective, to arrive at the most complete definition of the problem.  

It is worth noting that lateral thinking is a valuable soft skill in this phase of a data science project. These crowdsourced scenarios are a highly effective technique as questions from other members easily lead the team to generate additional questions that better serve the ultimate goal.

Collaboration between data scientists and business users who – after all, are the ones with the greatest knowledge of it from their respective areas – is a key instance in the success of a project of this type: fluid communication and integrated among all of them, undoubtedly marks the path to success.

Julio Cesar Blanco – June 23, 2022

Be part of the Cloud world

Subscribe to our periodic summary of Technology News.