Problem Definition: A Shared Responsibility in Data Management

In a data science strategy, the precise definition of the problem is crucial. Asking the right questions enables us to obtain insights, predictions, and useful knowledge for businesses in a big data environment. It is important to involve all stakeholders within the organization and use direct methods to frame the problem, integrating perspectives from different areas. Collaboration between data scientists and business users is fundamental to the success of the project.

When designing a data strategy, defining the problem is a central issue. Its correct definition depends on the participation of all key areas and sectors within companies.

When we refer to a data science strategy, the correct definition of the problem is the key to this entire process. If we want data to work in our favor, we must be able of asking the right questions.

Once these questions have been posed, data can provide excellent perspectives, accurate predictions, and reveal insights of great value to businesses in a context where escaping big data seems unlikely: 65% of companies risk becoming irrelevant or non-competitive if they do not adopt data strategies, a sector that is growing by 30% annually in Spain.

The first step is to clearly understand the problem we want to solve, which leads to the following questions: What is my main objective? What business problem do I have? What do I want to explain using data? What business model would I be interested in creating? Where do I think I can obtain the data to solve this problem?

The second step is to verify if there is a standardized methodology for framing the problem for the data science team. For example, problems can be directly identified (through signals in the environment or by using indicators, or by anticipating trends) or tools such as the “Customer Journey Map” can be used, which can indicate the customer experience (CX), among other possibilities.

At this point, I would like to draw attention to certain risks in the definition of  a problem: that definition may be too ambiguous or broad, or for some reason, it may not be the precise focus required by the data.

This is why within the direct methodologies we find the most suitable option when framing a problem: involving all stakeholders within the organization. Data comes from multiple sources, so the problem must also be approached from a multisectoral perspective. We need to integrate the vision of all areas to construct the problem most effectively.

An interesting reflection on this is that problem definition is a step in the data science process that relies more on soft skills than on technological or hard skills.

This means it is a stage similar to brainstorming: all key representatives from different areas interacting in the same space with the same goal, to reach the most refined definition of the problem.

It is worth highlighting that lateral thinking is a valuable soft skill at this phase of a data science project. These collective exchange scenarios are a highly effective technique since questions from other members easily lead the team to generate additional questions that better fulfill the ultimate goal.

The collaboration between data scientists and business users—who ultimately have the most knowledge of the business from their respective areas—is a key step in the success of a project of this kind: smooth and integrated communication between all of them undoubtedly paves the way to success.

By Julio Cesar Blanco – June 23, 2022

Be part of the Cloud world

Subscribe to our periodic Technology News digest.