Entrenar algoritmos: por qué no hay que dejar sola a la inteligencia artificial – Zentricx

Training algorithms: why artificial intelligence should not be left alone

The 65% of companies in Spain run the risk of becoming irrelevant if they do not adopt big data strategies, a growing sector of the annual 30%. Machine Learning allows models to learn automatically, but requires human supervision to avoid negative results, as in the case of the Microsoft bot. Data quality and quantity, as well as accurate and efficient labeling, are key to successful Machine Learning model training.

Camila Husillos
July 4, 2023

Building a model for AI-based data analysis requires human-supervised training to avoid bias and unwanted effects.

According to the Cotec Foundation, the 65% of the companies they run the risk of becoming irrelevant or non-competitive if they do not adopt big data strategies, a sector that in Spain grows by 30% every year.

When we approach a project that has data as its center, there is a stage that is often overlooked: that of building the model. The model is capable of creating systematic procedures and rules around the data to find the solution to a problem. Once the model is built, it is possible to outline scenarios based on any available information.

In this sense, Machine Learning is a great revolution because the use of computer algorithms allows models to learn automatically through experience. In fact, the quality and quantity of that learning has as much to do with the success of the data project as it does with the algorithms themselves. However, this learning should not take place in complete "solitude" and I want to stop at this point.

Machine Learning algorithms learn from data, comparing relationships, developing understanding, making decisions and evaluating from the data they receive, but this training necessarily requires human accompaniment: when I say that the model does not learn, only the example comes to mind. paradigmatic of Microsoft and Tay, the experimental bot with which it was intended to learn more about the interaction between computers and human beings in social networks. The experiment went wrong and after a few hours and due to the interaction with certain users in real time, the bot became xenophobic and racist and had to be removed. His training had not received sufficient human monitoring.

Going back to training, the better the quality and quantity of data for your training, the better the model will perform. But even if the model has a large amount of well-structured data, that does not ensure correct training. For example, autonomous vehicles don't just need images of a street, they need tagged images of every car, pedestrian, street sign, and so on. Sentiment analysis projects require tags that help an algorithm understand when someone is using irony or sarcasm. Chatbots need to understand parsing, tones, etc.

Of course, more complicated use cases generally require more data and training than less complex ones. The more specific the model has to be, the more examples it will need to train itself. If, for example, we have an identification tool that only seeks to identify food versus one that tries to identify objects, the former generally needs less data.

What ensues is the question of how to prepare the data for successful model training. The best way is as simple as involving humans in the loop who are capable of labeling the most amount of data accurately and efficiently. In this way, learning is accompanied, possible deviations are corrected and thus the side effects of learning “in solitude” are avoided.

The data labeling process is often time consuming. Probably, depending on the scale of the project, a lot of resources are needed for proper data labeling and learning monitoring, but it is the surest way to create a reliable and effective Machine Learning model.

Julio César Blanco – August 12, 2022