graphic representation of Big Data

The challenge of Artificial Intelligence: fast delivery, continuous development

15 of August of 2023

Developing an Artificial Intelligence project, understanding that this complex concept also encompasses computational statistics and machine learning, has different phases and a very high component of uncertainty stemming from the fact that not every problem can be solved.

To face this reality with optimism, it is worth coming up with solutions that generate direct value, that make a difference. In my work as part of Ferrovial’s Digital Hub, we soon discovered that, on many occasions, getting to deliver value was expensive with these projects; they required a lot of time for what is contributed to be seen.

Therefore, we analyzed the entire process to understand the impacts data science has and how the benefits of the AI we apply can be maximized. We landed on a plan in four installments: accessing the data, visualization, and treatment, selecting the best model, and putting it into production.

  • Access to data

“If you want a model, you need data”

For this first phase, we need raw data – that is, to understand how it comes out of the data-generating machine (whether it’s a sensor, a person charging expenses, a drone making videos…) to be able to model the system itself.

The data must be available, and the box where it is stored must be accessible (whether this box is an Excel file, an SQL database, a DataLake …) in order to have up-to-date data that is usually worked with.

We need a few more things, like the description of the data (what does each variable mean?) and the update frequency, but for now, that’s enough requirements at this stage.

  • Visualization and Treatment

“If you want a model, you need to understand the data”

This stage is recurrent in every project of this type and will be returned to many times.

You always start by understanding the data you have, carry out a few problems you try to solve with simple techniques, such as eliminating some observations because they are doubtful, selecting the images that really show something… You also face other problems where you have to use very manual techniques, such as marking where exactly a license plate is in an image, and other more complicated ones, such as deciding whether a behavior is normal or not in a set of electrical generation).

But you will always come back.

You will always come back because when the model does not work, you’ll have to see what happens to the data.

You always come back because when the model is biased, you’ll have to see what happens to the data.

Always, always, always…

  • Selection of the best model

“If you want a model, you need a model”

Every problem, every goal, will have a model with which it will be achieved better.

The key part is to understand the problem because the model’s objective will be defined based on this.

Once you have an objective, you must select the best model or, at least, a model that meets the necessary conditions to alleviate, if not completely solve, the initial problem.

And for that, you test, you divide the data so that you can test each model as if you were going to use it.

You define a metric that you want to optimize.

And after testing and testing, you choose the best one, and… is that it?

  • Putting it into production

 “If you want a model, you want to use the model”

And it can be used in different ways.

It can be used periodically: the system can get new data, make predictions, and store them somewhere where you can consume them.

It can be used on request: the system can be ready for you to send it data at any time, and it will return the prediction to you.

And it can be used constantly: the system can always be making predictions because you’re sending it data non-stop.

Combining the steps to ensure impact

How do we accelerate all these shared steps in the different projects to deliver value quickly?

We turn to Stack MLOps, a set of tools that make it possible to accelerate the delivery time of an MVP, minimum viable product, by 90%:

  • Providing access to data
  • Automating model selection
  • Automating putting it into production

And not only that!

Once the automatic model is delivered, these tools allow it to be iterated in the background, conducting experiments and comparing new models developed ad-hoc for the particular objective and switching from one model to another in a simple way.

The way of working? You are connected to the data, process it in the way you need, define the problem and the metrics to optimize thoroughly, and… you now have the model ready for production!

And after that?

Then the process of continually improving the models begins. The data scientists are behind the scenes, sharing the metrics they get with unseen models, with new approaches to the problem, and when they improve the model in production, they change it without difficulty.

This is the way we work in data science at our Artificial Intelligence Center of Excellence.

Getting value fast, increasing it day by day.

There are no comments yet