by Elvin Johnson, Data Scientist, Lucas Systems
Machine learning (ML) is a term we hear a lot in general discussion and especially when it comes to services and software coming to the warehouse and distribution space. But do we really know what it means? In this article, I’d like to give you a peek behind the curtain and the process, beginning with how to build and train models for success.
To start, simply defined, ML is a subset of artificial intelligence (AI) that focuses on developing algorithms that enable machines to learn from and make predictions or decisions based on past or real time data. The process of having these models learn from past data can be referred to as ‘model training.’ This process is critical because it allows the model to learn/identify patterns from existing data, refining its ability to make accurate predictions or decisions. A well-trained model can generalize from its training data to new, unseen data, making it effective in real-world applications.
Mastering model training for success
When training models, there are several critical steps to be considered to ensure the models perform well and generalize. Here are the key steps that can then set the stage for deployment:
- Data Collection
Data collection is the critical first step in building an ML model. The purpose is to gather relevant data that the model will use to learn patterns. The type of data collected depends on the specific problem we’re solving. Data can come in various forms, such as numeric (e.g., product sales), categorical (e.g., product categories, user types), or unstructured data like text (e.g., reviews, articles), images. Data could be sourced from databases (if it’s customer specific), APIs, web scraping, manual collection, public sources.
Further, the quality and quantity of data needs considered for effective model performance. High-quality data should be accurate, complete, consistent, relevant, etc. The absence of these qualities can lead to flawed, noisy predictions and can degrade model performance. Quantity is equally crucial, as larger datasets allow models to learn from a wider range of examples and generalize better to unseen data. Additionally, data diversity is key to ensuring the model performs well across different real-world conditions, reducing bias and creating more reliable predictions.
- Data Preprocessing
Before the data can be ingested by an ML model, it needs to be preprocessed. Data preprocessing includes steps like data cleaning, normalization and feature engineering, all of which are vital for effective learning. Data cleaning addresses errors, missing values, and outliers to prevent unreliable predictions. Normalization scales data numerically across features, preventing any one feature from disproportionately influencing the model. Feature engineering enhances the model’s ability to recognize patterns by creating new features or transforming the existing ones.
To relate this to a warehouse process, let’s say we need to forecast the future demand (velocity) of certain SKU’s that are being picked in a warehouse for the upcoming month, using an ML model. The pick data for a warehouse might consist of features like the past average velocity/day for a particular SKU, past average velocity/week, the product category of the SKU, product cost, number of orders the SKU was part of, average SKU quantity per order etc. Preprocessing steps like normalization would ensure that data such as average SKU quantity/order, velocity/day are brought to a similar scale by using a smart scaling technique since they could vastly differ in scale. Data cleaning would ensure that missing data is filled in using a suitable strategy (feature dependent). Further, feature engineering could be used to derive additional features. For example, for a particular product, an additional feature could be generated – let’s say, average velocity from the same time period last year. This feature will help usher in seasonality to a certain extent.
The model being built will only be as good as your data. Bad data can throw off model predictions. It’s important to keep in mind that it’s useful to loop in the customer in this process if the data source is customer data, because it’s essential to understand the customers use case and the processes associated, so that the data being prepared matches real-life processes for the customer. For example, the best strategy to fill in for missing data can be best understood by understanding the real warehouse processes for that particular customer.
- Model Selection
Choosing the right ML model depends on the type of problem we’re trying to solve, it’s complexity, amount and nature of the data. Typical ML models can be classified into two major categories –classification and regression. Classification models are used to predict if a particular data point belongs to a particular category/class. For example, in the warehouse context, we might have to predict if a particular order is likely to experience delays or not, based on order data. Since the prediction that needs to be made is a simple ‘yes’ or ‘no,’ the model is called a classification model. On the other hand, let’s say we have to predict the product velocity of all products in a particular warehouse for a particular month. In this case, the prediction would be a continuous value (the output is not restricted to a set of discreet values previously seen). The model used to make this prediction is called a regression model. Within classification and regression, there are multiple techniques, models that could be used. Typically, one would pick a variety of models and compare the performance of each of these models, to determine what’s best.
Simpler models for example, decision tree models are often favored for their ease of interpretation, which can be beneficial in operational settings where transparency is essential and help warehouse managers make decisions based on clear, explainable outcomes. Complex models offer higher performance requiring more resources, less interpretability. However, complex models like deep neural networks, though resource-intensive, can offer greater predictive power and handle larger, more intricate datasets. These models could, for example, enhance dynamic routing algorithms in real-time, optimizing picking paths or delivery schedules. The quality of data available can also dictate model selection. For example, more complex models like neural networks are very resilient to outliers, missing data.
- Model Training
Model training is the process where an ML model learns patterns from the ingested, preprocessed data by adjusting its internal parameters to improve prediction accuracy. The training is performed on some form of compute, depending on the model. Larger, complex models like neural networks require GPU’s to train, while simpler models can run on a CPU and typically, training is performed on a cloud resource. Before training, the data is typically split into training and testing sets, where the model learns from the training data, while performance is evaluated on the test set. It’s also important during the training process that the model does not overfit, which means the model does well in making predictions on seen data, but can’t generalize well on unseen data. Certain techniques are used to help prevent that during the training process.
ML models are often trained and then deployed, while sometime models can be trained in real time. For example, real-time data collection on pickers’ movements within the warehouse combined with dynamic slotting models can suggest optimal paths and product placement to minimize walking distance. By continuously learning from this movement data, ML algorithms can adjust in real time to changing conditions like increased demand for certain products or the addition of new SKUs, enhancing travel efficiency.
- Model Testing
After training, a model must be tested on unseen data to ensure it can generalize well. This process, known as model testing, checks if the model has learned meaningful patterns for accurate real-world predictions. Evaluation metrics reveal how well the model generalizes and whether further refinement, such as adjusting preprocessing, selecting a different model, or tuning model parameters are needed, suitable to the warehouse’s operational needs. By aligning the right machine learning model with specific warehouse challenges, whether it’s improving order accuracy, reducing picking times, or forecasting stockouts, you can further help ensure success. Aligning the right model with the specific challenge we’re trying to solve includes the above-described steps of model selection, training and testing. Other factors that play an important role in aligning the model include the type of compute available (GPU’s, CPU’s), real-time inference speed needed, the complexity of the problem we’re trying to solve etc. For example, for an application like forecasting future product velocities, a model that takes longer to train (a more complex) model can be used, since we may not need the results in real time. On the other hand, for an application like predicting order completion times in real-time – one that is constantly using new data to train, we would need a simpler, pretrained model (or a model that can be trained very quickly in real time) that can make quick predictions due to the real-time prediction demand.
AI has the potential to revolutionize the warehouse and distribution space by enabling more intelligent, data-driven decision-making. From optimizing picking routes to forecasting demand, these models can help warehouses run more efficiently, with greater accuracy. A very good example of AI models that are used widely in various industries are LLM’s (large language models), – the core of softwares like ChatGPT which use large amounts of data to learn from it and is able to perform complex tasks.
Leveraging these types of models to model complex problems in warehouses is the way forward with data being at the center of it all. The moral of the story – AI has the ability to completely and rapidly transform distribution centers; all being made possible by leveraging data available to its maximum extent. Your data has a story to tell; the story can help shape the future of your warehouse.
Elvin Johnson is a Data Scientist at Lucas Systems, where he applies advanced machine learning and AI to transform warehouse and supply chain operations. He earned his Master’s degree in Electrical and Computer Engineering from Carnegie Mellon University, specializing in Machine Learning and Data Science.
Elvin has completed multiple engagements across industry and academia, contributing to a wide range of data science and deep learning projects. His research background includes co-authoring three machine learning based publications, as well as hands-on experience with a wide range of ML and data science applications across diverse domains.
Passionate about leveraging technology for the greater good, Elvin continues to explore innovative ways to apply data-driven solutions to complex real-world challenges.