Human Resources Predictive Analytics
AI, Big Data & Machine Learning
A multinational company with more than 10,000 white-collar employees has been experiencing a turnover twice as high as its competitors (>10% vs. ~5%) for several years.
The company’s need was twofold:
- Understanding who is more likely to leave
- Understand why those people were leaving
The HR department of the company was tasked to provide these answers to the management and to identify feasible retention strategies, but their success was marginal. Moreover, the outcome of each strategy could be evaluated only at the end of the year, thus making the try-mistake-correct-repeat procedure too costly in terms of time (and money, by consequence).
We proposed the idea of applying Machine Learning to the data the company already had in order to create a model that could simultaneously provide high quality, interpretable predictions.
Given the complexity of the problem and the noisiness of the data, we realized that a single estimator, no matter how powerful, would have failed at providing us with state-of-the-art results. So we decided to address the challenge by using a more sophisticated approach called Ensemble Learning.
ENSEMBLE LEARNING involves combining multiple estimators in a “meta-estimator” with stronger predictive power. Well-known examples of meta-estimators are random forests or gradient boosting algorithms, as well as artificial neural networks. However, Ensemble Learning itself is a definition comprising a range of techniques, among which we chose to implement the most powerful and yet unexplored one: Stacked Generalization.
STACKED GENERALIZATION stands out from all the other ensemble learning techniques because it is the only one that makes use of heterogeneous estimators. Whereas random forests and gradient boosting algorithms use decision trees and neural networks use neurons, a stacked generalization algorithm can combine tens of different algorithms, which could be, in turn, meta-estimators as well.
This leads to a huge level of complexity. In fact, building a successful stacked ensemble model requires us to carefully choose one of the following:
- the single estimators;
- the hyperparameters of each estimator; or
- the architecture used to combine them.
This corresponds to finding the best combination of an unknown number of factors selected within a potentially infinite amount of options.
So how does one solve such a tremendously complex problem?
The solution is called AutoML.
AUTOML stands for Automatic Machine Learning, and it corresponds to training an Artificial Intelligence for automatically selecting, tuning, and arranging Machine Learning models the way a human would. In fact, the AutoML AI chooses a model, applies the model to the data, and then evaluates the goodness of its outputs. As more and more models are tested, the AutoML AI starts to understand what works with the data and what does not. This awareness makes it capable of performing increasingly precise inferences that lead to better and better models. On top of that, this can be run in a distributed environment, such as cloud architecture, in order to perform asynchronous model selection.
By combining Stacked Generalization with AutoML through a cloud-based solution that leveraged the power of distributed computation, we were able to build a stacked ensemble model that delivered interpretable as well as astonishingly good results. In fact, despite lacking any kind of exogenous information, our model was able to properly identify 90% of the people who left the company with a precision of 50%; namely, of the two candidates identified to leave the company by our model, one actually left the company within the year.
Even though the COVID-19 pandemic negatively impacted the quality of our model’s predictions (the detection rate lowered to 66% and the precision to 33%), the Vice Head of HR of the company congratulated us for achieving better results than his entire division.
For more information about our solution, contact us at email@example.com.
Discover more about Ennova Research