On the Importance of Meta-models in Data Science

Deep Learning

As we have said, deep learning is built upon neural networks, which are ensembles of artificial neurons. So the first question that arises is: what is an artificial neuron? Let us find out.

Artificial neurons

In short, an artificial neuron is a mathematical function that receives a series of values as inputs, multiplies them by a set of weights, and passes their weighted sum to another function, known as activation function, whose output is the output of the neuron itself.

  1. soma, the body of the cell that sums up all the positive and negative signals coming from the dendrites (which corresponds to the weighted sum that we mentioned before); and
  2. axon, the filament which returns the electrical signal if its value is higher than a certain threshold (which is mimicked by the activation function).
artificial neurons

The activation function

The ability of neural networks to model every kind of information lies in their non-linear activation function. There are many kinds of activation functions, each with its own rationale. To keep this explanation short and easy, we will focus on one of the most common ones, the REctified Linear Unit (RELU) function (in blue in the following image from the dedicated page of the excellent blog Machine Learning Mastery):

Machine Learning Mastery
  • is in a safe area; and
  • is well connected to services.

Neural Networks and Big Data

Earlier we introduced two questions. Having discussed the source of the neural networks’ capability of modeling complex non-linear phenomena, let us now explain when they should be preferred over traditional machine learning models by using the plot below obtained from this publication (Wireless Networks Design in the Era of Deep Learning: Model-Based, AI-Based, or Both?), which links the performance of an estimator with the size of the dataset used for training it:

Neural Networks and Big Data
  • the number of features and data points involved;
  • the noise that affects the data; and
  • both the classical and deep learning models involved in the study.

Artificial Neural Networks

So we have introduced the concept of artificial neurons, we have explained why they can model complex, non-linear phenomena, and when they are the best choice for solving data science tasks. Now we will explain how to ensemble such neurons in networks.

Ensemble Learning

Have you ever heard about Random Forests or Gradient Boosting? If so, after reading this section you will have a much clearer idea about them. But first we need to take a step back and go where it all began: with Decision Trees.

The Decision Trees

In the field of traditional Machine Learning, decision trees are the most common building blocks for ensemble models. Decision trees are estimators that perform their classification or regression task by, literally, building a tree of subsequent decisions (for a comprehensive tutorial, please, see this video). Let us consider an example from the Wikipedia page of Decision Tree Learning, regarding the survival of the passengers on the Titanic (the Titanic dataset can be found on Kaggle, at this link):

  1. Age
  2. SibSp
  1. If the passenger age is < 9.5 years, he died. Else
  2. If the passenger number of siblings/spouse is < 3, he survived. Else, he died.
  1. This is reasonable. After all, women and children are the first to be evacuated in life-or-death situations.
  2. If the passenger age is < 9.5 years, he died.
  3. Wait, what? Are children expected to die more often than adults? This is strange… and yet, if the decision tree determined it, it means that this is what the data say.
  4. If the passenger number of siblings/spouse is < 3, he survived. Else, he died.
  5. This is interesting too… people with less siblings appear to have a higher probability of survival than those with many.

Bootstrapping Aggregation

Imagine that you are the sales manager of a company and you need to make an economical offer to a customer for developing a solution that they require. The team responsible for the implementation consists of three persons:

  1. Bob, developer; and
  2. Christine, senior developer.
  1. Bob, 8 days; and
  2. Christine, 10 days.
  1. According to his/her own experience and perspective, every developer will consider a different set of features for creating his/her own model. For example, Adam, being a junior developer, may be prone to neglecting or underestimating some important tasks because he did not realize that they were needed in the first place, whereas Christine, as a senior developer, may be able to correctly identify all the significant factors and frame them in the big picture.


Whereas bagging corrects variance, boosting has been developed to reduce bias. So what is bias? Simply put, it is the average difference between the true and predicted values. For example, if you predict 8 days, while the final amount is 9, and the next time you predict 15, while the true value is 16, then you have a bias of −1 day. Differently from variance, bias is not imputable to the data, but to the model itself.

Stacked Generalization

What bagging and boosting have in common is that they are built upon decision trees, so what if you wanted to mix different kinds of estimators? The answer to this question dates back to the early ’90s to a paper called Stacked Generalization (Wolpert 1992), which set the foundation for the most powerful (and yet currently less explored) paradigm of ensemble learning: the Stacked Generalization.

  • selecting the best hyperparameter combination for each estimator of the layer
  • in case of a multi-layer architecture, selecting also which data to feed into each layer
  • random forest with 950 decision trees and all the other hyperparameters values equal to those of the random forest above


In this article, we have given an overview of the different kinds of meta-estimators and we have detailed why they are so important for achieving top-notch results in data science. Every kind of meta-model is a tool and, just like any other tool, it offers different advantages and disadvantages. Choose wisely when working with your data and do not be afraid of trying new approaches because they are the only way to allow the progress of mankind. Happy ensembling!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ennova Research

Ennova Research


We see beyond today’s needs focusing on R&D to bring innovative, human-driven and tech-driven solutions to the market. https://ennova-research.com