Stacking is a form of meta-learning. The main idea is that we use base learners in order to generate metadata for the problem's dataset and then utilize another learner called a meta-learner, in order to process the metadata. Base learners are considered to be level 0 learners, while the meta learner is considered a level 1 learner. In other words, the meta learner is stacked on top of the base learners, hence the name stacking.
A more intuitive way to describe the ensemble is to present an analogy with voting. In voting, we combined a number of base learners' predictions in order to increase their performance. In stacking, instead of explicitly defining the combination rule, we train a model that learns how to best combine the base learners' predictions. The meta-learner's input dataset consists of the base learners' predictions (metadata), as shown in figure: