Module 4: Tree-Based Modeling

Contents hide

1 INTRODUCTION – Tree-Based Modeling

2 PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ADDITIONAL SUPERVISED LEARNING TECHNIQUES

3 PRACTICE QUIZ: TEST YOUR KNOWLEDGE: TUNE TREE-BASED MODELS

4 PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BAGGING

5 PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BOOSTING

6 QUIZ: MODULE 4 CHALLENGE

7 CONCLUSION – Tree-Based Modeling

Spread the love

INTRODUCTION – Tree-Based Modeling

This whole section is solely centered on supervised learning, a subfield of machine learning, where the participants learn about testing and validating the performance of many supervised machine learning models, such as decision trees, random forests, and gradient boosting.

This deep dive into the understanding of these models will give students important tools for conceptualizing, implementing, and judging how well they apply to solving actual issues. At the end of this module, the students will be quite steeped in the methodology of supervised learning, hence giving them the capacity to be informed decision-makers and apply these powerful tools in practical scenarios.

Learning Objectives:

Identify the impact of model parameter tuning on performance and evaluation metrics
Differentiate Boosting in machine learning, with emphasis on XGBoost Models
Know the concept of Bagging in machine learning, specifically Related to random forest models
Know decision tree models, how do they work, and what are their advantages as compared to other forms of supervised machine learning
Differentiate the various kinds of supervised learning models

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ADDITIONAL SUPERVISED LEARNING TECHNIQUES

1. Tree-based learning is a type of unsupervised machine learning that performs classification and regression tasks.

True
False (CORRECT)

Correct: Unlike unsupervised machine learning, which relies on unlabeled data to find hidden structures or patterns that do not have predefined labels, tree-based learning is a form of supervised learning because it employs labeled datasets for training algorithms to classify or predict results.

2. Fill in the blank: Similar to a flow chart, a _____ is a classification model that represents various solutions available to solve a given problem based on the possible outcomes of each solution.

linear regression
decision tree (CORRECT)
Poisson distribution
binary logistic regression

Correct: Decision tree is a classification model that provides different solutions to the same problem according to possible outcomes from different solutions. Decision trees allow data professionals to conduct forecasts and predict future events, directly taking current information to foresee its outcomes. Binary logistic regression, however, is employed to model the probability of an event having two possible outcomes. It is typically used when the outcome is binary (yes or no) related effect.

3. In a decision tree, which node is the location where the first decision is made?

Leaf
Branch
Decision
Root (CORRECT)

Correct: The initial decision is made at the root node in the decision tree. It is the highest node in the tree and all the further decisions that will contribute in reaching the ultimate conclusion are from this node. Splitting the data based on a particular feature, root node does the work of branch splitting until a leaf node is reached, where it ends with a decision.

4. In tree-based learning, how is a split determined?

By which variables and cut-off values offer the most predictive power (CORRECT)
By the number of decisions required before arriving at a final prediction
By the amount of leaves present
By the level of balance present among the predictions made by the model

Correct: The first decision made in a decision tree happens at its root node-it is the point furthest up the tree. From that point on, any later decisions made will stem down from it to lead to the ultimate prediction. The root node will then split the data according to some feature, and the other branches grow down from that point until a decision is reached at the terminal leaves.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: TUNE TREE-BASED MODELS

1. Fill in the blank: The hyperparameter max depth is used to limit the depth of a decision tree, which is the number of levels between the _____ and the farthest node away from it.

leaf node
root node (CORRECT)
first split
decision node

Correct: The total levels or distance between root node and furthest leaf node is called max depth, which enable to limit a decision tree for the hyperparameterization process. The terms hyperparameters refer to settings put in place before a model is being trained; they cannot be learned or derived from the inferential statistics but could be adjusted for a better performance of the model. Hyperparameter tuning like max_depth will directly impact the model in its performance in fitting the dataset and will affect complexity of tree creating weakness to generalization when exposed to new and unseen data.

2. What tuning technique can a data professional use to confirm that a model achieves its intended purpose?

Min samples leaf
Classifier
Grid search (CORRECT)
Decision tree

Correct: An Instrument used to guarantee that a model fulfills its purpose is gridsearch, which systematically checks every possible hyperparameter combination. It finds the hyperparameter set that maximizes some evaluation metric in order to produce the best results possible. As a result, the parameter grid helps perform an exhaustive search which optimizes the model for maximum performance.

3. During model validation, the validation dataset must be combined with test data in order to function properly.

True
False (CORRECT)

Correct: It is only at the very end of the entire process that the validation dataset is revealed to hold no information. It is a sample of data that is withheld during training and used to evaluate the model’s performance at intermediate stages. This dataset differs from the test set, which is used after the final model is trained and tuned. The validation dataset is used in tuning hyper-parameters and preventing overfitting as the model is generalizing well towards new and unknown data.

4. Fill in the blank: Cross validation involves splitting training data into different combinations of _____, on which the model is trained.

Parcels
banks
tiers
folds (CORRECT)

Correct: The training data is split into several parts (also called “folds”) in cross-validation. A model is created with different combinations of these folds and then must be tested in every iteration with the fall left out. Each fold is subjected to the process in order to make sure that every data point is used in both training and testing. Cross-validation will give a robust assessment of the model’s performance, thereby reducing overfitting and providing a better estimate of its behavior on new, previously unseen data.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BAGGING

1. Ensemble learning is most effective when the outputs are aggregated from models that follow the exact same methodology all using the same dataset.

True
False (CORRECT)

Correct: Ensemble learning maximally succeeds when different approaches such as logistic regression, naive bayes, and decision tree classifiers give predictions. The diversity of model types guarantees that a model will not be wrong in the same way as another since every model has its own particular strength and weaknesses. Thus , combining the predictions of different models, an ensemble method decreases bias and variance resulting in improvement in generalization and robustness of the model.

2. What are some of the benefits of ensemble learning? Select all that apply.

It requires few base learners trained on the same dataset.
The predictions have less bias than other standalone models. (CORRECT)
It combines the results of many models to help make more reliable predictions. (CORRECT)
The predictions have lower variance than other standalone models. (CORRECT)

Correct: It is important to note that ensemble learning merges the predictions of many models to generate more reliable predictions. This approach produces less bias and variance when compared with the results from individual models. However, ensemble learning requires several base learners, each of which is trained on a random subset of the training data, to perform effectively.

3. In a random forest, what type of data is used to train the ensemble of decision-tree base learners?

Sampled
Unstructured
Bootstrapped (CORRECT)
Duplicated

Correct: Combined learning combines the predictions of many models for more reliable outcome; thus it reduces both bias and variance when compared to an individual model. For successful ensemble learning, a number of base learners must be trained on a random subset of the training data.

4. Fill in the blank: When using a decision tree model, a data professional can use _____ to control the threshold below which nodes become leaves.

min_samples_split (CORRECT)
max_features
min_samples_leaf
max_depth

Correct: The minimum number of samples that should be required to divide an internal node can be specified by a data professional using the min_samples_split hyperparameter when working with a decision tree model. If the number of samples in a particular node falls short of the stated threshold, the node is converted to a leaf and is not subjected to further splitting. This prevents the tree from getting complex and overfits with the training data.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BOOSTING

1. When using the hyperparameter min_child_weight, a tree will not split a node if it results in any child node with less weight than what is specified. What happens to the node instead?

It gets deleted.
It becomes a root.
It becomes a leaf (CORRECT)
It duplicates itself to become another node.

Correct: Hyperparameter~min_child_weight can be understood as a conditional when it will not allow the splitting of a node in the tree; in cases where the split would produce a child node with weight, or sum of instance weights, less than or equal to the set threshold, then it becomes a leaf. This hyperparameter prevents overfitting to an extent by forcing splits to occur only on child nodes that will have significant enough instance count or weight, creating stronger and more generalizable models.

2. Fill in the blank: The supervised learning technique boosting builds an ensemble of weak learners _____, then aggregates their predictions.

Repeatedly
in parallel
sequentially (CORRECT)
randomly

Correct: The supervised learning technique boosting builds an ensemble of weak learners sequentially, then aggregates their predictions.

3. When using a gradient boosting machine (GBM) modeling technique, which term describes a model’s ability to predict new values that fall outside of the range of values in the training data?

Grid search
Learning rate
Extrapolation (CORRECT)
Cross validation

Correct: Extrapolation in the notion of the gradient boosting machine (GBM) is the prediction of values which lies outside the numeric range of predicting numbers that were already encountered. Unlike interpolation, the prediction occurs within the enclosed range of training data, while extrapolation deals with estimating features outside of training. Though beneficial at times, this technique often fails as the model is not designed to generalize out into untrained regions of the feature space.

QUIZ: MODULE 4 CHALLENGE

1. A junior data analyst uses tree-based learning for a sales and marketing project. Currently, they are interested in the section of the tree that represents where the first decision is made. What are they examining?

Branches
Leaves
Roots (CORRECT)
Splits

2. What are some disadvantages of decision trees? Select all that apply.

Preparing data to train a decision is a complex process involving significant preprocessing
Decision trees require assumptions regarding the distribution of underlying data.
Decision trees can be particularly susceptible to overfitting. (CORRECT)
When new data is introduced, decision trees can be less effective at prediction. (CORRECT)

3. Which section of a decision tree is where the final prediction is made?

Decision node
Split
Leaf node (CORRECT)
Root node

4. In a decision tree ensemble model, which hyperparameter controls how many decision trees the model will build for its ensemble?

max_features
max_depth
n_trees
n_estimators (CORRECT)

5. What process uses different “folds” (portions) of the data to train and evaluate a model across several iterations?

Grid search
Model validation
Cross validation (CORRECT)
Proportional verification

6. Which of the following statements correctly describe ensemble learning? Select all that apply.

When building an ensemble using different types of models, each should be trained on completely different data.
Predictions using an ensemble of models can be accurate even when the individual models are barely more accurate than a random guess. (CORRECT)
Ensemble learning involves aggregating the outputs of multiple models to make a final prediction. (CORRECT)
If a base learner’s prediction is only slightly better than a random guess, it is called a “weak learner.” (CORRECT)

7. Fill in the blank: A random forest is an ensemble of decision-tree _____ that are trained on bootstrapped data.

Statements
Observations
base learners (CORRECT)
variables

8. What are some benefits of boosting? Select all that apply.

Boosting is the most interpretable model methodology.
Boosting is a powerful predictive methodology. (CORRECT)
Boosting can handle both numeric and categorical features. (CORRECT)
Boosting does not require the data to be scaled. (CORRECT)

9. Which of the following statements correctly describe gradient boosting? Select all that apply.

Gradient boosting machines cannot perform classification tasks.
Gradient boosting machines have many hyperparameters. (CORRECT)
Gradient boosting machines do not give coefficients or directionality for their individual features. (CORRECT)
Gradient boosting machines are often called black-box models because their predictions can be difficult to explain. (CORRECT)

10. A data professional uses tree-based learning for an operations project. Currently, they are interested in the nodes at which the trees split. What type of nodes do they examine?

Decision (CORRECT)
Branch
Leaf
Root

11. What are some benefits of decision trees? Select all that apply.

When working with decision trees, overfitting is unlikely.
When preparing data to train a decision tree, very little preprocessing is required. (CORRECT)
Decision trees enable data professionals to make predictions about future events based on currently available information. (CORRECT)
Decision trees require no assumptions regarding the distribution of underlying data. (CORRECT)

12. In a decision tree, what type(s) of nodes can decision nodes point to? Select all that apply.

The decryption of secret keys
The authentication controls of Caesar’s cipher
The establishment of trust using digital certificates (CORRECT)
The storage of public information

Through the process of PKI, encrypted information and trust are transferred from one entity to another via duly supplied digital certificates. In PKI, data can be encrypted using either asymmetric or symmetric or both encryption keys. This can be followed by authenticating the identity of a site, individual, organization, device, or server by a digital certificate linking that entity’s public key to such an identity.

13. In a decision tree model, which hyperparameter sets the threshold below which nodes become leaves?

Min child weight
Min samples tree
Min samples split (CORRECT)
Min samples leaf

14. When might you use a separate validation dataset? Select all that apply.

If you have very little data.
If you want to choose the specific samples used to validate the model. (CORRECT)
If you have a very large amount of data. (CORRECT)
If you want to compare different model scores to choose a champion before predicting on test holdout data. (CORRECT)

15. What tool is used to confirm that a model achieves its intended purpose by systematically checking combinations of hyperparameters to identify which set produces the best results, based on the selected metric?

GridSearchCV (CORRECT)
Model validation
Cross validation
Hyperparameter verification

16. Which of the following statements correctly describe ensemble learning? Select all that apply.

If a base learner’s prediction is equally effective as a random guess, it is a strong learner.
It’s possible to use the same methodology for each contributing model, as long as there are numerous base learners. (CORRECT)
Ensemble learning involves building multiple models. (CORRECT)
It’s possible to use very different methodologies for each contributing model. (CORRECT)

17. Which of the following statements correctly describe gradient boosting? Select all that apply.

Gradient boosting machines build models in parallel.
Gradient boosting machines tell you the coefficients for each feature.
Gradient boosting machines work well with missing data. (CORRECT)
Gradient boosting machines do not require the data to be scaled. (CORRECT)

18. Which of the following statements accurately describe decision trees? Select all that apply.

Decision trees are equally effective at predicting both existing and new data.
Decision trees work by sorting data. (CORRECT)
Decision trees require no assumptions regarding the distribution of underlying data. (CORRECT)
Decision trees are susceptible to overfitting. (CORRECT)

19. What is the only section of a decision tree that contains no predecessors?

Leaf node
Root node (CORRECT)
Decision node
Split based on what will provide the most predictive power.

20. In a decision tree, nodes are where decisions are made, and they are connected by edges.

True (CORRECT)
False

Correct: In a decision tree, nodes are where decisions are made, and they are connected by edges. At each node, a single feature of the data is considered and decided on. Edges direct from one node to the next during this process. Eventually, all relevant features will have been resolved, resulting in the classification prediction.

21. Fill in the blank: Each base learner in a random forest model has different combinations of features available to it, which helps prevent correlated errors among _____ in the ensemble.

Nodes
roots
learners (CORRECT)
splits

22. What are some benefits of boosting? Select all that apply.

The models used in boosting can be trained in parallel across many different servers.
Boosting reduces bias. (CORRECT)
Because no single tree weighs too heavily in the ensemble, boosting reduces the problem of high variance. (CORRECT)
Boosting can improve model accuracy. (CORRECT)

23. Which of the following statements correctly describe gradient boosting? Select all that apply.

Gradient boosting models can be trained in parallel.
Each base learner in the sequence is built to predict the residual errors of the model that preceded it. (CORRECT)
Gradient boosting machines can be difficult to interpret. (CORRECT)
Gradient boosting machines have difficulty with extrapolation. (CORRECT)

24. A data analytics team uses tree-based learning for a research and development project. Currently, they are interested in the parts of the decision tree that represent an item’s target value. What are they examining?

Roots
Branches
Leaves (CORRECT)
Splits

25. In a decision tree model, which hyperparameter specifies the number of attributes that each tree selects randomly from the training data to determine its splits?

Learning rate
Max features (CORRECT)
Number of estimators
Max depth

26. Adaboost is a tree-based boosting methodology in which each consecutive base learner assigns greater weight to the observations that were correctly predicted by the preceding learner.

True
False (CORRECT)

Correct: AdaBoost usually builds a cascade of weak learners using a tree-based boosting mechanism in a sequential way, where every subsequent base learner is weighed more heavily for those observations incorrectly predicted by the previous learner. Therefore the first tree is trained on the training data by assigning equal weights on the observations. Once the first tree prediction has been made, the weights of incorrectly predicted observations are raised, and the weights of those predicted correctly are decreased. This continues so that every new tree learned most of its errors made by predecessors until the time when perfect prediction is made or up to a certain number of trees.

27. Why might a GBM, or gradient-boosting machine, be inappropriate for use in the health care or financial fields?

Its predictions cannot be precisely explained. (CORRECT)
It doesn’t perform well with missing data.
It requires the data to be scaled.
It is inaccurate.

Correct: Now, this could not be a good place for using a Gradient Boosting Machine (GBM) like that in healthcare or finance because it does not make its predictions much easier to communicate in terms of justifying. “Black-box” is often the name given to these models because predictions are accurate; however, the predictions are not transparent. This could be a critical drawback in resources such as healthcare and finance, where the reason for making decisions is so important that there must be some understanding and justification of that reasoning.

CONCLUSION – Tree-Based Modeling

In short, this section has plunged into one of the major aspects of machine learning: supervised learning. The students have now

learnt important things about testing and validation of results from key models, including decision trees, random forests, and gradient boosting. Such candles illuminate the path students have to lace up their shoes for exploring the intricacies of challenges and inflating them with data-based decisions.

It gives students the tools with which to carry such supervised learning techniques-emphasizing their potential to solve more complex problems and make more informed data-driven decisions. This quite extensive overview puts participants in very good stead to start out learning and applying supervised learning methodologies as part of their own skill set to improve performance in the ever-evolving discipline of machine learning.