INTRODUCTION – Exam Preparation Course 5
Learning Objectives:
- Demonstrate proficiency in the skills assessed in DP-100: Designing and Implementing a Data Science Solution on Azure
- Take a practice exam covering all content in the specialization
Quiz: Full Practice Exam
1. Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having no disease”.
- True negatives
- False positives
- False negatives
- True positives (CORRECT)
Correct: A true positive is an outcome where the model correctly predicts the positive class.
2. As a senior data scientist, you need to evaluate a binary classification machine learning model.
As evaluation metric, you have to use the precision. Considering this, which is the most appropriate visualization?
- Violin plot
- Receiver Operating Characteristic (ROC) curve (CORRECT)
- Gradient descent
- Scatter plot
Correct: Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly classified labels for a particular model.
3. You have a dataset that can be used for multiclass classification tasks. The dataset provided contains a normalized numerical feature set with 20,000 data points and 300 features. For training purposes, you need 75 percent of the data points and for testing purposes you need 25 percent.
Name Description
X_train Training feature set
Y_train Training class labels
x_train Testing feature set
y_train Testing class labels
Your goal is to use the method of the Principal Component Analysis (PCA) in order to reduce the feature set dimensionality to 20 features for training and testing sets also.
You decide to apply in Python the scikit-learn machine learning library.
You mark with X the feature set and with Y the class labels.
Your Python data frames include the below code segment:
From sklearn.decomposition import PCA
pca – […]
x_train=[…].fit_transform(X_train)
x_test = pca.[…]
How would you complete the missing brackets for the code snippet presented?
- Box1: PCA(n_components=10);
- Box2: model;
- Box3: transform(x_test)
- Box1: PCA(n_components=10); (CORRECT)
- Box2: pca;
- Box3: transform(x_test)
- Box1: PCA(n_components=150);
- Box2: pca;
- Box3: x_test
- Box1: PCA(n_components=10000);
- Box2: pca;
- Box3: X_train
Correct: This is the described metric.
4. What is the result for multiplying a list by 3?
- The new list created has the length 3 times the original length with the sequence repeated 3 times. (CORRECT)
- The new list created has the length 3 times the original length with the sequence repeated 3 times and also all the elements are also multiplied by 3.
- The new list remains the same size, but the elements are multiplied by 3.
Correct: This is how a list behaves when multiplied.
5. Python is commonly known to ensure extensive functionality with powerful and statistical numerical libraries. What are the utilities of TensorFlow?
- Analyzing and manipulating data
- Providing attractive data visualizations
- Supplying machine learning and deep learning capabilities (CORRECT)
- Offering simple and effective predictive data analysis
Correct: TensorFlow supplies machine learning and deep learning capabilities.
6. Choose from the list below the evaluation model that is described as a relative metric where the higher the value is, the better will be the fit of the model.
- Root Mean Square Error (RMSE)
- Coefficient of Determination (known as R-squared or R2) (CORRECT)
- Mean Square Error (MSE)
Correct: This evaluation metric reflects the proportion of the variance between predicted and actual label values that the model is able to explain. Essentially, it measures how well the model accounts for the variation in the data.
7. Choose from the list below the evaluation metric that provides you with an absolute metric in the same unit as the label.
- Coefficient of Determination (known as R-squared or R2)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE) (CORRECT)
Correct: This is the described metric. This means that the smaller the value, the better the model.
8. You are able to associate the K-Means clustering algorithm with the following machine learning type:
- Reinforcement learning
- Unsupervised machine learning (CORRECT)
- Supervised machine learning
Correct: Clustering is a form of unsupervised machine learning in which the training data does not include known labels.
9. Your deep neural network is in the process of training. You decided to set 30 epochs to the training process configuration.
In this scenario, what would happen to the model’s behavior?
- The entire training dataset is passed through the network 30 times (CORRECT)
- The training data is split into 30 subsets, and each subset is passed through the network
- The first 30 rows of data are used to train the model, and the remaining rows are used to validate it
Correct: The number of epochs determines the number of training passes for the full dataset.
10. Which of the layer types described below is a principal one that retrieves important features in images and works by putting a filter to images?
- Pooling layer
- Flattening layer
- Convolutional layer (CORRECT)
Correct: One of the principal layer types is a convolutional layer that extracts important features in images. A convolutional layer works by applying a filter to images.
11. You want to set up a new Azure subscription. The subscription doesn’t contain any resources.
Navigate to Azure Machine Learning studio and create a workspace.
Your goal is to create an Azure Machine Learning workspace.
Considering this scenario, which are three possible ways to obtain this result? Keep in mind that every correct answer presents a complete solution.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
- Navigate to Azure Machine Learning studio and create a workspace.
- Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/ workspaces resource and its dependencies. (CORRECT)
- Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az group create function with –name and –location parameters, and then the az ml workspace create function, specifying Cw and Cg parameters for the workspace name and resource group.(CORRECT)
- Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with name, subscription_id, resource_group, and location parameters. (CORRECT)
Correct: This is one way to achieve the goal.
Correct: This is one way to achieve the goal.
Correct: This is one way to achieve the goal.
12. You decide to use GPU-based training to develop a deep learning model on Azure Machine Learning service that is able to recognize image.
The context where you have to configure the model needs to allow real-time GPU-based inferencing.
Considering that you have to set up compute resources for model inferencing, what is the most suitable compute type?
- Field Programmable Gate Array
- Azure Kubernetes Service (CORRECT)
- Machine Learning Compute
- Azure Container Instance
Correct: You can use Azure Machine Learning to deploy a GPU-enabled model as a web service. Deploying a model on Azure Kubernetes Service (AKS) is a viable option. The AKS cluster provides a GPU resource that is used by the model for inference.
13. You decide to use Azure Machine Learning designer for your real-time service endpoint. You can make use of only one Azure Machine Learning service compute resource.
You start training the model and preparing the real-time pipeline for deployment.
If you want to obtain a web service by publishing the inference pipeline, what is the most suitable compute type?
- the existing Machine Learning Compute resource
- Azure Databricks
- a new Machine Learning Compute resource
- HDInsight
- Azure Kubernetes Services (CORRECT)
Correct: Azure Kubernetes Service (AKS) can be for used real-time inference.
14. In order to define a pipeline with multiple steps, you decide to use the Azure Machine Learning Python SDK. You notice that some steps of the pipeline do not run. Instead of running the steps, the pipeline uses a cached output from a previous run. Your task is to make sure that the pipeline runs every step, even when the parameters and contents of the source directory are the same with the ones from the previous run.
From the following list, which two ways are able to return the expected result? Keep in mind that every correct answer presents a complete solution.
- Restart the compute cluster where the pipeline experiment is configured to run.
- Set the outputs property of each step in the pipeline to True.
- Use a PipelineData object that references a datastore other than the default datastore.
- Set the regenerate_outputs property of the pipeline to True. (CORRECT)
- Set the allow_reuse property of each step in the pipeline to False. (CORRECT)
Correct: If regenerate_outputs is set to True, a new submit will always force generation of all step outputs, and disallow data reuse for any step of this run. Once this run is complete, however, subsequent runs may reuse the results of this run.
Correct: If the data used in a step is stored in a datastore and allow_reuse is set to True, any changes to the data will not be detected.
If the data is uploaded as part of the snapshot (under the step’s source_directory), although this is not recommended, the hash will change, triggering a rerun of the step.
15. Yes or No?
You use a logistic regression algorithm to train your classification model. In order to explain the model’s predictions, you have to calculate the importance of all the features, taking into account the overall global relative importance value, but also the measure of local importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an explainer.
Solution: Configure a TabularExplainer. Is this solution effective?
- Yes (CORRECT)
- No
Correct: The TabularExplainer supports both global and local feature importance explanations.
16. If you want to install the Azure Machine Learning SDK for Python, what are the most suitable package managers and CLI commands?
- pip install azureml-sdk (CORRECT)
- npm install azureml-sdk
- yarn install azureml-sdk
- nuget azureml-sdk
Correct: This package management system contains the Python Azure ML SDK and is the CLI command to install it.
17. If you want to use the from_delimited_files method of the Dataset.Tabular class to configure and register a tabular dataset, what are the most appropriate Python commands?
- from azureml.core import Dataset
- blob_ds = ws.get_default_datastore()
- csv_paths = [(blob_ds, ‘data/files/current_data.csv’),
- (blob_ds, ‘data/files/archive/*.csv’)]
- tab_ds = Dataset.Tabular.from_delimited_files()
- tab_ds = tab_ds.register(workspace=ws, name=’csv_table’)
- from azureml.core import Dataset
- blob_ds = ws.get_default_datastore()
- csv_paths = [(blob_ds, ‘data/files/current_data.csv’),
- (blob_ds, ‘data/files/archive/csv’)]
- tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
- tab_ds = tab_ds.register(workspace=ws, name=’csv_table’)
- from azureml.core import Dataset
- blob_ds = ws.change_default_datastore()
- csv_paths = [(blob_ds, ‘data/files/current_data.csv’),
- (blob_ds, ‘data/files/archive/*.csv’)]
- tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
- tab_ds = tab_ds.register(workspace=ws, name=’csv_table’)
- from azureml.core import Dataset (CORRECT)
- blob_ds = ws.get_default_datastore()
- csv_paths = [(blob_ds, ‘data/files/current_data.csv’),
- (blob_ds, ‘data/files/archive/*.csv’)]tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
- tab_ds = tab_ds.register(workspace=ws, name=’csv_table’)
Correct: This is the correct command and statement.
18. If you want to visualize the environments that you registered in your workspace, what are the most appropriate SDK commands that you should choose?
- from azureml.core import Environment
- env_names = Environment.list(workspace=ws)
- for env_name of env_names:
- print(‘Name:’,env_name)
- from azureml.core import Environment
- env_names = Environment_list(workspace=ws)
- for env_name in env_names:
- print(‘Name:’,env_name)
- from azureml.core import Environment
- env_names = Environment.list(workspace=ws)
- for each env_name in env_names:
- print(‘Name:’,env_name)
- from azureml.core import Environment (CORRECT)
- env_names = Environment.list(workspace=ws)
- for env_name in env_names:
- print(‘Name:’,env_name)
Correct: This is the correct code expression.
19. What object needs to be defined if your task is to create a schedule for your pipeline?
- ScheduleConfig
- ScheduleTimer
- ScheduleSync
- ScheduleRecurrence (CORRECT)
Correct: To schedule a pipeline to run at periodic intervals, you must define a ScheduleRecurrence that determines the run frequency, and use it to create a Schedule.
20. Choose from the options below the one that explains how are values for hyperparameters selected by random sampling.
- From a mix of discrete and continuous values (CORRECT)
- It tries every possible combination of parameters in the search space
- It tries to select parameter combinations that will result in improved performance from the previous selection
Correct: Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values.
21. What Python code should you write if your goal is to extract the primary metric for a regression task?
- from azureml.train.automl.utilities import catch_primary_metrics
- catch_primary_metrics(‘regression’)
- from azureml.train.automl.utilities import pull_primary_metrics
- pull_primary_metrics(‘regression’)
- from azureml.train.automl.utilities import get_primary_metrics (CORRECT)
- get_primary_metrics(‘regression’)
- from azureml.train.automl.utilities import feed_primary_metrics
- feed_primary_metrics(‘regression’)
Correct: This is the correct code expression.
22. Your task is to enable the creation of an explanation in the experiment script. What packages should you install in the run environment in order to achieve this goal?
- azureml-blackbox
- azureml-explainer
- azureml-contrib-interpret (CORRECT)
- azureml-interpret (CORRECT)
Correct: You need to ensure this package is installed in the run environment to create an explanation in your experiment script.
Correct: You need to ensure this package is installed in the run environment to create an explanation in your experiment script.
23. If you want to minimize disparity in combined true positive rate and false_positive_rate across sensitive feature groups, what is the most suitable parity constraint that you should choose to use with any of the mitigation algorithms?
- True positive rate parity
- Error rate parity
- Equalized odds (CORRECT)
- False-positive rate parity
Correct: This is the parity constraint described. For example, in a binary classification scenario, this constraint tries to ensure that each group contains a comparable ratio of true positive and false-positive predictions.
24. You decided to preprocess and filter down only the relevant columns for your AirBnB housing dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Your goal is to use the spark.sql.register in order to create a UDF from the function above, because you want to ensure that the UDF will be created in the SQL namespace.
Considering this scenario, what code should you write?
- airbnbDF.createTempView(“airbnbDF”)
- spark.udf.register(sql_udf = firstInitialFunction)
- airbnbDF.createOrReplaceTempView(“airbnbDF”)
- spark.udf.register(“sql_udf”, firstInitialFunction)
- airbnbDF.createAndReplaceTempView(“airbnbDF”)
- spark.udf.register(sql_udf.firstInitialFunction)
- airbnbDF.replaceTempView(“airbnbDF”) (CORRECT)
- spark.udf.register(“sql_udf”, firstInitialFunction)
Correct: This is the correct code for the task.
25. In order to track the runs of a Linear Regression model of your AirBnB dataset, you decide to use MLflow.
You want to make use of all the features included in your dataset.
At this point, you have created and logged the pipeline and you have logged the parameters.
You now have to create some predictions and metrics.
Considering this scenario, what code should you write?
- predDF = pipelineModel.evaluate(testDF) (CORRECT)
- regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
- rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
- r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
- predDF = pipelineModel.estimate(testDF)
- regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
- rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
- r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
- predDF = pipelineModel.transform(testDF)
- regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
- rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
- r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
- predDF = pipelineModel.transform(testDF)
- regression = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
- rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
- r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
Correct: This is the correct code for the task.
26. You decided to use Azure Machine Learning to create machine learning models. You want to use multiple compute contexts to train and score models.
Moreover, you want to use Azure Databricks cluster to train models.
Considering this scenario, what compute type is the most suitable to use for Azure Databricks?
- Inference cluster
- Compute cluster
- Attached compute (CORRECT)
Correct: This compute type can be used by Azure Databricks (for use in machine learning pipelines), Azure Data Lake Analytics (for use in machine learning pipelines), and Azure HDInsight.
27. For your experiment in Azure Machine Learning you decide to run the following code:
from azureml.core import Workspace, Experiment, Run
from azureml.core import RunConfig, ScriptRunConfig
ws = Workspace.from_config()
run_config = RunConfiguration()
run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’, script=’experiment.py’, run_config=run_config)
experiment = Experiment(workspace=ws, name=’script experiment’)
run = experiment.submit(config=script_config)
run.wait_for_completion()
The experiment run generates several output files that need identification.
In order to retrieve the output file names, you must write some code. Which of the following code snippets should you choose to complete the script?
- files = run.get_properties()
- files = run.get_details_with_logs()
- files = run.get_fine_names() (CORRECT)
- files = run.get_metrics()
Correct: You can list all of the files that are associated with this run record by calling run.get_file_names()
28. Which of the options listed below is able to show if you have missing values in the dataset when you want to find out the number of observations in the data set in the process of explanatory data analysis?
- Standard deviation
- Count (CORRECT)
- Mean
Correct: Count gives us the number of observed values, indicating the size of the dataset and whether there are missing values.
29. You are able to use the the MlflowClient object as the pathway in order to query previous runs in a programmatic manner.
What code should you write in Python to achieve this?
- from mlflow.pipelines import MlflowClient (CORRECT)
- client = MlflowClient(
- list.experiments()
- from mlflow.tracking import MlflowClient
- client = MlflowClient()
- client.list_experiments()
- from mlflow.pipelines import MlflowClient
- client = MlflowClient()
- client.list_experiments()
- from mlflow.tracking import MlflowClient
- client = MlflowClient()
- list.client_experiments()
Correct: This is the correct and complete command to run for this scenario.
30. In you want to explore the hyperparameters on a model while knowing that every algorithm uses a different hyperparameter for tuning, what is the most appropriate method you should choose?
- showParams()
- getParams()
- exploreParams()
- explainParams() (CORRECT)
Correct: You can explore these hyperparameters by using the .explainParams() method on a model.
31. True or False?
Petastorm uses as an input a Vector and not an Array.
- True
- False (CORRECT)
Correct: It’s actually the other way around. Petastrom requires arrays as input, not vectors.
32. Your task is to store in the Azure ML workspace a model for whose training you ran an experiment. You want to do this so that other experiments and services can be applied to the model.
Considering this scenario, what action should you take to achieve the result?
- Save the model as a file in a compute instance (CORRECT)
- Save the experiment script as a notebook
- Save the model as a file in a Key Vault instance
- Register the model in the workspace
Correct: deploy a real-time endpoint on the inference cluster.
33. Your hyperparameter tuning needs to have a search space defined. The values of the batch_size hyperparameter can be 128, 256, or 512 and the normal distribution values for the learning_rate hyperparameter can have a mean of 10 and a standard deviation of 3.
What Python code should you write in order to achieve this goal?
- from azureml.train.hyperdrive import choice, normal (CORRECT)
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: normal(10, 3)
- }
- from azureml.train.hyperdrive import choice, uniform
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: uniform(10, 3)
- }
- from azureml.train.hyperdrive import choice, normal
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: qnormal(10, 3)
- }
- from azureml.train.hyperdrive import choice, normal
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: lognormal(10, 3)
- }
Correct: This is the correct code for this scenario.
34. You decided to use Parquet files and Petastorm to train a distributed neural network by using Horovod.
Your housing prices dataset from California is entitled cal_housing.
In order to concatenate the features and labels of the model after you loaded the data, you configured from the Pandas DataFrame a Spark DataFrame.
At this point, you want to set up Dense Vectors for the features.
What code should you write in Python to achieve this?
- from pyspark.ml.feature import VectorAssembler
- vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
- vecTrainDF = vecAssembler.transform(trainDF).call(“features”, “label”)
- display(vecTrainDF)
- from pyspark.ml.feature import VectorAssembler
- vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
- vecTrainDF = vecAssembler.transform(trainDF).hook(“features”, “label”)
- display(vecTrainDF)
- from pyspark.ml.feature import VectorAssemble (CORRECT)
- vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
- vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
- display(vecTrainDF)
- from pyspark.ml.feature import VectorAssembler
- vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”labels “)
- vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
- display(vecTrainDF)
Correct: This is the correct code for the task.
35. You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
- Box plot
- A violin plot
- Binary classification confusion matrix (CORRECT)
- Gradient descent
Correct: The confusion matrix includes the Precision metric, which results from the number of true positives divided by the number of true positives plus false positives.
36. In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the student’s length of education, degree type, and craft form. You decide to set up a linear regression model that you will have to evaluate. Solution: Apply the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
- Yes
- No (CORRECT)
Correct: Relative Squared Error, Coefficient of Determination are good metrics to evaluate the linear regression model, but the others are metrics for classification models.
37. What is the result for multiplying a NumPy array by 3?
- The new array will be 3 times longer, with the sequence repeated 3 times and also all the elements are multiplied by 3.
- Array stays the same size, but each element is multiplied by 3. (CORRECT)
- The new array will be 3 times longer, with the sequence repeated 3 times.
Correct: This is how a list behaves when multiplied.
38. How should the following sentence be completed?One example of the machine learning […] type models is the Support Vector Machine algorithm.
- Classification
- Regression (CORRECT)
- Clustering
Correct: Decision trees take a step-by-step approach to predicting a variable.
39. If you multiply by 2 a list and a NumPy array, what result would you get?
- Multiplying an NumPy array by 2 creates a new array 2 times the length with the original sequence repeated 2 times.
- Multiplying a list by 2 performs an element-wise calculation on the list, which sees the list stay the same size, but each element has been multiplied by 2.
- Multiplying a list by 2 creates a new list 2 times the length with the original sequence repeated 2 times. (CORRECT)
- Multiplying a NumPy array by 2 performs an element-wise calculation on the array, which sees the array stay the same size, but each element has been multiplied by 2. (CORRECT)
Correct: This is how a list behaves when multiplied.
Correct: This is how a NumPy array behaves when multiplied.
40. For training a classification model that is able to predict based on 8 numeric features where in the classes is belonging an observation, you configured a deep neural network.
From the list below, which one states a truth related to the network architecture?
- The network layer should contain four hidden layers
- The input layer should contain four nodes
- The output layer should contain four nodes (CORRECT)
Correct: The output layer should contain a node for each possible class value.
41. The company that you work for decides to expand the use of machine learning. The company decides not to set up another compute environment in Azure. At the moment, you have at your disposal the compute environments below.
Environment name | Compute Type |
---|---|
nb_server | Compute instance |
aks_cluster | Azure Kubernetes Service |
mlc_cluster | Machine Learning compute |
Considering the scenarios below, you must establish what is the most appropriate compute environment to:
1. Run an Azure Machine Learning Designer training pipeline
2. Deploy a web service from the Azure Machine Learning Designer
What are the best compute types for this goal?
- 1 mlc_cluster, 2 nb_server
- 1 mlc_cluster, 2 aks_cluster (CORRECT)
- 1 nb_server, 2 mlc_cluster
- 1 nb_server, 2 aks_cluster
Correct: Machine Learning Compute Cluster supports integration with AML designer training pipeline, and Azure Kubernetes Service supports integration with AML Designer.
42. . In order to do a multi-class classification using an unbalanced training dataset, you have to apply C-Support Vector classification. You use the following Python code for the C-Support Vector classification:
from sklearn.svm import svc
import numpy as np
svc = SVC(kernel = ‘linear’, class_weight= ‘balanced’, c-1.0, random_state-0)
model1 = svc.fit(X_train, y)
Considering that your task is to evaluate the C-Support Vector classification code, what is the most appropriate evaluation statement?
- class_weight=balanced: Automatically adjust weights inversely proportional to class frequencies in the input data
- C parameter: Size of the kernel cache
- class_weight=balanced: Automatically select the performance metrics for the classification
- C parameter: Size of the kernel cache
- class_weight=balanced: Automatically adjust weights inversely proportional to class frequencies in the input data (CORRECT)
- C parameter: Penalty parameter
- class_weight=balanced: Automatically adjust weights inversely proportional to class frequencies in the input data
- C parameter: Degree of polynomial kernel function
- class_weight=balanced: Automatically adjust weights directly proportional to class frequencies in the input data
- C parameter: Penalty parameter
Correct: Automatically adjust weights inversely proportional to class frequencies in the input data The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Penalty parameter Parameter: C : float, optional (default=1.0) Penalty parameter C of the error term.
43. If you want to extract a dataset after its registration, what are the most suitable methods you should choose from the Dataset class?
- find_by_name
- find_by_id
- get_by_name (CORRECT)
- get_by_id (CORRECT)
Correct: This method will retrieve a dataset using its name.
Correct: This method will retrieve a dataset using its id.
44. What object needs to be defined if your task is to create a schedule for your pipeline
- ScheduleTimer (CORRECT)
- ScheduleRecurrence
- ScheduleConfig
- ScheduleSync
Correct: Schedule is timed for the task.
45. If you want to set up a parallel run step, which of the SDK commands below should you choose?
- parallelrun_step = ParallelRunStep(
- name=’batch-score’,
- parallel_run_config=parallel.run.config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[]
- allow_reuse=True
- parallelrun_step = ParallelRunStep( (CORRECT)
- name=’batch-score’,
- parallel_run_config=parallel_run_config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
- parallelrun.step = ParallelRunStep(
- name=’batch-score’,
- parallel_run_config=parallel_run_config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
- parallelrun_step = ParallelRunStep(
- name=’batch-score’,
- parallel.run.config=parallel_run_config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
Correct: These are the correct commands.
46. You are able to update web services already deployed and to enable the Application Insight with the use of Azure ML SDK.
What code should you write to achieve this?
- service = ws.webservices[‘my-svc’]
- service.modify (enable_app_insights=True)
- service = ws.webservices[‘my-svc’]
- service.new (enable_app_insights=True)
- service = ws.webservices(‘my-svc’)
- service.create(enable_app_insights=True)
- service = ws.webservices[‘my-svc’] (CORRECT)
- service.update(enable_app_insights=True)
Correct: This is the correct code.
47. The DataFrame you are currently working on contains data regarding the daily sales of ice cream. In order to compare the avg_temp and units_sold columns you decided to use the corr method which returned a result of 0.95.
What information can you read from this result?
- On the day with the maximum units_sold value, the avg_temp value was 0.95
- Days with high avg_temp values tend to coincide with days that have high units_sold values (CORRECT)
- The units_sold value is, on average, 95% of the avg_temp value
Correct: The corr method returns the correlation, and a value near 1 indicates a positive correlation.
48. Which of the following methods are the ACI services and AKS services default authentication ones?
- Disabled for AKS services
- Token-based for AKS services.
- Token-based for ACI services
- Key-based for AKS services (CORRECT)
- Disabled for ACI services (CORRECT)
Correct: By default, authentication is set to key-based authentication for AKS services (for which primary and secondary keys are automatically generated).
Correct: By default, authentication is disabled for ACI services.
49. If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace, what Python command would be the most appropriate?
- from azureml.core import Workspace
- ws = from.config_Workspace()
- from azureml.core import Workspace (CORRECT)
- ws = Workspace.from_config()
- from azureml.core import Workspace
- ws = Workspace.from.config
Correct: This is the correct command for this task.
50. Your company is asking you to analyze a dataset that contains historical data obtained from a local car-sharing company. For this task, you decide to develop a regression model and you want to be able to foretell what price a trip will be. For the correct evaluation of the regression model, you have to use performance metrics.
In this scenario, what are the best two metrics?
- An F1 score that is low
- An R-Squared value close to 0
- A Root Mean Square Error value that is low (CORRECT)
- An R-Squared value close to 1 (CORRECT)
Correct: RMSE and R2 are both metrics for regression models. Root mean squared error (RMSE) creates a single value that summarizes the error in the model.
Correct: RMSE and R2 are both metrics for regression models. Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit.
51. Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the same unit as the label.
What is the metric described above?
- Coefficient of Determination (known as R-squared or R2)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE) (CORRECT)
Correct: This is the described metric. This means that the smaller the value, the better the model.
52. When you use the Support Vector Machine algorithm, what type of machine learning model is possible to train?
- Regression
- Clustering
- Classification (CORRECT)
Correct: Logistic Regression is a well-established algorithm for classification.
53. In order to create clusters, Hierarchical clustering uses two methods.
What are the two methods used in this case?
- Aggregational
- Distinctive
- Agglomerative (CORRECT)
- Divisive (CORRECT)
Correct: Agglomerative clustering is a “bottom up” approach.
Correct: The divisive method is a “top down” approach starting with the entire dataset and then finding partitions in a stepwise manner
54. What is the effect that you obtain if you increase the Learning Rate parameter for the deep neural network that you are creating?
- More hidden layers are added to the network
- Larger adjustments are made to weight values during backpropagation (CORRECT)
- More records are included in each batch passed through the network
Correct: Increasing the learning rate causes backpropagation to make larger weight adjustments.
55. Your task is to set up an Azure Machine Learning workspace. You decide to use a laptop computer to create a local Python environment.
You want to ensure connection between the laptop and the workspace and you want to run experiments.
You start creating the config.json file below:
{ “workspace_name” : “ml-workspace” }
In order to interact in the workspace with data and experiments, you have to use the Azure Machine Learning SDK. Your config.json file has to be able to connect from the Python environment directly to the workspace. If you want to ensure connection to the workspace, what should be the two additional parameters that you should add to the config,json? Keep in mind that every correct answer presents a part of the solution.
- Login
- Region
- Key
- Resource_group (CORRECT)
- Subscription_id (CORRECT)
Correct: This parameter must be specified.
Correct: This parameter must be specified.
56. You are using an Azure Machine Learning service for your data science project. In order to deploy the project, you have to choose a compute target. For this scenario, which of the following Azure services is the most suitable?
- Azure Databricks
- Azure Data Lake Analytics
- Apache Spark for HDInsight
- Azure Container Instances (CORRECT)
Correct: Azure Container Instances can be used as compute target for testing or development. Use for low-scale CPU-based workloads that require less than 48 GB of RAM.
57. You want to use your registered model in a batch inference pipeline.
For processing files in a file dataset, your batch inference pipeline has to use a ParallelRunStep step. The script has the ParallelRunStep step and every time the inferencing function is used, the runs need to be able to process six input files.
You have to set up the pipeline. What configuration setting needs to be specified in the ParallelRunConfig object for the ParallelRunStep step?
- process_count_per_node= “6”
- node_count= “6”
- mini_batch_size= “6” (CORRECT)
- error_threshold= “6”
Correct: For FileDataset input, this field is the number of files a user script can process in one run() call. For TabularDataset input, this field is the approximate size of data the user script can process in one run() call.
58. After installing the Azure Machine Learning Python SDK, you decide to use it to configure on your subscription a workspace entitled “aml-workspace”.
What code should you write in Python for this task?
- azureml.core import Workspace
- ws = Workspace.create(name=’aml-workspace’,
- subscription_id=’123456-abc-123…’,
- resource_group=’aml-resources’,
- create_resource_group=False,
- location=’eastus’
- )
- from azureml.core import Workspace (CORRECT)
- ws = Workspace.create(name=’aml-workspace’,
- subscription_id=’123456-abc-123…’,
- resource_group=’aml-resources’,
- create_resource_group=True,
- location=’eastus’
- )
- from azureml.core import Workspace
- ws = Workspace.create(name=’aml-workspace’,
- subscription_id=’123456-abc-123…’,
- resource_group=’aml-resources’,
- location=’eastus’
- )
Correct: This is the correct and complete command to run for this scenario.
59. What Python command should you choose in order to view the models previously registered in the Azure ML studio by using the Model object?
- from azureml.core import Model
- for model in Model.list(ws):
- print(model.name, ‘version:’, model.version)
- from azureml.core import Model
- for model in List.Model(ws):
- print(model.name, ‘version:’, model.version)
- from azureml.core import Model (CORRECT)
- for model in Model.object(ws):
- print(model.name, ‘version:’, model.version)
- from azureml.core import Model
- for model in Model.list(ws):
- get(model.name, ‘version:’, model.version)
Correct: This is the correct command for this request.
60. What SDK commands should you choose if you want to extract a certain version of a data set?
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version_2)
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version(2))
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version=2) (CORRECT)
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version=’2’)
Correct: This is the correct command for this request.
61. Your task is to deploy your service on an AKS cluster that is set up as a compute target.
What SDK commands are able to return you the expected result?
- from azureml.core.webservice import ComputeTarget, AksWebservice
- cluster_name = ‘aks-cluster’
- compute_config = AksCompute.provisioning_configuration(location=’eastus’)
- production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
- production_cluster.wait_for_completion(show_output=True)
- from azureml.core.compute import ComputeTarget, AksCompute (CORRECT)
- cluster_name = ‘aks-cluster’
- compute_config = AksCompute.provisioning_configuration(location=’eastus’)
- production_cluster = ComputeTarget.deploy (ws, cluster_name, compute_config)
- production_cluster.wait_for_completion(show_output=True)
- from azureml.core.compute import ComputeTarget, AksCompute
- cluster_name = ‘aks-cluster’
- compute_config = AksCompute.provisioning_configuration(location=’eastus’)
- production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
- production_cluster.wait_for_completion(show_output=True)
- from azureml.core.webservice import ComputeTarget, AksCompute
- cluster_name = ‘aks-cluster’
- compute_config = AksCompute.provisioning_configuration(location=’eastus’)
- production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
- production_cluster.wait_for_completion(show_output=True)
Correct: This is the correct command for this request.
62. If you want to extract the parallel_run_step.txt file from the output of the step after the pipeline run has ended, what code should you choose?
- df = pd.read_csv(result_file, delimiter=”:”, header=None)
- df.columns = [“File”, “Prediction”]
- print(df)
- prediction_run = next(pipeline_run.get_children())
- prediction_output = prediction_run.get_output_data(‘inferences’)
- prediction_output.download(local_path=’results’)
- for root, dirs, files in os.walk(‘results’): (CORRECT)
- for file in files:
- if file.endswith(‘parallel_run_step.txt’):
- result_file = os.path.join(root,file)
Correct: This code will find the parallel_run_step.txt file.
63. What code should you write for an instance of a MimicExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import MimicExplainer
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- initialization_examples=X_test,
- explainable_model = DecisionTree,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import MimicExplainer
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- initialization_examples=X_test,
- explainable_model = DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- from interpret.ext.blackbox import MimicExplainer (CORRECT)
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- explainable_model = DecisionTreeExplainableModel,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import MimicExplainer
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- initialization_examples=X_test,
- explainable_model = DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
Correct: This is the correct code for a MimicExplainer.
64. What code should you write for a PFIExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import PFIExplainer
- pfi_explainer = PFIExplainer(model = loan_model,
- initialization_examples=X_test,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import PFIExplainer (CORRECT)
- pfi_explainer = PFIExplainer(model = loan_model,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import PFIExplainer
- pfi_explainer = PFIExplainer(model = loan_model,
- explainable_model= DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox
- pfi_explainer = PFIExplainer(model = loan_model,
- initialization_examples=X_test,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
Correct: This is the correct code for a PFIExplainer.
65. Choose from the list below all the options that show how are also entitled the qualitative variables.
- Continuous
- Numerical
- Discrete (CORRECT)
- Categorical (CORRECT)
Correct: This is one of the ways qualitative variables are also known.
Correct: This is one of the ways qualitative variables are also known.
66. Which of the non-exhaustive cross validation techniques listed below enables you to assign data points in a random way to the training set and the test set?
- K-fold cross-validation
- Repeated random sub-sampling validation
- Holdout cross-validation (CORRECT)
Correct: In the holdout method, you randomly assign data points to two sets d0 and d1, usually called the training set and the test set, respectively. The size of each of the sets is arbitrary although typically the test set is smaller than the training set. You then train (build a model) on d0 and test (evaluate its performance) on d1.
67. If you want to list the generated files after your experiment run is completed, what is the most suitable object run you should choose?
- download_files
- download_file
- get_file_names (CORRECT)
- list_file_names
Correct: You can use the run objects get_file_names method to list the files generated. Standard practice is for scripts that train models to save them in the run’s outputs folder.
68. You can enable the Application Insights when configuring the service deployment at the moment you want to deploy a new real-time service.
By using the SDK, what code should you write to achieve this goal?
- dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
- memory_gb = 1,
- app_insights(True))
- dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
- memory_gb = 1,
- appinsights=True)
- dep_config = AciWebservice.deploy_configuration(cpu_cores = 1, (CORRECT)
- memory_gb = 1,
- enable_app_insights=True)
- dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
- memory_gb = 1,
- app_insights=True)
Correct: This is the correct code.
69. You intend to use the Hyperdrive feature of Azure Machine Learning to determine the optimal hyperparameter values when training a model.
You need to use Hyperdrive to try combinations of the following hyperparameter values:
— learning_rate: any value between 0.001 and 0.1
— batch_size: 16, 32, or 64
You must configure the search space for the Hyperdrive experiment.
Which two parameter expressions should you use? Each correct answer presents part of the solution.
- A choice expression for learning_rate
- A normal expression for batch_size
- A choice expression for batch_size (CORRECT)
- A uniform expression for learning_rate (CORRECT)
Correct: Discrete hyperparameters are specified as a choice among discrete values. choice can be: one or more comma-separated values — a range object — any arbitrary list object.
Correct: Continuous hyperparameters are specified as a distribution over a continuous range of values. Supported distributions include:
— uniform(low, high) – Returns a value uniformly distributed between low and high.
70. You have a set of CSV files that contain sales records. Your CSV files follow an identical data schema.
The sales record for a certain month are held in one of the CSV files and the filename is sales.csv. For every file there is a corresponding storage folder that shows the month and the year for the data recording. In an Azure Machine Learning workspace has been set up a datastore for the folders kept in an Azure blob container. The parent folder entitled sales contains the folders organized to create the hierarchical structure below:
/sales
/01-2019
/sales.csv
/02-2019
/sales.csv
/03-2019
/sales.csv
…
In the sales folder is added a new folder with a certain month’s sales every time that month has ended. You want to train a machine learning model by using the sales data while complying with the requirements below:
– All of your sales data have to be loaded to date by a dataset and into a structure that enables easy conversion to a dataframe.
– You have to ensure that experiments can be done by using only the data created until a specific previous month, disregarding any data added after the month selected.
– You have to keep the number of registered datasets to the minimum possible.
Considering that the sales data have to be registered as a dataset in the Azure Machine Learning service workspace, what actions should you take?
- Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
- Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary. (CORRECT)
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
Correct: This is the correct approach to this scenario.
71. Your company uses a set of labeled photographs for the multi-class image classification deep learning model that is creating.
During summer time, the software engineering team noticed that for the prediction web services is a heavy inferencing load. Although the production web service for the model has a fully-utilized compute cluster for the deployment of the web service, it fails to meet demand.
While keeping the downtime and the administrative effort to a minimum, you have to be able to improve the image classification web service performance. Considering this, what actions do you recommend the IT Operations team to take?
- Increase the VM size of nodes in the compute cluster where the web service is deployed.
- Increase the node count of the compute cluster where the web service is deployed. (CORRECT)
- Create a new compute cluster by using larger VM sizes for the nodes, redeploy the web service to that cluster, and update the DNS registration for the service endpoint to point to the new cluster.
- Increase the minimum node count of the compute cluster where the web service is deployed.
Correct: The Azure Machine Learning SDK does not provide support scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure Machine Learning studio.
You can only change the node count, not the VM size of the cluster.
72. You decided to use the from_files method of the Dataset.File class to configure a file dataset.
You then want to register the file dataset with the title img_files in a workspace.
What SDK commands should you choose for this task?
- from azureml.core import Dataset
- file_ds = Dataset.File.from_files(path=(blob_ds, ‘data/files/images/*.jpg’))
- file_ds = file_ds.register(workspace=ws, name=’img_files’)
- from azureml.core import Dataset
- blob_ds = ws.get_default_datastore()
- file_ds = Dataset.File.from_files(path=(blob_ds, ‘data/files/images’))
- file_ds = file_ds.register(workspace=ws, name=’img_files’)
- from azureml.core import Dataset
- blob_ds = ws.get_default_datastore()
- file_ds = Dataset.File.from_files(path=(blob_ds, ‘data/files/images/*.jpg’))
- from azureml.core import Dataset (CORRECT)
- blob_ds = ws.get_default_datastore()
- file_ds = Dataset.File.from_files(path=(blob_ds, ‘data/files/images/*.jpg’))
- file_ds = file_ds.register(workspace=ws, name=’img_files’)
Correct: This is the correct and complete command for this scenario.
73. True or False?
Before publishing, a pipeline needs to have its parameters defined.
- True (CORRECT)
- False
Correct: You must define parameters for a pipeline before publishing it.
74. You can combine the Bayesian sampling with an early-termination policy and you can use it only with these three parameter expressions: choice, uniform and quniform.
- best_run, fitted_model = automl.run.get_output()
- best_run_metrics = best_run.get_metrics()
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
- best_run, fitted_model = automl_run.get_output() (CORRECT)
- best_run_metrics = best_run_get_metrics(1)
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
- best_run, fitted_model = automl_run.get_output()
- best_run_metrics = best_run.get_metrics()
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
- best_run, fitted_model = automl_run.get_input()
- best_run_metrics = best_run.get_metrics()
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
Correct: This is the correct and complete command for this scenario.
75. What code should you write using SDK if your goal is to extract the best run and its model?
- Celebrate team success and milestones
- Ask team members how they prefer to communicate (CORRECT)
- Create motivation by rewarding good work
- Hold regular team meetings (CORRECT)
Correct: Learning your team members’ communication styles can help you connect with them individually.
76. Your task is to train a binary classification model in order for it to be able to target the correct subjects in a marketing campaign.
What actions should you take if you want to ensure that your model is fair and will not be inclined to ethnic discrimination?
- Remove the ethnicity feature from the training dataset.
- Evaluate each trained model with a validation dataset, and use the model with the highest accuracy score. An accurate model is inherently fair.
- Compare disparity between selection rates and performance metrics across ethnicities. (CORRECT)
Correct: By using ethnicity as a sensitive field, and comparing disparity between selection rates and performance metrics for each ethnicity value, you can evaluate the fairness of the model.
77. You decided to use Python code interactively in your Conda environment. You have all the required Azure Machine Learning SDK and MLflow packages in the environment.
In order to log metrics in your Azure Machine Learning experiment entitled mlflow-experiment, you have to use MLflow.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
#1 Set the MLflow logging target
#2 Configure the experiment
with #3 Begin the experiment run
#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)
print(“Finished!”)
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3 mlflow.active_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 run.log() (CORRECT)
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
Correct: #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()) In the following code, the get_mlflow_tracking_uri() method assigns a unique tracking URI address to the workspace, ws, and set_tracking_uri() points the MLflow tracking URI to that address.
#2 mlflow.set_experiment(experiment_name) Set the MLflow experiment name with set_experiment() and start your training run with start_run().
#3 mlflow.start_run()
#4 mlflow.log_metric – Then use log_metric() to activate the MLflow logging API and begin logging your training run metrics.
78. You want to deploy in your Azure Container Instance a deep learning model.
In order to call the model API, you have to use the Azure Machine Learning SDK.
To invoke the deployed model, you have to use native SDK classes and methods.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from azureml.core import Workspace
#1st code option
Import json
ws = Workspace.from_config()
service_name = “mlmodel1-service”
service = Webservice(name=service_name, workspace=ws)
x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]
input_json = json.dumps({“data”: x_new})
#2nd code option
- from azureml.core.webservice import requests, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)
- from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.run(input_json) (CORRECT)
Correct: These are the correct commands for this task.
79. One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
Every room is assigned with a unique numerical value because you have encoded every unique string into a number.
In order for the machine learning algorithm to effect every category, you have to one-hot encode every one of the values to a location in an array.
What code should you write to achieve this goal?
- from pyspark.ml.feature import OneHotEncoder
- encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
- encoderModel = encoder.fit(indexedDF)
- encodedDF = encoderModel.transform(indexedDF)
- display(encodedDF)
- from pyspark.ml.feature import OneHotEncoder
- encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
- encoderModel = encoder.fit(indexedDF)
- encodedDF = encoderModel.fit (indexedDF)
- display(encodedDF)
- from pyspark.ml.feature import OneHotEncoder (CORRECT)
- encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
- encoderModel = encoder.fit(indexedDF)
- encodedDF = encoderModel(indexedDF)
- display(encodedDF)
- from pyspark.ml.feature import OneHotEncoder
- encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
- encoderModel = encoder.fit(indexedDF)
- encodedDF = encoderModel_transform()
- display(encodedDF)
Correct: This is the correct code for this task.
80. You decided to use Azure Machine Learning and your goal is to train a Diabetes Model and build a container image for it.
You choose to make use of the scikit-learn ElasticNet linear regression model.
You want to use Azure Kubernetes Service (AKS) for the model deployment to production.
You have to create an active AKS cluster by using the Azure ML SDK.
You decide to use the standard configuration.
What code should you write for this task?
- aks_target = ComputeTarget.create(workspace = workspace,
- name = aks_cluster_name,
- provisioning_configuration = prov_config)
- aks_target = ComputeTarget.create(workspace = workspace,
- name = aks_cluster_name,)
- aks_target = ComputeTarget.deploy(workspace = workspace, (CORRECT)
- name = aks_cluster_name,
- provisioning_configuration = prov_config)
- aks_target = ComputeTarget.workspace = workspace
- (name = aks_cluster_name,
- provisioning_configuration = prov_config)
Correct: This is the correct and complete command to run for this scenario.
81. You decide to use a two-class logistic regression model for a binary classification. If you have to evaluate the results for imbalance issues, what would be the best evaluation metric for the model?
- Mean Absolute Error
- Relative Squared Error
- AUC Curve (CORRECT)
- Relative Absolute Error
Correct: One can inspect the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value.
82. Your task is to create and evaluate a model. You decide to use a specific metric that provides you a direct proportionality with how well the model fits.
What is the evaluation model described above?
- Root Mean Square Error (RMSE)
- Mean Square Error (MSE)
- Coefficient of Determination (known as R-squared or R2) (CORRECT)
Correct: This is the evaluation metric described. In essence, this metric represents how much of the variance between predicted and actual label values the model is able to explain.
83. Four possible prediction outcomes are able to provide you with the Precision and Recall metrics.
What is the outcome in the scenario where the predicted label is 1, but the actual label is 0?
- False Negative
- True Negative
- True Positive
- False Positive (CORRECT)
Correct: This outcome happens when the predicted label is 1, but the actual label is 0.
84. In order to register a datastore in a Machine Learning services workspace, one of your coworkers decides to use the code below:
Datastore.register_azure_blob_container(workspace=ws,
datastore_name=‘demo_datastore’,
container_name=‘demo_datacontainer’,
account_name=’demo_account’,
account_key=’0A0A0A-0A00A0A-0A0A0A0A0A0’
create_if_not_exists=True)
You want to be able to access the datastore by using a notebook. If you want to achieve this goal, what code should you write for completing the following snippet segment?
import azureml.core
from azureml.core import Workspace, Datastore
ws = Workspace.from_config()
datastore = <add answer here> .get( <add answer here>, ‘<add answer here>’)
- DataStore, ws, demo_datastore (CORRECT)
- Run, ws, demo_datastore
- Run, experiment, demo_datastore
- Experiment, run, demo_account
Correct: This is the correct statement for the data store.
85. You decide to register and train a model in your Azure Machine Learning workspace.
Your pipeline needs to ensure that the client applications are able to use the model for batch inferencing.
Your single ParallelRunStep step pipeline uses a Python inferencing script in order to obtain predictions from the input data.
Your task is to configure the inferencing script for the ParallelRunStep pipeline step.
Which are the most suitable two functions that you should use? Keep in mind that every correct answer presents a part of the solution.
- main()
- score(mini_batch)
- batch()
- run(mini_batch) (CORRECT)
- init() (CORRECT)
Correct:This function is called for each batch of data to be processed.
86. You decide to deploy a real-time inference service for a trained model.
Your model is able to support a business-critical application, and you have to ensure it can monitor the data that is submitted to the web, as well as the predictions generated by the data.
While keeping the administrative effort to a minimum, you have to be able to implement a monitoring solution for the model deployed. What action should you take?
- az ml ws create -w ‘aml-workspace’ -g ‘aml-resources’
- az ml workspace create -w ‘aml-workspace’ -g ‘aml-resources’ (CORRECT)
- new az ml workspace create -w ‘aml-workspace’ -g ‘aml-resources’
- az ml new workspace create -w ‘aml-workspace’ -g ‘aml-resources’
Correct: This is the correct and complete command to run for this scenario.
87. After installing the Azure Machine Learning CLI extension, you decide to use it to set up an ML workspace in your existing resource group.
What Azure CLI command should you choose for this task?
- compute_config = AmlCompute_provisioning_configuration(vm_size=’STANDARD_DS11_V2′,
- min_nodes=0, max_nodes=4,
- vm_priority=’dedicated’)
- compute_config = AmlCompute.provisioning_configuration(vm_size=’STANDARD_DS11_V2′, (CORRECT)
- min_nodes=0, max_nodes=4,
- vm_priority=’dedicated’)
- compute_config = AmlCompute.provisioning_configuration(vm_size=’STANDARD_DS11_V2′,
- min_nodes=0, max_nodes=0,
- vm_priority=’dedicated’)
- compute_config = AmlCompute.provisioning.configuration(vm_size=’STANDARD_DS11_V2′,
- min_nodes=0, max_nodes=4,
- vm_priority=’dedicated’)
Correct: These are the correct commands for the job.
88. Your task is to use the SDK in order to define a compute configuration for a managed compute target.
Which of the following commands will return you the expected result?
- Evaluate alternative actions. (CORRECT)
- Make a decision and test it.
- Recognize an ethical issue.
- Act and reflect on the outcome.
89. What Python code should you write if your goal is to implement a median stopping policy?
- from azureml.train.hyperdrive import MedianStoppinPolicy
- early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,
- evaluation_interval=1,
- delay_evaluation=5)
- from azureml.train.hyperdrive import MedianStoppingPolicy
- early_termination_policy = MedianStoppingPolicy(truncation_percentage=10,
- evaluation_interval=1,
- delay_evaluation=5)
- from azureml.train.hyperdrive import MedianStoppingPolicy (CORRECT)
- early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
- delay_evaluation=5)
Correct: This is the correct code for this task.
90. Your task is to back fill a dataset monitor for the previous 5 months based on changes made in data on a monthly basis.
What code should you write in the SDK to achieve this goal?
- import datetime as dt
- backfill = monitor.backfill( dt.datetime.now(), dt.timedelta(months=5), dt.datetime.now())
- import datetime as dt
- backfill = monitor_backfill( dt.datetime.now(), dt.timedelta(months=5), dt.datetime.now())
- import datetime as dt (CORRECT)
- backfill = monitor_backfill( dt.datetime.now() – dt.timedelta(months=5), dt.datetime.now())
- import datetime as dt
- backfill = monitor.backfill( dt.datetime.now() – dt.timedelta(months=5), dt.datetime.now())
Correct: This is the correct code.
91. You decided to use the AirBnB Housing dataset and the Linear Regression algorithm for which you want to tune the Hyperparameters.
At this point, for the Boston data set you have executed a test split and for the linear regression you have built a pipeline.
You now want to test the maximum number of iterations by using the ParamGridBuilder() and you can do this no matter if you want to use an intercept with the y axis or fi you want to standardize the features.
Considering this scenario, what code should you write?
- from pyspark.ml.tuning import ParamGridBuilder
- paramGrid = (ParamGridBuilder(lr)
- .addGrid(lr.maxIter, [1, 10, 100])
- .addGrid(lr.fitIntercept, [True, False])
- .addGrid(lr.standardization, [True, False])
- .run()
- )
- from pyspark.ml.tuning import ParamGridBuilder
- paramGrid = (ParamGridBuilder()
- .addGrid(lr.maxIter, [1, 10, 100])
- .addGrid(lr.fitIntercept, [True, False])
- .addGrid(lr.standardization, [True, False])
- .build()
- )
- from pyspark.ml.tuning import ParamGridBuilder
- paramGrid = (ParamGridBuilder()
- .addGrid(lr.maxIter, [1, 10, 100])
- .addGrid(lr.fitIntercept, [True, False])
- .addGrid(lr.standardization, [True, False])
- .search()
- )
- from pyspark.ml.tuning import ParamGridBuilder (CORRECT)
- paramGrid = (ParamGridBuilder(lr)
- .addGrid(lr.maxIter, [1, 10, 100])
- .addGrid(lr.fitIntercept, [True, False])
- .addGrid(lr.standardization, [True, False])
- .create()
- )
Correct: This is the correct code for this task.
92. Choose from the list below the cross-validation technique that belongs to the exhaustive type.
- K-fold cross-validation
- Holdout cross-validation
- Leave-p-out cross-validation (CORRECT)
- Leave-one-out cross-validation (CORRECT)
Correct: Leave-p-out cross-validation (LpO CV) is an exhaustive type of cross-validation technique. It involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set.
Correct: Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1, which makes it an exhaustive type of cross-validation.
93. You usually take the following steps when you use HorovodRunner in order to develop a distributed training program:
1. Configure a HorovodRunner instance that is initialized with the nodes number.
2. While using the methods described in Horovod usage, define a Horovod training method for which you want to ensure that import statements are added inside the method.
What code should you write in Python to achieve this?
- hr = HorovodRunner(tf)
- def train():
- import tensorflow as np
- hvd.init(2)
- hr.run(train)
- hr = HorovodRunner()
- def train():
- import tensorflow as tf
- hvd.init(np)
- hr.run(train)
- hr = HorovodRunner(np=2) (CORRECT)
- def train():
- import tensorflow as tf
- hvd.init()
- hr.run(train)
- hr = HorovodRunner(np)
- def train():
- import tensorflow as tf
- hvd.init()
- hr.run(train)
Correct: This would be the correct code syntax.
94. You create a machine learning model by using the Azure Machine Learning designer. You publish the model as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no change to the deployed endpoint configuration.
You need to provide application developers with the information they need to consume the endpoint. Which two values should you provide to application developers? Each correct answer presents part of the solution.
- The run ID of the inference pipeline experiment for the endpoint
- The name of the inference pipeline for the endpoint
- The name of the AKS cluster where the endpoint is hosted
- The URL of the endpoint (CORRECT)
- The key for the endpoint (CORRECT)
Correct: You can get the URL of the endpoint from the scoring_uri.
Correct: When authentication is enabled, you can also use the SDK to get the authentication keys or tokens.
95. Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having a disease”.
- True positives
- False negatives
- False positives
- True negatives (CORRECT)
Correct: A true negative is an outcome where the model correctly predicts the negative class.
96. In order to foretell the price for a student’s craftwork, you have to rely on the following variables:
the student’s length of education, degree type, and art form. You decide to set up a linear regression model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
- Yes
- No (CORRECT)
Correct: Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models; Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear regression model.
97. You have a Pandas DataFrame entitled df_sales that contains the sales data from each day. You DataFrame contains these columns: year, month, day_of_month, sales_total. Which of the following codes should you choose if your goal is to return the average sales_total value?
- mean(df_sales[‘sales_total’])
- df_sales[‘sales_total’].avg()
- df_sales[‘sales_total’].mean() (CORRECT)
Correct: This code will return the average of the sales_total column values.
98. If you use the sklearn.metrics classification report for evaluating how your model performs, what result do you get from the F1-Score metric?
- How many instances of this class are there in the test dataset
- Out of all of the instances of this class in the test dataset, how many did the model identify
- Of the predictions the model made for this class, what proportion were correct
- An average metric that takes both precision and recall into account. (CORRECT)
Correct: This is what the F1-Score provides.
99. In order to train your K-Means clustering model that enables grouping observations into four clusters, you decide to use scikit-learn library. Considering this scenario, what method should you choose to create the K-Means object?
- model = Kmeans(n_init=4)
- model = KMeans(n_clusters=4) (CORRECT)
- model = Kmeans(max_iter=4)
Correct: The n_clusters parameter determines the number of clusters.
100. The layer described below is used to reduce the number of feature values that are extracted from images, while still retaining the key differentiating features.
- Convolutional layer
- Pooling layer (CORRECT)
- Flattening layer
Correct: After extracting feature values from images, pooling (or downsampling) layers are used to reduce the number of feature values while retaining the key differentiating features that have been extracted.
101. You are in the process of training a machine learning model. Your model has to be configured for testing as a real-time inference service. For the service you have to ensure low CPU utilization and less than 48 MB of RAM. While keeping cost and administrative overhead to a minimum, you have to make sure that the compute target for the deployed service is initialized in an automatic manner.
In this scenario, what is the most appropriate compute target?
- attached Azure Databricks cluster
- Azure Kubernetes Service (AKS) inference cluster
- Azure Container Instance (ACI) (CORRECT)
- Azure Machine Learning compute cluster
Correct: Azure Container Instances (ACI) are suitable only for small models less than 1 GB in size.
Use it for low-scale CPU-based workloads that require less than 48 GB of RAM. Note: Microsoft recommends using single-node Azure Kubernetes Service (AKS) clusters for dev-test of larger models.
102. You want to create a pipeline for which you defined three steps entitled as step1, step2, and step3.
Your goal is to run the pipeline as an experiment after the steps have been assigned to it.
Which of the following SDK command should you choose for this task?
- train_pipeline = Pipeline(workspace = ws, steps = [step1,step2,step3])
- experiment = Experiment(workspace = ws, name = ‘training-pipeline’)
- pipeline_run = experiment_submit(train_pipeline)
- train_pipeline = Pipeline(workspace = ws, steps = [step1,step2,step3])
- experiment = Experiment(workspace = ws)
- pipeline_run = experiment.submit(train_pipeline)
- train_pipeline = Pipeline(workspace = ws, steps = [step1,step2,step3]) (CORRECT)
- experiment = Experiment(workspace = ws, name = ‘training-pipeline’)
- pipeline_run = experiment.submit(train_pipeline)
- train_pipeline = Pipeline(workspace = ws, steps = [step1;step2;step3])
- experiment = Experiment(workspace = ws, name = ‘training-pipeline’)
- pipeline_run = experiment.submit(train_pipeline)
Correct: These are the correct and complete commands for this scenario.
103. You want to evaluate a Python NumPy array that has six data points with the following definition: data = [10, 20, 30, 40, 50, 60]
Your task is to use the k-fold algorithm implementation in the Python Scikit-learn machine learning library to generate the output that follows: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
In order to generate the output, you have to implement a cross-validation.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from numpy import array
from sklearn.model_selection import #1st option
data – array ([10, 20, 30, 40, 50, 60])
kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)
for train, test in kFold, split( #3rd option):
print (‘train’: %s, test: %5’ % (data[train], data[test])
- K-means, 6, array
- K-fold, 3, array
- K-fold, 3, data (CORRECT)
- CrossValidation, 3, data
Correct: K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.
104. Choose from the list below the supervised learning problem type that usually outputs quantitative values.
- Classification
- Regression (CORRECT)
- Clustering
Correct: This would be the algorithm used because you would predict a label based on numerical values.
105. Choose from the descriptions below the one that explains what does a negative correlation of -1 mean in terms of correlations.
- For each unit increase in one variable, the same increase is seen in the other
- There is no association between the variables
- For each unit increase in one variable, the same decrease is seen in the other (CORRECT)
Correct: This is what a negative correlation of -1 indicate.
106. Your task is to extract from the experiments list the last run.
What code should you write in Python to achieve this?
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=1)
- runs[0].data.metrics (CORRECT)
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time”], max_results=1)
- runs[0].data.metrics
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time asce”], max_results=1)
- runs[0].data.metrics
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=3)
- runs[0].data.metrics
Correct: This is the correct code syntax.
107. You published a parametrized pipeline and you now want to be able to pass parameter values in the JSON payload for the REST interface.
What SDK commands are the most appropriate to achieve your goal?
- response = requests_post(rest_endpoint,
- json={“ExperimentName”: “run_training_pipeline”,
- “ParameterAssignments”: {“reg_rate”: 0.1}})
- response = requests.post(rest_endpoint, (CORRECT)
- headers=auth_header,
- json={“ExperimentName”: “run_training_pipeline”,
- “ParameterAssignments”: {“reg_rate”: 0.1}})
- response = requests.post(rest_endpoint,
- json=auth_header,
- headers={“ExperimentName”: “run_training_pipeline”,
- “ParameterAssignments”: {“reg_rate”: 0.1}})
- response = requests.post(rest_endpoint,
- headers()
- json={“ExperimentName”: “run_training_pipeline”,
- “ParameterAssignments”: {“reg_rate”: 0.1}})
Correct: These are the correct commands.
108. You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: You use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.
Does the solution meet the goal?
- No
- Yes (CORRECT)
Correct: The metrics above are used for evaluating regression models. When you compare models, they are ranked by the metric you select for evaluation.
109. Your NumPy array has the shape (2,35). Considering this, what information can you get about the elements?
- The array contains 2 elements with the values of 2 and 35.
- The array contains 35 elements, all with the value 2.
- The array is two dimensional, consisting of two arrays with 35 elements each. (CORRECT)
Correct: A shape of (2,35) indicates a multidimensional array with two arrays, each containing 35 elements.
110. You decided to use the LinearRegression class from the scikit-learn library to create your model object.
If you want to train the model, what should your next step be?
- Call the score() method of the model object, specifying the training feature and test feature arrays
- Call the predict() method of the model object, specifying the training feature and label arrays
- Call the fit() method of the model object, specifying the training feature and label arrays (CORRECT)
Correct: To train the model, use the fit() method.
111. Which are two appropriate ways to approach a problem when using multiclass classification?
- Rest minus One
- One and Rest
- One vs Rest (CORRECT)
- One vs One (CORRECT)
Correct: One vs Rest (OVR), in which a classifier is created for each possible class value, with a positive outcome for cases where the prediction is this class, and negative predictions for cases where the prediction is any other class.
Correct: One vs One (OVO), in which a classifier for each possible pair of classes is created.
112. Your task is to train a model entitled finance-data for the financial department, by using data in an Azure Storage blob container.
Your container has to be registered in an Azure Machine Learning workspace as a datastore and you have to make sure that an error will appear if the container does not exist.
Considering this scenario, what should be the continuation for the code below?
Datastore = Datastore.<add answer here> (workspace = ws,
datastore_name = ‘finance_datastore’,
container_name = ‘finance-data’,
account_name = ‘fintrainingdatastorage’,
account_key = ‘FdhIWHDaiwh2…’
<add answer here>
- register_azure_blob_container, overwrite = True
- register_azure_blob_container, create_if_not_exists = False (CORRECT)
- register_azure_data_lake, create_if_not_exists = False
- register_azure_data_lake, overwrite = False
Correct: register_azure_blob_container to Register an Azure Blob Container to the datastore and create_if_not_exists = False to create the file share if it does not exist, defaults to False.
113. You decide to use the code below for the deployment of a model as an Azure Machine Learning real-time web service:
# ws, model, inference_config, and deployment_config defined previously
service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)
service.wait_for_deployment(True)
Your deployment does not succeed.
You have to troubleshoot the deployment failure in order to determine what actions were taken while deploying and to identify the one action that encountered a problem and didn’t succeed.
For this scenario, which of the following code snippets should you use?
- service.get_logs() (CORRECT)
- service.update_deployment_state()
- service.serialize()
- service.state
Correct: You can print out detailed Docker engine log messages from the service object.
You can view the log for ACI, AKS, and Local deployments.
114. In order to train models, you decide to use an Azure Machine Learning compute resource. You set up the compute resource in the following manner: – Minimum nodes: 1 – Maximum nodes: 5. You have to decrease the minimum number of nodes and to increase the maximum number of nodes to the following values: – Minimum nodes: 0 – Maximum nodes: 8
Considering that you have to reconfigure the compute resource, which three ways are able to return the expected result? Keep in mind that every correct answer presents a complete solution.
- Use the Azure Machine Learning designer.
- Run the refresh_state() method of the BatchCompute class in the Python SDK.
- Run the update method of the AmlCompute class in the Python SDK. (CORRECT)
- Use the Azure portal. (CORRECT)
- Use the Azure Machine Learning studio. (CORRECT)
Correct: The update(min_nodes=None, max_nodes=None, idle_seconds_before_scaledown=None) of the AmlCompute class updates the ScaleSettings for this AmlCompute target.
Correct: To change the nodes in the cluster, use the UI for your cluster in the Azure portal.
Correct: You can manage assets and resources in the Azure Machine Learning studio.
115. You discover a median value for a number of variables in your AirBnB Housing dataset, variables like the number of rooms, per capita crime and economic status of residents.
Depending on the average number of rooms, you want to be able to predict the median home value by using Linear Regression.
You decided to use VectorAssembler to import the dataset and to create your column entitled features that includes a single input variable entitled rm.
At this moment you have to fit the Liner Regression model.
Considering this scenario, what code should you write?
- from pyspark.ml.regression import LinearRegression
- lr = LinearRegression(featuresCol=”rm”, labelCol=”medv”)
- lrModel = lr_fit(bostonFeaturizedDF)
- from pyspark.ml import LinearRegression
- lr = LinearRegression(featuresCol=”rm “, labelCol=”medv”)
- lrModel = lr_fit(bostonFeaturizedDF)
- from pyspark import LinearRegression
- lr = LinearRegression(featuresCol=”features”, labelCol=”medv”)
- lrModel = lr.fit(bostonFeaturizedDF)
- from pyspark.ml.regression import LinearRegression (CORRECT)
- lr = LinearRegression(featuresCol=”features”, labelCol=”medv”)
- lrModel = lr.fit(bostonFeaturizedDF)
Correct: This is the correct code for the task.
116. You are using remote compute in Azure Machine Learning to run a training experiment.
The Conda environment used for the experiment includes both the mlflow, and the azureml-contrib-run packages. In order to track the metrics that the experiment generates, you have to log package by using MLflow.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
Import numpy as np
#1 Import library to log metrics
#2 Start logging for this run
reg_rage = 0.01
#3 Log the reg_rate metric
#4 Stop loggin for this run
- #1 import mlflow, #2 mlflow.start_run(), #3 logger.info(‘ ..’), #4 mlflow.end_run()
- #1 from azureml.core import Run, #2 run = Run.get_context(), #3 logger.info(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 mlflow.end_run() (CORRECT)
- #1 import logging, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 run.complete()
Correct: #1 Import the mlflow and Workspace classes to access MLflow’s tracking URI and configure your workspace.
#2 mlflow.start_run() Set the MLflow experiment name with set_experiment() and start your training run with start_run().
#3 mlflow.log_metric(‘ ..’) Use log_metric() to activate the MLflow logging API and begin logging your training run metrics.
#4 mlflow.end_run() Close the run: run.endRun()
117. Your task is to extract local feature importance from a TabularExplainer.
What code should you write in the SDK to achieve this goal?
- local.tab_explanation = tab_explainer_explain_local(X_test[0:5])
- local_tab_features = local_tab_explanation.get_ranked_local_names()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
- local_tab_features = local_tab_explanation.get_feature_local_names()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
- local_tab_features = local_tab_explanation.get_feature_importance_dict ()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5]) (CORRECT)
- local_tab_features = local_tab_explanation.get_ranked_local_names()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
Correct: This is the correct code for this task.
118. If you want to string together all the different possible hyperparameters that you need for testing, what is the most suitable PySpark class method you should choose?
- ParamGridBuilder() (CORRECT)
- ParamBuilder()
- ParamGridSearch()
- ParamSearch()
Correct: ParamGridBuilder() allows you to string together all of the different possible hyperparameters you would like to test. In this case, you can test the maximum number of iterations, whether you want to use an intercept with the y axis, and whether you want to standardize our features.
119. Your task is to clean up the deployments and terminate the “dev” ACI webservice by making use of the Azure ML SDK after your work with Azure Machine Learning has ended.
What is the most suitable method in order to achieve this goal?
- dev_webservice.remove()
- dev_webservice.terminate()
- dev_webservice.flush()
- dev_webservice.delete() (CORRECT)
Correct: Because ACI manages compute resources on your behalf, deleting the “dev” ACI webservice will remove all resources associated with the “dev” model deployment
120. As a data scientist, you are asked to build a deep convolutional neural network (CNN) in order to classify images. Your CNN model seems to present some overfitting signs. Your goal is to minimize overfitting and to give an optimal fit to the model.
Considering this, what are the most appropriate two actions that you should take?
- Reduce the amount of training data
- Add an additional dense layer with 64 input units
- Add an additional dense layer with 512 input units
- Add L1/L2 regularization (CORRECT)
- Use training data augmentation (CORRECT)
Correct: Weight regularization provides an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data, such as the holdout test set. L1L2: Sum of the absolute and the squared weights.
Correct: Adding more training records should decrease the overfitting.
121. You have the role of lead data scientist in a project that keeps record of birds’ health and migration. You decide to use a set of labeled bird photographs collected by experts for your multi-class image classification deep learning model.
The entire set of 200,000 birds’ photographs uses the JPG format and is being kept in an Azure blob container from an Azure subscription. You have to be able to ensure access from the Azure Machine Learning service workspace used for deep learning model training directly to the bird photograph files stored in the Azure blob container.
You have to keep data movement to a minimum. What action should you take?
- Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning service. (CORRECT)
- Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning service workspace.
- Create an Azure Data Lake store and move the bird photographs to the store.
- Create and register a dataset by using TabularDataset class that references the Azure blob storage containing bird photographs.
- Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to the database.
Correct: When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace.
122. Your task is to ensure that your data drift monitor, that you scheduled to run daily, is able to send an alert when the drift magnitude surpasses 0.2. What code should you write in Python to achieve this?
- alert_email = AlertConfiguration(‘data_scientists@contoso.com’)
- monitor = DataDriftDetector.create_from_datasets(ws, ‘dataset-drift-detector’,
- baseline_data_set, target_data_set,
- compute_target=cpu_cluster,
- frequency=’Day’, latency=2,
- drift_threshold=.2,
- alert_configuration=alert_email)
- alert_email = AlertConfiguration(‘data_scientists@contoso.com’) (CORRECT)
- monitor = DataDriftDetector.create_from_datasets(ws, ‘dataset-drift-detector’,
- baseline_data_set, target_data_set,
- compute_target=cpu_cluster,
- frequency=’Week’, latency=2,
- drift_threshold=.2,
- alert_configuration=alert_email)
- alert_email = AlertConfiguration(‘data_scientists@contoso.com’)
- monitor = DataDriftDetector.create_from_datasets(ws, ‘dataset-drift-detector’,
- baseline_data_set, target_data_set,
- compute_target=cpu_cluster,
- frequency=’Day’, latency=2,
- drift_threshold=.4)
Correct: This is the correct code for the task.
123. In order to find all the runs for a specific experiment, you can use also the search_runs method.
What code should you write in Python to achieve this?
- experiment = run.experiment_id
- runs_df = mlflow.search_runs(experiment_id)
- display(runs_df)
- experiment_id = run.info.experiment_id
- runs_df = mlflow.search_runs(experiment_id)
- display(runs_df)
- experiment_id = info.experiment_id
- runs_df = mlflow.search_runs(experiment_id)
- display(runs_df)
- experiment_id = run.experiment_id (CORRECT)
- runs_df = mlflow.search_runs(experiment_id)
- display(runs_df)
Correct: This is the correct code syntax.
124. You decided to use Azure Machine Learning and your goal is to train a Diabetes Model and build a container image for it.
You choose to make use of the scikit-learn ElasticNet linear regression model.
You want to use Azure Kubernetes Service (AKS) for the model deployment to production.
For deploying the model, you configured an AKS cluster.
At this point, you have deployed the image of the model to the desired AKS cluster.
After using different hyperparameters to train the new model, your goal is to deploy to the AKS cluster the new image of the model.
What code should you write for this task?
- prod_webservice.deploy (image=model_image_updated)
- prod_webservice.wait_for_deployment(show_output = True)
- prod_webservice.create (image=model_image_updated)
- prod_webservice.wait_for_deployment(show_output = True) (CORRECT)
- prod_webservice.update(image=model_image_updated)
- prod_webservice.wait_for_deployment(show_output = True)
- prod_webservice.delete (image=model_image_updated)
- prod_webservice.wait_for_deployment(show_output = True)
Correct: This is the correct code syntax.
125. What are the most appropriate SDK commands you should choose if you want to publish the pipeline that you created?
- published.pipeline = pipeline_publish(name=’training_pipeline’,
- description=’Model training pipeline’,
- version=’1.0′)
- published.pipeline = pipeline.publish(name=’training_pipeline’,
- description=’Model training pipeline’,
- version=’1.0′)
- published_pipeline = pipeline.publish(name=’training_pipeline’, (CORRECT)
- description=’Model training pipeline’,
- version=’1.0′)
- publishedpipeline = pipeline_publish(name=’training_pipeline’,
- description=’Model training pipeline’,
- version=’1.0′)
Correct: This is the correct command for publishing a pipeline using the SDK.
126. Yes or No?
In order to explain the model’s predictions, you have to calculate the importance of all the features, taking into account the overall global relative importance value, but also the measure of local importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an explainer.
Solution: Configure a PFIExplainer. Is this solution effective?
- Yes
- No (CORRECT)
Correct: The PFIExplainer doesn’t support local feature importance explanations.
127. Your task is to reduce the size of the feature maps that a convolutional layer generates when you create a convolutional neural network. What action should you take in this case?
- Increase the number of filters in the convolutional layer
- Add a pooling layer after the convolutional layer (CORRECT)
- Reduce the size of the filter kernel used in the convolutional layer
Correct: A pooling layer reduces the number of features in a feature map.
128. True or False?
In order to differentiate multiple images, convolutional filters and pooling are used by the feature extraction layers to emphasize edges, corners, and other patterns.
This solution is supposed to work for any other group of images that have the same dimensions set as the network input layer.
- True (CORRECT)
- False
Correct: The feature extraction layers apply convolutional filters and pooling to emphasize edges, corners, and other patterns in the images that can be used to differentiate them, and in theory should work for any set of images with the same dimensions as the input layer of the network.
129. You’re using the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
- Create a ScheduleRecurrence object with a Frequency of auto. Use the object to create a schedule for the pipeline
- Set the regenerate_outputs property of the pipeline to True
- Create a PipelineParameter with a default value that references the location where the training data is stored
- Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder containing the training data in the path_on_datastore property (CORRECT)
Correct: To run a pipeline on a recurring basis, you’ll create a schedule. A Schedule associates a pipeline, an experiment, and a trigger. The trigger can either be aScheduleRecurrence that describes the wait between jobs or a Datastore path that specifies a directory to watch for changes.
130. If you want to minimize disparity in the selection rate across sensitive feature groups, what is the most suitable parity constraint that you should choose to use with any of the mitigation algorithms?
- Equalized odds
- Bounded group loss
- Demographic parity (CORRECT)
- Error rate parity
Correct: This is the parity constraint described. For example, in a binary classification scenario, this constraint tries to ensure that an equal number of positive predictions are made in each group.
CONCLUSION – Exam Preparation Course 5
In conclusion, this module offers a thorough understanding of how core concepts are weighted and skills are assessed, equipping you for the full practice exam. You will also gain step-by-step guidance on scheduling your certification exam, ensuring you are fully prepared and informed about the certification process. By the end of this module, you will possess the knowledge and confidence to effectively manage both the evaluation and administrative steps needed to achieve your certification goals.