This page was exported from IT certification exam materials [ http://blog.dumpleader.com ] Export date:Sat Feb 22 15:57:41 2025 / +0000 GMT ___________________________________________________ Title: [Nov 22, 2023] Prepare For The DP-100 Question Papers In Advance [Q232-Q253] --------------------------------------------------- [Nov 22, 2023] Prepare For The DP-100 Question Papers In Advance DP-100 PDF Dumps Real 2023 Recently Updated Questions Microsoft DP-100 exam is a great way for data scientists to validate their skills and knowledge in Azure data science solutions. Passing DP-100 exam shows that the candidate has the necessary skills to design, implement, and deploy data science solutions on Azure. Moreover, this certification can be a valuable asset for individuals who want to advance their career in the data science field, as it demonstrates their proficiency in various areas related to data science. Microsoft DP-100 Exam Syllabus Topics: TopicDetailsTopic 1Determine Relative Size Of Splits Resample A Dataset To Impose Balance Adjust Performance Metric To Resolve ImbalancesTopic 2Determine Ideal Split Based On The Nature Of The Data Determine Number Of Splits Identify Data ImbalancesTopic 3Select An Algorithmic Approach Consider Data Preparation Steps That Are Specific To The Selected AlgorithmsTopic 4Determine Appropriate Performance Metrics Implement Appropriate AlgorithmsTopic 5Analyze And Recommend Tools That Meet System Requirements Set Up Development EnvironmentTopic 6Assess The Deployment Environment Constraints Select The Development EnvironmentTopic 7Review Visual Analytics Data To Discover Patterns And Determine Next Steps Design A Data Sampling Strategy   Q232. You plan to explore demographic data for home ownership in various cities. The data is in a CSV file with the following format:age,city,income,home_owner21,Chicago,50000,035,Seattle,120000,123,Seattle,65000,045,Seattle,130000,118,Chicago,48000,0You need to run an experiment in your Azure Machine Learning workspace to explore the data and log the results. The experiment must log the following information:the number of observations in the dataseta box plot of income by home_ownera dictionary containing the city names and the average income for each city You need to use the appropriate logging methods of the experiment’s run object to log the required information.How should you complete the code? To answer, drag the appropriate code segments to the correct locations.Each code segment may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.NOTE: Each correct selection is worth one point. ExplanationBox 1: logThe number of observations in the dataset.run.log(name, value, description=”)Scalar values: Log a numerical or string value to the run with the given name. Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric.Example: run.log(“accuracy”, 0.95)Box 2: log_imageA box plot of income by home_owner.log_image Log an image to the run record. Use log_image to log a .PNG image file or a matplotlib plot to the run. These images will be visible and comparable in the run record.Example: run.log_image(“ROC”, plot=plt)Box 3: log_tableA dictionary containing the city names and the average income for each city.log_table: Log a dictionary object to the run with the given name.Q233. You need to modify the inputs for the global penalty event model to address the bias and variance issue.Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Q234. You register a model that you plan to use in a batch inference pipeline.The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset. The script has the ParallelRunStep step runs must process six input files each time the inferencing function is called.You need to configure the pipeline.Which configuration setting should you specify in the ParallelRunConfig object for the PrallelRunStep step?  process_count_per_node= “6”  node_count= “6”  mini_batch_size= “6”  error_threshold= “6” node_count is the number of nodes in the compute target used for running the ParallelRunStep.Reference:https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.parallelrunconfig?view=azure-ml-pyQ235. You create an experiment in Azure Machine Learning Studio. You add a training dataset that contains 10,000 rows. The first 9,000 rows represent class 0 (90 percent).The remaining 1,000 rows represent class 1 (10 percent).The training set is imbalances between two classes. You must increase the number of training examples for class 1 to 4,000 by using 5 data rows. You add the Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.You need to configure the module.Which values should you use? To answer, select the appropriate options in the dialog box in the answer area.NOTE: Each correct selection is worth one point. Explanation:Box 1: 300You type 300 (%), the module triples the percentage of minority cases (3000) compared to the original dataset (1000).Box 2: 5We should use 5 data rows.Use the Number of nearest neighbors option to determine the size of the feature space that the SMOTE algorithm uses when in building new cases. A nearest neighbor is a row of data (a case) that is very similar to some target case. The distance between any two cases is measured by combining the weighted vectors of all features.By increasing the number of nearest neighbors, you get features from more cases.By keeping the number of nearest neighbors low, you use features that are more like those in the original sample.References:https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smoteQ236. An organization uses Azure Machine Learning service and wants to expand their use of machine learning.You have the following compute environments. The organization does not want to create another compute environment.You need to determine which compute environment to use for the following scenarios.Which compute types should you use? To answer, drag the appropriate compute environments to the correct scenarios. Each compute environment may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.NOTE: Each correct selection is worth one point. Reference:https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-targethttps://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targetsQ237. You are building an experiment using the Azure Machine Learning designer.You split a dataset into training and testing sets. You select the Two-Class Boosted Decision Tree as the algorithm.You need to determine the Area Under the Curve (AUC) of the model.Which three modules should you use in sequence? To answer, move the appropriate modules from the list of modules to the answer area and arrange them in the correct order. Q238. You need to define a process for penalty event detection.Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. 1 – Vary the length of frequency bands between modeling epochs.2 – Standardize to mono audio clips.3 – Use an Inverse Fourier transform on frequency changes over time.Q239. You create an experiment in Azure Machine Learning Studio- You add a training dataset that contains 10.000 rows. The first 9.000 rows represent class 0 (90 percent). The first 1.000 rows represent class 1 (10 percent).The training set is unbalanced between two Classes. You must increase the number of training examples for class 1 to 4,000 by using data rows. You add the Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.You need to configure the module.Which values should you use? To answer, select the appropriate options in the dialog box in the answer area.NOTE: Each correct selection is worth one point. Q240. You train a classification model by using a decision tree algorithm.You create an estimator by running the following Python code. The variable feature_names is a list of all feature names, and class_names is a list of all class names.from interpret.ext.blackbox import TabularExplainerYou need to explain the predictions made by the model for all classes by determining the importance of all features.For each of the following statements, select Yes if the statement is true. Otherwise, select No.NOTE: Each correct selection is worth one point. Reference:https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-amlQ241. You are creating a new Azure Machine Learning pipeline using the designer.The pipeline must train a model using data in a comma-separated values (CSV) file that is published on a website. You have not created a dataset for this file.You need to ingest the data from the CSV file into the designer pipeline using the minimal administrative effort.Which module should you add to the pipeline in Designer?  Convert to CSV  Enter Data ManuallyD  Import Data  Dataset ExplanationThe preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to data that lives in or is accessible from a datastore or at a Web URL. The Dataset class is abstract, so you will create an instance of either a FileDataset (referring to one or more files) or a TabularDataset that’s created by from one or more files with delimited columns of data.Example:from azureml.core import Datasetiris_tabular_dataset = Dataset.Tabular.from_delimited_files([(def_blob_store, ‘train-dataset/iris.csv’)]) Reference:https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipelineQ242. You train a machine learning model.You must deploy the model as a real-time inference service for testing. The service requires low CPU utilization and less than 48 MB of RAM. The compute target for the deployed service must initialize automatically while minimizing cost and administrative overhead.Which compute target should you use?  Azure Kubernetes Service (AKS) inference cluster  Azure Machine Learning compute cluster  Azure Container Instance (ACI)  attached Azure Databricks cluster ExplanationAzure Container Instances (ACI) are suitable only for small models less than 1 GB in size.Use it for low-scale CPU-based workloads that require less than 48 GB of RAM.Note: Microsoft recommends using single-node Azure Kubernetes Service (AKS) clusters for dev-test of larger models.Reference:https://docs.microsoft.com/id-id/azure/machine-learning/how-to-deploy-and-whereQ243. You have a dataset that contains over 150 features. You use the dataset to train a Support Vector Machine (SVM) binary classifier.You need to use the Permutation Feature Importance module in Azure Machine Learning Studio to compute a set of feature importance scores for the dataset.In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order. Explanation:Step 1: Add a Two-Class Support Vector Machine module to initialize the SVM classifier.Step 2: Add a dataset to the experimentStep 3: Add a Split Data module to create training and test dataset.To generate a set of feature scores requires that you have an already trained model, as well as a test dataset.Step 4: Add a Permutation Feature Importance module and connect to the trained model and test dataset.Step 5: Set the Metric for measuring performance property to Classification – Accuracy and then run the experiment.Reference:https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-support-vector-machinehttps://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importanceQ244. You plan to use the Hyperdrive feature of Azure Machine Learning to determine the optimal hyperparameter values when training a model.You must use Hyperdrive to try combinations of the following hyperparameter values:* learning_rate: any value between 0.001 and 0.1* batch_size: 16, 32, or 64You need to configure the search space for the Hyperdrive experiment.Which two parameter expressions should you use? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point.  a choice expression for learning_rate  a uniform expression for learning_rate  a normal expression for batch_size  a choice expression for batch_size  a uniform expression for batch_size B: Continuous hyperparameters are specified as a distribution over a continuous range of values. Supported distributions include:* uniform(low, high) – Returns a value uniformly distributed between low and high D: Discrete hyperparameters are specified as a choice among discrete values. choice can be:* one or more comma-separated values* a range object* any arbitrary list objectReference:https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparametersQ245. You create an Azure Machine Learning workspace and install the MLflow library.You need to tog different types of data by using the MLflow library.Which method should you use? To answer, select the appropriate options in the answer area.NOTE: Each correct selection is worth one point. ExplanationQ246. You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework.You must configure Azure Machine Learning Hyperdrive to optimize the hyperparameters for the classification model.You need to define a primary metric to determine the hyperparameter values that result in the model with the best accuracy score.Which three actions must you perform? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point.  Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.  Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss  Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.  Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.  Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.  Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key ExplanationAD:primary_metric_name=”accuracy”,primary_metric_goal=PrimaryMetricGoal.MAXIMIZEOptimize the runs to maximize “accuracy”. Make sure to log this value in your training script.Note:primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script.primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs.F: The training script calculates the val_accuracy and logs it as “accuracy”, which is used as the primary metric.Q247. You have a binary classifier that predicts positive cases of diabetes within two separate age groups.The classifier exhibits a high degree of disparity between the age groups.You need to modify the output of the classifier to maximize its degree of fairness across the age groups and meet the following requirements:* Eliminate the need to retrain the model on which the classifier is based.* Minimize the disparity between true positive rates and false positive rates across age groups.Which algorithm and panty constraint should you use? To answer, select the appropriate options in the answer are a. NOTE: Each correct selection is worth one point. Q248. You are developing a data science workspace that uses an Azure Machine Learning service.You need to select a compote target to deploy the workspace.What should you use?  Azure Data Lake Analytics  Azure Databrick .  Apache Spark for HDInsight.  Azure Container Service Q249. You are evaluating a Python NumPy array that contains six data points defined as follows:data = [10, 20, 30, 40, 50, 60]You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn machine learning library:train: [10 40 50 60], test: [20 30]train: [20 30 40 60], test: [10 50]train: [10 20 30 50], test: [40 60]You need to implement a cross-validation to generate the output.How should you complete the code segment? To answer, select the appropriate code segment in the dialog box in the answer area.NOTE: Each correct selection is worth one point. ExplanationBox 1: k-foldBox 2: 3K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.Box 3: dataExample: Example:>>>>>> from sklearn.model_selection import KFold>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])>>> y = np.array([1, 2, 3, 4])>>> kf = KFold(n_splits=2)>>> kf.get_n_splits(X)2>>> print(kf)KFold(n_splits=2, random_state=None, shuffle=False)>>> for train_index, test_index in kf.split(X):print(“TRAIN:”, train_index, “TEST:”, test_index)X_train, X_test = X[train_index], X[test_index]y_train, y_test = y[train_index], y[test_index]TRAIN: [2 3] TEST: [0 1]TRAIN: [0 1] TEST: [2 3]References:https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.htmlQ250. You plan to use the Hyperdrive feature of Azure Machine Learning to determine the optimal hyperparameter values when training a model.You must use Hyperdrive to try combinations of the following hyperparameter values. You must not apply an early termination policy.learning_rate: any value between 0.001 and 0.1* batch_size: 16, 32, or 64You need to configure the sampling method for the Hyperdrive experiment Which two sampling methods can you use? Each correct answer is a complete solution.NOTE: Each correct selection is worth one point.  Grid sampling  No sampling  Bayesian sampling  Random sampling Q251. You need to correct the model fit issue.Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Explanation:Step 1: Augment the dataScenario: Columns in each dataset contain missing and null values. The datasets also contain many outliers.Step 2: Add the Bayesian Linear Regression module.Scenario: You produce a regression model to predict property prices by using the Linear Regression and Bayesian Linear Regression modules.Step 3: Configure the regularization weight.Regularization typically is used to avoid overfitting. For example, in L2 regularization weight, type the value to use as the weight for L2 regularization. We recommend that you use a non-zero value to avoid overfitting.Scenario:Model fit: The model shows signs of overfitting. You need to produce a more refined regression model that reduces the overfitting.Incorrect Answers:Multiclass Decision Jungle module:Decision jungles are a recent extension to decision forests. A decision jungle consists of an ensemble of decision directed acyclic graphs (DAGs).L-BFGS:L-BFGS stands for “limited memory Broyden-Fletcher-Goldfarb-Shanno”. It can be found in the wwo-Class Logistic Regression module, which is used to create a logistic regression model that can be used to predict two (and only two) outcomes.References:<https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regr ession>Q252. You use Azure Machine Learning Studio to build a machine learning experiment.You need to divide data into two distinct datasets.Which module should you use?  Split Data  Load Trained Model  Assign Data to Clusters  Group Data into Bins The Group Data into Bins module supports multiple options for binning data. You can customize how the bin edges are set and how values are apportioned into the bins.References:https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-data-into-bins Perform Feature Engineering Testlet 1 Case study Overview You are a data scientist in a company that provides data science for professional sporting events. Models will use global and local market data to meet the following business goals:* Understand sentiment of mobile device users at sporting events based on audio from crowd reactions.* Assess a user’s tendency to respond to an advertisement.* Customize styles of ads served on mobile devices.* Use video to detect penalty eventsCurrent environment* Media used for penalty event detection will be provided by consumer devices. Media may include images and videos captured during the sporting event and shared using social media. The images and videos will have varying sizes and formats.* The data available for model building comprises of seven years of sporting event media. The sporting event media includes; recorded video transcripts or radio commentary, and logs from related social media feeds captured during the sporting events.* Crowd sentiment will include audio recordings submitted by event attendees in both mono and stereo formats.Penalty detection and sentiment* Data scientists must build an intelligent solution by using multiple machine learning models for penalty event detection.* Data scientists must build notebooks in a local environment using automatic feature engineering and model building in machine learning pipelines.* Notebooks must be deployed to retrain by using Spark instances with dynamic worker allocation.* Notebooks must execute with the same code on new Spark instances to recode only the source of the data.* Global penalty detection models must be trained by using dynamic runtime graph computation during training.* Local penalty detection models must be written by using BrainScript.* Experiments for local crowd sentiment models must combine local penalty detection data.* Crowd sentiment models must identify known sounds such as cheers and known catch phrases. Individual crowd sentiment models will detect similar sounds.* All shared features for local models are continuous variables.* Shared features must use double precision. Subsequent layers must have aggregate running mean and standard deviation metrics available.AdvertisementsDuring the initial weeks in production, the following was observed:* Ad response rated declined.* Drops were not consistent across ad styles.* The distribution of features across training and production data are not consistent Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features that come from location sources are being used as raw features. A suggested experiment to remedy the bias and variance issue is to engineer 10 linearly uncorrelated features.* Initial data discovery shows a wide range of densities of target states in training data used for crowd sentiment models.* All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD) are running too slow.* Audio samples show that the length of a catch phrase varies between 25%-47% depending on region* The performance of the global penalty detection models shows lower variance but higher bias when comparing training and validation sets. Before implementing any feature changes, you must confirm the bias and variance using all training and validation cases.* Ad response models must be trained at the beginning of each event and applied during the sporting event.* Market segmentation models must optimize for similar ad response history.* Sampling must guarantee mutual and collective exclusively between local and global segmentation models that share the same features.* Local market segmentation models will be applied before determining a user’s propensity to respond to an advertisement.* Ad response models must support non-linear boundaries of features.* The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated from0.1 +/- 5%.* The ad propensity model uses cost factors shown in the following diagram:* The ad propensity model uses proposed cost factors shown in the following diagram:* Performance curves of current and proposed cost factor scenarios are shown in the following diagram:Q253. You create a binary classification model using Azure Machine Learning Studio.You must use a Receiver Operating Characteristic (RO C) curve and an F1 score to evaluate the model.You need to create the required business metrics.How should you complete the experiment? To answer, select the appropriate options in the dialog box in the answer area.NOTE: Each correct selection is worth one point.  Loading … The Microsoft DP-100 exam covers a range of topics related to data science, including data ingestion, transformation, and storage. Candidates will be tested on their ability to design and implement solutions using Azure tools and services, such as Azure Machine Learning, Azure Cognitive Services, and Azure Data Factory. They will also be tested on their ability to work with big data technologies such as Hadoop and Spark.   DP-100 Dumps and Practice Test (403 Exam Questions): https://www.dumpleader.com/DP-100_exam.html --------------------------------------------------- Images: https://blog.dumpleader.com/wp-content/plugins/watu/loading.gif https://blog.dumpleader.com/wp-content/plugins/watu/loading.gif --------------------------------------------------- --------------------------------------------------- Post date: 2023-11-22 15:23:52 Post date GMT: 2023-11-22 15:23:52 Post modified date: 2023-11-22 15:23:52 Post modified date GMT: 2023-11-22 15:23:52