Building ML Training Sets: Preprocessing Datasets for Classification

In this course learners can explore how to implement machine learning scaling techniques such as standardizing and normalizing on continuous data and label encoding on the target in order to get the best out of machine learning algorithms. Examine dimensionality reduction by using Principal Component Analysis (PCA). Start this 6-video course by using Pandas library to load a CSV data set into a data frame and scale continuous features by using a standard scaler. You will then learn how to build and evaluate a support vector classifier in scikit-learn; use Pandas and Seaborn to generate a heat map; and spot the correlations between features in a data set. Discover how to apply the technique of PCA to reduce the number of dimensions in your input data and obtain the explained variance of each principal component. In the courses final tutorial you will explore how to apply normalization and PCA on data sets and build a classification model with the principal components of scaled data. The concluding exercise involves processing data for classification.

Other Articles

The Benefits of Role-Based Training for Government Employees

The Root Cause of Exceptional Government Leadership

Employee Skills Assessments: Enhancing Government Workforce Capabilities

free trial