Building ML Training Sets: Introduction
There are numerous options available to scale and encode features and labels in data sets to get the best out of machine learning (ML) algorithms. In this 10-video course explore techniques such as standardizing nomalizing and one-hot encoding. Learners begin by learning how to use Pandas library to load a data set in the form of a CSV file and perform exploratory analysis on its features. Then use scikit-learns Binarizer to transform the continuous data in a series to binary values; apply the MiniMaxScaler on a data set to get two similar columns to have the same range of values; and standardize multiple columns in data sets with scikit-learns StandardScaler. Examine differences between the Normalizer and other scaling techniques and learn how to represent values in a column as a proportion of the maximum absolute value by using the MaxAbScaler. Finally discover how to use Pandas library to one-hot encode one or more features of your data set and distinguish between this technique and label encoding. The concluding exercise involves building ML training sets.<