Machine & Deep Learning Algorithms: Imbalanced Datasets Using Pandas ML

placeholder

The imbalanced-learn library that integrates with Pandas ML (machine learning) offers several techniques to address the imbalance in datasets used for classification. In this course explore oversampling undersampling and a combination of techniques. Begin by using Pandas ML to explore a data set in which samples are not evenly distributed across target classes. Then apply the technique of oversampling with the RandomOverSampler class in the imbalanced-learn library; build a classification model with oversampled data; and evaluate its performance. Next learn how to create a balanced data set with the Synthetic Minority Oversampling Technique and how to perform undersampling operations on a data set by applying Near Miss Cluster Centroids and Neighborhood cleaning rules techniques. Next look at ensemble classifiers for imbalanced data applying combination samplers for imbalanced data and finding correlations in a data set. Learn how to build a multilabel classification model explore the use of principal component analysis or PCA and how to combine use of oversampling and PCA in building a classification model. The exercise involves working with imbalanced data sets.