Data Analysis Using the Spark DataFrame API
An open-source cluster-computing framework used for data science Apache Spark has become the de facto big data framework. In this Skillsoft Aspire course learners explore how to analyze real data sets by using DataFrame API methods. Discover how to optimize operations with shared variables and combine data from multiple DataFrames using joins. Explore the Spark 2.x version features that make it significantly faster than Spark 1.x. Other topics include how to create a Spark DataFrame from a CSV file; apply DataFrame transformations grouping and aggregation; perform operations on a DataFrame to analyze categories of data in a data set. Visualize the contents of a Spark DataFrame with Matplotlib. Conclude by studying how to broadcast variables and DataFrame contents in text file format.