Cloud Hadoop: Scaling Apache Spark

Cloud Hadoop: Scaling Apache Spark

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 3h 13m | 481 MB

Apache Hadoop and Spark make it possible to generate genuine business insights from big data. The Amazon cloud is natural home for this powerful toolset, providing a variety of services for running large-scale data-processing workflows. Learn to implement your own Apache Hadoop and Spark workflows on AWS in this course with big data architect Lynn Langit. Explore deployment options for production-scaled jobs using virtual machines with EC2, managed Spark clusters with EMR, or containers with EKS. Learn how to configure and manage Hadoop clusters and Spark jobs with Databricks, and use Python or the programming language of your choice to import data and execute jobs. Plus, learn how to use Spark libraries for machine learning, genomics, and streaming. Each lesson helps you understand which deployment option is best for your workload.

Topics include:

  • File systems for Hadoop and Spark
  • Working with Databricks
  • Loading data into tables
  • Setting up Hadoop and Spark clusters on the cloud
  • Running Spark jobs
  • Importing and exporting Python notebooks
  • Executing Spark jobs in Databricks using Python and Scala
  • Importing data into Spark clusters
  • Coding and executing Spark transformations and actions
  • Data caching
  • Spark libraries: Spark SQL, SparkR, Spark ML, and more
  • Spark streaming
  • Scaling Spark with AWS and GCP
Table of Contents

1 Scaling Apache Hadoop and Spark
2 What you should know
3 Using cloud services
4 Modern Hadoop and Spark
5 File systems used with Hadoop and Spark
6 Apache or commercial Hadoop distros
7 Hadoop and Spark libraries
8 Hadoop on Google Cloud Platform
9 Spark Job on Google Cloud Platform
10 Sign up for Databricks Community Edition
11 Add Hadoop libraries
12 Databricks AWS Community Edition
13 Load data into tables
14 Hadoop and Spark cluster on AWS EMR
15 Run Spark job on AWS EMR
16 Review batch architecture for ETL on AWS
17 Apache Spark libraries
18 Spark data interfaces
19 Select your programming language
20 Spark session objects
21 Spark shell
22 Tour the Databricks Environment
23 Tour the notebook
24 Import and export notebooks
25 Calculate Pi on Spark
26 Run WordCount of Spark with Scala
27 Import data
28 Transformations and actions
29 Caching and the DAG
30 Architecture Streaming for prediction
31 Spark SQL
32 SparkR
33 Spark ML Preparing data
34 Spark ML Building the model
35 Spark ML Evaluating the model
36 Advanced machine learning on Spark
37 MXNet
38 Spark with ADAM for genomics
39 Spark architecture for genomics
40 Reexamine streaming pipelines
41 Spark Streaming
42 Streaming ingest services
43 Advanced Spark Streaming with MLeap
44 Scale Spark on the cloud by example
45 Build a quick start with Databricks AWS
46 Scale Spark cloud compute with VMs
47 Optimize cloud Spark virtual machines
48 Use AWS EKS containers and data lake
49 Optimize Spark cloud data tiers on Kubernetes
50 Build reproducible cloud infrastructure
51 Scale on GCP Dataproc or on Terra.bio
52 Continue learning for scaling