Learning Hadoop 2020

Learning Hadoop 2020

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 4h 06m | 491 MB

Hadoop is indispensable when it comes to processing big data—as necessary to understanding your information as servers are to storing it. This course is your introduction to Hadoop; key file systems used with Hadoop; its processing engine, MapReduce, and its many libraries and programming tools. Developer and big-data consultant Lynn Langit shows how to set up a Hadoop development environment, run and optimize MapReduce jobs, code basic queries with Hive and Pig, and build workflows to schedule jobs. Plus, learn about the depth and breadth of available Apache Spark libraries available for use with a Hadoop cluster, as well as options for running machine learning jobs on a Hadoop cluster.

Table of Contents

Introduction
1 Getting started with Hadoop
2 What you should know before watching this course
3 Using cloud services

Why Change
4 Limits of relational database management systems
5 Introducing CAP (consistency availability partitioning)
6 Understanding big data

What Is Hadoop
7 Introducing Hadoop
8 Understanding Hadoop distributions
9 Understanding the difference between HBase and Hadoop
10 Exploring the future of Hadoop

Understanding Hadoop Core Components
11 Understanding Java Virtual Machines (JVMs)
12 Exploring HDFS and other file systems
13 Introducing Hadoop cluster components
14 Introducing Hadoop Spark
15 Exploring the Apache and Cloudera Hadoop distributions
16 Using the public cloud to host Hadoop AWS or GCP

Setting up Hadoop Development Environment
17 Understanding the parts and pieces
18 Hosting Hadoop locally with the Cloudera developer distribution
19 Setting up the Cloudera Hadoop developer virtual machine
20 Adding Hadoop libraries to your test environment
21 Picking your programming language and IDE
22 Use GCP Dataproc for development

Understanding MapReduce 1.0
23 Understanding MapReduce 1.0
24 Exploring the components of a MapReduce job
25 Working with the Hadoop file system
26 Running a MapReduce job using the console
27 Reviewing the code for a MapReduce WordCount job
28 Running and tracking Hadoop jobs

Tuning MapReduce
29 Tuning by physical methods
30 Tuning a Mapper
31 Tuning a Reducer
32 Using a cache for lookups

Understanding MapReduce 2.0 YARN
33 Understanding MapReduce 2.0
34 Coding a basic WordCount in Java using MapReduce 2.0
35 Exploring advanced WordCount in Java using MapReduce 2.0

Understanding Hive
36 Introducing Hive and HBase
37 Understanding Hive
38 Revisiting WordCount using Hive
39 Understanding more about HQL query optimization
40 Using Hive in GCP Dataproc

Understanding Pig
41 Introducing Pig
42 Understanding Pig
43 Exploring use cases for Pig
44 Exploring Pig tools in GCP Dataproc

Understanding Workflows and Connectors
45 Introducing Oozie
46 Building a workflow with Oozie
47 Introducing Sqoop
48 Importing data with Sqoop
49 Introducing ZooKeeper
50 Coordinating workflows with ZooKeeper

Using Spark
51 Introducing Apache Spark
52 Running a Spark job to calculate Pi
53 Running a Spark job in a Jupyter Notebook

Hadoop Today
54 Understanding machine learning options
55 Understanding data lakes
56 Visualizing Hadoop systems

Next Steps
57 Next steps with Hadoop