Hadoop and Spark Fundamentals

Hadoop and Spark Fundamentals

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 14h 07m | 4.30 GB

The perfect (and fast) way to get started with Hadoop and Spark

Hadoop and Spark Fundamentals LiveLessons provides 9+ hours of video introduction to the Apache Hadoop Big Data ecosystem. The tutorial includes background information and explains the core components of Hadoop, including Hadoop Distributed File Systems (HDFS), MapReduce, the YARN resource manager, and YARN Frameworks. In addition, it demonstrates how to use Hadoop at several levels, including the native Java interface, C++ pipes, and the universal streaming program interface. Examples include how to use benchmarks and high-level tools, including the Apache Pig scripting language, Apache Hive “SQL-like” interface, Apache Flume for streaming input, Apache Sqoop for import and export of relational data, and Apache Oozie for Hadoop workflow management. In addition, there is comprehensive coverage of Spark, PySpark, and the Zeppelin web-GUI. The steps for easily installing a working Hadoop/Spark system on a desktop/laptop and on a local stand-alone cluster using the powerful Ambari GUI are also included. All software used in these LiveLessons is open source and freely available for your use and experimentation. A bonus lesson includes a quick primer on the Linux command line as used with Hadoop and Spark.

Learn How To

  • Understand Hadoop design and key components
  • How the MapReduce process works in Hadoop
  • Understand the relationship of Spark and Hadoop
  • Key aspects of the new YARN design and Frameworks
  • Use, administer, and program HDFS
  • Run and administer Hadoop/Spark programs
  • Write basic MapReduce/Spark programs
  • Install Hadoop/Spark on a laptop/desktop
  • Run Apache Pig, Hive, Flume, Sqoop, Oozie, Spark applications
  • Perform basic data Ingest with Hive and Spark
  • Use the Zeppelin web-GUI for Spark/Hive programing
  • Install and administer Hadoop with the Apache Ambari GUI tool
Table of Contents

1 Hadoop and Spark Fundamentals – Introduction
2 Learning objectives
3 1.1 Understand Big Data and analytics
4 1.2 Understand Hadoop as a data platform
5 1.3 Understand Hadoop MapReduce basics
6 1.4 Understand Spark language basics
7 1.5 Learn the Linux command line features
8 Learning objectives
9 2.1 Install Hortonworks Hadoop and Spark HDP Sandbox
10 2.2 Install from Hadoop sources–Part 1
11 2.2 Install from Hadoop sources–Part 2
12 2.3 Install from Spark sources
13 Learning objectives
14 3.1 Understand HDFS basics
15 3.2 Use HDFS command line tools
16 3.3 Use HDFS in programs
17 3.4 Utilize additional features of HDFS
18 Learning objectives
19 4.1 Understand the MapReduce paradigm
20 4.2 Develop and run a Java MapReduce application
21 4.3 Understand how MapReduce works
22 Learning objectives
23 5.1 Use the Streaming Interface
24 5.2 Use the Pipes interface
25 5.3 Run the Hadoop grep example
26 5.4 Debugging MapReduce
27 5.5 Understand Hadoop Version 2 MapReduce
28 5.6 Use Hadoop Version 2 features–Part 1
29 5.6 Use Hadoop Version 2 features–Part 2
30 Learning objectives
31 6.1 Demonstrate a Pig example
32 6.2 Demonstrate a Hive example
33 6.3 Demonstrate an Oozie example–Part 1
34 6.3 Demonstrate an Oozie example–Part 2
35 Learning objectives
36 7.1 Learn Spark language basics
37 7.2 Demonstrate a PySpark command line example
38 Learning objectives
39 8.1 Import data into Hive tables
40 8.2 Use Spark to import data into HDFS
41 8.3 Demonstrate a Flume Example–Part 1
42 8.3 Demonstrate a Flume Example–Part 2
43 8.4 Demonstrate a Sqoop Example–Part 1
44 8.4 Demonstrate a Sqoop Example–Part 2
45 Learning objectives
46 9.1 Understand Zeppelin features
47 9.2 Deconstruct a Spark application in Zeppelin
48 Learning objectives
49 10.1 Install and configure Hadoop using Ambari–Part 1
50 10.1 Install and configure Hadoop using Ambari Part–2
51 10.2 Perform simple administration and monitoring with Ambari
52 10.3 Perform simple command line administration
53 10.4 Utilize additional features of HDFS
54 Hadoop and Spark Fundamentals – Summary