Hadoop Developer In Real World

Hadoop Developer In Real World

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 18.5 Hours | 2.18 GB

Free Cluster Access * HDFS * MapReduce * YARN * Pig * Hive * Flume * Sqoop * AWS * EMR * Optimization * Troubleshooting

From the creators of the successful Hadoop Starter Kit course hosted in Udemy, comes Hadoop In Real World course. This course is designed for anyone who aspire a career as a Hadoop developer. In this course we have covered all the concepts that every aspiring Hadoop developer must know to SURVIVE in REAL WORLD Hadoop environments.

The course covers all the must know topics like HDFS, MapReduce, YARN, Apache Pig and Hive etc. and we go deep in exploring the concepts. We just don’t stop with the easy concepts, we take it a step further and cover important and complex topics like file formats, custom Writables, input/output formats, troubleshooting, optimizations etc.

All concepts are backed by interesting hands-on projects like analyzing million song dataset to find less familiar artists with hot songs, ranking pages with page dumps from wikipedia, simulating mutual friends functionality in Facebook just to name a few.

What Will I Learn?

  • Understand what is Big Data, the challenges with Big Data and how Hadoop propose a solution for the Big Data problem
  • Work and navigate Hadoop cluster with ease
  • Install and configure a Hadoop cluster on cloud services like Amazon Web Services (AWS)
  • Understand the difference phases of MapReduce in detail
  • Write optimized Pig Latin instruction to perform complex data analysis
  • Write optimized Hive queries to perform data analysis on simple and nested datasets
  • Work with file formats like SequenceFile, AVRO etc
  • Understand Hadoop architecture, Single Point Of Failures (SPOF), Secondary/Checkpoint/Backup nodes, HA configuration and YARN
  • Tune and optimize slowing running MapReduce jobs, Pig instructions and Hive queries
  • Understand how Joins work behind the scenes and will be able to write optimized join statements
  • Wherever possible, students will be introduced to difficult questions that are asked in real Hadoop interviews
Table of Contents

Thank You and Let’s Get Started
1 Course Structure
2 Tools & Setup (Windows)
3 Tools & Setup (Linux)

Introduction To Big Data
4 What is Big Data_
5 Understanding Big Data Problem
6 History of Hadoop

HDFS
7 HDFS – Why Another Filesystem_
8 Blocks
9 Working With HDFS
10 HDFS – Read & Write
11 HDFS – Read & Write (Program)
12 HDFS Assignment

MapReduce
13 Introduction to MapReduce
14 Dissecting MapReduce Components
15 Dissecting MapReduce Program (Part 1)
16 Dissecting MapReduce Program (Part 2)
17 Combiner
18 Counters
19 Facebook – Mutual Friends
20 New York Times – Time Machine
21 MapReduce Assignment

Apache Pig
22 Introduction to Apache Pig
23 Loading & Projecting Datasets
24 Solving a Problem
25 Complex Types
26 Pig Latin – Joins
27 Million Song Dataset (Part 1)
28 Million Song Dataset (Part 2)
29 Page Ranking (Part 1)
30 Page Ranking (Part 2)
31 Page Ranking (Part 3)
32 Apache Pig Assignment

Apache Hive
33 Introduction to Apache Hive
34 Dissect a Hive Table
35 Loading Hive Tables
36 Simple Selects
37 Managed Table vs_ External Table
38 Order By vs_ Sort By vs_ Cluster By
39 Partitions
40 Buckets
41 Hive QL – Joins
42 Twitter (Part 1)
43 Twitter (Part 2)
44 Apache Hive Assignment

Architechture
45 HDFS Architechture
46 Secondary Namenode
47 Highly Available Hadoop
48 MRv1 Architechture
49 YARN

Cluster Setup
50 Vendors & Hosting
51 Cluster Setup (Part 1)
52 Cluster Setup (Part 2)
53 Cluster Setup (Part 3)
54 Amazon EMR

Hadoop Administrator In Real World (Preview)
55 Cloudera Manager – Introduction
56 Cloudera Manager – Installation

File Formats
57 Compression
58 Sequence File
59 AVRO
60 File Formats – Pig
61 File Formats – Hive
62 Introduction to RCFile
63 Working with RCFile
64 Introduction to ORC
65 Working with ORC
66 Parquet – Another Columnar Format

Troubleshooting and Optimizations
67 Exploring Logs
68 MRUnit
69 MapReduce Tuning
70 Pig Join Optimizations (Part 1)
71 Pig Join Optimizations (Part 2)
72 Hive Join Optimizations

Apache Sqoop
73 Sqoop Imports
74 Sqoop – File Formats
75 Jobs & Incremental Imports
76 Hive – Exports

Apache Flume
77 Introduction to Flume
78 Replication
79 Consolidation & Mutliplexing
80 Streaming Twitter with Flume

Kafka
81 Kafka – The Why & the What_
82 Kafka Concepts
83 Tolerating Failures – Producers & Consumers
84 Tolerating Failures – Brokers
85 Kafka Installation
86 Experiments with Kafka
87 Streaming Meetup with Kafka (Part-1)
88 Streaming Meetup with Kafka (Part-2)

Bonus
89 Preparing For Hadoop Interviews