Building a Big Data Analytics Stack

Building a Big Data Analytics Stack

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 1h 31m | 277 MB

Building a Big Data ecosystem is hard. There are a variety of technologies available and every one of them has its pros and cons. When building a big data pipeline for software engineers, we need to use more low-level tools and APIs such as HBase and Apache Spark.

In this course, we’ll check out HBase, a database built by optimizing on the HDFS. Moving on, we’ll have a bit of fun with Spark MLlib. Finally, you’ll get an understanding of ETL and deploy a Hadoop project to the cloud. Building Big Data Ecosystem is hard. There are a variety of technologies available and every one of them has own pros and cons. Software Engineers we need to use more low-level tools and APIs like HBase and Apache Spark while building big data pipeline.

By the end of the course, you’ll be able to use more high-level tools that have more user-friendly, declarative APIs such as Pig and Hive.

What You Will Learn

  • Use Pig and Hive in a non-Java way to understand the power of Hadoop
  • Explore Spark and use it to stream and batch process
  • Use HBase database from Java application
  • Find out more about the machine learning toolkit and its use with Spark
  • Know how to leverage the pros of Big Data tools
Table of Contents

01 The Course Overview
02 Introduction to Pig
03 Introduction to Hive
04 Hive Query Language
05 Writing Spark Jobs
06 Introducing YARN
07 Creating Spark Job
08 HBase and HDFS
09 Using HBase Database from Java Application
10 Composing Spark ML Pipelines
11 Build a Recommendation System Using Collaborative Filtering
12 ETL
13 Introducing AWS EMR
14 Creating S3 and EMR Cluster
15 Running Jobs in Series Using EMR Java API