Apache Flink: Batch Mode Data Engineering

Apache Flink: Batch Mode Data Engineering

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 1h 07m | 167 MB

Data engineering is the foundation for enabling analytics and data science applications in the world of big data. It requires building scalable data processing pipelines and delivering them in short time frames. Apache Flink, the powerful and popular stream-processing platform, was designed to help you achieve these goals. In this course, join Kumaran Ponnambalam as he focuses on how to build batch mode data pipelines with Apache Flink. Kumaran kicks off the course by reviewing the features and architecture of Apache Flink. He then takes a deeper look at the DataSet API and explores various capabilities available for transforming, aggregating, and combining data. To wrap up the course, he presents a use case project that allows you to leverage your new skills.

Topics include:

  • The architecture of Apache Flink
  • Features of the DataSet API
  • Using POJO classes for DataSet typing
  • Working with joins in Flink
  • Using MySQL with Flink
  • Using broadcast variables to share and collect data
Table of Contents

1 Batch mode engineering
2 What is Apache Flink
3 Apache Flink features
4 Architecture of Apache Flink
5 Flink program structure
6 Flink execution flow
7 Installing Flink standalone
8 Creating a Flink project
9 Build a sample Flink program
10 Running jobs on the cluster
11 Using the Flink web interface
12 Setting up the exercise files
13 DataSet API concepts
14 Reading a CSV File
15 Using Map
16 Using FlatMap
17 Using filters
18 Using aggregates
19 Using Reduce
20 Using POJO classes
21 Join operations
22 Using MySQL with Flink
23 Using broadcast variables
24 Problem definition
25 Computing total score
26 Printing scores for physics
27 Computing average scores across subjects
28 Find the top student for each subject
29 Next steps