Data Stream Development with Apache Spark, Kafka, and Spring Boot

Data Stream Development with Apache Spark, Kafka, and Spring Boot

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 7h 51m | 1.88 GB

Handle high volumes of data at high speed. Architect and implement an end-to-end data streaming pipeline

Today, organizations have a difficult time working with huge numbers of datasets. In addition, data processing and analyzing need to be done in real time to gain insights. This is where data streaming comes in. As big data is no longer a niche topic, having the skillset to architect and develop robust data streaming pipelines is a must for all developers. In addition, they also need to think of the entire pipeline, including the trade-offs for every tier.

This course starts by explaining the blueprint architecture for developing a completely functional data streaming pipeline and installing the technologies used. With the help of live coding sessions, you will get hands-on with architecting every tier of the pipeline. You will also handle specific issues encountered working with streaming data. You will input a live data stream of Meetup RSVPs that will be analyzed and displayed via Google Maps.

By the end of the course, you will have built an efficient data streaming pipeline and will be able to analyze its various tiers, ensuring a continuous flow of data.

This course is a combination of text, a lot of images (diagrams), and meaningful live coding sessions. Each topic covered follows a three-step structure: first, we have some headlines (facts); second, we continue with images (diagrams) meant to provide more details; and finally we convert the text and images into code written in the proper technology.

What You Will Learn

  • Attain a solid foundation in the most powerful and versatile technologies involved in data streaming: Apache Spark and Apache Kafka
  • Form a robust and clean architecture for a data streaming pipeline
  • Implement the correct tools to bring your data streaming architecture to life
  • Isolate the most problematic tradeoff for each tier involved in a data streaming pipeline
  • Query, analyze, and apply machine learning algorithms to collected data
  • Display analyzed pipeline data via Google Maps on your web browser
  • Discover and resolve difficulties in scaling and securing data streaming applications
Table of Contents

Introducing Data Streaming Architecture
1 The Course Overview
2 Discovering the Data Streaming Pipeline Blueprint Architecture
3 Analyzing Meetup RSVPs in Real-Time

Deployment of Collection and Message Queuing Tiers
4 Running the Collection Tier (Part I – Collecting Data)
5 Collecting Data Via the Stream Pattern and Spring WebSocketClient API
6 Explaining the Message Queuing Tier Role
7 Introducing Our Message Queuing Tier –Apache Kafka
8 Running The Collection Tier (Part II – Sending Data)

Proceeding to the Data Access Tier
9 Dissecting the Data Access Tier
10 Introducing Our Data Access Tier – MongoDB
11 Exploring Spring Reactive
12 Exposing the Data Access Tier in Browser

Implementing the Analysis Tier
13 Diving into the Analysis Tier
14 Streaming Algorithms For Data Analysis
15 Introducing Our Analysis Tier – Apache Spark
16 Plug-in Spark Analysis Tier to Our Pipeline
17 Brief Overview of Spark RDDs
18 Spark Streaming
19 DataFrames, Datasets and Spark SQL
20 Spark Structured Streaming
21 Machine Learning in 7 Steps
22 MLlib (Spark ML)
23 Spark ML and Structured Streaming
24 Spark GraphX

Mitigate Data Loss between Collection, Analysis and Message Queuing Tiers
25 Fault Tolerance (HML)
26 Kafka Connect
27 Securing Communication between Tiers