Advanced Analytics and Real-Time Data Processing in Apache Spark

Advanced Analytics and Real-Time Data Processing in Apache Spark

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 3h 24m | 791 MB

Implement high velocity streaming for real-time data processing along with machine learning, graph analysis operations using Spark MLlib, GraphX, SparkR on Apache Spark and explore some Analytical use-cases on Spark.

This comprehensive tutorial will acquaint you with all the aspects of real-time analytics with Apache Spark, one of the trending Big Data processing frameworks on the market today. It will show you how to leverage the features of various components of the Spark framework to efficiently process, analyze, and visualize your data.

You will learn how to implement the high velocity streaming operation for data processing in order to perform efficient analytics on your real-time data. You’ll analyze data using machine learning techniques and graphs. You’ll learn about Spark Streaming and create real-world streaming processing that address all the problems that need to be solved. You’ll solve problems using Machine Learning techniques and find out about all the tools available in the MLlibtoolkit. You’ll find out how to leverage Graphs to solve real-world problems.

At the end of this video, you’ll also see some useful Machine Learning algorithms with the help of Spark MLlib and will integrate Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you’ll learn more about the GraphX API. By the end, you’ll be well-versed in the aspects of real-time analytics and implement them with Apache Spark.

Filled with hands-on examples, this course will help you perform data analysis and take you from an intermediate level to an advanced approach to data analytics. You will perform graph analysis, handling high velocity streaming with some analytical use cases.

What You Will Learn

  • Real-time data streaming processes and operations with Spark Streaming
  • Implement high-velocity streaming and data processing use cases while working with streaming API
  • Dive into MLlib– the machine learning functional library in Spark with highly scalable algorithms.
  • Createmachine learning pipelines to combine multiple algorithms in a single workflow.
  • Understand graphs and the Apache Spark API for graphs—GraphX
  • Apply interesting graph algorithms and graph processing with GraphX in a distributed environment
  • Use R, the popular statistical language, to work with Spark—SparkR
  • See how SparkR allows users to create and transform RDDs in R
  • See analytical use case implementations using MLLib, GraphX, and Spark Streaming
Table of Contents

Spark Streaming
Integrating Spark Streaming with Apache Kafka
Introducing Spark Streaming
Join and Output Operations
mapWithState Operation
Output Operations -Saving Results to Kafka Sink
Processing Streaming Data
Spark Streaming – Understanding Master URL
Spark Streaming Word Count Hands-On
Streaming Context
The Course Overview
Transform and Window Operation
Use Cases

Advance Streaming and Use Cases
Building Streaming Application -Handling Events That Are Not in Order
Connecting External Systems That Works in At Least Once Guarantee – Deduplicaion
Filtering Bots from Stream of Page View Events
Handling Time in High Velocity Streams

Spark MLlib and ML Pipelines
Clustering
Collaborative Filtering – Building Recommendation Engine
Feature Extraction and Transformation
Implementing GMM in Apache Spark
Introducing Machine Learning with Spark
Logistic Regression
Model Evaluation
Principal Component Analysis and Distributing the Singular Value Decomposition (SVD)
Transforming Text into Vector of Numbers – ML Bag-of-Words Technique

Spark GraphX
Create a Graph Using GraphX and Property Graph
Importing GraphX
Introducing Spark GraphX – How to Represent a Graph
Limitations of Graph-Parallel System – Why Spark GraphX
List of Operators
Perform Graph Operations Using GraphX
Triplet View

Performing Spark GraphX Operations
Caching and Uncaching
Counting Degree of Vertex
GraphBuilder
Neighbourhood Aggregations – Collecting Neighbours
Perform Subgraph Operations
Structural Operators – Connected Components
Vertex and Edge RDD

SparkR
Creating Spark DataFrames from Data Sources
Introduction to SparkR and How It’s Used
Run a Given Function on a Large Dataset Using dapply or dapplyCollect
Run Local R Functions Distributed Using spark.lapply
Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect
Running SQL Queries from SparkR
Setting Up from RStudio
SparkDataFrames Operations – Grouping, Aggregation

Analytical Use Cases
PageRank Using Spark GraphX
Sending Real-Time NotificationWhen User Want to Buy a Product on the E-Commerce Site