R for Data Science Solutions

R for Data Science Solutions

English | 2016 | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 5.5 Hours | 2.14 GB

Over 100 hands-on tasks to help you effectively solve real-world data problems using the most popular R packages and techniques

R is a data analysis software as well as a programming language. Data scientists, statisticians and analysts use R for statistical analysis, data visualization and predictive modeling. R is open source and allows integration with other applications and systems. Compared to other data analysis platforms, R has an extensive set of data products. Problems faced with data are cleared with R’s excellent data visualization feature.

The first section in this course deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the ‘dplyr’ and ‘data.table’ packages to efficiently process larger data structures. We also focus on ‘ggplot2’ and show you how to create advanced figures for data exploration.

In addition, you will learn how to build an interactive report using the “ggvis” package. Later sections offer insight into time series analysis, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

By the end of this course, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

What You Will Learn

  • Get to know the functional characteristics of R language
  • Extract, transform, and load data from heterogeneous sources-
  • Understand how easily R can confront probability and statistics problems
  • Get simple R instructions to quickly organize and manipulate large datasets
  • Create professional data visualizations and interactive reports
  • Predict user purchase behavior by adopting a classification approach
  • Implement data mining techniques to discover items that are frequently purchased together
  • Group similar text documents by using various clustering methods
Table of Contents

Functions in R
1.R Functions and Arguments
2.Understanding Environments
3.Working with Lexical Scoping
4.Understanding Closure
5.Performing Lazy Evaluation
6.Creating Infix Operators
7.Using the Replacement Function
8.Handling Errors in a Function
9.The Debugging Function

Data Extracting, Transforming, and Loading
10.Downloading Open Data
11.Reading and Writing CSV Files
12.Scanning Text Files
13.Working with Excel Files
14.Reading Data from Databases
15.Scraping Web Data

Data Pre-Processing and Preparation
16.Renaming the Data Variable
17.Converting Data Types
18.Working with Date Format
19.Adding New Records
20.Filtering Data
21.Dropping Data
22.Merging and Sorting Data
23.Reshaping Data
24.Detecting Missing Data
25.Imputing Missing Data

Data Manipulation
26.Enhancing a data.frame with a data.table
27.Managing Data with data.table
28.Performing Fast Aggregation with data.table
29.Merging Large Datasets with a data.table
30.Subsetting and Slicing Data with dplyr
31.Sampling Data with dplyr
32.Selecting Columns with dplyr
33.Chaining Operations in dplyr
34.Arranging Rows with dplyr
35.Eliminating Duplicated Rows with dplyr
36.Adding New Columns with dplyr
37.Summarizing Data with dplyr
38.Merging Data with dplyr

Visualizing Data with ggplot2
39.Creating Basic Plots with ggplot2
40.Changing Aesthetics Mapping
41.Introducing Geometric Objects
42.Performing Transformations
43.Adjusting Scales
45.Adjusting Themes
46.Combining Plots
47.Creating Maps

Making Interactive Reports
48.Creating R Markdown Reports
49.Learning the Markdown Syntax
50.Embedding R Code Chunks
51.Creating Interactive Graphics with ggvis
52.Understanding Basic Syntax and Gramma
53.Controlling Axes and Legends and Using Scales
54.Adding Interactivity to a ggvis Plot
55.Creating an R Shiny Document
56.Publishing an R Shiny Report

Simulation from Probability Distributions
57.Generating Random Samples
58.Understanding Uniform Distributions
59.Generating Binomial Random Variates
60.Generating Poisson Random Variates
61.Sampling from a Normal Distribution
62.Sampling from a Chi-Squared Distribution
63.Understanding Student-s t- Distribution
64.Sampling from a Dataset
65.Simulating the Stochastic Process

Statistical Inference in R
66.Getting Confidence Intervals
67.Performing Z-tests
68.Performing Student-s t-Tests
69.Conducting Exact Binomial Tests
70.Performing Kolmogorov-Smirnov Tests
71.Working with the Pearson-s Chi-Squared Tests
72.Understanding the Wilcoxon Rank Sum and Signed Rank Tests
73.Conducting One-way ANOVA
74.Performing Two-way ANOVA

Rule and Pattern Mining with R
75.Transforming Data into Transactions
76.Displaying Transactions and Associations
77.Mining Associations with the Apriori Rule
78.Pruning Redundant Rules
79.Visualizing Association Rules
80.Mining Frequent Itemsets with Eclat
81.Creating Transactions with Temporal Information
82.Mining Frequent Sequential Patterns with cSPADE

Time Series Mining with R
83.Creating Time Series Data
84.Plotting a Time Series Object
85.Decomposing Time Series
86.Smoothing Time Series
87.Forecasting Time Series
88.Selecting an ARIMA Model
89.Creating an ARIMA Model
90.Forecasting with an ARIMA Model
91.Predicting Stock Prices with an ARIMA Model

Supervised Machine Learning
92.Fitting a Linear Regression Model with lm
93.Summarizing Linear Model Fits
94.Using Linear Regression to Predict Unknown Values
95.Measuring the Performance of the Regression Model
96.Performing a Multiple Regression Analysis
97.Selecting the Best-Fitted Regression Model with Stepwise Regression
98.Applying the Gaussian Model for Generalized Linear Regression
99.Performing a Logistic Regression Analysis
100.Building a Classification Model with Recursive Partitioning Trees
101.Visualizing Recursive Partitioning Tree
102.Measuring Model Performance with a Confusion Matrix
103.Measuring Prediction Performance Using ROCR

Unsupervised Machine Learning
104.Clustering Data with Hierarchical Clustering
105.Cutting Tree into Clusters
106.Clustering Data with the k-means Method
107.Clustering Data with the Density-Based Method
108.Extracting Silhouette Information from Clustering
109.Comparing Clustering Methods
110.Recognizing Digits Using the Density-Based Clustering Method
111.Grouping Similar Text Documents with k-means Clustering Method
112.Performing Dimension Reduction with Principal Component Analysis (PCA)
113.Determining the Number of Principal Components Using a Scree Plot
114.Determining the Number of Principal Components Using the Kaiser Method
115.Visualizing Multivariate Data Using a biplot