
Amazon Prime Free Trial
FREE Delivery is available to Prime members. To join, select "Try Amazon Prime and start saving today with FREE Delivery" below the Add to Cart button and confirm your Prime free trial.
Amazon Prime members enjoy:- Cardmembers earn 5% Back at Amazon.com with a Prime Credit Card.
- Unlimited FREE Prime delivery
- Streaming of thousands of movies and TV shows with limited ads on Prime Video.
- A Kindle book to borrow for free each month - with no due dates
- Listen to over 2 million songs and hundreds of playlists
Important: Your credit card will NOT be charged when you start your free trial or if you cancel during the trial period. If you're happy with Amazon Prime, do nothing. At the end of the free trial, your membership will automatically upgrade to a monthly membership.
Buy new:
$64.99$64.99
Ships from: Amazon.com Sold by: Amazon.com
Save with Used - Like New
$11.52$11.52
Ships from: Amazon Sold by: The Quality Books

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the author
OK
Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning 1st Edition
Purchase options and add-ons
Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches.
Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science.
You’ll learn how to:
- Automate and schedule data ingest, using an App Engine application
- Create and populate a dashboard in Google Data Studio
- Build a real-time analysis pipeline to carry out streaming analytics
- Conduct interactive data exploration with Google BigQuery
- Create a Bayesian model on a Cloud Dataproc cluster
- Build a logistic regression machine-learning model with Spark
- Compute time-aggregate features with a Cloud Dataflow pipeline
- Create a high-performing prediction model with TensorFlow
- Use your deployed model as a microservice you can access from both batch and real-time pipelines
- ISBN-101491974567
- ISBN-13978-1491974568
- Edition1st
- PublisherO'Reilly Media
- Publication dateJanuary 16, 2018
- LanguageEnglish
- Dimensions7 x 0.83 x 9.19 inches
- Print length404 pages
There is a newer edition of this item:
$54.20
(25)
Only 15 left in stock (more on the way).
Frequently bought together

Customers who viewed this item also viewed
From the Publisher

From the Preface
In this book, we walk through an example of this new transformative, more collaborative way of doing data science. You will learn how to implement an end-to-end data pipeline-we will begin with ingesting the data in a serverless way and work our way through data exploration, dashboards, relational databases, and streaming data all the way to training and making operational a machine learning model. I cover all these aspects of data-based services because data engineers will be involved in designing the services, developing the statistical and machine learning models and implementing them in large-scale production and in real time.
Who This Book Is For
If you use computers to work with data, this book is for you. You might go by the title of data analyst, database administrator, data engineer, data scientist, or systems programmer today. Although your role might be narrower today (perhaps you do only data analysis, or only model building, or only DevOps), you want to stretch your wings a bit-you want to learn how to create data science models as well as how to implement them at scale in production systems.
Google Cloud Platform is designed to make you forget about infrastructure. The marquee data services-Google BigQuery, Cloud Dataflow, Cloud Pub/Sub, and Cloud ML Engine-are all serverless and autoscaling. When you submit a query to BigQuery, it is run on thousands of nodes, and you get your result back; you don’t spin up a cluster or install any software. Similarly, in Cloud Dataflow, when you submit a data pipeline, and in Cloud Machine Learning Engine, when you submit a machine learning job, you can process data at scale and train models at scale without worrying about cluster management or failure recovery. Cloud Pub/Sub is a global messaging service that autoscales to the throughput and number of subscribers and publishers without any work on your part. Even when you’re running open source software like Apache Spark that’s designed to operate on a cluster, Google Cloud Platform makes it easy. Leave your data on Google Cloud Storage, not in HDFS, and spin up a job-specific cluster to run the Spark job. After the job completes, you can safely delete the cluster. Because of this job-specific infrastructure, there’s no need to fear overprovisioning hardware or running out of capacity to run a job when you need it. Plus, data is encrypted, both at rest and in transit, and kept secure. As a data scientist, not having to manage infrastructure is incredibly liberating.
The reason that you can afford to forget about virtual machines and clusters when running on Google Cloud Platform comes down to networking. The network bisection bandwidth within a Google Cloud Platform datacenter is 1 PBps, and so sustained reads off Cloud Storage are extremely fast. What this means is that you don’t need to shard your data as you would with traditional MapReduce jobs. Instead, Google Cloud Platform can autoscale your compute jobs by shuffling the data onto new compute nodes as needed. Hence, you’re liberated from cluster management when doing data science on Google Cloud Platform.
These autoscaled, fully managed services make it easier to implement data science models at scale-which is why data scientists no longer need to hand off their models to data engineers. Instead, they can write a data science workload, submit it to the cloud, and have that workload executed automatically in an autoscaled manner. At the same time, data science packages are becoming simpler and simpler. So, it has become extremely easy for an engineer to slurp in data and use a canned model to get an initial (and often very good) model up and running. With well-designed packages and easy-to-consume APIs, you don’t need to know the esoteric details of data science algorithms-only what each algorithm does, and how to link algorithms together to solve realistic problems. This convergence between data science and data engineering is why you can stretch your wings beyond your current role.
Rather than simply read this book cover-to-cover, I strongly encourage you to follow along with me by also trying out the code. The full source code for the end-to-end pipeline I build in this book is on GitHub. Create a Google Cloud Platform project and after reading each chapter, try to repeat what I did by referring to the code and to the Readme file in each folder of the GitHub repository.
Editorial Reviews
About the Author
Product details
- Publisher : O'Reilly Media; 1st edition (January 16, 2018)
- Language : English
- Paperback : 404 pages
- ISBN-10 : 1491974567
- ISBN-13 : 978-1491974568
- Item Weight : 1.42 pounds
- Dimensions : 7 x 0.83 x 9.19 inches
- Best Sellers Rank: #1,652,904 in Books (See Top 100 in Books)
- #415 in Database Storage & Design
- #709 in Data Modeling & Design (Books)
- #1,022 in Data Processing
- Customer Reviews:
About the author

Lak is Head for Data Analytics and AI Solutions on Google Cloud. His team builds software solutions for business problems using Google Cloud's data analytics and machine learning products. He is the author of Machine Learning Design Patterns, Data Science on GCP (O'Reilly), BigQuery the Definitive Guide (O'Reilly). He founded Google's Advanced Solutions Lab ML Immersion program. Before Google, Lak was a Director of Data Science at Climate Corporation and a Research Scientist at NOAA. He's the original author of several Coursera specializations including Machine Learning on GCP, Advanced Machine Learning on GCP, and Data Engineering.
Follow him on Twitter at @lak_luster.
http://www.vlakshman.com/
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonReviews with images

Data analysis and engineering is democratized for all
Top reviews from the United States
There was a problem filtering reviews. Please reload the page.
- Reviewed in the United States on November 29, 2019I knew this book for me just a few pages into the first chapter. This book by Lake is unlike many other books of data science and particular technology that just enumerate the how-to's of the particular technology. Lak starts with a concrete user problem strongly anchored in probabilistic outcomes, and then steps through a typical data science process of discovery, refinement, and then converting to a production pipeline. While teaching about GCP technologies along the way, the book stays strongly anchored in the original user-problem. There is not a corner of GCP that is needed for a full production data science product that goes untouched in this book. The material is well covered, with pointers to deeper material and user manuals.
I received the first edition. As GCP technology evolved, Lak was posting updates to his blog on Medium so that everyone could take understand the updates to GCP and how to use them. I was pleasantly surprised by getting these updates and made having the book that much more valuable.
- Reviewed in the United States on January 16, 2018Wow. A true tour of data science and engineering on the cloud.
It's been a few years since I've worked with tools in this field, but this book was a clear level-headed view for data engineers looking to derive and drive insights from data. Using a core example use case and following it end to end through the entire book (and indeed cloud tools integrated with each other) helped me keep track of what was going on, and kept things from becoming a book on theory rather than one of accomplishment and answers. The purpose and process for each tool was clear, and I also appreciated the explanations of trade-offs and the value added for the choices made. The practice of data science is a LOT easier now with cloud/serverless tools than eight or nine years ago, and I feel this brought me back to the state of the art.
5.0 out of 5 starsWow. A true tour of data science and engineering on the cloud.Data analysis and engineering is democratized for all
Reviewed in the United States on January 16, 2018
It's been a few years since I've worked with tools in this field, but this book was a clear level-headed view for data engineers looking to derive and drive insights from data. Using a core example use case and following it end to end through the entire book (and indeed cloud tools integrated with each other) helped me keep track of what was going on, and kept things from becoming a book on theory rather than one of accomplishment and answers. The purpose and process for each tool was clear, and I also appreciated the explanations of trade-offs and the value added for the choices made. The practice of data science is a LOT easier now with cloud/serverless tools than eight or nine years ago, and I feel this brought me back to the state of the art.
Images in this review
- Reviewed in the United States on August 2, 2019While Lak’s conversational style can be a turn off to some who just want an answer and don’t care about how, I liked this book. Many times with books like this you get an answer or a recipe and you’re done. What happens when your answer or recipe isn’t right for the situation? I’m glad Lak explains his rationale and let’s it be known that there’s more than one way to do it. Could the book have been condensed without the explanations? Yes. Would it have been like almost every other book in the space? Yes. Check out this book if you want a well thought out answer and maybe alternates. If you just want the “right answer”, then buy something else.
- Reviewed in the United States on January 29, 2021I do not understand the high reviews for this book, especially ones written in 2020. I'm only into chapter 2 and the code to download the files fails. There is a supplement on the github page that allowed me to copy the bucket. But, the explanation, like many things is vague and not accurate (you don't provide the path to your bucket, but just the name of the bucket). I assumed this book was an introduction to using the Google Cloud Platform for data science. So I am expecting an introduction. This book has detail where it doesn't need it, and lacks detail where it does. It just assumes you have already been using GCP, but if that were the case this book isn't really needed then.
Major Problems:
1. Code is not working.
2. Code is not explained in any detail.
3. Vague details about how to navigate GCP (chapter one has you create a bucket, but doesn't explain what a bucket is, and how to create it, yet there are three pages about the definition of a data engineer).
4. Inconsistent assumptions about your background knowledge.
Good parts:
1. The use of a case study for learning.
- Reviewed in the United States on June 11, 2019The book is easy to follow with detailed descriptions of each step followed to build a project from start to end on the Google Cloud Platform.
The book is also accompanied by a code repository which lets the readers try out the project themselves.
Strongly recommended for data scientists learning to use the platform.
- Reviewed in the United States on January 7, 2020Narrative structure in a technical book is hard to find, and this was executed last masterfully, with lots of code examples for you to follow along with on your own. Highly recommended.
- Reviewed in the United States on May 21, 2020Excellent book for learning which GCP services can be used for what portions of data analytic pipelines. From data acquisition all the way to model revalidation.
- Reviewed in the United States on May 5, 2021This product is more akin to a course than a reference book. I tried flipping over to the chapter on Cloud-SQL (actually the author only goes into BigQuery so I ended up scrolling through Stack Overflow anyway.) When I finally found the relevant chapter, it was impossible to disentangle the SQL code from the class objects built in the proceeding 6 chapters. Do not buy this book if you have any intention other than reading every single page in order. Otherwise, you'll end up doing what I did, which reading stack overflow and medium articles to mixed effect.
Top reviews from other countries
-
KMoreno8Reviewed in Mexico on May 10, 2022
5.0 out of 5 stars Para cualquiera que quiere introducirse o conocer del tema con GCP
Cualquier persona que trabaje en ámbito de datos potencialmente usará Google Cloud. Este libro te da un buen fundamento para ello.
-
宗教好きReviewed in Japan on July 6, 2018
5.0 out of 5 stars 一通り学ぶのに適している
良い商品です、英語ですが一通りのことを学べるように書いてあります。
グーグルクラウドでデータサイエンスをしようと思っている人間ですが良い入門書となりました。
- Chandra Shekhar SinghReviewed in India on October 10, 2019
4.0 out of 5 stars Get to the heart of data science straight away 😊👍
Its sets very clear direction for aspiring data engineers / scientists as well what is expected out of them.
- Amazon CustomerReviewed in Canada on November 21, 2018
5.0 out of 5 stars Great
Great
-
Wolfgang GierscheReviewed in Germany on May 17, 2018
5.0 out of 5 stars Very knowledgeable author. Balanced and informative reasoning
Very knowledgeable author. Balanced and at time beautiful reasoning, presented in an understandable way. Definitely a must-read for Google cloud practitioners.