English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 64 Lessons (8h 52m) | 1.00 GB
Master Python techniques and libraries to reduce run times, efficiently handle huge datasets, and optimize execution for complex machine learning applications.
Fast Python is a toolbox of techniques for high performance Python including
- Writing efficient pure-Python code
- Optimizing the NumPy and pandas libraries
- Rewriting critical code in Cython
- Designing persistent data structures
- Tailoring code for different architectures
- Implementing Python GPU computing
Fast Python is your guide to optimizing every part of your Python-based data analysis process, from the pure Python code you write to managing the resources of modern hardware and GPUs. You’ll learn to rewrite inefficient data structures, improve underperforming code with multithreading, and simplify your datasets without sacrificing accuracy.
Written for experienced practitioners, this book dives right into practical solutions for improving computation and storage efficiency. You’ll experiment with fun and interesting examples such as rewriting games in Cython and implementing a MapReduce framework from scratch. Finally, you’ll go deep into Python GPU computing and learn how modern hardware has rehabilitated some former antipatterns and made counterintuitive ideas the most efficient way of working.
Face it. Slow code will kill a big data project. Fast pure-Python code, optimized libraries, and fully utilized multiprocessor hardware are the price of entry for machine learning and large-scale data analysis. What you need are reliable solutions that respond faster to computing requirements while using less resources, and saving money.
Fast Python is a toolbox of techniques for speeding up Python, with an emphasis on big data applications. Following the clear examples and precisely articulated details, you’ll learn how to use common libraries like NumPy and pandas in more performant ways and transform data for efficient storage and I/O. More importantly, Fast Python takes a holistic approach to performance, so you’ll see how to optimize the whole system, from code to architecture.
What’s Inside
- Rewriting critical code in Cython
- Designing persistent data structures
- Tailoring code for different architectures
- Implementing Python GPU computing
Table of Contents
1 Part 1. Foundational Approaches
2 An urgent need for efficiency in data processing
3 Modern computing architectures and high-performance computing
4 Working with Python’s limitations
5 A summary of the solutions
6 Summary
7 Extracting maximum performance from built-in features
8 Profiling code to detect performance bottlenecks
9 Optimizing basic data structures for speed Lists, sets, and dictionaries
10 Finding excessive memory allocation
11 Using laziness and generators for big-data pipelining
12 Summary
13 Concurrency, parallelism, and asynchronous processing
14 Implementing a basic MapReduce engine
15 Implementing a concurrent version of a MapReduce engine
16 Using multiprocessing to implement MapReduce
17 Tying it all together An asynchronous multithreaded and multiprocessing MapReduce server
18 Summary
19 High-performance NumPy
20 Using array programming
21 Tuning NumPy’s internal architecture for performance
22 Summary
23 Part 2. Hardware
24 Re-implementing critical code with Cython
25 A whirlwind tour of Cython
26 Profiling Cython code
27 Optimizing array access with Cython memoryviews
28 Writing NumPy generalized universal functions in Cython
29 Advanced array access in Cython
30 Parallelism with Cython
31 Summary
32 Memory hierarchy, storage, and networking
33 Efficient data storage with Blosc
34 Accelerating NumPy with NumExpr
35 The performance implications of using the local network
36 Summary
37 Part 3. Applications and Libraries for Modern Data Processing
38 High-performance pandas and Apache Arrow
39 Techniques to increase data analysis speed
40 pandas on top of NumPy, Cython, and NumExpr
41 Reading data into pandas with Arrow
42 Using Arrow interop to delegate work to more efficient languages and systems
43 Summary
44 Storing big data
45 Parquet An efficient format to store columnar data
46 8. Dealing with larger-than-memory datasets the old-fashioned way
47 Zarr for large-array persistence
48 Summary
49 Part 4. Advanced Topics
50 Data analysis using GPU computing
51 Using Numba to generate GPU code
52 Performance analysis of GPU code The case of a CuPy application
53 Summary
54 Analyzing big data with Dask
55 The computational cost of Dask operations
56 Using Dask’s distributed scheduler
57 Summary
58 Setting up the environment
59 Installing your own Python distribution
60 Using Docker
61 Hardware considerations
62 Using Numba to generate efficient low-level code
63 Writing explicitly parallel functions in Numba
64 Writing NumPy-aware code in Numba
Resolve the captcha to access the links!