Data Analysis with Python and PySpark by Jonathan Rioux is a practical guide for data professionals looking to leverage the power of PySpark for big data processing and analysis. This book bridges the gap between traditional Python-based data analysis and distributed computing, enabling readers to handle large datasets efficiently.
What's inside:
- PySpark Basics: Installing, Configuring, and Running Code
- Working with DataFrames and Spark SQL for Data Processing
- Using Grouping, Aggregation, and Windowing Functions
- Scalable Machine Learning and Statistical Analysis with PySpark
- Optimizing Performance and Managing Cluster Resources
The book is aimed at data scientists, analysts, and developers who want to integrate PySpark into their projects and improve the efficiency of working with large volumes of data.
Характеристики книги | |
Автор | Jonathan Rioux |
Количество страниц | 456 |
Обложка | Мягкая |
Тип полиграфической бумаги | офсет |
Язык издания | Английский |