Language
  • Python 3
Reading time
  • Approximately 53 days
What you will learn
  • Numerical Programming and Data Mining
Author
  • Theodore Petrou
Published
  • 7 years, 1 month ago
Packages you will be introduced to
  • pandas
  • matplotlib
  • seaborn

Key Features

  • Use the power of pandas to solve most complex scientific computing problems with ease
  • Leverage fast, robust data structures in pandas to gain useful insights from your data
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas

Book Description

Pandas is one of the most powerful, flexible, and efficient scientific computing packages in Python. With this book, you will explore data in pandas through dozens of practice problems with detailed solutions in iPython notebooks.

This book will provide you with clean, clear recipes, and solutions that explain how to handle common data manipulation and scientific computing tasks with pandas. You will work with different types of datasets, and perform data manipulation and data wrangling effectively. You will explore the power of pandas DataFrames and find out about boolean and multi-indexing. Tasks related to statistical and time series computations, and how to implement them in financial and scientific applications are also covered in this book.

By the end of this book, you will have all the knowledge you need to master pandas, and perform fast and accurate scientific computing.

What you will learn

  • Master the fundamentals of pandas to quickly begin exploring any dataset
  • Isolate any subset of data by properly selecting and querying the data
  • Split data into independent groups before applying aggregations and transformations to each group
  • Restructure data into a tidy form to make data analysis and visualization easier
  • Prepare messy real-world datasets for machine learning
  • Combine and merge data from different sources through pandas SQL-like operations
  • Utilize pandas unparalleled time series functionality
  • Create beautiful and insightful visualizations through pandas direct hooks to matplotlib and seaborn

About the Author

Theodore Petrou is a data scientist and the founder of Dunder Data, a professional educational company focusing on exploratory data analysis. He is also the head of Houston Data Science, a meetup group with more than 2,000 members that has the primary goal of getting local data enthusiasts together in the same room to practice data science. Before founding Dunder Data, Ted was a data scientist at Schlumberger, a large oil services company, where he spent the vast majority of his time exploring data.

Some of his projects included using targeted sentiment analysis to discover the root cause of part failure from engineer text, developing customized client/server dashboarding applications, and real-time web services to avoid the mispricing of sales items. Ted received his masters degree in statistics from Rice University, and used his analytical skills to play poker professionally and teach math before becoming a data scientist. Ted is a strong supporter of learning through practice and can often be found answering questions about pandas on Stack Overflow.

Table of Contents

  1. Pandas Foundations
  2. Essential DataFrame Operations
  3. Beginning Data Analysis
  4. Selecting Subsets of Data
  5. Boolean Indexing
  6. Index Alignment
  7. Grouping for Aggregation, Filtration and Transformation
  8. Restructuring Data into Tidy Form
  9. Joining multiple pandas objects
  10. Time Series
  11. Visualization
The author Theodore Petrou has the following credentials.

  • Works/Worked at Schlumberger