#Pandas
7 posts
Python Data Analysis #7 A Taste of Polars: Your Next Move When pandas Slows Down
When pandas struggles against millions of rows, Polars is the alternative. We compare reading, filtering, and groupby side by side in pandas and Polars code, explain the idea behind lazy mode, lay out criteria for choosing between the two tools, and close out the series.
Python Data Analysis #6 Visualization: matplotlib Fundamentals and Choosing Charts
The minimal matplotlib structure understood through Figure and Axes, fast plotting with DataFrame.plot, picking the right chart for each purpose, fixing broken fonts in CJK environments, and saving with savefig — the visualization fundamentals in one post.
Python Data Analysis #5 Grouping and Joining: groupby, pivot_table, merge
Starting from the classic sales-by-branch-by-month question, we build a mental model for groupby, multi-stat aggregation with agg, pivot_table for Excel users, and merge and concat as the pandas counterparts of SQL JOIN — plus the habit of checking row counts after every join.
Python Data Analysis #4: Transforming Data — New Columns, Dates, and Missing Values
One post covering the data cleanup phase in pandas: vectorized operations for new columns, the str and dt accessors, what NaN really is, how to decide between dropna and fillna, type conversion with astype, and removing duplicates.
Python Data Analysis #3: Selecting and Filtering — loc, iloc, and Boolean Indexing
How to pick out just the rows and columns you want in pandas: single vs. double brackets for column selection, the loc/iloc distinction, boolean indexing where a condition becomes a mask, the query method, and the danger that SettingWithCopyWarning is warning you about.
Python Data Analysis #2: Loading Data — CSV, Excel, and First Exploration
The encoding, sep, and dtype arguments of read_csv and the legacy-codepage trap, sheet selection in read_excel, and the routine of checking your data with head, info, and describe right after loading.
Python Data Analysis #1: Getting Started with pandas — Notebooks and the DataFrame
pandas is the Python library for working with tabular data. We set up a notebook environment with uv, build Series and DataFrame objects by hand, and kick off this seven-part data analysis series.