Python Data Analysis #6 Visualization: matplotlib Fundamentals and Choosing Charts

You can stare at a groupby result table for a long time and miss something that becomes obvious the moment you draw a single line chart. Questions like “when did sales start to dip?” or “did one outlier drag the whole average up?” are answered far faster by a picture than by numbers. In this post we establish the minimal structure of matplotlib, work out criteria for choosing the right chart, and cover the font problem that bites anyone labeling charts in a CJK language.

  • #1 Getting started
  • #2 Loading and exploring data
  • #3 Selecting and filtering
  • #4 Transforming and missing data
  • #5 Grouping and joining
  • #6 Visualization ← this post
  • #7 A taste of Polars (wrap-up)

The moments when a picture beats a table #

Summary statistics are weak on trends and outliers. Two datasets can share the exact mean and standard deviation that describe() reports, yet one rises steadily over time while the other jumps all over the place — completely different data. Trends only show up when you draw a line; outliers only show up when you plot the points. That’s why, in the exploration phase, plotting is not optional but a default tool.

First, let’s build the example data used throughout this post. It has the same shape as the order data we worked with in #5.

example data
import pandas as pd
import numpy as np

rng = np.random.default_rng(42)
dates = pd.date_range("2026-01-01", periods=180, freq="D")
df = pd.DataFrame({
    "order_date": rng.choice(dates, size=600),
    "category": rng.choice(["Clothing", "Food", "Electronics", "Books"], size=600),
    "amount": rng.integers(5_000, 120_000, size=600),
    "qty": rng.integers(1, 6, size=600),
})

matplotlib’s minimal model: Figure and Axes #

matplotlib looks like a lot to learn, but the structure you need in practice comes down to two things.

  • Figure: the entire canvas. Size and saving attach here.
  • Axes: one coordinate plane on the canvas. All actual drawing happens on an Axes.

Master the single plt.subplots() pattern that creates both at once, and you’re set.

basic pattern
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8, 4))
ax.plot([1, 2, 3, 4], [10, 30, 25, 40])
ax.set_title("Basic pattern")
plt.show()

Searching the web, you’ll see plenty of code that draws directly on plt, like plt.plot(...). That’s fine for quick experiments, but as soon as you have two or more plots, “which one am I drawing on right now?” becomes ambiguous. Start with fig, ax = plt.subplots() from day one and draw on ax, and the confusion never appears.

Fast plotting with DataFrame.plot #

pandas’s DataFrame.plot is a shortcut interface that calls matplotlib under the hood. In the exploration phase, this route is much faster.

DataFrame.plot
monthly = df.groupby(df["order_date"].dt.to_period("M"))["amount"].sum()
monthly.plot(figsize=(8, 4), marker="o")
plt.show()

The value plot() returns is a matplotlib Axes. So you can plot quickly with pandas, then continue with matplotlib methods for fine adjustments like titles and labels. The two tools aren’t separate — they’re touching the same picture.

plot with pandas, polish with matplotlib
ax = monthly.plot(figsize=(8, 4), marker="o")
ax.set_title("Monthly Sales")
ax.set_ylabel("Amount")

Choosing a chart: four are enough #

There are dozens of chart types, but in the analysis phase you actually use four. Start from “what do I want to show?” and there’s rarely anything to agonize over.

What you want to showChartCall
Trend over timeLine chartmonthly.plot()
Comparison across itemsBar chartby_cat.plot.bar()
Distribution of valuesHistogramdf["amount"].plot.hist(bins=30)
Relationship between two variablesScatter plotdf.plot.scatter(x="amount", y="qty")

Drawing all four in practice looks like this.

the four basic charts
by_cat = df.groupby("category")["amount"].sum().sort_values(ascending=False)

monthly.plot(marker="o")                  # trend: you can see where it turns
by_cat.plot.bar()                         # comparison: size differences across items
df["amount"].plot.hist(bins=30)           # distribution: skew and tails
df.plot.scatter(x="amount", y="qty")      # relationship: correlation and outliers

Conversely, the moment you need a pie chart or a 3D chart almost never arrives during analysis. For comparisons, bars are always read more accurately.

Styling with restraint: title, axis labels, legend — and stop #

Spending time picking color palettes and styles for an exploratory plot is waste. Even for a chart you’ll show someone else, these three are enough.

all the styling you need
fig, ax = plt.subplots(figsize=(8, 4))
monthly.plot(ax=ax, marker="o", label="Sales")
ax.set_title("Monthly Sales Trend")
ax.set_xlabel("Month")
ax.set_ylabel("Amount")
ax.legend()

What to avoid is just as clear. 3D charts distort along the depth axis, making values impossible to read accurately. Dual axes (twinx) can give the same data a completely different impression depending on how the two scales are chosen, so they easily mislead the viewer. If you want to compare two metrics, drawing two charts stacked vertically is the safer route.

Broken fonts in CJK environments #

If you label charts in Korean, Japanese, or Chinese, you’ll often find every title and label rendered as boxes (□). matplotlib’s default fonts simply lack CJK glyphs. The fix is to point font.family at a font your OS ships for that language — for example, on Windows “Malgun Gothic” for Korean or “Yu Gothic” for Japanese, on macOS “AppleGothic”, and on Linux an installed font such as “NanumGothic”.

CJK font setup (Korean example)
import platform
import matplotlib.pyplot as plt

if platform.system() == "Windows":
    plt.rcParams["font.family"] = "Malgun Gothic"
elif platform.system() == "Darwin":
    plt.rcParams["font.family"] = "AppleGothic"
else:
    plt.rcParams["font.family"] = "NanumGothic"   # must be installed

plt.rcParams["axes.unicode_minus"] = False

The last line is the trap people miss. After switching to a CJK font, the minus sign on negative axis labels turns into a box, because the Unicode minus (U+2212) matplotlib uses by default is missing from many CJK fonts. Turning off axes.unicode_minus substitutes a plain hyphen and everything renders normally. Treat the font setting and this option as a single set, and in a notebook, put this cell at the very top.

Saving and notebook display #

In a Jupyter notebook, running a cell displays the figure right below it. When running as a script, you need to call plt.show() for a window to appear. To save to a file, use savefig.

saving
fig.savefig("report.png", dpi=150, bbox_inches="tight")
  • dpi: resolution. 100 is enough for on-screen checks, 150–300 for embedding in documents.
  • bbox_inches="tight": prevents labels from being clipped at the figure’s edges.

One caution: in some environments, calling savefig after plt.show() saves a blank figure. If saving is the goal, always call savefig before plt.show().

Putting it together: one report figure from groupby results #

Let’s place the two aggregates from #5 — the monthly trend and the category comparison — side by side in a single figure. The structure: create two Axes with plt.subplots(1, 2), then plug each aggregate in via the ax= argument.

combined example
monthly = df.groupby(df["order_date"].dt.to_period("M"))["amount"].sum()
by_cat = df.groupby("category")["amount"].sum().sort_values(ascending=False)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

monthly.plot(ax=axes[0], marker="o")
axes[0].set_title("Monthly Sales Trend")
axes[0].set_ylabel("Amount")

by_cat.plot.bar(ax=axes[1], rot=0)
axes[1].set_title("Sales by Category")

fig.tight_layout()
fig.savefig("monthly_report.png", dpi=150, bbox_inches="tight")

fig.tight_layout() automatically adjusts spacing so the two charts’ labels don’t collide. This division of labor — pandas does the aggregation, matplotlib does the layout and saving — is the basic skeleton of visualization code. Even with four charts, the same pattern extends as is with plt.subplots(2, 2) and axes[row, col] indexing.

Wrap-up #

What this post covered:

  • matplotlib’s structure boils down to Figure (the canvas) and Axes (the coordinate plane), and everything starts from the one plt.subplots() pattern
  • DataFrame.plot is a shortcut interface over matplotlib, and you can keep polishing the Axes it returns
  • Pick charts by purpose: line for trends, bar for comparisons, histogram for distributions, scatter for relationships
  • Style only up to title, axis labels, and legend; avoid 3D and dual axes
  • A CJK font setting and axes.unicode_minus = False always travel as a set
  • Save with savefig, specifying dpi and bbox_inches="tight" together

In the next post (#7 A taste of Polars and wrap-up), we sample Polars, the next-generation library that’s far faster than pandas, and close the series with a one-page map of all seven posts.

X