Seaborn Pairplot: How to Display Off-Diagonal KDEs for Two Classes (Instead of Scatterplots)

Exploratory Data Analysis (EDA) is a critical step in understanding your data, and visualizations are its backbone. Among the most powerful EDA tools in Python is Seaborn’s pairplot, which visualizes pairwise relationships between variables in a dataset. By default, pairplot uses scatterplots for off-diagonal elements (to show bivariate relationships) and histograms/KDEs for diagonal elements (to show univariate distributions). However, when analyzing two classes (e.g., "healthy" vs. "diseased" or "0" vs. "1"), scatterplots on the off-diagonal can become cluttered due to overplotting, making it hard to distinguish how each class is distributed.

In this blog, we’ll explore how to replace off-diagonal scatterplots with Kernel Density Estimation (KDE) plots in Seaborn’s pairplot. KDEs visualize the probability density of data, making it easier to compare the distribution of two classes and identify overlap or separation. We’ll walk through a step-by-step implementation, explain the benefits, and share customization tips to make your visualizations more insightful.

Table of Contents#

  1. What is a Seaborn Pairplot?
  2. Default Pairplot Behavior: Diagonals vs. Off-Diagonals
  3. The Problem with Scatterplots for Two Classes
  4. Solution: Replace Off-Diagonal Scatterplots with KDEs
  5. Step-by-Step Implementation
  6. Advanced Customization
  7. Conclusion
  8. References

1. What is a Seaborn Pairplot?#

Seaborn’s pairplot is a high-level function for visualizing pairwise relationships in a dataset. It creates a grid of plots where:

  • Diagonal elements show univariate distributions (e.g., histograms or KDEs) of individual variables.
  • Off-diagonal elements show bivariate relationships between pairs of variables (e.g., scatterplots or KDEs).

pairplot is particularly useful for EDA because it quickly reveals correlations, outliers, and distribution shapes across multiple variables. It works seamlessly with pandas DataFrames and supports a hue parameter to group data by a categorical variable (e.g., class labels), making it ideal for comparing two classes.

2. Default Pairplot Behavior: Diagonals vs. Off-Diagonals#

Let’s start by understanding the default behavior of pairplot with a two-class dataset. We’ll use the breast cancer dataset from sklearn, which contains 30 features describing breast mass characteristics, with a binary target (0 = malignant, 1 = benign).

Default Pairplot Code (Scatterplots for Off-Diagonals):#

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_breast_cancer
 
# Load dataset
cancer = load_breast_cancer()
df = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
df["target"] = cancer.target
df["target"] = df["target"].map({0: "Malignant", 1: "Benign"})  # Rename targets for clarity
 
# Select a subset of features to avoid clutter
features = ["mean radius", "mean texture", "mean perimeter", "mean area"]
 
# Default pairplot with scatterplots on off-diagonal
sns.pairplot(df, hue="target", vars=features, corner=True)
plt.suptitle("Default Pairplot: Scatterplots on Off-Diagonal", y=1.02)
plt.show()

What You’ll See:#

  • Diagonals: Histograms (default for non-hue data) or KDEs (if hue is specified) showing the distribution of each feature for both classes.
  • Off-diagonals: Scatterplots with points colored by class (Malignant vs. Benign).

3. The Problem with Scatterplots for Two Classes#

While scatterplots show individual data points, they suffer from overplotting when data is dense. For two classes with overlapping distributions:

  • Points from both classes may merge visually, hiding the true shape of each class’s distribution.
  • It’s hard to quantify overlap (e.g., "Do malignant tumors have higher mean radius?").

For example, in the breast cancer dataset, features like mean radius and mean perimeter are highly correlated. A scatterplot of these features will have dense clusters, making it hard to distinguish how "Malignant" and "Benign" classes are distributed.

4. Solution: Replace Off-Diagonal Scatterplots with KDEs#

Kernel Density Estimation (KDE) plots solve this problem by estimating the probability density of a dataset. Instead of plotting individual points, KDEs show smooth curves (or filled regions) representing where data is most concentrated. For two classes, overlaid KDEs on off-diagonal elements make it easy to:

  • See the shape of each class’s distribution (e.g., "Benign tumors have lower mean radius density").
  • Identify overlap (e.g., "Malignant and Benign overlap most in the 10–15 range for mean texture").

Seaborn’s pairplot supports this via the kind parameter. Setting kind="kde" replaces off-diagonal scatterplots with KDEs, and diag_kind="kde" ensures diagonal elements remain KDEs (for consistency).

5. Step-by-Step Implementation#

Let’s modify the earlier example to use KDEs on off-diagonal elements.

Step 1: Import Libraries#

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_breast_cancer

Step 2: Load and Prepare Data#

We’ll use the same breast cancer dataset, with target labels renamed for clarity:

cancer = load_breast_cancer()
df = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
df["target"] = cancer.target
df["target"] = df["target"].map({0: "Malignant", 1: "Benign"})  # Class labels
features = ["mean radius", "mean texture", "mean perimeter", "mean area"]  # Subset of features

Step 3: Create Pairplot with Off-Diagonal KDEs#

Use kind="kde" to set off-diagonal plots to KDEs and diag_kind="kde" to keep diagonals as KDEs:

# Pairplot with off-diagonal KDEs
sns.pairplot(
    df,
    hue="target",  # Group by class
    vars=features,  # Subset of features
    kind="kde",  # Off-diagonal: KDE plots
    diag_kind="kde",  # Diagonals: KDE plots
    corner=True,  # Remove redundant upper triangle (optional)
    palette="viridis",  # Color palette
    linewidth=2,  # Thicken KDE lines
    alpha=0.7  # Transparency for overlap clarity
)
plt.suptitle("Pairplot: Off-Diagonal KDEs for Two Classes", y=1.02)
plt.show()

What You’ll See Now:#

  • Diagonals: KDEs for each feature, with separate curves for "Malignant" and "Benign".
  • Off-diagonals: Overlaid KDEs for each class, showing where each class is most dense. For example:
    • mean radius vs. mean perimeter: Malignant tumors have a KDE curve shifted to the right (higher values).
    • mean texture vs. mean area: Benign tumors show a tighter density cluster at lower areas.

5. Advanced Customization#

For more control (e.g., mixing scatterplots and KDEs in upper/lower triangles), use PairGrid (the underlying API for pairplot). This lets you define different plot types for upper, lower, and diagonal elements.

Example: Upper Triangle Scatterplots, Lower Triangle KDEs#

# Initialize PairGrid
g = sns.PairGrid(
    df,
    hue="target",
    vars=features,
    corner=True,
    palette="viridis"
)
 
# Upper triangle: Scatterplots (sparse points)
g.map_upper(sns.scatterplot, alpha=0.5, s=30)  # s=30: point size
 
# Lower triangle: Filled KDEs (dense regions)
g.map_lower(sns.kdeplot, fill=True, alpha=0.3)  # fill=True: shaded KDEs
 
# Diagonals: Filled KDEs
g.map_diag(sns.kdeplot, fill=True, alpha=0.5)
 
# Add legend and title
g.add_legend(title="Tumor Type")
plt.suptitle("Mixed Pairplot: Scatter (Upper) + KDE (Lower)", y=1.02)
plt.show()

Key Customizations:#

  • fill=True: Shades the area under KDE curves for better readability.
  • alpha=0.3: Makes KDEs transparent to avoid hiding overlapping regions.
  • map_upper/map_lower: Separates plot types for upper/lower triangles (useful for comparing scatter and KDE side-by-side).

6. Conclusion#

Seaborn’s pairplot is a powerful EDA tool, but default scatterplots on off-diagonal elements can obscure class distributions for two-class datasets. By replacing off-diagonal scatterplots with KDEs (via kind="kde"), you can visualize density and overlap more effectively. For advanced use cases, PairGrid lets you mix plot types (e.g., scatter upper, KDE lower) to balance detail and clarity.

Whether you’re analyzing medical data (like breast cancer) or customer churn, KDEs in pairplots will help you answer: "How do my classes distribute across features?"

7. References#