Why Seaborn pairplot hue='C' Isn't Hiding Column 'C'? Troubleshooting Hue Parameter Behavior

Seaborn’s pairplot is a staple in exploratory data analysis (EDA), offering a concise way to visualize relationships between multiple numeric variables in a dataset. By creating a grid of scatterplots (for pairwise relationships) and histograms/kernel density estimates (KDEs, for univariate distributions), it helps analysts quickly identify patterns, correlations, and outliers.

A common feature of pairplot is the hue parameter, which allows users to group data points by a categorical or numeric variable, encoding these groups with distinct colors. However, a frequent source of confusion arises when users set hue='C' expecting the column C to be excluded from the pair grid—only to find C still plotted as one of the variables.

In this blog, we’ll demystify this behavior, explain why hue='C' doesn’t automatically hide column C, and provide step-by-step troubleshooting to resolve the issue.

Table of Contents#

  1. Understanding Seaborn’s pairplot and the hue Parameter
  2. Why Doesn’t hue='C' Hide Column C?
  3. Step-by-Step Troubleshooting
  4. Advanced: Explicitly Excluding Columns from pairplot
  5. Conclusion
  6. References

1. Understanding Seaborn’s pairplot and the hue Parameter#

Before diving into the troubleshooting, let’s clarify how pairplot works under the hood and the role of the hue parameter.

What is pairplot?#

Seaborn’s pairplot (short for “pairwise plot”) generates a matrix of plots where each cell displays the relationship between two variables. By default:

  • The x-axis and y-axis of each cell correspond to numeric columns in the input DataFrame.
  • The diagonal cells show univariate distributions (histograms or KDEs) of individual variables.
  • Off-diagonal cells show scatterplots (or regression lines) of pairs of variables.

Role of the hue Parameter#

The hue parameter adds a layer of grouping to the plot by color-encoding data points based on a specified column. For example:

import seaborn as sns  
import pandas as pd  
 
# Load sample dataset (iris has numeric features and a categorical 'species' column)  
iris = sns.load_dataset("iris")  
sns.pairplot(iris, hue="species")  

Here, hue="species" colors data points by the species category, but species does not appear as an axis variable in the pair grid. Why? Because species is a categorical column (non-numeric), and pairplot only includes numeric columns in the pair grid by default.

Key Takeaway#

pairplot automatically excludes non-numeric columns from the pair grid, even if they are used as hue. However, this behavior changes if the hue column is numeric.

2. Why Doesn’t hue='C' Hide Column C?#

The confusion often stems from a misunderstanding of hue’s purpose: hue controls color encoding, not variable inclusion/exclusion in the pair grid.

The Root Cause: Numeric vs. Non-Numeric hue Columns#

pairplot includes all numeric columns in the DataFrame by default, regardless of whether they are used as hue. If your C column is:

Case 1: C is Numeric#

If C is a numeric column (e.g., integer, float), pairplot will treat it as a variable to include in the pair grid. Thus, even if you set hue='C', C will appear as an axis variable (since it’s numeric), leading to plots like C vs. A, C vs. B, etc.

Case 2: C is Non-Numeric (Categorical)#

If C is non-numeric (e.g., string labels, pd.Categorical, or object dtype), pairplot will exclude it from the pair grid (since only numeric columns are plotted). In this case, hue='C' will color points without C appearing as an axis variable.

Example: Numeric C is Included#

Let’s create a DataFrame where C is numeric and use it as hue:

import pandas as pd  
import seaborn as sns  
 
# Create a DataFrame with numeric columns A, B, C  
data = pd.DataFrame({  
    "A": [1, 2, 3, 4, 5],  
    "B": [5, 4, 3, 2, 1],  
    "C": [10, 20, 30, 40, 50]  # Numeric column  
})  
 
# Use C as hue  
sns.pairplot(data, hue="C")  

Output: The pair grid will include A, B, and C as axis variables. C will appear on both x and y axes, with points colored by C values.

Example: Non-Numeric C is Excluded#

Now, convert C to a categorical column and re-run:

# Convert C to categorical (non-numeric)  
data["C"] = pd.Categorical(["Low", "Low", "Medium", "High", "High"])  
 
# Use C as hue  
sns.pairplot(data, hue="C")  

Output: The pair grid now only includes A and B as axis variables. C (categorical) is excluded from the grid, even though it’s used as hue.

Why This Happens#

pairplot uses data.select_dtypes(include=[np.number]) internally to determine which columns to plot. Numeric C passes this check and is included; non-numeric C fails and is excluded. The hue parameter does not override this selection logic.

3. Step-by-Step Troubleshooting#

If hue='C' isn’t hiding C, follow these steps to diagnose the issue:

Step 1: Check the Data Type of C#

First, confirm if C is numeric. Use df.dtypes to check:

print(data.dtypes)  
# Output (if C is numeric):  
# A    int64  
# B    int64  
# C    int64  # Numeric! Will be included in pairplot  
# dtype: object  
 
# Output (if C is non-numeric):  
# A      int64  
# B      int64  
# C    category  # Non-numeric! Will be excluded  
# dtype: object  

Step 2: Verify pairplot’s Default Behavior#

pairplot includes all numeric columns by default. To confirm, check the columns in your DataFrame:

print(data.columns)  
# If columns are ['A', 'B', 'C'] and all are numeric, pairplot will plot all 3.  

Step 3: Confirm hue Doesn’t Control Exclusion#

The hue parameter has no logic to exclude columns from the pair grid. To test this, run pairplot without hue and compare:

# Without hue: includes all numeric columns  
sns.pairplot(data)  
 
# With hue='C' (numeric): still includes all numeric columns  
sns.pairplot(data, hue="C")  

Both plots will include C if it’s numeric.

4. Advanced: Explicitly Excluding Columns#

To hide C (or any column) from the pair grid, explicitly specify which columns to include using the vars parameter.

Solution 1: Use vars to Select Columns#

The vars parameter lets you list the numeric columns to include in the pair grid. Exclude C by omitting it:

# Include only A and B (exclude C)  
sns.pairplot(data, hue="C", vars=["A", "B"])  

Solution 2: Drop C Before Plotting#

Alternatively, drop C from the DataFrame before passing it to pairplot (but keep a copy for hue):

# Drop C from the data, but use the original C for hue  
sns.pairplot(data.drop("C", axis=1), hue=data["C"])  

Solution 3: Use x_vars and y_vars#

For more control, use x_vars and y_vars to specify axes separately:

# Plot A and B on x-axis; A and B on y-axis (exclude C)  
sns.pairplot(data, hue="C", x_vars=["A", "B"], y_vars=["A", "B"])  

5. Conclusion#

The key takeaway is that hue in pairplot controls color encoding, not variable exclusion. If C isn’t hidden, it’s likely because C is a numeric column, and pairplot includes all numeric columns by default.

To fix this:

  • Check if C is numeric (use df.dtypes).
  • Explicitly exclude C using vars, x_vars, or y_vars.
  • Convert C to a categorical dtype if it represents groups (this auto-excludes it from the pair grid).

By understanding pairplot’s default behavior and using explicit parameters, you can tailor the pair grid to your needs.

6. References#