How to Efficiently Select Non-Adjacent Columns by Column Number in Pandas (iloc Method)
When working with data in Pandas, selecting specific columns is a fundamental task. While many users rely on column names (using loc), there are scenarios where selecting columns by their position (column number) is more efficient: for example, when column names are long, non-descriptive, or dynamically generated. Pandas’ iloc method is designed for integer-based indexing, making it ideal for this job.
In this blog, we’ll focus on a common challenge: selecting non-adjacent columns (columns that are not next to each other) using their position with iloc. We’ll break down the syntax, walk through practical examples, highlight pitfalls to avoid, and share best practices to make your code robust and readable.
Table of Contents#
- Understanding
ilocBasics - Selecting Non-Adjacent Columns with
iloc - Step-by-Step Examples
- Common Pitfalls and How to Avoid Them
- Best Practices
- Conclusion
- References
1. Understanding iloc Basics#
Before diving into non-adjacent column selection, let’s recap how iloc works.
iloc is Pandas’ integer-location based indexer, used to select rows and columns by their position (0-based index). The syntax is:
df.iloc[row_indexer, column_indexer] row_indexer: Specifies which rows to select (e.g.,0:5for the first 5 rows,[0, 2, 4]for specific rows).column_indexer: Specifies which columns to select (using the same logic asrow_indexer).
For example, to select rows 0-2 and columns 1 and 3:
df.iloc[0:3, [1, 3]] # Rows 0,1,2; Columns 1 and 3 Key Note: iloc uses 0-based indexing, meaning the first column is 0, the second is 1, and so on.
2. Selecting Non-Adjacent Columns with iloc#
To select non-adjacent columns (columns that are not next to each other) by their position, use a list of integers as the column_indexer. This list contains the positions of the columns you want to keep.
Basic Syntax for Non-Adjacent Columns:#
df.iloc[:, [col1, col2, col3, ...]] - The
:in therow_indexerselects all rows (replace with specific row indices if needed). [col1, col2, ...]is a list of non-adjacent column positions (e.g.,[0, 2, 4]for the 1st, 3rd, and 5th columns).
3. Step-by-Step Examples#
Let’s walk through practical examples to solidify your understanding.
Example 1: Basic Non-Adjacent Column Selection#
Suppose we have a small DataFrame with 5 columns (A, B, C, D, E). We want to select the 1st, 3rd, and 5th columns (positions 0, 2, and 4).
Step 1: Create a Sample DataFrame#
import pandas as pd
data = {
'A': [10, 20, 30],
'B': [40, 50, 60],
'C': [70, 80, 90],
'D': [100, 110, 120],
'E': [130, 140, 150]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n", df) Output:
Original DataFrame:
A B C D E
0 10 40 70 100 130
1 20 50 80 110 140
2 30 60 90 120 150
Step 2: Select Non-Adjacent Columns (Positions 0, 2, 4)#
selected_columns = df.iloc[:, [0, 2, 4]] # Columns 0 (A), 2 (C), 4 (E)
print("Selected Non-Adjacent Columns:\n", selected_columns) Output:
Selected Non-Adjacent Columns:
A C E
0 10 70 130
1 20 80 140
2 30 90 150
Example 2: Selecting Non-Adjacent Columns in a Large Dataset#
For larger datasets, column positions are often easier to track than long/complex column names. Let’s use the Titanic dataset (via Seaborn) to demonstrate.
Step 1: Load the Dataset and Inspect Columns#
import seaborn as sns
titanic = sns.load_dataset('titanic')
print("Titanic Columns (with positions):")
for idx, col in enumerate(titanic.columns):
print(f"Position {idx}: {col}") Output (truncated):
Titanic Columns (with positions):
Position 0: survived
Position 1: pclass
Position 2: sex
Position 3: age
Position 4: sibsp
Position 5: parch
Position 6: fare
...
Step 2: Select Non-Adjacent Columns by Position#
Suppose we want survived (pos 0), age (pos 3), and fare (pos 6):
selected_titanic = titanic.iloc[:, [0, 3, 6]] # Columns 0, 3, 6
print("Selected Titanic Columns:\n", selected_titanic.head()) Output:
Selected Titanic Columns:
survived age fare
0 0 22.0 7.2500
1 1 38.0 71.2833
2 1 26.0 7.9250
3 1 35.0 53.1000
4 0 35.0 8.0500
Example 3: Combining Adjacent and Non-Adjacent Columns#
Sometimes you may want to select a mix of adjacent (e.g., columns 0-1) and non-adjacent (e.g., columns 3 and 5) columns. To do this, combine slices (for adjacent columns) and lists (for non-adjacent columns) using list(range()).
Example: Select Columns 0-1 (adjacent) and 3, 5 (non-adjacent)#
# Columns 0-1 (adjacent) + 3,5 (non-adjacent)
columns_to_select = list(range(0, 2)) + [3, 5] # [0,1,3,5]
selected_mixed = titanic.iloc[:, columns_to_select]
print("Mixed Selection Columns:\n", selected_mixed.columns) Output:
Mixed Selection Columns:
Index(['survived', 'pclass', 'age', 'parch'], dtype='object')
4. Common Pitfalls and How to Avoid Them#
While iloc is powerful, it’s easy to make mistakes. Here are key pitfalls and fixes:
Pitfall 1: Confusing 0-Based vs. 1-Based Indexing#
Problem: Assuming the first column is position 1 (instead of 0).
Example:
# Trying to select the first column (A) with [1] (incorrect)
df.iloc[:, [1]] # Returns column B instead of A Fix: Use 0 for the first column:
df.iloc[:, [0]] # Correctly selects column A Pitfall 2: Using Negative Indices (Without Awareness)#
iloc supports negative indices (e.g., -1 = last column), but this can cause confusion.
Example:
df.iloc[:, [-1]] # Selects the last column (E in our sample DataFrame) Fix: Explicitly note negative indices in comments to avoid ambiguity.
Pitfall 3: Using String Names in iloc#
iloc only accepts integer indices. Using column names raises an error.
Problem:
df.iloc[:, ['A']] # TypeError: .iloc requires numeric indexers Fix: Use loc for column names, or iloc with positions:
df.loc[:, ['A']] # Correct (using loc with names)
df.iloc[:, [0]] # Correct (using iloc with position) Pitfall 4: Out-of-Bounds Column Indices#
Selecting a column position greater than the number of columns raises an IndexError.
Problem:
df.iloc[:, [10]] # df has 5 columns (0-4) → IndexError Fix: Check column count first with len(df.columns):
print(f"Total columns: {len(df.columns)}") # Ensure indices are < len(df.columns) 5. Best Practices#
To make your code readable and robust when selecting non-adjacent columns with iloc:
1. Document Column Positions#
Comment the purpose of each column position to clarify intent:
# Select 'survived' (0), 'age' (3), 'fare' (6)
selected = titanic.iloc[:, [0, 3, 6]] 2. Use Variables for Reusable Column Indices#
Assign column positions to variables for clarity, especially in large projects:
col_survived = 0
col_age = 3
col_fare = 6
selected = titanic.iloc[:, [col_survived, col_age, col_fare]] 3. Validate Selections with df.columns#
Check that your selected positions map to the correct columns:
selected_cols = [0, 3, 6]
print("Selected Column Names:", titanic.columns[selected_cols]) # Verify names 4. Avoid Over-Reliance on Positions#
If column order might change (e.g., after data preprocessing), prefer loc with names. Use iloc only when column positions are stable.
6. Conclusion#
Selecting non-adjacent columns by position in Pandas is efficient and intuitive with iloc. By using a list of integers as the column indexer, you can quickly extract the columns you need—even in large datasets. Remember to:
- Use 0-based indexing.
- Combine slices and lists for mixed adjacent/non-adjacent selections.
- Avoid common pitfalls like string indices or out-of-bounds errors.
- Document and validate your selections for readability.
With these techniques, you’ll streamline your data wrangling workflow and avoid costly mistakes.