How to Search a Text File for Keywords or Key-Phrases and Print Related Lines in Python

In today’s data-driven world, efficiently searching through text files to extract relevant information is a common task. Whether you’re analyzing log files, parsing through large datasets, or extracting insights from text documents, Python provides powerful tools to simplify this process. This blog will guide you through step-by-step methods to search for keywords or key-phrases in text files and print the related lines, with practical examples and advanced techniques to handle real-world scenarios. By the end, you’ll be able to automate text search tasks, save time, and scale your workflow to multiple files.

Table of Contents#

  1. Prerequisites
  2. Basic Approach: Search for Keywords in a Single Text File
  3. Advanced Search: Key-Phrases, Case Insensitivity, and Partial Matches
  4. Searching Across Multiple Text Files
  5. Filtering and Processing Results (Beyond Printing)
  6. Practical Examples: Real-World Use Cases
  7. Troubleshooting Common Issues
  8. Conclusion
  9. References

Prerequisites#

Before diving in, ensure you have:

  • Python 3.x installed: Download from python.org.
  • Basic Python knowledge: Familiarity with variables, loops, and functions.
  • A text file to practice: Create a sample file (e.g., sample.txt) with lines like:
    The quick brown fox jumps over the lazy dog.  
    Python is a powerful programming language.  
    Learning Python can boost your career!  
    The dog chased the fox through the forest.  
    Error: Connection timed out.  
    

Basic Approach: Search for Keywords in a Single Text File#

Let’s start with the fundamentals: searching for a single keyword in a text file and printing the lines containing it.

Step 1: Open the File and Read Lines#

Use Python’s with statement to open the file (ensures proper closure). Read lines one at a time to handle large files efficiently.

Step 2: Check for Keyword Presence#

Loop through each line, check if the keyword exists, and print the line (with its number for context).

Example Code:#

def search_keyword_in_file(file_path, keyword):
    with open(file_path, 'r', encoding='utf-8') as file:
        for line_num, line in enumerate(file, 1):  # enumerate starts at line 1
            if keyword in line:
                print(f"Line {line_num}: {line.strip()}")  # .strip() removes extra newlines
 
# Usage
file_path = 'sample.txt'
keyword = 'Python'
search_keyword_in_file(file_path, keyword)

Output:#

Line 2: Python is a powerful programming language.  
Line 3: Learning Python can boost your career!  

Explanation:#

  • with open(...): Safely opens the file and closes it automatically.
  • enumerate(file, 1): Tracks line numbers (starts at 1 instead of 0).
  • keyword in line: Checks if the keyword exists in the current line.

Advanced Search: Key-Phrases, Case Insensitivity, and Partial Matches#

The basic method works for simple keywords, but real-world tasks often require:

  • Key-phrases (multiple words, e.g., "powerful programming").
  • Case insensitivity (e.g., match "python", "Python", or "PYTHON").
  • Avoiding partial matches (e.g., "cat" should not match "category").

1. Search for Key-Phrases#

Key-phrases are just longer strings. Use the same in operator:

def search_phrase_in_file(file_path, phrase):
    with open(file_path, 'r', encoding='utf-8') as file:
        for line_num, line in enumerate(file, 1):
            if phrase in line:
                print(f"Line {line_num}: {line.strip()}")
 
# Search for a phrase
search_phrase_in_file('sample.txt', 'powerful programming')

Output:#

Line 2: Python is a powerful programming language.  

To ignore case, convert both the line and phrase to lowercase (or uppercase):

def search_case_insensitive(file_path, phrase):
    with open(file_path, 'r', encoding='utf-8') as file:
        for line_num, line in enumerate(file, 1):
            if phrase.lower() in line.lower():
                print(f"Line {line_num}: {line.strip()}")
 
# Match "python", "PYTHON", etc.
search_case_insensitive('sample.txt', 'PYTHON')

Output:#

Line 2: Python is a powerful programming language.  
Line 3: Learning Python can boost your career!  

3. Avoid Partial Matches with Regular Expressions#

Use the re module to search for whole words (e.g., "dog" but not "dogged"). The \b metacharacter denotes word boundaries.

Example: Match "dog" (not "dogged")#

import re
 
def search_exact_word(file_path, word):
    pattern = re.compile(rf'\b{re.escape(word)}\b', re.IGNORECASE)  # re.escape handles special chars
    with open(file_path, 'r', encoding='utf-8') as file:
        for line_num, line in enumerate(file, 1):
            if pattern.search(line):
                print(f"Line {line_num}: {line.strip()}")
 
# Search for "dog" (exact word)
search_exact_word('sample.txt', 'dog')

Output:#

Line 1: The quick brown fox jumps over the lazy dog.  
Line 4: The dog chased the fox through the forest.  

Why This Works:#

  • re.compile(rf'\b{re.escape(word)}\b', re.IGNORECASE):
    • \b: Ensures the word is not part of another word.
    • re.escape(word): Escapes special characters (e.g., if the word is "dog.").
    • re.IGNORECASE: Matches any case.

Searching Across Multiple Text Files#

To search in multiple files (e.g., all .txt files in a folder), use the glob module to list files and loop through them.

Example: Search All .txt Files in a Directory#

import glob
import re
 
def search_multiple_files(directory, keyword):
    pattern = re.compile(rf'\b{re.escape(keyword)}\b', re.IGNORECASE)
    # Get all .txt files in the directory
    for file_path in glob.glob(f"{directory}/*.txt"):
        print(f"\n--- Results in {file_path} ---")
        with open(file_path, 'r', encoding='utf-8') as file:
            for line_num, line in enumerate(file, 1):
                if pattern.search(line):
                    print(f"Line {line_num}: {line.strip()}")
 
# Search all .txt files in the current folder for "error"
search_multiple_files('.', 'error')

Output:#

--- Results in ./sample.txt ---  
Line 5: Error: Connection timed out.  

Explanation:#

  • glob.glob(f"{directory}/*.txt"): Finds all .txt files in directory.
  • Loop through each file and apply the search logic.

Filtering and Processing Results (Beyond Printing)#

Instead of printing results immediately, store them in a structured format (e.g., a list of dictionaries) for further analysis (e.g., saving to a report, counting occurrences).

Example: Collect Results and Save to a Report#

import re
 
def collect_results(file_path, keyword):
    results = []
    pattern = re.compile(rf'\b{re.escape(keyword)}\b', re.IGNORECASE)
    with open(file_path, 'r', encoding='utf-8') as file:
        for line_num, line in enumerate(file, 1):
            if pattern.search(line):
                results.append({
                    'file': file_path,
                    'line_number': line_num,
                    'content': line.strip()
                })
    return results
 
# Collect results for "Python" and save to a report
results = collect_results('sample.txt', 'Python')
 
# Write results to a report file
with open('search_report.txt', 'w', encoding='utf-8') as report:
    report.write("Search Results for 'Python':\n")
    for result in results:
        report.write(f"\nFile: {result['file']}\nLine {result['line_number']}: {result['content']}\n")

Output in search_report.txt:#

Search Results for 'Python':  

File: sample.txt  
Line 2: Python is a powerful programming language.  

File: sample.txt  
Line 3: Learning Python can boost your career!  

Practical Examples: Real-World Use Cases#

1. Log File Analysis#

Search server logs for errors/warnings to debug issues:

search_multiple_files('server_logs/', 'error')  # Find all "error" entries in logs

2. Text Analysis (e.g., Novels)#

Find themes in a novel (e.g., "freedom" in 1984):

search_keyword_in_file('1984.txt', 'freedom')

3. Data Extraction from Raw Text#

Extract lines with email addresses (using regex):

import re
 
def extract_emails(file_path):
    email_pattern = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            matches = email_pattern.findall(line)
            if matches:
                print(f"Emails found: {matches}")
 
extract_emails('contacts.txt')  # Extracts emails like "[email protected]"

Troubleshooting Common Issues#

1. "File Not Found" Error#

  • Fix: Use the absolute file path (e.g., C:/projects/sample.txt on Windows or /home/user/sample.txt on Linux/macOS).
  • Check if the file exists with os.path.exists(file_path).

2. Encoding Errors (e.g., UnicodeDecodeError)#

  • Fix: Specify the file encoding (e.g., encoding='latin-1' or encoding='utf-16'):
    with open(file_path, 'r', encoding='latin-1') as file:

3. Slow Performance with Large Files#

  • Fix: Read lines one at a time (as shown) instead of loading the entire file into memory with read() or readlines().

Conclusion#

Searching text files for keywords/phrases is a foundational skill in Python, with applications in log analysis, data extraction, and text processing. By mastering the techniques above—basic searches, regex for precision, multi-file handling, and result processing—you can automate tedious tasks and unlock insights from text data.

Experiment with different scenarios (e.g., log files, novels, or raw data) to build confidence!

References#