List comprehension vs Generator expression

When to choose List comprehension and generator expression?

syntax

  • List comprehension [expression for item in iterable if condition]

  • Generator expression (expression for item in iterable if condition)

  • only difference is [] & ()

How Generator Expression Works:

  1. Initialization:

    • When you create a generator expression, it doesn't compute any values right away. Instead, it returns a generator object that can produce values one at a time.
  2. Iteration:

    • As you iterate over the generator (e.g., when the sum function requests the next value), the generator expression computes the next value and yields it to the caller.

    • The generator keeps track of its state internally, so it knows where it left off each time it is called to produce a new value.

  3. On-the-Fly Computation:

    • The computation of each value happens only when it is needed, which means memory usage is minimized since it doesn't store all values at once.

Example with sum Function:

When you use a generator expression with the sum function, the sum function requests values one at a time and accumulates them. Hereโ€™s the process step-by-step:

large_range = range(1, 1000000)
total_sum = sum(x ** 2 for x in large_range)
print(total_sum)  # Output: Sum of squares of numbers from 1 to 999999
  • Generator Creation: The generator expression (x ** 2 for x in large_range) creates a generator object.

  • Summation: The sum function starts iterating over the generator:

    • It requests the first value: 1 ** 2 = 1

    • It requests the second value: 2 ** 2 = 4

    • It continues requesting and summing values until the end of the range.

when to use which

Certainly! Here are some use case examples for both list comprehensions and generator expressions to illustrate when and why you might use each.

Use Case Examples for List Comprehensions:

  1. Transforming Data:

    • Example: Squaring numbers in a list.
    numbers = [1, 2, 3, 4, 5]
    squares = [x ** 2 for x in numbers]
    print(squares)  # Output: [1, 4, 9, 16, 25]
  1. Filtering Data:

    • Example: Filtering even numbers from a list.
    numbers = [1, 2, 3, 4, 5, 6]
    evens = [x for x in numbers if x % 2 == 0]
    print(evens)  # Output: [2, 4, 6]
  1. Flattening a List of Lists:

    • Example: Flattening a 2D list.
    matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    flat_list = [num for row in matrix for num in row]
    print(flat_list)  # Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
  1. Creating Dictionaries or Sets:

    • Example: Creating a dictionary with keys and values squared.
    keys = [1, 2, 3]
    squares_dict = {x: x ** 2 for x in keys}
    print(squares_dict)  # Output: {1: 1, 2: 4, 3: 9}
  1. Conditional Logic:

    • Example: Replacing negative numbers with 0.
    numbers = [-1, 2, -3, 4, -5]
    non_negatives = [x if x >= 0 else 0 for x in numbers]
    print(non_negatives)  # Output: [0, 2, 0, 4, 0]

Use Case Examples for Generator Expressions:

  1. Processing Large Datasets:

    • Example: Summing squares of a large range without storing the list.
    large_range = range(1, 1000000)
    total = sum(x ** 2 for x in large_range)
    print(total)  # Output: A large number (sum of squares)
  1. Memory-Efficient Data Processing:

    • Example: Filtering and transforming a large list of numbers.
    large_list = range(1, 1000000)
    evens_squared = (x ** 2 for x in large_list if x % 2 == 0)
    total = sum(evens_squared)
    print(total)  # Output: Sum of squares of even numbers
  1. In Data-bricks count the number of files present in the directory and exclude directories

     # Replace '/path/to/folder' with the actual path to your folder
     folder_path = "dbfs:/FileStore/PySpark_demo/"
    
     # List the contents of the folder
     folder_contents = dbutils.fs.ls(folder_path)
    
     # Filter to count only files (exclude directories)
     file_count = sum(1 for item in folder_contents if not item.isDir())
    
     # Display the file count
     print(f"The number of files (excluding directories) in the folder '{folder_path}' is: {file_count}")
    

Summary:

  • List Comprehensions: Ideal for creating lists when you need to work with the entire dataset at once and can use after some time. They are concise and readable for transforming and filtering data. and can access the value by index in generator expression can't.

  • Generator Expressions: Useful for handling large datasets or streams where you want to process items one by one without storing them all in memory. They are more memory-efficient.

Did you find this article valuable?

Support ๐Ÿ“’ Notes123 by becoming a sponsor. Any amount is appreciated!

ย