Generator functions

Generator functions behave like regular functions in all respects, except for one difference. Instead of collecting results and returning them at once, they are automatically turned into iterators that yield results one at a time when you call next on them. Generator functions are automatically turned into their own iterators by Python.

This is all very theoretical so, let's make it clear why such a mechanism is so powerful, and then let's see an example.

Say I asked you to count out loud from 1 to 1,000,000. You start, and at some point I ask you to stop. After some time, I ask you to resume. At this point, what is the minimum information you need to be able to resume correctly? Well, you need to remember the last number you called. If I stopped you after 31,415, you will just go on with 31,416, and so on.

The point is, you don't need to remember all the numbers you said before 31,415, nor do you need them to be written down somewhere. Well, you may not know it, but you're behaving like a generator already!

Take a good look at the following code:

# first.n.squares.py
def get_squares(n): # classic function approach
return [x ** 2 for x in range(n)]
print(get_squares(10))

def get_squares_gen(n): # generator approach
for x in range(n):
yield x ** 2 # we yield, we don't return
print(list(get_squares_gen(10)))

The result of the two print statements will be the same: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]. But there is a huge difference between the two functions. get_squares is a classic function that collects all the squares of numbers in [0, n) in a list, and returns it. On the other hand, get_squares_gen is a generator, and behaves very differently. Each time the interpreter reaches the yield line, its execution is suspended. The only reason those print statements return the same result is because we fed get_squares_gen to the list constructor, which exhausts the generator completely by asking the next element until a StopIteration is raised. Let's see this in detail:

# first.n.squares.manual.py
def get_squares_gen(n):
for x in range(n):
yield x ** 2

squares = get_squares_gen(4) # this creates a generator object
print(squares) # <generator object get_squares_gen at 0x10dd...>
print(next(squares)) # prints: 0
print(next(squares)) # prints: 1
print(next(squares)) # prints: 4
print(next(squares)) # prints: 9
# the following raises StopIteration, the generator is exhausted,
# any further call to next will keep raising StopIteration
print(next(squares))

In the preceding code, each time we call next on the generator object, we either start it (first next) or make it resume from the last suspension point (any other next).

The first time we call next on it, we get 0, which is the square of 0, then 1, then 4, then 9, and since the for loop stops after that (n is 4), then the generator naturally ends. A classic function would at that point just return None, but in order to comply with the iteration protocol, a generator will instead raise a StopIteration exception.

This explains how a for loop works. When you call for k in range(n), what happens under the hood is that the for loop gets an iterator out of range(n) and starts calling next on it, until StopIteration is raised, which tells the for loop that the iteration has reached its end.

Having this behavior built into every iteration aspect of Python makes generators even more powerful because once we write them, we'll be able to plug them into whatever iteration mechanism we want.

At this point, you're probably asking yourself why you would want to use a generator instead of a regular function. Well, the title of this chapter should suggest the answer. I'll talk about performances later, so for now let's concentrate on another aspect: sometimes generators allow you to do something that wouldn't be possible with a simple list. For example, say you want to analyze all permutations of a sequence. If the sequence has a length of N, then the number of its permutations is N!. This means that if the sequence is 10 elements long, the number of permutations is 3,628,800. But a sequence of 20 elements would have 2,432,902,008,176,640,000 permutations. They grow factorially.

Now imagine you have a classic function that is attempting to calculate all permutations, put them in a list, and return it to you. With 10 elements, it would require probably a few dozen seconds, but for 20 elements there is simply no way that it can be done.

On the other hand, a generator function will be able to start the computation and give you back the first permutation, then the second, and so on. Of course you won't have the time to parse them all, there are too many, but at least you'll be able to work with some of them.

Remember when we were talking about the break statement in for loops? When we found a number dividing a candidate prime we were breaking the loop, and there was no need to go on.

Sometimes it's exactly the same, only the amount of data you have to iterate over is so huge that you cannot keep it all in memory in a list. In this case, generators are invaluable: they make possible what wouldn't be possible otherwise.

So, in order to save memory (and time), use generator functions whenever possible.

It's also worth noting that you can use the return statement in a generator function. It will produce a StopIteration exception to be raised, effectively ending the iteration. This is extremely important. If a return statement were actually to make the function return something, it would break the iteration protocol. Python's consistency prevents this, and allows us great ease when coding. Let's see a quick example:

# gen.yield.return.py
def geometric_progression(a, q):
k = 0
while True:
result = a * q**k
if result <= 100000:
yield result
else:
return
k += 1

for n in geometric_progression(2, 5):
print(n)

The preceding code yields all terms of the geometric progression, a, aq, aq2, aq3, .... When the progression produces a term that is greater than 100000, the generator stops (with a return statement). Running the code produces the following result:

$ python gen.yield.return.py
2
10
50
250
1250
6250
31250

The next term would have been 156250, which is too big.

Speaking about StopIteration, as of Python 3.5, the way that exceptions are handled in generators has changed. To understand the implications of the change is probably asking too much of you at this point, so just know that you can read all about it in PEP 479 (https://legacy.python.org/dev/peps/pep-0479/).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset