def permute(word):
if len(word) == 1:
return [word]
else:
= []
result for i in range(len(word)):
for perm in permute(word[:i] + word[i + 1 :]):
+ perm)
result.append(word[i] return result
Appendix Y: Generators
Goals
- Introduce generator functions and common patterns in use with generators.
Motivation for Generators
We often write functions that return large lists, with the intention to iterate over the items once.
"abc") permute(
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']
This returns
We often write functions that return lists of substantial size, for example the functions you write to load a list of data.
Yet we often only use a single item at a time:
for item in permute("cat"):
print(item)
cat
cta
act
atc
tca
tac
Consider that this code is the same as:
= permute("cat")
items for item in items:
print(item)
How can we avoid this?
One option would be to print from within the permute
function, and this would avoid the need for a list.
This would reduce our ability to reuse functions. It seems like perhaps it creates a tension between well-structured code and efficiency.
In practice, we will turn to generators.
Generator Functions
Generator functions are a special type of function. They can be identified by their use of the yield
keyword.
def simple_generator_func():
print("start")
yield 1
print("still running")
yield 2
print("one more")
yield 3
When you call a generator function, it seems like nothing happens:
# nothing prints?
= simple_generator_func() g
The function returns an object of type generator.
print(type(g))
<class 'generator'>
A generator is a special type of object that is iterable, but is not a container type.
for item in g:
print("printing from loop", item)
start
printing from loop 1
still running
printing from loop 2
one more
printing from loop 3
Note that the output alternates between output from within the function and outside.
When the function encounters a yield
, it pauses. The calling code receives the value as if it has been return
ed.
The difference is that on each iteration, the function resumes.
def simple_generator_func():
print("start")
yield 1
print("still running")
yield 2
print("one more")
yield 3
Let’s rewrite permute
as a generator function:
def ipermute(word):
if len(word) == 1:
yield word
else:
for i in range(len(word)):
for perm in ipermute(word[:i] + word[i + 1 :]):
yield word[i] + perm
This works similarly, avoiding the
# this works as expected, but avoids the N! list
for perm in ipermute("abcd"):
print(perm)
abcd
abdc
acbd
acdb
adbc
adcb
bacd
badc
bcad
bcda
bdac
bdca
cabd
cadb
cbad
cbda
cdab
cdba
dabc
dacb
dbac
dbca
dcab
dcba
Iterables
How does this work?
We’ve seen that Python syntax is largely hiding calls to dunder methods.
A for
loop is “syntactic sugar” for calling two other methods: __iter__
and __next__
.
= [1, 2, 3, 4]
ll for item in ll:
print(item)
1
2
3
4
= [1, 2, 3, 4]
ll = ll.__iter__()
iterator while True:
try:
= iterator.__next__()
item except StopIteration:
break
print(item)
1
2
3
4
__iter__
returns an iterator object, an intermediary object that tracks the current position in the iterable. We usually don’t see this variable.__next__
returns the next item in the iterable, and raisesStopIteration
when there are no more items.
If you implement a type that has these methods, it can be used in for loops and other iteration contexts.
Generators in Practice
You’ve been using something very generator-like without realizing it.
What does range
return?
# does this create a list with 1 trillion elements?
= range(1_000_000_000_000)
r print(f"object {r} is {type(r)=}")
object range(0, 1000000000000) is type(r)=<class 'range'>
# get the iterator object (not needed for generators, but range is generator-like)
= iter(r)
ri # the function `next` gets a single item from any iterable
print(next(ri))
# it is the same as calling __next__
print(ri.__next__())
# it yields one value at a time
print(next(ri))
0
1
2
With generators could write our own range
function:
# simplified form with one parameter
def our_range(n):
= 0
i while i < n:
yield i
+= 1 i
for x in our_range(10):
print(x)
0
1
2
3
4
5
6
7
8
9
If you’ve used map
and filter
, these are generator functions as well (as of Python 3).
Though you are increasingly unlikely to encounter Python 2, there is a fair bit of legacy Python 2 code out there.
In all Python 2.x versions range
does in fact return a list
! Therefore, the code above would crash, since you would not have the RAM to store a list of 1 trillion elements.
Generators were introduced in Python 2.2. For backwards-compatibility reasons it wouldn’t be until Python 3 that the language was able to update range
to use them.
In Python it is almost always correct to use xrange
, the generator version. This became range
in Python 3..
This demonstrates two of the primary benefits of generators:
- Generators use significantly less memory, only needing the memory for a single item and a bit of overhead– significantly less than the entire list.
- Generators enable lazy evaluation, work is not done until/unless needed.
Infinite Generators
To demonstrate this idea of lazy evaluation we can look at infinite generators as an extreme case:
# Generators do not have to ever exit.
# Here is an infinite generator:
def powers_of_two():
= 2
n while True:
yield n
*= 2 n
This is an infinite loop!
But since it will pause each time yield
is reached, we can use this as long as the calling code breaks the loop:
for x in powers_of_two():
print(x)
if x > 1000:
break
2
4
8
16
32
64
128
256
512
1024
Generator Expressions
In Python we can make a list comprehension:
# list of squares
** 2 for x in range(10)] [x
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
A dict comprehension:
# maps values to their squares
** 2 for x in range(10)} {x: x
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
A set comprehension:
# a set of cubes
** 3 for x in range(10)} {x
{0, 1, 8, 27, 64, 125, 216, 343, 512, 729}
But, if you have ever tried to make a tuple comprehension you didn’t get what you expected:
** 2 for x in range(10)) (x
<generator object <genexpr> at 0x1173c3bc0>
This is in fact a generator expression:
# this is a generator, it will be lazy-evaluated
= (x ** 2 for x in range(10000000000))
g # we can iterate over it and get a single item at a time
for item in g:
print(item)
if item > 1000:
break
0
1
4
9
16
25
36
49
64
81
100
121
144
169
196
225
256
289
324
361
400
441
484
529
576
625
676
729
784
841
900
961
1024
itertools
The itertools
module contains many useful functions for working with iterators, all of which are implemented as generators.
Useful functions include:
itertools.permutations
- permutations of an iterableitertools.combinations
- combinations without replacementitertools.product
- Cartesian product of multiple iterables (like nested for loops)itertools.chain
- concatenate iteratorsitertools.islice
- slice an iterator the way you would a listitertools.groupby
- group items by a keyitertools.tee
- create multiple iterators from one