Fluent Python Study Notes - Built-in Sequences

Sequences Grouping

Container Sequences

Container sequences can hold items of different types

list, tuple, collections.depue

Flat Sequences

Flat Sequences hold items of one type

str, bytes, bytearray, memoryview, array.array

  • Container sequences hold references to the objects they contain, which may be of any type
  • Flat sequences physically store the value of each item, which has its own memory space
  • Flat sequences are more compact than Container sequences, while it’s limited to holding primitive values like string, number, bytes

Mutable Sequences

list, bytearray, array.array, collection.deque, memoryview

Immutable Sequences

tuple, str, bytes

List

Listcomps

listcomps are short of list comprehesions, which is used to build sequences quickly. listcomps can enhance the readability of the python code.

1
2
3
4
5
6
7
8
9
10
# example one
symbols = 'feifeiyu'
codes1 = []
for sy in symbols:
codes1.append(ord(sy))
print(codes)

# examples two
codes2 = [ord(sy) for sy in symbols]
print(codes)

example two is more readable than example one, because its intent is explicit and its length is short

listcomps do everything the map and filter functions do, without using the python lambda.

1
2
3
4
5
6
# use listcomps
symbols = 'feifeiyu'
ascii = [ord(sy) for sy in symbols if ord(sy) > 105]

# use map filter
ascii2 = list(filter(lambda sy: sy > 105, map(ord, symbols)))

Cartesian Products

Listcomps can generate lists from the cartesian products of two or more iterables. The items that make up the catesian products are tuples made from items from every input iterable.

1
2
3
4
5
6
7
8
9
10
11
# use listcomps
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts1 = [(color, size) for color in colors
for size in sizes]

# use for loop
tshirts2 = []
for color in colors:
for size in sizes:
tshirts2.append((color, size))

Generator Expressions

Tuples, arrays and other types of sequences can be initialized by listcomp, but a genexp (short of generator expressions) saves the memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor.
Genexp use the same syntax as listcomps, but are closed in parenthesis rather than brackets.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# example one 
symbols = 'feifeiyu'
tuple(ord(sy) for sy in symbols)

# example two
import array
array.array('I', (ord(sy) for sy in symbols))

# notes print generator object in console
(ord(sy) for sy in symbols)
<generator object <genexpr> at xxxxx>

# example three, use a genexp with a cartesian product
colors = ['black', 'whith']
sizes = ['S', 'M', 'L']
for tshirt in ('%s %s' % (color, size) for color in colors
for size in sizes)
print(tshirt)

Tuples

Tuples are not just immutable lists, they do double-duty: they can be used ad immutable lists and also as records with field namses.
Tuples as records: each item in the tuple holds the data for one field and the postion of the item gives its meaning.
If a tuple is just used ad immutable list, the quantity and the order of the items may or may not be important. But when using a tuple as collection of fields, the number of items is often fixed and their order is always vital.

1
2
3
4
5
6
7
8
9
# examples of tuples used as records
# example one use tuple to express latitude and longitude
lax_coordinates = (33.9425, -118.4080)

# use tuple to exporess data about Tokyo
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)

# contacts
contacts = [('James', '13212342345'), ('green', '13512342345')]

Tuple Unpacking

Parallel assignment is the most visible form of tuple unpacking, that is assigning items from an iterable to a tuple of variables.

1
2
lax_coord = (33.9425, -118.4080)
latitude, longitude = lax_coord #tuple unpacking

Swapping the values of variables without using temporary variable is also a application of tuple unpacking

1
2
# swap values of two variable
b, a = a, b

Another example of tuple unpacking is prefixing an argument with a star when calling a function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
divmod(20, 8) # output (2, 4)

# use tuple
t = (20, 8)
divmod(*t) # use * to upack tuple, output (2, 4)
# output upacking
quo, rem = divmod(*t) # quo = 2, rem = 4
# Defining function parameters with *args to grab arbitrary excess arguments is a classic python feature

# dummy variable _
_, rem = divmod(*t)
# rem = 4,

# nested tuple unpacking
name, cc, pop, (lat, long) = ('Tokyo', 'JP', 36.933, (35.689, 139.691))
# restult: name = 'Tokyo', cc = 'JP', pop = 36.933, lat = 35.689, long 139.691

Sometimes we only care about certain parts of a tuple when upacking, a dummy variable like _ is used ad placeholder.

In python 3, use * to parallel assignment as well

1
2
3
4
5
6
a, b, *rest = range(5)
# output a = 0, b = 1, rest = [2,3,4]
a, b, *rest = range(3)
# output a = 0, b = 1, rest = [2]
a, b, *rest = range(2)
# output a = 0, b = 1, rest = []

Named tuples

The collections.namedtuple function is a factory that produces subclasses of tuple enhanced with field names and a class name

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from collections import namedtuple
# Two paramters are required to create a named tuple: a class name and a list of field names
City = namedtuple('City', 'name country population coordinates')
# or City = namedtuple('City', ['name', 'country', 'population', 'coordinates'])
# field names can be given as an iterable of strings or as a single string that space-delimited string

# _fields is a tuple with the field names of the named tuple
# City._fields => ('name', 'country', 'population', 'coordinates')

tokyo = City('Tokyo', 'JP', 36.933, (35.689, 139.691))
# tokyo value: City(name='Tokyo', country='JP', population=36.933, coordinates=(35,689, 139.691))
# tokyo.name => 'Tokyo', tokyo.coordinates => (35,689, 139.691)

# _make() lets you instantiate a named tuple from a iterable; City(*tokyo)
# _asdict() returns a collections.OrderedDict built from the named tuple instance

Slicing

A common feature of list, tuple, str and all sequence types in python is the support of slicing operations.

1
2
3
4
5
6
7
8
l = [1,2,3,4,5,6]
# split at index 2
l[:2] # output => [1,2]
l[2:] # output => [3,4,5,6]
# split at index 5
l[:5] # output => [1,2,3,4,5]
l[5:] # output => [6]
l[2:4] # output => [3,4]

Slice objects

To specify a stride or step in slice: s[a:b:c] a is the start postion, b is the stop postion, c is the step

1
2
3
4
5
6
s = 'feifeiyu'
# step = 3,
s[::3] # output => ffy
# if the step is negative, returned items is reversed
s[::-3] # output => uee
s[::-1] # output => uyiefief, reverse the string

The built-in sequence types in python are one-dimensional, so they support only one index or slice, and not a tuple of them.
The [] operator can take multiple indexes or slices separated by commonas. For instance, in the external NumPy package, where items of a 2-dimensional numpy.ndarray can be fetched using the syntax a[i, j] and a 2-dimensional slice obtained with an expresssion like a[m:n, k:l].

Assigning to Slices

Mutable sequences can be grafted, excised and otherwise modified in-place using slice notation on the left side of an assignment statement or as the target of a del statement.

1
2
3
4
5
l = range(10) # output => [0,1,2,3,4,5,6,7,8,9]
l[2:5] = [20, 30] # output => [0,1,20,30,5,6,7,8,9]
del l[5:7] # output => [0,1,20,30,5,8,9]
l[3::2] = [11, 22] # output => [0,1,20,11,5,22,9]
l[2:5] = [100] # outpu => [0,1,100,22,9]

Using + and * with sequences

Operator + is used to concatenate two operands that have same sequence type. And the two oprands are not modified but a new sequnces is created as result of the concatenation.

Operator * is used to concatenate multiple copies of the same sequence, multiply it by an integer. A new sequence is created as result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# operator +
# l1, l2 have the same sequence type
l1 = [1,2,3]
l2 = [4,5,6]
res1 = l1 + l2
# l1 => [1,2,3], l2 => [4,5,6], both l1, l2 are not modified
# res1 => [1,2,3,4,5,6]

# operator *
res2 = l1 * 2
# l1 => [1,2,3] not modified
# res2 => [1,2,3,1,2,3]
s1 = 'fei'
s2 = sl * 2 # output: s1 => 'fei', s2 => 'feifei'
s3 = s2 + 'yu' # output: s2 => 'feifei', s3 => 'feifeiyu'

Building lists of lists

Sometimes we need to initialize a list with a certain number of nested list. The best way of doing this is with a listcomp

1
2
3
4
5
6
7
# create a list of with 3 lists of 3 items each.
board = [['_'] * for i in range(3)]
# board => [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

# modify board
board[1][2] = 'x'
# board => [['_', '_', '_'], ['_', '_', 'x'], ['_', '_', '_']]

A tempting but wrong shortcut is doing like below.

1
2
3
4
5
6
7
8
9
# The outer list is made of three references to the same inner list.
board = [['-'] * 3] * 3
# board => [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
# before board is changed, all seems right

# modify board
board[1][2] = 'x'
# board => [['_', '_', 'x'], ['_', '_', 'x'], ['_', '_', 'x']]
# all rows are aliases referring to the same object.

Augmented assignment with sequences

The augmented assignment operators: +=, *=, -= …

The sepcial method that makes += work is iadd(for “in-place addition”), is iadd is not implemented, python fall back to calling add.

1
2
3
4
5
6
7
8
9
10
11
12
# number
n = 2
n += 3 # n => 5
# string
s = 'fei'
s += 'fei' # s => 'feifei'
# sequence
l = [1,2,3]
l += [4,5,6] # l => [1,2,3,4,5,6]

# use *=
l *= 2 # l => [1,2,3,4,5,6,1,2,3,4,5,6]

Note: Repeated concatenation of immutable sequence is inefficient, because instead of just appending new items, the interpreter has to copy the whole target sequence to create a new one with the new items concatenated.

1
2
3
4
5
6
7
8
9
# example for augmented assignment for a mutable item in tuples
# raise TypeError
# value of t is changed
t = (1, 2, [3, 4])
t[2] += [5, 6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
# output: t => (1,2,[3,4,5,6])

Note:

  • Putting mutable items in tuple is not a good idea
  • Augmented assignment is not an atomic operation

list.sort and the sorted built-in function

The list.sort method sorts a list in-place, without making a copy. It returns None to remind us that it changes the target object, and does not create a new list.

In contrast, the built-in function sorted creates a new list and returns it. sorted accepts any iterable objects as arguments, include immutable sequences and generators.

Both list.sort and sorted take two optional, keyword-only arguments: key and reverse.

  • reverse: if True, the items are returned in descending order. default is False
  • key: A one-argument function that will be applied to each item to produce its sorting key.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# examples of sorted
fruits = ['grap', 'raspberry', 'apple', 'banana']
# sorted by the first charactor ascii value of a world
sorted(fruits) # output => ['apple', 'banana', 'grape', 'raspberry']
sorted(fruits, reverse=True) # output => ['raspberry', 'grape', 'banana', 'apple']
# sorted by the length of each world
sorted(fruits, key=len) # output => ['grape', 'apple', 'banana', 'raspberry']
sorted(fruits, key=len, reverse=True) # ouput => ['raspberry', 'banana', 'grape', 'apple']
# finally, the order in fruits is not changed
fruits # output => ['grap', 'raspberry', 'apple', 'banana']

# examples of list.sort
fruits.sort() # output None
# order in fruits is changed.
fruits # output => ['apple', 'banana', 'grape', 'raspberry']
fruits.sort(key=len, reverse=True) # output None
fruits # output => ['raspberry', 'banana', 'grape', 'apple']

Managing ordered sequences with bisect

The bisect module offers two main functions bisect and insort, they use the binary search algorithm to quickly find an insert items in any sorted sequence

bisect

bisect(haystack, needle) does a binary search for needle in haystack which must be a sorted sequence, to locate the position where needle can be inserted while maintaining haystack in ascending order. In other words, all items appearing up to that position are less or equal to the needle.

1
2
3
4
5
6
7
8
9
10
11
# usage of bisect
import bisect

def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
# get position of socre in breakpoints
i = bisect.bisect(breakpoints, score)
return grades[i]

[grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

#output => ['raspberry', 'banana', 'grape', 'apple']

Inserting with bisect.insort

Sorting is expensive, so once a sequence has sorted, it’s good to keep it that way.
insort(seq, item) inserts item in to seq so as to keep seq in ascending order.

1
2
3
4
5
6
7
8
9
# usage of 
import bisect
import random
my_list = []
random.seed(1729)
for i in range(7):
new_item = random.randrange(14)
bisect.insort(my_list, new_item)
print '%2d ->' % (new_item, my_list)

Arrays

For numbbers, array.array is more efficient than a list, it supports all mutable sequence operations(include .pop, .insert, and .extend), and additional methods for fast loading and saving such as .frombytes and .tofile.

When creating an array a typecode must be provided which determine the underlying C type used to store each item in the array.

1
2
3
4
5
6
7
8
9
from array import array
from random import random
# 'd' is a typecode mean double-precision floats
floats = array('d', (random() for i in range(100)))
floats[0] # output => 0.7810596079396112

# 'b' is another typecode mean signed char -128 ~ 127
integer = array('b', [1,2,3,4,5,6])
integer[2] # output => 3

array.tofile and array.fromfile are easy to write or read numbers from files, and they are very fast.

1
2
3
4
5
6
7
8
fp = open('float.bin', 'wb')
floats.tofile(fp)
fp.close()
floats2 = array('d')
fp = open('floats.bin', 'rb')
# read 50 numbers from binary file
floats.fromfile(fp, 50)
fp.close()

Memory Views

The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes.
Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around.

1
2
3
4
5
6
# build a array of 5 short signed integers (typecode 'h')
# not work in python 2.7
numbers = array.array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers)
len(memv) # output => 5
memv[1] # output => -1

Deques and other queues

The .append and .pop methods make a list usable as stack or a queue. But inserting and removing from the left of a list is costly because the entire list must be shifted.

The class cllections.deque is a thread-safe double-ended queue designed for fast inserting and removing from both ends.

1
2
3
4
5
6
7
8
9
from collections import deque
dq = deque(range(12), maxlen=10)
dq # output deque([2, 3, 4, 5, 6, 7, 8, 9, 10, 11], maxlen=10), not start with 0
dq.rotate(3)
dq # output deque([9, 10, 11, 2, 3, 4, 5, 6, 7, 8], maxlen=10)
dq.appendleft(-1)
dq # output deque([-1, 9, 10, 11, 2, 3, 4, 5, 6, 7], maxlen=10)
dq.extend([11,22,33])
dq # output deque([11, 2, 3, 4, 5, 6, 7, 11, 22, 33], maxlen=10)

Besides deques, other python standard library packages implement queues

queue

Provides the synchronized classes Queue, LifoQueue and PriorityQueue. These are used for safe communication between threads.

multiprocessing

Implements its own bounded Queue, very similar to queue.Queue but designed for inter-process communictaion.

asyncio

Newly added to python 3.4, asyncio provides Queue, LifoQueue, PriorityQueue and JoinableQueue with APIs inspired by the class queue and multiprocessing, but adapted for managing tasks in synchronous programming

heapq

In contrast to the previous three modules, heapq does not implement a queue class, but provides functions like heappush and heappop that let you use a mutable sequence as a heap queue or priority queue

END