Fluent Python Study Notes - Object references, mutability and recycling

Identity, equality and aliases

In python, every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identify of two objects; the id() function returns an integer representing its identify. The real meaning of an object’s id is implementation-dependent.

1
2
3
4
5
6
7
8
9
10
11
12
13
yu = {'name': 'feifeiyu', 'age': 23}
fei = yu
print fei is yu
# output => True
print 'id(fei)=%s, id(yu)=%s' % (id(fei), id(yu))
# output => id(fei)=140066670235272, id(yu)=140066670235272
yufei = {'name': 'feifeiyu', 'age': 23}
print yu == yufei
# output => True
print yu is yufei
# output => False
print id(yufei)
# output => 140066670245320

In above code, yu and fei are aliases: two variable bound to the same object. on the other hand, yufei is not an alias for yu, they have different identifies and are bound to distinct objects. The objects bound to yu and yufei have same value – that’s what == compares.

Choosing operators between == and is

The == operator compares the values of objects(the data they hold), while is compares their identifies.
we often care about values and not identities, so == appears more frequently than is in python code.

However, if you are comparing a variable to singleton, then it makes sense to use is. By far, the most common case is checking whether a variable is bound to None.

1
2
3
x is None
# its negation
x is not None

The is operator is faster than ==, because if cannot be overloaded, So Python does not have to find and invoke special methods to evaluate is, and computing is as simple as comparing two integer ids.
In contrast, a == b is syntactic sugar for a.eq(b). The eq method inherited from object compares object ids, so it produces the same result as is. But most built-in types override eq with more meaningful implementations tath actually take into account the values of the object attributes.

Copies are shallow by default

The easiest way to copy a list(or most built-in mutable collections) is to use the built-in constructor for the type itself.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
l1 = [3, [66, 55, 44], (7, 8, 9)]
l2 = list(l1)
print id(l1), id(l2)
# output => 140066670707128 140066652306752
# l1, l2 have different identifies
print id(l1[1]), id(l2[1])
# output => 140066652306248 140066652306248
# the second element of l1 and l2 hold the same identify
l1[1].append(33)
print l1
# output => [3, [66, 55, 44, 33], (7, 8, 9)]
print l2
# output => [3, [66, 55, 44, 33], (7, 8, 9)]
l1[0] = 6
# base type
print l1
# output => [6, [55, 55, 44, 33], (7, 8, 9)]
print l2
# output => [3, [66, 55, 44, 33], (7, 8, 9)]
# imutable object
# hold the same identify of l1[2] and l2[2]
print id(l1[2]), id(l2[2])
# output => 140066653249040 140066653249040
l1[2] += (10, 11)
# id(l1[2]) has changed: += creates a new tuple and rebinds the variable l1[2]
print id(l1[2]), id(l2[2])
# output => 140066652287920 140066653249040

Using the constructor produces a shallow copy, the outermost container is duplicated, but the copy is filled with references to the same items held by the original container. This saves memory and causes no problems if all items are immutable. But if there are mutable items, this may lead to unpleasant surprises.

Deep and shallow copies of arbitrary objects

The copy module provides the deepcopy and copy functions that return deep adn shallow copies of arbitrary objects.

1
2
3
4
5
6
7
8
9
import copy
l1 = [3, [66, 55, 44], (7, 8, 9)]
l2 = copy.copy(l1)
l3 = copy.deepcopy(l1)
print 'id(l1)=%s, id(l2)=%s, id(l3)=%s' % (id(l1), id(l2), id(l3))
# output => id(l1)=140063262851800, id(l2)=140063271641528, id(l3)=140063271582176
print 'id(l1[1])=%s, id(l2[1])=%s, id(l3[1])=%s' % (id(l1[1]), id(l2[1]), id(l3[1]))
# output => id(l1[1])=140063262908072, id(l2[1])=140063262908072, id(l3[1])=140063271582320

Note that making deep copies is not a simple matter in the general case. Objects may hava cyclic references which would cause a naive algorithm to enter an infinite loop. The deepcopy function remembers the objects already copied to handle cyclic references gracefully.

Function parameters as references

The only mode of parameter passing in python is call by sharing. Call by sharing means that each formal parameter of the function gets a copy of each reference in the arguments.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def f(a, b):
a += b
return a
x = 1
y = 2
f(x, y)
print 'x=%s, y=%s' % (x, y)
# output => x=1, y=2 (parameters are not changed)
a = [1, 2]
b = [3, 4]
f(a, b)
print 'a=%r, b=%r' % (a, b)
# output => a=[1, 2, 3, 4], b=[3, 4] (parameters are changed)
m = (10, 20)
n = (30, 40)
f(m, n)
print 'm=%r, n=%r' % (m, n)
# output => m=(10, 20), n=(30, 40) (parameters are not changed)

Mutable types as parameter defaults: bad idea

Optional parameters with default values are a great feature of python function definitions, allowing our APIs to evolve while remaining backward-compatible. However, we should avoid mutable objects as default values for parameters.

1
2
3
4
5
6
7
8
9
10
11
12
def params(vals=[]):
return vals
ts1 = params(vals=[1,2,3])
ts2 = params()
ts3 = params()
print 'ts1=%r, ts2=%r, ts3=%r' % (ts1, ts2, ts3)
# output => ts1=[1, 2, 3], ts2=[], ts3=[]
ts2.append(2)
print 'ts1=%r, ts2=%r, ts3=%r' % (ts1, ts2, ts3)
# ts1=[1, 2, 3], ts2=[2], ts3=[2]
# ts3 has changed

The issue with mutable defaults explains why None is often used as the default value for parameters that may receive mutable values.

del and garbage collection

The del statement deletes names, not objects. An object may be garbage collected as result of a del command, but only if the variable deleted holds the last reference to the object, or if the object becomes unreachable. Rebinding a variable may also cause the number of references to an object reach zero, causing its destruction.

In CPython the primary algorithm for garbage collection is reference counting. Essentially, each object keeps count of how many references point to it. As soon as that refcount reaches zero, the object is immediately destroyed: CPython calls the del method on the object (if defined) and then frees the memory allocated to the object.

Weak references

The presence of references is what keeps an object alive in memory. When the reference count of an object reaches zero, the garbage collector disposes of it. But sometimes it is useful to have a reference to an object that does not keep it around longer than necessary. A common use case is a cache.

Weak references to an object do not increase its reference count. The object that is the target of a reference is called referent. Therefore, we say that a weak reference does not prevent the referent from being garbage collected.

Weak references are useful in caching applications because you don’t want the cached objects to be kept alive just because they are referenced by the cache.

1
2
3
4
5
6
7
8
9
10
11
12
13
import weakref
a_set = {0, 1}
wref = weakref.ref(a_set)
wref
# output => <weakref at 0x7f98ed10b6d8; to 'set' at 0x7f98ede923f0>
wref()
# output => {0, 1}
a_set = {1, 2, 3}
wref()
# output => {0, 1}
wref() is None
# output => False

Tricks python plays with immutables

For variable of instances str, bytes, tuple, frozenset, when they assigned to other variables do not make copies, but return references to the instances.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
num1 = 12
num2 = num1
print num1 is num2
# output => True
num2 = 13
print num1 is num2
# output => False
str1 = 'feifeiyu'
str2 = str1
print str1 is str2
# output => True
tu1 = (1,2,3)
tu2 = tu1
print tu1 is tu2
# output => True
tu2 = tuple(tu1)
print tu1 is tu2
# output => True
tu2 = (1,2,3)
print tu1 is tu2
# output => False

The tricks discussed in this section, are white lies, they save memory and make interpreter faster. Do not worry about them, they should not give you any trouble because they ony apply to immutable types.

Object representations

Thanks to the python data model, user-defined types can behave as naturally as the built-in types. And this can be accomplished without inheritance, in the spirit of duck typing: you just implement the methods needed for your objects to behave as expected.

Every object-oriented language has at least one standard way of getting a string representation from any object. python has two:

  • repr()

    Returns a string representing the object as the developer wants to see it.

  • str()

    Retruns a string representing the object as the user wants to see it.

    we can implement the special methods repr and str to support repr() and str().

classmethod vs staticmethod

Decorator classmethod is used to define a method that operates on the class and not on instances. classmethod changes the way the method is called, so it receives the class itself as the first argument, instead of an instance. Its most common use is for alternate constructors.

1
2
3
4
5
6
7
8
# alternate constructors
@classmethod
def frombytes(cls, octets):
typecode = chr(octets[0])
memv = memoryview(octets[1:]).cast(typecode)
# unpack the memoryview resulting from
# the cast into the pair of arguments needed for the constructor
return cls(*memv)

In contrast, the staticmethod decorator changes a method so that it receives no special first argument. in essence, a static method is just like a plain function that happens to live in a class body, instead of being defined at module level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Demo:
@classmethod
def classmeth(*args):
return args
@staticmethod
def statmeth(*args):
return args
Demo.classmeth()
# output => (<class __main__.Demo at 0x7f5df12a1ae0>,)
Demo.classmeth('span')
# output => (<class __main__.Demo at 0x7f5df12a1ae0>, 'span')
Demo.statmeth()
# output => ()
Demo.statmeth('span')
# output => ('span',)

Formatted displays

The format() built-in function and the str.format() method delegate the actual for matting to each type by calling their .format(format_spec) method. The format_spec is a formatting specifier, which is either:

  • The second argument in format(obj, format_spec)
  • whatever appears after the colon in a replacement field delimited with {} inside a format string used with str.format()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
brl = 1/2.43
brl
# output => 0.4115226337448559
format(brl, '0.4f')
# output => 0.4115
' 1 BRL = {rate:0.2f} USD'.format(rate=brl)
# output => ' 1 BRL = 0.41 USD'
# base 2 output
format(12, 'b')
# output => '1100'
#base 16 output
format(12, 'x')
# ouput => 'c'
# % for a percentage display
format(1.0/3.0, '.1%')
# ouput => '33.3%'

Hashable object

Usually, python object are unhashable, so it can not put in a set. How to make a object hashable, we must implement hash method. We also need to make object instances immutable by use @property decorator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Vector2d:
def __init__(self, x, y):
self.__x = float(x)
self.__y = float(y)
# marks the getter method of a property
@property
def x(self):
return self.__x
@property
def y(self):
return self.__y
def __hash__(self):
return hash(self.x) ^ hash(self.y)
v1 = Vector2d(3, 4)
v2 = Vector2d(3.1, 4.2)
hash(v1), hash(v2)
# output => (7, 384307168202284039)
set([v1, v2])
{Vector2d(3.1, 4.2), Vector2d(3.0, 4.0)}

Private and “Protected” attributes in Python

In python there is no way to create private variables in strong sense of the private modifier in java. What python has is a simple mechanism to prevent accidental overwriting of a “private” attribute in a subclass.

private attributes

if you name an instance attribute in the form name (two leading underscores and zero or at most on trailing underscore), python stores the name in the instance dict__ prefixed with a leading underscore and the class name. This language feature goes by the lovely of name mangling.

protected attributes

In python, we use just one underscore prefix to ‘protect’ attributes by convention, eg.slef._x. The single underscore prefix has no special meaning to the python interpreter when used in attribute names, but it’s a very strong convention among python programmers that you should not access such attributes from outside the class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Vector(object):
def __init__(self, x, y, z, m):
self.__x = x # private attribute
self._y = y # protected attribute
self.__z_ = z # private attribute
self.__m__ = m # __xx__ is preserved for python special variable, we can not use it
vec = Vector(1,2,3,4)
vec.__dict__
# output => {'__m__': 4, '_Vector__x': 1, '_Vector__z_': 3, '_y': 2}
vec.__x
# output => AttributeError: 'Vector' object has no attribute '__x'
vec._y
# output => 2
vec.__z_
# output => AttributeError: 'Vector' object has no attribute '__z_'
vec.__m__
# output => 4
vec._Vector__x
# output => 1
vec._Vector__z_
# output => 3

Name mangling used in python private attributes is about safety, not security: it’s designed to prevent accidental access and not intentional wrongdoing.

Save space with the slots class attribute

By default, python stores instance attributes in a per-instance dict named dict. while, dictionaries hava s significant memory overhead because of the underlying hash table used to provide fast access. If you are dealing with millions of instances with few attributes, the slots class attribute can save a lot of memory, by letting the interpreter store the instance attributes in a tuple instead of a dict.

The problems with slots

  • You must remember to redeclare slots in each subclass, since the inherited attribute is ignored by the interpreter.
  • Instances will only be able to have the attributes listed in slots, unless you include dict in slots.
  • Instances cannot be targets of weak references unless you remember to include weakref in slots

Overriding class attribtutes

End