Python3入门到精通——自定义序列类

作者: Daniel Meng

GitHub: LibertyDream

博客:明月轩

本系列教程采用知识共享署名-非商业性使用-相同方式共享 2.5 中国大陆许可协议

先回顾一下 Python 里的序列分类,按照存储类型可以分为容器序列和扁平序列

容器序列可以存放不同类型的数据。即可以存放任意类型对象的引用。比如 list, tuple, deque

扁平序列只能容纳一种类型,存放的是值而不是引用,实际是占据一段连续的内存空间,但只能存放诸如字符、字节和数值等基础类型数据。比如 str,bytes,bytearray,array.array

按照存储内容是否可变,又可分类成可变序列与不可变序列。前者包括 list,deque,bytearray,array。后者包括 str,tuple,bytes

不可变序列类型的实现可以参照 collections.abc.Sequence,继承 ReversibleCollection 两个类,需要实现 __getitem____len__ 两个方法。而 MutableSequence 是继承 Sequence 后又通过 mixin 添加了若干方法

In [2]:
import collections.abc as abc

help(abc.Sequence)
Help on class Sequence in module collections.abc:

class Sequence(Reversible, Collection)
 |  All the operations on a read-only sequence.
 |  
 |  Concrete subclasses must override __new__ or __init__,
 |  __getitem__, and __len__.
 |  
 |  Method resolution order:
 |      Sequence
 |      Reversible
 |      Collection
 |      Sized
 |      Iterable
 |      Container
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, value)
 |  
 |  __getitem__(self, index)
 |  
 |  __iter__(self)
 |  
 |  __reversed__(self)
 |  
 |  count(self, value)
 |      S.count(value) -> integer -- return number of occurrences of value
 |  
 |  index(self, value, start=0, stop=None)
 |      S.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 |      
 |      Supporting start and stop arguments is optional, but
 |      recommended.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __abstractmethods__ = frozenset({'__getitem__', '__len__'})
 |  
 |  ----------------------------------------------------------------------
 |  Class methods inherited from Reversible:
 |  
 |  __subclasshook__(C) from abc.ABCMeta
 |      Abstract classes can override this to customize issubclass().
 |      
 |      This is invoked early on by abc.ABCMeta.__subclasscheck__().
 |      It should return True, False or NotImplemented.  If it returns
 |      NotImplemented, the normal algorithm is used.  Otherwise, it
 |      overrides the normal algorithm (and the outcome is cached).
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Sized:
 |  
 |  __len__(self)

In [3]:
help(abc.MutableSequence)
Help on class MutableSequence in module collections.abc:

class MutableSequence(Sequence)
 |  All the operations on a read-only sequence.
 |  
 |  Concrete subclasses must override __new__ or __init__,
 |  __getitem__, and __len__.
 |  
 |  Method resolution order:
 |      MutableSequence
 |      Sequence
 |      Reversible
 |      Collection
 |      Sized
 |      Iterable
 |      Container
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __delitem__(self, index)
 |  
 |  __iadd__(self, values)
 |  
 |  __setitem__(self, index, value)
 |  
 |  append(self, value)
 |      S.append(value) -- append value to the end of the sequence
 |  
 |  clear(self)
 |      S.clear() -> None -- remove all items from S
 |  
 |  extend(self, values)
 |      S.extend(iterable) -- extend sequence by appending elements from the iterable
 |  
 |  insert(self, index, value)
 |      S.insert(index, value) -- insert value before index
 |  
 |  pop(self, index=-1)
 |      S.pop([index]) -> item -- remove and return item at index (default last).
 |      Raise IndexError if list is empty or index is out of range.
 |  
 |  remove(self, value)
 |      S.remove(value) -- remove first occurrence of value.
 |      Raise ValueError if the value is not present.
 |  
 |  reverse(self)
 |      S.reverse() -- reverse *IN PLACE*
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __abstractmethods__ = frozenset({'__delitem__', '__getitem__', '__len_...
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Sequence:
 |  
 |  __contains__(self, value)
 |  
 |  __getitem__(self, index)
 |  
 |  __iter__(self)
 |  
 |  __reversed__(self)
 |  
 |  count(self, value)
 |      S.count(value) -> integer -- return number of occurrences of value
 |  
 |  index(self, value, start=0, stop=None)
 |      S.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 |      
 |      Supporting start and stop arguments is optional, but
 |      recommended.
 |  
 |  ----------------------------------------------------------------------
 |  Class methods inherited from Reversible:
 |  
 |  __subclasshook__(C) from abc.ABCMeta
 |      Abstract classes can override this to customize issubclass().
 |      
 |      This is invoked early on by abc.ABCMeta.__subclasscheck__().
 |      It should return True, False or NotImplemented.  If it returns
 |      NotImplemented, the normal algorithm is used.  Otherwise, it
 |      overrides the normal algorithm (and the outcome is cached).
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Sized:
 |  
 |  __len__(self)

+,+=,extend 和 append

对于序列类型,拼接和添加是最常见的操作,但仔细追究还另有天地。+ 执行的是运算,会对两个算子进行格式审查,+= 不同,它是通过 MutableSequence 中的 __iadd__ 实现的,方法内部调用了 extend 方法,所以 += 和调用 extend 一样,都接一个可迭代对象做参数,执行就地变换。加法则会返回运算结果

In [12]:
one_lst = [1,2]
rst_lst = one_lst + [3, 4]  # 加法有类型检查,左右类型要一致
rst_lst
Out[12]:
[1, 2, 3, 4]
In [13]:
one_lst += [3, 4]
one_lst
Out[13]:
[1, 2, 3, 4]
In [14]:
one_lst += (5, 6)  # 可迭代对象都能参与计算,效果等同于 one_lst.extend(value)
one_lst
Out[14]:
[1, 2, 3, 4, 5, 6]
In [16]:
one_lst.extend((7,8))
one_lst
Out[16]:
[1, 2, 3, 4, 5, 6, 7, 8]

注意这里不要和 append 方法弄混,append(object) 是在序列末尾添加一个新对象,而不是执行序列合并操作

In [17]:
one_lst.append([9,10])
one_lst
Out[17]:
[1, 2, 3, 4, 5, 6, 7, 8, [9, 10]]

切片的实现

Python 中的序列,比如 list,一大特色且高频使用的功能是切片,无论是取值,反转,添加,删除,在不谈可读性时切片实际都能实现。这么灵活的功能在实现上只需要完成 __getitem__ 方法,就能对对象进行切片操作了

In [19]:
class MySlice:
    def __init__(self,val_lst):
        self.vals = val_lst
    
    def __getitem__(self, item):
        return self.vals[item]
In [20]:
my_slice = MySlice([1,2,3,4,5,6])
my_slice[::-1]
Out[20]:
[6, 5, 4, 3, 2, 1]

此时结果看似没问题,但实际上述实现中 self.vals[item] 是借助 list 完成了切片,返回的数据类型是 list 不符合预期,对于一个序列,切片后应该还是本身才对。

In [21]:
rst_slice = my_slice[::-1]
type(rst_slice)
Out[21]:
list

问题的关键在于传入的参数 item 上,它是 Python 内置类型 slice 的一个实例,当执行切片操作时,操作内容就被封装在其中,但却没有返回类型检查。所以经验上,对于序列类型的切片操作,需要自行实现操作类型判断,是取某个位置的值,还是切片,进而返回相应结果。

In [34]:
import numbers

class Group:
    def __init__(self, group_name, company_name, staffs):
        self.group_name = group_name
        self.company_name = company_name
        self.staffs = staffs

    def __getitem__(self, item):
        clss = type(self)
        if isinstance(item, slice):  # sequence[::]
            return clss(self.group_name,self.company_name,self.staffs[item])
        elif isinstance(item, numbers.Integral):  # sequence[index]
            return clss(self.group_name,self.company_name, [self.staffs[item]])
    
    #  实现自定义序列类的其他方法
    def __len__(self):
        return len(self.staffs)

    def __reversed__(self):
        self.staffs.reverse()
    
    def __iter__(self):
        return iter(self.staffs)

    def __contains__(self, item):
        return True if item in self.staffs else False
In [36]:
group = Group(company_name='Bilibili',group_name='anime',staffs=['A','B','C','D'])
group[2:].__dict__
Out[36]:
{'group_name': 'anime', 'company_name': 'Bilibili', 'staffs': ['C', 'D']}
In [37]:
group[1].__dict__
Out[37]:
{'group_name': 'anime', 'company_name': 'Bilibili', 'staffs': ['B']}

维护有序序列

实际开发中排序是永恒的主题,对于序列类型尤甚。如果自己维护序列可能会不停的重复造轮子,效率低下。Python 对此提供了 bisect 模块专门用于维护有序序列,默认升序。主要方法有 insortbisect 通过二分法,前者负责插入,后者负责寻找目标位置。为了一些特殊需求,模块还提供了 insort_right,insort_leftbisect_right,bisect_left 方法来确认目标的前后近邻位置

insortbisect 等价于 insort_rightbisect_right

In [42]:
import bisect
from collections import deque
sorted_seq_1 = []
bisect.insort(sorted_seq_1, 4)
bisect.insort(sorted_seq_1, 5)
bisect.insort(sorted_seq_1, 3)
bisect.insort(sorted_seq_1, 2)
print(sorted_seq_1)

sorted_seq_2 = deque()
bisect.insort(sorted_seq_2, 8)
bisect.insort(sorted_seq_2, 7)
bisect.insort(sorted_seq_2, 9)
bisect.insort(sorted_seq_2, 6)
print(sorted_seq_2)
[2, 3, 4, 5]
deque([6, 7, 8, 9])
In [43]:
bisect.bisect(sorted_seq_1, 4)
Out[43]:
3