python内置的数据结构包括:列表(list)、集合(set)、字典(dictionary),一般情况下我们可以直接使用这些数据结构,但通常我们还需要考虑比如搜索、排序、排列以及赛选等一些常见的问题。
如何巧妙的使用数据结构和同数据有关的算法,在collections模块中包含了针对各种数据结构的解决方法。
1、序列分解为变量
In [5]: a = (4,5,6)
In [6]: x,y,z =a
In [7]: x
Out[7]: 4In [8]: z
Out[8]: 6In [9]: y
Out[9]: 5In [10]: b = ['python',222,(,9,30)] #嵌套分解变量
In [11]: p,n,(year,mon,day) =b
In [12]: p
Out[12]: 'python'In [13]: n
Out[13]: 222In [14]: year
Out[14]: In [15]: day
Out[15]: 30
#可以分解的对象只要是可迭代对象如字符串、文件、迭代器和生成器
In [16]: s = 'py'In [17]: x,y =s
In [18]: x
Out[18]: 'p'
#忽略某个值使用下划线代替
In [19]: data = 'python'In [20]: x,_,_,y,_,_ =data
In [21]: x
Out[21]: 'p'In [22]: y
Out[22]: 'h'
2、任意长度对象分解元素
要从某个可迭代对象中分解出N个元素,可以使用python的“*表达式”来代表多个
列1:在作业成绩中去掉最高和最低后取平均分
In [47]: grades = (68,98,85,78,84,79,88)
In [48]: defdrop_first_last(grades):
...: first,*middle,last =grades
...:return sum(middle) /len(middle)
...:
...:
In [49]: drop_first_last(sorted(list(grades),reverse=True))
Out[49]: 82.8
列2:在嵌套元组中*式语法的分解应用
records =[
('foo',1,2,3),
('bar',11,22,33),
('foo',4,5,6),
('bar',44,55,66),
]defdo_foo(x,y,z):print('foo',x,y,z)defdo_bar(a,b,c):print('bar',a,b,c)for tag,*args inrecords: #分解元组打印if tag == 'foo':
do_foo(*args)elif tag == 'bar':
do_bar(*args)#outing
foo 1 2 3bar11 22 33foo4 5 6bar44 55 66
列3:通过split拆分分解元素
In [52]: passwd = 'root:x:0:0:root:/root:/bin/bash'In [53]: username,*_,homedir,sh = passwd.split(":")
In [54]: username
Out[54]: 'root'In [55]: homedir
Out[55]: '/root'In [56]: sh
Out[56]: '/bin/bash'
3、保存最后N个元素
列1:使用collections.deque保存有限的历史纪录,deque用来创建一个固定长度的队列
In [61]: from collections importdeque#创建队列长度对象
In [62]: q = deque(maxlen=3)#加入数据到队列
In [63]: q.append(1)
In [64]: q.append(2)
In [65]: q.append(3)
In [66]: q
Out[66]: deque([1, 2, 3])
In [67]: q.append(4)
In [68]: q
Out[68]: deque([2, 3, 4])#从左边加入数据到队列
In [69]: q.appendleft(5)
In [70]: q
Out[70]: deque([5, 2, 3])#从末尾取出一个数据
In [71]: q.pop()
Out[71]: 3In [72]: q
Out[72]: deque([5, 2])
In [73]: q.popleft()
Out[73]: 5In [74]: q
Out[74]: deque([2])
4、找到最大或最小的N个元素
在heapq模块中有两个函数nlargest()从最大的值开始取,nsmallest()从最小的值开始取
In [75]: importheapq
In [76]: numbers = [1,3,4,9,11,34,55,232,445,9812,321,45,67,434,555]#取三个最大的值
In [77]: heapq.nlargest(3,numbers)
Out[77]: [9812, 555, 445]#取三个最小的值
In [78]: heapq.nsmallest(3,numbers)
Out[78]: [1, 3, 4]
5、python堆排序peapq模块
hepaq模块实现了python中的推排序,并提供了很多方法,让用python实现排序算法有了简单快捷的方式
In [1]: importheapq
In [2]: date = [19,1,9,3,11,21]
In [3]: heap =[]#heappush方法会插入一个元素到堆中,并按从小到大排序
In [4]: for i indate:
...: heapq.heappush(heap,i)
...:
In [5]: heap
Out[5]: [1, 3, 9, 19, 11, 21]
In [6]: date
Out[6]: [19, 1, 9, 3, 11, 21]#heapify方法会重新排序整个列表
In [7]: heapq.heapify(date)
In [8]: date
Out[8]: [1, 3, 9, 19, 11, 21]#heappop()方法会取出第一个元素,并将剩下的元素堆排序
In [10]: date
Out[10]: [19, 1, 9, 3, 11, 21]
In [11]: heapq.heappop(date)
Out[11]: 19In [12]: date
Out[12]: [1, 3, 9, 21, 11]#heapreplace()的作用是在堆中取第一个元素并插入一个元素
In [27]: date = [11,8,3,78,35]
In [28]: heapq.heapreplace(date,1)
Out[28]: 11In [29]: date
Out[29]: [1, 8, 3, 78, 35]#在集合中找出最大或者最小的N个元素,可以使用nlargest()和nsmallest()
In [30]: date = [3,88,32,97,56]
In [31]: heapq.nlargest(2,date)
Out[31]: [97, 88]
In [33]: heapq.nsmallest(2,date)
Out[33]: [3, 32]#nlargest()和nsmallest()还可以接受一个key参数来实现复杂的数据结构上的取值,如根据字典的值取值
In [34]: port =[
...: {'name':'dhcp','port':67},
...: {'name':'mysql','port':3306},
...: {'name':'memcached','port':11211},
...: {'name':'nginx','port':80},
...: {'name':'ssh','port':22},]
In [35]: heapq.nlargest(3,port,key=lambda x:x['port'])
Out[35]:
[{'name': 'memcached', 'port': 11211},
{'name': 'mysql', 'port': 3306},
{'name': 'nginx', 'port': 80}]
In [36]: heapq.nsmallest(3,port,key=lambda x:x['port'])
Out[36]:
[{'name': 'ssh', 'port': 22},
{'name': 'dhcp', 'port': 67},
{'name': 'nginx', 'port': 80}]
实现优先级队列实例:
importheapqclasspriorityqueue(object):def __init__(self):
self._queue=[]
self._index=0defpush(self,item,priority):
heapq.heappush(self._queue,(-priority,self._index,item))
self._index+= 1
defpop(self):return heapq.heappop(self._queue)[-1]deflistt(self):returnself._queue
q=priorityqueue()
q.push('python',44)
q.push('java',2)
q.push('c++',4)
q.push('c#',8)
q.push('goo',88)
q.push('perl',1)
date1=q.listt()print(date1)print(q.pop())print(q.listt())#output
[(-88, 4, 'goo'), (-44, 0, 'python'), (-4, 2, 'c++'), (-2, 1, 'java'), (-8, 3, 'c#'), (-1, 5, 'perl')]
goo
[(-44, 0, 'python'), (-8, 3, 'c#'), (-4, 2, 'c++'), (-2, 1, 'java'), (-1, 5, 'perl')]
6、在字典中将键映射到多个值上
可以使用列表、元组、集合来创建多个值的字典键值
dictlist ={'a':[1,2],'b':[3,4],'c':[5,6],
}
dictset={'as':{7,8},'bs':{9,0},
}
在collection模块中的defaultdict类,它可以自动初始化第一个值,只需要添加元素即可
In [1]: from collections importdefaultdict
In [2]: d =defaultdict(list)
In [3]: d
Out[3]: defaultdict(list, {})
In [4]: d['a'].append(1)
In [5]: d['a'].append(2)
In [6]: d
Out[6]: defaultdict(list, {'a': [1, 2]})
7、让字典保持有序
要控制字典中元素的顺序,可以使用collections模块中的OrderedDict类,当对字典做迭代时,它会严格按照元素初始添加的顺序进行迭代
from collections importOrderedDict
d=OrderedDict()
d['one'] = 1d['two'] = 2d['three'] = 3d['four'] = 4
for key,value ind.items():print(key,value)#output
one 1two2three3four4
当我们先精确控制字典中各字段的顺序然后序列化时,只需要在序列化前使用OrderdDist来构建字典数据,OrderedDict内部维护了一个双向链表,它会根据元素加入的顺序来排列键的位置,第一个新加入的元素被放置在链表的末尾,以后对已存在的键做修改也不会改变键的顺序,由于它额外创建了链表所占用的空间会是普通字典的2倍
from collections importOrderedDictimportjson
d=OrderedDict()
d['one'] = 1d['two'] = 2d['three'] = 3d['four'] = 4jsd=json.dumps(d)
d1=json.loads(jsd)print(jsd)print(d1)#{"one": 1, "two": 2, "three": 3, "four": 4}
{'one': 1, 'two': 2, 'three': 3, 'four': 4}
8、字典中的计算(求最大值、最小值和排序)
prices ={'ACME':45.23,'AAPL':612.78,'IBM':205.55,'HPQ':10.75,'FB':10.75}print(min(zip(prices.values(),prices.keys())))print(max(zip(prices.values(),prices.keys())))print(sorted(zip(prices.values(),prices.keys())))#使用zip()将字典中的值映射为元组的迭代器,但zip()只能被使用一次#如果对比的值相同,则选择键的排序大小#(10.75, 'FB')
(612.78, 'AAPL')
[(10.75, 'FB'), (10.75, 'HPQ'), (45.23, 'ACME'), (205.55, 'IBM'), (612.78, 'AAPL')]
9、在两个字典中寻找相同点
a = {'x':1,'y':2,'z':3}
b= {'w':10,'x':11,'y':2}print(a.keys() & b.keys()) #a和b中同时都有的key
print(a.keys() - b.keys()) #a中的键不在b中出现的key
print(a.items() & b.items()) #a和b中键值都相同的元素
#{'x', 'y'}
{'z'}
{('y', 2)}
In [1]: a = {'a':11,'b':22,'c':44,'d':99,'f':101}#推倒式排除键新建字典
In [2]: c = {key:a[key] for key in a.keys() - {'b','d'}}
In [3]: c
Out[3]: {'f': 101, 'a': 11, 'c': 44}
10、从序列中移除重复项并保持元素顺序
dic = [{'x':1,'y':3},{'x':3,'y':8},{'x':1,'y':11},{'x':1,'y':3}]def dedupe(items,key=None):
seen=set()for item initems:
val= item if key is None elsekey(item)if val not inseen:yielditem
seen.add(val)#key传递函数将序列中的元素转换为可哈希值,来去除重复项
date = list(dedupe(dic,key=lambda d:(d['x'],d['y'])))print(date)
date1= list(dedupe(dic,key=lambda x:(x['x'])))print(date1)#[{'x': 1, 'y': 3}, {'x': 3, 'y': 8}, {'x': 1, 'y': 11}]
[{'x': 1, 'y': 3}, {'x': 3, 'y': 8}]
11、对切片命名
使用内置函数slice来创建切片对象
In [4]: li = [1,2,3,4,5,6,7,8,9,0]
In [5]: cost = li[slice(2,8)]
In [6]: cost
Out[6]: [3, 4, 5, 6, 7, 8]
In [8]: li[slice(2,9,2)]
Out[8]: [3, 5, 7, 9]
12、找出序列中出现次数最多的元素
collections模块中的Counter类可以直接统计每个元素出现的次数,它会以字典的形式映射每个元素出现的次数,其中的most_common()方法可以直接显示结果,可传参数为显示的元素个数
In [15]: date = [1,23,4,3,2,5,23,123,553,23,1,3,4,5,2,3,423,12,3,4,23,412,43]
In [16]: from collections importCounter
In [17]: Counter(date)
Out[17]:
Counter({1: 2,23: 4,4: 3,3: 4,2: 2,5: 2,123: 1,553: 1,423: 1,12: 1,412: 1,43: 1})
In [18]: Counter(date).most_common()
Out[18]:
[(23, 4),
(3, 4),
(4, 3),
(1, 2),
(2, 2),
(5, 2),
(123, 1),
(553, 1),
(423, 1),
(12, 1),
(412, 1),
(43, 1)]
In [19]: Counter(date).most_common(2)
Out[19]: [(23, 4), (3, 4)]
In [20]: Counter(date).most_common(4)
Out[20]: [(23, 4), (3, 4), (4, 3), (1, 2)]
In [27]: date1 = ['a','b','c','a','a','b']
In [28]: from collections importCounter#生成一个Counter对象,为字典映射的统计值
In [29]: counts =Counter(date1)
In [30]: counts['a']
Out[30]: 3
#创建第二个序列
In [31]: date2 = ['b','b','a']#先统计元素出现的次数
In [32]: counts.most_common()
Out[32]: [('a', 3), ('b', 2), ('c', 1)]#使用update()方法来手动更新counts对象
In [33]: counts.update(date2)#查看结果
In [34]: counts.most_common()
Out[34]: [('a', 4), ('b', 4), ('c', 1)]#创建第二个counter对象
In [35]: counts1 =Counter(date2)#counter对象可以用加减来运算
In [36]: counts -counts1
Out[36]: Counter({'a': 3, 'b': 2, 'c': 1})
13、字典列表排序
operator模块中的itemgetter函数可以对嵌套数据结构的排序会非常简单且运行很快
from operator importitemgetter
date1=[
{'fname':'Brian','lname':'Jones','uid':1003},
{'fname':'David','lname':'Beazley','uid':1002},
{'fname':'John','lname':'Cleese','uid':1001},
{'fname':'Big','lname':'Jones','uid':1004},
]print(sorted(date1,key=itemgetter('uid')))print(sorted(date1,key=itemgetter('uid'),reverse=True)) #反向排序
print(sorted(date1,key=itemgetter('uid','fname'))) #通过多个公共键排序
print(sorted(date1,key=lambda x:x['uid'])) #也可以使用匿名函数来代替,但速度没有itemgetter()函数快
print(min(date1,key=itemgetter('uid'))) #itemgetter()也可以用在去最大或最小值上
print(max(date1,key=itemgetter('uid')))#[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]
{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
对原生不支持比较操作的对象排序
from operator importattrgetterclassuser(object):def __init__(self,user_id):
self.user_id=user_iddef __repr__(self):return 'user({})'.format(self.user_id)
users= [user(11),user(22),user(3)]print(users)print(sorted(users,key=lambdax:x.user_id))print(sorted(users,key=attrgetter('user_id'))) #使用attrgetter()函数来对实例化对象的参数值排序
14、根据字段将记录分组
itertools模块中的函数groupby()可以通过扫描序列找出拥有相同值或是参数key指定的函数所返回的值的序列项,并将它们分组,groupby()创建一个迭代器,而每次迭代时都回返回一个值,和一个子迭代器,这个子迭代器可以产生所有在该分组内具有该值的项。
from operator importitemgetterfrom itertools importgroupby
rows=[
{'a':'python','date':'07/01/'},
{'a':'java','date':'08/11/'},
{'a':'c++','date':'09/12/'},
{'a':'perl','date':'17/06/'},
]
rows.sort(key=itemgetter('date'))print(rows)for date,items in groupby(rows,key=itemgetter('date')):print(date)for i initems:print(' ',i)#[{'a': 'python', 'date': '07/01/'}, {'a': 'java', 'date': '08/11/'}, {'a': 'c++', 'date': '09/12/'}, {'a': 'perl', 'date': '17/06/'}]07/01/{'a': 'python', 'date': '07/01/'}
08/11/{'a': 'java', 'date': '08/11/'}
09/12/{'a': 'c++', 'date': '09/12/'}17/06/{'a': 'perl', 'date': '17/06/'}
from collections importdefaultdict#根据数据分组来构建一个一键多值的字典
rows_date =defaultdict(list)for row inrows:
rows_date[row['date']].append(row)print(rows_date)#defaultdict(, {'07/01/': [{'a': 'python', 'date': '07/01/'}], '08/11/': [{'a': 'java', 'date': '08/11/'}], '09/12/': [{'a': 'c++', 'date': '09/12/'}], '17/06/': [{'a': 'perl', 'date': '17/06/'}]})
15、筛选序列中的元素
#使用列表推到式来赛选列表中符合要求的值
In [37]: mylist = [1,2,-5,10,-8,3,-1]
In [38]: list(i for i in mylist if i >0)
Out[38]: [1, 2, 10, 3]
In [39]: list(i for i in mylist if i <0)
Out[39]: [-5, -8, -1]#如果输入的值非常多,可以先生成生成器然后筛选结果值
In [43]: pos = (i for i in mylist if i >0)
In [46]: for i inpos:
...:print(i)
...:1
2
10
3
如果碰到筛选不标准的值如包含字符和数字,只筛选出数字呢?
In [47]: values = [1,'3','-4','-',88,'N/A','python','5']
In [48]: defis_int(val):
...:try:
...: x=int(val)
...:returnTrue
...:exceptValueError:
...:returnFalse
...:#在筛选不规则的值式使用函数来过滤异常然后使用filter函数处理
In [49]: list(filter(is_int,values))
Out[49]: [1, '3', '-4', 88, '5']#用新值替换掉筛选不和规定的值
In [50]: mylist = [1,4,-5,10,-7,2,3,-1]
In [51]: list(i if i > 0 else 0 for i inmylist)
Out[51]: [1, 4, 0, 10, 0, 2, 3, 0]
In [52]: list(i if i < 0 else 0 for i inmylist)
Out[52]: [0, 0, -5, 0, -7, 0, 0, -1]#还可以使用press()来构建一个布尔选择器序列来赛选数据
In [53]: addresses = ['one','two','three','four','five']
In [54]: from itertools importcompress
In [55]: counts = [1,3,5,6,3]
In [56]: more1 = [i > 3 for i incounts]
In [57]: more1
Out[57]: [False, False, True, True, False]
In [58]: list(compress(addresses,more1))
Out[58]: ['three', 'four']
16、从字典中提取子集
prices ={'ACME':45.23,'AAPL':612.78,'IBM':205.55,'HPQ':37.20,'FB':10.75,
}#推到式创建值大于30的字典集合
P1 = {key:value for key,value in prices.items() if value > 30}print(P1)#推倒式创建在tech中有的键的字典集合
tech = {'ACME','IBM','HPQ','FB'}
P2= {key:value for key,value in prices.items() if key intech}print(P2)#{'ACME': 45.23, 'AAPL': 612.78, 'IBM': 205.55, 'HPQ': 37.2}
{'ACME': 45.23, 'IBM': 205.55, 'HPQ': 37.2, 'FB': 10.75}
#使用dict()函数来创建会更加清晰,效率会是上面的两倍
P3 = dict((key,value) for key,value in prices.items() if value > 100)print(P3)#{'AAPL': 612.78, 'IBM': 205.55}
#多种实现方式,但这种方法会慢很多
p4 ={key:prices[key] for key in prices.keys() &tech}print(p4)#{'ACME': 45.23, 'IBM': 205.55, 'FB': 10.75, 'HPQ': 37.2}
17、将名称映射到序列的元素中
collections.namedtuple()模块定义命名元组
In [1]: from collections importnamedtuple
In [2]: subject = namedtuple('subject',['one','two','three'])
In [3]: sub = subject(1,2,3)
In [4]: sub
Out[4]: subject(one=1, two=2, three=3)
In [7]: sub.index(2)
Out[7]: 1In [9]: sub.one
Out[9]: 1In [10]: sub.two
Out[10]: 2In [11]: sub.three
Out[11]: 3In [12]: len(sub)
Out[12]: 3In [13]: a,b,c=sub
In [14]: a
Out[14]: 1In [15]: c
Out[15]: 3
#如果要修改某个值可以使用_replace()方法
In [17]: sub._replace(one=88)
Out[17]: subject(one=88, two=2, three=3)
18、同时对数据做转换和换算
#使用生成器表达式将数据转换和换算
In [19]: numbers = [1,2,3,4,5]
In [20]: s = sum(x * x for x innumbers)
In [21]: s
Out[21]: 55In [26]: portfolio = [{'name':'GOOG','shares':50},{'name':'YHOO','shares':75},{'name':'AOL','shares':20},{'name':'SCOX','shares':65}]#对所有商品求和
In [27]: sumnmber = sum(i['shares'] for i inportfolio)
In [28]: sumnmber
Out[28]: 210In [29]: minmber = min(i['shares'] for i inportfolio)
In [30]: minmber
Out[30]: 20In [31]: maxmber = max(i['shares'] for i inportfolio)
In [32]: maxmber
Out[32]: 75
#也可以使用key参数来换算
In [33]: min(portfolio,key=lambda s:s['shares'])
Out[33]: {'name': 'AOL', 'shares': 20}
19、将多个映射合并为单个映射
In [34]: a = {'x':11,'z':33}
In [35]: b = {'y':22,'z':44}#利用collections模块中的ChainMap类来实现多个字典的合并检查
In [36]: from collections importChainMap
In [37]: c =ChainMap(a,b)
In [38]: c
Out[38]: ChainMap({'x': 11, 'z': 33}, {'y': 22, 'z': 44})
In [39]: c['z'] = 33In [40]: c
Out[40]: ChainMap({'x': 11, 'z': 33}, {'y': 22, 'z': 44})
In [41]: c['z'] = 55In [42]: c
Out[42]: ChainMap({'x': 11, 'z': 55}, {'y': 22, 'z': 44})
In [43]: values =ChainMap()
In [44]: values['x'] = 100In [45]: values =values.new_child()
In [46]: values['x'] = 200In [47]: values
Out[47]: ChainMap({'x': 200}, {'x': 100})
In [48]: values =values.new_child()
In [50]: values['x'] = 50In [51]: values
Out[51]: ChainMap({'x': 50}, {'x': 200}, {'x': 100})
In [52]: values['x']
Out[52]: 50
#利用字典的update()方法将多个字典合并一起,它会重新构建一个完整的字典
In [58]: a = {'x':1,'z':3}
In [59]: b = {'y':2,'z':4}
In [60]: merged =dict(b)
In [61]: merged.update(a)
In [62]: merged
Out[62]: {'y': 2, 'z': 3, 'x': 1}#ChainMap使用的是原始的字典,对原始数据的更改会映射到新建的对象上
In [63]: a = {'x':1,'z':3}
In [64]: b = {'y':2,'z':4}
In [65]: merged =ChainMap(a,b)
In [66]: merged
Out[66]: ChainMap({'x': 1, 'z': 3}, {'y': 2, 'z': 4})
In [67]: merged['x']
Out[67]: 1In [68]: a['x'] = 100In [69]: merged['x']
Out[69]: 100In [70]: merged
Out[70]: ChainMap({'x': 100, 'z': 3}, {'y': 2, 'z': 4})