获得两个列表之间的差异

我在Python中有两个列表，如下所示：

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

我需要创建第三个列表，其中第一个列表中的项不在第二个列表中。从示例中我必须得到：

temp3 = ['Three', 'Four']

有没有循环和检查的快速方法？

元素是否保证唯一？如果您的temp1 = ['One'，'One'，'One']和temp2 = ['One']，您是否想要['One'，'One']还是[]？

@ michael-mrozek它们是独一无二的。

您要保留元素的顺序吗？

#1 楼

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

请注意，

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1])

可能希望/希望它等于set([1, 3])。如果确实要使用set([1, 3])作为答案，则需要使用set([1, 2]).symmetric_difference(set([2, 3]))。

@Drewdin：列表不支持“-”操作数。但是，集合确实可以做到，如果您仔细观察的话，上面展示的内容也是如此。

–戈德史密斯
14-10-14在21:21

谢谢，我最终使用set（ListA）.symmetric_difference（ListB）

–德鲁丁
14-10-15在12:07

对称差可以写成：^（set1 ^ set2）

–巴斯蒂安语
2015年10月1日在18:18

请，您可以编辑答案并指出这仅返回temp1-temp2吗？ ..正如其他人所说，为了返回所有差异，您必须使用系统度量差异：list（set（temp1）^ set（temp2））

– rkachach
16年2月16日在16:00

请注意，由于集合是无序的，所以差值上的迭代器可以按任何顺序返回元素。例如list（set（temp1）-set（temp2））== ['Four'，'Three']或list（set（temp1）-set（temp2））== ['Three'，'Four']。

–亚瑟
17年3月31日在12:50

#2 楼

现有解决方案均提供以下一项或多项：

性能优于O（n * m）。
保留输入列表的顺序。

但到目前为止，还没有解决方案。如果两者都需要，请尝试以下操作：

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

我介绍的方法以及保留顺序也比集合减法要快（略），因为它不需要构造不必要的集合。如果第一个列表比第二个列表长很多，并且散列很昂贵，则性能差异会更加明显。这是第二个测试，证明了这一点：

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

结果：

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

此答案的其他支持：遍历一个用例，其中保留列表顺序对于性能很重要。当使用tarinfo或zipinfo对象时，我使用的是设置减法。从归档文件中排除某些tarinfo对象。创建新列表很快，但是在提取过程中却非常慢。起初，原因使我回避。原来对tarinfo对象列表重新排序会导致巨大的性能损失。切换到列表理解方法可以节省一天的时间。

–雷·汤普森（Ray Thompson）
2011-12-13在0:26

@MarkByers-也许我应该为此写一个全新的问题。但是，如何在forloop中工作？例如，如果我的temp1和temp2保持变化..并且我想将新信息附加到temp3？

– Ason
2012年8月9日17:57

@MarkByers-听起来不错。我会继续考虑一下。但是+1是一个很好的解决方案。

– Ason
2012年8月9日在18:39

我同意@Dejel >>> temp1 = ['One'，'Two'，'Three'，'4'] >>> temp2 = ['One'，'Two'，'Six'] >>> s = set（temp2）>>> temp3 = [如果x不在s中，则temp1中x用于x] >>> temp3 ['Three'，'四个']

–earlonrails
2014年7月7日在2:58

@haccks因为检查列表的成员身份是O（n）操作（遍历整个列表），但是检查集合的成员身份是O（1）。

–马克·拜尔斯
2015年11月5日在16:57

#3 楼

temp3 = [item for item in temp1 if item not in temp2]

之前将temp2设置为set将使其更有效率。

–lunaryorn
2010年8月11日19:47

是的，取决于Ockonal是否关心重复项（原始问题未说明）

–马特b
2010年8月11日19:47

评论说（列表|元组）没有重复项。

–user395760
2010年8月11日19:52

我赞成您的回答，因为我认为您起初对重复项是正确的。但是，不在temp2中的项目和不在set（temp2）中的项目将始终返回相同的结果，无论temp2中是否存在重复项。

– Arekolek
16-3-7在22:42

投票反对不要求列表项可散列。

–布伦特
17年9月11日在15:19

#4 楼

可以使用python XOR运算符完成。

这将删除每个列表中的重复项
这将显示temp1与temp2和temp2与temp1的区别。

set(temp1) ^ set(temp2)

最好的答案！

– Artsiom Praneuski
2月24日12:54

这是怎么埋的...很棒的电话

–卡尔·博内里（Carl Boneri）
7月31日20:44

该死的。这是一个更好的解决方案！

–́AlexAMP
9月25日18:15

#5 楼

可以使用以下简单函数找到两个列表（例如list1和list2）之间的区别。

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

通过上述功能，可以使用diff(temp2, temp1)或diff(temp1, temp2)找到区别。两者都将给出结果['Four', 'Three']。您不必担心列表的顺序或先给出哪个列表。

Python文档参考

为什么不设置（list1）.symmetric_difference（set（list2））？

– swietyy
15年3月4日在16:49

#6 楼

如果您需要递归的区别，我为python编写了一个软件包：
https://github.com/seperman/deepdiff

安装

从安装PyPi：

pip install deepdiff

示例用法

导入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

相同的对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

项目值已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

已添加和/或删除的项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

字符串差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

字符串差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

类型更改

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

列表差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

列表差异2：

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

列表差异忽略顺序或重复：（具有与上述相同的词典）

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表：

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

集合：

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名元组：

>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> t1 = Point(x=11, y=22) >>> t2 = Point(x=11, y=23) >>> pprint (DeepDiff(t1, t2)) {'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

自定义对象：

>>> class ClassA(object): ... a = 1 ... def __init__(self, b): ... self.b = b ... >>> t1 = ClassA(1) >>> t2 = ClassA(2) >>> >>> pprint(DeepDiff(t1, t2)) {'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

添加的对象属性：

>>> t2.c = "new attribute" >>> pprint(DeepDiff(t1, t2)) {'attribute_added': ['root.c'], 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

#7 楼

最简单的方法，

使用set（）。difference（set（））

list_a = [1,2,3] list_b = [2,3] print set(list_a).difference(set(list_b))

答案是set([1])
可以打印为列表，

print list(set(list_a).difference(set(list_b)))

评论
删除重复项并且不保留顺序

–阿尔伯特·罗斯曼（Albert Rothman）
9月3日18:51

#8 楼

如果您真的在研究性能，那么请使用numpy！ //gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

评论
我更新了链接中的笔记本以及屏幕截图。令人惊讶的是，即使在内部切换到哈希表时，熊猫也比numpy慢。部分原因可能是因为向上转换为int64。

–denfromufa
17年4月29日在7:06

#9 楼

由于目前的解决方案都无法产生元组，因此我会提出建议：

>就像其他非元组在该方向上产生答案一样，它保留了顺序

#10 楼

我想要一些可以包含两个列表并且可以执行diff中的功能的东西。由于搜索“ python diff两个列表”时该问题首先弹出，并且不是很具体，因此我将发布我的想法。做。其他答案都不会告诉您差异发生的位置，但是这个答案确实可以。一些答案仅在一个方向上有所不同。一些重新排列元素。有些不处理重复项。但是此解决方案为您提供了两个列表之间的真正区别：

a = 'A quick fox jumps the lazy dog'.split() b = 'A quick brown mouse jumps over the dog'.split() from difflib import SequenceMatcher for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes(): if tag == 'equal': print('both have', a[i:j]) if tag in ('delete', 'replace'): print(' 1st has', a[i:j]) if tag in ('insert', 'replace'): print(' 2nd has', b[k:l])

此输出：当然，如果您的应用程序做出与其他答案相同的假设，则您将从中受益最大。但是，如果您正在寻找真正的bash功能，那么这是唯一可行的方法。例如，其他答案都无法解决：

both have ['A', 'quick'] 1st has ['fox'] 2nd has ['brown', 'mouse'] both have ['jumps'] 2nd has ['over'] both have ['the'] 1st has ['lazy'] both have ['dog']

但是这个做：

a = [1,2,3,4,5] b = [5,4,3,2,1]

#11 楼

试试这个：

temp3 = set(temp1) - set(temp2)

#12 楼

这可能比Mark的列表理解还要快：

list(itertools.filterfalse(set(temp2).__contains__, temp1))

评论
可能想在此处包括from itertools import filterfalse位。另请注意，这不会像其他序列一样返回序列，而是返回迭代器。

–马特·鲁昂戈（Matt Luongo）
2012年1月17日16:16

#13 楼

这是最简单的情况的答案。而不是第二个。

from collections import Counter lst1 = ['One', 'Two', 'Three', 'Four'] lst2 = ['One', 'Two'] c1 = Counter(lst1) c2 = Counter(lst2) diff = list((c1 - c2).elements())

或者，根据您的可读性偏好，它也可以作为一个不错的选择： >
输出：

diff = list((Counter(lst1) - Counter(lst2)).elements())

请注意，如果只是对其进行迭代，则可以删除Counter调用。

因为这个解决方案使用计数器，与许多基于集合的答案相比，它可以正确处理数量。例如在此输入上：

['Three', 'Four']

输出是：

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four'] lst2 = ['One', 'Two']

#14 楼

这是区分两个字符串列表的几种简单的保留顺序的方法。

代码

使用pathlib的一种不寻常的方法： br />
这假设两个列表都包含以相同开头的字符串。有关更多详细信息，请参阅文档。请注意，与设置操作相比，它不是特别快。

使用itertools.zip_longest的直接实现：

import pathlib temp1 = ["One", "Two", "Three", "Four"] temp2 = ["One", "Two"] p = pathlib.Path(*temp1) r = p.relative_to(*temp2) list(r.parts) # ['Three', 'Four']

评论
仅当temp1和temp2中的元素对齐时，itertools解决方案才有效。例如，如果您转过temp2中的元素或在temp2的开头插入其他值，则listcomp将只返回与temp1中相同的元素

– KenHBS
18年8月24日在10:50

是的，这是这些方法的特征。如前所述，这些解决方案是顺序保留的-它们假定列表之间有一些相对顺序。一个无序的解决方案是将两个集合进行比较。

– pylang
18年8月24日在18:09

#15 楼

这是另一种解决方案：

def diff(a, b): xa = [i for i in set(a) if i not in b] xb = [i for i in set(b) if i not in a] return xa + xb

#16 楼

如果对difflist的元素进行了排序和设置，则可以使用幼稚的方法。 br />
天真的解决方案：0.0787101593292

本机解决方案：0.998837615564

#17 楼

我在游戏中为时不晚，但是您可以以此比较上面提到的一些代码的性能，其中两个最快的竞争者是：

list(set(x).symmetric_difference(set(y))) list(set(x) ^ set(y))

对于基本的编码水平，我深表歉意。

import time import random from itertools import filterfalse # 1 - performance (time taken) # 2 - correctness (answer - 1,4,5,6) # set performance performance = 1 numberoftests = 7 def answer(x,y,z): if z == 0: start = time.clock() lists = (str(list(set(x)-set(y))+list(set(y)-set(y)))) times = ("1 = " + str(time.clock() - start)) return (lists,times) elif z == 1: start = time.clock() lists = (str(list(set(x).symmetric_difference(set(y))))) times = ("2 = " + str(time.clock() - start)) return (lists,times) elif z == 2: start = time.clock() lists = (str(list(set(x) ^ set(y)))) times = ("3 = " + str(time.clock() - start)) return (lists,times) elif z == 3: start = time.clock() lists = (filterfalse(set(y).__contains__, x)) times = ("4 = " + str(time.clock() - start)) return (lists,times) elif z == 4: start = time.clock() lists = (tuple(set(x) - set(y))) times = ("5 = " + str(time.clock() - start)) return (lists,times) elif z == 5: start = time.clock() lists = ([tt for tt in x if tt not in y]) times = ("6 = " + str(time.clock() - start)) return (lists,times) else: start = time.clock() Xarray = [iDa for iDa in x if iDa not in y] Yarray = [iDb for iDb in y if iDb not in x] lists = (str(Xarray + Yarray)) times = ("7 = " + str(time.clock() - start)) return (lists,times) n = numberoftests if performance == 2: a = [1,2,3,4,5] b = [3,2,6] for c in range(0,n): d = answer(a,b,c) print(d[0]) elif performance == 1: for tests in range(0,10): print("Test Number" + str(tests + 1)) a = random.sample(range(1, 900000), 9999) b = random.sample(range(1, 900000), 9999) for c in range(0,n): #if c not in (1,4,5,6): d = answer(a,b,c) print(d[1])

#18 楼

如果遇到TypeError: unhashable type: 'list'，则需要将列表或集合转换为元组，例如

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

另请参阅如何在python中比较列表/集合的列表？

#19 楼

假设我们有两个列表

list1 = [1, 3, 5, 7, 9] list2 = [1, 2, 3, 4, 5]

从上面的两个列表中我们可以看到list2中存在项1、3、5，而项7、9中不存在。另一方面，列表1中存在项1、3、5，而项2、4中不存在。

返回包含项7、9和2、4的新列表的最佳解决方案是什么？

以上所有答案都找到了解决方案，现在什么是最佳选择？

def difference(list1, list2): new_list = [] for i in list1: if i not in list2: new_list.append(i) for j in list2: if j not in list1: new_list.append(j) return new_list

与

def sym_diff(list1, list2): return list(set(list1).symmetric_difference(set(list2)))

使用时间我们可以看到结果

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, list1, list2") t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, list1, list2") print('Using two for loops', t1.timeit(number=100000), 'Milliseconds') print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

返回

[7, 9, 2, 4] Using two for loops 0.11572412995155901 Milliseconds Using symmetric_difference 0.11285737506113946 Milliseconds Process finished with exit code 0

#20 楼

arulmr解决方案的单行版本

def diff(listA, listB): return set(listA) - set(listB) | set(listA) -set(listB)

#21 楼

如果您想要更像是变更集...可以使用Counter

from collections import Counter def diff(a, b): """ more verbose than needs to be, for clarity """ ca, cb = Counter(a), Counter(b) to_add = cb - ca to_remove = ca - cb changes = Counter(to_add) changes.subtract(to_remove) return changes lista = ['one', 'three', 'four', 'four', 'one'] listb = ['one', 'two', 'three'] In [127]: diff(lista, listb) Out[127]: Counter({'two': 1, 'one': -1, 'four': -2}) # in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s In [128]: diff(listb, lista) Out[128]: Counter({'four': 2, 'one': 1, 'two': -1}) # in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

#22 楼

我们可以计算交集减去列表的并集：

temp1 = ['One', 'Two', 'Three', 'Four'] temp2 = ['One', 'Two', 'Five'] set(temp1+temp2)-(set(temp1)&set(temp2)) Out: set(['Four', 'Five', 'Three'])

#23 楼

只需一行即可解决。
给出了两个列表（temp1和temp2），将它们的差返回到第三个列表（temp3）。

#24 楼

这是区分两个列表的简单方法（无论内容如何），您都可以得到如下所示的结果：

>

#25 楼

我更喜欢使用转换为集合，然后使用“ difference（）”函数。完整的代码是：

temp1 = ['One', 'Two', 'Three', 'Four' ] temp2 = ['One', 'Two'] set1 = set(temp1) set2 = set(temp2) set3 = set1.difference(set2) temp3 = list(set3) print(temp3)

输出：

>>>print(temp3) ['Three', 'Four']

这是最容易理解的东西将来，如果您处理大数据，则如果不需要重复数据，将其转换为数据集将删除重复数据。希望对您有所帮助；-)

评论
差异函数与接受的答案中显示的-运算符相同，因此不确定十年后是否真的添加了任何新信息

– OneCricketeer
12月3日6:49

#26 楼

(list(set(a)-set(b))+list(set(b)-set(a)))

评论
除了提供答案之外，您能否解释一下它如何工作/适用于此特定问题？答案和解决方案很棒，但是详细的指南和说明却要好得多。

–商业
17年7月6日在17:15

#27 楼

def diffList(list1, list2):     # returns the difference between two lists.
    if len(list1) > len(list2):
        return (list(set(list1) - set(list2)))
    else:
        return (list(set(list2) - set(list1)))

例如如果list1 = [10, 15, 20, 25, 30, 35, 40]和list2 = [25, 40, 35]，则返回的列表将是output = [10, 20, 30, 15]

评论
对于差分操作，您不能这样做。即使在整数的情况下，如果您告诉函数执行“ a-b”，则无论“ b”是否大于“ a”，都应仅从“ a”减去“ b”。列表和集合的情况与此类似。无论A和B的长度如何，A-B和B-A都是有效的操作，您只需要从A中排除B的内容即可执行A-B。

–阿布塔尔哈（丹麦）
19/12/10在18:40

编程黑洞网