Remove duplicates from a List<T> in C#

任何人都有在C＃中删除通用列表的快速方法吗？

您是否关心结果中元素的顺序？这将排除一些解决方案。

一种解决方案：ICollection withoutDuplicates = new HashSet （inputList）;

#1 楼

也许您应该考虑使用HashSet。

从MSDN链接：

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        HashSet<int> evenNumbers = new HashSet<int>();
        HashSet<int> oddNumbers = new HashSet<int>();

        for (int i = 0; i < 5; i++)
        {
            // Populate numbers with just even numbers.
            evenNumbers.Add(i * 2);

            // Populate oddNumbers with just odd numbers.
            oddNumbers.Add((i * 2) + 1);
        }

        Console.Write("evenNumbers contains {0} elements: ", evenNumbers.Count);
        DisplaySet(evenNumbers);

        Console.Write("oddNumbers contains {0} elements: ", oddNumbers.Count);
        DisplaySet(oddNumbers);

        // Create a new HashSet populated with even numbers.
        HashSet<int> numbers = new HashSet<int>(evenNumbers);
        Console.WriteLine("numbers UnionWith oddNumbers...");
        numbers.UnionWith(oddNumbers);

        Console.Write("numbers contains {0} elements: ", numbers.Count);
        DisplaySet(numbers);
    }

    private static void DisplaySet(HashSet<int> set)
    {
        Console.Write("{");
        foreach (int i in set)
        {
            Console.Write(" {0}", i);
        }
        Console.WriteLine(" }");
    }
}

/* This example produces output similar to the following:
 * evenNumbers contains 5 elements: { 0 2 4 6 8 }
 * oddNumbers contains 5 elements: { 1 3 5 7 9 }
 * numbers UnionWith oddNumbers...
 * numbers contains 10 elements: { 0 2 4 6 8 1 3 5 7 9 }
 */

它令人难以置信的快速... List的100.000字符串需要400秒和8MB内存，我自己的解决方案需要2.5秒和28MB，哈希集需要0.1秒！和11MB的ram

– sasjaq
13年3月25日在22:28

HashSet没有索引，因此并不总是可以使用它。我必须一次创建一个没有重复的巨大列表，然后在虚拟模式下将其用于ListView。首先创建HashSet <>然后将其转换为List <>是非常快速的（因此ListView可以按索引访问项目）。 List <>。Contains（）太慢。

– Sinatr
13年7月31日在8:50

如果有一个示例说明如何在此特定上下文中使用哈希集，那将有所帮助。

–内森·麦克斯凯尔（Nathan McKaskle）
15年1月28日在17:04

这怎么算答案呢？这是一个链接

–mcont
2015年6月4日15:13

HashSet在大多数情况下都很棒。但是，如果您有一个类似DateTime的对象，它将按引用而不是按值进行比较，因此最终仍会重复。

–詹森·麦金德利（Jason McKindly）
2015年12月9日在20:03

#2 楼

如果您使用的是.Net 3+，则可以使用Linq。

List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();

该代码将失败，因为.Distinct（）返回IEnumerable 。您必须添加.ToList（）。

–ljs
08年9月6日在20:21

此方法只能用于具有简单值的列表。

–北极星
2010年11月8日在7:06

不，它适用于包含任何类型对象的列表。但是您将必须为您的类型覆盖默认的比较器。像这样：public over bool Equals（object obj）{...}

–BaBu
2010-12-9 14:27

在类中重写ToString（）和GetHashCode（）始终是一个好主意，这样这类事情就可以了。

– B B七
2011年4月8日在16:58

您还可以使用MoreLinQ Nuget软件包，该软件包具有.DistinctBy（）扩展方法。非常有用。

–yu_ominae
13年5月16日在2:49

#3 楼

怎么样：

var noDupes = list.Distinct().ToList();

在.net 3.5中？

它会复制列表吗？

–黑暗凝视
19年5月28日在14:20

@darkgaze这只会创建另一个仅包含唯一条目的列表。因此，所有重复项都将被删除，您将得到一个列表，其中每个位置都有一个不同的对象。

–hexagod
19年10月4日在17:33

这对列表项的列表有用吗，其中项代码重复且需要获取唯一列表

–venkat
1月19日21:10

#4 楼

只需使用具有相同类型的List初始化HashSet： >

...并且如果您需要List 作为结果，请使用新的HashSet （withDupes）.ToList（）

– Tim Schmelter
17年2月9日在16:29

#5 楼

对其进行排序，然后将两个和两个相邻检查，因为重复项会聚集在一起。：

从头到尾进行比较，以避免每次删除后都必须诉诸列表
此示例现在使用C＃值元组进行交换，并替换为适当的代码如果无法使用
最终结果将不再排序

如果我没记错的话，上面提到的大多数方法只是该例程的抽象，对吗？我会在这里采用您的方法，Lasse，因为这是我如何从心理上描绘数据移动的方式。但是，现在我对某些建议之间的性能差异感兴趣。

–伊恩·帕特里克·休斯（Ian Patrick Hughes）
09年8月11日在20:52

实施它们并为其计时，这是确保的唯一方法。甚至Big-O表示法也无法帮助您获得实际的性能指标，而仅与增长效果关系有关。

–拉瑟·V·卡尔森（Lasse V. Karlsen）
09年8月12日在7:03

我喜欢这种方法，它更易于移植到其他语言。

–杰里·梁（Jerry Liang）
2012年5月14日，0：26

不要那样做超级慢。 RemoveAt是List上非常昂贵的操作

–Clément
13年2月9日在21:53

Clément是正确的。解决这个问题的一种方法是将其包装在使用枚举器产生且仅返回不同值的方法中。或者，您可以将值复制到新的数组或列表中。

– JHubbard80
13-10-25在17:08

#6 楼

我喜欢使用以下命令：

List<Store> myStoreList = Service.GetStoreListbyProvince(provinceId)
                                                 .GroupBy(s => s.City)
                                                 .Select(grp => grp.FirstOrDefault())
                                                 .OrderBy(s => s.City)
                                                 .ToList();

我的列表中有以下字段：ID，商店名称，城市，邮政编码
我想显示城市列表在一个具有重复值的下拉列表中。
解决方案：按城市分组，然后从列表中选择第一个。

这适用于以下情况：我有多个具有相同密钥的项，并且只保留了具有最新更新日期的项。因此，使用“区别”的方法将行不通。

– Paul Evans
10月27日4:01

#7 楼

它为我工作。只需使用

List<Type> liIDs = liIDs.Distinct().ToList<Type>();

用所需的类型替换“类型”，例如int。

与MSDN页面所报告的不同，在Linq中而不是System.Collections.Generic中。

– Almo
2014年10月1日19:54

该答案（2012年）似乎与此页面上的其他两个答案都与2008年相同？

–乔恩·施耐德（Jon Schneider）
16年1月6日在21:33

#8 楼

正如kronoz在.Net 3.5中所说的，您可以使用Distinct()。在.Net 2中，您可以模仿它：

public IEnumerable<T> DedupCollection<T> (IEnumerable<T> input) 
{
    var passedValues = new HashSet<T>();

    // Relatively simple dupe check alg used as example
    foreach(T item in input)
        if(passedValues.Add(item)) // True if item is new
            yield return item;
}

可以使用可以对任何集合进行重复数据删除，并以原始顺序返回值。

过滤集合（与Distinct()和本示例一样）通常比从集合中删除项目要快得多。

这种方法的问题是，它是O（N ^ 2）-ish，而不是哈希集。但是至少它在做什么是显而易见的。

– Tamas Czinege
09年1月29日在18:25

@DrJokepu-实际上我没有意识到HashSet构造函数已重复数据删除，这使其在大多数情况下都更好。但是，这将保留排序顺序，而HashSet则不会。

–基思
2010年8月24日14:59

HashSet 在3.5中引入

–荆棘̈
2011年11月5日19:00

@thorn真的吗？很难跟踪。在那种情况下，您可以只使用Dictionary 来代替，将.Contains替换为.ContainsKey，将.Add（item）替换为.Add（item，null）

–基思
2011年11月6日22:32

@Keith，根据我的测试，HashSet保留顺序，而Distinct（）则不保留。

–丹尼斯T-恢复莫妮卡-
2015年6月9日15:50

#9 楼

扩展方法可能是一种不错的方法...像这样：

public static List<T> Deduplicate<T>(this List<T> listToDeduplicate)
{
    return listToDeduplicate.Distinct().ToList();
}

然后像这样调用，例如： >

List<int> myFilteredList = unfilteredList.Deduplicate();

#10 楼

在Java中（我认为C＃大致相同）：

list = new ArrayList<T>(new HashSet<T>(list))

如果您真的想更改原始列表： br />
要保留顺序，只需将HashSet替换为LinkedHashSet。

在C＃中将是：List noDupes = new List （new HashSet （list））; list.Clear（）; list.AddRange（noDupes）;

–烟熏
2012年4月16日14:45在

在C＃中，这种方式更容易：var noDupes = new HashSet （list）; list.Clear（）; list.AddRange（noDupes）; :)

– nawfal
2014年5月26日晚上11:20

#11 楼

这需要不同的元素（没有重复的元素），然后再次将其转换为列表：

List<type> myNoneDuplicateValue = listValueWithDuplicate.Distinct().ToList();

#12 楼

使用Linq的Union方法。

注意：此解决方案除了对Linq的了解之外，不需要任何知识。

代码

首先将以下内容添加到类文件的顶部：

using System.Linq;

现在，您可以使用以下内容从名为obj1的对象中删除重复项：

obj1 = obj1.Union(obj1).ToList();

注意：将obj1重命名为您的对象的名称。

它的工作方式

/> Union命令列出了两个源对象的每个条目之一。由于obj1都是两个源对象，因此这会将obj1减少为每个条目之一。
ToList()返回一个新的List。这是必需的，因为诸如Union之类的Linq命令将结果作为IEnumerable结果返回，而不是修改原始List或返回新的List。

#13 楼

作为辅助方法（无Linq）：

public static List<T> Distinct<T>(this List<T> list)
{
    return (new HashSet<T>(list)).ToList();
}

我认为Distinct已经采取。除此之外（如果您重命名方法），它应该可以工作。

– Andreas Reiff
2015年1月5日，12：34

#14 楼

通过Nuget安装MoreLINQ软件包，您可以通过属性轻松区分对象列表

IEnumerable<Catalogue> distinctCatalogues = catalogues.DistinctBy(c => c.CatalogueCode);

#15 楼

如果您不关心订单，可以将它们推入HashSet中，如果您想维护订单，则可以执行以下操作： >或使用Linq方式： HashSet时间和O(N)空间，所以对我不太清楚（乍一看）排序方式较差（我对暂时的不赞成投票表示歉意...）

#16 楼

这是用于原位删除相邻重复项的扩展方法。首先调用Sort（）并传递相同的IComparer。这应该比Lasse V. Karlsen的版本更有效，后者反复调用RemoveAt（导致多次块存储移动）。

public static void RemoveAdjacentDuplicates<T>(this List<T> List, IComparer<T> Comparer)
{
    int NumUnique = 0;
    for (int i = 0; i < List.Count; i++)
        if ((i == 0) || (Comparer.Compare(List[NumUnique - 1], List[i]) != 0))
            List[NumUnique++] = List[i];
    List.RemoveRange(NumUnique, List.Count - NumUnique);
}

#17 楼

只需确保没有将重复项添加到列表中，可能会更容易。

if(items.IndexOf(new_item) < 0) 
    items.add(new_item)

我目前正在这样做，但是条目越多，检查重复项所需的时间就越长。

–罗伯特·斯特劳奇
2013年6月24日14:59

我在这里有同样的问题。我每次都使用List .Contains方法，但有超过1,000,000个条目。此过程使我的应用程序变慢。我首先使用List .Distinct（）。ToList （）。

–RPDeshaies
2014年1月3日19:05

这个方法很慢

–黑暗凝视
19年5月28日在14:22

#18 楼

您可以使用Union

obj2 = obj1.Union(obj1).ToList();

解释为什么会起作用肯定会使这个答案更好

–伊戈尔B
17年8月6日在15:26

#19 楼

如果您有两个类Product和Customer，而我们想从它们的列表中删除重复的项目

public class Product
{
    public int Id { get; set; }
    public string ProductName { get; set; }
}

public class Customer
{
    public int Id { get; set; }
    public string CustomerName { get; set; }

}

您必须以下面的形式定义一个通用类

public class ItemEqualityComparer<T> : IEqualityComparer<T> where T : class
{
    private readonly PropertyInfo _propertyInfo;

    public ItemEqualityComparer(string keyItem)
    {
        _propertyInfo = typeof(T).GetProperty(keyItem, BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
    }

    public bool Equals(T x, T y)
    {
        var xValue = _propertyInfo?.GetValue(x, null);
        var yValue = _propertyInfo?.GetValue(y, null);
        return xValue != null && yValue != null && xValue.Equals(yValue);
    }

    public int GetHashCode(T obj)
    {
        var propertyValue = _propertyInfo.GetValue(obj, null);
        return propertyValue == null ? 0 : propertyValue.GetHashCode();
    }
}

然后，您可以删除列表中的重复项。通过其他属性删除重复项，您可以将Id更改为相同的nameof(YourClass.DuplicateProperty)，然后通过nameof(Customer.CustomerName)属性删除重复项。

#20 楼

.Net 2.0中的另一种方法

    static void Main(string[] args)
    {
        List<string> alpha = new List<string>();

        for(char a = 'a'; a <= 'd'; a++)
        {
            alpha.Add(a.ToString());
            alpha.Add(a.ToString());
        }

        Console.WriteLine("Data :");
        alpha.ForEach(delegate(string t) { Console.WriteLine(t); });

        alpha.ForEach(delegate (string v)
                          {
                              if (alpha.FindAll(delegate(string t) { return t == v; }).Count > 1)
                                  alpha.Remove(v);
                          });

        Console.WriteLine("Unique Result :");
        alpha.ForEach(delegate(string t) { Console.WriteLine(t);});
        Console.ReadKey();
    }

#21 楼

解决方法有很多-列表中的重复项是以下之一：

List<Container> containerList = LoadContainer();//Assume it has duplicates
List<Container> filteredList = new  List<Container>();
foreach (var container in containerList)
{ 
  Container duplicateContainer = containerList.Find(delegate(Container checkContainer)
  { return (checkContainer.UniqueId == container.UniqueId); });
   //Assume 'UniqueId' is the property of the Container class on which u r making a search

    if(!containerList.Contains(duplicateContainer) //Add object when not found in the new class object
      {
        filteredList.Add(container);
       }
  }

干杯

#22 楼

这是一个简单的解决方案，不需要任何难以理解的LINQ或列表的任何先前排序。

   private static void CheckForDuplicateItems(List<string> items)
    {
        if (items == null ||
            items.Count == 0)
            return;

        for (int outerIndex = 0; outerIndex < items.Count; outerIndex++)
        {
            for (int innerIndex = 0; innerIndex < items.Count; innerIndex++)
            {
                if (innerIndex == outerIndex) continue;
                if (items[outerIndex].Equals(items[innerIndex]))
                {
                    // Duplicate Found
                }
            }
        }
    }

您可以使用此方法对重复项进行更多控制。如果您有要更新的数据库，则更多。对于innerIndex，为什么不从outerIndex + 1开始而不是每次都开始？

–Nolmë信息学
17年4月22日在10:16

#23 楼

David J.的答案是一个很好的方法，不需要额外的对象，排序等。但是可以在以下方面进行改进：

整个列表位于顶部，但内部循环位于“直到到达外部循环位置为止”的底部。这些只能发生在尚未处理外循环的部分中。 1.

#24 楼

一个简单而直观的实现：

public static List<PointF> RemoveDuplicates(List<PointF> listPoints)
{
    List<PointF> result = new List<PointF>();

    for (int i = 0; i < listPoints.Count; i++)
    {
        if (!result.Contains(listPoints[i]))
            result.Add(listPoints[i]);
        }

        return result;
    }

这种方法也很慢。创建一个新列表。

–黑暗凝视
19年5月28日在14:23

#25 楼

所有答案都会复制列表，或创建新列表，或使用慢速功能，或者非常缓慢。经验丰富的专门从事实时物理优化的程序员。）

最终成本为： O（nlogn）很好。

关于RemoveRange的注意事项：
由于我们无法设置列表的数量并且避免使用Remove函数，所以我不完全知道它的速度这项操作，但我想这是最快的方法。

#26 楼

  public static void RemoveDuplicates<T>(IList<T> list )
  {
     if (list == null)
     {
        return;
     }
     int i = 1;
     while(i<list.Count)
     {
        int j = 0;
        bool remove = false;
        while (j < i && !remove)
        {
           if (list[i].Equals(list[j]))
           {
              remove = true;
           }
           j++;
        }
        if (remove)
        {
           list.RemoveAt(i);
        }
        else
        {
           i++;
        }
     }  
  }

#27 楼

我认为最简单的方法是：
创建新列表并添加唯一项。
示例：

        class MyList{
    int id;
    string date;
    string email;
    }
    
    List<MyList> ml = new Mylist();

ml.Add(new MyList(){
id = 1;
date = "2020/09/06";
email = "zarezadeh@gmailcom"
});

ml.Add(new MyList(){
id = 2;
date = "2020/09/01";
email = "zarezadeh@gmailcom"
});

 List<MyList> New_ml = new Mylist();

foreach (var item in ml)
                {
                    if (New_ml.Where(w => w.email == item.email).SingleOrDefault() == null)
                    {
                        New_ml.Add(new MyList()
                        {
                          id = item.id,
     date = item.date,
               email = item.email
                        });
                    }
                }

编程黑洞网