Best algorithm to remove duplicate values ​​from a list

What is the best algorithm to remove duplicate values ​​from a list? I tried this:

for (int i = 0; i < AuthorCounter-1; i++)
{
    for (int j = 0; j < AuthorCounter-1; j++)
    {
        if (i != j)
        {
            if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[j].Text)
            {
                AuthorGroupNode.Nodes[j].Remove();
                AuthorCounter--;
            }

        }
    }
}

Here AuthorGroupNodesis a list of nodes. It did everything in order, but not perfect. Anyone has a better solution.

+5
source share
4 answers

Your current algorithm is O (N-square), which will perform rather poorly for a large list.

If space is not a problem, you can save the HashSet<int>hashes of the nodes. Move the list once. If the node hash is in a HashSet, you know that this is a duplicate of the node. Skip it. If the hash is not in the HashSet, add this node to the new list and add the hash from the node to the HashSet.

O (N) , HashSet. .

Linq,

var distinctList = originalList.Distinct().ToList();

UPDATE

, , Distinct.

public static IEnumerable<TSource> Distinct<TSource>( 
    this IEnumerable<TSource> source) 
{ 
    return source.Distinct(EqualityComparer<TSource>.Default); 
} 

public static IEnumerable<TSource> Distinct<TSource>( 
    this IEnumerable<TSource> source, 
    IEqualityComparer<TSource> comparer) 
{ 
    if (source == null)  
    { 
        throw new ArgumentNullException("source"); 
    } 
    return DistinctImpl(source, comparer ?? EqualityComparer<TSource>.Default); 
} 

private static IEnumerable<TSource> DistinctImpl<TSource>( 
    IEnumerable<TSource> source, 
    IEqualityComparer<TSource> comparer) 
{ 
    HashSet<TSource> seenElements = new HashSet<TSource>(comparer); 
    foreach (TSource item in source) 
    { 
        if (seenElements.Add(item)) 
        { 
            yield return item; 
        } 
    } 
}

https://codeblog.jonskeet.uk/2010/12/30/reimplementing-linq-to-objects-part-14-distinct/

+6

:

var xs = new []
{
    2, 3, 2, 4, 3, 3, 5, 6,
};

var ys = xs
    .ToLookup(z => z, z => z)
    .Select(x => x.First());

:

var nodes = AuthorGroupNode.Nodes
    .ToLookup(z => z.Text, z => z)
    .Select(x => x.First())
    .ToArray();

.: -)

+4

Piggy, Eric J., ... EqualityComparer, , .

class Program
{
    static void Main(string[] args)
    {
        var list = new List<SampleClass>();
        // add some items

        var distinctItems = list.Distinct(new SampleClass());
    }
}

public class SampleClass : EqualityComparer<SampleClass>
{
    public string Text { get; set; }

    public override bool Equals(SampleClass x, SampleClass y)
    {
        if (x == null || y == null) return false;
        return x.Text == y.Text;
    }

    public override int GetHashCode(SampleClass obj)
    {
        if (obj == null) return 0;
        if (obj.Text == null) return 0;
        return obj.Text.GetHashCode();
    }
}

: http://msdn.microsoft.com/en-us/library/bb338049

+3

, :

for (int j = 0; j < AuthorCounter; j++)

. , = 0 j = 1, , = 1 j = 0. j i. = 0, , , AuthorGroupNodes.Nodes[0] . , AuthorGroupNodes.Nodes[1] . j + 1 == j. , node, j node. node j, , , j j, node:

for (int j = i + 1; j < AuthorCounter;)
{
    if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[j].Text)
    {
        AuthorGroupNode.Nodes[j].Remove();
        AuthorCounter--;
    }
    else
    {
        j++;
    }
}

You say that this works, but not perfect, so I assume that you are not using a standard list and that your nodes handle their own removal from the list using the Remove () method.

If the list is sorted by the field you are comparing, you can completely remove the inner for loop and remove any duplicates of the current element until you find another:

for (int i = 0; i < AuthorCounter-1;)
{
    if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[i + 1].Text)
    {
        AuthorGroupNode.Nodes[i].Remove();
        AuthorCounter--;
    }
    else
    {
        i++;
    }
}
+2
source

All Articles