Best algorithm to remove duplicate values from a list

Question

Best algorithm to remove duplicate values from a list

What is the best algorithm to remove duplicate values from a list? I tried this:

for (int i = 0; i < AuthorCounter-1; i++)
{
    for (int j = 0; j < AuthorCounter-1; j++)
    {
        if (i != j)
        {
            if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[j].Text)
            {
                AuthorGroupNode.Nodes[j].Remove();
                AuthorCounter--;
            }

        }
    }
}

Here AuthorGroupNodesis a list of nodes. It did everything in order, but not perfect. Anyone has a better solution.

+5

list c # algorithm duplicate-removal

Itz.Irshad Jul 17 '12 at 4:22

source share

4 answers

Eric J. · Answer 1 · 2012-07-17T04:28:02+0000

Your current algorithm is O (N-square), which will perform rather poorly for a large list.

If space is not a problem, you can save the HashSet<int>hashes of the nodes. Move the list once. If the node hash is in a HashSet, you know that this is a duplicate of the node. Skip it. If the hash is not in the HashSet, add this node to the new list and add the hash from the node to the HashSet.

O (N) , HashSet. .

Linq,

var distinctList = originalList.Distinct().ToList();

UPDATE

, , Distinct.

public static IEnumerable<TSource> Distinct<TSource>( 
    this IEnumerable<TSource> source) 
{ 
    return source.Distinct(EqualityComparer<TSource>.Default); 
} 

public static IEnumerable<TSource> Distinct<TSource>( 
    this IEnumerable<TSource> source, 
    IEqualityComparer<TSource> comparer) 
{ 
    if (source == null)  
    { 
        throw new ArgumentNullException("source"); 
    } 
    return DistinctImpl(source, comparer ?? EqualityComparer<TSource>.Default); 
} 

private static IEnumerable<TSource> DistinctImpl<TSource>( 
    IEnumerable<TSource> source, 
    IEqualityComparer<TSource> comparer) 
{ 
    HashSet<TSource> seenElements = new HashSet<TSource>(comparer); 
    foreach (TSource item in source) 
    { 
        if (seenElements.Add(item)) 
        { 
            yield return item; 
        } 
    } 
}

https://codeblog.jonskeet.uk/2010/12/30/reimplementing-linq-to-objects-part-14-distinct/

Enigmativity · Answer 2 · 2012-07-17T04:52:01+0000

:

var xs = new []
{
    2, 3, 2, 4, 3, 3, 5, 6,
};

var ys = xs
    .ToLookup(z => z, z => z)
    .Select(x => x.First());

:

var nodes = AuthorGroupNode.Nodes
    .ToLookup(z => z.Text, z => z)
    .Select(x => x.First())
    .ToArray();

.: -)

Glenn Ferrie · Answer 3 · 2012-07-17T04:34:50+0000

Piggy, Eric J., ... EqualityComparer, , .

class Program
{
    static void Main(string[] args)
    {
        var list = new List<SampleClass>();
        // add some items

        var distinctItems = list.Distinct(new SampleClass());
    }
}

public class SampleClass : EqualityComparer<SampleClass>
{
    public string Text { get; set; }

    public override bool Equals(SampleClass x, SampleClass y)
    {
        if (x == null || y == null) return false;
        return x.Text == y.Text;
    }

    public override int GetHashCode(SampleClass obj)
    {
        if (obj == null) return 0;
        if (obj.Text == null) return 0;
        return obj.Text.GetHashCode();
    }
}

: http://msdn.microsoft.com/en-us/library/bb338049

Jason goemaat · Answer 4 · 2012-07-17T04:42:26+0000

, :

for (int j = 0; j < AuthorCounter; j++)

. , = 0 j = 1, , = 1 j = 0. j i. = 0, , , AuthorGroupNodes.Nodes[0] . , AuthorGroupNodes.Nodes[1] . j + 1 == j. , node, j node. node j, , , j j, node:

for (int j = i + 1; j < AuthorCounter;)
{
    if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[j].Text)
    {
        AuthorGroupNode.Nodes[j].Remove();
        AuthorCounter--;
    }
    else
    {
        j++;
    }
}

You say that this works, but not perfect, so I assume that you are not using a standard list and that your nodes handle their own removal from the list using the Remove () method.

If the list is sorted by the field you are comparing, you can completely remove the inner for loop and remove any duplicates of the current element until you find another:

for (int i = 0; i < AuthorCounter-1;)
{
    if (AuthorGroupNode.Nodes[i].Text == AuthorGroupNode.Nodes[i + 1].Text)
    {
        AuthorGroupNode.Nodes[i].Remove();
        AuthorCounter--;
    }
    else
    {
        i++;
    }
}

Best algorithm to remove duplicate values ​​from a list

More articles:

Best algorithm to remove duplicate values from a list