Easy language identification with LINQ

I am experimenting with LINQ for the first time and decided to try basic identification of the human language. The input text is tested on HashSetthe most common 10,000 words in the language and is rated.

My question is, is there a better approach to LINQ query? Maybe another form that I do not know? It works, but I'm sure the experts here can provide a much cleaner solution!

public PolyAnalyzer() {
    Dictionaries = new Dictionary<string, AbstractDictionary>();
    Dictionaries.Add("Bulgarian", new BulgarianDictionary());
    Dictionaries.Add("English", new EnglishDictionary());
    Dictionaries.Add("German", new GermanDictionary());
    Dictionaries.Values.Select(n => new Thread(() => n.LoadDictionaryAsync())).ToList().ForEach(n => n.Start());            
}  

public string getResults(string text) {
    int total = 0;
    return string.Join(" ",
        Dictionaries.Select(n => new {
            Language = n.Key,
            Score = new Regex(@"\W+").Split(text).AsQueryable().Select(m => n.Value.getScore(m)).Sum()
        }).
        Select(n => { total += n.Score; return n; }).
        ToList().AsQueryable(). // Force immediate evaluation
        Select(n =>
        "[" + n.Score * 100 / total + "% " + n.Language + "]").
        ToArray());
}

PS I know that this is a very simplified approach to identifying the language, I'm just interested in the LINQ side.

+3
source share
2 answers

I would reorganize it like this:

    public string GetResults(string text)
    {
        Regex wordRegex = new Regex(@"\W+");
        var scores = Dictionaries.Select(n => new
            {
                Language = n.Key,
                Score = wordRegex.Split(text)
                                 .Select(m => n.Value.getScore(m))
                                 .Sum()
            });

        int total = scores.Sum(n => n.Score);
        return string.Join(" ",scores.Select(n => "[" + n.Score * 100 / total + "% " + n.Language + "]");
    }

A few points:

  • AsQueryAble() - Linq , IEnumerable<T> - .

  • ToList() - , .

  • LINQ - , ( ) . (imo).

  • - , total - - LINQ , . Linq.

  • Linq , - I Linq - Regex N . .

+4

, , , . , , (, , , ), . , .

public PolyAnalyzer()
{
    Dictionaries = new Dictionary<string, AbstractDictionary>();
    Dictionaries.Add("Bulgarian", new BulgarianDictionary());
    Dictionaries.Add("English", new EnglishDictionary());
    Dictionaries.Add("German", new GermanDictionary());

    //Tip: Use the Parallel library to to multi-core, multi-threaded work.
    Parallel.ForEach(Dictionaries.Values, d =>
    {
        d.LoadDictionaryAsync();
    });            
}  

public Dictionary<string, int> GetResults(string text)
{
    //1) Split the words.
    //2) Calculate the score per dictionary (per language).
    //3) Return the scores.
    string[] words = new Regex(@"\w+").Split().ToArray();
    Dictionary<string, int> scores = this.Dictionaries.Select(d => new
    {
        Language = d.Key,
        Score = words.Sum(w => d.Value.GetScore(w))
    }));

    return scores;
}
+1

All Articles