Counting individual words in a text file

Question

Counting individual words in a text file

I am trying to count the number of times a particular word occurred in a text file. The text file is specified as a program argument to the perl program.

while($text = <>)
{
    @words = split (/\W*\s+\W*/, $text);
    @words = grep (/^[a-zA-Z\-]+$/, @words);
    foreach $word (@words)
    {
        $wordCount{$word}++;
    }
}

I do not have a clear understanding of these lines -

@words = split (/\W*\s+\W*/, $text);
@words = grep (/^[a-zA-Z\-]+$/, @words);

I know I'm splitgoing to split a string into an array variable, but how? Are these like non-words? I do not understand the regex used in the split function.

What does grep, and again its regular expression is unclear to me.

PS When I check this, the code seems to have an error in case I enter a text file with text like -

fast brown fox jumps over a lazy dog dog. brown, purple fox jumps.

He counts the words foxand dogonly once, which is wrong.

What is wrong here?

+3

regex perl

goldenmean 29 '11 23:00

3

, , "". ( ), :

my $text = 'the quick brown fox jumps over the lazy dog dog.rose is brown, violet jumps the fox.';
my %wordCount;
for my $word ( $text =~ /([a-zA-Z]+|-(?=[a-zA-Z\-])(?<=[a-zA-Z\-]-))+/g ) {
    ++$wordCount{$word};
}

for my $word ( sort { $wordCount{$a} <=> $wordCount{$b} || $a cmp $b } keys %wordCount ) {
     print "$word: $wordCount{$word}\n" 
}

+1

ysth 29 '11 23:28

\W is matching word characters
\s is matching whitespace

, , dog.rose .

\b ( ). , \W *\s +\W *.

while($text = <>)
{
    @words = split (/\b/, $text);
    foreach $word (@words)
    {
        $wordCount{$word}++;
    }
}

0

cellcortex 29 '11 23:07

TLP · Accepted Answer · 2011-05-30T07:16:39+0000

, - , . , , .

- :

while ($text = <>) {
    while ($text =~ /([A-Za-z\-]+)/g)  {
        my $word = lc($1);    # dont diffrentiate between 'Dog' and 'dog'
        $count++;             # total word count
        $wordCount{$word}++;  # individual word count
    }
}

, , . , this_file , [A-Za-z\-_].

:

\W*\s+W* : , , , , , . , , . (, dog, dog ).

grep , . @words, ( ) , . , grep .

, "dog.rose" "fox." , . , , grep.

Counting individual words in a text file

More articles: