List of all words in a text file with the number of occurrences?

Question

Suppose I have a file text.txtas shown below:

she likes cats, and he likes cats too.

I would like my result to look like this:

she 1
likes 2
cats 2
and 1
he 1
too 1

If put space , .into it, it will simplify the scripts, it will be fine.

Is there a simple shell pipeline that could achieve this?

+5

Jackwm Mar 14 '13 at 3:25

2 answers

With GNU awk, you can simply specify a record separator (RS) like any sequence of non-alphabetic characters:

$ gawk -v RS='[^[:alpha:]]+' '{sum[$0]++} END{for (word in sum) print word,sum[word]}' file
she 1
likes 2
and 1
too 1
he 1
cats 2

but this will not solve your problem, how to identify the "words" in general.

0

Ed morton Mar 14 '13 at 21:00

phs · Accepted Answer · 2013-03-14T03:28:51+0000

Here is one liner close to my heart:

cat text.txt | sed 's|[,.]||g' | tr ' ' '\n' | sort | uniq -c

Suppression characters sed (adjust the regular expression to taste), tr puts the results one word at a time.