Returns a list of counted, unique items

I have been provided with a file with hundreds of thousands of values, inconsistently formatted in a .CSV file. The structure may resemble:

A, B, C, D
E, F
G, H, I, J, K, L, M, N, O
P, Q, R, S

Etc.

All I need to do is: a) list unique values ​​and b) count instances of the same. We will be happy to do this in R, Excel or any other recommended tool.

I usually use something like the Google Docs' = UNIQUE and = COUNT functions, but the spreadsheet is too large to load there. And I did not find the exact equivalents in Excel, oddly enough.

Any help was appreciated.

+3
source share
3 answers

, ( R):

# Emulate your file
cat('A,B,C,D\nB,D\nA,A,F,Q,F\n', file='foo.csv')

x <- scan('foo.csv', what='', sep=',')
table(x)
#x
#A B C D F Q 
#3 2 1 2 2 1
+5

" ":

%perl -F',' -a -n -e 'chomp, $count{$_}++ foreach (@F); END {print "$_: $count{$_}\n" foreach sort keys %count;}'
0

I assume you know how to import data into R? Something like read.csv should work ... Without going into functions like application, you can make a simple loop to search for counters of unique values ​​(for example, letters):

set.seed(1)

OBJ <- LETTERS[round(runif(1000, min=1, max=26))]
VALS <- unique(OBJ)
VALS
COUNTS <- rep(0*length(VALS))
for(i in seq(VALS)){
    COUNTS[i] <- length(which(OBJ==VALS[i]))
}

data.frame(VALS, COUNTS)
0
source

All Articles