I have some data, and I'm trying to load it into the R . It is located in .csv files and I can view data in Excel and OpenOffice. (If you're curious, this is the 2011 survey results from the available Canadian election data here ).
Data is encoded in an unusual way. Typical line:
12002,Central Nova","Nova-Centre"," 1","River John",N,N,"",1,299,"Chisholm","","Matthew","Green Party","Parti Vert",N,N,11
There is Central-Nova at the end ", but not at the beginning. Therefore, in order to read the data, I suppressed quotes that worked fine for the first few files. i.e.
test<-read.csv("pollresults_resultatsbureau11001.csv",header = TRUE,sep=",",fileEncoding="latin1",as.is=TRUE,quote="")
Now here is the problem: in another file (for example, pollresults_resultatsbureau12002.csv) there is a data line like this:
12002,Central Nova","Nova-Centre"," 6-1","Pictou, Subd. A",N,N,"",0,168,"Parker","","David K.","NDP-New Democratic Party","NPD-Nouveau Parti democratique",N,N,28
Since I need to suppress quotes, writing "Pictou, Subd. A"allows R to split it into 2 variables. Data cannot be read because it wants to add a column halfway through the creation of a data block.
Excel and OpenOffice can open these files without problems. Somehow, Excel and OpenOffice know that quotation marks only matter if they are at the beginning of a variable.
Do you know which option I need to enable on R to get this data? I have> 300 files that I need to upload (each with ~ 1000 lines each), so manually fixing is not an option ...
I looked everywhere for a solution, but cannot find it.