I have a text file (~ 10 GB) with the following format:
data1<TAB>data2<TAB>data3<TAB>data4<NEWLINE>
I want to scan it and process it only on data2. What is the best (fastest) way to extract data2in C ++.
data2
EDIT: added by NEWLINE
Read the file line by line. For each row, divide it in the tab. This will leave you with an array containing the fields, which allows you to work with the second field (data2).
This sounds like a job for a higher level tool, such as shell utilities:
cut -f2 # from stdin cut -f2 <my_file # from file
Nevertheless, you can do this with C ++ too:
void parse(std::istream& in) { std::string word; while( in ) { std::cin >> word; // throwaway 1 std::cin >> word; // data2 process(word); std::cin >> word >> word; // throwaway 3 and 4 } } // ... parse(std::cin); std::ifstream file("my_file"); parse(file);
, ( 10gig), , '\t', .
'\t'
#include <fstream> #include <string> int main(){ std::fstream fin("your_file.txt"); while(fin){ std::string data2; char sink = '\0'; // skip to first tab fin.ignore(1024,'\t'); fin >> data2; // do stuff with data2 // skip to next line fin.ignore(1024,'\n'); } }
. , . - strtok() .
strtok()
, , - . , linux. 2.6 , - Linux (AIO). , aio_read , aio_suspend, ( ) . char *, . , , std::string ( ) . , . , .
aio_read
aio_suspend
, , .
iostream, . - fscanf. :
#include <stdio.h> ... FILE* fp = fopen(path_to_file, "r"); char[256] data; while(fscanf(fp, "%*s<tab>%s<tab>%*s<tab>%*s", data)) { do what you want with your data }