Partition Separation Data

I have a text file (~ 10 GB) with the following format:

data1<TAB>data2<TAB>data3<TAB>data4<NEWLINE>

I want to scan it and process it only on data2. What is the best (fastest) way to extract data2in C ++.

EDIT: added by NEWLINE

+3
source share
6 answers

Read the file line by line. For each row, divide it in the tab. This will leave you with an array containing the fields, which allows you to work with the second field (data2).

+4
source

This sounds like a job for a higher level tool, such as shell utilities:

cut -f2           # from stdin
cut -f2 <my_file  # from file

Nevertheless, you can do this with C ++ too:

void parse(std::istream& in)
{
    std::string word;
    while( in ) {
        std::cin >> word;  // throwaway 1
        std::cin >> word;  // data2
        process(word);
        std::cin >> word >> word;  // throwaway 3 and 4
    }
}

// ...
parse(std::cin);
std::ifstream file("my_file");
parse(file);
+2
source

, ( 10gig), , '\t', .

#include <fstream>
#include <string>

int main(){
  std::fstream fin("your_file.txt");

  while(fin){
    std::string data2;
    char sink = '\0';

    // skip to first tab
    fin.ignore(1024,'\t');

    fin >> data2;
    // do stuff with data2

    // skip to next line
    fin.ignore(1024,'\n');
  }
}
+1

. , . - strtok() .

+1

, , - . , linux. ​​2.6 , - Linux (AIO). , aio_read , aio_suspend, ( ) . char *, . , , std::string ( ) . , . , .

, , .

+1

iostream, . - fscanf. :

#include <stdio.h>

...

FILE* fp = fopen(path_to_file, "r");
char[256] data;

while(fscanf(fp, "%*s<tab>%s<tab>%*s<tab>%*s", data))
{
   do what you want with your data
}
0

All Articles