Why is converting the number of rows in C ++ too slow?

This function reads an array of twos from a string:

vector<double> parseVals(string& str) {
    stringstream ss(str);
    vector<double> vals;
    double val;
    while (ss >> val) vals.push_back(val);
    return vals;
}

When called with a line containing 1 million numbers, the function takes 7.8 seconds to execute (Core i5, 3.3 GHz). This means that 25,000 CPU cycles are spent parsing ONE NUMBER.

user315052 pointed out that the same code works an order of magnitude faster on its system, and further testing showed very large differences in performance between different systems and compilers (also see user response315052):

1. Win7, Visual Studio 2012RC or Intel C++ 2013 beta: 7.8  sec
2. Win7, mingw / g++ 4.5.2                          : 4    sec
3. Win7, Visual Studio 2010                         : 0.94 sec
4. Ubuntu 12.04, g++ 4.7                            : 0.65 sec

I found a great alternative in the Boost / Spirit library. The code is safe, compressed and extremely fast (0.06 seconds on VC2012, 130x faster than stringstream).

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

vector<double> parseVals4(string& str) {
    vector<double> vals;
    qi::phrase_parse(str.begin(), str.end(),
        *qi::double_ >> qi::eoi, ascii::space, vals);
    return vals;
}

, , . , STL . , STL, .

PS: - O2 . stringstream, . .

+5
5

Linux VM, 1,6 i7, . , , . - , , . , , , .

: Linux g++ 4.6.3, -O3. MS Intel, cygwin g++ 4.5.3, -O3. Linux : : Windows 7 - 64-, Linux VM. , cygwin 32- .

elapsed: 0.46 stringstream
elapsed: 0.11 strtod

cygwin :

elapsed: 1.685 stringstream
elapsed: 0.171 strtod

, cygwin Linux MS. , cygwin Linux VM.

I, istringstream.

std::vector<double> parseVals (std::string &s) {
    std::istringstream ss(s);
    std::vector<double> vals;
    vals.reserve(1000000);
    double val;
    while (ss >> val) vals.push_back(val);
    return vals;
}

I, strtod.

std::vector<double> parseVals2 (char *s) {
    char *p = 0;
    std::vector<double> vals;
    vals.reserve(1000000);
    do {
        double val = strtod(s, &p);
        if (s == p) break;
        vals.push_back(val);
        s = p+1;
    } while (*p);
    return vals;
}

, .

std::string one_million_doubles () {
    std::ostringstream oss;
    double x = RAND_MAX/(1.0 + rand()) + rand();
    oss << x;
    for (int i = 1; i < 1000000; ++i) {
        x = RAND_MAX/(1.0 + rand()) + rand();
        oss << " " << x;
    }
    return oss.str();
}

, :

template <typename PARSE, typename S>
void time_parse (PARSE p, S s, const char *m) {
    struct tms start;
    struct tms finish;
    long ticks_per_second;
    std::vector<double> vals_vec;

    times(&start);
    vals_vec = p(s);
    times(&finish);
    assert(vals_vec.size() == 1000000);
    ticks_per_second = sysconf(_SC_CLK_TCK);
    std::cout << "elapsed: "
              << ((finish.tms_utime - start.tms_utime
                   + finish.tms_stime - start.tms_stime)
                  / (1.0 * ticks_per_second))
              << " " << m << std::endl;
}

main:

int main ()
{
    std::string vals_str;

    vals_str = one_million_doubles();
    std::vector<char> s(vals_str.begin(), vals_str.end());

    time_parse(parseVals, vals_str, "stringstream");
    time_parse(parseVals2, &s[0], "strtod");
}
+5

std::stringstream, . - , #include <cstdlib> std::strtod().

+2

string double , Corei5 .

short float int , , , double .

, , , , , -.0 INF 4E6 -NAN. , , , double .

+1

. , . , , , , . , , , , , . , .

0

stringstream , , .

However, we do not have enough information to answer your question. Are you planning on optimizing fully included? Is your function inline or is there a function call with every call?

For suggestions on how to speed things up, you should consider boost::lexical_cast<double>(str)

0
source

All Articles