Suppose I try to sum a single variable (name it var_1) in a very large dataset (almost terabytes). The dataset is long and wide. My code would look like this:
PROC MEANS DATA=my_big_dataset SUM;
VAR var_1;
RUN;
Would I get a performance boost at all by using a parameter KEEPin a read dataset? I.e:
PROC MEANS DATA=my_big_dataset (KEEP=var_1) SUM;
VAR var_1;
RUN;
In terms of disk I / O, I believe that every record should be read in its entirety no matter what. But perhaps less memory is required to read the records. Any advice is appreciated.
source
share