I'm trying to remove outliers from the tick data series, following Brownley and Gallo 2006 (if you might be interested).
The code works fine, but provided that I work on really long vectors (most have 20-meter observations, but after 20 hours this was not done), I was wondering how to speed it up.
What I have done so far:
I changed the time and date format to a numeric double, and I saw that it saves quite a lot of processing time and LOT OF MEMORY.
I allocated memory for vectors:
[n] = size(price);
x = price;
score = nan(n,'double'); %using tic and toc I saw that nan requires less time than zeros
trimmed_mean = nan(n,'double');
sd = nan(n,'double');
out_mat = nan(n,'double');
Here is a loop I would like to remove. I read that vectorization will be accelerated, especially using long vectors.
for i = k+1:n
trimmed_mean(i) = trimmean(x(i-k:i-1 & i+1:i+k),10,'round'); %trimmed mean computed on the 'k' closest observations to 'i' (i is excluded)
score(i) = x(i) - trimmed_mean(i);
sd(i) = std(x(i-k:i-1 & i+1:i+k)); %same as the mean
tmp = abs(score(i)) > (alpha .* sd(i) + gamma);
out_mat(i) = tmp*1;
end
Here is what I tried to do
trimmed_mean=trimmean(regroup_matrix,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(regroup_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
out_mat = temp*1;
, , Matlab, , . , : regroup_matrix= nan (n,2*k).
EDIT: , , ( ):
"x" (n, 1) "i" "x" "k" "i" ( ik i-1 + 1 + k) (n, 2 * k).
EDIT 2: , , . , , , :
Matlab: for
:
[n] = size(price,1);
x = price;
[j1]=find(x);
matrix_left=zeros(n, k,'double');
matrix_right=zeros(n, k,'double');
toc
matrix_left(j1(k+1:end),:)=x(j1-k:j1-1);
matrix_right(j1(1:end-k),:)=x(j1+1:j1+k);
matrix_group=[matrix_left matrix_right];
trimmed_mean=trimmean(matrix_group,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(matrix_group,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
matrix_left matrix_right.
j1, , .
j1=[1:1:n]
- double (n, 1)