I do not think your loops are equivalent. It seems you are combining each element into an array with a processor implementation, but doing some sort of count for arrayfun.
Despite this, I think the explanation you are looking for is as follows:
GPU - - . , i [cell_i]^2 . , S-, S - , GPU. . , .
, , : * array_fun * , , . , , , . , .