How to detect sections (clusters) of sparse data in linear time and (hopefully) sublinear space?

Say I have m integers from n disjoint integer intervals that are, in a sense, “far apart”. n is not known beforehand, but it is known as small (see assumptions below).

For example, for n = 3, I could get random distributed integers from the intervals 105-2400, 58030-571290, 1000000-1000100.

Finding the minimum (105) and maximum (1000100) is clearly trivial.
But is there a way to efficiently (in O (m) time and, hopefully, o (m) space) find the boundary points of the intervals so that I can quickly split the data for separate processing?

If there is no effective way to do this accurately, is there an effective way to approximate sections up to a small constant factor (e.g. 2)?
(For example, 4000 would be an acceptable approximation of the upper bound of a smaller interval, and 30000 would be an acceptable approximation of the lower bound of the middle interval.)

Assumptions:

  • All non-negative

  • n is very small (say, <10)

  • The maximum value is relatively large (for example, of the order of 2 26 )

  • Intervals are dense (i.e. there is an integer in the array for most values ​​within this interval)

  • , . ( :. , c , . , 1 1000000 500000-2000000.)

  • , . O (m) radix, radix O ( ), , - m.

, ; , .

+3
4

, , , - , , radix.

, . 2, .

, radix O (m log ( )) ≈ O (m) time ( log ( ) 26 ), .

0

, c , , , , . , (0.5X .. 0.5X+0.5) , X - .

, c 2, 2 26 52 , floor(2*log<sub>2</sub>i), i - , . , m , , , .

, , c, aka 128, 181, 256, 363, 512 .. .

. , . , :

  • . .
  • , . . .
  • .
  • . .

: ( )

counters=[];
lowest=[];
highest=[];
for (i=0;i<m;i++) { 
    x=getNextInteger();
    n=Math.floor(2.0*logByBase(c,x));
    counters[n]++;
    if (counters[n]==1) {
        lowest[n]=x;
        highest[n]=x;
    } else {
        if (lowest[n]>x) lowest[n]=x;
        if (highest[n]<x) highest[n]=x;
    }
}
zeroflag=true; /// are we in mode of finding a zero or a nonzero
intervals=[];
currentLow=0;
currentHigh=0;
for (i=0;i<counters.length;i++) {
    if (zeroflag) {
        // we search for nonzero
        if (counters[i]>0) {
            currentLow=lowest[i]; // there a value
            zeroflag=false;
        } // else skip
    } else {
        if (counters[i]==0) {
            currentHigh=highest[i-1]; // previous was nonzero, get its highest
            intervals.push([currentLow,currentHigh]); // store interval
            zeroflag=true;
    }
}
if (!zeroflag) { // unfinished interval 
    currentHigh=highest[counters.length-1];
    intervals.push([currentLow,currentHigh]); // store last interval
}
+4

.

; .

, .

  • 10000 .

  • .

  • , .

, . - .

+1

. X_i , , X_i=16*i. , X_i=4*i*i X_i=2^(i/16), , . , .
, . . , , . , , 16, 15. 2 26 - 16 , 2 19 512 .

0

All Articles