Combining color filters

I use flowering filters to check for duplicate data in a set. However, it is necessary to combine the results of the two data sets into one filter to check for duplication between the two sets. I developed a function in pseudo-Python to accomplish this task:

def combine(a : bloom_filter, b : bloom_filter):
    assert a.length == b.length
    assert a.hashes == b.hashes

    c = new bloom_filter(length = a.length, hashes = b.hashes)
    c.attempts = a.attempts + b.attempts
    c.bits = a.bits | b.bits

    # Determining the amount of items
    a_and_b = count(a & b)
    a_not_b = count(a & !b)
    not_a_b = count(!a & b)
    neither = count(!a & !b)
    c.item_count = a_not_b / a.length * a.item_count
                 + not_a_b / b.length * b.item_count
                 + a_and_b / c.length * min(a.item_count, b.item_count)

    return c

Does that even sound right? I have considerable internal debate as to whether it is even possible to do what I intend, since most of the information about the source data is lost (which is the point of the bloom filter).

+3
source share
2 answers

You can get the formula for estimating the number of elements: color filter:

c = log(z / N) / ((h * log(1 - 1 / N))

N: Number of bits in the bit vector
h: Number of hashes
z: Number of zero bits in the bit vector

Bloom Filter. .

+2

..... ..

, A

, B

16- CRC32

crc32(apples) = 0x70CCB02F

crc32(oranges) = 0x45CDF3B4

crc32(peas) = 0xB18D0C2B

crc32(carrots) = 0x676A9E28

w/ (BF) (, 16 ) (A, B)

BFA = BFB = 0000 0000 0000 0000 

, , 4 BF.

Get Apples BF Index list by splitting up the hash:

0x70CCB02F = 0111 0000 1100 1100 1011 0000 0010 1111
             7      0    C    C   B     0    2     F     
----------------------------------------------------

Add Apples to BFA by setting BF bit indexes [ 7, 0, 12, 12, 11, 0, 2, 15]

                                 (set the index bit of an empty BF to 1)
Apples =     1001 1000 1000 0101 (<- see indexes 0,2,7,11,12,15 are set)
BF =         0000 0000 0000 0000  (or operation adds that item to the BF)
================================
Updated BFA = 1001 1000 1000 0101 

BF :

0x45CDF3B4 = 0100 0101 1100 1101 1111 0011 1011 0100
              4    5    12   13   15    3   11   4
----------------------------------------------------
Add oranges to BF by setting BF bit indexes [ 4,5,12,13,15,3,11,4]

Oranges =      1011 1000 0011 1000 
BFA =          1001 1000 1000 0101  (or operation)
================================
Updated BFA =  1011 1000 1011 1101 

, BF1 w/ 1011 1000 1011 1101

BFB

crc32(peas) = 0xB18D0C2B becomes => 
set [11,2,12,0,13,1,8] in BFB
 0011 1001 0000 0011 = BF(peas)

crc32(carrots) = 0x676A9E28 becomes => 
set [8,2,14,9,10,6,7] in BFB

0100 0111 1100 0100 = BF(carrots)

so BFB = 
0011 1001 0000 0011  BF(peas)
0100 0111 1100 0100  BF(carrots)
===================  ('add' them to BFB via locial or op)
0111 1111 1100 0111

B A :

B "" = >

 1011 1000 0011 1000 (Oranges BF representation)
 0111 1111 1100 0111 (BFB)
=====================     (and operation)
 0011 1000 0000 0000  

(0011 1000 0000 0000) BF , , B

...... ( )

, B A, , B .

, , , , BF, . , xor op, "" , :

0111 1111 1100 0111 (BFB)
1011 1000 1011 1101 (BFA)
========================
1100 0111 0111 1010 (BFA xor BFB) == (items in B not in A, and items in A not in B)

BF, 100% , 100%.

, (, A):

 1100 0111 0111 1010 (BFA xor BFB)
 0011 1001 0000 0011 (Peas)
============================== (And operation)
 0000 0001 0000 0010 (non-zero)

(BFA xor BFB) && (Peas) != 0 , ""...

, , , , , , ...

, !

+1

All Articles