Mass reset Java array on disk

I have two arrays (int and long) that contains millions of records. So far, I am doing this using a DataOutputStream and using a long buffer, so the disk I / O costs become low (nio is also more or less the same as I have a huge buffer, so I / O access is low) using

DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream("abc.txt"),1024*1024*100));

for(int i = 0 ; i < 220000000 ; i++){
    long l = longarray[i];
    dos.writeLong(l);
}

But this takes several seconds (more than 5 minutes). In fact, I want a volume stream (some kind of main memory on a disk’s memory card). For this, I found a good approach in here and here , However, I can’t figure out how to use this in my javac. Can someone help me in this or any other way to make it beautiful?

+3
source share
2 answers

On my machine 3.8 GHz i7 with SSD

DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream("abc.txt"), 32 * 1024));

long start = System.nanoTime();
final int count = 220000000;
for (int i = 0; i < count; i++) {
    long l = i;
    dos.writeLong(l);
}
dos.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to write %,d longs%n",
        time / 1e9, count);

prints

Took 11.706 seconds to write 220,000,000 longs

Using memory mapped files

final int count = 220000000;

final FileChannel channel = new RandomAccessFile("abc.txt", "rw").getChannel();
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, count * 8);
mbb.order(ByteOrder.nativeOrder());

long start = System.nanoTime();
for (int i = 0; i < count; i++) {
    long l = i;
    mbb.putLong(l);
}
channel.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to write %,d longs%n",
        time / 1e9, count);

// Only works on Sun/HotSpot/OpenJDK to deallocate buffer.
((DirectBuffer) mbb).cleaner().clean();

final FileChannel channel2 = new RandomAccessFile("abc.txt", "r").getChannel();
MappedByteBuffer mbb2 = channel2.map(FileChannel.MapMode.READ_ONLY, 0, channel2.size());
mbb2.order(ByteOrder.nativeOrder());
assert mbb2.remaining() == count * 8;
long start2 = System.nanoTime();
for (int i = 0; i < count; i++) {
    long l = mbb2.getLong();
    if (i != l)
        throw new AssertionError("Expected "+i+" but got "+l);
}
channel.close();
long time2 = System.nanoTime() - start2;
System.out.printf("Took %.3f seconds to read %,d longs%n",
        time2 / 1e9, count);

// Only works on Sun/HotSpot/OpenJDK to deallocate buffer.
((DirectBuffer) mbb2).cleaner().clean();

prints on my 3.8 GHz i7.

Took 0.568 seconds to write 220,000,000 longs

on slower machine printing

Took 1.180 seconds to write 220,000,000 longs
Took 0.990 seconds to read 220,000,000 longs

Is there any other way not to create this? Because I already have this array in my main memory, and I can’t allocate more than 500 MB for this?

This does not use less than 1 Kbyte of heap. If you look at how much memory is used before and after this call, you usually will not see any increase.

Another thing, does this mean that efficient loading also means MappedByteBuffer?

In my experience, using a memory mapped file is the fastest as you reduce the number of system calls and copies to memory.

- read (buffer), . ( , , 220 int array -float array 5 )

, .

: readLong

. writeLong/readLong , Intel/AMD, .

big-endian, , (DataInput/OutputStream endian)

+2

16 2,13 []

, - Java-.

( , , ).

:

+1

All Articles