On my machine 3.8 GHz i7 with SSD
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream("abc.txt"), 32 * 1024));
long start = System.nanoTime();
final int count = 220000000;
for (int i = 0; i < count; i++) {
long l = i;
dos.writeLong(l);
}
dos.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to write %,d longs%n",
time / 1e9, count);
prints
Took 11.706 seconds to write 220,000,000 longs
Using memory mapped files
final int count = 220000000;
final FileChannel channel = new RandomAccessFile("abc.txt", "rw").getChannel();
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_WRITE, 0, count * 8);
mbb.order(ByteOrder.nativeOrder());
long start = System.nanoTime();
for (int i = 0; i < count; i++) {
long l = i;
mbb.putLong(l);
}
channel.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to write %,d longs%n",
time / 1e9, count);
((DirectBuffer) mbb).cleaner().clean();
final FileChannel channel2 = new RandomAccessFile("abc.txt", "r").getChannel();
MappedByteBuffer mbb2 = channel2.map(FileChannel.MapMode.READ_ONLY, 0, channel2.size());
mbb2.order(ByteOrder.nativeOrder());
assert mbb2.remaining() == count * 8;
long start2 = System.nanoTime();
for (int i = 0; i < count; i++) {
long l = mbb2.getLong();
if (i != l)
throw new AssertionError("Expected "+i+" but got "+l);
}
channel.close();
long time2 = System.nanoTime() - start2;
System.out.printf("Took %.3f seconds to read %,d longs%n",
time2 / 1e9, count);
((DirectBuffer) mbb2).cleaner().clean();
prints on my 3.8 GHz i7.
Took 0.568 seconds to write 220,000,000 longs
on slower machine printing
Took 1.180 seconds to write 220,000,000 longs
Took 0.990 seconds to read 220,000,000 longs
Is there any other way not to create this? Because I already have this array in my main memory, and I can’t allocate more than 500 MB for this?
This does not use less than 1 Kbyte of heap. If you look at how much memory is used before and after this call, you usually will not see any increase.
Another thing, does this mean that efficient loading also means MappedByteBuffer?
In my experience, using a memory mapped file is the fastest as you reduce the number of system calls and copies to memory.
- read (buffer), . ( , , 220 int array -float array 5 )
, .
: readLong
. writeLong/readLong , Intel/AMD, .
big-endian, , (DataInput/OutputStream endian)