The fastest way to read a huge amount of int from a binary file

I am using Java 1.5 on an embedded Linux device and want to read a binary file with 2 MB of int values. (now 4 bytes of Big Endian, but I can decide the format)

Using DataInputStreamthrough BufferedInputStreamwith dis.readInt()), these 500,000 calls take 17 seconds to read, but it takes 5 seconds to read a file into one large byte buffer.

How can I read this file faster in one huge int []?

The reading process should not use more than 512 kb.

This code below using niono faster than the readInt () method from java io.

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

Update: rating answers:

IntBuffer .
jit java.io DataiInputStream.readInt() (17s, 20s MemMap IntBuffer)

: . ( init)

+5
3

, , , .

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }
+4

IntBuffer nio → http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html

int[] intArray = new int[ 5000000 ];

IntBuffer intBuffer = IntBuffer.wrap( intArray );

...

, inChannel.read(intBuffer).

intArray 500000 .

, ByteBuffer.

// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];

// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );

// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );

// Fill in the buffer
while ( buf.hasRemaining( ) )
{
   // Per EJP suggestion check EOF condition
   if( inChannel.read( buf ) == -1 )
   {
       // Hit EOF
       throw new EOFException( );
   }
}

buf.flip( );

// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );

// result will now contain all ints read from file
intBuffer.get( result );
+3

I did a pretty thorough experiment using serialize / deserialize, DataInputStream vs ObjectInputStream, both based on ByteArrayInputStream, to avoid I / O effects. For a million ints, readObject was about 20 ms, readInt was about 116. The overhead of serialization in an array with a millimeter was 27 bytes. It was on the MacBook Pro in 2013.

Having said that, serializing objects is evil, and you must write data using a Java program.

+2
source

All Articles