Best way to extract text from a 1.3 GB text file using PHP?

I have a 1.3 GB text file that I need to extract from PHP. I researched it and came up with several different ways to do what I need to do, but, as always, after a little clarification about which method will be better, or if better, which I don’t know about?

The information I need in a text file is only the first 40 characters of each line, and the file contains about 17 million lines. 40 characters from each line will be inserted into the database.

The following are the methods below:

// REMOVE TIME LIMIT
set_time_limit(0);
// REMOVE MEMORY LIMIT
ini_set('memory_limit', '-1');
// OPEN FILE
$handle = @fopen('C:\Users\Carl\Downloads\test.txt', 'r');
if($handle) {
    while(($buffer = fgets($handle)) !== false) {
        $insert[] = substr($buffer, 0, 40);
    }
    if(!feof($handle)) {
        // END OF FILE
    }
    fclose($handle);
}

It reads each row above at a time and receives the data, I have all the database inserts sorted, doing 50 inserts ten times in the transaction.

, file() , foreach ? , 17 .

, , , , script header?

? , ?

script wamp, , script 0. , script ?

+5
3

, "file()", , , script.

"insert []", RAM. , .

, "cut", .

cut -c1-40 file.txt

stdout PHP script, .

cut -c1-40 file.txt | php -f inserter.php

inserter.php php://stdin DB.

"cut" - , Linux, Windows, MinGW msystools ( git) win32 gnuWin32.

+5

PHP, ? , MySQL LOAD DATA INFILE:

LOAD DATA INFILE 'data.txt'
INTO TABLE `some_table`
  FIELDS TERMINATED BY ''
  LINES TERMINATED BY '\n';
  ( @line )
SET `some_column` = LEFT( @line, 40 );

.

MySQL mysqlimport, .

+2

. fgets() , , . , fgets() . file(). - , .

, fgets() , . , , :

function fgetl($fp, $len) {
    $l = 0;
    $buffer = '';
    while (false !== ($c = fgetc($fp)) && PHP_EOL !== $c) {
        if ($l < $len)
            $buffer .= $c;
        ++$l;
    }
    if (0 === $l && false === $c) {
        return false;
    }
    return $buffer;
}

Perform an insert operation immediately or you will lose memory. Make sure you use prepared statementsto insert this many lines; this will significantly reduce lead time. You do not want to send a full request for each insert when you can send data only.

+1
source

All Articles