How to efficiently read stdin data requiring search

I am looking for the best way to read data from a pipe stdinin C programming.

Problem: I need to look for this data, that is, I need to read data from the beginning of the stream after reading some data at the end of the same stream.

Little use case: gunzip -c 4GbDataFile.gz | myprogram

Other:

  • On the local host: nc -l -p 1234 | myprogram
  • On the remote host: gunzip -c 4GbDataFile.gz | nc -q 0 theotherhost 1234

I know that reading from fifo can only be done once. So, at the moment:

  • I delete everything from stdin in memory and work from this allocated memory.

This is ugly, but it works. The obvious problem is that if someone sends a huge (or continuous) stream to my application, I will end up with a large allocated fragment of memory or I will run out of memory. (Think of an 8Gb file)

What I thought:

  • I set a size limit (possibly user defined) for this piece of memory. As soon as I read a lot of data from stdin:
    • Or I stop here: "Wrong, stylish bavinga style. Forget it ..
    • Or, I start dumping what I read into a file , and work from this file as soon as all the data has been read.

But then, what's the point? I cannot find out the origin of the data that I am reading. If it is a local 8Gb file, I will dump it to another 8Gb file on the same system.

So my question is:

How do you efficiently read a lot of data from a pipe stdinwhen you need to search back and forth?

Thanks in advance for your answers.

Edit:

- ( ) , , , . , .. , .

stdin: , ..

, stdin . ;)

+3
3

4GbDataFile , . . -, . , , 4 .

, : 4 . , stdin , ( mmap ) .

0

. (), , , stdin. (), .

+1

, Cat.

TL; DR: cat 4gbfile | yourprogram yourprogram < 4gbfile.

If you really insist that it work with data from the channel, you need to save it in a temporary file at startup, and then replace file descriptor 0 with a copy of the fd file for the temporary file using dup2.

0
source

All Articles