I was thinking of an improvement. I am currently doing a lot of text file processing.
I do not want to say that PCRE is slow / fast or any other implementation in this regard.
The language I write in is primarily Perl. I know that it has a powerful regex engine, and I know that it is more expressive than PCRE.
I have an idea to create a small C ++ regex engine that compiles a regex for raw nasm.
I know that PCRE is quite complicated, and I believe that I could have missed a lot of the things PCRE did in terms of unnecessary processing. And I could do it faster than Perl, because it works with vm-like opcodes and all kinds of things that can be considered overhead.
I already started implementation some time ago. I am not going to publish it here, since I have no problems with this, I could finish it and get a regular expression engine capable of executing captures, able to interpret + * ^ $character classes (although I have not done the part where I I will convert the regular expression to assembly language)
+
*
^
$
Would this be a good idea or a bad idea? What could go wrong in terms of achieving good performance with this?
tl; dr = > ++ - , ?
, . , , . , , , , , , , - . .
Perl regexp , . . Russ Cox Regexp Perl vs. Thompson , .
Thompson/Cox, Ragel, . Ragel , . " ".
, , .
, IBM . , Davide Pasetto, , .
. , , -, . . , , .
, regexp , , . , - (, char).
, , , -, , . , , .
, . Regexp . , , . , , , . , ?
, . ... ummm...
, . - regexp . , - , 100% - , , . , , - ! , , , , , , ad-hoc- ( ). , .
LLVM, , - .
, , . , , .
, , , , , ( Google RE2) . RE, , .
If, however, you want to do this as a challenge and training exercise, and there is no deadline, and, in any case, give him a chance. Get ready for a lot of work and do not forget to write a lot of test cases, because there will be errors. Your regular expression engine will be an essential foundation for the entire work product, and you cannot afford to misbehave during production.
Good luck