I'm not sure if there is explicit support for zero-length matches (for example, ^in Perl-style regular expressions that match a position, not a substring). However, you should allow your lexer to turn newlines into an explicit token, something like this:
parser.mly
%token EOL
%token <int> EOLWS
% other stuff here
%%
main:
EOL stmt { MyStmtDataType(0, $2) }
| EOLWS stmt { MyStmtDataType($1 - 1, $2) }
;
lexer.mll
{
open Parser
exception Eof
}
rule token = parse
[' ' '\t'] { token lexbuf } (* skip other blanks *)
| ['\n'][' ']+ as lxm { EOLWS(String.length(lxm)) }
| ['\n'] { EOL }
(* ... *)
This is not tested, but the general idea is:
- Treat newlines as starters
- Measure the space that immediately follows the new line and pass its length as
int
Caution: you need to pre-process your input to start with one \nif it does not contain it.
source
share