Does OCamllex match the beginning of a line?

I was busy writing a toy programming language in OCaml with ocamllex and tried to make the language sensitive to indentation changes, python-style, but I have a problem with the beginning of a line with ocamllex regex rules. I use ^to match the beginning of a line, but in OCaml, which is the concat line operator. Google searches for me unfortunately did not become much :( Does anyone know how this will work?

+3
source share
1 answer

I'm not sure if there is explicit support for zero-length matches (for example, ^in Perl-style regular expressions that match a position, not a substring). However, you should allow your lexer to turn newlines into an explicit token, something like this:

parser.mly

%token EOL
%token <int> EOLWS
% other stuff here
%%
main:
    EOL stmt                { MyStmtDataType(0, $2) }
  | EOLWS stmt              { MyStmtDataType($1 - 1, $2) }
 ;

lexer.mll

{
 open Parser
 exception Eof
}
rule token = parse
    [' ' '\t']           { token lexbuf }     (* skip other blanks *)
  | ['\n'][' ']+ as lxm  { EOLWS(String.length(lxm)) }
  | ['\n']               { EOL }
  (* ... *)

This is not tested, but the general idea is:

  • Treat newlines as starters
  • Measure the space that immediately follows the new line and pass its length as int

Caution: you need to pre-process your input to start with one \nif it does not contain it.

+4
source

All Articles