Does OCamllex match the beginning of a line?

Question

Does OCamllex match the beginning of a line?

I was busy writing a toy programming language in OCaml with ocamllex and tried to make the language sensitive to indentation changes, python-style, but I have a problem with the beginning of a line with ocamllex regex rules. I use ^to match the beginning of a line, but in OCaml, which is the concat line operator. Google searches for me unfortunately did not become much :( Does anyone know how this will work?

+3

ocaml ocamllex

Paul woolcock Mar 15 '11 at 16:17

source share

1 answer

phooji · Accepted Answer · 2011-03-15T17:00:40+0000

I'm not sure if there is explicit support for zero-length matches (for example, ^in Perl-style regular expressions that match a position, not a substring). However, you should allow your lexer to turn newlines into an explicit token, something like this:

parser.mly

%token EOL
%token <int> EOLWS
% other stuff here
%%
main:
    EOL stmt                { MyStmtDataType(0, $2) }
  | EOLWS stmt              { MyStmtDataType($1 - 1, $2) }
 ;

lexer.mll

{
 open Parser
 exception Eof
}
rule token = parse
    [' ' '\t']           { token lexbuf }     (* skip other blanks *)
  | ['\n'][' ']+ as lxm  { EOLWS(String.length(lxm)) }
  | ['\n']               { EOL }
  (* ... *)

This is not tested, but the general idea is:

Treat newlines as starters
Measure the space that immediately follows the new line and pass its length as int

Caution: you need to pre-process your input to start with one \nif it does not contain it.

Does OCamllex match the beginning of a line?

More articles: