Using PEG Parser to maintain BBCode: pegjs or ... what?

I have a bbcode β†’ html converter that responds to a change event in a text box. Currently, this is done using a number of regular expressions, and there are a number of pathological cases. I always wanted to sharpen a pencil on this grammar, but did not want to go into shaving yaks. But ... I recently learned about pegjs , which seems like a pretty complete implementation of PEG parser generation. I have most of the specified grammar, but now I have to wonder if this is the right way to use a full-sized parser.

My specific questions are:

  • How does my application rely on translating what I can into HTML and the rest as source code, does the bbcode implementation make use of a parser that could fail with a syntax error? For example: [url=/foo/bar]click me![/url]it will undoubtedly succeed after entering the closing bracket in the close tag. But what would the user see then? With regex, I can simply ignore inappropriate things and treat it like plain text for preview purposes. With formal grammar, I don’t know if this is possible because I rely on creating HTML from the parsing tree and that the parsing fails ... what?

  • I do not know where the transformations should be made. In the formal lex / yacc analyzer, I would have header files and characters representing the type node. In pegjs, I get nested arrays with text node. I can emit the translated code as an action generated by the pegjs parser, but it seems that the smell of the code is combined with the parser and emitter. However, if I call PEG.parse.parse(), I will return something like this:

[
       [
          "[",
          "img",
          "",
          [
             "/",
             "f",
             "o",
             "o",
             "/",
             "b",
             "a",
             "r"
          ],
          "",
          "]"
       ],
       [
          "[/",
          "img",
          "]"
       ]
    ]

given grammar type:

document
   = (open_tag / close_tag / new_line / text)*

open_tag
   = ("[" tag_name "="? tag_data? tag_attributes? "]")


close_tag
   = ("[/" tag_name "]") 

text
   = non_tag+

non_tag
   = [\n\[\]]

new_line
   = ("\r\n" / "\n")

, , , . , , , , node, , . , , , .

? ?

+5
3

, , . , , , " ", . Peg.js , , , , .

, . , ,

text
   = text:non_tag+ {
     // we captured the text in an array and can manipulate it now
     return text.join("");
   }

. , pre.js pullrequest, . , .

+2

( ):

incomplete_tag = ("[" tag_name "="? tag_data? tag_attributes?)
//                         the closing bracket is omitted ---^

open_tag document, . , , - . incomplete_tag .

( ):

. - Javascript, pegjs, i. . !

, { return result.join("") }, , pegjs . . pegjs , . , .

. PEG Python. : .

+3

Try something like this replacement rule. You are on the right track; you just have to tell him to collect the results.

text = result: non_tag + {return result.join (''); }

+1
source

All Articles