Parsing Transact SQL with RegEx

I'm pretty inexperienced with RegEx - just a random regExforward RegEx for a programming task that I developed by trial and error, but now I have a serious regEx problem:

I have about 970 text files containing Sybase Transact SQL fragments, and I need to find each table name in these files and preface the table name with "#". Therefore, my options are either to spend a week editing the files manually, or to write a script or application using regEx (Python 3 or Delphi-PRCE) that will perform this task.

The rules are as follows:

Table names are ALWAYS - upper case - so I'm only looking for upperCase words;

Column names, SQL expressions, and variables ALWAYS lowerCase;

SQL keywords, table aliases, and column values ​​MAY BE upperCase, but MUST NOT have the prefix '#';

Table aliases (must not have a prefix) will always have a white space in front of them until the end of the previous word, which will be the name of the table.

Column values ​​(must not have a prefix) will be either numerical values ​​or characters enclosed in quotes.

Here is an example of a text requiring all of the above rules to apply:

update SYBASE_TABLE
set ok = convert(char(10),MB.limit)
from MOVE_BOOKS MB, PEOPLEPLACES PPL
where MB.move_num = PPL.move_num
AND PPL.mot_ind = 'B'
AND PPL.trade_type_ind = 'P'

So far I only got this far (not too far ...)

- [[: upper:]] (me?)

Any help would be greatly appreciated. TIA

Mn

+3
source share
1 answer

. , , :

update TABLE set x='NOT_A_TABLE' where y='NOT TABLES EITHER' 
-- AND NO TABLES HERE AS WELL

, , , , , SQL :

-- a quote: '
update TABLE set x=42 where y=666
-- another quote: '

update TABLE set x='not '' A '''' table' where y=666 

EDIT II

, () , . () . , - " ", . , , comment, a string literal, a keyword, capitalized word. 4 , .

Python:

#!/usr/bin/env python
import re 

input = """
UPDATE SYBASE_TABLE
SET ok = convert(char(10),MB.limit) -- ignore me!
from MOVE_BOOKS MB, PEOPLEPLACES PPL
where MB.move_num = PPL.move_num
-- comment '
AND PPL.mot_ind = 'B '' X'
-- another comment '
AND PPL.trade_type_ind = 'P -- not a comment'
"""

regex = r"""(?xs)          # x = enable inline comments, s = enable DOT-ALL
  (--[^\r\n]*)             # [1] comments
  |                        # OR
  ('(?:''|[^\r\n'])*')     # [2] string literal
  |                        # OR
  (\b(?:AND|UPDATE|SET)\b) # [3] keywords
  |                        # OR
  ([A-Z][A-Z_]*)           # [4] capitalized word
  |                        # OR
  .                        # [5] fall through: matches any char
"""

output = ''

for m in re.finditer(regex, input): 
    # append a `#` if group(4) matched
    if m.group(4): output += '#'
    # append the matched text (any of the groups!)
    output +=  m.group()

# print the adjusted SQL
print output

:

UPDATE #SYBASE_TABLE
SET ok = convert(char(10),#MB.limit) -- ignore me!
from #MOVE_BOOKS #MB, #PEOPLEPLACES #PPL
where #MB.move_num = #PPL.move_num
-- comment '
AND #PPL.mot_ind = 'B '' X'
-- another comment '
AND #PPL.trade_type_ind = 'P -- not a comment'

, , , script , .

.

+4

All Articles