I use Text::Ngramsto determine the combinations of words in a string. However, I need to keep words that contain numbers. I decided that $o->{tokenrex}- this is what I need to change, but I can not determine the correct regular expression for it.
Text::Ngrams
$o->{tokenrex}
Original qr/([a-zA-Z]+|(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?)/;, but I think I need something more line by line:
qr/([a-zA-Z]+|(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?)/;
qr/([a-zA-Z]+|(?<=\w)(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?(?=\w)|(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?)/;
What follows if I read the regular expression correctly, match any number of alpha characters, or the “number” that the word character has before and after it, or the “number”. Except that he divides my "word" into separate tokens. In the example I'm working with is " A1X ".
Any help would be great.
I'm doing it too hard. The original regular expression matches words consisting only of letters or numbers (integers, floating point, including exponential notation).
If you need to match words consisting of letters and numbers, then for this expression will be [a-zA-Z\d]+. In the module documents, you also want to indicate what you want to skip, and it matches [^a-zA-Z\d]+.
[a-zA-Z\d]+
[^a-zA-Z\d]+
$self->{tokenrex} = qr/([a-z\d]+)/i; $self->{skiprex} = qr/([^a-z\d]+)/i;
If you need to recognize numbers, as the module documentation shows in your example, then please let me know and I will add this for you with pleasure. From your description, this does not seem like what you need.
, , , . , , : , . (?:foo), foo; (foo), .
(?:foo)
foo
(foo)
, :
p{L}*(?:\d*\.)?\d+(?:[eE][-+]?\d+)?(?:(?<=p{L}(?:\d*\.)?\d+(?:[eE][-+]?\d+)?)p{L}+)?
:
p{L}* #Zero or more letter characters (note that this is broader than [a-zA-Z], as it allows accent marks and so forth) (?:\d*\.)?\d+ #Slightly simplified version of your number-matching pattern (?:(?<=p{L}...)p{L}+)? #Optionally match trailing letters, but only if there are letters at the beginning
, , . - [eE]; . , A3E4D, E ? , . , , , , , , .
[eE]
(?<=...) (?=...) - look-behind look-ahead, , , , , .
(?<=...)
(?=...)
, $_ = "A1X",
$_ = "A1X"
qr/(?<=A)1(?=X)/
$_, , (, $&), 1, A1X.
$_
$&
1
A1X
( A1B2C3D, - , )
A1B2C3D
qr/(\b[a-zA-Z]([a-zA-Z\d]+[a-zA-Z])?\b|(\d+(\.\d+)?|\d*\.\d+)([eE][-+]?\d+)?)/
, , ( ) "" .