I am trying to grab a subdomain from huge lists of domain names. For example, I want to capture "funstuff" from "funstuff.mysite.com". I do not want to capture .mysite.com in the match. These events are in the sea of text, so I can not depend on the fact that they are at the beginning of the line. I know that the subdomain will not contain any special characters or numbers. So, I have a:
[a-z]{2,10}(?=\.mysite\.com)
The problem is that this will only work if the subdomain is NOT preceded by a number or a special character. For example, "asdfbasdasdfdfunstuff.mysite.com" will return "fdfunstuff", but "asdfasf23 / funstuff.mysite.com" will not match.
I cannot depend on the presence of a special character in front of the subobject, for example, “/”, as in “ http://funstuff.mysite.com ”, so it cannot be used as part of the condition.
This is normal if the capture receives erroneous text before the subdomain, although in 99% of cases it will be preceded by something other than a lowercase letter. I tried,
(?<=[^a-z])[a-z]{2,10}(?=\.mysite\.com)
but for some reason this does not capture the text, this is a situation such as:
afb"asdfunstuff.mysite.com
If the quotation mark prevents a match for [a-z]{2-20}. Basically what I would like to do in this case would be to capture asdfunstuff.mysite.com. How can I do that?
source
share