Parsing string for domain / hostName

Users can enter websites from domain names. They can also enter mailadresses from their contacts.

Know that we need to find customers that websited whoose domain can be associated with mailadresses.

So my idea is to extract the host from the website and from the URL and compare them

So what is the most reliable algorithm to get the hostname from the URL?

for example, a host can be:

foo.com
www.foo.com
http://foo.com
https://foo.com
https://www.foo.com

The result should always be foo.com

+5
source share
4 answers

Instead of relying on an unreliable regular expression, use System.Urito do the parsing for you. Use a code like this:

string uriStr = "www.foo.com";
if (!uriStr.Contains(Uri.SchemeDelimiter)) {
    uriStr = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriStr);
}
Uri uri = new Uri(uriStr);
string domain = uri.Host; // will return www.foo.com

, , :

string tld = uri.GetLeftPart( UriPartial.Authority ); // will return foo.com
+10

, URL-, . http https .. , www. ;

var expression = /(https?:\/\/)?(www\.)?([^\/]*)(\/.*)?$/;

, :

var result = 'https://www.foo.com.vu/blah'.replace(expression, '$3')

result === 'foo.com.vu'
+1

All Articles