C is the last language I would choose for this. First, if you want to do this with high precision, use the MIME parser to output the body of the HTML. Java has mime4j, Perl has MIME :: Parser, Python has email, etc. It is not so difficult, and I am ready to help with this step in any of these languages, if you want. Second, use an HTML parser to isolate links.
, Perl PHP. . . URL-, sed. URL-, , , - , , url_encode, P- .