Awk match () - multiple lines per line

I am using the match () function in gawk to grab links from an HTML file. The regex looks something like this:

match($0, /(<a href=\")([^\"]+)/, arr)

I can't seem to use the "/ g" option at the end to get a few matches per line, though?

+3
source share
2 answers

It is right. AWK regular expressions do not have flags.
In addition, there is no built-in support for matchfinding a second or later match.
Only functions gsuband gensubhave it. I would try something like this:

gensub(/.*<a href=\"([^\"]+)/, "\1%", "g")
last = split($0, "%", arr)
delete arr[last]

where %is a string that you can guarantee will not be found in the input.

+5
source

lynx URL-. -dump . . ( . URL- .)

$ lynx -dump http://www.stackoverflow.com 

[snip]
References

   Visible links
   1. http://stackoverflow.com/opensearch.xml
   2. http://stackoverflow.com/feeds
   3. http://stackexchange.com/
   4. http://stackoverflow.com/users/login
   5. http://careers.stackoverflow.com/
   6. http://chat.stackoverflow.com/
[snip]
 676. http://creativecommons.org/licenses/by-sa/3.0/
 677. http://blog.stackoverflow.com/2009/06/attribution-required/

   Hidden links:
 678. http://www.peer1.com/stackoverflow
 679. http://creativecommons.org/licenses/by-sa/3.0/
+1

All Articles