I am creating a sharing site that allows you to exchange links to web pages with Ruby on Rails.
I would like to extract some representative images for each page (like on Facebook when you share a link).
I currently use the opengraph gem to parse the meta tag og:imagefirst, and then use Nokogiri to parse the content of the page and get all the <img>tag attributes src. This gives good results (with the exception of some decorations, so I filter the results by size ...).
-
Now I would like to go further and analyze the css property background-image: the website logo is often displayed as the background for the <h1>or tag <a>.
I am thinking of the following process:
... and absolute URLs according to document URLs.
-
My questions:
Do you think there is a more effective alternative?
Is there any library that can improve process performance?
, HTML + CSS, CSS DOM, HTML (h1, a,...) .