Rubing an HTTP response with Nokogiri

Parsing an HTTP Response with Nokogiri

Hi, I am having trouble parsing HTTPresponse objects using Nokogiri.

I use this function to get the website here:

select link

def fetch(uri_str, limit = 10)


  # You should choose better exception.
  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  url = URI.parse(URI.encode(uri_str.strip))
  puts url

  #get path
  req = Net::HTTP::Get.new(url.path,headers)
  #start TCP/IP
  response = Net::HTTP.start(url.host,url.port) { |http|
        http.request(req)
  }
  case response
  when Net::HTTPSuccess
    then #print final redirect to a file
    puts "this is location" + uri_str
    puts "this is the host #{url.host}"
    puts "this is the path #{url.path}"

    return response
    # if you get a 302 response
  when Net::HTTPRedirection 
    then 
    puts "this is redirect" + response['location']
    return fetch(response['location'],aFile, limit - 1)
  else
    response.error!
  end
end




            html = fetch("http://www.somewebsite.com/hahaha/")
            puts html
            noko = Nokogiri::HTML(html)

When I do this, html prints a whole bunch of gibberish and Nokogiri complains that "node_set should be Nokogiri :: XML :: NOdeset

If anyone can offer help, he would greatly appreciate it.

+5
source share
1 answer

The first one. Your method fetchreturns an object Net::HTTPResponse, not just a body. You must provide the body of Nokigiri.

response = fetch("http://www.somewebsite.com/hahaha/")
puts response.body
noko = Nokogiri::HTML(response.body)

I updated your script so that it runs (below). Several things were undefined.

require 'nokogiri'
require 'net/http'

def fetch(uri_str, limit = 10)
  # You should choose better exception.
  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  url = URI.parse(URI.encode(uri_str.strip))
  puts url

  #get path
  headers = {}
  req = Net::HTTP::Get.new(url.path,headers)
  #start TCP/IP
  response = Net::HTTP.start(url.host,url.port) { |http|
        http.request(req)
  }

  case response
  when Net::HTTPSuccess
    then #print final redirect to a file
    puts "this is location" + uri_str
    puts "this is the host #{url.host}"
    puts "this is the path #{url.path}"

    return response
    # if you get a 302 response
  when Net::HTTPRedirection
    then
    puts "this is redirect" + response['location']
    return fetch(response['location'], limit-1)
  else
    response.error!
  end
end

response = fetch("http://www.google.com/")
puts response
noko = Nokogiri::HTML(response.body)
puts noko

script . Nokogiri - , . , Nokogiri, . , .

StackOverflow

ruby ​​1.9: UTF-8 ( )

Net:: HTTP Ruby 1.9.1?

+4

All Articles