How to create a Python script that captures text from one site and passes it to another?

I would like to create a Python script that captures Pi numbers from this site: http://www.piday.org/million.php and redirects them to this site: http://www.freelove-forum.com/index.php I DO NOT spam or play jokes, this is an internal joke with the creator and the webmaster, a belated celebration of Pi Day, if you like.

+3
source share
3 answers

Import urllib2 and BeautifulSoup

import urllib2
from BeautifulSoup import BeautifulSoup

specify url and fetch using urllib2

url = 'http://www.piday.org/million.php'
response = urlopen(url)

BeautifulSoup, , , , .

soup = BeautifulSoup(response)

pi = soup.findAll('TAG')

"TAG" - , , , pi.

,

out = '<html><body>'+pi+'</html></body>

HTML, , pythons.

f = open('file.html', 'w')
f.write(out)
f.close()

'file.html', -.

BeautifulSoup, re urllib, , BeautifulSoup.

+1

, POST, . :

<form action="enter.php" method="post">
  <textarea name="post">Enter text here</textarea> 
</form>

POST POST ( ), .

, , , Pi <iframe> URL-:

 http://www.piday.org/includes/pi_to_1million_digits_v2.html

, , <p>, <body> ( <!DOCTYPE>, )

<!DOCTYPE html>

<html>
  <head>
    ...
  </head>

  <body>
    <p>3.1415926535897932384...</p>
  </body>
</html>

HTML XML, XML -. BeautifulSoup, XML, HTML.

, XML, Python urllib2. POST Python httplib.

, :

import urllib, httplib
from BeautifulSoup import BeautifulSoup

# Downloads and parses the webpage with Pi
page = urllib.urlopen('http://www.piday.org/includes/pi_to_1million_digits_v2.html')
soup = BeautifulSoup(page)

# Extracts the Pi. There only one <p> tag, so just select the first one
pi_list = soup.findAll('p')[0].contents
pi = ''.join(str(s).replace('\n', '') for s in pi_list).replace('<br />', '')

# Creates the POST request body. Still bad object naming on the creator part...
parameters = urllib.urlencode({'post':      pi, 
                               'name':      'spammer',
                               'post_type': 'confession',
                               'school':    'all'})

# Crafts the POST request header.
headers = {'Content-type': 'application/x-www-form-urlencoded',
           'Accept':       'text/plain'}

# Creates the connection to the website
connection = httplib.HTTPConnection('freelove-forum.com:80')
connection.request('POST', '/enter.php', parameters, headers)

# Sends it out and gets the response
response = connection.getresponse()
print response.status, response.reason

# Finishes the connections
data = response.read()
connection.close()

, , IP-.

+1

urllib2, Python.

URL- . , PI

pi_million_file = urllib2.urlopen("http://www.piday.org/million.php")

, HTML- -, .

URL- - POST PI.

0

All Articles