Reading and publishing on web pages using C #

Question

Reading and publishing on web pages using C #

I have a project at work that requires me to enter information on a web page, read the next page that I'm redirected to, and then take further action. A simplified example of the real world will be similar to google.com, introducing "Coding Tricks" as search criteria and reading the resulting page.

Small coding examples, such as those related to http://www.csharp-station.com/HowTo/HttpWebFetch.aspx , tell you how to read a web page, but not how to interact with it by sending information to the form and going to the next page.

For the record, I do not create a malicious and / or spam-related product.

So, how do I go to read web pages that require a few simple browsing steps?

+2

c # screen-scraping

borktholamue 25 sept. '08 at 18:48

source share

6 answers

Chris lawlor · Answer 1 · 2008-09-25T19:29:25+0000

You can programmatically create an Http request and get the answer:

 string uri = "http://www.google.com/search";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";

        // encode the data to POST:
        string postData = "q=searchterm&hl=en";
        byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
        request.ContentLength = encodedData.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(encodedData, 0, encodedData.Length);

        // send the request and get the response
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {

            // Do something with the response stream. As an example, we'll
            // stream the response to the console via a 256 character buffer
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                Char[] buffer = new Char[256];
                int count = reader.Read(buffer, 0, 256);
                while (count > 0)
                {
                    Console.WriteLine(new String(buffer, 0, count));
                    count = reader.Read(buffer, 0, 256);
                }
            } // reader is disposed here
        } // response is disposed here

Of course, this code will return an error since Google uses GET, not POST, for search queries.

This method will work if you are dealing with specific web pages, since the URLs and POST data are mostly hardcoded. If you need something more dynamic, you will have to:

Page Capture
Remove the form
Create a POST string based on form fields

FWIW, I think something like Perl or Python might be better suited for this kind of task.

edit: x-www-form-urlencoded

Axl · Answer 2 · 2008-09-25T21:29:38+0000

Selenium. Firefox Selenium IDE, script #, Selenium RC #. , System.Net.HttpWebRequest System.Net.WebClient. , . System.Windows.Forms.WebBrowser.

: Selenium IDE Selenium RC, Java, WatiN Test Recorder WatiN .NET.

Joel Coehoorn · Answer 3 · 2008-09-25T18:56:45+0000

, html . , , , , .

, , System.Net.HttpWebRequest/HttpWebResponse, , System.Net.WebClient. , cookie , ..

SecretDeveloper · Answer 4 · 2008-09-25T18:57:34+0000

, -, URL-, , . , "" google.com?q=beetles, .

, - querystring (url) , -, -. Google WebRequest webresponse.

Austin Salonen · Answer 5 · 2008-09-25T18:59:59+0000

Google - , , .

: http://www.google.com/search?hl=en&q=coding%20tricks

Mindmodel · Answer 6 · 2008-09-25T19:40:27+0000

:

IMacros

http://www.iopus.com/

, , , , .

The top-level product has a graphical interface that you can use to record and edit macros, as well as C # libraries, which you can call from .Net code.

IMHO, this is one of those areas of programming that seems simple to launch ("I just GET the HTML for the page, process the line, then GET the next page ..."), but in practice it becomes to be a real PITA.

Reading and publishing on web pages using C #

More articles: