Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

13.8. Obtaining the HTML from a URL

Problem

You need to get the HTML returned from a web server in order to examine it for items of interest. For example, you could examine the returned HTML for links to other pages or for headlines from a news site.

Solution

We can use the methods for web communication we have set up in Recipe 13.5 and Recipe 13.6 to make the HTTP request and verify the response; then, we can get at the HTML via the ResponseStream property of the HttpWebResponse object:

public static string GetHTMLFromURL(string url)
{
    if(url.Length == 0)
        throw new ArgumentException("Invalid URL","url");

    string html = "";
    HttpWebRequest request = GenerateGetOrPostRequest(url,"GET",null);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse( );
    try
    {
        if(VerifyResponse(response)== ResponseCategories.Success)
        {
            // get the response stream.
            Stream responseStream = response.GetResponseStream( );
            // use a stream reader that understands UTF8
            StreamReader reader = new StreamReader(responseStream,Encoding.UTF8);

            try
            {
                html = reader.ReadToEnd( );
            }
            finally
            {
                // close the reader
                reader.Close( );
            }
        }
    }
    finally
    {
        response.Close( );
    }
    return html;
}

Discussion

The GetHTMLFromURL method is set up to get a web page using the GenerateGetOrPostRequest and GetResponse methods, verify the response using the VerifyResponse method, and then, once we have a valid response, we start looking for the HTML that was returned.

The GetResponseStream method on the HttpWebResponse provides access to the body of the message that was returned in a System.IO.Stream object. In order to read the data, we instantiate a StreamReader with the response stream and the UTF8 property of the Encoding class to allow for the UTF8-encoded text data to be read correctly from the stream. We then call ReadToEnd on the StreamReader, which puts all of the content in the string variable called html and return it.

Table of Contents for
13.8. Obtaining the HTML from a URL

13.8. Obtaining the HTML from a URL

Problem

Solution

Discussion

See Also

Table of Contents for 13.8. Obtaining the HTML from a URL

Create new playlist

Sign In

Sign Up

13.8. Obtaining the HTML from a URL

Problem

Solution

Discussion

See Also

Table of Contents for
13.8. Obtaining the HTML from a URL