Interacting with Forms

In earlier chapters, we downloaded static web pages that return the same content. In this chapter, we will interact with web pages which depend on user input and state to return relevant content. This chapter will cover the following topics:

  • Sending a POST request to submit a form
  • Using cookies and sessions to log in to a website
  • Using Selenium for form submissions

To interact with these forms, you'll need a user account to log in to the website. You can register an account manually at http://example.webscraping.com/user/register. Unfortunately, we can't yet automate the registration form until the next chapter, which deals with CAPTCHA images.

Form methods
HTML forms define two methods for submitting data to the server-GET and POST. With the GET method, data such as ?name1=value1&name2=value2 is appended to the URL, which is known as a "query string". The browser sets a limit on the URL length, so this is only useful for small amounts of data. Additionally, this method is generally intended to only retrieve data from the server and not make changes to it, but sometimes this intention is ignored. With POST requests, the data is sent in the request body, not the URL. Sensitive data should only be sent in a POST request to avoid exposing it in the URL. How the POST data is represented in the body depends on the encoding type.
Servers can also support other HTTP methods, such as PUT and DELETE, however, these are not supported in standard HTML forms.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset