An example dynamic web page

Let's look at an example dynamic web page. The example website has a search form, which is available at http://example.webscraping.com/search, which is used to locate countries. Let's say we want to find all the countries that begin with the letter A:

If we right-click on these results to inspect them with our browser tools (as covered in Chapter 2, Scraping the Data), we would find the results are stored within a div element with ID "results":

Let's try to extract these results using the lxml module, which was also covered in Chapter 2, Scraping the Data, and the Downloader class from Chapter 3, Caching Downloads:

>>> from lxml.html import fromstring
>>> from downloader import Downloader
>>> D = Downloader()
>>> html = D('http://example.webscraping.com/search')
>>> tree = fromstring(html)
>>> tree.cssselect('div#results a')
[]

The example scraper here has failed to extract results. Examining the source code of this web page (by using the right-click View Page Source option instead of using the browser tools) can help you understand why. Here, we find the div element we are trying to scrape is empty:

<div id="results"> 
</div>

Our browser tools give us a view of the current state of the web page. In this case, it means the web page has used JavaScript to load search results dynamically. In the next section, we will use another feature of our browser tools to understand how these results are loaded.

What is AJAX?
AJAX stands for Asynchronous JavaScript and XML and was coined in 2005 to describe the features available across web browsers that make dynamic web applications possible. Most importantly, the JavaScript XMLHttpRequest object, which was originally implemented by Microsoft for ActiveX, became available in many common web browsers. This allowed JavaScript to make HTTP requests to a remote server and receive responses, which meant that a web application could send and receive data.  The previous way to communicate between client and server was to refresh the entire web page, which resulted in a poor user experience and wasted bandwidth when only a small amount of data needed to be transmitted.
Google's Gmail and Maps sites were early examples of the dynamic web applications and helped make AJAX mainstream.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset