Rendering a dynamic web page

For the example search web page, we were able to quickly reverse engineer how the API worked and how to use it to retrieve the results in one request. However, websites can be very complex and difficult to understand, even with advanced browser tools. For example, if the website has been built with Google Web Toolkit (GWT), the resulting JavaScript code will be machine-generated and minified. This generated JavaScript code can be cleaned with a tool such as JS beautifier, but the result will be verbose and the original variable names will be lost, so it is difficult to understand and reverse engineer.

Additionally, higher level frameworks like React.js and other Node.js-based tools can further abstract already complex JavaScript logic and obfuscate data and variable names and add more layers of API request security (by requiring cookies, browser sessions and timestamps or using other anti-scraper technologies).

With enough effort, any website can be reverse engineered. However, this effort can be avoided by instead using a browser rendering engine, which is the part of the web browser that parses HTML, applies the CSS formatting, and executes JavaScript to display a web page. In this section, the WebKit rendering engine will be used, which has a convenient Python interface through the Qt framework.

What is WebKit?
The code for WebKit started life as the KHTML project in 1998, which was the rendering engine for the Konqueror web browser. It was then forked by Apple as WebKit in 2001 for use in their Safari web browser. Google used WebKit up to Chrome Version 27 before forking their version from WebKit called Blink in 2013. Opera originally used their internal rendering engine called Presto from 2003 to 2012 before briefly switching to WebKit, and then followed Chrome to Blink. Other popular browser rendering engines are Trident, used by Internet Explorer, and Gecko by Firefox.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset