Facebook API

As mentioned in Chapter 1, Introduction to Web Scraping, scraping a website is a last resort when the data is not available in a structured format. Facebook does offer APIs for a vast majority of the public or private (via your user account) data, so we should check whether these APIs provide access to what we are after before building an intensive browser scraper.

The first thing to do is determine what data is available via the API. To figure this out, we should first reference the API documentation. The developer documentation available at https://developers.facebook.com/docs/ shows all different types of APIs, including the Graph API, which is the one containing the information we desire. If you need to build other interactions with Facebook (via the API or SDK), the documentation is regularly updated and easy to use.

Also available via the documentation links is the in-browser Graph API Explorer, located at https://developers.facebook.com/tools/explorer/. As shown in the following screenshot, the Explorer is a great place to test queries and their results:

Here, I can search the API to retrieve the PacktPub Facebook Page ID. This Graph Explorer can also be used to generate access tokens, which we will use to navigate the API.

To utilize the Graph API with Python, we need to use special access tokens with slightly more advanced requests. Luckily, there is already a well-maintained library for us, called facebook-sdk (https://facebook-sdk.readthedocs.io). We can easily install it using pip:

pip install facebook-sdk

Here is an example of using Facebook's Graph API to extract data from the Packt Publishing page:

In [1]: from facebook import GraphAPI

In [2]: access_token = '....' # insert your actual token here

In [3]: graph = GraphAPI(access_token=access_token, version='2.7')

In [4]: graph.get_object('PacktPub')
Out[4]: {'id': '204603129458', 'name': 'Packt'}

We see the same results as from the browser-based Graph Explorer. We can request more information about the page by passing some extra details we would like to extract. To determine which details, we can see all available fields for pages in the Graph documentation https://developers.facebook.com/docs/graph-api/reference/page/. Using the keyword argument fields, we can extract these extra available fields from the API:

In [5]: graph.get_object('PacktPub', fields='about,events,feed,picture')
Out[5]:
{'about': 'Packt provides software learning resources, from eBooks to video courses, to everyone from web developers to data scientists.',
'feed': {'data': [{'created_time': '2017-03-27T10:30:00+0000',
'id': '204603129458_10155195603119459',
'message': "We've teamed up with CBR Online to give you a chance to win 5 tech eBooks - enter by March 31! http://bit.ly/2mTvmeA"},
...
'id': '204603129458',
'picture': {'data': {'is_silhouette': False,
'url': 'https://scontent.xx.fbcdn.net/v/t1.0-1/p50x50/14681705_10154660327349459_72357248532027065_n.png?oh=d0a26e6c8a00cf7e6ce957ed2065e430&oe=59660265'}}}

We can see that this response is a well-formatted Python dictionary, which we can easily parse.

The Graph API provides many other calls to access user data, which are documented on Facebook's developer page at https://developers.facebook.com/docs/graph-api. Depending on the data you need, you may also want to create a Facebook developer application, which can give you a longer usable access token.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset