We need to do two things on the API side of the project. The first one is related to the API we expose; we need to add methods to insert, retrieve, and delete subscriptions. Another big task we have to do is related to the articles we need to fetch from the RSS feeds. We will create a CLI script to fetch and store them on the database. As the first big task is similar to what we did in the previous chapters, we will revise it quickly and focus on the CLI script.
The requirement on the API side will be adding a new endpoint to manage the feed subscriptions of a user. Of course, we will need a couple of tables and a few configurations on the database to store the information. For the CLI script, we need to add a new CLI route on the configuration, and then create the script that will fetch the data and store it on the database, outputting the information on the console as a feedback to the user.
The new endpoint will be /api/feeds/:username[/:id]
. Now, let's see how to use each HTTP method on this new endpoint:
HTTP method |
Controller method |
Parameters |
Functionality |
---|---|---|---|
GET |
|
None |
The The |
POST |
|
data |
This is the method used to add a new subscription for a user. This method will add a new entry on the table of feeds, but will not retrieve the articles of the feed at creation time. |
PUT |
|
ID |
This method is not allowed |
DELETE |
|
ID |
This method will be used to remove a subscription. Because an ID is mandatory, we will need to pass the id of the feed we want to remove and this will also trigger a removal of the related articles. |
To store the information of the feeds and articles on the database, we need to use two tables. The first one called user_feeds
will store the subscriptions of a specific user and the general information of that feed. The second table called
user_feed_articles
will store the articles of specific feeds and the information related to each article. The following is the structure of each table:
CREATE TABLE `user_feeds` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `user_id` int(11) unsigned DEFAULT NULL, `url` varchar(2048) DEFAULT NULL, `title` varchar(512) DEFAULT NULL, `icon` varchar(2048) DEFAULT NULL, `created_at` timestamp NULL DEFAULT NULL, `updated_at` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `user_feed_articles` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `feed_id` int(11) unsigned DEFAULT NULL, `title` varchar(512) DEFAULT NULL, `content` text, `url` varchar(2048) DEFAULT NULL, `author` varchar(255) DEFAULT NULL, `created_at` timestamp NULL DEFAULT NULL, `updated_at` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_feed_id` (`feed_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
As you can see, the data stored in the tables is self-explanatory and straightforward.
Ok, now it's time to take a look at the structure of the folders. As usual, we will create a new module called Feeds
for this functionality. Remember to add it to the application.config.php
file. The folder structure is as follows:
This file will contain the route that we will expose on the API and also a new type of route called console route. This new type of route will allow us to map the parameters used to call the application from the command line to actual controllers and actions that will take care of executing the required code. As this controls the parameters of the command-line call, we can also specify which one we will use, if any of them are optional, and so on.
'router' => array( 'routes' => array( 'news' => array( 'type' => 'ZendMvcRouterHttpSegment', 'options' => array( 'route' => '/api/feeds/:username[/:id]', 'constraints' => array( 'id' => 'd+' ), 'defaults' => array( 'controller' => 'FeedsControllerIndex' ), ), ), ), ),
This is the normal route we expose on the API level. As you can see right now, the username is mandatory, and we also have an ID that is optional. The ID will refer to the feed ID when we delete the subscriptions. In the next chapter, we will make the username parameter disappear because we will use the information of the Oauth 2.0 mechanism to identify who's making the request; but right now, we need to specify the username each time we make a request.
'console' => array( 'router' => array( 'routes' => array( 'feeds-process' => array( 'options' => array( 'route' => 'feeds process [--verbose|-v]', 'defaults' => array( 'controller' => 'FeedsControllerCli', 'action' => 'processFeeds' ) ) ) ) ) ),
This is the new type of route we were talking about earlier. As you can see, the structure is fairly similar to the default routes. The only difference is that the route parameter specifies the parameters you have to pass to the index.php
file when called from the command line in order to execute the specific controller and the configured action. As you can also see, there is a parameter between the square brackets, which means that the parameter is optional. This is exactly the same as with the normal routes, but in this case, we also have a pipe symbol and a shorter version of the same parameter. This allows us to give longer and shorter versions of each parameter. Now, you go to the public
folder using the command line and execute the following code:
php index.php feeds process –v
The request will be sent to the CliController.php
file and will be fulfilled. That's the line with which we will configure the cronjob:
'controllers' => array( 'invokables' => array( 'FeedsControllerIndex' => 'FeedsControllerIndexController', 'FeedsControllerCli' => 'FeedsControllerCliController' ), ),
This is the last block of code of this file, and we are only listing the available controllers on this module and mapping them to the actual file.
In this case, we do not need to add or modify any of the default code that usually comes with this file. If you review the code on the file, you will see that we have two methods: getConfig()
and getAutoloaderConfig()
, and they are the same as the ones we saw in the previous Module.php
files of other modules.
At this point, we already saw a few table gateways, and I'm confident that you are totally capable of creating them on your own. In this case, we will just highlight the updateTimestamp()
method that takes care of updating the updated_at
column with the current timestamp. This column will be used by the CLI script to track the last time we fetched the articles and avoid duplicating them on the database. As usual, we have other methods: create()
that adds a new row on the table, getByUserId()
that fetches all the rows based on the user_id
attribute, and the setDbAdapter()
method, which we have on all the table gateways.
public function updateTimestamp($feedId) { return $this->update(array( 'updated_at' => new Expression('NOW()') ), array( 'id' => $feedId) ); }
As you can see, we issue a call to the update()
method passing an array with the changes we want to make; in this case, by updating the updated_at
column with the NOW()
expression. The second parameter we pass to the method is the WHERE
clause, because we don't want to mess up and update all the rows.
We are not going to review this table gateway in detail because it is really simple, and as we said before, you should be capable of doing it yourself. In this case, we have a couple of methods: the usual create()
method to insert the data on the table and the getByFeedId()
method to retrieve all the articles of a specific feed. As usual, the setDbAdapter()
method is also present in this class.
In this controller, we will take care of retrieving the information of an RSS to insert it on the database, and this would imply the usage of ZendFeed
.
As usual, at the beginning of the class, we add the namespace and the dependencies we need on this controller. In this case, we also add the following line to declare that we want to use the ZendFeedReaderReader
component of ZF2:
use ZendFeedReaderReader;
As we make a query to retrieve the URL of the RSS feed and the fav icon of the page to use as an icon on our menu, we also need a ZendHttpClient
:
use ZendHttpClient;
Additionally, we also use the following components:
use ZendDomQuery; use ZendValidatorDbNoRecordExists;
As is common on the other controllers, we declare a few properties at the top of the class to hold a copy of the table gateways and avoid creating a few of them using the following code:
protected $userFeedsTable; protected $userFeedArticlesTable; protected $usersTable;
The methods that are not implemented on these controllers are the same as the ones we saw in the previous table: get()
and update()
. They will contain the following line to return a 405 HTTP code to the clients:
$this->methodNotAllowed();
Let's now review each method one by one to see what they do and how they do it.
The first one we will review is getList()
, which basically returns a list of all the feeds with the associated articles nested on the information. In the first few lines of the code, you can see that we extract the information of the user based on the mandatory username
parameter on the route. As we mentioned before, this will be amended in the future when we implement the API authentication. But for now, the following quick and dirty solution will do the job:
$username = $this->params()->fromRoute('username'), $usersTable = $this->getTable('UsersTable'), $user = $usersTable->getByUsername($username); $userFeedsTable = $this->getTable('UserFeedsTable'), $userFeedArticlesTable = $this->getTable('UserFeedArticlesTable'), $feedsFromDb = $userFeedsTable->getByUserId($user->id); $feeds = array(); foreach ($feedsFromDb as $f) { $feeds[$f->id] = $f; $feeds[$f->id]]['articles'] = $userFeedArticlesTable ->getByFeedId($f->id)->toArray(); } return new JsonModel($feeds);
As you can see, we retrieved the feeds from the table based on the user ID and then we proceeded to extract the articles of each feed from the database. If you take a closer look, you can see that we are creating an associative array using the ID of the feed as a key. This is a convenient way to return the information for the frontend and be able to quickly extract the articles we need to show, based on the feed ID. Of course, if a feed doesn't have any articles, we will store an empty array on the articles
key.
In the following section, we are going to review the
delete()
method, which is very simple:
$username = $this->params()->fromRoute('username'), $usersTable = $this->getTable('UsersTable'), $user = $usersTable->getByUsername($username); $userFeedsTable = $this->getTable('UserFeedsTable'), $userFeedArticlesTable = $this->getTable('UserFeedArticlesTable'), $userFeedArticlesTable->delete(array('feed_id' => $id)); return new JsonModel(array( 'result' => $userFeedsTable->delete(array( 'id' => $id, 'user_id' => $user->id )) ));
The procedure is essentially the same, the only difference with the method we reviewed before is that we issue a delete()
call on the table gateway, passing the feed_id
attribute. The first call removes the articles from user_feed_articles
, and the second delete()
call removes the subscription from user_feeds
. As you can see, we also used the user_id
attribute on the second delete()
call. This is a small protection to avoid someone deleting a subscription of another user. Of course, if we were doing this in a professional way, we would want to first check if the user has the subscription we are trying to delete before we actually delete the data.
Now let's jump to the last method we are going to explain in this controller. The create()
method takes care of extracting the information given by the user from the website and storing the subscription. We can just accept an RSS URL and store it. But in this case, we also need to extract the favicon of the original website to use it at the frontend. So, the user needs to provide the URL of the website instead of the URL of the RSS; we will take care of discovering the RSS URL by using the following code:
$username = $this->params()->fromRoute('username'), $usersTable = $this->getTable('UsersTable'), $user = $usersTable->getByUsername($username);
This is the first block and as we saw before, it just retrieves the user from the database to have access to the user data.
$userFeedsTable = $this->getTable('UserFeedsTable'), $rssLinkXpath = '//link[@type="application/rss+xml"]'; $faviconXpath = '//link[@rel="shortcut icon"]';
In the second block, of the method, we retriev an instance of the table gateway, and then store two XPath expressions that will help us to retrieve the HTML tags that contain the RSS URL and the favicon URL.
$client = new Client($data['url']); $client->setEncType(Client::ENC_URLENCODED); $client->setMethod(endHttpRequest::METHOD_GET); $response = $client->send();
The third section of code prepares the HTTP client that we will use to retrieve the HTML and issues the request.
Now, we arrive on a conditional block that checks if we got the HTML from the website or not. In case the request fails, we throw an exception.
if ($response->isSuccess()) { ... } else { throw new Exception("Website not found", 404); }
Keep in mind that here we are taking a shortcut with an exception. In a proper application, you need to add this to a queue and retry in a few minutes or just retry a few times before throwing the exception. But this is out of the scope of this book.
$html = $response->getBody(); $html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"); $dom = new Query($html); $rssUrl = $dom->execute($rssLinkXpath); if (!count($rssUrl)) { return new JsonModel(array( 'result' => false, 'message' => 'Rss link not found in the url provided' )); } $rssUrl = $rssUrl->current()->getAttribute('href'), $faviconUrl = $dom->execute($faviconXpath); if (count($faviconUrl)) { $faviconUrl = $faviconUrl->current()->getAttribute('href'), } else { $faviconUrl = null; }
This is the code you will find inside the if
block that we saw before, which of course, is only executed if we are able to get the contents of the page.
The first two lines just retrieve the contents and take care of converting the encoding,ensuring that there are no issues when using the HTML inside the ZendDomQuery
object that comes later.
The first thing we try is to retrieve the HTML tag that contains the RSS link. In case we don't find it, we just throw an exception. After that, we store the href
value on a variable and carry on trying to get the favicon
URL. In this case, the favicon is not a critical part. So if we don't get it, we just ignore it.
The final part of this method is the following block. Here, we will load the RSS URL using a ZendFeedReaderReader
component, and use that component to extract the information from the RSS-like title and so on:
$rss = Reader::import($rssUrl); return new JsonModel(array( 'result' => $userFeedsTable->create( $user->id, $rssUrl, $rss->getTitle(), $faviconUrl ) ));
As a final note on this controller we also have the methodNotAllowed()
and getTable()
methods that are similar to the ones used on the other controllers. As you see, we use exactly the same code over and over again for the methodNotAllowed()
method which is a good candidate to be promoted as a parent class and get rid of it on all its children.
This is the new type of controller we are going to see in this book. Actually, it is not a new type, it is just a controller extending the AbstractActionController
class, which will be called using the command-line Interface. This means that all the controllers extending the AbstractActionController
class can be called through the CLI, but it is your job to detect if the request is coming from the normal channels as a HTTP request or a CLI request, and then act in one way or the other on each case. You can check the type of the request object to determine if it is a CLI request or a normal request. In the first case, the object will be an instance of ZendConsoleRequest
, and in the latter, it will be an instance of ZendHttpPhpEnvironmentRequest
.
This script will take care of retrieving the new articles found on each RSS feed on the database. To accomplish that, we will use the updated_at
field to avoid duplications on the articles. Also, keep in mind that this script is supposed to be called by a cronjob every X minute to refresh the articles in an automatic way.
As we are going to read the RSS feed from this controller, we need to declare the usage of the Feed components of ZF2 as follows:
use ZendFeedReaderReader;
We need to work with the new tables and this means that we need to declare the properties that will hold the table and the
getTable()
method, which will create the instances.
protected $userFeedsTable; protected $userFeedArticlesTable;
After all of this, we arrive on the
processFeedsAction()
method we called on the console route, which is the one in charge of processing the requests. Let's see what we have inside it.
$request = $this->getRequest(); $verbose = $request->getParam('verbose') || $request->getParam('v'),
These are the first two lines. The objective is to check if the verbose
parameter was specified while calling the script from the command line. If you remember, we specified two versions of the verbose
parameter, the long and the short one, and that's why we need to retrieve both here.
$userFeedsTable = $this->getTable('UserFeedsTable'), $userFeedArticlesTable = $this->getTable('UserFeedArticlesTable'), $feeds = $userFeedsTable->select();
The following block of code gets instances of the tables we need to use, and utilizes the first one to fetch all the feeds on the database and loop over them to execute the following code:
foreach ($feeds as $feed) { if ($verbose) { printf("Processing feed: %s ", $feed['url']); } $lastUpdate = strtotime($feed['updated_at']); $rss = Reader::import($feed['url']); // Loop over each channel item/entry and store relevant data for each foreach ($rss as $item) { $timestamp = $item->getDateCreated()->getTimestamp(); if ($timestamp > $lastUpdate) { if ($verbose) { printf("Processing item: %s ", $item ->getTitle()); } $author = $item->getAuthor(); if (is_array($author)) { $author = $author['name']; } $userFeedArticlesTable->create( $feed['id'], $item->getTitle(), $item->getContent(), $item->getLink(), $author ); } } if ($verbose) { printf("Updating timestamp "); } $userFeedsTable->updateTimestamp($feed['id']); if ($verbose) { printf("Finished feed processing "); } }
The first part just outputs the information to the console in case the verbose
parameter is specified. Then, we get the last update from the database and convert it to a timestamp to be able to compare it. Right after that, we import the RSS feed using the ZendFeedReaderReader
component and loop over it to access all the articles.
For each article on the feed, we extract the timestamp and compare it against the value of the last update we have on the database. In case the article was published after the last update, we proceed printing more debug information, and then insert the data on the table, using the information we get from the item object.
After looping through all the articles, we print more debug information and then update the timestamp for this feed on the database, finishing with more debug strings.
If something goes wrong while processing, the script will stop. So, if you are going to use this on a production-ready application, you need to take care of all the exceptions that can occur.
Because now the controllers can be called by the normal channel or CLI, we need to handle the errors in a different way for each case. If it is an error produced while fulfilling an HTTP request, we need to return it using the method we have in place; however, if an error occurs while running the CLI script, we need to output it on the console. To fix this, we have to do a small change in the ApiErrorListener.php
file, and replace the first line of the
onRender()
method. Instead of just checking if the response is OK, we are also going to test if the request is a CLI request.
if ($e->getRequest() instanceOf endConsoleRequest || $e->getResponse()->isOk())