API development

We need to do two things on the API side of the project. The first one is related to the API we expose; we need to add methods to insert, retrieve, and delete subscriptions. Another big task we have to do is related to the articles we need to fetch from the RSS feeds. We will create a CLI script to fetch and store them on the database. As the first big task is similar to what we did in the previous chapters, we will revise it quickly and focus on the CLI script.

Requirements

The requirement on the API side will be adding a new endpoint to manage the feed subscriptions of a user. Of course, we will need a couple of tables and a few configurations on the database to store the information. For the CLI script, we need to add a new CLI route on the configuration, and then create the script that will fetch the data and store it on the database, outputting the information on the console as a feedback to the user.

The new endpoint will be /api/feeds/:username[/:id]. Now, let's see how to use each HTTP method on this new endpoint:

HTTP method

Controller method

Parameters

Functionality

GET

get()

getList()

None

The get() method will return an HTTP error 405, because we will not retrieve information of a subscription directly.

The getList() method will return a list of all the feeds to which a user is subscribed, and also the related list of posts nested inside each feed information.

POST

create()

data

This is the method used to add a new subscription for a user. This method will add a new entry on the table of feeds, but will not retrieve the articles of the feed at creation time.

PUT

update()

ID

This method is not allowed

DELETE

delete()

ID

This method will be used to remove a subscription. Because an ID is mandatory, we will need to pass the id of the feed we want to remove and this will also trigger a removal of the related articles.

Working with the database

To store the information of the feeds and articles on the database, we need to use two tables. The first one called user_feeds will store the subscriptions of a specific user and the general information of that feed. The second table called user_feed_articles will store the articles of specific feeds and the information related to each article. The following is the structure of each table:

CREATE TABLE `user_feeds` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(11) unsigned DEFAULT NULL,
  `url` varchar(2048) DEFAULT NULL,
  `title` varchar(512) DEFAULT NULL,
  `icon` varchar(2048) DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `user_feed_articles` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `feed_id` int(11) unsigned DEFAULT NULL,
  `title` varchar(512) DEFAULT NULL,
  `content` text,
  `url` varchar(2048) DEFAULT NULL,
  `author` varchar(255) DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_feed_id` (`feed_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As you can see, the data stored in the tables is self-explanatory and straightforward.

Expanding the module structure

Ok, now it's time to take a look at the structure of the folders. As usual, we will create a new module called Feeds for this functionality. Remember to add it to the application.config.php file. The folder structure is as follows:

Expanding the module structure

The module.config.php file

This file will contain the route that we will expose on the API and also a new type of route called console route. This new type of route will allow us to map the parameters used to call the application from the command line to actual controllers and actions that will take care of executing the required code. As this controls the parameters of the command-line call, we can also specify which one we will use, if any of them are optional, and so on.

'router' => array(
    'routes' => array(
        'news' => array(
            'type' => 'ZendMvcRouterHttpSegment',
            'options' => array(
                'route' => '/api/feeds/:username[/:id]',
                'constraints' => array(
                    'id' => 'd+'
                ),
                'defaults' => array(
                    'controller' => 'FeedsControllerIndex'
                ),
            ),
        ),
    ),
),

This is the normal route we expose on the API level. As you can see right now, the username is mandatory, and we also have an ID that is optional. The ID will refer to the feed ID when we delete the subscriptions. In the next chapter, we will make the username parameter disappear because we will use the information of the Oauth 2.0 mechanism to identify who's making the request; but right now, we need to specify the username each time we make a request.

'console' => array(
    'router' => array(
        'routes' => array(
            'feeds-process' => array(
                'options' => array(
                    'route' => 'feeds process [--verbose|-v]',
                    'defaults' => array(
                        'controller' => 'FeedsControllerCli',
                        'action'     => 'processFeeds'
                    )
                )
            )
        )
    )
),

This is the new type of route we were talking about earlier. As you can see, the structure is fairly similar to the default routes. The only difference is that the route parameter specifies the parameters you have to pass to the index.php file when called from the command line in order to execute the specific controller and the configured action. As you can also see, there is a parameter between the square brackets, which means that the parameter is optional. This is exactly the same as with the normal routes, but in this case, we also have a pipe symbol and a shorter version of the same parameter. This allows us to give longer and shorter versions of each parameter. Now, you go to the public folder using the command line and execute the following code:

php index.php feeds process –v

The request will be sent to the CliController.php file and will be fulfilled. That's the line with which we will configure the cronjob:

'controllers' => array(
    'invokables' => array(
        'FeedsControllerIndex' => 
            'FeedsControllerIndexController',
        'FeedsControllerCli' => 'FeedsControllerCliController'
    ),
),

This is the last block of code of this file, and we are only listing the available controllers on this module and mapping them to the actual file.

The Module.php file

In this case, we do not need to add or modify any of the default code that usually comes with this file. If you review the code on the file, you will see that we have two methods: getConfig() and getAutoloaderConfig(), and they are the same as the ones we saw in the previous Module.php files of other modules.

Adding the UserFeedsTable.php file

At this point, we already saw a few table gateways, and I'm confident that you are totally capable of creating them on your own. In this case, we will just highlight the updateTimestamp() method that takes care of updating the updated_at column with the current timestamp. This column will be used by the CLI script to track the last time we fetched the articles and avoid duplicating them on the database. As usual, we have other methods: create() that adds a new row on the table, getByUserId() that fetches all the rows based on the user_id attribute, and the setDbAdapter() method, which we have on all the table gateways.

public function updateTimestamp($feedId)
{
    return $this->update(array(
        'updated_at' => new Expression('NOW()')
    ), array(
        'id' => $feedId)
    );
}

As you can see, we issue a call to the update() method passing an array with the changes we want to make; in this case, by updating the updated_at column with the NOW() expression. The second parameter we pass to the method is the WHERE clause, because we don't want to mess up and update all the rows.

Adding the UserFeedArticleTable.php file

We are not going to review this table gateway in detail because it is really simple, and as we said before, you should be capable of doing it yourself. In this case, we have a couple of methods: the usual create() method to insert the data on the table and the getByFeedId() method to retrieve all the articles of a specific feed. As usual, the setDbAdapter() method is also present in this class.

The contents of the IndexController.php file

In this controller, we will take care of retrieving the information of an RSS to insert it on the database, and this would imply the usage of ZendFeed.

As usual, at the beginning of the class, we add the namespace and the dependencies we need on this controller. In this case, we also add the following line to declare that we want to use the ZendFeedReaderReader component of ZF2:

use ZendFeedReaderReader;

As we make a query to retrieve the URL of the RSS feed and the fav icon of the page to use as an icon on our menu, we also need a ZendHttpClient:

use ZendHttpClient;

Additionally, we also use the following components:

use ZendDomQuery;
use ZendValidatorDbNoRecordExists;

As is common on the other controllers, we declare a few properties at the top of the class to hold a copy of the table gateways and avoid creating a few of them using the following code:

protected $userFeedsTable;
protected $userFeedArticlesTable;
protected $usersTable;

The methods that are not implemented on these controllers are the same as the ones we saw in the previous table: get() and update(). They will contain the following line to return a 405 HTTP code to the clients:

$this->methodNotAllowed();

Let's now review each method one by one to see what they do and how they do it.

The first one we will review is getList(), which basically returns a list of all the feeds with the associated articles nested on the information. In the first few lines of the code, you can see that we extract the information of the user based on the mandatory username parameter on the route. As we mentioned before, this will be amended in the future when we implement the API authentication. But for now, the following quick and dirty solution will do the job:

$username = $this->params()->fromRoute('username'),
$usersTable = $this->getTable('UsersTable'),
$user = $usersTable->getByUsername($username);
$userFeedsTable = $this->getTable('UserFeedsTable'),
$userFeedArticlesTable = $this->getTable('UserFeedArticlesTable'),

$feedsFromDb = $userFeedsTable->getByUserId($user->id);
$feeds = array();
foreach ($feedsFromDb as $f) {
    $feeds[$f->id] = $f;
    $feeds[$f->id]]['articles'] = $userFeedArticlesTable
        ->getByFeedId($f->id)->toArray();
}

return new JsonModel($feeds);

As you can see, we retrieved the feeds from the table based on the user ID and then we proceeded to extract the articles of each feed from the database. If you take a closer look, you can see that we are creating an associative array using the ID of the feed as a key. This is a convenient way to return the information for the frontend and be able to quickly extract the articles we need to show, based on the feed ID. Of course, if a feed doesn't have any articles, we will store an empty array on the articles key.

In the following section, we are going to review the delete() method, which is very simple:

$username = $this->params()->fromRoute('username'),
$usersTable = $this->getTable('UsersTable'),
$user = $usersTable->getByUsername($username);
$userFeedsTable = $this->getTable('UserFeedsTable'),
$userFeedArticlesTable = $this->getTable('UserFeedArticlesTable'),

$userFeedArticlesTable->delete(array('feed_id' => $id));
return new JsonModel(array(
    'result' => $userFeedsTable->delete(array(
        'id' => $id, 
        'user_id' => $user->id
    ))
    ));

The procedure is essentially the same, the only difference with the method we reviewed before is that we issue a delete() call on the table gateway, passing the feed_id attribute. The first call removes the articles from user_feed_articles, and the second delete() call removes the subscription from user_feeds. As you can see, we also used the user_id attribute on the second delete() call. This is a small protection to avoid someone deleting a subscription of another user. Of course, if we were doing this in a professional way, we would want to first check if the user has the subscription we are trying to delete before we actually delete the data.

Now let's jump to the last method we are going to explain in this controller. The create() method takes care of extracting the information given by the user from the website and storing the subscription. We can just accept an RSS URL and store it. But in this case, we also need to extract the favicon of the original website to use it at the frontend. So, the user needs to provide the URL of the website instead of the URL of the RSS; we will take care of discovering the RSS URL by using the following code:

$username = $this->params()->fromRoute('username'),
$usersTable = $this->getTable('UsersTable'),
$user = $usersTable->getByUsername($username);

This is the first block and as we saw before, it just retrieves the user from the database to have access to the user data.

$userFeedsTable = $this->getTable('UserFeedsTable'),
$rssLinkXpath = '//link[@type="application/rss+xml"]';
$faviconXpath = '//link[@rel="shortcut icon"]';

In the second block, of the method, we retriev an instance of the table gateway, and then store two XPath expressions that will help us to retrieve the HTML tags that contain the RSS URL and the favicon URL.

$client = new Client($data['url']);
$client->setEncType(Client::ENC_URLENCODED);
$client->setMethod(endHttpRequest::METHOD_GET);
$response = $client->send();

The third section of code prepares the HTTP client that we will use to retrieve the HTML and issues the request.

Now, we arrive on a conditional block that checks if we got the HTML from the website or not. In case the request fails, we throw an exception.

if ($response->isSuccess()) {
    ...
} else {
    throw new Exception("Website not found", 404);
}

Note

XPath is a query language that allow us to select nodes inside an XML document.

Keep in mind that here we are taking a shortcut with an exception. In a proper application, you need to add this to a queue and retry in a few minutes or just retry a few times before throwing the exception. But this is out of the scope of this book.

$html = $response->getBody();
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"); 

$dom = new Query($html);
$rssUrl = $dom->execute($rssLinkXpath);

if (!count($rssUrl)) {
    return new JsonModel(array(
        'result' => false, 
        'message' => 'Rss link not found in the url provided'
    ));
}
$rssUrl = $rssUrl->current()->getAttribute('href'),

$faviconUrl = $dom->execute($faviconXpath);
if (count($faviconUrl)) {
    $faviconUrl = $faviconUrl->current()->getAttribute('href'),
} else {
    $faviconUrl = null;
}

This is the code you will find inside the if block that we saw before, which of course, is only executed if we are able to get the contents of the page.

The first two lines just retrieve the contents and take care of converting the encoding,ensuring that there are no issues when using the HTML inside the ZendDomQuery object that comes later.

The first thing we try is to retrieve the HTML tag that contains the RSS link. In case we don't find it, we just throw an exception. After that, we store the href value on a variable and carry on trying to get the favicon URL. In this case, the favicon is not a critical part. So if we don't get it, we just ignore it.

The final part of this method is the following block. Here, we will load the RSS URL using a ZendFeedReaderReader component, and use that component to extract the information from the RSS-like title and so on:

$rss = Reader::import($rssUrl);

return new JsonModel(array(
    'result' => $userFeedsTable->create(
        $user->id, 
        $rssUrl, 
        $rss->getTitle(), 
     $faviconUrl
   )
));

As a final note on this controller we also have the methodNotAllowed() and getTable() methods that are similar to the ones used on the other controllers. As you see, we use exactly the same code over and over again for the methodNotAllowed() method which is a good candidate to be promoted as a parent class and get rid of it on all its children.

Creating the CliController.php file

This is the new type of controller we are going to see in this book. Actually, it is not a new type, it is just a controller extending the AbstractActionController class, which will be called using the command-line Interface. This means that all the controllers extending the AbstractActionController class can be called through the CLI, but it is your job to detect if the request is coming from the normal channels as a HTTP request or a CLI request, and then act in one way or the other on each case. You can check the type of the request object to determine if it is a CLI request or a normal request. In the first case, the object will be an instance of ZendConsoleRequest, and in the latter, it will be an instance of ZendHttpPhpEnvironmentRequest.

This script will take care of retrieving the new articles found on each RSS feed on the database. To accomplish that, we will use the updated_at field to avoid duplications on the articles. Also, keep in mind that this script is supposed to be called by a cronjob every X minute to refresh the articles in an automatic way.

As we are going to read the RSS feed from this controller, we need to declare the usage of the Feed components of ZF2 as follows:

use ZendFeedReaderReader;

We need to work with the new tables and this means that we need to declare the properties that will hold the table and the getTable() method, which will create the instances.

protected $userFeedsTable;
protected $userFeedArticlesTable;

After all of this, we arrive on the processFeedsAction() method we called on the console route, which is the one in charge of processing the requests. Let's see what we have inside it.

$request = $this->getRequest();
$verbose = $request->getParam('verbose') 
    || $request->getParam('v'),

These are the first two lines. The objective is to check if the verbose parameter was specified while calling the script from the command line. If you remember, we specified two versions of the verbose parameter, the long and the short one, and that's why we need to retrieve both here.

$userFeedsTable = $this->getTable('UserFeedsTable'),
$userFeedArticlesTable = $this->getTable('UserFeedArticlesTable'),
$feeds = $userFeedsTable->select();

The following block of code gets instances of the tables we need to use, and utilizes the first one to fetch all the feeds on the database and loop over them to execute the following code:

foreach ($feeds as $feed) {
    if ($verbose) {
        printf("Processing feed: %s
", $feed['url']);
    }
    $lastUpdate = strtotime($feed['updated_at']);
    $rss = Reader::import($feed['url']);

    // Loop over each channel item/entry and store relevant data 
        for each
    foreach ($rss as $item) {
        $timestamp = $item->getDateCreated()->getTimestamp();
        if ($timestamp > $lastUpdate) {
            if ($verbose) {
                printf("Processing item: %s
", $item
                    ->getTitle());
            }
            $author = $item->getAuthor();
            if (is_array($author)) {
                $author = $author['name'];
            }
            
            $userFeedArticlesTable->create(
                $feed['id'], 
                $item->getTitle(), 
                $item->getContent(), 
                $item->getLink(), 
                $author
            );
        }
    }
    
    if ($verbose) {
        printf("Updating timestamp
");
    }
    
    $userFeedsTable->updateTimestamp($feed['id']);
    
    if ($verbose) {
        printf("Finished feed processing

");
    }
}

The first part just outputs the information to the console in case the verbose parameter is specified. Then, we get the last update from the database and convert it to a timestamp to be able to compare it. Right after that, we import the RSS feed using the ZendFeedReaderReader component and loop over it to access all the articles.

For each article on the feed, we extract the timestamp and compare it against the value of the last update we have on the database. In case the article was published after the last update, we proceed printing more debug information, and then insert the data on the table, using the information we get from the item object.

After looping through all the articles, we print more debug information and then update the timestamp for this feed on the database, finishing with more debug strings.

If something goes wrong while processing, the script will stop. So, if you are going to use this on a production-ready application, you need to take care of all the exceptions that can occur.

The ApiErrorListener.php file

Because now the controllers can be called by the normal channel or CLI, we need to handle the errors in a different way for each case. If it is an error produced while fulfilling an HTTP request, we need to return it using the method we have in place; however, if an error occurs while running the CLI script, we need to output it on the console. To fix this, we have to do a small change in the ApiErrorListener.php file, and replace the first line of the onRender() method. Instead of just checking if the response is OK, we are also going to test if the request is a CLI request.

   if ($e->getRequest() instanceOf endConsoleRequest || 
$e->getResponse()->isOk())
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset