Now that you have a refreshed and updated idea of what JavaScript programming is really like, it’s time to get into the core concept that makes Node.js what it is: nonblocking IO and asynchronous programming. It carries with it some huge advantages and benefits, which you shall soon see, but it also brings some complications and challenges with it.
In the olden days (2008 or so), when you sat down to write an application and needed to load in a file, you would write something like the following (let’s assume you’re using something vaguely PHP-ish for the purposes of this example):
$file = fopen('info.txt', 'r');
// wait until file is open
$contents = fread($file, 100000);
// wait until contents are read
// do something with those contents
If you were to analyze the execution of this script, you would find that it spends a vast majority of its time doing nothing at all. Indeed, most of the clock time taken by this script is spent waiting for the computer’s file system to do its job and return the file contents you requested. Let me generalize things a step further and state that for most IO-based applications—those that frequently connect to databases, communicate with external servers, or read and write files—your scripts will spend a majority of their time sitting around waiting (see Figure 3.1).
The way your servers process multiple requests at the same time by running many of these scripts in parallel. Modern computer operating systems are great at multitasking, so you can easily switch out processes that are blocked and let other processes have access to the CPU. Some environments take things a step further and use threads instead of processes.
The problem is that for each of these processes or threads, there is some amount of overhead. For heavier implementations using Apache and PHP, I have seen up to 10–15MB of memory overhead per process—never mind the resources and time consumed by the operating system switching that context in and out constantly. That’s not even 100 simultaneously executing servers per gigabyte of RAM! Threaded solutions and those using more lightweight HTTP servers do, of course, have better results, but you still end up in a situation in which the computer spends most of its time waiting around for blocked processes to get their results, and you risk running out of capacity to handle incoming requests.
It would be nice if there were some way to make better use of all the available CPU power and available memory so as not to waste so much. This is where Node.js shines.
To understand how Node.js changes the method demonstrated in the preceding section into a nonblocking, asynchronous model, first look at the setTimeout
function in JavaScript. This function takes a function to call and a timeout after which it should be called:
setTimeout(() => {
console.log("I've done my work!");
}, 2000);
console.log("I'm waiting for all my work to finish.");
If you run the preceding code, you see the following output:
I'm waiting for all my work to finish.
I've done my work!
I hope this is not a surprise to you: The program sets the timeout for 2000 ms (2 seconds), giving it the function to call when it fires, and then continues with execution, which prints out the “I’m waiting...” text. Two seconds later, you see the “I’ve done...” message, and the program then exits.
Now, look at a world where any time you call a function that needs to wait for some external resource (database server, network request, or file system read/write operation), it has a similar signature. That is, instead of calling fopen(path, mode)
and waiting, you would instead call fopen(path, mode, (file_handle) => { ... })
.
Now rewrite the preceding synchronous script using the new asynchronous functions. You can actually enter and run this program with node
from the command line. Just make sure you also create a file called info.txt that can be read.
var fs = require('fs'); // We'll explain this below
var file;
var buf = new Buffer(100000);
fs.open('info.txt', 'r', (err, handle) => {
file = handle;
});
// fs.read needs the file handle returned by fs.open. But this is broken.
fs.read(file, buf, 0, 100000, null, (err, length) => {
console.log(buf.toString());
fs.close(file, () => { /* don't care */ });
});
The first line of this code is something you haven’t seen just yet: the require
function is a way to include additional functionality in your Node.js programs. Node comes with a pretty impressive set of modules, each of which you can include separately as you need functionality. You will work further with modules frequently from now on; you learn about consuming them and writing your own in Chapter 5, “Modules.”
If you run this program as it is, it throws an error and terminates. How come? Because the fs.open
function runs asynchronously; it returns immediately, before the file has been opened and the callback function invoked. The file
variable is not set until the file has been opened and the handle to it has been passed to the callback specified as the third parameter to the fs.open
function. Thus, you are trying to access an undefined
variable when you try to call the fs.read
function with it immediately afterward.
Fixing this program is easy:
var fs = require('fs');
fs.open('info.txt', 'r', (err, handle) => {
var buf = new Buffer(100000);
fs.read(handle, buf, 0, 100000, null, (err, length) => {
console.log(buf.toString('utf8', 0, length));
fs.close(handle, () => { /* Don't care */ });
});
});
The key way to think of how these asynchronous functions work internally in Node is something along the following lines:
Check and validate parameters.
Tell the Node.js core to queue the call to the appropriate function for you (in the preceding example, the operating system open
or read
function) and to notify (call) the provided callback function when there is a result.
Return to the caller.
You might be asking: if the open
function returns right away, why doesn’t the node
process exit immediately after that function has returned? The answer is that Node operates with an event queue; if there are pending events for which you are awaiting a response, it does not exit until your code has finished executing and there are no events left on that queue. If you are waiting for a response (either to the open
or the read
function calls), it waits. See Figure 3.2 for an idea of how this scenario looks conceptually.
In the preceding chapter, I discussed error handling and events as well as the try / catch
block in JavaScript. The addition of nonblocking IO and asynchronous function callbacks in this chapter, however, creates a new problem. Consider the following code:
try {
setTimeout(() => {
throw new Error("Uh oh!");
}, 2000);
} catch (e) {
console.log("I caught the error: " + e.message);
}
If you run this code, you might very well expect to see the output "I caught the error: Uh oh!"
. But you do not. You actually see the following:
timers.js:103
if (!process.listeners('uncaughtException').length) throw e;
^
Error: Uh oh, something bad!
at Object._onTimeout errors_async.js:5:15)
at Timer.list.ontimeout (timers.js:101:19)
What happened? Did I not say that try / catch
blocks were supposed to catch errors for you? I did, but asynchronous callbacks throw a new little wrench into this situation.
In reality, the call to setTimeout
does execute within the try / catch
block. If that function were to throw an error, the catch
block would catch it, and you would see the message that you had hoped to see. However, the setTimeout
function just adds an event to the Node event queue (instructing it to call the provided function after the specified time interval—2000 ms in this example) and then returns. The provided callback function actually operates within its own entirely new context and scope!
As a result, when you call asynchronous functions for nonblocking IO, very few of them throw errors, but instead use a separate way of telling you that something has gone wrong.
In Node, you use a number of core patterns to help you standardize how you write code and avoid errors. These patterns are not enforced syntactically by the language or runtime, but you will see them used frequently and should absolutely use them yourself.
One of the first patterns you will see is the format of the callback function you pass to most asynchronous functions. It always has at least one parameter, the success or failure status of the last operation, and very commonly a second parameter with some sort of additional results or information from the last operation (such as a file handle, database connection, rows from a query, and so on); some callbacks are given even more than two:
do_something(param1, param2, ..., paramN, function (err, results) { ... });
null
, indicating the operation was a success, and (if there should be one) there will be a result.
An instance of the Error
object class. You will occasionally notice some inconsistency here, with some people always adding a code
field to the Error
object and then using the message
field to hold a description of what happened, whereas others have chosen other patterns. For all the code you write in this book, you will follow the pattern of always including a code
field and using the message
field to provide as much information as you can. For all the modules you write, you will use a string value for the code
because strings tend to be a bit easier to read. Some libraries provide extra data in the Error
object with additional information, but at least the two members should always be there.
This standard prototype methodology enables you to always write predictable code when you are working with nonblocking functions. Throughout this book, I demonstrate two common coding styles for handling errors in callbacks. Here’s the first:
fs.open('info.txt', 'r', (err, handle) => {
if (err) {
console.log("ERROR: " + err.code + " (" + err.message ")");
return;
}
// success!! continue working here
});
In this style, you check for errors and return if you see one; otherwise, you continue to process the result. And now here’s the other way:
fs.open('info.txt', 'r', (err, handle) => {
if (err) {
console.log("ERROR: " + err.code + " (" + err.message ")");
} else {
// success! continue working here
}
});
In this method, you use an if ... then ... else
statement to handle the error.
The difference between these two may seem like splitting hairs, but the former method is a little more prone to bugs and errors for those cases when you forget to use the return
statement inside the if
statement, whereas the latter results in code that indents itself much more quickly and you end up with lines of code that are quite long and less readable. We’ll look at a solution to this second problem in the section titled “Managing Asynchronous Code” in Chapter 5.
A fully updated version of the file loading code with error handling is shown in Listing 3.1.
var fs = require('fs');
fs.open('info.txt', 'r', (err, handle) => {
if (err) {
console.log("ERROR: " + err.code + " (" + err.message + ")");
return;
}
var buf = new Buffer(100000);
fs.read(handle, buf, 0, 100000, null, (err, length) => {
if (err) {
console.log("ERROR: " + err.code
+ " (" + err.message + ")");
return;
}
console.log(buf.toString('utf8', 0, length));
fs.close(handle, () => { /* don't care */ });
});
});
Now you’re ready to write a little class to help you with some common file operations:
var fs = require('fs');
function FileObject () {
this.filename = '';
this.file_exists = function (callback) {
console.log("About to open: " + this.filename);
fs.open(this.filename, 'r', function (err, handle) {
if (err) {
console.log("Can't open: " + this.filename);
callback(err);
return;
}
fs.close(handle, function () { });
callback(null, true);
});
};
}
You have currently added one property, filename
, and a single method, file_exists
. This method does the following:
It tries to open the file specified in the filename
property read-only.
If the file doesn’t exist, it prints a message and calls the callback function with the error info.
If the file does exist, it calls the callback function indicating success.
Now, run this class with the following code:
var fo = new FileObject();
fo.filename = "file_that_does_not_exist";
fo.file_exists((err, results) => {
if (err) {
console.log("
Error opening file: " + JSON.stringify(err));
return;
}
console.log("file exists!!!");
});
You might expect the following output:
About to open: file_that_does_not_exist
Can't open: file_that_does_not_exist
But, in fact, you see this:
About to open: file_that_does_not_exist
Can't open: undefined
What happened? Most of the time, when you have a function nested within another, it inherits the scope of its parent/host function and should have access to all the same variables. So why does the nested callback function not get the correct value for the filename
property?
The problem lies with the this
keyword and asynchronous callback functions. Don’t forget that when you call a function like fs.open
, it initializes itself, calls the underlying operating system function (in this case to open a file), and places the provided callback function on the event queue. Execution immediately returns to the FileObject#file_exists
function, and then you exit. When the fs.open
function completes its work and Node runs the callback, you no longer have the context of the FileObject
class any more, and the callback function is given a new this
pointer representing some other execution context!
The bad news is that you have, indeed, lost your this
pointer referring to the FileObject
class. The good news is that the callback function for fs.open
does still have its function scope. A common solution to this problem is to “save” the disappearing this
pointer in a variable called self
or me
or something similar. Now rewrite the file_exists
function to take advantage of this:
this.file_exists = function (callback) {
var self = this;
console.log("About to open: " + self.filename);
fs.open(this.filename, 'r', function (err, handle) {
if (err) {
console.log("Can't open: " + self.filename);
callback(err);
return;
}
fs.close(handle, function () { });
callback(null, true);
});
};
Because local function scope is preserved via closures, the new self
variable is maintained for you even when your callback is executed asynchronously later by Node.js. You will make extensive use of this in all your applications. Some people like to use me
instead of self
because it is shorter; others still use completely different words. Pick whatever kind you like and stick with it for consistency.
The above scenario is another reason to use arrow functions, introduced in the previous chapter. Arrow functions capture the this
value of the enclosing scope, so your code actually works as expected! Thus, as long as you are using =>
, you can continue to use the this
keyword, as follows:
var fs = require('fs');
function FileObject () {
this.filename = '';
// Always use "function" for member fns, not =>, see below for why
this.file_exists = function (callback) {
console.log("About to open: " + this.filename);
fs.open(this.filename, 'r', (err, handle) => {
if (err) {
console.log("Can't open: " + this.filename);
callback(err);
return;
}
fs.close(handle, () => { });
callback(null, true);
});
};
}
One other thing to note is that we do not use arrow functions for declaring member functions on objects or prototypes. This is because in those cases, we actually do want the this
variable to update with the context of the currently executing object. Thus, you’ll see us using => only when we’re using anonymous functions in other contexts.
The key takeaway for this section should be: If you’re using an anonymous function that’s not a class or prototype method, you should stop and think before using this
. There’s a good chance it won’t work the way you want. Use arrow functions as much as possible.
Node runs in a single thread with a single event loop that makes calls to external functions and services. It places callback functions on the event queue to wait for the responses and otherwise tries to execute code as quickly as possible. So what happens if you have a function that tries to compute the intersection between two arrays:
function compute_intersection(arr1, arr2, callback) {
var results = [];
for (var i = 0 ; i < arr1.length; i++) {
for (var j = 0; j < arr2.length; j++) {
if (arr2[j] == arr1[i]) {
results[results.length] = arr2[j];
break;
}
}
}
callback(null, results); // no error, pass in results!
}
For arrays of a few thousand elements, this function starts to consume significant amounts of time to do its work, on the order of a second or more. In a single-threaded model, where Node.js can do only one thing at a time, this amount of time can be a problem. Similar functions that compute hashes, digests, or otherwise perform expensive operations are going to cause your applications to temporarily “freeze” while they do their work? What can you do?
In the introduction to this book, I mentioned that there are certain things for which Node.js is not particularly well suited, and one of them is definitely acting as a compute server. Node is far better suited to more common network application tasks, such as those with heavy amounts of IO and requests to other services. If you want to write a server that does a lot of expensive computations and calculations, you might want to consider moving these operations to other services that your Node applications can then call remotely.
I am not saying, however, that you should completely shy away from computationally intensive tasks. If you’re doing these only some of the time, you can still include them in Node.js and take advantage of a method on the process
global object called nextTick
. This method basically says “Give up control of execution, and then when you have a free moment, call the provided function.” It tends to be significantly faster than just using the setTimeout
function.
Listing 3.2 contains an updated version of the compute_intersection
function that yields every once in a while to let Node process other tasks.
function compute_intersection(arr1, arr2, callback) {
// let's break up the bigger of the two arrays
var bigger = arr1.length > arr2.length ? arr1 : arr2;
var smaller = bigger == arr1 ? arr2 : arr1;
var biglen = bigger.length;
var smlen = smaller.length;
var sidx = 0; // starting index of any chunk
var size = 10; // chunk size, can adjust!
var results = []; // intermediate results
// for each chunk of "size" elements in bigger, search through smaller
function sub_compute_intersection() {
for (var i = sidx; i < (sidx + size) && i < biglen; i++) {
for (var j = 0; j < smlen; j++) {
if (bigger[i] == smaller[j]) {
results.push(smaller[j]);
break;
}
}
}
if (i >= biglen) {
callback(null, results); // no error, send back results
} else {
sidx += size;
process.nextTick(sub_compute_intersection);
}
}
sub_compute_intersection();
}
In this new version of the function, you basically divide the bigger of the input arrays into chunks of 10 (you can choose whatever number you want), compute the intersection of that many items, and then call process#nextTick
to allow other events or requests a chance to do their work. Only when there are no events in front of you any longer, will you continue to do the work. Don’t forget that passing the callback function sub_compute_intersection
to process#nextTick
ensures that the current scope is preserved as a closure, so you can store the intermediate results in the variables in compute_intersection
.
Listing 3.3 shows the code you use to test this new compute_intersection
function.
var a1 = [ 3476, 2457, 7547, 34523, 3, 6, 7,2, 77, 8, 2345,
7623457, 2347, 23572457, 237457, 234869, 237,
24572457524] ;
var a2 = [ 3476, 75347547, 2457634563, 56763472, 34574, 2347,
7, 34652364 , 13461346, 572346, 23723457234, 237,
234, 24352345, 537, 2345235, 2345675, 34534,
7582768, 284835, 8553577, 2577257,545634, 457247247,
2345 ];
compute_intersection(a1, a2, function (err, results) {
if (err) {
console.log(err);
} else {
console.log(results);
}
});
Although this has made things a bit more complicated than the original version of the function to compute the intersections, the new version plays much better in the single-threaded world of Node event processing and callbacks, and you can use process.nextTick
in any situation in which you are worried that a complex or slow computation is necessary.
Now that I have spent nearly an entire chapter telling you how Node.js is very much asynchronous and about all the tricks and traps of programming nonblocking IO, I must mention that Node actually does have synchronous versions of some key APIs, most notably file APIs. You use them for writing command-line tools in Chapter 12, “Command-Line Programming.”
To demonstrate briefly here, you can rewrite the first script of this chapter as follows:
var fs = require('fs');
var handle = fs.openSync('info.txt', 'r');
var buf = new Buffer(100000);
var read = fs.readSync(handle, buf, 0, 10000, null);
console.log(buf.toString('utf8', 0, read));
fs.closeSync(handle);
As you work your way through this book, I hope you are able to see quite quickly that Node.js isn’t just for network or web applications. You can use it for everything from command-line utilities to prototyping to server management and more!
Switching from a model of programming where you execute a sequence of synchronous or blocking IO function calls and wait for each of them to complete before moving on to the next call, to a model where you do everything asynchronously and wait for Node to tell you when a given task is done requires a bit of mental gymnastics and experimentation. But I am convinced that when you get the hang of this, you’ll never be able imagine going back to the other way of writing your web apps.
Next, you write your first simple JSON application server.