Select EC2 under All Services; you can also find the EC2 service in the Services menu at the top of the page. The EC2 Dashboard
provides summary information about existing EC2 instances (see figure E.2)
Figure E.2. Creating a new AWS instance
In the EC2 Dashboard, click the blue Launch Instance button to start the instance setup wizard, a sequence of screens where
you can configure the virtual machine you want to launch.
This screen (figure E.3) shows the server hard drive images or ISOs you can install on your virtual machine. These are called Amazon Machine Images (AMIs) on Amazon.[1] Some AMIs come with deep learning frameworks already installed. That way, you don’t need to install and configure the CUDA
and BLAS libraries or Python packages such as TensorFlow, numpy, and Keras. To find a free preconfigured deep learning AMI, click the Amazon Marketplace or Community AMIs tab on the left side and
search for “deep learning.”[2] You must still configure the hardware that makes use of all the software features that a particular AMI provides.
ISO is short for ISO-9660, an International Standards Organization open standard for writing disk images in a way that they
can be transported and installed elsewhere, not only on one proprietary cloud service, such as AWS.
At the time of this writing, one such image under the Amazon Marketplace had an AMI ID of ami-f1d51489.
Figure E.3. Selecting an AWS Machine Image
Some of the neural network code in this book was tested on the Deep Learning AMI (Ubuntu), which is designed to take advantage
of any GPU hardware present on your virtual machine. Click the blue Select button next to the AMI you want to use. If you’ve
selected an Amazon Marketplace image, you’ll be presented with an estimate of the prices for running the AMI on various EC2
instance types that have a GPU (see figure E.4).
Figure E.4. Cost overview for the machine image and the available instance types in your AWS region
Many open source AMIs, like the Deep Learning Ubuntu AMI, are free, so the Software cost column on the More Info page for
Amazon Marketplace shows $0. Other AMIs under the AWS Marketplace tab, such as the RocketML AMI, may have software costs associated
with them. Regardless of the software cost, you’ll need to pay for server instance power-on time if it exceeds your “free
tier” allowance. A GPU instance isn’t covered under the free tier. So make sure your pipeline has been fully tested on a low-cost
CPU machine before running your pipeline on a more-expensive instance. Click the blue Continue button if you’re viewing this
price list (see figure E.4). If you’ve returned to the AMI lists on Amazon Marketplace, you can click the blue Select button next to the AMI you would
like to install on your EC2 instance, which will take you to “Step 2: Choose an Instance Type” (see figure E.5).
In this step, you select the server type for your virtual machine (see figure E.5). The smallest GPU instance—g2.2xlarge—is a good value. Amazon’s dark pattern UI will preselect a much more expensive type,
so you’ll have to manually select the g2.2xlarge instance if that’s the one you want. Also, you’ll find that virtual machines are much cheaper if you’ve
selected US West 2 (Oregon) as your region rather than other US regions. You can find this selection in the menu at the upper-right
corner of the page near your account name.
Figure E.5. Choosing your instance type
Once you’ve selected the instance type you’d like to use, you can launch your machine by clicking the blue Review and Launch
button. But for your first instance, you should work your way through all the setup wizard steps so you can see what your
options are, even if you decide to accept the defaults on each of these screens. To proceed to the next step, click the gray
Next: Configure Instance Details button.
Here you can configure the instance details (see figure E.6). If you are already using AWS machines on an existing virtual private cloud (VPC), you can assign your GPU machine to your existing VPC. Machines on the same VPC can use the same gateway or bastion
servers on that VPC to access your machine. But if this is your first EC2 instance or you don’t have a “bastion server,”[3] you don’t need to worry about this.
Selecting “Protect against accidental termination” makes it harder for you to accidentally terminate your machine. On Amazon
Web Services, “terminate” means to power off a machine and wipe its storage. “Stop” means to power down or suspend the machine
while retaining any training checkpoints you may have saved to persistent storage on that machine.
To continue, click the Next: Add Storage button.
In this step (figure E.7), you can add storage if you plan to work with large corpora. But you may be better off proceeding with a minimal amount
of “local” storage on your EC2 instance and waiting to mount an Amazon “S3 Bucket” or other cloud storage service after your
EC2 instance is up and running. This will allow you to share large datasets across multiple servers or training runs (between
instance terminations). Amazon Web Service will charge you for any “local” EC2 storage above the 30 GB free tier allowance.
The AWS UX has a lot of dark patterns that make it hard to avoid racking up charges.
Figure E.7. Adding persistent storage to your instance
Click the Next buttons to proceed through the next steps and review the default tags and security groups assigned to your
EC2 instance. The final Next button sends you to the review step (see figure E.8).
Figure E.8. Reviewing your instance setup before launching
On the review screen (see figure E.8), Amazon Web Services shows you the details of your instance in one overview.
Confirm that the instance details—particularly the type (RAM and CPU), the AMI image (Deep Learning Ubuntu), and storage (enough
GB for your data)—are what you want before clicking the Launch button. At that point, AWS will power up your virtual machine
and start loading your software image onto it.
If you haven’t previously created an instance with AWS, it’ll ask you to create a new key pair (see figure E.9). The key pair allows you to ssh into the machine without a password. By default, EC2 instances don’t allow password login,
so you’ll need to save the .pem file in your $HOME/.ssh/ folder and keep a copy of it in a safe place (such as your password manager) or you won’t be able to access your running
server and will have to start over.
Figure E.9. Creating a new instance key (or downloading an existing one)
After saving your key pair (if you created a new key pair), AWS confirms that the instance is launched. On rare occasions,
the Amazon data center may not have the resources you requested and you’ll receive an error, requiring you to start over.
Click the instance hash that starts with i-... (see figure E.10). The link sends you to the overview of all your EC2 instances, where you’ll see your instance with its state indicated as
“running” or “initializing.”
Figure E.10. AWS launch confirmation
You’ll want to record the public IP address for your instance (see figure E.11) alongside the .pem file for the key pair you generated earlier. A good place to store this is in your password manager with the .pem file. You’ll also want to put it within your $HOME/.ssh/config file, so you can give your instance a host name so you don’t have to find the IP address in the future.
Figure E.11. EC2 Dashboard showing the newly created instance
A typical config file will look something like what is shown in the following listing. You’ll want to change the HostName value to the public IP address (from the EC2 Dashboard) or fully qualified domain name (from your “Route 53” Dashboard on
AWS) for your EC2 instance that you just launched.
Listing E.1. $HOME/.ssh/config
Host totalgood
User ubuntu
HostName INSTANCE_PUBLIC_IP 1
Port 22
IdentityFile ~/.ssh/nlp-in-action.pem 2
# ssh -i ~/.ssh/nlp-in-action.pem ubuntu@INSTANCE_PUBLIC_IP 3
1 Replace INSTANCE_PUBLIC_IP with your public IP address.
2 The path to the .pem file you downloaded goes here.
3 You can leave notes as comments in your config file.
Before logging into the AWS instance, ssh requires that the private key file (.pem file in your $HOME/.ssh directory) can be read only by you and the root superuser on your system. You can set the appropriate permissions by executing
the following bash commands:[4]
1 This ensures that only you can delete, write, read, and execute the $HOME/.ssh directory.
2 This ensures that only you can write and read the .pem file you downloaded.
3 This ensures that you can read and write any of the key files in your $HOME/.ssh directory, like the default id_rsa and id_rsa.pub
files that may have been generated when your account was created.
After you’ve set the appropriate file permissions and set up your config file, execute the following bash command to attempt
to log into your EC2 instance:
If the Amazon Machine Image is Ubuntu-based, the user name is usually ubuntu. But each AMI will have documentation on the user name and ssh port number required to log into it.
If you log in for the very first time, you’re warned that the fingerprint of the machine is unknown (see figure E.12). Confirm with yes to go ahead with the login process.[5]
If you see this warning in the future, when you haven’t changed its IP address, then you may have someone attempting to spoof
the IP address or domain name of your machine and hack into your instance with a man-in-the-middle attack. This is extremely
rare.
Figure E.12. Confirmation request to exchange ssh credentials
After a successful login, you see a welcome screen (see figure E.13).
Figure E.13. Welcome screen after a successful login
As the final step, you need to activate your preferred development environment. The machine image provides various environments,
including PyTorch, TensorFlow, and CNTK. Because we use TensorFlow and Keras in this book, you should activate the tensorflow_p36 environment. This loads a virtual
environment with Python 3.6, Keras, and TensorFlow installed (see figure E.14):
$ source activate tensorflow_p36
Figure E.14. Activating your pre-installed Keras environment
Now that you’ve activated your TensorFlow environment, you are ready to train your deep learning NLP models. Head over to
an iPython shell with
$ ipython
Now you’re ready to train your models. Have fun!
E.1.1. Cost control
Running a GPU instance on a cloud service like AWS can quickly get expensive. The smallest GPU instance in the US-West 2 region
costs $0.65 per hour at the time of this writing. Training a simple sequence-to-sequence model can take a few hours, and then
you might want to iterate on your model parameters. All iterations can quickly add up to a decent monthly bill. You can minimize
surprises with a few precautions (see figures E.15 and E.16):
Turn off idle GPU machines. When you stop (not terminate) your machine, the last state of the storage (except your /tmp folder)
will be preserved and you can return to it. In-memory data will be lost, so make sure to save all your model checkpoints before
stopping the machine.
Figure E.15. AWS Billing Dashboard
Figure E.16. AWS Budget Console
Check your EC2 instance summary page for running instances.
Check your AWS bill summary regularly to check for running instances.
Create an AWS Budget with spending alarms. Once you’ve configured a budget, AWS will alert you when you are exceeding it.