The Secure Shell Protocol (SSH) is commonly used to remotely manage Unix systems. SSH was originally invented as a secure alternative to the telnet command but soon became the de facto remote management tool.
Even systems that use custom agents to manage a server fleet, such as Salt, are often bootstrapped with SSH to install the custom agents. When a system is described as agent-less, as, for example, Ansible is, it usually means that it uses SSH as its underlying management protocol.
The Paramiko library implements an SSH client. This allows automating remote management of Unix systems using Python.
Paramiko has both high-level and low-level abstractions of the SSH protocol. This chapter covers, for the most part, the high-level abstractions.
Before delving into the details, it is worth noting the synergy Paramiko has with Jupyter. Using a Jupyter notebook, and running Paramiko inside it, provides a powerful auto-documented remote-control console. Having multiple browsers connected to the same notebook means it has a native ability to share troubleshooting sessions for remote servers without the need for cumbersome screen sharing.
The Paramiko library relies on a few binary wheels to implement the cryptographic operations that are part of the SSH protocol. On systems with good support for binary wheels, like Windows, macOS, or Linux distributions that use the GNU C Library, pip install paramiko installs the library without further work.
Installing Paramiko without using the binary wheels from PyPI for its dependencies can be more complicated. The official install guide covers the relevant steps and should be followed. These steps can, sometimes, change as dependencies are upgraded. For example, cryptography changed its tooling to take advantage of Rust, which means building it from a source requires a Rust compiler.
9.1 SSH Security
SSH allows you to securely control and configure remote hosts. However, security is a subtle topic. Even if the underlying cryptographic primitives, and the way the protocol uses them, are secure, you must use them properly to prevent misusage from causing an issue that opens the door for a successful attack.
It is important to understand how SSH thinks about security to use it securely. Unfortunately, it was built when affordance for security was not considered a high priority. It is easy to use SSH, which negates all security benefits gotten from it.
The SSH protocol establishes mutual trust. The client is assured that the server is authentic, and the server is assured that the client is authentic. There are several ways it can establish this trust, but this discussion covers the public key method. This is the most common one.
A server’s public key is identified by a fingerprint. This fingerprint confirms the server’s identity in one of two ways. One way is by being communicated by a previously established secure channel and saved in a file.
For example, when an AWS EC2 server boots up, it prints the fingerprint to its virtual console. The contents of the console can be retrieved using an AWS API call (which is secured using the Webs TLS model) and parsed to retrieve the fingerprint.
The other way is the Trust On First Use (TOFU) model. In the initial connection, the fingerprint is assumed to be authentic and stored locally in a secure location. On any subsequent attempts, the fingerprint is checked against the stored fingerprint, and a different fingerprint is marked as an error.
The fingerprint is a hash of the server’s public key. If the fingerprints are the same, the public keys are the same. A server can prove that it knows the private key that corresponds to a given public key. In other words, a server can say here is my fingerprint and prove that it is indeed a server with that fingerprint. Therefore, if the fingerprint is confirmed
On the other side, users can indicate which public keys they trust to the server. Again this is often done via some out-of-band mechanism, a web API for the system administrator to put in a public key, a shared filesystem, or a boot script that reads information from the network. Regardless of how it is done, a user’s directory can contain a file that means, “please authorize connections that can prove they have a private key corresponding to this particular public key as coming from me.”
When an SSH connection is established, the client verifies the server’s identity and then provides proof that it owns a private key corresponding to some public key on the server. If both steps succeed, the connection is verified in both directions and can be used for running commands and modifying files.
9.2 Client Keys
Client private and public keys are kept in files next to each other. Users often already have an existing key, but this is easily remedied if not.
However, when you write out the private part of the key, you want to make sure that the file permissions are secure. You change the mode after opening the file but before writing any sensitive data to it.
This gives read and write permissions to the owner, no permissions to non-owner group members, and no permissions for anyone else.
Now through some out-of-band mechanism, push the public key to the relevant server.
Another thing that is sometimes done is using a Docker container as a bastion. This means you expect users to SSH both into the container and from the container into the specific machine they need to run commands on.
In this case, a simple COPY instruction at build time (or a docker cp at runtime, as appropriate) accomplishes the goal. Note that it is perfectly fine to publish an image with public keys to a Docker registry. The requirement that this is a safe operation is part of the definition of public keys.
9.3 Host Identity
The TOFU principle is the most common first line of defense against man-in-the-middle attacks in SSH. After connecting to a host, its fingerprint must be saved in a cache for this to work.
The location of that cache used to be straightforward—a file in the user’s home directory. However, more modern setups of immutable, throw-away environments, multiple user machines, and other issues complicate this.
A client can set a MissingHostKeyPolicy, which is an instance that supports an interface. This means that you can have logic to document the key or query an external database for it.
The known_hosts file is an abstraction of the most common format on Unix systems, the known_hosts file. Paramiko shares the experience with keys with the regular SSH client by reading it and documenting new entries.
9.4 Connecting
When this is done at the top-level or as close to it as is reasonable, functions can accept client as an argument without worrying about lifetime. The connection is closed at the end of the stanza.
Sometimes, before connecting, various policies need to be configured on the client. This is sometimes useful in a function that returns a ready-to-connect client.
In this case, the policy is set to WarningPolicy(). This policy uses the Python warnings module to warn about missing keys and allows the connection.
Policies are instances that define a method: policy. missing_host_key(client, name, key). The method should raise an error to prevent the connection. Any successful return is treated as a success.
load_system _host_keys() loads a file with host keys that should not be modified.
load_host_keys () loads a file with host keys that should not be modified.
This loads the known host’s file, which is checked and updated by the command-line SSH tool. In this case, keys are automatically updated if the policy paramiko.AutoAddPolicy() is set as the policy.
Note that only keys loaded via load_host_keys() are resaved. Keys loaded via load._system_host_keys() are not saved and are expected to be loaded again when recreating a client.
hostname is the server to connect to.
port is needed if you run on a special port other than 22. This is sometimes done as part of a security protocol; attempting a connection to port 22 automatically denies all further connections from the IP, while the real server runs on 5022 or a port that is only discoverable via API.
username is the name of the user. While the default is the local user, this is less frequently the case. Often cloud virtual machine images have a default system user.
pkey is a private key to use for authentication. This is useful if you want some programmatic way to get the private key (for example, retrieving it from a secret manager).
allow_agent is True by default, for good reasons. This is often a good option since Paramiko never loads the private key. Therefore, no matter what happens, the private key itself cannot be compromised by anything inside the Python process; for example, accidentally logging a __dict__ of an instance.
look.for.keys is set to False and gives no other key options to force using an agent.
9.5 Running Commands
The original SSH was invented as a telnet substitute, and its main job is still to run commands on remote machines. Note that remote is taken metaphorically, not always literally. SSH is sometimes used to control virtual machines and sometimes even containers that might be running close by.
After a Paramiko client has connected, it can run commands on the remote host. This is done using the exec_command client method. Note that this method takes the command to be executed as a string, not a list. This means that extra care must be exercised when interpolating user values into the command to make sure that it does not give a user complete execution privileges.
The return value of exec_command() is the command’s standard input, output, and error. This means that the responsibility of communicating carefully with the command to avoid deadlocks is firmly in the hands of the end-user. The best way to do so is to avoid commands which read from standard input. If at all possible, create a file on the remote machine first.
The client also has an invoke_shell method, which creates a remote shell and allows programmatic access to it. It returns a Channel object connected directly to the shell. The send method on the channel sends data to the shell, just as if a person was typing at the terminal.
Similarly, the recv method allows retrieving the output. Note that this can be tricky to get right, especially around timing. In general, using exec_command is much safer. Opening an explicit shell is rarely needed, except for running commands that need interaction. For example, remotely running visudo requires real shell-like access.
9.6 Remote Files
To start file management, call the client’s open_sftp method, which returns an SFTPClient object. You use methods on this object for all the remote file manipulation.
Internally, this starts a new SSH channel on the same TCP connection. This means that even while transferring files back and forth, the connection can still be used to send commands to the remote host. SSH does not have a notion of the current directory. Though SFTPClient emulates it, it is better to avoid relying on it and instead use fully qualified paths for all file manipulation. This makes code easier to refactor, and it does not have subtle dependencies on the order of operations.
9.6.1 Metadata Management
Sometimes you do not want to change the data but merely filesystem attributes. The SFTPClient object allows you to do the normal manipulation you expect.
Note that the 0644 notation, borrowed from C, does not work in Python 3 (and is deprecated in Python 2). The 0o644 notation is more explicit and Pythonic.
(This would correspond to -w----r-- in a directory listing, which is not insecure but very confusing!)
chown changes the owner
listdir_iter retrieves file names and metadata.
stat, lstat retrieves file metadata.
posix_rename atomically changes a file’s name. (Do not use rename because it has confusingly different semantics; it is there for backward compatibility.)
mkdir, rmdir creates and removes directories.
utime sets the accessed and modified times of a file.
9.6.2 Upload
There are two main ways to upload files to a remote host with Paramiko. One is to simply use put. The easiest way is to give it a local path and a remote path and copy the file. The function also accepts other parameters, mainly a callback to call with intermediate progress. However, it is better to upload differently if such sophistication is required.
The open method on SFTPClient returns an open file-like object. It is straightforward to write a loop that remotely copies block by block or line by line. In that case, the logic for progress could be embedded in the loop itself instead of having to supply a callback function and carefully maintain states between calls.
9.6.3 Download
Much like uploading, there are two ways to retrieve files from the remote host. One is via the get method, which gets the names of the remote and local files, and manages the copying.
The other is again by using the open method, this time in read mode instead of write, and copying block by block or line by line. Again, if a progress indicator is needed or feedback from the user is desired, that is the better approach.
9.7 Summary
Most Unix-based servers can be managed remotely using the SSH protocol. Paramiko is a powerful way to automate management tasks in Python while assuming the least about any server. It runs an SSH server that you have permission to log in to.