Certificate-based SSH

Date of publication

by Peter Manthey

SSH access to remote servers is often enabled by a user's self-created SSH key. The public part of the SSH key is written manually to the authorized_keys file in the corresponding home directory on the server. The maintenance of these keys is thus exclusively manual and usually without dual control.

An end to arbitrary

As part of the rebuild of our automated infrastructure in the data center, we also fundamentally addressed this problem. We took a bit of inspiration from major platforms and came across this article, among others: Implementing SSH CA at Facebook

After we had collected, refined and agreed on the requirements, we searched the depths of the free software for suitable tools to set up our CA (Certificate Authority). We found what we were looking for in Vault, whose extensive functions we used in the implementation (e.g. API, SEAL/UNSEAL via split master key, built-in UI, SSH-CA plugin).

Roles, rights, zones and flow

When collecting the requirements, we quickly realized that we were dealing with sometimes quite different authorizations and runtimes: For example, we want to cleanly separate the role of administration from on-call. And readiness, in turn, is ensured by different employees working in shifts. In the lab, we roll out systems that are not simply forgotten in the event of an unsuccessful evaluation and are not allowed to "bog down". Collaboration with external persons must also be clearly documented and mapped in the rights.

This means that everyone should only have access to systems if they need them and have been approved for them. And approvals should be based on the four (or six, eight, etc.) eyes principle.

Our requirements have resulted in the following roles:

  • Administration
  • Standby
  • Lab
  • Development (Permanent employees & Freelancer)
  • Application

In addition to the roles, we had to define zones for which the roles are responsible. This allowed us to create a separation of rights, for example for development and on-call: On-call must be able to access all systems in the event of a fault, while development is only allowed to access the respective dev environment.

We also defined useful key validities that are based on project runtimes or on-call duties, for example.

Example workflow readiness

Transfer of readiness between current and incoming employee.

  • The incoming employee creates an SSH key on his client (an already created key can be reused):

ssh-keygen -t ecdsa -f [USER]_[ROLLE]_ecdsa

  • The incoming employee transmits the public key he just created to the current employee.
  • The Vault is opened (unsealed). Current employee and incoming employee open the Vault using their parts of the master key.
  • The current employee logs in to the Vault with credentials.
  • The current employee signs the public key of the incoming employee.

KeyID: [USER] bsp.: JOHNDOE
Validprinzipals: [ZONE] bsp. : Standby
Extentions: "permit-pty": ""
TTL: X days

  • The current employee transmits the just signed public key to the incoming employee. The latter must then store it instead of the unsigned public key. (Note: If you want to reuse the key, the unsigned public part should be stored separately).
  • After successful signing, the vault is closed again (sealed) by the current employee.

Now the magic happens in the background that the key exchange happens automatically on the servers in connection with Vault.

The current employee can go to sleep with peace of mind because the incoming employee is now the current employee and the former current employee no longer has access to the machines from now on because the validity of the keys has expired.

Profile picture for user DeepL

DeepL is a deep learning company that develops AI systems for languages. The company, based in Cologne, Germany, was founded in 2009 as Linguee, and introduced the first internet search engine for translations. Linguee has answered over 10 billion queries from more than 1 billion users.