Keep Cloud Infrastructure Secure & Compliant
How to Keep Your Cloud Infrastructure Secure and Compliant - overview
In a world of hyperscale public clouds, dynamically provisioned environments, distributed teams and remote work, how can you reliably secure access to your infrastructure and satisfy compliance requirements without slowing down your development teams? Gus Luxton discusses the essential elements of secure infrastructure access and how you can implement best practices in your environment.
Key topics on How to Keep Your Cloud Infrastructure Secure and Compliant
- Once people get access to infrastructure, traditionally, there's very little visibility into what's actually going on.
- There are top 5 things you can do to secure cloud infrastructure, all covered in this webinar.
- Teleport provides secure defaults and best practices out of the box that provide visibility into infrastructure access and user sessions.
- Teleport is open source and has a very active community where you can get answers to your questions.
Expanding your knowledge on How to Keep Your Cloud Infrastructure Secure and Compliant
- Teleport Server Access
- Teleport Application Access
- Teleport Kubernetes Access Guide
- Teleport Database Access
Introduction - How to Keep Your Cloud Infrastructure Secure and Compliant
(The transcript of the session)
Lucy: 00:00:03.585 Hello, everybody. And welcome to today's webinar presentation, “How to Keep Your Cloud Infrastructure Secure and Compliant Without Slowing Down Developer Productivity”. Brought to you by Teleport. Before we get started, I have a few housekeeping items that I need to go over. First, if you have any questions for our presenter today, post those into the questions panel so we can get to them after the presentation. If you experience any technical difficulties today, please post a message in the chat panel, and I'll work with you to resolve those issues. Last on my housekeeping list, today's presentation is being recorded. It will be available shortly after the webinar ends. Now that we are through those housekeeping items, I'd like to introduce you to our presenter, Gus Luxton. Gus is originally from the UK and currently lives in Canada. Formerly a production engineer at Facebook and now a Solutions Engineer at Teleport, Gus helps developers securely access their infrastructure as well as deliver cloud-native applications to restricted and regulated environments. When he's not working, Gus enjoys PC gaming, great food, and craft beer. With that, I'll hand it over to Gus.
What cloud infrastructure encompasses
Gus: 00:01:05.160 Hey, thanks very much. Hi, everyone. Yeah, Gus Luxton, as said. Work here at Teleport, and today I'm going to do a webinar for you on keeping your cloud infrastructure secure and compliant and not slowing down developer productivity. So let's get going. Now, what do I mean when I say cloud infrastructure? Well, it can be pretty much anything on this list. Bare metal servers, VMs more commonly, and things like EC2, Azure, that sort of thing. Docker containers that you run on those. Kubernetes pods, whether those are in kind of EKS, GKE, any sort of managed Kubernetes environment, or self-hosted, do Kubernetes from scratch, anything like that. Serverless applications like that, Lambda functions, all those kind of things, databases, web applications, Windows desktop, server machines, and the cloud consoles and APIs themselves. We're talking about a very broad range of everything. These days, so many things are hosted in the cloud, and it's incredibly important to keep them secure and make sure that only the right people get access to them. And that's one of the things that we specialize in here at Teleport. That's something we really like to help people with.
Why cloud infrastructure security matters
Gus: 00:02:11.486 So why does it matter? Well, sharing passwords is insecure. Traditionally, that's been the way a lot of places have approached this. And even with password vaults like 1Password or LastPass or anything like that, it can still be an issue to share those passwords between people. You don't necessarily know who has access to them. People can copy and paste them, give them to others. It's not really easy to see exactly who has access. So traditionally, other than this, people have come up with things like SSH public keys, for example. So if you need to give access to a server, you load on an SSH public key, but they don't really scale either. You can add them fairly easily with automation, things like Ansible, Salt, that kind of stuff. But it's very hard to keep track of who has access to the private key which will allow you access. It's incredibly difficult to keep track of. Onboarding and offboarding users is difficult. Whenever you have a new user join, you have to add them to a system, and you have to give them access to things. There are maybe many different places. If you're using all three cloud services or more, you may have to add them to each one of those consoles individually to give them access. It's a big load and cognitive strain. Credential leaks and compromises are a real risk. We've seen more and more of those, particularly in the past few years. And everything that happens, every one of these great big breaches or anything that happens, it presents problems for people. It's really difficult to keep track of them, to see where things are happening. You have to change your passwords all the time. You have to do all this sort of thing. It's difficult.
Gus: 00:03:41.535 Once people do get access to infrastructure, traditionally, there's very little visibility into what's actually going on. Once you sign into the AWS Console, there is information there in CloudTrail and so forth about what people are doing, but it's quite tricky to be able to keep track of that. There's not necessarily one central location where you can get that visibility. And also, the final point,, and probably the most important for some of us, developers really don't like clunky overbearing security solutions. Anyone who's ever worked in a kind of regulated or government environment, I certainly have. I've been there in the past. I used to be a contractor and worked for the UK government, and I have had to deal with a lot of very arbitrary restrictions, things being placed on me, people who won't listen to reason, people who won't open anything up. You have to do things a certain way, and that's the only way around it. And of course, that hurts your productivity. Trying to do anything in that restricted, regulated environment is very difficult, and developers just end up working around them. I certainly did, and I know other people did as well. And that's not really what you want. It's the antithesis of security. That's not the sort of thing you want to be happening. So you want to make it easy for developers to do the right thing. And that's what we're all about.
Top 5 things you can do to secure cloud infrastructure
Gus: 00:04:48.083 So what can you do? Well, if you take nothing else other than this away from the webinar today, take these five points. So number one, don't use shared credentials or SSH public keys. They're kind of antiquated. And there are much better ways to do things. And I'm going to go into exactly what those are. Point two, use an external identity database to identify your users. So don't add users in individual locations and have to add them whenever they join and then remove them when they leave. Have a central identity database, a single sign-on identity provider which keeps a record of your users and what they should have access to. Step three, give people the lowest level of access that they need. So don't give them access to everything out of the box. Just remove that and give them a baseline level of access because they can always request more. And I'll go into details about that too. Step four, use a bastion or proxy server and funnel access through there for audit logging. And step five, one of the most important, enforce the use of a second factor when logging in.
Step 1: Don't use shared credentials or SSH public keys
Gus: 00:05:47.099 So step one, don't use shared credentials or SSH public keys. I'm going to repeat it because it's very important. But do, do use short-lived certificates issued by a central certificate authority. Public keys are great. They've served us well for many years. They are more secure than a password. You don't type them in. So your public key is loaded onto a machine. You keep the private key associated with that. And when you go to log in, your private key gets used to sign the authentication packets, and it authenticates that you, the holder of the private key, are allowed to access the server and get access. They're more secure. You don't have to type them. You never type a public key. It's a huge string. It's got a lot of entropy. It's very hard to guess it. It's very hard to keylog it because it never gets typed. It just gets put into a box then saved on disk and left away, so you won't get key logged. And they're cryptographically secure. Public keys backed up by a variety of different types of cryptography. I say complicated math with prime numbers here, that's in the case of RSA, one of the best-known algorithms for generating keys. But there's also much newer ones which generate far shorter keys with the same elliptic curve security, that sort of thing. Much newer algorithms do the same thing.
Gus: 00:07:02.987 But there are downsides as well. Some of the cons of this. They're pretty hard to keep track of. So you generate a public key on your machine, and you install that public key on a server so it gives you access. You can put a comment on the end of that. So I can say this is Gus' public key, and that's how Gus is getting access to this server. But quite often keys will get shared. I've definitely pasted a private key to someone in Slack before many years ago. I've done this sort of thing. There have been private keys in password vaults which I've pulled out and used to access other services. Sometimes they've been checked into repos on GitHub, private repos, of course, but nevertheless they're there. And there are shared credentials which people are using. So you don't really know often who has the private key and who is getting access. You know that a public key is granting the access, but you don't know who actually has the private key. It could be one person, it could be hundreds.
Gus: 00:07:50.497 Public keys don't expire by default. So when you add a public key to a system, if you forget about it, you give the person who's got the private key, which could be one person, it could be many, you're giving them perpetual access. So unless you remove that, which may impair access for a lot of people, you really don't know who's going to be able to get in and for how long. And they're hard to distribute as well. As I say, you can do this with Ansible, anything like that, Puppet. You can push these keys out to servers. It works. It can be done. I've definitely been there in the past. But there are better ways, and you don't need to do that anymore. There are alternatives. And I'm going to tell you what those are. There are more downsides. There's a lack of accountability as I mentioned before. Who does that key belong to? Does the comment actually — it can say anything it likes. Does that mean that's actually the person who has that key? And how do you know? It's pretty hard to enforce credential rotation as well. So people run SSH keygen once, they generate a key, they keep the private key. They put the public key on all their servers, and they use that for access. Most people, including myself, historically, often don't rotate those. So they get set once, and they get left behind, and they're there for years. I haven't rotated some of mine in a very long time, and it's bad practice. I should do better. I know I should do better, and that's what I'm trying to do now. They're pretty hard to revoke these quickly. If you've put a public key out to a whole load of servers and suddenly you find out that the private part of that key has been compromised, someone's laptop is been taken, and it wasn't encrypted, and it had the key left on it, you've suddenly got to revoke that key, that public part of that key from absolutely everywhere. But where is it? How do you know? Maybe your Ansible runs can tell you, "Oh, it was applied to these 50 machines," but do you know that someone didn't go and copy that elsewhere manually? How do you know that that has been revoked from everywhere? It's going to be a pretty time-consuming operation to do that.
Gus: 00:09:42.984 But there is hope. There is an alternative. And the hope is certificates. So SSH certificates, and more specifically, X.509 certificates and any kind of certificate method in general, the same sort of thing that's used for HTTPS and encrypting traffic between websites and browsers. The certificates behind these, they have all the pros of the public keys, but they have none of the downsides that I mentioned. They can be set to expire. They do have metadata logging associated with them. You do know who issued a certificate and when it was issued, you can set an expiry time on it. You can force credential rotation. If you add the public key of a CA to your system, then any certificate that the CA issues — so it's the same principle, except that rather than adding one public key which associates with one private key, you add the certificate authority, saying, "I trust that any certificate issued by this certificate authority is genuine. It's real. We should allow access." And if you ever need to revoke that access, you can remove the CA, and all the certificates that were issued by that CA are suddenly invalid. So it's not a question of individual people generating certificates and putting them in place. It's one central CA which issues all the certificates, and you can track who issued the certificates, when were they issued, what permissions were given, what time was it issued, how long is it valid for? And you can enforce that expiry out of the box. The metadata that you can add, it can be usernames, it can be email addresses, it could be ticket identities for production issues and cases like that.
Gus: 00:11:16.553 Certificates are really good. I'm going to keep driving this point home because it's incredibly important. You can issue new certificates easily, really, really easily. A certificate authority, that's exactly what it's designed to do. It issues certificates. And that's so easy that you can make certificates expire every few hours if you like. The process to get a new certificate is very simple. It's a case of issuing a request and getting a certificate back once you've been authenticated, again, with that central identity, which we'll touch on later. So you can make these certificates expire every few hours. You can also revoke certificates. So if you have a problem and you know that a certificate has been compromised, you can revoke a certificate which makes it instantly invalid, all of a sudden. That's very useful. So you can just revoke that centrally, and the certificate authority will provide that for you. Rotating credentials becomes very simple as well. You set up a new certificate authority, you add that, and then you start issuing the new certificates from that authority, the new authority instead. And you can use a grace period for old certificates as well. And one of the reasons I'm dwelling on this so much is because Teleport does all of this out of the box. Teleport uses certificate authority centrally. It uses short-lived expiry times, and it makes sure that you do the right thing with very little configuration. These are all the same defaults. Teleport's default certificate length is eight hours, roughly the same as a working day. And that makes sure that people get a certificate at the beginning of the day, they use it all day, and it expires when they leave. That's essentially the ideal in terms of security. And when they come in the next day, they have to get a new certificate before they can carry on. And that goes through the same authentication process, making sure that they're allowed to have it.
Gus: 00:12:51.812 There's more to this even. So certificates don't just have to authenticate that I, Gus, a user wants to connect to a given host. They can work the other way around too. So they can also authenticate the hosts that I'm connecting to so that when I connect, I know that the host I'm connecting to is legitimate because it's presenting a certificate that's trusted by my host certificate authority, which I have installed locally on my laptop. Have you ever seen this prompt before? The, "Do you want to trust this key?" prompt. Yes, no. It's called TOFU or trust on first use. We've probably all seen it if we've ever SSH to anywhere because it's the first thing you see whenever you connect to a machine, and you don't have a host certificate authority configured. But it's quite insecure. The idea originally is when these keys were presented, you'd get the fingerprint out on the screen, and you're supposed to go and look somewhere to make sure that that's the correct fingerprint for that server. But I'm going to wager that 9 and a half times out of 10, people never actually go and check that, and they just type yes, and they connect. That's a problem. It means that you don't know that you're connecting to the right place. It's dangerous, and we shouldn't do it. We should do better than that. You can eliminate this, as I say, by using host certificates, you issue a certificate to a host, you get the host to present that certificate, and you configure the client to trust that certificate authority. And all of a sudden, all of these trust-on-first use prompts, they disappear. They're a thing of the past. Whenever you're connecting to a host, you can be certain that that host has a certificate issued by a trusted certificate authority.
Step 2: Use an external identity database to authenticate your users
Gus: 00:14:23.556 That's it for step one. So step two, don't add users individually to each provider or system. What you should be doing is using an external identity database to authenticate the users. And then when you issue the certificates, as we mentioned in step one, you encapsulate that identity into there. So when I log in and I get a certificate, it says, "This is Gus' certificate." It's valid for my email address. It's valid for the next eight hours, and there's a full audit trail associated with all of that. Why do you want to do this? Maybe that's important to explain. Well, do you know who every one of your users is? Can you name them all? Back when I first joined Teleport, I could. I certainly can't anymore. There are a lot of people work for the company now, and I don't know all of the names. I couldn't tell you who all of them are. And moreover, I couldn't tell you what individual person has access to which individual parts of our infrastructure. I really don't know. There's just too many now. It's spiraled out of control. It's plausible when you have a really small number of users and a small amount of infrastructure, it's totally plausible to do that. But what about when you have 50 users or 100 users or 1,000 or 10,000 or 500,000 users? Are you still going to know that? It's impossible to keep track of? And that's why having somewhere centralized which manages identity and keeps track of who people are and what they have access to is incredibly important.
Gus: 00:15:41.315 So what happens when a user leaves? Well, you need to revoke their access from all of the infrastructure that you have. But how's that going to scale? It's pretty tricky to know. What about the infrastructure that you don't know that you have even? The development VM in EC2 with the public IP, nobody told you about that. You didn't know that it was there. You didn't know that was something that was going to happen. What can you do about that? And what happens when there's a public key installed on that called generated key, something random? You don't recognize that? Who put that there? Was it someone trusted? Was it somebody else? Nobody's really sure. Is it the user who left? Or could it be some other critical process which is actually responsible for deploying your entire website? Again, you're not really sure, and it's important to know. So external identity databases, you derive your user identity from a central source. So you have some kind of provider, some kind of system which keeps track of all of your users. We're talking identity providers, things like Octa, Auth0, GitHub, OneLogin, anything like that, Active Directory. That's one place to add users when they join and one place to remove them when they leave. And you can use groups to grant access to particular sections of your infrastructure.
Gus: 00:16:54.957 So when these things are going on, you know who is going to be granted access to a given thing. When someone joins, you add them and you grant them access to all the things they need. When they leave, you remove them and all of that access automatically gets removed. And then your provider or your system like Teleport, which you're using to get your access, knows who should have access to what. Create that process, require the users to log in, get the certificate, and tie the certificate to the provider's identity. Every certificate gets that identifier, that username and email address, something assigned to it so you can keep track of them and you know who they are. Teleport also has the functionality which allows you to lock users. So if someone has a certificate issued, and they leave the company, they still might have a certificate that's valid for a few hours. You can lock the user, so that certificate is useless. If they tried to use it to connect to anything, they wouldn't get any access, and they wouldn't be able to log in again using the identity provider because you would have removed them and offboarded them. Again, better for peace of mind. Security teams love it. Everyone's very happy.
Step 3: Give people the lowest level of access that they need
Gus: 00:17:59.993 Step three, don't give everyone full access. You don't need it. As much as it's nice, you really don't need to have full access to everything all the time. Give people the lowest level of access that they need on a regular basis because you can always request more. If you need to connect to production to go and fix something, go and have a process which automates that saying I need to connect to production. I'm going to need to be in there for a couple of hours doing something on this one machine that needs something happening. I need to go and log into the production Kubernetes cluster to fix a deployment which has gone wrong and the CI or Argo CD can't fix it. You do any of these things. That's incredibly important. I like my production access. Of course, I've had a lot of access to a lot of things in my time and still do in some circumstances. And it's important. It's good. It's nice to know that I have the ability to fix problems if they occur, but I still spend most of my days working in development or demo environments. I don't need that production access. I don't need the important access to things unless I actually have to go and fix something. If you do need access, have that approval process, log the request saying, "Gus requested production access at this time. He wanted it for 4 hours," and have an audit trail that goes along with that. Who requested the access? Who approved the access? Because someone needs to say, "Yes. Okay. Gus can have that access." What was the reason? Why did I say I needed the production access? What access did I get, and how long do I have it for? Keep track of all of this and all of a sudden you can keep perfect tabs on who has been accessing things and why.
Gus: 00:19:28.600 You can even automate this process as well. So if you have an on-call rotation and somebody needs to fix something at 3:00 in the morning because they've been woken up because things are broken, you can have processes which will automatically approve that person. Oh, Gus is on-call today. He's requested production access. That must be because he's on call. We'll grant him the access, and we'll log it, and we'll make sure that while he's in production, while he's doing all of these things, we'll know exactly what he's doing. Teleport makes this easy. It has access request workflows built into it so you can request access and people can approve it right from Teleport's web UI. And I'll show a little demo of that as well. Teleport has open source plugins as well for Slack, Mattermost, Jira, PagerDuty, things like that. So we have an API as well. You can write new plugins. They're all written in Go. If you know Go, if you like Go, you can write new plugins for whatever provider you'd like. We've had people who've written external authentication for ServiceNow and other providers. So when people file a ticket saying, "I need access," there's automatically an access request created for them in Teleport. Teleport approves it, and grants them the access. Streamlining those workflows, reducing all of that friction, making developers happy, making it so it's easy to do the right thing, and making the security teams happy too because they've got that audit trail, and they know exactly what's going on at all times.
Step 4: Use a bastion or proxy server and funnel all access through it for audit logging
Gus: 00:20:43.232 Step four, don't give access to individual servers or services by IP or hostname. Instead, have like a central gateway, a central access plane, something which sits in the middle and allows people to connect via it. Traditionally, people have done this with VPNs. VPNs, they're good, but they have their issues. They provide a network layer of transport only, and they don't really handle any kind of authentication or logging on top of that. You can do better than that. And that's what we can do with Teleport. You funnel all the access through Teleport, you get those audit trails, you get that authentication. You're asserting that your users are genuine, and they've been connected. That's something that can go on. Remote work. It's brilliant. We've all been remote work for a lot of times, many of us, especially for the last couple of years. But that brings new challenges. People have been connecting from different locations very regularly, and you can't whitelist IPs with the best will in the world. IP addresses change regularly. People have dynamic leases from internet service providers. People are using mobile data, in some cases, shared WiFi, hotspots, coffee shops, whatever. Not necessarily, it's completely implausible to just whitelist IPs and say, "You know what? We trust this user because they're coming from a given IP." It's not a good idea. So use a central bastion or proxy instead. You configure the certificate-based authentication that we've mentioned above, and you enforce the use of that bastion as the gateway to all of the rest of your infrastructure. You make the infrastructure only accessible from behind your bastion, behind your Teleport, behind your proxy server, and you configure that authentication so that they trust the certificates that are issued when people log in. You can use host certificates to ensure that people connect to the right place so they know that they're connecting to the correct bastion server, the correct proxy. And you can use user certificates to ensure that they're legitimate. They have an identity which comes from an identity provider. It's trusted, and you know that you should have faith in them.
Gus: 00:22:32.199 So what about scaling concerns for this? Isn't a bastion server a single point of failure? Well, no, because the use of host certificates here means that you're connecting to a host name. And as long as that server behind there, whichever it is, is presenting a valid host certificate saying, "I am the bastion server," you can have as many as you like. You put them behind a load balancer. As long as all your bastions are presenting that host certificate, and it's signed by that trusted host certificate authority, there's no problem at all. You know that you can trust them no matter which bastion you're connecting to of the fleet, and you know that that's trustworthy. And if you need to prevent access, you can lock the user out. Teleport gives you that functionality, and because they're all connecting through Teleport, and you can't get to the systems without that, you know that they're going to be locked out if you revoke their certificates, and if you remove their access. They can't get new certificates, and their old ones are useless.
Step 5: Enforce the use of a second factor
Gus: 00:23:27.620 Step five. We're getting close to the final one. Don't rely on passwords only or even on certificates only. They're great. Certificates are better than passwords, more trustworthy, short-lived. All the reasons, all the perks that we've described above. But even with that, it's not a good idea to just rely on them. Attacks are getting more and more sophisticated. Any kind of thing which compromises a laptop or gives control of it. If that certificate gets leaked, the laptop gets stolen, taken, even if it's encrypted, if it's not been turned off, and it's still unlocked, you're going to have access to those certificates. But there is a way around this. Enforce the use of a second factor. Credential compromise, still a risk. Even when you follow all of the best practices that I'm outlining, it's still a risk. And that's why we can do better with two-factor, second-factor authentication.
Gus: 00:24:19.516 So what is a second factor? Many of you may know. If you don't, I'll explain it to you. Two-factor authentication is any system enforcing the use of two factors from a list to be granted access. So what's a factor? Well, it's something you know like a password or a private key, something that you have available to you locally. You've got something you have. So a physical device, a Yubikey, an identity, a phone with an authenticator on it, you've got that physical device. It's something that you must have in your possession to be able to provide that factor. It can also be a phone number for receiving an SMS. There's a caveat to that. SMS is very insecure. It's far too easy, particularly in North America, to take over someone's phone number and be able to steal the tokens that are being sent to them by SMS. So don't use SMS. Use something else. Use TOTP, time-based one-time passwords. The things where you get your authenticator app, you point it at the QR code on screen, and it gives you the six-digit code. Use things like that. Use a physical device which you have to have. Macbooks have Touch ID. That's, again, a great example of something that you have, something that's available to you. And you can use something you are. So again, Touch ID. It's a fingerprint, a voiceprint, iris scans. I mean, we haven't had iris scanners on laptops yet, but we do have fingerprint scanners, and we do have microphones. I haven't heard of anyone using voiceprints for authentication, but they can. And Touch ID is a perfect example. So having to have a password and a fingerprint to be able to be granted access, that's two factors.
Gus: 00:25:51.469 So why do you need them? Well, needing that extra step on top of merely just presenting a certificate adds defense in depth, and it makes it incredibly difficult for attackers to successfully compromise your infrastructure. So the likelihood of a password being stolen or a certificate is actually fairly reasonable. Passwords can be keylogged. Certificates, if someone steals your laptop, they get your certificates which are on it. And if they haven't expired yet — again, one of the perks of having short-lived certificates is that they're a very low risk because they expire quite quickly. But the likelihood that a password or your laptop gets stolen and your finger gets stolen at the same time, quite unlikely. And that's why you're adding defense in depth with multiple factors. The same is true for someone managing to keylog your password and steal your phone. It's not completely implausible, but it's much more difficult. The same with a password and a USB device on your key ring like a Yubikey or something like that. It's incredibly unlikely that both of them will be stolen at once.
Gus: 00:26:51.365 So in summary, this is the first slide again, but I'm just going to reiterate. So step one, don't use shared credentials or SSH public keys, but instead do use short-lived certificates, host certificates, user certificates, use them instead. Add metadata to them, and use them for tracking who's doing what. Step two, use an external identity database to authenticate your users so that user authentication step can go through that external identity database. You make sure that the users are in there, and you grant them access based on the groups that they have available to them. That's not a big arduous task for a developer to be able to have to sign in via a single sign-on provider. It's an extra 30 seconds in a workflow, and it gives you an access certificate that's good for the entire day. That's not an arduous step, and it provides much better security than just having those shared credentials available. Step three, give people the lowest level of access that they need. Don't give them full production access, and don't let them have it all the time. They can have development access to a small area of the infrastructure. The things they need on a daily basis. And if they need more, it takes two minutes to submit an access request to production and get it approved and get the access you need. And that is an incredibly small price to pay for the extra benefit that you get. I haven't met a developer yet who has a problem with needing to go through that process to get a better production access. And actually, many people, including myself, prefer not having that access available at all times. It limits your own blast radius. It limits everything that you can do wrong. If you don't have access to machines where you can do harm, you can't do anything wrong. If you run a command accidentally — I've certainly done that in my past. I've run a command on a database that I thought was development, and it turned out to be production. I've wiped out a whole table, and I had to go and restore it from a backup. I've been there. Without that production access, I wouldn't have been able to do that. I would only have deleted the development database, and it wouldn't have been a big deal. So sometimes, it's actually a perk having less access.
Gus: 00:28:46.745 Step four, use a bastion or proxy server or Teleport and funnel all access through it for audit logging. The benefit of this is incredible. You get to know who got access to what, you can provide a highly available bastion cluster, which people can get access through. Whenever someone logs in, you get an audit log entry. Whenever they connect to a machine, you get an audit log entry. They connect to a Kubernetes cluster, to a database, to anything like that, to an application, to a console. You've got that log, you know that they connected. In the case of SSH sessions or Kubernetes, you can record them with Teleport. Teleport will record the full content of the session for you, and I'll show a little demo of that as well. You funnel all that access through and you get the audit logs. Keeps security teams very happy. So you enforce the use of a second factor, step five. Don't just rely on a password. Don't just rely on a certificate. Enforce the use of that second factor as well. Why? Overall, why do you want to do this? Well, these are the industry-standard best practices. These are the things that the Googles, Facebook, Amazon, Microsoft, Netflix, all of these people, they all use similar methods to make sure their infrastructure is secure because they know that having shared credentials is dangerous. Using passwords only is dangerous. They know that you need — on big scale, you've got to protect yourself, and you've got to make sure you've got defense in depth. You have security on your perimeter, and you have security when connecting to individual services. You need to know who is doing what, and you've got to have those audit trails so that you can prove everything.
Teleport: secure defaults and best practices out of the box
Gus: 00:30:19.271 It's tough to implement these things from scratch, and we know that, especially at Teleport. But these things can scale from a very small amount of users. I run Teleport personally on my own home clusters. I've got a development cluster that's a very small number of machines. I run Teleport on it because it works very well for small clusters. We have customers and people who are running hundreds of thousands of machines within Teleport, and it scales. You can use it for all of this, and they rely on those processes themselves. And Teleport makes it easy to do this. As I said earlier, we pick secure, same defaults, security certificates that are issued for only eight hours. The ability to have a second — when you sign up a user with Teleport, you have to add a second factor, something like a Yubikey, something like a time-based one-time password. The best practices are used out of the box. So we make it easy to do this from day one. Things that we haven't covered. There are some other things that Teleport does as well, and I'll show a little bit of this in the demo I'm about to do, but audit logging. So audit logging. We have talked about it, but I haven't shown you an example of what that actually looks like. But when you have audit logs, you have that stuff. You can export them to Splunk, you can export them to anything like those, to a CM like Splunk, Datadog, ELK Stack, something like that. You can export those JSON-based events. We know what's happening there. You could have session logging. So SSH Kubernetes session logging. You could have logging of database queries when they go via Teleport. Incredibly important. Something useful being able to see every database query that was run, being able to see the full content of an SSH session from start to finish.
Demo: requesting elevated access via Teleport
Gus: 00:31:53.680 You can only allow certain users access to certain parts of your infrastructure, as I mentioned. You set groups on people within the identity provider, and you can allow that access. That's something I haven't touched on, but it's something you can do via Teleport. An organization and ease of management. How do you keep track of all of this? How do you know who's got access to what? Teleport has web UI views and things that can show you these things, but we haven't touched on them yet. So as I mentioned, I'm going to do a quick demo of some of the capabilities within Teleport. I'm going to show you logging into a machine and what that looks like. I'm going to show you how it looks, what looks like to request an elevated access via Teleport as well. So when I go and log in here into my cluster, it will present me with — I've got a single sign-on screen. I can pick my Google, and I can choose either of these accounts. So I've got an admin-level account, and I have a developer-level account. If I pick the developer-level account, I'll connect, and it will give me access to a small subset of my infrastructure. So this shows me three servers that I have access to. There are more servers in this cluster, but my role-based access control here is only granting me access to these three machines. I also have other things, web applications available to me like Grafana and Jenkins and the AWS Console. I've also got a Kubernetes cluster available to me, and I can connect to that via Teleport. If I click on Connect, it gives me the commands to use, and it sets my `kubectl` context on my local machine so that I can route all requests to that cluster. I have databases, so things like an RDS Postgres database which is going on. I can connect to that via Teleport as well, and I can connect to that using GUI clients. And we have Windows desktop access as well. When I click on Connect here, it'll take me, and give me a virtual console on that machine which I can use.
Gus: 00:33:42.402 So the service here that I have access to. When I drop down the list, I have a shortlist of users that I have access to. I have a set of usernames associated that I can connect as. These are set by my roles within Teleport. So because I'm logged in as a developer, I've only got this small set of usernames available to me. And if I pick EC2 user via Teleport, I can log into the console. I can do anything here that I would do, including make typos. I can show everything that's on screen. I got color support for colors, prompts. I can do anything that I would do ordinarily, anything that I would do on a machine. This isn't a machine that's running Apt. It's running Yum. You can see everything. Teleport will record this session from start to finish. It will know everything that's happening. It shows everything that I type into the console and everything that comes back out onto it. I can run top, things that redraw my screen, things that add commands, things that rewrite stuff, move characters around, that sort of thing. Teleport records everything. I have buttons for file upload and file download, so I can download machines to my local machine. I can upload files to Teleport. You get a web-based terminal which does all of these things.
Gus: 00:34:53.956 That was my session. What we can see is that there's an audit log which goes along with that. So I can see gus@teleportinfra.dev ended an interactive session lasting a minute on this particular node. We can see when I logged into the cluster. We can see a session was started. If I go in, I can see the details of the session that was going on. I've got a JSON structured event with an event code saying `session.end`. I know exactly what happened, who the participants were, what server address I connected to, the time it started, the time it finished, all of these things. I've got that full audit trail available to me, and administrators of the cluster can see those things as well. They know. And this button here, when I click on it, will show me a session recording. Because I'm the development user, it won't show me that. I have to be an administrator. And I'll show you that in a second. We'll play back the thing. The other thing I'm going to show, as I mentioned, was the idea of access requests. So if I sign out of this development user, and I sign into the administrator user instead, we can go and look at the audit request. We can go and look at the audit log and see what happened. So now I get the same cluster view, but with a couple of extra machines because I'm now an administrator. When I go and look at the audit log, we see everything. We see the administrator user logging in, we see the same session as before. And when I click on the playback button, we'll see a recording of the session that I had as that previous user, everything that gets typed, everything that happened within the terminal, all of the typos I made, me running the wrong command, all of these things, running top, loading all of this stuff. It plays back just like a video. You can see everything going on within the session. You can see all of the things that happen. If I just want to stop and focus on a given part, I can do that. I can pause and look at what was on screen. I can drag the scroller back, and it will show me exactly what was on screen at a previous part of the session, what was being typed, all of the commands that were run, that sort of thing. Incredibly powerful.
Gus: 00:36:51.102 So I mentioned I was going to show you the access requests. For some reason, that appears not to be working at the moment, so unfortunately, I will have to skip that part. But trust me when I say we can do it. So that's a rough demo of Teleport. Accessing applications works largely the same way. You have an application that runs — say I want to connect to Jenkins here. I have this subdomain of my Teleport cluster. If I connect to that, it can provide me access to the Jenkins server, which is running in my environment, gives me a subdomain of my Teleport cluster. It logs that I've accessed this, and it will provide me a proxied access through to Jenkins where I can do all of the other services. I don't have to have Jenkins on the public internet. It's hosted in a private cluster that has a private IP. There's no public access to this at all. It's just that Teleport provides that gateway and that proxy. It's the bastion server. It's the central component which is providing access to my infrastructure. And along with that, it logs that I had a session in Jenkins at that time. The same is true for Kubernetes. The same is true for databases. So incredibly powerful.
Q&A time
Gus: 00:37:56.164 And that essentially is everything that I wanted to cover today. So thank you very much for your attention. If you have any questions, please put them in the chat or in the questions section. I'll be happy to answer all of them. There's a few already I can see. I'll be happy to answer all of them. If there's anything that I haven't covered that you'd like to know more about, please ask away, and I will address all of them. We've still got 20 minutes’ worth of time for questions. There's plenty of time to answer anything that you're curious about. Of course, I have to plug it. Check out Teleport. As I've said numerous times, it's open source, and it makes it easy to do all of this stuff out of the box. We're on GitHub, gravitational/teleport. You can also go to goteleport.com. We have a community Slack, which you can join. It's very active. Lots of support going on there, people asking questions, people looking for help with their Teleport installations, people setting up Teleport for the first time, looking at more advanced use cases, all kinds of different things. Go and look and see some of the cool ways that people are using Teleport, the really interesting things that people are doing with Teleport, and the stuff that they find useful.
Gus: 00:39:00.388 So I'm going to get going with a few of these questions. So question number one, do I need a license to use Teleport, or can I do that for free? Well, there are two versions of Teleport. As I mentioned, it's open-source. There's the Teleport Community Edition which is on GitHub. You can download the source code, compile it yourself. You can download the open-source binaries from our website as well, goteleport.com/download. You can download that, install Teleport, and use it for free. You don't need to pay for it. It is completely without limitations in the Community Edition. There's no limitations on server size, on number of servers, number of users, anything like that. There is also the Enterprise Edition of Teleport, and there are a few differences there. Enterprise Editions of Teleport which do require a license. They're paid for. You get full single sign-on provider support. So the open-source edition of Teleport, the Community Edition supports GitHub as an identity provider, and it also supports the use of local users that you can configure. The Enterprise Edition adds the use of SAML and OIDC based providers like Okta, like Auth0, like Active Directory as I mentioned earlier. We also have support offerings which come as part of the Enterprise packages, and that's the reason why people are interested in them, generally.
Gus: 00:40:12.444 Question number two, do I have to use Teleport to get all of this? And what if I just want to build it myself. Well, that's absolutely a valid question. And you don't have to use Teleport. It just makes it an awful lot easier. There's kind of those five different steps that I mentioned. Building those from scratch is pretty difficult. I actually wrote a blog post a while ago called How to SSH Properly. If you google “How to SSH Properly”, you'll find that blog post on the Teleport website. And that is something which you can see in there. I build out three of those five things. I build out certificate-based SSH access. I build out two-factor, and I build out some other parts as well. So you can see kind of how to do this. And I did that all with OpenSSH, with SSH-keygen, completely open-source tools, no Teleport whatsoever, nothing. It was done using just open-source technology, which is already there. So it is possible. It will just take you a lot longer. And honestly, having done it from scratch myself, it took quite a long time, and I learned a fantastic amount of stuff. Obviously, I'm going to say this because of where I work, but try Teleport. It's worth trying. It's a lot easier to do all of this stuff out of the box.
Gus: 00:41:24.839 Question number three: does teleport work with Windows machines? The answer is yes. So the web UI that I showed you briefly, that works on any browser. So macOS, Windows, Linux. It works on iOS. It works on Android. Any system that has a web browser that's reasonably modern will be able to use that. So, absolutely. The tsh
client as well, which I didn't show as part of the demo. But the Teleport command-line client, which you can use for Kubernetes and for other things, that also works on Windows, on macOS, and on Linux. So you can access those things. And recently, in Teleport 8, we added the ability to proxy access to Windows servers as well, basically an RDP client within Teleport. So we have that access in there as well. So now you can access Windows desktops and Windows servers using Teleport as well. So, yes, we are improving Teleport's Windows support by the day, improving those use cases. But anything that you want to be able to do with it currently, you can do with Windows as well. Thank you, Kat.
Gus: 00:42:25.979 So I have a lot of things — here's another question. Question number four, I have a lot of things already set up on my laptop to use SSH for access. Do I have to change all of those to be able to use Teleport? And the answer is no. No, you don't. So I previously as well, the same thing, had muscle memory. I typed SSH. I had server names that I knew, and whenever I ran the commands, it just connected me straight into them. It was using public keys for authentication that I had configured from years hence. So I didn't need to be able to change all of that. Teleport's command line tool, tsh
has a tsh config
functionality which will output you an SSH config file that you can use with OpenSSH. So in your home directory, you've got a .ssh folder and a config file within that. You can run tsh config
after you log into a Teleport cluster — you put that configuration in there and then you can connect using SSH. You can use SCP, you can use rsync, you can use visual studio code, anything that uses SSH to connect as a transport, Ansible included as well. Ansible also works over Teleport. You can SSH into machines using Ansible via Teleport. So you can do all of your Ansible runs via Teleport and see what gets run and what gets connected to them, what Ansible was touching during those times as well. So absolutely, you don't have to use the tsh
client, you can use an SSH client and configure everything to go via Teleport's proxy server as well.
Conclusion
Gus: 00:43:52.810 So that seems to be all we have for chatting questions at the moment, I don't see any others. So with that I think I'm going to say thank you very much again for paying attention to the webinar. Thank you for coming. It's a pleasure to talk to everyone. Please go and check out Teleport. Go look at the website, go and look everything. If you've got any questions, join our community Slack and ask there. Absolutely be happy to talk to you and help you answer your questions and solve your solutions problems. And with that, I'm going to hand it back to DZone. Thank you very much.
Lucy: 00:44:28.135 All right. Well, DZone would like to thank Gus for a great presentation. And DZone would also thank Teleport for providing the audience with a wonderful webinar. And lastly, thank you to everyone who attended today. We hope you learned something new that will help you in your developer career.
Join The Teleport Community