The Truth About the PID 1 Problem

Bret Fisher
A free video tutorial from Bret Fisher
Docker Captain and DevOps Sysadmin
4.6 instructor rating • 4 courses • 197,648 students

Lecture description

Let's clear up the confusion about how you should start node in containers

Learn more from the full course

Docker for Node.js Projects From a Docker Captain

Build, test, deploy Node for Docker, Kubernetes, Swarm, and ARM with the latest DevOps practices from a container expert

08:13:00 of on-demand video • Updated September 2020

  • Optimize your local development setup for NodeJS in Docker
  • Operate smoothly in a team of NodeJS developers using Docker and Compose
  • Improve the speed and reliability of your Node builds and testing using Docker
  • Get the best NodeJS tweaks to use for dev, test, and prod
  • Design NodeJS images for use with Kubernetes and Swarm
  • Learn about security scanning and locking-down your NodeJS apps
English I teased in the last video that there are some potential issues with the shutdown process for Node and npm. I want to expand a little bit on that in this lecture around what we're dealing with inside a container. It has to do with the fundamental starting and stopping of processes at the root of a file system. That's known as the init process. It's also known as the PID 1 process. That's the technical term for it because it is technically running Process Identifier 1. If you did a ps command, you would see that it was the number 1, meaning the first one. In Linux, that means it's got a couple of extra responsibilities as a process. That is, it needs to handle zombie processes. That sounds scary. But really what it means is processes that don't have a parent. Or maybe something that was launched by something else, but then that parent process crashed, and then the subprocess is left hanging out, right. That's happened sometimes in a bad situation. Those are known as zombie processes. The main, or root init process there, the PID 1, it's supposed to clean up, or reap, those processes, or kind of take them in, and take hold of them and deal with them. And the other big thing it does is pass as process signals to its subprocesses. That is important when you want to shut down a process. So, there are things in the Linux kernel that are sent, they're called signals, and they're sent to the processes. Then if there are subprocesses of a particular process, it needs to pass those on as well. So, let's see how we deal with these two problems. I want to tell you that the Internet, to a large extent, has this wrong when it comes to Node. So if you go start researching around about Node and zombie processes in containers, you will read everybody telling you that you shouldn't run Node directly in a container. That's simply not true. That's maybe a little bit of misunderstanding on their part about the potential solutions because there are at least three ways to deal with this problem. Certain ways have certain negative effects. So, let's go through those. The first here is the zombie process. I just want to tell you that zombie processes are not really an issue with Node. I've never had this issue with Node and I've not seen others on the Internet talking about zombie processes happening in Node, so let's just not deal with that or add an extra layer of management in the middle here if we don't need it, right. So, if we don't have zombie processes, great. Usually, you're just going to have that one node process and subprocesses, if any at all, that you would have for that node process. Let's focus on the shutdown which I feel like is really the core and most important issue with Node in production when you're going to do rolling updates. Now we'll get the rolling updates later in the course. But for now, let's just talk about the different options for handling shutdown in a container. This is where I feel like the Internet, in general, gets this wrong and they're a little bit confused around the real core issues we're dealing with here. If we ignore zombie processes because they're not an issue, then really the main thing left is shutdown. Shutdown is really about signals coming from Docker. Docker gets those either from telling itself, you know hey, I'm going to shut down this container because I'm going to replace it. Or it might get those signals from the OS. Maybe a control c at the command line when you're running on your local machine, you hit control c. That's also a shutdown process that is sent from Docker into the container. Then the kernel itself could maybe tell Docker, you need to shut down and Docker will make sure that it's shutting down all of its containers first. These come in three different flavors of signals from the operating system and that is Signal Int or SIGINT, Signal Term, also called SIGTERM. Then SIGKILL as you can guess, killing the container. So, we never want to deal with SIGKILL because that's not really a healthy state. That is actually the operating system just literally right now, immediately killing the container. The container won't even see that signal because it's not even given a chance to respond. It's just terminated. Now the other two, the INT and TERM, are passed into the container by Docker. What I see is SIGINT is used when you control c, and SIGTERM is used when you do a docker container stop, or maybe a rolling update, or Kubernetes is doing an update, something like that. So, those are the two common ones. Those are really the only two options actually. You're going to see those two in your app, and we need to understand them and handle them properly in Docker. Because INT and TERM are seen by the app, the app has a chance to properly shut down. This is actually pretty common in the real world of Linux even without Docker. So, if you had a MySQL server, for example, and you wanted to stop the process, Linux will send it a SIGTERM and give some seconds, you know depending on the init process and how long it wants to wait, for MySQL to shut down. And that's a healthy thing. Usually you want apps to handle their connections, maybe save the files, save memory to files. That stuff is going to all happen. With Node, what we really care about, usually, is making sure that it cleans up any files it might be writing or streaming. And then most importantly, deal gracefully with any connections because a lot of us are using Node for web apps. We want to make sure that our clients aren't suddenly terminated and they lose connection or any backend processes don't suddenly lose something midstream. Unfortunately, npm itself, does not handle these signals at all. It doesn't do them well or at all. And so, this is why I kind of rule out npm for use in production Docker because it just doesn't do anything gracefully that I can tell. Node though, Node doesn't by default handle these signals but you can add code to it, in your app, that will listen for these signals and then do whatever you want. Maybe it cleans up temp files. Maybe it gracefully shuts down connections. Maybe it does something that you need just specific for your app. But it gives you this chance. It will actually watch, and then capture the signal, and then it can run the functions you need, and then give the exit to essentially exit the app gracefully at the very end. As a backup plan here, Docker does have a built-in init process. This is an interesting thing. It's actually called teeny. If you've taken my Docker Mastery, you've seen it before. We've used it in a couple of examples with Node. This is sort of a backup or a workaround. So, when you read on the Internet that people are saying use teeny. Always use teeny or use an init process other than Node. I don't always agree with that because if you know what these are doing and you've handled them in your app with special code, like I'll show you in a minute, then you don't have to worry about adding teeny in there. Again, we don't need to worry about zombie processes really, and as long as we're properly handling the shutdown, we don't need yet another process in the mix just to handle that.