Debug the internal process of Docker container

This article uses the Nebula Graph process as an example to explain how to debug the process as if it were locally without destroying the contents of the original container or installing any toolkits in it.

1. Demand

In the process of development or testing, we often use the deployment method under the repo vesoft-inc/nebula-docker-compose, because in order to compress the volume of the docker image of each Nebula Graph service as much as possible, so during the development process All the commonly used tools are not installed, not even the editor VIM.

This makes it difficult for us to locate the problem inside the container, because every time we can only install some toolkits, we can carry out the next work, which is very troublesome. In fact, there is another way to debug the process inside the container. You don’t need to destroy the contents of the original container, and you don’t need to install any toolkit in it.

This kind of technology is actually quite common in the k8s environment, which is the sidecar mode. The principle is relatively simple, which is to start another container and let this container share the same pid/network namespace with the container you want to debug. In this way, the process and network space in the original container can be “at a glance” in the debugging container, and all the tools you want are installed in the debugging container, and the next stage is left for you to play.

2. Demo

Next, I will demonstrate how to operate

Let’s first deploy a Nebula Graph cluster locally using the docker-compose method described above. For the tutorial, see the README in the repo. The result after deployment is as follows

$ docker-compose up -d
Creating network "nebula-docker-compose_nebula-net" with the default driver
Creating nebula-docker-compose_metad1_1 ... done
Creating nebula-docker-compose_metad2_1 ... done
Creating nebula-docker-compose_metad0_1 ... done
Creating nebula-docker-compose_storaged2_1 ... done
Creating nebula-docker-compose_storaged1_1 ... done
Creating nebula-docker-compose_storaged0_1 ... done
Creating nebula-docker-compose_graphd_1    ... done
$ docker-compose ps
              Name                             Command                       State                                             Ports
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
nebula-docker-compose_graphd_1      ./bin/nebula-graphd --flag ...   Up (health: starting)   0.0.0.0:32907->13000/tcp, 0.0.0.0:32906->13002/tcp, 0.0.0.0:3699->3699/tcp
nebula-docker-compose_metad0_1      ./bin/nebula-metad --flagf ...   Up (health: starting)   0.0.0.0:32898->11000/tcp, 0.0.0.0:32896->11002/tcp, 45500/tcp, 45501/tcp
nebula-docker-compose_metad1_1      ./bin/nebula-metad --flagf ...   Up (health: starting)   0.0.0.0:32895->11000/tcp, 0.0.0.0:32894->11002/tcp, 45500/tcp, 45501/tcp
nebula-docker-compose_metad2_1      ./bin/nebula-metad --flagf ...   Up (health: starting)   0.0.0.0:32899->11000/tcp, 0.0.0.0:32897->11002/tcp, 45500/tcp, 45501/tcp
nebula-docker-compose_storaged0_1   ./bin/nebula-storaged --fl ...   Up (health: starting)   0.0.0.0:32901->12000/tcp, 0.0.0.0:32900->12002/tcp, 44500/tcp, 44501/tcp
nebula-docker-compose_storaged1_1   ./bin/nebula-storaged --fl ...   Up (health: starting)   0.0.0.0:32903->12000/tcp, 0.0.0.0:32902->12002/tcp, 44500/tcp, 44501/tcp
nebula-docker-compose_storaged2_1   ./bin/nebula-storaged --fl ...   Up (health: starting)   0.0.0.0:32905->12000/tcp, 0.0.0.0:32904->12002/tcp, 44500/tcp, 44501/tcp

At this time, we will demonstrate in two scenarios, one is the process space and the other is the network space. First of all, we need to have a handy debugging image, we will not build it ourselves. We will find one that has been packaged from the docker hub for demonstration. Later, we feel that it is not enough. We can maintain a nebula-debug image and install us. For all the debugging tools you want, I will first borrow the nicolaka/netshoot solution in the community here. We first pull the mirror to the local

$ docker pull nicolaka/netshoot
$ docker images
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
vesoft/nebula-graphd     nightly             c67fe54665b7        36 hours ago        282MB
vesoft/nebula-storaged   nightly             5c77dbcdc507        36 hours ago        288MB
vesoft/nebula-console    nightly             f3256c99eda1        36 hours ago        249MB
vesoft/nebula-metad      nightly             5a78d3e3008f        36 hours ago        288MB
nicolaka/netshoot        latest              6d7e8891c980        2 months ago        352MB

Let’s take a look at what it would be like to execute this mirror directly

$ docker run --rm -ti nicolaka/netshoot bash
bash-5.0# ps
PID   USER     TIME  COMMAND
    1 root      0:00 bash
    8 root      0:00 ps
bash-5.0#

The above shows that this container cannot see any Nebula Graph service process content, so let’s add some parameters to it and take a look

$ docker run --rm -ti --pid container:nebula-docker-compose_metad0_1 --cap-add sys_admin nicolaka/netshoot bash
bash-5.0# ps
PID   USER     TIME  COMMAND
    1 root      0:03 ./bin/nebula-metad --flagfile=./etc/nebula-metad.conf --daemonize=false --meta_server_addrs=172.28.1.1:45500,172.28.1.2:45500,172.28.1.3:45500 --local_ip=172.28.1.1 --ws_ip=172.28.1.1 --port=45500 --data_path=/data/meta --log_dir=/logs --v=15 --minloglevel=0
  452 root      0:00 bash
  459 root      0:00 ps
bash-5.0# ls -al /proc/1/net/
total 0
dr-xr-xr-x    6 root     root             0 Sep 18 07:17 .
dr-xr-xr-x    9 root     root             0 Sep 18 06:55 ..
-r--r--r--    1 root     root             0 Sep 18 07:18 anycast6
-r--r--r--    1 root     root             0 Sep 18 07:18 arp
dr-xr-xr-x    2 root     root             0 Sep 18 07:18 bonding
-r--r--r--    1 root     root             0 Sep 18 07:18 dev
...
-r--r--r--    1 root     root             0 Sep 18 07:18 sockstat
-r--r--r--    1 root     root             0 Sep 18 07:18 sockstat6
-r--r--r--    1 root     root             0 Sep 18 07:18 softnet_stat
dr-xr-xr-x    2 root     root             0 Sep 18 07:18 stat
-r--r--r--    1 root     root             0 Sep 18 07:18 tcp
-r--r--r--    1 root     root             0 Sep 18 07:18 tcp6
-r--r--r--    1 root     root             0 Sep 18 07:18 udp
-r--r--r--    1 root     root             0 Sep 18 07:18 udp6
-r--r--r--    1 root     root             0 Sep 18 07:18 udplite
-r--r--r--    1 root     root             0 Sep 18 07:18 udplite6
-r--r--r--    1 root     root             0 Sep 18 07:18 unix
-r--r--r--    1 root     root             0 Sep 18 07:18 xfrm_stat

This time it’s a bit different, we see the process of metad0, and its pid is still 1. After seeing this process, it’s easy to do something about it. For example, can you attach it directly in gdb? Since there is no corresponding image with nebula binary at hand, I leave it to you to explore privately.

We have already seen that the pid space can be shared by specifying –pid container:, then let’s take a look at the network situation. After all, sometimes we need to capture a packet and execute the following command

bash-5.0# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name

Nothing. It’s a bit different from what we expected. It is impossible for us to have a metad0 process without a connection. If you want to see the network space in this container, you need to add some more parameters, like the following way to start the debugging container

$ docker run --rm -ti --pid container:nebula-docker-compose_metad0_1 --network container:nebula-docker-compose_metad0_1 --cap-add sys_admin nicolaka/netshoot bash
bash-5.0# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 172.28.1.1:11000        0.0.0.0:*               LISTEN      -
tcp        0      0 172.28.1.1:11002        0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:45500           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:45501           0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.11:33249        0.0.0.0:*               LISTEN      -
udp        0      0 127.0.0.11:51929        0.0.0.0:*                           -

This time it is different from the above output. After adding the –network container:nebula-docker-compose_metad0_1 operating parameter, the connection status in the metad0 container can also be seen, so you can capture the packet and debug it.

Leave a Reply