AWS Compute Blog

Using EC2 Serial Console to access the GRUB menu and recover from boot failures

This post is written by Pallavi Ravishankar a Senior Product Manager and Jason Nicholls an Enterprise Solutions Architect.

Debugging and fixing infrastructure is one of the key responsibilities of system and network administrators. When an instance fails to start up on premises or can no longer connect to the network, an administrator can walk up to the server in a data center and use the serial port of the server to resolve the problem. In the AWS Cloud, the administrator has no access to the data center. In the past Amazon Elastic Compute Cloud (Amazon EC2) customers could only view console output using the GetConsoleOutput API to identify that a network or boot failure has occurred.

On March 30,2021, AWS announced the EC2 Serial Console, a simple, and secure way for system and network administrators to interactively troubleshoot boot and network connectivity issues by establishing a virtual serial connection to your Amazon EC2 instances.

In this post, we walk you through a scenario where an Amazon EC2 instance fails to start.  We demonstrate how you can access the GNU GRand Unified Bootloader (GRUB) using Amazon EC2 Serial Console to fix the problem. While this post focuses on Amazon Linux 2, you can use a similar workflow with other Linux distributions in addition to as on Windows instances to troubleshoot and fix boot-failure.

Product overview

The EC2 Serial Console provides a secure and dedicated way to connect to the serial port (ttyS0 or COM1) of your Amazon EC2 instance. The EC2 Serial Console can establish connection to an instance even when a user is not able to connect using other methods such as Secure Shell (SSH) or Remote Desktop Protocol (RDP). You do not need to have direct network connectivity to the instance in order to access it using the virtual serial port.

Configuration changes or software updates are two examples out of many that could result in an Amazon EC2 instance start-up failure.

For example, in the configuration change scenario, you might have made an error to the file systems table (/etc/fstab), which controls how disks are mounted. Or you might have misconfigured the Ethernet configuration, resulting in an instance that boots without network connectivity. In both cases, the only way to troubleshoot the misconfigured instance is to view the boot sequence. You can view the entire boot sequence using EC2 Serial Console via the AWS Management Console.

The following screenshot from EC2 Serial Console shows the boot process of an instance that had both the /etc/fstab and network configuration files deleted.

Figure 1:Boot-sequence showing failure because of a deleting the fstab file and disabling networking

In the software update scenario you might have upgraded the Linux kernel, and in the process might have inadvertently deleted the wrong kernel. After rebooting, the missing kernel will fail to load correctly. In this scenario, you must revert back to the last known stable kernel. Access to the serial port using EC2 Serial Console shows you that the kernel is no longer available. The following screenshot shows an example of this error.

Figure 2: Linux boot failure - kernel not foundYou can simulate the same scenario by renaming the Linux kernel. Before you rename the kernel, you must ensure that GRUB is configured to read and write to the virtual serial port of your EC2 Instance. You will later use GRUB to access a rescue kernel to resolve the start-up failure.

Configuring GRUB

Now that you have an idea of example cases where you might run into system failure, let’s walk through how EC2 serial console with GRUB can help you fix the issue.

GRUB is the default boot-loader for most Linux operating systems. The GRUB menu appears when you press and hold Shift during the boot process. From the GRUB menu, you can select which kernel to boot into, or modify menu entries to change how the kernel will boot. This can be useful when troubleshooting a failing instance.

The GRUB menu displays during the boot process. The menu is not accessible using normal SSH, but you can access it using the EC2 Serial Console. Before you can access GRUB using the EC2 Serial Console, GRUB must be configured to read and write to the EC2 Serial Console. You can set read and write permissions by editing the /etc/default/grub file on Amazon Linux 2.

Before you begin, ensure that you have started an Amazon EC2 instance using the latest Amazon Linux 2 AMI with the latest kernel.

To edit the GRUB file

  1. Log in to your EC2 instance using any of the following methods:
  1. After you connected to your EC2 instance running Amazon Linux 2, edit the GRUB file (/etc/default/grub) using your preferred editor. The default configuration is set as follows:

GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 net.ifnames=0
biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0"
GRUB_TIMEOUT=0
GRUB_DISABLE_RECOVERY="true"

GRUB waits a configured amount of time before starting the kernel. Notice that Amazon Linux 2 has the time set to 0 seconds. The value is denoted by GRUB_TIMEOUT. A timeout value of -1 waits indefinitely, and 0 is immediately.

  1. For this example, set the GRUB_TIMEOUT to 30. A value of 30 means that GRUB will wait 30 seconds before loading the kernel. During this time, you can press any key to access the GRUB menu.
  2. Add the following two lines to configure GRUB serial port access:

GRUB_TERMINAL="console serial"
GRUB_SERIAL_COMMAND="serial —speed=115200“

The following shows an example GRUB file:

GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 net.ifnames=0
biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0"
GRUB_TIMEOUT=30
GRUB_DISABLE_RECOVERY="true"
GRUB_TERMINAL="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200"

For additional GRUB options, see the GRUB manual.

  1. After you have updated the configuration, apply the changes by running the following command:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Accessing the GRUB menu

After you configured GRUB to read and write to the serial port, you can access the GRUB menu using EC2 Serial Console. This console provides a Secure Shell (SSH) session to securely access your EC2 instance’s serial port. The SSH session is authorized using an SSH key pair. You can access the EC2 Serial Console using one of the following methods:

In order to connect to EC2 Serial Console, you must generate a one-time SSH key locally on your client. Use the AWS CLI action SendSerialConsoleSSHPublicKey to push the public key to the EC2 Serial Console service, and use SSH to connect to the EC2 Serial Console endpoint.

The EC2 console combines all these steps into a single-click access. For detailed instructions of this process, see Connect to the EC2 Serial Console.

By default, your AWS user account does not have access to push an SSH public key to the EC2 Serial Console service. You can allow access by using the EC2 console or the AWS CLI. Allowing access at the account level ensures that all instances in that account can access EC2 Serial Console. You can exercise more granular controls at the instance level by setting a resource group or tag-based IAM policy. For more information, see Configure access to the EC2 Serial Console.

Troubleshooting using EC2 Serial Console

For the example in this blog post, you use AWS CloudShell. AWS CloudShell is a browser-based shell that makes it simple to securely manage, explore, and interact with your AWS resources. AWS CloudShell is pre-authenticated with your console credentials. You can launch AWS CloudShell directly from the AWS Management Console.

To troubleshoot using EC2 Serial Console

  1. From the AWS Management Console, option the AWS CloudShell console by pressing the CloudShell icon:CloudShell icon:
  2. Generate a one-time SSH key par using ssh-keygen
    ssh-keygen -t rsa -f my_rsa_key
  3. Push your public key to EC2 Serial Console using the AWS CLI installed on AWS CloudShell.
    aws ec2-instance-connect send-serial-console-ssh-public-key \
    --instance-id i-00123EXAMPLE \
    --serial-port 0 \
    --ssh-public-key file://my_rsa_key.pub
    --region $REGION
  4. Start an SSH session to EC2 Serial Console.
    ssh -i my_rsa_key i-00123EXAMPLE.port0@serial-console.ec2-instance-connect.{region}.aws
  5. In a new browser window, open the EC2 console.
  6. From the EC2 console, reboot your EC2 instance.
  7. Go back to your AWS CloudShell session and watch your instance reboot.
  8. You should see your instance pause on the GRUB menu. When the GRUB menu appears, press any key to stop the boot process, allowing you to interact with the GRUB menu. The following screenshot shows an example of the GRUB menu for Amazon Linux 2.

advanced options for linux2

 

Simulating a kernel failure

Now that you can access the GRUB menu, you can simulate a failed kernel boot. In this example, you use AWS CloudShell to connect via SSH to your instance.

To simulate a failed kernel load

  1. Log in to your EC2 instance using the same method you used when configuring GRUB.
  2. Install an older kernel using the following command:

sudo yum install kernel-4.14.104-95.84.amzn2.aarch64

  1. After the kernel is installed, reboot the instance using the following command:

sudo reboot now

  1. Once again log into your EC2 Instance using your preferred method. After logging in, run the following command:

uname -r

You should see a message similar to the following image, to show that you are indeed running version 4.14.104-95.84.

Figure 4: Amazon Linux 2 now running kernel 4.14.104-95.84

  1. Rename the kernel file to simulate a kernel failure by running the following command:

sudo mv /boot/vmlinuz-4.14.104-95.84.amzn2.aarch64 /
 /boot/vmlinuz-4.14.104-95.84.amzn2.aarch64.oops

  1. Reboot the EC2 instance, and connect using EC2 Serial Console to see the boot failure that was shown previously in Figure 2.
  2. After a few seconds, the GRUB menu re-appears and you can select another kernel. The following image shows an example of the GRUB boot menu with both the original Linux kernel version 4.14.214 and the now default and broken Linux kernel 4.14.104.

Figure 5: GRUB menu showing all kernels installed

  1. Select the previous stable version from the GRUB menu (in this example, 4.14.214) and continue the boot process, as shown in the following image.

Figure 6 Successful boot of Amazon Linux 2

 

The GRUB menu provides additional modes to help fix and troubleshoot boot-failures. The example in this section shows a simple failure that can be recovered by selecting the previous stable kernel. However, sometimes a previous stable kernel is not available. For that, GRUB has two additional modes, single user mode and emergency mode.

Single user mode

Single user mode boots the kernel at a lower run level. For example, it might mount the file system but not activate the network, which allows you to perform necessary maintenance to fix the instance.

To boot into single user mode

  1. From the GRUB menu, enter e on the kernel you want to boot into.
  2. User the arrow keys to locate your cursor on the line containing the kernel.
  3. At the end of the line, add the word single. The following is an example for Amazon Linux 2:

linux /boot/vmlinuz-4.14.193-149.317.amzn2.aarch64 root=UUID=d33f9c9a-\
dadd-4499-938d-ebbf42c3e499 ro console=tty0 console=ttyS0,115200n8 net.ifname\
s=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.she\
ll=0 single

  1. Use Ctrl-X to boot into single user mode.
  2. At the prompt, enter the root user password.

Emergency mode

Emergency mode is similar to single user mode, except that the kernel runs at the lowest run level possible. To boot into emergency mode, follow the same steps as for single user mode, but at step 3, add the word emergency instead of single. The following is an example for Amazon Linux 2:

linux /boot/vmlinuz-4.14.193-149.317.amzn2.aarch64 root=UUID=d33f9c9a-\
dadd-4499-938d-ebbf42c3e499 ro console=tty0 console=ttyS0,115200n8 net.ifname\
s=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.she\
ll=0 emergency

Clean up

After you’ve finished with the instance you created for this post, you should clean up by deleting the instance. This will prevent you from incurring any additional costs. To delete the instance:

  1. In the navigation pane, choose Instances. In the list of instances, select the instance.
  2. Choose Instance state, Terminate instance.
  3. Choose Terminate when prompted for confirmation.

Amazon EC2 shuts down and deletes your instance. After your instance is deleted, it remains visible on the console for a short while, and then the entry is automatically deleted. You cannot remove the deleted instance from the console display yourself.

Conclusion

EC2 Serial Console offers serial port access to an EC2 instance to interact with boot loaders, debug boot issues, update networking configuration, and troubleshoot malfunctioning instances. To learn more, see EC2 Serial Console for Linux instances in the Amazon EC2 User Guide, or follow this hands-on Qwiklabs Troubleshooting connectivity using EC2 Serial Console. You can also connect to the EC2 Serial Console of your Windows instances.