AWS Open Source Blog

Enhancing data science environments with Vim, tmux, and Zsh on Amazon EC2

This post was written by Josiah Davis, Yin Song, and Anne Hu. The solution can also be found on GitHub.

Many professional data scientists are adopting open source software development tools such as Vim, tmux, and Zsh to get more productivity out of their working environment.

  • Vim is a free and open source, highly configurable text editor built to make creating and changing any kind of text very efficient.
  • Tmux is an open source terminal multiplexer for Unix-like operating systems. Tmux allows creating persistent terminal sessions, and further splitting these sessions into multiple windows and panes.
  • Oh My Zsh is an open source, community-driven framework for managing your Zsh configuration. It provides ergonomic support for working with the command-line, including using Git.

Although there is a bit of a learning curve with these tools, once you commit certain keystrokes to memory, Vim, tmux, and Zsh provide the foundation for a pleasant coding experience. They also offer several benefits:

  • Increase portability and consistency: The same environment can be reproduced anywhere quickly and programmatically.
  • Increase productivity: Sessions are saved, and effective navigation saves time. Also, you can do nearly all of the tasks you need purely by using a keyboard (mouse not required).
  • Expand your skillset: Using these tools will help you develop good software development practices, such as writing code that is modular and testable.

Few resources, however, are available for those just getting started on this journey. In this blog post, we provide guidance for setting up Vim, tmux, and Zsh on an Amazon Elastic Compute Cloud (Amazon EC2) instance.

Solution overview

We provide an AWS CloudFormation template to spin up an AWS Deep Learning AMI (Amazon Linux 2), a user data script to set everything up, and a few tips for getting started (on GitHub). Functionally, there are three aspects to this solution:

  1. Manage code (for example, edit, run, and monitor simultaneously) in a persistent session using tmux.
  2. Develop and edit code using Vim, configured with popular plugins and sensible defaults.
  3. Enable ergonomic directory navigation and display the Git branch and status with Oh My Zsh.

The configuration options provided here are not meant to be comprehensive, but they should help you get started. Once the setup is complete, the development environment will look something like this:

Screenshot of what the development environment should look like once the setup is complete.

Part 1: Setting up the environment

We have provided two quick setup options below: command-line deployment or one-click deployment.

Option A: Command-line deployment

For command-line deployment, clone the GitHub repository, change to the relevant working directory, update the relevant parameters in the config/deploy.ini file, and then run the deploy command with a stack name and Region.

git clone https://github.com/aws-samples/ec2-data-science-vim-tmux-zsh.git
cd ec2-data-science-vim-tmux-zsh
 # Enter parameters to config/deploy.ini
./deploy.sh <STACK_NAME> <REGION> [<PROFILE>]

Option B: One-click deployment

For one-click deployment, follow the screenshots provided by first creating a stack in AWS CloudFormation from the AWS Management Console by selecting Launch Stack:

Launch stack.

Select Upload a template file and then upload the CloudFormation template file:

Screenshot of the console highlighting the steps to upload the CloudFormation template file.

Input your own parameters in the Parameters fields:

Screenshot of the console highlighting the specifics of the parameters when creating the stack.

Select Create a stack, configure the stack options, and then select Create stack:

Screenshot of the "Advanced Options" in configuring the stack options.

Screenshot of the settings necessary to create the stack.

Part 2: Using the environment

Once the EC2 instance has been set up and configured, you are ready to begin development. If you are new to Vim, tmux, or Zsh, you can read through the following sections to get up to speed.

Practical guidance on Vim

Vim makes navigation through code seamless. With practice, you will be able to type and initiate code quickly. If you have never used Vim before, check out vimtutor, which provides the most basic navigation commands. Here we focus on a few useful Vim commands that will be helpful for those in the early days of learning to use the tool.

Navigating enclosures

Quick and effective code editing often comes down to manipulating text within of enclosures (for example, ", (, [, {,). Core commands consist of three components:

  1. What you want to do to the text (for example, change, yank, delete).
  2. Whether you want to include the enclosure or merely the text inside of it (i for inside, a for around).
  3. Which enclosure you want to include (for example ", (, [, {,).

Put these commands together when inside an enclosure, and you have a powerful tool for editing code. To illustrate, here are a few examples:

  • ci) change everything inside the parentheses.
  • yi“yank everything inside the quotation marks. This is akin to copy and pasting the text within the quotations marks.
  • da]delete everything within the square brackets, including all the brackets.

As always, once a command has been initiated, the command will be temporarily saved and can be repeated with the . command, until it is overwritten by the execution of another command.

Example of navigating enclosures in Vim.

Setting the leader key

Some operations are so common that it makes sense to make a custom shortcut for them. The leader key provides a standardized approach to shortcuts. A common leader key option is <Space>. We can set this value in the ~/.vimrc file.

let mapleader="\<Space>"

In the following example, we use it for easier writing and for quitting, but it can be used for any common shortcut we want to set.

nnoremap <Leader>w :w<CR>
nnoremap <Leader>x :x<CR>
nnoremap <Leader>q :q<CR>

Now, when writing code, for example, you can use <Space>w to save the updates instead of :w<Enter> .

Vertical scrolling

Part of programming is simply “scrolling” up and down in a single code file, and Vim has many options for vertical scrolling. To navigate around the page, use G to travel to the bottom of the page, use gg to travel to the top of the page, and use Ctrl-U/Ctrl-D to travel up or down by half pages.

To navigate by code chunks, {} jumps up/down by code blocks. To configure the position of the screen with respect to the cursor, zz centers the screen with respect to the cursor, zt moves the screen down until cursor is at top, and zb moves the screen up until the cursor is at bottom.

Example of vertical scrolling.

Tabs with the Nerdtree plugin

To navigate between files in the code base, you can use Nerdtree, which is a popular Vim plugin that functions as a pop-out navigation pane and central hub for opening new files. Nerdtree can be expanded/collapsed using Ctrl-N. You can view/hide invisible files using Shift-i to toggle, and refresh the file menu using Shift-R.

Tabs also pair nicely with the Nerdtree plugin. Use t or T to open a tab from the Nerdtree menu. Using t will open the file in a new tab and jump to it, whereas T will open the file in a new tab and will not jump to it. The gt and gT commands will scroll through tabs forward and backward, respectively.

Additionally, from the Nerdtree menu, you can select s for a side-by-side split or i for a vertical stack.

Example of navigating between files using the Nerdtree plugin.

Auto-complete

Working with a good auto-complete mechanism is pure joy, and once you start using one, you will never go back. In our ~/.vimrc, we use tabnine. Command options are provided, and the commands necessary to complete code are left to the user’s discretion. You can use Tab/Shift-Tab to “scroll” through the available options.

Commenting

The vim-commentary plugin from Tim Pope makes commenting or uncommenting blocks of code easy. Select the text in visual mode (for example, select three lines with Vjj) and then gc to toggle comment/uncomment. To achieve this step, put the following in ~/.vimrc:

Plug 'tpope/vim-commentary'

Example of commenting and uncommenting using the Vjj and gc commands.

The G command

We’ve already discussed G and gg for jumping to the bottom or top of a file, but the g command is useful for a variety of additional things. Let’s say you were just editing code, and you navigated to a different part of the file when you realized that you forgot to add something. Entering gi will jump back to the previous insert. In a similar way, gv jumps back to the previous visual selection. Finally, gq is another useful command that will bring any visually selected text within the line limit, which can come in handy when editing README.md files, for example.

Hybrid numbers

By default, Vim will show absolute line numbers inside of a text file. However, relative line jumping can become unwieldy when you have to count the number of lines to jump. With relative line numbers, you can quickly jump the correct number of lines up or down (for example, 4j will jump down 4 lines, 8k will jump 8 lines). That said, there are merits to using absolute line numbers also. For example, when troubleshooting an error that is coming from a particular line, it is helpful to be able to see that. One solution is to use hybrid line numbers. That is, you can use relative line numbers when you are in normal mode jumping around the code, and absolute numbers when you are not in normal mode (for example, when the cursor is in another file or you are actively typing in insert mode).

You can set up hybrid line numbers by putting the following snip in your ~/.vimrc:

:set number relativenumber
:augroup numbertoggle
: autocmd!
: autocmd BufEnter,FocusGained,InsertLeave * set relativenumber
: autocmd BufLeave,FocusLost,InsertEnter * set norelativenumber
:augroup END

Practical guidance on tmux

The tmux tool allows you to persist your session, work in multiple windows, and split them into multiple sections. This functionality comes in handy when performing data science work on an EC2 instance. Here are examples of what tmux lets you do:

  • You can split up the screen to edit code in one section, while running it in another (shown below).
  • For training jobs, which may take hours or longer to complete, tmux can ensure that the training job is uninterrupted and finishes, even if the SSH session disconnects.
  • You can have separate windows for different tasks, such as monitoring GPU utilization, running Flask server, etc.

Screenshot displaying the session, window, tabs, and panes within tmux.

Tmux is organized into sessions, windows, and panes. In the preceding image, there is one upper pane and one lower pane. These comprise a single window. There is also a second window that is not shown. A session comprises all windows and constituent panes.

Configuring our environment and the prefix key

In tmux, configuration options are controlled with a .tmux.conf file, typically located in the home directory. One of the first options you might set is the prefix key. The prefix key is what to press to alert the system of your intention to type in a tmux command. The default is Ctrl-B, but a common option is Ctrl-A. Making the switch requires the following:

set -g prefix C-a

Another common option is to map ctrl to caps-lock (otherwise, not a frequently used key) on your keyboard so these keys are next to each other. This is a standard option on most computer operating systems.

Switching between tmux and Vim panes

The vim-tmux-navigator plugin from Chris Toomey lets you navigate the tmux panes with the same keystroke combination as Vim panes using Ctrl-{hjkl}. Similarly to the way {hjkl} in Vim navigates around the document (left, down, right, up), Ctrl-{hjkl} navigates between the tmux panes.

You just need to add the following plugin to .vimrc and add a few lines of boilerplate to tmux.conf.

Plug 'christoomey/vim-tmux-navigator'

Using vi mode

You can use vi mode in tmux to “scroll” through output. A common option is to use Esc to turn on this mode. From there, you can use Vim keystrokes to navigate up and down. Additionally, increasing the amount of history that is retained can be useful. Here you set to the past 10,000 lines:

set-option -g history-limit 10000
setw -g mode-keys vi
unbind-key [
bind-key Escape copy-mode

Programmatically setting up environments

Tmux has the ability to set up a new environment programmatically. Included in the (non-exhaustive) range of design options is the ability to specify the windows and panes, or even specify where to send commands to the session. We’ve provided an example Bash script that creates two panes stacked vertically, as shown above, as well as options for additional windows for monitoring or running a Jupyter server.

Practical guidance on Zsh

Many different shell options are available, including sh, Bash, fish, and Zsh. In particular, Oh My Zsh is a wrapper for Zsh that includes features to make the process of working from the command line more enjoyable. Two areas of focus are navigation and Git use. From the perspective of navigation, Oh My Zsh allows for tab completion, tab scrolling through directories, and folder expansion (for example, cd m/p + Tab translates to module/pipeline).

Example of Oh My Zsh navigation through tabs.

From the perspective of using Git, Oh My Zsh automatically displays whether your Git status is clean or dirty (i.e., whether you have uncommitted changes or not) and which branch you are on. These two indicators are useful for data scientists who are new to software engineering in that they provide constant visual reminders to utilize Git functionality.

Example of Oh My Zsh displaying the Git status and which branch you are on.

Conclusion

Whether you are an experienced data scientist who is new to cloud computing or a new data scientist seeking to grow your software development skillset, there are many benefits to incorporating Vim, tmux, and Zsh into your workflow.

You can have the same programming environment in a variety of settings, whether you are programming locally or on a remote instance, and your environment can be managed purely by configuration files and Bash scripts, further increasing portability. Additionally, by embedding the necessary keystrokes into muscle memory, you can reduce your overall cognitive burden, while simultaneously receiving prompts to follow good software development practices. Eventually, you can get to the point where you can edit and write code as fast as you can think.

Anne Hu

Anne Hu

Anne Hu is an Intern at AWS who hopes to make an impact by improving practical applications of AI/ML, especially with respect to ethical AI. She holds a BSc in Statistics and BAS(Hons I) in Data Science from the University of Sydney and is currently completing her LLB. In her downtime, she enjoys exploring new cultures and hidden gems in every corner of the world.

Josiah Davis

Josiah Davis

Josiah Davis is a Senior Data Scientist with AWS where he engages with customers to solve applied problems in Machine Learning. Outside of work, he enjoys reading and travelling with his family. He holds a master's degree in Statistics from UC Berkeley.

Yin Song

Yin Song

Yin Song is a data scientist from the AWS ProServe ML APJC team since May 2019. He works very closely to several enterprises and industries (e.g., telecommunication, mining, FSI and etc.) to design and apply machine learning and AI solutions, and create value for customers. Before joining AWS, Yin worked for Telstra, the largest telecommunication company in Australia, and delivered several projects about customer and network experience optimisation. Earlier to this, he was working as a data scientist in the field of online advertising and was leading the ML-based advertising optimisation. He obtained hiss PHD back in 2014; his thesis was about probabilistic machine learning and applications.