AWS Open Source Blog

How and why AWS contributes to Jupyter

Artificial intelligence (AI) and machine learning (ML) have exploded in popularity as enterprises have sought to make better use of their data. At the heart of these efforts is Project Jupyter, a popular open source project widely used in data science, machine learning, and scientific computing. Although Jupyter is beloved for helping data scientists do complicated, technical work, its real genius has perhaps less to do with machines and more to do with humans. That is, Jupyter makes it easy for people to think and tell stories with code and data.

At AWS, we work with incredible, open source, data science communities, such as Tensorflow and PyTorch, to make code accessible for customers through services like Amazon SageMaker. In the case of Jupyter, it was so important to customer success that we went a step farther, hiring Jupyter co-founder Brian Granger in early 2019. This move put Granger in a position to learn from the “challenges and pain points” AWS customers have working with Jupyter and then turn around and work with the Jupyter community to improve the project for all, including AWS customers.

This approach is nothing new for Granger. As he said in a recent interview, when he and Juypter co-founder Fernando Pérez started working together in 2004 as academics, they weren’t trying to change the world through data science. Rather, he says they were obsessed with building something that they could put to immediate use with their students in doing computational physics. Granger, in other words, has always embodied the Amazon principle of customer obsession. It’s just that the nature of his customers has changed.

Data science for all

AWS has long had a vision that in the not-too-distant future, virtually every application will be infused with ML and AI, as I wrote in 2019. Today, tens of thousands of customers benefit from ML through Amazon SageMaker, a fully managed service that allows data scientists and developers the ability to quickly and easily build, train, and deploy ML models at scale.

Key to that vision and, indeed, for pretty much any data scientist, is Jupyter.

Jupyter Notebooks have become ubiquitous across computational education and research, science, data science, and machine learning. Millions of users and tens of thousands of organizations use Jupyter daily. As of early 2021, there are more than 10 million public Jupyter Notebooks on GitHub. Data scientists are building incredible things using Jupyter, as a review of JupyterCon 2020 talks indicates.

But what is Jupyter, exactly?

If you’re a software developer, you may write code using VS Code. Jupyter, by contrast, is not really used to build software as much as extract insight about data with code, Granger says. Jupyter, in other words, is a tool that enables people to think with code and data and then to build narratives or stories around it to communicate those code and data-driven insights to others. The Jupyter Notebook is a concrete artifact that enables both of those flows, combining live code with narrative text, mathematical equations, visualizations, and other content. Jupyter comes with tools for converting these notebooks into websites, blog posts, dashboards, and other means of sharing.

As popular as Jupyter is today, back in 2004 it was just an idea that Granger and Pérez had for delivering a Mathematica-like experience for their students, but with a more approachable programming language (Python) and, importantly, open source. “At the time it was less about it being cheaper and more about it being open source, which made it more extensible and easier to hack on,” Granger explains.

Back then, rich web applications like Gmail were emerging, giving Granger and Pérez the sense that they could deliver a similar experience for their students. Granger says there was just one problem: “We were theoretical computational physicists with no experience building this type of stuff.”

What they did have, however, was the perfect target user: themselves. “In the early days we were building something that we ourselves wanted to use,” Granger says.

Did they think it might have broader applicability? Yes, but with a caveat. “We imagined that this might take off and have broad impact in that space,” Granger says. Yet that’s where the vision ended. “What we really couldn’t have foreseen is that the rest of the world would wake up to the value of data science and machine learning,” he adds. This seems obvious in retrospect; it was anything but back then.

Improving Jupyter for all

Granger explains that the organizational complexity of managing an open source project of Jupyter’s size is a huge amount of work. Fortunately, he’s not alone. Jupyter is now part of the NumFOCUS Foundation and, although Granger and Pérez co-founded Jupyter, more than 1,500 others have contributed to the project since its launch in 2011. Today, Jupyter is a thriving community, one in which AWS engineers are fortunate to participate.

Which was the last question I asked Granger: How (and why) is AWS involved?

“At AWS, we want to make Jupyter as good as we possibly can, engage with the open source community, and improve Jupyter on behalf of our customers and all Jupyter users,” Granger explains. Given how much AWS customers depend on Jupyter, this makes sense, and supporting Jupyter—directly, and through the team at AWS that actively contributes to the project—is a full-time job for him.

Granger’s job mirrors the role Jupyter plays in data science: Jupyter sits at the intersection of challenging technical problems and interesting human problems, making data science work for and speak to humans. To do this well, Granger concludes, means that his open source work with Jupyter and his product-related work with SageMaker aren’t divergent. They’re near-perfect complements, essentially requiring Granger to continue being active and leading on the Jupyter front.

Interested in trying Jupyter? Please check out the Jupyter project page to get help installing and using Jupyter. Would you like to help Granger and others in the Jupyter community to make it even better for your needs? Have a look at the Jupyter contribution guide and, when you’re ready, submit your first pull request on GitHub.

Matt Asay

Matt Asay

Matt Asay (pronounced "Ay-see") has been involved in open source and all that it enables (cloud, machine learning, data infrastructure, mobile, etc.) for nearly two decades, working for a variety of open source companies and writing regularly for InfoWorld and TechRepublic. You can follow him on Twitter (@mjasay).