Supercomputing
Image of Discover SCU 17 with intense blue back lighting
Added in 2023, Scalable Compute Unit 17 is the most recent addition to the Discover supercomputer at the NASA Center for Climate Simulation. This row of racks provides 1/3 of the computational capacity available to NASA scientists. Bruce Pfaff, NASA/Goddard

The Discover supercomputer at the NASA Center for Climate Simulation (NCCS) enables NASA earth science, heliophysics, planetary science, and astrophysics research. Discover was architected in 2005 and has evolved over the years through the addition of new—and the removal of old—Scalable Compute Units (SCUs). The supercomputer began in 2006 with a Base Unit providing 3.3 teraflops. In 2023, Discover provides more than 2,600 times the original compute capacity to the NCCS user community: 8.8 petaflops. In fact, a single, 48-core node on Discover’s newest SCU provides slightly more computing power than the entire Base Unit!

Despite this growth, after 18 years of changes to high-performance computing (HPC) architectures, user workflows, data volumes, cloud computing, and security requirements, the NCCS has decided to start anew. Presented are some of the original architectural choices that have motivated this decision, along with the requirements for the new design, a review of technologies and architectures under consideration, and some musings on the balance between the “forklift” install and the “evergreen” approach that can be applied to HPC systems as well as everyday life.

Quick Facts

The NCCS provides critical resources to support HPC and machine learning workflows for NASA scientists, but it’s time to re-architect our environment to meet the challenges of the next decade that can be best served with hybrid HPC and cloud computing.

Laura Carriere,
NASA Goddard Space Flight Center
  • The NCCS supports NASA scientists with a diverse array of resources: the Discover supercomputer; Explore/ADAPT, an on-premises cloud environment; Prism, an artificial intelligence/machine learning (AI/ML) system; and over 100 petabytes of online storage.
  • The approach to expanding Discover over the past 18 years through adding and removing SCUs is no longer viable due to emerging technologies and evolving science requirements.
  • Their next-generation supercomputer will support both HPC and AI workflows, access to NASA-curated data products, and the flexibility available in the cloud to support the science needs of the future.
  • The NCCS will purchase a rack of test hardware to evaluate new technologies, including high-speed interconnect options, storage solutions, an orchestration layer for controlling access to compute resources, Continuous Integration/Continuous Delivery (CI/CD) for operating system provisioning and maintenance, and Zero Trust Architecture for security.

Researcher

  • Laura Carriere, NASA Goddard Space Flight Center

More Information