Globus Enables Experiment-Time Data Analysis
July 17, 2024
The Advanced Photon Source (APS) at the Department of Energy’s Argonne National Laboratory is in the process of completing a two-year upgrade to their facility and beamlines, where the x-ray beams will be 500x more powerful and will collect data at 10x the previous rate. This revolutionary upgrade brings extraordinary new opportunities for researchers. Scientists will be able to study materials at the atomic level with higher precision and at faster speeds, and achieve new scientific breakthroughs. In order for researchers to capitalize on this upgrade the APS is addressing how researchers are doing their experiments. Researchers must find the “needle in the haystack’’ given the mountain of data that will be available. Researchers will require the ability to conduct rapid data analysis using highly scalable systems, such as the powerful, new exascale supercomputer “Aurora” at the Argonne Leadership Computing Facility (ALCF), in order to make sense of all the data, and even modify their experiments while they still have use of the scientific user facility.
For over a decade, the ALCF and APS, both DOE Office of Science user facilities, have collaborated to build the infrastructure for integrated ALCF-APS research, including work to develop workflow management tools, and enable secure access to on-demand computing. The Globus platform and suite of tools enabled them to achieve their goals. The APS together with the ALCF has built a fully automated complex data analysis pipeline that uses ALCF resources to rapidly process data obtained from the X-ray experiments at the APS. Researchers at the APS are now able to collect data coming off instruments, and use Globus to transfer the data. With Globus Auth they create a beamline service account, and with Groups they set up access to the results through a data portal for the team to have secure, shared data access. Globus Compute enables the team to bring supercomputers into the loop during the experiment, and offload heavy computation for data analysis. All of these tasks are wrapped into a Globus Flow to execute without human intervention, and deliver results to the researchers in near real-time to view results and adjust the experiment if necessary. Researchers no longer have to wait for results until after leaving the facility, and possibly discover that their samples or data are not usable.
This fully automated pipeline, which integrates experimental and observational facilities with computing centers, changes the way that experiments are run in today’s labs, where researchers are grappling with how to deal with the tsunami of data and how to leverage the exciting new tools and technologies available in order to accelerate time to insight and discovery.