‘If Data, Then Discover’ — UChicago Software Group Globus Seeks to Automate Science

October 25, 2018

Smart home devices and popular web services such as If This Then That (IFTTT) have made it possible for people to automate many routine life tasks. Users can set their thermostat to kick up the heat if the temperature nears freezing, have their washing machine send them a text when a load is finished, or sync up calendars and to-do lists across home and work devices.

Globus, a software service created and based at the University of Chicago, already helps scientists simplify their workflow by automating data transfer and synchronization tasks. Now, thanks to a $2 million National Science Foundation grant, Globus will introduce a broader set of automation services that make more comprehensive automation possible.

Globus Automate Service

Imagine a telescope that, whenever it collects a new image, automatically sends the data to an institutional cluster or the cloud, where it can be stored, published, and analyzed based on predetermined recipes; if a notable event, such as a supernova, is detected, another telescope can be steered to observe that region.

The new Globus Automate services will follow the model of “trigger-action programming,” a user-friendly interface that requires no programming knowledge to create automated sequences initiated by events. Using the platform, a scientist will be able to easily set up a workflow of actions -- including data cleaning, transfer, curation, and analysis -- to take place automatically whenever a trigger event occurs. For example, the workflow may run whenever they collect or receive data, or after some preprocessing run has been completed, or at other specified times.

“We want to help scientists by moving all the complexities of managing data to a cloud service,” said Ian Foster, co-founder of Globus and Arthur Holly Compton Distinguished Service Professor of Computer Science at UChicago. “We intend to accelerate scientific discovery by providing expanded automation services, beneficial to researchers across a broad spectrum and delivered via a widely adopted and sustainable platform.”

Globus was first launched in 1997 to enable the scientific use of grid computing -- the connection of distributed computational resources that was a precursor to what’s now known as cloud computing. Since then, it has grown into a full suite of services used by tens of thousands of researchers and over 1,000 institutions in the U.S. and internationally to move, share, publish and otherwise manage data through over 12,000 Globus “endpoints.” The new Automate services will turn those static data waystations into active portals that can initiate actions whenever interesting events occur, such as the creation or receipt of new data, Foster said.

Bobby Kasthuri

Asst. Prof. Narayanan “Bobby” Kasthuri is one of several research partners who will use Globus Automate in their scientific workflow. Photo by Mark Lopez/Argonne National Laboratory

As part of the new grant, over the next three years Globus will build and test the new automation platform with research partners studying astronomy, geoscience, hazard engineering, materials science, and neuroanatomy. For example, Narayanan "Bobby" Kasthuri, a neuroscientist at UChicago and Argonne National Laboratory who uses the Argonne Advanced Photon Source to map the brains of humans and animals, will automate the workflow from image collection to 3-D reconstruction and visualization. With 20 gigabytes of data collected by the instrument each minute, automation is key to timely and efficient data processing.

But the new services are not just intended for managing large, data-intensive research, but also for small labs where limited time and resources are all too often spent on tedious and repetitive tasks.

“Scientists will be able to define flows before an experiment begins and then ingest data rapidly, confident in the knowledge that data are being analyzed, processed, and shared with collaborators; that errors are detected and reported; and that analysis results are available to steer the experiment,” said Kyle Chard, senior researcher at Globus and co-investigator on the grant. “The expected result will be a significant increase in the quantity and quality of research output.”

The Globus team will also work with research computing centers at University of Chicago, Notre Dame, University of Michigan, Purdue University, and Northern Illinois University to test automated services and train potential users. Researchers will be able to create trigger actions in the Python programming language via a web-based interface and through the popular data science tool JupyterLab.

As these new functions are developed and rolled out, the Globus team will also work with Blase Ur, Neubauer Family Assistant Professor of Computer Science at UChicago, on user-centered design. Previously, Ur has studied the unexpected bugs that can occur with trigger-action programming, despite their accessible intentions.

“There’s very much a science to how you make things user-friendly,” Ur said. “There is a very low barrier to entry with these tools, but then once you have a set or suite of trigger-action programs, they can start interacting in really complicated ways. Globus Automate is a great testbed for the work we’re doing, and to be able to deploy this research on a large service and actually see it in practice in the field is going to be awesome.”

Read the story on UChicago news site: https://www.cs.uchicago.edu/news/article/if-data-then-discover-uchicago-software-group-globus-seeks-to-automate-scie/