Enabling AI and ML
Globus provides a platform that supports artificial intelligence research, development and deployment across diverse and distributed science cyberinfrastructure.

Machine learning (ML) and artificial intelligence (AI) are now critical tools for almost every research discipline. These methods are pushing the limits of computing and storage resources, requiring management of vast quantities of data, specialized and distributed training resources, and high throughput inference services.
Globus provides a range of capabilities that can reduce the barriers associated with AI research, development, and deployment, and in particular remove the frictions associated with exploiting diverse and distributed cyberinfrastructure. For example, using Globus, researchers can easily manage large training datasets, moving them between computing resources for training and storage; deploy large-scale training campaigns across diverse computing resources, from institutional clusters to national supercomputers and commercial clouds; and construct robust workflows that automate sophisticated AI processes from data acquisition to model deployment, for example to deploy AI-driven discovery processes. Further, Globus, as a cloud-hosted platform with open APIs, provides an ideal base for building AI-centric services that can address domain- or AI-specific use cases.
Globus provides these capabilities via a cloud-hosted model in which the complexity of managing AI processes can be outsourced to a set of reliable, available, performant, and secure cloud services. Globus leverages industry-standard authentication and authorization methods to secure interactions between users, applications, services, and distributed cyberinfrastructure, ensuring that these methods can be used broadly across research domains, cyberinfrastructure, and AI methods.
Why Globus?
- Globus overcomes the data, computation, and cyberinfrastructure challenges of the AI lifecycle
- Globus enables researchers to easily scale from small, exploratory work to production use of the largest AI datasets and models.
- Globus enables researchers to “bring-their-own” resources, enabling Globus capabilities to reliably manage the AI process on any accessible computing resources
- Programmatic APIs, CLIs, and SDKs enable straightforward adoption in arbitrary research domains and creation of customized services and platforms for others.

Diamond
Model training
Diamond is a service to manage the training and fine tuning of models on high performance computing clusters. It provides an accessible cloud-hosted service via which users can discover models (e.g., OpenFold, Llama), create a container for a specific target resource (e.g., TACC Frontera), deploy the training job on that target resource, and monitor and manage the training process via various training statistics.

Advanced Privacy-Preserving Federated Learning framework APPFL/APPFL
Federated learning
APPFL/APPFLx is a powerful Python library and hosted service for managing federated machine learning training and inference across disparate computing resources. APPFLx provides a REST API and web interface to create “federations” of participating devices and to then deploy a machine learning training process across those federations. It uses Globus Compute to launch local training on each device and to launch aggregation of local models into a global model. The resulting global model incorporates aspects of the local models and can then be used by the federation for inference.

Garden
Model publication and inference
Garden is a service to support publication of machine learning models alongside data grouped by scientific domain and use cases. It simplifies the process of wrapping a model with all dependencies such that it can be easily used for inference by users. Inference can be launched on any Globus Compute endpoint or in the cloud, enabling researchers to easily discover a published model and then use it on their local resources.

AuroraGPT
On-demand inference
AuroraGPT is an effort at Argonne National Laboratory to build large science-focused generative models. AuroraGPT is using Globus services to provide on-demand inference capabilities using supercomputers at Argonne.

Foundry
Data, Models, Science
Foundry hosts structured, ML-ready data of any size that can be accessed programmatically via a Python SDK. It provides a schema via which publishers can describe datasets, making it easy for users to discover and then use those datasets. Datasets can be loaded directly in Python and used as input to model training or inference.