Carnegie Mellon University

Photo of a snowy mountain range; in the distance, a line of people traversing the range can be seen

Expeditions: Carbon Connect: An Ecosystem for Sustainable Computing

By Emma Strubell

AI technology has come of age over the last decade thanks to advances in hardware and methodology for machine learning (ML) using deep neural networks. So far, these impressive advances in the capabilities of AI, such as the now-ubiquitous conversational fluency of OpenAI's ChatGPT and AI pair programming with GitHub Copilot, have been enabled by massively scaling deep neural network models, both in terms of the size (number of model parameters) and computation (floating-point operations), as well as the number of training data examples required to fit the models. Advances in model end-task performance and the emergence of desirable model capabilities grow in proportion to computation and dataset size (Ghorbani et al. 2022, Wei et al. 2022), with corresponding operational emissions due to the energy required to power them (Strubell et al. 2019, Dodge et al. 2022). Sustainable development and use of AI will necessitate developing ML methodologies that not only require less computation and training data while maintaining or enhancing the required capabilities, but that adapt to resource availability and system conditions.

We identify two key challenges in machine learning that represent bottlenecks limiting the extent to which models can be adapted to changes in their environment: continual learning, and characterizing example difficulty. Continual learning will extend the useful lifetime of machine learning models and training data by allowing models to be incrementally updated over time to adapt to changes in data and phenomena rather than regularly re-training from scratch on new data as it arrives, substantially reducing the amortized computational and carbon cost of training and data ingestion. Characterizing example difficulty will enable more effective provisioning of computational resources along multiple axes, reducing operational emissions by improving efficiency in training, inference, and data curation stages, and reducing embodied emissions by extending the useful lifetime of aging hardware.