📚 The Carbon Footprint of ChatGPT
💡 Newskategorie: AI Nachrichten
🔗 Quelle: towardsdatascience.com
Opinion
This article attempts to estimate the carbon footprint of the popular OpenAI chatbot called ChatGPT
There’s a lot of talk about ChatGPT these days, and some people talk about the monetary costs of running the model, but not many people talk about the environmental costs of the model.
Increasing levels of greenhouse gases in the atmosphere due to human activities are a major driver of climate change [8]. The information and communications technology (ICT) sector and the data center industry are responsible for a relatively large share of global greenhouse gas emissions [9].
We — users and developers of digital tools that run in data centers — therefore need to do our part to contribute towards reducing the carbon footprint of digital activities, thereby mitigating climate change.
To this end, it is first and foremost important that we become aware that even digital products require energy to develop and consume, thus they have a carbon footprint. This article contributes towards this objective. Additionally, it is important to have access to factual information as we discuss how to reduce our footprint such that we can prioritize our efforts to address those issues that yield the biggest carbon savings.
Finally, I hope this article will inspire developers of machine learning models to disclose the energy consumption and/or carbon footprint of their models, so that such information can be used along with model accuracy metrics do assess the performance of models.
Environmental costs of large scale machine learning
Environmental costs can come in various guises, for instance water usage, soil contamination, air pollution. In this post, we’ll take a look at the environmental impact of ChatGPT through the lens of carbon footprint.
When determining the carbon footprint of a machine learning model, one can distinguish between a) The carbon footprint from training the model, b) The carbon footprint from running inference with the model once it has been deployed, c) The total life cycle carbon footprint of a model. For a deep dive into how the carbon footprint of machine learning models can be estimated and reduced, see [9].
Regardless of the scope, one needs to know two things to calculate the carbon footprint of any model:
- The amount of electricity it consumes
- The carbon intensity of this electricity
# 1 depends a lot on the hardware it’s running on as well as the utilization rate of the hardware.
# 2 depends a lot on how the electricity is produced, for instance solar energy and wind energy is obviously much greener than coal for instance. To quantify this, one often uses the average carbon intensity of electricity in the grid in which the hardware is located.
How to estimate and reduce the carbon footprint of machine learning models
Carbon footprint from training ChatGPT
If I’ve understood correctly, ChatGPT is based on a version of GPT-3. It has been estimated that training GPT-3 consumed 1,287 MWh which emitted 552 tons CO2e [1].
These emissions should, however, not only be attributed to ChatGPT, but it is not clear to me how one would go about assigning a share of these emissions to ChatGPT. In addition, ChatGPT has been trained using reinforcement learning [2], which should be added, but relevant information about this training procedure is not available and I am not aware of any reasonable proxies. Please point me in the right direction if you are.
Carbon footprint from running ChatGPT
Now, let’s take a look at how much CO2e might be emitted from running inference with ChatGPT. I haven’t come across any info about # 1 and # 2 wrt. running ChatGPT. So let’s instead make some guesstimates.
The large language model BLOOM was once deployed on a Google Cloud Platform instance with 16 Nvidia A100 40GB GPUs for 18 days [3].
Let’s assume the same hardware is used for ChatGPT. Since the models are roughly the same size — 175b vs 176b parameters for GPT-3 and and BLOOM respectively — let’s assume ChatGPT is also running on 16 Nvidia A100 40GB GPUs, but on an Azure instance rather than a GCP instance since Open AI has a partnership with Microsoft [4].
Since OpenAI is headquartered in San Francisco, let’s further assume that ChatGPT is running in an Azure region on the US west coast. Since electricity in West US has a lower carbon intensity than West US 2, let’s use the former.
Using the ML CO2 Impact calculator, we can estimate ChatGPT’s daily carbon footprint to 23.04 kgCO2e. The average Dane is responsible for emitting 11 tons CO2e per year [7] so the daily carbon footprint of ChatGPT is roughly 0.2 percent of the annual carbon footprint of a Dane. If ChatGPT ran for a year, its carbon footprint would be 365 * 23.04 kg = 8.4 tons, or roughly 76 percent of the annual carbon footprint of a Dane.
The estimate of 23.04 kg CO2E daily was obtained by assuming 16 GPUs * 24 hours = 384 GPU hours per day. It’s not immediately clear, but I think ML CO2 Impact assumes 100 % hardware utilization all of the time, which might be a fair assumption in this case given the heavy load the service is reportedly experiencing.
How much faith should we put in this guesstimate?
To get an idea, let’s look at how it compares to the carbon footprint of BLOOM.
Over the course of 18 days, the 23.04 kgCO2e daily emissions would put ChatGPT’s emissions at 414 kgCO2e. In comparison, BLOOM emitted 360 kg over an 18 day period. The fact that the two estimates are not too far off indicates that 23.04 kgCO2e might not be a poor guesstimates.
The difference between the two emission estimates can come down to many things, e.g. a difference in the carbon intensity of BLOOM’s and ChatGPT’s electricity.
It is also worth noting that BLOOM handled 230,768 requests over the 18 day period [3], corresponding to an average of 12,820 requests per day. If 1.2 % of ChatGPT’s 1 million users [6] send one request daily, ChatGPT would incur the same number of daily requests as BLOOM did in that period. If all the talk of ChatGPT on social media and in conventional media outlets is any indication of its usage, ChatGPT probably handles way more daily requests, so it might be fair to expect it has a larger carbon footprint.
On the other hand, my estimate of ChatGPT’s carbon footprint could be too high if OpenAI’s engineers have found some smart ways to handle all the requests more efficiently.
8 podcast episodes on the climate impact of machine learning
Total life cycle carbon footprint of ChatGPT
To calculate the total life cycle carbon footprint of ChatGPT, we would need to factor in the emissions from the training procedure. Some information about this can be obtained, but it will be very difficult to determine what share of GPT-3 training emissions should be attributed to ChatGPT.
We would also need to factor in emissions from the pre-processing of the training data. This information is not available.
Furthermore, we would need to obtain an estimate of the embodied emissions from producing the hardware. This is a rather complex undertaking that is left as an exercise for the reader. Useful information can be found in [3] which estimates the total life cycle carbon footprint of BLOOM and in [5] which estimates the total life cycle carbon footprint of some of Facebook’s large models.
The 10 most energy efficient programming languages
Conclusion
This article estimates the daily carbon footprint from running ChatGPT to be 23.04 kgCO2e. This estimate is based upon some crude assumptions and it therefore comes with a lot of uncertainty, but it seems reasonable compared to thorough estimates of the carbon footprint of a comparable language model called BLOOM.
By providing a fact-based estimate of the carbon footprint of ChatGPT, this article enables an informed debate about the costs and benefits of ChatGPT.
Finally, this article focuses solely on the CO2e emissions from ChatGPT. Besides CO2e emissions, other types of environmental impact, including water usage, air pollution, soil contamination etc., are also important to consider.
That’s it! I hope you enjoyed this post 🤞
Please leave a comment letting me know what you think 🙌
Follow for more posts related to sustainable data science. I also write about time series forecasting like here or here.
Also, make sure to check out the Danish Data Science Community’s Sustainable Data Science guide for more resources on sustainable data science and the environmental impact of machine learning.
And feel free to connect with me on LinkedIn.
References
[1] https://arxiv.org/ftp/arxiv/papers/2204/2204.05149.pdf
[2] https://openai.com/blog/chatgpt/
[3] https://arxiv.org/pdf/2211.02001.pdf
[5] https://arxiv.org/pdf/2111.00364.pdf
[6] https://twitter.com/sama/status/1599668808285028353
[8] https://storymaps.arcgis.com/stories/5417cd9148c248c0985a5b6d028b0277
The Carbon Footprint of ChatGPT was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
...