Cost of Computing in Coal

Much of my academic research involves statistics and crunching through big datasets. To do this, I use computer clusters like Amazon’s EC2 and a cluster at the Harvard MIT Data Center. I will frequently kick of a job to run overnight on the full HMDC cluster of ~100 computers. Some of my friends do so nearly every night on similar clusters. Like many researchers and engineers, it costs me nothing to kick off a big job. That said, computers consume a lot of energy so I did a little back-of-the-envelope calculation to figure out what the cost in terms of resources might add up to.

An overnight job that uses a 100 computer cluster might use 800 computer-hours. Although power efficiency varies hugely between computers, most statistical analysis is CPU intensive and should come close to maximizing power consumption. According to a few sources [e.g., 1 2 3], 200 watts might be a conservative estimate of much a modern multi-CPU server will draw under high load and won’t include other costs like cooling. Using this estimate, the overnight job on 100 machines would easily use 160 kilowatt hours (kWh) of energy.

In Massachusetts, most of our power comes from coal. This page suggests that an efficient coal plant will generate 2,460 kWh for each ton of coal. That means that one overnight job would use 59 kg (130 lbs) of coal. In the process, you would also create 153 kg (338 lb) of CO2 and a bit under half a kilogram (about 1 lb) of nitrogen oxides and sulfur dioxide each. It’s a very rough estimate but it certainly generates some pressure to make sure the research counts!

Of course, I’ve written some free software that runs on many thousands of computers and servers. How many tons of coal are burnt to support laziness or a lack of optimization in my software? What is the coal cost of choosing to write a program in a less efficient, but easier to write, higher-level programming languages like Python or Ruby instead of writing a more efficient version in C?