分享

亡羊补牢?Equinix、谷歌提高数据中心机房温度背后的依据和争议

 yi321yi 2023-01-01 发布于上海

图片

技术剖析:数据中心机房温度标准之争

over cooling is past its sell-by data

Peter Judge,DCD主编

在有明确证据证明机房温度提高效果不错后的 14 年后,托管服务商仍在努力调优其冷却系统的运行。

Fourteen years after definitive proof that warmer is better colocation companies are still struggling to turn their cooling systems down.

告别过度制冷?
Goodbye to over-cooling?

绿色数据中心运动的最大努力是减少数据中心冷却方面的能源浪费。过去的共识是数据中心应保持在 20°C 以下的低温工况,以绝对确保电子设备正常工作。虽然硬件供应商不同意升温的观点,行业机构则向数据中心运营商保证,升高机房温度是安全的。现在,一些保守的托管服务商正在根据这些升温建议采取行动。但是先等等,我们听到研究工程师发出谨慎的建议。事实证明,当温度升高时,电子设备执行同样的工作会消耗更多功率。一些人认为提高温度的举措是一个大错误,因为他们过度依赖 PUE(电源使用效率)等简单的能效测量方法。难道我们需要更多的数据和研究来解决这个问题吗?

The biggest effort in the green data center movement has been around reducing the energy wasted in cooling data centers.The old consensus was that data centers should be kept at a chilly temperature below 20°C, to be absolutely sure that the electronics were not stressed.Hardware vendors disagreed, and industry bodies assured data center operators that warmer temperatures were safe. Now the conservative colocation sector is acting on those recommendations.But wait. Research engineers are sounding a cautious note. It turns out that the electronics use more power when the temperature goes up, to perform the same work.Some are suggesting that the move to raise temperatures is a big mistake, based on over-reliance on simplistic efficiency measures like PUE (power usage effectiveness).Could it be that we need a bigger dataset and more research to sort this out?

在明确的指南中,高达27°C(80°F)的温度下运行数据中心是完全安全的。但该行业的很多人坚持对服务器进行过度制冷,浪费了大量能源,并造成不必要的排放。有迹象表明,这种情况可能正在改变,但进展非常缓慢,未来的发展看起来不会有太快转变。

in definitive guidance that it is perfectly safe to run data centers at temperatures up to 27°C (80°F).But large parts of the industry persist in over-cooling their servers, wasting vast amounts of energy and causing unnecessary emissions. There are signs that this may be changing, but progress has been incredibly slow - and future developments don’t look like speeding things up very much.

图片

别这么冷

Don’t be so cool

当数据中心刚出现时,运营商让它们保持凉爽,以避免任何过热的可能性。温度被限定在22℃ (71.6℉),这意味着冷风机正在加紧工作,以保持机房不必要的低温。在本世纪初,消耗在冷却系统上的能量比用在IT机架上的能量更多,这一趋势似乎显然是错误的。行业内开始努力减少这种不平衡,并创建了一个衡量标准——PUE(电源使用效率)来衡量进展。

When data centers first emerged, operators kept them cool to avoid any chance of overheating. Temperatures were pegged at 22°C (71.6°F), which meant that chillers were working overtime to maintain an unnecessarily cool atmosphere in the server rooms.In the early 2000s, more energy was spent in the cooling systems than in the IT rack itself, a trend which seemed obviously wrong. The industry began an effort to reduce that imbalance, and created a metric, PUE (Power Usage Effectiveness) to measure progress.

PUE是数据中心中使用的总能耗除以机架使用的能耗,因此,如果PUE为1.0,则意味着所有功率都分配给机架。寻求通过关闭空调的方法,让环境温度上升,是实现这一目标的主要战略。2004年,ASHRAE(美国采暖、制冷和空调工程师学会)建议将工作温度范围调整为20°C~25°C。2008年,该学会更进一步建议将温度上升到27°C。此后,该学会发布了A1修订本,根据情况将上限提高到32°C (89.6°F)。

PUE is the total power used in the data center, divided by the power used in the racks - so an “ideal” PUE of 1.0 would mean all power is going to the racks. Findings ways to switch off the air conditioning, and letting temperatures rise, was a major strategy in approaching this goal.In 2004, ASHRAE (the American Society of Heating, Refrigerating and Air-Conditioning Engineers) recommended an operating temperature range from 20°C to 25°C.In 2008, the society went further, suggesting that temperatures could be raised to 27°C.Following that, the society issued Revision A1, which raised the limit to 32°C (89.6°F) depending on conditions.

图片

这可不是一时兴起
This was not an idle whim.

ASHRAE的工程师说,更高的温度对部件的寿命影响不大,但会显著节省能源。来自美国总务管理局的数据显示,数据中心如允许温度上升1°C,就可以节省4%的总能耗。云计算巨头(超大规模厂商)往往最先获取先进的技术概念。他们拥有大楼、冷却系统和IT设备。因此,如果他们允许温度上升,那么受发热影响的是他们自己的设备。因此,云计算巨头率先参与提高数据中心温度也就不足为奇了。Facebook很快发现,它可以超越ASHRAE的指引。在其Prineville和Forest City数据中心,他们将服务器温度提高到29.4°C(85℉),没有发现不良影响。“这将进一步减少我们对环境的影响,使我们在Prineville运行的空气处理硬件相对减少45%,”时任Facebook工程总监的Yael Maguire说。

ASHRAE engineers said that higher temperatures would have little effect on the lifetime of components, but would offer significant energy savings. Figures from the US General Services Administration suggested that data centers could save four percent of their total energy, for every degree they allowed the temperature to climb. Hyperscale companies are often best placed to pick up advanced technology ideas. They own the building, the cooling systems, and the IT. So if they allow temperatures to climb, then it’s their own equipment that feels the heat.So it’s no surprise that cloud giants were the first to get on board with raising data center temperatures. Facebook quickly found it could go beyond the ASHRAE guidelines. At its Prineville and Forest City data centers, they raised the server temperatures to 29.4°C, and found no ill effects.“This will further reduce our environmental impact and allow us to have 45 percent less air-handling hardware than we have in Prineville,” Yael Maguire, then Facebook’s director of engineering, said.

谷歌的温度升至26.6摄氏度,当时负责数据中心的副总裁乔·卡瓦表示,这一举措正在奏效:“谷歌运营中的数据中心温度比大多数公司都高,因为这有助于提高效率。”英特尔走得最远。在2008年的10个月里,这家芯片巨头使用了900台服务器,其中一半在传统冷却技术的数据中心运行,而另外450台则没有外部冷却,服务器温度有时高达33.3°C(92°F)。在十个月结束时,这家芯片巨头将这些服务器与另外450台在传统空调环境中运行的服务器进行了对比。450台热服务器节省了约67%的电力成本。在这项工作在更高温度的测试中,英特尔实际上发现了可测量的故障增加。在热服务器中,有2%以上的服务器出现故障。但故障率可能与温度无关,因为接受测试的450台服务器也没有采取空气过滤或湿度控制,因此故障率的小幅上升可能是由于灰尘和冷凝造成的。

Google went up to 26.6°C, and Joe Kava, then vice president of data centers, said the move was working: “Google runs data centers warmer than most because it helps efficiency.” Intel went furthest. For ten months in 2008, the chip giant took 900 servers, and ran half of them in a traditionally cooled data center, while the other 450 were given no external cooling. The server temperatures went up to 33.3°C (92°F) at times.At the end of the ten months, the chip giant compared those servers with another 450 which had been run in a traditional air-conditioned environment. The 450 hot servers had saved some 67 percent of the power budget.In this higher-temperature test, Intel actually found a measurable increase in failure. Amongst the hot servers, two percent more failed. But that failure rate may have had nothing to do with the temperature - the 450 servers under test also had no air filtration or humidity control, so the small increase in failure rate may have been due to dust and condensation.

 图片

有些人喜欢更高温度

Some like it hot

学者们支持这一想法,2012年多伦多大学发表的一篇题为《数据中心的温度管理:为什么有些人(可能)喜欢它变热》的论文也支持这一观点。“我们的研究结论表明,综合情况下,温度对硬件可靠性的影响比通常认为的要小,”加拿大学者总结道,“不断升高的数据中心温度创造了大幅节能和减少碳排放的潜力空间。”与此同时,服务器制造商回应了ASHRAE的指引,并确认在不违反设备保修的情况下,这些新的更高温度是可以接受的。考虑到如此多的支持,您可能会预计整个行业的数据中心温度将大幅上升。而且您仍然可以找到2011年预测冷通道的温度将迅速上升的评论。

Academics backed up the idea, with support coming from a 2012 paper from the University of Toronto titled Temperature Management in Data Centers: Why Some (Might) Like It Hot. “Our results indicate that, all things considered, the effect of temperature on hardware reliability is weaker than commonly thought,” the Canadian academics conclude. “Increasing data center temperatures creates the potential for large energy savings and reductions in carbon emissions.” At the same time, server makers responded to ASHRAE’s guidelines, and confirmed that these new higher temperatures were acceptable without breaking equipment warranties. Given that weight of support, you might have expected data center temperatures to rise dramatically across the industry - and you can still find commentary from 2011, which predicts a rapid increase in cold aisle temperatures.

然而,环顾当今推荐的数据中心温度,22°C和25°C的数字仍然被广泛引用。这种不愿改变的态度主要归因于该行业保守主义的声誉,也有一些有影响力的声音反对温度越高越好的共识。

However, look around for recommended data center temperatures today, and figures of 22°C and 25°C are still widely quoted. This reluctance to change is widely put down to the industry’s reputation for conservatism, although there are some influential voices raised against the consensus that higher temperatures are automatically better.

图片

Equinix谨慎前行
Equinix makes a cautious move

所有这些都使得Equinix最近的一项声明显得非常有趣。从某些方面来看,Equinix是世界上最大的托管运营商,托管了大量服务器(这些服务器不在本地机房和云服务中)。12月份,Equinix宣布将“调整其主机托管数据中心的温控器,让它们在更高温度运行(逐步提高到80℉即27℃),以节约不必要的冷却耗能。”Equinix全球运营执行副总裁Raouf Abdel表示:“有了这项新举措,我们可以智能地调整数据中心的温控器,就像用户在家里一样。”

All of which makes a recent announcement from Equinix very interesting. On some measures, Equinix is the world’s largest colocation player, housing a huge chunk of the servers which are not either in on premises data centers on in the cloud. In December, Equinix announced that it would “adjust the thermostat of its colocation data centers, letting them run warmer, to reduce the amount of energy spent cooling them down unnecessarily.” “With this new initiative, we can intelligently adjust the thermostat in our data centers in the same way that consumers do in their homes,” said Raouf Abdel, EVP of global operations for Equinix.

Equinix的公告引用了分析师和供应商的贺词。分析公司IDC的数据中心服务项目副总裁Rob Brothers解释说,“大多数数据中心不需要过度制冷。Equinix将在推动行业变革方面发挥关键作用,并帮助塑造我们所有人都需要参与的整体可持续发展事件中”。他说,这一声明将 “改变我们对数据中心环境运行温度的看法”。

Equinix’s announcement features congratulatory quotes from analysts and vendors. Rob Brothers, program vice president, data center services, at analyst firm IDC explains that “most data centers … are unnecessarily cooler than required,' Brothers goes on to say that the announcement will see Equinix “play a key role in driving change in the industry and help shape the overall sustainability story we all need to participate in.' The announcement will 'change the way we think about operating temperatures within data center environments,” he says.

但这确实有点夸大其词了。Equinix的承诺只是尝试将温度提高到27°C ,而这是ASHRAE在14年前设定的目标,现在它的建议已经可以超过这个目标了。Equinix数据中心也不会立即变暖。该公告不会对任何现有客户产生立竿见影的影响。相反,将来Equinix计划调整其设备托管站点温度时,客户还会在某个不确定的时刻收到通知。

Which really does oversell the announcement somewhat. All Equinix has promised to do is to make an attempt to push temperatures up towards 27°C - the target which ASHRAE set 14 years ago, and which it already recommends can be exceeded. No Equinix data centers will get warmer straight away, either. The announcement will have no immediate impact on any existing customers in any Equinix data centers. Instead, customers will be notified at some unspecified time in the future, when Equinix is planning to adjust the thermostat at the site where their equipment is hosted.

图片

碳排放范围1、2、3

客户喜欢它更冷

Customers like it cool

从文中可见,Equinix显然面临着来自客户的抵制。客户对“更高温度也安全”的大量证据视而不见,不愿改变传统的22℃的温度。Equinix将其推动升高温度的想法作为帮助其客户实现减少范围3排放目标的一种方式,范围3排放是指他们供应链活动中排放的CO2当量。对于托管客户来说,他们的托管服务商设施使用的能源是他们范围3排放的一部分,并应采取措施鼓励所有公司减少范围3排放,以实现其净零碳排放目标。

Reading between the lines, it is obvious that Equinix is facing pushback from its customers, who are ignoring the vast weight of evidence that higher temperatures are safe, and are unwilling to budge from the traditional 22°C temperature which has been the norm. Equinix pushes the idea of increased temperatures as a way for its customers to meet the goal of reducing Scope 3 emissions, the CO2 equivalent emitted from activity in their supply chain. For colocation customers, the energy used in their colo provider’s facility is part of their Scope 3 emissions, and there are moves to encourage all companies to cut their Scope 3 emissions to reach net-zero goals.

值得注意的是,尽管有些客户渴望在更高的温度下托管他们的服务器,但Equinix根本没有提供任何相应的报价。对于Equinix来说,其冷却系统中使用的电力排放是范围2排放的一部分,电力部门已承诺减少排放。提高温度将是实现这一目标的重要一步。“我们的冷却系统约占全球总能源使用量的25%,”Abdel说,“一旦我们目前的全球数据中心全面铺开,我们预计,不同地点的能效将提高10%。”Equinix目前处境艰难,它不可能冒着客户不满的风险而提高温度,因为客户可能会拒绝提高温度,或者更换服务商。

Revealingly, Equinix does not provide any supporting quotes at all from customers eager to have their servers hosted at a higher temperature. For Equinix, the emissions for electricity used in its cooling systems are part of its Scope 2 emissions, which it has promised to reduce. Increasing the temperature will be a major step towards achieving that goal. 'Our cooling systems account for approximately 25 percent of our total energy usage globally,' said Abdel. 'Once rolled out across our current global data center footprint, we anticipate energy efficiency improvements of as much as 10 percent in various locations.' Equinix is in a difficult position. It can’t increase the temperature without risking the displeasure of its customers, who might refuse to allow the increase, or go elsewhere.

Equinix设定的目标需要采取行动支持。但声明的谨慎性质清楚地表明,这可能是一场艰苦的战斗。然而,Equinix显然相信,未来的净零监管将推动客户朝着它希望被允许的方向前进。“Equinix致力于了解这些变化将如何影响我们的客户,我们将共同努力,找到一条互利共赢的道路,走向更可持续的未来,”该公司的声明称,“随着全球对数据中心运营的可持续性要求变得更加严格,我们的客户和合作伙伴将依赖Equinix继续引领行动,帮助他们实现可持续发展目标。”

It’s a move that needs to be made, and Equinix deserves support for setting the goal. But the cautious nature of the announcement makes it clear that this could be an uphill battle. However, Equinix clearly believes that future net-zero regulations will push customers in the direction it wants to be allowed to go.' Equinix is committed to understanding how these changes will affect our customers and we will work together to find a mutually beneficial path toward a more sustainable future,” says the statement from the company.“As global sustainability requirements for data center operations become more stringent, our customers and partners will depend on Equinix to continue leading efforts that help them achieve their sustainability goals.'

图片

过冷真的那么糟糕吗?
IS OVER-COOLING REALLY SO BAD?

令人惊讶的是,在行业内似乎已达成应该尽可能提高数据中心温度的共识后,还是出现了反对的声音,而且其中一些人还非常权威。对于让数据中心变得更热的做法,主要有两种反对意见。一是在封闭的热通道中工作的数据中心员工将面临更恶劣的工作环境;二是服务器中的芯片也将面临更极端的条件。

Surprisingly, after the industry seems to have reached a consensus that data centers should be run as warmly as possible, there are dissenting voices - some of them very authoritative. There are two main objections to running data centers warmer. One is that data center staff working in a contained hot aisle will be subjected to a harsher working environment. The other is that the chips in the servers will also be subjected to more extreme conditions.

在Telehouse工作了24年的退休老员工John Haile在领英上就Equinix的声明发表了评论:“一旦一排机柜加电运行,在数据中心工作的人通常必须在炎热的通道上工作。那里的温度远远超过40℃,它会让你的眼睛变得干燥。虽然许多专业人士准备在更高的温度下工作,有些人甚至享受着穿短裤工作的机会,但其他人质疑这种努力从一开始就是否有益。

John Haile, a retired 24-year veteran of Telehouse, commented on a LinkedIn discussion about Equinix’s announcement: “The people that work in the data center generally have to work in the hot aisle once the row goes live. The temperatures in there are well over 40°C - it drys your eyes out.” While many professionals are prepared to work at higher temperatures, and some even relish the opportunity to work in shorts, others question whether the effort is even beneficial in the first place.

瑞典研究院(RISE)数据中心研究主管Jon Summers教授认为,过度依赖一个能效指标在更高温度下运行产生好处可能是假的。为了降低PUE(电源使用效率)指标,数据中心将能耗从建筑的空调转移到了机架内。Summers指出,如果机架中的能量全部用于计算,但部分用于冷却风扇,这才是有意义的。Summers说:“温度升高将改善数据中心的PUE,绝大多数人似乎将其奉为衡量效率的标准。”他的研究表明,空调能耗的减少将被服务器能耗的增加所抵消。“在瑞典RISE研究所,我们在ICE数据中心使用风洞、全风冷数据中心、冷板(直接到芯片)和连接到控制良好的液体冷却试验台的浸没式系统来研究电源温度对数据中心IT设备的影响,“Summers说。

Running with hotter air temperatures may create a completely spurious benefit, based on over-reliance on one efficiency metric, argues Professor Jon Summers, research lead in data centers at Research Institutes of Sweden (RISE), Data centers measure efficiency by aiming for a low PUE (power usage effectiveness) which is created by shifting power consumption from the building’s air conditioning to the racks. This makes sense if the energy in the racks is all used for computation, but some are used for cooling fans, points out Professor Summers. “Increasing temperatures will improve the ISO PUE of a DC, which a vast majority appear to cite as a measure of efficiency,” says Summers. His research that a reduction in energy used by the air conditioning will be offset by the increased energy used in servers.“At RISE Research Institutes of Sweden, in the ICE data center we have researched the effect of supply temperature on DC IT equipment using wind tunnels, full air-cooled data centers, direct-to-chip, and immersion systems connected to well-controlled liquid cooling testbeds,” says Summers.

图片

其结果是,无论采用哪种冷却方法,由于电流泄漏,在相同的数字工作负载下,微处理器在工作温度较高时会消耗更多电能。Summers说,这种影响在不同的处理器之间有所不同,Xeon E5-2769-v3 CPU在50%的工作负载下运行,在风洞中将温度从40°C提高到75°C时,服务器风扇的目标是固定的CPU温度,消耗的功率增加8W。

“The upshot is that irrespective of the cooling method, the microprocessors draw more power when operated hotter for the same digital workload due to current leakages.” This effect varies between different processors, says Summers, with Xeon E5-2769-v3 CPUs running at 50 percent workload drawing 8W more when the temperature was increased from 40°C to 75°C in a wind tunnel, with the server fans set to target a fixed CPU temperature.

基本上,当进风温度升高时,冷却负载从空调系统转移到服务器中的风扇,后者必须更加努力地工作。这会自动降低PUE,因为风扇位于机架中,与外部冷却系统中使用的能量相比,PUE旨在最大限度地提高机架内使用的能量。

Essentially, when the air inlet temperature goes up, the cooling work shifts from the air conditioning systems to the fans in the servers, which have to work harder. This automatically reduces the PUE, because the fans are in the racks, and PUE is designed to maximize the energy used within the racks, compared to energy used in the external cooling systems.

Summers说,在更高的温度下运行创造出的好处完全是错觉:“随着进风温度的上升,尽管PUE下降了,我们确实看到总体能源消耗增加。”在浸没柜中的系统没有服务器风扇,Summers在其中运行了108个相同的CPU。在这种情况下,他的团队发现,当浸没柜冷却液从50℃下降到30℃时,在同样50%的工作负荷下,电力需求下降了6%。“除了数据中心冷却设备消耗的能量较少导致PUE较低外,还有什么原因推高了送风温度?”,Summers问道。

Running at hotter temperatures can create completely illusory benefits, says Summers: “With increased supply temperatures we do see an increased overall energy consumption even though the PUE drops.” In immersion tanks, where systems have no server fans, Summers ran 108 of the same CPUs in an immersion tank.In this situation, his team found there was a six percent drop in power requirements at the same 50 percent workload when the tank coolant was dropped from 50°C to 30°C. “Other than less energy consumed by the DC cooling equipment resulting in a lower ISO PUE, what are the reasons for pushing up air-supply temperatures?” Summers asks.

Summers的同事、ICE数据中心负责人Tor Björn Minde对此表示赞同:“你到底为什么想要这样做?”Minde说,“如果室外气温在30℃以上,那么允许更高的温度可能是有意义的,但如果不是这样,你应该尽可能地让它变冷。”在低温下,IT的功率消耗较小。如果你有自然冷源,那就保持低温吧。无论是在设施中还是在服务器上,您的总体风扇速度都会降低。Minde认为,该行业应该以恒定的CPU温度为目标,只有在CPU温度太高的情况下才使用空调压缩机。研究工作还在进行中,TechBuyer的一个部门InterAct也一直在研究这个课题,并将于2023年在IEEE发表一篇论文。

Summers’ colleague Tor Björn Minde, head of RISE’s ICE data center agrees: “Why in the world would you like to do this?” Allowing warmer temperatures might make sense if the outside air temperature is above +30°C, says Minde, but otherwise “you should run it as cold as possible. The power draw of the IT is less at low temperatures. If you have freecooling, run it cold. You will have less fan speed overall both in the facility and on the servers.Minde thinks the industry should aim for constant CPU temperatures, and use the air conditioning compressor only when the CPU temperature is getting too high.Further work will be done on this – and Interact, a division of TechBuyer, has also been researching the issue, and will be publishing a paper with the IEEE in 2023.

图片

深知社独家观点
深知社独家观点

深知社高级研究员乐海林认为,机房温度升高还会带来以下优势:

  • 升温提高了冷水机组的工作效率。供水温度每提高1℃,冷机的效率将提高2~3%;

  • 升温延长了自然冷源的使用时间。一般送风温度提升1℃,供水温度同步可以提高1℃,自然冷源的使用时长可以延长10~15天;

  • 如果将系统的回风温度提高至38℃,相应的系统最高回水温度可达32℃。该状态下,系统可全年使用自然冷源,系统控制模式更加简单。

提高机房工作温度后,也会带来如下风险:

  • 升温后,若发生制冷故障,会导致IT设备过热宕机的缓冲时间缩短,因此持续供冷变得更重要;

  • 升温后,机房局部存在过热风险,对机房气流组织提出了更高的要求;

  • 升温会导致服务器的风扇功率变大。若项目无法使用自然冷源,单一机械制冷的总能耗可能会变大。



深 知 社


翻译:

Seaman

DKV(DeepKnowledge Volunteer)计划精英成员

观点:

乐海林

DKV(DeepKnowledge Volunteer)计划创始成员

公众号声明:

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多