分享

盛夏提高数据中心制冷效率的12个小窍门

 yi321yi 2020-08-29

Top summercooling tips for your data center

July 27, 2020

Julius Neudorfer是北美捷通科技(NAAT)的CTO和创始人,该公司位于纽约州的Westchester市,其客户包括世界五百强企业和政府部门。NAAT在过去25年间致力于设计和建造数据中心基础设施和相关的科技项目。

Julius Neudorferis the CTO and founder of North American Access Technologies, Inc. (NAAT).Based in Westchester NY, NAAT’s clients include Fortune 500 firms andgovernment agencies. NAAT has been designing and implementing Data CenterInfrastructure and related technology projects for the last 25 years.

一定要避免服务器过热停机
Avoid a server meltdown

我们正在经历一年一度盛夏酷热潮湿的天气,这也考验制冷系统的极限。尽管很多组织已经把内部的数据中心迁入外部,改为托管或迁到公有云上,仍有些中小规模的数据中心和“小型机房”继续在本地运转。此前我也已经写过,很多企业将遭遇数据中心制冷系统被推向甚至超出极限的窘境。所以你或许该考虑在服务器上涂防晒霜(开玩笑),或者参考一下制冷的小敲门来避免服务器过热。

Those hot andhumid days of summer are upon us again, stress testing the limits of ourcooling systems. And while many organizations have forgone operating theironsite data centers, and have converted to colo’s and cloud, there are stillmany small to midsize data centers and “server rooms” in operation. While Ihave written about this before, many firms will still see their data center’scooling systems pushed to, and even beyond, their limits. So you may want toconsider putting sunburn lotion on your servers or you can use some of thesecooling tips to keep them from overheating.

有很多关于ASHREA扩展的工作冷热区参考和自然冷却的文章,如果你的制冷设施很有限,仍将无济于事。位于综合性楼宇中的机房有一个共性的问题,就是没有专用的大型制冷系统或者不具备应对夏季酷暑的能力。事实上,任何制冷系统的性能在户外高温和潮湿工况下性能都会降低。很多IT部门每个夏季都饱受煎熬,祈祷服务器不会因温度过高而宕机。

So while thereare many posts about all the new ASHRAE expanded Thermal Guidelines and FreeCooling, it does not help if you are in a site with marginal cooling units.This is also a common issue for server rooms located in mixed-use buildingsthat are not using large dedicated cooling systems or systems without enoughextra capacity for those very hot summer days. Virtually any cooling system’sperformance will decrease with higher outdoor temperatures and humidity. ManyIT departments are “sweating” out the summer (again), hoping that they will nothave servers suddenly crashing from over-temperature shutdowns.

– Thinkstock /tomwang112

以下是一些提示和小窍门,虽不能解决长期问题,也许会帮您顺利度过夏季。很多情况下,制冷系统的实际容量无法应对设备实际热负荷超出,在未安装新制冷系统前,优化气流组织可以改善制冷效果。

Here are a fewtips, tricks and techniques that may not solve the long term problem, but mayhelp enough to get you through the summer. Many times, when the actual capacityof the cooling system is not severely exceeded by the actual heat load of theequipment, optimizing the airflow may improve the situation until a new oradditional cooling system is installed.

01

如果觉得温暖,不必惊慌,即便在冷通道看到28°C。是的,虽然这比数据中心内公认21-22°C的“标准”高几度(虽然你不喜欢在这样的温度下工作),不过,对服务器的影响并不想你想象的那样严重。机柜前面的温度如果是28°C或以下,仍然在ASHRAE’s TC 9.9最新“推荐”参考范围内。即使服务器吸入温度比较高(超过32度),仍然在A1“允许”参考范围内。

If it feelswarm, don’t panic - even if you see 80°F, in the cold aisle! Yes, while this ishotter than the proverbial 70-72°F data center “standard” you were used to (andyou may not enjoy working in the room), however, it may not be as bad for theservers as you think. If the highest temperature reading in the front of therack is 80°F or less, you are still within ASHRAE’s TC 9.9 latest “Recommended”guidelines. Even if the intake temperature is somewhat higher (up to 90°F), itis still within the A1 “Allowable” guidelines.

02

在机柜内服务器前面测温。在这个部位服务器吸入冷空气,这也是唯一真实有效和重要的测量点。在机柜的顶部、中部和底部(假设采用了冷热通道布局)。机柜顶部通常是温度最高点。如果机柜底部温度更低,可将服务器重新安装在靠近机柜底部(温度最低的区域)。确认用盲板封住机柜前方的所有未安装区域。这阻止了后侧热空气串扰重新进到机柜前方。

Take temperaturemeasurements inside the face of the cabinet at the front of the servers. Thisis where the servers draw in the cool air and is really the only valid and mostimportant measurement. Take readings at the top, middle and bottom of the frontof the racks (assuming that you have a Hot Aisle - Cold Aisle layout). The topof the rack is usually the highest. If the bottom areas of the racks arecooler, and where possible, try to re-arrange the servers nearer the bottom (orcoolest area) of the racks. Make sure that you use blanking panels to block offANY and ALL open un-used spaces in the front of the racks. This will preventhot air from the rear re-circulating into the front of the racks.

03

即便温度达到38°C或更高(这不常见),也不必担心尾部空气温度!不要在机柜后侧放置任何电扇来“降低温度”-这会导致一些热空气进入冷通道(每次我看到这种状况时,我都觉得应该得到1美元的奖励)!

Don’t worryabout rear temperatures – even if they are at 100°F or more(this is notunusual)! Do not place random fans blowing at the rear of racksto “cool them down” – this just causes more mixing of warm air into the coldaisles (I wish I had a dollar for every time I have seen this)!

04

如由架空地板,要确保通风地板的镂空或开孔部分正对着最热的机柜。如有有必要,可以对镂空地板重新调整,或更换不同的地板开孔以满足机柜散热所需散热的气流。要避免通风地板的开孔不能距离精密空调(CRAC)太近,这会产生气流“短路”,冷空气从最近的地方排出来后马上回到精密空调,导致整个房间和同列的机柜和服务器无法得到充足的冷量。

If you have araised floor, make sure that the floor grates or perforated tiles are properly locatedin front of where the hottest racks are. If necessary re-arrange or change todifferent floor grates to match the airflow to the heat load. Be careful not tolocate floor grates too close to the CRACs, this will “Short Circuit” the coolair flow immediately back into the CRACs and rob the rest of the room/row ofsufficient cool air.

05

避免旁路气流,检查架空地板和机柜的里面。地板上进出电缆的开口处会导致冷气泄露,降低了冷通道架空地板下静压箱的压力造成冷通道内可用冷气不足。安装封闭毛刷类装置可把泄露问题的影响降到最低。

Avoid bypassairflow. Check the raised floor for openings inside the cabinets. Cableopenings in the floor allow air to escape the raised floor plenum were it isnot needed, and lowers the available cold air to the floor vents in the coldaisles. Use air containment brush type collar kits to minimize this problem.

06

如果有可能,对机柜进行重新调配使热负载平均,避免或最小化“局部热点”的产生。至少,在移动服务器前测量机柜顶部、中间和下部的温度。在每列机柜中安装固定的温湿度传感器,每个机柜或至少每隔3个机柜安装1个传感器并实现中央监控。

If possible, tryto re-distribute and evenly spread the heat loads into every rack to avoid orminimize “Hot Spots”. At the very least, manually check the temperature in theracks at the top, middle and bottom, before you move the servers. Installpermanent temperature sensors in each rack or at least every 3rd rack and acentral monitoring if possible.

07

检查机柜后侧的电缆是否阻挡了服务器排出的气流。这些电缆会导致IT设备后部的风扇产生额外阻力,即便服务器前部有充足的冷气,也会导致设备过热。这在很多装满1U服务器,在机柜背面布防了很长的电源线和网线的地方很容易发生。可以考虑购买更短的电源线以及更换服务器原装的OEM电源线。同时尽量使用最短的网线。安装理线架可以保持机柜背面的条理整洁,保障气流不受影响。

Check the rearof racks for cables blocking exhaust airflow. This will cause excessive backpressure for the IT equipment fans and can cause the equipment to overheat -even when there is enough cool air in front. This is especially true of racksfull of 1U servers with a lot of long power cords and network cabling. Considerpurchasing shorter (1-2 foot) power cords and replacing the original longer OEMcords shipped with most servers. Also use the shortest possible network cablesas well. Use cable management to unclutter the rear of the rack so that the airflow is not impeded.

08

如果有顶部的导流制冷系统,要保证冷气吹到机柜前面的冷通道,背面的热通道上方回风。我看到一些场地的通风地板和回风布局不合理,房间非常热,而空调系统的制冷容量不足,正是因为冷气没有吹到机柜正面,热气也没有正常回流。更重要的是要避免环流;确保机柜背面的热气应回到精密空调回风处,避免与冷空气混在一起。如果有封闭天花,可考虑作为热通道来把热气导流到精密空调顶部的回风口。有些封闭通道会对机房温度产生立竿见影的效果。实际上,回风温度稍高,精密空调的效率和实际制冷效果更好。

If you have anoverhead ducted cooling system, make sure that the cool air outlets aredirectly over the front of the racks and the return ducts are over the hotaisles. I have seen sites where the ceiling vents and returns are poorlylocated, the room is very hot, yet the capacity of the cooling system has not beenexceeded simply because the all the cool air is not getting directly to thefront of the racks or the hot air is not properly extracted. The most importantissue is to avoid recirculation; make sure the hot air from the rear of thecabinets can get directly back to the CRAC return, without mixing with the coldair. If you have a plenum ceiling consider using it to capture the warm air andadd a ducted collar going into the ceiling from your CRAC’s top return airintake. Some basic duct work will have an immediate impact on the roomtemperature. In fact the warmer the return air, the higher the efficiency andactual cooling capacity of the CRAC.

09

如果能把排出的热气排到外部区域,可以考虑临时增加“封闭式循环通道”类制冷设备。把服务器排出的热气通过管道排到天花或顶部区域送回精密空调并不太管用。服务器排出的热气必须封闭隔离排到机房区域外。

Consider addingtemporary “roll-in” type cooling units only if you can exhaust the heat into anexternal area. Running the exhaust ducts into a ceiling that goes back to theCRAC does not work. The heat exhaust ducts of the roll-in must exhaust into anarea outside of the controlled space.

10

如果房间不是特殊要求,关掉所有灯。这能节省1-3%的电和热负荷,在极端的制冷环境下,可以降温1-2度。

When the room isnot occupied, turn off the lights. This can save 1-3% of electrical and heatload, which in a marginal cooling situation, may lower the temperature 1-2degrees.

11

检查是否有设备已经不再参与生产(即僵尸服务器)但仍接着电源线和加电。这很常见---必须把这些设备断电。

Check to see ifthere is any equipment that is still plugged in and powered up, but is nolonger in production (aka the ever popular Zombie servers). This is a fairly commonoccurrence and has an easy fix - just shut them off!

12

如果有刀片服务器,在制冷系统无法应对满热负荷时,可以可启动“电力封顶”功能。这会降低有些处理器的性能,但总比因为过热导致服务器崩溃好得多。

If you haveblade servers consider activating the “power capping” feature, when coolingsystems are not able to handle the full heat load. This may slow down theprocessors a bit, but it is much better that having an unexpected server crashdue to thermal shutdown.

底线
The bottom line

当然,要确保制冷系统正常运转和把外部不利因素全部排除掉。当热负载全部超出了制冷系统的制冷容量后很难立竿见影地解决,有时改善气流组织可以把总效率提升5-20%。这会让你在升级制冷系统前顺利度过盛夏。无论如何,这都将降低你的能源成本,总是件好事。

Of course, makesure that your cooling system is properly serviced and that all exteriorrejection systems have been cleaned. While there is no true quick fix when yourheat load totally exceeds your cooling system’s capacity, sometimes justimproving the air flow may increase the overall efficiency 5-20%. Make surethat your cool system was serviced and all exterior rejection has been cleaned.This may get you though the hottest days, until you can upgrade your coolingsystems should you need it. In any event, it will lower your energy costs,which is always a good thing.

今年Covid-19疫情令IT和很多支持人员更难在现场开展工作,远程监控的重要性显得前所未有。计划要尽早。最后,可在部分机柜内安装一些基本型的远程温度监控。设定阈值提前预警避免事态发展。一旦制冷系统故障,设定回撤预案关掉次关键系统,保证更多关键服务器(如e-mail,财务等)继续运行。要把运行核心关键的系统安放在最冷的区域。这要比收到(或没收到)高温警告邮件信息或因过热导致最关键的系统意外宕机要好很多。

This year theCovid-19 pandemic has made it more difficult for IT and other support personnelto work onsite, making remote monitoring and control more important than ever.Plan ahead. At the very least, install some basic remote temperature monitoringinside some or all of the cabinets. Set alarm thresholds to provide an earlywarning system of developing problems. If all else fails, have a fall-back planto shut down the least critical systems, so that the more critical servers canremain operational (i.e. email – financial, etc.). Make sure to locate the mostcritical systems in the coolest area. This is a lot better than getting (orperhaps not getting) high temperature warning email messages or having themost critical systems unexpectedly shutdown from overheating.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多