分享

【DKV】改变中的数据中心前景 (第三章)

 yi321yi 2019-08-12

Changing Landscape Of Data Centers

BY DONALD L. BEATY, P.E., FELLOW ASHRAE; DAVID QUIRK, P.E., MEMBER ASHRAE; JEFF JAWORSKI

From the moment a data center’s design is finalized, it may no longer be optimal for the information technology(IT) equipment it will someday contain. Not only does the IT equipment in a data center rarely remain unchanged, but also the loads the IT equipment can experience may vary significantly over time or even within the same day. These changes can easily affect cooling, power, airflow, acoustics, and other considerations, creating an ever-changing landscape the data center must accommodate.

数据中心设计完成的那一刻起,它就可能不是将来要放置其中的IT设备的最优匹配对象了。不仅因为里面的IT设备种类与数量很少保持不变,而且IT负载即便在同一天中也有可能变化很大。这些变化能够轻易地影响制冷、配电、气流组织、声效等其他因素。

An inherent mismatch of time scales exists between IT equipment and the data center in which it operates. IT equipment is responsive, adapting immediately to its workload and environment. Data center facility equipment, on the other hand, reacts when changes within the data center reach critical mass where larger responses are necessary. For example, if servers consume more power by handling a higher compute workload, internal fans can speed up instantly to remove the additional dissipated heat. In contrast, the facility cooling may wait for a measurable increase in return air temperature before increasing cooling capacity.

 从时间维度上,IT设备与其所在的数据中心的错配是固有的。随负载、环境变化,IT设备会迅速作出反应、适应。而另一方面,数据中心的基础设施设备则需要外界条件变化积累到必须调整的程度才能作出响应。例如,服务器运行较高的算力负荷时,会更耗能,IT设备内部的风扇就会持续加速排风,以带走额外产生的热量。与此相反,基础设施的制冷系统则需要等回风温度升高到可衡量的程度之后,才加大制冷量.

Further, many interactions exist between the IT equipment and the data center facility systems, not just those related to power consumption and heat dissipation. The differential air temperature(ΔT ) between supply and return air is inversely related to how much airflow is required to remove some amount of heat. The ability of a server to affect airflow is affected by the pressure differential between the intake and exhaust of the IT equipment. Even acoustics within the data center are directly related to fan speeds.

不仅如此,在IT设备与基础设施设备系统之间存在着诸多互动,不只限于电力消耗与热排放之间。送、回风温差与排热需要的气流量呈负相关。服务器在多大程度上能影响气流组织与IT设备吸入与排除空气的压差有关。甚至数据中心的声学环境与风扇转速直接相关。

Combining the multitude of interactions and dependencies with the mismatch of time scales creates a potentially challenging environment in which all the equipment is in a constant state of reacting to the reactions of other equipment. This highlights the impor- tance of understanding and designing for these interactions in ways that yield predictable and/or addressable results.

多种的交互作用与相互依赖结合起来,加上随着时间推移引起的错配,造成了潜在的挑战性的运行环境。在这个环境中,各种不同设备持续的进行着互动。这种情况使得对这些相互作用的理解与设计的重要性变得更高,从而得到预防性和/或者解决性的结果。

Differential Air Temperature

For a long time, data center operators were mainly concerned with IT power consumption. This was a clear indicator of the amount of power that would need to be supplied to the equipment and of the heat that would be dissipated by the equipment and need to be removed. But in the quest to improve data center efficiency, other important metrics have emerged—ΔT being one of them.

温差

许久以来,IT设备的能耗是数据中心运维人员们主要关心的。能耗曾经是一个明确的指标,它能告诉我们设备需要多少电量供给,多少热量产生以及需要被排走。但是就有关如何提高数据中心的效率,其他的衡量指标就被引入了,温差就是其中的一个。

Before entering the spotlight, ΔT was typically just a consequence of the supply air temperature and the dissipated power of the equipment. Whatever heat could be picked up by the supply air defined the ΔT. However, to increase server efficiency, additional sensors and more complex thermal management techniques have allowed servers to operate at higher average temperatures while managing the risk that any portion of theserver could overheat. This has allowed both a reduction in fan speeds within the server(to not provide more cooling than needed) and more efficient heat rejection(less airflow is needed to remove the same amount of heat).

在进入聚光灯下之前,温差通常被看做是送风气体与设备热量释放的结果。所有能够被送风气体带走的热量都算作温差。然而,为了提高服务器的效率,更多的传感器以及更加复杂的热管理技术已经能够使服务器在较高的平均温度范围内运行,同时能够对服务器任何部分出现过热的风险进行管理。这就使服务器内的风扇转速得以降低(无需再提供更多制冷)以及热排放更加高效(不再需要那么多气流量来带走热量)。

As IT manufacturers began to design equipment for higher ΔT, the consequence has been that exhaust air temperatures have increased since the typical supply air temperatures have certainly not been decreased(colder supply air temperatures would be less efficient for the facility to provide). This was compounded by a general increase in supply air temperatures in data centers as IT manufacturers agreed that warmer air at IT intakes would not impact reliability.

随着IT设备制造厂商开始设计适用于更高温差范围运行的服务器,由于常规送风气体温度没有随着降低(更低的制冷温度会使设施运行变得更加低效),结果导致出风温度提高了。这是一个综合的结果,数据中心的送风气体温度升高,因为IT设备厂商认识到更高温度的进入气体不会有损于设备可靠性。

Between 2008 and 2010 a rapid increase occured in typical ΔT for new IT equipment, climbing over 12°F (7°C) in just one or two generations of equipment. At that point, ΔT ranged from just under 30°F (16°C) to over 50°F(28°C), based on the type, size, and utilization of equipment. Since then, the lower range has steadily increased.1

2008年至2010年,新的IT设备的可适应温差有明显提升,仅仅在一或二代之间,就有超过12华氏度(7摄氏度)的涨幅。在那种条件下,温差范围在低于30华氏度(16摄氏度)至高于50华氏度(28摄氏度)之间,基于设备种类、规格及使用情况。从那时起,适应更低范围的IT设备开始稳步发展。

But at 50°F(28°C) ΔT, the upper range of this trend has been pushing the limit of what constitutes acceptable exhaust air temperatures, with exhaust temperatures reaching 140°F(60°C)! The concerns at this temperature include touch safety, certification of ancillary equipment such as PDUs and switches, and general human comfort. Assuch, the upper limit of ΔT is not expected to increase any further.

但是在50华氏度(28摄氏度)的温差,这种趋势带来更高的温差范围,已经让可接受的出气温度达到140华氏度(60摄氏度)。在这种温度下,人们需要关注触碰安全(炽热表面引起烫伤)、像PDU以及转换开关的认证、与人体工作舒适度。考虑到这些因素,温差的最高限已经不被期望调的越高越好了。

As mentioned, increasing ΔT corresponds to decreasing airflow requirements per unit of heat. Until 2010, the increase in ΔT meant less air flow was needed for the same load, which allowed server manufacturers to actually increase the amount of heat that could be dissipated by equipment without necessitating major changes to existing facilities. However, as equipment nears 50°F(28°C) ΔT, the amount of airflow levels off to a minimum of~60cfm/kW(~100m3/kWh). Beyond this point, simply put, more airflow will be required to remove additional heat.

如前所述,升高温差相当于减少了每单元排热所需要的热量。直到2010年之前,温差范围的提升意味着同等负载情况所需用于排热的气流更少,这就让服务器制造商实际上提高了设备排热能力而不必对现有基础设施做重大改变。不过,随着设备温差接近50华氏度(28摄氏度),气流组织的需要量下降到至少60 cfm/Kw(~100m3/kWh)。超出这个点之后,简单来说,将需要更多的气流量来带走多余的热量。

Airflow

If IT equipment power consumption and density continue to increase while near or at the maximum practical ΔT, then airflow must continue to increase(or another cooling method must be used, such as liquid cooling). Depending on the flexibility and future proofing of existing data centers, this may or may not necessitate major changes during future IT refreshes.

气流组织

如果IT设备的能耗及密度持续增长,当接近或达到最大实际温差时,那么气流组织也必须相应持续增长(或者使用其他制冷方式,比如液冷)。依靠现有数据中心的灵活性以及可预见的保障能力,在将来的IT设备更新迭代过程中,这类明显改变或许会发生,或许不会。

It has been commonplace for a long time to adopt hot-aisle, cold-aisle layouts within the data center. This practice concentrates the exhaust air for more efficient removal while simultaneously reducing inefficient mixing of supply and return air. To supply the air, many data centers use raised floor plenums to distribute the supply air across the data center floor, with perforated tiles in the cold-aisles.

很长一段时间以来,在数据中心内部采用冷、热通道的布局成为通用做法。这种做法着眼于将热空气更有效排出,同时避免送、回风混合。为了提供制冷用的气流,许多数据中心会采用悬空的地板来促进送风气体流动,在冷通道一侧的地砖来打出孔洞。

However, this standard layout typically allows for, at most, asingle perforated tile per rack, creating a possible constraint on the maximum amount of airflow that can be supplied. Based on standard raised floor pressures of 0.03 in .w.c.to 0.05 in .w.c.(7.5 Pa to 12.5Pa), perforated tiles(~25% free area) could supply up to 500 cfm (850 m3/h) ) while very open grates (~60% free area) could supply up to 1,800cfm (3060 m3/h). This range already exceeds what may be required for a full 42U rack.

不过,这种标准的布局一般只适用于每列机架单个通风地板的情况,能够形成一种可能的连续在能够提供的最大允许量。基于标准提高压力从0.03 in w.c.到0.05 in w.c.(7.5帕到12.5帕)的情况,通风地板(~25%的自由面积)能够支持到500 cfm(850 m3/h),当特别打开的程度时(~60%的自由面积)能够支持到1800 cfm(3060 m3/h). 对于一个42U的机架来说,风量要求的范围已经有所提高。

Attempting to simply raise the under floor pressure to increase airflow may not be effective, due to increased leakage and unexpected pressure effects where the velocity of supply air under the floor is high. For example, perforated tiles near computer room air handler(CRAH) units may actually experience reverse airflow due to low underfloor static pressure according to the Bernoulli principle. High pressures also increase the typical air velocity exiting the raised floor, and can cause air to overshoot the servers in the bottom of the rack and increase mixing with the return air. 

由于地板高度提高,地板下的空气流速加快之后,会导致后续气流匮乏以及有意想不到的气压效应,所以,简单地靠提高地板下的气流压力来增强气流量的尝试恐怕不会特别有效。例如,根据伯努利方程,由于较低的地板静态压力,机房里风墙附近的通风地板可能会实际上面临多种的气流。高的气流压力也会提高气体的速度,这就可能导致空气穿过机架底部的服务器而与回风气体混合.

A few potential solutions exist to overcome this restriction. The cold aisle could be widened to allow two perforated tiles per rack, but this would require significantly more

floor space, leading to are duction in the number of racks that could fit into the data center. Alternatively, active tiles could be used that contain fans and can increase airflow up to 3,000 cfm(5100 m3/h) or more. It turns out, though, that these are best used sparingly for spot cooling as opposed to increasing airflow throughout the data center. Unless enough supply air is provided, widespread use of these tiles could lower the pressure under the raised floor causing more harm than good.

要解决这种困境,一些潜在的方案可供选择。拓宽每列机架的冷通道使之达到两块通风地板的宽度,但是这会占用更多面积,就造成了数据中心的总装机容量减少。或者,安装风扇形成主动的通风地板,可以把气流提升至3000 cfm(5100 m3/h)甚至更多。然而,结果表明,对于单点制冷的系统,还是最好慎重使用这种手段。它不希望使气流广泛的通过数据中心。除非提供充足的送风,广泛使用这种手段,会降低地板下的压力,弊大于利。


Directly Coupled Air Paths

If the IT equipment requirements are in conflict with the amount of air that can be reasonably supplied in a typical manner, it may be advisable to consider directly coupledcooling solutions in which supply or return air is entirely contained. By containing one of the air streams, pressures can be increased without leakage or mixing, making it easier to supply additional air.

直接耦合空气路径

如果IT设备的制冷需求与传统制冷手段所能提供的最大送风量不一致,直接耦合空气路径的模式可以被推荐使用。这种情形下,送风或者回风是完全被包含在里面的。通过包含其中之一的空气气流,气压可以升高,同时不发生短缺或者混合的情况,使得供应额外的气流变得更加简单。

For direct coupling, either the supply air or return air can be contained. Further, containment can be as large as the entire aisle or smaller, such as a contained chimney for thehot exhaust, o reven smaller still by limiting the containment to the rack itself, as is the case with rear door heat exchangers. Figure1 shows a graphical matrix representing possible designs that answer these two questions.

对于直接耦合空气路径来说,不管是送风还是回风都能被包含在内。不仅如此,被封闭的范围可以大到整个通道,或者小一些,比如一个用于排出热空气的烟囱,或者再小一些,把机架本身封闭,在门后面的热交换器的情形就是。表格1给出了一个生动的矩阵,代表了能够回答这两个问题的可能的设计方案。

The size of the containment will typically be inversely related to the pressure created by flow resistance inside it. For very large containment volumes, there may be a negligible to minor increase in resistance that must be overcome. For smaller containment volumes such as capturing the hot exhaust and ducting it into a return plenum, the resistance due to pressure may become significant unless there is active assistance increasing the flow. The increased pressure also can achieve higher cooling capacities,t hus higher rack densities.

封闭空间的大小与里面气流所遇到的阻力呈负相关。对于特别大的封闭体积,这里有一个微小的提升在阻力方面,这是必须需要克服掉的。对于小点的封闭体积,比如说控制、传导热空气到一个回风集气室,由于气压引起的阻力就会显著提高,除非有主动的增强手段来传给气流。提升压力,能带来更高的制冷能力,也就为配置更高密度的机柜提供了可能。

Modern servers can adjust to external pressures by appropriately and continuously speeding up or slowing down fans, assuming there is enough margin to do so. Limitations of the IT equipment must be considered though; if servers are running at their maximum utilization at maximum inlet temperature, the fans may already be running at their design maximum. Additionally, other IT equipment such as legacy servers, storage equipment, and switches may have more basic thermal controls and fail to adequately adapt to external pressures.

在给定的范围内,现在的服务器能通过增减风扇转速对外界气压变化作出持续微调,有很好的适应能力。尽管如此,IT设备本身的局限性也需要考虑:如果服务器在其所允许的最高入口温度下满负荷运行,此时风扇转速可能已经在其设计最大值的情况下运转了。此外,其他IT设备,像传统服务器、存储设备、交换设备,可能只是具备最基本的热控制系统,从而不能很好地适应外界压力变化。

Data Center Transients

Within the data center, environmental factors may vary frequently and at different rates. On one end of the spectrum we have changes to air temperatures, which can change quite significantly and rapidly due to their minimal thermal mass and rapid flow through the data center. If a rack of IT equipment is turned on, then the air temperature at the back of that rack will increase almost immediately.

数据中心瞬变

在数据中心里,环境因素可能以不同速率变化十分频繁。在频谱的一端,我们改变空气的温度,可能引起十分显著且迅速地变化,由于最低限度的热量和快速的气流流经整个数据中心。如果一列的IT设备开机使用,那么机架后面的空气温度几乎会瞬间升高。

At the other end, the temperature of the physical data center(walls, floor, etc.) may change the slowest since they have very large thermal masses and do not produce heat themselves. The IT and facility equipment most likely land somewhere in between, as they have large thermal masses but the potential to produce or exchange great quantities of heat.

在频谱的另一端,数据中心建筑设施(例如墙、地面等)的温度可能是变化最慢的,因为它们有很大的面积,本身不产生热量。IT和设施设备几乎介于两者之间,虽然它们具有庞大的热交换面积但它们也可能产生或者交换大量的热。

There are two main things to consider related to the capacity of the data center environment to change. The first is IT equipment specifications that often specify a maximal rate of temperature change per some timeframe. Thermal Guidelines for Data Processing Environments states a maximal rate of 9°F(5°C) per hour for tape and 9°F(5°C) per 15 minutes for all other IT equipment.2 The thermal mass of IT equipment is often enough to alleviate concerns that the equipment would change temperature too rapidly due to a normal change in environmental conditions.

关于数据中心对环境变化的能力,有两个主要方面需要考虑。首先需要考虑IT设备本身的特性,它们往往在一个时间段内对温度变化的速率有指定的范围。数据中心环境导则里给出了一个最大范围:针对特定类型9华氏度(5摄氏度)每小时,针对所有其他IT设备9华氏度(5摄氏度)每15分钟。IT设备的热交换面积通常足以应对外界环境温度的迅速变化,这一定程度上减缓了这种风险。

The second consideration is related to failure modes of the data center and their effect  on  the equipment. For example, a loss of CRAH fan power could have a rapid impact as return air ceases to be cooled, whereas a loss of pumps to the CRAH would have a slower impact as the return air could continue to transfer heat to the CRAH coils for a short period of time.

第二个需要考虑的因素是关于数据中心故障类型及其对设备的影响。举例来说,列间空调的风扇掉电会产生迅速影响,因为回风会停止被冷却,然而一台泵停止工作就会产生相对比较慢的影响,因为回风在短时间内依然可以通过列间空调的盘管来继续进行热交换。

All modes of failure should be properly evaluated for their impact to the data center. Based on the design and failure, factors such as the thermal mass along the airflow path, possible mixing of supply and return air, and control schemes may greatly affect how quickly temperatures rise in the data center. This information is critical to creating the right data center response plans in the event of failures.

所有故障类型对数据中心运行条件产生的影响都应该被准确评估。基于设计和故障,各类因素,像气流传播路径的热量,送、回风混合的可能性,制冷模式等,可能很大程度上影响数据中心里一旦发生故障温度上升的速度。这些信息对于制定合理的数据中心故障响应预案极为重要。

Some Other Interrelated Considerations

Inaddition to air temperature-related concerns, many other aspects of data center design and performance need to be considered when making any changes to the data center. For example, the potential for electrostatic discharge may increase when humidity is very low, but unnecessary humidification wastes power.

一些其它有关因素

除了与温度相关的担忧,做任何变更时,许多其它方面的数据中心设计及绩效因素需要被考虑进去。例如,湿度太低时容易增加静电释放的概率,但是不必要的加湿过程浪费能源。

Humidity also plays a role regarding air contaminants; corrosive gaseous factors are generally more dangerous when humidity is high.3. Therefore, design humidity may affect what sort of air filtration is required.

湿度在空气污染方面也举足轻重,腐蚀性气体在高湿情况下危害更甚。因此,设计湿度可能会对选择何种空气过滤方式造成影响。

Particulate air filtrationis another factor that can have large-reaching effects, since fouling of the air streamson even tiny components within servers can affect the thermal performance of servers throughout the data center. On the other hand, unnecessary air filtration will certainly waste energy.

空气中微粒的过滤是另一个能够产生广泛影响的因素,因为空气中的污染物能够对服务器内部的微小元件造成影响,从而影响整个数据中心的服务器的热量表现。另一方面,不必要的过滤无疑会造成能源的浪费。

Even acoustics and weight should be considered. Increasing temperatures throughout the data center could increase server fan speed to the point where acoustical specifications are being approached. Similarly, increasing server density could mean increasing the load on a raised floor beyond its design point, necessitating additional  structural  supports and creating further airflow obstructions under the floors.

即便是声效和重量都是应该被考虑的因素。数据中心内温度升高,会造成服务器的风扇转速提高,从而使得噪声达到声效限值。类似的,服务器密度的提升同时意味着地板的承载能力超过其设计限值,这就使加装额外结构支撑和提供更远的气流组织通道变得必要。

Conclusion

Throughout the life of the data center, changes to both the IT and facilities equipment will almost inevitably occur on a regular basis. However, inherent risk exists in making changes without considering all the effects it might have downstream. One seemingly minor change to one aspect might ultimately cause a massive system failure elsewhere.

结论

在数据中心的寿命里,通常情况下,不管是IT设备还是设施设备的变化都几乎是不可避免的。然而,如果变更时没有充分考虑由此引发的种种后果,固有的风险就会显现出来。一个看起来特别小的改变可能导致巨大的系统性故障。

Designing for change is a wise strategy, but even observing current trends in IT and technology may not be enough to predict what changes may be required a few yearsfrom now. Therefore, potentially the best weapon against future risk is a solid understanding and appreciation for the interrelated nature of systems throughout the data center.

为变更做设计是一个明智的策略,但是即便是目前观察到IT和技术的发展趋势也可能是不够的,因为我们不能预测接下来几年哪些东西会作出改变。因此,可能最有效的对抗风险的屏障就是整个数据中心里所有系统的综合特性的深刻理解与敬畏。

References

1. ASHRAE.2016. IT Equipment Design Impact on Data Center Solutions. Atlanta: ASHRAE.

2. ASHRAE. 2015. Thermal Guidelines For Data Processing Environments, 4th Edition. Atlanta: ASHRAE.

3. ASHRAE. 2009. Particulate and Gaseous Contamination in Datacom Environments. Atlanta: ASHRAE.

Reproduced with permission of copyright owner. Further reproduction prohibited without permission.

翻译:

王亚辉

维谛技术有限公司天津办事处  服务工程师

DKV(Deep Knowledge Volunteer)计划创始成员

编辑:

李擎

北京欣盛云路科技有限公司 高级运营经理

公众号声明:

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多