分享

【DKV】只用4招!提升数据中心比特/瓦特能效

 yi321yi 2020-04-23

More work per watt: Four ways to turbocharge data centers in the IoT era

本文由DCD授权翻译并在DeepKnowledge平台发表。

每瓦特更多功用意味着数字地产的更高效利用,成为城市或边缘高效的成本

More work-per-watt potentially means a better use of real estate, which can be a premium cost in cities or at the edge

April 15, 2020

Chris Bergey是ARM SVP和基础设施部的总经理

Chris Bergey, Arm

能源效率是计算背后的无名英雄。

Energy efficiency is the unsung hero of computing.

我们都了解有关性能的故事。尖端移动技术过去曾经意味着计算器上一个专用按键√ ̄即可获得平方根。今后不久,无人驾驶汽车、智慧医疗设备和智慧建筑将改变日程生活的内容和方式。

We all know the performance story. Cutting-edge mobile technology once meant a calculator with a dedicated square root button. Soon, autonomous cars, smart medical devices and smart buildings will change the fabric of daily life.

这些技术进步的实现有一个前提:移动电话、服务器和其他设备必需在每瓦特能耗基础上每2-2.5年性能加倍才行。以数据中心为例,数据中心工作负载自2015年以来翻了3倍,同期互联网流量增长了5倍,数据中心各种不起眼的底层突破,能耗始终保持在约1980亿千瓦时的水平。

But those advances have only been possible because mobile phones, servers and other devices have been doubling the amount of work they perform watt of power roughly every 2 to 2.5 years. Take data centers. Data center workloads have nearly tripled since 2015 while Internet traffic has grown 5x, yet data center power consumption - thanks to a number of background breakthroughs that may not make the front pages - has stayed relatively flat at around 198 billion kilowatt hours.

这是否不合常理,每升汽油怎么能几年后跑加倍的路程呢?

Is this unusual? Does your car’s gas mileage double every few years?

另一种审视能源效率水平是通过计算密度。不断增长的计算密度提高了每个数据中心的可用资源,对运营商来说这意味着更好的投资回报率(ROI),可以充分利用好基础设施和对应用程序进行更好的定价,以及更快速地交付客户使用。每个机柜能提供的电力和散热是限定的:要么每瓦特做更多功提升生产效率,要么提供更多空间和服务器。良性循环突然开始失效了。

Another way to look at the energy efficiency phenomenon is through the lens of compute density. Increasing compute density increases available resources per data center, which in turn means a potentially better ROI for carriers, better pricing for applications taking advantage of the infrastructure and more rapid adoption by customers. A rack, however, can only accommodate a finite amount of power and heat: either productivity increases via work-per-watt or more space and servers are required. The virtuous cycle suddenly starts creaking.

智能设备的浪潮对每瓦特对应的做功性能提出了新挑战。电力通常占数据中心运营成本的30-50%。《State of the Edge 2020》文中报道到2028年将有120GW的新电能缺口,相当于加州电网的容量,才能来满足新应用所需。结果是一些行业能否存续将依赖效率和密度方面的变化。

The coming wave of intelligent devices adds a new level of urgency to the performance-per-watt quest. Electricity is often 30-50 percent of the operating costs of data centers. However, the State of the Edge 2020 report estimates that 102GW of new power capacity, or basically a California-size grid, will be needed by 2028 to serve these new applications. As a result, a number of industries could struggle or swim depending on what happens with efficiency and density.

采用多核

More Cores

在过去的15年间,很多人采取了“消化吸收”的策略来提升能效和密度。首先,有些公司对原有建筑进行改造来优化气流组织。之后,机柜上的塑料薄膜进一步降低了制冷开销。继而人们开始实地寻找运行中的僵尸服务器(zombie server)。然后采用GPU和Flash加速器推动每瓦特的功用。

Over the past 15 years, most have followed an “outside in” path to better efficiency and density. First, companies retrofitted buildings to improve air flow. Then, plastic sheeting over racks further brought down cooling budgets. Then they went inside the racks to root out zombie servers running in place. Next came GPUs and flash accelerators to boost operations-per-watt.

下一步将进一步如剥洋葱般精简计算机部件中的重中之重---处理器。数据中心中部署的服务器将由大量能效出众的计算核芯的处理器,将取代核芯少的处理器。换个说法就是,将从低档高转速转为高档位地转速(这样更经济实惠)。

The next layer of the onion revolves around the crown prince of computer components. Yes, the processor. Data centers deploying servers with processors powered large numbers of energy-efficient compute cores instead of smaller numbers of beefier cores. Put another way, they are shifting from racing in a lower gear at high RPMs to a higher gear at lower RPMs.

AWS采用的新款Graviton2 SoC系统芯片处理器有64核,可与机柜中其他数以百计(或上千)定制芯片协同工作。富士通(Fujitsu)去年宣称Fugaku高性能超级计算机获得了Green 500(全球高能效超算500强)排名的首位,该计算机有37000颗高能效芯片,每瓦特电力可以实现168.76亿次浮点运算。另一厂家 Ampere近期展示了把3680颗芯片安装在1个机柜里的能力,这是普通机柜的2.5倍。

The new Graviton2 system-on-chip (SoC) processor from Amazon Web Services (AWS) consists of 64 compute cores that can be combined with other purpose-built cores for hundred, if not thousands, of cores per rack. Fujitsu last November claimed the #1 spot on the Green 500 list with Fugaku, a high-performance supercomputer containing nearly 37,000 energy efficient cores that can perform 16.876 gigaflops per watt. Ampere recently showed how it can fit 3680 cores into a rack, or more than 2.5x more than normal.

增加NPU(神经网络处理器)

Add NPUs

与70年代 Stan Smiths鞋子(阿迪达斯的款式)一样,协处理器也一样过气了。神经网络处理器技术---以矢量扩充方案、集成电路或专用神经网络处理器的形式,专门通过高效计算(能效)设计来实现矩阵乘法。传统的CPU无法实现这一功能。Deloitte(德勤)预计神经网络处理器的发货量将以每年20%速度增长,是半导体出货量长期复合年增长率 (CAGR)的2倍。

Like 1970s Stan Smiths shoes, the co-processor is back. Neural network processing technology - which can come in the form of vector extensions, integrated circuitry or a dedicated NPU - is specifically designed for performing matrix multiplication in a compute (and thus energy) efficient manner. Classic CPUs are not. Deloitte predicts NPU shipments will grow by 20 percent per year, more than 2x the long term CAGR for semiconductors.

NPU技术对于通过AI实现视频和图片筛查,降低所需资源可能是关键一环。视频已经占当前互联网流量的75%,到2022年将增长到82%。面部识别和实时分析将越来越重要。

NPU technology could be critical for reducing the resources required for extracting insights from video and images with AI. Video already accounts for an estimated 75 percent of Internet traffic today and will gobble up 82 percent by 2022. Facial recognition and real-time analysis will only add to the heft.

借助边缘

Leverage the Edge

基于云的数据中心,理论上讲,将成为实现计算负载最高效的场所。工作负载不断汇集,最高效的设备将用来执行这些操作。

Cloud-based data centers - in the abstract - are the most efficient place to place computing loads. Workloads can be aggressively consolidated, and the most efficient equipment can be used to populate them.

不幸的是,我们并非生活在抽象的世界里。例如,云平台15分钟的电力激增将产生令人厌烦(高得离谱)的峰值电费,如果负载能分散到外围就不会产生这种糟糕的状况。与此同时,无人驾驶和人工智能驱动的工厂也将需要更复杂的边缘基础设施。

Unfortunately, we don’t live in an abstract world. A 15-minute surge in power demand at a cloud, for instance, could lead to annoying (and exorbitant) peak-power charges which wouldn’t occur if loads were spread to the periphery. Meanwhile, autonomous driving and AI-enabled factories will also require a sophisticated edge infrastructure.

在类似当下Covid-19疫情的全国或全球危机中,汇聚到云端将意味着网络连接变慢和视频卡顿。例如,美国互联网流量3月13日在宣布全国进入紧急状态后猛增了20%,Cloudflare的数据显示意大利网络流量激增了20-40%(详见下图)

And in a national or global crisis, like the current Covid-19 pandemic, a cloud-centric outlook could mean slow connections and spotty video feeds. For example, Internet traffic in the U.S. temporarily spiked by 20 percent after a national state-of-emergency was declared on March 13, according to Cloudflare, while Italy’s daily traffic surged 20-40 percent (see image).

疫情后全球视频会议将更加普遍,对更多边缘的需求日益浮出水面。

And with videoconferencing likely to be more common in a post-pandemic world, the need for a deep edge is even more apparent.

这一趋势要求运营商和各方开发畅通负载流量的技术,实现利用率最大化以及降低电力和网络流量。来自Anne-Cécile Orgerie的Ehsan Ahvar 和Inria 项目实验室的Adrien Lebre最近撰文,在分布式服务器中完成物联网相关任务相较云端数据中心将降低14-25%能耗。原因何在呢?他们给出的分析是更少的网络跳数、更少的制冷负载,空闲的计算资源得到了更好的利用。

This trend will require carriers and others to develop techniques for fluidly shifting loads to maximize utilization and minimize power and network traffic. Ehsan Ahvar, Anne-Cécile Orgerie and Adrien Lebre at the Inria Project Lab recently wrote that managing IoT-related tasks on distributed servers can consume 14 percent to 25 percent less power than a cloud-based data center. Why? Fewer network hops, lower cooling loads, and better use of fallow computing assets.

利兹大学的研究人员发现视频监控可以在各帧变化平滑期间,利用附近物联网和雾计算等设备将能耗减低32%。

Similarly researchers of the University of Leeds found that the power consumption for surveillance video tasks could be cut by up to 32 percent through harvesting the available capacity from nearby IoT and fog computing assets during slow periods.

计算附加效应

Calculate the Ancillary Effects

每瓦特更多功用意味着数字地产得以更高效利用,城市和边缘成为高效的成本。转用可再生能源可实现降低电力、水费和各种规费。员工、客户和投资方不仅要求公司降低碳排放,同时也要求公司选用更加环保绿色的供应商。

More work-per-watt potentially means a better use of real estate, which can be a premium cost in cities or at the edge. A shift to renewables can reduce power and water costs as well as reduce regulatory fees. Employees, consumers and investors are demanding companies reduce their own carbon footprint as well as buy from greener suppliers.

数据中心虽未面临危机,运营商现在就应该考虑今后的各种挑战。

Data centers aren’t facing a crisis yet, but they want to start thinking about the changes coming their way now.

翻译校对:Eric

编辑排版:Amy

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多