CNN vs. RNN: How are they different? | TechTarget

我的技术大杂烩 2023-10-13 发布于广东

展开全文

To set realistic expectations for AI without missing opportunities, it's important to understand both the capabilities and limitations of different model types.
为了在不错过机会的情况下对人工智能设定切合实际的期望，了解不同模型类型的功能和局限性非常重要。

Two categories of algorithms that have propelled the field of AI forward are convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Compare how CNNs and RNNs work to understand their strengths and weaknesses, including where they can complement each other.
推动人工智能领域向前发展的两类算法是卷积神经网络 (CNN) 和循环神经网络 (RNN)。比较 CNN 和 RNN 的工作原理，了解它们的优缺点，包括它们可以在哪些方面相互补充。

The main differences between CNNs and RNNs include the following:
CNN 和 RNN 之间的主要区别包括：

CNNs are commonly used to solve problems involving spatial data, such as images. RNNs are better suited to analyzing temporal and sequential data, such as text or videos.
CNN 通常用于解决涉及空间数据（例如图像）的问题。 RNN 更适合分析时间和顺序数据，例如文本或视频。
CNNs and RNNs have different architectures. CNNs are feedforward neural networks that use filters and pooling layers, whereas RNNs feed results back into the network.
CNN 和 RNN 具有不同的架构。 CNN 是使用过滤器和池化层的前馈神经网络，而 RNN 将结果反馈到网络中。
In CNNs, the size of the input and the resulting output are fixed. A CNN receives images of fixed size and outputs a predicted class label for each image along with a confidence level. In RNNs, the size of the input and the resulting output can vary.
在 CNN 中，输入的大小和结果输出是固定的。 CNN 接收固定大小的图像，并输出每个图像的预测类别标签以及置信度。在 RNN 中，输入的大小和最终输出的大小可能会有所不同。
Common use cases for CNNs include facial recognition, medical analysis and image classification. Common use cases for RNNs include machine translation, natural language processing, sentiment analysis and speech analysis.
CNN 的常见用例包括面部识别、医学分析和图像分类。 RNN 的常见用例包括机器翻译、自然语言处理、情感分析和语音分析。

ANNs and the history of neural networks
ANN 和神经网络的历史

The neural network was widely recognized at the time of its invention as a major breakthrough in the field. Taking inspiration from the interconnected networks of neurons in the human brain, the architecture introduced an algorithm that enabled computers to fine-tune their decision-making -- in other words, to "learn."
神经网络在其发明时被广泛认为是该领域的重大突破。该架构从人脑中相互连接的神经元网络中汲取灵感，引入了一种算法，使计算机能够微调其决策，换句话说，就是“学习”。

An artificial neural network (ANN) consists of many perceptrons. In its simplest form, a perceptron is a function that takes two inputs, multiplies them by two random weights, adds them together with a bias value, passes the results through an activation function and prints the results. The weights and bias values, which are adjustable, define the outcome of the perceptron given two specific input values.
人工神经网络 (ANN) 由许多感知器组成。在最简单的形式中，感知器是一个函数，它接受两个输入，将它们乘以两个随机权重，将它们与偏差值相加，将结果传递给激活函数并打印结果。权重和偏差值是可调的，定义了感知器在给定两个特定输入值的情况下的结果。

This article is part of
本文是以下内容的一部分

In-depth guide to machine learning in the enterprise
企业机器学习的深入指南

Download 下载1

Download this entire guide for FREE now!
立即免费下载整个指南！

Combining perceptrons enabled researchers to build multilayered networks with adjustable variables that could take on a wide range of complex tasks. A mechanism called backpropagation is used to address the challenge of selecting the ideal numbers for weights and bias values.
结合感知器使研究人员能够构建具有可调节变量的多层网络，这些网络可以承担各种复杂的任务。一种称为反向传播的机制用于解决选择权重和偏差值的理想数字的挑战。

In backpropagation, the ANN is given an input, and the result is compared with the expected output. The difference between the desired and actual output is then fed back into the neural network via a mathematical calculation that determines how to adjust each perceptron to achieve the desired result. This procedure is repeated until a satisfactory level of accuracy is reached.
在反向传播中，给人工神经网络一个输入，并将结果与预期输出进行比较。然后，期望输出和实际输出之间的差异通过数学计算反馈到神经网络，确定如何调整每个感知器以实现期望的结果。重复该过程直到达到满意的准确度。

This type of ANN works well for simple statistical forecasting, such as predicting a person's favorite football team given their age, gender and geographical location. But using AI for more difficult tasks, such as image recognition, requires a more complex neural network architecture.
这种类型的人工神经网络非常适合简单的统计预测，例如根据一个人的年龄、性别和地理位置来预测他最喜欢的足球队。但使用人工智能来完成更困难的任务，例如图像识别，需要更复杂的神经网络架构。

Convolutional neural networks
卷积神经网络

Computers interpret images as sets of color values distributed over a certain width and height. Thus, what humans see as shapes and objects on a computer screen appear as arrays of numbers to the machine.
计算机将图像解释为分布在一定宽度和高度上的颜色值集。因此，人类在计算机屏幕上看到的形状和物体对机器来说显示为数字数组。

CNNs make sense of this data through mechanisms called filters: small matrices of weights tuned to detect certain features in an image, such as colors, edges or textures. In the first layers of a CNN, known as convolutional layers, a filter is slid -- or convolved -- over the input, scanning for matches between the input and the filter pattern. This results in a new matrix indicating areas where the feature of interest was detected, known as a feature map.
CNN 通过称为过滤器的机制来理解这些数据：调整权重的小矩阵，用于检测图像中的某些特征，例如颜色、边缘或纹理。在 CNN 的第一层（称为卷积层）中，滤波器在输入上滑动（或卷积），扫描输入和滤波器模式之间的匹配。这会产生一个新的矩阵，指示检测到感兴趣特征的区域，称为特征图。

In the next stage of the CNN, known as the pooling layer, these feature maps are cut down using a filter that identifies the maximum or average value in various regions of the image. Reducing the dimensions of the feature maps greatly decreases the size of the data representations, making the neural network much faster.
在 CNN 的下一阶段（称为池化层）中，使用识别图像各个区域中的最大值或平均值的滤波器来削减这些特征图。减少特征图的维度可以大大减少数据表示的大小，从而使神经网络速度更快。

Finally, the resulting information is fed into the CNN's fully connected layer. This layer of the network takes into account all the features extracted in the convolutional and pooling layers, enabling the model to categorize new input images into various classes.
最后，所得信息被输入 CNN 的全连接层。网络的这一层考虑了在卷积层和池化层中提取的所有特征，使模型能够将新的输入图像分类为各种类别。

In a CNN, the series of filters effectively builds a network that understands more and more of the image with every passing layer. The filters in the initial layers detect low-level features, such as edges. In deeper layers, the filters begin to recognize more complex patterns, such as shapes and textures. Ultimately, this results in a model capable of recognizing entire objects, regardless of their location or orientation in the image.
在 CNN 中，一系列过滤器有效地构建了一个网络，该网络可以通过每个传递层来理解越来越多的图像。初始层中的过滤器检测低级特征，例如边缘。在更深的层中，过滤器开始识别更复杂的模式，例如形状和纹理。最终，这将产生一个能够识别整个物体的模型，无论它们在图像中的位置或方向如何。

Bias in artificial neurons
人工神经元的偏差

In both artificial and biological networks, when neurons process the input they receive, they decide whether the output should be passed on to the next layer as input. The decision of whether to send information on is called bias, and it's determined by an activation function built into the system. For example, an artificial neuron can only pass an output signal on to the next layer if its inputs -- which are actually voltages -- sum to a value above some particular threshold.
在人工网络和生物网络中，当神经元处理它们收到的输入时，它们会决定输出是否应该作为输入传递到下一层。是否发送信息的决定称为偏差，它由系统内置的激活函数决定。例如，如果人工神经元的输入（实际上是电压）的总和高于某个特定阈值，则人工神经元只能将输出信号传递到下一层。

Recurrent neural networks
循环神经网络

CNNs are great at recognizing objects, animals and people, but what if we want to understand what is happening in a picture?
CNN 非常擅长识别物体、动物和人，但如果我们想了解图片中发生的事情该怎么办？

Consider a picture of a ball in the air. Determining whether the ball is rising or falling would require more context than a single picture -- for example, a video whose sequence could clarify whether the ball is going up or down.
考虑一张空中球的图片。确定球是上升还是下降需要比单张图片更多的背景信息——例如，一段视频的序列可以明确球是上升还是下降。

This, in turn, would require the neural network to "remember" previously encountered information and factor that into future calculations. And the problem of remembering goes beyond videos: For example, many natural language understanding algorithms typically deal only with text, but need to recall information such as the topic of a discussion or previous words in a sentence.
反过来，这将要求神经网络“记住”以前遇到的信息并将其纳入未来的计算中。记忆问题不仅仅局限于视频：例如，许多自然语言理解算法通常只处理文本，但需要回忆诸如讨论主题或句子中之前的单词等信息。

RNNs were designed to tackle exactly this problem. RNNs can process sequential data, such as text or video, using loops that can recall and detect patterns in those sequences. The units containing these feedback loops are called recurrent cells and enable the network to retain information over time.
RNN 的设计正是为了解决这个问题。 RNN 可以使用可以调用和检测这些序列中的模式的循环来处理序列数据，例如文本或视频。包含这些反馈循环的单元称为循环单元，使网络能够随着时间的推移保留信息。

When the RNN receives input, the recurrent cells combine the new data with the information received in prior steps, using that previously received input to inform their analysis of the new data. The recurrent cells then update their internal states in response to the new input, enabling the RNN to identify relationships and patterns.
当 RNN 接收输入时，循环单元将新数据与之前步骤中接收到的信息结合起来，使用之前接收到的输入来通知它们对新数据的分析。然后，循环单元会更新其内部状态以响应新的输入，从而使 RNN 能够识别关系和模式。

To illustrate, imagine that you want to translate the sentence "What date is it?" In an RNN, the algorithm feeds each word separately into the neural network. By the time the model arrives at the word it, its output is already influenced by the word What.
为了说明这一点，假设您要翻译“What date is it?”这句话。在 RNN 中，算法将每个单词单独输入神经网络。当模型到达单词 it 时，其输出已经受到单词 What 的影响。

RNNs do have a problem, however. In basic RNNs, words that are fed into the network later tend to have a greater influence than earlier words, causing a form of memory loss over the course of a sequence. In the previous example, the words is it have a greater influence than the more meaningful word date. Newer algorithms such as long short-term memory networks address this issue by using recurrent cells designed to preserve information over longer sequences.
然而，RNN 确实有一个问题。在基本 RNN 中，较晚输入网络的单词往往比较早的单词具有更大的影响力，从而在序列过程中导致某种形式的记忆丢失。在前面的示例中，单词 is it 比更有意义的单词 date 具有更大的影响力。长短期记忆网络等较新的算法通过使用旨在保存较长序列信息的循环单元来解决这个问题。

Convolutional and recurrent neural networks differ in their architectures, input and output, ideal use cases, and real-world applications. — Although CNNs and RNNs are both types of neural networks, they differ in several important ways.
尽管 CNN 和 RNN 都是神经网络类型，但它们在几个重要方面有所不同。

CNNs vs. RNNs: Strengths and weaknesses
CNN 与 RNN：优点和缺点

CNNs are well suited for working with images and video, although they can also handle audio, spatial and textual data. Thus, CNNs are primarily used in computer vision and image processing tasks, such as object classification, image recognition and pattern recognition. Example use cases for CNNs include facial recognition, object detection for autonomous vehicles and anomaly identification in medical images such as X-rays.
CNN 非常适合处理图像和视频，尽管它们也可以处理音频、空间和文本数据。因此，CNN 主要用于计算机视觉和图像处理任务，例如对象分类、图像识别和模式识别。 CNN 的示例用例包括面部识别、自动驾驶车辆的物体检测以及 X 射线等医学图像中的异常识别。

RNNs, on the other hand, excel at working with sequential data thanks to their ability to develop contextual understanding of sequences. RNNs are therefore often used for speech recognition and natural language processing tasks, such as text summarization, machine translation and speech analysis. Example use cases for RNNs include generating textual captions for images, forecasting time series data such as sales or stock prices, and analyzing user sentiment in social media posts.
另一方面，RNN 擅长处理序列数据，因为它们能够发展对序列的上下文理解。因此，RNN 通常用于语音识别和自然语言处理任务，例如文本摘要、机器翻译和语音分析。 RNN 的示例用例包括生成图像的文本标题、预测销售或股票价格等时间序列数据以及分析社交媒体帖子中的用户情绪。

For some tasks, one option is clearly the better fit. For example, CNNs typically aren't well suited for the types of predictive text tasks where RNNs excel. Trying to use a CNN's spatial modeling capabilities to capture sequential text data would require unnecessary effort and memory; it would be much simpler and more efficient to use an RNN.
对于某些任务，一种选择显然更适合。例如，CNN 通常不太适合 RNN 擅长的预测文本任务类型。尝试使用 CNN 的空间建模功能来捕获连续文本数据将需要不必要的精力和内存；使用 RNN 会更简单、更高效。

However, in other cases, the two types of models can complement each other. Combining CNNs' spatial processing and feature extraction abilities with RNNs' sequence modeling and context recall can yield powerful systems that take advantage of each algorithm's strengths.
然而，在其他情况下，两种类型的模型可以相互补充。将 CNN 的空间处理和特征提取能力与 RNN 的序列建模和上下文回忆相结合，可以产生利用每种算法优势的强大系统。

For example, a CNN and an RNN could be used together in a video captioning application, with the CNN extracting features from video frames and the RNN using those features to write captions. Similarly, in weather forecasting, a CNN could identify patterns in maps of meteorological data, which an RNN could then use in conjunction with time series data to make weather predictions.
例如，CNN 和 RNN 可以在视频字幕应用中一起使用，其中 CNN 从视频帧中提取特征，而 RNN 使用这些特征来编写字幕。同样，在天气预报中，CNN 可以识别气象数据地图中的模式，然后 RNN 可以将其与时间序列数据结合使用来进行天气预报。

Dig deeper into the expanding universe of neural networks
深入挖掘不断扩展的神经网络领域

CNNs and RNNs are just two of the most popular categories of neural network architectures. There are dozens of other approaches, and previously obscure types of models are seeing significant growth today.
CNN 和 RNN 只是最流行的神经网络架构类别中的两种。还有数十种其他方法，以前晦涩难懂的模型类型如今正在显着增长。

Transformers, like RNNs, are a type of neural network architecture well suited to processing sequential text data. However, transformers address RNNs' limitations through a technique called attention mechanisms, which enables the model to focus on the most relevant portions of input data. This means transformers can capture relationships across longer sequences, making them a powerful tool for building large language models such as ChatGPT.
Transformer 与 RNN 一样，是一种非常适合处理顺序文本数据的神经网络架构。然而，Transformer 通过一种称为注意力机制的技术解决了 RNN 的局限性，该技术使模型能够专注于输入数据最相关的部分。这意味着 Transformer 可以捕获较长序列之间的关系，使其成为构建 ChatGPT 等大型语言模型的强大工具。

Generative adversarial networks (GANs) combine two competing neural networks: a generator and a discriminator. The generator creates synthetic data that attempts to mimic the real input as closely as possible, while the discriminator tries to detect whether data is real or produced by the generator. GANs are used in generative AI applications to create high-quality synthetic data, such as images and video.
生成对抗网络（GAN）结合了两个竞争的神经网络：生成器和判别器。生成器创建尝试尽可能模仿真实输入的合成数据，而鉴别器则尝试检测数据是真实的还是由生成器生成的。 GAN 用于生成人工智能应用程序来创建高质量的合成数据，例如图像和视频。

Convolutional neural networks and generative adversarial networks differ in their architectures, use cases, training modalities and approaches to convolution. — CNNs and GANs have distinct architectures, use cases and training requirements.
CNN 和 GAN 具有不同的架构、用例和训练要求。

Autoencoders are another type of neural network that is becoming the tool of choice for dimensionality reduction, image compression and data encoding. Similar to GANs, autoencoders consist of two models: an encoder, which compresses input data into a code, and a decoder, which attempts to reconstruct the input data from the generated code. The autoencoder's goal is to improve its performance over time by minimizing the difference between the original input and the decoder's reconstruction.
自动编码器是另一种类型的神经网络，它正在成为降维、图像压缩和数据编码的首选工具。与 GAN 类似，自动编码器由两个模型组成：编码器，将输入数据压缩为代码；解码器，尝试从生成的代码重建输入数据。自动编码器的目标是通过最小化原始输入和解码器重建之间的差异来随着时间的推移提高其性能。

In addition, researchers are finding ways to automatically create new, highly optimized neural networks on the fly using neural architecture search. This technique starts with a wide range of potential architecture configurations and network components for a particular problem. The search algorithm then iteratively tries out different architectures and analyzes the results, aiming to find the optimal combination.
此外，研究人员正在寻找使用神经架构搜索自动创建新的、高度优化的神经网络的方法。该技术从针对特定问题的各种潜在架构配置和网络组件开始。然后，搜索算法迭代尝试不同的架构并分析结果，旨在找到最佳组合。

In this way, neural architecture search improves efficiency by helping model developers automate the process of designing customized neural networks for specific tasks. Examples of automated machine learning include Google AutoML, IBM Watson Studio and the open source library AutoKeras.
通过这种方式，神经架构搜索可以帮助模型开发人员自动化为特定任务设计定制神经网络的过程，从而提高效率。自动化机器学习的示例包括 Google AutoML、IBM Watson Studio 和开源库 AutoKeras。

Researchers can also use ensemble modeling techniques to combine multiple neural networks with the same or different architectures. The resulting ensemble model can often achieve better performance than any of the individual models, but identifying the best combination involves comparing many possibilities.
研究人员还可以使用集成建模技术将具有相同或不同架构的多个神经网络组合起来。由此产生的集成模型通常可以比任何单个模型获得更好的性能，但确定最佳组合需要比较许多可能性。

To address this issue, researchers have developed techniques for comparing the performance and accuracy of neural network architectures, enabling them to more efficiently sift through the many options available for a given task. Creative applications of statistical techniques such as bootstrapping and cluster analysis can help researchers compare the relative performance of different neural network architectures.
为了解决这个问题，研究人员开发了用于比较神经网络架构的性能和准确性的技术，使他们能够更有效地筛选可用于给定任务的许多选项。自举和聚类分析等统计技术的创造性应用可以帮助研究人员比较不同神经网络架构的相对性能。

Editor's note: David Petersson originally wrote this article, and Lev Craig updated and expanded it. George Lawton also contributed to this story.
编者注：这篇文章最初由 David Petersson 撰写，Lev Craig 对其进行了更新和扩展。乔治·劳顿也对这个故事做出了贡献。