The cool thing about programming is that there is an absolute truth in every Bit: It’s either black or white, 1 or 0, true or false. In theory. But whenever you transmit data or store it somewhere on a physical drive, it can get corrupted. Individual bits can get flipped or even bursts of multiple bits, changing the value and the meaning of your data inadvertently. Surrounding electromagnetic radiation can interfere with your Bluetooth or wi-fi signals and cause unintended changes in your data stream. Simply accessing a flash memory might also result in bit flips. So whenever you exchange data with other systems, how do you make sure that the data you receive is exactly the same as the data that was sent to you? How do you detect accidental changes in your data stream? Or in other words, how do you check the integrity of your data against unintended changes and losses? There are a number of mechanisms used for data error detection. The most common one is called Cyclic Redundancy Check (CRC). It is often used in Bluetooth and other wireless communication protocols. It is also used to check the integrity of different types of files such as Gzip, Bzip2, PNG etc. The name sounds rather complex and intimidating. However, after reading this article you should have a good understanding of what a CRC is and how it works. The BasicsCRC is an error detection code used for verifying the integrity of data. It works just like a checksum and is appended to the end of the payload data and transmitted (or stored) along with that data. For example, let’s say we want to send the lower-case letter z to someone else. In Unicode, the z is represented by the number , that’s in binary representation. In order to make it possible for the receiver to verify that the data wasn’t modified on transmission, we calculate the CRC for this data (1000 ) and simply append it to our message:
0111 1010 (data) → 0111 1010 1000 (data + CRC) 0111 1010 (data) → 0111 1010 1000 (data + CRC) The check value is called redundant because it doesn’t add any additional information to the message. (This data is only appended for the sake of error detection and data integrity). 0111 1010 contains the same information as (where last 4 bits are the CRC), however, in the first case, the receiver has no way of verifying if the data was received correctly.
CRCs are specifically designed to detect common data communication errors. It can also detect when the order of the bits or bytes changes. CRCs can easily be implemented in hardware – another reason for their widespread use. Before we take a deep dive into how a CRC works, there is one important concept that we need to understand first: The endianness. EndiannessThe endianness is the order of bytes with which data words are stored. We distinguish the following to types:
In the introductory example (with the letter From an app developer’s perspective, the answer is quite simple: We use the same endianness as the system we want to communicate with. It’s usually required by the hardware to use a particular byte order. For the remainder of this article, we’ll stick with the big-endian format and provide some hints along the way on how a little-endian implementation would differ. The Concept of a CRCMathematically spoken, a CRC is the remainder of a modulo-2 polynomial division of the data. So, the CRC can be written as CRC = data %2 polynomial The polynomial can be any mathematical polynomial (without any coefficients) up to the order of the CRC bit size. So if we want a CRC with 8 bits, the highest exponent of the polynomial must be 8 as well. Example: x8 + x2 + x + 1 As you can imagine from the simple equation above, choosing a different polynomial will result in a different CRC for the same data. So sender and receiver need to agree on a common polynomial, or else they will always end up with a different CRC and assume that the data got corrupted on the way. Depending on the use case, some polynomials are better suited than others. (Picking the right one is a topic on its own and beyond the scope of this article.) Any such polynomial can be represented in binary form by going through the exponents from highest to lowest and writing a 1 for each exponent that is present (8, 2, 1 and 0 in our example) and a 0 for each absent exponent (7, 6, 5, 4, 3 in this case): Exponent 8 7 6 5 4 3 2 1 0 Polynomial 1 0 0 0 0 0 1 1 1 ––––––––––––––––––––––––––––– Bit Index 0 1 2 3 4 5 6 7 8 Exponent 8 7 6 5 4 3 2 1 0
Polynomial 1 0 0 0 0 0 1 1 1
–––––––––––––––––––––––––––––
Bit Index 0 1 2 3 4 5 6 7 8 The above is the big-endian representation of the polynomial. For its little-endian representation, simply reverse the order of bits: Exponent 0 1 2 3 4 5 6 7 8 Polynomial 1 1 1 0 0 0 0 0 1 ––––––––––––––––––––––––––––– Bit Index 0 1 2 3 4 5 6 7 8 Exponent 0 1 2 3 4 5 6 7 8
Polynomial 1 1 1 0 0 0 0 0 1
–––––––––––––––––––––––––––––
Bit Index 0 1 2 3 4 5 6 7 8 Here are some example polynomials with their (big-endian) binary representation:
Next, in order to be able to calculate a CRC, we need to understand the modulo-2 division. Making Calculations in Modulo-2 ArithmeticModulo-2 arithmetic is performed digit by digit (bit by bit) on binary numbers. There is no carry or borrowing in this arithmetic. Each digit is considered independent from its neighbours. Addition & Subtraction The curious thing about modulo-2 arithmetic is that both addition and subtraction yield the same result. 10 10 + 11 - 11 ----- ------ 01 01 10 10
+ 11 - 11
----- ------
01 01 The sum or the difference of two bits can be computed with an XOR operation: The result is 1 if exactly one of the two bits is 1, otherwise the result is 0. As there is no carry or borrowing, the same applies not only for individual bits, but for binary numbers of any length, for example: 10 + 11 = 10 - 11 = 10 XOR 11 = 01 10 + 11
= 10 - 11
= 10 XOR 11
= 01 We get this result by applying the XOR operator on each pair of bits individually:
1 XOR 1 = 0 0 XOR 1 = 1 ----- 01 1 XOR 1 = 0
0 XOR 1 = 1
-----
01 Division Modulo-2 division is performed similarly to “normal” arithmetic division. The only difference is that we use modulo-2 subtraction ( XOR ) instead of arithmetic subtraction for calculating the remainders in each step.
If you are German, please note that in English literature, the divisor 11011 is written first, followed by the dividend 01111010 . (In German literature it’s usually the other way around.) The result of the division (quotient) is written on top of the calculation, the remainder at the bottom.
01011 ⬅︎ Quotient –––––––––––––––––– 11011 | 01111010 - 00000 ---------- 1111010 - 11011 --------- 010110 - 00000 -------- 10110 - 11011 ------- 11010 - 11011 ------- 0001 ⬅︎ Remainder (mod-2) 01011 ⬅︎ Quotient
––––––––––––––––––
11011 | 01111010
- 00000
----------
1111010
- 11011
---------
010110
- 00000
--------
10110
- 11011
-------
11010
- 11011
-------
0001 ⬅︎ Remainder (mod-2) Types of CRCsThere are different types of CRCs. They are categorized by the degree of the polynomial they use. As the first exponent of a polynomial of degree n is always present by definition (otherwise it would have a lower degree), its binary representation always begins with a 1 . In other words, the first bit of a binary polynomial representation doesn’t carry any information about the polynomial when we agree on a fixed degree.
For that reason, the first bit of a binary polynomial representation is always dropped when computing a CRC in software. So the bit size of the resulting binary representation is always n for a polynomial of degree n. Example:
CRCs types are named by their bit size. Here are the most common ones:
Generally, we can refer to a CRC as CRC-n, where n is the number of CRC bits and the number of bits of the polynomial’s binary representation with a dropped first bit. Obviously, different CRCs are possible for the same n as multiple polynomials exist for the same degree. For example, if we want to calculate the CRC-8 for these bits: 01000001 01000001 the result looks different, depending on the polynomial we use:
Above are the standard CRC-8 polynomials used in different applications such as Bluetooth, GSM-B, etc. Error DetectionHow do we choose a suitable CRC and a respective polynomial? There are three things we need to consider:
Random Error Detection AccuracyRandom errors are errors that can occur randomly in any data. For example, a single bit is flipped when transmitting data or a few bits are lost during the transmission. Depending on the bit size of the CRC we use, we can detect most of these random errors. However, for a CRC-n, 1/2n of these errors cannot be detected. The following table shows the percentage of the possible random errors that remain undetected for each CRC type:
Burst Error Detection AccuracyErrors in data transmission are oftentimes not random but produced over a consecutive sequence of bits. Such errors are called burst errors. They are the most common errors in data communication. It’s one of the CRC’s strongest properties to detecting burst errors reliably. A CRC-n can detect single burst errors with a maximum length of n bits. However, this depends a lot on the polynomial used for computing the CRC. Some polynomials are able to detect multiple bursts of errors in the transmitted data.
The Redundancy FactorUsing a CRC for error detection comes at the cost of extra (non-meaningful) data. When we use a CRC-32 (4 bytes), we need to transmit two more bytes of “unnecessary” data as compared to a CRC-16. CRCs with a lower bit size are obviously cheaper with respect to storage space. Based on these three factors, we can decide which CRC type to choose for our application. However, the polynomial you choose for your CRC also affects the efficiency and quality of your error detection. But that’s a topic for itself and we won’t cover it in this article. Fortunately, there are a couple of standard polynomials used for a particular CRC type and in most cases it makes sense to just use one of these. How It Works: The CRC AlgorithmThis section will take you through the steps of calculating a CRC-n for any input data. We will first explain the algorithm and then follow these steps to calculate a 4-bit CRC of some sample data.
ExampleLet’s see how this algorithm works in practice and calculate the 4-bit CRC for the binary Unicode representation of the letter z (as introduced at the beginning of this article):
First, we need to choose a suitable polynomial depending on our use case which must used by both the sender and the receiver. For this example, let’s use this one:
We have already performed step (1) of the algorithm in the table above: 1011 is our binary polynomial representation with a dropped most significant bit. Next, let’s go through the remaining steps as part of our modulo-2 division:
(Quotient is ignored) ––––––––––––––––––––– 1011 | 01111010 // 1. Polynomial with dropped MSB | Input 01111010 0000 // 2. Append 4 zero bits for the CRC 01111010 0000 // 3. MSB (first bit) = 0 ------------- 1111010 0000 // 4. Discard the MSB - 0000 // 5. Do nothing (= subtract 0000) –––––––––––––––––– 1111010 0000 // 3. MSB = 1 ------------ 111010 0000 // 4. Discard the MSB - 1011 // 5. Subtract the polynomial –––––––––––––––––– 010110 0000 // 3. MSB = 0 ----------- 10110 0000 // 4. Discard the MSB - 0000 // 5. Do nothing (= subtract 0000) –––––––––––––––––– 10110 0000 // 3. MSB = 1 ---------- 0110 0000 // 4. Discard the MSB - 1011 // 5. Subtract the polynomial –––––––––––––––––– 1101 0000 // 3. MSB = 1 ---------- 1010000 // 4. Discard the MSB - 1011 // 5. Subtract the polynomial –––––––––––––––––– 0001000 // 3. MSB = 0 ------- 001000 // 4. Discard the MSB - 0000 // 5. Do nothing (= subtract 0000) –––––––––––––––––– 001000 // 3. Check the most significant bit (MSB = 0) ------ 01000 // 4. Discard the MSB - 0000 // 5. Do nothing (= subtract 0000) –––––––––––––––––– 01000 // 3. MSB = 0 ----- 1000 // 4. Discard the MSB - 0000 // 5.Do nothing (= subtract 0000) –––––––––––––––––– 1000 // 7. Final remainder = CRC (Quotient is ignored)
–––––––––––––––––––––
1011 | 01111010 // 1. Polynomial with dropped MSB | Input
01111010 0000 // 2. Append 4 zero bits for the CRC
01111010 0000 // 3. MSB (first bit) = 0
-------------
1111010 0000 // 4. Discard the MSB
- 0000 // 5. Do nothing (= subtract 0000)
––––––––––––––––––
1111010 0000 // 3. MSB = 1
------------
111010 0000 // 4. Discard the MSB
- 1011 // 5. Subtract the polynomial
––––––––––––––––––
010110 0000 // 3. MSB = 0
-----------
10110 0000 // 4. Discard the MSB
- 0000 // 5. Do nothing (= subtract 0000)
––––––––––––––––––
10110 0000 // 3. MSB = 1
----------
0110 0000 // 4. Discard the MSB
- 1011 // 5. Subtract the polynomial
––––––––––––––––––
1101 0000 // 3. MSB = 1
----------
1010000 // 4. Discard the MSB
- 1011 // 5. Subtract the polynomial
––––––––––––––––––
0001000 // 3. MSB = 0
-------
001000 // 4. Discard the MSB
- 0000 // 5. Do nothing (= subtract 0000)
––––––––––––––––––
001000 // 3. Check the most significant bit (MSB = 0)
------
01000 // 4. Discard the MSB
- 0000 // 5. Do nothing (= subtract 0000)
––––––––––––––––––
01000 // 3. MSB = 0
-----
1000 // 4. Discard the MSB
- 0000 // 5.Do nothing (= subtract 0000)
––––––––––––––––––
1000 // 7. Final remainder = CRC Implementing The Algorithm in SoftwareOnce you’ve understood the algorithm above, it’s pretty straight-forward to implement it in software. The only thing you need to be clear about is whether you implement the CRC with a big-endian or a little-endian representation. In the following, we’ll provide some concrete implementation advice, assuming a big-endian format.
The concrete implementation of a CRC algorithm in software always depends on the platform and the programming language you use. If you know how to work with bits and bytes on your respective platform, you should now be able to implement a CRC algorithm of your desired type on your own. However, if you are an iOS or macOS developer and don’t want to out in all the effort to write your own implementation from scratch, we’ve dedicated another article for implementing a CRC with Swift on iOS. Another article for Android developers on how to implement a CRC in Kotlin is in our pipeline and coming soon. Stay tuned! SummaryCongratulations! You’ve made it to the end of this article and should now have a broad understanding of what a Cyclic Redundancy Check (CRC) is and how it works. Here’s a short summary of this article:
Let us finish with some final thoughts on things you should be aware of:
Where to Go From Here
|
|