number systems
Base 10:
We mortal humans use the decimal (base 10) system.
Base 10 includes 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
Here is 243 in base 10:
243 = (102 * 2) + (101 * 4) + (100 * 3) = 200 + 40 + 3.
If decimal is denoted, it will usually be with the suffix of “d” such as 12d.
Base 7:
We can apply this to any base. For example, 243 in base 7:
243(in base 7) = (72 * 2) + (71 * 4) + (70 * 3) = 98 + 28 + 3 = 129(in decimal).
Base 7 includes 0, 1, 2, 3, 4, 5, 6.
9 isn’t in base 7, so how do we represent it in base 7?
9(in decimal) = (71 * 1) + (70 * 2) = 7 + 2. Our answer is going to be 12(base7) = 9(base10).
Base2/Binary:
What about base 2? Base 2 includes 0 and 1. It works the same as the others. Here are some good values to know:
210 = 1024, 29 = 512, 28 = 256, 27 = 128, etc.

If you want to learn binary conversion and how to evaluate different bases, go here:
Trying to explain this stuff through text can be a little difficult, but that video describes it very well.
Binary is usually denoted with the prefix “0b” such as 0b0110 and sometimes denoted with the suffix “b” such as 110b.
Hexadecimal:
Hexa = 6, Dec = 10. Hexadecimal is base 16. Hexadecimal is very similar but can be a little confusing for some people. You see, we only have ten different individual numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). Hexadecimal needs 16 different numbers. You could use 0, 1… 11, 12, 13… but that would be extremely confusing. For example, what is 1432? Is that 1,4,3,2 or 14,3,2? When we need to represent anything above 9 we can instead use letters such as A, B, C, D, E, and F in the case of hexadecimal.
A = 10, B = 11, …, F = 15
Hexadecimal numbers are usually given a “0x” prefix or the suffix “h” such as 0xFF or FFh.
0x4A = (161 * 4d) + (160 * 10d) = 64d + 10d = 74d.
Learn more hexadecimal here:
Prefixes and Suffixes:
To distinguish between different number systems, we use prefixes or suffixes. There are many things used to distinguish between the number systems, I will only show the most common.
- Decimal is represented with the suffix “d” or with nothing. Examples: 12d or 12.
- Hexadecimal is represented with the **prefix “0x”**or suffix “h”. Examples: 0x12 or 12h. Another way hexadecimal is represented is with the prefix of “\x”. However, this is typically used per byte. Two hexadecimal digits make one byte. Examples: \x12 or \x12\x45\x21. If bits and bytes seem a little weird we’ll get into them soon so don’t worry.
- Binary is represented with a suffix “b” or with padding of zeros at the start. Examples: 100101b or 00100101. The padding at the start is often used because a decimal number can’t start with a zero.
Answer the questions below
What is 0xA in decimal?
Join this room
What is decimal 25 in hexadecimal? Include the prefix for hexadecimal.
Note
To convert from hex to decimal, use the operation ‘From Base’ with Radix: 16, or from decimal to hex, use ‘To Base’ with Radix: 16.
To convert from binary to hex, use the operation ‘From Binary’ and ‘To Hex’, or for hex to binary use ‘From Hex’ and ‘To Binary’.
Note
When transforming decimals to and from binary set the operations ‘To Base’ or ‘From Base’ to Radix: 2.
When transforming letters to and from binary set the operations ‘To Binary’ or ‘From Binary’ as Delimeter: Space and Byte Length: 8.
Note
To convert to and from ASCII in CyberChef, simply use the operation required for conversion. For example, to convert from ASCII to binary, use the operation ‘To Binary’.
Note
In Cyberchef, use the operation ‘Escape Unicode Characters’ to decode a language to its Unicode value and ‘Unescape Unicode Characters’ to encode a Unicode value to its language.
Note
In CyberChef, ensure that the checkbox ‘Internationalised domain name’ is checked when using the correct operations.
Binary
Quick Summary
Binary is a numbering system that consists of only two possible values for each digit: 0 and 1. It’s the primary language for computers and makes up all the raw data stored on our machines. In this lab, you will use CyberChef to encode and decode data to and from binary.
What is binary?
Binary is a numbering system invented by Gottfried Leibniz in 1679 that consists of only two numbers or digits: 0 and 1. This numbering system is the basis of binary code, which is the primary language of computers.
The term binary can also refer to any digital encoding/decoding method with only two potential states. The 0 and 1 values in digital data memory, storage, processing, and communications are commonly referred to as ‘low’ and ‘high’, or ‘off’ and ‘on’, respectively.
A binary number is made up of eight bits, known as a byte. A bit, short for binary digit, is the smallest unit of data on a computer; each bit has one of two values: 1 or 0. Executable (ready-to-run) applications are identified as binary files with the file extension ‘.bin.’ Programmers often call these executable files binaries.
How does it work?
When typed out, binary numbers appear unreadable to the human eye. This is because the weight of the digits grows by powers of two rather than powers of ten. The placement of a binary digit determines its decimal value. For example, if we have an 8-bit binary number, the values will be calculated like so:
Bit 1: 2^0 = 1
Bit 2: 2^1 = 2
Bit 3: 2^2 = 4
Bit 4: 2^3 = 8
Bit 5: 2^4 = 16
Bit 6: 2^5 = 32
Bit 7: 2^6 = 64
Bit 8: 2^7 = 128
To read binary, we work from right to left. As shown above, the power of the right-most digit is zero, followed (towards the left) by two to the power of one, two, three, and so on. We can transform a binary number into a digital number by adding together all these values.
We can express any decimal number from 0 to 255 in binary form by adding the different values when the bit has a value of 1. By adding more bits to the system, we can represent more significant numbers.
For example, the binary number 01011001 represents the decimal number 89 (1+0+0+8+16+0+64+0 = 89).
Here are the first 15 decimal numbers in binary form:
The transformation of letters into binary is defined by the rules set by UTF-8, in which each character is assigned a group of eight binary digits:
Why is it used and who might use it?
Binary is the primary language for computers and electronic devices, which can only understand and store data in this form.
However, at the most granular level of the data, complex media data such as images and videos are also stored in binary code. An image, for example, is made up of hundreds of thousands of pixels, with each pixel containing an RGB value (color value) stored in binary code.
Music is also stored in binary format, using a technique called pulse code modulation. This technique digitizes continuous sound waves by taking snapshots of their amplitudes every few milliseconds and storing amplitudes of the snapshots in binary. Each second of sound could have a possible 44,000 binary strings. The binary code of an audio file allows a computer to determine the frequency of the vibration in the coils of the speaker, which then projects the sounds we hear.
Hexadecimal
Quick Summary
Hexadecimal, also known as hex, is a base-16 numbering system used to shorten binary code and make it easier to understand. As computers can only read in 1s and 0s, hex transforms this data into a user-friendly format. In this lab, you will use CyberChef to encode and decode data using hexadecimal.
What is hexadecimal?
Hexadecimal, also called base-16 or hex, is a number system that uses 16 unique symbols to represent a particular value. These symbols are the numbers 0-9 and the letters A-F (in capitals), which represent an equivalent binary or decimal number, starting with the least significant digit on the right-hand side.
The unique hexadecimal symbols appear in the following order:
0 1 2 3 4 5 6 7 8 9 A B C D E F
The decimal 12, for example, would be C in hexadecimal.
Computers only understand binary code (or base-2), which is challenging for humans to read, so programmers created hexadecimal to simplify the binary system. Hexadecimal takes one hexadecimal digit to represent four binary digits. Whereas eight binary digits represent one byte, only two hexadecimal digits are required to do so.
Hex is often used in computing due to its simplicity and the ease of converting it to binary. To learn more about the binary numbering system, visit our lab here:
How does it work?
Counting in hex is very similar to decimal, except there are six extra non-numerical digits. Firstly, you count from zero to nine and then from A to F. Once you’ve reached F, you roll that place back to zero and increment the digit to the left by one.
Converting hex to/from binary
Converting between hex and binary is simple because each digit of a hexadecimal number represents four bits (a bit being an individual binary digit) of a binary value. So a byte (eight bits) can always be represented by two hexadecimal digits. Therefore, hex is a concise way to represent a byte or group of bytes.
Converting from binary to hex
To convert binary data to hex, we base our workings on the fact that four binary digits represent one hex digit.
These are the steps for converting binary to hex:
- Split the binary value into groups of four digits, starting at the right-hand side of the data.
- Replace each group of four digits with its matching hex value in the table above.
For example: Convert the binary 0110110001001101 to hex.
Starting at the right-hand side of the binary number, we need to sort the 1s and 0s into groups of four:
Binary groups: 0110 1100 0100 1101
Then, we can use the table above to convert these groups into a single hex digit:
Binary groups: 0110 1100 0100 1101
Hex digit: 6 C 4 D
Our binary number 0110110001001101 is converted to 6C4D in hex.
Converting from hex to binary
To convert from hex to binary, we simply reverse the process:
- For each hex digit, find the matching four-digit binary value in our table, and replace the one hex value with groups of binary digits.
For example:
Convert the hex DECAF to binary.
First, sort the string into individual digits:
Hex digit: D E C A F
Now convert each hex digit into groups of four binary bits:
Hex digit: D E C A F
Binary groups: 1101 1110 1100 1010 1111
Our hex digit DECAF is converted to 11011110110010101111 in binary.
How and why is hexadecimal used?
Computer error codes use a hexadecimal format. For example, the STOP codes displayed on the famous Windows Blue Screen of Death, are always hexadecimal.
Programmers use hexadecimal digits because their values are shorter than decimal equivalents and are more concise than binary. An example of this would be the hexadecimal value F4240. This value is equivalent to 1,000,000 in decimal and 1111 0100 0010 0100 0000 in binary, but the hexadecimal format allows you to store the same information while using less space.
It is also simple to convert between hexadecimal numbers and binary. Hex can therefore be used to represent large binary numbers in just a few digits. This makes it easier for us to read, write and understand data, which also reduces the possibility of human error when working with it.
Hexadecimal is commonly used as an HTML color code to represent a specific color on the color chart.
For example, the hex value FF0000 is used to define the color red. Breaking this down, the value becomes FF,00,00. This then defines the amount of red, green, and blue colors in this specific shade (RRGGBB), which in this example is 255 red, 0 green, and 0 blue.
Hexadecimal values can be expressed in two digits up to 255, and HTML color codes use three sets of two digits. This means over 16 million (255 x 255 x 255) possible colors can be represented in hexadecimal format, taking up less space than other formats like decimal or binary.
ASCII
Quick Summary
ASCII is a code for representing 128 characters as numbers. Computers use ASCII to represent text data, which makes it possible to transfer data from one computer to another. In this lab, you will use CyberChef to encode and decode using ASCII.
What is ASCII?
Computers operate using numbers and therefore need a way to convert letters (and other characters) to and from numbers, so they can be stored and manipulated. An encoding standard known as the American Standard Code for Information Interchange (ASCII),which debuted in 1963, is used for this purpose. The code represents 128 characters as numbers, with each character assigned a number from 0 to 127. The standard ASCII character uses seven bits for each character.
ASCII contains the upper and lower case letters of the English alphabet, crucial punctuation markers, mathematical symbols, and 33 control codes for data transfer and text formatting.
The following characters are grouped like so:
- 0-32 and 127 – control codes for data transfer as well as spaces, tabs, and line breaks.
- 48-57 – digits (numbers).
- 65-90 – upper case letters.
- 97-122 – lower case letters.
- 33-47, 58-64, 91-96, and 123-126 – punctuation marks, mathematical symbols, brackets, and other key characters.
ASCII table
Each letter is assigned a value according to its position within the ASCII table:
There is also an extended ASCII table that contains an additional 128 characters, bringing the total to 256, which can be viewed here.
How does it work?
All computer data is stored as numbers, which means a computer must transform any characters (such as those pressed on a keyboard) into numbers to process them. ASCII is the mapping process that assigns each character to a number. For a character to display on the screen, the computer memory must read its ASCII number and “draw” it as a glyph.
How and when is ASCII used?
All ASCII-capable applications (like word processors) can read or store text information to and from computer memory. Without ASCII, modern computing would be more complex and less streamlined.
ASCII code provides a common language across all computer operating systems to establish this link between the screen and hard drive. This means, regardless of the software installed on your computer (Windows, Mac, Linux), you’ll be able to read documents, as every software recognizes the binary equivalent of the characters on-screen.
Files using ASCII can also serve as a common denominator for all types of data conversions. For example, if a software program cannot convert its data to another format, both programs can still input and output ASCII files. This makes data conversion possible, even when the two software programs are incompatible.
Base64 Encoding
Quick Summary
Base64 is a binary-to-text encoding scheme used to transfer content-based messages over the internet. When binary data is transmitted to transfer media, systems can misinterpret the data and lose or corrupt it in the transmission process. Base64 is used to prevent this. In this lab you’ll use CyberChef to encode and decode using Base64.
What is Base64 encoding?
Base64 is a binary-to-text encoding scheme that represents binary data and transforms it into an ASCII string format. Base64 encoding schemes are used when data needs to be stored and transferred over media designed to deal with text.
How does it work?
Base64 breaks binary data into 6-bit groups of three bytes (24 bits) and represents the groups as characters in ASCII. It does this in two steps:
Step one is to break down the binary string into 6-bit blocks. Base64 only uses six bits to ensure encoded data is printable and humanly readable. ASCII special characters are not used in this method.
The 64 characters of Base64 encoding (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters, the plus sign (+), and the forward-slash (/). The 65th character, known as pad, is the equals sign (=). Pad is used when the last segment of binary data doesn’t contain the full six bits.
All 64 characters in Base64 encoding can be found in the following table:
For example, say we want to convert the plaintext string ab@yz. The binary stream of this string is:
0110000101100010010000000111100101111010.
To encode to Base64 we first break the binary stream into groups of six bits:
011000 010110 001001 000000 011110 010111 101000 (note here that two 0s have been padded to the final grouping to complete the groups of six characters).
These binary numbers groupings translate into the decimal numbers 24, 22, 9, 0, 30, 23, and 40.
Step two is to convert these decimals using the Base64 encoding table.
The Base64 table shows us that:
- 24 is Y
- 22 is W
- 9 is J
- 0 is A
- 30 is e
- 23 is X
- 40 is o
YWJAeXo= (padded with ’=’ to account for extra bits added).
This two-step process is applied to the entire binary string that is being encoded.
Why do we need it?
Base64 encoding is used to transfer data over a system that only supports ASCII formats. Examples of this are email messages on Multipurpose Internet Mail Extension (MIME) and Extensible Markup Language (XML) data. Base64 is also the industry-standard format for SSL certificate content. The most common web servers will generate certificate signing requests and accept SSL certificates in Base64 format.
Additionally, Base64 encoding is used when binary data found in media, such as images or video, is transmitted over systems designed to transmit data in an ASCII (plaintext) format.
Base64 encoding’s necessity comes from the problems that can occur when media, such as those types mentioned above, are transmitted in raw binary format to text-based systems.
Because text-based systems like email interpret binary data as a wide range of characters (including command characters) binary data can be misinterpreted by those systems and lost or corrupted in the process of media transmission. Sending it as plain ASCII text in Base64 encoded format will avoid such transmission problems.
Unicode
Quick Summary
Unicode, formally known as the Unicode Standard, provides a unique number for every character, regardless of platform, device, application, or language. Modern software providers have adopted it to transport data through these mediums without corruption. In this lab, you will encode and decode using Unicode in Cyberchef.
What is Unicode?
Unicode is a universal character encoding system that assigns a code to the characters and symbols of most languages. It’s the only encoding standard that ensures you can retrieve or combine data using an unlimited combination of languages.
Unicode is used in all major operating systems, search engines, browsers, laptops, and smartphones, as well as on the Internet and the World Wide Web (URLs, HTML, XML, CSS, JSON, etc). As of version 13.1, Unicode now encompasses 144,076 characters used in almost every language in the world, alongside emojis.
To find the Unicode value for emojis, visit the official Unicode website here.
The emergence of Unicode and the availability of tools supporting it are among the most significant recent global software technology trends.
How does it work?
Unicode defines codes (unique numbers) for all characters used in most languages, including diacritics, punctuation marks, mathematical symbols, technical symbols, arrows, and dingbats. Overall, Unicode has codes for over 100,000 characters from world alphabets, ideograph sets, and symbol collections, including the classical and historical texts of numerous written languages.
Unicode can represent characters in different encoding forms, such as UTF-8 and UTF-16.
UTF-8 is a variable-length encoding scheme in which each written symbol is represented by a one to four-byte code, whereas UTF-16 is a fixed-width encoding scheme in which a two-byte code represents each written symbol.
ASCII and Unicode
Unicode and ASCII are the most popular character coding standards and are frequently compared as a result. However, they have some essential differences.
Unicode is a universal character encoding method used for the consistent encoding, representation, and handling of text expressed in any spoken language. In comparison, ASCII (American Standard Code for Information Interchange) uses 128 English characters (symbols, letters, digits) as a number to represent text in computers.
A significant difference between these coding methods is size. Unicode represents characters from a more substantial character table than ASCII, which has 128 characters (or 256 with extended ASCII). ASCII represents lowercase letters (a-z), uppercase letters (A-Z), digits (0–9), and symbols such as punctuation marks. In contrast, Unicode represents English, Arabic, Greek, mathematical symbols, historical scripts, emojis, and so on, covering a far greater range of characters than ASCII.
Additionally, the first 128 Unicode characters point to ASCII characters, meaning ASCII can be viewed as a subset of Unicode; this isn’t true vice versa.
Unicode also allows characters to be up to 32 bits wide, whereas ASCII uses seven bits (or eight using the 256 version) to represent a character. For Unicode, that’s over four billion unique values to represent all languages accurately. Because Unicode allows for larger bit sizes, it may take up more storage space than ASCII files.
How and why is it used?
The Unicode encoding system provides a consistent method of encoding multilingual plaintext, making it easier to exchange text files internationally. Supporting data with multiple languages such as French, Japanese, and Hebrew, Unicode allows you to combine different language scripts on a single file. Before Unicode, a computer operating system could only process and display the written symbols on its code page, which was tied to one language script. For example, if a computer could process German, it could not process Hebrew or Japanese.
There is a growing trend for new computer technologies to adopt Unicode. Currently, industry leaders such as Microsoft, Apple, HP, IBM, and Oracle have adopted it. Unicode is also a preferred text encoding method in web browsers Google Chrome and Firefox, and is used in Java technologies HTML, XML, and Windows, and Office.
Visit the Unicode website here to learn more.
In this lab
In this lab, you will practice encoding and decoding Unicode using CyberChef. You will also search through the Unicode website here to find Unicode values for emojis.
Note
In Cyberchef, use the operation ‘Escape Unicode Characters’ to decode a language to its Unicode value and ‘Unescape Unicode Characters’ to encode a Unicode value to its language.
Punycode
Quick Summary
Punycode is an encoding method used to convert Unicode characters to ASCII – a smaller, more restricted character set. However, Punycode can also be exploited to launch a homograph attack, which lures victims into clicking on an illegitimate URL to steal sensitive data or infect a device. In this lab, you will use Cyberchef to decode and encode Punycode URLs to identify any suspicious modification.
What is Punycode?
Punycode is an encoding method that converts Unicode characters to a limited ASCII(American Standard Code for Information Interchange) character set.
This character set, called the Letter-Digit-Hyphen (LDH) subset, consists of:
- Lowercase letters: a-z
- Digits: 0-9
- Special character: hyphen (-)
Punycode is primarily used to process Internationalized Domain Names (IDNs), which are limited to ASCII characters, as this allows for Unicode characters to be used in a hostname resolution in ASCII format. For example, if a domain is compromised of Chinese characters, Punycode will encode them into an ASCII format.
How does it work?
Punycode allows domain names to include non-ASCII characters by creating a bootstring algorithm, which encodes Unicode characters used within a limited set of ASCII characters. The algorithm interprets any string passed to it and analyzes it for non-ASCII characters, then Punycode goes through a series of steps to create an encoded string usable on ASCII systems. Once complete, the algorithm performs some normalization functions on the string to eliminate ambiguities that may exist in the Unicode encoded text.
Step one: The process of normalization is performed on the string to eliminate any Unicode text ambiguity. Normalization involves converting all characters into lowercase (a more ‘normal’ format), where applicable. The string is then searched for characters that exist within the ASCII character set. Any characters found within the set are ignored; however, any non-standard ASCII characters are removed and a hyphen is added to the end of the string.
Step two: If any non-standard ASCII characters are found, the prefix **xn--** is added to the string. This signifies that the string contains ACE (ASCII Compatible Encoding) and that the hyphen appended should be interpreted using Punycode rather than as part of the string itself.
Step three: Punycode analyzes the non-ASCII characters and appends a string of characters to the hyphen. Here, it uses ASCII characters to dictate which characters should be represented and where they should be placed within the string. While doing this, Punycode ensures that the end string does not exceed a 63-character limit.
To see these steps in action, we will use ‘adιdas.com’ but with the Greek letter ‘ι’ rather than ‘i’. Here, the ‘ι’ will be removed, the prefix xn-- will be added to the start of the string to signify the presence of Unicode characters, and a hyphen will be added to the end of the string. Punycode will then add characters to the ending hyphen to represent the Unicode character.
Our final string would be: **xn--addas-rbe.com**
Though the resulting text is not the most straightforward to read, it accurately represents the original string of Unicode characters while using only previously allowed characters for domain names.
Here is an example of the process for a domain name using all Unicode characters:
Punycode allows the encoding of any 8, 16, or 32-bit character. In the example above, we have used the domain ‘www.날씨.co.kr’. The characters ‘날씨’ are converted by Punycode to ‘xn—i20bj30b’. The newly converted URL can now be represented by ‘www.xn—i20bj30b.co.kr’. This means that if you bought the domain name ‘www.날씨.co.kr’, you would be actually be buying ‘www.xn—i20bj30b.co.kr’.
By default, many web browsers use the xn-- prefix to indicate that the domain is using Punycode to represent some Unicode characters. However, not all browsers show the Punycode prefix, leaving the potential for a homograph attack.
Homograph attacks
Though Unicode characters may look the same in a web browser address, their appearance can be deceptive. For instance, many letters in the Roman alphabet look similar to letters in Greek and Cyrillic. This similarity means it can be easy for a malicious actor to replace ASCII characters with Unicode characters in a domain name.
For example, we could swap the English letter ‘T’ for a Greek Tau: ‘τ’ – an almost identical symbol to the naked eye. However, the Punycode URL would be ‘xn—5xa’ rather than just ‘T’, and if a browser address bar renders in a way that does not indicate that Punycode is present, these character changes can be hard to identify. This ambiguity can lead to a user unintentionally clicking on a harmful link.
The technique of swapping out ASCII characters for Unicode is called a homograph attack. The URL will look authentic, and the page content may appear the same, but it will be a different website set up with malicious intentions, such as stealing the victim’s sensitive data or infecting their device. Homograph attacks use techniques like phishing, forced downloads, and scams to fulfill these aims.
How and why is it used?
Punycode is valuable for processing internationalized domain names. For example, characters in the Korean character system, Hangul, cannot be adequately encoded using ASCII. Therefore, Punycode takes Unicode encoded strings and converts them into a universally readable and resolvable ASCII string.
Before Punycode, companies and services operating in Korea (or other countries using a character-based language) had to adjust their domain name to fit the ASCII restrictions. For example, ‘날씨 ’ translates to ‘weather’ in Korean. Pre-Punycode, a website would have had to change its domain name to an ASCII acceptable format, such as ‘www.weather.co.kr’. Now, a website can use the domain name ‘www.날씨.co.kr’ intact with Unicode characters. This allows companies and services to keep their brand name identity while operating in markets that do not use the Latin alphabet.
In this lab
In this lab, you will encode and decode Punycode URLs to identify any suspicious modification.