07

Diy: make your own ciphers

OK, enough theory, let’s get down to actually doing something now. In this chapter, we’ll guide you through the rather interesting journey of creating your own cipher. The first question that crops up is “What cipher should Classical ciphers are the best examples, and are the easiest ones as well. The problem with them is that they work on the level of ‘letters’. Classical ciphers don’t let you go below that level. In our daily life, we deal with a lot of data that’s not just a collection of letters. For example, there’s no easy way to convert an image or a video to letters. Of course, the word ‘letter’ is being used loosely here. Another question could be “Letters in which language?” Modern encoding techniques can represent letters from quite a long list of languages using clever techniques to organize bits. This leaves us with one option – abandon classical ciphers because classical ciphers cannot be used to encrypt data well enough in a digital world.

DIY: MAKE YOUR OWN CIPHER

So now that you know a decent bit about cryptography, how about making your own little cipher?


OK, enough theory, let’s get down to actually doing something now. In this chapter, we’ll guide you through the rather interesting journey of creating your own cipher.

The first question that crops up is “What cipher should Classical ciphers are the best examples, and are the easiest ones as well. The problem with them is that they work on the level of ‘letters’. Classical ciphers don’t let you go below that level. In our daily life, we deal with a lot of data that’s not just a collection of letters. For example, there’s no easy way to convert an image or a video to letters. Of course, the word ‘letter’ is being used loosely here. Another question could be “Letters in which language?” Modern encoding techniques can represent letters from quite a long list of languages using clever techniques to organize bits. This leaves us with one option – abandon classical ciphers because classical ciphers cannot be used to encrypt data well enough in a digital world.

 

Coming to modern ciphers, we have two options: asymmetric algorithms (or public key algorithms) and symmetric algorithms (or private key algorithms). It’s well known that asymmetric algorithms provide more security by allowing the key used for encryption to be known by anyone. However, most asymmetric algorithms are based on very complicated mathematical calculations. Though we’re going to create our own algorithm, and we would want to make a secure one, designing an asymmetric algorithm would require quite a lot of explanation and in-depth understanding of number theory, which is beyond the scope of this book. So let’s stick to a symmetric algorithm, and move on to creating it. We’ll do this stepwise.

Step 1

The first factor to be considered before starting the actual design procedure is to decide on the block and key length. The bigger both are, the better the security. This is because small blocks and/or small keys can make the ciphertext vulnerable to cryptanalysis attacks.

Of course, you should also know that as the block size and key length increases in length, the cipher process slows down. All CPUs are designed to work with a certain amount of data per cycle. For example, 32-bit computers work with 32 bits per cycle, while 64-bit CPUs can process 64 bits in one cycle (max). So when you make the block size larger than that, each block would need multiple cycles to be transformed and that will slow down the entire process. We’ll be using a 32-bit key and a 32-bit block size.

Step 2

Since we’re creating a process which can scramble and unscramble messages using the same key, we obviously need to find a mathematical process that’s reversible. Step 2 is choosing that method to serve as one of the key parts of the algorithm. For this example, we’re choosing a simple ‘XOR’ method.

The table will illustrate the XOR method: The idea is simple: You take two bits. If both bits are the same (either both 1s or both 0s) then the result is 0. If they’re different, the result is a 1. Notice that if you reverse the order of the rows and read bottom up (treat the last row as the First Bit and the First Bit as the XOR Result row) even then the results hold true. This is the simplest way to know that the process is easily reversible. In our example, the first bit row is the actual data, and the second bit would be the key. As long as you know the key (Second Bit), you can encrypt and decrypt.

Step 3

As we said earlier, small block-and key sizes are a bad combination that can be cracked easily. However, that’s true only if the “algorithm” is known by the person trying to crack the cipher (someone who’s not the intended recipient). In this case we won’t be increasing the block size but altering the key size so that the strength of the cipher increases. You could also increase the block size to further increase the strength, but we’re trying to keep things simple to understand.

We’re going to increase the key size using a very simple technique – we take the original key (32-bit) and split it into four parts. Remember, each bit sequence of 4 bits corresponds to one hexadecimal number. We will use that here instead of dealing with bits directly to make it easy to read. Our key when represented in hexadecimal would be a sequence of 8 hexadecimal numbers. For simplicity, our key will be “02468ACE”. We’re going to derive a 128-bit key from the original 32-bit key by working with chunks of 8 bits from the original key using the following table: If you notice, the pattern is actually simple: we rotate the chunk’s positions by one and we do this three times. We then write them in sequence making the derived key:

 

02468ACECE02468A8ACE0246468ACE02

We’ll use this key to run the main algorithm. Before we proceed, there are a few things you need to note:

  • The algorithm we’ve used to derive the 128-bit key from the original 32-bit key is very weak and can’t be used for any real secure communication.

  • There are better ways to generate a 128-bit key from a 32-bit key. For example, you can do bit shuffling rather than shuffling blocks of 8-bits. By nesting rotations (rotating bits of chunks, as well as rotating the chunks themselves) the generated sequence would make the procedure of tracking the 32-bit key from the 128-bit key a bit more difficult.

 

  • We could have chosen a 128-bit key in the very beginning, it would have served as well, but then, we would have to send the 128-bit key – thus making it impractical. We do the 32-bit to 128-bit transformation so that the algorithm doesn’t make use of the key directly but first obtains the 128-bit key.

Now, some of you will be wondering how on earth we’re going to encrypt a 32-bit block-size plaintext with a 128-bit key size. Also, if we’re using XOR, how do you XOR a 32-bit block with a 128-bit block? That’s what we’ll show you in the next step.

Step 4

Since the plaintext block size is 32 bit and the key length is 128 bit, they cannot be involved in a direct XOR operation. So what we do here is take consecutive 32-bit blocks from the plaintext and XOR them with consecutive 32-bit blocks of the key. Following this method, the key would get consumed by the fourth block of the plaintext (32 x 4 = 128). The fifth block of the plaintext would then be XORed with the first 32-bit block of the key, and so on. Hence, if the plaintext is

“FDA72BE910CA8AB3C1068EEA735715BCFF2938A31029ABD37ECA928CE7EA53 3” and the 128-key is “02468ACECE02468A8ACE0246468ACE02” (which we got in Step 2) then the XOR operation would be done with chunks in the following manner.We’re going to concatenate the 32-bit blocks of result into one to get the ciphertext. Hence the resulting ciphertext would be: FFE1A127DEC8CC394BC88CAC35DDDBBEFD6FB26DDE2BED59F- 40490CAA1609DC1

Decryption

Decryption of the ciphertext is straightforward in this algorithm. Of course, the two things we’re going to need before beginning the process are the original 32-bit key and the entire ciphertext. We then are going to derive the 128-bit key from the 32-bit key using the process in Step 3. Once we have that, we’re simply going to do the XOR on each 32-bit block of the ciphertext. It would be something like this: We can then concatenate the resulting blocks of plaintext to arrive at the original plaintext. Simple and easy, but was our design really any good? No. It wasn’t. Not only is the encryption easy to break but the design process contains a perspective which is unnecessarily complicated and can be simplified.

How our design process is flawed

While the algorithm does look real good, there is a flaw in its design. The question is where? Let us reconsider the way we’re doing the XOR operation. It’s in a sequence of blocks. The first 32-bit block of plaintext and the first 32-bit block of the key are being XORed. The second 32-bit block of plaintext and the second 32-bit lock of the key are being XORed. Similar for the 3rd and 4th blocks of plaintext and the key. For the fifth 32-bit block of plaintext, we’re reusing the first 32-bit block of the key. If you pay just a little attention, all that we’re doing is to XOR the first 128 bits of the plaintext with the full 128-bit key and the next 128 bits of plaintext with the key again. The table we used to show the conversion of plaintext to ciphertext (table 3) could also have been shown as in the following table (table 5).

By such behavior we can say that the block size that the algorithm is actually using is not 32 bits but 128 bits. Instead of thinking that we’re running the algorithm in chunks of 32 bits, you can rather think that we’re operating in chunks of 128 bits instead. By changing the perspective, we would actually be simplifying the algorithm. But then again, the simplification would only be in terms of thinking, not in terms of writing it as a

program for the computer.

A view from the implementation perspective

We said earlier that implementations of any algorithm on a machine are limited by the capabilities of the machine and that 32-bit machines can take in only 32 bits and 64-bits machines can take only up to 64-bits in one cycle. Hence, when implementing this algorithm on the machine, you cannot just stuff up all the 128-bits of the plaintext in a variable and 128-bits of the key into another variable and run the XOR operation. The XOR operation is always done on bits and the number of bits that a machine can take in is limited. Hence, your implementation as such will have to consider those boundaries. In such a context, you will have to go with either chunks of 32 bits or 64 bits depending on the machine on which you’re going to run the encryption.

It’s the same reason why we have different setup files for 32-bit and 64-bit architectures for the same version of the same software. The original perspective of running the XOR operation in blocks of 32-bit was not all that bad. After all that’s how you would be doing it on a 32-bit machine.

Suitable programming languages for implementation

Just because you can create beautiful web pages with great functionality using PHP and Javascript does not mean you can use those for implementing the cipher we just designed. Some factors which would govern the decision on the programming language to be used for implementation are:

1. Features provided by the language: While PHP and Javascript can be used to do bit-level operations, those are not the things these languages were designed for. The use of PHP and Javascript remains on a higher level than playing with bits. A little lower level programming language such as Java, C, C++, etc., would be what you need.

2. Speed: PHP and Javascript are not really programming languages in the classical sense. These are scripting languages. Though you do write programs in them and they do work, the programs are not compiled. They are interpreted. While Java has a rich range of functionality and can work on bit-levels, it is not really fast at doing that (after all, Java programs too are bytecodes which are interpreted by the JRE). A programming language which when compiled creates high performance code which can run on multiple architectures is deemed ideal for such usage. While Java is cross platform compatible, it’s not faster than C/C++ for bit-level operations. Thus C/C++ are thought to be the best languages to write high performance ciphers with complicated mathematical procedures.

As a side note, we would like to say that some people argue that Haskell is high performance as well, but benchmarks show that it’s not faster than C/C++ (implementation of Haskell itself happens to be in C/C++) and thus it does not really outperform C/C++.

 

Since most modern ciphers deal with mathematical calculations, C/C++ would qualify easily as the duo comes closest to hardware while maintaining high speed for calculations.

A missing piece

We told you earlier that we were not going to illustrate the best ciphering technique. However the algorithm that we have designed has a missing piece in it: the cipher will fail for any data whose size is not a direct multiple of 128 bits (16 bytes). There are multiple ways to do just that. One of those ways would be to pad the plaintext with random data to make its size a perfect multiple of 128 bits and send along the size of data that’s actually to be considered after decrypting along with the key.

Disadvantages of using your own cipher

Nothing comes without its problems. A custom-designed cipher such as the one we illustrated in this chapter are no exceptions.

1. The first disadvantage of making your own cipher is that it’s going to be weak (unless you’re a mathematical whizkid). Well established ciphers are the result of years (if not decades) of mathematical research. It’s next to impossible to design a new algorithm overnight that would be cleverer than years of research done by actual mathematics whizzes.

 

2. The second disadvantage is that no one except you would be able to use it for practical purposes. The software to encrypt and decrypt data using well known algorithms is usually already installed in the operating system or other software which is needed to use them (e.g. web browsers). Your algorithm is going to be just yours. You know the design and how it’s implemented, and no one else can make use of it. Not only that, you need to transfer the algorithm to everyone who wants to use it.

 

3. You need to implement the cipher yourself. By that we mean to say that if you’re going to use your own cipher on your computer, you need to write the program yourself as well. This can take time and can be imperfect or make it insecure if the person who is writing the algorithm skips a step.

 

4. If you encrypt something using your own cipher and then forget the algorithm, or lose the program you use to encrypt and decrypt data, you’re literally locked out of your own data. No other program on this planet is going to help you there. Make sure you have saved the algorithm and the program’s source code somewhere safe other than your own computer (cloud services are a good choice), so that if you ever need it back, you can have it.

Advantages of using your own cipher

Reality is that no one wants to do anything unless it has got at least some advantages. So what are the advantages to having your own cipher? The first advantage (and the only big one) is that it’s yours. You know how it works and you alone know it. So if you’re going to encrypt something using your own cipher and send the data (or store it), you alone will know how to decrypt it! For a cipher to be cracked, the key needs to be known; but that’s not the only thing needed. You also need to know the algorithm. If someone gets the key but does not know the algorithm, you’re still relatively safe. You can implement your own cipher as a program which can be used to ‘password protect’ data – the password you enter should act as the key which can then decrypt the data. If the wrong key is supplied, the decrypted result would be garbage. As you can see, this technique can be used with only what you use; after all it’s your own cipher and you would have wanted to use it only for yourself. Of course you can give the cipher and the program you created to your friend, but the cipher would not be accepted as IETF for being utilized in HTTPS anyway and it would remain private.