Wednesday, February 8, 2012

Encryption by hand

Audio recording of blog post:

The latest edition of 2600 Magazine has an interesting article about encryption. It explains a way to encrypt messages by hand, without the help of a computer. It's a two-step process: First, the message is converted into a series of numbers. And second, it is encrypted using a secret key.

This kind of encryption technique is thought to be used by spies to decrypt messages from number stations. Number stations came into existence shortly after World War II and still exist today. They are short-wave radio stations that are silent most of the time, but occasionally will broadcast a voice that reads off a series of numbers. These numbers are thought to be encrypted messages for spies in the field. The Conet Project has a great collection of freely-downloadable recordings from number stations.

In this blog post, I'll describe how perform this encryption technique.

Cypher

The message must first be converted into a series of numbers before encryption can occur. This is done using what's known as a simple substitution cipher, where each letter in the alphabet corresponds to a numeric code. An example of a very basic substitution cipher looks like this:

Each letter in this simple substitution cipher has a two-digit code.

This cipher assigns two-digit codes to each letter. The code 01 is assigned to the letter A, 02 to the letter B, and so on up to 26 for the letter Z.

However, there is a more efficient way to encode each letter. A cipher called a straddling checkerboard takes advantage of the fact that certain letters in the alphabet are used more often than others. The numeric codes for these letters have only one digit, while the codes for all the other letters have two. This makes the cipher text shorter, which makes the message easier to encrypt and decrypt.

A "straddling checkerboard" is arranged so that the most frequently-used letters are only one digit long.

The image above shows the straddling checkerboard that we'll use. The most frequently used letters in the English language ("AEINORST") appear on the top line. The author chooses to arrange them in a way that makes them easy to memorize ("AT ONE SIR"). However, they can be arranged in any order, as long as the person decrypting your message uses the same arrangement. The rest of the alphabet appears in the rows below.

The dashes you see on the first row are called shift characters. You use them to "jump down" to another row in the table. For example, the number 2 in the first row is a shift character, which means that 2 will be used as a prefix for the characters that are in the row labeled 2 (the second row). The number 6 in the first row is also a shift character, meaning that 6 will act as a prefix for the characters that are in the row labeled 6 (the third row). For example, the word WOMBAT translates to 64 3 29 20 0 1.

But what if your secret message has numbers in it? Well, numbers are encoded in a special way. The # character in the table represents the number escape character, which encodes to 69. The number, when encoded, will start and end with 69. Also, each digit in the number is repeated three times. For example, 2012 becomes 69 222 000 111 222 69.

Encryption

Before the message can be encrypted, a secret key must be generated. The key is a series of random numbers and must be at least as long as the ciphered message itself. If the key is longer than the ciphered message, then "0"s are appended to the message.

So, let's encrypt the message OCEANS11 using the key 62206 45175 09174 12846. Note that, spaces are inserted for readability purposes and are not part of the message or the key.

  32150 47691 11111 69000 (plain text)
- 62206 45175 09174 12846 (key)
-------------------------
  70954 02526 12047 57264 (encrypted)

The plain text message contains 17 digits. Because my key is 20 digits, I append three zeroes to the plain text so that it matches the size of the key. Then, I subtract each digit in the key from each digit in the plain text. If the result is less than zero, I add 10. For example, in the first column, I calculate 3 - 6, which results in -3. Because this is less than zero, I add 10 to get 7 as the final answer. In the second column, the result of 2 - 2 is 0, which is not less than zero, so I leave that as my final answer.

Decryption

  70954 02526 12047 57264 (encrypted)
+ 62206 45175 09174 12846 (key)
-------------------------
  32150 47691 11111 69000 (plain text)

Decrypting a message involves adding each digit in the key to each digit in the encrypted text. If the result is greater than 10, then 10 is subtracted from the result. For example, in the first column, the result of 7 + 6 is 13. This is greater than 10, so I subtract 10 to get 3 as my final answer. In the second column, 0 + 2 is 2, which is not greater than 10, so I leave this as my final answer.

Saturday, February 4, 2012

The Java Collections Framework - Maps

The Collections Framework is a group of classes that allow you to group objects together in various ways and perform operations against those groupings. It is contained within the core Java API, so you don't have to install anything extra to use it. Its features are so basic and powerful that pretty much every piece of software written in Java makes use of this framework in one way or another. The framework has been around since Java 1.2 (1998) and has been improved upon with every subsequent Java release. There are four major collection types: Lists, Sets, Queues, and Maps.

The four basic collection types (List, Set, and Queue all inherit from the Collection interface).

A map is a data structure that contains a collection of key/value pairs. After the key/value pair is added to the map, the key can then be used to retrieve its corresponding value. All of the keys in a map are unique, so if the key in a key/value pair already exists in the map, adding that key/value pair to the map will just replace the existing key's value.

Implementations

The Java Collections Framework contains three main implementations of the Map interface. Much like the Set implementations (which I described in a previous blog post), they differ in how they iterate over their data.

  • HashMap - The most basic implementation. It iterates over its entries in no particular order.
  • LinkedHashMap - This implemention iterates over its entries in either (1) the order in which the entries were added to the map, with the oldest entries coming first or (2) the access time of each entry, with the least recently accessed entry coming first. The order type can be specified in the constructor and it defaults to insertion order.
  • TreeMap - This implementation iterates over its entries by the sorted order of its keys. This requires that the key class implement the Comparable interface, which defines how objects of the class are sorted. If the class does not implement this interface, then a Comparator object can be passed into the TreeMap constructor to define the sorting rules.

Example

The following example demonstrates how to use a map in the Java Collections Framework. This map will hold the names of some of the states in the United States, with the state abbreviation acting as the key.

Map<String, String> states = new HashMap<String, String>();
states.put("AZ", "Azerbaijan");
states.put("TX", "Texas");
states.put("AZ", "Arizona");
states.put("CO", "Colorado");

System.out.println(states.get("AZ"));
System.out.println(states.get("ZZ"));
for (Map.Entry<String, String> entry : states.entrySet()){
  String abbr = entry.getKey();
  String name = entry.getValue();
  System.out.println(name + " (" + abbr + ")");
}

On line 1, a new instance of HashMap is created. The classes that will be used for the keys and values are specified using generics. In this case, both the key and value are Strings.

On lines 2-5, data is added to the map. On line 4, the key is the same as the key on line 2. This will cause the key's original value of "Azerbaijan" to be replaced with "Arizona".

Line 7 shows that the value for "AZ" was indeed replaced. It will print "Arizona". Line 8 will print "null" because the key "ZZ" does not exist in the map (the get() method returns null if the key does not exist).

On lines 9-13, each key/value pair in the map is iterated over and the abbreviation (key) and full name (value) of each state is printed out. The entrySet() method is used to do this. It returns a Set object that contains all of the map's key/value pairs. Because this is a HashMap, the map entries are returned in no particular order. Other map implementations would iterate over these entries differently:

  • LinkedHashMap (insertion order): AZ, TX, CO
    The first entry to be added was "AZ", followed by "TX" and "CO". Replacing a key's value does not count as an insertion, which is why "AZ" comes before and not after "TX".
  • LinkedHashMap (access order): TX, CO, AZ
    "AZ" comes last because it was most recently accessed (on line 7).
  • TreeMap (sorted order): AZ, CO, TX
    Because the keys are Strings, they are sorted alphabetically.