Friday, November 19, 2010

Linux patch increases multi-tasking performance

A couple days ago, a programmer named Mike Galbraith submitted a patch for the Linux kernel which significantly increases the performance of multi-tasking on the Linux desktop. With this patch, you can run multiple, CPU-intensive applications at the same time more effectively. Phoronix has two videos demonstrating the effect of this patch. Each video shows a computer that has number of tasks running simultaneously, such as compiling the Linux kernel and watching a high definition video. On the computer without the patch, the video was very stuttery and looked like a still image most of the time. But on the computer with the patch, the video plays almost perfectly. The patch is only about 200 lines of code, which is small compared to the large improvements the patch brings. Linus Torvalds, the developer in charge of the Linux kernel, liked the patch very much, calling it "one of those 'real improvement' patches", meaning that it creates few, if any, negative side-effects.

The patch achieves these performance gains by tweaking the scheduler (the scheduler determines when each running program is allowed to get CPU time). It works by grouping all programs by the TTY (terminal) that they were started in. CPU time is then spread out evenly across each group, as opposed to being spread out evenly across each individual program. Slashdot user tinkerghost gives a good example:

As an example, we need to run extinguish_fire and evacuate_building at the same time. extinguish_fire spawns a thread for each bucket in the brigade, while evacuate_building only spawns a thread for each escape route. Now, if there are 96 buckets and 4 escape routes, extinguish_fire will consume 96% of the CPU and choke out the evacuate_building threads...By grouping all of the threads from a program, extinguish_fire and evacuate_building get equal footing regardless of the number of threads they spawn.

Because it groups according to the TTY, it seems that you will only notice a performance improvement if you are running command-line programs. From what I've read, GUI programs are TTY-less. With this patch, GUI programs would all be lumped into the same group, so there would be no performance gain. In order to take advantage of the new scheduler, you would have to run a program from a new terminal in order to place it into another group.

Today, a Red Hat developer named Lennart Poettering found out that you can achieve the same thing just by running a few commands and editing a configuration file. This alternative solution is implemented differently, grouping programs by session instead of TTY, but it achieves the same effect. Linus says that this is actually how the kernel patch was originally tested, but that it should still be included in the kernel because it's such a good improvement.

Tuesday, November 9, 2010

Internationalization in Java

Internationalization means developing your application in such a way so that people from different countries who speak different languages can still use your program. For example, a French user, when presented with a "Yes/No" dialog, should instead be shown a "Oui/Non" dialog. The term "internationalization" is often abbreviated to "i18n". The "18" represents the eighteen letters between the first and last letters of the word.

Java provides a very nice way of doing this. It involves putting all of the text that your application uses into .properties files, which reside somewhere on the classpath. Each translation has its own .properties file and is named according to the language and (optionally) the country. For example, a German properties file would look like "messages_de.properties" ("de" being the standard, two-letter abbreviation for "German"). A British English properties file would look like "messages_en_UK.properties".

The properties files are organized in a hierarchy. This means that when searching for a particular string, the language/country file is looked for first (messages_en_UK.properties) followed by the language file (messages_en.properties) followed by the default file (messages.properties). One benefit to this is that if two translations are mostly the same (like US and UK translations), the parent file can contain most of the text, while the child files can define the differences. This helps to reduce duplication. It also allows for "fall-back" translations. For example, if there is no American English file, then it will use the English file.

To access these files in the Java code, the ResourceBundle class is used like so:

ResourceBundle messages = ResourceBundle.getBundle("com/example/messages");
System.out.println(messages.getString("hello.world"));

In this example, the .properties files are located in the "com.example" package and have names that begin with "message". It finds the property with a name of "hello.world" and prints its value to the console.

To determine which language to use, the Java application looks at the default Locale, which is automatically set in every Java program. ResourceBundle uses the default Locale to determine which properties file to use. For example, a French user using your application will automatically be shown the French translation because his or her default locale is set to France. No extra work needs to be done by the application.

Additionally, you can add arguments to your property values. This allows you to customize a message at runtime.  For example, the following property contains two arguments:

hello.someone=Hello, {0}!  I'm {1} to see you!

To populate the arguments, you must use the MessageFormat class like so:

ResourceBundle messages = ResourceBundle.getBundle("com/example/messages");
String hello = messages.getString("hello.someone");
System.out.println(MessageFormat.format(hello, "Joe", "happy"));

The above code would print "Hello, Joe! I'm happy to see you!" to the console.

Friday, November 5, 2010

How HTTPS Works

I'm currently reading the book Java Web Services: Up and Running by Martin Kalin. In Chapter 5, he discusses issues related to security. He starts out by giving a brief overview of HTTPS.

HTTPS is a secure version of HTTP, the protocol that web browsers use to access websites over the Internet.  With HTTPS, all communication is encrypted so that it can't be intercepted or altered by malicious attackers.  This is crucial to ensuring that, for example, nobody steals your credit card information when you purchase something from an online shopping site.

When your browser visits an HTTPS website, it first must initiate the connection in a process known as a handshake. The browser starts by requesting the server's digital certificate. The digital certificate contains the server's public key as well as a digital signature, which is said to sign the certificate. The digital signature is usually from a CA (certificate authority) such as VeriSign, but can also be self-signed. The browser checks its trust-store to see if it has either (a) a certificate matching the server's certificate or (b) a certificate corresponding to the digital signature. For example, the browser's trust store may not have a certificate for Amazon, but it probably does have a certificate for VeriSign, which is the CA that signed Amazon's certificate.

If the browser can't find an appropriate certificate in its trust-store, then it will show a scary security warning saying that it's dangerous to proceed with the connection. The danger is that a malicious attacker could create a certificate which tries to present itself as being from a reputable organization, like Amazon. He or she could create a fake website which looks like the Amazon website and fool you into buying something, thus giving him or her access to your credit card information.

If the certificate validates against the trust-store, the client generates a pre-master secret key, which is random string of 48 bits.  It then encrypts it with the server's public key and sends it to the server. Since only the server has the private key, only the server can decrypt it, which means that the key can't be intercepted by a malicious attacker. Public/private key encryption is called asymmetric encryption. Then, the client and the server use the pre-master secret key to create a master secret key. Because they both used the same pre-master secret key, the master secret key will be identical on both the client and server. This master secret key is then used to encrypt and decrypt all subsequent communication between the client and server. This is called symmetric encryption because only one key is needed to both encrypt and decrypt the data. Symmetrical encryption is much faster than asymmetric encryption (about 1000 times faster).