Tuesday, December 14, 2010

Google's Cr-48 netbook

If you haven't heard, Google is releasing its own operating system, called Chrome OS (check out this marketing video). It's based on Linux, but is tailored solely for use with online, web-based applications, such as Gmail and Google Docs. They are also releasing a netbook, called Cr-48, that comes installed with Chrome OS. The whole idea is that all your data is stored in the (mysterious, but reliable) cloud, so if something bad happens to your computer, none of your data is lost. It's almost like we're returning to the dummy-terminal days of yore. A pre-production version of the netbook has been given to a number of people so that Google can get feedback on the device and improve upon it before releasing it to the public.

One thing that differentiates Cr-48 from other netbooks is the keyboard. Google modified it slightly to better integrate it with the OS. The "Caps Lock" key was replaced with a "Search" key, which opens a new tab in the browser. The function keys (F1, F2, etc) were replaced with keys that perform tasks such as controling the brightness of the screen, controlling the speaker volume, and going back and forward in the browser.

The touchpad works as follows: Tap it with one finger for a left mouse click, tap it with two fingers for a right mouse click, drag two fingers across its surface to scroll. People online have expressed frustration with this setup, but this is exactly how the touchpad of my Asus Eee netbook functions and I'm satisfied with it. Cr-48 comes with a single USB port with extremely limited functionality. It doesn't support anything other than a mouse or keyboard, not even a thumb drive. It comes with an SD card slot, although Chrome OS currently doesn't recognize it. A headphone jack is included and is fully functional. It has a 16GB solid state hard drive.

In terms of internet connectivity, Wi-Fi and 3G (which is on Verizon's network) are supported. You get a free 100MB per month of 3G bandwidth for the first two years. The netbook doesn't have an Ethernet jack, so you're limited to wireless connections. It comes with a VGA port, allowing you to plug in a larger monitor or a projector.

The idea of an OS which stores all of its applications and data in the cloud is an interesting concept, but I'm in no hurry to start using it. Even though Google provides very well designed online applications, I would imagine that they are not as powerful as their desktop equivalents. Cr-48 sounds more like something you'd take with you on the road or to the coffee shop to do casual work.

Friday, December 3, 2010

Java: The Next Generation

A few weeks ago, plans for the next version of Java were released. It will be split into two different versions in order to more quickly get a new version out the door. Two key features that were originally slated to be released in Java 7, will be pushed back to Java 8.

One of these features is called Lambda, which adds closures to Java. From what I understand, a closure is like a function that can be assigned to a variable. This means that the closure can be passed as an argument to another function, and then called inside of the function's body. If the closure is declared inside of another function, it has access to the parent function's local variables. Many languages support this. Here is an example of how closures can be used in Javascript (taken from Wikipedia):

// Return a list of all books with at least 'threshold' copies sold.
function bestSellingBooks(threshold) {
  return bookList.filter(
      function (book) { return book.sales >= threshold; }

The function filters the bookList array, returning only the books that have sold a mininum number of copies. It does this by passing a closure into the filter() method, which gets executed on every book in the array. If the closure returns true for a particular book, then that book is filtered out.

The other major feature that was pushed back to Java 8 is called Jigsaw. The goal of Jigsaw is to break the JDK into separate modules. This would help applications boot up more quickly. It also would help in situations where a JRE is packaged with the application. This JRE could be reduced in size by only including the modules that the application needs.

Java 7 is slated for release in mid-2011 and Java 8 is to be released sometime in 2012.

Friday, November 19, 2010

Linux patch increases multi-tasking performance

A couple days ago, a programmer named Mike Galbraith submitted a patch for the Linux kernel which significantly increases the performance of multi-tasking on the Linux desktop. With this patch, you can run multiple, CPU-intensive applications at the same time more effectively. Phoronix has two videos demonstrating the effect of this patch. Each video shows a computer that has number of tasks running simultaneously, such as compiling the Linux kernel and watching a high definition video. On the computer without the patch, the video was very stuttery and looked like a still image most of the time. But on the computer with the patch, the video plays almost perfectly. The patch is only about 200 lines of code, which is small compared to the large improvements the patch brings. Linus Torvalds, the developer in charge of the Linux kernel, liked the patch very much, calling it "one of those 'real improvement' patches", meaning that it creates few, if any, negative side-effects.

The patch achieves these performance gains by tweaking the scheduler (the scheduler determines when each running program is allowed to get CPU time). It works by grouping all programs by the TTY (terminal) that they were started in. CPU time is then spread out evenly across each group, as opposed to being spread out evenly across each individual program. Slashdot user tinkerghost gives a good example:

As an example, we need to run extinguish_fire and evacuate_building at the same time. extinguish_fire spawns a thread for each bucket in the brigade, while evacuate_building only spawns a thread for each escape route. Now, if there are 96 buckets and 4 escape routes, extinguish_fire will consume 96% of the CPU and choke out the evacuate_building threads...By grouping all of the threads from a program, extinguish_fire and evacuate_building get equal footing regardless of the number of threads they spawn.

Because it groups according to the TTY, it seems that you will only notice a performance improvement if you are running command-line programs. From what I've read, GUI programs are TTY-less. With this patch, GUI programs would all be lumped into the same group, so there would be no performance gain. In order to take advantage of the new scheduler, you would have to run a program from a new terminal in order to place it into another group.

Today, a Red Hat developer named Lennart Poettering found out that you can achieve the same thing just by running a few commands and editing a configuration file. This alternative solution is implemented differently, grouping programs by session instead of TTY, but it achieves the same effect. Linus says that this is actually how the kernel patch was originally tested, but that it should still be included in the kernel because it's such a good improvement.

Tuesday, November 9, 2010

Internationalization in Java

Internationalization means developing your application in such a way so that people from different countries who speak different languages can still use your program. For example, a French user, when presented with a "Yes/No" dialog, should instead be shown a "Oui/Non" dialog. The term "internationalization" is often abbreviated to "i18n". The "18" represents the eighteen letters between the first and last letters of the word.

Java provides a very nice way of doing this. It involves putting all of the text that your application uses into .properties files, which reside somewhere on the classpath. Each translation has its own .properties file and is named according to the language and (optionally) the country. For example, a German properties file would look like "messages_de.properties" ("de" being the standard, two-letter abbreviation for "German"). A British English properties file would look like "messages_en_UK.properties".

The properties files are organized in a hierarchy. This means that when searching for a particular string, the language/country file is looked for first (messages_en_UK.properties) followed by the language file (messages_en.properties) followed by the default file (messages.properties). One benefit to this is that if two translations are mostly the same (like US and UK translations), the parent file can contain most of the text, while the child files can define the differences. This helps to reduce duplication. It also allows for "fall-back" translations. For example, if there is no American English file, then it will use the English file.

To access these files in the Java code, the ResourceBundle class is used like so:

ResourceBundle messages = ResourceBundle.getBundle("com/example/messages");

In this example, the .properties files are located in the "com.example" package and have names that begin with "message". It finds the property with a name of "hello.world" and prints its value to the console.

To determine which language to use, the Java application looks at the default Locale, which is automatically set in every Java program. ResourceBundle uses the default Locale to determine which properties file to use. For example, a French user using your application will automatically be shown the French translation because his or her default locale is set to France. No extra work needs to be done by the application.

Additionally, you can add arguments to your property values. This allows you to customize a message at runtime.  For example, the following property contains two arguments:

hello.someone=Hello, {0}!  I'm {1} to see you!

To populate the arguments, you must use the MessageFormat class like so:

ResourceBundle messages = ResourceBundle.getBundle("com/example/messages");
String hello = messages.getString("hello.someone");
System.out.println(MessageFormat.format(hello, "Joe", "happy"));

The above code would print "Hello, Joe! I'm happy to see you!" to the console.

Friday, November 5, 2010

How HTTPS Works

I'm currently reading the book Java Web Services: Up and Running by Martin Kalin. In Chapter 5, he discusses issues related to security. He starts out by giving a brief overview of HTTPS.

HTTPS is a secure version of HTTP, the protocol that web browsers use to access websites over the Internet.  With HTTPS, all communication is encrypted so that it can't be intercepted or altered by malicious attackers.  This is crucial to ensuring that, for example, nobody steals your credit card information when you purchase something from an online shopping site.

When your browser visits an HTTPS website, it first must initiate the connection in a process known as a handshake. The browser starts by requesting the server's digital certificate. The digital certificate contains the server's public key as well as a digital signature, which is said to sign the certificate. The digital signature is usually from a CA (certificate authority) such as VeriSign, but can also be self-signed. The browser checks its trust-store to see if it has either (a) a certificate matching the server's certificate or (b) a certificate corresponding to the digital signature. For example, the browser's trust store may not have a certificate for Amazon, but it probably does have a certificate for VeriSign, which is the CA that signed Amazon's certificate.

If the browser can't find an appropriate certificate in its trust-store, then it will show a scary security warning saying that it's dangerous to proceed with the connection. The danger is that a malicious attacker could create a certificate which tries to present itself as being from a reputable organization, like Amazon. He or she could create a fake website which looks like the Amazon website and fool you into buying something, thus giving him or her access to your credit card information.

If the certificate validates against the trust-store, the client generates a pre-master secret key, which is random string of 48 bits.  It then encrypts it with the server's public key and sends it to the server. Since only the server has the private key, only the server can decrypt it, which means that the key can't be intercepted by a malicious attacker. Public/private key encryption is called asymmetric encryption. Then, the client and the server use the pre-master secret key to create a master secret key. Because they both used the same pre-master secret key, the master secret key will be identical on both the client and server. This master secret key is then used to encrypt and decrypt all subsequent communication between the client and server. This is called symmetric encryption because only one key is needed to both encrypt and decrypt the data. Symmetrical encryption is much faster than asymmetric encryption (about 1000 times faster).

Saturday, October 30, 2010

Screen-scraping Wikipedia

In order to screen-scrape a page on Wikipedia, there is one extra step that you must take in order to successfully download a page for processing. You must include a User-Agent header in your HTTP request. Wikipedia requires that this header be included or else it will return a 403 Forbidden error. I found this out thanks to a user on the #mediawiki IRC channel. They suggest that you set the User-Agent to something which uniquely identifies your program or application. They strongly discourage using the User-Agent string of a browser because this signals that you might be doing something malicious.

It is easy to set the User-Agent header in PHP. You can either edit your PHP installation's php.ini file or add the following line of code to your PHP script. The cURL library also supports setting HTTP headers, but this library is not included in the standard PHP installation.

//tell it what value to use for the User-Agent header
ini_set('user_agent', 'My Cool Screen-Scraper (+http://www.mangst.com)');

//includes the above User-Agent header in this request and all subsequent requests
$page = file_get_contents('http://en.wikipedia.org/wiki/Pumpkin');

Note that this is different from the header() function. The header() function is used to set the headers of the HTTP response that the PHP script itself is generating. This has nothing to do with any HTTP requests that the script makes in the process of generating its response.

Monday, October 25, 2010

Poor Man's FTP

In the November issue of Linux Journal Magazine, Kyle Rankin wrote an article about his experience attending the DEF CON conference.  One lesson he took away was the importance of knowing how to use the basic Linux commands, such as vi and sh.  Being familiar with these commands means that you won't be dead in the water if you have to work on a computer with a minimal Linux install.

One of these commands is netcat (nc).  Netcat allows you to open TCP and UDP connections with other computers as well as listen for connections.  Kyle described many interesting ways that you can use this command.  My favorite was using it to transfer files.  I think that this technique would come in very handily if ssh or ftp is not installed.

It's very simple.  The computer receiving the file runs this command:

nc -l 31337 > output_file

And then the computer sending the file runs this command:

nc hostname 31337 < input_file

This will send the file through port 31337 and automatically close the connection when the transfer is complete.  It doesn't matter what port you use, so long as the port isn't being used by another program.

Friday, October 22, 2010

Starbucks Wi-Fi

I ran into a small problem at a Starbucks store the other day.  I had my netbook with me and wanted to connect to their Wi-Fi network.  The service is free, but in order to access the Internet, you must first visit a Starbucks webpage that asks you to accept their terms and conditions.  Any attempt to visit any other website will redirect you to this page.

I like to have my browser reopen all the tabs from my last browsing session when it starts up.  My problem was that, because I have to accept the terms and conditions first, all my tabs would redirect to the Starbucks page.  Clicking the back button after accepting the terms and conditions doesn't return me to my original page.  I think that this is because it's a HTTP 3xx redirect response.  So I basically lose all my tabs.

Getting around this wasn't too tricky.  The terms and conditions page is just an HTML form with a bunch of hidden parameters and a checkbox for "I agree".  I wrote a Java program to parse all the parameters out of the page and submit the form.  So if I run this before opening my browser, my browser will reload all its tabs no problem.  No annoying redirects to the Starbucks page.

You can download it here.  I put all the classes in one file to make it simpler.  I also wrote some JUnit tests to test the part that parses the HTML page.  To run it, just compile the file and run java Starbucks -v.  The -v (verbose) is optional and will cause it to print status messages as it's working.  Run this program as soon as you connect to the Starbucks Wi-Fi network (and before you open your browser).

Monday, October 18, 2010

Peer-to-Peer (P2P) Systems

In the current issue of the magazine Communications of the ACM, there is an article called Peer-to-Peer Systems by Rodrigo Rodrigues and Peter Druschel. Along with discussing the pros and cons of P2P networks and including examples of how they are used, the article goes into technical detail about how they work.

The article divides P2P systems into two types. One type is partly centralized. In these systems, there exists a single controller node, which keeps a list of all nodes that are connected to the network, along with the resources that each node is sharing. A good example of this kind of P2P network would be Napster (now non-existent). When you searched for a song on Napster, a request would be sent to a centralized server owned by the Napster folks themselves. This server would then search its database for all computers in the P2P network that had the song, and return this list to you. You then downloaded the song by directly connecting to the computer hosting the file. Without this server, there would be no way to get the song that you were looking for because you wouldn't know what computers, out of all the computers in the entire Internet, are both sharing their music collection and have the song you are looking for.

The other type of P2P network is decentralized. In a decentralized P2P network, there is no controller node that knows about all the computers in the network. Computers connect to the network through a bootstrap node, which is just one computer in the network that makes its IP address publicly known. This lack of centralization makes these types of P2P networks more robust, as the network is not dependent on a single server being functional. But because of this lack of centralization, there is no straight-forward way of knowing what computers are connected to the network or what resources they are sharing. This makes searching for particular resources trickier. The article describes two ways in which a decentralized P2P network can be structured in order to solve this problem of search.

One way of structuring a decentralized P2P network is by using an unstructured overlay (an overlay is a graph that describes how the nodes are connected with each other). Each computer in the network only knows about a few other computers that are also connected to the network. To search for a file, the computer will query its neighbors first. If none of its neighbors have the file, then it will ask its neighbor's neighbors. If none of these computers have the file, then it will ask its neighbor's neighbor's neighbors, and so on. This kind of decentralized network is fine if the resource you are looking for is replicated across many other nodes. However, if the resource is rare, then finding that resource could take a very long time. For example, imagine having to search for a file which exists on only one computer in a network of one million computers. The odds of that computer being within a short search distance to your computer is slim.

The other way a decentralized P2P network can be structured is by using a structured overlay. I didn't quite understand the specifics of this technique, but it involves the use of unique keys. Each computer in the network is assigned a unique key in such a way that all keys are evenly spread out in the key space. For example, if the key space is 0-999, the first node will be assigned a random key, say 432. Then, the second node will be assigned a key around 932 (999/2+432, on the opposite side of the "circle"). The third key will be assigned a key around either 682 or 182, and so on. Each node only directly knows about its two neighbors, so the overlay graph looks like a circle. The advantage to this type of overlay is that it makes searching much faster. It's able to use these unique keys to quickly find a computer hosting the resource. This is called key-based routing (KBR). Even if the resource is rare, it will still be able to find it quickly (unlike unstructured overlays, which must spend the time to ask each node directly). However, the downside is that there exists an overhead to maintain these keys. Extra work must be done every time a node enters or leaves the network, so if the number of computers that are connected to the network is constantly changing (called churn) this may not be the best solution.

It was a very good article, but I do have one criticism: it considers applications like SETI@home to be P2P. How is this P2P? You do not communicate with the other peers on the network. You only communicate with the centralized SETI server in order to download new data to process. I think that a better category for SETI@home would be "distributed computing", not P2P.

Saturday, October 2, 2010

How to access your home computer over the Internet with VNC

The VNC protocol gives you remote control of another computer's screen. You can see and interact with the computer as if you were sitting right in front of it.

In this blog post, I'm going to describe how to set up your computer so that you can connect to it anywhere in the world through the Internet.  If your home computer is connected to a router (which it probably is), then the process is a little tricky, which is why I thought it would be helpful to write this blog post.

1. Install a VNC Server

First, you must install a VNC Server on the computer you want to control.

Mac OS X already comes with the necessary software, so you don't have to install anything. To enable Mac's VNC Server, do the following:

a. In System Preferences, click on "Sharing".
b. Check the "Screen Sharing" checkbox to enable it.
c. Click the "Computer Settings..." button.
d. Check the box that says "VNC viewers may control screen with password". Type in a password, then click OK. Remember that your computer will be visible to the world, so make sure the password is secure!

If you are running Windows, can you use TightVNC Server as a VNC Server.

2. Configure your router

Your home computer is probably connected to a router, either through a wired, ethernet connection or a wireless connection. A router connects all of your computers together to form a home network and acts as the gate keeper to the Internet. But if your computer is connected to a router, then it doesn't have its own IP address, which is what you need in order for VNC to connect to your computer.

To get around this, you must tell your router to forward VNC traffic to the computer you want to control. VNC communicates over port 5900, so you must tell your router to forward all data it receives from this port to the 5900 port on your computer. Here is how I did this with my Belkin router:

First, open the router's configuration web page by typing its private IP address in a web browser. My router's private IP is (Private IP addresses are only visible within your home network--they are not visible from the Internet.)

Then, click on the "Virtual Servers" menu option under the "Firewall" category.  It will ask you for a password. If you haven't configured the router with a password, then just click "Submit".  This page lists all the data that the router will forward to other computers on the network. Pick an empty row and enter 5900 for the Inbound and Private port fields. Then, enter the private IP address of the computer you want to control. Finally, click on the "Enable" checkbox, then click "Apply Changes". As you can see in the screenshot, you can do this for other services too like SSH, FTP, or HTTP.

3. Test it out

You'll need a VNC Viewer in order to connect to the computer. Chicken of the VNC is a good VNC Viewer for Mac. For Windows, you can use TightVNC Viewer.

You'll also need the IP address of your router. An easy way to get the IP address of your router is to visit whatismyip.com from one of the computers that are connected to the router.

If the VNC Viewer asks for a display number, enter "0". Display "0" maps to port 5900, display "1" maps to port 5901, display "2" maps to port 5902, etc.

4. Get a free domain name (optional)

DynDNS is a free service that maps your IP address to a domain name like foobar.dyndns.com. Check to see if your router supports this service. My router will automatically update my DynDNS account whenever the router's IP address changes (which can happen often). Using DynDNS means you don't have to memorize your IP address or worry about it changing.

Wednesday, September 22, 2010

More Screen Scraping!

The website CelebrityBookSigningsAndEvents.com contains a list of book signing appearances that different celebrities are making across the country. I like to go to this website occasionally to see if there is anybody I would be interested in seeing. However, it does not contain any search functionality that allows you to see what celebrities are visiting your area. I thought that it would be fun to screen scrape the webpage and make the data more searchable.

Screen Scraping Overview

Screen scraping means taking the HTML source code of a webpage and extracting data out of it. For example, one thing that I wanted to extract from CelebrityBookSigningsAndEvents.com was the titles of all the books. Looking at the source code, I could see a pattern to where the titles were located in the HTML. They all looked something like this:

<div class="modWrap">
    <font size="4">
            <font color="#0000ff">Mysterious Galaxy</font>
    <font size="4">
            <font color="#0000ff">Called To Coach</font>
  ... more tag hierarchies like these

To extract these book titles, I used XPath queries. This is a query language that allows you to pull data from specific parts of an XML document. To get all the book titles from my example, the XPath query would look like the following:


This will return a list all the font tags that are nested within this particular hierarchy of tags. The two slashes are the beginning mean that it doesn't matter what tags come before the div tag. The [@class='modWrap'] part returns only those div tags that have a class attribute with a value of "modWrap". The PHP code to run this query would look like the following:

//load the HTML source code into a DOM
$html = file_get_contents('http://www.celebritybooksigningsandevents.com/events');
$dom = new DOMDocument();

//run the XPath query
$xpath = new DOMXPath($dom);
$bookTitleNodes = $xpath->query('//div[@class="modWrap"]/p/font/strong/em/u/font');
foreach ($bookTitleNodes as $node){
  echo $node->textContent; //the title is the text within the "font" tag


Aside from slight inconsistencies in the structure of the HTML, which I could account for by tweaking the XPath queries, the biggest problem I ran into was the fact that the source code was littered with these two strange characters--ASCII 160 and ASCII 194. Having ASCII values greater than 127, they were not part of the normal ASCII character set, which gave me problems in places where I needed to access individual characters in a string. They appeared as spaces in my web browser, but were not treated as spaces in my PHP code. Simply replacing all of these characters with spaces before creating the DOM fixed this issue.

The DOMDocument::loadHTML() function was throwing warnings that didn't affect the screen scraping results, but that I didn't want appearing on my webpage. You can silence the error messages that a function generates using PHP's @ operator: @$dom->loadHTML($html);

I also ran into a problem of my web server running an earlier version of PHP than my local computer. It took me forever to track down. FYI: DateTime::getTimestamp() is only supported in PHP versions 5.3 and above...


After scraping the page, I save the data to an XML file and use it as a cache. If the file gets to be more than an hour old, it will refresh the cache by re-scraping the original webpage. This keeps the cache up to date with any changes that were made to the original webpage. And by using a cache, CelebrityBookSigningsAndEvents.com is not constantly harassed by requests from my website.

Tuesday, August 31, 2010


Yesterday, I finished designing a small game written in Java. It is called Bemuled! and it's based on the infamous game Bejeweled. The rules of the game differ slightly in that you are only given ten moves in which to make the most amount of points that you can. It uses Java's Web Start technology, so you can launch it simply by clicking on a link in your browser. It took about a week to make and I had a lot of fun making it. You can read more about the technical aspects of the application by reading the Technical Overview WHITEPAPER.

Sunday, July 25, 2010

Does screen brightness significantly affect battery life?

Many laptops will automatically dim the screen when running on battery power in order to conserve energy. But does this significantly increase the battery life? How much power is actually saved when the screen's brightness is turned down? My guess has always been that the energy savings are negligible, that a darkened screen might give the laptop 10 more minutes of battery life, nothing more.

To discover the answer, I ran a small experiment. I ran a script which polled the battery data on my netbook every 5 seconds. I monitored this data for 2 minutes with the screen on its brightest setting and 2 minutes with the screen on its darkest setting. I disabled my netbook's wireless network card and muted the speakers to avoid any interference in the data. The results were marginally better than I expected:

Avg. mABat. life
(4200 mAh battery)
Brightest1120.83h 42m
Darkest1018.64h 7m

If kept on its darkest screen setting, my netbook battery would last about 25 minutes longer. If the brightness were to be set somewhere in between the two extremes to a level where the screen would actually be readable, the energy savings would probably be about half that--12.5 minutes.

The battery data was collected from "/proc/acpi/battery/BAT0/state". The script used to poll the data was written in PHP and is shown below:

 $state = file_get_contents('/proc/acpi/battery/BAT0/state');

 preg_match('/present rate:\\s+(\\d+)/', $state, $matches);
 $presentRate = $matches[1];

 echo date('M d G:i:s'), ' ', $presentRate, "\n";


Wednesday, July 21, 2010

UNetbootin Troubles

If you are having trouble running the UNetbootin Linux binary, you can try installing it using apt-get. This will automatically download and install any required dependencies.

1. Add the following lines to "/etc/apt/sources.list".
deb http://ppa.launchpad.net/gezakovacs/ppa/ubuntu [name] main
deb-src http://ppa.launchpad.net/gezakovacs/ppa/ubuntu [name] main
Where "[name]" is the name of your Ubuntu version:

2. Add the GPG key.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 72D340A3
3. Get the list of packages from the newly added source.
sudo apt-get update
4. Install UNetbootin. It will appear in the "System Tools" menu.
sudo apt-get install unetbootin
1. How to install UNetbootin in ubuntu

Monday, May 31, 2010

Flair Additions

So I made some improvements to my developer forum flair!

Firstly, I added a Sun Forums flair, which displays your username, number of posts, and number of Dukes.

The background image to the left changes depending on how many "Dukes" you have, which are like these special points you can get for answering questions. It will also put a star next to your name if you are a moderator.

This one was a little more complicated because I couldn't pass the HTML directly into a DOMDocument object like with JavaRanch. I had to use tidy to clean up the HTML before passing it to DOMDocument:
//tidy the page
$tidy = new tidy("http://forums.sun.com/profile.jspa?userID=$id");
$tidy = preg_replace('/<\/?nobr>/', '', $tidy); //<nobr> tags must be removed for DOMDocument

//load the page
$html = new DOMDocument();
I also had to pretty much do all development on the mangst.com server because the PHP installation on my iMac doesn't have tidy installed and it doesn't support all of the GD graphics functions.

Secondly, I added the ability to view the flair as a dynamically created image (nerdgasm!!!). Just change the "type" parameter to "png" or "jpeg":
<img src="http://www.mangst.com/flair/sun?id=1071155&type=png" />

I was actually surprised how easy this was. I had this fear that generating an image programmatically would be hugely complicated, but it wasn't really. The hardest part was keeping track of the pixels to make sure everything lined up right. I tried to make it look as close as possible to the Javascript and HTML versions, but for some reason the bold version of Verdana comes out looking a lot less bold in the image.

Tuesday, May 25, 2010

JavaRanch Flair

Check out my JavaRanch flair!

I was so inspired by the flair over at Stackoverflow that I thought it would be fun to create my own! It basically works just like Stackoverflow's does. You have three options for how to include it in your webpage:

  • Javascript - You can include it using a <script> tag, which injects the flair into the DOM via Javascript.
  • HTML - You can include it using an <iframe>, which loads the flair as HTML into the frame.
  • JSON - You can get the raw data with JSON and handle the data however you wish with Javascript.
    JSON URL: http://www.mangst.com/flair/javaranch?id=209694&type=json
On the backend, what it does is it screen scrapes your JavaRanch profile page, plucking out the information that it needs. It uses a PHP DOMDocument object to load the HTML, then uses XPath to get the data fields.
$html = new DOMDocument();
$xpath = new DOMXPath($html);
$username = $xpath->query("//span[@id='profileUserName']")->item(0)->textContent;

I think it's pretty elegant, though it will break if JavaRanch decides to do any site redesigns. I was afraid that it wouldn't be able to load the HTML into a DOM, since webpage HTML tends not to be well formed XML, which would have prevented me from using XPath. The SimpleXMLElement class wouldn't accept the HTML, but the DOMDocument class did.

Friday, May 7, 2010

SimpleDateFormat and Thread Safety

The Java class SimpleDateFormat converts Dates to Strings and vice versa. For example, the following code converts the String "04/15/2010" to a Date object and back again:
DateFormat df = new SimpleDateFormat("MM/dd/yyyy");
Date d = df.parse("04/15/2010");
String s = df.format(d);
"04/15/2010".equals(s); //true
This class is documented as not being thread-safe, but I decided to see for myself if this was true.

Thread-safety proof
I created a program that proves that SimpleDateFormat is not thread-safe. It creates X threads which run concurrently. Each thread generates Y random Dates, then formats each Date in two ways: using a local instance of SimpleDateFormat that no other thread has access to, and using a static instance of SimpleDateFormat which all threads use. The Strings created by the local instance are added to one List and the Strings created by the static instance are added to another List.

If everything is synchronized properly, then these two lists should be identical. But because SimpleDateFormat is not thread safe and all threads use a shared static instance, the lists do not always come out identical (I've found that around ten threads and ten dates-per-thread consistently produce different lists).

If the call to "staticDf.format()" is wrapped in a synchronized block, then the lists always come out identical, which shows that SimpleDateFormat needs to be manually synchronized and is therefore not thread-safe.

The following command will create ten threads, each of which will generate twenty dates:
java SimpleDateFormatThreadSafe 10 20

Source code
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.List;

 * Proves that SimpleDateFormat is indeed not thread safe (as documented in the
 * javadocs).
 * @author mangstadt
public class SimpleDateFormatThreadSafe {
  private static final String format = "MM/dd/yy";
  private static final DateFormat staticDf = new SimpleDateFormat(format);
  private static int numThreads = 10;
  private static int numLoopsPerThread = 10;

  public static void main(String args[]) throws Exception {
    //get the arguments
    if (args.length > 0){
      numThreads = Integer.parseInt(args[0]);
      if (args.length > 1){
        numLoopsPerThread = Integer.parseInt(args[1]);

    //create the threads
    MyThread threads[] = new MyThread[numThreads];
    for (int i = 0; i < threads.length; ++i) {
      threads[i] = new MyThread();

    //start the threads
    for (MyThread t : threads) {

    //check the results
    boolean allIdentical = true;
    for (MyThread t : threads) {
      if (!t.localList.equals(t.staticList)){
        System.out.println(t.getName() + " lists are different:");
        System.out.println("local:  " + t.localList);
        System.out.println("static: " + t.staticList);
        allIdentical = false;
    if (allIdentical){
      System.out.println("All lists are identical.");

  private static class MyThread extends Thread {
    public final List localList = new ArrayList();
    public final List staticList = new ArrayList();

    public void run() {
      DateFormat localDf = new SimpleDateFormat(format);
      for (int i = 0; i < numLoopsPerThread; ++i) {
        //create a random Date
        Calendar c = Calendar.getInstance();
        c.set(Calendar.MONTH, randInt(0, 12));
        c.set(Calendar.DATE, randInt(1, 21));
        c.set(Calendar.YEAR, randInt(1990, 2011));
        Date d = c.getTime();

        //add formatted dates to lists
        //synchronized (staticDf){

  private static int randInt(int min, int max) {
    return (int) (Math.random() * (max - min) + min);

Sunday, April 11, 2010

The Data URI

Typically, an image is included in a web page by referencing an image file:

<img src="take-this.jpg" />

But it's also possible to put the image directly inside the web page source. This is done using a data URI:

<img src="data:image/jpeg;base64,image-data-goes-here" />

The URI contains the content-type of the image ("image/jpeg" in this case, since this image is a JPEG) and the image data encoded in base64.

This is supported by most browsers. Internet Explorer support lags behind, with only version 8 supporting it (to a limited extent).

This technique shouldn't be used on a regular basis, since it bloats the size of the web page considerably and makes it take longer to load. Including images as separate files (the typical way) lets the user explore the page while the more bandwidth intensive images are loading.

Check out my Data URI Generator to generate a data URI from an image of your choosing.

Thursday, April 8, 2010

Bad HealthVault Method Schema URLs

One thing to note when working with the HealthVault XML method schemas is that the links to these schemas in the HealthVault Developer Center (http://developer.healthvault.com/methods/methods.aspx), as well as in the GetServiceDefinition response, are not always correct. Some versions of some methods refer to the version 1 schemas, when they actually have their own schemas.

For example, the URL to the GetThings3 request schema is listed as:


This URL incorrectly points to the version 1 schema. The version number must be added to the end of the file name in order to get the correct schema:


This holds true for the following methods:

CreateConnectPackage2 request
GetServiceDefinition2 response
GetThings3 request, response
OverwriteThings2 request
PutThings2 request

Saturday, March 27, 2010

Creating a Connect-Request in HealthVault

I've been doing a lot of work with Microsoft HealthVault at my job lately. I did some experimental work with the platform when I first started working there and since then, I've sort of been the team's HealthVault expert. It's been fun learning all about it and challenging as well, since most of the HealthVault libraries, tutorials, etc are centered around .NET (we use Java). So HealthVault is something I think would be fun to blog about.

This blog post will be about how to use what are called connect-requests in HealthVault. It assumes that you already have some knowledge of HealthVault from a developer's perspective. It includes Java code samples that use the JAX-B classes from the HealthVault Java Library.

Connect-requests are used by non web-based applications to create a connection to a HealthVault record. The process must only be completed once--not every time the application wants to access the record. They work like this:

1 The application prompts the user for the following information:

  • Friendly name - This can be anything, but should be the name that's on the HealthVault record.

  • Question - A question of the user's choosing (such as "What high school did I go to?").

  • Answer - The answer to the above question.

2 The application uses the above information, along with an external-id, to create a connect-request. The external-id can be any value that uniquely identifies the connect-request (the current time in milliseconds works fine). This value must be saved somewhere, as the application will need to use it again later (see step 5). Using these four bits of information, the application sends the request to HealthVault:
//create the request
CreateConnectRequestRequest request = new CreateConnectRequestRequest();
request.setExternalId(System.currentTimeMillis() + "");
request.setFriendlyName("Joe Smith");

//send the request / get the response
SimpleRequestTemplate srt = new SimpleRequestTemplate(ConnectionFactory.getConnection());
CreateConnectRequestResponse response = (CreateConnectRequestResponse) srt.makeRequest(request);
String identityCode = response.getIdentityCode();

Because we're using the JAX-B request classes, be sure to use
and not

3 HealthVault returns an identity-code, which is a sequence of 20 random letters separated into five groups of four:
The application must show this to the user. It should also show (what I call) the patient connect URL. The user will have to visit this URL in order to validate the connect-request:

Remove the "-ppe" when going live of course.

4 The user visits the patient connect URL in a web browser. This page will step the user through a process, requiring her to (1) login to her HealthVault account, (2) enter the identity-code, (3) enter the question/answer, and (4) choose which record in her account to grant the application access to.

5 Once the user has done this, the application is able to retrieve the person-id and record-id of the user's HealthVault record:
String externalId = //retrieve the external-id you saved in step 2
GetAuthorizedConnectRequestsRequest request = new GetAuthorizedConnectRequestsRequest();
SimpleRequestTemplate srt = new SimpleRequestTemplate(ConnectionFactory.getConnection());
GetAuthorizedConnectRequestsResponse response = (GetAuthorizedConnectRequestsResponse) srt.makeRequest(request);
for (ConnectRequest cr : response.getConnectRequest()) {
if (cr.getExternalId().equals(externalId)) {
String personId = cr.getPersonId();
String recordId = cr.getRecordId();
//save to persistent storage (like a database)
However, the application has no way of knowing when the user will approve the application (when she completed step 4). So, the application must continually poll HealthVault for newly approved connect-requests (e.g. a separate thread which calls GetAuthorizedConnectRequests every 30 seconds or so).

The CreateConnectRequest method has a "call-back-url" parameter, which is a URL that HealthVault is supposed to call when the connect-request is validated by the user. However, at the time of this writing, it is not supported.

Monday, March 22, 2010

Mac OS X Mouse Acceleration

If the mouse acceleration settings of Mac OS X ever frustrate you, check out the Mouse Acceleration Preference Pane. This is a utility created by Christian Zuckschwerdt which lets you adjust these settings to your liking.

Apple changed around OS X's mouse API when version 10.6 was released, which made existing mouse acceleration tools useless. However, Christian just recently released an updated version supporting 10.6! My hand feels less cramped already...

Sunday, March 14, 2010

Working with Scala and Maven

In order to get Maven to properly recognize Scala code, there are are a number of steps you must take. Included below are pom.xml samples, along with explanations.

1. Name your source directories properly:



2. Add the scala-tools.org repositories.

These are required in order to download the necessary Scala dependencies (see step #3):
<name>Scala-tools Maven2 Repository</name>

<name>Scala-tools Maven2 Repository</name>

3. Add the appropriate dependencies:

This contains the scala compiler, which Maven will use to compile your code and run your unit tests.

This is Scala's unit testing framework. It's also possible to use JUnit, but when in Rome, right?

If you want to use scalatest, unfortunately you also need to include JUnit as a dependency (see step #5).


4. Add the scala-tools plugin:



5. Add the @RunWith annotation to each unit test.

If you are using scalatest as your unit testing framework, you must trick Maven into thinking that your tests are JUnit tests. Otherwise, Maven will not run your tests:
import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner
import org.scalatest.FunSuite

class MyScalaTest extends FunSuite{

Tip: If you're using Eclipse, you can right click on one of these unit tests and select "Run As > JUnit Test" to manually run the test.

Wednesday, March 10, 2010

XPath and Java

XPath is a domain specific language which is used to extract data from an XML document. It's supported by many different general purpose languages like C#, PHP, and Java. Take the following XML document for example:

<author>Stanislaw Lem</author>
<title>Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
<author>Frank Herbert</author>

The following XPath query retrieves the titles of all English-language books:


Java makes working with XPath (and XML in general) kind of complicated, since many different classes are involved. First, the XML document must be loaded into a DOM (Document Object Model).

StreamSource source = new StreamSource(new File("books.xml"));
DOMResult result = new DOMResult();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(source, result);
Node documentRoot = result.getNode();

This means that the XML text is read into memory and organized into a tree of nodes where each tag is an element node. The top element node would be the <library> element, which would have three child element nodes (<book>), and so on.

To demonstrate the power of XPath, this is what Java code might look like if XPath did not exist. The programmer would have to manually iterate through the entire DOM to get what she needed.

List<String> englishTitles = new ArrayList<String>();
Node books = documentRoot.getFirstChild();
if (books.getNodeName().equals("library")){
for (int i = 0; i < books.getChildNodes().getLength(); ++i){
Node book = books.getChildNodes().item(i);
if (book.getNodeName().equals("book")){
boolean english = false;
String title = null;
for (int j = 0; j < book.getChildNodes().getLength(); ++j){
Node bookChild = book.getChildNodes().item(j);
if (bookChild.getNodeName().equals("language") &&
english = true;
if (bookChild.getNodeName().equals("title")){
title = bookChild.getTextContent();
if (english && title != null){
for (String title : englishTitles){

As you can see, this is very very tedious and error prone! XPath is the better solution by far:

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList)xpath.evaluate("/library/book[language='en']/title",
documentRoot, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i){
Node node = nodeList.item(i);


XML documents often use namespaces. These are sort of like Java packages--they group related elements together and prevent name collisions from occurring. Let's say that each <language> element belonged to a namespace:

<language xmlns="http://translate.google.com">en</language>
<author>Stanislaw Lem</author>

Note: While namespaces technically can be anything (like "abc123" for example), they should be globally unique. There's no way to enforce this, so the convention is to use a URI belonging to the person or company creating the namespace. For example, if Oracle wants to use a namespace, they can be fairly certain that no one else in the entire world is using one starting with "http://www.oracle.com".

To make Java aware of namespaces,a NamespaceContext object must be created and added to the XPath object.

XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if ("tr".equals(prefix)){
return "http://translate.google.com";
return null;
public Iterator getPrefixes(String uri) {
return null;
public String getPrefix(String uri) {
return null;

This will assign the prefix "tr" to the namespace "http://translate.google.com". The prefix can be anything, but the namespace must match the one in the XML document.

NodeList nodeList = (NodeList)xpath.evaluate("/library/book[tr:language='en']/title",
documentRoot, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i){
Node node = nodeList.item(i);

To learn more about XPath, you can visit w3schools.com.

Tuesday, March 9, 2010


I just found out about this cool service called Dropbox, which I thought I'd write about. I read a blurb about it in the March issue of Linux Journal.

Dropbox is a free service that lets you store files online. That sounds pretty boring. But it's more than just a free FTP server. What happens is, you install a small app on your computer, which creates a special "dropbox" folder in your home directory (or in My Documents if you're using Windows). The app monitors this folder for any changes and syncs these changes with the Dropbox server. For example, when you copy a file (Word document, MP3, whatever) to this folder, it will immediately upload it to your Dropbox account. If you delete a file, it will delete it from your account.

But the really cool thing is that you can connect multiple computers to your account by installing the Dropbox application on each one (there are Windows, Mac, Linux, and iPhone versions). So if you add a file to your Dropbox folder on your desktop PC, it will immediately download to your MacBook laptop, Linux Netbook, and whatever else.