Monday, October 18, 2010

Peer-to-Peer (P2P) Systems

In the current issue of the magazine Communications of the ACM, there is an article called Peer-to-Peer Systems by Rodrigo Rodrigues and Peter Druschel. Along with discussing the pros and cons of P2P networks and including examples of how they are used, the article goes into technical detail about how they work.

The article divides P2P systems into two types. One type is partly centralized. In these systems, there exists a single controller node, which keeps a list of all nodes that are connected to the network, along with the resources that each node is sharing. A good example of this kind of P2P network would be Napster (now non-existent). When you searched for a song on Napster, a request would be sent to a centralized server owned by the Napster folks themselves. This server would then search its database for all computers in the P2P network that had the song, and return this list to you. You then downloaded the song by directly connecting to the computer hosting the file. Without this server, there would be no way to get the song that you were looking for because you wouldn't know what computers, out of all the computers in the entire Internet, are both sharing their music collection and have the song you are looking for.

The other type of P2P network is decentralized. In a decentralized P2P network, there is no controller node that knows about all the computers in the network. Computers connect to the network through a bootstrap node, which is just one computer in the network that makes its IP address publicly known. This lack of centralization makes these types of P2P networks more robust, as the network is not dependent on a single server being functional. But because of this lack of centralization, there is no straight-forward way of knowing what computers are connected to the network or what resources they are sharing. This makes searching for particular resources trickier. The article describes two ways in which a decentralized P2P network can be structured in order to solve this problem of search.

One way of structuring a decentralized P2P network is by using an unstructured overlay (an overlay is a graph that describes how the nodes are connected with each other). Each computer in the network only knows about a few other computers that are also connected to the network. To search for a file, the computer will query its neighbors first. If none of its neighbors have the file, then it will ask its neighbor's neighbors. If none of these computers have the file, then it will ask its neighbor's neighbor's neighbors, and so on. This kind of decentralized network is fine if the resource you are looking for is replicated across many other nodes. However, if the resource is rare, then finding that resource could take a very long time. For example, imagine having to search for a file which exists on only one computer in a network of one million computers. The odds of that computer being within a short search distance to your computer is slim.

The other way a decentralized P2P network can be structured is by using a structured overlay. I didn't quite understand the specifics of this technique, but it involves the use of unique keys. Each computer in the network is assigned a unique key in such a way that all keys are evenly spread out in the key space. For example, if the key space is 0-999, the first node will be assigned a random key, say 432. Then, the second node will be assigned a key around 932 (999/2+432, on the opposite side of the "circle"). The third key will be assigned a key around either 682 or 182, and so on. Each node only directly knows about its two neighbors, so the overlay graph looks like a circle. The advantage to this type of overlay is that it makes searching much faster. It's able to use these unique keys to quickly find a computer hosting the resource. This is called key-based routing (KBR). Even if the resource is rare, it will still be able to find it quickly (unlike unstructured overlays, which must spend the time to ask each node directly). However, the downside is that there exists an overhead to maintain these keys. Extra work must be done every time a node enters or leaves the network, so if the number of computers that are connected to the network is constantly changing (called churn) this may not be the best solution.

It was a very good article, but I do have one criticism: it considers applications like SETI@home to be P2P. How is this P2P? You do not communicate with the other peers on the network. You only communicate with the centralized SETI server in order to download new data to process. I think that a better category for SETI@home would be "distributed computing", not P2P.

1 comment:

digital signature certificate said...

Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipment participants in the application.