Chord

From P2P Wiki

Jump to: navigation, search

Chord is one of the original distributed hash table protocols. Chord is being developed at MIT and the current Chord source code can be downloaded and used under the MIT License.

Contents

Overview

The Chord protocol specifies how to find the locations of keys, how new nodes join the system, and how to recover from the failure (or planned departure) of existing nodes. At its heart, Chord provides fast distributed computation of a hash function, and mapping keys to nodes responsible for them. It uses consistent hashing to assign key (see [DHT|Distributed Hash Tables]), value pairs to their hash buckets, which are physical nodes.

Using the Chord lookup protocol, node keys are arranged in a circle. The circle cannot have more than 2^m nodes. The ring can have ids/keys ranging from 0 to 2^m - 1.

With high probability the hash function balances the load (all nodes receive roughly the same number of keys). Also with high probability, when an Nth node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location: this is clearly the minimum requirement for maintaining a balanced load. Chord improves the scalability of consistent hashing by avoiding the requirement that every node knows about every other node. A Chord node needs only a small amount of routing information about other nodes. Because this information is distributed, a node resolves the hash function by communicating with a few other nodes. In an N-node network, each node maintains information only about O(log N) other nodes, and a lookup requires O(log N) messages. Chord must update the routing information when a node joins or leaves the network; a join or leave requires O(log2 N) messages.

The consistent hash function uses a base hash function such as Secure Hash Algorithm 1 (SHA-1) to assign each node and key an m-bit identifier. A node‘s identifier is chosen by hashing the node‘s IP address, while a key identifier is produced by hashing the key. The identifier m must be long enough to make the probability of two nodes or keys hashing to the same identifier negligible. Consistent hashing assigns keys to nodes as follows. Identifiers are ordered in an identifier circle modulo 2^m. Key k is assigned to the first node whose identifier is equal to or follows (the identifier of) k in the identifier space. This node is called the successor node of key k, denoted by successor(k). If identifiers are represented as a circle of numbers from 0 to 2^m - 1, then successor(k) is the first node clockwise from k. Consistent hashing is designed to let nodes enter and leave the network with minimal disruption. To maintain the consistent hashing mapping when a node n joins the network, some keys that were previously assigned to n's successor are now assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n's successor. No other changes in the assignment of keys to nodes need occur. Each node stores information about only a small subset of the nodes in the system. in its routing table, called a finger table. The search for a node moves progressively closer to identifying the successor with each step. A search for the successor of f initiated at node r begins by determining if f is between r and the immediate successor of r. If so, the search terminates and the successor of r is returned. Otherwise, r forwards the search request to the largest node in its finger table that precedes f; call this node s. The same procedure is repeated by s until the search terminates.

Chord includes a simple stabilization protocol which allows it to be fault resilient, self-organizing and self-healing, and to perform acceptably even in the face of concurrent node arrivals and departures. Nevertheless, this simplicity is also one of the protocol‘s biggest problems, since it involves too much communication between nodes. The authors of Chord proposed an extension to support network proximity for lower latency and higher throughput.

Potential Uses

  • Cooperative Mirroring: A load balancing mechanism by a local network hosting information available to computers outside of the local network. This scheme could allow developers to balance the load between many computers instead of a central server to ensure availability of their product.
  • Time-shared storage: In a network, once a computer joins the network its available data is distributed throughout the network for retrieval when that computer disconnects from the network. As well as other computers' data is sent to the computer in question for offline retrieval when they are no longer connected to the network. Mainly for nodes without the ability to connect full time to the network.
  • Distributed Indices: Retrieval of files over the network within a searchable database. eg. P2P file transfer clients.
  • Large scale combinatorial searches: Keys being candidate solutions to a problem and each key mapping to the node, or computer, that is responsible for evaluating them as a solution or not. eg. Code Breaking


A Chord ring consisting of many nodes. Notice how the finger table is organized and how K54 is looked up following Chord‘s algorithm.
A Chord ring consisting of many nodes. Notice how the finger table is organized and how K54 is looked up following Chord‘s algorithm.


















Pseudocode

See also

External links and papers

Websites:




Files

Personal tools