P2p
From P2P Wiki
Overview
The term peer-to-peer refers to a class of systems and applications that employ distributed resources to perform a critical function in a decentralized manner. The resources encompass computing power, data (storage and content), network bandwidth, and presence (computers, human, and other resources). The critical function can be distributed computing, data/content sharing, communication and collaboration, or platform services. Decentralization may apply to algorithms, data, and meta-data, or to all of them. This does not preclude retaining centralization in some parts of the systems and applications if it meets their requirements.
A peer-to-peer (or "P2P", or, rarely, "PtP") computer network exploits diverse connectivity between participants in a network and the cumulative bandwidth of network participants rather than conventional centralized resources where a relatively low number of servers provide the core value to a service or application. Peer-to-peer networks are typically used for connecting nodes via largely ad hoc connections. Such networks are useful for many purposes. Sharing content files containing audio, video, data or anything in digital format is very common, and realtime data, such as Voice over IP traffic, is also passed using P2P technology.
A pure peer-to-peer network does not have the notion of clients or servers, but only equal peer nodes that simultaneously function as both "clients" and "servers" to the other nodes on the network. This model of network arrangement differs from the client-server model where communication is usually to and from a central server. A typical example for a non peer-to-peer file transfer is an FTP server where the client and server programs are quite distinct, and the clients initiate the download/uploads and the servers react to and satisfy these requests.
The concept of peer to peer is increasingly evolving to an expanded usage as the relational dynamic active in distributed networks, i.e. not just computer to computer, but human to human. Yochai Benkler has coined the term "commons-based peer production" to denote collaborative projects such as free software. Associated with peer production are the concept of peer governance (referring to the manner in which peer production projects are managed) and peer property (referring to the new type of licenses which recognize individual authorship but not exclusive property rights, such as the GNU General Public License and the Creative Commons License).
A little bit of history
Peer-to-peer basically started in the late 1960s with the establishment of the ARPANET. This network was designed to share computing resources and documents between different US research facilities. This initial system was not a server-client like system, every host was treated equally, and one could therefore call this network a first Peer-to-Peer network, although it was not self organizing and no overlay network was established. Everything matched to the physical connection and no virtual connections were established as we can see today in Peer-to-Peer networks.
In 1979 the UseNet protocol was developed. It is a newsgroup application, helping to organize the content and offering a self-organizing approach to add and remove newsgroup server by the participating users via a rigorous democratic process. However, the application itself is still a tyupical client server application, with the clients as requesting nodes, and the servers as content providing nodes.
Around 1990, there was a rush of general public joining Internet and a lot of applications were developed, such as WWW, email, streaming... The basic communication model was the client/server model. In this model, a simple application on the user side which establishes a connection with the server, from which downloads the requested content and disconnects from it once it is done.
Classifications of peer-to-peer networks
Peer-to-peer networks can be classified in a wide variety of ways. However, we are interested in classifying the various p2p architectures to date by the algorithms they use when trying to locate resources. One of the principal challenges of such systems is how to locate a particular resource. Since this can be a highly complex problem, several approaches have been taken to overcome it. In chronological order, these are the central index location scheme, the unstructured location scheme (mainly based on unstructured p2p networks) and the distributed hash table scheme (based on structured p2p networks).
For further details on those two concepts see:
Other classifications are:
Depending on what they are used for:
- file sharing
- telephony
- media streaming (radio, video)
- discussion forums
Other classification of peer-to-peer networks is according to their degree of centralization.
In 'pure' peer-to-peer networks:
- Peers act as equals, merging the roles of clients and server
- There is no central server managing the network
- There is no central router
Some examples of pure peer-to-peer application layer networks designed for file sharing are Gnutella and Freenet.
There also exist countless hybrid peer-to-peer systems:
- Has a central server that keeps information on peers and responds to requests for that information.
- Peers are responsible for hosting available resources (as the central server does not have them), for letting the central server know what resources they want to share, and for making its shareable resources available to peers that request it.
- Route terminals are used addresses, which are referenced by a set of indices to obtain an absolute address.
e.g.
- Centralized P2P network such as Napster
- Decentralized P2P network such as KaZaA
- Structured P2P network such as Content Addressable Network
- Unstructured P2P network such as Gnutella
- Hybrid P2P network (Centralized and Decentralized) such as JXTA.
1st Generation Clients (Centralized)
Around 1999, home users started to use their computers for something more than requesting content from web or email servers. Their computers not olny downloaded content, but they also provided content to other users over the Internet.
Napster
Napster was the original P2P application that popularized the concept to millions. The way Napster worked was quite simple. (For further technical details see Napster).
Napster‘s architecture consisted of a central index server where all users logged in and uploaded metadata about which resources they were sharing. Content searches were made on the index server, and resource transfers were made between peers themselves.
The central server model made sense for many reasons -- it was an efficient way to handle searches, and allowed Napster to retain control over the network. However, what it also meant was that when the lawyers came down on Napster, all they had to do was turn off the central servers and that was the end of Napster.
Napster was a file sharing service that paved the way for decentralized P2P file-sharing programs such as Kazaa, Limewire, iMesh, Morpheus, and BearShare, which are now used for many of the same reasons and can download music, pictures, and other files. The popularity and repercussions of the first Napster have made it a legendary icon in the computer and entertainment fields.
Some of the advantages and disadvantages of centralized p2p systems are the following:
- Advantages:
- Locates files quickly and efficiently.
- Searches are as comprehensive as possible.
- All users must registered to be on the network.
- Disadvantages:
- Vulnerable to censorship and technical failure.
- Slashdot effect: popular data become less accessible because of the load of the requests on a central server.
- Central index might be out of data because the central server’s database is only refreshed periodically.
2nd Generation Clients (Decentralized)
In opposition to the previous, in decentralized p2p systems all peers have the same capability and responsibility. Communication between peers is symmetric: there is no central directory index server where the files metadata is stored. This metadata is stored locally among all peers.
The technical principles in which they are based are (see Unstructured p2p systems):
- Flooding.
- Replication & Caching.
- Time To Live (TTL).
- Epidemics & Gossiping protocols.
- Super-Peers.
- Random Walkers & Probabilistic algorithms.
Some examples of applications based in this approach are:
- Gnutella.
- Freenet.
- FreeServe.
- MojoNation.
Techniques used in this approach
Gnutella was the second major P2P network that emerged. After Napster's demise, the creators of Gnutella wanted to create a de-centralized network, one that could not be shut down by simply turning off a server.
In the most basic sense, Gnutella worked by connecting users to other users directly (and bypassing any central server altogether). When you started the Gnutella client, you would connect to a certain number of other users, and those users were connected to other users etc... in one giant network. In order to search for a file, you asked everyone you were connected to hey, do you have this file?. They in turn would see if they do, and also pass the message on to all the people they were connected to. Basically, it was one big game of "telephone".
The main advantage was that it couldn't easily be shut down. The disadvantages were many, such as slow searches and islands of sub-networks that were not connected to each other.
Gnutella
Gnutella, in opposition to Napster, is a fully decentralized architecture. In Gnutella, the participating peers do not only act as a servent, they also take over routing functionalities initially performed in Napster by the central server. Thus, not only the file exchange is completely distributed, but also the content lookup/routing functionality. By doing so, any single point of failure is avoided, and tracking by the RIAA (Recording Industry Association of America) is more complex, making it difficult to prove any illegal activities on the part of the network's inventors.
- Advantages:
- Inherent scalability.
- Avoidance of “single point of litigation” problem.
- Fault Tolerance.
- Disadvantages:
- Slow information discovery.
- More query traffic on the network.
3rd Generation Clients
The next generation of p2p networks tries to solve the non-determinism problem of resource location. The idea is that if a specific resource is on the network, it should be found. To find it, the philosophy changes, and these networks start becoming structured node groupings. Nodes are arranged in a structured fashion, typically following tree or ring formations. The objective is to assign particular nodes to store particular content. When a node wishes to look for a resource, it must be redirected to the node which is supposed to hold it. The challenges of these structured peer-to-peer networks are:
- To avoid bottlenecks in particular nodes, thus distributing responsibilities evenly among the existing peers.
- To adapt to nodes joining or leaving (or failing). As a consequence, it is logical to give new responsibilities to joining nodes, and redistribute responsibilities from leaving nodes.
These challenges perfectly match the idea of a hash table, for details see:
The Future
P2P Networks and Clients have been the focus of much legal action. The music industry is the most threatened by the advent of P2P networks with the easy sharing of audio files. As a result, they have initiated a Legal war which brought Napster to an end. Since that time, they have been unsuccessful in shutting down FastTrack (Kazaa). As a result, they are now aiming their sites at end-users. In addition to the use of peer to peer networks for multimedia contents exchange, a number of future application areas arise:
- Self organizing collaborative environments.
- Location based services in conjunction with mobile networks.
- Peer to peer media streaming networks.
Further open problems which have to be addressed include reliability, availability, load-balancing, QoS and network organization. There is no doubt that peer to peer technology will play a key role in next generation networks.
Chronology
- July, 1999: publication of Freenet protocol
- September, 1999: creation of Napster
- November, 1999: first release of Direct Connect client
- March 14, 2000: first release of Gnutella
- September 6, 2000: first release of eDonkey2000
- March, 2001: introduction of the FastTrack protocol
- April, 2001: design of the BitTorrent protocol
- May, 2001: first release of WinMX Peer Network Protocol
- July, 2001: shutdown of Napster
- November 6, 2001: first release of GNUnet
- March, 2002: publication of the Kademlia DHT
- November, 2002: start of the Gnutella2 project
Advantages of peer-to-peer networks
An important goal in peer-to-peer networks is that all clients provide resources, including bandwidth, storage space, and computing power. Thus, as nodes arrive and demand on the system increases, the total capacity of the system also increases. This is not true of a client-server architecture with a fixed set of servers, in which adding more clients could mean slower data transfer for all users.
The distributed nature of peer-to-peer networks also increases robustness in case of failures by replicating data over multiple peers, and -- in pure P2P systems -- by enabling peers to find the data without relying on a centralized index server. In the latter case, there is no single point of failure in the system.
When the term peer-to-peer was used to describe the Napster network, it implied that the peer protocol was important, but, in reality, the great achievement of Napster was the empowerment of the peers (i.e., the fringes of the network) in association with a central index, which made it fast and efficient to locate available content. The peer protocol was just a common way to achieve this.
Legal controversy
Peer-to-peer technologies are rarely considered in and of themselves to be illegal.
However a frequent use of many peer-to-peer technologies is file sharing of copyright materials and this is very typically illegal, unless a license exists that permits this (such as GPL or GFDL), or for materials that have entered the public domain.
Other uses of peer-to-peer such as telephony are not typically nearly so controversial, although provision of telephony is restricted in some legal jurisdictions around the world.
Computer science perspective
Technically, a completely pure peer-to-peer application must implement only peering protocols that do not recognize the concepts of "server" and "client". Such pure peer applications and networks are rare. Most networks and applications described as peer-to-peer actually contain or rely on some non-peer elements, such as DNS. Also, real world applications often use multiple protocols and act as client, server, and peer simultaneously, or over time. Completely decentralized networks of peers have been in use for many years: two examples are Usenet and FidoNet (1984).
Many P2P systems use stronger peers (super-peers, super-nodes) as servers and client-peers are connected in a star-like fashion to a single super-peer.
Sun added classes to the Java technology to speed the development of peer-to-peer applications quickly in the late 1990s so that developers could build decentralized real time chat applets and applications before Instant Messaging networks were popular. This effort is now being continued with the JXTA project.
Peer-to-peer systems and applications have attracted a great deal of attention from computer science research; some prominent research projects include the Chord project, the P-Grid, a self-organized and emerging overlay network and the CoopNet content distribution system (see below for external links related to these projects).
Application of P2P Network outside Computer Science
- Bioinformatics: Peer-to-peer networks have also begun to attract attention from scientists in other disciplines, especially those that deal with large datasets such as bioinformatics. P2P networks can be used to run large programs designed to carry out tests to identify drug candidates. The first such program was begun in 2001 the Centre for Computational Drug Discovery at Oxford University in cooperation with the National Foundation for Cancer Research. There are now several similar programs running under the auspices of the United Devices Cancer Research Project. On a smaller scale, a self-administered program for computational biologists to run and compare various bioinformatics software is available from Chinook. Tranche is an open-source set of software tools for setting up and administrating a decentralized network. It was developed to solve the bioinformatics data sharing problem in a secure and scalable fashion.
- Academic Search engine: The sciencenet peer2peer search engine provides a free and open search engine for scientific knowledge. sciencenet is based on yacy technology. Universities / research institutes can download the free java software and contribute with their own peer(s) to the global network. Liebel-Lab @ Karlsruhe institute of technology KIT.
- Education and Academic: Due to the fast distribution and large storage space features, many organizations are trying to apply P2P network for educational and academic purposes. For instance, Pennsylvania State University, MIT and Simon Fraser University are carrying on a project called LionShare designed for facilitating file sharing among educational institutions globally.
- Military: The U.S. Department of Defense has already started research topic on P2P network as part of its modern network war. In November, 2001, Colonel Robert Wardell from the Pentagon told a group of peer-to-peer software engineers at a tech conference in Washington, DC: "You have to empower the fringes if you are going to... be able to make decisions faster than the bad guy".<ref>Walker, Leslie. Uncle Sam Wants Napster! The Washington Post, November 8, 2001</ref> Wardell indicated he was looking for peer-to-peer experts to join his engineering effort. In May, 2003 Dr. Tether. Director of Defense Advanced Research Project Agency testified that U.S. Military is using P2P network. Due to security reasons, details are kept confidential.
- Business: P2P network has already been used in business areas, but it is still at the beginning line. Currently, Kato et al’s studies indicate over 200 companies with approximately $400 million USD are investing in P2P network. Besides File Sharing, companies are also interested in Distributing Computing, Content Distribution, e-market place, Distributed Search engines, Groupware and Office Automation via P2P network. There are several reasons why companies prefer P2P sometimes such as: Real-time collaboration, a server cannot manage with increasing volume of contents, a process requires strong computing power, a process needs high-speed communications etc. At the same time, P2P is not fully used as it still confronts a lot of security issues.
- TV: One of the first applications of P2P in this area is Joost, which is expected to deliver (relay) near-TV resolution images.
- Telecommunication: Nowadays, people are not just satisfied with “can hear a person from another side of the earth”, instead, the demands of clearer voice in real-time are increasing globally. Just like the TV network, there are already cables built. It’s not very likely for companies to change all the cables. Many of them turn to use internet, more specifically, P2P network. For instance, Skype, one of the most widely used phone software is using P2P technology. Furthermore, many research organizations are trying to apply P2P network on cellular network.
References
- Peer to peer systems and applications. Steinmetz R, Wehrle K. Springer, 2005.
- Wikipedia.
- Liben-Nowell D, Balakrishnan H, Karger D. Analysis of the Evolution of Peer-to-Peer Systems
- Dabek, F.; Zhao, B.Y.; Druschel, P.; et al. Towards a Common API for Structured Peer-to-Peer Overlays
- Ledlie, J.; Shneidman, J.; Seltzer, M; et al. Scooped, Again
- Li, J.; Loo, B.T.; Hellerstein, J.; et al. On the Feasibility of Peer-to-Peer Web Indexing and Search
- Bonsma, E.; Hoile, C. The Swan Project
- Balazinska, M.; Balakrishnan, H. and Karger, D. INS/Twine: A Scalable Peer-to-Peer Architecture for Intentional Resource Discovery
- Freedman, M.J. and Vingralek, R. Efficient Peer-to-Peer Lookup Based on a Distributed Trie
- Ralf Steinmetz, Klaus Wehrle (Eds). Peer-to-Peer Systems and Applications. ISBN 3-540-29192-X, Lecture Notes in Computer Science, Volume 3485, Sep 2005.
- Ross J. Anderson. The eternity service. In Pragocrypt 1996, 1996.
- Marling Engle & J. I. Khan. Vulnerabilities of P2P systems and a critical look at their solutions, May 2006
- Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peer-to-peer content distribution technologies. ACM Computing Surveys, 36(4):335–371, December 2004. doi:10.1145/1041680.1041681.
- Biddle, Peter, Paul England, Marcus Peinado, and Bryan Willman, The Darknet and the Future of Content Distribution. In 2002 ACM Workshop on Digital Rights Management, 18 November 2002.
- Antony Rowstron and Peter Druschel, Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In proceedings Middleware 2001 : IFIP/ACM International Conference on Distributed Systems Platforms. Heidelberg, Germany, November 12-16, 2001. Lecture Notes in Computer Science, Volume 2218, Jan 2001, Page 329.
- Andy Oram et al., Peer-to-Peer:Harnessing the Power of Disruptive Technologies, Oreilly 2001
- Detlef Schoder and Kai Fischbach, Core Concepts in Peer-to-Peer (P2P) Networking. In: Subramanian, R.; Goodman, B. (eds.): P2P Computing: The Evolution of a Disruptive Technology, Idea Group Inc, Hershey.
- Ramesh Subramanian and Brian Goodman (eds), Peer-to-Peer Computing: Evolution of a Disruptive Technology, ISBN 1-59140-429-0, Idea Group Inc., Hershey, PA, USA, 2005.
- Shuman Ghosemajumder. Advanced Peer-Based Technology Business Models. MIT Sloan School of Management, 2002.
- Silverthorne, Sean. Music Downloads: Pirates- or Customers?. Harvard Business School Working Knowledge, 2004.



