Bencode (pronounced like Bee-encode ) is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.
55-892: It supports four different types of values: Bencoding is most commonly used in torrent files , and as such is part of the BitTorrent specification. These metadata files are simply bencoded dictionaries. Bencoding is simple and (because numbers are encoded as text in decimal notation) is unaffected by endianness , which is important for a cross-platform application like BitTorrent. It is also fairly flexible, as long as applications ignore unexpected dictionary keys, so that new ones can be added without creating incompatibilities. Bencode uses ASCII characters as delimiters and digits. There are no restrictions on what kind of values may be stored in lists and dictionaries; they may (and usually do) contain other lists and dictionaries. This allows for arbitrarily complex data structures to be encoded. Bencode
110-447: A query flooding model – in essence, each search would result in a message being broadcast to every machine in the network. While avoiding a single point of failure , this method was significantly less efficient than Napster. Later versions of Gnutella clients moved to a dynamic querying model which vastly improved efficiency. Freenet is fully distributed, but employs a heuristic key-based routing in which each file
165-403: A Pastry overlay, and DQ-DHT, which implements a dynamic querying search algorithm over a Chord network. Because of the decentralization, fault tolerance, and scalability of DHTs, they are inherently more resilient against a hostile attacker than a centralized system. Open systems for distributed data storage that are robust against massive hostile attackers are feasible. A DHT system that
220-749: A file debian-503-amd64-CD-1.iso (whose size is 678 301 696 bytes) might look like: Note: pieces here would be a 51 KiB value ( ⌈ l e n g t h p i e c e l e n g t h ⌉ × 160 = 414080 b i t s {\displaystyle {\color {Blue}\left\lceil {\color {Black}{\frac {\mathtt {length}}{\mathtt {piece\ length}}}}\right\rceil }\times 160=414080\ \mathrm {bits} } ). A de-bencoded torrent file (with 'piece length' 256 KiB = 262144 B) for two files, 111.txt and 222.txt , might look like: Distributed hash table A distributed hash table ( DHT )
275-478: A file-sharing service. These systems differed in how they located the data offered by their peers. Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits. Gnutella and similar networks moved to
330-500: A key to one of the n available servers. Each client has the same list of identifiers { S 1 , S 2 , ..., S n } , one for each server. Given some key k , a client computes n hash weights w 1 = h ( S 1 , k ), w 2 = h ( S 2 , k ), ..., w n = h ( S n , k ) . The client associates that key with the server corresponding to the highest hash weight for that key. A server with ID S x {\displaystyle S_{x}} owns all
385-449: A navigable small-world network based on random walk sampling also assuring logarithmic search time. Each node maintains a set of links to other nodes (its neighbors or routing table ). Together, these links form the overlay network. A node picks its neighbors according to a certain structure, called the network's topology . All DHT topologies share some variant of the most essential property: for any key k , each node either has
440-459: A new key, httpseeds , is placed in the top-most list (i.e., with announce and info ). This key's value is a list of web addresses where torrent data can be retrieved. Special server support is required. It remains at Draft status. BEP-0030 extends BitTorrent to support Merkle trees (originally implemented in Tribler ). The purpose is to reduce the file size of torrent files, which reduces
495-415: A node ID that owns k or has a link to a node whose node ID is closer to k , in terms of the keyspace distance defined above. It is then easy to route a message to the owner of any key k using the following greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k . When there is no such neighbor, then we must have arrived at
550-504: A torrent would have different infohashes in v1 and v2 networks, two swarms would form, requiring special handling by the client to merge the two. A core feature of the new format is its application of merkle trees , allowing for 16KiB blocks of a piece to be individually verified and re-downloaded. Each file now always occupy whole piece sizes and have an independent merkle root hash, so that it's possible to find duplicate files across unrelated torrent files of any piece length. The file size
605-499: A tracker restricts access to torrents it tracks by checking the peer's IP, refusing to provide a peer list if the IP is unknown. The peer itself is usually registered to the tracker via a gated online community; the private tracker typically also keep statistics of data transfer for use in the community. Decentralized methods like DHT, PeX, LSD are disabled to maintain the centralized control. A private torrent can be manually edited to remove
SECTION 10
#1732855140186660-401: A trackerless torrent has a nodes key: For example, The specification recommends that nodes "should be set to the K closest nodes in the torrent generating client's routing table. Alternatively, the key could be set to a known good node such as one operated by the person generating the torrent." BEP-0012 extends BitTorrent to support multiple trackers. A new key, announce-list ,
715-508: Is a distributed system that provides a lookup service similar to a hash table . Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values , which in turn can be anything from addresses, to documents , to arbitrary data . Responsibility for maintaining
770-483: Is a very specialized kind of binary coding with some unique properties: However, this uniqueness can cause some problems: Torrent file In the BitTorrent file distribution system, a torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers , which are computers that help participants in
825-405: Is associated with a key, and files with similar keys tend to cluster on a similar set of nodes. Queries are likely to be routed through the network to such a cluster without needing to visit many peers. However, Freenet does not guarantee that data will be found. Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and
880-428: Is asymmetrical, supporting greater download speeds than upload speeds, limiting the bandwidth of each download, and sometimes enforcing bandwidth caps and periods where systems are not accessible. This creates inefficiency when many people want to obtain the same set of files from a single source; the source must always be online and must have massive outbound bandwidth. The torrent protocol addresses this by decentralizing
935-501: Is carefully designed to have Byzantine fault tolerance can defend against a security weakness, known as the Sybil attack , which affects most current DHT designs. Whanau is a DHT designed to be resistant to Sybil attacks. Petar Maymounkov, one of the original authors of Kademlia , has proposed a way to circumvent the weakness to the Sybil attack by incorporating social trust relationships into
990-413: Is created to represent a file or folder to be shared. The torrent file acts as the key to initiating downloading of the actual content. Someone interested in receiving the shared file or folder first obtains the corresponding torrent file, either by directly downloading it or by using a magnet link . The user then opens that file in a torrent client, which automates the rest of the process. In order to learn
1045-527: Is fundamental in graph theory . Route length can be greater than diameter, since the greedy routing algorithm may not find shortest paths. Aside from routing, there exist many algorithms that exploit the structure of the overlay network for sending a message to all nodes, or a subset of nodes, in a DHT. These algorithms are used by applications to do overlay multicast , range queries, or to collect statistics. Two systems that are based on this approach are Structella, which implements flooding and random walks on
1100-425: Is less common than in many other peer-to-peer (especially file sharing ) systems; see anonymous P2P . The structure of a DHT can be decomposed into several main components. The foundation is an abstract keyspace , such as the set of 160-bit strings . A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find
1155-665: Is low, so that maintenance overhead is not excessive. Of course, having shorter routes requires higher maximum degree . Some common choices for maximum degree and route length are as follows, where n is the number of nodes in the DHT, using Big O notation : The most common choice, O ( log n ) {\displaystyle O(\log n)} degree/route length, is not optimal in terms of degree/route length tradeoff, but such topologies typically allow more flexibility in choice of neighbors. Many DHTs use that flexibility to pick neighbors that are close in terms of latency in
SECTION 20
#17328551401861210-528: Is no more assurance that the keys (and thus the load) is uniformly randomly distributed over the key space and the participating peers. DHT protocols such as Self-Chord and Oscar address such issues. Self-Chord decouples object keys from peer IDs and sorts keys along the ring with a statistical approach based on the swarm intelligence paradigm. Sorting ensures that similar keys are stored by neighbour nodes and that discovery procedures, including range queries , can be performed in logarithmic time. Oscar constructs
1265-437: Is not reduced (assuming piece size stays the same; v2's tree structure allows larger pieces with fewer ill effects), but the info dictionary required for magnet links are (only in v2-only torrents). A torrent file can also contain additional metadata defined in extensions to the BitTorrent specification. These are known as "BitTorrent Enhancement Proposals." Examples of such proposals include metadata for stating who created
1320-511: Is placed in the top-most dictionary (i.e., with announce and info ) BEP-0019 is one of two extensions allowing HTTP seeds to be used in BitTorrent. In BEP-0019, a new key url-list , is placed in the top-most list. The client uses the links to assemble ordinary HTTP URLs – no server-side support is required. This feature is very commonly used by open source projects offering software downloads. Web seeds allow smart selection and simultaneous use of mirror sites , P2P or HTTP(S), by
1375-469: Is uniquely identified by an infohash , a SHA-1 hash calculated over the contents of the info dictionary in bencode form. Changes to other portions of the torrent do not affect the hash. This hash is used to identify the torrent to other peers via DHT and to the tracker. It is also used in magnet links . The BitTorrent v2 protocol (BEP-0052) introduces a new definition of the torrent file. The basic structure is: The new format uses SHA-256 in both
1430-537: Is unrelated to geographical distance or network latency . Each node is assigned a single key called its identifier (ID). A node with ID i x {\displaystyle i_{x}} owns all the keys k m {\displaystyle k_{m}} for which i x {\displaystyle i_{x}} is the closest ID, measured according to δ ( k m , i x ) {\displaystyle \delta (k_{m},i_{x})} . For example,
1485-456: The Chord DHT uses consistent hashing, which treats nodes as points on a circle, and δ ( k 1 , k 2 ) {\displaystyle \delta (k_{1},k_{2})} is the distance traveling clockwise around the circle from k 1 {\displaystyle k_{1}} to k 2 {\displaystyle k_{2}} . Thus,
1540-787: The Kad network , the Storm botnet , the Tox instant messenger , Freenet , the YaCy search engine, and the InterPlanetary File System . DHT research was originally motivated, in part, by peer-to-peer (P2P) systems such as Freenet , Gnutella , BitTorrent and Napster , which took advantage of resources distributed across the Internet to provide a single useful application. In particular, they took advantage of increased bandwidth and hard disk capacity to provide
1595-515: The DHT from one node to another, minimizing such reorganization is required to efficiently support high rates of churn (node arrival and failure). Consistent hashing employs a function δ ( k 1 , k 2 ) {\displaystyle \delta (k_{1},k_{2})} that defines an abstract notion of the distance between the keys k 1 {\displaystyle k_{1}} and k 2 {\displaystyle k_{2}} , which
1650-457: The DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key k as specified by the keyspace partitioning. That node then stores the key and the data. Any other client can then retrieve the contents of the file by again hashing filename to produce k and asking any DHT node to find the data associated with k with a message get ( k ) . The message will again be routed through
1705-711: The Infrastructure for Resilient Internet Systems (Iris) was funded by a $ 12 million grant from the United States National Science Foundation in 2002. Researchers included Sylvia Ratnasamy , Ion Stoica , Hari Balakrishnan and Scott Shenker . Outside academia, DHT technology has been adopted as a component of BitTorrent and in PlanetLab projects such as the Coral Content Distribution Network. DHTs characteristically emphasize
Bencode - Misplaced Pages Continue
1760-448: The burden on those that serve torrent files. A torrent file using Merkle trees does not have a pieces key in the info list. Instead, such a torrent file has a root_hash key in the info list. This key's value is the root hash of the Merkle hash: BitTorrent v2 uses a different type of Merkle tree. A de-bencoded torrent file (with piece length 256 KiB = 262,144 bytes) for
1815-426: The circular keyspace is split into contiguous segments whose endpoints are the node identifiers. If i 1 {\displaystyle i_{1}} and i 2 {\displaystyle i_{2}} are two adjacent IDs, with a shorter clockwise distance from i 1 {\displaystyle i_{1}} to i 2 {\displaystyle i_{2}} , then
1870-580: The client has all the pieces, the torrent client assembles them into a usable form. They may also continue sharing the pieces, elevating their status to that of seeder rather than an ordinary peer. A torrent file contains a list of files and integrity metadata about all the pieces, and optionally contains a large list of trackers. A torrent file is a bencoded dictionary with the following keys (the keys in any bencoded dictionary are lexicographically ordered ): All strings must be UTF-8 encoded, except for pieces , which contains binary data. A torrent
1925-419: The client. Doing so reducing the load on the project's servers while maximizing download speed. MirrorBrain [ de ] automatically generates torrents with web seeds. BEP-0027 extends BitTorrent to support private torrents. A new key, private , is placed in the info dictionary. This key's value is 1 if the torrent is private: Private torrents are to be used with a private tracker . Such
1980-401: The closest node, which is the owner of k as defined above. This style of routing is sometimes called key-based routing . Beyond basic routing correctness, two important constraints on the topology are to guarantee that the maximum number of hops in any route (route length) is low, so that requests complete quickly; and that the maximum number of neighbors of any node (maximum node degree )
2035-499: The distributed hash table problem. Both consistent hashing and rendezvous hashing have the essential property that removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped. Since any change in ownership typically corresponds to bandwidth-intensive movement of objects stored in
2090-411: The distribution, leveraging the ability of people to network " peer-to-peer ", among themselves. Each file to be distributed is divided into small information chunks called pieces . Downloading peers achieve high download speeds by requesting multiple pieces from different computers simultaneously in the swarm. Once obtained, these pieces are usually immediately made available for download by others in
2145-418: The efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although Freenet's routing algorithm can be generalized to any key type where a closeness operation can be defined. In 2001, four systems— CAN , Chord , Pastry , and Tapestry —ignited DHTs as a popular research topic. A project called
2200-415: The file in addition to, or in place of, the primary server . A torrent file does not contain the content to be distributed; it only contains information about those files, such as their names, folder structure , sizes, and cryptographic hash values for verifying file integrity. The torrent system has been created to ease the load on central servers, as instead of having individual clients fetch files from
2255-468: The following properties: A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system — most commonly, O (log n ) of the n participants (see below) — so that only a limited amount of work needs to be done for each change in membership. Some DHT designs seek to be secure against malicious participants and to allow participants to remain anonymous , though this
Bencode - Misplaced Pages Continue
2310-419: The internet locations of peers who may be sharing pieces, the client connects to the trackers named in the torrent file, and/or achieves a similar result through the use of distributed hash tables . Then the client connects directly to the peers in order to request pieces and otherwise participate in a swarm. The client may also report progress to trackers, to help the tracker with its peer recommendations. When
2365-476: The keys k m {\displaystyle k_{m}} for which the hash weight h ( S x , k m ) {\displaystyle h(S_{x},k_{m})} is higher than the hash weight of any other node for that key. Locality-preserving hashing ensures that similar keys are assigned to similar objects. This can enable a more efficient execution of range queries, however, in contrast to using consistent hashing, there
2420-658: The mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. DHTs form an infrastructure that can be used to build more complex services, such as anycast , cooperative web caching , distributed file systems , domain name services , instant messaging , multicast , and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent 's distributed tracker,
2475-453: The node with ID i 2 {\displaystyle i_{2}} owns all the keys that fall between i 1 {\displaystyle i_{1}} and i 2 {\displaystyle i_{2}} . In rendezvous hashing, also called highest random weight (HRW) hashing, all clients use the same hash function h ( ) {\displaystyle h()} (chosen ahead of time) to associate
2530-456: The overlay to the node responsible for k , which will reply with the stored data . The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details. Most DHTs use some variant of consistent hashing or rendezvous hashing to map keys to nodes. The two algorithms appear to have been devised independently and simultaneously to solve
2585-465: The owner of any given key in the keyspace. Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To index a file with given filename and data in the DHT, the SHA-1 hash of filename is generated, producing a 160-bit key k , and a message put ( k, data ) is sent to any node participating in
2640-418: The physical underlying network. In general, all DHTs construct navigable small-world network topologies, which trade-off route length vs. network degree. Maximum route length is closely related to diameter : the maximum number of hops in any shortest path between nodes. Clearly, the network's worst case route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff that
2695-419: The piece-hashing and the infohash , replacing the broken SHA-1 hash. The "btmh" magnet link would contain the full 32-byte hash, while communication with trackers and on the DHT uses the 20-byte truncated version to fit into the old message structure. It is possible to construct a torrent file with only updated new fields for a "v2" torrent, or with both the old and new fields for a "hybrid" format. However, as
2750-528: The private flag, but doing so will change the info-hash (deterministically), forming a separate "swarm" of peers. On the other hand, changing the tracker list will not change the hash. The flag does not offer true privacy, instead operating as a gentlemen's agreement . These extensions are under consideration for standardization. Most are already widely adopted as de facto standards. BEP-0017 extends BitTorrent to support HTTP seeds, later more commonly termed "web seeds" to be inclusive of HTTPS . In BEP-0017,
2805-405: The server, torrent can crowd-source the bandwidth needed for the file transfer and reduce the time needed to download large files. Many free/freeware programs and operating systems, such as the various Linux distributions offer a torrent download option for users seeking the aforementioned benefits. Other large downloads, such as media files, are often torrented as well. Typically, Internet access
SECTION 50
#17328551401862860-406: The swarm. In this way, the burden on the network is spread among the downloaders, rather than concentrating at a central distribution hub or cluster. As long as all the pieces are available, peers (downloaders and uploaders) can come and go; no one peer needs to have all the chunks or to even stay connected to the swarm in order for distribution to continue among the other peers. A small torrent file
2915-609: The system design. The new system, codenamed Tonika or also known by its domain name as 5ttt, is based on an algorithm design known as "electric routing" and co-authored with the mathematician Jonathan Kelner. Maymounkov has now undertaken a comprehensive implementation effort of this new system. However, research into effective defences against Sybil attacks is generally considered an open question, and wide variety of potential defences are proposed every year in top security research conferences. Most notable differences encountered in practical instances of DHT implementations include at least
2970-457: The system find each other and form efficient distribution groups called swarms . Torrent files are normally named with the extension .torrent . A torrent file acts like a table of contents (index) that allows computers to find information through the use of a torrent client. With the help of a torrent file, one can download small parts of the original file from computers that have already downloaded it. These "peers" allow for downloading of
3025-536: The torrent, and when. These extensions have been deployed in one or more implementations as well as having been proven useful through consistent and widespread use. While they may require minor revisions, they are largely considered to be complete, only awaiting the blessing of Bram Cohen in order to be elevated to the status of Final/Active Process. BEP-0005 extends BitTorrent to support distributed hash tables , specifically Mainline DHT . A trackerless torrent dictionary does not have an announce key. Instead,
#185814