OFTorrent

From OpenP2P

Jump to: navigation, search

OFTorrent uses the concept of the Owner-Free File System to modify torrents to provide significantly improved privacy. The basis of the idea is to break a file down into fixed size blocks, and then xor them with other blocks which have either been generated randomly or pulled from other nodes in the network. Nodes retrieving a file then download each of the blocks and xor them together to construct the actual file data, based on knowledge they have from a torrent-like file. The significance of this mechanism is that a single block can be used to produce more than one file (and if the observing node doesn't know otherwise, could potentially just be random data and not used for any files).

Contents

Creating a torrent

  1. Generate a random AES-256 key
  2. Encrypt the data with the key
  3. Split the encrypted data into appropriately sized blocks (generally from 16KB up to 4MB)
  4. Select an equal number of 'randomizing' blocks, either by using existing blocks (which could've been pulled from the network or produced after generating another file), or by generating them randomly
  5. Xor each encrypted data block with the corresponding randomizing block
  6. Store the result blocks
  7. For each result block, store our IP address on the DHT under the block's SHA-256 hash
  8. Finally, generate the OFTorrent file containing the key and for each data block the SHA-256 hashes of each the blocks that need to be xor-ed together to produce it, and distribute this to users so they can download the file

Downloading a torrent

  1. Connect to the DHT (or likely already be connected)
  2. For each of the blocks specified in the OFTorrent file, find the IP addresses of nodes seeding them in the DHT by looking up their SHA-256 hashes (as given in the OFTorrent file)
  3. Download each of the blocks from these IP addresses
  4. Decrypt the blocks with the AES-256 key given in the OFTorrent file

Privacy

The mechanisms above particularly provide improved privacy in the case where the OFTorrent file is only distributed within a closed group - in this case, due to the encryption, it is infeasible to identify what files any blocks found on the network can be used to generate, considering that AES will generate random-looking data. However, even if the OFTorrent file is distributed widely, it is very unclear to observers whether the seeding IP addresses are even aware of the underlying file, and they could be seeding another file which uses the same block(s). As described above, nodes will seed all the blocks for a file - in the interests of privacy it is likely to be wise to in fact only seed a randomly sized and randomly selected portion of the blocks, ensuring that at least one of the blocks isn't being seeded (this could also take into account the relative availability of each of the blocks), and hence introducing reasonable doubt.

In the system as described, the actual data needed to be downloaded will be twice the size of the data file itself, adding a considerable overhead. However this overhead can be reduced by reusing some of the randomizing blocks or by using some of the encryption data blocks as randomizing blocks - both of which still maintains the property that a single block can be used to generate multiple different files. In general it should be possible to achieve an overhead of less than 50% by these methods, although a smaller overhead is likely to mean more restricted privacy (but I need to study this). It is important to note that nodes are likely to have many blocks already downloaded from other files and thus any re-occurence of these blocks in other files will not require re-downloading.

A clear focus of attack appears to be the original uploader, since they will be seeding all the original blocks, many of which may only have just have appeared. This is mitigated considerably by incorporating a Gossip protocol, where nodes can announce blocks they possess, where they may or may not know what files the blocks could generate, and in fact may have just generated them randomly themselves. Other nodes can then download these blocks at random occasions and begin seeding them to other nodes, allowing existing seeders to reduce their commitments and hence improve their privacy. In future the protocol may introduce an economy-like system (e.g. like Tribler's BarterCast) to encourage seeding behaviour.

Efficiency

OFTorrent is not as efficient as typical torrenting, due to the various small communication overheads (although BitTorrent now uses a DHT), the use of encryption (although BitTorrent does often encrypt data too now), and the size overhead introduced. However, typical torrents provide little privacy, since it is trivial to get a list of IP addresses seeding a particular file with certainty that they are seeding that file, so OFTorrent should be compared to existing solutions that provide privacy.

The most significant method to improve privacy, and which does provide maximum privacy, is a darknet. Unlike OFTorrent, darknets aim to ensure that only the current node's neighbours are even aware that the node is exists and is using the protocol, where any communication must be done through a node's neighbours. This is a very convenient method for privately transferring small amounts of data, however it introduces a large amount of overhead because data must be transferred between multiple different nodes, and hence is limited to the smallest bandwidth and the sum of the latencies. Even using its Opennet mode, a system such as Freenet is very slow because data must be routed between multiple computers. OFTorrent transfers all data blocks directly from seeder to downloader, avoiding these overheads.

OFTorrent is a modification of the Owner-Free File System, and much of the modification came from a desire to improve performance. The Owner-Free File System actually stores all data blocks in the DHT, meaning that even relatively small files can take a long time to be stored into the network, such that the original upload is very time consuming. Furthermore, the system can cause significant load on nodes that simply happen to have DHT IDs close to the hash of a popular block, rather than spreading this across all nodes that have the block and are prepared to seed it. OFTorrent does actually also cause an increased load in a similar way, but in that case the data stored is IP addresses, hence messages are very small and bandwidth consumption is negligible - it's also much easier for other nodes to cache the IP address lists so they can handle future requests - this problem is solved similarly in the Root Network, where it has the potential to be much worse.

Personal tools