FolderSync

From OpenP2P

Jump to: navigation, search

OpenP2P 'FolderSync' is a proposed network designed to allow multiple nodes on a network to synchronise the contents of a file system folder. The aim is to provide a simple, reliable and efficient mechanism that provides an open source equivalent to commercial solutions such as 'Dropbox' and 'Box'.

Contents

Functionality

A synchronised folder has the following functionality:

  • Nodes can modify their data at any time, whether connected to other nodes or not.
  • When two nodes with the same folder connect, they should merge their folders together.

Data Store

Nodes should build the directory structure on top of a hash-based data store. Essentially the structure is very similar to Unix inodes, where all data is held in fixed size blocks in a generic data store, and in which directories are stored as a list of pairs of file names and the data block IDs.

However, note that SHA256 hashes are used as data block IDs. This means that if two files are equivalent, they use the same block in the data store, and hence copy operations can be communicated quickly and stored efficiently.

Journal

When a node modifies a file within a synchronised folder, it should note the change in its journal, which is a log of all operations it has performed on its local copy of the folder.

The journal consists of a series of file IDs for the root directory, since any other files (due to the fact directories are files that contain the block IDs of file nodes they conceptually contain) being changed will cause an 'upward cascade' that changes the root directory.

The journal is then communicated to other nodes so that they can reproduce the steps that node made to create its synchronised folder.

Protocol

FolderSync requires a reliable stream protocol (normally that means TCP).

There are two parts to the FolderSync protocol:

  • Journal queries and updates.
  • Data block queries.

Journal Protocol

When nodes initially connect (and synchronised is presumably approved by both ends), they should send each other any new journal entries since their last connection. Node identification is not handled by this network; it is assumed it is built on top of the Root Network and can therefore take advantage of the identification and authentication it provides.

When nodes are already connected and have performed the initial journal transfer, they should maintain the connection in order to communicate any new journal entries to other nodes.

Data Block Protocol

Nodes can request data blocks by sending their block IDs (i.e. SHA256 hashes) to the node that has reported the block.

Personal tools