PASTEDOWN 81 0
880 5 KB 60

IPFS alternative proposal

By Synthbot
Created: 2023-10-04 07:35:11
Updated: 2023-10-04 08:23:51
Expiry: Never

ppp torrent ipfs pag

Downsides of IPFS

It's hard to make IPFS reliable. There are a lot of configuration options with non-obvious consequences, and it's hard to find the important ones. One problem I keep running into is that IPFS takes up all available disk space and is then unable to service any requests.
It has severe CPU bottlenecks. On a computer that can normally transfer tens of megabytes per second, IPFS maxes out at 1 MB/s.
It can only use 1 disk per daemon, and the CLI doesn't support multiple daemons. It's possible to run multiple daemons by creating multiple users, but this introduces a lot of pin management complexity.

Use normal torrents instead of IPFS. Use an S3 interface to manage torrents. Use s3fs (FUSE interface to S3 buckets) to access files

What it should look like:

To create a new torrent, add a file to an S3 bucket. Some process will monitor the bucket and auto-generate a .torrent file, which you can download.
To locally cache and seed a file, add the .torrent file to the S3 bucket. Some process will monitor the bucket and start downloading & seeding the file.
To access everything in the S3 bucket as if it were a normal folder, use s3fs.

Why S3?

An S3 interface makes it easy to abstract away disks. A single S3 bucket can represent a hundred disks. The disks can both local and remote, cloud-managed or self-managed.
The S3 interface can deduplicate data on disk, which torrents usually do not.
For anyone with the bandwidth to do so, S3 also lets you provide direct downloads instead of torrent downloads.
On clouds, S3 storage is often cheaper than disks.
S3 is easier to work with programmatically than torrents and IPFS.

Data deduplication

Ceph supports deduplication for S3 object storage. https://docs.ceph.com/en/latest/dev/deduplication/. Downloading torrents to an s3fs filesystem would let us add deduplication benefits to torrents.
This isn't as good as IPFS's automatic deduplication, but it should be good enough when run periodically and when combined with the Incrementally updating datasets guidelines.
I haven't used Ceph's deduplication features myself, and I don't know how hard it will be to set up. This seems difficult right now, but it looks like Ceph intends to expand support for it over time.

Creation of data subsets

IPFS makes it somewhat easy to create a new pin that bundles together files from other pins.
The same is possible with torrent files. The info section of a torrent file contains a list of files to download along with tracker and hash information for where to find it. That's enough to create a new torrent file with pieces taken from other torrents.

Web interfaces for interacting with pins

Torrent files contain enough information to navigate directory structures, similar to IPFS pins. Torrent clients can convert magnet links to torrent files.
It looks like there are libraries for running torrent clients in the browser, similar to Helia for IPFS.

Incrementally updating datasets

Doing this efficiently will require following certain guidelines.
- The path for any static data should be fully determined by its hash (preferably SHA512).
- The path for metadata should be determined by its version.
To create a new version of the torrent, add all of the files in the static data folder, and add the newest metadata file. Make sure to specify the same "download folder" for all versions of the same dataset.

Setting up local storage:

Run Minikube to set up a local kubernetes cluster.
Install Rook-Ceph in the cluster.
Use Rook to create an object store. This will create an internal gateway to access object (Rados Gateway).
Expose the Rados Gateway either locally or (if you want others to access it directly) publicly.
All of this can be done in a short script.

Converting between S3 files and torrents:

Create a bucket whose files should be monitored.
Use s3fs to get a filesystem interface to the bucket.
Run a torrent client to seed and download data. The torrent client reads and write data from the filesystem, and s3fs transparently converts this to S3 operations.
Use Rook to set up a bucket notification for changes to the bucket.
Whenever a new metadata file is created, create a matching .torrent file, and notify the torrent client of new file.
Whenever a .torrent file gets added, notify the torrent client of the new file.
I haven't used s3fs or any API for interacting with torrent clients, so I don't know how difficult or reliable those parts will be. The notification part can be done with a short script.

For browsing downloaded files:

Use s3fs to create a folder that mirrors the S3 bucket contents.
I haven't used s3fs, so I don't know how reliable it will be. It looks like it's possible to run it on Windows, though that requires building it from source. I don't know how easy or reliable that will be. If it doesn't work, it looks like rclone and WinS3FS might work too.