msd, rmsd - copy unique files back to minimum safe distance
msd [-DnvX] [-T top] [user@]host [targets]
msd -U [-T top] [paths]
msd -W [-T top] [sizes]
Msd creates a de-duped archive of all the unique files available from
each of the target directories on the given host. This archive is
packed into a structure that is easy to search by file size.
The idea is to cache all the unique files from a "group" of hosts in a
common backup pool. As each host is collected a log of the files
needed to rebuild that host (and where they would be placed in the
restored file systems) should be produced. This log is not currently
generated. Also symbolic links are presently not recorded, as they are
only log entries.
Files which are large and compress well could be replaced with the best
compression available. Any such replacement would be updated in the
host log, and would be done in the process described below.
After all the hosts in a group have been processed a second pass
through the host manifest files would remove any file that was a trun-
cated version of another, replacing the log entry with a pointer to the
larger file and the required length. Any file which is an uncompressed
version of a compressed (gzip(1), xz(1), compress(1), etc.) image of
compressed file would similarly be noted. Archive files (tar(1),
ar(1), zip(1), etc.) files could be indexed so their members could be
replaced with references, as well.
The resulting directory and per-host recipe files should be a very com-
pact representation of the entire cluster, with only 1 copy of each
unique file. The restoration of any member of the cluster should be
easy, but some information would be lost (file ownership, times, and
This has never been a problem, as we've use this to meet legal require-
ments to keep "every file" for a mandated e-discovery. There was no
requirement to keep the modes, owners, or access times on the files.
And pulling any single file out of the cache is quite easy.
Rmsd is the remote service that msd starts on each target host. It
supports a very simple request/reply protocol to allow the checksum and
recovery of files under start (which defaults to the current working
Debug protocol by tracing the commands sent to the remote host.
Output a standard help message.
Do not download any files, just output which files should be
saved. This also builds all the cache directories, which looks
like a bug but really isn't. This allows the back-feeder to
allocate all the directory structure for a group of hosts in
Change the local cache root, rather than ".".
Output the size of the cached paths listed. Don't use ls(1), as
the file may not exist, may be compressed, or may be segmented.
Output a line for each file cached, implied by -n. Under -U and
-W include the requested input for each output line.
Output only the standard version information.
Output what path a file of the given size would be cached under.
Debug protocol by tracing traffic from the remote process
Specify the remote login name for ssh(1) and the target host.
Specify the source directories to recursively search for unique
msd email@example.com /home/staff1/ksb
Archive all the unique files from my account on "nostromo" here.
msd -v firstname.lastname@example.org /home/staff1/ksb
Add all the unique files from my account on "sulaco" to the
above archive, and show me the list of the files added.
msd -W 4660046610375530309
Outputs "./xJ/0" which is the directory a file of that size
would be cached under.
msd -W 4660046610377530309
Outputs "./xJ/Y/W/T/P/H/B/0" because that number is a little
bigger than an even Fibonacci number.
msd -vU ./P/H/D/8/3/1
Outputs "./P/H/D/8/3/1: 18155" because that path will recover a
file that unpacks to 18,155 bytes on disk.
msd -W 'msd -U kevin'
Ouputs "./v/n/k/i/e/0" which is the canonical spelling. Some
names compress a lot (my last name does).
The purpose of this program is not clear, until you need it. Largely
when you miss a file it is because it was unique on some host (while
being the same across all others). Msd only keeps the unique ones
(mostly) so you can find the unique one if you can grep for contents or
know the file size.
There are missing parts to this facility that I've not released (or
finished coding) yet.
There is no manual page for rmsd(7l), yet.
KS Braunsdorf, Non-Player Character's Guild,
msd at ksb.npcguild.org.NoSpam.rm
ssh(1), grep(1), rsync(1), sbp(8l)