MSD(8)                                                                  MSD(8)



NAME
       msd, rmsd - copy unique files back to minimum safe distance

SYNOPSIS
       msd [-DnvX] [-T top] [user@]host [targets]
       msd -U [-T top] [paths]
       msd -W [-T top] [sizes]
       msd -h
       msd -V
       rmsd [start]

DESCRIPTION
       Msd  creates  a de-duped archive of all the unique files available from
       each of the target directories on the  given  host.   This  archive  is
       packed into a structure that is easy to search by file size.

       The  idea is to cache all the unique files from a "group" of hosts in a
       common backup pool.  As each host is  collected  a  log  of  the  files
       needed  to  rebuild  that  host  (and where they would be placed in the
       restored file systems) should be produced.  This log is  not  currently
       generated.  Also symbolic links are presently not recorded, as they are
       only log entries.

       Files which are large and compress well could be replaced with the best
       compression  available.   Any  such replacement would be updated in the
       host log, and would be done in the process described below.

       After all the hosts in a  group  have  been  processed  a  second  pass
       through  the host manifest files would remove any file that was a trun-
       cated version of another, replacing the log entry with a pointer to the
       larger file and the required length.  Any file which is an uncompressed
       version of a compressed (gzip(1), xz(1), compress(1),  etc.)  image  of
       compressed  file  would  similarly  be  noted.   Archive files (tar(1),
       ar(1), zip(1), etc.)  files could be indexed so their members could  be
       replaced with references, as well.

       The resulting directory and per-host recipe files should be a very com-
       pact representation of the entire cluster, with only  1  copy  of  each
       unique  file.   The  restoration of any member of the cluster should be
       easy, but some information would be lost (file  ownership,  times,  and
       modes).

       This has never been a problem, as we've use this to meet legal require-
       ments to keep "every file" for a mandated e-discovery.   There  was  no
       requirement  to  keep  the modes, owners, or access times on the files.
       And pulling any single file out of the cache is quite easy.

       Rmsd is the remote service that msd starts on  each  target  host.   It
       supports a very simple request/reply protocol to allow the checksum and
       recovery of files under start (which defaults to  the  current  working
       directory).

OPTIONS
       -D
              Debug  protocol by tracing the commands sent to the remote host.

       -h
              Output a standard help message.

       -n
              Do not download any files, just output  which  files  should  be
              saved.   This also builds all the cache directories, which looks
              like a bug but really isn't.  This  allows  the  back-feeder  to
              allocate  all  the  directory  structure for a group of hosts in
              parallel.

       -T top
              Change the local cache root, rather than ".".

       -U
              Output the size of the cached paths listed.  Don't use ls(1), as
              the  file may not exist, may be compressed, or may be segmented.

       -v
              Output a line for each file cached, implied by -n.  Under -U and
              -W include the requested input for each output line.

       -V
              Output only the standard version information.

       -W
              Output what path a file of the given size would be cached under.

       -X
              Debug protocol  by  tracing  traffic  from  the  remote  process
              (rmsd).

       [user@]host
              Specify the remote login name for ssh(1) and the target host.

       targets
              Specify  the source directories to recursively search for unique
              files.

EXAMPLES
       msd ksb@nostromo.example.com /home/staff1/ksb
              Archive all the unique files from my account on "nostromo" here.

       msd -v ksb@sulaco.example.com /home/staff1/ksb
              Add  all  the  unique  files  from my account on "sulaco" to the
              above archive, and show me the list of the files added.

       msd -W 4660046610375530309
              Outputs "./xJ/0" which is the directory  a  file  of  that  size
              would be cached under.

       msd -W 4660046610377530309
              Outputs  "./xJ/Y/W/T/P/H/B/0"  because  that  number is a little
              bigger than an even Fibonacci number.

       msd -vU ./P/H/D/8/3/1
              Outputs "./P/H/D/8/3/1: 18155" because that path will recover  a
              file that unpacks to 18,155 bytes on disk.

       msd -W 'msd -U kevin'
              Ouputs  "./v/n/k/i/e/0"  which  is the canonical spelling.  Some
              names compress a lot (my last name does).

BUGS
       The purpose of this program is not clear, until you need  it.   Largely
       when  you  miss  a file it is because it was unique on some host (while
       being the same across all others).  Msd  only  keeps  the  unique  ones
       (mostly) so you can find the unique one if you can grep for contents or
       know the file size.

       There are missing parts to this facility that  I've  not  released  (or
       finished coding) yet.

       There is no manual page for rmsd(7l), yet.

AUTHORS
       KS Braunsdorf, Non-Player Character's Guild,
       msd at ksb.npcguild.org.NoSpam.rm

SEE ALSO
       ssh(1), grep(1), rsync(1), sbp(8l)



                                     LOCAL                              MSD(8)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | EXAMPLES | BUGS | AUTHORS | SEE ALSO