我正在寻找具有增量备份的备份实用程序,但方式更复杂。
我尝试了 rsync,但它似乎无法做我想做的事,或者更可能的是,我不知道如何让它做到这一点。
所以这是我想用它实现的一个例子。我有以下文件:
testdir
├── picture1
├── randomfile1
├── randomfile2
└── textfile1
我想运行备份实用程序并基本上在不同的目录中创建所有这些文件的存档(或压缩包):
$ mystery-command testdir/ testbak
testbak
└── 2020-02-16--05-10-45--testdir.tar
现在,假设第二天,我添加了一个文件,我的结构如下所示:
testdir
├── picture1
├── randomfile1
├── randomfile2
├── randomfile3
└── textfile1
现在,当我运行神秘命令时,我将获得当天的另一个 tarball:
$ mystery-command testdir/ testbak
testbak
├── 2020-02-16--05-10-45--testdir.tar
└── 2020-02-17--03-24-16--testdir.tar
这是踢球者:我希望备份实用程序检测自上次备份以来没有更改的事实,并且仅备份新/更改的文件,在这种情况下是picture1
,例如:randomfile1
randomfile2
textfile1
randomfile3
tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar
testdir/randomfile3
因此,作为最后一个示例,假设第二天我更改了textfile1
,并添加了picture2
and picture3
:
$ mystery-command testdir/ testbak
testbak/
├── 2020-02-16--05-10-45--testdir.tar
├── 2020-02-17--03-24-16--testdir.tar
└── 2020-02-18--01-54-41--testdir.tar
tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar
testdir/randomfile3
tester@raspberrypi:~ $ tar -tf testbak/2020-02-18--01-54-41--testdir.tar
testdir/textfile1
testdir/picture2
testdir/picture3
使用这个系统,我将通过仅备份每个备份之间的增量更改来节省空间(显然主备份包含所有初始文件),并且我将备份增量更改,例如,如果我进行了更改在第 2 天,并在第 3 天再次更改相同的内容,我仍然可以获取第 2 天更改的文件,但在第 3 天更改之前。
我认为这有点像 GitHub 的工作方式 :)
我知道我可能会创建一个运行差异的脚本,然后根据结果选择要备份的文件(或者更有效地,只需获取校验和并进行比较),但我想知道是否有任何实用程序可以做到这一点容易一点:)
rsync
is precisely that program that copies based on a diff. By default, it copies only when there is a difference in last-modified time or size, but it can even compare by checksum with-c
.The trouble here is that you're
tar
'ing the backups. This becomes easier if you don't do that. I don't even know why you're doing it. It might make sense if you're compressing them, but you're not even doing that.The Wikipedia article for Incremental Backups has an example
rsync
command that goes roughly:What it does is to hardlink files from the previous backup when they are unchanged from the source. There's also
--copy-dest
if you want it to copy instead (it's still faster when$dst
is a remote or on a faster drive).If you use a filesystem with subvolumes like btrfs, you can also just snapshot from the previous backup before rsync'ing. Snapshots are instantaneous and don't take additional space[1].
Or if you're using a filesystem that supports reflinks, then you can also do that. Reflinks are done by making a new inode but referring to the same blocks as the source file, implementing COW support. It's still faster than regular copy because it doesn't read and write the data, and it also doesn't take additional space[1].
Anyway, once having done something like that you can just do a regular
rsync
to copy the differences:Though, you might want to add
--delete
, which would cause rsync to delete files from the destination that are no longer present in the source.Another useful option is
-i
or--itemize-changes
. It produces succinct, machine readable output that describes what changes rsync is doing. I normally add that option and pipe like:to keep record of the changes via easily
grep
able files. The|&
is to pipe both stdout and stderr.The
-P
is short for--partial
and--progress
.--partial
keeps partially transferred files, but more importantly--progress
reports per-file progress.How this compares to archiving changes with tar
The above solutions result in directories that seem to hold everything. Even though that's the case, in total for any amount/frequency of backups, they would occupy around the same amount of space as having plain tar archives with only changes. That's because of how hardlinks, reflinks, and snapshots work. The use of bandwidth when creating the backups would also be the same.
The advantages are:
foo
, mark itfoo.DELETED
or do something complicated. I've never used duplicity for example, but looking at its documentation, it seems it encodes deletions by adding an empty file of the same name in the new tar and holding the original signature of the file in a separate .sigtar file. I imagine it compares the original signature with that of an empty file to differentiate between a file deletion and a change to an actual empty file.If one still wants to setup each backup as only holding the files that are different (added or modified), then one can use the
--link-dest
solution described above and then delete the hardlinks using something like the following:[1] Strictly speaking, they do use additional space in the form of duplicate metadata, like the filename and such. However, I think anyone would consider that insignificant.
Although
tar
does have an incremental mode there are a couple of more comprehensive tools to do the job:They not only support incremental backups, it's easy to configure a schedule on which a full backup needs to be taken. For example in
duplicity
:duplicity --full-if-older-than 1M
will make sure a full backup has run. They also support going back in time to a specific file, with plain tar you'll have to go through all incremental files till you found one which contains the right file.Additionally they do support encryption and uploading to a variety of backends (like sftp, blob storage, etc). Obviously if you encrypt, don't forget to make a good backup of your keys to a secondary backup!
Another important aspect is that you can verify the integrity of your backups, ensuring you can restore, eg using
duplicity verify
.I would negatively advise on a git based backup strategy. Large restores take significant time.
你为什么不考虑
git
自己?您描述的策略,在一次完整备份和两次增量备份之后,继续进行时会很复杂。很容易出错,而且效率会很低,具体取决于更改。必须进行一种轮换,即不时进行新的完整备份 - 然后您是否要保留旧备份?
给定一个包含一些项目(文件和子目录)的工作目录“testdir” ,默认情况下为数据创建一个隐藏的子目录。那将用于本地的附加版本控制功能。对于备份,您可以将其存档/复制到介质或通过网络克隆它。
git
.git
您获得的版本控制(无需请求)是 git 差异存储的副作用。
您可以省略所有的分叉/分支等。这意味着您有一个名为“master”的分支。
在提交之前(实际上是写入 git 存档/存储库),您必须为配置文件配置一个最小用户。然后你应该首先在一个子目录(可能是 tmpfs)中学习和测试。有时,Git 和 tar 一样棘手。
无论如何,正如评论所说:备份很容易,难的是恢复。
git 的缺点只是开销很小/过大。
优点是:git跟踪内容和文件名。它仅根据差异(至少对于文本文件)保存必要的内容。
例子
我在一个目录中有 3 个文件。之后
git init
,我有一个 260K 的git add .
目录。git commit
.git
然后我
cp -r .git /tmp/abpic.git
(保存备份的好地方:)。我rm
是154K jpg,也换了一个文本文件。我也rm -r .git
。在恢复文件之前,我可以获得精确的差异:
在这里,我想遵循
git restore
提示。之后
git --git-dir=/tmp/abpic.git/ restore \*
:jpeg 又回来了,文本文件
btext
尚未更新(保留时间戳)。中的修改atext
被覆盖。要重新组合 repo 和(工作)目录,您只需将其复制回来。
当前目录中的文件与
.git
存档相同(在 之后restore
)。将显示新的更改,并且可以添加和提交,无需任何计划。您只需将其存储到另一个介质中,以进行备份。修改文件后,您可以使用
status
ordiff
:就像
git
知道文件'btext'中的“+more”一样,它也只会增量存储该行。之后
git add .
(或git add btext
)status
命令从红色切换到绿色,并commit
为您提供信息。而且您可以以某种方式真正了解内容:
然后是前 4 个十六进制哈希数字
通过一次提交回到过去是:
btext 的 blob 在最后一次提交之前具有不同的哈希值,其他的具有相同的哈希值。
概述将是:
您可以提交带有消息和日期(以及作者)的提交,而不是手动加时间戳的 tar 文件。逻辑上附加到这些提交的是文件列表和内容。
简单
git
比 复杂 20%tar
,但您可以从中获得决定性的 50% 更多功能。我想做 OP 的第三个更改:更改一个文件加上两个新的“图片”文件。我做到了,但现在我有:
那么,在下午 6 点前不久,Your Name Guy 在他的两次提交中到底做了什么?
最后一次提交的详细信息是:
并检查倒数第二个提交,其消息宣布了两张图片:
发生这种情况是因为我尝试
git commit -a
使用快捷方式git add .
,并且这两个文件是新文件(未跟踪)。它以红色显示git status
,但正如我所说,git 并不比 tar 或 unix 更棘手。“你的初次登台者只知道你需要什么,但我知道你想要什么”(或者反过来。重点是它并不总是一样的)
更新:
请在此处查看一些警告: 是否可以使用 tar 进行完整系统备份?
根据该答案,使用 tar 恢复增量备份容易出错,应该避免。除非您绝对确定可以在需要时恢复数据,否则请勿使用以下方法。
根据文档,您可以使用 -g/--listed-incremental 选项来创建增量 tar 文件,例如。
然后下次做类似的事情
其中 data.inc 是您的增量元数据,而 DATE-data.tar 是您的增量档案。
我推荐
star
增量备份,因为star
已经过验证可以可靠地支持增量转储和恢复。当您重命名目录时,后者在 GNU tar 中不起作用,即使它自 28 年以来一直在做广告。请阅读http://schilytools.sourceforge.net/man/man1/star.1.html
star
的手册页关于增量备份的部分目前从第 53 页开始。
要下载源代码,请从http://sourceforge.net/projects/schilytools/files/获取 schilytools 压缩包
Check Is it possible to use tar for full system backups? for a verification of the GNU tar bug.
I would recommend you get a look at Borg Backup.
This will handle backups that:
Are deduplicated. This indirectly makes it differential backups, but has more advantages:
Are compressed
It will manage pruning of old backups using rules such as "keep one daily backup for a week, one weekly backup for a month, one monthly backup for a year"
It's really easy to set up and use.
Take a look at restic. It does incremental back ups using an algorithm called deduplication. It's also very easy to use so its great for a beginner or advanced command line user.
你可以试试BackupPC。
它允许增量备份,您可以决定执行它们的频率,保留多少,当您查看它们时,您可以看到它们已合并或只是实际的增量备份。如果它们存在于相同或不同主机的不同备份中,它还会删除完整的文件。
它很可能已经为您的发行版打包。
This is not exactly what you are requesting, because it doesn't use tar. But it does use rsync, and it has worked very well for me. On of the abilities that I really like is the ability to drop incremental restore points over time without loosing points before or after the one I am dropping. This allows me to, for example, have daily backups for the last 2 weeks, then thin those out once they get 2 weeks old so they are weekly for a couple months, then further thin those out until they are monthly for a quarter or two, then thin those out to about quarterly over the timespan of years. I have a python script that I can share that can prune these automatically if you want. (Obviously, it comes with NO WARRANTY as letting a computer automatically delete backups sounds a bit scary.)
What I do is use a ZFS pool & filesystem for storing backups. With the ZFS filesystem, which is (thankfully!) now usable on linux, you can take snapshots. When you write to a filesystem that has been snapshotted, it (smartly) writes a new version of only the changed blocks, thus making it an incremental backup. Even easier is that all of your snapshots can be mounted as a full (read only) Unix filesystem, that you can use all of your normal tools on to look at and copy from. Want to see what that file looked like 2 months ago? Just cd to the right folder and use less or vim or whatever to look at it. Want to see when a (hacked) wordpress install you were backing up went off the rails? Just do a grep for an identifying mark with something like
grep -in /zfsbackup/computername/.zfs/snapshots/*/var/www/html/wp-config.php" "somebadstring"
You can even use Linux's LUKS system to do disk encryption and then present the mapped device as "drives" to ZFS, giving you an encrypted backup.
If you ever need to migrate your backups to a new drive, you can use zfs send & receive to move the entire filesystem.
It has been a year or two since I've set it up (I just keep adding on incremental backups and haven't needed to upgrade my backup drive for a while), so these will be rough instructions. Bear with me, or better yet, edit them.
First, make sure you have zfs, rsync, and, if you want to encrypt your backups, the LUKS tools installed.
First, create any partition layout you might want on your backup drive. (You may want to make a small unencrypted partition that has scripts for running the backup.)
Then, if you want disk encryption, encrypt the partition with LUKS (example assumes a backup drive of /dev/sde and a partition /dev/sde2 since /dev/sde1 is probably scripts):
(Put in a nice strong passphrase).
If you are doing disk encryption, now you need to open the volume:
(Now an unencrypted version of the raw device should be available (mapped) at /dev/mapper/zfsbackuppart1).
Now, create you ZFS pool (group of drive(s) holding data, multiple drives/devices can be used for RAID if you wish):
This will create a ZFS pool named "zfsbackup".
Now, create a filesystem for each machine you are backing up:
And create a folder for each partition you want to back up from the source machine:
Then, use rsync to copy files to there:
Take a snapshot:
To disconnect the drive when you are done:
And to set it up when taking another backup in the future, before the above rsync commands:
Let me know if you need more info on this approach, or are interested in a script to thin out backups as they get farther back in time.
And, yes, you can backup a whole system this way -- you just have to create partitions/filesystems (which don't have to match the original layout -- a great way to migrate stuff!), tweak /etc/fstab, and install GRUB & have it rescan/rebuild the GRUB config.
一种可能性是AMANDA,即 Advanced Maryland Automatic Network Disk Archiver,它在许多其他功能中还支持增量备份。