有没有办法让 ls 只显示某些目录的隐藏文件？

Question

user361323

Asked: 2020-02-16 19:30:53 +0800 CST2020-02-16 19:30:53 +0800 CST 2020-02-16 19:30:53 +0800 CST

用于增量备份的 Linux 备份实用程序

772

我正在寻找具有增量备份的备份实用程序，但方式更复杂。

我尝试了 rsync，但它似乎无法做我想做的事，或者更可能的是，我不知道如何让它做到这一点。

所以这是我想用它实现的一个例子。我有以下文件：

testdir
├── picture1
├── randomfile1
├── randomfile2
└── textfile1

我想运行备份实用程序并基本上在不同的目录中创建所有这些文件的存档（或压缩包）：

$ mystery-command testdir/ testbak
testbak
└── 2020-02-16--05-10-45--testdir.tar

现在，假设第二天，我添加了一个文件，我的结构如下所示：

testdir
├── picture1
├── randomfile1
├── randomfile2
├── randomfile3
└── textfile1

现在，当我运行神秘命令时，我将获得当天的另一个 tarball：

$ mystery-command testdir/ testbak
testbak
├── 2020-02-16--05-10-45--testdir.tar
└── 2020-02-17--03-24-16--testdir.tar

这是踢球者：我希望备份实用程序检测自上次备份以来没有更改的事实，并且仅备份新/更改的文件，在这种情况下是picture1，例如：randomfile1randomfile2textfile1randomfile3

tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar 
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar 
testdir/randomfile3

因此，作为最后一个示例，假设第二天我更改了textfile1，并添加了picture2and picture3：

$ mystery-command testdir/ testbak
testbak/
├── 2020-02-16--05-10-45--testdir.tar
├── 2020-02-17--03-24-16--testdir.tar
└── 2020-02-18--01-54-41--testdir.tar
tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar 
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar 
testdir/randomfile3
tester@raspberrypi:~ $ tar -tf testbak/2020-02-18--01-54-41--testdir.tar 
testdir/textfile1
testdir/picture2
testdir/picture3

使用这个系统，我将通过仅备份每个备份之间的增量更改来节省空间（显然主备份包含所有初始文件），并且我将备份增量更改，例如，如果我进行了更改在第 2 天，并在第 3 天再次更改相同的内容，我仍然可以获取第 2 天更改的文件，但在第 3 天更改之前。

我认为这有点像 GitHub 的工作方式 :)

我知道我可能会创建一个运行差异的脚本，然后根据结果选择要备份的文件（或者更有效地，只需获取校验和并进行比较），但我想知道是否有任何实用程序可以做到这一点容易一点:)

15 个回答

Voted

JoL · Answer 1 · 2020-02-17T09:25:28+08:00

I tried rsync, but it doesn't seem to be able to do what I want, or more likely, I don't know how to make it do that.

I know I could probably create a script that runs a diff and then selects the files to backup based on the result (or more efficiently, just get a checksum and compare), but I want to know if there's any utility that can do this a tad easier :)

rsync is precisely that program that copies based on a diff. By default, it copies only when there is a difference in last-modified time or size, but it can even compare by checksum with -c.

The trouble here is that you're tar'ing the backups. This becomes easier if you don't do that. I don't even know why you're doing it. It might make sense if you're compressing them, but you're not even doing that.

The Wikipedia article for Incremental Backups has an example rsync command that goes roughly:

rsync -va \
  --link-dest="$dst/2020-02-16--05-10-45--testdir/" \
  "$src/testdir/" \
  "$dst/2020-02-17--03-24-16--testdir/"

What it does is to hardlink files from the previous backup when they are unchanged from the source. There's also --copy-dest if you want it to copy instead (it's still faster when $dst is a remote or on a faster drive).

If you use a filesystem with subvolumes like btrfs, you can also just snapshot from the previous backup before rsync'ing. Snapshots are instantaneous and don't take additional space[1].

btrfs subvolume snapshot \
  "$dst/2020-02-16--05-10-45--testdir" \
  "$dst/2020-02-17--03-24-16--testdir"

Or if you're using a filesystem that supports reflinks, then you can also do that. Reflinks are done by making a new inode but referring to the same blocks as the source file, implementing COW support. It's still faster than regular copy because it doesn't read and write the data, and it also doesn't take additional space[1].

cp --reflink -av \
  "$dst/2020-02-16--05-10-45--testdir" \
  "$dst/2020-02-17--03-24-16--testdir"

Anyway, once having done something like that you can just do a regular rsync to copy the differences:

rsync -va \
  "$src/testdir/" \
  "$dst/2020-02-17--03-24-16--testdir/"

Though, you might want to add --delete, which would cause rsync to delete files from the destination that are no longer present in the source.

Another useful option is -i or --itemize-changes. It produces succinct, machine readable output that describes what changes rsync is doing. I normally add that option and pipe like:

rsync -Pai --delete \
  "$src/testdir/" \
  "$dst/2020-02-17--03-24-16--testdir/" \
|& tee -a "$dst/2020-02-17--03-24-16--testdir.log"

to keep record of the changes via easily grepable files. The |& is to pipe both stdout and stderr.

The -P is short for --partial and --progress. --partial keeps partially transferred files, but more importantly --progress reports per-file progress.

How this compares to archiving changes with tar

The above solutions result in directories that seem to hold everything. Even though that's the case, in total for any amount/frequency of backups, they would occupy around the same amount of space as having plain tar archives with only changes. That's because of how hardlinks, reflinks, and snapshots work. The use of bandwidth when creating the backups would also be the same.

The advantages are:

backups are easy to restore with rsync and faster, since rsync would only transfer the differences from the backup.
they're simpler to browse and modify if needed.
file deletions can be encoded naturally as the file's absence in new backups. When using tar archives, one would have to resort to hacks, like to delete a file foo, mark it foo.DELETED or do something complicated. I've never used duplicity for example, but looking at its documentation, it seems it encodes deletions by adding an empty file of the same name in the new tar and holding the original signature of the file in a separate .sigtar file. I imagine it compares the original signature with that of an empty file to differentiate between a file deletion and a change to an actual empty file.

If one still wants to setup each backup as only holding the files that are different (added or modified), then one can use the --link-dest solution described above and then delete the hardlinks using something like the following:

find $new_backup -type f ! -links 1 -delete

[1] Strictly speaking, they do use additional space in the form of duplicate metadata, like the filename and such. However, I think anyone would consider that insignificant.

nathan_gs · Answer 2 · 2020-02-17T07:18:06+08:00

Although tar does have an incremental mode there are a couple of more comprehensive tools to do the job:

They not only support incremental backups, it's easy to configure a schedule on which a full backup needs to be taken. For example in duplicity: duplicity --full-if-older-than 1M will make sure a full backup has run. They also support going back in time to a specific file, with plain tar you'll have to go through all incremental files till you found one which contains the right file.

Additionally they do support encryption and uploading to a variety of backends (like sftp, blob storage, etc). Obviously if you encrypt, don't forget to make a good backup of your keys to a secondary backup!

Another important aspect is that you can verify the integrity of your backups, ensuring you can restore, eg using duplicity verify.

I would negatively advise on a git based backup strategy. Large restores take significant time.

user373503 · Answer 3 · 2020-02-17T00:24:24+08:00

你为什么不考虑git自己？

您描述的策略，在一次完整备份和两次增量备份之后，继续进行时会很复杂。很容易出错，而且效率会很低，具体取决于更改。必须进行一种轮换，即不时进行新的完整备份 - 然后您是否要保留旧备份？

给定一个包含一些项目（文件和子目录）的工作目录“testdir” ，默认情况下为数据创建一个隐藏的子目录。那将用于本地的附加版本控制功能。对于备份，您可以将其存档/复制到介质或通过网络克隆它。git.git

您获得的版本控制（无需请求）是 git 差异存储的副作用。

您可以省略所有的分叉/分支等。这意味着您有一个名为“master”的分支。

在提交之前（实际上是写入 git 存档/存储库），您必须为配置文件配置一个最小用户。然后你应该首先在一个子目录（可能是 tmpfs）中学习和测试。有时，Git 和 tar 一样棘手。

无论如何，正如评论所说：备份很容易，难的是恢复。

git 的缺点只是开销很小/过大。

优点是：git跟踪内容和文件名。它仅根据差异（至少对于文本文件）保存必要的内容。

例子

我在一个目录中有 3 个文件。之后git init，我有一个 260K 的git add .目录。git commit.git

然后我cp -r .git /tmp/abpic.git（保存备份的好地方：）。我rm是154K jpg，也换了一个文本文件。我也rm -r .git。

  ]# ls
    atext  btext

  ]# git --git-dir=/tmp/abpic.git/ ls-files
    atext
    btext
    pic154k.jpg

在恢复文件之前，我可以获得精确的差异：

]# git --git-dir=/tmp/abpic.git/ status
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   atext
        deleted:    pic154k.jpg

no changes added to commit (use "git add" and/or "git commit -a")

在这里，我想遵循git restore提示。

之后git --git-dir=/tmp/abpic.git/ restore \*：

]# ls -st
total 164
  4 atext  156 pic154k.jpg    4 btext

jpeg 又回来了，文本文件btext尚未更新（保留时间戳）。中的修改atext被覆盖。

要重新组合 repo 和（工作）目录，您只需将其复制回来。

]# cp -r /tmp/abpic.git/ .git
]# git status
On branch master
nothing to commit, working tree clean

当前目录中的文件与.git存档相同（在之后restore）。将显示新的更改，并且可以添加和提交，无需任何计划。您只需将其存储到另一个介质中，以进行备份。

修改文件后，您可以使用statusor diff：

]# echo more >>btext 

]# git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   btext

no changes added to commit (use "git add" and/or "git commit -a")

]# git diff
diff --git a/btext b/btext
index 96b5d76..a4a6c5b 100644
--- a/btext
+++ b/btext
@@ -1,2 +1,3 @@
 This is file b
 second line
+more
#]

就像git知道文件'btext'中的“+more”一样，它也只会增量存储该行。

之后git add .（或git add btext）status命令从红色切换到绿色，并commit为您提供信息。

]# git add .
]# git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   btext

]# git commit -m 'btext: more'
[master fad0453] btext: more
 1 file changed, 1 insertion(+)

而且您可以以某种方式真正了解内容：

]# git ls-tree @
100644 blob 321e55a5dc61e25fe34e7c79f388101bd1ae4bbf    atext
100644 blob a4a6c5bd3359d84705e5fd01884caa8abd1736d0    btext
100644 blob 2d550ffe96aa4347e465109831ac52b7897b9f0d    pic154k.jpg

然后是前 4 个十六进制哈希数字

]# git cat-file blob a4a6
This is file b
second line
more

通过一次提交回到过去是：

]# git ls-tree @^
100644 blob 321e55a5dc61e25fe34e7c79f388101bd1ae4bbf    atext
100644 blob 96b5d76c5ee3ccb7e02be421e21c4fb8b96ca2f0    btext
100644 blob 2d550ffe96aa4347e465109831ac52b7897b9f0d    pic154k.jpg

]# git cat-file blob 96b5
This is file b
second line

btext 的 blob 在最后一次提交之前具有不同的哈希值，其他的具有相同的哈希值。

概述将是：

]# git log
commit fad04538f7f8ddae1f630b648d1fe85c1fafa1b4 (HEAD -> master)
Author: Your Name <[email protected]>
Date:   Sun Feb 16 10:51:51 2020 +0000

    btext: more

commit 0bfc1837e20988f1b80f8b7070c5cdd2de346dc7
Author: Your Name <[email protected]>
Date:   Sun Feb 16 08:45:16 2020 +0000

    added 3 files with 'add .'

您可以提交带有消息和日期（以及作者）的提交，而不是手动加时间戳的 tar 文件。逻辑上附加到这些提交的是文件列表和内容。

简单git比复杂 20% tar，但您可以从中获得决定性的 50% 更多功能。

我想做 OP 的第三个更改：更改一个文件加上两个新的“图片”文件。我做到了，但现在我有：

]# git log
commit deca7be7de8571a222d9fb9c0d1287e1d4d3160c (HEAD -> master)
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:56:18 2020 +0000

    didn't add the pics before :(

commit b0355a07476c8d8103ce937ddc372575f0fb8ebf
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:54:03 2020 +0000

    Two new picture files
    Had to change btext...

commit fad04538f7f8ddae1f630b648d1fe85c1fafa1b4
Author: Your Name <[email protected]>
Date:   Sun Feb 16 10:51:51 2020 +0000

    btext: more

commit 0bfc1837e20988f1b80f8b7070c5cdd2de346dc7
Author: Your Name <[email protected]>
Date:   Sun Feb 16 08:45:16 2020 +0000

    added 3 files with 'add .'
]#

那么，在下午 6 点前不久，Your Name Guy 在他的两次提交中到底做了什么？

最后一次提交的详细信息是：

]# git show
commit deca7be7de8571a222d9fb9c0d1287e1d4d3160c (HEAD -> master)
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:56:18 2020 +0000

    didn't add the pics before :(

diff --git a/picture2 b/picture2
new file mode 100644
index 0000000..d00491f
--- /dev/null
+++ b/picture2
@@ -0,0 +1 @@
+1
diff --git a/picture3 b/picture3
new file mode 100644
index 0000000..0cfbf08
--- /dev/null
+++ b/picture3
@@ -0,0 +1 @@
+2
]#

并检查倒数第二个提交，其消息宣布了两张图片：

]# git show @^
commit b0355a07476c8d8103ce937ddc372575f0fb8ebf
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:54:03 2020 +0000

    Two new picture files
    Had to change btext...

diff --git a/btext b/btext
index a4a6c5b..de7291e 100644
--- a/btext
+++ b/btext
@@ -1,3 +1 @@
-This is file b
-second line
-more
+Completely changed file b
]#

发生这种情况是因为我尝试git commit -a使用快捷方式git add .，并且这两个文件是新文件（未跟踪）。它以红色显示git status，但正如我所说，git 并不比 tar 或 unix 更棘手。

“你的初次登台者只知道你需要什么，但我知道你想要什么”（或者反过来。重点是它并不总是一样的）

Angelo · Answer 4 · 2020-02-16T19:54:51+08:00

更新：

请在此处查看一些警告：是否可以使用 tar 进行完整系统备份？

根据该答案，使用 tar 恢复增量备份容易出错，应该避免。除非您绝对确定可以在需要时恢复数据，否则请勿使用以下方法。

根据文档，您可以使用 -g/--listed-incremental 选项来创建增量 tar 文件，例如。

tar -cg data.inc -f DATE-data.tar /path/to/data

然后下次做类似的事情

tar -cg data.inc -f NEWDATE-data.tar /path/to/data

其中 data.inc 是您的增量元数据，而 DATE-data.tar 是您的增量档案。

schily · Answer 5 · 2020-02-17T00:28:39+08:00

我推荐star增量备份，因为star已经过验证可以可靠地支持增量转储和恢复。当您重命名目录时，后者在 GNU tar 中不起作用，即使它自 28 年以来一直在做广告。

请阅读http://schilytools.sourceforge.net/man/man1/star.1.htmlstar的手册页

关于增量备份的部分目前从第 53 页开始。

要下载源代码，请从http://sourceforge.net/projects/schilytools/files/获取 schilytools 压缩包

Check Is it possible to use tar for full system backups? for a verification of the GNU tar bug.

jcaron · Answer 6 · 2020-02-18T02:24:47+08:00

I would recommend you get a look at Borg Backup.

This will handle backups that:

Are deduplicated. This indirectly makes it differential backups, but has more advantages:
- It will handle multiple copies of the same file
- Or even of the same blocks within different files
- Will help with files that grow (like logs)
- Will help with files that are renamed (like logs in some rotation setups)
Are compressed
Can be mounted like a regular remote file system (you can mount any of the previous backups)

It will manage pruning of old backups using rules such as "keep one daily backup for a week, one weekly backup for a month, one monthly backup for a year"

It's really easy to set up and use.

bit · Answer 7 · 2020-02-18T10:35:49+08:00

Take a look at restic. It does incremental back ups using an algorithm called deduplication. It's also very easy to use so its great for a beginner or advanced command line user.

Eduardo Trápani · Answer 8 · 2020-02-16T19:37:29+08:00

你可以试试BackupPC。

它允许增量备份，您可以决定执行它们的频率，保留多少，当您查看它们时，您可以看到它们已合并或只是实际的增量备份。如果它们存在于相同或不同主机的不同备份中，它还会删除完整的文件。

它很可能已经为您的发行版打包。

Azendale · Answer 9 · 2020-02-17T12:21:57+08:00

This is not exactly what you are requesting, because it doesn't use tar. But it does use rsync, and it has worked very well for me. On of the abilities that I really like is the ability to drop incremental restore points over time without loosing points before or after the one I am dropping. This allows me to, for example, have daily backups for the last 2 weeks, then thin those out once they get 2 weeks old so they are weekly for a couple months, then further thin those out until they are monthly for a quarter or two, then thin those out to about quarterly over the timespan of years. I have a python script that I can share that can prune these automatically if you want. (Obviously, it comes with NO WARRANTY as letting a computer automatically delete backups sounds a bit scary.)

What I do is use a ZFS pool & filesystem for storing backups. With the ZFS filesystem, which is (thankfully!) now usable on linux, you can take snapshots. When you write to a filesystem that has been snapshotted, it (smartly) writes a new version of only the changed blocks, thus making it an incremental backup. Even easier is that all of your snapshots can be mounted as a full (read only) Unix filesystem, that you can use all of your normal tools on to look at and copy from. Want to see what that file looked like 2 months ago? Just cd to the right folder and use less or vim or whatever to look at it. Want to see when a (hacked) wordpress install you were backing up went off the rails? Just do a grep for an identifying mark with something like grep -in /zfsbackup/computername/.zfs/snapshots/*/var/www/html/wp-config.php" "somebadstring"

You can even use Linux's LUKS system to do disk encryption and then present the mapped device as "drives" to ZFS, giving you an encrypted backup.

If you ever need to migrate your backups to a new drive, you can use zfs send & receive to move the entire filesystem.

It has been a year or two since I've set it up (I just keep adding on incremental backups and haven't needed to upgrade my backup drive for a while), so these will be rough instructions. Bear with me, or better yet, edit them.

First, make sure you have zfs, rsync, and, if you want to encrypt your backups, the LUKS tools installed.

First, create any partition layout you might want on your backup drive. (You may want to make a small unencrypted partition that has scripts for running the backup.)

Then, if you want disk encryption, encrypt the partition with LUKS (example assumes a backup drive of /dev/sde and a partition /dev/sde2 since /dev/sde1 is probably scripts):

sudo cryptsetup luksFormat /dev/sde2

(Put in a nice strong passphrase).

If you are doing disk encryption, now you need to open the volume:

sudo cryptsetup luksOpen /dev/sde2 zfsbackuppart1

(Now an unencrypted version of the raw device should be available (mapped) at /dev/mapper/zfsbackuppart1).

Now, create you ZFS pool (group of drive(s) holding data, multiple drives/devices can be used for RAID if you wish):

sudo zpool create zfsbackup /dev/mapper/zfsbackuppart1

This will create a ZFS pool named "zfsbackup".

Now, create a filesystem for each machine you are backing up:

sudo zfs create zfsbackup/machinename

And create a folder for each partition you want to back up from the source machine:

sudo mkdir /zfsbackup/machinename/slash/
sudo mkdir /zfsbackup/machinename/boot/

Then, use rsync to copy files to there:

sudo rsync -avx --numeric-ids --exclude .gvfs / /zfsbackup/machinename/slash/ --delete-after
sudo rsync -avx --numeric-ids --exclude .gvfs /boot/ /zfsbackup/machinename/boot/ --delete-after

Take a snapshot:

zfs snapshot zfsbackup/machinename@`date +%F_%T`

To disconnect the drive when you are done:

zpool export zfsbackup
# Next line, for each underlying encrypted block device, if using encryption:
cryptsetup luksClose zfsbackuppart1

And to set it up when taking another backup in the future, before the above rsync commands:

cryptsetup luksOpen /dev/sde2 zfsbackuppart1
zpool import zfsbackup

Let me know if you need more info on this approach, or are interested in a script to thin out backups as they get farther back in time.

And, yes, you can backup a whole system this way -- you just have to create partitions/filesystems (which don't have to match the original layout -- a great way to migrate stuff!), tweak /etc/fstab, and install GRUB & have it rescan/rebuild the GRUB config.

Paulo Tomé · Answer 10 · 2020-02-16T20:07:43+08:00

一种可能性是AMANDA，即 Advanced Maryland Automatic Network Disk Archiver，它在许多其他功能中还支持增量备份。

用于增量备份的 Linux 备份实用程序

How this compares to archiving changes with tar

例子

模块 i915 可能缺少固件 /lib/firmware/i915/*

无法获取 jessie backports 存储库

如何将 GPG 私钥和公钥导出到文件

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

用于增量备份的 Linux 备份实用程序

15 个回答

How this compares to archiving changes with tar

例子

相关问题