Existe uma maneira de fazer ls mostrar arquivos ocultos apenas para determinados diretórios?

Question

user361323

Asked: 2020-02-16 19:30:53 +0800 CST2020-02-16 19:30:53 +0800 CST 2020-02-16 19:30:53 +0800 CST

Utilitário de backup do Linux para backups incrementais

772

Estou procurando um utilitário de backup com backups incrementais, mas de uma maneira mais complicada.

Eu tentei rsync, mas não parece ser capaz de fazer o que eu quero, ou mais provavelmente, eu não sei como fazer isso.

Então este é um exemplo do que eu quero alcançar com isso. Tenho os seguintes arquivos:

testdir
├── picture1
├── randomfile1
├── randomfile2
└── textfile1

Eu quero executar o utilitário de backup e basicamente criar um arquivo (ou um tarball) de todos esses arquivos em um diretório diferente:

$ mystery-command testdir/ testbak
testbak
└── 2020-02-16--05-10-45--testdir.tar

Agora, digamos que no dia seguinte, eu adicione um arquivo, de modo que minha estrutura se pareça com:

testdir
├── picture1
├── randomfile1
├── randomfile2
├── randomfile3
└── textfile1

Agora, quando executo o comando misterioso, obterei outro tarball para esse dia:

$ mystery-command testdir/ testbak
testbak
├── 2020-02-16--05-10-45--testdir.tar
└── 2020-02-17--03-24-16--testdir.tar

Aqui está o kicker: eu quero que o utilitário de backup detecte o fato de que picture1, randomfile1, randomfile2e textfile1não foram alterados desde o último backup, e apenas faça backup dos arquivos novos/alterados, que neste caso é randomfile3, de modo que:

tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar 
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar 
testdir/randomfile3

Então, como último exemplo, digamos que no dia seguinte eu mudei textfile1e adicionei picture2e picture3:

$ mystery-command testdir/ testbak
testbak/
├── 2020-02-16--05-10-45--testdir.tar
├── 2020-02-17--03-24-16--testdir.tar
└── 2020-02-18--01-54-41--testdir.tar
tester@raspberrypi:~ $ tar -tf testbak/2020-02-16--05-10-45--testdir.tar 
testdir/
testdir/randomfile1
testdir/textfile1
testdir/randomfile2
testdir/picture1
tester@raspberrypi:~ $ tar -tf testbak/2020-02-17--03-24-16--testdir.tar 
testdir/randomfile3
tester@raspberrypi:~ $ tar -tf testbak/2020-02-18--01-54-41--testdir.tar 
testdir/textfile1
testdir/picture2
testdir/picture3

Com este sistema, eu economizaria espaço apenas fazendo backup das alterações incrementais entre cada backup (com obviamente o backup mestre que tem todos os arquivos iniciais), e teria backups das alterações incrementais, por exemplo, se eu fizesse uma alteração no dia 2, e alterei a mesma coisa novamente no dia 3, ainda consigo obter o arquivo com a alteração do dia 2, mas antes da alteração do dia 3.

Eu acho que é meio como o GitHub funciona :)

Eu sei que provavelmente poderia criar um script que executasse um diff e selecionasse os arquivos para backup com base no resultado (ou mais eficientemente, basta obter uma soma de verificação e comparar), mas quero saber se há algum utilitário que possa fazer isso ta mais facil :)

15 respostas

Voted

JoL · Answer 1 · 2020-02-17T09:25:28+08:00

I tried rsync, but it doesn't seem to be able to do what I want, or more likely, I don't know how to make it do that.

I know I could probably create a script that runs a diff and then selects the files to backup based on the result (or more efficiently, just get a checksum and compare), but I want to know if there's any utility that can do this a tad easier :)

rsync is precisely that program that copies based on a diff. By default, it copies only when there is a difference in last-modified time or size, but it can even compare by checksum with -c.

The trouble here is that you're tar'ing the backups. This becomes easier if you don't do that. I don't even know why you're doing it. It might make sense if you're compressing them, but you're not even doing that.

The Wikipedia article for Incremental Backups has an example rsync command that goes roughly:

rsync -va \
  --link-dest="$dst/2020-02-16--05-10-45--testdir/" \
  "$src/testdir/" \
  "$dst/2020-02-17--03-24-16--testdir/"

What it does is to hardlink files from the previous backup when they are unchanged from the source. There's also --copy-dest if you want it to copy instead (it's still faster when $dst is a remote or on a faster drive).

If you use a filesystem with subvolumes like btrfs, you can also just snapshot from the previous backup before rsync'ing. Snapshots are instantaneous and don't take additional space[1].

btrfs subvolume snapshot \
  "$dst/2020-02-16--05-10-45--testdir" \
  "$dst/2020-02-17--03-24-16--testdir"

Or if you're using a filesystem that supports reflinks, then you can also do that. Reflinks are done by making a new inode but referring to the same blocks as the source file, implementing COW support. It's still faster than regular copy because it doesn't read and write the data, and it also doesn't take additional space[1].

cp --reflink -av \
  "$dst/2020-02-16--05-10-45--testdir" \
  "$dst/2020-02-17--03-24-16--testdir"

Anyway, once having done something like that you can just do a regular rsync to copy the differences:

rsync -va \
  "$src/testdir/" \
  "$dst/2020-02-17--03-24-16--testdir/"

Though, you might want to add --delete, which would cause rsync to delete files from the destination that are no longer present in the source.

Another useful option is -i or --itemize-changes. It produces succinct, machine readable output that describes what changes rsync is doing. I normally add that option and pipe like:

rsync -Pai --delete \
  "$src/testdir/" \
  "$dst/2020-02-17--03-24-16--testdir/" \
|& tee -a "$dst/2020-02-17--03-24-16--testdir.log"

to keep record of the changes via easily grepable files. The |& is to pipe both stdout and stderr.

The -P is short for --partial and --progress. --partial keeps partially transferred files, but more importantly --progress reports per-file progress.

How this compares to archiving changes with tar

The above solutions result in directories that seem to hold everything. Even though that's the case, in total for any amount/frequency of backups, they would occupy around the same amount of space as having plain tar archives with only changes. That's because of how hardlinks, reflinks, and snapshots work. The use of bandwidth when creating the backups would also be the same.

The advantages are:

backups are easy to restore with rsync and faster, since rsync would only transfer the differences from the backup.
they're simpler to browse and modify if needed.
file deletions can be encoded naturally as the file's absence in new backups. When using tar archives, one would have to resort to hacks, like to delete a file foo, mark it foo.DELETED or do something complicated. I've never used duplicity for example, but looking at its documentation, it seems it encodes deletions by adding an empty file of the same name in the new tar and holding the original signature of the file in a separate .sigtar file. I imagine it compares the original signature with that of an empty file to differentiate between a file deletion and a change to an actual empty file.

If one still wants to setup each backup as only holding the files that are different (added or modified), then one can use the --link-dest solution described above and then delete the hardlinks using something like the following:

find $new_backup -type f ! -links 1 -delete

[1] Strictly speaking, they do use additional space in the form of duplicate metadata, like the filename and such. However, I think anyone would consider that insignificant.

nathan_gs · Answer 2 · 2020-02-17T07:18:06+08:00

nathan_gs

2020-02-17T07:18:06+08:002020-02-17T07:18:06+08:00

Although tar does have an incremental mode there are a couple of more comprehensive tools to do the job:

They not only support incremental backups, it's easy to configure a schedule on which a full backup needs to be taken. For example in duplicity: duplicity --full-if-older-than 1M will make sure a full backup has run. They also support going back in time to a specific file, with plain tar you'll have to go through all incremental files till you found one which contains the right file.

Additionally they do support encryption and uploading to a variety of backends (like sftp, blob storage, etc). Obviously if you encrypt, don't forget to make a good backup of your keys to a secondary backup!

Another important aspect is that you can verify the integrity of your backups, ensuring you can restore, eg using duplicity verify.

I would negatively advise on a git based backup strategy. Large restores take significant time.

11

user373503 · Answer 3 · 2020-02-17T00:24:24+08:00

E por que você não está se considerando git?

A estratégia que você descreve, após um backup completo e dois incrementais, tem suas complicações quando você continua. É fácil cometer erros e pode se tornar muito ineficiente, dependendo das mudanças. Teria que haver uma espécie de rodízio, ou seja, de vez em quando você faz um novo backup completo - e depois quer manter o antigo ou não?

Dado um diretório de trabalho "testdir" contendo algum projeto (arquivos e subdiretórios), gittorna por padrão um .gitsubdiretório oculto para os dados. Isso seria para os recursos de controle de versão locais e adicionais . Para backup, você pode arquivá-lo/copiá-lo para uma mídia ou cloná-lo via rede.

O controle de revisão que você obtém (sem pedir) é um efeito colateral do armazenamento diferencial do git.

Você pode deixar de fora toda a bifurcação/ramificação e assim por diante. Isso significa que você tem uma ramificação chamada "mestre".

Antes de fazer o commit (na verdade, gravar no git archive/repo), você precisa configurar um usuário mínimo para o arquivo de configuração. Então você deve primeiro aprender e testar em um subdiretório (talvez tmpfs). O Git é tão complicado quanto o tar, às vezes.

Enfim, como diz um comentário: fazer backup é fácil, o difícil é restaurar.

As desvantagens do git seriam apenas a pequena sobrecarga/exagero.

As vantagens são: o git rastreia o conteúdo e os nomes dos arquivos. Ele salva apenas o necessário, com base em um diff (pelo menos para arquivos de texto).

Exemplo

Eu tenho 3 arquivos em um dir. Depois git init, git add .e git commiteu tenho um .gitdir de 260K.

Então eu cp -r .git /tmp/abpic.git(um bom lugar para salvar um backup:). Eu rmo jpg de 154K, e também altero um arquivo de texto. eu também rm -r .git.

  ]# ls
    atext  btext

  ]# git --git-dir=/tmp/abpic.git/ ls-files
    atext
    btext
    pic154k.jpg

Antes de restaurar os arquivos, posso obter as diferenças precisas:

]# git --git-dir=/tmp/abpic.git/ status
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   atext
        deleted:    pic154k.jpg

no changes added to commit (use "git add" and/or "git commit -a")

Aqui eu quero seguir a git restoredica.

Depois git --git-dir=/tmp/abpic.git/ restore \*:

]# ls -st
total 164
  4 atext  156 pic154k.jpg    4 btext

O jpeg está de volta e o arquivo de texto nãobtext foi atualizado (mantém o carimbo de data/hora). As modificações em são sobrescritas.atext

Para reunir o repositório e o diretório (de trabalho), basta copiá-lo de volta.

]# cp -r /tmp/abpic.git/ .git
]# git status
On branch master
nothing to commit, working tree clean

Os arquivos no diretório atual são idênticos ao .gitarquivo (após o restore). Novas alterações serão exibidas e podem ser adicionadas e confirmadas, sem qualquer planejamento. Você só precisa armazená-lo em outra mídia, para fins de backup.

Depois que um arquivo é modificado, você pode usar statusou diff:

]# echo more >>btext 

]# git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   btext

no changes added to commit (use "git add" and/or "git commit -a")

]# git diff
diff --git a/btext b/btext
index 96b5d76..a4a6c5b 100644
--- a/btext
+++ b/btext
@@ -1,2 +1,3 @@
 This is file b
 second line
+more
#]

E assim como gitsabe sobre "+more" no arquivo 'btext', ele também armazenará essa linha de forma incremental.

Após git add .(ou git add btext) o statuscomando muda de vermelho para verde e commitfornece a informação.

]# git add .
]# git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   btext

]# git commit -m 'btext: more'
[master fad0453] btext: more
 1 file changed, 1 insertion(+)

E você pode realmente acessar o conteúdo, de alguma forma:

]# git ls-tree @
100644 blob 321e55a5dc61e25fe34e7c79f388101bd1ae4bbf    atext
100644 blob a4a6c5bd3359d84705e5fd01884caa8abd1736d0    btext
100644 blob 2d550ffe96aa4347e465109831ac52b7897b9f0d    pic154k.jpg

E então os primeiros 4 dígitos de hash hexadecimal

]# git cat-file blob a4a6
This is file b
second line
more

Para viajar de volta no tempo por um commit é:

]# git ls-tree @^
100644 blob 321e55a5dc61e25fe34e7c79f388101bd1ae4bbf    atext
100644 blob 96b5d76c5ee3ccb7e02be421e21c4fb8b96ca2f0    btext
100644 blob 2d550ffe96aa4347e465109831ac52b7897b9f0d    pic154k.jpg

]# git cat-file blob 96b5
This is file b
second line

O blob do btext tem um hash diferente antes do último commit, os outros têm o mesmo.

Uma visão geral seria:

]# git log
commit fad04538f7f8ddae1f630b648d1fe85c1fafa1b4 (HEAD -> master)
Author: Your Name <[email protected]>
Date:   Sun Feb 16 10:51:51 2020 +0000

    btext: more

commit 0bfc1837e20988f1b80f8b7070c5cdd2de346dc7
Author: Your Name <[email protected]>
Date:   Sun Feb 16 08:45:16 2020 +0000

    added 3 files with 'add .'

Em vez de arquivos tar com data e hora manualmente, você tem commits com uma mensagem e data (e um autor). Logicamente anexados a esses commits estão as listas de arquivos e o conteúdo.

Simples gité 20% mais complicado do que tar, mas você obtém 50% mais funcionalidades decisivas dele.

Eu queria fazer a terceira alteração do OP: alterar um arquivo mais dois novos arquivos de 'imagem'. Eu fiz, mas agora eu tenho:

]# git log
commit deca7be7de8571a222d9fb9c0d1287e1d4d3160c (HEAD -> master)
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:56:18 2020 +0000

    didn't add the pics before :(

commit b0355a07476c8d8103ce937ddc372575f0fb8ebf
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:54:03 2020 +0000

    Two new picture files
    Had to change btext...

commit fad04538f7f8ddae1f630b648d1fe85c1fafa1b4
Author: Your Name <[email protected]>
Date:   Sun Feb 16 10:51:51 2020 +0000

    btext: more

commit 0bfc1837e20988f1b80f8b7070c5cdd2de346dc7
Author: Your Name <[email protected]>
Date:   Sun Feb 16 08:45:16 2020 +0000

    added 3 files with 'add .'
]#

Então, o que o Cara do Seu Nome fez exatamente, em seus dois commits, pouco antes das 18h?

Os detalhes do último commit são:

]# git show
commit deca7be7de8571a222d9fb9c0d1287e1d4d3160c (HEAD -> master)
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:56:18 2020 +0000

    didn't add the pics before :(

diff --git a/picture2 b/picture2
new file mode 100644
index 0000000..d00491f
--- /dev/null
+++ b/picture2
@@ -0,0 +1 @@
+1
diff --git a/picture3 b/picture3
new file mode 100644
index 0000000..0cfbf08
--- /dev/null
+++ b/picture3
@@ -0,0 +1 @@
+2
]#

E para verificar o penúltimo commit, cuja mensagem anuncia duas fotos:

]# git show @^
commit b0355a07476c8d8103ce937ddc372575f0fb8ebf
Author: Your Name <[email protected]>
Date:   Sun Feb 16 17:54:03 2020 +0000

    Two new picture files
    Had to change btext...

diff --git a/btext b/btext
index a4a6c5b..de7291e 100644
--- a/btext
+++ b/btext
@@ -1,3 +1 @@
-This is file b
-second line
-more
+Completely changed file b
]#

Isso aconteceu porque tentei git commit -aatalho git add .e os dois arquivos eram novos (não rastreados). Ele apareceu em vermelho com git status, mas como eu disse git não é menos complicado que tar ou unix.

"Your debutante just knows what you need, but I know what you want" (or the other way round. Point is it's not always the same)

Angelo · Answer 4 · 2020-02-16T19:54:51+08:00

Best Answer

Angelo

2020-02-16T19:54:51+08:002020-02-16T19:54:51+08:00

Atualizar:

Por favor, veja algumas advertências aqui: É possível usar o tar para backups completos do sistema?

De acordo com essa resposta, a restauração de backups incrementais com tar é propensa a erros e deve ser evitada. Não use o método abaixo, a menos que tenha certeza absoluta de que pode recuperar seus dados quando precisar.

De acordo com a documentação, você pode usar a opção -g/--listed-incremental para criar arquivos tar incrementais, por exemplo.

tar -cg data.inc -f DATE-data.tar /path/to/data

Então da próxima vez faça algo como

tar -cg data.inc -f NEWDATE-data.tar /path/to/data

Onde data.inc são seus metadados incrementais e DATE-data.tar são seus arquivos incrementais.

5

schily · Answer 5 · 2020-02-17T00:28:39+08:00

schily

2020-02-17T00:28:39+08:002020-02-17T00:28:39+08:00

I recommend star for incremental backups, since star has been verified to reliably support incremental dumps and restores. The latter is what does not work with GNU tar when you rename directories even though it is advertized since 28 years.

Please read the star man page at http://schilytools.sourceforge.net/man/man1/star.1.html

The section about incremental backups is currently starting at page 53.

To download the source, get the schilytools tarball from http://sourceforge.net/projects/schilytools/files/

Check Is it possible to use tar for full system backups? for a verification of the GNU tar bug.

5

jcaron · Answer 6 · 2020-02-18T02:24:47+08:00

jcaron

2020-02-18T02:24:47+08:002020-02-18T02:24:47+08:00

I would recommend you get a look at Borg Backup.

This will handle backups that:

Are deduplicated. This indirectly makes it differential backups, but has more advantages:
- It will handle multiple copies of the same file
- Or even of the same blocks within different files
- Will help with files that grow (like logs)
- Will help with files that are renamed (like logs in some rotation setups)
Are compressed
Can be mounted like a regular remote file system (you can mount any of the previous backups)

It will manage pruning of old backups using rules such as "keep one daily backup for a week, one weekly backup for a month, one monthly backup for a year"

It's really easy to set up and use.

5

bit · Answer 7 · 2020-02-18T10:35:49+08:00

bit

2020-02-18T10:35:49+08:002020-02-18T10:35:49+08:00

Take a look at restic. It does incremental back ups using an algorithm called deduplication. It's also very easy to use so its great for a beginner or advanced command line user.

3

Eduardo Trápani · Answer 8 · 2020-02-16T19:37:29+08:00

Eduardo Trápani

2020-02-16T19:37:29+08:002020-02-16T19:37:29+08:00

Você pode tentar o BackupPC .

Ele permite backups incrementais, você pode decidir com que frequência fazê-los, quantos manter e, ao olhar para eles, pode vê-los consolidados ou apenas o backup incremental real. Também deduplica arquivos completos, caso estejam presentes em diferentes backups do mesmo ou de diferentes hosts.

Provavelmente já está empacotado para sua distribuição.

2

Azendale · Answer 9 · 2020-02-17T12:21:57+08:00

This is not exactly what you are requesting, because it doesn't use tar. But it does use rsync, and it has worked very well for me. On of the abilities that I really like is the ability to drop incremental restore points over time without loosing points before or after the one I am dropping. This allows me to, for example, have daily backups for the last 2 weeks, then thin those out once they get 2 weeks old so they are weekly for a couple months, then further thin those out until they are monthly for a quarter or two, then thin those out to about quarterly over the timespan of years. I have a python script that I can share that can prune these automatically if you want. (Obviously, it comes with NO WARRANTY as letting a computer automatically delete backups sounds a bit scary.)

What I do is use a ZFS pool & filesystem for storing backups. With the ZFS filesystem, which is (thankfully!) now usable on linux, you can take snapshots. When you write to a filesystem that has been snapshotted, it (smartly) writes a new version of only the changed blocks, thus making it an incremental backup. Even easier is that all of your snapshots can be mounted as a full (read only) Unix filesystem, that you can use all of your normal tools on to look at and copy from. Want to see what that file looked like 2 months ago? Just cd to the right folder and use less or vim or whatever to look at it. Want to see when a (hacked) wordpress install you were backing up went off the rails? Just do a grep for an identifying mark with something like grep -in /zfsbackup/computername/.zfs/snapshots/*/var/www/html/wp-config.php" "somebadstring"

You can even use Linux's LUKS system to do disk encryption and then present the mapped device as "drives" to ZFS, giving you an encrypted backup.

If you ever need to migrate your backups to a new drive, you can use zfs send & receive to move the entire filesystem.

It has been a year or two since I've set it up (I just keep adding on incremental backups and haven't needed to upgrade my backup drive for a while), so these will be rough instructions. Bear with me, or better yet, edit them.

First, make sure you have zfs, rsync, and, if you want to encrypt your backups, the LUKS tools installed.

First, create any partition layout you might want on your backup drive. (You may want to make a small unencrypted partition that has scripts for running the backup.)

Then, if you want disk encryption, encrypt the partition with LUKS (example assumes a backup drive of /dev/sde and a partition /dev/sde2 since /dev/sde1 is probably scripts):

sudo cryptsetup luksFormat /dev/sde2

(Put in a nice strong passphrase).

If you are doing disk encryption, now you need to open the volume:

sudo cryptsetup luksOpen /dev/sde2 zfsbackuppart1

(Now an unencrypted version of the raw device should be available (mapped) at /dev/mapper/zfsbackuppart1).

Now, create you ZFS pool (group of drive(s) holding data, multiple drives/devices can be used for RAID if you wish):

sudo zpool create zfsbackup /dev/mapper/zfsbackuppart1

This will create a ZFS pool named "zfsbackup".

Now, create a filesystem for each machine you are backing up:

sudo zfs create zfsbackup/machinename

And create a folder for each partition you want to back up from the source machine:

sudo mkdir /zfsbackup/machinename/slash/
sudo mkdir /zfsbackup/machinename/boot/

Then, use rsync to copy files to there:

sudo rsync -avx --numeric-ids --exclude .gvfs / /zfsbackup/machinename/slash/ --delete-after
sudo rsync -avx --numeric-ids --exclude .gvfs /boot/ /zfsbackup/machinename/boot/ --delete-after

Take a snapshot:

zfs snapshot zfsbackup/machinename@`date +%F_%T`

To disconnect the drive when you are done:

zpool export zfsbackup
# Next line, for each underlying encrypted block device, if using encryption:
cryptsetup luksClose zfsbackuppart1

And to set it up when taking another backup in the future, before the above rsync commands:

cryptsetup luksOpen /dev/sde2 zfsbackuppart1
zpool import zfsbackup

Let me know if you need more info on this approach, or are interested in a script to thin out backups as they get farther back in time.

And, yes, you can backup a whole system this way -- you just have to create partitions/filesystems (which don't have to match the original layout -- a great way to migrate stuff!), tweak /etc/fstab, and install GRUB & have it rescan/rebuild the GRUB config.

Paulo Tomé · Answer 10 · 2020-02-16T20:07:43+08:00

Paulo Tomé

2020-02-16T20:07:43+08:002020-02-16T20:07:43+08:00

Uma possibilidade é o AMANDA, o Advanced Maryland Automatic Network Disk Archiver , que entre muitos outros recursos, também suporta backups incrementais.

1

Utilitário de backup do Linux para backups incrementais

How this compares to archiving changes with tar

Exemplo

Possível firmware ausente /lib/firmware/i915/* para o módulo i915

Falha ao buscar o repositório de backports jessie

Como exportar uma chave privada GPG e uma chave pública para um arquivo

Como podemos executar um comando armazenado em uma variável?

Como configurar o systemd-resolved e o systemd-networkd para usar o servidor DNS local para resolver domínios locais e o servidor DNS remoto para domínios remotos?

apt-get update error no Kali Linux após a atualização do dist [duplicado]

Como ver as últimas linhas x do log de serviço systemctl

Nano - pule para o final do arquivo

erro grub: você precisa carregar o kernel primeiro

Como baixar o pacote não instalá-lo com o comando apt-get?

Utilitário de backup do Linux para backups incrementais

15 respostas

How this compares to archiving changes with tar

Exemplo

relate perguntas