Debian 测试 wget 段错误

Question

T. Caio

Asked: 2018-09-22 05:11:50 +0800 CST2018-09-22 05:11:50 +0800 CST 2018-09-22 05:11:50 +0800 CST

Wget：选择性地和递归地下载文件？

772

关于wget、子文件夹和 index.html 的问题。

假设我在“travels/”文件夹中，它在“website.com”中：“website.com/travels/”。

文件夹“travels/”包含很多文件和其他（子）文件夹：“website.com/travels/list.doc”、“website.com/travels/cover.png”、“website.com/travels/[1990 ] America/" , "website.com/travels/[1994] Japan/" 等等...

如何仅下载所有子文件夹中的所有“.mov”和“.jpg”？我不想从“travels/”中选择文件（例如，不是“website.com/travels/list.doc”）

我找到了一个wget命令（在 Unix&Linux Exchange 上，我不记得讨论了什么）能够从子文件夹下载它们的“index.html”，而不是其他内容。为什么只下载索引文件？

1 个回答

Voted

user88036 · Answer 1 · 2018-09-22T05:39:35+08:00

此命令将仅从给定网站下载图像和电影：

wget -nd -r -P /save/location -A jpeg,jpg,bmp,gif,png,mov "http://www.somedomain.com"

根据wget man：

-nd prevents the creation of a directory hierarchy (i.e. no directories).

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

如果您想下载子文件夹，您需要使用 flag --no-parent，类似于以下命令：

wget -r -l1 --no-parent -P /save/location -A jpeg,jpg,bmp,gif,png,mov "http://www.somedomain.com"

-r: recursive retrieving
-l1: sets the maximum recursion depth to be 1
--no-parent: does not ascend to the parent; only downloads from the specified subdirectory and downwards hierarchy

关于 index.html 网页。一旦该标志-A包含在命令中将被排除wget，因为该标志将强制wget下载特定类型的文件，这意味着如果html不包含在要下载的已接受文件列表中（即标志A），则不会下载并将wget在终端中输出以下消息：

Removing /save/location/default.htm since it should be rejected.

wget当这些文件存在于提供的 URL 链接中时，可以下载特定类型的文件，例如（jpg、jpeg、png、mov、avi、mpeg、...等）wget，例如：

假设我们想从这个网站下载 .zip 和 .chd 文件

在此链接中有文件夹和 .zip 文件（滚动到最后）。现在，假设我们要运行这个命令：

wget -r --no-parent -P /save/location -A chd,zip "https://archive.org/download/MAME0.139_MAME2010_Reference_Set_ROMs_CHDs_Samples/roms/"

此命令将下载 .zip 文件，同时它会为 .chd 文件创建一个空文件夹。

为了下载 .chd 文件，我们需要提取空文件夹的名称，然后将这些文件夹名称转换为其实际 URL。然后，将所有感兴趣的 URL 放入一个文本文件file.txt中，最后将该文本文件馈送到wget，如下：

wget -r --no-parent -P /save/location -A chd,zip -i file.txt

前面的命令将找到所有 chd 文件。

Wget：选择性地和递归地下载文件？

如何将 GPG 私钥和公钥导出到文件

ssh 无法协商：“找不到匹配的密码”，正在拒绝 cbc

我们如何运行存储在变量中的命令？

如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域？

如何卸载内核模块“nvidia-drm”？

dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

如何从 systemctl 服务日志中查看最新的 x 行

Nano - 跳转到文件末尾

grub 错误：你需要先加载内核

如何下载软件包而不是使用 apt-get 命令安装它？

Wget：选择性地和递归地下载文件？

1 个回答

相关问题