I had to download a bunch of files from a Cloudflare-protected site. Said site is annoying to scrape, because it is heavily using JS to generate unguessable links. Selenium is a possibility but I found that it would be defeated by these Cloudflare CAPTCHAs. Cloudscraper is not feasible either because links are not present in the downloaded HTML directly.
It's an acceptable solution to click the links manually in this case, but then another problem emerges. Firefox's download manager is rather substandard. This isn't an issue in most cases, but in this particular case it was a limiting factor, as downloads would fail and not be retried. Firefox also doesn't limit in-progress downloads, meaning they would saturate the connection and fail.
One possibility is you can use aria2c
in daemon mode, and use a Firefox
extension to
add downloads to it. Aria2c will then queue the downloads and run them in
order, with optional retry. The extension embeds something called AriaNg, which
provides a nice web interface on top of the aria2c
daemon, so it's actually
quite friendly. The only tricky part is that you may need to start aria2c
yourself. You can do that using the config below, which I took from the Arch
wiki.
aria2c
must be invoked with the --conf-path
option to use this.
continue
daemon=true
dir=/mnt/disk/mydownloads
file-allocation=falloc
log-level=warn
max-connection-per-server=4
max-concurrent-downloads=3
max-overall-download-limit=0
min-split-size=5M
enable-http-pipelining=true
enable-rpc=true
rpc-listen-all=true
rpc-secret=xyzzy
retry-wait=5