Download Management

I had to download a bunch of files from a Cloudflare-protected site. Said site is annoying to scrape, because it is heavily using JS to generate unguessable links. Selenium is a possibility but I found that it would be defeated by these Cloudflare CAPTCHAs. Cloudscraper is not feasible either because links are not present in the downloaded HTML directly.

It's an acceptable solution to click the links manually in this case, but then another problem emerges. Firefox's download manager is rather substandard. This isn't an issue in most cases, but in this particular case it was a limiting factor, as downloads would fail and not be retried. Firefox also doesn't limit in-progress downloads, meaning they would saturate the connection and fail.

One possibility is you can use aria2c in daemon mode, and use a Firefox extension to add downloads to it. Aria2c will then queue the downloads and run them in order, with optional retry. The extension embeds something called AriaNg, which provides a nice web interface on top of the aria2c daemon, so it's actually quite friendly. The only tricky part is that you may need to start aria2c yourself. You can do that using the config below, which I took from the Arch wiki.

aria2c must be invoked with the --conf-path option to use this.

continue
daemon=true
dir=/mnt/disk/mydownloads
file-allocation=falloc
log-level=warn
max-connection-per-server=4
max-concurrent-downloads=3
max-overall-download-limit=0
min-split-size=5M
enable-http-pipelining=true

enable-rpc=true
rpc-listen-all=true
rpc-secret=xyzzy

retry-wait=5