I'm looking for tips to prevent erroring when downloading large channels

I’m currently working on archiving all of the US government’s YouTube pages. Many of these pages have 5000-13000 videos. My issue is that after a few hundred, sometimes a few thousand videos, it starts to fail for every subsequent videos. What commands can I use to make this more reliable and reduce startup time in the case of a future crash?

The specific error is “video not available” when it clearly is.

You’re likely getting rate-limited. Try adding these to your config:

--sleep-requests SECONDS        Number of seconds to sleep between requests
                                during data extraction
--sleep-interval SECONDS        Number of seconds to sleep before each
                                download. This is the minimum time to sleep
                                when used along with --max-sleep-interval
                                (Alias: --min-sleep-interval)

Could you try Hola or Urban VPN?

These may give you home IP addresses so they’re less likely to be detected than other VPNs.

I recommend downloading metadata first and then fan out downloads using multiple IPs using something like squid-dl.

If you only have one IP address then you could use something like this. It doesn’t prevent blocking but it has a fairly robust retry mechanism so you can retry downloading previously failed videos without downloading all the channel metadata over and over

Why in gods name are you downloading that much content

On top of the other comments, abusing the --datebefore or --dateafter function will allow you to download in smaller chunks at a time by limiting the date range to download from. Depending on how many videos per year, you may need to limit to as low as a one month block. Wait a while, then do another block by adjusting the date for those function, or combine with other rate limiting functions.

That works very well in combination with a webproxy.

Ok, so the only thing I’m not using here is sleep requests.

I’m having such a frustrating time with this. I have a bash script that refreshes cookies, switches vpns, pauses, etc. Still the same cap on downloads… I guess it’s the account at this point.

and a vpn so you hop IP’s every 30 mins or so

Personally if you are trying something this size I would spin up a VM or 3 and load up a vpn on it…

Not sure if it helps, but try reducing your download speed a bit too maybe, i use something like:

–limit-rate 3M

If your usual speed is super fast, try halfing it maybe?

Also use a vpn and rotate IPs? Script it to detect download failures and auto reconnect to get a new IP

I was using copied cookies from edge because cookies from the browser weren’t working. I’ve now created a new youtube account just for this and it looks like I’m working. The vast majority of time is being spent on sleep delays now. with 11.5k videos on one youtube page out of 12 that i’m going for, it’ll take a bit.