I’m currently working on archiving all of the US government’s YouTube pages. Many of these pages have 5000-13000 videos. My issue is that after a few hundred, sometimes a few thousand videos, it starts to fail for every subsequent videos. What commands can I use to make this more reliable and reduce startup time in the case of a future crash?
The specific error is “video not available” when it clearly is.
You’re likely getting rate-limited. Try adding these to your config:
--sleep-requests SECONDS Number of seconds to sleep between requests
during data extraction
--sleep-interval SECONDS Number of seconds to sleep before each
download. This is the minimum time to sleep
when used along with --max-sleep-interval
(Alias: --min-sleep-interval)
I recommend downloading metadata first and then fan out downloads using multiple IPs using something like squid-dl.
If you only have one IP address then you could use something like this. It doesn’t prevent blocking but it has a fairly robust retry mechanism so you can retry downloading previously failed videos without downloading all the channel metadata over and over
On top of the other comments, abusing the --datebefore or --dateafter function will allow you to download in smaller chunks at a time by limiting the date range to download from. Depending on how many videos per year, you may need to limit to as low as a one month block. Wait a while, then do another block by adjusting the date for those function, or combine with other rate limiting functions.
I’m having such a frustrating time with this. I have a bash script that refreshes cookies, switches vpns, pauses, etc. Still the same cap on downloads… I guess it’s the account at this point.
I was using copied cookies from edge because cookies from the browser weren’t working. I’ve now created a new youtube account just for this and it looks like I’m working. The vast majority of time is being spent on sleep delays now. with 11.5k videos on one youtube page out of 12 that i’m going for, it’ll take a bit.