logo
 
  1. IT-Security >
  2. Podcasts >
  3. HPR2720: Download youtube channels using the rss feeds


ArabicEnglishFrenchGermanGreekItalianJapaneseKoreanPersianPolishPortugueseRussianSpanishTurkishVietnamese

HPR2720: Download youtube channels using the rss feeds


Podcasts vom | Direktlink: hackerpublicradio.org Nachrichten Bewertung

I had a very similar problem to Ahuka aka Kevin, in hpr2675 :: YouTube Playlists. I wanted to be able to download an entire youtube channel and store them so that I could play them in the order that they were posted.
See previous episode hpr2705 :: Youtube downloader for channels.

The problem with the original script is that it needs to download and check each video in each channel and it can crawl to a halt on large channels like EEEVblog.

The solution was given in hpr2544 :: How I prepared episode 2493: YouTube Subscriptions - update with more details in the full-length notes.

  1. Subscribe:
    Subscriptions are the currency of YouTube creators so don't be afraid to create an account to subscribe to the creators. Here is my current subscription_manager.opml to give you some ideas.
  2. Export:
    Login to https://www.youtube.com/subscription_manager and at the bottom you will see the option to Export subscriptions. Save the file and alter the script to point to it.
  3. Download: Run the script youtube-rss.bash

How it works

The first part allows you to define where you want to save your files. It also allows you to set what videos to skip based on length and strings in their titles.

savepath="/mnt/media/Videos/channels"
subscriptions="${savepath}/subscription_manager.opml"
logfile="${savepath}/log/downloaded.log"
youtubedl="/mnt/media/Videos/youtube-dl/youtube-dl"
DRYRUN="echo DEBUG: "
maxlength=7200 # two hours
skipcrap="fail |react |live |Best Pets|BLOOPERS|Kids Try"

After some checks and cleanup, we can then parse the opml file. This is an example of the top of mine.

<?xml version="1.0"?>
<opml version="1.1">
  <body>
    <outline text="YouTube Subscriptions" title="YouTube Subscriptions">
      <outline text="Wintergatan" title="Wintergatan" type="rss" xmlUrl="https://www.youtube.com/feeds/videos.xml?channel_id=UCcXhhVwCT6_WqjkEniejRJQ"/>
      <outline text="Primitive Technology" title="Primitive Technology" type="rss" xmlUrl="https://www.youtube.com/feeds/videos.xml?channel_id=UCAL3JXZSzSm8AlZyD3nQdBA"/>
      <outline text="John Ward" title="John Ward" type="rss" xmlUrl="https://www.youtube.com/feeds/videos.xml?channel_id=UC2uFFhnMKyF82UY2TbXRaNg"/>

Now we use the xmlstarlet tool to extract each of the urls and also the title. The title is just used to give some feedback, while the url needs to be stored for later. Now we have a complete list of all the current urls, in all the feeds.

xmlstarlet sel -T -t -m '/opml/body/outline/outline' -v 'concat( @xmlUrl, " ", @title)' -n "${subscriptions}" | while read subscription title
do
  echo "Getting "${title}""
  wget -q "${subscription}" -O - | xmlstarlet sel -T -t -m '/_:feed/_:entry/media:group/media:content' -v '@url' -n - | awk -F '?' '{print $1}'  >> "${logfile}_getlist"
done

The main part of the script then counts the total so we can have some feedback while we are running it. It then pumps the list from the previous step into a loop which first checks to make sure we have not already downloaded it.

count=1
total=$( sort "${logfile}_getlist" | uniq | wc -l )

sort "${logfile}_getlist" | uniq | while read thisvideo
do 
  if [ "$( grep "${thisvideo}" "${logfile}" | wc -l )" -eq 0 ];
  then

The next part takes advantage of the youtube-dl --dump-json command which downloads all sorts of information about the video which we store to query later.

    metadata="$( ${youtubedl} --dump-json ${thisvideo} )"
    uploader="$( echo $metadata | jq '.uploader' | awk -F '"' '{print $2}' )"
    title="$( echo $metadata | jq '.title' | awk -F '"' '{print $2}' )"
    upload_date="$( echo $metadata | jq '.upload_date' | awk -F '"' '{print $2}' )"
    id="$( echo $metadata | jq '.id' | awk -F '"' '{print $2}' )"
    duration="$( echo $metadata | jq '.duration' )"

Having the duration, we can skip long episodes.

    if [[ -z ${duration} || ${duration} -le 0 ]]
    then
      echo -e "nError: The duration "${length}" is strange. "${thisvideo}"."
      continue
    elif [[ ${duration} -ge ${maxlength} ]]
    then
      echo -e "nFilter: You told me not to download titles over ${maxlength} seconds long "${title}", "${thisvideo}""
      continue
    fi

Or videos that don't interest us.

    if [[ ! -z "${skipcrap}" && $( echo ${title} | egrep -i "${skipcrap}" | wc -l ) -ne 0 ]]
    then
      echo -e "nSkipping: You told me not to download this stuff. ${uploader}: "${title}", "${thisvideo}""
      continue
    else
      echo -e "n${uploader}: "${title}", "${thisvideo}""
    fi

Now we have a filtered list of urls we do want to keep. These we also save the description in a text file with the video id if we want to refer to it later.

    echo ${thisvideo} >> "${logfile}_todo"
    echo -e $( echo $metadata | jq '.description' ) > "${savepath}/description/${id}.txt"
  else
    echo -ne "rProcessing ${count} of ${total}"
  fi
  count=$((count+1))
done
echo ""

And finally we download the actual videos saving each channel in its own directory. The file names is first an ISO8601 date, then the title stored as ASCII with no space or ampersands. I then use a "⋄" as a delimiter before the video id.

# Download the list
if [ -e "${logfile}_todo" ];
then
  cat "${logfile}_todo" | ${youtubedl} --batch-file - --ignore-errors --no-mtime --restrict-filenames --format mp4 -o "${savepath}"'/%(uploader)s/%(upload_date)s-%(title)s⋄%(id)s.%(ext)s'
  cat "${logfile}_todo" >> ${logfile}
fi

Now you have a fast script that keeps you up to date with your feeds.

...

http://hackerpublicradio.org/eps.php?id=2720

Externe Webseite mit kompletten Inhalt öffnen

Kommentiere zu HPR2720: Download youtube channels using the rss feeds






➤ Ähnliche Beiträge

  • 1.

    download-manager Plugin bis 2.9.51 auf WordPress wp-admin/admin-ajax.php wpdm_generate_password id Cross Site Scripting

    vom 192.92 Punkte ic_school_black_18dp
    In download-manager Plugin bis 2.9.51 auf WordPress wurde eine problematische Schwachstelle ausgemacht. Es geht um die Funktion wpdm_generate_password der Datei wp-admin/admin-ajax.php. Durch die Manipulation des Arguments id durch Parameter kann eine Cross Site
  • 2.

    HPR2720: Download youtube channels using the rss feeds

    vom 178.12 Punkte ic_school_black_18dp
    I had a very similar problem to Ahuka aka Kevin, in hpr2675 :: YouTube Playlists. I wanted to be able to download an entire youtube channel and store them so that I could play them in the order that they were posted. See previous episode hpr2705 ::
  • 3.

    Masterlist of Privacy, Compsec, Tech & Internet RSS Feeds

    vom 177.22 Punkte ic_school_black_18dp
    I asked about a list some weeks ago and I was frustrated that nobody made a list like this, so I took the time and did it. It is very useful to have a constant access to tech, infosec and privacy related news to keep ourselves as updates as possible
  • 4.

    A MILLION SUBS IN A YEAR: YOUTUBE MARKETING AND YOUTUBE SEO

    vom 114.14 Punkte ic_school_black_18dp
    Highest Rated Created by Max Wilhard Last updated 5/2018 English What Will I Learn? After completing the course, you will know the secrets used by AlexSuper, PewDiePie, VanossGaming, Markiplier, DanTDM, and other popular YouTubers to get a million subs
  • 5.

    Watch sessions from the Playtime 2016 events to learn how to succeed on Android & Google Play

    vom 99.09 Punkte ic_school_black_18dp
    Posted by Patricia Correa, Head of Developer Marketing, Google Play We’re wrapping up our annual global Playtime series of events with a last stop in Tokyo, Japan. This year Google Play hosted events in 10 cities: London, Paris, Berlin, Hong Kong, Sin
  • 6.

    Watch sessions from the Playtime 2016 events to learn how to succeed on Android & Google Play

    vom 99.09 Punkte ic_school_black_18dp
    Posted by Patricia Correa, Head of Developer Marketing, Google Play We’re wrapping up our annual global Playtime series of events with a last stop in Tokyo, Japan. This year Google Play hosted events in 10 cities: London, Paris, Berlin, Hong Kong, Sin
  • 7.

    CVE-2019-0708: A Comprehensive Analysis of a Remote Desktop Services Vulnerability

    vom 97.21 Punkte ic_school_black_18dp
    In the May 2019 patch cycle, Microsoft released a patch for a remote code execution bug in their Remote Desktop Services (RDS). A remote, unauthenticated attacker can exploit this vulnerability by sending crafted RDP messages to the target server. Success
  • 8.

    Performance Improvements in .NET Core 3.0

    vom 93.39 Punkte ic_school_black_18dp
    Back when we were getting ready to ship .NET Core 2.0, I wrote a blog post exploring some of the many performance improvements that had gone into it. I enjoyed putting it together so much and received such a positive response to the post that I did it
  • 9.

    TrackerJacker- To Know All Nearby WiFi Networks And Devices Connected To Each Network

    vom 87.54 Punkte ic_school_black_18dp
    TrackerJacker- To Know All Nearby WiFi Networks And Devices Connected To Each Network Like Nmap for mapping WiFi networks you're not connected to, plus device tracking. Maps and tracks WiFi networks and devices through raw 802.11 monitoring. PyPI p
  • 10.

    Parents Can Now Limit YouTube Kids To Human-Reviewed Channels and Recommendations

    vom 84.74 Punkte ic_school_black_18dp
    Google is announcing an expanded series of parental controls for its YouTube Kids application. "The new features will allow parents to lock down the YouTube Kids app so it only displays those channels that have been reviewed by humans, not just algorithms,"
  • 11.

    Azure.Source &#8211; Volume 62

    vom 83.14 Punkte ic_school_black_18dp
    KubeCon North America 2018 KubeCon North America 2018: Serverless Kubernetes and community led innovation! Brendan Burns, Distinguished Engineer in Microsoft Azure and co-founder of the Kubernetes project, provides a welcome to KubeCon North America 2018, which took
  • 12.

    DEF CON 23 Torrents and RSS Feeds are Live!

    vom 79.51 Punkte ic_school_black_18dp
    For your holiday binge-watching, we recommend you fire up your torrent-guzzling devices, clear some drive space and get some of this good stuff! All the talks from DEF CON 23's main series? Check. Village Talks? Check. There's even an audio-only for