Import Upstream version 2020.01.24

author Rogério Brito <rbrito@ime.usp.br>

Fri, 10 Apr 2020 12:32:01 +0000 (09:32 -0300)

committer Rogério Brito <rbrito@ime.usp.br>

Fri, 10 Apr 2020 12:32:01 +0000 (09:32 -0300)
author Rogério Brito <rbrito@ime.usp.br>
Fri, 10 Apr 2020 12:32:01 +0000 (09:32 -0300)
committer Rogério Brito <rbrito@ime.usp.br>
Fri, 10 Apr 2020 12:32:01 +0000 (09:32 -0300)
diff --git a/ChangeLog b/ChangeLog

index 80681a9aeb7f5540c7142eaf04ae8828e1a7c233..94aa9f327c73a5c8ee3e792b8435a445de1432b1 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,363 @@
+version 2020.01.24
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#23819)
+* [stretchinternet] Fix extraction (#4319)
+* [voicerepublic] Fix extraction
+* [azmedien] Fix extraction (#23783)
+* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
++ [24video] Add support for 24video.vip (#23753)
+* [ivi:compilation] Fix entries extraction (#23770)
+* [ard] Improve extraction (#23761)
+    * Simplify extraction
+    + Extract age limit and series
+    * Bypass geo-restriction
++ [nbc] Add support for nbc multi network URLs (#23049)
+* [americastestkitchen] Fix extraction
+* [zype] Improve extraction
+    + Extract subtitles (#21258)
+    + Support URLs with alternative keys/tokens (#21258)
+    + Extract more metadata
+* [orf:tvthek] Improve geo restricted videos detection (#23741)
+* [soundcloud] Restore previews extraction (#23739)
+
+
+version 2020.01.15
+
+Extractors
+* [yourporn] Fix extraction (#21645, #22255, #23459)
++ [canvas] Add support for new API endpoint (#17680, #18629)
+* [ndr:base:embed] Improve thumbnails extraction (#23731)
++ [vodplatform] Add support for embed.kwikmotion.com domain
++ [twitter] Add support for promo_video_website cards (#23711)
+* [orf:radio] Clean description and improve extraction
+* [orf:fm4] Fix extraction (#23599)
+* [safari] Fix kaltura session extraction (#23679, #23670)
+* [lego] Fix extraction and extract subtitle (#23687)
+* [cloudflarestream] Improve extraction
+    + Add support for bytehighway.net domain
+    + Add support for signed URLs
+    + Extract thumbnail
+* [naver] Improve extraction
+    * Improve geo-restriction handling
+    + Extract automatic captions
+    + Extract uploader metadata
+    + Extract VLive HLS formats
+    * Improve metadata extraction
+- [pandatv] Remove extractor (#23630)
+* [dctp] Fix format extraction (#23656)
++ [scrippsnetworks] Add support for www.discovery.com videos
+* [discovery] Fix anonymous token extraction (#23650)
+* [nrktv:seriebase] Fix extraction (#23625, #23537)
+* [wistia] Improve format extraction and extract subtitles (#22590)
+* [vice] Improve extraction (#23631)
+* [redtube] Detect private videos (#23518)
+
+
+version 2020.01.01
+
+Extractors
+* [brightcove] Invalidate policy key cache on failing requests
+* [pornhub] Improve locked videos detection (#22449, #22780)
++ [pornhub] Add support for m3u8 formats
+* [pornhub] Fix extraction (#22749, #23082)
+* [brightcove] Update policy key on failing requests
+* [spankbang] Improve removed video detection (#23423)
+* [spankbang] Fix extraction (#23307, #23423, #23444)
+* [soundcloud] Automatically update client id on failing requests
+* [prosiebensat1] Improve geo restriction handling (#23571)
+* [brightcove] Cache brightcove player policy keys
+* [teachable] Fail with error message if no video URL found
+* [teachable] Improve locked lessons detection (#23528)
++ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
+* [mitele] Fix extraction (#21354, #23456)
+* [soundcloud] Update client id (#23516)
+* [mailru] Relax URL regular expressions (#23509)
+
+
+version 2019.12.25
+
+Core
+* [utils] Improve str_to_int
++ [downloader/hls] Add ability to override AES decryption key URL (#17521)
+
+Extractors
+* [mediaset] Fix parse formats (#23508)
++ [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
++ [slideslive] Add support for url and vimeo service names (#23414)
+* [slideslive] Fix extraction (#23413)
+* [twitch:clips] Fix extraction (#23375)
++ [soundcloud] Add support for token protected embeds (#18954)
+* [vk] Improve extraction
+    * Fix User Videos extraction (#23356)
+    * Extract all videos for lists with more than 1000 videos (#23356)
+    + Add support for video albums (#14327, #14492)
+- [kontrtube] Remove extractor
+- [videopremium] Remove extractor
+- [musicplayon] Remove extractor (#9225)
++ [ufctv] Add support for ufcfightpass.imgdge.com and
+  ufcfightpass.imggaming.com (#23343)
++ [twitch] Extract m3u8 formats frame rate (#23333)
++ [imggaming] Add support for playlists and extract subtitles
++ [ufcarabia] Add support for UFC Arabia (#23312)
+* [ufctv] Fix extraction
+* [yahoo] Fix gyao brightcove player id (#23303)
+* [vzaar] Override AES decryption key URL (#17521)
++ [vzaar] Add support for AES HLS manifests (#17521, #23299)
+* [nrl] Fix extraction
+* [teachingchannel] Fix extraction
+* [nintendo] Fix extraction and partially add support for Nintendo Direct
+  videos (#4592)
++ [ooyala] Add better fallback values for domain and streams variables
++ [youtube] Add support youtubekids.com (#23272)
+* [tv2] Detect DRM protection
++ [tv2] Add support for katsomo.fi and mtv.fi (#10543)
+* [tv2] Fix tv2.no article extraction
+* [msn] Improve extraction
+    + Add support for YouTube and NBCSports embeds
+    + Add support for articles with multiple videos
+    * Improve AOL embed support
+    * Improve format extraction
+* [abcotvs] Relax URL regular expression and improve metadata extraction
+  (#18014)
+* [channel9] Reduce response size
+* [adobetv] Improve extaction
+    * Use OnDemandPagedList for list extractors
+    * Reduce show extraction requests
+    * Extract original video format and subtitles
+    + Add support for adobe tv embeds
+
+
+version 2019.11.28
+
+Core
++ [utils] Add generic caesar cipher and rot47
+* [utils] Handle rd-suffixed day parts in unified_strdate (#23199)
+
+Extractors
+* [vimeo] Improve extraction
+    * Fix review extraction
+    * Fix ondemand extraction
+    * Make password protected player case as an expected error (#22896)
+    * Simplify channel based extractors code
+- [openload] Remove extractor (#11999)
+- [verystream] Remove extractor
+- [streamango] Remove extractor (#15406)
+* [dailymotion] Improve extraction
+    * Extract http formats included in m3u8 manifest
+    * Fix user extraction (#3553, #21415)
+    + Add suport for User Authentication (#11491)
+    * Fix password protected videos extraction (#23176)
+    * Respect age limit option and family filter cookie value (#18437)
+    * Handle video url playlist query param
+    * Report allowed countries for geo-restricted videos
+* [corus] Improve extraction
+    + Add support for Series Plus, W Network, YTV, ABC Spark, disneychannel.com
+      and disneylachaine.ca (#20861)
+    + Add support for self hosted videos (#22075)
+    * Detect DRM protection (#14910, #9164)
+* [vivo] Fix extraction (#22328, #22279)
++ [bitchute] Extract upload date (#22990, #23193)
+* [soundcloud] Update client id (#23214)
+
+
+version 2019.11.22
+
+Core
++ [extractor/common] Clean jwplayer description HTML tags
++ [extractor/common] Add data, headers and query to all major extract formats
+  methods
+
+Extractors
+* [chaturbate] Fix extraction (#23010, #23012)
++ [ntvru] Add support for non relative file URLs (#23140)
+* [vk] Fix wall audio thumbnails extraction (#23135)
+* [ivi] Fix format extraction (#21991)
+- [comcarcoff] Remove extractor
++ [drtv] Add support for new URL schema (#23059)
++ [nexx] Add support for Multi Player JS Setup (#23052)
++ [teamcoco] Add support for new videos (#23054)
+* [soundcloud] Check if the soundtrack has downloads left (#23045)
+* [facebook] Fix posts video data extraction (#22473)
+- [addanime] Remove extractor
+- [minhateca] Remove extractor
+- [daisuki] Remove extractor
+* [seeker] Fix extraction
+- [revision3] Remove extractors
+* [twitch] Fix video comments URL (#18593, #15828)
+* [twitter] Improve extraction
+    + Add support for generic embeds (#22168)
+    * Always extract http formats for native videos (#14934)
+    + Add support for Twitter Broadcasts (#21369)
+    + Extract more metadata
+    * Improve VMap format extraction
+    * Unify extraction code for both twitter statuses and cards
++ [twitch] Add support for Clip embed URLs
+* [lnkgo] Fix extraction (#16834)
+* [mixcloud] Improve extraction
+    * Improve metadata extraction (#11721)
+    * Fix playlist extraction (#22378)
+    * Fix user mixes extraction (#15197, #17865)
++ [kinja] Add support for Kinja embeds (#5756, #11282, #22237, #22384)
+* [onionstudios] Fix extraction
++ [hotstar] Pass Referer header to format requests (#22836)
+* [dplay] Minimize response size
++ [patreon] Extract uploader_id and filesize
+* [patreon] Minimize response size
+* [roosterteeth] Fix login request (#16094, #22689)
+
+
+version 2019.11.05
+
+Extractors
++ [scte] Add support for learning.scte.org (#22975)
++ [msn] Add support for Vidible and AOL embeds (#22195, #22227)
+* [myspass] Fix video URL extraction and improve metadata extraction (#22448)
+* [jamendo] Improve extraction
+    * Fix album extraction (#18564)
+    * Improve metadata extraction (#18565, #21379)
+* [mediaset] Relax URL guid matching (#18352)
++ [mediaset] Extract unprotected M3U and MPD manifests (#17204)
+* [telegraaf] Fix extraction
++ [bellmedia] Add support for marilyn.ca videos (#22193)
+* [stv] Fix extraction (#22928)
+- [iconosquare] Remove extractor
+- [keek] Remove extractor
+- [gameone] Remove extractor (#21778)
+- [flipagram] Remove extractor
+- [bambuser] Remove extractor
+* [wistia] Reduce embed extraction false positives
++ [wistia] Add support for inline embeds (#22931)
+- [go90] Remove extractor
+* [kakao] Remove raw request
++ [kakao] Extract format total bitrate
+* [daum] Fix VOD and Clip extracton (#15015)
+* [kakao] Improve extraction
+    + Add support for embed URLs
+    + Add support for Kakao Legacy vid based embed URLs
+    * Only extract fields used for extraction
+    * Strip description and extract tags
+* [mixcloud] Fix cloudcast data extraction (#22821)
+* [yahoo] Improve extraction
+    + Add support for live streams (#3597, #3779, #22178)
+    * Bypass cookie consent page for european domains (#16948, #22576)
+    + Add generic support for embeds (#20332)
+* [tv2] Fix and improve extraction (#22787)
++ [tv2dk] Add support for TV2 DK sites
+* [onet] Improve extraction …
+    + Add support for onet100.vod.pl
+    + Extract m3u8 formats
+    * Correct audio only format info
+* [fox9] Fix extraction
+
+
+version 2019.10.29
+
+Core
+* [utils] Actualize major IPv4 address blocks per country
+
+Extractors
++ [go] Add support for abc.com and freeform.com (#22823, #22864)
++ [mtv] Add support for mtvjapan.com
+* [mtv] Fix extraction for mtv.de (#22113)
+* [videodetective] Fix extraction
+* [internetvideoarchive] Fix extraction
+* [nbcnews] Fix extraction (#12569, #12576, #21703, #21923)
+- [hark] Remove extractor
+- [tutv] Remove extractor
+- [learnr] Remove extractor
+- [macgamestore] Remove extractor
+* [la7] Update Kaltura service URL (#22358)
+* [thesun] Fix extraction (#16966)
+- [makertv] Remove extractor
++ [tenplay] Add support for 10play.com.au (#21446)
+* [soundcloud] Improve extraction
+    * Improve format extraction (#22123)
+    + Extract uploader_id and uploader_url (#21916)
+    + Extract all known thumbnails (#19071, #20659)
+    * Fix extration for private playlists (#20976)
+    + Add support for playlist embeds (#20976)
+    * Skip preview formats (#22806)
+* [dplay] Improve extraction
+    + Add support for dplay.fi, dplay.jp and es.dplay.com (#16969)
+    * Fix it.dplay.com extraction (#22826)
+    + Extract creator, tags and thumbnails
+    * Handle playback API call errors
++ [discoverynetworks] Add support for dplay.co.uk
+* [vk] Improve extraction
+    + Add support for Odnoklassniki embeds
+    + Extract more videos from user lists (#4470)
+    + Fix wall post audio extraction (#18332)
+    * Improve error detection (#22568)
++ [odnoklassniki] Add support for embeds
+* [puhutv] Improve extraction
+    * Fix subtitles extraction
+    * Transform HLS URLs to HTTP URLs
+    * Improve metadata extraction
+* [ceskatelevize] Skip DRM media
++ [facebook] Extract subtitles (#22777)
+* [globo] Handle alternative hash signing method
+
+
+version 2019.10.22
+
+Core
+* [utils] Improve subtitles_filename (#22753)
+
+Extractors
+* [facebook] Bypass download rate limits (#21018)
++ [contv] Add support for contv.com
+- [viewster] Remove extractor
+* [xfileshare] Improve extractor (#17032, #17906, #18237, #18239)
+    * Update the list of domains
+    + Add support for aa-encoded video data
+    * Improve jwplayer format extraction
+    + Add support for Clappr sources
+* [mangomolo] Fix video format extraction and add support for player URLs
+* [audioboom] Improve metadata extraction
+* [twitch] Update VOD URL matching (#22395, #22727)
+- [mit] Remove support for video.mit.edu (#22403)
+- [servingsys] Remove extractor (#22639)
+* [dumpert] Fix extraction (#22428, #22564)
+* [atresplayer] Fix extraction (#16277, #16716)
+
+
+version 2019.10.16
+
+Core
+* [extractor/common] Make _is_valid_url more relaxed
+
+Extractors
+* [vimeo] Improve album videos id extraction (#22599)
++ [globo] Extract subtitles (#22713)
+* [bokecc] Improve player params extraction (#22638)
+* [nexx] Handle result list (#22666)
+* [vimeo] Fix VHX embed extraction
+* [nbc] Switch to graphql API (#18581, #22693, #22701)
+- [vessel] Remove extractor
+- [promptfile] Remove extractor (#6239)
+* [kaltura] Fix service URL extraction (#22658)
+* [kaltura] Fix embed info strip (#22658)
+* [globo] Fix format extraction (#20319)
+* [redtube] Improve metadata extraction (#22492, #22615)
+* [pornhub:uservideos:upload] Fix extraction (#22619)
++ [telequebec:squat] Add support for squat.telequebec.tv (#18503)
+- [wimp] Remove extractor (#22088, #22091)
++ [gfycat] Extend URL regular expression (#22225)
++ [chaturbate] Extend URL regular expression (#22309)
+* [peertube] Update instances (#22414)
++ [telequebec] Add support for coucou.telequebec.tv (#22482)
++ [xvideos] Extend URL regular expression (#22471)
+- [youtube] Remove support for invidious.enkirton.net (#22543)
++ [openload] Add support for oload.monster (#22592)
+* [nrktv:seriebase] Fix extraction (#22596)
++ [youtube] Add support for yt.lelux.fi (#22597)
+* [orf:tvthek] Make manifest requests non fatal (#22578)
+* [teachable] Skip login when already logged in (#22572)
+* [viewlift] Improve extraction (#22545)
+* [nonktube] Fix extraction (#22544)
+
+
  version 2019.09.28
  
  Core
@@ -353,7 +713,7 @@ Extractors
  version 2019.04.17
  
  Extractors
-* [openload] Randomize User-Agent (closes #20688)
+* [openload] Randomize User-Agent (#20688)
  + [openload] Add support for oladblock domains (#20471)
  * [adn] Fix subtitle extraction (#12724)
  + [aol] Add support for localized websites
@@ -918,7 +1278,7 @@ Extractors
  + [youtube] Extract channel meta fields (#9676, #12939)
  * [porntube] Fix extraction (#17541)
  * [asiancrush] Fix extraction (#15630)
-+ [twitch:clips] Extend URL regular expression (closes #17559)
++ [twitch:clips] Extend URL regular expression (#17559)
  + [vzaar] Add support for HLS
  * [tube8] Fix metadata extraction (#17520)
  * [eporner] Extract JSON-LD (#17519)
diff --git a/README.md b/README.md

index c39b13616946fbb1c07377e1bc74a39a3dfc4ee5..01f975958c8370016a39c9f3fb872241c977c62b 100644 (file)
--- a/README.md
+++ b/README.md
@@ -752,8 +752,8 @@ As a last resort, you can also uninstall the version installed by your package m
  Afterwards, simply follow [our manual installation instructions](https://ytdl-org.github.io/youtube-dl/download.html):
  
  ```
-sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
-sudo chmod a+x /usr/local/bin/youtube-dl
+sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
+sudo chmod a+rx /usr/local/bin/youtube-dl
  hash -r
  ```
  
diff --git a/README.txt b/README.txt

index 045e0b19a873b35e5856240fe3975757288f0625..cc86a1be504693d38cc15f08a567a4c8be01b872 100644 (file)
--- a/README.txt
+++ b/README.txt
@@ -960,8 +960,8 @@ that, remove the distribution's package, with a line like
  
  Afterwards, simply follow our manual installation instructions:
  
-    sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
-    sudo chmod a+x /usr/local/bin/youtube-dl
+    sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
+    sudo chmod a+rx /usr/local/bin/youtube-dl
      hash -r
  
  Again, from then on you'll be able to update with sudo youtube-dl -U.
diff --git a/devscripts/create-github-release.py b/devscripts/create-github-release.py

index 428111b3f0e893d9ae53da648844833e87dd72b3..2ddfa109698864f642b97f8bcb9846f84631e4d9 100644 (file)
--- a/devscripts/create-github-release.py
+++ b/devscripts/create-github-release.py
@@ -1,7 +1,6 @@
  #!/usr/bin/env python
  from __future__ import unicode_literals
  
-import base64
  import io
  import json
  import mimetypes
@@ -15,7 +14,6 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  
  from youtube_dl.compat import (
      compat_basestring,
-    compat_input,
      compat_getpass,
      compat_print,
      compat_urllib_request,
@@ -40,28 +38,20 @@ class GitHubReleaser(object):
          try:
              info = netrc.netrc().authenticators(self._NETRC_MACHINE)
              if info is not None:
-                self._username = info[0]
-                self._password = info[2]
+                self._token = info[2]
                  compat_print('Using GitHub credentials found in .netrc...')
                  return
              else:
                  compat_print('No GitHub credentials found in .netrc')
          except (IOError, netrc.NetrcParseError):
              compat_print('Unable to parse .netrc')
-        self._username = compat_input(
-            'Type your GitHub username or email address and press [Return]: ')
-        self._password = compat_getpass(
-            'Type your GitHub password and press [Return]: ')
+        self._token = compat_getpass(
+            'Type your GitHub PAT (personal access token) and press [Return]: ')
  
      def _call(self, req):
          if isinstance(req, compat_basestring):
              req = sanitized_Request(req)
-        # Authorizing manually since GitHub does not response with 401 with
-        # WWW-Authenticate header set (see
-        # https://developer.github.com/v3/#basic-authentication)
-        b64 = base64.b64encode(
-            ('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
-        req.add_header('Authorization', 'Basic %s' % b64)
+        req.add_header('Authorization', 'token %s' % self._token)
          response = self._opener.open(req).read().decode('utf-8')
          return json.loads(response)
  
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 35275278bf68f667e3c19c46d1634072e2bbbe59..e9a8cc27adfc8a3da64f5a41ad8319e1563d5059 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -26,13 +26,13 @@
   - **AcademicEarth:Course**
   - **acast**
   - **acast:channel**
- - **AddAnime**
   - **ADN**: Anime Digital Network
   - **AdobeConnect**
- - **AdobeTV**
- - **AdobeTVChannel**
- - **AdobeTVShow**
- - **AdobeTVVideo**
+ - **adobetv**
+ - **adobetv:channel**
+ - **adobetv:embed**
+ - **adobetv:show**
+ - **adobetv:video**
   - **AdultSwim**
   - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
   - **afreecatv**: afreecatv.com
@@ -76,8 +76,6 @@
   - **awaan:video**
   - **AZMedien**: AZ Medien videos
   - **BaiduVideo**: 百度视频
- - **bambuser**
- - **bambuser:channel**
   - **Bandcamp**
   - **Bandcamp:album**
   - **Bandcamp:weekly**
@@ -177,12 +175,12 @@
   - **CNN**
   - **CNNArticle**
   - **CNNBlogs**
- - **ComCarCoff**
   - **ComedyCentral**
   - **ComedyCentralFullEpisodes**
   - **ComedyCentralShortname**
   - **ComedyCentralTV**
   - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
+ - **CONtv**
   - **Corus**
   - **Coub**
   - **Cracked**
@@ -204,8 +202,6 @@
   - **dailymotion**
   - **dailymotion:playlist**
   - **dailymotion:user**
- - **DaisukiMotto**
- - **DaisukiMottoPlaylist**
   - **daum.net**
   - **daum.net:clip**
   - **daum.net:playlist**
@@ -231,7 +227,6 @@
   - **DouyuShow**
   - **DouyuTV**: 斗鱼
   - **DPlay**
- - **DPlayIt**
   - **DRBonanza**
   - **Dropbox**
   - **DrTuber**
@@ -284,12 +279,12 @@
   - **FiveThirtyEight**
   - **FiveTV**
   - **Flickr**
- - **Flipagram**
   - **Folketinget**: Folketinget (ft.dk; Danish parliament)
   - **FootyRoom**
   - **Formula1**
   - **FOX**
   - **FOX9**
+ - **FOX9News**
   - **Foxgay**
   - **foxnews**: Fox News and Fox Business Video
   - **foxnews:article**
@@ -315,8 +310,6 @@
   - **FXNetworks**
   - **Gaia**
   - **GameInformer**
- - **GameOne**
- - **gameone:playlist**
   - **GameSpot**
   - **GameStar**
   - **Gaskrank**
@@ -331,14 +324,12 @@
   - **Globo**
   - **GloboArticle**
   - **Go**
- - **Go90**
   - **GodTube**
   - **Golem**
   - **GoogleDrive**
   - **Goshgay**
   - **GPUTechConf**
   - **Groupon**
- - **Hark**
   - **hbo**
   - **HearThisAt**
   - **Heise**
@@ -367,7 +358,6 @@
   - **Hungama**
   - **HungamaSong**
   - **Hypem**
- - **Iconosquare**
   - **ign.com**
   - **imdb**: Internet Movie Database trailers
   - **imdb:list**: Internet Movie Database lists
@@ -407,14 +397,14 @@
   - **Kankan**
   - **Karaoketv**
   - **KarriereVideos**
- - **keek**
+ - **Katsomo**
   - **KeezMovies**
   - **Ketnet**
   - **KhanAcademy**
   - **KickStarter**
+ - **KinjaEmbed**
   - **KinoPoisk**
   - **KonserthusetPlay**
- - **kontrtube**: KontrTube.ru - Труба зовёт
   - **KrasView**: Красвью
   - **Ku6**
   - **KUSI**
@@ -431,7 +421,6 @@
   - **Lcp**
   - **LcpPlay**
   - **Le**: 乐视网
- - **Learnr**
   - **Lecture2Go**
   - **Lecturio**
   - **LecturioCourse**
@@ -465,11 +454,9 @@
   - **lynda**: lynda.com videos
   - **lynda:course**: lynda.com online courses
   - **m6**
- - **macgamestore**: MacGameStore trailers
   - **mailru**: Видео@Mail.Ru
   - **mailru:music**: Музыка@Mail.Ru
   - **mailru:music:search**: Музыка@Mail.Ru
- - **MakerTV**
   - **MallTV**
   - **mangomolo:live**
   - **mangomolo:video**
@@ -496,14 +483,12 @@
   - **Mgoon**
   - **MGTV**: 芒果TV
   - **MiaoPai**
- - **Minhateca**
   - **MinistryGrid**
   - **Minoto**
   - **miomio.tv**
   - **MiTele**: mitele.es
   - **mixcloud**
   - **mixcloud:playlist**
- - **mixcloud:stream**
   - **mixcloud:user**
   - **Mixer:live**
   - **Mixer:vod**
@@ -525,11 +510,10 @@
   - **mtg**: MTG services
   - **mtv**
   - **mtv.de**
- - **mtv81**
   - **mtv:video**
+ - **mtvjapan**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
- - **MusicPlayOn**
   - **mva**: Microsoft Virtual Academy videos
   - **mva:course**: Microsoft Virtual Academy courses
   - **Mwave**
@@ -634,7 +618,6 @@
   - **OnionStudios**
   - **Ooyala**
   - **OoyalaExternal**
- - **Openload**
   - **OraTV**
   - **orf:fm4**: radio FM4
   - **orf:fm4:story**: fm4.orf.at stories
@@ -645,7 +628,6 @@
   - **OutsideTV**
   - **PacktPub**
   - **PacktPubCourse**
- - **PandaTV**: 熊猫TV
   - **pandora.tv**: 판도라TV
   - **ParamountNetwork**
   - **parliamentlive.tv**: UK parliament videos
@@ -694,7 +676,6 @@
   - **PornoXO**
   - **PornTube**
   - **PressTV**
- - **PromptFile**
   - **prosiebensat1**: ProSiebenSat.1 Digital
   - **puhutv**
   - **puhutv:serie**
@@ -735,8 +716,6 @@
   - **Restudy**
   - **Reuters**
   - **ReverbNation**
- - **revision**
- - **revision3:embed**
   - **RICE**
   - **RMCDecouverte**
   - **RockstarGames**
@@ -781,11 +760,13 @@
   - **screen.yahoo:search**: Yahoo screen search
   - **Screencast**
   - **ScreencastOMatic**
+ - **ScrippsNetworks**
   - **scrippsnetworks:watch**
+ - **SCTE**
+ - **SCTECourse**
   - **Seeker**
   - **SenateISVP**
   - **SendtoNews**
- - **ServingSys**
   - **Servus**
   - **Sexu**
   - **SeznamZpravy**
@@ -816,6 +797,7 @@
   - **soundcloud:set**
   - **soundcloud:trackstation**
   - **soundcloud:user**
+ - **SoundcloudEmbed**
   - **soundgasm**
   - **soundgasm:profile**
   - **southpark.cc.com**
@@ -842,7 +824,6 @@
   - **Steam**
   - **Stitcher**
   - **Streamable**
- - **Streamango**
   - **streamcloud.eu**
   - **StreamCZ**
   - **StreetVoice**
@@ -884,9 +865,11 @@
   - **TeleQuebec**
   - **TeleQuebecEmission**
   - **TeleQuebecLive**
+ - **TeleQuebecSquat**
   - **TeleTask**
   - **Telewebion**
   - **TennisTV**
+ - **TenPlay**
   - **TF1**
   - **TFO**
   - **TheIntercept**
@@ -925,11 +908,12 @@
   - **tunein:topic**
   - **TunePk**
   - **Turbo**
- - **Tutv**
   - **tv.dfb.de**
   - **TV2**
   - **tv2.hu**
   - **TV2Article**
+ - **TV2DK**
+ - **TV2DKBornholmPlay**
   - **TV4**: tv4.se and tv4play.se
   - **TV5MondePlus**: TV5MONDE+
   - **TVA**
@@ -966,10 +950,12 @@
   - **twitch:vod**
   - **twitter**
   - **twitter:amplify**
+ - **twitter:broadcast**
   - **twitter:card**
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
+ - **UFCArabia**
   - **UFCTV**
   - **UKTVPlay**
   - **umg:de**: Universal Music Deutschland
@@ -990,8 +976,6 @@
   - **Vbox7**
   - **VeeHD**
   - **Veoh**
- - **verystream**
- - **Vessel**
   - **Vesti**: Вести.Ru
   - **Vevo**
   - **VevoPlaylist**
@@ -1006,13 +990,11 @@
   - **Viddler**
   - **Videa**
   - **video.google:search**: Google Video search
- - **video.mit.edu**
   - **VideoDetective**
   - **videofy.me**
   - **videomore**
   - **videomore:season**
   - **videomore:video**
- - **VideoPremium**
   - **VideoPress**
   - **Vidio**
   - **VidLii**
@@ -1024,7 +1006,6 @@
   - **vier:videos**
   - **ViewLift**
   - **ViewLiftEmbed**
- - **Viewster**
   - **Viidea**
   - **viki**
   - **viki:channel**
@@ -1090,7 +1071,6 @@
   - **Weibo**
   - **WeiboMobile**
   - **WeiqiTV**: WQTV
- - **Wimp**
   - **Wistia**
   - **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
   - **WorldStarHipHop**
@@ -1099,7 +1079,7 @@
   - **WWE**
   - **XBef**
   - **XboxClips**
- - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
+ - **XFileShare**: XFileShare based sites: ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing
   - **XHamster**
   - **XHamsterEmbed**
   - **XHamsterUser**
diff --git a/test/test_all_urls.py b/test/test_all_urls.py

index 465ce005097d246d6b77ae8c04b47fa94889dc41..81056a999d2014506b589ea4b31ab67951486f1b 100644 (file)
--- a/test/test_all_urls.py
+++ b/test/test_all_urls.py
@@ -123,12 +123,6 @@ class TestAllURLsMatching(unittest.TestCase):
          self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['pbs'])
          self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['pbs'])
  
-    def test_yahoo_https(self):
-        # https://github.com/ytdl-org/youtube-dl/issues/2701
-        self.assertMatch(
-            'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
-            ['Yahoo'])
-
      def test_no_duplicated_ie_names(self):
          name_accu = collections.defaultdict(list)
          for ie in self.ies:
diff --git a/test/test_utils.py b/test/test_utils.py

index 659c6ece53d480a5d709366ef44e29eedbec3027..0896f41506aa6d6cdb45b1c601203d6e717946d6 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -19,6 +19,7 @@ from youtube_dl.utils import (
      age_restricted,
      args_to_str,
      encode_base_n,
+    caesar,
      clean_html,
      date_from_str,
      DateRange,
@@ -69,11 +70,13 @@ from youtube_dl.utils import (
      remove_start,
      remove_end,
      remove_quotes,
+    rot47,
      shell_quote,
      smuggle_url,
      str_to_int,
      strip_jsonp,
      strip_or_none,
+    subtitles_filename,
      timeconvert,
      unescapeHTML,
      unified_strdate,
@@ -261,6 +264,11 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
          self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
  
+    def test_subtitles_filename(self):
+        self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt'), 'abc.en.vtt')
+        self.assertEqual(subtitles_filename('abc.ext', 'en', 'vtt', 'ext'), 'abc.en.vtt')
+        self.assertEqual(subtitles_filename('abc.unexpected_ext', 'en', 'vtt', 'ext'), 'abc.unexpected_ext.en.vtt')
+
      def test_remove_start(self):
          self.assertEqual(remove_start(None, 'A - '), None)
          self.assertEqual(remove_start('A - B', 'A - '), 'B')
@@ -334,6 +342,8 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(unified_strdate('July 15th, 2013'), '20130715')
          self.assertEqual(unified_strdate('September 1st, 2013'), '20130901')
          self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902')
+        self.assertEqual(unified_strdate('November 3rd, 2019'), '20191103')
+        self.assertEqual(unified_strdate('October 23rd, 2005'), '20051023')
  
      def test_unified_timestamps(self):
          self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@@ -489,6 +499,12 @@ class TestUtil(unittest.TestCase):
      def test_str_to_int(self):
          self.assertEqual(str_to_int('123,456'), 123456)
          self.assertEqual(str_to_int('123.456'), 123456)
+        self.assertEqual(str_to_int(523), 523)
+        # Python 3 has no long
+        if sys.version_info < (3, 0):
+            eval('self.assertEqual(str_to_int(123456L), 123456)')
+        self.assertEqual(str_to_int('noninteger'), None)
+        self.assertEqual(str_to_int([]), None)
  
      def test_url_basename(self):
          self.assertEqual(url_basename('http://foo.de/'), '')
@@ -1361,6 +1377,20 @@ Line 1
          self.assertRaises(ValueError, encode_base_n, 0, 70)
          self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
  
+    def test_caesar(self):
+        self.assertEqual(caesar('ace', 'abcdef', 2), 'cea')
+        self.assertEqual(caesar('cea', 'abcdef', -2), 'ace')
+        self.assertEqual(caesar('ace', 'abcdef', -2), 'eac')
+        self.assertEqual(caesar('eac', 'abcdef', 2), 'ace')
+        self.assertEqual(caesar('ace', 'abcdef', 0), 'ace')
+        self.assertEqual(caesar('xyz', 'abcdef', 2), 'xyz')
+        self.assertEqual(caesar('abc', 'acegik', 2), 'ebg')
+        self.assertEqual(caesar('ebg', 'acegik', -2), 'abc')
+
+    def test_rot47(self):
+        self.assertEqual(rot47('youtube-dl'), r'J@FEF36\5=')
+        self.assertEqual(rot47('YOUTUBE-DL'), r'*~&%&qt\s{')
+
      def test_urshift(self):
          self.assertEqual(urshift(3, 1), 1)
          self.assertEqual(urshift(-3, 1), 2147483646)
diff --git a/youtube-dl b/youtube-dl

index c2e66c3ac5f06b4ee41061572c563b83ea233075..748d6a0deb8f348b9d682f478fb36db4c0515ebc 100755 (executable)

Binary files a/youtube-dl and b/youtube-dl differ
diff --git a/youtube-dl.1 b/youtube-dl.1

index fa17c311357571462837bc8e49309826d48251c2..251810f21fd7545613c8ce9ef0ade45afdbff652 100644 (file)
--- a/youtube-dl.1
+++ b/youtube-dl.1
@@ -1550,8 +1550,8 @@ instructions (https://ytdl-org.github.io/youtube-dl/download.html):
  .IP
  .nf
  \f[C]
-sudo\ wget\ https://yt\-dl.org/latest/youtube\-dl\ \-O\ /usr/local/bin/youtube\-dl
-sudo\ chmod\ a+x\ /usr/local/bin/youtube\-dl
+sudo\ wget\ https://yt\-dl.org/downloads/latest/youtube\-dl\ \-O\ /usr/local/bin/youtube\-dl
+sudo\ chmod\ a+rx\ /usr/local/bin/youtube\-dl
  hash\ \-r
  \f[]
  .fi
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index c3d1407f99ef76ad55ff36fbfae6649fa406d998..f5cb46308198e4c65316fba10ccf30d2f3e14b6a 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1814,7 +1814,7 @@ class YoutubeDL(object):
              ie = self.get_info_extractor(info_dict['extractor_key'])
              for sub_lang, sub_info in subtitles.items():
                  sub_format = sub_info['ext']
-                sub_filename = subtitles_filename(filename, sub_lang, sub_format)
+                sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
                  if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
                      self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
                  else:
diff --git a/youtube_dl/downloader/hls.py b/youtube_dl/downloader/hls.py

index b59aad73f9fc0a798548585bd2b2ef120e867d76..84bc349288052fe15e92be61e72645db007003d0 100644 (file)
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@@ -64,7 +64,7 @@ class HlsFD(FragmentFD):
          s = urlh.read().decode('utf-8', 'ignore')
  
          if not self.can_download(s, info_dict):
-            if info_dict.get('extra_param_to_segment_url'):
+            if info_dict.get('extra_param_to_segment_url') or info_dict.get('_decryption_key_url'):
                  self.report_error('pycrypto not found. Please install it.')
                  return False
              self.report_warning(
@@ -169,7 +169,7 @@ class HlsFD(FragmentFD):
                      if decrypt_info['METHOD'] == 'AES-128':
                          iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
                          decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
-                            self._prepare_url(info_dict, decrypt_info['URI'])).read()
+                            self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
                          frag_content = AES.new(
                              decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
                      self._append_fragment(ctx, frag_content)
diff --git a/youtube_dl/extractor/abcotvs.py b/youtube_dl/extractor/abcotvs.py

index 03b92a39c785453123a4c5fbc5e09c98c7128634..0bc69a64f699604bdb6c538da1737f974626da3e 100644 (file)
--- a/youtube_dl/extractor/abcotvs.py
+++ b/youtube_dl/extractor/abcotvs.py
@@ -4,29 +4,30 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
+    dict_get,
      int_or_none,
-    parse_iso8601,
+    try_get,
  )
  
  
  class ABCOTVSIE(InfoExtractor):
      IE_NAME = 'abcotvs'
      IE_DESC = 'ABC Owned Television Stations'
-    _VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?P<site>abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:(?:/[^/]+)*/(?P<display_id>[^/]+))?/(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
              'info_dict': {
-                'id': '472581',
+                'id': '472548',
                  'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
                  'ext': 'mp4',
-                'title': 'East Bay museum celebrates vintage synthesizers',
+                'title': 'East Bay museum celebrates synthesized music',
                  'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
                  'thumbnail': r're:^https?://.*\.jpg$',
-                'timestamp': 1421123075,
+                'timestamp': 1421118520,
                  'upload_date': '20150113',
-                'uploader': 'Jonathan Bloom',
              },
              'params': {
                  # m3u8 download
@@ -37,39 +38,63 @@ class ABCOTVSIE(InfoExtractor):
              'url': 'http://abc7news.com/472581',
              'only_matching': True,
          },
+        {
+            'url': 'https://6abc.com/man-75-killed-after-being-struck-by-vehicle-in-chester/5725182/',
+            'only_matching': True,
+        },
      ]
+    _SITE_MAP = {
+        '6abc': 'wpvi',
+        'abc11': 'wtvd',
+        'abc13': 'ktrk',
+        'abc30': 'kfsn',
+        'abc7': 'kabc',
+        'abc7chicago': 'wls',
+        'abc7news': 'kgo',
+        'abc7ny': 'wabc',
+    }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id') or video_id
-
-        webpage = self._download_webpage(url, display_id)
+        site, display_id, video_id = re.match(self._VALID_URL, url).groups()
+        display_id = display_id or video_id
+        station = self._SITE_MAP[site]
  
-        m3u8 = self._html_search_meta(
-            'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
+        data = self._download_json(
+            'https://api.abcotvs.com/v2/content', display_id, query={
+                'id': video_id,
+                'key': 'otv.web.%s.story' % station,
+                'station': station,
+            })['data']
+        video = try_get(data, lambda x: x['featuredMedia']['video'], dict) or data
+        video_id = compat_str(dict_get(video, ('id', 'publishedKey'), video_id))
+        title = video.get('title') or video['linkText']
  
-        formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
+        formats = []
+        m3u8_url = video.get('m3u8')
+        if m3u8_url:
+            formats = self._extract_m3u8_formats(
+                video['m3u8'].split('?')[0], display_id, 'mp4', m3u8_id='hls', fatal=False)
+        mp4_url = video.get('mp4')
+        if mp4_url:
+            formats.append({
+                'abr': 128,
+                'format_id': 'https',
+                'height': 360,
+                'url': mp4_url,
+                'width': 640,
+            })
          self._sort_formats(formats)
  
-        title = self._og_search_title(webpage).strip()
-        description = self._og_search_description(webpage).strip()
-        thumbnail = self._og_search_thumbnail(webpage)
-        timestamp = parse_iso8601(self._search_regex(
-            r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
-            webpage, 'upload date', fatal=False))
-        uploader = self._search_regex(
-            r'rel="author">([^<]+)</a>',
-            webpage, 'uploader', default=None)
+        image = video.get('image') or {}
  
          return {
              'id': video_id,
              'display_id': display_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'uploader': uploader,
+            'description': dict_get(video, ('description', 'caption'), try_get(video, lambda x: x['meta']['description'])),
+            'thumbnail': dict_get(image, ('source', 'dynamicSource')),
+            'timestamp': int_or_none(video.get('date')),
+            'duration': int_or_none(video.get('length')),
              'formats': formats,
          }
  
diff --git a/youtube_dl/extractor/addanime.py b/youtube_dl/extractor/addanime.py

deleted file mode 100644 (file)

index 5e7c072..0000000
--- a/youtube_dl/extractor/addanime.py
+++ /dev/null
@@ -1,95 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_HTTPError,
-    compat_str,
-    compat_urllib_parse_urlencode,
-    compat_urllib_parse_urlparse,
-)
-from ..utils import (
-    ExtractorError,
-    qualities,
-)
-
-
-class AddAnimeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
-    _TESTS = [{
-        'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
-        'md5': '72954ea10bc979ab5e2eb288b21425a0',
-        'info_dict': {
-            'id': '24MR3YO5SAS9',
-            'ext': 'mp4',
-            'description': 'One Piece 606',
-            'title': 'One Piece 606',
-        },
-        'skip': 'Video is gone',
-    }, {
-        'url': 'http://add-anime.net/video/MDUGWYKNGBD8/One-Piece-687',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        try:
-            webpage = self._download_webpage(url, video_id)
-        except ExtractorError as ee:
-            if not isinstance(ee.cause, compat_HTTPError) or \
-               ee.cause.code != 503:
-                raise
-
-            redir_webpage = ee.cause.read().decode('utf-8')
-            action = self._search_regex(
-                r'<form id="challenge-form" action="([^"]+)"',
-                redir_webpage, 'Redirect form')
-            vc = self._search_regex(
-                r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
-                redir_webpage, 'redirect vc value')
-            av = re.search(
-                r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
-                redir_webpage)
-            if av is None:
-                raise ExtractorError('Cannot find redirect math task')
-            av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
-
-            parsed_url = compat_urllib_parse_urlparse(url)
-            av_val = av_res + len(parsed_url.netloc)
-            confirm_url = (
-                parsed_url.scheme + '://' + parsed_url.netloc
-                + action + '?'
-                + compat_urllib_parse_urlencode({
-                    'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
-            self._download_webpage(
-                confirm_url, video_id,
-                note='Confirming after redirect')
-            webpage = self._download_webpage(url, video_id)
-
-        FORMATS = ('normal', 'hq')
-        quality = qualities(FORMATS)
-        formats = []
-        for format_id in FORMATS:
-            rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
-            video_url = self._search_regex(rex, webpage, 'video file URLx',
-                                           fatal=False)
-            if not video_url:
-                continue
-            formats.append({
-                'format_id': format_id,
-                'url': video_url,
-                'quality': quality(format_id),
-            })
-        self._sort_formats(formats)
-        video_title = self._og_search_title(webpage)
-        video_description = self._og_search_description(webpage)
-
-        return {
-            '_type': 'video',
-            'id': video_id,
-            'formats': formats,
-            'title': video_title,
-            'description': video_description
-        }
diff --git a/youtube_dl/extractor/adobetv.py b/youtube_dl/extractor/adobetv.py

index 008c98e51ead3ffcad7bb350fcf928a945b91e35..80060f03722c241231d3ed468539d81b9ac9c93e 100644 (file)
--- a/youtube_dl/extractor/adobetv.py
+++ b/youtube_dl/extractor/adobetv.py
@@ -1,25 +1,119 @@
  from __future__ import unicode_literals
  
+import functools
  import re
  
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
-    parse_duration,
-    unified_strdate,
-    str_to_int,
-    int_or_none,
      float_or_none,
+    int_or_none,
      ISO639Utils,
-    determine_ext,
+    OnDemandPagedList,
+    parse_duration,
+    str_or_none,
+    str_to_int,
+    unified_strdate,
  )
  
  
  class AdobeTVBaseIE(InfoExtractor):
-    _API_BASE_URL = 'http://tv.adobe.com/api/v4/'
+    def _call_api(self, path, video_id, query, note=None):
+        return self._download_json(
+            'http://tv.adobe.com/api/v4/' + path,
+            video_id, note, query=query)['data']
+
+    def _parse_subtitles(self, video_data, url_key):
+        subtitles = {}
+        for translation in video_data.get('translations', []):
+            vtt_path = translation.get(url_key)
+            if not vtt_path:
+                continue
+            lang = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
+            subtitles.setdefault(lang, []).append({
+                'ext': 'vtt',
+                'url': vtt_path,
+            })
+        return subtitles
+
+    def _parse_video_data(self, video_data):
+        video_id = compat_str(video_data['id'])
+        title = video_data['title']
+
+        s3_extracted = False
+        formats = []
+        for source in video_data.get('videos', []):
+            source_url = source.get('url')
+            if not source_url:
+                continue
+            f = {
+                'format_id': source.get('quality_level'),
+                'fps': int_or_none(source.get('frame_rate')),
+                'height': int_or_none(source.get('height')),
+                'tbr': int_or_none(source.get('video_data_rate')),
+                'width': int_or_none(source.get('width')),
+                'url': source_url,
+            }
+            original_filename = source.get('original_filename')
+            if original_filename:
+                if not (f.get('height') and f.get('width')):
+                    mobj = re.search(r'_(\d+)x(\d+)', original_filename)
+                    if mobj:
+                        f.update({
+                            'height': int(mobj.group(2)),
+                            'width': int(mobj.group(1)),
+                        })
+                if original_filename.startswith('s3://') and not s3_extracted:
+                    formats.append({
+                        'format_id': 'original',
+                        'preference': 1,
+                        'url': original_filename.replace('s3://', 'https://s3.amazonaws.com/'),
+                    })
+                    s3_extracted = True
+            formats.append(f)
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'thumbnail': video_data.get('thumbnail'),
+            'upload_date': unified_strdate(video_data.get('start_date')),
+            'duration': parse_duration(video_data.get('duration')),
+            'view_count': str_to_int(video_data.get('playcount')),
+            'formats': formats,
+            'subtitles': self._parse_subtitles(video_data, 'vtt'),
+        }
+
+
+class AdobeTVEmbedIE(AdobeTVBaseIE):
+    IE_NAME = 'adobetv:embed'
+    _VALID_URL = r'https?://tv\.adobe\.com/embed/\d+/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://tv.adobe.com/embed/22/4153',
+        'md5': 'c8c0461bf04d54574fc2b4d07ac6783a',
+        'info_dict': {
+            'id': '4153',
+            'ext': 'flv',
+            'title': 'Creating Graphics Optimized for BlackBerry',
+            'description': 'md5:eac6e8dced38bdaae51cd94447927459',
+            'thumbnail': r're:https?://.*\.jpg$',
+            'upload_date': '20091109',
+            'duration': 377,
+            'view_count': int,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video_data = self._call_api(
+            'episode/' + video_id, video_id, {'disclosure': 'standard'})[0]
+        return self._parse_video_data(video_data)
  
  
  class AdobeTVIE(AdobeTVBaseIE):
+    IE_NAME = 'adobetv'
      _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)'
  
      _TEST = {
@@ -42,45 +136,33 @@ class AdobeTVIE(AdobeTVBaseIE):
          if not language:
              language = 'en'
  
-        video_data = self._download_json(
-            self._API_BASE_URL + 'episode/get/?language=%s&show_urlname=%s&urlname=%s&disclosure=standard' % (language, show_urlname, urlname),
-            urlname)['data'][0]
-
-        formats = [{
-            'url': source['url'],
-            'format_id': source.get('quality_level') or source['url'].split('-')[-1].split('.')[0] or None,
-            'width': int_or_none(source.get('width')),
-            'height': int_or_none(source.get('height')),
-            'tbr': int_or_none(source.get('video_data_rate')),
-        } for source in video_data['videos']]
-        self._sort_formats(formats)
-
-        return {
-            'id': compat_str(video_data['id']),
-            'title': video_data['title'],
-            'description': video_data.get('description'),
-            'thumbnail': video_data.get('thumbnail'),
-            'upload_date': unified_strdate(video_data.get('start_date')),
-            'duration': parse_duration(video_data.get('duration')),
-            'view_count': str_to_int(video_data.get('playcount')),
-            'formats': formats,
-        }
+        video_data = self._call_api(
+            'episode/get', urlname, {
+                'disclosure': 'standard',
+                'language': language,
+                'show_urlname': show_urlname,
+                'urlname': urlname,
+            })[0]
+        return self._parse_video_data(video_data)
  
  
  class AdobeTVPlaylistBaseIE(AdobeTVBaseIE):
-    def _parse_page_data(self, page_data):
-        return [self.url_result(self._get_element_url(element_data)) for element_data in page_data]
+    _PAGE_SIZE = 25
+
+    def _fetch_page(self, display_id, query, page):
+        page += 1
+        query['page'] = page
+        for element_data in self._call_api(
+                self._RESOURCE, display_id, query, 'Download Page %d' % page):
+            yield self._process_data(element_data)
  
-    def _extract_playlist_entries(self, url, display_id):
-        page = self._download_json(url, display_id)
-        entries = self._parse_page_data(page['data'])
-        for page_num in range(2, page['paging']['pages'] + 1):
-            entries.extend(self._parse_page_data(
-                self._download_json(url + '&page=%d' % page_num, display_id)['data']))
-        return entries
+    def _extract_playlist_entries(self, display_id, query):
+        return OnDemandPagedList(functools.partial(
+            self._fetch_page, display_id, query), self._PAGE_SIZE)
  
  
  class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
+    IE_NAME = 'adobetv:show'
      _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)'
  
      _TEST = {
@@ -92,26 +174,31 @@ class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
          },
          'playlist_mincount': 136,
      }
-
-    def _get_element_url(self, element_data):
-        return element_data['urls'][0]
+    _RESOURCE = 'episode'
+    _process_data = AdobeTVBaseIE._parse_video_data
  
      def _real_extract(self, url):
          language, show_urlname = re.match(self._VALID_URL, url).groups()
          if not language:
              language = 'en'
-        query = 'language=%s&show_urlname=%s' % (language, show_urlname)
+        query = {
+            'disclosure': 'standard',
+            'language': language,
+            'show_urlname': show_urlname,
+        }
  
-        show_data = self._download_json(self._API_BASE_URL + 'show/get/?%s' % query, show_urlname)['data'][0]
+        show_data = self._call_api(
+            'show/get', show_urlname, query)[0]
  
          return self.playlist_result(
-            self._extract_playlist_entries(self._API_BASE_URL + 'episode/?%s' % query, show_urlname),
-            compat_str(show_data['id']),
-            show_data['show_name'],
-            show_data['show_description'])
+            self._extract_playlist_entries(show_urlname, query),
+            str_or_none(show_data.get('id')),
+            show_data.get('show_name'),
+            show_data.get('show_description'))
  
  
  class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
+    IE_NAME = 'adobetv:channel'
      _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?'
  
      _TEST = {
@@ -121,24 +208,30 @@ class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
          },
          'playlist_mincount': 96,
      }
+    _RESOURCE = 'show'
  
-    def _get_element_url(self, element_data):
-        return element_data['url']
+    def _process_data(self, show_data):
+        return self.url_result(
+            show_data['url'], 'AdobeTVShow', str_or_none(show_data.get('id')))
  
      def _real_extract(self, url):
          language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups()
          if not language:
              language = 'en'
-        query = 'language=%s&channel_urlname=%s' % (language, channel_urlname)
+        query = {
+            'channel_urlname': channel_urlname,
+            'language': language,
+        }
          if category_urlname:
-            query += '&category_urlname=%s' % category_urlname
+            query['category_urlname'] = category_urlname
  
          return self.playlist_result(
-            self._extract_playlist_entries(self._API_BASE_URL + 'show/?%s' % query, channel_urlname),
+            self._extract_playlist_entries(channel_urlname, query),
              channel_urlname)
  
  
-class AdobeTVVideoIE(InfoExtractor):
+class AdobeTVVideoIE(AdobeTVBaseIE):
+    IE_NAME = 'adobetv:video'
      _VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
  
      _TEST = {
@@ -160,38 +253,36 @@ class AdobeTVVideoIE(InfoExtractor):
  
          video_data = self._parse_json(self._search_regex(
              r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
-
-        formats = [{
-            'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
-            'url': source['src'],
-            'width': int_or_none(source.get('width')),
-            'height': int_or_none(source.get('height')),
-            'tbr': int_or_none(source.get('bitrate')),
-        } for source in video_data['sources']]
+        title = video_data['title']
+
+        formats = []
+        sources = video_data.get('sources') or []
+        for source in sources:
+            source_src = source.get('src')
+            if not source_src:
+                continue
+            formats.append({
+                'filesize': int_or_none(source.get('kilobytes') or None, invscale=1000),
+                'format_id': '-'.join(filter(None, [source.get('format'), source.get('label')])),
+                'height': int_or_none(source.get('height') or None),
+                'tbr': int_or_none(source.get('bitrate') or None),
+                'width': int_or_none(source.get('width') or None),
+                'url': source_src,
+            })
          self._sort_formats(formats)
  
          # For both metadata and downloaded files the duration varies among
          # formats. I just pick the max one
          duration = max(filter(None, [
              float_or_none(source.get('duration'), scale=1000)
-            for source in video_data['sources']]))
-
-        subtitles = {}
-        for translation in video_data.get('translations', []):
-            lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
-            if lang_id not in subtitles:
-                subtitles[lang_id] = []
-            subtitles[lang_id].append({
-                'url': translation['vttPath'],
-                'ext': 'vtt',
-            })
+            for source in sources]))
  
          return {
              'id': video_id,
              'formats': formats,
-            'title': video_data['title'],
+            'title': title,
              'description': video_data.get('description'),
-            'thumbnail': video_data['video'].get('poster'),
+            'thumbnail': video_data.get('video', {}).get('poster'),
              'duration': duration,
-            'subtitles': subtitles,
+            'subtitles': self._parse_subtitles(video_data, 'vttPath'),
          }
diff --git a/youtube_dl/extractor/americastestkitchen.py b/youtube_dl/extractor/americastestkitchen.py

index 8b32aa886e9696e9334f73a777a70264f28c9433..9c9d77ae107e0b822b46368d89445f21e9e830a6 100644 (file)
--- a/youtube_dl/extractor/americastestkitchen.py
+++ b/youtube_dl/extractor/americastestkitchen.py
@@ -5,6 +5,7 @@ from .common import InfoExtractor
  from ..utils import (
      clean_html,
      int_or_none,
+    js_to_json,
      try_get,
      unified_strdate,
  )
@@ -13,22 +14,21 @@ from ..utils import (
  class AmericasTestKitchenIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
      _TESTS = [{
-        'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
+        'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
          'md5': 'b861c3e365ac38ad319cfd509c30577f',
          'info_dict': {
-            'id': '1_5g5zua6e',
-            'title': 'Summer Dinner Party',
+            'id': '5b400b9ee338f922cb06450c',
+            'title': 'Weeknight Japanese Suppers',
              'ext': 'mp4',
-            'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
-            'thumbnail': r're:^https?://.*\.jpg',
-            'timestamp': 1497285541,
-            'upload_date': '20170612',
-            'uploader_id': 'roger.metcalf@americastestkitchen.com',
-            'release_date': '20170617',
+            'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
+            'thumbnail': r're:^https?://',
+            'timestamp': 1523664000,
+            'upload_date': '20180414',
+            'release_date': '20180414',
              'series': "America's Test Kitchen",
-            'season_number': 17,
-            'episode': 'Summer Dinner Party',
-            'episode_number': 24,
+            'season_number': 18,
+            'episode': 'Weeknight Japanese Suppers',
+            'episode_number': 15,
          },
          'params': {
              'skip_download': True,
@@ -47,7 +47,7 @@ class AmericasTestKitchenIE(InfoExtractor):
              self._search_regex(
                  r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
                  webpage, 'initial context'),
-            video_id)
+            video_id, js_to_json)
  
          ep_data = try_get(
              video_data,
@@ -55,17 +55,7 @@ class AmericasTestKitchenIE(InfoExtractor):
               lambda x: x['videoDetail']['content']['data']), dict)
          ep_meta = ep_data.get('full_video', {})
  
-        zype_id = ep_meta.get('zype_id')
-        if zype_id:
-            embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
-            ie_key = 'Zype'
-        else:
-            partner_id = self._search_regex(
-                r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
-                webpage, 'kaltura partner id')
-            external_id = ep_data.get('external_id') or ep_meta['external_id']
-            embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
-            ie_key = 'Kaltura'
+        zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
  
          title = ep_data.get('title') or ep_meta.get('title')
          description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@@ -79,8 +69,8 @@ class AmericasTestKitchenIE(InfoExtractor):
  
          return {
              '_type': 'url_transparent',
-            'url': embed_url,
-            'ie_key': ie_key,
+            'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
+            'ie_key': 'Zype',
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 8adae46449232fe487ec6796b5b60345f144d310..2f47e21c324102d73cf277accc0b30ac41ddfab6 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -1,6 +1,7 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import json
  import re
  
  from .common import InfoExtractor
@@ -22,66 +23,28 @@ from ..utils import (
  from ..compat import compat_etree_fromstring
  
  
-class ARDMediathekIE(InfoExtractor):
-    IE_NAME = 'ARD:mediathek'
-    _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
-
-    _TESTS = [{
-        # available till 26.07.2022
-        'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
-        'info_dict': {
-            'id': '44726822',
-            'ext': 'mp4',
-            'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
-            'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
-            'duration': 1740,
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        }
-    }, {
-        'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
-        'only_matching': True,
-    }, {
-        # audio
-        'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
-        'only_matching': True,
-    }, {
-        'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
-        'only_matching': True,
-    }, {
-        # audio
-        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
-        'only_matching': True,
-    }, {
-        'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
-        'only_matching': True,
-    }]
-
-    @classmethod
-    def suitable(cls, url):
-        return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+class ARDMediathekBaseIE(InfoExtractor):
+    _GEO_COUNTRIES = ['DE']
  
      def _extract_media_info(self, media_info_url, webpage, video_id):
          media_info = self._download_json(
              media_info_url, video_id, 'Downloading media JSON')
+        return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
  
+    def _parse_media_info(self, media_info, video_id, fsk):
          formats = self._extract_formats(media_info, video_id)
  
          if not formats:
-            if '"fsk"' in webpage:
+            if fsk:
                  raise ExtractorError(
                      'This video is only available after 20:00', expected=True)
              elif media_info.get('_geoblocked'):
-                raise ExtractorError('This video is not available due to geo restriction', expected=True)
+                self.raise_geo_restricted(
+                    'This video is not available due to geoblocking',
+                    countries=self._GEO_COUNTRIES)
  
          self._sort_formats(formats)
  
-        duration = int_or_none(media_info.get('_duration'))
-        thumbnail = media_info.get('_previewImage')
-        is_live = media_info.get('_isLive') is True
-
          subtitles = {}
          subtitle_url = media_info.get('_subtitleUrl')
          if subtitle_url:
@@ -92,9 +55,9 @@ class ARDMediathekIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'duration': duration,
-            'thumbnail': thumbnail,
-            'is_live': is_live,
+            'duration': int_or_none(media_info.get('_duration')),
+            'thumbnail': media_info.get('_previewImage'),
+            'is_live': media_info.get('_isLive') is True,
              'formats': formats,
              'subtitles': subtitles,
          }
@@ -123,11 +86,11 @@ class ARDMediathekIE(InfoExtractor):
                              update_url_query(stream_url, {
                                  'hdcore': '3.1.1',
                                  'plugin': 'aasp-3.1.1.69.124'
-                            }),
-                            video_id, f4m_id='hds', fatal=False))
+                            }), video_id, f4m_id='hds', fatal=False))
                      elif ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
-                            stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                            stream_url, video_id, 'mp4', 'm3u8_native',
+                            m3u8_id='hls', fatal=False))
                      else:
                          if server and server.startswith('rtmp'):
                              f = {
@@ -140,7 +103,9 @@ class ARDMediathekIE(InfoExtractor):
                                  'url': stream_url,
                                  'format_id': 'a%s-%s-%s' % (num, ext, quality)
                              }
-                        m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
+                        m = re.search(
+                            r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
+                            stream_url)
                          if m:
                              f.update({
                                  'width': int(m.group('width')),
@@ -151,6 +116,48 @@ class ARDMediathekIE(InfoExtractor):
                          formats.append(f)
          return formats
  
+
+class ARDMediathekIE(ARDMediathekBaseIE):
+    IE_NAME = 'ARD:mediathek'
+    _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
+
+    _TESTS = [{
+        # available till 26.07.2022
+        'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
+        'info_dict': {
+            'id': '44726822',
+            'ext': 'mp4',
+            'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
+            'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
+            'duration': 1740,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
+        'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
+        'only_matching': True,
+    }, {
+        'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
+        'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
+        'only_matching': True,
+    }, {
+        'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+
      def _real_extract(self, url):
          # determine video id from url
          m = re.match(self._VALID_URL, url)
@@ -302,19 +309,20 @@ class ARDIE(InfoExtractor):
          }
  
  
-class ARDBetaMediathekIE(InfoExtractor):
-    _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
+class ARDBetaMediathekIE(ARDMediathekBaseIE):
+    _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/(?P<client>[^/]+)/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
      _TESTS = [{
          'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
-        'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
+        'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
          'info_dict': {
              'display_id': 'die-robuste-roswita',
-            'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
-            'title': 'Tatort: Die robuste Roswita',
+            'id': '70153354',
+            'title': 'Die robuste Roswita',
              'description': r're:^Der Mord.*trüber ist als die Ilm.',
              'duration': 5316,
-            'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
-            'upload_date': '20180826',
+            'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
+            'timestamp': 1577047500,
+            'upload_date': '20191222',
              'ext': 'mp4',
          },
      }, {
@@ -330,71 +338,69 @@ class ARDBetaMediathekIE(InfoExtractor):
          video_id = mobj.group('video_id')
          display_id = mobj.group('display_id') or video_id
  
-        webpage = self._download_webpage(url, display_id)
-        data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
-        data = self._parse_json(data_json, display_id)
-
-        res = {
-            'id': video_id,
-            'display_id': display_id,
+        player_page = self._download_json(
+            'https://api.ardmediathek.de/public-gateway',
+            display_id, data=json.dumps({
+                'query': '''{
+  playerPage(client:"%s", clipId: "%s") {
+    blockedByFsk
+    broadcastedOn
+    maturityContentRating
+    mediaCollection {
+      _duration
+      _geoblocked
+      _isLive
+      _mediaArray {
+        _mediaStreamArray {
+          _quality
+          _server
+          _stream
          }
-        formats = []
-        subtitles = {}
-        geoblocked = False
-        for widget in data.values():
-            if widget.get('_geoblocked') is True:
-                geoblocked = True
-            if '_duration' in widget:
-                res['duration'] = int_or_none(widget['_duration'])
-            if 'clipTitle' in widget:
-                res['title'] = widget['clipTitle']
-            if '_previewImage' in widget:
-                res['thumbnail'] = widget['_previewImage']
-            if 'broadcastedOn' in widget:
-                res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
-            if 'synopsis' in widget:
-                res['description'] = widget['synopsis']
-            subtitle_url = url_or_none(widget.get('_subtitleUrl'))
-            if subtitle_url:
-                subtitles.setdefault('de', []).append({
-                    'ext': 'ttml',
-                    'url': subtitle_url,
-                })
-            if '_quality' in widget:
-                format_url = url_or_none(try_get(
-                    widget, lambda x: x['_stream']['json'][0]))
-                if not format_url:
-                    continue
-                ext = determine_ext(format_url)
-                if ext == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
-                        format_url + '?hdcore=3.11.0',
-                        video_id, f4m_id='hds', fatal=False))
-                elif ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        format_url, video_id, 'mp4', m3u8_id='hls',
-                        fatal=False))
-                else:
-                    # HTTP formats are not available when geoblocked is True,
-                    # other formats are fine though
-                    if geoblocked:
-                        continue
-                    quality = str_or_none(widget.get('_quality'))
-                    formats.append({
-                        'format_id': ('http-' + quality) if quality else 'http',
-                        'url': format_url,
-                        'preference': 10,  # Plain HTTP, that's nice
-                    })
-
-        if not formats and geoblocked:
-            self.raise_geo_restricted(
-                msg='This video is not available due to geoblocking',
-                countries=['DE'])
-
-        self._sort_formats(formats)
-        res.update({
-            'subtitles': subtitles,
-            'formats': formats,
+      }
+      _previewImage
+      _subtitleUrl
+      _type
+    }
+    show {
+      title
+    }
+    synopsis
+    title
+    tracking {
+      atiCustomVars {
+        contentId
+      }
+    }
+  }
+}''' % (mobj.group('client'), video_id),
+            }).encode(), headers={
+                'Content-Type': 'application/json'
+            })['data']['playerPage']
+        title = player_page['title']
+        content_id = str_or_none(try_get(
+            player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
+        media_collection = player_page.get('mediaCollection') or {}
+        if not media_collection and content_id:
+            media_collection = self._download_json(
+                'https://www.ardmediathek.de/play/media/' + content_id,
+                content_id, fatal=False) or {}
+        info = self._parse_media_info(
+            media_collection, content_id or video_id,
+            player_page.get('blockedByFsk'))
+        age_limit = None
+        description = player_page.get('synopsis')
+        maturity_content_rating = player_page.get('maturityContentRating')
+        if maturity_content_rating:
+            age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
+        if not age_limit and description:
+            age_limit = int_or_none(self._search_regex(
+                r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
+        info.update({
+            'age_limit': age_limit,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
+            'series': try_get(player_page, lambda x: x['show']['title']),
          })
-
-        return res
+        return info
diff --git a/youtube_dl/extractor/atresplayer.py b/youtube_dl/extractor/atresplayer.py

index ae1c094277251b6bfe26298d3c8dd8dfd741a905..c2cec984525fe9d318b4a615fefd01e644ce788a 100644 (file)
--- a/youtube_dl/extractor/atresplayer.py
+++ b/youtube_dl/extractor/atresplayer.py
@@ -1,202 +1,118 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
-import time
-import hmac
-import hashlib
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import compat_HTTPError
  from ..utils import (
      ExtractorError,
-    float_or_none,
      int_or_none,
-    sanitized_Request,
      urlencode_postdata,
-    xpath_text,
  )
  
  
  class AtresPlayerIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?atresplayer\.com/television/[^/]+/[^/]+/[^/]+/(?P<id>.+?)_\d+\.html'
+    _VALID_URL = r'https?://(?:www\.)?atresplayer\.com/[^/]+/[^/]+/[^/]+/[^/]+/(?P<display_id>.+?)_(?P<id>[0-9a-f]{24})'
      _NETRC_MACHINE = 'atresplayer'
      _TESTS = [
          {
-            'url': 'http://www.atresplayer.com/television/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_2014122100174.html',
-            'md5': 'efd56753cda1bb64df52a3074f62e38a',
+            'url': 'https://www.atresplayer.com/antena3/series/pequenas-coincidencias/temporada-1/capitulo-7-asuntos-pendientes_5d4aa2c57ed1a88fc715a615/',
              'info_dict': {
-                'id': 'capitulo-10-especial-solidario-nochebuena',
+                'id': '5d4aa2c57ed1a88fc715a615',
                  'ext': 'mp4',
-                'title': 'Especial Solidario de Nochebuena',
-                'description': 'md5:e2d52ff12214fa937107d21064075bf1',
-                'duration': 5527.6,
-                'thumbnail': r're:^https?://.*\.jpg$',
+                'title': 'Capítulo 7: Asuntos pendientes',
+                'description': 'md5:7634cdcb4d50d5381bedf93efb537fbc',
+                'duration': 3413,
+            },
+            'params': {
+                'format': 'bestvideo',
              },
              'skip': 'This video is only available for registered users'
          },
          {
-            'url': 'http://www.atresplayer.com/television/especial/videoencuentros/temporada-1/capitulo-112-david-bustamante_2014121600375.html',
-            'md5': '6e52cbb513c405e403dbacb7aacf8747',
-            'info_dict': {
-                'id': 'capitulo-112-david-bustamante',
-                'ext': 'flv',
-                'title': 'David Bustamante',
-                'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
-                'duration': 1439.0,
-                'thumbnail': r're:^https?://.*\.jpg$',
-            },
+            'url': 'https://www.atresplayer.com/lasexta/programas/el-club-de-la-comedia/temporada-4/capitulo-10-especial-solidario-nochebuena_5ad08edf986b2855ed47adc4/',
+            'only_matching': True,
          },
          {
-            'url': 'http://www.atresplayer.com/television/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_2014122400174.html',
+            'url': 'https://www.atresplayer.com/antena3/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_5ad51046986b2886722ccdea/',
              'only_matching': True,
          },
      ]
-
-    _USER_AGENT = 'Dalvik/1.6.0 (Linux; U; Android 4.3; GT-I9300 Build/JSS15J'
-    _MAGIC = 'QWtMLXs414Yo+c#_+Q#K@NN)'
-    _TIMESTAMP_SHIFT = 30000
-
-    _TIME_API_URL = 'http://servicios.atresplayer.com/api/admin/time.json'
-    _URL_VIDEO_TEMPLATE = 'https://servicios.atresplayer.com/api/urlVideo/{1}/{0}/{1}|{2}|{3}.json'
-    _PLAYER_URL_TEMPLATE = 'https://servicios.atresplayer.com/episode/getplayer.json?episodePk=%s'
-    _EPISODE_URL_TEMPLATE = 'http://www.atresplayer.com/episodexml/%s'
-
-    _LOGIN_URL = 'https://servicios.atresplayer.com/j_spring_security_check'
-
-    _ERRORS = {
-        'UNPUBLISHED': 'We\'re sorry, but this video is not yet available.',
-        'DELETED': 'This video has expired and is no longer available for online streaming.',
-        'GEOUNPUBLISHED': 'We\'re sorry, but this video is not available in your region due to right restrictions.',
-        # 'PREMIUM': 'PREMIUM',
-    }
+    _API_BASE = 'https://api.atresplayer.com/'
  
      def _real_initialize(self):
          self._login()
  
+    def _handle_error(self, e, code):
+        if isinstance(e.cause, compat_HTTPError) and e.cause.code == code:
+            error = self._parse_json(e.cause.read(), None)
+            if error.get('error') == 'required_registered':
+                self.raise_login_required()
+            raise ExtractorError(error['error_description'], expected=True)
+        raise
+
      def _login(self):
          username, password = self._get_login_info()
          if username is None:
              return
  
-        login_form = {
-            'j_username': username,
-            'j_password': password,
-        }
+        self._request_webpage(
+            self._API_BASE + 'login', None, 'Downloading login page')
  
-        request = sanitized_Request(
-            self._LOGIN_URL, urlencode_postdata(login_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        response = self._download_webpage(
-            request, None, 'Logging in')
+        try:
+            target_url = self._download_json(
+                'https://account.atresmedia.com/api/login', None,
+                'Logging in', headers={
+                    'Content-Type': 'application/x-www-form-urlencoded'
+                }, data=urlencode_postdata({
+                    'username': username,
+                    'password': password,
+                }))['targetUrl']
+        except ExtractorError as e:
+            self._handle_error(e, 400)
  
-        error = self._html_search_regex(
-            r'(?s)<ul[^>]+class="[^"]*\blist_error\b[^"]*">(.+?)</ul>',
-            response, 'error', default=None)
-        if error:
-            raise ExtractorError(
-                'Unable to login: %s' % error, expected=True)
+        self._request_webpage(target_url, None, 'Following Target URL')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
+        display_id, video_id = re.match(self._VALID_URL, url).groups()
  
-        episode_id = self._search_regex(
-            r'episode="([^"]+)"', webpage, 'episode id')
+        try:
+            episode = self._download_json(
+                self._API_BASE + 'client/v1/player/episode/' + video_id, video_id)
+        except ExtractorError as e:
+            self._handle_error(e, 403)
  
-        request = sanitized_Request(
-            self._PLAYER_URL_TEMPLATE % episode_id,
-            headers={'User-Agent': self._USER_AGENT})
-        player = self._download_json(request, episode_id, 'Downloading player JSON')
-
-        episode_type = player.get('typeOfEpisode')
-        error_message = self._ERRORS.get(episode_type)
-        if error_message:
-            raise ExtractorError(
-                '%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
+        title = episode['titulo']
  
          formats = []
-        video_url = player.get('urlVideo')
-        if video_url:
-            format_info = {
-                'url': video_url,
-                'format_id': 'http',
-            }
-            mobj = re.search(r'(?P<bitrate>\d+)K_(?P<width>\d+)x(?P<height>\d+)', video_url)
-            if mobj:
-                format_info.update({
-                    'width': int_or_none(mobj.group('width')),
-                    'height': int_or_none(mobj.group('height')),
-                    'tbr': int_or_none(mobj.group('bitrate')),
-                })
-            formats.append(format_info)
-
-        timestamp = int_or_none(self._download_webpage(
-            self._TIME_API_URL,
-            video_id, 'Downloading timestamp', fatal=False), 1000, time.time())
-        timestamp_shifted = compat_str(timestamp + self._TIMESTAMP_SHIFT)
-        token = hmac.new(
-            self._MAGIC.encode('ascii'),
-            (episode_id + timestamp_shifted).encode('utf-8'), hashlib.md5
-        ).hexdigest()
-
-        request = sanitized_Request(
-            self._URL_VIDEO_TEMPLATE.format('windows', episode_id, timestamp_shifted, token),
-            headers={'User-Agent': self._USER_AGENT})
-
-        fmt_json = self._download_json(
-            request, video_id, 'Downloading windows video JSON')
-
-        result = fmt_json.get('resultDes')
-        if result.lower() != 'ok':
-            raise ExtractorError(
-                '%s returned error: %s' % (self.IE_NAME, result), expected=True)
-
-        for format_id, video_url in fmt_json['resultObject'].items():
-            if format_id == 'token' or not video_url.startswith('http'):
-                continue
-            if 'geodeswowsmpra3player' in video_url:
-                # f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
-                # f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
-                # this videos are protected by DRM, the f4m downloader doesn't support them
+        for source in episode.get('sources', []):
+            src = source.get('src')
+            if not src:
                  continue
-            video_url_hd = video_url.replace('free_es', 'es')
-            formats.extend(self._extract_f4m_formats(
-                video_url_hd[:-9] + '/manifest.f4m', video_id, f4m_id='hds',
-                fatal=False))
-            formats.extend(self._extract_mpd_formats(
-                video_url_hd[:-9] + '/manifest.mpd', video_id, mpd_id='dash',
-                fatal=False))
+            src_type = source.get('type')
+            if src_type == 'application/vnd.apple.mpegurl':
+                formats.extend(self._extract_m3u8_formats(
+                    src, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif src_type == 'application/dash+xml':
+                formats.extend(self._extract_mpd_formats(
+                    src, video_id, mpd_id='dash', fatal=False))
          self._sort_formats(formats)
  
-        path_data = player.get('pathData')
-
-        episode = self._download_xml(
-            self._EPISODE_URL_TEMPLATE % path_data, video_id,
-            'Downloading episode XML')
-
-        duration = float_or_none(xpath_text(
-            episode, './media/asset/info/technical/contentDuration', 'duration'))
-
-        art = episode.find('./media/asset/info/art')
-        title = xpath_text(art, './name', 'title')
-        description = xpath_text(art, './description', 'description')
-        thumbnail = xpath_text(episode, './media/asset/files/background', 'thumbnail')
-
-        subtitles = {}
-        subtitle_url = xpath_text(episode, './media/asset/files/subtitle', 'subtitle')
-        if subtitle_url:
-            subtitles['es'] = [{
-                'ext': 'srt',
-                'url': subtitle_url,
-            }]
+        heartbeat = episode.get('heartbeat') or {}
+        omniture = episode.get('omniture') or {}
+        get_meta = lambda x: heartbeat.get(x) or omniture.get(x)
  
          return {
+            'display_id': display_id,
              'id': video_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
+            'description': episode.get('descripcion'),
+            'thumbnail': episode.get('imgPoster'),
+            'duration': int_or_none(episode.get('duration')),
              'formats': formats,
-            'subtitles': subtitles,
+            'channel': get_meta('channel'),
+            'season': get_meta('season'),
+            'episode_number': int_or_none(get_meta('episodeNumber')),
          }
diff --git a/youtube_dl/extractor/audioboom.py b/youtube_dl/extractor/audioboom.py

index 393f381c6129f41688e2e163b6135a5254eaf3ea..c51837b40615ac95da2d7ed917eb7e9fc8464723 100644 (file)
--- a/youtube_dl/extractor/audioboom.py
+++ b/youtube_dl/extractor/audioboom.py
@@ -2,22 +2,25 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import float_or_none
+from ..utils import (
+    clean_html,
+    float_or_none,
+)
  
  
  class AudioBoomIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?audioboom\.com/(?:boos|posts)/(?P<id>[0-9]+)'
      _TESTS = [{
-        'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
-        'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
+        'url': 'https://audioboom.com/posts/7398103-asim-chaudhry',
+        'md5': '7b00192e593ff227e6a315486979a42d',
          'info_dict': {
-            'id': '4279833',
+            'id': '7398103',
              'ext': 'mp3',
-            'title': '3/09/2016 Czaban Hour 3',
-            'description': 'Guest:   Nate Davis - NFL free agency,   Guest:   Stan Gans',
-            'duration': 2245.72,
-            'uploader': 'SB Nation A.M.',
-            'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
+            'title': 'Asim Chaudhry',
+            'description': 'md5:2f3fef17dacc2595b5362e1d7d3602fc',
+            'duration': 4000.99,
+            'uploader': 'Sue Perkins: An hour or so with...',
+            'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/perkins',
          }
      }, {
          'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
@@ -32,8 +35,8 @@ class AudioBoomIE(InfoExtractor):
          clip = None
  
          clip_store = self._parse_json(
-            self._search_regex(
-                r'data-new-clip-store=(["\'])(?P<json>{.*?"clipId"\s*:\s*%s.*?})\1' % video_id,
+            self._html_search_regex(
+                r'data-new-clip-store=(["\'])(?P<json>{.+?})\1',
                  webpage, 'clip store', default='{}', group='json'),
              video_id, fatal=False)
          if clip_store:
@@ -47,14 +50,15 @@ class AudioBoomIE(InfoExtractor):
  
          audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
              'audio', webpage, 'audio url')
-        title = from_clip('title') or self._og_search_title(webpage)
-        description = from_clip('description') or self._og_search_description(webpage)
+        title = from_clip('title') or self._html_search_meta(
+            ['og:title', 'og:audio:title', 'audio_title'], webpage)
+        description = from_clip('description') or clean_html(from_clip('formattedDescription')) or self._og_search_description(webpage)
  
          duration = float_or_none(from_clip('duration') or self._html_search_meta(
              'weibo:audio:duration', webpage))
  
-        uploader = from_clip('author') or self._og_search_property(
-            'audio:artist', webpage, 'uploader', fatal=False)
+        uploader = from_clip('author') or self._html_search_meta(
+            ['og:audio:artist', 'twitter:audio:artist_name', 'audio_artist'], webpage, 'uploader')
          uploader_url = from_clip('author_url') or self._html_search_meta(
              'audioboo:channel', webpage, 'uploader url')
  
diff --git a/youtube_dl/extractor/azmedien.py b/youtube_dl/extractor/azmedien.py

index fcbdc71b98d98076852e0f88559f4a2ed428d7af..b1e20def5343e6b1a077ff3ba0b36f6a96c4f2c4 100644 (file)
--- a/youtube_dl/extractor/azmedien.py
+++ b/youtube_dl/extractor/azmedien.py
@@ -47,39 +47,19 @@ class AZMedienIE(InfoExtractor):
          'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
          'only_matching': True
      }]
-
+    _API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
      _PARTNER_ID = '1719221'
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        host = mobj.group('host')
-        video_id = mobj.group('id')
-        entry_id = mobj.group('kaltura_id')
+        host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
  
          if not entry_id:
-            api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
-            payload = {
-                'query': '''query VideoContext($articleId: ID!) {
-                    article: node(id: $articleId) {
-                      ... on Article {
-                        mainAssetRelation {
-                          asset {
-                            ... on VideoAsset {
-                              kalturaId
-                            }
-                          }
-                        }
-                      }
-                    }
-                  }''',
-                'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
-            }
-            json_data = self._download_json(
-                api_url, video_id, headers={
-                    'Content-Type': 'application/json',
-                },
-                data=json.dumps(payload).encode())
-            entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
+            entry_id = self._download_json(
+                self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
+                    'variables': json.dumps({
+                        'contextId': 'NewsArticle:' + article_id,
+                    }),
+                })['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
  
          return self.url_result(
              'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
diff --git a/youtube_dl/extractor/bambuser.py b/youtube_dl/extractor/bambuser.py

deleted file mode 100644 (file)

index 4400ff9..0000000
--- a/youtube_dl/extractor/bambuser.py
+++ /dev/null
@@ -1,142 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-import itertools
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    ExtractorError,
-    float_or_none,
-    int_or_none,
-    sanitized_Request,
-    urlencode_postdata,
-)
-
-
-class BambuserIE(InfoExtractor):
-    IE_NAME = 'bambuser'
-    _VALID_URL = r'https?://bambuser\.com/v/(?P<id>\d+)'
-    _API_KEY = '005f64509e19a868399060af746a00aa'
-    _LOGIN_URL = 'https://bambuser.com/user'
-    _NETRC_MACHINE = 'bambuser'
-
-    _TEST = {
-        'url': 'http://bambuser.com/v/4050584',
-        # MD5 seems to be flaky, see https://travis-ci.org/ytdl-org/youtube-dl/jobs/14051016#L388
-        # 'md5': 'fba8f7693e48fd4e8641b3fd5539a641',
-        'info_dict': {
-            'id': '4050584',
-            'ext': 'flv',
-            'title': 'Education engineering days - lightning talks',
-            'duration': 3741,
-            'uploader': 'pixelversity',
-            'uploader_id': '344706',
-            'timestamp': 1382976692,
-            'upload_date': '20131028',
-            'view_count': int,
-        },
-        'params': {
-            # It doesn't respect the 'Range' header, it would download the whole video
-            # caused the travis builds to fail: https://travis-ci.org/ytdl-org/youtube-dl/jobs/14493845#L59
-            'skip_download': True,
-        },
-    }
-
-    def _login(self):
-        username, password = self._get_login_info()
-        if username is None:
-            return
-
-        login_form = {
-            'form_id': 'user_login',
-            'op': 'Log in',
-            'name': username,
-            'pass': password,
-        }
-
-        request = sanitized_Request(
-            self._LOGIN_URL, urlencode_postdata(login_form))
-        request.add_header('Referer', self._LOGIN_URL)
-        response = self._download_webpage(
-            request, None, 'Logging in')
-
-        login_error = self._html_search_regex(
-            r'(?s)<div class="messages error">(.+?)</div>',
-            response, 'login error', default=None)
-        if login_error:
-            raise ExtractorError(
-                'Unable to login: %s' % login_error, expected=True)
-
-    def _real_initialize(self):
-        self._login()
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        info = self._download_json(
-            'http://player-c.api.bambuser.com/getVideo.json?api_key=%s&vid=%s'
-            % (self._API_KEY, video_id), video_id)
-
-        error = info.get('error')
-        if error:
-            raise ExtractorError(
-                '%s returned error: %s' % (self.IE_NAME, error), expected=True)
-
-        result = info['result']
-
-        return {
-            'id': video_id,
-            'title': result['title'],
-            'url': result['url'],
-            'thumbnail': result.get('preview'),
-            'duration': int_or_none(result.get('length')),
-            'uploader': result.get('username'),
-            'uploader_id': compat_str(result.get('owner', {}).get('uid')),
-            'timestamp': int_or_none(result.get('created')),
-            'fps': float_or_none(result.get('framerate')),
-            'view_count': int_or_none(result.get('views_total')),
-            'comment_count': int_or_none(result.get('comment_count')),
-        }
-
-
-class BambuserChannelIE(InfoExtractor):
-    IE_NAME = 'bambuser:channel'
-    _VALID_URL = r'https?://bambuser\.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
-    # The maximum number we can get with each request
-    _STEP = 50
-    _TEST = {
-        'url': 'http://bambuser.com/channel/pixelversity',
-        'info_dict': {
-            'title': 'pixelversity',
-        },
-        'playlist_mincount': 60,
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        user = mobj.group('user')
-        urls = []
-        last_id = ''
-        for i in itertools.count(1):
-            req_url = (
-                'http://bambuser.com/xhr-api/index.php?username={user}'
-                '&sort=created&access_mode=0%2C1%2C2&limit={count}'
-                '&method=broadcast&format=json&vid_older_than={last}'
-            ).format(user=user, count=self._STEP, last=last_id)
-            req = sanitized_Request(req_url)
-            # Without setting this header, we wouldn't get any result
-            req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
-            data = self._download_json(
-                req, user, 'Downloading page %d' % i)
-            results = data['result']
-            if not results:
-                break
-            last_id = results[-1]['vid']
-            urls.extend(self.url_result(v['page'], 'Bambuser') for v in results)
-
-        return {
-            '_type': 'playlist',
-            'title': user,
-            'entries': urls,
-        }
diff --git a/youtube_dl/extractor/bellmedia.py b/youtube_dl/extractor/bellmedia.py

index f36a2452d4e9aba89dde708e3bd51ada35ada301..485173774d9f9c2534f9b18f1668a8d5fb204dc9 100644 (file)
--- a/youtube_dl/extractor/bellmedia.py
+++ b/youtube_dl/extractor/bellmedia.py
@@ -22,7 +22,8 @@ class BellMediaIE(InfoExtractor):
                  bravo|
                  mtv|
                  space|
-                etalk
+                etalk|
+                marilyn
              )\.ca|
              much\.com
          )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
@@ -70,6 +71,7 @@ class BellMediaIE(InfoExtractor):
          'animalplanet': 'aniplan',
          'etalk': 'ctv',
          'bnnbloomberg': 'bnn',
+        'marilyn': 'ctv_marilyn',
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/bitchute.py b/youtube_dl/extractor/bitchute.py

index 430663fbf93166a67ee1b7088a77831b59cd60f6..0c773e66e1c7349802b4bb8425e844163d18da75 100644 (file)
--- a/youtube_dl/extractor/bitchute.py
+++ b/youtube_dl/extractor/bitchute.py
@@ -7,6 +7,7 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      orderedSet,
+    unified_strdate,
      urlencode_postdata,
  )
  
@@ -23,6 +24,7 @@ class BitChuteIE(InfoExtractor):
              'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a',
              'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Victoria X Rave',
+            'upload_date': '20170813',
          },
      }, {
          'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
@@ -74,12 +76,17 @@ class BitChuteIE(InfoExtractor):
               r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>'),
              webpage, 'uploader', fatal=False)
  
+        upload_date = unified_strdate(self._search_regex(
+            r'class=["\']video-publish-date[^>]+>[^<]+ at \d+:\d+ UTC on (.+?)\.',
+            webpage, 'upload date', fatal=False))
+
          return {
              'id': video_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'uploader': uploader,
+            'upload_date': upload_date,
              'formats': formats,
          }
  
diff --git a/youtube_dl/extractor/bokecc.py b/youtube_dl/extractor/bokecc.py

index 86a7f4d7d6261966101508023450e08f5d6c30d7..6017e834495efa07ee3fd3adeff28cafc6939452 100644 (file)
--- a/youtube_dl/extractor/bokecc.py
+++ b/youtube_dl/extractor/bokecc.py
@@ -11,8 +11,8 @@ from ..utils import ExtractorError
  class BokeCCBaseIE(InfoExtractor):
      def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
          player_params_str = self._html_search_regex(
-            r'<(?:script|embed)[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
-            webpage, 'player params')
+            r'<(?:script|embed)[^>]+src=(?P<q>["\'])(?:https?:)?//p\.bokecc\.com/(?:player|flash/player\.swf)\?(?P<query>.+?)(?P=q)',
+            webpage, 'player params', group='query')
  
          player_params = compat_parse_qs(player_params_str)
  
@@ -36,9 +36,9 @@ class BokeCCIE(BokeCCBaseIE):
      _VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
  
      _TESTS = [{
-        'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',
+        'url': 'http://union.bokecc.com/playvideo.bo?vid=E0ABAE9D4F509B189C33DC5901307461&uid=FE644790DE9D154A',
          'info_dict': {
-            'id': 'CD0C5D3C8614B28B_E44D40C15E65EA30',
+            'id': 'FE644790DE9D154A_E0ABAE9D4F509B189C33DC5901307461',
              'ext': 'flv',
              'title': 'BokeCC Video',
          },
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 8e2f7217ab85a81a58d1bb902af02b6e62ec2ab6..85001b3ad3cf100d093a16806085b02fa5933d5a 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -586,45 +586,63 @@ class BrightcoveNewIE(AdobePassIE):
  
          account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
  
-        webpage = self._download_webpage(
-            'http://players.brightcove.net/%s/%s_%s/index.min.js'
-            % (account_id, player_id, embed), video_id)
+        policy_key_id = '%s_%s' % (account_id, player_id)
+        policy_key = self._downloader.cache.load('brightcove', policy_key_id)
+        policy_key_extracted = False
+        store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
  
-        policy_key = None
+        def extract_policy_key():
+            webpage = self._download_webpage(
+                'http://players.brightcove.net/%s/%s_%s/index.min.js'
+                % (account_id, player_id, embed), video_id)
  
-        catalog = self._search_regex(
-            r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
-        if catalog:
-            catalog = self._parse_json(
-                js_to_json(catalog), video_id, fatal=False)
+            policy_key = None
+
+            catalog = self._search_regex(
+                r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
              if catalog:
-                policy_key = catalog.get('policyKey')
+                catalog = self._parse_json(
+                    js_to_json(catalog), video_id, fatal=False)
+                if catalog:
+                    policy_key = catalog.get('policyKey')
+
+            if not policy_key:
+                policy_key = self._search_regex(
+                    r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+                    webpage, 'policy key', group='pk')
  
-        if not policy_key:
-            policy_key = self._search_regex(
-                r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
-                webpage, 'policy key', group='pk')
+            store_pk(policy_key)
+            return policy_key
  
          api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
-        headers = {
-            'Accept': 'application/json;pk=%s' % policy_key,
-        }
+        headers = {}
          referrer = smuggled_data.get('referrer')
          if referrer:
              headers.update({
                  'Referer': referrer,
                  'Origin': re.search(r'https?://[^/]+', referrer).group(0),
              })
-        try:
-            json_data = self._download_json(api_url, video_id, headers=headers)
-        except ExtractorError as e:
-            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
-                json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
-                message = json_data.get('message') or json_data['error_code']
-                if json_data.get('error_subcode') == 'CLIENT_GEO':
-                    self.raise_geo_restricted(msg=message)
-                raise ExtractorError(message, expected=True)
-            raise
+
+        for _ in range(2):
+            if not policy_key:
+                policy_key = extract_policy_key()
+                policy_key_extracted = True
+            headers['Accept'] = 'application/json;pk=%s' % policy_key
+            try:
+                json_data = self._download_json(api_url, video_id, headers=headers)
+                break
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
+                    json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
+                    message = json_data.get('message') or json_data['error_code']
+                    if json_data.get('error_subcode') == 'CLIENT_GEO':
+                        self.raise_geo_restricted(msg=message)
+                    elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
+                        policy_key = None
+                        store_pk(None)
+                        continue
+                    raise ExtractorError(message, expected=True)
+                raise
  
          errors = json_data.get('errors')
          if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
diff --git a/youtube_dl/extractor/businessinsider.py b/youtube_dl/extractor/businessinsider.py

index dfcf9bc6b50b9274d2e45ff7e0b6d1af9920cab0..73a57b1e4db835ab09ac308704bd105d796628ac 100644 (file)
--- a/youtube_dl/extractor/businessinsider.py
+++ b/youtube_dl/extractor/businessinsider.py
@@ -9,21 +9,26 @@ class BusinessInsiderIE(InfoExtractor):
      _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
-        'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
+        'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
          'info_dict': {
-            'id': 'hZRllCfw',
+            'id': 'cjGDb0X9',
              'ext': 'mp4',
-            'title': "Here's how much radiation you're exposed to in everyday life",
-            'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
-            'upload_date': '20170709',
-            'timestamp': 1499606400,
-        },
-        'params': {
-            'skip_download': True,
+            'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
+            'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
+            'upload_date': '20160611',
+            'timestamp': 1465675620,
          },
      }, {
          'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
-        'only_matching': True,
+        'md5': '43f438dbc6da0b89f5ac42f68529d84a',
+        'info_dict': {
+            'id': '5zJwd4FK',
+            'ext': 'mp4',
+            'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
+            'description': 'md5:2af8975825d38a4fed24717bbe51db49',
+            'upload_date': '20170705',
+            'timestamp': 1499270528,
+        },
      }, {
          'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
          'only_matching': True,
@@ -35,7 +40,8 @@ class BusinessInsiderIE(InfoExtractor):
          jwplatform_id = self._search_regex(
              (r'data-media-id=["\']([a-zA-Z0-9]{8})',
               r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
-             r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
+             r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
+             r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
              webpage, 'jwplatform id')
          return self.url_result(
              'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py

index c506bc5dd2402a95752bdf3223fe4a24cf9d06ae..8667a0d0457cccfc145cc52bc1eb1c7816aa04b8 100644 (file)
--- a/youtube_dl/extractor/canvas.py
+++ b/youtube_dl/extractor/canvas.py
@@ -13,6 +13,8 @@ from ..utils import (
      int_or_none,
      merge_dicts,
      parse_iso8601,
+    str_or_none,
+    url_or_none,
  )
  
  
@@ -20,15 +22,15 @@ class CanvasIE(InfoExtractor):
      _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
-        'md5': '90139b746a0a9bd7bb631283f6e2a64e',
+        'md5': '68993eda72ef62386a15ea2cf3c93107',
          'info_dict': {
              'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
              'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Nachtwacht: De Greystook',
-            'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
+            'description': 'Nachtwacht: De Greystook',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 1468.03,
+            'duration': 1468.04,
          },
          'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
      }, {
@@ -39,23 +41,45 @@ class CanvasIE(InfoExtractor):
          'HLS': 'm3u8_native',
          'HLS_AES': 'm3u8',
      }
+    _REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          site_id, video_id = mobj.group('site_id'), mobj.group('id')
  
+        # Old API endpoint, serves more formats but may fail for some videos
          data = self._download_json(
              'https://mediazone.vrt.be/api/v1/%s/assets/%s'
-            % (site_id, video_id), video_id)
+            % (site_id, video_id), video_id, 'Downloading asset JSON',
+            'Unable to download asset JSON', fatal=False)
+
+        # New API endpoint
+        if not data:
+            token = self._download_json(
+                '%s/tokens' % self._REST_API_BASE, video_id,
+                'Downloading token', data=b'',
+                headers={'Content-Type': 'application/json'})['vrtPlayerToken']
+            data = self._download_json(
+                '%s/videos/%s' % (self._REST_API_BASE, video_id),
+                video_id, 'Downloading video JSON', fatal=False, query={
+                    'vrtPlayerToken': token,
+                    'client': '%s@PROD' % site_id,
+                }, expected_status=400)
+            message = data.get('message')
+            if message and not data.get('title'):
+                if data.get('code') == 'AUTHENTICATION_REQUIRED':
+                    self.raise_login_required(message)
+                raise ExtractorError(message, expected=True)
  
          title = data['title']
          description = data.get('description')
  
          formats = []
          for target in data['targetUrls']:
-            format_url, format_type = target.get('url'), target.get('type')
+            format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
              if not format_url or not format_type:
                  continue
+            format_type = format_type.upper()
              if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
                  formats.extend(self._extract_m3u8_formats(
                      format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@@ -134,20 +158,20 @@ class CanvasEenIE(InfoExtractor):
          },
          'skip': 'Pagina niet gevonden',
      }, {
-        'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
+        'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
          'info_dict': {
-            'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
-            'display_id': 'herbekijk-sorry-voor-alles',
+            'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
+            'display_id': 'emma-pakt-thilly-aan',
              'ext': 'mp4',
-            'title': 'Herbekijk Sorry voor alles',
-            'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
+            'title': 'Emma pakt Thilly aan',
+            'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 3788.06,
+            'duration': 118.24,
          },
          'params': {
              'skip_download': True,
          },
-        'skip': 'Episode no longer available',
+        'expected_warnings': ['is not a supported codec'],
      }, {
          'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
          'only_matching': True,
@@ -183,19 +207,44 @@ class VrtNUIE(GigyaBaseIE):
      IE_DESC = 'VrtNU.be'
      _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
+        # Available via old API endpoint
          'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
          'info_dict': {
              'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'De zwarte weduwe',
-            'description': 'md5:d90c21dced7db869a85db89a623998d4',
+            'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
              'duration': 1457.04,
              'thumbnail': r're:^https?://.*\.jpg$',
-            'season': '1',
+            'season': 'Season 1',
              'season_number': 1,
              'episode_number': 1,
          },
-        'skip': 'This video is only available for registered users'
+        'skip': 'This video is only available for registered users',
+        'params': {
+            'username': '<snip>',
+            'password': '<snip>',
+        },
+        'expected_warnings': ['is not a supported codec'],
+    }, {
+        # Only available via new API endpoint
+        'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
+        'info_dict': {
+            'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
+            'ext': 'mp4',
+            'title': 'Aflevering 5',
+            'description': 'Wie valt door de mand tijdens een missie?',
+            'duration': 2967.06,
+            'season': 'Season 1',
+            'season_number': 1,
+            'episode_number': 5,
+        },
+        'skip': 'This video is only available for registered users',
+        'params': {
+            'username': '<snip>',
+            'password': '<snip>',
+        },
+        'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
      }]
      _NETRC_MACHINE = 'vrtnu'
      _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py

index 1ec58f7d8d40d92eacb0aa723e49f189f118ffcd..7cb4efb746b5f7eae929ccd16993e530122d9ed3 100644 (file)
--- a/youtube_dl/extractor/ceskatelevize.py
+++ b/youtube_dl/extractor/ceskatelevize.py
@@ -147,6 +147,8 @@ class CeskaTelevizeIE(InfoExtractor):
                  is_live = item.get('type') == 'LIVE'
                  formats = []
                  for format_id, stream_url in item.get('streamUrls', {}).items():
+                    if 'drmOnly=true' in stream_url:
+                        continue
                      if 'playerType=flash' in stream_url:
                          stream_formats = self._extract_m3u8_formats(
                              stream_url, playlist_id, 'mp4', 'm3u8_native',
diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py

index 81108e70424f5a98bc97dcdb2ee9b11c6869ed08..09cacf6d3b7e56029ef675b93c970c7860660ada 100644 (file)
--- a/youtube_dl/extractor/channel9.py
+++ b/youtube_dl/extractor/channel9.py
@@ -32,7 +32,7 @@ class Channel9IE(InfoExtractor):
              'upload_date': '20130828',
              'session_code': 'KOS002',
              'session_room': 'Arena 1A',
-            'session_speakers': ['Andrew Coates', 'Brady Gaster', 'Mads Kristensen', 'Ed Blankenship', 'Patrick Klug'],
+            'session_speakers': 'count:5',
          },
      }, {
          'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
@@ -65,14 +65,14 @@ class Channel9IE(InfoExtractor):
              'skip_download': True,
          },
      }, {
-        'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
+        'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
          'info_dict': {
-            'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
-            'title': 'Channel 9',
+            'id': 'Events/DEVintersection/DEVintersection-2016',
+            'title': 'DEVintersection 2016 Orlando Sessions',
          },
-        'playlist_mincount': 100,
+        'playlist_mincount': 14,
      }, {
-        'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
+        'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
          'only_matching': True,
      }, {
          'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
@@ -112,11 +112,11 @@ class Channel9IE(InfoExtractor):
                  episode_data), content_path)
              content_id = episode_data['contentId']
              is_session = '/Sessions(' in episode_data['api']
-            content_url = 'https://channel9.msdn.com/odata' + episode_data['api']
+            content_url = 'https://channel9.msdn.com/odata' + episode_data['api'] + '?$select=Captions,CommentCount,MediaLengthInSeconds,PublishedDate,Rating,RatingCount,Title,VideoMP4High,VideoMP4Low,VideoMP4Medium,VideoPlayerPreviewImage,VideoWMV,VideoWMVHQ,Views,'
              if is_session:
-                content_url += '?$expand=Speakers'
+                content_url += 'Code,Description,Room,Slides,Speakers,ZipFile&$expand=Speakers'
              else:
-                content_url += '?$expand=Authors'
+                content_url += 'Authors,Body&$expand=Authors'
              content_data = self._download_json(content_url, content_id)
              title = content_data['Title']
  
@@ -210,7 +210,7 @@ class Channel9IE(InfoExtractor):
                  'id': content_id,
                  'title': title,
                  'description': clean_html(content_data.get('Description') or content_data.get('Body')),
-                'thumbnail': content_data.get('Thumbnail') or content_data.get('VideoPlayerPreviewImage'),
+                'thumbnail': content_data.get('VideoPlayerPreviewImage'),
                  'duration': int_or_none(content_data.get('MediaLengthInSeconds')),
                  'timestamp': parse_iso8601(content_data.get('PublishedDate')),
                  'avg_rating': int_or_none(content_data.get('Rating')),
diff --git a/youtube_dl/extractor/chaturbate.py b/youtube_dl/extractor/chaturbate.py

index e2b828d8afdada257b6b4f458022947522180642..a459dcb8d5fbedc00f068da023392f0898d687a2 100644 (file)
--- a/youtube_dl/extractor/chaturbate.py
+++ b/youtube_dl/extractor/chaturbate.py
@@ -3,11 +3,15 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    lowercase_escape,
+    url_or_none,
+)
  
  
  class ChaturbateIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?:fullvideo/?\?.*?\bb=)?(?P<id>[^/?&#]+)'
      _TESTS = [{
          'url': 'https://www.chaturbate.com/siswet19/',
          'info_dict': {
@@ -21,6 +25,9 @@ class ChaturbateIE(InfoExtractor):
              'skip_download': True,
          },
          'skip': 'Room is offline',
+    }, {
+        'url': 'https://chaturbate.com/fullvideo/?b=caylin',
+        'only_matching': True,
      }, {
          'url': 'https://en.chaturbate.com/siswet19/',
          'only_matching': True,
@@ -32,14 +39,34 @@ class ChaturbateIE(InfoExtractor):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(
-            url, video_id, headers=self.geo_verification_headers())
+            'https://chaturbate.com/%s/' % video_id, video_id,
+            headers=self.geo_verification_headers())
  
-        m3u8_urls = []
+        found_m3u8_urls = []
+
+        data = self._parse_json(
+            self._search_regex(
+                r'initialRoomDossier\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
+                webpage, 'data', default='{}', group='value'),
+            video_id, transform_source=lowercase_escape, fatal=False)
+        if data:
+            m3u8_url = url_or_none(data.get('hls_source'))
+            if m3u8_url:
+                found_m3u8_urls.append(m3u8_url)
  
-        for m in re.finditer(
-                r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
-            m3u8_fast_url, m3u8_no_fast_url = m.group('url'), m.group(
-                'url').replace('_fast', '')
+        if not found_m3u8_urls:
+            for m in re.finditer(
+                    r'(\\u002[27])(?P<url>http.+?\.m3u8.*?)\1', webpage):
+                found_m3u8_urls.append(lowercase_escape(m.group('url')))
+
+        if not found_m3u8_urls:
+            for m in re.finditer(
+                    r'(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage):
+                found_m3u8_urls.append(m.group('url'))
+
+        m3u8_urls = []
+        for found_m3u8_url in found_m3u8_urls:
+            m3u8_fast_url, m3u8_no_fast_url = found_m3u8_url, found_m3u8_url.replace('_fast', '')
              for m3u8_url in (m3u8_fast_url, m3u8_no_fast_url):
                  if m3u8_url not in m3u8_urls:
                      m3u8_urls.append(m3u8_url)
@@ -59,7 +86,12 @@ class ChaturbateIE(InfoExtractor):
  
          formats = []
          for m3u8_url in m3u8_urls:
-            m3u8_id = 'fast' if '_fast' in m3u8_url else 'slow'
+            for known_id in ('fast', 'slow'):
+                if '_%s' % known_id in m3u8_url:
+                    m3u8_id = known_id
+                    break
+            else:
+                m3u8_id = None
              formats.extend(self._extract_m3u8_formats(
                  m3u8_url, video_id, ext='mp4',
                  # ffmpeg skips segments for fast m3u8
diff --git a/youtube_dl/extractor/cloudflarestream.py b/youtube_dl/extractor/cloudflarestream.py

index 8ff2c6531570ee3a06210832cf967c7033d6ab9f..2fdcfbb3af1fbffb9e66abff56b86e31762ad449 100644 (file)
--- a/youtube_dl/extractor/cloudflarestream.py
+++ b/youtube_dl/extractor/cloudflarestream.py
@@ -1,20 +1,24 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import base64
  import re
  
  from .common import InfoExtractor
  
  
  class CloudflareStreamIE(InfoExtractor):
+    _DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
+    _EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
+    _ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
-                            (?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
-                            embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
+                            (?:watch\.)?%s/|
+                            %s
                          )
-                        (?P<id>[\da-f]+)
-                    '''
+                        (?P<id>%s)
+                    ''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
      _TESTS = [{
          'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
          'info_dict': {
@@ -41,23 +45,28 @@ class CloudflareStreamIE(InfoExtractor):
          return [
              mobj.group('url')
              for mobj in re.finditer(
-                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
+                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
                  webpage)]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
+        base_url = 'https://%s/%s/' % (domain, video_id)
+        if '.' in video_id:
+            video_id = self._parse_json(base64.urlsafe_b64decode(
+                video_id.split('.')[1]), video_id)['sub']
+        manifest_base_url = base_url + 'manifest/video.'
  
          formats = self._extract_m3u8_formats(
-            'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
-            video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
-            fatal=False)
+            manifest_base_url + 'm3u8', video_id, 'mp4',
+            'm3u8_native', m3u8_id='hls', fatal=False)
          formats.extend(self._extract_mpd_formats(
-            'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
-            video_id, mpd_id='dash', fatal=False))
+            manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': video_id,
+            'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/comcarcoff.py b/youtube_dl/extractor/comcarcoff.py

deleted file mode 100644 (file)

index 588aad0..0000000
--- a/youtube_dl/extractor/comcarcoff.py
+++ /dev/null
@@ -1,74 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    int_or_none,
-    parse_duration,
-    parse_iso8601,
-)
-
-
-class ComCarCoffIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
-    _TESTS = [{
-        'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
-        'info_dict': {
-            'id': '2494164',
-            'ext': 'mp4',
-            'upload_date': '20141127',
-            'timestamp': 1417107600,
-            'duration': 1232,
-            'title': 'Happy Thanksgiving Miranda',
-            'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
-        },
-        'params': {
-            'skip_download': 'requires ffmpeg',
-        }
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        if not display_id:
-            display_id = 'comediansincarsgettingcoffee.com'
-        webpage = self._download_webpage(url, display_id)
-
-        full_data = self._parse_json(
-            self._search_regex(
-                r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
-            display_id)['videoData']
-
-        display_id = full_data['activeVideo']['video']
-        video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
-
-        video_id = compat_str(video_data['mediaId'])
-        title = video_data['title']
-        formats = self._extract_m3u8_formats(
-            video_data['mediaUrl'], video_id, 'mp4')
-        self._sort_formats(formats)
-
-        thumbnails = [{
-            'url': video_data['images']['thumb'],
-        }, {
-            'url': video_data['images']['poster'],
-        }]
-
-        timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
-            video_data.get('pubDate'))
-        duration = int_or_none(video_data.get('durationSeconds')) or parse_duration(
-            video_data.get('duration'))
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': video_data.get('description'),
-            'timestamp': timestamp,
-            'duration': duration,
-            'thumbnails': thumbnails,
-            'formats': formats,
-            'season_number': int_or_none(video_data.get('season')),
-            'episode_number': int_or_none(video_data.get('episode')),
-            'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
-        }
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 85978661793a77411419d619f654a33755d23d4b..eaae5e484f99311ccf018301f773c87f9c8cf544 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -1424,12 +1424,10 @@ class InfoExtractor(object):
          try:
              self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
              return True
-        except ExtractorError as e:
-            if isinstance(e.cause, compat_urllib_error.URLError):
-                self.to_screen(
-                    '%s: %s URL is invalid, skipping' % (video_id, item))
-                return False
-            raise
+        except ExtractorError:
+            self.to_screen(
+                '%s: %s URL is invalid, skipping' % (video_id, item))
+            return False
  
      def http_scheme(self):
          """ Either "http:" or "https:", depending on the user's preferences """
@@ -1457,14 +1455,14 @@ class InfoExtractor(object):
  
      def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
                               transform_source=lambda s: fix_xml_ampersands(s).strip(),
-                             fatal=True, m3u8_id=None):
+                             fatal=True, m3u8_id=None, data=None, headers={}, query={}):
          manifest = self._download_xml(
              manifest_url, video_id, 'Downloading f4m manifest',
              'Unable to download f4m manifest',
              # Some manifests may be malformed, e.g. prosiebensat1 generated manifests
              # (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244)
              transform_source=transform_source,
-            fatal=fatal)
+            fatal=fatal, data=data, headers=headers, query=query)
  
          if manifest is False:
              return []
@@ -1588,12 +1586,13 @@ class InfoExtractor(object):
      def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
                                entry_protocol='m3u8', preference=None,
                                m3u8_id=None, note=None, errnote=None,
-                              fatal=True, live=False):
+                              fatal=True, live=False, data=None, headers={},
+                              query={}):
          res = self._download_webpage_handle(
              m3u8_url, video_id,
              note=note or 'Downloading m3u8 information',
              errnote=errnote or 'Failed to download m3u8 information',
-            fatal=fatal)
+            fatal=fatal, data=data, headers=headers, query=query)
  
          if res is False:
              return []
@@ -1767,6 +1766,19 @@ class InfoExtractor(object):
                          # the same GROUP-ID
                          f['acodec'] = 'none'
                  formats.append(f)
+
+                # for DailyMotion
+                progressive_uri = last_stream_inf.get('PROGRESSIVE-URI')
+                if progressive_uri:
+                    http_f = f.copy()
+                    del http_f['manifest_url']
+                    http_f.update({
+                        'format_id': f['format_id'].replace('hls-', 'http-'),
+                        'protocol': 'http',
+                        'url': progressive_uri,
+                    })
+                    formats.append(http_f)
+
                  last_stream_inf = {}
          return formats
  
@@ -2011,12 +2023,12 @@ class InfoExtractor(object):
              })
          return entries
  
-    def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
+    def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}, data=None, headers={}, query={}):
          res = self._download_xml_handle(
              mpd_url, video_id,
              note=note or 'Downloading MPD manifest',
              errnote=errnote or 'Failed to download MPD manifest',
-            fatal=fatal)
+            fatal=fatal, data=data, headers=headers, query=query)
          if res is False:
              return []
          mpd_doc, urlh = res
@@ -2319,12 +2331,12 @@ class InfoExtractor(object):
                          self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
          return formats
  
-    def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
+    def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
          res = self._download_xml_handle(
              ism_url, video_id,
              note=note or 'Downloading ISM manifest',
              errnote=errnote or 'Failed to download ISM manifest',
-            fatal=fatal)
+            fatal=fatal, data=data, headers=headers, query=query)
          if res is False:
              return []
          ism_doc, urlh = res
@@ -2691,7 +2703,7 @@ class InfoExtractor(object):
              entry = {
                  'id': this_video_id,
                  'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
-                'description': video_data.get('description'),
+                'description': clean_html(video_data.get('description')),
                  'thumbnail': urljoin(base_url, self._proto_relative_url(video_data.get('image'))),
                  'timestamp': int_or_none(video_data.get('pubdate')),
                  'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
diff --git a/youtube_dl/extractor/contv.py b/youtube_dl/extractor/contv.py

new file mode 100644 (file)

index 0000000..84b462d
--- /dev/null
+++ b/youtube_dl/extractor/contv.py
@@ -0,0 +1,118 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    int_or_none,
+)
+
+
+class CONtvIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?contv\.com/details-movie/(?P<id>[^/]+)'
+    _TESTS = [{
+        'url': 'https://www.contv.com/details-movie/CEG10022949/days-of-thrills-&-laughter',
+        'info_dict': {
+            'id': 'CEG10022949',
+            'ext': 'mp4',
+            'title': 'Days Of Thrills & Laughter',
+            'description': 'md5:5d6b3d0b1829bb93eb72898c734802eb',
+            'upload_date': '20180703',
+            'timestamp': 1530634789.61,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://www.contv.com/details-movie/CLIP-show_fotld_bts/fight-of-the-living-dead:-behind-the-scenes-bites',
+        'info_dict': {
+            'id': 'CLIP-show_fotld_bts',
+            'title': 'Fight of the Living Dead: Behind the Scenes Bites',
+        },
+        'playlist_mincount': 7,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        details = self._download_json(
+            'http://metax.contv.live.junctiontv.net/metax/2.5/details/' + video_id,
+            video_id, query={'device': 'web'})
+
+        if details.get('type') == 'episodic':
+            seasons = self._download_json(
+                'http://metax.contv.live.junctiontv.net/metax/2.5/seriesfeed/json/' + video_id,
+                video_id)
+            entries = []
+            for season in seasons:
+                for episode in season.get('episodes', []):
+                    episode_id = episode.get('id')
+                    if not episode_id:
+                        continue
+                    entries.append(self.url_result(
+                        'https://www.contv.com/details-movie/' + episode_id,
+                        CONtvIE.ie_key(), episode_id))
+            return self.playlist_result(entries, video_id, details.get('title'))
+
+        m_details = details['details']
+        title = details['title']
+
+        formats = []
+
+        media_hls_url = m_details.get('media_hls_url')
+        if media_hls_url:
+            formats.extend(self._extract_m3u8_formats(
+                media_hls_url, video_id, 'mp4',
+                m3u8_id='hls', fatal=False))
+
+        media_mp4_url = m_details.get('media_mp4_url')
+        if media_mp4_url:
+            formats.append({
+                'format_id': 'http',
+                'url': media_mp4_url,
+            })
+
+        self._sort_formats(formats)
+
+        subtitles = {}
+        captions = m_details.get('captions') or {}
+        for caption_url in captions.values():
+            subtitles.setdefault('en', []).append({
+                'url': caption_url
+            })
+
+        thumbnails = []
+        for image in m_details.get('images', []):
+            image_url = image.get('url')
+            if not image_url:
+                continue
+            thumbnails.append({
+                'url': image_url,
+                'width': int_or_none(image.get('width')),
+                'height': int_or_none(image.get('height')),
+            })
+
+        description = None
+        for p in ('large_', 'medium_', 'small_', ''):
+            d = m_details.get(p + 'description')
+            if d:
+                description = d
+                break
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'thumbnails': thumbnails,
+            'description': description,
+            'timestamp': float_or_none(details.get('metax_added_on'), 1000),
+            'subtitles': subtitles,
+            'duration': float_or_none(m_details.get('duration'), 1000),
+            'view_count': int_or_none(details.get('num_watched')),
+            'like_count': int_or_none(details.get('num_fav')),
+            'categories': details.get('category'),
+            'tags': details.get('tags'),
+            'season_number': int_or_none(details.get('season')),
+            'episode_number': int_or_none(details.get('episode')),
+            'release_year': int_or_none(details.get('pub_year')),
+        }
diff --git a/youtube_dl/extractor/corus.py b/youtube_dl/extractor/corus.py

index a1b2518047f8564df2dc74c9762cfcf1a8801493..e11aadf14fe960be6d9e8c583cf103f0b75ec057 100644 (file)
--- a/youtube_dl/extractor/corus.py
+++ b/youtube_dl/extractor/corus.py
@@ -4,7 +4,12 @@ from __future__ import unicode_literals
  import re
  
  from .theplatform import ThePlatformFeedIE
-from ..utils import int_or_none
+from ..utils import (
+    dict_get,
+    ExtractorError,
+    float_or_none,
+    int_or_none,
+)
  
  
  class CorusIE(ThePlatformFeedIE):
@@ -12,24 +17,49 @@ class CorusIE(ThePlatformFeedIE):
                      https?://
                          (?:www\.)?
                          (?P<domain>
-                            (?:globaltv|etcanada)\.com|
-                            (?:hgtv|foodnetwork|slice|history|showcase|bigbrothercanada)\.ca
+                            (?:
+                                globaltv|
+                                etcanada|
+                                seriesplus|
+                                wnetwork|
+                                ytv
+                            )\.com|
+                            (?:
+                                hgtv|
+                                foodnetwork|
+                                slice|
+                                history|
+                                showcase|
+                                bigbrothercanada|
+                                abcspark|
+                                disney(?:channel|lachaine)
+                            )\.ca
+                        )
+                        /(?:[^/]+/)*
+                        (?:
+                            video\.html\?.*?\bv=|
+                            videos?/(?:[^/]+/)*(?:[a-z0-9-]+-)?
+                        )
+                        (?P<id>
+                            [\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}|
+                            (?:[A-Z]{4})?\d{12,20}
                          )
-                        /(?:video/(?:[^/]+/)?|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
-                        (?P<id>\d+)
                      '''
      _TESTS = [{
          'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
-        'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
          'info_dict': {
              'id': '870923331648',
              'ext': 'mp4',
              'title': 'Movie Night Popcorn with Bryan',
              'description': 'Bryan whips up homemade popcorn, the old fashion way for Jojo and Lincoln.',
-            'uploader': 'SHWM-NEW',
              'upload_date': '20170206',
              'timestamp': 1486392197,
          },
+        'params': {
+            'format': 'bestvideo',
+            'skip_download': True,
+        },
+        'expected_warnings': ['Failed to parse JSON'],
      }, {
          'url': 'http://www.foodnetwork.ca/shows/chopped/video/episode/chocolate-obsession/video.html?v=872683587753',
          'only_matching': True,
@@ -48,58 +78,83 @@ class CorusIE(ThePlatformFeedIE):
      }, {
          'url': 'https://www.bigbrothercanada.ca/video/big-brother-canada-704/1457812035894/',
          'only_matching': True
+    }, {
+        'url': 'https://www.seriesplus.com/emissions/dre-mary-mort-sur-ordonnance/videos/deux-coeurs-battant/SERP0055626330000200/',
+        'only_matching': True
+    }, {
+        'url': 'https://www.disneychannel.ca/shows/gabby-duran-the-unsittables/video/crybaby-duran-clip/2f557eec-0588-11ea-ae2b-e2c6776b770e/',
+        'only_matching': True
      }]
-
-    _TP_FEEDS = {
-        'globaltv': {
-            'feed_id': 'ChQqrem0lNUp',
-            'account_id': 2269680845,
-        },
-        'etcanada': {
-            'feed_id': 'ChQqrem0lNUp',
-            'account_id': 2269680845,
-        },
-        'hgtv': {
-            'feed_id': 'L0BMHXi2no43',
-            'account_id': 2414428465,
-        },
-        'foodnetwork': {
-            'feed_id': 'ukK8o58zbRmJ',
-            'account_id': 2414429569,
-        },
-        'slice': {
-            'feed_id': '5tUJLgV2YNJ5',
-            'account_id': 2414427935,
-        },
-        'history': {
-            'feed_id': 'tQFx_TyyEq4J',
-            'account_id': 2369613659,
-        },
-        'showcase': {
-            'feed_id': '9H6qyshBZU3E',
-            'account_id': 2414426607,
-        },
-        'bigbrothercanada': {
-            'feed_id': 'ChQqrem0lNUp',
-            'account_id': 2269680845,
-        },
+    _GEO_BYPASS = False
+    _SITE_MAP = {
+        'globaltv': 'series',
+        'etcanada': 'series',
+        'foodnetwork': 'food',
+        'bigbrothercanada': 'series',
+        'disneychannel': 'disneyen',
+        'disneylachaine': 'disneyfr',
      }
  
      def _real_extract(self, url):
          domain, video_id = re.match(self._VALID_URL, url).groups()
-        feed_info = self._TP_FEEDS[domain.split('.')[0]]
-        return self._extract_feed_info('dtjsEC', feed_info['feed_id'], 'byId=' + video_id, video_id, lambda e: {
-            'episode_number': int_or_none(e.get('pl1$episode')),
-            'season_number': int_or_none(e.get('pl1$season')),
-            'series': e.get('pl1$show'),
-        }, {
-            'HLS': {
-                'manifest': 'm3u',
-            },
-            'DesktopHLS Default': {
-                'manifest': 'm3u',
-            },
-            'MP4 MBR': {
-                'manifest': 'm3u',
-            },
-        }, feed_info['account_id'])
+        site = domain.split('.')[0]
+        path = self._SITE_MAP.get(site, site)
+        if path != 'series':
+            path = 'migration/' + path
+        video = self._download_json(
+            'https://globalcontent.corusappservices.com/templates/%s/playlist/' % path,
+            video_id, query={'byId': video_id},
+            headers={'Accept': 'application/json'})[0]
+        title = video['title']
+
+        formats = []
+        for source in video.get('sources', []):
+            smil_url = source.get('file')
+            if not smil_url:
+                continue
+            source_type = source.get('type')
+            note = 'Downloading%s smil file' % (' ' + source_type if source_type else '')
+            resp = self._download_webpage(
+                smil_url, video_id, note, fatal=False,
+                headers=self.geo_verification_headers())
+            if not resp:
+                continue
+            error = self._parse_json(resp, video_id, fatal=False)
+            if error:
+                if error.get('exception') == 'GeoLocationBlocked':
+                    self.raise_geo_restricted(countries=['CA'])
+                raise ExtractorError(error['description'])
+            smil = self._parse_xml(resp, video_id, fatal=False)
+            if smil is None:
+                continue
+            namespace = self._parse_smil_namespace(smil)
+            formats.extend(self._parse_smil_formats(
+                smil, smil_url, video_id, namespace))
+        if not formats and video.get('drm'):
+            raise ExtractorError('This video is DRM protected.', expected=True)
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for track in video.get('tracks', []):
+            track_url = track.get('file')
+            if not track_url:
+                continue
+            lang = 'fr' if site in ('disneylachaine', 'seriesplus') else 'en'
+            subtitles.setdefault(lang, []).append({'url': track_url})
+
+        metadata = video.get('metadata') or {}
+        get_number = lambda x: int_or_none(video.get('pl1$' + x) or metadata.get(x + 'Number'))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': dict_get(video, ('defaultThumbnailUrl', 'thumbnail', 'image')),
+            'description': video.get('description'),
+            'timestamp': int_or_none(video.get('availableDate'), 1000),
+            'subtitles': subtitles,
+            'duration': float_or_none(metadata.get('duration')),
+            'series': dict_get(video, ('show', 'pl1$show')),
+            'season_number': get_number('season'),
+            'episode_number': get_number('episode'),
+        }
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 745971900b23f1cbf44ce286b9c4a4a7e527a3da..327fdb04a71215b8de170314d7080093ced3bbba 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -1,50 +1,93 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import functools
-import hashlib
-import itertools
  import json
-import random
  import re
-import string
  
  from .common import InfoExtractor
-from ..compat import compat_struct_pack
+from ..compat import compat_HTTPError
  from ..utils import (
-    determine_ext,
-    error_to_compat_str,
+    age_restricted,
+    clean_html,
      ExtractorError,
      int_or_none,
-    mimetype2ext,
      OnDemandPagedList,
-    parse_iso8601,
-    sanitized_Request,
-    str_to_int,
      try_get,
      unescapeHTML,
-    update_url_query,
-    url_or_none,
      urlencode_postdata,
  )
  
  
  class DailymotionBaseInfoExtractor(InfoExtractor):
-    @staticmethod
-    def _build_request(url):
-        """Build a request with the family filter disabled"""
-        request = sanitized_Request(url)
-        request.add_header('Cookie', 'family_filter=off; ff=off')
-        return request
+    _FAMILY_FILTER = None
+    _HEADERS = {
+        'Content-Type': 'application/json',
+        'Origin': 'https://www.dailymotion.com',
+    }
+    _NETRC_MACHINE = 'dailymotion'
  
-    def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
-        request = self._build_request(url)
-        return self._download_webpage_handle(request, *args, **kwargs)
+    def _get_dailymotion_cookies(self):
+        return self._get_cookies('https://www.dailymotion.com/')
  
-    def _download_webpage_no_ff(self, url, *args, **kwargs):
-        request = self._build_request(url)
-        return self._download_webpage(request, *args, **kwargs)
+    @staticmethod
+    def _get_cookie_value(cookies, name):
+        cookie = cookies.get('name')
+        if cookie:
+            return cookie.value
+
+    def _set_dailymotion_cookie(self, name, value):
+        self._set_cookie('www.dailymotion.com', name, value)
+
+    def _real_initialize(self):
+        cookies = self._get_dailymotion_cookies()
+        ff = self._get_cookie_value(cookies, 'ff')
+        self._FAMILY_FILTER = ff == 'on' if ff else age_restricted(18, self._downloader.params.get('age_limit'))
+        self._set_dailymotion_cookie('ff', 'on' if self._FAMILY_FILTER else 'off')
+
+    def _call_api(self, object_type, xid, object_fields, note, filter_extra=None):
+        if not self._HEADERS.get('Authorization'):
+            cookies = self._get_dailymotion_cookies()
+            token = self._get_cookie_value(cookies, 'access_token') or self._get_cookie_value(cookies, 'client_token')
+            if not token:
+                data = {
+                    'client_id': 'f1a362d288c1b98099c7',
+                    'client_secret': 'eea605b96e01c796ff369935357eca920c5da4c5',
+                }
+                username, password = self._get_login_info()
+                if username:
+                    data.update({
+                        'grant_type': 'password',
+                        'password': password,
+                        'username': username,
+                    })
+                else:
+                    data['grant_type'] = 'client_credentials'
+                try:
+                    token = self._download_json(
+                        'https://graphql.api.dailymotion.com/oauth/token',
+                        None, 'Downloading Access Token',
+                        data=urlencode_postdata(data))['access_token']
+                except ExtractorError as e:
+                    if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
+                        raise ExtractorError(self._parse_json(
+                            e.cause.read().decode(), xid)['error_description'], expected=True)
+                    raise
+                self._set_dailymotion_cookie('access_token' if username else 'client_token', token)
+            self._HEADERS['Authorization'] = 'Bearer ' + token
+
+        resp = self._download_json(
+            'https://graphql.api.dailymotion.com/', xid, note, data=json.dumps({
+                'query': '''{
+  %s(xid: "%s"%s) {
+    %s
+  }
+}''' % (object_type, xid, ', ' + filter_extra if filter_extra else '', object_fields),
+            }).encode(), headers=self._HEADERS)
+        obj = resp['data'][object_type]
+        if not obj:
+            raise ExtractorError(resp['errors'][0]['message'], expected=True)
+        return obj
  
  
  class DailymotionIE(DailymotionBaseInfoExtractor):
@@ -54,18 +97,9 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                              (?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
                              (?:www\.)?lequipe\.fr/video
                          )
-                        /(?P<id>[^/?_]+)
+                        /(?P<id>[^/?_]+)(?:.+?\bplaylist=(?P<playlist_id>x[0-9a-z]+))?
                      '''
      IE_NAME = 'dailymotion'
-
-    _FORMATS = [
-        ('stream_h264_ld_url', 'ld'),
-        ('stream_h264_url', 'standard'),
-        ('stream_h264_hq_url', 'hq'),
-        ('stream_h264_hd_url', 'hd'),
-        ('stream_h264_hd1080_url', 'hd180'),
-    ]
-
      _TESTS = [{
          'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
          'md5': '074b95bdee76b9e3654137aee9c79dfe',
@@ -74,7 +108,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'Office Christmas Party Review –  Jason Bateman, Olivia Munn, T.J. Miller',
              'description': 'Office Christmas Party Review -  Jason Bateman, Olivia Munn, T.J. Miller',
-            'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
              'duration': 187,
              'timestamp': 1493651285,
              'upload_date': '20170501',
@@ -146,7 +179,16 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
      }, {
          'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
          'only_matching': True,
+    }, {
+        'url': 'https://www.dailymotion.com/video/x3z49k?playlist=xv4bw',
+        'only_matching': True,
      }]
+    _GEO_BYPASS = False
+    _COMMON_MEDIA_FIELDS = '''description
+      geoblockedCountries {
+        allowed
+      }
+      xid'''
  
      @staticmethod
      def _extract_urls(webpage):
@@ -162,264 +204,140 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
          return urls
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage_no_ff(
-            'https://www.dailymotion.com/video/%s' % video_id, video_id)
-
-        age_limit = self._rta_search(webpage)
-
-        description = self._og_search_description(
-            webpage, default=None) or self._html_search_meta(
-            'description', webpage, 'description')
-
-        view_count_str = self._search_regex(
-            (r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
-             r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
-            webpage, 'view count', default=None)
-        if view_count_str:
-            view_count_str = re.sub(r'\s', '', view_count_str)
-        view_count = str_to_int(view_count_str)
-        comment_count = int_or_none(self._search_regex(
-            r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
-            webpage, 'comment count', default=None))
-
-        player_v5 = self._search_regex(
-            [r'buildPlayer\(({.+?})\);\n',  # See https://github.com/ytdl-org/youtube-dl/issues/7826
-             r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
-             r'buildPlayer\(({.+?})\);',
-             r'var\s+config\s*=\s*({.+?});',
-             # New layout regex (see https://github.com/ytdl-org/youtube-dl/issues/13580)
-             r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
-            webpage, 'player v5', default=None)
-        if player_v5:
-            player = self._parse_json(player_v5, video_id, fatal=False) or {}
-            metadata = try_get(player, lambda x: x['metadata'], dict)
-            if not metadata:
-                metadata_url = url_or_none(try_get(
-                    player, lambda x: x['context']['metadata_template_url1']))
-                if metadata_url:
-                    metadata_url = metadata_url.replace(':videoId', video_id)
-                else:
-                    metadata_url = update_url_query(
-                        'https://www.dailymotion.com/player/metadata/video/%s'
-                        % video_id, {
-                            'embedder': url,
-                            'integration': 'inline',
-                            'GK_PV5_NEON': '1',
-                        })
-                metadata = self._download_json(
-                    metadata_url, video_id, 'Downloading metadata JSON')
-
-            if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
-                password = self._downloader.params.get('videopassword')
-                if password:
-                    r = int(metadata['id'][1:], 36)
-                    us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
-                    t = ''.join(random.choice(string.ascii_letters) for i in range(10))
-                    n = us64e(compat_struct_pack('I', r))
-                    i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
-                    metadata = self._download_json(
-                        'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
-
-            self._check_error(metadata)
-
-            formats = []
-            for quality, media_list in metadata['qualities'].items():
-                for media in media_list:
-                    media_url = media.get('url')
-                    if not media_url:
-                        continue
-                    type_ = media.get('type')
-                    if type_ == 'application/vnd.lumberjack.manifest':
-                        continue
-                    ext = mimetype2ext(type_) or determine_ext(media_url)
-                    if ext == 'm3u8':
-                        m3u8_formats = self._extract_m3u8_formats(
-                            media_url, video_id, 'mp4', preference=-1,
-                            m3u8_id='hls', fatal=False)
-                        for f in m3u8_formats:
-                            f['url'] = f['url'].split('#')[0]
-                            formats.append(f)
-                    elif ext == 'f4m':
-                        formats.extend(self._extract_f4m_formats(
-                            media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
-                    else:
-                        f = {
-                            'url': media_url,
-                            'format_id': 'http-%s' % quality,
-                            'ext': ext,
-                        }
-                        m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
-                        if m:
-                            f.update({
-                                'width': int(m.group('width')),
-                                'height': int(m.group('height')),
-                            })
-                        formats.append(f)
-            self._sort_formats(formats)
-
-            title = metadata['title']
-            duration = int_or_none(metadata.get('duration'))
-            timestamp = int_or_none(metadata.get('created_time'))
-            thumbnail = metadata.get('poster_url')
-            uploader = metadata.get('owner', {}).get('screenname')
-            uploader_id = metadata.get('owner', {}).get('id')
-
-            subtitles = {}
-            subtitles_data = metadata.get('subtitles', {}).get('data', {})
-            if subtitles_data and isinstance(subtitles_data, dict):
-                for subtitle_lang, subtitle in subtitles_data.items():
-                    subtitles[subtitle_lang] = [{
-                        'ext': determine_ext(subtitle_url),
-                        'url': subtitle_url,
-                    } for subtitle_url in subtitle.get('urls', [])]
-
-            return {
-                'id': video_id,
-                'title': title,
-                'description': description,
-                'thumbnail': thumbnail,
-                'duration': duration,
-                'timestamp': timestamp,
-                'uploader': uploader,
-                'uploader_id': uploader_id,
-                'age_limit': age_limit,
-                'view_count': view_count,
-                'comment_count': comment_count,
-                'formats': formats,
-                'subtitles': subtitles,
-            }
-
-        # vevo embed
-        vevo_id = self._search_regex(
-            r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
-            webpage, 'vevo embed', default=None)
-        if vevo_id:
-            return self.url_result('vevo:%s' % vevo_id, 'Vevo')
-
-        # fallback old player
-        embed_page = self._download_webpage_no_ff(
-            'https://www.dailymotion.com/embed/video/%s' % video_id,
-            video_id, 'Downloading embed page')
-
-        timestamp = parse_iso8601(self._html_search_meta(
-            'video:release_date', webpage, 'upload date'))
-
-        info = self._parse_json(
-            self._search_regex(
-                r'var info = ({.*?}),$', embed_page,
-                'video info', flags=re.MULTILINE),
-            video_id)
-
-        self._check_error(info)
+        video_id, playlist_id = re.match(self._VALID_URL, url).groups()
+
+        if playlist_id:
+            if not self._downloader.params.get('noplaylist'):
+                self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
+                return self.url_result(
+                    'http://www.dailymotion.com/playlist/' + playlist_id,
+                    'DailymotionPlaylist', playlist_id)
+            self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
+
+        password = self._downloader.params.get('videopassword')
+        media = self._call_api(
+            'media', video_id, '''... on Video {
+      %s
+      stats {
+        likes {
+          total
+        }
+        views {
+          total
+        }
+      }
+    }
+    ... on Live {
+      %s
+      audienceCount
+      isOnAir
+    }''' % (self._COMMON_MEDIA_FIELDS, self._COMMON_MEDIA_FIELDS), 'Downloading media JSON metadata',
+            'password: "%s"' % self._downloader.params.get('videopassword') if password else None)
+        xid = media['xid']
+
+        metadata = self._download_json(
+            'https://www.dailymotion.com/player/metadata/video/' + xid,
+            xid, 'Downloading metadata JSON',
+            query={'app': 'com.dailymotion.neon'})
+
+        error = metadata.get('error')
+        if error:
+            title = error.get('title') or error['raw_message']
+            # See https://developer.dailymotion.com/api#access-error
+            if error.get('code') == 'DM007':
+                allowed_countries = try_get(media, lambda x: x['geoblockedCountries']['allowed'], list)
+                self.raise_geo_restricted(msg=title, countries=allowed_countries)
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, title), expected=True)
  
+        title = metadata['title']
+        is_live = media.get('isOnAir')
          formats = []
-        for (key, format_id) in self._FORMATS:
-            video_url = info.get(key)
-            if video_url is not None:
-                m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
-                if m_size is not None:
-                    width, height = map(int_or_none, (m_size.group(1), m_size.group(2)))
+        for quality, media_list in metadata['qualities'].items():
+            for m in media_list:
+                media_url = m.get('url')
+                media_type = m.get('type')
+                if not media_url or media_type == 'application/vnd.lumberjack.manifest':
+                    continue
+                if media_type == 'application/x-mpegURL':
+                    formats.extend(self._extract_m3u8_formats(
+                        media_url, video_id, 'mp4',
+                        'm3u8' if is_live else 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
                  else:
-                    width, height = None, None
-                formats.append({
-                    'url': video_url,
-                    'ext': 'mp4',
-                    'format_id': format_id,
-                    'width': width,
-                    'height': height,
-                })
+                    f = {
+                        'url': media_url,
+                        'format_id': 'http-' + quality,
+                    }
+                    m = re.search(r'/H264-(\d+)x(\d+)(?:-(60)/)?', media_url)
+                    if m:
+                        width, height, fps = map(int_or_none, m.groups())
+                        f.update({
+                            'fps': fps,
+                            'height': height,
+                            'width': width,
+                        })
+                    formats.append(f)
+        for f in formats:
+            f['url'] = f['url'].split('#')[0]
+            if not f.get('fps') and f['format_id'].endswith('@60'):
+                f['fps'] = 60
          self._sort_formats(formats)
  
-        # subtitles
-        video_subtitles = self.extract_subtitles(video_id, webpage)
-
-        title = self._og_search_title(webpage, default=None)
-        if title is None:
-            title = self._html_search_regex(
-                r'(?s)<span\s+id="video_title"[^>]*>(.*?)</span>', webpage,
-                'title')
+        subtitles = {}
+        subtitles_data = try_get(metadata, lambda x: x['subtitles']['data'], dict) or {}
+        for subtitle_lang, subtitle in subtitles_data.items():
+            subtitles[subtitle_lang] = [{
+                'url': subtitle_url,
+            } for subtitle_url in subtitle.get('urls', [])]
+
+        thumbnails = []
+        for height, poster_url in metadata.get('posters', {}).items():
+            thumbnails.append({
+                'height': int_or_none(height),
+                'id': height,
+                'url': poster_url,
+            })
+
+        owner = metadata.get('owner') or {}
+        stats = media.get('stats') or {}
+        get_count = lambda x: int_or_none(try_get(stats, lambda y: y[x + 's']['total']))
  
          return {
              'id': video_id,
+            'title': self._live_title(title) if is_live else title,
+            'description': clean_html(media.get('description')),
+            'thumbnails': thumbnails,
+            'duration': int_or_none(metadata.get('duration')) or None,
+            'timestamp': int_or_none(metadata.get('created_time')),
+            'uploader': owner.get('screenname'),
+            'uploader_id': owner.get('id') or metadata.get('screenname'),
+            'age_limit': 18 if metadata.get('explicit') else 0,
+            'tags': metadata.get('tags'),
+            'view_count': get_count('view') or int_or_none(media.get('audienceCount')),
+            'like_count': get_count('like'),
              'formats': formats,
-            'uploader': info['owner.screenname'],
-            'timestamp': timestamp,
-            'title': title,
-            'description': description,
-            'subtitles': video_subtitles,
-            'thumbnail': info['thumbnail_url'],
-            'age_limit': age_limit,
-            'view_count': view_count,
-            'duration': info['duration']
+            'subtitles': subtitles,
+            'is_live': is_live,
          }
  
-    def _check_error(self, info):
-        error = info.get('error')
-        if error:
-            title = error.get('title') or error['message']
-            # See https://developer.dailymotion.com/api#access-error
-            if error.get('code') == 'DM007':
-                self.raise_geo_restricted(msg=title)
-            raise ExtractorError(
-                '%s said: %s' % (self.IE_NAME, title), expected=True)
-
-    def _get_subtitles(self, video_id, webpage):
-        try:
-            sub_list = self._download_webpage(
-                'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
-                video_id, note=False)
-        except ExtractorError as err:
-            self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
-            return {}
-        info = json.loads(sub_list)
-        if (info['total'] > 0):
-            sub_lang_list = dict((l['language'], [{'url': l['url'], 'ext': 'srt'}]) for l in info['list'])
-            return sub_lang_list
-        self._downloader.report_warning('video doesn\'t have subtitles')
-        return {}
-
  
-class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
-    IE_NAME = 'dailymotion:playlist'
-    _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
-    _TESTS = [{
-        'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
-        'info_dict': {
-            'title': 'SPORT',
-            'id': 'xv4bw',
-        },
-        'playlist_mincount': 20,
-    }]
+class DailymotionPlaylistBaseIE(DailymotionBaseInfoExtractor):
      _PAGE_SIZE = 100
  
-    def _fetch_page(self, playlist_id, authorizaion, page):
+    def _fetch_page(self, playlist_id, page):
          page += 1
-        videos = self._download_json(
-            'https://graphql.api.dailymotion.com',
-            playlist_id, 'Downloading page %d' % page,
-            data=json.dumps({
-                'query': '''{
-  collection(xid: "%s") {
-    videos(first: %d, page: %d) {
-      pageInfo {
-        hasNextPage
-        nextPage
-      }
+        videos = self._call_api(
+            self._OBJECT_TYPE, playlist_id,
+            '''videos(allowExplicit: %s, first: %d, page: %d) {
        edges {
          node {
            xid
            url
          }
        }
-    }
-  }
-}''' % (playlist_id, self._PAGE_SIZE, page)
-            }).encode(), headers={
-                'Authorization': authorizaion,
-                'Origin': 'https://www.dailymotion.com',
-            })['data']['collection']['videos']
+    }''' % ('false' if self._FAMILY_FILTER else 'true', self._PAGE_SIZE, page),
+            'Downloading page %d' % page)['videos']
          for edge in videos['edges']:
              node = edge['node']
              yield self.url_result(
@@ -427,86 +345,49 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
  
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
-        webpage = self._download_webpage(url, playlist_id)
-        api = self._parse_json(self._search_regex(
-            r'__PLAYER_CONFIG__\s*=\s*({.+?});',
-            webpage, 'player config'), playlist_id)['context']['api']
-        auth = self._download_json(
-            api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
-            playlist_id, data=urlencode_postdata({
-                'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
-                'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
-                'grant_type': 'client_credentials',
-            }))
-        authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
          entries = OnDemandPagedList(functools.partial(
-            self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
+            self._fetch_page, playlist_id), self._PAGE_SIZE)
          return self.playlist_result(
-            entries, playlist_id,
-            self._og_search_title(webpage))
+            entries, playlist_id)
+
+
+class DailymotionPlaylistIE(DailymotionPlaylistBaseIE):
+    IE_NAME = 'dailymotion:playlist'
+    _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
+    _TESTS = [{
+        'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
+        'info_dict': {
+            'id': 'xv4bw',
+        },
+        'playlist_mincount': 20,
+    }]
+    _OBJECT_TYPE = 'collection'
  
  
-class DailymotionUserIE(DailymotionBaseInfoExtractor):
+class DailymotionUserIE(DailymotionPlaylistBaseIE):
      IE_NAME = 'dailymotion:user'
-    _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
-    _MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
-    _PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
+    _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'https://www.dailymotion.com/user/nqtv',
          'info_dict': {
              'id': 'nqtv',
-            'title': 'Rémi Gaillard',
          },
-        'playlist_mincount': 100,
+        'playlist_mincount': 152,
      }, {
          'url': 'http://www.dailymotion.com/user/UnderProject',
          'info_dict': {
              'id': 'UnderProject',
-            'title': 'UnderProject',
          },
-        'playlist_mincount': 1800,
-        'expected_warnings': [
-            'Stopped at duplicated page',
-        ],
+        'playlist_mincount': 1000,
          'skip': 'Takes too long time',
+    }, {
+        'url': 'https://www.dailymotion.com/user/nqtv',
+        'info_dict': {
+            'id': 'nqtv',
+        },
+        'playlist_mincount': 148,
+        'params': {
+            'age_limit': 0,
+        },
      }]
-
-    def _extract_entries(self, id):
-        video_ids = set()
-        processed_urls = set()
-        for pagenum in itertools.count(1):
-            page_url = self._PAGE_TEMPLATE % (id, pagenum)
-            webpage, urlh = self._download_webpage_handle_no_ff(
-                page_url, id, 'Downloading page %s' % pagenum)
-            if urlh.geturl() in processed_urls:
-                self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
-                    page_url, urlh.geturl()), id)
-                break
-
-            processed_urls.add(urlh.geturl())
-
-            for video_id in re.findall(r'data-xid="(.+?)"', webpage):
-                if video_id not in video_ids:
-                    yield self.url_result(
-                        'http://www.dailymotion.com/video/%s' % video_id,
-                        DailymotionIE.ie_key(), video_id)
-                    video_ids.add(video_id)
-
-            if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
-                break
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        user = mobj.group('user')
-        webpage = self._download_webpage(
-            'https://www.dailymotion.com/user/%s' % user, user)
-        full_user = unescapeHTML(self._html_search_regex(
-            r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
-            webpage, 'user'))
-
-        return {
-            '_type': 'playlist',
-            'id': user,
-            'title': full_user,
-            'entries': self._extract_entries(user),
-        }
+    _OBJECT_TYPE = 'channel'
diff --git a/youtube_dl/extractor/daisuki.py b/youtube_dl/extractor/daisuki.py

deleted file mode 100644 (file)

index dbc1aa5..0000000
--- a/youtube_dl/extractor/daisuki.py
+++ /dev/null
@@ -1,154 +0,0 @@
-from __future__ import unicode_literals
-
-import base64
-import json
-import random
-import re
-
-from .common import InfoExtractor
-from ..aes import (
-    aes_cbc_decrypt,
-    aes_cbc_encrypt,
-)
-from ..compat import compat_b64decode
-from ..utils import (
-    bytes_to_intlist,
-    bytes_to_long,
-    extract_attributes,
-    ExtractorError,
-    intlist_to_bytes,
-    js_to_json,
-    int_or_none,
-    long_to_bytes,
-    pkcs1pad,
-)
-
-
-class DaisukiMottoIE(InfoExtractor):
-    _VALID_URL = r'https?://motto\.daisuki\.net/framewatch/embed/[^/]+/(?P<id>[0-9a-zA-Z]{3})'
-
-    _TEST = {
-        'url': 'http://motto.daisuki.net/framewatch/embed/embedDRAGONBALLSUPERUniverseSurvivalsaga/V2e/760/428',
-        'info_dict': {
-            'id': 'V2e',
-            'ext': 'mp4',
-            'title': '#117 SHOWDOWN OF LOVE! ANDROIDS VS UNIVERSE 2!!',
-            'subtitles': {
-                'mul': [{
-                    'ext': 'ttml',
-                }],
-            },
-        },
-        'params': {
-            'skip_download': True,  # AES-encrypted HLS stream
-        },
-    }
-
-    # The public key in PEM format can be found in clientlibs_anime_watch.min.js
-    _RSA_KEY = (0xc5524c25e8e14b366b3754940beeb6f96cb7e2feef0b932c7659a0c5c3bf173d602464c2df73d693b513ae06ff1be8f367529ab30bf969c5640522181f2a0c51ea546ae120d3d8d908595e4eff765b389cde080a1ef7f1bbfb07411cc568db73b7f521cedf270cbfbe0ddbc29b1ac9d0f2d8f4359098caffee6d07915020077d, 65537)
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        flashvars = self._parse_json(self._search_regex(
-            r'(?s)var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
-            video_id, transform_source=js_to_json)
-
-        iv = [0] * 16
-
-        data = {}
-        for key in ('device_cd', 'mv_id', 'ss1_prm', 'ss2_prm', 'ss3_prm', 'ss_id'):
-            data[key] = flashvars.get(key, '')
-
-        encrypted_rtn = None
-
-        # Some AES keys are rejected. Try it with different AES keys
-        for idx in range(5):
-            aes_key = [random.randint(0, 254) for _ in range(32)]
-            padded_aeskey = intlist_to_bytes(pkcs1pad(aes_key, 128))
-
-            n, e = self._RSA_KEY
-            encrypted_aeskey = long_to_bytes(pow(bytes_to_long(padded_aeskey), e, n))
-            init_data = self._download_json(
-                'http://motto.daisuki.net/fastAPI/bgn/init/',
-                video_id, query={
-                    's': flashvars.get('s', ''),
-                    'c': flashvars.get('ss3_prm', ''),
-                    'e': url,
-                    'd': base64.b64encode(intlist_to_bytes(aes_cbc_encrypt(
-                        bytes_to_intlist(json.dumps(data)),
-                        aes_key, iv))).decode('ascii'),
-                    'a': base64.b64encode(encrypted_aeskey).decode('ascii'),
-                }, note='Downloading JSON metadata' + (' (try #%d)' % (idx + 1) if idx > 0 else ''))
-
-            if 'rtn' in init_data:
-                encrypted_rtn = init_data['rtn']
-                break
-
-            self._sleep(5, video_id)
-
-        if encrypted_rtn is None:
-            raise ExtractorError('Failed to fetch init data')
-
-        rtn = self._parse_json(
-            intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
-                compat_b64decode(encrypted_rtn)),
-                aes_key, iv)).decode('utf-8').rstrip('\0'),
-            video_id)
-
-        title = rtn['title_str']
-
-        formats = self._extract_m3u8_formats(
-            rtn['play_url'], video_id, ext='mp4', entry_protocol='m3u8_native')
-
-        subtitles = {}
-        caption_url = rtn.get('caption_url')
-        if caption_url:
-            # mul: multiple languages
-            subtitles['mul'] = [{
-                'url': caption_url,
-                'ext': 'ttml',
-            }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
-
-
-class DaisukiMottoPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://motto\.daisuki\.net/(?P<id>information)/'
-
-    _TEST = {
-        'url': 'http://motto.daisuki.net/information/',
-        'info_dict': {
-            'title': 'DRAGON BALL SUPER',
-        },
-        'playlist_mincount': 117,
-    }
-
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, playlist_id)
-
-        entries = []
-        for li in re.findall(r'(<li[^>]+?data-product_id="[a-zA-Z0-9]{3}"[^>]+>)', webpage):
-            attr = extract_attributes(li)
-            ad_id = attr.get('data-ad_id')
-            product_id = attr.get('data-product_id')
-            if ad_id and product_id:
-                episode_id = attr.get('data-chapter')
-                entries.append({
-                    '_type': 'url_transparent',
-                    'url': 'http://motto.daisuki.net/framewatch/embed/%s/%s/760/428' % (ad_id, product_id),
-                    'episode_id': episode_id,
-                    'episode_number': int_or_none(episode_id),
-                    'ie_key': 'DaisukiMotto',
-                })
-
-        return self.playlist_result(entries, playlist_title='DRAGON BALL SUPER')
diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py

index 76f0218923536b29550c9384ce8348baf05289d5..13709557754afc133c2654e202d52c32d5d85d66 100644 (file)
--- a/youtube_dl/extractor/daum.py
+++ b/youtube_dl/extractor/daum.py
@@ -2,25 +2,21 @@
  
  from __future__ import unicode_literals
  
-import re
  import itertools
  
  from .common import InfoExtractor
  from ..compat import (
      compat_parse_qs,
      compat_urllib_parse_unquote,
-    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
-from ..utils import (
-    int_or_none,
-    str_to_int,
-    xpath_text,
-    unescapeHTML,
-)
  
  
-class DaumIE(InfoExtractor):
+class DaumBaseIE(InfoExtractor):
+    _KAKAO_EMBED_BASE = 'http://tv.kakao.com/embed/player/cliplink/'
+
+
+class DaumIE(DaumBaseIE):
      _VALID_URL = r'https?://(?:(?:m\.)?tvpot\.daum\.net/v/|videofarm\.daum\.net/controller/player/VodPlayer\.swf\?vid=)(?P<id>[^?#&]+)'
      IE_NAME = 'daum.net'
  
@@ -36,6 +32,9 @@ class DaumIE(InfoExtractor):
              'duration': 2117,
              'view_count': int,
              'comment_count': int,
+            'uploader_id': 186139,
+            'uploader': '콘간지',
+            'timestamp': 1387310323,
          },
      }, {
          'url': 'http://m.tvpot.daum.net/v/65139429',
@@ -44,11 +43,14 @@ class DaumIE(InfoExtractor):
              'ext': 'mp4',
              'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
              'description': 'md5:79794514261164ff27e36a21ad229fc5',
-            'upload_date': '20150604',
+            'upload_date': '20150118',
              'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 154,
              'view_count': int,
              'comment_count': int,
+            'uploader': 'MBC 예능',
+            'uploader_id': 132251,
+            'timestamp': 1421604228,
          },
      }, {
          'url': 'http://tvpot.daum.net/v/07dXWRka62Y%24',
@@ -59,12 +61,15 @@ class DaumIE(InfoExtractor):
              'id': 'vwIpVpCQsT8$',
              'ext': 'flv',
              'title': '01-Korean War ( Trouble on the horizon )',
-            'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
+            'description': 'Korean War 01\r\nTrouble on the horizon\r\n전쟁의 먹구름',
              'upload_date': '20080223',
              'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 249,
              'view_count': int,
              'comment_count': int,
+            'uploader': '까칠한 墮落始祖 황비홍님의',
+            'uploader_id': 560824,
+            'timestamp': 1203770745,
          },
      }, {
          # Requires dte_type=WEB (#9972)
@@ -73,60 +78,24 @@ class DaumIE(InfoExtractor):
          'info_dict': {
              'id': 's3794Uf1NZeZ1qMpGpeqeRU',
              'ext': 'mp4',
-            'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny) [쇼! 음악중심] 508회 20160611',
-            'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\n\n[쇼! 음악중심] 20160611, 507회',
-            'upload_date': '20160611',
+            'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
+            'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
+            'upload_date': '20170129',
+            'uploader': '쇼! 음악중심',
+            'uploader_id': 2653210,
+            'timestamp': 1485684628,
          },
      }]
  
      def _real_extract(self, url):
          video_id = compat_urllib_parse_unquote(self._match_id(url))
-        movie_data = self._download_json(
-            'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json',
-            video_id, 'Downloading video formats info', query={'vid': video_id, 'dte_type': 'WEB'})
-
-        # For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
-        if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
-            return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
-
-        info = self._download_xml(
-            'http://tvpot.daum.net/clip/ClipInfoXml.do', video_id,
-            'Downloading video info', query={'vid': video_id})
-
-        formats = []
-        for format_el in movie_data['output_list']['output_list']:
-            profile = format_el['profile']
-            format_query = compat_urllib_parse_urlencode({
-                'vid': video_id,
-                'profile': profile,
-            })
-            url_doc = self._download_xml(
-                'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
-                video_id, note='Downloading video data for %s format' % profile)
-            format_url = url_doc.find('result/url').text
-            formats.append({
-                'url': format_url,
-                'format_id': profile,
-                'width': int_or_none(format_el.get('width')),
-                'height': int_or_none(format_el.get('height')),
-                'filesize': int_or_none(format_el.get('filesize')),
-            })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': info.find('TITLE').text,
-            'formats': formats,
-            'thumbnail': xpath_text(info, 'THUMB_URL'),
-            'description': xpath_text(info, 'CONTENTS'),
-            'duration': int_or_none(xpath_text(info, 'DURATION')),
-            'upload_date': info.find('REGDTTM').text[:8],
-            'view_count': str_to_int(xpath_text(info, 'PLAY_CNT')),
-            'comment_count': str_to_int(xpath_text(info, 'COMMENT_CNT')),
-        }
+        if not video_id.isdigit():
+            video_id += '@my'
+        return self.url_result(
+            self._KAKAO_EMBED_BASE + video_id, 'Kakao', video_id)
  
  
-class DaumClipIE(InfoExtractor):
+class DaumClipIE(DaumBaseIE):
      _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.(?:do|tv)|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
      IE_NAME = 'daum.net:clip'
      _URL_TEMPLATE = 'http://tvpot.daum.net/clip/ClipView.do?clipid=%s'
@@ -142,6 +111,9 @@ class DaumClipIE(InfoExtractor):
              'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 3868,
              'view_count': int,
+            'uploader': 'GOMeXP',
+            'uploader_id': 6667,
+            'timestamp': 1377911092,
          },
      }, {
          'url': 'http://m.tvpot.daum.net/clip/ClipView.tv?clipid=54999425',
@@ -154,22 +126,8 @@ class DaumClipIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        clip_info = self._download_json(
-            'http://tvpot.daum.net/mypot/json/GetClipInfo.do?clipid=%s' % video_id,
-            video_id, 'Downloading clip info')['clip_bean']
-
-        return {
-            '_type': 'url_transparent',
-            'id': video_id,
-            'url': 'http://tvpot.daum.net/v/%s' % clip_info['vid'],
-            'title': unescapeHTML(clip_info['title']),
-            'thumbnail': clip_info.get('thumb_url'),
-            'description': clip_info.get('contents'),
-            'duration': int_or_none(clip_info.get('duration')),
-            'upload_date': clip_info.get('up_date')[:8],
-            'view_count': int_or_none(clip_info.get('play_count')),
-            'ie_key': 'Daum',
-        }
+        return self.url_result(
+            self._KAKAO_EMBED_BASE + video_id, 'Kakao', video_id)
  
  
  class DaumListIE(InfoExtractor):
diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py

index 04ff214f727826a60bbdde5ec17bb48ba004a91e..e700f8d86531415da0f1db0f2ccdaef6ea10ac53 100644 (file)
--- a/youtube_dl/extractor/dctp.py
+++ b/youtube_dl/extractor/dctp.py
@@ -16,10 +16,11 @@ class DctpTvIE(InfoExtractor):
      _TESTS = [{
          # 4x3
          'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
+        'md5': '3ffbd1556c3fe210724d7088fad723e3',
          'info_dict': {
              'id': '95eaa4f33dad413aa17b4ee613cccc6c',
              'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
-            'ext': 'flv',
+            'ext': 'm4v',
              'title': 'Videoinstallation für eine Kaufhausfassade',
              'description': 'Kurzfilm',
              'thumbnail': r're:^https?://.*\.jpg$',
@@ -27,10 +28,6 @@ class DctpTvIE(InfoExtractor):
              'timestamp': 1302172322,
              'upload_date': '20110407',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
      }, {
          # 16x9
          'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@@ -59,33 +56,26 @@ class DctpTvIE(InfoExtractor):
  
          uuid = media['uuid']
          title = media['title']
-        ratio = '16x9' if media.get('is_wide') else '4x3'
-        play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
-
-        servers = self._download_json(
-            'http://www.dctp.tv/streaming_servers/', display_id,
-            note='Downloading server list JSON', fatal=False)
-
-        if servers:
-            endpoint = next(
-                server['endpoint']
-                for server in servers
-                if url_or_none(server.get('endpoint'))
-                and 'cloudfront' in server['endpoint'])
-        else:
-            endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
-
-        app = self._search_regex(
-            r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
-
-        formats = [{
-            'url': endpoint,
-            'app': app,
-            'play_path': play_path,
-            'page_url': url,
-            'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
-            'ext': 'flv',
-        }]
+        is_wide = media.get('is_wide')
+        formats = []
+
+        def add_formats(suffix):
+            templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
+            formats.extend([{
+                'format_id': 'hls-' + suffix,
+                'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
+                'protocol': 'm3u8_native',
+            }, {
+                'format_id': 's3-' + suffix,
+                'url': templ % 'completed-media.s3.amazonaws.com',
+            }, {
+                'format_id': 'http-' + suffix,
+                'url': templ % 'cdn-media.dctp.tv',
+            }])
+
+        add_formats('0500_' + ('16x9' if is_wide else '4x3'))
+        if is_wide:
+            add_formats('720p')
  
          thumbnails = []
          images = media.get('images')
diff --git a/youtube_dl/extractor/discovery.py b/youtube_dl/extractor/discovery.py

index 6a2712cc50429b7297a9d4fe9e1ec2d80177986e..e0139cc862d74bc3c20a9d6567747b96b642c730 100644 (file)
--- a/youtube_dl/extractor/discovery.py
+++ b/youtube_dl/extractor/discovery.py
@@ -13,8 +13,8 @@ from ..compat import compat_HTTPError
  class DiscoveryIE(DiscoveryGoBaseIE):
      _VALID_URL = r'''(?x)https?://
          (?P<site>
-            (?:(?:www|go)\.)?discovery|
-            (?:www\.)?
+            go\.discovery|
+            www\.
                  (?:
                      investigationdiscovery|
                      discoverylife|
@@ -22,8 +22,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
                      ahctv|
                      destinationamerica|
                      sciencechannel|
-                    tlc|
-                    velocity
+                    tlc
                  )|
              watch\.
                  (?:
@@ -83,7 +82,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
                      'authRel': 'authorization',
                      'client_id': '3020a40c2356a645b4b4',
                      'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
-                    'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
+                    'redirectUri': 'https://www.discovery.com/',
                  })['access_token']
  
          headers = self.geo_verification_headers()
diff --git a/youtube_dl/extractor/discoverynetworks.py b/youtube_dl/extractor/discoverynetworks.py

index fba1ef22180a6f4adf93c83e23d8370b6da87c56..607a54948535f018f02748b9478f1139d6edb237 100644 (file)
--- a/youtube_dl/extractor/discoverynetworks.py
+++ b/youtube_dl/extractor/discoverynetworks.py
@@ -3,63 +3,38 @@ from __future__ import unicode_literals
  
  import re
  
-from .brightcove import BrightcoveLegacyIE
  from .dplay import DPlayIE
-from ..compat import (
-    compat_parse_qs,
-    compat_urlparse,
-)
-from ..utils import smuggle_url
  
  
  class DiscoveryNetworksDeIE(DPlayIE):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>discovery|tlc|animalplanet|dmax)\.de/
-                        (?:
-                           .*\#(?P<id>\d+)|
-                           (?:[^/]+/)*videos/(?P<display_id>[^/?#]+)|
-                           programme/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)
-                        )'''
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show)/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)'
  
      _TESTS = [{
-        'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
+        'url': 'https://www.tlc.de/programme/breaking-amish/video/die-welt-da-drauen/DCB331270001100',
          'info_dict': {
-            'id': '3235167922001',
+            'id': '78867',
              'ext': 'mp4',
-            'title': 'Breaking Amish: Die Welt da draußen',
-            'description': (
-                'Vier Amische und eine Mennonitin wagen in New York'
-                '  den Sprung in ein komplett anderes Leben. Begleitet sie auf'
-                ' ihrem spannenden Weg.'),
-            'timestamp': 1396598084,
-            'upload_date': '20140404',
-            'uploader_id': '1659832546',
+            'title': 'Die Welt da draußen',
+            'description': 'md5:61033c12b73286e409d99a41742ef608',
+            'timestamp': 1554069600,
+            'upload_date': '20190331',
+        },
+        'params': {
+            'format': 'bestvideo',
+            'skip_download': True,
          },
      }, {
-        'url': 'http://www.dmax.de/programme/storage-hunters-uk/videos/storage-hunters-uk-episode-6/',
+        'url': 'https://www.dmax.de/programme/dmax-highlights/video/tuning-star-sidney-hoffmann-exklusiv-bei-dmax/191023082312316',
          'only_matching': True,
      }, {
-        'url': 'http://www.discovery.de/#5332316765001',
+        'url': 'https://www.dplay.co.uk/show/ghost-adventures/video/hotel-leger-103620/EHD_280313B',
          'only_matching': True,
      }]
-    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1659832546/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        alternate_id = mobj.group('alternate_id')
-        if alternate_id:
-            self._initialize_geo_bypass({
-                'countries': ['DE'],
-            })
-            return self._get_disco_api_info(
-                url, '%s/%s' % (mobj.group('programme'), alternate_id),
-                'sonic-eu1-prod.disco-api.com', mobj.group('site') + 'de')
-        brightcove_id = mobj.group('id')
-        if not brightcove_id:
-            title = mobj.group('title')
-            webpage = self._download_webpage(url, title)
-            brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
-            brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
-                brightcove_legacy_url).query)['@videoPlayer'][0]
-        return self.url_result(smuggle_url(
-            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['DE']}),
-            'BrightcoveNew', brightcove_id)
+        domain, programme, alternate_id = re.match(self._VALID_URL, url).groups()
+        country = 'GB' if domain == 'dplay.co.uk' else 'DE'
+        realm = 'questuk' if country == 'GB' else domain.replace('.', '')
+        return self._get_disco_api_info(
+            url, '%s/%s' % (programme, alternate_id),
+            'sonic-eu1-prod.disco-api.com', realm, country)
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

index ebf59512c6bdf89df0efeb22c8cf047156dd4a7d..a7b9db5689c4df697f1606a343995ef3d29ee03a 100644 (file)
--- a/youtube_dl/extractor/dplay.py
+++ b/youtube_dl/extractor/dplay.py
@@ -1,74 +1,68 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import json
  import re
-import time
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_HTTPError,
-    compat_str,
-    compat_urlparse,
-)
+from ..compat import compat_HTTPError
  from ..utils import (
      determine_ext,
      ExtractorError,
      float_or_none,
      int_or_none,
-    remove_end,
-    try_get,
-    unified_strdate,
      unified_timestamp,
-    update_url_query,
-    urljoin,
-    USER_AGENTS,
  )
  
  
  class DPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:video(?:er|s)/)?(?P<id>[^/]+/[^/?#]+)'
+    _VALID_URL = r'''(?x)https?://
+        (?P<domain>
+            (?:www\.)?(?P<host>dplay\.(?P<country>dk|fi|jp|se|no))|
+            (?P<subdomain_country>es|it)\.dplay\.com
+        )/[^/]+/(?P<id>[^/]+/[^/?#]+)'''
  
      _TESTS = [{
          # non geo restricted, via secure api, unsigned download hls URL
-        'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
+        'url': 'https://www.dplay.se/videos/nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
          'info_dict': {
-            'id': '3172',
-            'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet',
+            'id': '13628',
+            'display_id': 'nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
              'ext': 'mp4',
              'title': 'Svensken lär sig njuta av livet',
              'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
-            'duration': 2650,
-            'timestamp': 1365454320,
+            'duration': 2649.856,
+            'timestamp': 1365453720,
              'upload_date': '20130408',
-            'creator': 'Kanal 5 (Home)',
+            'creator': 'Kanal 5',
              'series': 'Nugammalt - 77 händelser som format Sverige',
              'season_number': 1,
              'episode_number': 1,
-            'age_limit': 0,
+        },
+        'params': {
+            'format': 'bestvideo',
+            'skip_download': True,
          },
      }, {
          # geo restricted, via secure api, unsigned download hls URL
-        'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
+        'url': 'http://www.dplay.dk/videoer/ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
          'info_dict': {
-            'id': '70816',
-            'display_id': 'mig-og-min-mor/season-6-episode-12',
+            'id': '104465',
+            'display_id': 'ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
              'ext': 'mp4',
-            'title': 'Episode 12',
-            'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
-            'duration': 2563,
-            'timestamp': 1429696800,
-            'upload_date': '20150422',
-            'creator': 'Kanal 4 (Home)',
-            'series': 'Mig og min mor',
-            'season_number': 6,
-            'episode_number': 12,
-            'age_limit': 0,
+            'title': 'Ted Bundy: Mind Of A Monster',
+            'description': 'md5:8b780f6f18de4dae631668b8a9637995',
+            'duration': 5290.027,
+            'timestamp': 1570694400,
+            'upload_date': '20191010',
+            'creator': 'ID - Investigation Discovery',
+            'series': 'Ted Bundy: Mind Of A Monster',
+            'season_number': 1,
+            'episode_number': 1,
+        },
+        'params': {
+            'format': 'bestvideo',
+            'skip_download': True,
          },
-    }, {
-        # geo restricted, via direct unsigned hls URL
-        'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
-        'only_matching': True,
      }, {
          # disco-api
          'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
@@ -89,19 +83,59 @@ class DPlayIE(InfoExtractor):
              'format': 'bestvideo',
              'skip_download': True,
          },
+        'skip': 'Available for Premium users',
      }, {
-
-        'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
+        'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
+        'md5': '2b808ffb00fc47b884a172ca5d13053c',
+        'info_dict': {
+            'id': '6918',
+            'display_id': 'biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij',
+            'ext': 'mp4',
+            'title': 'Luigi Di Maio: la psicosi di Stanislawskij',
+            'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
+            'thumbnail': r're:^https?://.*\.jpe?g',
+            'upload_date': '20160524',
+            'timestamp': 1464076800,
+            'series': 'Biografie imbarazzanti',
+            'season_number': 1,
+            'episode': 'Episode 1',
+            'episode_number': 1,
+        },
+    }, {
+        'url': 'https://es.dplay.com/dmax/la-fiebre-del-oro/temporada-8-episodio-1/',
+        'info_dict': {
+            'id': '21652',
+            'display_id': 'la-fiebre-del-oro/temporada-8-episodio-1',
+            'ext': 'mp4',
+            'title': 'Episodio 1',
+            'description': 'md5:b9dcff2071086e003737485210675f69',
+            'thumbnail': r're:^https?://.*\.png',
+            'upload_date': '20180709',
+            'timestamp': 1531173540,
+            'series': 'La fiebre del oro',
+            'season_number': 8,
+            'episode': 'Episode 1',
+            'episode_number': 1,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://www.dplay.fi/videot/shifting-gears-with-aaron-kaufman/episode-16',
          'only_matching': True,
      }, {
-        'url': 'https://www.dplay.se/videos/sofias-anglar/sofias-anglar-1001',
+        'url': 'https://www.dplay.jp/video/gold-rush/24086',
          'only_matching': True,
      }]
  
-    def _get_disco_api_info(self, url, display_id, disco_host, realm):
-        disco_base = 'https://' + disco_host
+    def _get_disco_api_info(self, url, display_id, disco_host, realm, country):
+        geo_countries = [country.upper()]
+        self._initialize_geo_bypass({
+            'countries': geo_countries,
+        })
+        disco_base = 'https://%s/' % disco_host
          token = self._download_json(
-            '%s/token' % disco_base, display_id, 'Downloading token',
+            disco_base + 'token', display_id, 'Downloading token',
              query={
                  'realm': realm,
              })['data']['attributes']['token']
@@ -110,17 +144,35 @@ class DPlayIE(InfoExtractor):
              'Authorization': 'Bearer ' + token,
          }
          video = self._download_json(
-            '%s/content/videos/%s' % (disco_base, display_id), display_id,
+            disco_base + 'content/videos/' + display_id, display_id,
              headers=headers, query={
-                'include': 'show'
+                'fields[channel]': 'name',
+                'fields[image]': 'height,src,width',
+                'fields[show]': 'name',
+                'fields[tag]': 'name',
+                'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
+                'include': 'images,primaryChannel,show,tags'
              })
          video_id = video['data']['id']
          info = video['data']['attributes']
-        title = info['name']
+        title = info['name'].strip()
          formats = []
-        for format_id, format_dict in self._download_json(
-                '%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
-                display_id, headers=headers)['data']['attributes']['streaming'].items():
+        try:
+            streaming = self._download_json(
+                disco_base + 'playback/videoPlaybackInfo/' + video_id,
+                display_id, headers=headers)['data']['attributes']['streaming']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
+                error = info['errors'][0]
+                error_code = error.get('code')
+                if error_code == 'access.denied.geoblocked':
+                    self.raise_geo_restricted(countries=geo_countries)
+                elif error_code == 'access.denied.missingpackage':
+                    self.raise_login_required()
+                raise ExtractorError(info['errors'][0]['detail'], expected=True)
+            raise
+        for format_id, format_dict in streaming.items():
              if not isinstance(format_dict, dict):
                  continue
              format_url = format_dict.get('url')
@@ -142,235 +194,54 @@ class DPlayIE(InfoExtractor):
                  })
          self._sort_formats(formats)
  
-        series = None
-        try:
-            included = video.get('included')
-            if isinstance(included, list):
-                show = next(e for e in included if e.get('type') == 'show')
-                series = try_get(
-                    show, lambda x: x['attributes']['name'], compat_str)
-        except StopIteration:
-            pass
+        creator = series = None
+        tags = []
+        thumbnails = []
+        included = video.get('included') or []
+        if isinstance(included, list):
+            for e in included:
+                attributes = e.get('attributes')
+                if not attributes:
+                    continue
+                e_type = e.get('type')
+                if e_type == 'channel':
+                    creator = attributes.get('name')
+                elif e_type == 'image':
+                    src = attributes.get('src')
+                    if src:
+                        thumbnails.append({
+                            'url': src,
+                            'width': int_or_none(attributes.get('width')),
+                            'height': int_or_none(attributes.get('height')),
+                        })
+                if e_type == 'show':
+                    series = attributes.get('name')
+                elif e_type == 'tag':
+                    name = attributes.get('name')
+                    if name:
+                        tags.append(name)
  
          return {
              'id': video_id,
              'display_id': display_id,
              'title': title,
              'description': info.get('description'),
-            'duration': float_or_none(
-                info.get('videoDuration'), scale=1000),
+            'duration': float_or_none(info.get('videoDuration'), 1000),
              'timestamp': unified_timestamp(info.get('publishStart')),
              'series': series,
              'season_number': int_or_none(info.get('seasonNumber')),
              'episode_number': int_or_none(info.get('episodeNumber')),
-            'age_limit': int_or_none(info.get('minimum_age')),
+            'creator': creator,
+            'tags': tags,
+            'thumbnails': thumbnails,
              'formats': formats,
          }
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          display_id = mobj.group('id')
-        domain = mobj.group('domain')
-
-        self._initialize_geo_bypass({
-            'countries': [mobj.group('country').upper()],
-        })
-
-        webpage = self._download_webpage(url, display_id)
-
-        video_id = self._search_regex(
-            r'data-video-id=["\'](\d+)', webpage, 'video id', default=None)
-
-        if not video_id:
-            host = mobj.group('host')
-            return self._get_disco_api_info(
-                url, display_id, 'disco-api.' + host, host.replace('.', ''))
-
-        info = self._download_json(
-            'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
-            video_id)['data'][0]
-
-        title = info['title']
-
-        PROTOCOLS = ('hls', 'hds')
-        formats = []
-
-        def extract_formats(protocol, manifest_url):
-            if protocol == 'hls':
-                m3u8_formats = self._extract_m3u8_formats(
-                    manifest_url, video_id, ext='mp4',
-                    entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
-                # Sometimes final URLs inside m3u8 are unsigned, let's fix this
-                # ourselves. Also fragments' URLs are only served signed for
-                # Safari user agent.
-                query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
-                for m3u8_format in m3u8_formats:
-                    m3u8_format.update({
-                        'url': update_url_query(m3u8_format['url'], query),
-                        'http_headers': {
-                            'User-Agent': USER_AGENTS['Safari'],
-                        },
-                    })
-                formats.extend(m3u8_formats)
-            elif protocol == 'hds':
-                formats.extend(self._extract_f4m_formats(
-                    manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
-                    video_id, f4m_id=protocol, fatal=False))
-
-        domain_tld = domain.split('.')[-1]
-        if domain_tld in ('se', 'dk', 'no'):
-            for protocol in PROTOCOLS:
-                # Providing dsc-geo allows to bypass geo restriction in some cases
-                self._set_cookie(
-                    'secure.dplay.%s' % domain_tld, 'dsc-geo',
-                    json.dumps({
-                        'countryCode': domain_tld.upper(),
-                        'expiry': (time.time() + 20 * 60) * 1000,
-                    }))
-                stream = self._download_json(
-                    'https://secure.dplay.%s/secure/api/v2/user/authorization/stream/%s?stream_type=%s'
-                    % (domain_tld, video_id, protocol), video_id,
-                    'Downloading %s stream JSON' % protocol, fatal=False)
-                if stream and stream.get(protocol):
-                    extract_formats(protocol, stream[protocol])
-
-        # The last resort is to try direct unsigned hls/hds URLs from info dictionary.
-        # Sometimes this does work even when secure API with dsc-geo has failed (e.g.
-        # http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
-        if not formats:
-            for protocol in PROTOCOLS:
-                if info.get(protocol):
-                    extract_formats(protocol, info[protocol])
-
-        self._sort_formats(formats)
-
-        subtitles = {}
-        for lang in ('se', 'sv', 'da', 'nl', 'no'):
-            for format_id in ('web_vtt', 'vtt', 'srt'):
-                subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
-                if subtitle_url:
-                    subtitles.setdefault(lang, []).append({'url': subtitle_url})
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': info.get('video_metadata_longDescription'),
-            'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
-            'timestamp': int_or_none(info.get('video_publish_date')),
-            'creator': info.get('video_metadata_homeChannel'),
-            'series': info.get('video_metadata_show'),
-            'season_number': int_or_none(info.get('season')),
-            'episode_number': int_or_none(info.get('episode')),
-            'age_limit': int_or_none(info.get('minimum_age')),
-            'formats': formats,
-            'subtitles': subtitles,
-        }
-
-
-class DPlayItIE(InfoExtractor):
-    _VALID_URL = r'https?://it\.dplay\.com/[^/]+/[^/]+/(?P<id>[^/?#]+)'
-    _GEO_COUNTRIES = ['IT']
-    _TEST = {
-        'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
-        'md5': '2b808ffb00fc47b884a172ca5d13053c',
-        'info_dict': {
-            'id': '6918',
-            'display_id': 'luigi-di-maio-la-psicosi-di-stanislawskij',
-            'ext': 'mp4',
-            'title': 'Biografie imbarazzanti: Luigi Di Maio: la psicosi di Stanislawskij',
-            'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
-            'thumbnail': r're:^https?://.*\.jpe?g',
-            'upload_date': '20160524',
-            'series': 'Biografie imbarazzanti',
-            'season_number': 1,
-            'episode': 'Luigi Di Maio: la psicosi di Stanislawskij',
-            'episode_number': 1,
-        },
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
-        title = remove_end(self._og_search_title(webpage), ' | Dplay')
-
-        video_id = None
-
-        info = self._search_regex(
-            r'playback_json\s*:\s*JSON\.parse\s*\(\s*("(?:\\.|[^"\\])+?")',
-            webpage, 'playback JSON', default=None)
-        if info:
-            for _ in range(2):
-                info = self._parse_json(info, display_id, fatal=False)
-                if not info:
-                    break
-            else:
-                video_id = try_get(info, lambda x: x['data']['id'])
-
-        if not info:
-            info_url = self._search_regex(
-                (r'playback_json_url\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
-                 r'url\s*[:=]\s*["\'](?P<url>(?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)'),
-                webpage, 'info url', group='url')
-
-            info_url = urljoin(url, info_url)
-            video_id = info_url.rpartition('/')[-1]
-
-            try:
-                info = self._download_json(
-                    info_url, display_id, headers={
-                        'Authorization': 'Bearer %s' % self._get_cookies(url).get(
-                            'dplayit_token').value,
-                        'Referer': url,
-                    })
-                if isinstance(info, compat_str):
-                    info = self._parse_json(info, display_id)
-            except ExtractorError as e:
-                if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
-                    info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
-                    error = info['errors'][0]
-                    if error.get('code') == 'access.denied.geoblocked':
-                        self.raise_geo_restricted(
-                            msg=error.get('detail'), countries=self._GEO_COUNTRIES)
-                    raise ExtractorError(info['errors'][0]['detail'], expected=True)
-                raise
-
-        hls_url = info['data']['attributes']['streaming']['hls']['url']
-
-        formats = self._extract_m3u8_formats(
-            hls_url, display_id, ext='mp4', entry_protocol='m3u8_native',
-            m3u8_id='hls')
-        self._sort_formats(formats)
-
-        series = self._html_search_regex(
-            r'(?s)<h1[^>]+class=["\'].*?\bshow_title\b.*?["\'][^>]*>(.+?)</h1>',
-            webpage, 'series', fatal=False)
-        episode = self._search_regex(
-            r'<p[^>]+class=["\'].*?\bdesc_ep\b.*?["\'][^>]*>\s*<br/>\s*<b>([^<]+)',
-            webpage, 'episode', fatal=False)
-
-        mobj = re.search(
-            r'(?s)<span[^>]+class=["\']dates["\'][^>]*>.+?\bS\.(?P<season_number>\d+)\s+E\.(?P<episode_number>\d+)\s*-\s*(?P<upload_date>\d{2}/\d{2}/\d{4})',
-            webpage)
-        if mobj:
-            season_number = int(mobj.group('season_number'))
-            episode_number = int(mobj.group('episode_number'))
-            upload_date = unified_strdate(mobj.group('upload_date'))
-        else:
-            season_number = episode_number = upload_date = None
-
-        return {
-            'id': compat_str(video_id or display_id),
-            'display_id': display_id,
-            'title': title,
-            'description': self._og_search_description(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'series': series,
-            'season_number': season_number,
-            'episode': episode,
-            'episode_number': episode_number,
-            'upload_date': upload_date,
-            'formats': formats,
-        }
+        domain = mobj.group('domain').lstrip('www.')
+        country = mobj.group('country') or mobj.group('subdomain_country')
+        host = 'disco-api.' + domain if domain.startswith('dplay.') else 'eu2-prod.disco-api.com'
+        return self._get_disco_api_info(
+            url, display_id, host, 'dplay' + country, country)
diff --git a/youtube_dl/extractor/drtv.py b/youtube_dl/extractor/drtv.py

index 218f1020947c42c6ae611e7b1609abdf06ba6a11..390e79f8cfae9cc0ec4b0f045ed8bac88bd6c523 100644 (file)
--- a/youtube_dl/extractor/drtv.py
+++ b/youtube_dl/extractor/drtv.py
@@ -17,6 +17,7 @@ from ..utils import (
      float_or_none,
      mimetype2ext,
      str_or_none,
+    try_get,
      unified_timestamp,
      update_url_query,
      url_or_none,
@@ -24,7 +25,14 @@ from ..utils import (
  
  
  class DRTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*|
+                            (?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode)/
+                        )
+                        (?P<id>[\da-z_-]+)
+                    '''
      _GEO_BYPASS = False
      _GEO_COUNTRIES = ['DK']
      IE_NAME = 'drtv'
@@ -83,6 +91,26 @@ class DRTVIE(InfoExtractor):
      }, {
          'url': 'https://www.dr.dk/radio/p4kbh/regionale-nyheder-kh4/p4-nyheder-2019-06-26-17-30-9',
          'only_matching': True,
+    }, {
+        'url': 'https://www.dr.dk/drtv/se/bonderoeven_71769',
+        'info_dict': {
+            'id': '00951930010',
+            'ext': 'mp4',
+            'title': 'Bonderøven (1:8)',
+            'description': 'md5:3cf18fc0d3b205745d4505f896af8121',
+            'timestamp': 1546542000,
+            'upload_date': '20190103',
+            'duration': 2576.6,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://www.dr.dk/drtv/episode/bonderoeven_71769',
+        'only_matching': True,
+    }, {
+        'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -100,13 +128,32 @@ class DRTVIE(InfoExtractor):
              webpage, 'video id', default=None)
  
          if not video_id:
-            video_id = compat_urllib_parse_unquote(self._search_regex(
+            video_id = self._search_regex(
                  r'(urn(?:%3A|:)dr(?:%3A|:)mu(?:%3A|:)programcard(?:%3A|:)[\da-f]+)',
-                webpage, 'urn'))
+                webpage, 'urn', default=None)
+            if video_id:
+                video_id = compat_urllib_parse_unquote(video_id)
+
+        _PROGRAMCARD_BASE = 'https://www.dr.dk/mu-online/api/1.4/programcard'
+        query = {'expanded': 'true'}
+
+        if video_id:
+            programcard_url = '%s/%s' % (_PROGRAMCARD_BASE, video_id)
+        else:
+            programcard_url = _PROGRAMCARD_BASE
+            page = self._parse_json(
+                self._search_regex(
+                    r'data\s*=\s*({.+?})\s*(?:;|</script)', webpage,
+                    'data'), '1')['cache']['page']
+            page = page[list(page.keys())[0]]
+            item = try_get(
+                page, (lambda x: x['item'], lambda x: x['entries'][0]['item']),
+                dict)
+            video_id = item['customId'].split(':')[-1]
+            query['productionnumber'] = video_id
  
          data = self._download_json(
-            'https://www.dr.dk/mu-online/api/1.4/programcard/%s' % video_id,
-            video_id, 'Downloading video JSON', query={'expanded': 'true'})
+            programcard_url, video_id, 'Downloading video JSON', query=query)
  
          title = str_or_none(data.get('Title')) or re.sub(
              r'\s*\|\s*(?:TV\s*\|\s*DR|DRTV)$', '',
diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py

index be2e3d37841b48ce7cf187b01ddd5236f85d7683..d9d9afdecba0a996bbcf36cfe46bc604344bd117 100644 (file)
--- a/youtube_dl/extractor/dumpert.py
+++ b/youtube_dl/extractor/dumpert.py
@@ -1,20 +1,17 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import compat_b64decode
  from ..utils import (
+    int_or_none,
      qualities,
-    sanitized_Request,
  )
  
  
  class DumpertIE(InfoExtractor):
-    _VALID_URL = r'(?P<protocol>https?)://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
+    _VALID_URL = r'(?P<protocol>https?)://(?:(?:www|legacy)\.)?dumpert\.nl/(?:mediabase|embed|item)/(?P<id>[0-9]+[/_][0-9a-zA-Z]+)'
      _TESTS = [{
-        'url': 'http://www.dumpert.nl/mediabase/6646981/951bc60f/',
+        'url': 'https://www.dumpert.nl/item/6646981_951bc60f',
          'md5': '1b9318d7d5054e7dcb9dc7654f21d643',
          'info_dict': {
              'id': '6646981/951bc60f',
@@ -24,46 +21,60 @@ class DumpertIE(InfoExtractor):
              'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
-        'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/',
+        'url': 'https://www.dumpert.nl/embed/6675421_dc440fe7',
+        'only_matching': True,
+    }, {
+        'url': 'http://legacy.dumpert.nl/mediabase/6646981/951bc60f',
+        'only_matching': True,
+    }, {
+        'url': 'http://legacy.dumpert.nl/embed/6675421/dc440fe7',
          'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        protocol = mobj.group('protocol')
-
-        url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
-        req = sanitized_Request(url)
-        req.add_header('Cookie', 'nsfw=1; cpc=10')
-        webpage = self._download_webpage(req, video_id)
-
-        files_base64 = self._search_regex(
-            r'data-files="([^"]+)"', webpage, 'data files')
-
-        files = self._parse_json(
-            compat_b64decode(files_base64).decode('utf-8'),
-            video_id)
+        video_id = self._match_id(url).replace('_', '/')
+        item = self._download_json(
+            'http://api-live.dumpert.nl/mobile_api/json/info/' + video_id.replace('/', '_'),
+            video_id)['items'][0]
+        title = item['title']
+        media = next(m for m in item['media'] if m.get('mediatype') == 'VIDEO')
  
          quality = qualities(['flv', 'mobile', 'tablet', '720p'])
-
-        formats = [{
-            'url': video_url,
-            'format_id': format_id,
-            'quality': quality(format_id),
-        } for format_id, video_url in files.items() if format_id != 'still']
+        formats = []
+        for variant in media.get('variants', []):
+            uri = variant.get('uri')
+            if not uri:
+                continue
+            version = variant.get('version')
+            formats.append({
+                'url': uri,
+                'format_id': version,
+                'quality': quality(version),
+            })
          self._sort_formats(formats)
  
-        title = self._html_search_meta(
-            'title', webpage) or self._og_search_title(webpage)
-        description = self._html_search_meta(
-            'description', webpage) or self._og_search_description(webpage)
-        thumbnail = files.get('still') or self._og_search_thumbnail(webpage)
+        thumbnails = []
+        stills = item.get('stills') or {}
+        for t in ('thumb', 'still'):
+            for s in ('', '-medium', '-large'):
+                still_id = t + s
+                still_url = stills.get(still_id)
+                if not still_url:
+                    continue
+                thumbnails.append({
+                    'id': still_id,
+                    'url': still_url,
+                })
+
+        stats = item.get('stats') or {}
  
          return {
              'id': video_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'formats': formats
+            'description': item.get('description'),
+            'thumbnails': thumbnails,
+            'formats': formats,
+            'duration': int_or_none(media.get('duration')),
+            'like_count': int_or_none(stats.get('kudos_total')),
+            'view_count': int_or_none(stats.get('views_total')),
          }
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

index 44120cae25e386a632f33d621a81ba0520e0f738..1cab440f46e9733e6182563b36a464d77b3610c7 100644 (file)
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -18,10 +18,10 @@ from .acast import (
      ACastIE,
      ACastChannelIE,
  )
-from .addanime import AddAnimeIE
  from .adn import ADNIE
  from .adobeconnect import AdobeConnectIE
  from .adobetv import (
+    AdobeTVEmbedIE,
      AdobeTVIE,
      AdobeTVShowIE,
      AdobeTVChannelIE,
@@ -80,7 +80,6 @@ from .awaan import (
  )
  from .azmedien import AZMedienIE
  from .baidu import BaiduVideoIE
-from .bambuser import BambuserIE, BambuserChannelIE
  from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
  from .bbc import (
      BBCCoUkIE,
@@ -224,13 +223,13 @@ from .comedycentral import (
      ComedyCentralTVIE,
      ToshIE,
  )
-from .comcarcoff import ComCarCoffIE
  from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
  from .commonprotocols import (
      MmsIE,
      RtmpIE,
  )
  from .condenast import CondeNastIE
+from .contv import CONtvIE
  from .corus import CorusIE
  from .cracked import CrackedIE
  from .crackle import CrackleIE
@@ -254,10 +253,6 @@ from .dailymotion import (
      DailymotionPlaylistIE,
      DailymotionUserIE,
  )
-from .daisuki import (
-    DaisukiMottoIE,
-    DaisukiMottoPlaylistIE,
-)
  from .daum import (
      DaumIE,
      DaumClipIE,
@@ -276,10 +271,7 @@ from .douyutv import (
      DouyuShowIE,
      DouyuTVIE,
  )
-from .dplay import (
-    DPlayIE,
-    DPlayItIE,
-)
+from .dplay import DPlayIE
  from .dreisat import DreiSatIE
  from .drbonanza import DRBonanzaIE
  from .drtuber import DrTuberIE
@@ -358,7 +350,6 @@ from .firsttv import FirstTVIE
  from .fivemin import FiveMinIE
  from .fivetv import FiveTVIE
  from .flickr import FlickrIE
-from .flipagram import FlipagramIE
  from .folketinget import FolketingetIE
  from .footyroom import FootyRoomIE
  from .formula1 import Formula1IE
@@ -369,7 +360,10 @@ from .fourtube import (
      FuxIE,
  )
  from .fox import FOXIE
-from .fox9 import FOX9IE
+from .fox9 import (
+    FOX9IE,
+    FOX9NewsIE,
+)
  from .foxgay import FoxgayIE
  from .foxnews import (
      FoxNewsIE,
@@ -402,10 +396,6 @@ from .fusion import FusionIE
  from .fxnetworks import FXNetworksIE
  from .gaia import GaiaIE
  from .gameinformer import GameInformerIE
-from .gameone import (
-    GameOneIE,
-    GameOnePlaylistIE,
-)
  from .gamespot import GameSpotIE
  from .gamestar import GameStarIE
  from .gaskrank import GaskrankIE
@@ -421,7 +411,6 @@ from .globo import (
      GloboArticleIE,
  )
  from .go import GoIE
-from .go90 import Go90IE
  from .godtube import GodTubeIE
  from .golem import GolemIE
  from .googledrive import GoogleDriveIE
@@ -430,7 +419,6 @@ from .googlesearch import GoogleSearchIE
  from .goshgay import GoshgayIE
  from .gputechconf import GPUTechConfIE
  from .groupon import GrouponIE
-from .hark import HarkIE
  from .hbo import HBOIE
  from .hearthisat import HearThisAtIE
  from .heise import HeiseIE
@@ -462,7 +450,6 @@ from .hungama import (
      HungamaSongIE,
  )
  from .hypem import HypemIE
-from .iconosquare import IconosquareIE
  from .ign import (
      IGNIE,
      OneUPIE,
@@ -521,10 +508,9 @@ from .keezmovies import KeezMoviesIE
  from .ketnet import KetnetIE
  from .khanacademy import KhanAcademyIE
  from .kickstarter import KickStarterIE
+from .kinja import KinjaEmbedIE
  from .kinopoisk import KinoPoiskIE
-from .keek import KeekIE
  from .konserthusetplay import KonserthusetPlayIE
-from .kontrtube import KontrTubeIE
  from .krasview import KrasViewIE
  from .ku6 import Ku6IE
  from .kusi import KUSIIE
@@ -548,7 +534,6 @@ from .lcp import (
      LcpPlayIE,
      LcpIE,
  )
-from .learnr import LearnrIE
  from .lecture2go import Lecture2GoIE
  from .lecturio import (
      LecturioIE,
@@ -600,13 +585,11 @@ from .lynda import (
      LyndaCourseIE
  )
  from .m6 import M6IE
-from .macgamestore import MacGameStoreIE
  from .mailru import (
      MailRuIE,
      MailRuMusicIE,
      MailRuMusicSearchIE,
  )
-from .makertv import MakerTVIE
  from .malltv import MallTVIE
  from .mangomolo import (
      MangomoloVideoIE,
@@ -640,17 +623,15 @@ from .microsoftvirtualacademy import (
      MicrosoftVirtualAcademyIE,
      MicrosoftVirtualAcademyCourseIE,
  )
-from .minhateca import MinhatecaIE
  from .ministrygrid import MinistryGridIE
  from .minoto import MinotoIE
  from .miomio import MioMioIE
-from .mit import TechTVMITIE, MITIE, OCWMITIE
+from .mit import TechTVMITIE, OCWMITIE
  from .mitele import MiTeleIE
  from .mixcloud import (
      MixcloudIE,
      MixcloudUserIE,
      MixcloudPlaylistIE,
-    MixcloudStreamIE,
  )
  from .mlb import MLBIE
  from .mnet import MnetIE
@@ -672,10 +653,9 @@ from .mtv import (
      MTVVideoIE,
      MTVServicesEmbeddedIE,
      MTVDEIE,
-    MTV81IE,
+    MTVJapanIE,
  )
  from .muenchentv import MuenchenTVIE
-from .musicplayon import MusicPlayOnIE
  from .mwave import MwaveIE, MwaveMeetGreetIE
  from .mychannels import MyChannelsIE
  from .myspace import MySpaceIE, MySpaceAlbumIE
@@ -815,10 +795,6 @@ from .ooyala import (
      OoyalaIE,
      OoyalaExternalIE,
  )
-from .openload import (
-    OpenloadIE,
-    VerystreamIE,
-)
  from .ora import OraTVIE
  from .orf import (
      ORFTVthekIE,
@@ -832,7 +808,6 @@ from .packtpub import (
      PacktPubIE,
      PacktPubCourseIE,
  )
-from .pandatv import PandaTVIE
  from .pandoratv import PandoraTVIE
  from .parliamentliveuk import ParliamentLiveUKIE
  from .patreon import PatreonIE
@@ -893,7 +868,6 @@ from .puhutv import (
      PuhuTVSerieIE,
  )
  from .presstv import PressTVIE
-from .promptfile import PromptFileIE
  from .prosiebensat1 import ProSiebenSat1IE
  from .puls4 import Puls4IE
  from .pyvideo import PyvideoIE
@@ -945,10 +919,6 @@ from .rentv import (
  from .restudy import RestudyIE
  from .reuters import ReutersIE
  from .reverbnation import ReverbNationIE
-from .revision3 import (
-    Revision3EmbedIE,
-    Revision3IE,
-)
  from .rice import RICEIE
  from .rmcdecouverte import RMCDecouverteIE
  from .ro220 import Ro220IE
@@ -992,11 +962,17 @@ from .savefrom import SaveFromIE
  from .sbs import SBSIE
  from .screencast import ScreencastIE
  from .screencastomatic import ScreencastOMaticIE
-from .scrippsnetworks import ScrippsNetworksWatchIE
+from .scrippsnetworks import (
+    ScrippsNetworksWatchIE,
+    ScrippsNetworksIE,
+)
+from .scte import (
+    SCTEIE,
+    SCTECourseIE,
+)
  from .seeker import SeekerIE
  from .senateisvp import SenateISVPIE
  from .sendtonews import SendtoNewsIE
-from .servingsys import ServingSysIE
  from .servus import ServusIE
  from .sevenplus import SevenPlusIE
  from .sexu import SexuIE
@@ -1037,6 +1013,7 @@ from .snotr import SnotrIE
  from .sohu import SohuIE
  from .sonyliv import SonyLIVIE
  from .soundcloud import (
+    SoundcloudEmbedIE,
      SoundcloudIE,
      SoundcloudSetIE,
      SoundcloudUserIE,
@@ -1080,7 +1057,6 @@ from .srmediathek import SRMediathekIE
  from .stanfordoc import StanfordOpenClassroomIE
  from .steam import SteamIE
  from .streamable import StreamableIE
-from .streamango import StreamangoIE
  from .streamcloud import StreamcloudIE
  from .streamcz import StreamCZIE
  from .streetvoice import StreetVoiceIE
@@ -1129,12 +1105,14 @@ from .telegraaf import TelegraafIE
  from .telemb import TeleMBIE
  from .telequebec import (
      TeleQuebecIE,
+    TeleQuebecSquatIE,
      TeleQuebecEmissionIE,
      TeleQuebecLiveIE,
  )
  from .teletask import TeleTaskIE
  from .telewebion import TelewebionIE
  from .tennistv import TennisTVIE
+from .tenplay import TenPlayIE
  from .testurl import TestURLIE
  from .tf1 import TF1IE
  from .tfo import TFOIE
@@ -1187,10 +1165,14 @@ from .tunein import (
  )
  from .tunepk import TunePkIE
  from .turbo import TurboIE
-from .tutv import TutvIE
  from .tv2 import (
      TV2IE,
      TV2ArticleIE,
+    KatsomoIE,
+)
+from .tv2dk import (
+    TV2DKIE,
+    TV2DKBornholmPlayIE,
  )
  from .tv2hu import TV2HuIE
  from .tv4 import TV4IE
@@ -1248,13 +1230,17 @@ from .twitter import (
      TwitterCardIE,
      TwitterIE,
      TwitterAmplifyIE,
+    TwitterBroadcastIE,
  )
  from .udemy import (
      UdemyIE,
      UdemyCourseIE
  )
  from .udn import UDNEmbedIE
-from .ufctv import UFCTVIE
+from .ufctv import (
+    UFCTVIE,
+    UFCArabiaIE,
+)
  from .uktvplay import UKTVPlayIE
  from .digiteka import DigitekaIE
  from .dlive import (
@@ -1282,7 +1268,6 @@ from .varzesh3 import Varzesh3IE
  from .vbox7 import Vbox7IE
  from .veehd import VeeHDIE
  from .veoh import VeohIE
-from .vessel import VesselIE
  from .vesti import VestiIE
  from .vevo import (
      VevoIE,
@@ -1309,7 +1294,6 @@ from .videomore import (
      VideomoreVideoIE,
      VideomoreSeasonIE,
  )
-from .videopremium import VideoPremiumIE
  from .videopress import VideoPressIE
  from .vidio import VidioIE
  from .vidlii import VidLiiIE
@@ -1324,7 +1308,6 @@ from .viewlift import (
      ViewLiftIE,
      ViewLiftEmbedIE,
  )
-from .viewster import ViewsterIE
  from .viidea import ViideaIE
  from .vimeo import (
      VimeoIE,
@@ -1413,7 +1396,6 @@ from .weibo import (
      WeiboMobileIE
  )
  from .weiqitv import WeiqiTVIE
-from .wimp import WimpIE
  from .wistia import WistiaIE
  from .worldstarhiphop import WorldStarHipHopIE
  from .wsj import (
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index a3dcdca3e2bd424bd7a24a800bdf457ec37a033f..ce64e26831fdafceb97b6d8ae919c00a78f0f90f 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -334,7 +334,7 @@ class FacebookIE(InfoExtractor):
          if not video_data:
              server_js_data = self._parse_json(
                  self._search_regex(
-                    r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)',
+                    r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_\d+)',
                      webpage, 'js data', default='{}'),
                  video_id, transform_source=js_to_json, fatal=False)
              video_data = extract_from_jsmods_instances(server_js_data)
@@ -379,6 +379,7 @@ class FacebookIE(InfoExtractor):
          if not video_data:
              raise ExtractorError('Cannot parse data')
  
+        subtitles = {}
          formats = []
          for f in video_data:
              format_id = f['stream_type']
@@ -402,9 +403,17 @@ class FacebookIE(InfoExtractor):
              if dash_manifest:
                  formats.extend(self._parse_mpd_formats(
                      compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
+            subtitles_src = f[0].get('subtitles_src')
+            if subtitles_src:
+                subtitles.setdefault('en', []).append({'url': subtitles_src})
          if not formats:
              raise ExtractorError('Cannot find video formats')
  
+        # Downloads with browser's User-Agent are rate limited. Working around
+        # with non-browser User-Agent.
+        for f in formats:
+            f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
+
          self._sort_formats(formats)
  
          video_title = self._html_search_regex(
@@ -442,6 +451,7 @@ class FacebookIE(InfoExtractor):
              'timestamp': timestamp,
              'thumbnail': thumbnail,
              'view_count': view_count,
+            'subtitles': subtitles,
          }
  
          return webpage, info_dict
diff --git a/youtube_dl/extractor/flipagram.py b/youtube_dl/extractor/flipagram.py

deleted file mode 100644 (file)

index b7be40f..0000000
--- a/youtube_dl/extractor/flipagram.py
+++ /dev/null
@@ -1,115 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    int_or_none,
-    float_or_none,
-    try_get,
-    unified_timestamp,
-)
-
-
-class FlipagramIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?flipagram\.com/f/(?P<id>[^/?#&]+)'
-    _TEST = {
-        'url': 'https://flipagram.com/f/nyvTSJMKId',
-        'md5': '888dcf08b7ea671381f00fab74692755',
-        'info_dict': {
-            'id': 'nyvTSJMKId',
-            'ext': 'mp4',
-            'title': 'Flipagram by sjuria101 featuring Midnight Memories by One Direction',
-            'description': 'md5:d55e32edc55261cae96a41fa85ff630e',
-            'duration': 35.571,
-            'timestamp': 1461244995,
-            'upload_date': '20160421',
-            'uploader': 'kitty juria',
-            'uploader_id': 'sjuria101',
-            'creator': 'kitty juria',
-            'view_count': int,
-            'like_count': int,
-            'repost_count': int,
-            'comment_count': int,
-            'comments': list,
-            'formats': 'mincount:2',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        video_data = self._parse_json(
-            self._search_regex(
-                r'window\.reactH2O\s*=\s*({.+});', webpage, 'video data'),
-            video_id)
-
-        flipagram = video_data['flipagram']
-        video = flipagram['video']
-
-        json_ld = self._search_json_ld(webpage, video_id, default={})
-        title = json_ld.get('title') or flipagram['captionText']
-        description = json_ld.get('description') or flipagram.get('captionText')
-
-        formats = [{
-            'url': video['url'],
-            'width': int_or_none(video.get('width')),
-            'height': int_or_none(video.get('height')),
-            'filesize': int_or_none(video_data.get('size')),
-        }]
-
-        preview_url = try_get(
-            flipagram, lambda x: x['music']['track']['previewUrl'], compat_str)
-        if preview_url:
-            formats.append({
-                'url': preview_url,
-                'ext': 'm4a',
-                'vcodec': 'none',
-            })
-
-        self._sort_formats(formats)
-
-        counts = flipagram.get('counts', {})
-        user = flipagram.get('user', {})
-        video_data = flipagram.get('video', {})
-
-        thumbnails = [{
-            'url': self._proto_relative_url(cover['url']),
-            'width': int_or_none(cover.get('width')),
-            'height': int_or_none(cover.get('height')),
-            'filesize': int_or_none(cover.get('size')),
-        } for cover in flipagram.get('covers', []) if cover.get('url')]
-
-        # Note that this only retrieves comments that are initially loaded.
-        # For videos with large amounts of comments, most won't be retrieved.
-        comments = []
-        for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
-            text = comment.get('comment')
-            if not text or not isinstance(text, list):
-                continue
-            comments.append({
-                'author': comment.get('user', {}).get('name'),
-                'author_id': comment.get('user', {}).get('username'),
-                'id': comment.get('id'),
-                'text': text[0],
-                'timestamp': unified_timestamp(comment.get('created')),
-            })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'duration': float_or_none(flipagram.get('duration'), 1000),
-            'thumbnails': thumbnails,
-            'timestamp': unified_timestamp(flipagram.get('iso8601Created')),
-            'uploader': user.get('name'),
-            'uploader_id': user.get('username'),
-            'creator': user.get('name'),
-            'view_count': int_or_none(counts.get('plays')),
-            'like_count': int_or_none(counts.get('likes')),
-            'repost_count': int_or_none(counts.get('reflips')),
-            'comment_count': int_or_none(counts.get('comments')),
-            'comments': comments,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/fox9.py b/youtube_dl/extractor/fox9.py

index 17dfffa7b70867adaa5a8995e4e24e03ed18df4d..91f8f7b8a1a2727783305dfd15137978d61d121f 100644 (file)
--- a/youtube_dl/extractor/fox9.py
+++ b/youtube_dl/extractor/fox9.py
@@ -1,13 +1,23 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .anvato import AnvatoIE
+from .common import InfoExtractor
  
  
-class FOX9IE(AnvatoIE):
-    _VALID_URL = r'https?://(?:www\.)?fox9\.com/(?:[^/]+/)+(?P<id>\d+)-story'
-    _TESTS = [{
-        'url': 'http://www.fox9.com/news/215123287-story',
+class FOX9IE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?fox9\.com/video/(?P<id>\d+)'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            'anvato:anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b:' + video_id,
+            'Anvato', video_id)
+
+
+class FOX9NewsIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?fox9\.com/news/(?P<id>[^/?&#]+)'
+    _TEST = {
+        'url': 'https://www.fox9.com/news/black-bear-in-tree-draws-crowd-in-downtown-duluth-minnesota',
          'md5': 'd6e1b2572c3bab8a849c9103615dd243',
          'info_dict': {
              'id': '314473',
@@ -21,22 +31,11 @@ class FOX9IE(AnvatoIE):
              'categories': ['News', 'Sports'],
              'tags': ['news', 'video'],
          },
-    }, {
-        'url': 'http://www.fox9.com/news/investigators/214070684-story',
-        'only_matching': True,
-    }]
+    }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        video_id = self._parse_json(
-            self._search_regex(
-                r"this\.videosJson\s*=\s*'(\[.+?\])';",
-                webpage, 'anvato playlist'),
-            video_id)[0]['video']
-
-        return self._get_anvato_videos(
-            'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b',
-            video_id)
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        anvato_id = self._search_regex(
+            r'anvatoId\s*:\s*[\'"](\d+)', webpage, 'anvato id')
+        return self.url_result('https://www.fox9.com/video/' + anvato_id, 'FOX9')
diff --git a/youtube_dl/extractor/gameone.py b/youtube_dl/extractor/gameone.py

deleted file mode 100644 (file)

index a07d698..0000000
--- a/youtube_dl/extractor/gameone.py
+++ /dev/null
@@ -1,134 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    xpath_with_ns,
-    parse_iso8601,
-    float_or_none,
-    int_or_none,
-)
-
-NAMESPACE_MAP = {
-    'media': 'http://search.yahoo.com/mrss/',
-}
-
-# URL prefix to download the mp4 files directly instead of streaming via rtmp
-# Credits go to XBox-Maniac
-# http://board.jdownloader.org/showpost.php?p=185835&postcount=31
-RAW_MP4_URL = 'http://cdn.riptide-mtvn.com/'
-
-
-class GameOneIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?gameone\.de/tv/(?P<id>\d+)'
-    _TESTS = [
-        {
-            'url': 'http://www.gameone.de/tv/288',
-            'md5': '136656b7fb4c9cb4a8e2d500651c499b',
-            'info_dict': {
-                'id': '288',
-                'ext': 'mp4',
-                'title': 'Game One - Folge 288',
-                'duration': 1238,
-                'thumbnail': 'http://s3.gameone.de/gameone/assets/video_metas/teaser_images/000/643/636/big/640x360.jpg',
-                'description': 'FIFA-Pressepokal 2014, Star Citizen, Kingdom Come: Deliverance, Project Cars, Schöner Trants Nerdquiz Folge 2 Runde 1',
-                'age_limit': 16,
-                'upload_date': '20140513',
-                'timestamp': 1399980122,
-            }
-        },
-        {
-            'url': 'http://gameone.de/tv/220',
-            'md5': '5227ca74c4ae6b5f74c0510a7c48839e',
-            'info_dict': {
-                'id': '220',
-                'ext': 'mp4',
-                'upload_date': '20120918',
-                'description': 'Jet Set Radio HD, Tekken Tag Tournament 2, Source Filmmaker',
-                'timestamp': 1347971451,
-                'title': 'Game One - Folge 220',
-                'duration': 896.62,
-                'age_limit': 16,
-            }
-        }
-
-    ]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-        og_video = self._og_search_video_url(webpage, secure=False)
-        description = self._html_search_meta('description', webpage)
-        age_limit = int(
-            self._search_regex(
-                r'age=(\d+)',
-                self._html_search_meta(
-                    'age-de-meta-label',
-                    webpage),
-                'age_limit',
-                '0'))
-        mrss_url = self._search_regex(r'mrss=([^&]+)', og_video, 'mrss')
-
-        mrss = self._download_xml(mrss_url, video_id, 'Downloading mrss')
-        title = mrss.find('.//item/title').text
-        thumbnail = mrss.find('.//item/image').get('url')
-        timestamp = parse_iso8601(mrss.find('.//pubDate').text, delimiter=' ')
-        content = mrss.find(xpath_with_ns('.//media:content', NAMESPACE_MAP))
-        content_url = content.get('url')
-
-        content = self._download_xml(
-            content_url,
-            video_id,
-            'Downloading media:content')
-        rendition_items = content.findall('.//rendition')
-        duration = float_or_none(rendition_items[0].get('duration'))
-        formats = [
-            {
-                'url': re.sub(r'.*/(r2)', RAW_MP4_URL + r'\1', r.find('./src').text),
-                'width': int_or_none(r.get('width')),
-                'height': int_or_none(r.get('height')),
-                'tbr': int_or_none(r.get('bitrate')),
-            }
-            for r in rendition_items
-        ]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-            'description': description,
-            'age_limit': age_limit,
-            'timestamp': timestamp,
-        }
-
-
-class GameOnePlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?gameone\.de(?:/tv)?/?$'
-    IE_NAME = 'gameone:playlist'
-    _TEST = {
-        'url': 'http://www.gameone.de/tv',
-        'info_dict': {
-            'title': 'GameOne',
-        },
-        'playlist_mincount': 294,
-    }
-
-    def _real_extract(self, url):
-        webpage = self._download_webpage('http://www.gameone.de/tv', 'TV')
-        max_id = max(map(int, re.findall(r'<a href="/tv/(\d+)"', webpage)))
-        entries = [
-            self.url_result('http://www.gameone.de/tv/%d' %
-                            video_id, 'GameOne')
-            for video_id in range(max_id, 0, -1)]
-
-        return {
-            '_type': 'playlist',
-            'title': 'GameOne',
-            'entries': entries,
-        }
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index d1725d98b0c63f72031068a001ea150b54fd23ec..3c002472f795b3600c8f721e76c446f85f3f9bf5 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -77,11 +77,10 @@ from .instagram import InstagramIE
  from .liveleak import LiveLeakIE
  from .threeqsdn import ThreeQSDNIE
  from .theplatform import ThePlatformIE
-from .vessel import VesselIE
  from .kaltura import KalturaIE
  from .eagleplatform import EaglePlatformIE
  from .facebook import FacebookIE
-from .soundcloud import SoundcloudIE
+from .soundcloud import SoundcloudEmbedIE
  from .tunein import TuneInBaseIE
  from .vbox7 import Vbox7IE
  from .dbtv import DBTVIE
@@ -89,10 +88,6 @@ from .piksel import PikselIE
  from .videa import VideaIE
  from .twentymin import TwentyMinutenIE
  from .ustream import UstreamIE
-from .openload import (
-    OpenloadIE,
-    VerystreamIE,
-)
  from .videopress import VideoPressIE
  from .rutube import RutubeIE
  from .limelight import LimelightBaseIE
@@ -119,6 +114,8 @@ from .foxnews import FoxNewsIE
  from .viqeo import ViqeoIE
  from .expressen import ExpressenIE
  from .zype import ZypeIE
+from .odnoklassniki import OdnoklassnikiIE
+from .kinja import KinjaEmbedIE
  
  
  class GenericIE(InfoExtractor):
@@ -1487,16 +1484,18 @@ class GenericIE(InfoExtractor):
                  'timestamp': 1432570283,
              },
          },
-        # OnionStudios embed
+        # Kinja embed
          {
              'url': 'http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537',
              'info_dict': {
-                'id': '2855',
+                'id': '106351',
                  'ext': 'mp4',
                  'title': 'Don’t Understand Bitcoin? This Man Will Mumble An Explanation At You',
+                'description': 'Migrated from OnionStudios',
                  'thumbnail': r're:^https?://.*\.jpe?g$',
-                'uploader': 'ClickHole',
-                'uploader_id': 'clickhole',
+                'uploader': 'clickhole',
+                'upload_date': '20150527',
+                'timestamp': 1432744860,
              }
          },
          # SnagFilms embed
@@ -2099,6 +2098,9 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Smoky Barbecue Favorites',
                  'thumbnail': r're:^https?://.*\.jpe?g',
+                'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+                'upload_date': '20170909',
+                'timestamp': 1504915200,
              },
              'add_ie': [ZypeIE.ie_key()],
              'params': {
@@ -2491,11 +2493,6 @@ class GenericIE(InfoExtractor):
          if tp_urls:
              return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
  
-        # Look for Vessel embeds
-        vessel_urls = VesselIE._extract_urls(webpage)
-        if vessel_urls:
-            return self.playlist_from_matches(vessel_urls, video_id, video_title, ie=VesselIE.ie_key())
-
          # Look for embedded rtl.nl player
          matches = re.findall(
              r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@@ -2633,9 +2630,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'), 'VK')
  
          # Look for embedded Odnoklassniki player
-        mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'Odnoklassniki')
+        odnoklassniki_url = OdnoklassnikiIE._extract_url(webpage)
+        if odnoklassniki_url:
+            return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
  
          # Look for embedded ivi player
          mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
@@ -2754,9 +2751,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(myvi_url)
  
          # Look for embedded soundcloud player
-        soundcloud_urls = SoundcloudIE._extract_urls(webpage)
+        soundcloud_urls = SoundcloudEmbedIE._extract_urls(webpage)
          if soundcloud_urls:
-            return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
+            return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML)
  
          # Look for tunein player
          tunein_urls = TuneInBaseIE._extract_urls(webpage)
@@ -2899,6 +2896,12 @@ class GenericIE(InfoExtractor):
          if senate_isvp_url:
              return self.url_result(senate_isvp_url, 'SenateISVP')
  
+        # Look for Kinja embeds
+        kinja_embed_urls = KinjaEmbedIE._extract_urls(webpage, url)
+        if kinja_embed_urls:
+            return self.playlist_from_matches(
+                kinja_embed_urls, video_id, video_title)
+
          # Look for OnionStudios embeds
          onionstudios_url = OnionStudiosIE._extract_url(webpage)
          if onionstudios_url:
@@ -2960,7 +2963,7 @@ class GenericIE(InfoExtractor):
  
          # Look for VODPlatform embeds
          mobj = re.search(
-            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
              webpage)
          if mobj is not None:
              return self.url_result(
@@ -2968,10 +2971,14 @@ class GenericIE(InfoExtractor):
  
          # Look for Mangomolo embeds
          mobj = re.search(
-            r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
+            r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//
+                (?:
+                    admin\.mangomolo\.com/analytics/index\.php/customers/embed|
+                    player\.mangomolo\.com/v1
+                )/
                  (?:
                      video\?.*?\bid=(?P<video_id>\d+)|
-                    index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
+                    (?:index|live)\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
                  ).+?)\1''', webpage)
          if mobj is not None:
              info = {
@@ -3040,18 +3047,6 @@ class GenericIE(InfoExtractor):
              return self.playlist_from_matches(
                  twentymin_urls, video_id, video_title, ie=TwentyMinutenIE.ie_key())
  
-        # Look for Openload embeds
-        openload_urls = OpenloadIE._extract_urls(webpage)
-        if openload_urls:
-            return self.playlist_from_matches(
-                openload_urls, video_id, video_title, ie=OpenloadIE.ie_key())
-
-        # Look for Verystream embeds
-        verystream_urls = VerystreamIE._extract_urls(webpage)
-        if verystream_urls:
-            return self.playlist_from_matches(
-                verystream_urls, video_id, video_title, ie=VerystreamIE.ie_key())
-
          # Look for VideoPress embeds
          videopress_urls = VideoPressIE._extract_urls(webpage)
          if videopress_urls:
diff --git a/youtube_dl/extractor/gfycat.py b/youtube_dl/extractor/gfycat.py

index bbe3cb283afcc4f382ff613f706488b2e7c7c564..18a30fe678e2c6f97f44e64bce1d30ab31d877ec 100644 (file)
--- a/youtube_dl/extractor/gfycat.py
+++ b/youtube_dl/extractor/gfycat.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class GfycatIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
+    _VALID_URL = r'https?://(?:(?:www|giant|thumbs)\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#\.]+)'
      _TESTS = [{
          'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
          'info_dict': {
@@ -53,6 +53,12 @@ class GfycatIE(InfoExtractor):
      }, {
          'url': 'https://gfycat.com/acceptablehappygoluckyharborporpoise-baseball',
          'only_matching': True
+    }, {
+        'url': 'https://thumbs.gfycat.com/acceptablehappygoluckyharborporpoise-size_restricted.gif',
+        'only_matching': True
+    }, {
+        'url': 'https://giant.gfycat.com/acceptablehappygoluckyharborporpoise.mp4',
+        'only_matching': True
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/globo.py b/youtube_dl/extractor/globo.py

index fb8f7679b0366ad7accdf802af34a644b03aea01..60d842d3a81e801fc9cd11c721c8d14fc9591fb9 100644 (file)
--- a/youtube_dl/extractor/globo.py
+++ b/youtube_dl/extractor/globo.py
@@ -96,21 +96,31 @@ class GloboIE(InfoExtractor):
          video = self._download_json(
              'http://api.globovideos.com/videos/%s/playlist' % video_id,
              video_id)['videos'][0]
+        if video.get('encrypted') is True:
+            raise ExtractorError('This video is DRM protected.', expected=True)
  
          title = video['title']
  
          formats = []
+        subtitles = {}
          for resource in video['resources']:
              resource_id = resource.get('_id')
              resource_url = resource.get('url')
-            if not resource_id or not resource_url:
+            resource_type = resource.get('type')
+            if not resource_url or (resource_type == 'media' and not resource_id) or resource_type not in ('subtitle', 'media'):
+                continue
+
+            if resource_type == 'subtitle':
+                subtitles.setdefault(resource.get('language') or 'por', []).append({
+                    'url': resource_url,
+                })
                  continue
  
              security = self._download_json(
                  'http://security.video.globo.com/videos/%s/hash' % video_id,
                  video_id, 'Downloading security hash for %s' % resource_id, query={
-                    'player': 'flash',
-                    'version': '17.0.0.132',
+                    'player': 'desktop',
+                    'version': '5.19.1',
                      'resource_id': resource_id,
                  })
  
@@ -123,18 +133,23 @@ class GloboIE(InfoExtractor):
                  continue
  
              hash_code = security_hash[:2]
-            received_time = security_hash[2:12]
-            received_random = security_hash[12:22]
-            received_md5 = security_hash[22:]
-
-            sign_time = compat_str(int(received_time) + 86400)
              padding = '%010d' % random.randint(1, 10000000000)
-
-            md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
+            if hash_code in ('04', '14'):
+                received_time = security_hash[3:13]
+                received_md5 = security_hash[24:]
+                hash_prefix = security_hash[:23]
+            elif hash_code in ('02', '12', '03', '13'):
+                received_time = security_hash[2:12]
+                received_md5 = security_hash[22:]
+                padding += '1'
+                hash_prefix = '05' + security_hash[:22]
+
+            padded_sign_time = compat_str(int(received_time) + 86400) + padding
+            md5_data = (received_md5 + padded_sign_time + '0xAC10FD').encode()
              signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
-            signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
+            signed_hash = hash_prefix + padded_sign_time + signed_md5
+            signed_url = '%s?h=%s&k=html5&a=%s&u=%s' % (resource_url, signed_hash, 'F' if video.get('subscriber_only') else 'A', security.get('user') or '')
  
-            signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
              if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
                  formats.extend(self._extract_m3u8_formats(
                      signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
@@ -164,7 +179,8 @@ class GloboIE(InfoExtractor):
              'duration': duration,
              'uploader': uploader,
              'uploader_id': uploader_id,
-            'formats': formats
+            'formats': formats,
+            'subtitles': subtitles,
          }
  
  
diff --git a/youtube_dl/extractor/go.py b/youtube_dl/extractor/go.py

index 03e48f4ea4b93153ef445f0d9a66779144821361..03cfba91f00ab6134150090c3088fb7f5d6666d7 100644 (file)
--- a/youtube_dl/extractor/go.py
+++ b/youtube_dl/extractor/go.py
@@ -40,8 +40,17 @@ class GoIE(AdobePassIE):
              'resource_id': 'Disney',
          }
      }
-    _VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
-                 % '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:(?P<sub_domain>%s)\.)?go|
+                            (?P<sub_domain_2>abc|freeform|disneynow)
+                        )\.com/
+                        (?:
+                            (?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)|
+                            (?:[^/]+/)*(?P<display_id>[^/?\#]+)
+                        )
+                    ''' % '|'.join(list(_SITE_INFO.keys()))
      _TESTS = [{
          'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
          'info_dict': {
@@ -54,6 +63,7 @@ class GoIE(AdobePassIE):
              # m3u8 download
              'skip_download': True,
          },
+        'skip': 'This content is no longer available.',
      }, {
          'url': 'http://watchdisneyxd.go.com/doraemon',
          'info_dict': {
@@ -61,6 +71,34 @@ class GoIE(AdobePassIE):
              'id': 'SH55574025',
          },
          'playlist_mincount': 51,
+    }, {
+        'url': 'http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood',
+        'info_dict': {
+            'id': 'VDKA3609139',
+            'ext': 'mp4',
+            'title': 'This Guilty Blood',
+            'description': 'md5:f18e79ad1c613798d95fdabfe96cd292',
+            'age_limit': 14,
+        },
+        'params': {
+            'geo_bypass_ip_block': '3.244.239.0/24',
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet',
+        'info_dict': {
+            'id': 'VDKA13435179',
+            'ext': 'mp4',
+            'title': 'The Bet',
+            'description': 'md5:c66de8ba2e92c6c5c113c3ade84ab404',
+            'age_limit': 14,
+        },
+        'params': {
+            'geo_bypass_ip_block': '3.244.239.0/24',
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
          'only_matching': True,
@@ -95,10 +133,13 @@ class GoIE(AdobePassIE):
          if not video_id or not site_info:
              webpage = self._download_webpage(url, display_id or video_id)
              video_id = self._search_regex(
-                # There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
-                # from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
-                r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id',
-                default=video_id)
+                (
+                    # There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
+                    # from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
+                    r'data-video-id=["\']*(VDKA\w+)',
+                    # https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
+                    r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
+                ), webpage, 'video id', default=video_id)
              if not site_info:
                  brand = self._search_regex(
                      (r'data-brand=\s*["\']\s*(\d+)',
diff --git a/youtube_dl/extractor/go90.py b/youtube_dl/extractor/go90.py

deleted file mode 100644 (file)

index c3ea717..0000000
--- a/youtube_dl/extractor/go90.py
+++ /dev/null
@@ -1,149 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_HTTPError
-from ..utils import (
-    determine_ext,
-    ExtractorError,
-    int_or_none,
-    parse_age_limit,
-    parse_iso8601,
-)
-
-
-class Go90IE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?go90\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z]+)'
-    _TESTS = [{
-        'url': 'https://www.go90.com/videos/84BUqjLpf9D',
-        'md5': 'efa7670dbbbf21a7b07b360652b24a32',
-        'info_dict': {
-            'id': '84BUqjLpf9D',
-            'ext': 'mp4',
-            'title': 'Daily VICE - Inside The Utah Coalition Against Pornography Convention',
-            'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.',
-            'timestamp': 1491868800,
-            'upload_date': '20170411',
-            'age_limit': 14,
-        }
-    }, {
-        'url': 'https://www.go90.com/embed/261MflWkD3N',
-        'only_matching': True,
-    }]
-    _GEO_BYPASS = False
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        try:
-            headers = self.geo_verification_headers()
-            headers.update({
-                'Content-Type': 'application/json; charset=utf-8',
-            })
-            video_data = self._download_json(
-                'https://www.go90.com/api/view/items/' + video_id, video_id,
-                headers=headers, data=b'{"client":"web","device_type":"pc"}')
-        except ExtractorError as e:
-            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
-                message = self._parse_json(e.cause.read().decode(), None)['error']['message']
-                if 'region unavailable' in message:
-                    self.raise_geo_restricted(countries=['US'])
-                raise ExtractorError(message, expected=True)
-            raise
-
-        if video_data.get('requires_drm'):
-            raise ExtractorError('This video is DRM protected.', expected=True)
-        main_video_asset = video_data['main_video_asset']
-
-        episode_number = int_or_none(video_data.get('episode_number'))
-        series = None
-        season = None
-        season_id = None
-        season_number = None
-        for metadata in video_data.get('__children', {}).get('Item', {}).values():
-            if metadata.get('type') == 'show':
-                series = metadata.get('title')
-            elif metadata.get('type') == 'season':
-                season = metadata.get('title')
-                season_id = metadata.get('id')
-                season_number = int_or_none(metadata.get('season_number'))
-
-        title = episode = video_data.get('title') or series
-        if series and series != title:
-            title = '%s - %s' % (series, title)
-
-        thumbnails = []
-        formats = []
-        subtitles = {}
-        for asset in video_data.get('assets'):
-            if asset.get('id') == main_video_asset:
-                for source in asset.get('sources', []):
-                    source_location = source.get('location')
-                    if not source_location:
-                        continue
-                    source_type = source.get('type')
-                    if source_type == 'hls':
-                        m3u8_formats = self._extract_m3u8_formats(
-                            source_location, video_id, 'mp4',
-                            'm3u8_native', m3u8_id='hls', fatal=False)
-                        for f in m3u8_formats:
-                            mobj = re.search(r'/hls-(\d+)-(\d+)K', f['url'])
-                            if mobj:
-                                height, tbr = mobj.groups()
-                                height = int_or_none(height)
-                                f.update({
-                                    'height': f.get('height') or height,
-                                    'width': f.get('width') or int_or_none(height / 9.0 * 16.0 if height else None),
-                                    'tbr': f.get('tbr') or int_or_none(tbr),
-                                })
-                        formats.extend(m3u8_formats)
-                    elif source_type == 'dash':
-                        formats.extend(self._extract_mpd_formats(
-                            source_location, video_id, mpd_id='dash', fatal=False))
-                    else:
-                        formats.append({
-                            'format_id': source.get('name'),
-                            'url': source_location,
-                            'width': int_or_none(source.get('width')),
-                            'height': int_or_none(source.get('height')),
-                            'tbr': int_or_none(source.get('bitrate')),
-                        })
-
-                for caption in asset.get('caption_metadata', []):
-                    caption_url = caption.get('source_url')
-                    if not caption_url:
-                        continue
-                    subtitles.setdefault(caption.get('language', 'en'), []).append({
-                        'url': caption_url,
-                        'ext': determine_ext(caption_url, 'vtt'),
-                    })
-            elif asset.get('type') == 'image':
-                asset_location = asset.get('location')
-                if not asset_location:
-                    continue
-                thumbnails.append({
-                    'url': asset_location,
-                    'width': int_or_none(asset.get('width')),
-                    'height': int_or_none(asset.get('height')),
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'thumbnails': thumbnails,
-            'description': video_data.get('short_description'),
-            'like_count': int_or_none(video_data.get('like_count')),
-            'timestamp': parse_iso8601(video_data.get('released_at')),
-            'series': series,
-            'episode': episode,
-            'season': season,
-            'season_id': season_id,
-            'season_number': season_number,
-            'episode_number': episode_number,
-            'subtitles': subtitles,
-            'age_limit': parse_age_limit(video_data.get('rating')),
-        }
diff --git a/youtube_dl/extractor/hark.py b/youtube_dl/extractor/hark.py

deleted file mode 100644 (file)

index 342a613..0000000
--- a/youtube_dl/extractor/hark.py
+++ /dev/null
@@ -1,33 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class HarkIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
-    _TEST = {
-        'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
-        'md5': '6783a58491b47b92c7c1af5a77d4cbee',
-        'info_dict': {
-            'id': 'mmbzyhkgny',
-            'ext': 'mp3',
-            'title': 'Obama: \'Beyond The Afghan Theater, We Only Target Al Qaeda\' on May 23, 2013',
-            'description': 'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
-            'duration': 11,
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        data = self._download_json(
-            'http://www.hark.com/clips/%s.json' % video_id, video_id)
-
-        return {
-            'id': video_id,
-            'url': data['url'],
-            'title': data['name'],
-            'description': data.get('description'),
-            'thumbnail': data.get('image_original'),
-            'duration': data.get('duration'),
-        }
diff --git a/youtube_dl/extractor/hotstar.py b/youtube_dl/extractor/hotstar.py

index f9f7c5a645793eb13c44af45914b768b1982fe2d..f97eefa3d6789cf58b05a6e6bf91d290463cf444 100644 (file)
--- a/youtube_dl/extractor/hotstar.py
+++ b/youtube_dl/extractor/hotstar.py
@@ -118,6 +118,7 @@ class HotStarIE(HotStarBaseIE):
          if video_data.get('drmProtected'):
              raise ExtractorError('This video is DRM protected.', expected=True)
  
+        headers = {'Referer': url}
          formats = []
          geo_restricted = False
          playback_sets = self._call_api_v2('h/v2/play', video_id)['playBackSets']
@@ -137,10 +138,11 @@ class HotStarIE(HotStarBaseIE):
                  if 'package:hls' in tags or ext == 'm3u8':
                      formats.extend(self._extract_m3u8_formats(
                          format_url, video_id, 'mp4',
-                        entry_protocol='m3u8_native', m3u8_id='hls'))
+                        entry_protocol='m3u8_native',
+                        m3u8_id='hls', headers=headers))
                  elif 'package:dash' in tags or ext == 'mpd':
                      formats.extend(self._extract_mpd_formats(
-                        format_url, video_id, mpd_id='dash'))
+                        format_url, video_id, mpd_id='dash', headers=headers))
                  elif ext == 'f4m':
                      # produce broken files
                      pass
@@ -158,6 +160,9 @@ class HotStarIE(HotStarBaseIE):
              self.raise_geo_restricted(countries=['IN'])
          self._sort_formats(formats)
  
+        for f in formats:
+            f.setdefault('http_headers', {}).update(headers)
+
          return {
              'id': video_id,
              'title': title,
diff --git a/youtube_dl/extractor/iconosquare.py b/youtube_dl/extractor/iconosquare.py

deleted file mode 100644 (file)

index a39f422..0000000
--- a/youtube_dl/extractor/iconosquare.py
+++ /dev/null
@@ -1,85 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    get_element_by_id,
-    remove_end,
-)
-
-
-class IconosquareIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:iconosquare\.com|statigr\.am)/p/(?P<id>[^/]+)'
-    _TEST = {
-        'url': 'http://statigr.am/p/522207370455279102_24101272',
-        'md5': '6eb93b882a3ded7c378ee1d6884b1814',
-        'info_dict': {
-            'id': '522207370455279102_24101272',
-            'ext': 'mp4',
-            'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
-            'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
-            'timestamp': 1376471991,
-            'upload_date': '20130814',
-            'uploader': 'aguynamedpatrick',
-            'uploader_id': '24101272',
-            'comment_count': int,
-            'like_count': int,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        media = self._parse_json(
-            get_element_by_id('mediaJson', webpage),
-            video_id)
-
-        formats = [{
-            'url': f['url'],
-            'format_id': format_id,
-            'width': int_or_none(f.get('width')),
-            'height': int_or_none(f.get('height'))
-        } for format_id, f in media['videos'].items()]
-        self._sort_formats(formats)
-
-        title = remove_end(self._og_search_title(webpage), ' - via Iconosquare')
-
-        timestamp = int_or_none(media.get('created_time') or media.get('caption', {}).get('created_time'))
-        description = media.get('caption', {}).get('text')
-
-        uploader = media.get('user', {}).get('username')
-        uploader_id = media.get('user', {}).get('id')
-
-        comment_count = int_or_none(media.get('comments', {}).get('count'))
-        like_count = int_or_none(media.get('likes', {}).get('count'))
-
-        thumbnails = [{
-            'url': t['url'],
-            'id': thumbnail_id,
-            'width': int_or_none(t.get('width')),
-            'height': int_or_none(t.get('height'))
-        } for thumbnail_id, t in media.get('images', {}).items()]
-
-        comments = [{
-            'id': comment.get('id'),
-            'text': comment['text'],
-            'timestamp': int_or_none(comment.get('created_time')),
-            'author': comment.get('from', {}).get('full_name'),
-            'author_id': comment.get('from', {}).get('username'),
-        } for comment in media.get('comments', {}).get('data', []) if 'text' in comment]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnails': thumbnails,
-            'timestamp': timestamp,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'comment_count': comment_count,
-            'like_count': like_count,
-            'formats': formats,
-            'comments': comments,
-        }
diff --git a/youtube_dl/extractor/imggaming.py b/youtube_dl/extractor/imggaming.py

new file mode 100644 (file)

index 0000000..e11f920
--- /dev/null
+++ b/youtube_dl/extractor/imggaming.py
@@ -0,0 +1,133 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    str_or_none,
+    try_get,
+)
+
+
+class ImgGamingBaseIE(InfoExtractor):
+    _API_BASE = 'https://dce-frontoffice.imggaming.com/api/v2/'
+    _API_KEY = '857a1e5d-e35e-4fdf-805b-a87b6f8364bf'
+    _HEADERS = None
+    _MANIFEST_HEADERS = {'Accept-Encoding': 'identity'}
+    _REALM = None
+    _VALID_URL_TEMPL = r'https?://(?P<domain>%s)/(?P<type>live|playlist|video)/(?P<id>\d+)(?:\?.*?\bplaylistId=(?P<playlist_id>\d+))?'
+
+    def _real_initialize(self):
+        self._HEADERS = {
+            'Realm': 'dce.' + self._REALM,
+            'x-api-key': self._API_KEY,
+        }
+
+        email, password = self._get_login_info()
+        if email is None:
+            self.raise_login_required()
+
+        p_headers = self._HEADERS.copy()
+        p_headers['Content-Type'] = 'application/json'
+        self._HEADERS['Authorization'] = 'Bearer ' + self._download_json(
+            self._API_BASE + 'login',
+            None, 'Logging in', data=json.dumps({
+                'id': email,
+                'secret': password,
+            }).encode(), headers=p_headers)['authorisationToken']
+
+    def _call_api(self, path, media_id):
+        return self._download_json(
+            self._API_BASE + path + media_id, media_id, headers=self._HEADERS)
+
+    def _extract_dve_api_url(self, media_id, media_type):
+        stream_path = 'stream'
+        if media_type == 'video':
+            stream_path += '/vod/'
+        else:
+            stream_path += '?eventId='
+        try:
+            return self._call_api(
+                stream_path, media_id)['playerUrlCallback']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                raise ExtractorError(
+                    self._parse_json(e.cause.read().decode(), media_id)['messages'][0],
+                    expected=True)
+            raise
+
+    def _real_extract(self, url):
+        domain, media_type, media_id, playlist_id = re.match(self._VALID_URL, url).groups()
+
+        if playlist_id:
+            if self._downloader.params.get('noplaylist'):
+                self.to_screen('Downloading just video %s because of --no-playlist' % media_id)
+            else:
+                self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % playlist_id)
+                media_type, media_id = 'playlist', playlist_id
+
+        if media_type == 'playlist':
+            playlist = self._call_api('vod/playlist/', media_id)
+            entries = []
+            for video in try_get(playlist, lambda x: x['videos']['vods']) or []:
+                video_id = str_or_none(video.get('id'))
+                if not video_id:
+                    continue
+                entries.append(self.url_result(
+                    'https://%s/video/%s' % (domain, video_id),
+                    self.ie_key(), video_id))
+            return self.playlist_result(
+                entries, media_id, playlist.get('title'),
+                playlist.get('description'))
+
+        dve_api_url = self._extract_dve_api_url(media_id, media_type)
+        video_data = self._download_json(dve_api_url, media_id)
+        is_live = media_type == 'live'
+        if is_live:
+            title = self._live_title(self._call_api('event/', media_id)['title'])
+        else:
+            title = video_data['name']
+
+        formats = []
+        for proto in ('hls', 'dash'):
+            media_url = video_data.get(proto + 'Url') or try_get(video_data, lambda x: x[proto]['url'])
+            if not media_url:
+                continue
+            if proto == 'hls':
+                m3u8_formats = self._extract_m3u8_formats(
+                    media_url, media_id, 'mp4', 'm3u8' if is_live else 'm3u8_native',
+                    m3u8_id='hls', fatal=False, headers=self._MANIFEST_HEADERS)
+                for f in m3u8_formats:
+                    f.setdefault('http_headers', {}).update(self._MANIFEST_HEADERS)
+                    formats.append(f)
+            else:
+                formats.extend(self._extract_mpd_formats(
+                    media_url, media_id, mpd_id='dash', fatal=False,
+                    headers=self._MANIFEST_HEADERS))
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for subtitle in video_data.get('subtitles', []):
+            subtitle_url = subtitle.get('url')
+            if not subtitle_url:
+                continue
+            subtitles.setdefault(subtitle.get('lang', 'en_US'), []).append({
+                'url': subtitle_url,
+            })
+
+        return {
+            'id': media_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': video_data.get('thumbnailUrl'),
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'tags': video_data.get('tags'),
+            'is_live': is_live,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/internetvideoarchive.py b/youtube_dl/extractor/internetvideoarchive.py

index 76cc5ec3ee21450f724564ef0c75f9c08931d2f7..59b0a90c398c4c8ad6ab8fd39566bffbb8793046 100644 (file)
--- a/youtube_dl/extractor/internetvideoarchive.py
+++ b/youtube_dl/extractor/internetvideoarchive.py
@@ -1,15 +1,13 @@
  from __future__ import unicode_literals
  
+import json
+import re
+
  from .common import InfoExtractor
  from ..compat import (
      compat_parse_qs,
      compat_urlparse,
  )
-from ..utils import (
-    determine_ext,
-    int_or_none,
-    xpath_text,
-)
  
  
  class InternetVideoArchiveIE(InfoExtractor):
@@ -20,7 +18,7 @@ class InternetVideoArchiveIE(InfoExtractor):
          'info_dict': {
              'id': '194487',
              'ext': 'mp4',
-            'title': 'KICK-ASS 2',
+            'title': 'Kick-Ass 2',
              'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
          },
          'params': {
@@ -33,68 +31,34 @@ class InternetVideoArchiveIE(InfoExtractor):
      def _build_json_url(query):
          return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
  
-    @staticmethod
-    def _build_xml_url(query):
-        return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
-
      def _real_extract(self, url):
-        query = compat_urlparse.urlparse(url).query
-        query_dic = compat_parse_qs(query)
-        video_id = query_dic['publishedid'][0]
-
-        if '/player/' in url:
-            configuration = self._download_json(url, video_id)
-
-            # There are multiple videos in the playlist whlie only the first one
-            # matches the video played in browsers
-            video_info = configuration['playlist'][0]
-            title = video_info['title']
-
-            formats = []
-            for source in video_info['sources']:
-                file_url = source['file']
-                if determine_ext(file_url) == 'm3u8':
-                    m3u8_formats = self._extract_m3u8_formats(
-                        file_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
-                    if m3u8_formats:
-                        formats.extend(m3u8_formats)
-                        file_url = m3u8_formats[0]['url']
-                        formats.extend(self._extract_f4m_formats(
-                            file_url.replace('.m3u8', '.f4m'),
-                            video_id, f4m_id='hds', fatal=False))
-                        formats.extend(self._extract_mpd_formats(
-                            file_url.replace('.m3u8', '.mpd'),
-                            video_id, mpd_id='dash', fatal=False))
-                else:
-                    a_format = {
-                        'url': file_url,
-                    }
-
-                    if source.get('label') and source['label'][-4:] == ' kbs':
-                        tbr = int_or_none(source['label'][:-4])
-                        a_format.update({
-                            'tbr': tbr,
-                            'format_id': 'http-%d' % tbr,
-                        })
-                        formats.append(a_format)
-
-            self._sort_formats(formats)
-
-            description = video_info.get('description')
-            thumbnail = video_info.get('image')
-        else:
-            configuration = self._download_xml(url, video_id)
-            formats = [{
-                'url': xpath_text(configuration, './file', 'file URL', fatal=True),
-            }]
-            thumbnail = xpath_text(configuration, './image', 'thumbnail')
-            title = 'InternetVideoArchive video %s' % video_id
-            description = None
+        query = compat_parse_qs(compat_urlparse.urlparse(url).query)
+        video_id = query['publishedid'][0]
+        data = self._download_json(
+            'https://video.internetvideoarchive.net/videojs7/videojs7.ivasettings.ashx',
+            video_id, data=json.dumps({
+                'customerid': query['customerid'][0],
+                'publishedid': video_id,
+            }).encode())
+        title = data['Title']
+        formats = self._extract_m3u8_formats(
+            data['VideoUrl'], video_id, 'mp4',
+            'm3u8_native', m3u8_id='hls', fatal=False)
+        file_url = formats[0]['url']
+        if '.ism/' in file_url:
+            replace_url = lambda x: re.sub(r'\.ism/[^?]+', '.ism/' + x, file_url)
+            formats.extend(self._extract_f4m_formats(
+                replace_url('.f4m'), video_id, f4m_id='hds', fatal=False))
+            formats.extend(self._extract_mpd_formats(
+                replace_url('.mpd'), video_id, mpd_id='dash', fatal=False))
+            formats.extend(self._extract_ism_formats(
+                replace_url('Manifest'), video_id, ism_id='mss', fatal=False))
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
-            'thumbnail': thumbnail,
-            'description': description,
+            'thumbnail': data.get('PosterUrl'),
+            'description': data.get('Description'),
          }
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index 86c014b076bfaa1d7da143bdd7cb29c10438d2f3..b5a740a01e1d360ca667869edef88f85e3559842 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -1,8 +1,9 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
  import json
+import re
+import sys
  
  from .common import InfoExtractor
  from ..utils import (
@@ -18,6 +19,8 @@ class IviIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
      _GEO_BYPASS = False
      _GEO_COUNTRIES = ['RU']
+    _LIGHT_KEY = b'\xf1\x02\x32\xb7\xbc\x5c\x7a\xe8\xf7\x96\xc1\x33\x2b\x27\xa1\x8c'
+    _LIGHT_URL = 'https://api.ivi.ru/light/'
  
      _TESTS = [
          # Single movie
@@ -80,48 +83,96 @@ class IviIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        data = {
+        data = json.dumps({
              'method': 'da.content.get',
              'params': [
                  video_id, {
-                    'site': 's183',
+                    'site': 's%d',
                      'referrer': 'http://www.ivi.ru/watch/%s' % video_id,
                      'contentid': video_id
                  }
              ]
-        }
+        })
  
-        video_json = self._download_json(
-            'http://api.digitalaccess.ru/api/json/', video_id,
-            'Downloading video JSON', data=json.dumps(data))
-
-        if 'error' in video_json:
-            error = video_json['error']
-            origin = error['origin']
-            if origin == 'NotAllowedForLocation':
-                self.raise_geo_restricted(
-                    msg=error['message'], countries=self._GEO_COUNTRIES)
-            elif origin == 'NoRedisValidData':
-                raise ExtractorError('Video %s does not exist' % video_id, expected=True)
-            raise ExtractorError(
-                'Unable to download video %s: %s' % (video_id, error['message']),
-                expected=True)
+        bundled = hasattr(sys, 'frozen')
  
-        result = video_json['result']
+        for site in (353, 183):
+            content_data = (data % site).encode()
+            if site == 353:
+                if bundled:
+                    continue
+                try:
+                    from Cryptodome.Cipher import Blowfish
+                    from Cryptodome.Hash import CMAC
+                    pycryptodomex_found = True
+                except ImportError:
+                    pycryptodomex_found = False
+                    continue
  
-        quality = qualities(self._KNOWN_FORMATS)
+                timestamp = (self._download_json(
+                    self._LIGHT_URL, video_id,
+                    'Downloading timestamp JSON', data=json.dumps({
+                        'method': 'da.timestamp.get',
+                        'params': []
+                    }).encode(), fatal=False) or {}).get('result')
+                if not timestamp:
+                    continue
  
-        formats = [{
-            'url': x['url'],
-            'format_id': x.get('content_format'),
-            'quality': quality(x.get('content_format')),
-        } for x in result['files'] if x.get('url')]
+                query = {
+                    'ts': timestamp,
+                    'sign': CMAC.new(self._LIGHT_KEY, timestamp.encode() + content_data, Blowfish).hexdigest(),
+                }
+            else:
+                query = {}
  
-        self._sort_formats(formats)
+            video_json = self._download_json(
+                self._LIGHT_URL, video_id,
+                'Downloading video JSON', data=content_data, query=query)
  
+            error = video_json.get('error')
+            if error:
+                origin = error.get('origin')
+                message = error.get('message') or error.get('user_message')
+                extractor_msg = 'Unable to download video %s'
+                if origin == 'NotAllowedForLocation':
+                    self.raise_geo_restricted(message, self._GEO_COUNTRIES)
+                elif origin == 'NoRedisValidData':
+                    extractor_msg = 'Video %s does not exist'
+                elif site == 353:
+                    continue
+                elif bundled:
+                    raise ExtractorError(
+                        'This feature does not work from bundled exe. Run youtube-dl from sources.',
+                        expected=True)
+                elif not pycryptodomex_found:
+                    raise ExtractorError(
+                        'pycryptodomex not found. Please install it.',
+                        expected=True)
+                elif message:
+                    extractor_msg += ': ' + message
+                raise ExtractorError(extractor_msg % video_id, expected=True)
+            else:
+                break
+
+        result = video_json['result']
          title = result['title']
  
-        duration = int_or_none(result.get('duration'))
+        quality = qualities(self._KNOWN_FORMATS)
+
+        formats = []
+        for f in result.get('files', []):
+            f_url = f.get('url')
+            content_format = f.get('content_format')
+            if not f_url or '-MDRM-' in content_format or '-FPS-' in content_format:
+                continue
+            formats.append({
+                'url': f_url,
+                'format_id': content_format,
+                'quality': quality(content_format),
+                'filesize': int_or_none(f.get('size_in_bytes')),
+            })
+        self._sort_formats(formats)
+
          compilation = result.get('compilation')
          episode = title if compilation else None
  
@@ -158,7 +209,7 @@ class IviIE(InfoExtractor):
              'episode_number': episode_number,
              'thumbnails': thumbnails,
              'description': description,
-            'duration': duration,
+            'duration': int_or_none(result.get('duration')),
              'formats': formats,
          }
  
@@ -188,7 +239,7 @@ class IviCompilationIE(InfoExtractor):
              self.url_result(
                  'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
              for serie in re.findall(
-                r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
+                r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/jamendo.py b/youtube_dl/extractor/jamendo.py

index c218276184ab4acf9086404cdb679a74e24f353b..490efa8fb4aadcbf626295338fbd8c40e818d32d 100644 (file)
--- a/youtube_dl/extractor/jamendo.py
+++ b/youtube_dl/extractor/jamendo.py
@@ -1,38 +1,26 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
+import hashlib
+import random
  
-from ..compat import compat_urlparse
+from ..compat import compat_str
  from .common import InfoExtractor
-from ..utils import parse_duration
-
-
-class JamendoBaseIE(InfoExtractor):
-    def _extract_meta(self, webpage, fatal=True):
-        title = self._og_search_title(
-            webpage, default=None) or self._search_regex(
-            r'<title>([^<]+)', webpage,
-            'title', default=None)
-        if title:
-            title = self._search_regex(
-                r'(.+?)\s*\|\s*Jamendo Music', title, 'title', default=None)
-        if not title:
-            title = self._html_search_meta(
-                'name', webpage, 'title', fatal=fatal)
-        mobj = re.search(r'(.+) - (.+)', title or '')
-        artist, second = mobj.groups() if mobj else [None] * 2
-        return title, artist, second
-
-
-class JamendoIE(JamendoBaseIE):
+from ..utils import (
+    clean_html,
+    int_or_none,
+    try_get,
+)
+
+
+class JamendoIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
                              licensing\.jamendo\.com/[^/]+|
                              (?:www\.)?jamendo\.com
                          )
-                        /track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)
+                        /track/(?P<id>[0-9]+)(?:/(?P<display_id>[^/?#&]+))?
                      '''
      _TESTS = [{
          'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
@@ -45,7 +33,9 @@ class JamendoIE(JamendoBaseIE):
              'artist': 'Maya Filipič',
              'track': 'Stories from Emona I',
              'duration': 210,
-            'thumbnail': r're:^https?://.*\.jpg'
+            'thumbnail': r're:^https?://.*\.jpg',
+            'timestamp': 1217438117,
+            'upload_date': '20080730',
          }
      }, {
          'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
@@ -53,15 +43,20 @@ class JamendoIE(JamendoBaseIE):
      }]
  
      def _real_extract(self, url):
-        mobj = self._VALID_URL_RE.match(url)
-        track_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
+        track_id, display_id = self._VALID_URL_RE.match(url).groups()
          webpage = self._download_webpage(
-            'https://www.jamendo.com/track/%s/%s' % (track_id, display_id),
-            display_id)
-
-        title, artist, track = self._extract_meta(webpage)
+            'https://www.jamendo.com/track/' + track_id, track_id)
+        models = self._parse_json(self._html_search_regex(
+            r"data-bundled-models='([^']+)",
+            webpage, 'bundled models'), track_id)
+        track = models['track']['models'][0]
+        title = track_name = track['name']
+        get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
+        artist = get_model('artist')
+        artist_name = artist.get('name')
+        if artist_name:
+            title = '%s - %s' % (artist_name, title)
+        album = get_model('album')
  
          formats = [{
              'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@@ -77,31 +72,58 @@ class JamendoIE(JamendoBaseIE):
          ))]
          self._sort_formats(formats)
  
-        thumbnail = self._html_search_meta(
-            'image', webpage, 'thumbnail', fatal=False)
-        duration = parse_duration(self._search_regex(
-            r'<span[^>]+itemprop=["\']duration["\'][^>]+content=["\'](.+?)["\']',
-            webpage, 'duration', fatal=False))
+        urls = []
+        thumbnails = []
+        for _, covers in track.get('cover', {}).items():
+            for cover_id, cover_url in covers.items():
+                if not cover_url or cover_url in urls:
+                    continue
+                urls.append(cover_url)
+                size = int_or_none(cover_id.lstrip('size'))
+                thumbnails.append({
+                    'id': cover_id,
+                    'url': cover_url,
+                    'width': size,
+                    'height': size,
+                })
+
+        tags = []
+        for tag in track.get('tags', []):
+            tag_name = tag.get('name')
+            if not tag_name:
+                continue
+            tags.append(tag_name)
+
+        stats = track.get('stats') or {}
  
          return {
              'id': track_id,
              'display_id': display_id,
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
              'title': title,
-            'duration': duration,
-            'artist': artist,
-            'track': track,
-            'formats': formats
+            'description': track.get('description'),
+            'duration': int_or_none(track.get('duration')),
+            'artist': artist_name,
+            'track': track_name,
+            'album': album.get('name'),
+            'formats': formats,
+            'license': '-'.join(track.get('licenseCC', [])) or None,
+            'timestamp': int_or_none(track.get('dateCreated')),
+            'view_count': int_or_none(stats.get('listenedAll')),
+            'like_count': int_or_none(stats.get('favorited')),
+            'average_rating': int_or_none(stats.get('averageNote')),
+            'tags': tags,
          }
  
  
-class JamendoAlbumIE(JamendoBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
+class JamendoAlbumIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
          'info_dict': {
              'id': '121486',
-            'title': 'Shearer - Duck On Cover'
+            'title': 'Duck On Cover',
+            'description': 'md5:c2920eaeef07d7af5b96d7c64daf1239',
          },
          'playlist': [{
              'md5': 'e1a2fcb42bda30dfac990212924149a8',
@@ -111,6 +133,8 @@ class JamendoAlbumIE(JamendoBaseIE):
                  'title': 'Shearer - Warmachine',
                  'artist': 'Shearer',
                  'track': 'Warmachine',
+                'timestamp': 1368089771,
+                'upload_date': '20130509',
              }
          }, {
              'md5': '1f358d7b2f98edfe90fd55dac0799d50',
@@ -120,6 +144,8 @@ class JamendoAlbumIE(JamendoBaseIE):
                  'title': 'Shearer - Without Your Ghost',
                  'artist': 'Shearer',
                  'track': 'Without Your Ghost',
+                'timestamp': 1368089771,
+                'upload_date': '20130509',
              }
          }],
          'params': {
@@ -127,24 +153,35 @@ class JamendoAlbumIE(JamendoBaseIE):
          }
      }
  
+    def _call_api(self, resource, resource_id):
+        path = '/api/%ss' % resource
+        rand = compat_str(random.random())
+        return self._download_json(
+            'https://www.jamendo.com' + path, resource_id, query={
+                'id[]': resource_id,
+            }, headers={
+                'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
+            })[0]
+
      def _real_extract(self, url):
-        mobj = self._VALID_URL_RE.match(url)
-        album_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, mobj.group('display_id'))
-
-        title, artist, album = self._extract_meta(webpage, fatal=False)
-
-        entries = [{
-            '_type': 'url_transparent',
-            'url': compat_urlparse.urljoin(url, m.group('path')),
-            'ie_key': JamendoIE.ie_key(),
-            'id': self._search_regex(
-                r'/track/(\d+)', m.group('path'), 'track id', default=None),
-            'artist': artist,
-            'album': album,
-        } for m in re.finditer(
-            r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
-            webpage)]
-
-        return self.playlist_result(entries, album_id, title)
+        album_id = self._match_id(url)
+        album = self._call_api('album', album_id)
+        album_name = album.get('name')
+
+        entries = []
+        for track in album.get('tracks', []):
+            track_id = track.get('id')
+            if not track_id:
+                continue
+            track_id = compat_str(track_id)
+            entries.append({
+                '_type': 'url_transparent',
+                'url': 'https://www.jamendo.com/track/' + track_id,
+                'ie_key': JamendoIE.ie_key(),
+                'id': track_id,
+                'album': album_name,
+            })
+
+        return self.playlist_result(
+            entries, album_id, album_name,
+            clean_html(try_get(album, lambda x: x['description']['en'], compat_str)))
diff --git a/youtube_dl/extractor/kakao.py b/youtube_dl/extractor/kakao.py

index 7fa140b0ce299ce7cbd9fb3b2e9016d9be3286ff..32935bb283bfda6273c3eccf0f98fc6fb02f3e84 100644 (file)
--- a/youtube_dl/extractor/kakao.py
+++ b/youtube_dl/extractor/kakao.py
@@ -6,14 +6,15 @@ from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
      int_or_none,
+    strip_or_none,
      unified_timestamp,
      update_url_query,
  )
  
  
  class KakaoIE(InfoExtractor):
-    _VALID_URL = r'https?://tv\.kakao\.com/channel/(?P<channel>\d+)/cliplink/(?P<id>\d+)'
-    _API_BASE = 'http://tv.kakao.com/api/v1/ft/cliplinks'
+    _VALID_URL = r'https?://(?:play-)?tv\.kakao\.com/(?:channel/\d+|embed/player)/cliplink/(?P<id>\d+|[^?#&]+@my)'
+    _API_BASE_TMPL = 'http://tv.kakao.com/api/v1/ft/cliplinks/%s/'
  
      _TESTS = [{
          'url': 'http://tv.kakao.com/channel/2671005/cliplink/301965083',
@@ -36,7 +37,7 @@ class KakaoIE(InfoExtractor):
              'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
              'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
              'uploader_id': 2653210,
-            'uploader': '쇼 음악중심',
+            'uploader': '쇼! 음악중심',
              'timestamp': 1485684628,
              'upload_date': '20170129',
          }
@@ -44,6 +45,8 @@ class KakaoIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        display_id = video_id.rstrip('@my')
+        api_base = self._API_BASE_TMPL % video_id
  
          player_header = {
              'Referer': update_url_query(
@@ -55,20 +58,23 @@ class KakaoIE(InfoExtractor):
                  })
          }
  
-        QUERY_COMMON = {
+        query = {
              'player': 'monet_html5',
              'referer': url,
              'uuid': '',
              'service': 'kakao_tv',
              'section': '',
              'dteType': 'PC',
+            'fields': ','.join([
+                '-*', 'tid', 'clipLink', 'displayTitle', 'clip', 'title',
+                'description', 'channelId', 'createTime', 'duration', 'playCount',
+                'likeCount', 'commentCount', 'tagList', 'channel', 'name',
+                'clipChapterThumbnailList', 'thumbnailUrl', 'timeInSec', 'isDefault',
+                'videoOutputList', 'width', 'height', 'kbps', 'profile', 'label'])
          }
  
-        query = QUERY_COMMON.copy()
-        query['fields'] = 'clipLink,clip,channel,hasPlusFriend,-service,-tagList'
          impress = self._download_json(
-            '%s/%s/impress' % (self._API_BASE, video_id),
-            video_id, 'Downloading video info',
+            api_base + 'impress', display_id, 'Downloading video info',
              query=query, headers=player_header)
  
          clip_link = impress['clipLink']
@@ -76,32 +82,22 @@ class KakaoIE(InfoExtractor):
  
          title = clip.get('title') or clip_link.get('displayTitle')
  
-        tid = impress.get('tid', '')
-
-        query = QUERY_COMMON.copy()
-        query.update({
-            'tid': tid,
-            'profile': 'HIGH',
-        })
-        raw = self._download_json(
-            '%s/%s/raw' % (self._API_BASE, video_id),
-            video_id, 'Downloading video formats info',
-            query=query, headers=player_header)
+        query['tid'] = impress.get('tid', '')
  
          formats = []
-        for fmt in raw.get('outputList', []):
+        for fmt in clip.get('videoOutputList', []):
              try:
                  profile_name = fmt['profile']
+                if profile_name == 'AUDIO':
+                    continue
+                query.update({
+                    'profile': profile_name,
+                    'fields': '-*,url',
+                })
                  fmt_url_json = self._download_json(
-                    '%s/%s/raw/videolocation' % (self._API_BASE, video_id),
-                    video_id,
+                    api_base + 'raw/videolocation', display_id,
                      'Downloading video URL for profile %s' % profile_name,
-                    query={
-                        'service': 'kakao_tv',
-                        'section': '',
-                        'tid': tid,
-                        'profile': profile_name
-                    }, headers=player_header, fatal=False)
+                    query=query, headers=player_header, fatal=False)
  
                  if fmt_url_json is None:
                      continue
@@ -113,7 +109,8 @@ class KakaoIE(InfoExtractor):
                      'width': int_or_none(fmt.get('width')),
                      'height': int_or_none(fmt.get('height')),
                      'format_note': fmt.get('label'),
-                    'filesize': int_or_none(fmt.get('filesize'))
+                    'filesize': int_or_none(fmt.get('filesize')),
+                    'tbr': int_or_none(fmt.get('kbps')),
                  })
              except KeyError:
                  pass
@@ -134,9 +131,9 @@ class KakaoIE(InfoExtractor):
              })
  
          return {
-            'id': video_id,
+            'id': display_id,
              'title': title,
-            'description': clip.get('description'),
+            'description': strip_or_none(clip.get('description')),
              'uploader': clip_link.get('channel', {}).get('name'),
              'uploader_id': clip_link.get('channelId'),
              'thumbnails': thumbs,
@@ -146,4 +143,5 @@ class KakaoIE(InfoExtractor):
              'like_count': int_or_none(clip.get('likeCount')),
              'comment_count': int_or_none(clip.get('commentCount')),
              'formats': formats,
+            'tags': clip.get('tagList'),
          }
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index 0a733424c471e6a0d83b04b7773742353dda5c5a..2d38b758b72a852c6d9718f0537c62e7c215e903 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -151,14 +151,15 @@ class KalturaIE(InfoExtractor):
          if mobj:
              embed_info = mobj.groupdict()
              for k, v in embed_info.items():
-                embed_info[k] = v.strip()
+                if v:
+                    embed_info[k] = v.strip()
              url = 'kaltura:%(partner_id)s:%(id)s' % embed_info
              escaped_pid = re.escape(embed_info['partner_id'])
-            service_url = re.search(
-                r'<script[^>]+src=["\']((?:https?:)?//.+?)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
+            service_mobj = re.search(
+                r'<script[^>]+src=(["\'])(?P<id>(?:https?:)?//(?:(?!\1).)+)/p/%s/sp/%s00/embedIframeJs' % (escaped_pid, escaped_pid),
                  webpage)
-            if service_url:
-                url = smuggle_url(url, {'service_url': service_url.group(1)})
+            if service_mobj:
+                url = smuggle_url(url, {'service_url': service_mobj.group('id')})
              return url
  
      def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
diff --git a/youtube_dl/extractor/keek.py b/youtube_dl/extractor/keek.py

deleted file mode 100644 (file)

index 94a03d2..0000000
--- a/youtube_dl/extractor/keek.py
+++ /dev/null
@@ -1,39 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class KeekIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?keek\.com/keek/(?P<id>\w+)'
-    IE_NAME = 'keek'
-    _TEST = {
-        'url': 'https://www.keek.com/keek/NODfbab',
-        'md5': '9b0636f8c0f7614afa4ea5e4c6e57e83',
-        'info_dict': {
-            'id': 'NODfbab',
-            'ext': 'mp4',
-            'title': 'md5:35d42050a3ece241d5ddd7fdcc6fd896',
-            'uploader': 'ytdl',
-            'uploader_id': 'eGT5bab',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        return {
-            'id': video_id,
-            'url': self._og_search_video_url(webpage),
-            'ext': 'mp4',
-            'title': self._og_search_description(webpage).strip(),
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'uploader': self._search_regex(
-                r'data-username=(["\'])(?P<uploader>.+?)\1', webpage,
-                'uploader', fatal=False, group='uploader'),
-            'uploader_id': self._search_regex(
-                r'data-user-id=(["\'])(?P<uploader_id>.+?)\1', webpage,
-                'uploader id', fatal=False, group='uploader_id'),
-        }
diff --git a/youtube_dl/extractor/kinja.py b/youtube_dl/extractor/kinja.py

new file mode 100644 (file)

index 0000000..79e3026
--- /dev/null
+++ b/youtube_dl/extractor/kinja.py
@@ -0,0 +1,221 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urllib_parse_unquote,
+)
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    strip_or_none,
+    try_get,
+    unescapeHTML,
+    urljoin,
+)
+
+
+class KinjaEmbedIE(InfoExtractor):
+    IENAME = 'kinja:embed'
+    _DOMAIN_REGEX = r'''(?:[^.]+\.)?
+        (?:
+            avclub|
+            clickhole|
+            deadspin|
+            gizmodo|
+            jalopnik|
+            jezebel|
+            kinja|
+            kotaku|
+            lifehacker|
+            splinternews|
+            the(?:inventory|onion|root|takeout)
+        )\.com'''
+    _COMMON_REGEX = r'''/
+        (?:
+            ajax/inset|
+            embed/video
+        )/iframe\?.*?\bid='''
+    _VALID_URL = r'''(?x)https?://%s%s
+        (?P<type>
+            fb|
+            imgur|
+            instagram|
+            jwp(?:layer)?-video|
+            kinjavideo|
+            mcp|
+            megaphone|
+            ooyala|
+            soundcloud(?:-playlist)?|
+            tumblr-post|
+            twitch-stream|
+            twitter|
+            ustream-channel|
+            vimeo|
+            vine|
+            youtube-(?:list|video)
+        )-(?P<id>[^&]+)''' % (_DOMAIN_REGEX, _COMMON_REGEX)
+    _TESTS = [{
+        'url': 'https://kinja.com/ajax/inset/iframe?id=fb-10103303356633621',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=kinjavideo-100313',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=megaphone-PPY1300931075',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=ooyala-xzMXhleDpopuT0u1ijt_qZj3Va-34pEX%2FZTIxYmJjZDM2NWYzZDViZGRiOWJjYzc5',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=soundcloud-128574047',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=soundcloud-playlist-317413750',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=tumblr-post-160130699814-daydreams-at-midnight',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=twitch-stream-libratus_extra',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=twitter-1068875942473404422',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=ustream-channel-10414700',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=vimeo-120153502',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=vine-5BlvV5qqPrD',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=youtube-list-BCQ3KyrPjgA/PLE6509247C270A72E',
+        'only_matching': True,
+    }, {
+        'url': 'https://kinja.com/ajax/inset/iframe?id=youtube-video-00QyL0AgPAE',
+        'only_matching': True,
+    }]
+    _JWPLATFORM_PROVIDER = ('cdn.jwplayer.com/v2/media/', 'JWPlatform')
+    _PROVIDER_MAP = {
+        'fb': ('facebook.com/video.php?v=', 'Facebook'),
+        'imgur': ('imgur.com/', 'Imgur'),
+        'instagram': ('instagram.com/p/', 'Instagram'),
+        'jwplayer-video': _JWPLATFORM_PROVIDER,
+        'jwp-video': _JWPLATFORM_PROVIDER,
+        'megaphone': ('player.megaphone.fm/', 'Generic'),
+        'ooyala': ('player.ooyala.com/player.js?embedCode=', 'Ooyala'),
+        'soundcloud': ('api.soundcloud.com/tracks/', 'Soundcloud'),
+        'soundcloud-playlist': ('api.soundcloud.com/playlists/', 'SoundcloudPlaylist'),
+        'tumblr-post': ('%s.tumblr.com/post/%s', 'Tumblr'),
+        'twitch-stream': ('twitch.tv/', 'TwitchStream'),
+        'twitter': ('twitter.com/i/cards/tfw/v1/', 'TwitterCard'),
+        'ustream-channel': ('ustream.tv/embed/', 'Ustream'),
+        'vimeo': ('vimeo.com/', 'Vimeo'),
+        'vine': ('vine.co/v/', 'Vine'),
+        'youtube-list': ('youtube.com/embed/%s?list=%s', 'YoutubePlaylist'),
+        'youtube-video': ('youtube.com/embed/', 'Youtube'),
+    }
+
+    @staticmethod
+    def _extract_urls(webpage, url):
+        return [urljoin(url, unescapeHTML(mobj.group('url'))) for mobj in re.finditer(
+            r'(?x)<iframe[^>]+?src=(?P<q>["\'])(?P<url>(?:(?:https?:)?//%s)?%s(?:(?!\1).)+)\1' % (KinjaEmbedIE._DOMAIN_REGEX, KinjaEmbedIE._COMMON_REGEX),
+            webpage)]
+
+    def _real_extract(self, url):
+        video_type, video_id = re.match(self._VALID_URL, url).groups()
+
+        provider = self._PROVIDER_MAP.get(video_type)
+        if provider:
+            video_id = compat_urllib_parse_unquote(video_id)
+            if video_type == 'tumblr-post':
+                video_id, blog = video_id.split('-', 1)
+                result_url = provider[0] % (blog, video_id)
+            elif video_type == 'youtube-list':
+                video_id, playlist_id = video_id.split('/')
+                result_url = provider[0] % (video_id, playlist_id)
+            else:
+                if video_type == 'ooyala':
+                    video_id = video_id.split('/')[0]
+                result_url = provider[0] + video_id
+            return self.url_result('http://' + result_url, provider[1])
+
+        if video_type == 'kinjavideo':
+            data = self._download_json(
+                'https://kinja.com/api/core/video/views/videoById',
+                video_id, query={'videoId': video_id})['data']
+            title = data['title']
+
+            formats = []
+            for k in ('signedPlaylist', 'streaming'):
+                m3u8_url = data.get(k + 'Url')
+                if m3u8_url:
+                    formats.extend(self._extract_m3u8_formats(
+                        m3u8_url, video_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+            self._sort_formats(formats)
+
+            thumbnail = None
+            poster = data.get('poster') or {}
+            poster_id = poster.get('id')
+            if poster_id:
+                thumbnail = 'https://i.kinja-img.com/gawker-media/image/upload/%s.%s' % (poster_id, poster.get('format') or 'jpg')
+
+            return {
+                'id': video_id,
+                'title': title,
+                'description': strip_or_none(data.get('description')),
+                'formats': formats,
+                'tags': data.get('tags'),
+                'timestamp': int_or_none(try_get(
+                    data, lambda x: x['postInfo']['publishTimeMillis']), 1000),
+                'thumbnail': thumbnail,
+                'uploader': data.get('network'),
+            }
+        else:
+            video_data = self._download_json(
+                'https://api.vmh.univision.com/metadata/v1/content/' + video_id,
+                video_id)['videoMetadata']
+            iptc = video_data['photoVideoMetadataIPTC']
+            title = iptc['title']['en']
+            fmg = video_data.get('photoVideoMetadata_fmg') or {}
+            tvss_domain = fmg.get('tvssDomain') or 'https://auth.univision.com'
+            data = self._download_json(
+                tvss_domain + '/api/v3/video-auth/url-signature-tokens',
+                video_id, query={'mcpids': video_id})['data'][0]
+            formats = []
+
+            rendition_url = data.get('renditionUrl')
+            if rendition_url:
+                formats = self._extract_m3u8_formats(
+                    rendition_url, video_id, 'mp4',
+                    'm3u8_native', m3u8_id='hls', fatal=False)
+
+            fallback_rendition_url = data.get('fallbackRenditionUrl')
+            if fallback_rendition_url:
+                formats.append({
+                    'format_id': 'fallback',
+                    'tbr': int_or_none(self._search_regex(
+                        r'_(\d+)\.mp4', fallback_rendition_url,
+                        'bitrate', default=None)),
+                    'url': fallback_rendition_url,
+                })
+
+            self._sort_formats(formats)
+
+            return {
+                'id': video_id,
+                'title': title,
+                'thumbnail': try_get(iptc, lambda x: x['cloudinaryLink']['link'], compat_str),
+                'uploader': fmg.get('network'),
+                'duration': int_or_none(iptc.get('fileDuration')),
+                'formats': formats,
+                'description': try_get(iptc, lambda x: x['description']['en'], compat_str),
+                'timestamp': parse_iso8601(iptc.get('dateReleased')),
+            }
diff --git a/youtube_dl/extractor/kontrtube.py b/youtube_dl/extractor/kontrtube.py

deleted file mode 100644 (file)

index 1fda451..0000000
--- a/youtube_dl/extractor/kontrtube.py
+++ /dev/null
@@ -1,73 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    parse_duration,
-)
-
-
-class KontrTubeIE(InfoExtractor):
-    IE_NAME = 'kontrtube'
-    IE_DESC = 'KontrTube.ru - Труба зовёт'
-    _VALID_URL = r'https?://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
-
-    _TEST = {
-        'url': 'http://www.kontrtube.ru/videos/2678/nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag/',
-        'md5': '975a991a4926c9a85f383a736a2e6b80',
-        'info_dict': {
-            'id': '2678',
-            'display_id': 'nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag',
-            'ext': 'mp4',
-            'title': 'Над олимпийской деревней в Сочи поднят российский флаг',
-            'description': 'md5:80edc4c613d5887ae8ccf1d59432be41',
-            'thumbnail': 'http://www.kontrtube.ru/contents/videos_screenshots/2000/2678/preview.mp4.jpg',
-            'duration': 270,
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        webpage = self._download_webpage(
-            url, display_id, 'Downloading page')
-
-        video_url = self._search_regex(
-            r"video_url\s*:\s*'(.+?)/?',", webpage, 'video URL')
-        thumbnail = self._search_regex(
-            r"preview_url\s*:\s*'(.+?)/?',", webpage, 'thumbnail', fatal=False)
-        title = self._html_search_regex(
-            r'(?s)<h2>(.+?)</h2>', webpage, 'title')
-        description = self._html_search_meta(
-            'description', webpage, 'description')
-
-        duration = self._search_regex(
-            r'Длительность: <em>([^<]+)</em>', webpage, 'duration', fatal=False)
-        if duration:
-            duration = parse_duration(duration.replace('мин', 'min').replace('сек', 'sec'))
-
-        view_count = self._search_regex(
-            r'Просмотров: <em>([^<]+)</em>',
-            webpage, 'view count', fatal=False)
-        if view_count:
-            view_count = int_or_none(view_count.replace(' ', ''))
-
-        comment_count = int_or_none(self._search_regex(
-            r'Комментарии \((\d+)\)<', webpage, ' comment count', fatal=False))
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'url': video_url,
-            'thumbnail': thumbnail,
-            'title': title,
-            'description': description,
-            'duration': duration,
-            'view_count': int_or_none(view_count),
-            'comment_count': int_or_none(comment_count),
-        }
diff --git a/youtube_dl/extractor/la7.py b/youtube_dl/extractor/la7.py

index 6373268c400744f9b51a3d65a41a20d1eb27e034..c3b4ffa7e3f523bf9e0d10ad4660e72d18078b6f 100644 (file)
--- a/youtube_dl/extractor/la7.py
+++ b/youtube_dl/extractor/la7.py
@@ -20,7 +20,7 @@ class LA7IE(InfoExtractor):
          'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
          'md5': '8b613ffc0c4bf9b9e377169fc19c214c',
          'info_dict': {
-            'id': 'inccool8-02-10-2015-163722',
+            'id': '0_42j6wd36',
              'ext': 'mp4',
              'title': 'Inc.Cool8',
              'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto  atletico',
@@ -57,7 +57,7 @@ class LA7IE(InfoExtractor):
          return {
              '_type': 'url_transparent',
              'url': smuggle_url('kaltura:103:%s' % player_data['vid'], {
-                'service_url': 'http://kdam.iltrovatore.it',
+                'service_url': 'http://nkdam.iltrovatore.it',
              }),
              'id': video_id,
              'title': player_data['title'],
diff --git a/youtube_dl/extractor/learnr.py b/youtube_dl/extractor/learnr.py

deleted file mode 100644 (file)

index 1435e09..0000000
--- a/youtube_dl/extractor/learnr.py
+++ /dev/null
@@ -1,33 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class LearnrIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?learnr\.pro/view/video/(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://www.learnr.pro/view/video/51624-web-development-tutorial-for-beginners-1-how-to-build-webpages-with-html-css-javascript',
-        'md5': '3719fdf0a68397f49899e82c308a89de',
-        'info_dict': {
-            'id': '51624',
-            'ext': 'mp4',
-            'title': 'Web Development Tutorial for Beginners (#1) - How to build webpages with HTML, CSS, Javascript',
-            'description': 'md5:b36dbfa92350176cdf12b4d388485503',
-            'uploader': 'LearnCode.academy',
-            'uploader_id': 'learncodeacademy',
-            'upload_date': '20131021',
-        },
-        'add_ie': ['Youtube'],
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        return {
-            '_type': 'url_transparent',
-            'url': self._search_regex(
-                r"videoId\s*:\s*'([^']+)'", webpage, 'youtube id'),
-            'id': video_id,
-        }
diff --git a/youtube_dl/extractor/lego.py b/youtube_dl/extractor/lego.py

index b312e77f1abd5d4a05f43c763b7c9f56aefbd0e5..1e3c19dfd65b442ee7b4f3abf1803022c5975d8e 100644 (file)
--- a/youtube_dl/extractor/lego.py
+++ b/youtube_dl/extractor/lego.py
@@ -2,23 +2,24 @@
  from __future__ import unicode_literals
  
  import re
+import uuid
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import compat_HTTPError
  from ..utils import (
-    unescapeHTML,
-    parse_duration,
-    get_element_by_class,
+    ExtractorError,
+    int_or_none,
+    qualities,
  )
  
  
  class LEGOIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
+    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})'
      _TESTS = [{
          'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
          'md5': 'f34468f176cfd76488767fc162c405fa',
          'info_dict': {
-            'id': '55492d823b1b4d5e985787fa8c2973b1',
+            'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US',
              'ext': 'mp4',
              'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
              'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
@@ -26,103 +27,123 @@ class LEGOIE(InfoExtractor):
      }, {
          # geo-restricted but the contentUrl contain a valid url
          'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
-        'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
+        'md5': 'c7420221f7ffd03ff056f9db7f8d807c',
          'info_dict': {
-            'id': '13bdc2299ab24d9685701a915b3d71e7',
+            'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL',
              'ext': 'mp4',
-            'title': 'Aflevering 20 - Helden van het koninkrijk',
+            'title': 'Aflevering 20:  Helden van het koninkrijk',
              'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
+            'age_limit': 5,
          },
      }, {
-        # special characters in title
-        'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
+        # with subtitle
+        'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba',
          'info_dict': {
-            'id': '9685ee9d12e84ff38e84b4e3d0db533d',
+            'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL',
              'ext': 'mp4',
-            'title': 'Force Surprise – LEGO® Star Wars™ Microfighters',
-            'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
+            'title': 'De kleine puppy',
+            'description': 'md5:5b725471f849348ac73f2e12cfb4be06',
+            'age_limit': 1,
+            'subtitles': {
+                'nl': [{
+                    'ext': 'srt',
+                    'url': r're:^https://.+\.srt$',
+                }],
+            },
          },
          'params': {
              'skip_download': True,
          },
      }]
-    _BITRATES = [256, 512, 1024, 1536, 2560]
+    _QUALITIES = {
+        'Lowest': (64, 180, 320),
+        'Low': (64, 270, 480),
+        'Medium': (96, 360, 640),
+        'High': (128, 540, 960),
+        'Highest': (128, 720, 1280),
+    }
  
      def _real_extract(self, url):
          locale, video_id = re.match(self._VALID_URL, url).groups()
-        webpage = self._download_webpage(url, video_id)
-        title = get_element_by_class('video-header', webpage).strip()
-        progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
-        streaming_base = 'http://legoprod-f.akamaihd.net/'
-        content_url = self._html_search_meta('contentUrl', webpage)
-        path = self._search_regex(
-            r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
-            content_url, 'video path', default=None)
-        if not path:
-            player_url = self._proto_relative_url(self._search_regex(
-                r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
-                webpage, 'player url', default=None))
-            if not player_url:
-                base_url = self._proto_relative_url(self._search_regex(
-                    r'data-baseurl="([^"]+)"', webpage, 'base url',
-                    default='http://www.lego.com/%s/mediaplayer/video/' % locale))
-                player_url = base_url + video_id
-            player_webpage = self._download_webpage(player_url, video_id)
-            video_data = self._parse_json(unescapeHTML(self._search_regex(
-                r"video='([^']+)'", player_webpage, 'video data')), video_id)
-            progressive_base = self._search_regex(
-                r'data-video-progressive-url="([^"]+)"',
-                player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
-            streaming_base = self._search_regex(
-                r'data-video-streaming-url="([^"]+)"',
-                player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
-            item_id = video_data['ItemId']
+        countries = [locale.split('-')[1].upper()]
+        self._initialize_geo_bypass({
+            'countries': countries,
+        })
  
-            net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
-            base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
-            path = '/'.join([net_storage_path, base_path])
-        streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
+        try:
+            item = self._download_json(
+                # https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video
+                'https://services.slingshot.lego.com/mediaplayer/v2',
+                video_id, query={
+                    'videoId': '%s_%s' % (uuid.UUID(video_id), locale),
+                }, headers=self.geo_verification_headers())
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
+                self.raise_geo_restricted(countries=countries)
+            raise
  
-        formats = self._extract_akamai_formats(
-            '%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
-        m3u8_formats = list(filter(
-            lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
-            formats))
-        if len(m3u8_formats) == len(self._BITRATES):
-            self._sort_formats(m3u8_formats)
-            for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
-                progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
-                mp4_f = m3u8_format.copy()
-                mp4_f.update({
-                    'url': progressive_base_url + 'mp4',
-                    'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
-                    'protocol': 'http',
-                })
-                web_f = {
-                    'url': progressive_base_url + 'webm',
-                    'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
-                    'width': m3u8_format['width'],
-                    'height': m3u8_format['height'],
-                    'tbr': m3u8_format.get('tbr'),
-                    'ext': 'webm',
+        video = item['Video']
+        video_id = video['Id']
+        title = video['Title']
+
+        q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
+        formats = []
+        for video_source in item.get('VideoFormats', []):
+            video_source_url = video_source.get('Url')
+            if not video_source_url:
+                continue
+            video_source_format = video_source.get('Format')
+            if video_source_format == 'F4M':
+                formats.extend(self._extract_f4m_formats(
+                    video_source_url, video_id,
+                    f4m_id=video_source_format, fatal=False))
+            elif video_source_format == 'M3U8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_source_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=video_source_format, fatal=False))
+            else:
+                video_source_quality = video_source.get('Quality')
+                format_id = []
+                for v in (video_source_format, video_source_quality):
+                    if v:
+                        format_id.append(v)
+                f = {
+                    'format_id': '-'.join(format_id),
+                    'quality': q(video_source_quality),
+                    'url': video_source_url,
                  }
-                formats.extend([web_f, mp4_f])
-        else:
-            for bitrate in self._BITRATES:
-                for ext in ('web', 'mp4'):
-                    formats.append({
-                        'format_id': '%s-%s' % (ext, bitrate),
-                        'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
-                        'tbr': bitrate,
-                        'ext': ext,
-                    })
+                quality = self._QUALITIES.get(video_source_quality)
+                if quality:
+                    f.update({
+                        'abr': quality[0],
+                        'height': quality[1],
+                        'width': quality[2],
+                    }),
+                formats.append(f)
          self._sort_formats(formats)
  
+        subtitles = {}
+        sub_file_id = video.get('SubFileId')
+        if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
+            net_storage_path = video.get('NetstoragePath')
+            invariant_id = video.get('InvariantId')
+            video_file_id = video.get('VideoFileId')
+            video_version = video.get('VideoVersion')
+            if net_storage_path and invariant_id and video_file_id and video_version:
+                subtitles.setdefault(locale[:2], []).append({
+                    'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
+                })
+
          return {
              'id': video_id,
              'title': title,
-            'description': self._html_search_meta('description', webpage),
-            'thumbnail': self._html_search_meta('thumbnail', webpage),
-            'duration': parse_duration(self._html_search_meta('duration', webpage)),
+            'description': video.get('Description'),
+            'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'),
+            'duration': int_or_none(video.get('Length')),
              'formats': formats,
+            'subtitles': subtitles,
+            'age_limit': int_or_none(video.get('AgeFrom')),
+            'season': video.get('SeasonTitle'),
+            'season_number': int_or_none(video.get('Season')) or None,
+            'episode_number': int_or_none(video.get('Episode')) or None,
          }
diff --git a/youtube_dl/extractor/lnkgo.py b/youtube_dl/extractor/lnkgo.py

index cfec0d3d0910c6b107831b050f4af6dcd9d248c0..3e71852aab92fdfb607332cb1438f526f7ee6ae0 100644 (file)
--- a/youtube_dl/extractor/lnkgo.py
+++ b/youtube_dl/extractor/lnkgo.py
@@ -5,24 +5,27 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    clean_html,
+    compat_str,
      int_or_none,
-    unified_strdate,
+    parse_iso8601,
  )
  
  
  class LnkGoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?lnkgo\.(?:alfa\.)?lt/visi-video/(?P<show>[^/]+)/ziurek-(?P<id>[A-Za-z0-9-]+)'
+    _VALID_URL = r'https?://(?:www\.)?lnk(?:go)?\.(?:alfa\.)?lt/(?:visi-video/[^/]+|video)/(?P<id>[A-Za-z0-9-]+)(?:/(?P<episode_id>\d+))?'
      _TESTS = [{
-        'url': 'http://lnkgo.alfa.lt/visi-video/yra-kaip-yra/ziurek-yra-kaip-yra-162',
+        'url': 'http://www.lnkgo.lt/visi-video/aktualai-pratesimas/ziurek-putka-trys-klausimai',
          'info_dict': {
-            'id': '46712',
+            'id': '10809',
              'ext': 'mp4',
-            'title': 'Yra kaip yra',
-            'upload_date': '20150107',
-            'description': 'md5:d82a5e36b775b7048617f263a0e3475e',
-            'age_limit': 7,
-            'duration': 3019,
-            'thumbnail': r're:^https?://.*\.jpg$'
+            'title': "Put'ka: Trys Klausimai",
+            'upload_date': '20161216',
+            'description': 'Seniai matytas Put’ka užduoda tris klausimėlius. Pabandykime surasti atsakymus.',
+            'age_limit': 18,
+            'duration': 117,
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1481904000,
          },
          'params': {
              'skip_download': True,  # HLS download
@@ -30,20 +33,21 @@ class LnkGoIE(InfoExtractor):
      }, {
          'url': 'http://lnkgo.alfa.lt/visi-video/aktualai-pratesimas/ziurek-nerdas-taiso-kompiuteri-2',
          'info_dict': {
-            'id': '47289',
+            'id': '10467',
              'ext': 'mp4',
              'title': 'Nėrdas: Kompiuterio Valymas',
              'upload_date': '20150113',
              'description': 'md5:7352d113a242a808676ff17e69db6a69',
              'age_limit': 18,
              'duration': 346,
-            'thumbnail': r're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1421164800,
          },
          'params': {
              'skip_download': True,  # HLS download
          },
      }, {
-        'url': 'http://www.lnkgo.lt/visi-video/aktualai-pratesimas/ziurek-putka-trys-klausimai',
+        'url': 'https://lnk.lt/video/neigalieji-tv-bokste/37413',
          'only_matching': True,
      }]
      _AGE_LIMITS = {
@@ -51,66 +55,34 @@ class LnkGoIE(InfoExtractor):
          'N-14': 14,
          'S': 18,
      }
+    _M3U8_TEMPL = 'https://vod.lnk.lt/lnk_vod/lnk/lnk/%s:%s/playlist.m3u8%s'
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(
-            url, display_id, 'Downloading player webpage')
-
-        video_id = self._search_regex(
-            r'data-ep="([^"]+)"', webpage, 'video ID')
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        upload_date = unified_strdate(self._search_regex(
-            r'class="[^"]*meta-item[^"]*air-time[^"]*">.*?<strong>([^<]+)</strong>', webpage, 'upload date', fatal=False))
-
-        thumbnail_w = int_or_none(
-            self._og_search_property('image:width', webpage, 'thumbnail width', fatal=False))
-        thumbnail_h = int_or_none(
-            self._og_search_property('image:height', webpage, 'thumbnail height', fatal=False))
-        thumbnail = {
-            'url': self._og_search_thumbnail(webpage),
-        }
-        if thumbnail_w and thumbnail_h:
-            thumbnail.update({
-                'width': thumbnail_w,
-                'height': thumbnail_h,
-            })
-
-        config = self._parse_json(self._search_regex(
-            r'episodePlayer\((\{.*?\}),\s*\{', webpage, 'sources'), video_id)
-
-        if config.get('pGeo'):
-            self.report_warning(
-                'This content might not be available in your country due to copyright reasons')
+        display_id, video_id = re.match(self._VALID_URL, url).groups()
  
-        formats = [{
-            'format_id': 'hls',
-            'ext': 'mp4',
-            'url': config['EpisodeVideoLink_HLS'],
-        }]
-
-        m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<play_path>.+)$', config['EpisodeVideoLink'])
-        if m:
-            formats.append({
-                'format_id': 'rtmp',
-                'ext': 'flv',
-                'url': m.group('url'),
-                'play_path': m.group('play_path'),
-                'page_url': url,
-            })
+        video_info = self._download_json(
+            'https://lnk.lt/api/main/video-page/%s/%s/false' % (display_id, video_id or '0'),
+            display_id)['videoConfig']['videoInfo']
  
+        video_id = compat_str(video_info['id'])
+        title = video_info['title']
+        prefix = 'smil' if video_info.get('isQualityChangeAvailable') else 'mp4'
+        formats = self._extract_m3u8_formats(
+            self._M3U8_TEMPL % (prefix, video_info['videoUrl'], video_info.get('secureTokenParams') or ''),
+            video_id, 'mp4', 'm3u8_native')
          self._sort_formats(formats)
  
+        poster_image = video_info.get('posterImage')
+
          return {
              'id': video_id,
              'display_id': display_id,
              'title': title,
              'formats': formats,
-            'thumbnails': [thumbnail],
-            'duration': int_or_none(config.get('VideoTime')),
-            'description': description,
-            'age_limit': self._AGE_LIMITS.get(config.get('PGRating'), 0),
-            'upload_date': upload_date,
+            'thumbnail': 'https://lnk.lt/all-images/' + poster_image if poster_image else None,
+            'duration': int_or_none(video_info.get('duration')),
+            'description': clean_html(video_info.get('htmlDescription')),
+            'age_limit': self._AGE_LIMITS.get(video_info.get('pgRating'), 0),
+            'timestamp': parse_iso8601(video_info.get('airDate')),
+            'view_count': int_or_none(video_info.get('viewsCount')),
          }
diff --git a/youtube_dl/extractor/macgamestore.py b/youtube_dl/extractor/macgamestore.py

deleted file mode 100644 (file)

index 43db992..0000000
--- a/youtube_dl/extractor/macgamestore.py
+++ /dev/null
@@ -1,42 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import ExtractorError
-
-
-class MacGameStoreIE(InfoExtractor):
-    IE_NAME = 'macgamestore'
-    IE_DESC = 'MacGameStore trailers'
-    _VALID_URL = r'https?://(?:www\.)?macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
-
-    _TEST = {
-        'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450',
-        'md5': '8649b8ea684b6666b4c5be736ecddc61',
-        'info_dict': {
-            'id': '2450',
-            'ext': 'm4v',
-            'title': 'Crow',
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(
-            url, video_id, 'Downloading trailer page')
-
-        if '>Missing Media<' in webpage:
-            raise ExtractorError(
-                'Trailer %s does not exist' % video_id, expected=True)
-
-        video_title = self._html_search_regex(
-            r'<title>MacGameStore: (.*?) Trailer</title>', webpage, 'title')
-
-        video_url = self._html_search_regex(
-            r'(?s)<div\s+id="video-player".*?href="([^"]+)"\s*>',
-            webpage, 'video URL')
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': video_title
-        }
diff --git a/youtube_dl/extractor/mailru.py b/youtube_dl/extractor/mailru.py

index 6b0e64b7f1032159262220dcf77c6ffaa358d014..50234798bbe5176d881b5a42357b92ef36c16fe3 100644 (file)
--- a/youtube_dl/extractor/mailru.py
+++ b/youtube_dl/extractor/mailru.py
@@ -20,10 +20,10 @@ class MailRuIE(InfoExtractor):
      IE_DESC = 'Видео@Mail.Ru'
      _VALID_URL = r'''(?x)
                      https?://
-                        (?:(?:www|m)\.)?my\.mail\.ru/
+                        (?:(?:www|m)\.)?my\.mail\.ru/+
                          (?:
                              video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
-                            (?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
+                            (?:(?P<idv2prefix>(?:[^/]+/+){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
                              (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
                          )
                      '''
@@ -85,6 +85,14 @@ class MailRuIE(InfoExtractor):
          {
              'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
              'only_matching': True,
+        },
+        {
+            'url': 'https://my.mail.ru//list/sinyutin10/video/_myvideo/4.html',
+            'only_matching': True,
+        },
+        {
+            'url': 'https://my.mail.ru//list//sinyutin10/video/_myvideo/4.html',
+            'only_matching': True,
          }
      ]
  
@@ -237,7 +245,7 @@ class MailRuMusicSearchBaseIE(InfoExtractor):
  class MailRuMusicIE(MailRuMusicSearchBaseIE):
      IE_NAME = 'mailru:music'
      IE_DESC = 'Музыка@Mail.Ru'
-    _VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
+    _VALID_URL = r'https?://my\.mail\.ru/+music/+songs/+[^/?#&]+-(?P<id>[\da-f]+)'
      _TESTS = [{
          'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
          'md5': '0f8c22ef8c5d665b13ac709e63025610',
@@ -273,7 +281,7 @@ class MailRuMusicIE(MailRuMusicSearchBaseIE):
  class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
      IE_NAME = 'mailru:music:search'
      IE_DESC = 'Музыка@Mail.Ru'
-    _VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://my\.mail\.ru/+music/+search/+(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://my.mail.ru/music/search/black%20shadow',
          'info_dict': {
diff --git a/youtube_dl/extractor/makertv.py b/youtube_dl/extractor/makertv.py

deleted file mode 100644 (file)

index 8eda69c..0000000
--- a/youtube_dl/extractor/makertv.py
+++ /dev/null
@@ -1,32 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class MakerTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?:www\.)?maker\.tv/(?:[^/]+/)*video|makerplayer\.com/embed/maker)/(?P<id>[a-zA-Z0-9]{12})'
-    _TEST = {
-        'url': 'http://www.maker.tv/video/Fh3QgymL9gsc',
-        'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
-        'info_dict': {
-            'id': 'Fh3QgymL9gsc',
-            'ext': 'mp4',
-            'title': 'Maze Runner: The Scorch Trials Official Movie Review',
-            'description': 'md5:11ff3362d7ef1d679fdb649f6413975a',
-            'upload_date': '20150918',
-            'timestamp': 1442549540,
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        jwplatform_id = self._search_regex(r'jw_?id="([^"]+)"', webpage, 'jwplatform id')
-
-        return {
-            '_type': 'url_transparent',
-            'id': video_id,
-            'url': 'jwplatform:%s' % jwplatform_id,
-            'ie_key': 'JWPlatform',
-        }
diff --git a/youtube_dl/extractor/mangomolo.py b/youtube_dl/extractor/mangomolo.py

index 482175a3412db859e32cb79893df1c830afa17e7..acee370e931602bd3972eb631109e71c74c97ce8 100644 (file)
--- a/youtube_dl/extractor/mangomolo.py
+++ b/youtube_dl/extractor/mangomolo.py
@@ -10,18 +10,21 @@ from ..utils import int_or_none
  
  
  class MangomoloBaseIE(InfoExtractor):
+    _BASE_REGEX = r'https?://(?:admin\.mangomolo\.com/analytics/index\.php/customers/embed/|player\.mangomolo\.com/v1/)'
+
      def _get_real_id(self, page_id):
          return page_id
  
      def _real_extract(self, url):
          page_id = self._get_real_id(self._match_id(url))
-        webpage = self._download_webpage(url, page_id)
+        webpage = self._download_webpage(
+            'https://player.mangomolo.com/v1/%s?%s' % (self._TYPE, url.split('?')[1]), page_id)
          hidden_inputs = self._hidden_inputs(webpage)
          m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
  
          format_url = self._html_search_regex(
              [
-                r'file\s*:\s*"(https?://[^"]+?/playlist\.m3u8)',
+                r'(?:file|src)\s*:\s*"(https?://[^"]+?/playlist\.m3u8)',
                  r'<a[^>]+href="(rtsp://[^"]+)"'
              ], webpage, 'format url')
          formats = self._extract_wowza_formats(
@@ -39,14 +42,16 @@ class MangomoloBaseIE(InfoExtractor):
  
  
  class MangomoloVideoIE(MangomoloBaseIE):
-    IE_NAME = 'mangomolo:video'
-    _VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
+    _TYPE = 'video'
+    IE_NAME = 'mangomolo:' + _TYPE
+    _VALID_URL = MangomoloBaseIE._BASE_REGEX + r'video\?.*?\bid=(?P<id>\d+)'
      _IS_LIVE = False
  
  
  class MangomoloLiveIE(MangomoloBaseIE):
-    IE_NAME = 'mangomolo:live'
-    _VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
+    _TYPE = 'live'
+    IE_NAME = 'mangomolo:' + _TYPE
+    _VALID_URL = MangomoloBaseIE._BASE_REGEX + r'(live|index)\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
      _IS_LIVE = True
  
      def _get_real_id(self, page_id):
diff --git a/youtube_dl/extractor/mediaset.py b/youtube_dl/extractor/mediaset.py

index df3748798e9f943c16ddd29cead9515b0dd44c57..027a790b8b182541dd8b592b183b0dbaff505322 100644 (file)
--- a/youtube_dl/extractor/mediaset.py
+++ b/youtube_dl/extractor/mediaset.py
@@ -27,7 +27,7 @@ class MediasetIE(ThePlatformBaseIE):
                                  (?:video|on-demand)/(?:[^/]+/)+[^/]+_|
                                  player/index\.html\?.*?\bprogramGuid=
                              )
-                    )(?P<id>[0-9A-Z]{16})
+                    )(?P<id>[0-9A-Z]{16,})
                      '''
      _TESTS = [{
          # full episode
@@ -62,7 +62,6 @@ class MediasetIE(ThePlatformBaseIE):
              'uploader': 'Canale 5',
              'uploader_id': 'C5',
          },
-        'expected_warnings': ['HTTP Error 403: Forbidden'],
      }, {
          # clip
          'url': 'https://www.mediasetplay.mediaset.it/video/gogglebox/un-grande-classico-della-commedia-sexy_FAFU000000661680',
@@ -78,6 +77,18 @@ class MediasetIE(ThePlatformBaseIE):
      }, {
          'url': 'mediaset:FAFU000000665924',
          'only_matching': True,
+    }, {
+        'url': 'https://www.mediasetplay.mediaset.it/video/mediasethaacuoreilfuturo/palmieri-alicudi-lisola-dei-tre-bambini-felici--un-decreto-per-alicudi-e-tutte-le-microscuole_FD00000000102295',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.mediasetplay.mediaset.it/video/cherryseason/anticipazioni-degli-episodi-del-23-ottobre_F306837101005C02',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.mediasetplay.mediaset.it/video/tg5/ambiente-onda-umana-per-salvare-il-pianeta_F309453601079D01',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.mediasetplay.mediaset.it/video/grandefratellovip/benedetta-una-doccia-gelata_F309344401044C135',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -109,6 +120,11 @@ class MediasetIE(ThePlatformBaseIE):
                  entries.append(embed_url)
          return entries
  
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
+        for video in smil.findall(self._xpath_ns('.//video', namespace)):
+            video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
+        return super(MediasetIE, self)._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
+
      def _real_extract(self, url):
          guid = self._match_id(url)
          tp_path = 'PR1GhC/media/guid/2702976343/' + guid
@@ -118,14 +134,15 @@ class MediasetIE(ThePlatformBaseIE):
          subtitles = {}
          first_e = None
          for asset_type in ('SD', 'HD'):
-            for f in ('MPEG4', 'MPEG-DASH', 'M3U', 'ISM'):
+            # TODO: fixup ISM+none manifest URLs
+            for f in ('MPEG4', 'MPEG-DASH+none', 'M3U+none'):
                  try:
                      tp_formats, tp_subtitles = self._extract_theplatform_smil(
                          update_url_query('http://link.theplatform.%s/s/%s' % (self._TP_TLD, tp_path), {
                              'mbr': 'true',
                              'formats': f,
                              'assetTypes': asset_type,
-                        }), guid, 'Downloading %s %s SMIL data' % (f, asset_type))
+                        }), guid, 'Downloading %s %s SMIL data' % (f.split('+')[0], asset_type))
                  except ExtractorError as e:
                      if not first_e:
                          first_e = e
diff --git a/youtube_dl/extractor/minhateca.py b/youtube_dl/extractor/minhateca.py

deleted file mode 100644 (file)

index dccc542..0000000
--- a/youtube_dl/extractor/minhateca.py
+++ /dev/null
@@ -1,70 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    parse_duration,
-    parse_filesize,
-    sanitized_Request,
-    urlencode_postdata,
-)
-
-
-class MinhatecaIE(InfoExtractor):
-    _VALID_URL = r'https?://minhateca\.com\.br/[^?#]+,(?P<id>[0-9]+)\.'
-    _TEST = {
-        'url': 'http://minhateca.com.br/pereba/misc/youtube-dl+test+video,125848331.mp4(video)',
-        'info_dict': {
-            'id': '125848331',
-            'ext': 'mp4',
-            'title': 'youtube-dl test video',
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'filesize_approx': 1530000,
-            'duration': 9,
-            'view_count': int,
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        token = self._html_search_regex(
-            r'<input name="__RequestVerificationToken".*?value="([^"]+)"',
-            webpage, 'request token')
-        token_data = [
-            ('fileId', video_id),
-            ('__RequestVerificationToken', token),
-        ]
-        req = sanitized_Request(
-            'http://minhateca.com.br/action/License/Download',
-            data=urlencode_postdata(token_data))
-        req.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        data = self._download_json(
-            req, video_id, note='Downloading metadata')
-
-        video_url = data['redirectUrl']
-        title_str = self._html_search_regex(
-            r'<h1.*?>(.*?)</h1>', webpage, 'title')
-        title, _, ext = title_str.rpartition('.')
-        filesize_approx = parse_filesize(self._html_search_regex(
-            r'<p class="fileSize">(.*?)</p>',
-            webpage, 'file size approximation', fatal=False))
-        duration = parse_duration(self._html_search_regex(
-            r'(?s)<p class="fileLeng[ht][th]">.*?class="bold">(.*?)<',
-            webpage, 'duration', fatal=False))
-        view_count = int_or_none(self._html_search_regex(
-            r'<p class="downloadsCounter">([0-9]+)</p>',
-            webpage, 'view count', fatal=False))
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': title,
-            'ext': ext,
-            'filesize_approx': filesize_approx,
-            'duration': duration,
-            'view_count': view_count,
-            'thumbnail': self._og_search_thumbnail(webpage),
-        }
diff --git a/youtube_dl/extractor/mit.py b/youtube_dl/extractor/mit.py

index 1aea78d118a84a135494214da54c3c2c21465bc9..e1506a745a69e6c300f1e36cdc6b3e4ed8206bf0 100644 (file)
--- a/youtube_dl/extractor/mit.py
+++ b/youtube_dl/extractor/mit.py
@@ -65,30 +65,6 @@ class TechTVMITIE(InfoExtractor):
          }
  
  
-class MITIE(TechTVMITIE):
-    IE_NAME = 'video.mit.edu'
-    _VALID_URL = r'https?://video\.mit\.edu/watch/(?P<title>[^/]+)'
-
-    _TEST = {
-        'url': 'http://video.mit.edu/watch/the-government-is-profiling-you-13222/',
-        'md5': '7db01d5ccc1895fc5010e9c9e13648da',
-        'info_dict': {
-            'id': '21783',
-            'ext': 'mp4',
-            'title': 'The Government is Profiling You',
-            'description': 'md5:ad5795fe1e1623b73620dbfd47df9afd',
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        page_title = mobj.group('title')
-        webpage = self._download_webpage(url, page_title)
-        embed_url = self._search_regex(
-            r'<iframe .*?src="(.+?)"', webpage, 'embed url')
-        return self.url_result(embed_url)
-
-
  class OCWMITIE(InfoExtractor):
      IE_NAME = 'ocw.mit.edu'
      _VALID_URL = r'^https?://ocw\.mit\.edu/courses/(?P<topic>[a-z0-9\-]+)'
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index 40f214a873a7dcbb5087726aa270b74a5dab3e76..ad9da96125b92d66d4e6367e23b7ed9a1a633aaf 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -4,8 +4,8 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
+    parse_iso8601,
      smuggle_url,
-    parse_duration,
  )
  
  
@@ -18,16 +18,18 @@ class MiTeleIE(InfoExtractor):
          'info_dict': {
              'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
              'ext': 'mp4',
-            'title': 'Tor, la web invisible',
-            'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
+            'title': 'Diario de La redacción Programa 144',
+            'description': 'md5:07c35a7b11abb05876a6a79185b58d27',
              'series': 'Diario de',
-            'season': 'La redacción',
+            'season': 'Season 14',
              'season_number': 14,
-            'season_id': 'diario_de_t14_11981',
-            'episode': 'Programa 144',
+            'episode': 'Tor, la web invisible',
              'episode_number': 3,
              'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
+            'age_limit': 16,
+            'timestamp': 1471209401,
+            'upload_date': '20160814',
          },
          'add_ie': ['Ooyala'],
      }, {
@@ -39,13 +41,15 @@ class MiTeleIE(InfoExtractor):
              'title': 'Cuarto Milenio Temporada 6 Programa 226',
              'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
              'series': 'Cuarto Milenio',
-            'season': 'Temporada 6',
+            'season': 'Season 6',
              'season_number': 6,
-            'season_id': 'cuarto_milenio_t06_12715',
-            'episode': 'Programa 226',
+            'episode': 'Episode 24',
              'episode_number': 24,
              'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 7313,
+            'age_limit': 12,
+            'timestamp': 1471209021,
+            'upload_date': '20160814',
          },
          'params': {
              'skip_download': True,
@@ -54,67 +58,36 @@ class MiTeleIE(InfoExtractor):
      }, {
          'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
          'only_matching': True,
+    }, {
+        'url': 'https://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144-40_1006364575251/player/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        paths = self._download_json(
-            'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
-            video_id, 'Downloading paths JSON')
-
-        ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
-        base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
-        full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
-        source = self._download_json(
-            '%s://%s%s%s/docs/%s' % (
-                ooyala_s.get('protocol', 'https'), base_url, full_path,
-                ooyala_s.get('provider_id', '104951'), video_id),
-            video_id, 'Downloading data JSON', query={
-                'include_titles': 'Series,Season',
-                'product_name': ooyala_s.get('product_name', 'test'),
-                'format': 'full',
-            })['hits']['hits'][0]['_source']
-
-        embedCode = source['offers'][0]['embed_codes'][0]
-        titles = source['localizable_titles'][0]
-
-        title = titles.get('title_medium') or titles['title_long']
-
-        description = titles.get('summary_long') or titles.get('summary_medium')
-
-        def get(key1, key2):
-            value1 = source.get(key1)
-            if not value1 or not isinstance(value1, list):
-                return
-            if not isinstance(value1[0], dict):
-                return
-            return value1[0].get(key2)
-
-        series = get('localizable_titles_series', 'title_medium')
-
-        season = get('localizable_titles_season', 'title_medium')
-        season_number = int_or_none(source.get('season_number'))
-        season_id = source.get('season_id')
-
-        episode = titles.get('title_sort_name')
-        episode_number = int_or_none(source.get('episode_number'))
-
-        duration = parse_duration(get('videos', 'duration'))
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        pre_player = self._parse_json(self._search_regex(
+            r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
+            webpage, 'Pre Player'), display_id)['prePlayer']
+        title = pre_player['title']
+        video = pre_player['video']
+        video_id = video['dataMediaId']
+        content = pre_player.get('content') or {}
+        info = content.get('info') or {}
  
          return {
              '_type': 'url_transparent',
              # for some reason only HLS is supported
-            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
+            'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
              'id': video_id,
              'title': title,
-            'description': description,
-            'series': series,
-            'season': season,
-            'season_number': season_number,
-            'season_id': season_id,
-            'episode': episode,
-            'episode_number': episode_number,
-            'duration': duration,
-            'thumbnail': get('images', 'url'),
+            'description': info.get('synopsis'),
+            'series': content.get('title'),
+            'season_number': int_or_none(info.get('season_number')),
+            'episode': content.get('subtitle'),
+            'episode_number': int_or_none(info.get('episode_number')),
+            'duration': int_or_none(info.get('duration')),
+            'thumbnail': video.get('dataPoster'),
+            'age_limit': int_or_none(info.get('rating')),
+            'timestamp': parse_iso8601(pre_player.get('publishedTime')),
          }
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index bf5353ef941d660ff3fe542069a7160afe281369..9759560f1bee5642c7e4f11c8aa8891fd4b3ed7d 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -1,6 +1,5 @@
  from __future__ import unicode_literals
  
-import functools
  import itertools
  import re
  
@@ -11,28 +10,37 @@ from ..compat import (
      compat_ord,
      compat_str,
      compat_urllib_parse_unquote,
-    compat_urlparse,
      compat_zip
  )
  from ..utils import (
-    clean_html,
-    ExtractorError,
      int_or_none,
-    OnDemandPagedList,
-    str_to_int,
+    parse_iso8601,
+    strip_or_none,
      try_get,
-    urljoin,
  )
  
  
-class MixcloudIE(InfoExtractor):
+class MixcloudBaseIE(InfoExtractor):
+    def _call_api(self, object_type, object_fields, display_id, username, slug=None):
+        lookup_key = object_type + 'Lookup'
+        return self._download_json(
+            'https://www.mixcloud.com/graphql', display_id, query={
+                'query': '''{
+  %s(lookup: {username: "%s"%s}) {
+    %s
+  }
+}''' % (lookup_key, username, ', slug: "%s"' % slug if slug else '', object_fields)
+            })['data'][lookup_key]
+
+
+class MixcloudIE(MixcloudBaseIE):
      _VALID_URL = r'https?://(?:(?:www|beta|m)\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
      IE_NAME = 'mixcloud'
  
      _TESTS = [{
          'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
          'info_dict': {
-            'id': 'dholbach-cryptkeeper',
+            'id': 'dholbach_cryptkeeper',
              'ext': 'm4a',
              'title': 'Cryptkeeper',
              'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
@@ -40,11 +48,13 @@ class MixcloudIE(InfoExtractor):
              'uploader_id': 'dholbach',
              'thumbnail': r're:https?://.*\.jpg',
              'view_count': int,
+            'timestamp': 1321359578,
+            'upload_date': '20111115',
          },
      }, {
          'url': 'http://www.mixcloud.com/gillespeterson/caribou-7-inch-vinyl-mix-chat/',
          'info_dict': {
-            'id': 'gillespeterson-caribou-7-inch-vinyl-mix-chat',
+            'id': 'gillespeterson_caribou-7-inch-vinyl-mix-chat',
              'ext': 'mp3',
              'title': 'Caribou 7 inch Vinyl Mix & Chat',
              'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
@@ -52,11 +62,14 @@ class MixcloudIE(InfoExtractor):
              'uploader_id': 'gillespeterson',
              'thumbnail': 're:https?://.*',
              'view_count': int,
+            'timestamp': 1422987057,
+            'upload_date': '20150203',
          },
      }, {
          'url': 'https://beta.mixcloud.com/RedLightRadio/nosedrip-15-red-light-radio-01-18-2016/',
          'only_matching': True,
      }]
+    _DECRYPTION_KEY = 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'
  
      @staticmethod
      def _decrypt_xor_cipher(key, ciphertext):
@@ -66,176 +79,193 @@ class MixcloudIE(InfoExtractor):
              for ch, k in compat_zip(ciphertext, itertools.cycle(key))])
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        uploader = mobj.group(1)
-        cloudcast_name = mobj.group(2)
-        track_id = compat_urllib_parse_unquote('-'.join((uploader, cloudcast_name)))
+        username, slug = re.match(self._VALID_URL, url).groups()
+        username, slug = compat_urllib_parse_unquote(username), compat_urllib_parse_unquote(slug)
+        track_id = '%s_%s' % (username, slug)
+
+        cloudcast = self._call_api('cloudcast', '''audioLength
+    comments(first: 100) {
+      edges {
+        node {
+          comment
+          created
+          user {
+            displayName
+            username
+          }
+        }
+      }
+      totalCount
+    }
+    description
+    favorites {
+      totalCount
+    }
+    featuringArtistList
+    isExclusive
+    name
+    owner {
+      displayName
+      url
+      username
+    }
+    picture(width: 1024, height: 1024) {
+        url
+    }
+    plays
+    publishDate
+    reposts {
+      totalCount
+    }
+    streamInfo {
+      dashUrl
+      hlsUrl
+      url
+    }
+    tags {
+      tag {
+        name
+      }
+    }''', track_id, username, slug)
  
-        webpage = self._download_webpage(url, track_id)
+        title = cloudcast['name']
  
-        # Legacy path
-        encrypted_play_info = self._search_regex(
-            r'm-play-info="([^"]+)"', webpage, 'play info', default=None)
+        stream_info = cloudcast['streamInfo']
+        formats = []
  
-        if encrypted_play_info is not None:
-            # Decode
-            encrypted_play_info = compat_b64decode(encrypted_play_info)
-        else:
-            # New path
-            full_info_json = self._parse_json(self._html_search_regex(
-                r'<script id="relay-data" type="text/x-mixcloud">([^<]+)</script>',
-                webpage, 'play info'), 'play info')
-            for item in full_info_json:
-                item_data = try_get(
-                    item, lambda x: x['cloudcast']['data']['cloudcastLookup'],
-                    dict)
-                if try_get(item_data, lambda x: x['streamInfo']['url']):
-                    info_json = item_data
-                    break
-            else:
-                raise ExtractorError('Failed to extract matching stream info')
-
-        message = self._html_search_regex(
-            r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
-            webpage, 'error message', default=None)
-
-        js_url = self._search_regex(
-            r'<script[^>]+\bsrc=["\"](https://(?:www\.)?mixcloud\.com/media/(?:js2/www_js_4|js/www)\.[^>]+\.js)',
-            webpage, 'js url')
-        js = self._download_webpage(js_url, track_id, 'Downloading JS')
-        # Known plaintext attack
-        if encrypted_play_info:
-            kps = ['{"stream_url":']
-            kpa_target = encrypted_play_info
-        else:
-            kps = ['https://', 'http://']
-            kpa_target = compat_b64decode(info_json['streamInfo']['url'])
-        for kp in kps:
-            partial_key = self._decrypt_xor_cipher(kpa_target, kp)
-            for quote in ["'", '"']:
-                key = self._search_regex(
-                    r'{0}({1}[^{0}]*){0}'.format(quote, re.escape(partial_key)),
-                    js, 'encryption key', default=None)
-                if key is not None:
-                    break
+        for url_key in ('url', 'hlsUrl', 'dashUrl'):
+            format_url = stream_info.get(url_key)
+            if not format_url:
+                continue
+            decrypted = self._decrypt_xor_cipher(
+                self._DECRYPTION_KEY, compat_b64decode(format_url))
+            if url_key == 'hlsUrl':
+                formats.extend(self._extract_m3u8_formats(
+                    decrypted, track_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif url_key == 'dashUrl':
+                formats.extend(self._extract_mpd_formats(
+                    decrypted, track_id, mpd_id='dash', fatal=False))
              else:
+                formats.append({
+                    'format_id': 'http',
+                    'url': decrypted,
+                    'downloader_options': {
+                        # Mixcloud starts throttling at >~5M
+                        'http_chunk_size': 5242880,
+                    },
+                })
+
+        if not formats and cloudcast.get('isExclusive'):
+            self.raise_login_required()
+
+        self._sort_formats(formats)
+
+        comments = []
+        for edge in (try_get(cloudcast, lambda x: x['comments']['edges']) or []):
+            node = edge.get('node') or {}
+            text = strip_or_none(node.get('comment'))
+            if not text:
                  continue
-            break
-        else:
-            raise ExtractorError('Failed to extract encryption key')
-
-        if encrypted_play_info is not None:
-            play_info = self._parse_json(self._decrypt_xor_cipher(key, encrypted_play_info), 'play info')
-            if message and 'stream_url' not in play_info:
-                raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
-            song_url = play_info['stream_url']
-            formats = [{
-                'format_id': 'normal',
-                'url': song_url
-            }]
-
-            title = self._html_search_regex(r'm-title="([^"]+)"', webpage, 'title')
-            thumbnail = self._proto_relative_url(self._html_search_regex(
-                r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail', fatal=False))
-            uploader = self._html_search_regex(
-                r'm-owner-name="([^"]+)"', webpage, 'uploader', fatal=False)
-            uploader_id = self._search_regex(
-                r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
-            description = self._og_search_description(webpage)
-            view_count = str_to_int(self._search_regex(
-                [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
-                 r'/listeners/?">([0-9,.]+)</a>',
-                 r'(?:m|data)-tooltip=["\']([\d,.]+) plays'],
-                webpage, 'play count', default=None))
+            user = node.get('user') or {}
+            comments.append({
+                'author': user.get('displayName'),
+                'author_id': user.get('username'),
+                'text': text,
+                'timestamp': parse_iso8601(node.get('created')),
+            })
  
-        else:
-            title = info_json['name']
-            thumbnail = urljoin(
-                'https://thumbnailer.mixcloud.com/unsafe/600x600/',
-                try_get(info_json, lambda x: x['picture']['urlRoot'], compat_str))
-            uploader = try_get(info_json, lambda x: x['owner']['displayName'])
-            uploader_id = try_get(info_json, lambda x: x['owner']['username'])
-            description = try_get(info_json, lambda x: x['description'])
-            view_count = int_or_none(try_get(info_json, lambda x: x['plays']))
-
-            stream_info = info_json['streamInfo']
-            formats = []
-
-            def decrypt_url(f_url):
-                for k in (key, 'IFYOUWANTTHEARTISTSTOGETPAIDDONOTDOWNLOADFROMMIXCLOUD'):
-                    decrypted_url = self._decrypt_xor_cipher(k, f_url)
-                    if re.search(r'^https?://[0-9A-Za-z.]+/[0-9A-Za-z/.?=&_-]+$', decrypted_url):
-                        return decrypted_url
-
-            for url_key in ('url', 'hlsUrl', 'dashUrl'):
-                format_url = stream_info.get(url_key)
-                if not format_url:
-                    continue
-                decrypted = decrypt_url(compat_b64decode(format_url))
-                if not decrypted:
-                    continue
-                if url_key == 'hlsUrl':
-                    formats.extend(self._extract_m3u8_formats(
-                        decrypted, track_id, 'mp4', entry_protocol='m3u8_native',
-                        m3u8_id='hls', fatal=False))
-                elif url_key == 'dashUrl':
-                    formats.extend(self._extract_mpd_formats(
-                        decrypted, track_id, mpd_id='dash', fatal=False))
-                else:
-                    formats.append({
-                        'format_id': 'http',
-                        'url': decrypted,
-                        'downloader_options': {
-                            # Mixcloud starts throttling at >~5M
-                            'http_chunk_size': 5242880,
-                        },
-                    })
-            self._sort_formats(formats)
+        tags = []
+        for t in cloudcast.get('tags'):
+            tag = try_get(t, lambda x: x['tag']['name'], compat_str)
+            if not tag:
+                tags.append(tag)
+
+        get_count = lambda x: int_or_none(try_get(cloudcast, lambda y: y[x]['totalCount']))
+
+        owner = cloudcast.get('owner') or {}
  
          return {
              'id': track_id,
              'title': title,
              'formats': formats,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'view_count': view_count,
+            'description': cloudcast.get('description'),
+            'thumbnail': try_get(cloudcast, lambda x: x['picture']['url'], compat_str),
+            'uploader': owner.get('displayName'),
+            'timestamp': parse_iso8601(cloudcast.get('publishDate')),
+            'uploader_id': owner.get('username'),
+            'uploader_url': owner.get('url'),
+            'duration': int_or_none(cloudcast.get('audioLength')),
+            'view_count': int_or_none(cloudcast.get('plays')),
+            'like_count': get_count('favorites'),
+            'repost_count': get_count('reposts'),
+            'comment_count': get_count('comments'),
+            'comments': comments,
+            'tags': tags,
+            'artist': ', '.join(cloudcast.get('featuringArtistList') or []) or None,
          }
  
  
-class MixcloudPlaylistBaseIE(InfoExtractor):
-    _PAGE_SIZE = 24
+class MixcloudPlaylistBaseIE(MixcloudBaseIE):
+    def _get_cloudcast(self, node):
+        return node
  
-    def _find_urls_in_page(self, page):
-        for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
-            yield self.url_result(
-                compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
-                MixcloudIE.ie_key())
+    def _get_playlist_title(self, title, slug):
+        return title
+
+    def _real_extract(self, url):
+        username, slug = re.match(self._VALID_URL, url).groups()
+        username = compat_urllib_parse_unquote(username)
+        if not slug:
+            slug = 'uploads'
+        else:
+            slug = compat_urllib_parse_unquote(slug)
+        playlist_id = '%s_%s' % (username, slug)
  
-    def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
-        real_page_number = real_page_number or current_page + 1
-        return self._download_webpage(
-            'https://www.mixcloud.com/%s/' % path, video_id,
-            note='Download %s (page %d)' % (page_name, current_page + 1),
-            errnote='Unable to download %s' % page_name,
-            query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
-            headers={'X-Requested-With': 'XMLHttpRequest'})
+        is_playlist_type = self._ROOT_TYPE == 'playlist'
+        playlist_type = 'items' if is_playlist_type else slug
+        list_filter = ''
  
-    def _tracks_page_func(self, page, video_id, page_name, current_page):
-        resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
+        has_next_page = True
+        entries = []
+        while has_next_page:
+            playlist = self._call_api(
+                self._ROOT_TYPE, '''%s
+    %s
+    %s(first: 100%s) {
+      edges {
+        node {
+          %s
+        }
+      }
+      pageInfo {
+        endCursor
+        hasNextPage
+      }
+    }''' % (self._TITLE_KEY, self._DESCRIPTION_KEY, playlist_type, list_filter, self._NODE_TEMPLATE),
+                playlist_id, username, slug if is_playlist_type else None)
+
+            items = playlist.get(playlist_type) or {}
+            for edge in items.get('edges', []):
+                cloudcast = self._get_cloudcast(edge.get('node') or {})
+                cloudcast_url = cloudcast.get('url')
+                if not cloudcast_url:
+                    continue
+                entries.append(self.url_result(
+                    cloudcast_url, MixcloudIE.ie_key(), cloudcast.get('slug')))
  
-        for item in self._find_urls_in_page(resp):
-            yield item
+            page_info = items['pageInfo']
+            has_next_page = page_info['hasNextPage']
+            list_filter = ', after: "%s"' % page_info['endCursor']
  
-    def _get_user_description(self, page_content):
-        return self._html_search_regex(
-            r'<div[^>]+class="profile-bio"[^>]*>(.+?)</div>',
-            page_content, 'user description', fatal=False)
+        return self.playlist_result(
+            entries, playlist_id,
+            self._get_playlist_title(playlist[self._TITLE_KEY], slug),
+            playlist.get(self._DESCRIPTION_KEY))
  
  
  class MixcloudUserIE(MixcloudPlaylistBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
+    _VALID_URL = r'https?://(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/(?P<type>uploads|favorites|listens|stream)?/?$'
      IE_NAME = 'mixcloud:user'
  
      _TESTS = [{
@@ -243,68 +273,58 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
          'info_dict': {
              'id': 'dholbach_uploads',
              'title': 'Daniel Holbach (uploads)',
-            'description': 'md5:def36060ac8747b3aabca54924897e47',
+            'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
          },
-        'playlist_mincount': 11,
+        'playlist_mincount': 36,
      }, {
          'url': 'http://www.mixcloud.com/dholbach/uploads/',
          'info_dict': {
              'id': 'dholbach_uploads',
              'title': 'Daniel Holbach (uploads)',
-            'description': 'md5:def36060ac8747b3aabca54924897e47',
+            'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
          },
-        'playlist_mincount': 11,
+        'playlist_mincount': 36,
      }, {
          'url': 'http://www.mixcloud.com/dholbach/favorites/',
          'info_dict': {
              'id': 'dholbach_favorites',
              'title': 'Daniel Holbach (favorites)',
-            'description': 'md5:def36060ac8747b3aabca54924897e47',
-        },
-        'params': {
-            'playlist_items': '1-100',
+            'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
          },
-        'playlist_mincount': 100,
+        # 'params': {
+        #     'playlist_items': '1-100',
+        # },
+        'playlist_mincount': 396,
      }, {
          'url': 'http://www.mixcloud.com/dholbach/listens/',
          'info_dict': {
              'id': 'dholbach_listens',
              'title': 'Daniel Holbach (listens)',
-            'description': 'md5:def36060ac8747b3aabca54924897e47',
+            'description': 'md5:b60d776f0bab534c5dabe0a34e47a789',
          },
-        'params': {
-            'playlist_items': '1-100',
+        # 'params': {
+        #     'playlist_items': '1-100',
+        # },
+        'playlist_mincount': 1623,
+        'skip': 'Large list',
+    }, {
+        'url': 'https://www.mixcloud.com/FirstEar/stream/',
+        'info_dict': {
+            'id': 'FirstEar_stream',
+            'title': 'First Ear (stream)',
+            'description': 'Curators of good music\r\n\r\nfirstearmusic.com',
          },
-        'playlist_mincount': 100,
+        'playlist_mincount': 271,
      }]
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        user_id = mobj.group('user')
-        list_type = mobj.group('type')
-
-        # if only a profile URL was supplied, default to download all uploads
-        if list_type is None:
-            list_type = 'uploads'
-
-        video_id = '%s_%s' % (user_id, list_type)
+    _TITLE_KEY = 'displayName'
+    _DESCRIPTION_KEY = 'biog'
+    _ROOT_TYPE = 'user'
+    _NODE_TEMPLATE = '''slug
+          url'''
  
-        profile = self._download_webpage(
-            'https://www.mixcloud.com/%s/' % user_id, video_id,
-            note='Downloading user profile',
-            errnote='Unable to download user profile')
-
-        username = self._og_search_title(profile)
-        description = self._get_user_description(profile)
-
-        entries = OnDemandPagedList(
-            functools.partial(
-                self._tracks_page_func,
-                '%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
-            self._PAGE_SIZE)
-
-        return self.playlist_result(
-            entries, video_id, '%s (%s)' % (username, list_type), description)
+    def _get_playlist_title(self, title, slug):
+        return '%s (%s)' % (title, slug)
  
  
  class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
@@ -312,87 +332,20 @@ class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
      IE_NAME = 'mixcloud:playlist'
  
      _TESTS = [{
-        'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
-        'info_dict': {
-            'id': 'RedBullThre3style_tokyo-finalists-2015',
-            'title': 'National Champions 2015',
-            'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
-        },
-        'playlist_mincount': 16,
-    }, {
          'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        user_id = mobj.group('user')
-        playlist_id = mobj.group('playlist')
-        video_id = '%s_%s' % (user_id, playlist_id)
-
-        webpage = self._download_webpage(
-            url, user_id,
-            note='Downloading playlist page',
-            errnote='Unable to download playlist page')
-
-        title = self._html_search_regex(
-            r'<a[^>]+class="parent active"[^>]*><b>\d+</b><span[^>]*>([^<]+)',
-            webpage, 'playlist title',
-            default=None) or self._og_search_title(webpage, fatal=False)
-        description = self._get_user_description(webpage)
-
-        entries = OnDemandPagedList(
-            functools.partial(
-                self._tracks_page_func,
-                '%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
-            self._PAGE_SIZE)
-
-        return self.playlist_result(entries, video_id, title, description)
-
-
-class MixcloudStreamIE(MixcloudPlaylistBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
-    IE_NAME = 'mixcloud:stream'
-
-    _TEST = {
-        'url': 'https://www.mixcloud.com/FirstEar/stream/',
          'info_dict': {
-            'id': 'FirstEar',
-            'title': 'First Ear',
-            'description': 'Curators of good music\nfirstearmusic.com',
+            'id': 'maxvibes_jazzcat-on-ness-radio',
+            'title': 'Ness Radio sessions',
          },
-        'playlist_mincount': 192,
-    }
-
-    def _real_extract(self, url):
-        user_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, user_id)
-
-        entries = []
-        prev_page_url = None
-
-        def _handle_page(page):
-            entries.extend(self._find_urls_in_page(page))
-            return self._search_regex(
-                r'm-next-page-url="([^"]+)"', page,
-                'next page URL', default=None)
-
-        next_page_url = _handle_page(webpage)
-
-        for idx in itertools.count(0):
-            if not next_page_url or prev_page_url == next_page_url:
-                break
-
-            prev_page_url = next_page_url
-            current_page = int(self._search_regex(
-                r'\?page=(\d+)', next_page_url, 'next page number'))
-
-            next_page_url = _handle_page(self._fetch_tracks_page(
-                '%s/stream' % user_id, user_id, 'stream', idx,
-                real_page_number=current_page))
-
-        username = self._og_search_title(webpage)
-        description = self._get_user_description(webpage)
-
-        return self.playlist_result(entries, user_id, username, description)
+        'playlist_mincount': 59,
+    }]
+    _TITLE_KEY = 'name'
+    _DESCRIPTION_KEY = 'description'
+    _ROOT_TYPE = 'playlist'
+    _NODE_TEMPLATE = '''cloudcast {
+            slug
+            url
+          }'''
+
+    def _get_cloudcast(self, node):
+        return node.get('cloudcast') or {}
diff --git a/youtube_dl/extractor/msn.py b/youtube_dl/extractor/msn.py

index 0460cf4d5c886a41856e933419f206687de73f0d..e59b0b7b04a4ab841cef6c773d3b9a7e13e7e929 100644 (file)
--- a/youtube_dl/extractor/msn.py
+++ b/youtube_dl/extractor/msn.py
@@ -14,20 +14,27 @@ from ..utils import (
  
  
  class MSNIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
+    _VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
      _TESTS = [{
-        'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/criminal-minds-shemar-moore-shares-a-touching-goodbye-message/vp-BBqQYNE',
-        'md5': '8442f66c116cbab1ff7098f986983458',
+        'url': 'https://www.msn.com/en-in/money/video/7-ways-to-get-rid-of-chest-congestion/vi-BBPxU6d',
+        'md5': '087548191d273c5c55d05028f8d2cbcd',
          'info_dict': {
-            'id': 'BBqQYNE',
-            'display_id': 'criminal-minds-shemar-moore-shares-a-touching-goodbye-message',
+            'id': 'BBPxU6d',
+            'display_id': '7-ways-to-get-rid-of-chest-congestion',
              'ext': 'mp4',
-            'title': 'Criminal Minds - Shemar Moore Shares A Touching Goodbye Message',
-            'description': 'md5:e8e89b897b222eb33a6b5067a8f1bc25',
-            'duration': 104,
-            'uploader': 'CBS Entertainment',
-            'uploader_id': 'IT0X5aoJ6bJgYerJXSDCgFmYPB1__54v',
+            'title': 'Seven ways to get rid of chest congestion',
+            'description': '7 Ways to Get Rid of Chest Congestion',
+            'duration': 88,
+            'uploader': 'Health',
+            'uploader_id': 'BBPrMqa',
          },
+    }, {
+        # Article, multiple Dailymotion Embeds
+        'url': 'https://www.msn.com/en-in/money/sports/hottest-football-wags-greatest-footballers-turned-managers-and-more/ar-BBpc7Nl',
+        'info_dict': {
+            'id': 'BBpc7Nl',
+        },
+        'playlist_mincount': 4,
      }, {
          'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
          'only_matching': True,
@@ -41,75 +48,124 @@ class MSNIE(InfoExtractor):
      }, {
          'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-‘raped-woman’-comment/vi-AAhvzW6',
          'only_matching': True,
+    }, {
+        # Vidible(AOL) Embed
+        'url': 'https://www.msn.com/en-us/money/other/jupiter-is-about-to-come-so-close-you-can-see-its-moons-with-binoculars/vi-AACqsHR',
+        'only_matching': True,
+    }, {
+        # Dailymotion Embed
+        'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L',
+        'only_matching': True,
+    }, {
+        # YouTube Embed
+        'url': 'https://www.msn.com/en-in/money/news/meet-vikram-%E2%80%94-chandrayaan-2s-lander/vi-AAGUr0v',
+        'only_matching': True,
+    }, {
+        # NBCSports Embed
+        'url': 'https://www.msn.com/en-us/money/football_nfl/week-13-preview-redskins-vs-panthers/vi-BBXsCDb',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id, display_id = mobj.group('id', 'display_id')
+        display_id, page_id = re.match(self._VALID_URL, url).groups()
  
          webpage = self._download_webpage(url, display_id)
  
-        video = self._parse_json(
-            self._search_regex(
-                r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1',
-                webpage, 'video data', default='{}', group='data'),
-            display_id, transform_source=unescapeHTML)
+        entries = []
+        for _, metadata in re.findall(r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', webpage):
+            video = self._parse_json(unescapeHTML(metadata), display_id)
+
+            provider_id = video.get('providerId')
+            player_name = video.get('playerName')
+            if player_name and provider_id:
+                entry = None
+                if player_name == 'AOL':
+                    if provider_id.startswith('http'):
+                        provider_id = self._search_regex(
+                            r'https?://delivery\.vidible\.tv/video/redirect/([0-9a-f]{24})',
+                            provider_id, 'vidible id')
+                    entry = self.url_result(
+                        'aol-video:' + provider_id, 'Aol', provider_id)
+                elif player_name == 'Dailymotion':
+                    entry = self.url_result(
+                        'https://www.dailymotion.com/video/' + provider_id,
+                        'Dailymotion', provider_id)
+                elif player_name == 'YouTube':
+                    entry = self.url_result(
+                        provider_id, 'Youtube', provider_id)
+                elif player_name == 'NBCSports':
+                    entry = self.url_result(
+                        'http://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/' + provider_id,
+                        'NBCSportsVPlayer', provider_id)
+                if entry:
+                    entries.append(entry)
+                    continue
+
+            video_id = video['uuid']
+            title = video['title']
+
+            formats = []
+            for file_ in video.get('videoFiles', []):
+                format_url = file_.get('url')
+                if not format_url:
+                    continue
+                if 'format=m3u8-aapl' in format_url:
+                    # m3u8_native should not be used here until
+                    # https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, display_id, 'mp4',
+                        m3u8_id='hls', fatal=False))
+                elif 'format=mpd-time-csf' in format_url:
+                    formats.extend(self._extract_mpd_formats(
+                        format_url, display_id, 'dash', fatal=False))
+                elif '.ism' in format_url:
+                    if format_url.endswith('.ism'):
+                        format_url += '/manifest'
+                    formats.extend(self._extract_ism_formats(
+                        format_url, display_id, 'mss', fatal=False))
+                else:
+                    format_id = file_.get('formatCode')
+                    formats.append({
+                        'url': format_url,
+                        'ext': 'mp4',
+                        'format_id': format_id,
+                        'width': int_or_none(file_.get('width')),
+                        'height': int_or_none(file_.get('height')),
+                        'vbr': int_or_none(self._search_regex(r'_(\d+)\.mp4', format_url, 'vbr', default=None)),
+                        'preference': 1 if format_id == '1001' else None,
+                    })
+            self._sort_formats(formats)
+
+            subtitles = {}
+            for file_ in video.get('files', []):
+                format_url = file_.get('url')
+                format_code = file_.get('formatCode')
+                if not format_url or not format_code:
+                    continue
+                if compat_str(format_code) == '3100':
+                    subtitles.setdefault(file_.get('culture', 'en'), []).append({
+                        'ext': determine_ext(format_url, 'ttml'),
+                        'url': format_url,
+                    })
  
-        if not video:
+            entries.append({
+                'id': video_id,
+                'display_id': display_id,
+                'title': title,
+                'description': video.get('description'),
+                'thumbnail': video.get('headlineImage', {}).get('url'),
+                'duration': int_or_none(video.get('durationSecs')),
+                'uploader': video.get('sourceFriendly'),
+                'uploader_id': video.get('providerId'),
+                'creator': video.get('creator'),
+                'subtitles': subtitles,
+                'formats': formats,
+            })
+
+        if not entries:
              error = unescapeHTML(self._search_regex(
                  r'data-error=(["\'])(?P<error>.+?)\1',
                  webpage, 'error', group='error'))
              raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
  
-        title = video['title']
-
-        formats = []
-        for file_ in video.get('videoFiles', []):
-            format_url = file_.get('url')
-            if not format_url:
-                continue
-            if 'm3u8' in format_url:
-                # m3u8_native should not be used here until
-                # https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
-                m3u8_formats = self._extract_m3u8_formats(
-                    format_url, display_id, 'mp4',
-                    m3u8_id='hls', fatal=False)
-                formats.extend(m3u8_formats)
-            elif determine_ext(format_url) == 'ism':
-                formats.extend(self._extract_ism_formats(
-                    format_url + '/Manifest', display_id, 'mss', fatal=False))
-            else:
-                formats.append({
-                    'url': format_url,
-                    'ext': 'mp4',
-                    'format_id': 'http',
-                    'width': int_or_none(file_.get('width')),
-                    'height': int_or_none(file_.get('height')),
-                })
-        self._sort_formats(formats)
-
-        subtitles = {}
-        for file_ in video.get('files', []):
-            format_url = file_.get('url')
-            format_code = file_.get('formatCode')
-            if not format_url or not format_code:
-                continue
-            if compat_str(format_code) == '3100':
-                subtitles.setdefault(file_.get('culture', 'en'), []).append({
-                    'ext': determine_ext(format_url, 'ttml'),
-                    'url': format_url,
-                })
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': video.get('description'),
-            'thumbnail': video.get('headlineImage', {}).get('url'),
-            'duration': int_or_none(video.get('durationSecs')),
-            'uploader': video.get('sourceFriendly'),
-            'uploader_id': video.get('providerId'),
-            'creator': video.get('creator'),
-            'subtitles': subtitles,
-            'formats': formats,
-        }
+        return self.playlist_result(entries, page_id)
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index 7a3b57abd852f4ffb8e49bdce9d9e8029fca65c3..fedd5f46bba67eb5cb4304d38aa0bdffa537e11d 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -1,3 +1,4 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -349,33 +350,29 @@ class MTVIE(MTVServicesInfoExtractor):
      }]
  
  
-class MTV81IE(InfoExtractor):
-    IE_NAME = 'mtv81'
-    _VALID_URL = r'https?://(?:www\.)?mtv81\.com/videos/(?P<id>[^/?#.]+)'
+class MTVJapanIE(MTVServicesInfoExtractor):
+    IE_NAME = 'mtvjapan'
+    _VALID_URL = r'https?://(?:www\.)?mtvjapan\.com/videos/(?P<id>[0-9a-z]+)'
  
      _TEST = {
-        'url': 'http://www.mtv81.com/videos/artist-to-watch/the-godfather-of-japanese-hip-hop-segment-1/',
-        'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
+        'url': 'http://www.mtvjapan.com/videos/prayht/fresh-info-cadillac-escalade',
          'info_dict': {
-            'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
+            'id': 'bc01da03-6fe5-4284-8880-f291f4e368f5',
              'ext': 'mp4',
-            'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
-            'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
-            'timestamp': 1468846800,
-            'upload_date': '20160718',
+            'title': '【Fresh Info】Cadillac ESCALADE Sport Edition',
+        },
+        'params': {
+            'skip_download': True,
          },
      }
+    _GEO_COUNTRIES = ['JP']
+    _FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
  
-    def _extract_mgid(self, webpage):
-        return self._search_regex(
-            r'getTheVideo\((["\'])(?P<id>mgid:.+?)\1', webpage,
-            'mgid', group='id')
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        mgid = self._extract_mgid(webpage)
-        return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
+    def _get_feed_query(self, uri):
+        return {
+            'arcEp': 'mtvjapan.com',
+            'mgid': uri,
+        }
  
  
  class MTVVideoIE(MTVServicesInfoExtractor):
@@ -425,14 +422,14 @@ class MTVVideoIE(MTVServicesInfoExtractor):
  
  class MTVDEIE(MTVServicesInfoExtractor):
      IE_NAME = 'mtv.de'
-    _VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:artists|shows|news)/(?:[^/]+/)*(?P<id>\d+)-[^/#?]+/*(?:[#?].*)?$'
+    _VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:musik/videoclips|folgen|news)/(?P<id>[0-9a-z]+)'
      _TESTS = [{
-        'url': 'http://www.mtv.de/artists/10571-cro/videos/61131-traum',
+        'url': 'http://www.mtv.de/musik/videoclips/2gpnv7/Traum',
          'info_dict': {
-            'id': 'music_video-a50bc5f0b3aa4b3190aa',
-            'ext': 'flv',
-            'title': 'MusicVideo_cro-traum',
-            'description': 'Cro - Traum',
+            'id': 'd5d472bc-f5b7-11e5-bffd-a4badb20dab5',
+            'ext': 'mp4',
+            'title': 'Traum',
+            'description': 'Traum',
          },
          'params': {
              # rtmp download
@@ -441,11 +438,12 @@ class MTVDEIE(MTVServicesInfoExtractor):
          'skip': 'Blocked at Travis CI',
      }, {
          # mediagen URL without query (e.g. http://videos.mtvnn.com/mediagen/e865da714c166d18d6f80893195fcb97)
-        'url': 'http://www.mtv.de/shows/933-teen-mom-2/staffeln/5353/folgen/63565-enthullungen',
+        'url': 'http://www.mtv.de/folgen/6b1ylu/teen-mom-2-enthuellungen-S5-F1',
          'info_dict': {
-            'id': 'local_playlist-f5ae778b9832cc837189',
-            'ext': 'flv',
-            'title': 'Episode_teen-mom-2_shows_season-5_episode-1_full-episode_part1',
+            'id': '1e5a878b-31c5-11e7-a442-0e40cf2fc285',
+            'ext': 'mp4',
+            'title': 'Teen Mom 2',
+            'description': 'md5:dc65e357ef7e1085ed53e9e9d83146a7',
          },
          'params': {
              # rtmp download
@@ -453,7 +451,7 @@ class MTVDEIE(MTVServicesInfoExtractor):
          },
          'skip': 'Blocked at Travis CI',
      }, {
-        'url': 'http://www.mtv.de/news/77491-mtv-movies-spotlight-pixels-teil-3',
+        'url': 'http://www.mtv.de/news/glolix/77491-mtv-movies-spotlight--pixels--teil-3',
          'info_dict': {
              'id': 'local_playlist-4e760566473c4c8c5344',
              'ext': 'mp4',
@@ -466,25 +464,11 @@ class MTVDEIE(MTVServicesInfoExtractor):
          },
          'skip': 'Das Video kann zur Zeit nicht abgespielt werden.',
      }]
+    _GEO_COUNTRIES = ['DE']
+    _FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        playlist = self._parse_json(
-            self._search_regex(
-                r'window\.pagePlaylist\s*=\s*(\[.+?\]);\n', webpage, 'page playlist'),
-            video_id)
-
-        def _mrss_url(item):
-            return item['mrss'] + item.get('mrssvars', '')
-
-        # news pages contain single video in playlist with different id
-        if len(playlist) == 1:
-            return self._get_videos_info_from_url(_mrss_url(playlist[0]), video_id)
-
-        for item in playlist:
-            item_id = item.get('id')
-            if item_id and compat_str(item_id) == video_id:
-                return self._get_videos_info_from_url(_mrss_url(item), video_id)
+    def _get_feed_query(self, uri):
+        return {
+            'arcEp': 'mtv.de',
+            'mgid': uri,
+        }
diff --git a/youtube_dl/extractor/musicplayon.py b/youtube_dl/extractor/musicplayon.py

deleted file mode 100644 (file)

index 1854d59..0000000
--- a/youtube_dl/extractor/musicplayon.py
+++ /dev/null
@@ -1,66 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_urlparse
-from ..utils import (
-    int_or_none,
-    js_to_json,
-    mimetype2ext,
-)
-
-
-class MusicPlayOnIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
-
-    _TESTS = [{
-        'url': 'http://en.musicplayon.com/play?v=433377',
-        'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
-        'info_dict': {
-            'id': '433377',
-            'ext': 'mp4',
-            'title': 'Rick Ross - Interview On Chelsea Lately (2014)',
-            'description': 'Rick Ross Interview On Chelsea Lately',
-            'duration': 342,
-            'uploader': 'ultrafish',
-        },
-    }, {
-        'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
-        'only_matching': True,
-    }]
-
-    _URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        url = self._URL_TEMPLATE % video_id
-
-        page = self._download_webpage(url, video_id)
-
-        title = self._og_search_title(page)
-        description = self._og_search_description(page)
-        thumbnail = self._og_search_thumbnail(page)
-        duration = self._html_search_meta('video:duration', page, 'duration', fatal=False)
-        view_count = self._og_search_property('count', page, fatal=False)
-        uploader = self._html_search_regex(
-            r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
-
-        sources = self._parse_json(
-            self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
-            video_id, transform_source=js_to_json)
-        formats = [{
-            'url': compat_urlparse.urljoin(url, source['src']),
-            'ext': mimetype2ext(source.get('type')),
-            'format_note': source.get('data-res'),
-        } for source in sources]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'duration': int_or_none(duration),
-            'view_count': int_or_none(view_count),
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/myspass.py b/youtube_dl/extractor/myspass.py

index 2afe535b5de0804927f2798850572caf9267b044..db7ebc94ca7305dfb61cafd46679689906019ceb 100644 (file)
--- a/youtube_dl/extractor/myspass.py
+++ b/youtube_dl/extractor/myspass.py
@@ -1,73 +1,56 @@
+# coding: utf-8
  from __future__ import unicode_literals
-import os.path
+
+import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
-    ExtractorError,
+    int_or_none,
+    parse_duration,
+    xpath_text,
  )
  
  
  class MySpassIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?myspass\.de/.*'
+    _VALID_URL = r'https?://(?:www\.)?myspass\.de/([^/]+/)*(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
          'md5': '0b49f4844a068f8b33f4b7c88405862b',
          'info_dict': {
              'id': '11741',
              'ext': 'mp4',
-            'description': 'Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
-            'title': 'Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2',
+            'description': 'Wer kann in die Fußstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
+            'title': '17.02.2013 - Die Highlights, Teil 2',
          },
      }
  
      def _real_extract(self, url):
-        META_DATA_URL_TEMPLATE = 'http://www.myspass.de/myspass/includes/apps/video/getvideometadataxml.php?id=%s'
+        video_id = self._match_id(url)
  
-        # video id is the last path element of the URL
-        # usually there is a trailing slash, so also try the second but last
-        url_path = compat_urllib_parse_urlparse(url).path
-        url_parent_path, video_id = os.path.split(url_path)
-        if not video_id:
-            _, video_id = os.path.split(url_parent_path)
-
-        # get metadata
-        metadata_url = META_DATA_URL_TEMPLATE % video_id
          metadata = self._download_xml(
-            metadata_url, video_id, transform_source=lambda s: s.strip())
-
-        # extract values from metadata
-        url_flv_el = metadata.find('url_flv')
-        if url_flv_el is None:
-            raise ExtractorError('Unable to extract download url')
-        video_url = url_flv_el.text
-        title_el = metadata.find('title')
-        if title_el is None:
-            raise ExtractorError('Unable to extract title')
-        title = title_el.text
-        format_id_el = metadata.find('format_id')
-        if format_id_el is None:
-            format = 'mp4'
-        else:
-            format = format_id_el.text
-        description_el = metadata.find('description')
-        if description_el is not None:
-            description = description_el.text
-        else:
-            description = None
-        imagePreview_el = metadata.find('imagePreview')
-        if imagePreview_el is not None:
-            thumbnail = imagePreview_el.text
-        else:
-            thumbnail = None
+            'http://www.myspass.de/myspass/includes/apps/video/getvideometadataxml.php?id=' + video_id,
+            video_id)
+
+        title = xpath_text(metadata, 'title', fatal=True)
+        video_url = xpath_text(metadata, 'url_flv', 'download url', True)
+        video_id_int = int(video_id)
+        for group in re.search(r'/myspass2009/\d+/(\d+)/(\d+)/(\d+)/', video_url).groups():
+            group_int = int(group)
+            if group_int > video_id_int:
+                video_url = video_url.replace(
+                    group, compat_str(group_int // video_id_int))
  
          return {
              'id': video_id,
              'url': video_url,
              'title': title,
-            'format': format,
-            'thumbnail': thumbnail,
-            'description': description,
+            'thumbnail': xpath_text(metadata, 'imagePreview'),
+            'description': xpath_text(metadata, 'description'),
+            'duration': parse_duration(xpath_text(metadata, 'duration')),
+            'series': xpath_text(metadata, 'format'),
+            'season_number': int_or_none(xpath_text(metadata, 'season')),
+            'season_id': xpath_text(metadata, 'season_id'),
+            'episode': title,
+            'episode_number': int_or_none(xpath_text(metadata, 'episode')),
          }
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index bb3d944133d6a1e2685779b86a7565ad9b0985f0..61fc59126f61ef3599f69ac7ad1d920e9bc87817 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -1,68 +1,33 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
+    clean_html,
+    dict_get,
      ExtractorError,
      int_or_none,
+    parse_duration,
+    try_get,
      update_url_query,
  )
  
  
-class NaverIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
+class NaverBaseIE(InfoExtractor):
+    _CAPTION_EXT_RE = r'\.(?:ttml|vtt)'
  
-    _TESTS = [{
-        'url': 'http://tv.naver.com/v/81652',
-        'info_dict': {
-            'id': '81652',
-            'ext': 'mp4',
-            'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
-            'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
-            'upload_date': '20130903',
-        },
-    }, {
-        'url': 'http://tv.naver.com/v/395837',
-        'md5': '638ed4c12012c458fefcddfd01f173cd',
-        'info_dict': {
-            'id': '395837',
-            'ext': 'mp4',
-            'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
-            'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
-            'upload_date': '20150519',
-        },
-        'skip': 'Georestricted',
-    }, {
-        'url': 'http://tvcast.naver.com/v/81652',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        vid = self._search_regex(
-            r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'video id', fatal=None, group='value')
-        in_key = self._search_regex(
-            r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'key', default=None, group='value')
-
-        if not vid or not in_key:
-            error = self._html_search_regex(
-                r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
-                webpage, 'error', default=None)
-            if error:
-                raise ExtractorError(error, expected=True)
-            raise ExtractorError('couldn\'t extract vid and key')
+    def _extract_video_info(self, video_id, vid, key):
          video_data = self._download_json(
              'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
              video_id, query={
-                'key': in_key,
+                'key': key,
              })
          meta = video_data['meta']
          title = meta['subject']
          formats = []
+        get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
  
          def extract_formats(streams, stream_type, query={}):
              for stream in streams:
@@ -73,7 +38,7 @@ class NaverIE(InfoExtractor):
                  encoding_option = stream.get('encodingOption', {})
                  bitrate = stream.get('bitrate', {})
                  formats.append({
-                    'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
+                    'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))),
                      'url': stream_url,
                      'width': int_or_none(encoding_option.get('width')),
                      'height': int_or_none(encoding_option.get('height')),
@@ -83,7 +48,7 @@ class NaverIE(InfoExtractor):
                      'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
                  })
  
-        extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
+        extract_formats(get_list('video'), 'H264')
          for stream_set in video_data.get('streams', []):
              query = {}
              for param in stream_set.get('keys', []):
@@ -101,28 +66,101 @@ class NaverIE(InfoExtractor):
                      'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
          self._sort_formats(formats)
  
+        replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
+
+        def get_subs(caption_url):
+            if re.search(self._CAPTION_EXT_RE, caption_url):
+                return [{
+                    'url': replace_ext(caption_url, 'ttml'),
+                }, {
+                    'url': replace_ext(caption_url, 'vtt'),
+                }]
+            else:
+                return [{'url': caption_url}]
+
+        automatic_captions = {}
          subtitles = {}
-        for caption in video_data.get('captions', {}).get('list', []):
+        for caption in get_list('caption'):
              caption_url = caption.get('source')
              if not caption_url:
                  continue
-            subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
-                'url': caption_url,
-            })
+            sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles
+            sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url))
  
-        upload_date = self._search_regex(
-            r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
-            webpage, 'upload date', fatal=False)
-        if upload_date:
-            upload_date = upload_date.replace('.', '')
+        user = meta.get('user', {})
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
              'subtitles': subtitles,
-            'description': self._og_search_description(webpage),
-            'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
+            'automatic_captions': automatic_captions,
+            'thumbnail': try_get(meta, lambda x: x['cover']['source']),
              'view_count': int_or_none(meta.get('count')),
-            'upload_date': upload_date,
+            'uploader_id': user.get('id'),
+            'uploader': user.get('name'),
+            'uploader_url': user.get('url'),
          }
+
+
+class NaverIE(NaverBaseIE):
+    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
+    _GEO_BYPASS = False
+    _TESTS = [{
+        'url': 'http://tv.naver.com/v/81652',
+        'info_dict': {
+            'id': '81652',
+            'ext': 'mp4',
+            'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
+            'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
+            'timestamp': 1378200754,
+            'upload_date': '20130903',
+            'uploader': '메가스터디, 합격불변의 법칙',
+            'uploader_id': 'megastudy',
+        },
+    }, {
+        'url': 'http://tv.naver.com/v/395837',
+        'md5': '8a38e35354d26a17f73f4e90094febd3',
+        'info_dict': {
+            'id': '395837',
+            'ext': 'mp4',
+            'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
+            'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
+            'timestamp': 1432030253,
+            'upload_date': '20150519',
+            'uploader': '4가지쇼 시즌2',
+            'uploader_id': 'wrappinguser29',
+        },
+        'skip': 'Georestricted',
+    }, {
+        'url': 'http://tvcast.naver.com/v/81652',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        content = self._download_json(
+            'https://tv.naver.com/api/json/v/' + video_id,
+            video_id, headers=self.geo_verification_headers())
+        player_info_json = content.get('playerInfoJson') or {}
+        current_clip = player_info_json.get('currentClip') or {}
+
+        vid = current_clip.get('videoId')
+        in_key = current_clip.get('inKey')
+
+        if not vid or not in_key:
+            player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
+            if player_auth == 'notCountry':
+                self.raise_geo_restricted(countries=['KR'])
+            elif player_auth == 'notLogin':
+                self.raise_login_required()
+            raise ExtractorError('couldn\'t extract vid and key')
+        info = self._extract_video_info(video_id, vid, in_key)
+        info.update({
+            'description': clean_html(current_clip.get('description')),
+            'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
+            'duration': parse_duration(current_clip.get('displayPlayTime')),
+            'like_count': int_or_none(current_clip.get('recommendPoint')),
+            'age_limit': 19 if current_clip.get('adult') else None,
+        })
+        return info
diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py

index 3282f84ee4ef637eb7d7ece2b9b859ba0059e42c..6f3cb30034da7f5fcebb99fc6dec05f1ff3cd8e4 100644 (file)
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -9,10 +9,13 @@ from .theplatform import ThePlatformIE
  from .adobepass import AdobePassIE
  from ..compat import compat_urllib_parse_unquote
  from ..utils import (
+    int_or_none,
+    js_to_json,
+    parse_duration,
      smuggle_url,
      try_get,
+    unified_timestamp,
      update_url_query,
-    int_or_none,
  )
  
  
@@ -84,28 +87,61 @@ class NBCIE(AdobePassIE):
      def _real_extract(self, url):
          permalink, video_id = re.match(self._VALID_URL, url).groups()
          permalink = 'http' + compat_urllib_parse_unquote(permalink)
-        response = self._download_json(
-            'https://api.nbc.com/v3/videos', video_id, query={
-                'filter[permalink]': permalink,
-                'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
-                'fields[shows]': 'shortTitle',
-                'include': 'show.shortTitle',
-            })
-        video_data = response['data'][0]['attributes']
+        video_data = self._download_json(
+            'https://friendship.nbc.co/v2/graphql', video_id, query={
+                'query': '''query bonanzaPage(
+  $app: NBCUBrands! = nbc
+  $name: String!
+  $oneApp: Boolean
+  $platform: SupportedPlatforms! = web
+  $type: EntityPageType! = VIDEO
+  $userId: String!
+) {
+  bonanzaPage(
+    app: $app
+    name: $name
+    oneApp: $oneApp
+    platform: $platform
+    type: $type
+    userId: $userId
+  ) {
+    metadata {
+      ... on VideoPageData {
+        description
+        episodeNumber
+        keywords
+        locked
+        mpxAccountId
+        mpxGuid
+        rating
+        resourceId
+        seasonNumber
+        secondaryTitle
+        seriesShortTitle
+      }
+    }
+  }
+}''',
+                'variables': json.dumps({
+                    'name': permalink,
+                    'oneApp': True,
+                    'userId': '0',
+                }),
+            })['data']['bonanzaPage']['metadata']
          query = {
              'mbr': 'true',
              'manifest': 'm3u',
          }
-        video_id = video_data['guid']
-        title = video_data['title']
-        if video_data.get('entitlement') == 'auth':
+        video_id = video_data['mpxGuid']
+        title = video_data['secondaryTitle']
+        if video_data.get('locked'):
              resource = self._get_mvpd_resource(
-                'nbcentertainment', title, video_id,
-                video_data.get('vChipRating'))
+                video_data.get('resourceId') or 'nbcentertainment',
+                title, video_id, video_data.get('rating'))
              query['auth'] = self._extract_mvpd_auth(
                  url, video_id, 'nbcentertainment', resource)
          theplatform_url = smuggle_url(update_url_query(
-            'http://link.theplatform.com/s/NnzsPC/media/guid/2410887629/' + video_id,
+            'http://link.theplatform.com/s/NnzsPC/media/guid/%s/%s' % (video_data.get('mpxAccountId') or '2410887629', video_id),
              query), {'force_smil_url': True})
          return {
              '_type': 'url_transparent',
@@ -117,7 +153,7 @@ class NBCIE(AdobePassIE):
              'season_number': int_or_none(video_data.get('seasonNumber')),
              'episode_number': int_or_none(video_data.get('episodeNumber')),
              'episode': title,
-            'series': try_get(response, lambda x: x['included'][0]['attributes']['shortTitle']),
+            'series': video_data.get('seriesShortTitle'),
              'ie_key': 'ThePlatform',
          }
  
@@ -272,13 +308,12 @@ class NBCNewsIE(ThePlatformIE):
      _TESTS = [
          {
              'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
-            'md5': 'af1adfa51312291a017720403826bb64',
+            'md5': 'cf4bc9e6ce0130f00f545d80ecedd4bf',
              'info_dict': {
                  'id': '269389891880',
                  'ext': 'mp4',
                  'title': 'How Twitter Reacted To The Snowden Interview',
                  'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
-                'uploader': 'NBCU-NEWS',
                  'timestamp': 1401363060,
                  'upload_date': '20140529',
              },
@@ -296,28 +331,26 @@ class NBCNewsIE(ThePlatformIE):
          },
          {
              'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
-            'md5': '73135a2e0ef819107bbb55a5a9b2a802',
+            'md5': '8eb831eca25bfa7d25ddd83e85946548',
              'info_dict': {
                  'id': '394064451844',
                  'ext': 'mp4',
                  'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
                  'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
                  'timestamp': 1423104900,
-                'uploader': 'NBCU-NEWS',
                  'upload_date': '20150205',
              },
          },
          {
              'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
-            'md5': 'a49e173825e5fcd15c13fc297fced39d',
+            'md5': '4a8c4cec9e1ded51060bdda36ff0a5c0',
              'info_dict': {
-                'id': '529953347624',
+                'id': 'n431456',
                  'ext': 'mp4',
-                'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
-                'description': 'md5:c8be487b2d80ff0594c005add88d8351',
+                'title': "Volkswagen U.S. Chief:  We 'Totally Screwed Up'",
+                'description': 'md5:d22d1281a24f22ea0880741bb4dd6301',
                  'upload_date': '20150922',
                  'timestamp': 1442917800,
-                'uploader': 'NBCU-NEWS',
              },
          },
          {
@@ -330,7 +363,6 @@ class NBCNewsIE(ThePlatformIE):
                  'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
                  'upload_date': '20160420',
                  'timestamp': 1461152093,
-                'uploader': 'NBCU-NEWS',
              },
          },
          {
@@ -344,7 +376,6 @@ class NBCNewsIE(ThePlatformIE):
                  'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': 1406937606,
                  'upload_date': '20140802',
-                'uploader': 'NBCU-NEWS',
              },
          },
          {
@@ -360,20 +391,61 @@ class NBCNewsIE(ThePlatformIE):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        if not video_id.isdigit():
-            webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, video_id)
  
-            data = self._parse_json(self._search_regex(
-                r'window\.__data\s*=\s*({.+});', webpage,
-                'bootstrap json'), video_id)
-            video_id = data['article']['content'][0]['primaryMedia']['video']['mpxMetadata']['id']
+        data = self._parse_json(self._search_regex(
+            r'window\.__data\s*=\s*({.+});', webpage,
+            'bootstrap json'), video_id, js_to_json)
+        video_data = try_get(data, lambda x: x['video']['current'], dict)
+        if not video_data:
+            video_data = data['article']['content'][0]['primaryMedia']['video']
+        title = video_data['headline']['primary']
+
+        formats = []
+        for va in video_data.get('videoAssets', []):
+            public_url = va.get('publicUrl')
+            if not public_url:
+                continue
+            if '://link.theplatform.com/' in public_url:
+                public_url = update_url_query(public_url, {'format': 'redirect'})
+            format_id = va.get('format')
+            if format_id == 'M3U':
+                formats.extend(self._extract_m3u8_formats(
+                    public_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=format_id, fatal=False))
+                continue
+            tbr = int_or_none(va.get('bitrate'), 1000)
+            if tbr:
+                format_id += '-%d' % tbr
+            formats.append({
+                'format_id': format_id,
+                'url': public_url,
+                'width': int_or_none(va.get('width')),
+                'height': int_or_none(va.get('height')),
+                'tbr': tbr,
+                'ext': 'mp4',
+            })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        closed_captioning = video_data.get('closedCaptioning')
+        if closed_captioning:
+            for cc_url in closed_captioning.values():
+                if not cc_url:
+                    continue
+                subtitles.setdefault('en', []).append({
+                    'url': cc_url,
+                })
  
          return {
-            '_type': 'url_transparent',
              'id': video_id,
-            # http://feed.theplatform.com/f/2E2eJC/nbcnews also works
-            'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {'byId': video_id}),
-            'ie_key': 'ThePlatformFeed',
+            'title': title,
+            'description': try_get(video_data, lambda x: x['description']['primary']),
+            'thumbnail': try_get(video_data, lambda x: x['primaryImage']['url']['primary']),
+            'duration': parse_duration(video_data.get('duration')),
+            'timestamp': unified_timestamp(video_data.get('datePublished')),
+            'formats': formats,
+            'subtitles': subtitles,
          }
  
  
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index aec2ea1331f3c909957e50d4166e7657618fa1a6..9c8bf05af10b26f0c45da23ddeb5fa42ec51f307 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -9,6 +9,8 @@ from ..utils import (
      int_or_none,
      parse_iso8601,
      qualities,
+    try_get,
+    urljoin,
  )
  
  
@@ -220,11 +222,17 @@ class NDREmbedBaseIE(InfoExtractor):
          upload_date = ppjson.get('config', {}).get('publicationDate')
          duration = int_or_none(config.get('duration'))
  
-        thumbnails = [{
-            'id': thumbnail.get('quality') or thumbnail_id,
-            'url': thumbnail['src'],
-            'preference': quality_key(thumbnail.get('quality')),
-        } for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
+        thumbnails = []
+        poster = try_get(config, lambda x: x['poster'], dict) or {}
+        for thumbnail_id, thumbnail in poster.items():
+            thumbnail_url = urljoin(url, thumbnail.get('src'))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'id': thumbnail.get('quality') or thumbnail_id,
+                'url': thumbnail_url,
+                'preference': quality_key(thumbnail.get('quality')),
+            })
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/nexx.py b/youtube_dl/extractor/nexx.py

index 82d526c228b9a41c90676aed6471f7613b9df7a7..586c1b7eb0197a3973edb3096c9457716f2bb982 100644 (file)
--- a/youtube_dl/extractor/nexx.py
+++ b/youtube_dl/extractor/nexx.py
@@ -108,7 +108,7 @@ class NexxIE(InfoExtractor):
      @staticmethod
      def _extract_domain_id(webpage):
          mobj = re.search(
-            r'<script\b[^>]+\bsrc=["\'](?:https?:)?//require\.nexx(?:\.cloud|cdn\.com)/(?P<id>\d+)',
+            r'<script\b[^>]+\bsrc=["\'](?:https?:)?//(?:require|arc)\.nexx(?:\.cloud|cdn\.com)/(?:sdk/)?(?P<id>\d+)',
              webpage)
          return mobj.group('id') if mobj else None
  
@@ -123,7 +123,7 @@ class NexxIE(InfoExtractor):
          domain_id = NexxIE._extract_domain_id(webpage)
          if domain_id:
              for video_id in re.findall(
-                    r'(?is)onPLAYReady.+?_play\.init\s*\(.+?\s*,\s*["\']?(\d+)',
+                    r'(?is)onPLAYReady.+?_play\.(?:init|(?:control\.)?addPlayer)\s*\(.+?\s*,\s*["\']?(\d+)',
                      webpage):
                  entries.append(
                      'https://api.nexx.cloud/v3/%s/videos/byid/%s'
@@ -295,13 +295,23 @@ class NexxIE(InfoExtractor):
  
          video = None
  
+        def find_video(result):
+            if isinstance(result, dict):
+                return result
+            elif isinstance(result, list):
+                vid = int(video_id)
+                for v in result:
+                    if try_get(v, lambda x: x['general']['ID'], int) == vid:
+                        return v
+            return None
+
          response = self._download_json(
              'https://arc.nexx.cloud/api/video/%s.json' % video_id,
              video_id, fatal=False)
          if response and isinstance(response, dict):
              result = response.get('result')
-            if result and isinstance(result, dict):
-                video = result
+            if result:
+                video = find_video(result)
  
          # not all videos work via arc, e.g. nexx:741:1269984
          if not video:
@@ -348,7 +358,7 @@ class NexxIE(InfoExtractor):
              request_token = hashlib.md5(
                  ''.join((op, domain_id, secret)).encode('utf-8')).hexdigest()
  
-            video = self._call_api(
+            result = self._call_api(
                  domain_id, 'videos/%s/%s' % (op, video_id), video_id, data={
                      'additionalfields': 'language,channel,actors,studio,licenseby,slug,subtitle,teaser,description',
                      'addInteractionOptions': '1',
@@ -363,6 +373,7 @@ class NexxIE(InfoExtractor):
                      'X-Request-CID': cid,
                      'X-Request-Token': request_token,
                  })
+            video = find_video(result)
  
          general = video['general']
          title = general['title']
@@ -399,8 +410,8 @@ class NexxIE(InfoExtractor):
  
  
  class NexxEmbedIE(InfoExtractor):
-    _VALID_URL = r'https?://embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?P<id>[^/?#&]+)'
-    _TEST = {
+    _VALID_URL = r'https?://embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?:video/)?(?P<id>[^/?#&]+)'
+    _TESTS = [{
          'url': 'http://embed.nexx.cloud/748/KC1614647Z27Y7T?autoplay=1',
          'md5': '16746bfc28c42049492385c989b26c4a',
          'info_dict': {
@@ -409,7 +420,6 @@ class NexxEmbedIE(InfoExtractor):
              'title': 'Nervenkitzel Achterbahn',
              'alt_title': 'Karussellbauer in Deutschland',
              'description': 'md5:ffe7b1cc59a01f585e0569949aef73cc',
-            'release_year': 2005,
              'creator': 'SPIEGEL TV',
              'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 2761,
@@ -420,7 +430,10 @@ class NexxEmbedIE(InfoExtractor):
              'format': 'bestvideo',
              'skip_download': True,
          },
-    }
+    }, {
+        'url': 'https://embed.nexx.cloud/11888/video/DSRTO7UVOX06S7',
+        'only_matching': True,
+    }]
  
      @staticmethod
      def _extract_urls(webpage):
diff --git a/youtube_dl/extractor/nintendo.py b/youtube_dl/extractor/nintendo.py

index 4b4e66b059527c337c2b4c9a210b4cf39c1b7a87..ff8f70ba662a11a42df5c827e71ac54025bcb8c9 100644 (file)
--- a/youtube_dl/extractor/nintendo.py
+++ b/youtube_dl/extractor/nintendo.py
@@ -5,13 +5,12 @@ import re
  
  from .common import InfoExtractor
  from .ooyala import OoyalaIE
-from ..utils import unescapeHTML
  
  
  class NintendoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nintendo\.com/games/detail/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?nintendo\.com/(?:games/detail|nintendo-direct)/(?P<id>[^/?#&]+)'
      _TESTS = [{
-        'url': 'http://www.nintendo.com/games/detail/yEiAzhU2eQI1KZ7wOHhngFoAHc1FpHwj',
+        'url': 'https://www.nintendo.com/games/detail/duck-hunt-wii-u/',
          'info_dict': {
              'id': 'MzMmticjp0VPzO3CCj4rmFOuohEuEWoW',
              'ext': 'flv',
@@ -28,7 +27,19 @@ class NintendoIE(InfoExtractor):
              'id': 'tokyo-mirage-sessions-fe-wii-u',
              'title': 'Tokyo Mirage Sessions ♯FE',
          },
-        'playlist_count': 3,
+        'playlist_count': 4,
+    }, {
+        'url': 'https://www.nintendo.com/nintendo-direct/09-04-2019/',
+        'info_dict': {
+            'id': 'J2bXdmaTE6fe3dWJTPcc7m23FNbc_A1V',
+            'ext': 'mp4',
+            'title': 'Switch_ROS_ND0904-H264.mov',
+            'duration': 2324.758,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['Ooyala'],
      }]
  
      def _real_extract(self, url):
@@ -39,8 +50,11 @@ class NintendoIE(InfoExtractor):
          entries = [
              OoyalaIE._build_url_result(m.group('code'))
              for m in re.finditer(
-                r'class=(["\'])embed-video\1[^>]+data-video-code=(["\'])(?P<code>(?:(?!\2).)+)\2',
-                webpage)]
+                r'data-(?:video-id|directVideoId)=(["\'])(?P<code>(?:(?!\1).)+)\1', webpage)]
+
+        title = self._html_search_regex(
+            r'(?s)<(?:span|div)[^>]+class="(?:title|wrapper)"[^>]*>.*?<h1>(.+?)</h1>',
+            webpage, 'title', fatal=False)
  
          return self.playlist_result(
-            entries, page_id, unescapeHTML(self._og_search_title(webpage, fatal=False)))
+            entries, page_id, title)
diff --git a/youtube_dl/extractor/nonktube.py b/youtube_dl/extractor/nonktube.py

index 63e58aae2666f4f228a5a73ee4f4fc12add6dd37..ca1424e06a5c77a8ab2af5c2e4fe5b8d98aee1f9 100644 (file)
--- a/youtube_dl/extractor/nonktube.py
+++ b/youtube_dl/extractor/nonktube.py
@@ -25,9 +25,14 @@ class NonkTubeIE(NuevoBaseIE):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        info = self._extract_nuevo(
-            'https://www.nonktube.com/media/nuevo/econfig.php?key=%s'
-            % video_id, video_id)
+        webpage = self._download_webpage(url, video_id)
  
-        info['age_limit'] = 18
+        title = self._og_search_title(webpage)
+        info = self._parse_html5_media_entries(url, webpage, video_id)[0]
+
+        info.update({
+            'id': video_id,
+            'title': title,
+            'age_limit': 18,
+        })
          return info
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index 5f43e692f43091eff351567e57c030585b4cc22e..94115534b72ac19f3aaea3e35ef06fa3eaef3d7f 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -12,6 +12,7 @@ from ..utils import (
      ExtractorError,
      int_or_none,
      JSON_LD_RE,
+    js_to_json,
      NO_DEFAULT,
      parse_age_limit,
      parse_duration,
@@ -105,6 +106,7 @@ class NRKBaseIE(InfoExtractor):
              MESSAGES = {
                  'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
                  'ProgramRightsHasExpired': 'Programmet har gått ut',
+                'NoProgramRights': 'Ikke tilgjengelig',
                  'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
              }
              message_type = data.get('messageType', '')
@@ -255,6 +257,17 @@ class NRKTVIE(NRKBaseIE):
                      ''' % _EPISODE_RE
      _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
      _TESTS = [{
+        'url': 'https://tv.nrk.no/program/MDDP12000117',
+        'md5': '8270824df46ec629b66aeaa5796b36fb',
+        'info_dict': {
+            'id': 'MDDP12000117AA',
+            'ext': 'mp4',
+            'title': 'Alarm Trolltunga',
+            'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
+            'duration': 2223,
+            'age_limit': 6,
+        },
+    }, {
          'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
          'md5': '9a167e54d04671eb6317a37b7bc8a280',
          'info_dict': {
@@ -266,6 +279,7 @@ class NRKTVIE(NRKBaseIE):
              'series': '20 spørsmål',
              'episode': '23.05.2014',
          },
+        'skip': 'NoProgramRights',
      }, {
          'url': 'https://tv.nrk.no/program/mdfp15000514',
          'info_dict': {
@@ -370,7 +384,24 @@ class NRKTVIE(NRKBaseIE):
  
  class NRKTVEpisodeIE(InfoExtractor):
      _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
-    _TEST = {
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
+        'info_dict': {
+            'id': 'MUHH36005220BA',
+            'ext': 'mp4',
+            'title': 'Kro, krig og kjærlighet 2:6',
+            'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
+            'duration': 1563,
+            'series': 'Hellums kro',
+            'season_number': 1,
+            'episode_number': 2,
+            'episode': '2:6',
+            'age_limit': 6,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
          'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
          'info_dict': {
              'id': 'MSUI14000816AA',
@@ -386,7 +417,8 @@ class NRKTVEpisodeIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
-    }
+        'skip': 'ProgramRightsHasExpired',
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
@@ -406,10 +438,10 @@ class NRKTVSerieBaseIE(InfoExtractor):
      def _extract_series(self, webpage, display_id, fatal=True):
          config = self._parse_json(
              self._search_regex(
-                (r'INITIAL_DATA_*\s*=\s*({.+?})\s*;',
+                (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
                   r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
                  webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
-            display_id, fatal=False)
+            display_id, fatal=False, transform_source=js_to_json)
          if not config:
              return
          return try_get(
@@ -479,6 +511,14 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
      _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
      _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
      _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/blank',
+        'info_dict': {
+            'id': 'blank',
+            'title': 'Blank',
+            'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
+        },
+        'playlist_mincount': 30,
+    }, {
          # new layout, seasons
          'url': 'https://tv.nrk.no/serie/backstage',
          'info_dict': {
@@ -648,7 +688,7 @@ class NRKSkoleIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
-        'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
+        'md5': '18c12c3d071953c3bf8d54ef6b2587b7',
          'info_dict': {
              'id': '6021',
              'ext': 'mp4',
diff --git a/youtube_dl/extractor/nrl.py b/youtube_dl/extractor/nrl.py

index 798b91e048c2ecad4754ab6538e00e2b47bd1e60..22a2df8d3fe4e3038879997a72c3df850308db6b 100644 (file)
--- a/youtube_dl/extractor/nrl.py
+++ b/youtube_dl/extractor/nrl.py
@@ -23,8 +23,8 @@ class NRLTVIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        q_data = self._parse_json(self._search_regex(
-            r"(?s)q-data='({.+?})'", webpage, 'player data'), display_id)
+        q_data = self._parse_json(self._html_search_regex(
+            r'(?s)q-data="({.+?})"', webpage, 'player data'), display_id)
          ooyala_id = q_data['videoId']
          return self.url_result(
              'ooyala:' + ooyala_id, 'Ooyala', ooyala_id, q_data.get('title'))
diff --git a/youtube_dl/extractor/ntvru.py b/youtube_dl/extractor/ntvru.py

index 4f9cedb84a47a8481b2c4058c5a59483b9a613bd..c47d1dfa4d73d6bd31a377150ea92a3dbca75b38 100644 (file)
--- a/youtube_dl/extractor/ntvru.py
+++ b/youtube_dl/extractor/ntvru.py
@@ -3,9 +3,10 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
-    clean_html,
-    xpath_text,
      int_or_none,
+    strip_or_none,
+    unescapeHTML,
+    xpath_text,
  )
  
  
@@ -47,10 +48,10 @@ class NTVRuIE(InfoExtractor):
              'duration': 1496,
          },
      }, {
-        'url': 'http://www.ntv.ru/kino/Koma_film',
-        'md5': 'f825770930937aa7e5aca0dc0d29319a',
+        'url': 'https://www.ntv.ru/kino/Koma_film/m70281/o336036/video/',
+        'md5': 'e9c7cde24d9d3eaed545911a04e6d4f4',
          'info_dict': {
-            'id': '1007609',
+            'id': '1126480',
              'ext': 'mp4',
              'title': 'Остросюжетный фильм «Кома»',
              'description': 'Остросюжетный фильм «Кома»',
@@ -68,6 +69,10 @@ class NTVRuIE(InfoExtractor):
              'thumbnail': r're:^http://.*\.jpg',
              'duration': 2590,
          },
+    }, {
+        # Schemeless file URL
+        'url': 'https://www.ntv.ru/video/1797442',
+        'only_matching': True,
      }]
  
      _VIDEO_ID_REGEXES = [
@@ -96,37 +101,31 @@ class NTVRuIE(InfoExtractor):
              'http://www.ntv.ru/vi%s/' % video_id,
              video_id, 'Downloading video XML')
  
-        title = clean_html(xpath_text(player, './data/title', 'title', fatal=True))
-        description = clean_html(xpath_text(player, './data/description', 'description'))
+        title = strip_or_none(unescapeHTML(xpath_text(player, './data/title', 'title', fatal=True)))
  
          video = player.find('./data/video')
-        video_id = xpath_text(video, './id', 'video id')
-        thumbnail = xpath_text(video, './splash', 'thumbnail')
-        duration = int_or_none(xpath_text(video, './totaltime', 'duration'))
-        view_count = int_or_none(xpath_text(video, './views', 'view count'))
-
-        token = self._download_webpage(
-            'http://stat.ntv.ru/services/access/token',
-            video_id, 'Downloading access token')
  
          formats = []
          for format_id in ['', 'hi', 'webm']:
-            file_ = video.find('./%sfile' % format_id)
-            if file_ is None:
+            file_ = xpath_text(video, './%sfile' % format_id)
+            if not file_:
                  continue
-            size = video.find('./%ssize' % format_id)
+            if file_.startswith('//'):
+                file_ = self._proto_relative_url(file_)
+            elif not file_.startswith('http'):
+                file_ = 'http://media.ntv.ru/vod/' + file_
              formats.append({
-                'url': 'http://media2.ntv.ru/vod/%s&tok=%s' % (file_.text, token),
-                'filesize': int_or_none(size.text if size is not None else None),
+                'url': file_,
+                'filesize': int_or_none(xpath_text(video, './%ssize' % format_id)),
              })
          self._sort_formats(formats)
  
          return {
-            'id': video_id,
+            'id': xpath_text(video, './id'),
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'view_count': view_count,
+            'description': strip_or_none(unescapeHTML(xpath_text(player, './data/description'))),
+            'thumbnail': xpath_text(video, './splash'),
+            'duration': int_or_none(xpath_text(video, './totaltime')),
+            'view_count': int_or_none(xpath_text(video, './views')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/odnoklassniki.py b/youtube_dl/extractor/odnoklassniki.py

index 114b93c078516cab082d3943dbc260346cdb753d..7ed9fac5510f5f69b9eb23c07163bd50f91fdca9 100644 (file)
--- a/youtube_dl/extractor/odnoklassniki.py
+++ b/youtube_dl/extractor/odnoklassniki.py
@@ -1,6 +1,8 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import (
      compat_etree_fromstring,
@@ -121,6 +123,13 @@ class OdnoklassnikiIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
+        if mobj:
+            return mobj.group('url')
+
      def _real_extract(self, url):
          start_time = int_or_none(compat_parse_qs(
              compat_urllib_parse_urlparse(url).query).get('fromTime', [None])[0])
diff --git a/youtube_dl/extractor/onet.py b/youtube_dl/extractor/onet.py

index 58da1bc27b120e6636e61237dc1b0b61972bf96e..e55b2ac89bb7b6449d0ac51f1092d30742b3606c 100644 (file)
--- a/youtube_dl/extractor/onet.py
+++ b/youtube_dl/extractor/onet.py
@@ -20,6 +20,8 @@ from ..utils import (
  
  
  class OnetBaseIE(InfoExtractor):
+    _URL_BASE_RE = r'https?://(?:(?:www\.)?onet\.tv|onet100\.vod\.pl)/[a-z]/'
+
      def _search_mvp_id(self, webpage):
          return self._search_regex(
              r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
@@ -45,7 +47,7 @@ class OnetBaseIE(InfoExtractor):
          video = response['result'].get('0')
  
          formats = []
-        for _, formats_dict in video['formats'].items():
+        for format_type, formats_dict in video['formats'].items():
              if not isinstance(formats_dict, dict):
                  continue
              for format_id, format_list in formats_dict.items():
@@ -56,21 +58,31 @@ class OnetBaseIE(InfoExtractor):
                      if not video_url:
                          continue
                      ext = determine_ext(video_url)
-                    if format_id == 'ism':
+                    if format_id.startswith('ism'):
                          formats.extend(self._extract_ism_formats(
                              video_url, video_id, 'mss', fatal=False))
                      elif ext == 'mpd':
                          formats.extend(self._extract_mpd_formats(
                              video_url, video_id, mpd_id='dash', fatal=False))
+                    elif format_id.startswith('hls'):
+                        formats.extend(self._extract_m3u8_formats(
+                            video_url, video_id, 'mp4', 'm3u8_native',
+                            m3u8_id='hls', fatal=False))
                      else:
-                        formats.append({
+                        http_f = {
                              'url': video_url,
                              'format_id': format_id,
-                            'height': int_or_none(f.get('vertical_resolution')),
-                            'width': int_or_none(f.get('horizontal_resolution')),
                              'abr': float_or_none(f.get('audio_bitrate')),
-                            'vbr': float_or_none(f.get('video_bitrate')),
-                        })
+                        }
+                        if format_type == 'audio':
+                            http_f['vcodec'] = 'none'
+                        else:
+                            http_f.update({
+                                'height': int_or_none(f.get('vertical_resolution')),
+                                'width': int_or_none(f.get('horizontal_resolution')),
+                                'vbr': float_or_none(f.get('video_bitrate')),
+                            })
+                        formats.append(http_f)
          self._sort_formats(formats)
  
          meta = video.get('meta', {})
@@ -105,12 +117,12 @@ class OnetMVPIE(OnetBaseIE):
  
  
  class OnetIE(OnetBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
+    _VALID_URL = OnetBaseIE._URL_BASE_RE + r'[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
      IE_NAME = 'onet.tv'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://onet.tv/k/openerfestival/open-er-festival-2016-najdziwniejsze-wymagania-gwiazd/qbpyqc',
-        'md5': 'e3ffbf47590032ac3f27249204173d50',
+        'md5': '436102770fb095c75b8bb0392d3da9ff',
          'info_dict': {
              'id': 'qbpyqc',
              'display_id': 'open-er-festival-2016-najdziwniejsze-wymagania-gwiazd',
@@ -120,7 +132,10 @@ class OnetIE(OnetBaseIE):
              'upload_date': '20160705',
              'timestamp': 1467721580,
          },
-    }
+    }, {
+        'url': 'https://onet100.vod.pl/k/openerfestival/open-er-festival-2016-najdziwniejsze-wymagania-gwiazd/qbpyqc',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -140,18 +155,21 @@ class OnetIE(OnetBaseIE):
  
  
  class OnetChannelIE(OnetBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/(?P<id>[a-z]+)(?:[?#]|$)'
+    _VALID_URL = OnetBaseIE._URL_BASE_RE + r'(?P<id>[a-z]+)(?:[?#]|$)'
      IE_NAME = 'onet.tv:channel'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://onet.tv/k/openerfestival',
          'info_dict': {
              'id': 'openerfestival',
-            'title': 'Open\'er Festival Live',
-            'description': 'Dziękujemy, że oglądaliście transmisje. Zobaczcie nasze relacje i wywiady z artystami.',
+            'title': "Open'er Festival",
+            'description': "Tak było na Open'er Festival 2016! Oglądaj nasze reportaże i wywiady z artystami.",
          },
-        'playlist_mincount': 46,
-    }
+        'playlist_mincount': 35,
+    }, {
+        'url': 'https://onet100.vod.pl/k/openerfestival',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          channel_id = self._match_id(url)
@@ -173,7 +191,7 @@ class OnetChannelIE(OnetBaseIE):
              'Downloading channel %s - add --no-playlist to just download video %s' % (
                  channel_id, video_name))
          matches = re.findall(
-            r'<a[^>]+href=[\'"](https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/[0-9a-z-]+/[0-9a-z]+)',
+            r'<a[^>]+href=[\'"](%s[a-z]+/[0-9a-z-]+/[0-9a-z]+)' % self._URL_BASE_RE,
              webpage)
          entries = [
              self.url_result(video_link, OnetIE.ie_key())
diff --git a/youtube_dl/extractor/onionstudios.py b/youtube_dl/extractor/onionstudios.py

index c6e3d5640da8f4f0040e862a9641fb73deb391a4..cf5c39e66b95052bd9d297232fb4fe51a966dcc2 100644 (file)
--- a/youtube_dl/extractor/onionstudios.py
+++ b/youtube_dl/extractor/onionstudios.py
@@ -4,12 +4,8 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import (
-    determine_ext,
-    int_or_none,
-    float_or_none,
-    mimetype2ext,
-)
+from ..compat import compat_str
+from ..utils import js_to_json
  
  
  class OnionStudiosIE(InfoExtractor):
@@ -17,14 +13,16 @@ class OnionStudiosIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.onionstudios.com/videos/hannibal-charges-forward-stops-for-a-cocktail-2937',
-        'md5': '719d1f8c32094b8c33902c17bcae5e34',
+        'md5': '5a118d466d62b5cd03647cf2c593977f',
          'info_dict': {
-            'id': '2937',
+            'id': '3459881',
              'ext': 'mp4',
              'title': 'Hannibal charges forward, stops for a cocktail',
+            'description': 'md5:545299bda6abf87e5ec666548c6a9448',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'uploader': 'The A.V. Club',
-            'uploader_id': 'the-av-club',
+            'uploader': 'a.v. club',
+            'upload_date': '20150619',
+            'timestamp': 1434728546,
          },
      }, {
          'url': 'http://www.onionstudios.com/embed?id=2855&autoplay=true',
@@ -44,38 +42,12 @@ class OnionStudiosIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        video_data = self._download_json(
-            'http://www.onionstudios.com/video/%s.json' % video_id, video_id)
-
-        title = video_data['title']
-
-        formats = []
-        for source in video_data.get('sources', []):
-            source_url = source.get('url')
-            if not source_url:
-                continue
-            ext = mimetype2ext(source.get('content_type')) or determine_ext(source_url)
-            if ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
-                    source_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
-            else:
-                tbr = int_or_none(source.get('bitrate'))
-                formats.append({
-                    'format_id': ext + ('-%d' % tbr if tbr else ''),
-                    'url': source_url,
-                    'width': int_or_none(source.get('width')),
-                    'tbr': tbr,
-                    'ext': ext,
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': video_data.get('poster_url'),
-            'uploader': video_data.get('channel_name'),
-            'uploader_id': video_data.get('channel_slug'),
-            'duration': float_or_none(video_data.get('duration', 1000)),
-            'tags': video_data.get('tags'),
-            'formats': formats,
-        }
+        webpage = self._download_webpage(
+            'http://onionstudios.com/embed/dc94dc2899fe644c0e7241fa04c1b732.js',
+            video_id)
+        mcp_id = compat_str(self._parse_json(self._search_regex(
+            r'window\.mcpMapping\s*=\s*({.+?});', webpage,
+            'MCP Mapping'), video_id, js_to_json)[video_id]['mcp_id'])
+        return self.url_result(
+            'http://kinja.com/ajax/inset/iframe?id=mcp-' + mcp_id,
+            'KinjaEmbed', mcp_id)
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index 995b24d1bd4758f0c51350d517a52a83053d584f..eb957b8fefc9dd4fbbbad0279b4d87687a501b07 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -1,12 +1,12 @@
  from __future__ import unicode_literals
  
+import base64
  import re
  
  from .common import InfoExtractor
  from ..compat import (
      compat_b64decode,
      compat_str,
-    compat_urllib_parse_urlencode,
  )
  from ..utils import (
      determine_ext,
@@ -21,9 +21,9 @@ from ..utils import (
  class OoyalaBaseIE(InfoExtractor):
      _PLAYER_BASE = 'http://player.ooyala.com/'
      _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
-    _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
+    _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s'
  
-    def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None, embed_token=None):
+    def _extract(self, content_tree_url, video_id, domain=None, supportedformats=None, embed_token=None):
          content_tree = self._download_json(content_tree_url, video_id)['content_tree']
          metadata = content_tree[list(content_tree)[0]]
          embed_code = metadata['embed_code']
@@ -31,59 +31,62 @@ class OoyalaBaseIE(InfoExtractor):
          title = metadata['title']
  
          auth_data = self._download_json(
-            self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code)
-            + compat_urllib_parse_urlencode({
-                'domain': domain,
+            self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code),
+            video_id, headers=self.geo_verification_headers(), query={
+                'domain': domain or 'player.ooyala.com',
                  'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth',
                  'embedToken': embed_token,
-            }), video_id, headers=self.geo_verification_headers())
-
-        cur_auth_data = auth_data['authorization_data'][embed_code]
+            })['authorization_data'][embed_code]
  
          urls = []
          formats = []
-        if cur_auth_data['authorized']:
-            for stream in cur_auth_data['streams']:
-                url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
-                if not url_data:
-                    continue
-                s_url = compat_b64decode(url_data).decode('utf-8')
-                if not s_url or s_url in urls:
-                    continue
-                urls.append(s_url)
-                ext = determine_ext(s_url, None)
-                delivery_type = stream.get('delivery_type')
-                if delivery_type == 'hls' or ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
-                        m3u8_id='hls', fatal=False))
-                elif delivery_type == 'hds' or ext == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
-                        s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
-                elif delivery_type == 'dash' or ext == 'mpd':
-                    formats.extend(self._extract_mpd_formats(
-                        s_url, embed_code, mpd_id='dash', fatal=False))
-                elif delivery_type == 'smooth':
-                    self._extract_ism_formats(
-                        s_url, embed_code, ism_id='mss', fatal=False)
-                elif ext == 'smil':
-                    formats.extend(self._extract_smil_formats(
-                        s_url, embed_code, fatal=False))
-                else:
-                    formats.append({
-                        'url': s_url,
-                        'ext': ext or delivery_type,
-                        'vcodec': stream.get('video_codec'),
-                        'format_id': delivery_type,
-                        'width': int_or_none(stream.get('width')),
-                        'height': int_or_none(stream.get('height')),
-                        'abr': int_or_none(stream.get('audio_bitrate')),
-                        'vbr': int_or_none(stream.get('video_bitrate')),
-                        'fps': float_or_none(stream.get('framerate')),
-                    })
-        else:
+        streams = auth_data.get('streams') or [{
+            'delivery_type': 'hls',
+            'url': {
+                'data': base64.b64encode(('http://player.ooyala.com/hls/player/all/%s.m3u8' % embed_code).encode()).decode(),
+            }
+        }]
+        for stream in streams:
+            url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
+            if not url_data:
+                continue
+            s_url = compat_b64decode(url_data).decode('utf-8')
+            if not s_url or s_url in urls:
+                continue
+            urls.append(s_url)
+            ext = determine_ext(s_url, None)
+            delivery_type = stream.get('delivery_type')
+            if delivery_type == 'hls' or ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif delivery_type == 'hds' or ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
+            elif delivery_type == 'dash' or ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    s_url, embed_code, mpd_id='dash', fatal=False))
+            elif delivery_type == 'smooth':
+                self._extract_ism_formats(
+                    s_url, embed_code, ism_id='mss', fatal=False)
+            elif ext == 'smil':
+                formats.extend(self._extract_smil_formats(
+                    s_url, embed_code, fatal=False))
+            else:
+                formats.append({
+                    'url': s_url,
+                    'ext': ext or delivery_type,
+                    'vcodec': stream.get('video_codec'),
+                    'format_id': delivery_type,
+                    'width': int_or_none(stream.get('width')),
+                    'height': int_or_none(stream.get('height')),
+                    'abr': int_or_none(stream.get('audio_bitrate')),
+                    'vbr': int_or_none(stream.get('video_bitrate')),
+                    'fps': float_or_none(stream.get('framerate')),
+                })
+        if not formats and not auth_data.get('authorized'):
              raise ExtractorError('%s said: %s' % (
-                self.IE_NAME, cur_auth_data['message']), expected=True)
+                self.IE_NAME, auth_data['message']), expected=True)
          self._sort_formats(formats)
  
          subtitles = {}
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

index 1fe58178097a1cc1c101409a672b861dbf353561..0c20d0177e0f2abbc5c0503babf51ba66715f374 100644 (file)
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -3,21 +3,17 @@ from __future__ import unicode_literals
  
  import json
  import os
-import re
  import subprocess
  import tempfile
  
-from .common import InfoExtractor
  from ..compat import (
      compat_urlparse,
      compat_kwargs,
  )
  from ..utils import (
      check_executable,
-    determine_ext,
      encodeArgument,
      ExtractorError,
-    get_element_by_id,
      get_exe_version,
      is_outdated_version,
      std_headers,
@@ -240,259 +236,3 @@ class PhantomJSwrapper(object):
          self._load_cookies()
  
          return (html, encodeArgument(out))
-
-
-class OpenloadIE(InfoExtractor):
-    _DOMAINS = r'''
-                    (?:
-                        openload\.(?:co|io|link|pw)|
-                        oload\.(?:tv|best|biz|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|online|press|pw|life|live|space|services|website|vip)|
-                        oladblock\.(?:services|xyz|me)|openloed\.co
-                    )
-                '''
-    _VALID_URL = r'''(?x)
-                    https?://
-                        (?P<host>
-                            (?:www\.)?
-                            %s
-                        )/
-                        (?:f|embed)/
-                        (?P<id>[a-zA-Z0-9-_]+)
-                    ''' % _DOMAINS
-    _EMBED_WORD = 'embed'
-    _STREAM_WORD = 'f'
-    _REDIR_WORD = 'stream'
-    _URL_IDS = ('streamurl', 'streamuri', 'streamurj')
-    _TESTS = [{
-        'url': 'https://openload.co/f/kUEfGclsU9o',
-        'md5': 'bf1c059b004ebc7a256f89408e65c36e',
-        'info_dict': {
-            'id': 'kUEfGclsU9o',
-            'ext': 'mp4',
-            'title': 'skyrim_no-audio_1080.mp4',
-            'thumbnail': r're:^https?://.*\.jpg$',
-        },
-    }, {
-        'url': 'https://openload.co/embed/rjC09fkPLYs',
-        'info_dict': {
-            'id': 'rjC09fkPLYs',
-            'ext': 'mp4',
-            'title': 'movie.mp4',
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'subtitles': {
-                'en': [{
-                    'ext': 'vtt',
-                }],
-            },
-        },
-        'params': {
-            'skip_download': True,  # test subtitles only
-        },
-    }, {
-        'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
-        'only_matching': True,
-    }, {
-        'url': 'https://openload.io/f/ZAn6oz-VZGE/',
-        'only_matching': True,
-    }, {
-        'url': 'https://openload.co/f/_-ztPaZtMhM/',
-        'only_matching': True,
-    }, {
-        # unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
-        # for title and ext
-        'url': 'https://openload.co/embed/Sxz5sADo82g/',
-        'only_matching': True,
-    }, {
-        # unavailable via https://openload.co/embed/e-Ixz9ZR5L0/ but available
-        # via https://openload.co/f/e-Ixz9ZR5L0/
-        'url': 'https://openload.co/f/e-Ixz9ZR5L0/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.tv/embed/KnG-kKZdcfY/',
-        'only_matching': True,
-    }, {
-        'url': 'http://www.openload.link/f/KnG-kKZdcfY',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.stream/f/KnG-kKZdcfY',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.xyz/f/WwRBpzW8Wtk',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.win/f/kUEfGclsU9o',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.download/f/kUEfGclsU9o',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.cloud/f/4ZDnBXRWiB8',
-        'only_matching': True,
-    }, {
-        # Its title has not got its extension but url has it
-        'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.cc/embed/5NEAbI2BDSk',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.icu/f/-_i4y_F_Hs8',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.fun/f/gb6G1H4sHXY',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.club/f/Nr1L-aZ2dbQ',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.info/f/5NEAbI2BDSk',
-        'only_matching': True,
-    }, {
-        'url': 'https://openload.pw/f/WyKgK8s94N0',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.pw/f/WyKgK8s94N0',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.live/f/-Z58UZ-GR4M',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.space/f/IY4eZSst3u8/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.services/embed/bs1NWj1dCag/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.online/f/W8o2UfN1vNY/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.press/embed/drTBl1aOTvk/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.website/embed/drTBl1aOTvk/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.life/embed/oOzZjNPw9Dc/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.biz/f/bEk3Gp8ARr4/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.best/embed/kkz9JgVZeWc/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oladblock.services/f/b8NWEgkqNLI/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oladblock.xyz/f/b8NWEgkqNLI/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oladblock.me/f/b8NWEgkqNLI/',
-        'only_matching': True,
-    }, {
-        'url': 'https://openloed.co/f/b8NWEgkqNLI/',
-        'only_matching': True,
-    }, {
-        'url': 'https://oload.vip/f/kUEfGclsU9o',
-        'only_matching': True,
-    }]
-
-    @classmethod
-    def _extract_urls(cls, webpage):
-        return re.findall(
-            r'(?x)<iframe[^>]+src=["\']((?:https?://)?%s/%s/[a-zA-Z0-9-_]+)'
-            % (cls._DOMAINS, cls._EMBED_WORD), webpage)
-
-    def _extract_decrypted_page(self, page_url, webpage, video_id):
-        phantom = PhantomJSwrapper(self, required_version='2.0')
-        webpage, _ = phantom.get(page_url, html=webpage, video_id=video_id)
-        return webpage
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        host = mobj.group('host')
-        video_id = mobj.group('id')
-
-        url_pattern = 'https://%s/%%s/%s/' % (host, video_id)
-
-        for path in (self._EMBED_WORD, self._STREAM_WORD):
-            page_url = url_pattern % path
-            last = path == self._STREAM_WORD
-            webpage = self._download_webpage(
-                page_url, video_id, 'Downloading %s webpage' % path,
-                fatal=last)
-            if not webpage:
-                continue
-            if 'File not found' in webpage or 'deleted by the owner' in webpage:
-                if not last:
-                    continue
-                raise ExtractorError('File not found', expected=True, video_id=video_id)
-            break
-
-        webpage = self._extract_decrypted_page(page_url, webpage, video_id)
-        for element_id in self._URL_IDS:
-            decoded_id = get_element_by_id(element_id, webpage)
-            if decoded_id:
-                break
-        if not decoded_id:
-            decoded_id = self._search_regex(
-                (r'>\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<',
-                 r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)',
-                 r'>\s*([\w-]+~\d{10,}~(?:[a-f\d]+:){2}:~[\w-]+)\s*<',
-                 r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)\s*<',
-                 r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
-                'stream URL')
-        video_url = 'https://%s/%s/%s?mime=true' % (host, self._REDIR_WORD, decoded_id)
-
-        title = self._og_search_title(webpage, default=None) or self._search_regex(
-            r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
-            'title', default=None) or self._html_search_meta(
-            'description', webpage, 'title', fatal=True)
-
-        entries = self._parse_html5_media_entries(page_url, webpage, video_id)
-        entry = entries[0] if entries else {}
-        subtitles = entry.get('subtitles')
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': entry.get('thumbnail') or self._og_search_thumbnail(webpage, default=None),
-            'url': video_url,
-            'ext': determine_ext(title, None) or determine_ext(url, 'mp4'),
-            'subtitles': subtitles,
-        }
-
-
-class VerystreamIE(OpenloadIE):
-    IE_NAME = 'verystream'
-
-    _DOMAINS = r'(?:verystream\.com|woof\.tube)'
-    _VALID_URL = r'''(?x)
-                    https?://
-                        (?P<host>
-                            (?:www\.)?
-                            %s
-                        )/
-                        (?:stream|e)/
-                        (?P<id>[a-zA-Z0-9-_]+)
-                    ''' % _DOMAINS
-    _EMBED_WORD = 'e'
-    _STREAM_WORD = 'stream'
-    _REDIR_WORD = 'gettoken'
-    _URL_IDS = ('videolink', )
-    _TESTS = [{
-        'url': 'https://verystream.com/stream/c1GWQ9ngBBx/',
-        'md5': 'd3e8c5628ccb9970b65fd65269886795',
-        'info_dict': {
-            'id': 'c1GWQ9ngBBx',
-            'ext': 'mp4',
-            'title': 'Big Buck Bunny.mp4',
-            'thumbnail': r're:^https?://.*\.jpg$',
-        },
-    }, {
-        'url': 'https://verystream.com/e/c1GWQ9ngBBx/',
-        'only_matching': True,
-    }]
-
-    def _extract_decrypted_page(self, page_url, webpage, video_id):
-        return webpage  # for Verystream, the webpage is already decrypted
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index 499be0029e739ca1779812a34c812c70a70595b1..d54b8ace6e5eeee6ee7188bd1cfe88a2471f3627 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -6,12 +6,14 @@ import re
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
+    clean_html,
      determine_ext,
      float_or_none,
      HEADRequest,
      int_or_none,
      orderedSet,
      remove_end,
+    str_or_none,
      strip_jsonp,
      unescapeHTML,
      unified_strdate,
@@ -86,12 +88,16 @@ class ORFTVthekIE(InfoExtractor):
                      if value:
                          format_id_list.append(value)
                  format_id = '-'.join(format_id_list)
-                if determine_ext(fd['src']) == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        fd['src'], video_id, 'mp4', m3u8_id=format_id))
-                elif determine_ext(fd['src']) == 'f4m':
+                ext = determine_ext(src)
+                if ext == 'm3u8':
+                    m3u8_formats = self._extract_m3u8_formats(
+                        src, video_id, 'mp4', m3u8_id=format_id, fatal=False)
+                    if any('/geoprotection' in f['url'] for f in m3u8_formats):
+                        self.raise_geo_restricted()
+                    formats.extend(m3u8_formats)
+                elif ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
-                        fd['src'], video_id, f4m_id=format_id))
+                        src, video_id, f4m_id=format_id, fatal=False))
                  else:
                      formats.append({
                          'format_id': format_id,
@@ -160,44 +166,48 @@ class ORFRadioIE(InfoExtractor):
          show_date = mobj.group('date')
          show_id = mobj.group('show')
  
-        if station == 'fm4':
-            show_id = '4%s' % show_id
-
          data = self._download_json(
-            'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date),
-            show_id
-        )
-
-        def extract_entry_dict(info, title, subtitle):
-            return {
-                'id': info['loopStreamId'].replace('.mp3', ''),
-                'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']),
+            'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
+            % (station, show_id, show_date), show_id)
+
+        entries = []
+        for info in data['streams']:
+            loop_stream_id = str_or_none(info.get('loopStreamId'))
+            if not loop_stream_id:
+                continue
+            title = str_or_none(data.get('title'))
+            if not title:
+                continue
+            start = int_or_none(info.get('start'), scale=1000)
+            end = int_or_none(info.get('end'), scale=1000)
+            duration = end - start if end and start else None
+            entries.append({
+                'id': loop_stream_id.replace('.mp3', ''),
+                'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, loop_stream_id),
                  'title': title,
-                'description': subtitle,
-                'duration': (info['end'] - info['start']) / 1000,
-                'timestamp': info['start'] / 1000,
+                'description': clean_html(data.get('subtitle')),
+                'duration': duration,
+                'timestamp': start,
                  'ext': 'mp3',
-                'series': data.get('programTitle')
-            }
-
-        entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
+                'series': data.get('programTitle'),
+            })
  
          return {
              '_type': 'playlist',
              'id': show_id,
-            'title': data['title'],
-            'description': data['subtitle'],
-            'entries': entries
+            'title': data.get('title'),
+            'description': clean_html(data.get('subtitle')),
+            'entries': entries,
          }
  
  
  class ORFFM4IE(ORFRadioIE):
      IE_NAME = 'orf:fm4'
      IE_DESC = 'radio FM4'
-    _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
  
      _TEST = {
-        'url': 'http://fm4.orf.at/player/20170107/CC',
+        'url': 'http://fm4.orf.at/player/20170107/4CC',
          'md5': '2b0be47375432a7ef104453432a19212',
          'info_dict': {
              'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
@@ -208,7 +218,8 @@ class ORFFM4IE(ORFRadioIE):
              'timestamp': 1483819257,
              'upload_date': '20170107',
          },
-        'skip': 'Shows from ORF radios are only available for 7 days.'
+        'skip': 'Shows from ORF radios are only available for 7 days.',
+        'only_matching': True,
      }
  
  
diff --git a/youtube_dl/extractor/pandatv.py b/youtube_dl/extractor/pandatv.py

deleted file mode 100644 (file)

index 4219802..0000000
--- a/youtube_dl/extractor/pandatv.py
+++ /dev/null
@@ -1,99 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    qualities,
-)
-
-
-class PandaTVIE(InfoExtractor):
-    IE_DESC = '熊猫TV'
-    _VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
-    _TESTS = [{
-        'url': 'http://www.panda.tv/66666',
-        'info_dict': {
-            'id': '66666',
-            'title': 're:.+',
-            'uploader': '刘杀鸡',
-            'ext': 'flv',
-            'is_live': True,
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'skip': 'Live stream is offline',
-    }, {
-        'url': 'https://www.panda.tv/66666',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        config = self._download_json(
-            'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
-
-        error_code = config.get('errno', 0)
-        if error_code != 0:
-            raise ExtractorError(
-                '%s returned error %s: %s'
-                % (self.IE_NAME, error_code, config['errmsg']),
-                expected=True)
-
-        data = config['data']
-        video_info = data['videoinfo']
-
-        # 2 = live, 3 = offline
-        if video_info.get('status') != '2':
-            raise ExtractorError(
-                'Live stream is offline', expected=True)
-
-        title = data['roominfo']['name']
-        uploader = data.get('hostinfo', {}).get('name')
-        room_key = video_info['room_key']
-        stream_addr = video_info.get(
-            'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
-
-        # Reverse engineered from web player swf
-        # (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
-        # writing).
-        plflag0, plflag1 = video_info['plflag'].split('_')
-        plflag0 = int(plflag0) - 1
-        if plflag1 == '21':
-            plflag0 = 10
-            plflag1 = '4'
-        live_panda = 'live_panda' if plflag0 < 1 else ''
-
-        plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
-        sign = plflag_auth['auth']['sign']
-        ts = plflag_auth['auth']['time']
-        rid = plflag_auth['auth']['rid']
-
-        quality_key = qualities(['OD', 'HD', 'SD'])
-        suffix = ['_small', '_mid', '']
-        formats = []
-        for k, v in stream_addr.items():
-            if v != '1':
-                continue
-            quality = quality_key(k)
-            if quality <= 0:
-                continue
-            for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
-                formats.append({
-                    'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
-                    % (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
-                    'format_id': '%s-%s' % (k, ext),
-                    'quality': quality,
-                    'source_preference': pref,
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': self._live_title(title),
-            'uploader': uploader,
-            'formats': formats,
-            'is_live': True,
-        }
diff --git a/youtube_dl/extractor/patreon.py b/youtube_dl/extractor/patreon.py

index 426dd812158686a4a86e06c0846dd257fcee6748..761a4b1de3d194686339d4d8eb15a5f1bb2b50a6 100644 (file)
--- a/youtube_dl/extractor/patreon.py
+++ b/youtube_dl/extractor/patreon.py
@@ -6,7 +6,11 @@ from ..utils import (
      clean_html,
      determine_ext,
      int_or_none,
+    KNOWN_EXTENSIONS,
+    mimetype2ext,
      parse_iso8601,
+    str_or_none,
+    try_get,
  )
  
  
@@ -24,6 +28,7 @@ class PatreonIE(InfoExtractor):
              'thumbnail': 're:^https?://.*$',
              'timestamp': 1406473987,
              'upload_date': '20140727',
+            'uploader_id': '87145',
          },
      }, {
          'url': 'http://www.patreon.com/creation?hid=754133',
@@ -90,7 +95,13 @@ class PatreonIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
          post = self._download_json(
-            'https://www.patreon.com/api/posts/' + video_id, video_id)
+            'https://www.patreon.com/api/posts/' + video_id, video_id, query={
+                'fields[media]': 'download_url,mimetype,size_bytes',
+                'fields[post]': 'comment_count,content,embed,image,like_count,post_file,published_at,title',
+                'fields[user]': 'full_name,url',
+                'json-api-use-default-includes': 'false',
+                'include': 'media,user',
+            })
          attributes = post['data']['attributes']
          title = attributes['title'].strip()
          image = attributes.get('image') or {}
@@ -104,33 +115,42 @@ class PatreonIE(InfoExtractor):
              'comment_count': int_or_none(attributes.get('comment_count')),
          }
  
-        def add_file(file_data):
-            file_url = file_data.get('url')
-            if file_url:
-                info.update({
-                    'url': file_url,
-                    'ext': determine_ext(file_data.get('name'), 'mp3'),
-                })
-
          for i in post.get('included', []):
              i_type = i.get('type')
-            if i_type == 'attachment':
-                add_file(i.get('attributes') or {})
+            if i_type == 'media':
+                media_attributes = i.get('attributes') or {}
+                download_url = media_attributes.get('download_url')
+                ext = mimetype2ext(media_attributes.get('mimetype'))
+                if download_url and ext in KNOWN_EXTENSIONS:
+                    info.update({
+                        'ext': ext,
+                        'filesize': int_or_none(media_attributes.get('size_bytes')),
+                        'url': download_url,
+                    })
              elif i_type == 'user':
                  user_attributes = i.get('attributes')
                  if user_attributes:
                      info.update({
                          'uploader': user_attributes.get('full_name'),
+                        'uploader_id': str_or_none(i.get('id')),
                          'uploader_url': user_attributes.get('url'),
                      })
  
          if not info.get('url'):
-            add_file(attributes.get('post_file') or {})
+            embed_url = try_get(attributes, lambda x: x['embed']['url'])
+            if embed_url:
+                info.update({
+                    '_type': 'url',
+                    'url': embed_url,
+                })
  
          if not info.get('url'):
-            info.update({
-                '_type': 'url',
-                'url': attributes['embed']['url'],
-            })
+            post_file = attributes['post_file']
+            ext = determine_ext(post_file.get('name'))
+            if ext in KNOWN_EXTENSIONS:
+                info.update({
+                    'ext': ext,
+                    'url': post_file['url'],
+                })
  
          return info
diff --git a/youtube_dl/extractor/peertube.py b/youtube_dl/extractor/peertube.py

index b50543e329f983a9fe06c0bda9c457bb2aa337b6..d3a83ea2bb5215e34e72ad85bf99867697e2e1b2 100644 (file)
--- a/youtube_dl/extractor/peertube.py
+++ b/youtube_dl/extractor/peertube.py
@@ -18,81 +18,385 @@ from ..utils import (
  class PeerTubeIE(InfoExtractor):
      _INSTANCES_RE = r'''(?:
                              # Taken from https://instances.joinpeertube.org/instances
+                            peertube\.rainbowswingers\.net|
+                            tube\.stanisic\.nl|
+                            peer\.suiri\.us|
+                            medias\.libox\.fr|
+                            videomensoif\.ynh\.fr|
+                            peertube\.travelpandas\.eu|
+                            peertube\.rachetjay\.fr|
+                            peertube\.montecsys\.fr|
+                            tube\.eskuero\.me|
+                            peer\.tube|
+                            peertube\.umeahackerspace\.se|
+                            tube\.nx-pod\.de|
+                            video\.monsieurbidouille\.fr|
                              tube\.openalgeria\.org|
-                            peertube\.pointsecu\.fr|
+                            vid\.lelux\.fi|
+                            video\.anormallostpod\.ovh|
+                            tube\.crapaud-fou\.org|
+                            peertube\.stemy\.me|
+                            lostpod\.space|
+                            exode\.me|
+                            peertube\.snargol\.com|
+                            vis\.ion\.ovh|
+                            videosdulib\.re|
+                            v\.mbius\.io|
+                            videos\.judrey\.eu|
+                            peertube\.osureplayviewer\.xyz|
+                            peertube\.mathieufamily\.ovh|
+                            www\.videos-libr\.es|
+                            fightforinfo\.com|
+                            peertube\.fediverse\.ru|
+                            peertube\.oiseauroch\.fr|
+                            video\.nesven\.eu|
+                            v\.bearvideo\.win|
+                            video\.qoto\.org|
+                            justporn\.cc|
+                            video\.vny\.fr|
+                            peervideo\.club|
+                            tube\.taker\.fr|
+                            peertube\.chantierlibre\.org|
+                            tube\.ipfixe\.info|
+                            tube\.kicou\.info|
+                            tube\.dodsorf\.as|
+                            videobit\.cc|
+                            video\.yukari\.moe|
+                            videos\.elbinario\.net|
+                            hkvideo\.live|
+                            pt\.tux\.tf|
+                            www\.hkvideo\.live|
+                            FIGHTFORINFO\.com|
+                            pt\.765racing\.com|
+                            peertube\.gnumeria\.eu\.org|
+                            nordenmedia\.com|
+                            peertube\.co\.uk|
+                            tube\.darfweb\.eu|
+                            tube\.kalah-france\.org|
+                            0ch\.in|
+                            vod\.mochi\.academy|
+                            film\.node9\.org|
+                            peertube\.hatthieves\.es|
+                            video\.fitchfamily\.org|
+                            peertube\.ddns\.net|
+                            video\.ifuncle\.kr|
+                            video\.fdlibre\.eu|
+                            tube\.22decembre\.eu|
+                            peertube\.harmoniescreatives\.com|
+                            tube\.fabrigli\.fr|
+                            video\.thedwyers\.co|
+                            video\.bruitbruit\.com|
+                            peertube\.foxfam\.club|
+                            peer\.philoxweb\.be|
+                            videos\.bugs\.social|
+                            peertube\.malbert\.xyz|
+                            peertube\.bilange\.ca|
+                            libretube\.net|
+                            diytelevision\.com|
+                            peertube\.fedilab\.app|
+                            libre\.video|
+                            video\.mstddntfdn\.online|
+                            us\.tv|
+                            peertube\.sl-network\.fr|
+                            peertube\.dynlinux\.io|
+                            peertube\.david\.durieux\.family|
+                            peertube\.linuxrocks\.online|
+                            peerwatch\.xyz|
+                            v\.kretschmann\.social|
+                            tube\.otter\.sh|
+                            yt\.is\.nota\.live|
+                            tube\.dragonpsi\.xyz|
+                            peertube\.boneheadmedia\.com|
+                            videos\.funkwhale\.audio|
+                            watch\.44con\.com|
+                            peertube\.gcaillaut\.fr|
+                            peertube\.icu|
+                            pony\.tube|
+                            spacepub\.space|
+                            tube\.stbr\.io|
+                            v\.mom-gay\.faith|
+                            tube\.port0\.xyz|
+                            peertube\.simounet\.net|
+                            play\.jergefelt\.se|
+                            peertube\.zeteo\.me|
+                            tube\.danq\.me|
+                            peertube\.kerenon\.com|
+                            tube\.fab-l3\.org|
+                            tube\.calculate\.social|
+                            peertube\.mckillop\.org|
+                            tube\.netzspielplatz\.de|
+                            vod\.ksite\.de|
+                            peertube\.laas\.fr|
+                            tube\.govital\.net|
+                            peertube\.stephenson\.cc|
+                            bistule\.nohost\.me|
+                            peertube\.kajalinifi\.de|
+                            video\.ploud\.jp|
+                            video\.omniatv\.com|
+                            peertube\.ffs2play\.fr|
+                            peertube\.leboulaire\.ovh|
+                            peertube\.tronic-studio\.com|
+                            peertube\.public\.cat|
+                            peertube\.metalbanana\.net|
+                            video\.1000i100\.fr|
+                            peertube\.alter-nativ-voll\.de|
+                            tube\.pasa\.tf|
+                            tube\.worldofhauru\.xyz|
+                            pt\.kamp\.site|
+                            peertube\.teleassist\.fr|
+                            videos\.mleduc\.xyz|
+                            conf\.tube|
+                            media\.privacyinternational\.org|
+                            pt\.forty-two\.nl|
+                            video\.halle-leaks\.de|
+                            video\.grosskopfgames\.de|
+                            peertube\.schaeferit\.de|
+                            peertube\.jackbot\.fr|
+                            tube\.extinctionrebellion\.fr|
+                            peertube\.f-si\.org|
+                            video\.subak\.ovh|
+                            videos\.koweb\.fr|
+                            peertube\.zergy\.net|
+                            peertube\.roflcopter\.fr|
+                            peertube\.floss-marketing-school\.com|
+                            vloggers\.social|
+                            peertube\.iriseden\.eu|
+                            videos\.ubuntu-paris\.org|
+                            peertube\.mastodon\.host|
+                            armstube\.com|
+                            peertube\.s2s\.video|
+                            peertube\.lol|
+                            tube\.open-plug\.eu|
+                            open\.tube|
+                            peertube\.ch|
+                            peertube\.normandie-libre\.fr|
+                            peertube\.slat\.org|
+                            video\.lacaveatonton\.ovh|
+                            peertube\.uno|
+                            peertube\.servebeer\.com|
+                            peertube\.fedi\.quebec|
+                            tube\.h3z\.jp|
+                            tube\.plus200\.com|
+                            peertube\.eric\.ovh|
+                            tube\.metadocs\.cc|
+                            tube\.unmondemeilleur\.eu|
+                            gouttedeau\.space|
+                            video\.antirep\.net|
+                            nrop\.cant\.at|
+                            tube\.ksl-bmx\.de|
+                            tube\.plaf\.fr|
+                            tube\.tchncs\.de|
+                            video\.devinberg\.com|
+                            hitchtube\.fr|
+                            peertube\.kosebamse\.com|
+                            yunopeertube\.myddns\.me|
+                            peertube\.varney\.fr|
+                            peertube\.anon-kenkai\.com|
+                            tube\.maiti\.info|
+                            tubee\.fr|
+                            videos\.dinofly\.com|
+                            toobnix\.org|
+                            videotape\.me|
+                            voca\.tube|
+                            video\.heromuster\.com|
+                            video\.lemediatv\.fr|
+                            video\.up\.edu\.ph|
+                            balafon\.video|
+                            video\.ivel\.fr|
+                            thickrips\.cloud|
+                            pt\.laurentkruger\.fr|
+                            video\.monarch-pass\.net|
+                            peertube\.artica\.center|
+                            video\.alternanet\.fr|
+                            indymotion\.fr|
+                            fanvid\.stopthatimp\.net|
+                            video\.farci\.org|
+                            v\.lesterpig\.com|
+                            video\.okaris\.de|
+                            tube\.pawelko\.net|
+                            peertube\.mablr\.org|
+                            tube\.fede\.re|
+                            pytu\.be|
+                            evertron\.tv|
+                            devtube\.dev-wiki\.de|
+                            raptube\.antipub\.org|
+                            video\.selea\.se|
+                            peertube\.mygaia\.org|
+                            video\.oh14\.de|
+                            peertube\.livingutopia\.org|
+                            peertube\.the-penguin\.de|
+                            tube\.thechangebook\.org|
+                            tube\.anjara\.eu|
+                            pt\.pube\.tk|
+                            video\.samedi\.pm|
+                            mplayer\.demouliere\.eu|
+                            widemus\.de|
+                            peertube\.me|
+                            peertube\.zapashcanon\.fr|
+                            video\.latavernedejohnjohn\.fr|
+                            peertube\.pcservice46\.fr|
+                            peertube\.mazzonetto\.eu|
+                            video\.irem\.univ-paris-diderot\.fr|
+                            video\.livecchi\.cloud|
+                            alttube\.fr|
+                            video\.coop\.tools|
+                            video\.cabane-libre\.org|
+                            peertube\.openstreetmap\.fr|
+                            videos\.alolise\.org|
+                            irrsinn\.video|
+                            video\.antopie\.org|
+                            scitech\.video|
+                            tube2\.nemsia\.org|
+                            video\.amic37\.fr|
+                            peertube\.freeforge\.eu|
+                            video\.arbitrarion\.com|
+                            video\.datsemultimedia\.com|
+                            stoptrackingus\.tv|
+                            peertube\.ricostrongxxx\.com|
+                            docker\.videos\.lecygnenoir\.info|
+                            peertube\.togart\.de|
+                            tube\.postblue\.info|
+                            videos\.domainepublic\.net|
+                            peertube\.cyber-tribal\.com|
+                            video\.gresille\.org|
+                            peertube\.dsmouse\.net|
+                            cinema\.yunohost\.support|
+                            tube\.theocevaer\.fr|
+                            repro\.video|
+                            tube\.4aem\.com|
+                            quaziinc\.com|
+                            peertube\.metawurst\.space|
+                            videos\.wakapo\.com|
+                            video\.ploud\.fr|
+                            video\.freeradical\.zone|
+                            tube\.valinor\.fr|
+                            refuznik\.video|
+                            pt\.kircheneuenburg\.de|
+                            peertube\.asrun\.eu|
+                            peertube\.lagob\.fr|
+                            videos\.side-ways\.net|
+                            91video\.online|
+                            video\.valme\.io|
+                            video\.taboulisme\.com|
+                            videos-libr\.es|
+                            tv\.mooh\.fr|
+                            nuage\.acostey\.fr|
+                            video\.monsieur-a\.fr|
+                            peertube\.librelois\.fr|
+                            videos\.pair2jeux\.tube|
+                            videos\.pueseso\.club|
+                            peer\.mathdacloud\.ovh|
+                            media\.assassinate-you\.net|
+                            vidcommons\.org|
+                            ptube\.rousset\.nom\.fr|
+                            tube\.cyano\.at|
+                            videos\.squat\.net|
+                            video\.iphodase\.fr|
+                            peertube\.makotoworkshop\.org|
+                            peertube\.serveur\.slv-valbonne\.fr|
+                            vault\.mle\.party|
+                            hostyour\.tv|
+                            videos\.hack2g2\.fr|
+                            libre\.tube|
+                            pire\.artisanlogiciel\.net|
+                            videos\.numerique-en-commun\.fr|
+                            video\.netsyms\.com|
+                            video\.die-partei\.social|
+                            video\.writeas\.org|
+                            peertube\.swarm\.solvingmaz\.es|
+                            tube\.pericoloso\.ovh|
+                            watching\.cypherpunk\.observer|
+                            videos\.adhocmusic\.com|
+                            tube\.rfc1149\.net|
+                            peertube\.librelabucm\.org|
+                            videos\.numericoop\.fr|
+                            peertube\.koehn\.com|
+                            peertube\.anarchmusicall\.net|
+                            tube\.kampftoast\.de|
+                            vid\.y-y\.li|
+                            peertube\.xtenz\.xyz|
+                            diode\.zone|
+                            tube\.egf\.mn|
+                            peertube\.nomagic\.uk|
+                            visionon\.tv|
+                            videos\.koumoul\.com|
+                            video\.rastapuls\.com|
+                            video\.mantlepro\.com|
+                            video\.deadsuperhero\.com|
+                            peertube\.musicstudio\.pro|
+                            peertube\.we-keys\.fr|
+                            artitube\.artifaille\.fr|
+                            peertube\.ethernia\.net|
+                            tube\.midov\.pl|
+                            peertube\.fr|
+                            watch\.snoot\.tube|
+                            peertube\.donnadieu\.fr|
+                            argos\.aquilenet\.fr|
+                            tube\.nemsia\.org|
+                            tube\.bruniau\.net|
+                            videos\.darckoune\.moe|
+                            tube\.traydent\.info|
+                            dev\.videos\.lecygnenoir\.info|
+                            peertube\.nayya\.org|
+                            peertube\.live|
+                            peertube\.mofgao\.space|
+                            video\.lequerrec\.eu|
+                            peertube\.amicale\.net|
+                            aperi\.tube|
+                            tube\.ac-lyon\.fr|
+                            video\.lw1\.at|
+                            www\.yiny\.org|
+                            videos\.pofilo\.fr|
+                            tube\.lou\.lt|
+                            choob\.h\.etbus\.ch|
+                            tube\.hoga\.fr|
+                            peertube\.heberge\.fr|
+                            video\.obermui\.de|
+                            videos\.cloudfrancois\.fr|
+                            betamax\.video|
+                            video\.typica\.us|
+                            tube\.piweb\.be|
+                            video\.blender\.org|
+                            peertube\.cat|
+                            tube\.kdy\.ch|
+                            pe\.ertu\.be|
+                            peertube\.social|
+                            videos\.lescommuns\.org|
+                            tv\.datamol\.org|
+                            videonaute\.fr|
+                            dialup\.express|
                              peertube\.nogafa\.org|
-                            peertube\.pl|
                              megatube\.lilomoino\.fr|
                              peertube\.tamanoir\.foucry\.net|
-                            peertube\.inapurna\.org|
-                            peertube\.netzspielplatz\.de|
-                            video\.deadsuperhero\.com|
                              peertube\.devosi\.org|
                              peertube\.1312\.media|
-                            tube\.worldofhauru\.xyz|
                              tube\.bootlicker\.party|
                              skeptikon\.fr|
-                            peertube\.geekshell\.fr|
-                            tube\.opportunis\.me|
-                            peertube\.peshane\.net|
                              video\.blueline\.mg|
                              tube\.homecomputing\.fr|
-                            videos\.cloudfrancois\.fr|
-                            peertube\.viviers-fibre\.net|
                              tube\.ouahpiti\.info|
                              video\.tedomum\.net|
                              video\.g3l\.org|
                              fontube\.fr|
                              peertube\.gaialabs\.ch|
-                            peertube\.extremely\.online|
-                            peertube\.public-infrastructure\.eu|
                              tube\.kher\.nl|
                              peertube\.qtg\.fr|
-                            tube\.22decembre\.eu|
-                            facegirl\.me|
                              video\.migennes\.net|
-                            janny\.moe|
                              tube\.p2p\.legal|
-                            video\.atlanti\.se|
                              troll\.tv|
-                            peertube\.geekael\.fr|
-                            vid\.leotindall\.com|
-                            video\.anormallostpod\.ovh|
-                            p-tube\.h3z\.jp|
-                            tube\.darfweb\.eu|
                              videos\.iut-orsay\.fr|
                              peertube\.solidev\.net|
-                            videos\.symphonie-of-code\.fr|
-                            testtube\.ortg\.de|
                              videos\.cemea\.org|
-                            peertube\.gwendalavir\.eu|
                              video\.passageenseine\.fr|
                              videos\.festivalparminous\.org|
                              peertube\.touhoppai\.moe|
-                            peertube\.duckdns\.org|
                              sikke\.fi|
-                            peertube\.mastodon\.host|
-                            firedragonvideos\.com|
-                            vidz\.dou\.bet|
-                            peertube\.koehn\.com|
                              peer\.hostux\.social|
                              share\.tube|
                              peertube\.walkingmountains\.fr|
-                            medias\.libox\.fr|
-                            peertube\.moe|
-                            peertube\.xyz|
-                            jp\.peertube\.network|
                              videos\.benpro\.fr|
-                            tube\.otter\.sh|
-                            peertube\.angristan\.xyz|
                              peertube\.parleur\.net|
-                            peer\.ecutsa\.fr|
                              peertube\.heraut\.eu|
-                            peertube\.tifox\.fr|
-                            peertube\.maly\.io|
-                            vod\.mochi\.academy|
-                            exode\.me|
-                            coste\.video|
                              tube\.aquilenet\.fr|
                              peertube\.gegeweb\.eu|
                              framatube\.org|
@@ -100,18 +404,11 @@ class PeerTubeIE(InfoExtractor):
                              tube\.conferences-gesticulees\.net|
                              peertube\.datagueule\.tv|
                              video\.lqdn\.fr|
-                            meilleurtube\.delire\.party|
                              tube\.mochi\.academy|
-                            peertube\.dav\.li|
                              media\.zat\.im|
-                            pytu\.be|
-                            peertube\.valvin\.fr|
-                            peertube\.nsa\.ovh|
                              video\.colibris-outilslibres\.org|
-                            video\.hispagatos\.org|
                              tube\.svnet\.fr|
                              peertube\.video|
-                            videos\.lecygnenoir\.info|
                              peertube3\.cpy\.re|
                              peertube2\.cpy\.re|
                              videos\.tcit\.fr|
@@ -126,7 +423,7 @@ class PeerTubeIE(InfoExtractor):
                      (?P<id>%s)
                      ''' % (_INSTANCES_RE, _UUID_RE)
      _TESTS = [{
-        'url': 'https://peertube.moe/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
+        'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
          'md5': '80f24ff364cc9d333529506a263e7feb',
          'info_dict': {
              'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
diff --git a/youtube_dl/extractor/periscope.py b/youtube_dl/extractor/periscope.py

index b337a56c0575c670e9d8ecf5043bde22962098f3..c02e34abac8720361f94b7085e5f7fb3814df312 100644 (file)
--- a/youtube_dl/extractor/periscope.py
+++ b/youtube_dl/extractor/periscope.py
@@ -17,12 +17,54 @@ class PeriscopeBaseIE(InfoExtractor):
              'https://api.periscope.tv/api/v2/%s' % method,
              item_id, query=query)
  
+    def _parse_broadcast_data(self, broadcast, video_id):
+        title = broadcast['status']
+        uploader = broadcast.get('user_display_name') or broadcast.get('username')
+        title = '%s - %s' % (uploader, title) if uploader else title
+        is_live = broadcast.get('state').lower() == 'running'
+
+        thumbnails = [{
+            'url': broadcast[image],
+        } for image in ('image_url', 'image_url_small') if broadcast.get(image)]
+
+        return {
+            'id': broadcast.get('id') or video_id,
+            'title': self._live_title(title) if is_live else title,
+            'timestamp': parse_iso8601(broadcast.get('created_at')),
+            'uploader': uploader,
+            'uploader_id': broadcast.get('user_id') or broadcast.get('username'),
+            'thumbnails': thumbnails,
+            'view_count': int_or_none(broadcast.get('total_watched')),
+            'tags': broadcast.get('tags'),
+            'is_live': is_live,
+        }
+
+    @staticmethod
+    def _extract_common_format_info(broadcast):
+        return broadcast.get('state').lower(), int_or_none(broadcast.get('width')), int_or_none(broadcast.get('height'))
+
+    @staticmethod
+    def _add_width_and_height(f, width, height):
+        for key, val in (('width', width), ('height', height)):
+            if not f.get(key):
+                f[key] = val
+
+    def _extract_pscp_m3u8_formats(self, m3u8_url, video_id, format_id, state, width, height, fatal=True):
+        m3u8_formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, 'mp4',
+            entry_protocol='m3u8_native'
+            if state in ('ended', 'timed_out') else 'm3u8',
+            m3u8_id=format_id, fatal=fatal)
+        if len(m3u8_formats) == 1:
+            self._add_width_and_height(m3u8_formats[0], width, height)
+        return m3u8_formats
+
  
  class PeriscopeIE(PeriscopeBaseIE):
      IE_DESC = 'Periscope'
      IE_NAME = 'periscope'
      _VALID_URL = r'https?://(?:www\.)?(?:periscope|pscp)\.tv/[^/]+/(?P<id>[^/?#]+)'
-    # Alive example URLs can be found here http://onperiscope.com/
+    # Alive example URLs can be found here https://www.periscope.tv/
      _TESTS = [{
          'url': 'https://www.periscope.tv/w/aJUQnjY3MjA3ODF8NTYxMDIyMDl2zCg2pECBgwTqRpQuQD352EMPTKQjT4uqlM3cgWFA-g==',
          'md5': '65b57957972e503fcbbaeed8f4fa04ca',
@@ -61,21 +103,9 @@ class PeriscopeIE(PeriscopeBaseIE):
              'accessVideoPublic', {'broadcast_id': token}, token)
  
          broadcast = stream['broadcast']
-        title = broadcast['status']
-
-        uploader = broadcast.get('user_display_name') or broadcast.get('username')
-        uploader_id = (broadcast.get('user_id') or broadcast.get('username'))
+        info = self._parse_broadcast_data(broadcast, token)
  
-        title = '%s - %s' % (uploader, title) if uploader else title
          state = broadcast.get('state').lower()
-        if state == 'running':
-            title = self._live_title(title)
-        timestamp = parse_iso8601(broadcast.get('created_at'))
-
-        thumbnails = [{
-            'url': broadcast[image],
-        } for image in ('image_url', 'image_url_small') if broadcast.get(image)]
-
          width = int_or_none(broadcast.get('width'))
          height = int_or_none(broadcast.get('height'))
  
@@ -92,32 +122,20 @@ class PeriscopeIE(PeriscopeBaseIE):
                  continue
              video_urls.add(video_url)
              if format_id != 'rtmp':
-                m3u8_formats = self._extract_m3u8_formats(
-                    video_url, token, 'mp4',
-                    entry_protocol='m3u8_native'
-                    if state in ('ended', 'timed_out') else 'm3u8',
-                    m3u8_id=format_id, fatal=False)
-                if len(m3u8_formats) == 1:
-                    add_width_and_height(m3u8_formats[0])
+                m3u8_formats = self._extract_pscp_m3u8_formats(
+                    video_url, token, format_id, state, width, height, False)
                  formats.extend(m3u8_formats)
                  continue
              rtmp_format = {
                  'url': video_url,
                  'ext': 'flv' if format_id == 'rtmp' else 'mp4',
              }
-            add_width_and_height(rtmp_format)
+            self._add_width_and_height(rtmp_format)
              formats.append(rtmp_format)
          self._sort_formats(formats)
  
-        return {
-            'id': broadcast.get('id') or token,
-            'title': title,
-            'timestamp': timestamp,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'thumbnails': thumbnails,
-            'formats': formats,
-        }
+        info['formats'] = formats
+        return info
  
  
  class PeriscopeUserIE(PeriscopeBaseIE):
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index 11b8cfcf73f13f7db93a005d04b81ee80e06ed08..b3251ccd9b2300188f7efce561c3c5d1fbff702e 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -17,6 +17,7 @@ from ..utils import (
      determine_ext,
      ExtractorError,
      int_or_none,
+    NO_DEFAULT,
      orderedSet,
      remove_quotes,
      str_to_int,
@@ -227,12 +228,13 @@ class PornHubIE(PornHubBaseIE):
          else:
              thumbnail, duration = [None] * 2
  
-        if not video_urls:
-            tv_webpage = dl_webpage('tv')
-
+        def extract_js_vars(webpage, pattern, default=NO_DEFAULT):
              assignments = self._search_regex(
-                r'(var.+?mediastring.+?)</script>', tv_webpage,
-                'encoded url').split(';')
+                pattern, webpage, 'encoded url', default=default)
+            if not assignments:
+                return {}
+
+            assignments = assignments.split(';')
  
              js_vars = {}
  
@@ -254,11 +256,35 @@ class PornHubIE(PornHubBaseIE):
                  assn = re.sub(r'var\s+', '', assn)
                  vname, value = assn.split('=', 1)
                  js_vars[vname] = parse_js_value(value)
+            return js_vars
  
-            video_url = js_vars['mediastring']
-            if video_url not in video_urls_set:
-                video_urls.append((video_url, None))
-                video_urls_set.add(video_url)
+        def add_video_url(video_url):
+            v_url = url_or_none(video_url)
+            if not v_url:
+                return
+            if v_url in video_urls_set:
+                return
+            video_urls.append((v_url, None))
+            video_urls_set.add(v_url)
+
+        if not video_urls:
+            FORMAT_PREFIXES = ('media', 'quality')
+            js_vars = extract_js_vars(
+                webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
+                default=None)
+            if js_vars:
+                for key, format_url in js_vars.items():
+                    if any(key.startswith(p) for p in FORMAT_PREFIXES):
+                        add_video_url(format_url)
+            if not video_urls and re.search(
+                    r'<[^>]+\bid=["\']lockedPlayer', webpage):
+                raise ExtractorError(
+                    'Video %s is locked' % video_id, expected=True)
+
+        if not video_urls:
+            js_vars = extract_js_vars(
+                dl_webpage('tv'), r'(var.+?mediastring.+?)</script>')
+            add_video_url(js_vars['mediastring'])
  
          for mobj in re.finditer(
                  r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
@@ -276,10 +302,16 @@ class PornHubIE(PornHubBaseIE):
                      r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
                  if upload_date:
                      upload_date = upload_date.replace('/', '')
-            if determine_ext(video_url) == 'mpd':
+            ext = determine_ext(video_url)
+            if ext == 'mpd':
                  formats.extend(self._extract_mpd_formats(
                      video_url, video_id, mpd_id='dash', fatal=False))
                  continue
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+                continue
              tbr = None
              mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
              if mobj:
@@ -403,6 +435,15 @@ class PornHubUserIE(PornHubPlaylistBaseIE):
  
  
  class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
+    @staticmethod
+    def _has_more(webpage):
+        return re.search(
+            r'''(?x)
+                <li[^>]+\bclass=["\']page_next|
+                <link[^>]+\brel=["\']next|
+                <button[^>]+\bid=["\']moreDataBtn
+            ''', webpage) is not None
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          host = mobj.group('host')
@@ -411,13 +452,11 @@ class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
          page = int_or_none(self._search_regex(
              r'\bpage=(\d+)', url, 'page', default=None))
  
-        page_url = self._make_page_url(url)
-
          entries = []
          for page_num in (page, ) if page is not None else itertools.count(1):
              try:
                  webpage = self._download_webpage(
-                    page_url, item_id, 'Downloading page %d' % page_num,
+                    url, item_id, 'Downloading page %d' % page_num,
                      query={'page': page_num})
              except ExtractorError as e:
                  if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
@@ -547,18 +586,6 @@ class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
                  if PornHubIE.suitable(url) or PornHubUserIE.suitable(url) or PornHubUserVideosUploadIE.suitable(url)
                  else super(PornHubPagedVideoListIE, cls).suitable(url))
  
-    def _make_page_url(self, url):
-        return url
-
-    @staticmethod
-    def _has_more(webpage):
-        return re.search(
-            r'''(?x)
-                <li[^>]+\bclass=["\']page_next|
-                <link[^>]+\brel=["\']next|
-                <button[^>]+\bid=["\']moreDataBtn
-            ''', webpage) is not None
-
  
  class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
      _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
@@ -572,11 +599,3 @@ class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
          'url': 'https://www.pornhub.com/model/zoe_ph/videos/upload',
          'only_matching': True,
      }]
-
-    def _make_page_url(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        return '%s/ajax' % mobj.group('url')
-
-    @staticmethod
-    def _has_more(webpage):
-        return True
diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py

deleted file mode 100644 (file)

index 23ac93d..0000000
--- a/youtube_dl/extractor/promptfile.py
+++ /dev/null
@@ -1,70 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    determine_ext,
-    ExtractorError,
-    urlencode_postdata,
-)
-
-
-class PromptFileIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?promptfile\.com/l/(?P<id>[0-9A-Z\-]+)'
-    _TEST = {
-        'url': 'http://www.promptfile.com/l/86D1CE8462-576CAAE416',
-        'md5': '5a7e285a26e0d66d9a263fae91bc92ce',
-        'info_dict': {
-            'id': '86D1CE8462-576CAAE416',
-            'ext': 'mp4',
-            'title': 'oceans.mp4',
-            'thumbnail': r're:^https?://.*\.jpg$',
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        if re.search(r'<div.+id="not_found_msg".+>(?!We are).+</div>[^-]', webpage) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id,
-                                 expected=True)
-
-        chash = self._search_regex(
-            r'val\("([^"]*)"\s*\+\s*\$\("#chash"\)', webpage, 'chash')
-        fields = self._hidden_inputs(webpage)
-        keys = list(fields.keys())
-        chash_key = keys[0] if len(keys) == 1 else next(
-            key for key in keys if key.startswith('cha'))
-        fields[chash_key] = chash + fields[chash_key]
-
-        webpage = self._download_webpage(
-            url, video_id, 'Downloading video page',
-            data=urlencode_postdata(fields),
-            headers={'Content-type': 'application/x-www-form-urlencoded'})
-
-        video_url = self._search_regex(
-            (r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*Download File',
-             r'<a[^>]+href=(["\'])(?P<url>https?://(?:www\.)?promptfile\.com/file/(?:(?!\1).)+)\1'),
-            webpage, 'video url', group='url')
-        title = self._html_search_regex(
-            r'<span.+title="([^"]+)">', webpage, 'title')
-        thumbnail = self._html_search_regex(
-            r'<div id="player_overlay">.*button>.*?<img src="([^"]+)"',
-            webpage, 'thumbnail', fatal=False, flags=re.DOTALL)
-
-        formats = [{
-            'format_id': 'sd',
-            'url': video_url,
-            'ext': determine_ext(title),
-        }]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index e19a470a5eee5efda48fe88a694c7e3a62010963..1bc4f9b6b3a455e027f1e676f24c4f352888a0c4 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -16,7 +16,7 @@ from ..utils import (
  
  
  class ProSiebenSat1BaseIE(InfoExtractor):
-    _GEO_COUNTRIES = ['DE']
+    _GEO_BYPASS = False
      _ACCESS_ID = None
      _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
      _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
@@ -39,14 +39,18 @@ class ProSiebenSat1BaseIE(InfoExtractor):
          formats = []
          if self._ACCESS_ID:
              raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
-            server_token = (self._download_json(
+            protocols = self._download_json(
                  self._V4_BASE_URL + 'protocols', clip_id,
                  'Downloading protocols JSON',
                  headers=self.geo_verification_headers(), query={
                      'access_id': self._ACCESS_ID,
                      'client_token': sha1((raw_ct).encode()).hexdigest(),
                      'video_id': clip_id,
-                }, fatal=False) or {}).get('server_token')
+                }, fatal=False, expected_status=(403,)) or {}
+            error = protocols.get('error') or {}
+            if error.get('title') == 'Geo check failed':
+                self.raise_geo_restricted(countries=['AT', 'CH', 'DE'])
+            server_token = protocols.get('server_token')
              if server_token:
                  urls = (self._download_json(
                      self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
diff --git a/youtube_dl/extractor/puhutv.py b/youtube_dl/extractor/puhutv.py

index 5465e8ab79f6b4d3183708b50c5a02a1f8f2ee62..fb704a3c4390b9da5b6fa3a5dad027c6e812b7eb 100644 (file)
--- a/youtube_dl/extractor/puhutv.py
+++ b/youtube_dl/extractor/puhutv.py
@@ -25,21 +25,21 @@ class PuhuTVIE(InfoExtractor):
      _TESTS = [{
          # film
          'url': 'https://puhutv.com/sut-kardesler-izle',
-        'md5': 'fbd8f2d8e7681f8bcd51b592475a6ae7',
+        'md5': 'a347470371d56e1585d1b2c8dab01c96',
          'info_dict': {
              'id': '5085',
              'display_id': 'sut-kardesler',
              'ext': 'mp4',
              'title': 'Süt Kardeşler',
-            'description': 'md5:405fd024df916ca16731114eb18e511a',
+            'description': 'md5:ca09da25b7e57cbb5a9280d6e48d17aa',
              'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 4832.44,
              'creator': 'Arzu Film',
-            'timestamp': 1469778212,
-            'upload_date': '20160729',
+            'timestamp': 1561062602,
+            'upload_date': '20190620',
              'release_year': 1976,
              'view_count': int,
-            'tags': ['Aile', 'Komedi', 'Klasikler'],
+            'tags': list,
          },
      }, {
          # episode, geo restricted, bypassable with --geo-verification-proxy
@@ -64,9 +64,10 @@ class PuhuTVIE(InfoExtractor):
              display_id)['data']
  
          video_id = compat_str(info['id'])
-        title = info.get('name') or info['title']['name']
+        show = info.get('title') or {}
+        title = info.get('name') or show['name']
          if info.get('display_name'):
-            title = '%s %s' % (title, info.get('display_name'))
+            title = '%s %s' % (title, info['display_name'])
  
          try:
              videos = self._download_json(
@@ -78,17 +79,36 @@ class PuhuTVIE(InfoExtractor):
                  self.raise_geo_restricted()
              raise
  
+        urls = []
          formats = []
+
+        def add_http_from_hls(m3u8_f):
+            http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
+            if http_url != m3u8_f['url']:
+                f = m3u8_f.copy()
+                f.update({
+                    'format_id': f['format_id'].replace('hls', 'http'),
+                    'protocol': 'http',
+                    'url': http_url,
+                })
+                formats.append(f)
+
          for video in videos['data']['videos']:
              media_url = url_or_none(video.get('url'))
-            if not media_url:
+            if not media_url or media_url in urls:
                  continue
+            urls.append(media_url)
+
              playlist = video.get('is_playlist')
-            if video.get('stream_type') == 'hls' and playlist is True:
-                formats.extend(self._extract_m3u8_formats(
+            if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
+                m3u8_formats = self._extract_m3u8_formats(
                      media_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id='hls', fatal=False))
+                    m3u8_id='hls', fatal=False)
+                for m3u8_f in m3u8_formats:
+                    formats.append(m3u8_f)
+                    add_http_from_hls(m3u8_f)
                  continue
+
              quality = int_or_none(video.get('quality'))
              f = {
                  'url': media_url,
@@ -96,34 +116,29 @@ class PuhuTVIE(InfoExtractor):
                  'height': quality
              }
              video_format = video.get('video_format')
-            if video_format == 'hls' and playlist is False:
+            is_hls = (video_format == 'hls' or '/hls/' in media_url or '/chunklist.m3u8' in media_url) and playlist is False
+            if is_hls:
                  format_id = 'hls'
                  f['protocol'] = 'm3u8_native'
              elif video_format == 'mp4':
                  format_id = 'http'
-
              else:
                  continue
              if quality:
                  format_id += '-%sp' % quality
              f['format_id'] = format_id
              formats.append(f)
+            if is_hls:
+                add_http_from_hls(f)
          self._sort_formats(formats)
  
-        description = try_get(
-            info, lambda x: x['title']['description'],
-            compat_str) or info.get('description')
-        timestamp = unified_timestamp(info.get('created_at'))
          creator = try_get(
-            info, lambda x: x['title']['producer']['name'], compat_str)
+            show, lambda x: x['producer']['name'], compat_str)
  
-        duration = float_or_none(
-            try_get(info, lambda x: x['content']['duration_in_ms'], int),
-            scale=1000)
-        view_count = try_get(info, lambda x: x['content']['watch_count'], int)
+        content = info.get('content') or {}
  
          images = try_get(
-            info, lambda x: x['content']['images']['wide'], dict) or {}
+            content, lambda x: x['images']['wide'], dict) or {}
          thumbnails = []
          for image_id, image_url in images.items():
              if not isinstance(image_url, compat_str):
@@ -137,14 +152,8 @@ class PuhuTVIE(InfoExtractor):
              })
              thumbnails.append(t)
  
-        release_year = try_get(info, lambda x: x['title']['released_at'], int)
-
-        season_number = int_or_none(info.get('season_number'))
-        season_id = str_or_none(info.get('season_id'))
-        episode_number = int_or_none(info.get('episode_number'))
-
          tags = []
-        for genre in try_get(info, lambda x: x['title']['genres'], list) or []:
+        for genre in show.get('genres') or []:
              if not isinstance(genre, dict):
                  continue
              genre_name = genre.get('name')
@@ -152,12 +161,11 @@ class PuhuTVIE(InfoExtractor):
                  tags.append(genre_name)
  
          subtitles = {}
-        for subtitle in try_get(
-                info, lambda x: x['content']['subtitles'], list) or []:
+        for subtitle in content.get('subtitles') or []:
              if not isinstance(subtitle, dict):
                  continue
              lang = subtitle.get('language')
-            sub_url = url_or_none(subtitle.get('url'))
+            sub_url = url_or_none(subtitle.get('url') or subtitle.get('file'))
              if not lang or not isinstance(lang, compat_str) or not sub_url:
                  continue
              subtitles[self._SUBTITLE_LANGS.get(lang, lang)] = [{
@@ -168,15 +176,15 @@ class PuhuTVIE(InfoExtractor):
              'id': video_id,
              'display_id': display_id,
              'title': title,
-            'description': description,
-            'season_id': season_id,
-            'season_number': season_number,
-            'episode_number': episode_number,
-            'release_year': release_year,
-            'timestamp': timestamp,
+            'description': info.get('description') or show.get('description'),
+            'season_id': str_or_none(info.get('season_id')),
+            'season_number': int_or_none(info.get('season_number')),
+            'episode_number': int_or_none(info.get('episode_number')),
+            'release_year': int_or_none(show.get('released_at')),
+            'timestamp': unified_timestamp(info.get('created_at')),
              'creator': creator,
-            'view_count': view_count,
-            'duration': duration,
+            'view_count': int_or_none(content.get('watch_count')),
+            'duration': float_or_none(content.get('duration_in_ms'), 1000),
              'tags': tags,
              'subtitles': subtitles,
              'thumbnails': thumbnails,
diff --git a/youtube_dl/extractor/redtube.py b/youtube_dl/extractor/redtube.py

index 10311a81a8ae26e283d381fba88c44752844d04f..b1bde1e8117bf5fa72cccf94ce825a0ad31c274e 100644 (file)
--- a/youtube_dl/extractor/redtube.py
+++ b/youtube_dl/extractor/redtube.py
@@ -6,6 +6,7 @@ from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
+    merge_dicts,
      str_to_int,
      unified_strdate,
      url_or_none,
@@ -42,14 +43,24 @@ class RedTubeIE(InfoExtractor):
          webpage = self._download_webpage(
              'http://www.redtube.com/%s' % video_id, video_id)
  
-        if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
-            raise ExtractorError('Video %s has been removed' % video_id, expected=True)
+        ERRORS = (
+            (('video-deleted-info', '>This video has been removed'), 'has been removed'),
+            (('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
+        )
  
-        title = self._html_search_regex(
-            (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
-             r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
-            webpage, 'title', group='title',
-            default=None) or self._og_search_title(webpage)
+        for patterns, message in ERRORS:
+            if any(p in webpage for p in patterns):
+                raise ExtractorError(
+                    'Video %s %s' % (video_id, message), expected=True)
+
+        info = self._search_json_ld(webpage, video_id, default={})
+
+        if not info.get('title'):
+            info['title'] = self._html_search_regex(
+                (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
+                 r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
+                webpage, 'title', group='title',
+                default=None) or self._og_search_title(webpage)
  
          formats = []
          sources = self._parse_json(
@@ -88,28 +99,28 @@ class RedTubeIE(InfoExtractor):
  
          thumbnail = self._og_search_thumbnail(webpage)
          upload_date = unified_strdate(self._search_regex(
-            r'<span[^>]+>ADDED ([^<]+)<',
-            webpage, 'upload date', fatal=False))
+            r'<span[^>]+>(?:ADDED|Published on) ([^<]+)<',
+            webpage, 'upload date', default=None))
          duration = int_or_none(self._og_search_property(
              'video:duration', webpage, default=None) or self._search_regex(
                  r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None))
          view_count = str_to_int(self._search_regex(
              (r'<div[^>]*>Views</div>\s*<div[^>]*>\s*([\d,.]+)',
-             r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)'),
-            webpage, 'view count', fatal=False))
+             r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)',
+             r'<span[^>]+\bclass=["\']video_view_count[^>]*>\s*([\d,.]+)'),
+            webpage, 'view count', default=None))
  
          # No self-labeling, but they describe themselves as
          # "Home of Videos Porno"
          age_limit = 18
  
-        return {
+        return merge_dicts(info, {
              'id': video_id,
              'ext': 'mp4',
-            'title': title,
              'thumbnail': thumbnail,
              'upload_date': upload_date,
              'duration': duration,
              'view_count': view_count,
              'age_limit': age_limit,
              'formats': formats,
-        }
+        })
diff --git a/youtube_dl/extractor/revision3.py b/youtube_dl/extractor/revision3.py

deleted file mode 100644 (file)

index 833d8a2..0000000
--- a/youtube_dl/extractor/revision3.py
+++ /dev/null
@@ -1,170 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    int_or_none,
-    parse_iso8601,
-    unescapeHTML,
-    qualities,
-)
-
-
-class Revision3EmbedIE(InfoExtractor):
-    IE_NAME = 'revision3:embed'
-    _VALID_URL = r'(?:revision3:(?:(?P<playlist_type>[^:]+):)?|https?://(?:(?:(?:www|embed)\.)?(?:revision3|animalist)|(?:(?:api|embed)\.)?seekernetwork)\.com/player/embed\?videoId=)(?P<playlist_id>\d+)'
-    _TEST = {
-        'url': 'http://api.seekernetwork.com/player/embed?videoId=67558',
-        'md5': '83bcd157cab89ad7318dd7b8c9cf1306',
-        'info_dict': {
-            'id': '67558',
-            'ext': 'mp4',
-            'title': 'The Pros & Cons Of Zoos',
-            'description': 'Zoos are often depicted as a terrible place for animals to live, but is there any truth to this?',
-            'uploader_id': 'dnews',
-            'uploader': 'DNews',
-        }
-    }
-    _API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        playlist_id = mobj.group('playlist_id')
-        playlist_type = mobj.group('playlist_type') or 'video_id'
-        video_data = self._download_json(
-            'http://revision3.com/api/getPlaylist.json', playlist_id, query={
-                'api_key': self._API_KEY,
-                'codecs': 'h264,vp8,theora',
-                playlist_type: playlist_id,
-            })['items'][0]
-
-        formats = []
-        for vcodec, media in video_data['media'].items():
-            for quality_id, quality in media.items():
-                if quality_id == 'hls':
-                    formats.extend(self._extract_m3u8_formats(
-                        quality['url'], playlist_id, 'mp4',
-                        'm3u8_native', m3u8_id='hls', fatal=False))
-                else:
-                    formats.append({
-                        'url': quality['url'],
-                        'format_id': '%s-%s' % (vcodec, quality_id),
-                        'tbr': int_or_none(quality.get('bitrate')),
-                        'vcodec': vcodec,
-                    })
-        self._sort_formats(formats)
-
-        return {
-            'id': playlist_id,
-            'title': unescapeHTML(video_data['title']),
-            'description': unescapeHTML(video_data.get('summary')),
-            'uploader': video_data.get('show', {}).get('name'),
-            'uploader_id': video_data.get('show', {}).get('slug'),
-            'duration': int_or_none(video_data.get('duration')),
-            'formats': formats,
-        }
-
-
-class Revision3IE(InfoExtractor):
-    IE_NAME = 'revision'
-    _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:revision3|animalist)\.com)/(?P<id>[^/]+(?:/[^/?#]+)?)'
-    _TESTS = [{
-        'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016',
-        'md5': 'd94a72d85d0a829766de4deb8daaf7df',
-        'info_dict': {
-            'id': '71089',
-            'display_id': 'technobuffalo/5-google-predictions-for-2016',
-            'ext': 'webm',
-            'title': '5 Google Predictions for 2016',
-            'description': 'Google had a great 2015, but it\'s already time to look ahead. Here are our five predictions for 2016.',
-            'upload_date': '20151228',
-            'timestamp': 1451325600,
-            'duration': 187,
-            'uploader': 'TechnoBuffalo',
-            'uploader_id': 'technobuffalo',
-        }
-    }, {
-        # Show
-        'url': 'http://revision3.com/variant',
-        'only_matching': True,
-    }, {
-        # Tag
-        'url': 'http://revision3.com/vr',
-        'only_matching': True,
-    }]
-    _PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s'
-
-    def _real_extract(self, url):
-        domain, display_id = re.match(self._VALID_URL, url).groups()
-        site = domain.split('.')[0]
-        page_info = self._download_json(
-            self._PAGE_DATA_TEMPLATE % (domain, display_id, domain), display_id)
-
-        page_data = page_info['data']
-        page_type = page_data['type']
-        if page_type in ('episode', 'embed'):
-            show_data = page_data['show']['data']
-            page_id = compat_str(page_data['id'])
-            video_id = compat_str(page_data['video']['data']['id'])
-
-            preference = qualities(['mini', 'small', 'medium', 'large'])
-            thumbnails = [{
-                'url': image_url,
-                'id': image_id,
-                'preference': preference(image_id)
-            } for image_id, image_url in page_data.get('images', {}).items()]
-
-            info = {
-                'id': page_id,
-                'display_id': display_id,
-                'title': unescapeHTML(page_data['name']),
-                'description': unescapeHTML(page_data.get('summary')),
-                'timestamp': parse_iso8601(page_data.get('publishTime'), ' '),
-                'author': page_data.get('author'),
-                'uploader': show_data.get('name'),
-                'uploader_id': show_data.get('slug'),
-                'thumbnails': thumbnails,
-                'extractor_key': site,
-            }
-
-            if page_type == 'embed':
-                info.update({
-                    '_type': 'url_transparent',
-                    'url': page_data['video']['data']['embed'],
-                })
-                return info
-
-            info.update({
-                '_type': 'url_transparent',
-                'url': 'revision3:%s' % video_id,
-            })
-            return info
-        else:
-            list_data = page_info[page_type]['data']
-            episodes_data = page_info['episodes']['data']
-            num_episodes = page_info['meta']['totalEpisodes']
-            processed_episodes = 0
-            entries = []
-            page_num = 1
-            while True:
-                entries.extend([{
-                    '_type': 'url',
-                    'url': 'http://%s%s' % (domain, episode['path']),
-                    'id': compat_str(episode['id']),
-                    'ie_key': 'Revision3',
-                    'extractor_key': site,
-                } for episode in episodes_data])
-                processed_episodes += len(episodes_data)
-                if processed_episodes == num_episodes:
-                    break
-                page_num += 1
-                episodes_data = self._download_json(self._PAGE_DATA_TEMPLATE % (
-                    domain, display_id + '/' + compat_str(page_num), domain),
-                    display_id)['episodes']['data']
-
-            return self.playlist_result(
-                entries, compat_str(list_data['id']),
-                list_data.get('name'), list_data.get('summary'))
diff --git a/youtube_dl/extractor/roosterteeth.py b/youtube_dl/extractor/roosterteeth.py

index 8d88ee4994827b8cc648f03986e2015a2c2a1751..8883639b290b9712eb94ef05ab41ecaff7b06ee3 100644 (file)
--- a/youtube_dl/extractor/roosterteeth.py
+++ b/youtube_dl/extractor/roosterteeth.py
@@ -1,8 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
@@ -18,7 +16,6 @@ from ..utils import (
  
  class RoosterTeethIE(InfoExtractor):
      _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/(?:episode|watch)/(?P<id>[^/?#&]+)'
-    _LOGIN_URL = 'https://roosterteeth.com/login'
      _NETRC_MACHINE = 'roosterteeth'
      _TESTS = [{
          'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
@@ -53,48 +50,40 @@ class RoosterTeethIE(InfoExtractor):
          'url': 'https://roosterteeth.com/watch/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
          'only_matching': True,
      }]
+    _EPISODE_BASE_URL = 'https://svod-be.roosterteeth.com/api/v1/episodes/'
  
      def _login(self):
          username, password = self._get_login_info()
          if username is None:
              return
  
-        login_page = self._download_webpage(
-            self._LOGIN_URL, None,
-            note='Downloading login page',
-            errnote='Unable to download login page')
-
-        login_form = self._hidden_inputs(login_page)
-
-        login_form.update({
-            'username': username,
-            'password': password,
-        })
-
-        login_request = self._download_webpage(
-            self._LOGIN_URL, None,
-            note='Logging in',
-            data=urlencode_postdata(login_form),
-            headers={
-                'Referer': self._LOGIN_URL,
-            })
-
-        if not any(re.search(p, login_request) for p in (
-                r'href=["\']https?://(?:www\.)?roosterteeth\.com/logout"',
-                r'>Sign Out<')):
-            error = self._html_search_regex(
-                r'(?s)<div[^>]+class=(["\']).*?\balert-danger\b.*?\1[^>]*>(?:\s*<button[^>]*>.*?</button>)?(?P<error>.+?)</div>',
-                login_request, 'alert', default=None, group='error')
-            if error:
-                raise ExtractorError('Unable to login: %s' % error, expected=True)
-            raise ExtractorError('Unable to log in')
+        try:
+            self._download_json(
+                'https://auth.roosterteeth.com/oauth/token',
+                None, 'Logging in', data=urlencode_postdata({
+                    'client_id': '4338d2b4bdc8db1239360f28e72f0d9ddb1fd01e7a38fbb07b4b1f4ba4564cc5',
+                    'grant_type': 'password',
+                    'username': username,
+                    'password': password,
+                }))
+        except ExtractorError as e:
+            msg = 'Unable to login'
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                resp = self._parse_json(e.cause.read().decode(), None, fatal=False)
+                if resp:
+                    error = resp.get('extra_info') or resp.get('error_description') or resp.get('error')
+                    if error:
+                        msg += ': ' + error
+            self.report_warning(msg)
  
      def _real_initialize(self):
+        if self._get_cookies(self._EPISODE_BASE_URL).get('rt_access_token'):
+            return
          self._login()
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
-        api_episode_url = 'https://svod-be.roosterteeth.com/api/v1/episodes/%s' % display_id
+        api_episode_url = self._EPISODE_BASE_URL + display_id
  
          try:
              m3u8_url = self._download_json(
diff --git a/youtube_dl/extractor/safari.py b/youtube_dl/extractor/safari.py

index bd9ee1647d47d47bfc8d8341139c2bf953ecf158..4942437c7dbe918bd408bf805f8b76cc1d7967e4 100644 (file)
--- a/youtube_dl/extractor/safari.py
+++ b/youtube_dl/extractor/safari.py
@@ -165,7 +165,8 @@ class SafariIE(SafariBaseIE):
              kaltura_session = self._download_json(
                  '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
                  video_id, 'Downloading kaltura session JSON',
-                'Unable to download kaltura session JSON', fatal=False)
+                'Unable to download kaltura session JSON', fatal=False,
+                headers={'Accept': 'application/json'})
              if kaltura_session:
                  session = kaltura_session.get('session')
                  if session:
diff --git a/youtube_dl/extractor/scrippsnetworks.py b/youtube_dl/extractor/scrippsnetworks.py

index 8b3275735b1638b98c9c64f8360a6498c525a8f7..b40b4c4afded1b6f9541d60b3b2d3fb5fe0c5973 100644 (file)
--- a/youtube_dl/extractor/scrippsnetworks.py
+++ b/youtube_dl/extractor/scrippsnetworks.py
@@ -7,6 +7,7 @@ import re
  
  from .aws import AWSIE
  from .anvato import AnvatoIE
+from .common import InfoExtractor
  from ..utils import (
      smuggle_url,
      urlencode_postdata,
@@ -102,3 +103,50 @@ class ScrippsNetworksWatchIE(AWSIE):
                  'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
                  {'geo_countries': ['US']}),
              AnvatoIE.ie_key(), video_id=mcp_id)
+
+
+class ScrippsNetworksIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
+        'info_dict': {
+            'id': '0260338',
+            'ext': 'mp4',
+            'title': 'The Best of the Best',
+            'description': 'Catch a new episode of MasterChef Canada Tuedsay at 9/8c.',
+            'timestamp': 1475678834,
+            'upload_date': '20161005',
+            'uploader': 'SCNI-SCND',
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'https://www.diynetwork.com/videos/diy-barnwood-tablet-stand-0265790',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.foodnetwork.com/videos/chocolate-strawberry-cake-roll-7524591',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.hgtv.com/videos/cookie-decorating-101-0301929',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
+        'only_matching': True,
+    }]
+    _ACCOUNT_MAP = {
+        'cookingchanneltv': 2433005105,
+        'discovery': 2706091867,
+        'diynetwork': 2433004575,
+        'foodnetwork': 2433005105,
+        'hgtv': 2433004575,
+        'travelchannel': 2433005739,
+    }
+    _TP_TEMPL = 'https://link.theplatform.com/s/ip77QC/media/guid/%d/%s?mbr=true'
+
+    def _real_extract(self, url):
+        site, guid = re.match(self._VALID_URL, url).groups()
+        return self.url_result(smuggle_url(
+            self._TP_TEMPL % (self._ACCOUNT_MAP[site], guid),
+            {'force_smil_url': True}), 'ThePlatform', guid)
diff --git a/youtube_dl/extractor/scte.py b/youtube_dl/extractor/scte.py

new file mode 100644 (file)

index 0000000..ca1de63
--- /dev/null
+++ b/youtube_dl/extractor/scte.py
@@ -0,0 +1,144 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    decode_packed_codes,
+    ExtractorError,
+    urlencode_postdata,
+)
+
+
+class SCTEBaseIE(InfoExtractor):
+    _LOGIN_URL = 'https://www.scte.org/SCTE/Sign_In.aspx'
+    _NETRC_MACHINE = 'scte'
+
+    def _real_initialize(self):
+        self._login()
+
+    def _login(self):
+        username, password = self._get_login_info()
+        if username is None:
+            return
+
+        login_popup = self._download_webpage(
+            self._LOGIN_URL, None, 'Downloading login popup')
+
+        def is_logged(webpage):
+            return any(re.search(p, webpage) for p in (
+                r'class=["\']welcome\b', r'>Sign Out<'))
+
+        # already logged in
+        if is_logged(login_popup):
+            return
+
+        login_form = self._hidden_inputs(login_popup)
+
+        login_form.update({
+            'ctl01$TemplateBody$WebPartManager1$gwpciNewContactSignInCommon$ciNewContactSignInCommon$signInUserName': username,
+            'ctl01$TemplateBody$WebPartManager1$gwpciNewContactSignInCommon$ciNewContactSignInCommon$signInPassword': password,
+            'ctl01$TemplateBody$WebPartManager1$gwpciNewContactSignInCommon$ciNewContactSignInCommon$RememberMe': 'on',
+        })
+
+        response = self._download_webpage(
+            self._LOGIN_URL, None, 'Logging in',
+            data=urlencode_postdata(login_form))
+
+        if '|pageRedirect|' not in response and not is_logged(response):
+            error = self._html_search_regex(
+                r'(?s)<[^>]+class=["\']AsiError["\'][^>]*>(.+?)</',
+                response, 'error message', default=None)
+            if error:
+                raise ExtractorError('Unable to login: %s' % error, expected=True)
+            raise ExtractorError('Unable to log in')
+
+
+class SCTEIE(SCTEBaseIE):
+    _VALID_URL = r'https?://learning\.scte\.org/mod/scorm/view\.php?.*?\bid=(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://learning.scte.org/mod/scorm/view.php?id=31484',
+        'info_dict': {
+            'title': 'Introduction to DOCSIS Engineering Professional',
+            'id': '31484',
+        },
+        'playlist_count': 5,
+        'skip': 'Requires account credentials',
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
+
+        context_id = self._search_regex(r'context-(\d+)', webpage, video_id)
+        content_base = 'https://learning.scte.org/pluginfile.php/%s/mod_scorm/content/8/' % context_id
+        context = decode_packed_codes(self._download_webpage(
+            '%smobile/data.js' % content_base, video_id))
+
+        data = self._parse_xml(
+            self._search_regex(
+                r'CreateData\(\s*"(.+?)"', context, 'data').replace(r"\'", "'"),
+            video_id)
+
+        entries = []
+        for asset in data.findall('.//asset'):
+            asset_url = asset.get('url')
+            if not asset_url or not asset_url.endswith('.mp4'):
+                continue
+            asset_id = self._search_regex(
+                r'video_([^_]+)_', asset_url, 'asset id', default=None)
+            if not asset_id:
+                continue
+            entries.append({
+                'id': asset_id,
+                'title': title,
+                'url': content_base + asset_url,
+            })
+
+        return self.playlist_result(entries, video_id, title)
+
+
+class SCTECourseIE(SCTEBaseIE):
+    _VALID_URL = r'https?://learning\.scte\.org/(?:mod/sub)?course/view\.php?.*?\bid=(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://learning.scte.org/mod/subcourse/view.php?id=31491',
+        'only_matching': True,
+    }, {
+        'url': 'https://learning.scte.org/course/view.php?id=3639',
+        'only_matching': True,
+    }, {
+        'url': 'https://learning.scte.org/course/view.php?id=3073',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        course_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, course_id)
+
+        title = self._search_regex(
+            r'<h1>(.+?)</h1>', webpage, 'title', default=None)
+
+        entries = []
+        for mobj in re.finditer(
+                r'''(?x)
+                    <a[^>]+
+                        href=(["\'])
+                        (?P<url>
+                            https?://learning\.scte\.org/mod/
+                            (?P<kind>scorm|subcourse)/view\.php?(?:(?!\1).)*?
+                            \bid=\d+
+                        )
+                    ''',
+                webpage):
+            item_url = mobj.group('url')
+            if item_url == url:
+                continue
+            ie = (SCTEIE.ie_key() if mobj.group('kind') == 'scorm'
+                  else SCTECourseIE.ie_key())
+            entries.append(self.url_result(item_url, ie=ie))
+
+        return self.playlist_result(entries, course_id, title)
diff --git a/youtube_dl/extractor/seeker.py b/youtube_dl/extractor/seeker.py

index 3b9c65e7ea012c239ba407e98d3f83f60cc9d48d..7872dc80d9d93c20233023c75b75ab94b289b94a 100644 (file)
--- a/youtube_dl/extractor/seeker.py
+++ b/youtube_dl/extractor/seeker.py
@@ -4,34 +4,37 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    get_element_by_class,
+    strip_or_none,
+)
  
  
  class SeekerIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?seeker\.com/(?P<display_id>.*)-(?P<article_id>\d+)\.html'
      _TESTS = [{
-        # player.loadRevision3Item
          'url': 'http://www.seeker.com/should-trump-be-required-to-release-his-tax-returns-1833805621.html',
-        'md5': '30c1dc4030cc715cf05b423d0947ac18',
+        'md5': '897d44bbe0d8986a2ead96de565a92db',
          'info_dict': {
-            'id': '76243',
-            'ext': 'webm',
+            'id': 'Elrn3gnY',
+            'ext': 'mp4',
              'title': 'Should Trump Be Required To Release His Tax Returns?',
-            'description': 'Donald Trump has been secretive about his "big," "beautiful" tax returns. So what can we learn if he decides to release them?',
-            'uploader': 'Seeker Daily',
-            'uploader_id': 'seekerdaily',
+            'description': 'md5:41efa8cfa8d627841045eec7b018eb45',
+            'timestamp': 1490090165,
+            'upload_date': '20170321',
          }
      }, {
          'url': 'http://www.seeker.com/changes-expected-at-zoos-following-recent-gorilla-lion-shootings-1834116536.html',
          'playlist': [
              {
-                'md5': '83bcd157cab89ad7318dd7b8c9cf1306',
+                'md5': '0497b9f20495174be73ae136949707d2',
                  'info_dict': {
-                    'id': '67558',
+                    'id': 'FihYQ8AE',
                      'ext': 'mp4',
                      'title': 'The Pros & Cons Of Zoos',
-                    'description': 'Zoos are often depicted as a terrible place for animals to live, but is there any truth to this?',
-                    'uploader': 'DNews',
-                    'uploader_id': 'dnews',
+                    'description': 'md5:d88f99a8ea8e7d25e6ff77f271b1271c',
+                    'timestamp': 1490039133,
+                    'upload_date': '20170320',
                  },
              }
          ],
@@ -45,13 +48,11 @@ class SeekerIE(InfoExtractor):
      def _real_extract(self, url):
          display_id, article_id = re.match(self._VALID_URL, url).groups()
          webpage = self._download_webpage(url, display_id)
-        mobj = re.search(r"player\.loadRevision3Item\('([^']+)'\s*,\s*(\d+)\);", webpage)
-        if mobj:
-            playlist_type, playlist_id = mobj.groups()
-            return self.url_result(
-                'revision3:%s:%s' % (playlist_type, playlist_id), 'Revision3Embed', playlist_id)
-        else:
-            entries = [self.url_result('revision3:video_id:%s' % video_id, 'Revision3Embed', video_id) for video_id in re.findall(
-                r'<iframe[^>]+src=[\'"](?:https?:)?//api\.seekernetwork\.com/player/embed\?videoId=(\d+)', webpage)]
-            return self.playlist_result(
-                entries, article_id, self._og_search_title(webpage), self._og_search_description(webpage))
+        entries = []
+        for jwp_id in re.findall(r'data-video-id="([a-zA-Z0-9]{8})"', webpage):
+            entries.append(self.url_result(
+                'jwplatform:' + jwp_id, 'JWPlatform', jwp_id))
+        return self.playlist_result(
+            entries, article_id,
+            self._og_search_title(webpage),
+            strip_or_none(get_element_by_class('subtitle__text', webpage)) or self._og_search_description(webpage))
diff --git a/youtube_dl/extractor/servingsys.py b/youtube_dl/extractor/servingsys.py

deleted file mode 100644 (file)

index c013d67..0000000
--- a/youtube_dl/extractor/servingsys.py
+++ /dev/null
@@ -1,72 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-)
-
-
-class ServingSysIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:[^.]+\.)?serving-sys\.com/BurstingPipe/adServer\.bs\?.*?&pli=(?P<id>[0-9]+)'
-
-    _TEST = {
-        'url': 'http://bs.serving-sys.com/BurstingPipe/adServer.bs?cn=is&c=23&pl=VAST&pli=5349193&PluID=0&pos=7135&ord=[timestamp]&cim=1?',
-        'info_dict': {
-            'id': '5349193',
-            'title': 'AdAPPter_Hyundai_demo',
-        },
-        'playlist': [{
-            'md5': 'baed851342df6846eb8677a60a011a0f',
-            'info_dict': {
-                'id': '29955898',
-                'ext': 'flv',
-                'title': 'AdAPPter_Hyundai_demo (1)',
-                'duration': 74,
-                'tbr': 1378,
-                'width': 640,
-                'height': 400,
-            },
-        }, {
-            'md5': '979b4da2655c4bc2d81aeb915a8c5014',
-            'info_dict': {
-                'id': '29907998',
-                'ext': 'flv',
-                'title': 'AdAPPter_Hyundai_demo (2)',
-                'duration': 34,
-                'width': 854,
-                'height': 480,
-                'tbr': 516,
-            },
-        }],
-        'params': {
-            'playlistend': 2,
-        },
-        '_skip': 'Blocked in the US [sic]',
-    }
-
-    def _real_extract(self, url):
-        pl_id = self._match_id(url)
-        vast_doc = self._download_xml(url, pl_id)
-
-        title = vast_doc.find('.//AdTitle').text
-        media = vast_doc.find('.//MediaFile').text
-        info_url = self._search_regex(r'&adData=([^&]+)&', media, 'info URL')
-
-        doc = self._download_xml(info_url, pl_id, 'Downloading video info')
-        entries = [{
-            '_type': 'video',
-            'id': a.attrib['id'],
-            'title': '%s (%s)' % (title, a.attrib['assetID']),
-            'url': a.attrib['URL'],
-            'duration': int_or_none(a.attrib.get('length')),
-            'tbr': int_or_none(a.attrib.get('bitrate')),
-            'height': int_or_none(a.attrib.get('height')),
-            'width': int_or_none(a.attrib.get('width')),
-        } for a in doc.findall('.//AdditionalAssets/asset')]
-
-        return {
-            '_type': 'playlist',
-            'id': pl_id,
-            'title': title,
-            'entries': entries,
-        }
diff --git a/youtube_dl/extractor/shared.py b/youtube_dl/extractor/shared.py

index ff575f592003beb91314b0101fb97b3a27a67432..02295d1a4cacf570ee903c06161b072f35a6da77 100644 (file)
--- a/youtube_dl/extractor/shared.py
+++ b/youtube_dl/extractor/shared.py
@@ -1,13 +1,18 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_b64decode
+from ..compat import (
+    compat_b64decode,
+    compat_urllib_parse_unquote_plus,
+)
  from ..utils import (
      determine_ext,
      ExtractorError,
      int_or_none,
+    js_to_json,
      KNOWN_EXTENSIONS,
      parse_filesize,
+    rot47,
      url_or_none,
      urlencode_postdata,
  )
@@ -112,16 +117,22 @@ class VivoIE(SharedBaseIE):
              webpage, 'filesize', fatal=False))
  
      def _extract_video_url(self, webpage, video_id, url):
-        def decode_url(encoded_url):
+        def decode_url_old(encoded_url):
              return compat_b64decode(encoded_url).decode('utf-8')
  
-        stream_url = url_or_none(decode_url(self._search_regex(
+        stream_url = self._search_regex(
              r'data-stream\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
-            'stream url', default=None, group='url')))
+            'stream url', default=None, group='url')
+        if stream_url:
+            stream_url = url_or_none(decode_url_old(stream_url))
          if stream_url:
              return stream_url
-        return self._parse_json(
+
+        def decode_url(encoded_url):
+            return rot47(compat_urllib_parse_unquote_plus(encoded_url))
+
+        return decode_url(self._parse_json(
              self._search_regex(
-                r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
-                webpage, 'stream', group='url'),
-            video_id, transform_source=decode_url)[0]
+                r'(?s)InitializeStream\s*\(\s*({.+?})\s*\)\s*;', webpage,
+                'stream'),
+            video_id, transform_source=js_to_json)['source'])
diff --git a/youtube_dl/extractor/slideslive.py b/youtube_dl/extractor/slideslive.py

index ed84322c5131b49a1acac4deedb4d049832d7547..d9ea76831e7ebdc37bad0e722cd92268bfd8eb6f 100644 (file)
--- a/youtube_dl/extractor/slideslive.py
+++ b/youtube_dl/extractor/slideslive.py
@@ -2,7 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import smuggle_url
  
  
  class SlidesLiveIE(InfoExtractor):
@@ -14,9 +14,9 @@ class SlidesLiveIE(InfoExtractor):
          'info_dict': {
              'id': 'LMtgR8ba0b0',
              'ext': 'mp4',
-            'title': '38902413: external video',
-            'description': '3890241320170925-9-1yd6ech.mp4',
-            'uploader': 'SlidesLive Administrator',
+            'title': 'GCC IA16 backend',
+            'description': 'Watch full version of this video at https://slideslive.com/38902413.',
+            'uploader': 'SlidesLive Videos - A',
              'uploader_id': 'UC62SdArr41t_-_fX40QCLRw',
              'upload_date': '20170925',
          }
@@ -24,16 +24,38 @@ class SlidesLiveIE(InfoExtractor):
          # video_service_name = youtube
          'url': 'https://slideslive.com/38903721/magic-a-scientific-resurrection-of-an-esoteric-legend',
          'only_matching': True,
+    }, {
+        # video_service_name = url
+        'url': 'https://slideslive.com/38922070/learning-transferable-skills-1',
+        'only_matching': True,
+    }, {
+        # video_service_name = vimeo
+        'url': 'https://slideslive.com/38921896/retrospectives-a-venue-for-selfreflection-in-ml-research-3',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          video_data = self._download_json(
-            url, video_id, headers={'Accept': 'application/json'})
+            'https://ben.slideslive.com/player/' + video_id, video_id)
          service_name = video_data['video_service_name'].lower()
-        if service_name == 'youtube':
-            yt_video_id = video_data['video_service_id']
-            return self.url_result(yt_video_id, 'Youtube', video_id=yt_video_id)
+        assert service_name in ('url', 'vimeo', 'youtube')
+        service_id = video_data['video_service_id']
+        info = {
+            'id': video_id,
+            'thumbnail': video_data.get('thumbnail'),
+            'url': service_id,
+        }
+        if service_name == 'url':
+            info['title'] = video_data['title']
          else:
-            raise ExtractorError(
-                'Unsupported service name: {0}'.format(service_name), expected=True)
+            info.update({
+                '_type': 'url_transparent',
+                'ie_key': service_name.capitalize(),
+                'title': video_data.get('title'),
+            })
+            if service_name == 'vimeo':
+                info['url'] = smuggle_url(
+                    'https://player.vimeo.com/video/' + service_id,
+                    {'http_headers': {'Referer': url}})
+        return info
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 05538f3d6b8733fe30f45ef642a9d730bbd39cd5..a0b09f5b1747e3b1b9b3a17b10d482b47512143a 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -9,16 +9,18 @@ from .common import (
      SearchInfoExtractor
  )
  from ..compat import (
+    compat_HTTPError,
+    compat_kwargs,
      compat_str,
      compat_urlparse,
-    compat_urllib_parse_urlencode,
  )
  from ..utils import (
+    error_to_compat_str,
      ExtractorError,
      float_or_none,
+    HEADRequest,
      int_or_none,
      KNOWN_EXTENSIONS,
-    merge_dicts,
      mimetype2ext,
      str_or_none,
      try_get,
@@ -28,6 +30,30 @@ from ..utils import (
  )
  
  
+class SoundcloudEmbedIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:w|player|p)\.soundcloud\.com/player/?.*?\burl=(?P<id>.+)'
+    _TEST = {
+        # from https://www.soundi.fi/uutiset/ennakkokuuntelussa-timo-kaukolammen-station-to-station-to-station-julkaisua-juhlitaan-tanaan-g-livelabissa/
+        'url': 'https://w.soundcloud.com/player/?visual=true&url=https%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F922213810&show_artwork=true&maxwidth=640&maxheight=960&dnt=1&secret_token=s-ziYey',
+        'only_matching': True,
+    }
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [m.group('url') for m in re.finditer(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:w\.)?soundcloud\.com/player.+?)\1',
+            webpage)]
+
+    def _real_extract(self, url):
+        query = compat_urlparse.parse_qs(
+            compat_urlparse.urlparse(url).query)
+        api_url = query['url'][0]
+        secret_token = query.get('secret_token')
+        if secret_token:
+            api_url = update_url_query(api_url, {'secret_token': secret_token[0]})
+        return self.url_result(api_url)
+
+
  class SoundcloudIE(InfoExtractor):
      """Information extractor for soundcloud.com
         To access the media, the uid of the song and a stream token
@@ -44,9 +70,8 @@ class SoundcloudIE(InfoExtractor):
                              (?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#]))
                              (?P<title>[\w\d-]+)/?
                              (?P<token>[^?]+?)?(?:[?].*)?$)
-                       |(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+)
+                       |(?:api(?:-v2)?\.soundcloud\.com/tracks/(?P<track_id>\d+)
                            (?:/?\?secret_token=(?P<secret_token>[^&]+))?)
-                       |(?P<player>(?:w|player|p.)\.soundcloud\.com/player/?.*?url=.*)
                      )
                      '''
      IE_NAME = 'soundcloud'
@@ -60,6 +85,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
                  'description': 'No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o\'d',
                  'uploader': 'E.T. ExTerrestrial Music',
+                'uploader_id': '1571244',
                  'timestamp': 1349920598,
                  'upload_date': '20121011',
                  'duration': 143.216,
@@ -70,7 +96,7 @@ class SoundcloudIE(InfoExtractor):
                  'repost_count': int,
              }
          },
-        # not streamable song
+        # not streamable song, preview
          {
              'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
              'info_dict': {
@@ -79,6 +105,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Goldrushed',
                  'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com',
                  'uploader': 'The Royal Concept',
+                'uploader_id': '9615865',
                  'timestamp': 1337635207,
                  'upload_date': '20120521',
                  'duration': 30,
@@ -103,6 +130,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Youtube - Dl Test Video \'\' Ä↭',
                  'description': 'test chars:  \"\'/\\ä↭',
                  'uploader': 'jaimeMF',
+                'uploader_id': '69767071',
                  'timestamp': 1386604920,
                  'upload_date': '20131209',
                  'duration': 9.927,
@@ -123,6 +151,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Youtube - Dl Test Video \'\' Ä↭',
                  'description': 'test chars:  \"\'/\\ä↭',
                  'uploader': 'jaimeMF',
+                'uploader_id': '69767071',
                  'timestamp': 1386604920,
                  'upload_date': '20131209',
                  'duration': 9.927,
@@ -143,6 +172,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Bus Brakes',
                  'description': 'md5:0053ca6396e8d2fd7b7e1595ef12ab66',
                  'uploader': 'oddsamples',
+                'uploader_id': '73680509',
                  'timestamp': 1389232924,
                  'upload_date': '20140109',
                  'duration': 17.346,
@@ -163,6 +193,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
                  'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
                  'uploader': 'Ori Uplift Music',
+                'uploader_id': '12563093',
                  'timestamp': 1504206263,
                  'upload_date': '20170831',
                  'duration': 7449.096,
@@ -183,6 +214,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Sideways (Prod. Mad Real)',
                  'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
                  'uploader': 'garyvee',
+                'uploader_id': '2366352',
                  'timestamp': 1488152409,
                  'upload_date': '20170226',
                  'duration': 207.012,
@@ -207,6 +239,7 @@ class SoundcloudIE(InfoExtractor):
                  'title': 'Mezzo Valzer',
                  'description': 'md5:4138d582f81866a530317bae316e8b61',
                  'uploader': 'Giovanni Sarani',
+                'uploader_id': '3352531',
                  'timestamp': 1551394171,
                  'upload_date': '20190228',
                  'duration': 180.157,
@@ -221,114 +254,124 @@ class SoundcloudIE(InfoExtractor):
          }
      ]
  
-    _CLIENT_ID = 'BeGVhOrGmfboy1LtiHTQF6Ejpt9ULJCI'
+    _API_BASE = 'https://api.soundcloud.com/'
+    _API_V2_BASE = 'https://api-v2.soundcloud.com/'
+    _BASE_URL = 'https://soundcloud.com/'
+    _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
+
+    _ARTWORK_MAP = {
+        'mini': 16,
+        'tiny': 20,
+        'small': 32,
+        'badge': 47,
+        't67x67': 67,
+        'large': 100,
+        't300x300': 300,
+        'crop': 400,
+        't500x500': 500,
+        'original': 0,
+    }
  
-    @staticmethod
-    def _extract_urls(webpage):
-        return [m.group('url') for m in re.finditer(
-            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:w\.)?soundcloud\.com/player.+?)\1',
-            webpage)]
+    def _store_client_id(self, client_id):
+        self._downloader.cache.store('soundcloud', 'client_id', client_id)
+
+    def _update_client_id(self):
+        webpage = self._download_webpage('https://soundcloud.com/', None)
+        for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
+            script = self._download_webpage(src, None, fatal=False)
+            if script:
+                client_id = self._search_regex(
+                    r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
+                    script, 'client id', default=None)
+                if client_id:
+                    self._CLIENT_ID = client_id
+                    self._store_client_id(client_id)
+                    return
+        raise ExtractorError('Unable to extract client id')
+
+    def _download_json(self, *args, **kwargs):
+        non_fatal = kwargs.get('fatal') is False
+        if non_fatal:
+            del kwargs['fatal']
+        query = kwargs.get('query', {}).copy()
+        for _ in range(2):
+            query['client_id'] = self._CLIENT_ID
+            kwargs['query'] = query
+            try:
+                return super(SoundcloudIE, self)._download_json(*args, **compat_kwargs(kwargs))
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                    self._store_client_id(None)
+                    self._update_client_id()
+                    continue
+                elif non_fatal:
+                    self._downloader.report_warning(error_to_compat_str(e))
+                    return False
+                raise
+
+    def _real_initialize(self):
+        self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
  
      @classmethod
      def _resolv_url(cls, url):
-        return 'https://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
+        return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url
  
-    def _extract_info_dict(self, info, full_title=None, quiet=False, secret_token=None):
+    def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2):
          track_id = compat_str(info['id'])
          title = info['title']
-        name = full_title or track_id
-        if quiet:
-            self.report_extraction(name)
-        thumbnail = info.get('artwork_url') or info.get('user', {}).get('avatar_url')
-        if isinstance(thumbnail, compat_str):
-            thumbnail = thumbnail.replace('-large', '-t500x500')
-        username = try_get(info, lambda x: x['user']['username'], compat_str)
-
-        def extract_count(key):
-            return int_or_none(info.get('%s_count' % key))
-
-        like_count = extract_count('favoritings')
-        if like_count is None:
-            like_count = extract_count('likes')
-
-        result = {
-            'id': track_id,
-            'uploader': username,
-            'timestamp': unified_timestamp(info.get('created_at')),
-            'title': title,
-            'description': info.get('description'),
-            'thumbnail': thumbnail,
-            'duration': float_or_none(info.get('duration'), 1000),
-            'webpage_url': info.get('permalink_url'),
-            'license': info.get('license'),
-            'view_count': extract_count('playback'),
-            'like_count': like_count,
-            'comment_count': extract_count('comment'),
-            'repost_count': extract_count('reposts'),
-            'genre': info.get('genre'),
-        }
+        track_base_url = self._API_BASE + 'tracks/%s' % track_id
  
          format_urls = set()
          formats = []
          query = {'client_id': self._CLIENT_ID}
-        if secret_token is not None:
+        if secret_token:
              query['secret_token'] = secret_token
-        if info.get('downloadable', False):
-            # We can build a direct link to the song
+
+        if info.get('downloadable') and info.get('has_downloads_left'):
              format_url = update_url_query(
-                'https://api.soundcloud.com/tracks/%s/download' % track_id, query)
+                info.get('download_url') or track_base_url + '/download', query)
              format_urls.add(format_url)
+            if version == 2:
+                v1_info = self._download_json(
+                    track_base_url, track_id, query=query, fatal=False) or {}
+            else:
+                v1_info = info
              formats.append({
                  'format_id': 'download',
-                'ext': info.get('original_format', 'mp3'),
+                'ext': v1_info.get('original_format') or 'mp3',
+                'filesize': int_or_none(v1_info.get('original_content_size')),
                  'url': format_url,
-                'vcodec': 'none',
                  'preference': 10,
              })
  
-        # Old API, does not work for some tracks (e.g.
-        # https://soundcloud.com/giovannisarani/mezzo-valzer)
-        format_dict = self._download_json(
-            'https://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
-            track_id, 'Downloading track url', query=query, fatal=False)
-
-        if format_dict:
-            for key, stream_url in format_dict.items():
-                if stream_url in format_urls:
-                    continue
-                format_urls.add(stream_url)
-                ext, abr = 'mp3', None
-                mobj = re.search(r'_([^_]+)_(\d+)_url', key)
-                if mobj:
-                    ext, abr = mobj.groups()
-                    abr = int(abr)
-                if key.startswith('http'):
-                    stream_formats = [{
-                        'format_id': key,
-                        'ext': ext,
-                        'url': stream_url,
-                    }]
-                elif key.startswith('rtmp'):
-                    # The url doesn't have an rtmp app, we have to extract the playpath
-                    url, path = stream_url.split('mp3:', 1)
-                    stream_formats = [{
-                        'format_id': key,
-                        'url': url,
-                        'play_path': 'mp3:' + path,
-                        'ext': 'flv',
-                    }]
-                elif key.startswith('hls'):
-                    stream_formats = self._extract_m3u8_formats(
-                        stream_url, track_id, ext, entry_protocol='m3u8_native',
-                        m3u8_id=key, fatal=False)
-                else:
-                    continue
-
-                if abr:
-                    for f in stream_formats:
-                        f['abr'] = abr
+        def invalid_url(url):
+            return not url or url in format_urls
  
-                formats.extend(stream_formats)
+        def add_format(f, protocol, is_preview=False):
+            mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
+            if mobj:
+                for k, v in mobj.groupdict().items():
+                    if not f.get(k):
+                        f[k] = v
+            format_id_list = []
+            if protocol:
+                format_id_list.append(protocol)
+            for k in ('ext', 'abr'):
+                v = f.get(k)
+                if v:
+                    format_id_list.append(v)
+            preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
+            if preview:
+                format_id_list.append('preview')
+            abr = f.get('abr')
+            if abr:
+                f['abr'] = int(abr)
+            f.update({
+                'format_id': '_'.join(format_id_list),
+                'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
+                'preference': -10 if preview else None,
+            })
+            formats.append(f)
  
          # New API
          transcodings = try_get(
@@ -340,126 +383,161 @@ class SoundcloudIE(InfoExtractor):
              if not format_url:
                  continue
              stream = self._download_json(
-                update_url_query(format_url, query), track_id, fatal=False)
+                format_url, track_id, query=query, fatal=False)
              if not isinstance(stream, dict):
                  continue
              stream_url = url_or_none(stream.get('url'))
-            if not stream_url:
-                continue
-            if stream_url in format_urls:
+            if invalid_url(stream_url):
                  continue
              format_urls.add(stream_url)
-            protocol = try_get(t, lambda x: x['format']['protocol'], compat_str)
+            stream_format = t.get('format') or {}
+            protocol = stream_format.get('protocol')
              if protocol != 'hls' and '/hls' in format_url:
                  protocol = 'hls'
              ext = None
              preset = str_or_none(t.get('preset'))
              if preset:
                  ext = preset.split('_')[0]
-                if ext not in KNOWN_EXTENSIONS:
-                    mimetype = try_get(
-                        t, lambda x: x['format']['mime_type'], compat_str)
-                    ext = mimetype2ext(mimetype) or 'mp3'
-            format_id_list = []
-            if protocol:
-                format_id_list.append(protocol)
-            format_id_list.append(ext)
-            format_id = '_'.join(format_id_list)
-            formats.append({
+            if ext not in KNOWN_EXTENSIONS:
+                ext = mimetype2ext(stream_format.get('mime_type'))
+            add_format({
                  'url': stream_url,
-                'format_id': format_id,
                  'ext': ext,
-                'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
-            })
+            }, 'http' if protocol == 'progressive' else protocol,
+                t.get('snipped') or '/preview/' in format_url)
+
+        if not formats:
+            # Old API, does not work for some tracks (e.g.
+            # https://soundcloud.com/giovannisarani/mezzo-valzer)
+            # and might serve preview URLs (e.g.
+            # http://www.soundcloud.com/snbrn/ele)
+            format_dict = self._download_json(
+                track_base_url + '/streams', track_id,
+                'Downloading track url', query=query, fatal=False) or {}
+
+            for key, stream_url in format_dict.items():
+                if invalid_url(stream_url):
+                    continue
+                format_urls.add(stream_url)
+                mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
+                if mobj:
+                    protocol, ext, abr = mobj.groups()
+                    add_format({
+                        'abr': abr,
+                        'ext': ext,
+                        'url': stream_url,
+                    }, protocol)
  
          if not formats:
              # We fallback to the stream_url in the original info, this
              # cannot be always used, sometimes it can give an HTTP 404 error
-            formats.append({
-                'format_id': 'fallback',
-                'url': update_url_query(info['stream_url'], query),
-                'ext': 'mp3',
-            })
-            self._check_formats(formats, track_id)
+            urlh = self._request_webpage(
+                HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
+                track_id, query=query, fatal=False)
+            if urlh:
+                stream_url = urlh.geturl()
+                if not invalid_url(stream_url):
+                    add_format({'url': stream_url}, 'http')
  
          for f in formats:
              f['vcodec'] = 'none'
  
          self._sort_formats(formats)
-        result['formats'] = formats
  
-        return result
+        user = info.get('user') or {}
+
+        thumbnails = []
+        artwork_url = info.get('artwork_url')
+        thumbnail = artwork_url or user.get('avatar_url')
+        if isinstance(thumbnail, compat_str):
+            if re.search(self._IMAGE_REPL_RE, thumbnail):
+                for image_id, size in self._ARTWORK_MAP.items():
+                    i = {
+                        'id': image_id,
+                        'url': re.sub(self._IMAGE_REPL_RE, '-%s.jpg' % image_id, thumbnail),
+                    }
+                    if image_id == 'tiny' and not artwork_url:
+                        size = 18
+                    elif image_id == 'original':
+                        i['preference'] = 10
+                    if size:
+                        i.update({
+                            'width': size,
+                            'height': size,
+                        })
+                    thumbnails.append(i)
+            else:
+                thumbnails = [{'url': thumbnail}]
+
+        def extract_count(key):
+            return int_or_none(info.get('%s_count' % key))
+
+        return {
+            'id': track_id,
+            'uploader': user.get('username'),
+            'uploader_id': str_or_none(user.get('id')) or user.get('permalink'),
+            'uploader_url': user.get('permalink_url'),
+            'timestamp': unified_timestamp(info.get('created_at')),
+            'title': title,
+            'description': info.get('description'),
+            'thumbnails': thumbnails,
+            'duration': float_or_none(info.get('duration'), 1000),
+            'webpage_url': info.get('permalink_url'),
+            'license': info.get('license'),
+            'view_count': extract_count('playback'),
+            'like_count': extract_count('favoritings') or extract_count('likes'),
+            'comment_count': extract_count('comment'),
+            'repost_count': extract_count('reposts'),
+            'genre': info.get('genre'),
+            'formats': formats
+        }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
-        if mobj is None:
-            raise ExtractorError('Invalid URL: %s' % url)
+        mobj = re.match(self._VALID_URL, url)
  
          track_id = mobj.group('track_id')
-        new_info = {}
  
-        if track_id is not None:
-            info_json_url = 'https://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
+        query = {}
+        if track_id:
+            info_json_url = self._API_V2_BASE + 'tracks/' + track_id
              full_title = track_id
              token = mobj.group('secret_token')
              if token:
-                info_json_url += '&secret_token=' + token
-        elif mobj.group('player'):
-            query = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
-            real_url = query['url'][0]
-            # If the token is in the query of the original url we have to
-            # manually add it
-            if 'secret_token' in query:
-                real_url += '?secret_token=' + query['secret_token'][0]
-            return self.url_result(real_url)
+                query['secret_token'] = token
          else:
-            # extract uploader (which is in the url)
-            uploader = mobj.group('uploader')
-            # extract simple title (uploader + slug of song title)
-            slug_title = mobj.group('title')
+            full_title = resolve_title = '%s/%s' % mobj.group('uploader', 'title')
              token = mobj.group('token')
-            full_title = resolve_title = '%s/%s' % (uploader, slug_title)
              if token:
                  resolve_title += '/%s' % token
+            info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
  
-            webpage = self._download_webpage(url, full_title, fatal=False)
-            if webpage:
-                entries = self._parse_json(
-                    self._search_regex(
-                        r'var\s+c\s*=\s*(\[.+?\])\s*,\s*o\s*=Date\b', webpage,
-                        'data', default='[]'), full_title, fatal=False)
-                if entries:
-                    for e in entries:
-                        if not isinstance(e, dict):
-                            continue
-                        if e.get('id') != 67:
-                            continue
-                        data = try_get(e, lambda x: x['data'][0], dict)
-                        if data:
-                            new_info = data
-                            break
-                info_json_url = self._resolv_url(
-                    'https://soundcloud.com/%s' % resolve_title)
-
-        # Contains some additional info missing from new_info
+        version = 2
          info = self._download_json(
-            info_json_url, full_title, 'Downloading info JSON')
+            info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False)
+        if not info:
+            info = self._download_json(
+                info_json_url.replace(self._API_V2_BASE, self._API_BASE),
+                full_title, 'Downloading info JSON', query=query)
+            version = 1
  
-        return self._extract_info_dict(
-            merge_dicts(info, new_info), full_title, secret_token=token)
+        return self._extract_info_dict(info, full_title, token, version)
  
  
  class SoundcloudPlaylistBaseIE(SoundcloudIE):
-    @staticmethod
-    def _extract_id(e):
-        return compat_str(e['id']) if e.get('id') else None
-
-    def _extract_track_entries(self, tracks):
-        return [
-            self.url_result(
-                track['permalink_url'], SoundcloudIE.ie_key(),
-                video_id=self._extract_id(track))
-            for track in tracks if track.get('permalink_url')]
+    def _extract_track_entries(self, tracks, token=None):
+        entries = []
+        for track in tracks:
+            track_id = str_or_none(track.get('id'))
+            url = track.get('permalink_url')
+            if not url:
+                if not track_id:
+                    continue
+                url = self._API_V2_BASE + 'tracks/' + track_id
+                if token:
+                    url += '?secret_token=' + token
+            entries.append(self.url_result(
+                url, SoundcloudIE.ie_key(), track_id))
+        return entries
  
  
  class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@@ -480,54 +558,41 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
  
-        # extract uploader (which is in the url)
-        uploader = mobj.group('uploader')
-        # extract simple title (uploader + slug of song title)
-        slug_title = mobj.group('slug_title')
-        full_title = '%s/sets/%s' % (uploader, slug_title)
-        url = 'https://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
-
+        full_title = '%s/sets/%s' % mobj.group('uploader', 'slug_title')
          token = mobj.group('token')
          if token:
              full_title += '/' + token
-            url += '/' + token
  
-        resolv_url = self._resolv_url(url)
-        info = self._download_json(resolv_url, full_title)
+        info = self._download_json(self._resolv_url(
+            self._BASE_URL + full_title), full_title)
  
          if 'errors' in info:
              msgs = (compat_str(err['error_message']) for err in info['errors'])
              raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
  
-        entries = self._extract_track_entries(info['tracks'])
+        entries = self._extract_track_entries(info['tracks'], token)
  
-        return {
-            '_type': 'playlist',
-            'entries': entries,
-            'id': '%s' % info['id'],
-            'title': info['title'],
-        }
+        return self.playlist_result(
+            entries, str_or_none(info.get('id')), info.get('title'))
  
  
  class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
-    _API_V2_BASE = 'https://api-v2.soundcloud.com'
-
      def _extract_playlist(self, base_url, playlist_id, playlist_title):
          COMMON_QUERY = {
-            'limit': 50,
-            'client_id': self._CLIENT_ID,
+            'limit': 2000000000,
              'linked_partitioning': '1',
          }
  
          query = COMMON_QUERY.copy()
          query['offset'] = 0
  
-        next_href = base_url + '?' + compat_urllib_parse_urlencode(query)
+        next_href = base_url
  
          entries = []
          for i in itertools.count():
              response = self._download_json(
-                next_href, playlist_id, 'Downloading track page %s' % (i + 1))
+                next_href, playlist_id,
+                'Downloading track page %s' % (i + 1), query=query)
  
              collection = response['collection']
  
@@ -546,9 +611,8 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
                          continue
                      return self.url_result(
                          permalink_url,
-                        ie=SoundcloudIE.ie_key() if SoundcloudIE.suitable(permalink_url) else None,
-                        video_id=self._extract_id(cand),
-                        video_title=cand.get('title'))
+                        SoundcloudIE.ie_key() if SoundcloudIE.suitable(permalink_url) else None,
+                        str_or_none(cand.get('id')), cand.get('title'))
  
              for e in collection:
                  entry = resolve_entry((e, e.get('track'), e.get('playlist')))
@@ -559,11 +623,10 @@ class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
              if not next_href:
                  break
  
-            parsed_next_href = compat_urlparse.urlparse(response['next_href'])
-            qs = compat_urlparse.parse_qs(parsed_next_href.query)
-            qs.update(COMMON_QUERY)
-            next_href = compat_urlparse.urlunparse(
-                parsed_next_href._replace(query=compat_urllib_parse_urlencode(qs, True)))
+            next_href = response['next_href']
+            parsed_next_href = compat_urlparse.urlparse(next_href)
+            query = compat_urlparse.parse_qs(parsed_next_href.query)
+            query.update(COMMON_QUERY)
  
          return {
              '_type': 'playlist',
@@ -609,7 +672,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
          'url': 'https://soundcloud.com/jcv246/sets',
          'info_dict': {
              'id': '12982173',
-            'title': 'Jordi / cv (Playlists)',
+            'title': 'Jordi / cv (Sets)',
          },
          'playlist_mincount': 2,
      }, {
@@ -636,39 +699,29 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
      }]
  
      _BASE_URL_MAP = {
-        'all': '%s/stream/users/%%s' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-        'tracks': '%s/users/%%s/tracks' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-        'albums': '%s/users/%%s/albums' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-        'sets': '%s/users/%%s/playlists' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-        'reposts': '%s/stream/users/%%s/reposts' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-        'likes': '%s/users/%%s/likes' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-        'spotlight': '%s/users/%%s/spotlight' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
-    }
-
-    _TITLE_MAP = {
-        'all': 'All',
-        'tracks': 'Tracks',
-        'albums': 'Albums',
-        'sets': 'Playlists',
-        'reposts': 'Reposts',
-        'likes': 'Likes',
-        'spotlight': 'Spotlight',
+        'all': 'stream/users/%s',
+        'tracks': 'users/%s/tracks',
+        'albums': 'users/%s/albums',
+        'sets': 'users/%s/playlists',
+        'reposts': 'stream/users/%s/reposts',
+        'likes': 'users/%s/likes',
+        'spotlight': 'users/%s/spotlight',
      }
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          uploader = mobj.group('user')
  
-        url = 'https://soundcloud.com/%s/' % uploader
-        resolv_url = self._resolv_url(url)
          user = self._download_json(
-            resolv_url, uploader, 'Downloading user info')
+            self._resolv_url(self._BASE_URL + uploader),
+            uploader, 'Downloading user info')
  
          resource = mobj.group('rsrc') or 'all'
  
          return self._extract_playlist(
-            self._BASE_URL_MAP[resource] % user['id'], compat_str(user['id']),
-            '%s (%s)' % (user['username'], self._TITLE_MAP[resource]))
+            self._API_V2_BASE + self._BASE_URL_MAP[resource] % user['id'],
+            str_or_none(user.get('id')),
+            '%s (%s)' % (user['username'], resource.capitalize()))
  
  
  class SoundcloudTrackStationIE(SoundcloudPagedPlaylistBaseIE):
@@ -678,7 +731,7 @@ class SoundcloudTrackStationIE(SoundcloudPagedPlaylistBaseIE):
          'url': 'https://soundcloud.com/stations/track/officialsundial/your-text',
          'info_dict': {
              'id': '286017854',
-            'title': 'Track station: your-text',
+            'title': 'Track station: your text',
          },
          'playlist_mincount': 47,
      }]
@@ -686,19 +739,17 @@ class SoundcloudTrackStationIE(SoundcloudPagedPlaylistBaseIE):
      def _real_extract(self, url):
          track_name = self._match_id(url)
  
-        webpage = self._download_webpage(url, track_name)
-
+        track = self._download_json(self._resolv_url(url), track_name)
          track_id = self._search_regex(
-            r'soundcloud:track-stations:(\d+)', webpage, 'track id')
+            r'soundcloud:track-stations:(\d+)', track['id'], 'track id')
  
          return self._extract_playlist(
-            '%s/stations/soundcloud:track-stations:%s/tracks'
-            % (self._API_V2_BASE, track_id),
-            track_id, 'Track station: %s' % track_name)
+            self._API_V2_BASE + 'stations/%s/tracks' % track['id'],
+            track_id, 'Track station: %s' % track['title'])
  
  
  class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
-    _VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
+    _VALID_URL = r'https?://api(?:-v2)?\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
      IE_NAME = 'soundcloud:playlist'
      _TESTS = [{
          'url': 'https://api.soundcloud.com/playlists/4110309',
@@ -713,29 +764,20 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          playlist_id = mobj.group('id')
-        base_url = '%s//api.soundcloud.com/playlists/%s.json?' % (self.http_scheme(), playlist_id)
  
-        data_dict = {
-            'client_id': self._CLIENT_ID,
-        }
+        query = {}
          token = mobj.group('token')
-
          if token:
-            data_dict['secret_token'] = token
+            query['secret_token'] = token
  
-        data = compat_urllib_parse_urlencode(data_dict)
          data = self._download_json(
-            base_url + data, playlist_id, 'Downloading playlist')
+            self._API_V2_BASE + 'playlists/' + playlist_id,
+            playlist_id, 'Downloading playlist', query=query)
  
-        entries = self._extract_track_entries(data['tracks'])
+        entries = self._extract_track_entries(data['tracks'], token)
  
-        return {
-            '_type': 'playlist',
-            'id': playlist_id,
-            'title': data.get('title'),
-            'description': data.get('description'),
-            'entries': entries,
-        }
+        return self.playlist_result(
+            entries, playlist_id, data.get('title'), data.get('description'))
  
  
  class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
@@ -753,18 +795,17 @@ class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
      _SEARCH_KEY = 'scsearch'
      _MAX_RESULTS_PER_PAGE = 200
      _DEFAULT_RESULTS_PER_PAGE = 50
-    _API_V2_BASE = 'https://api-v2.soundcloud.com'
  
      def _get_collection(self, endpoint, collection_id, **query):
          limit = min(
              query.get('limit', self._DEFAULT_RESULTS_PER_PAGE),
              self._MAX_RESULTS_PER_PAGE)
-        query['limit'] = limit
-        query['client_id'] = self._CLIENT_ID
-        query['linked_partitioning'] = '1'
-        query['offset'] = 0
-        data = compat_urllib_parse_urlencode(query)
-        next_url = '{0}{1}?{2}'.format(self._API_V2_BASE, endpoint, data)
+        query.update({
+            'limit': limit,
+            'linked_partitioning': 1,
+            'offset': 0,
+        })
+        next_url = update_url_query(self._API_V2_BASE + endpoint, query)
  
          collected_results = 0
  
@@ -791,5 +832,5 @@ class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
                  break
  
      def _get_n_results(self, query, n):
-        tracks = self._get_collection('/search/tracks', query, limit=n, q=query)
+        tracks = self._get_collection('search/tracks', query, limit=n, q=query)
          return self.playlist_result(tracks, playlist_title=query)
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index e040ada29b24542582f72f08f31b843d928af251..61ca902ce286e6274c7d5776bd10c265a023643f 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -4,6 +4,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      ExtractorError,
      merge_dicts,
      orderedSet,
@@ -64,7 +65,7 @@ class SpankBangIE(InfoExtractor):
              url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
              video_id, headers={'Cookie': 'country=US'})
  
-        if re.search(r'<[^>]+\bid=["\']video_removed', webpage):
+        if re.search(r'<[^>]+\b(?:id|class)=["\']video_removed', webpage):
              raise ExtractorError(
                  'Video %s is not available' % video_id, expected=True)
  
@@ -75,11 +76,20 @@ class SpankBangIE(InfoExtractor):
              if not f_url:
                  return
              f = parse_resolution(format_id)
-            f.update({
-                'url': f_url,
-                'format_id': format_id,
-            })
-            formats.append(f)
+            ext = determine_ext(f_url)
+            if format_id.startswith('m3u8') or ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    f_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif format_id.startswith('mpd') or ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    f_url, video_id, mpd_id='dash', fatal=False))
+            elif ext == 'mp4' or f.get('width') or f.get('height'):
+                f.update({
+                    'url': f_url,
+                    'format_id': format_id,
+                })
+                formats.append(f)
  
          STREAM_URL_PREFIX = 'stream_url_'
  
@@ -93,28 +103,22 @@ class SpankBangIE(InfoExtractor):
                  r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
                  webpage, 'stream key', group='value')
  
-            sb_csrf_session = self._get_cookies(
-                'https://spankbang.com')['sb_csrf_session'].value
-
              stream = self._download_json(
                  'https://spankbang.com/api/videos/stream', video_id,
                  'Downloading stream JSON', data=urlencode_postdata({
                      'id': stream_key,
                      'data': 0,
-                    'sb_csrf_session': sb_csrf_session,
                  }), headers={
                      'Referer': url,
-                    'X-CSRFToken': sb_csrf_session,
+                    'X-Requested-With': 'XMLHttpRequest',
                  })
  
              for format_id, format_url in stream.items():
-                if format_id.startswith(STREAM_URL_PREFIX):
-                    if format_url and isinstance(format_url, list):
-                        format_url = format_url[0]
-                    extract_format(
-                        format_id[len(STREAM_URL_PREFIX):], format_url)
+                if format_url and isinstance(format_url, list):
+                    format_url = format_url[0]
+                extract_format(format_id, format_url)
  
-        self._sort_formats(formats)
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
  
          info = self._search_json_ld(webpage, video_id, default={})
  
diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py

index 28baf901c9f021c15544f099f78dd5d5a6b9165c..359dadaa3cce4540f5abb2fb58a43596379e0c56 100644 (file)
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -1,14 +1,14 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .ard import ARDMediathekIE
+from .ard import ARDMediathekBaseIE
  from ..utils import (
      ExtractorError,
      get_element_by_attribute,
  )
  
  
-class SRMediathekIE(ARDMediathekIE):
+class SRMediathekIE(ARDMediathekBaseIE):
      IE_NAME = 'sr:mediathek'
      IE_DESC = 'Saarländischer Rundfunk'
      _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
diff --git a/youtube_dl/extractor/streamango.py b/youtube_dl/extractor/streamango.py

deleted file mode 100644 (file)

index f1e17dd..0000000
--- a/youtube_dl/extractor/streamango.py
+++ /dev/null
@@ -1,128 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_chr
-from ..utils import (
-    determine_ext,
-    ExtractorError,
-    int_or_none,
-    js_to_json,
-)
-
-
-class StreamangoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:streamango\.com|fruithosts\.net|streamcherry\.com)/(?:f|embed)/(?P<id>[^/?#&]+)'
-    _TESTS = [{
-        'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4',
-        'md5': 'e992787515a182f55e38fc97588d802a',
-        'info_dict': {
-            'id': 'clapasobsptpkdfe',
-            'ext': 'mp4',
-            'title': '20170315_150006.mp4',
-        }
-    }, {
-        # no og:title
-        'url': 'https://streamango.com/embed/foqebrpftarclpob/asdf_asd_2_mp4',
-        'info_dict': {
-            'id': 'foqebrpftarclpob',
-            'ext': 'mp4',
-            'title': 'foqebrpftarclpob',
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'skip': 'gone',
-    }, {
-        'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
-        'only_matching': True,
-    }, {
-        'url': 'https://fruithosts.net/f/mreodparcdcmspsm/w1f1_r4lph_2018_brrs_720p_latino_mp4',
-        'only_matching': True,
-    }, {
-        'url': 'https://streamcherry.com/f/clapasobsptpkdfe/',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        def decrypt_src(encoded, val):
-            ALPHABET = '=/+9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA'
-            encoded = re.sub(r'[^A-Za-z0-9+/=]', '', encoded)
-            decoded = ''
-            sm = [None] * 4
-            i = 0
-            str_len = len(encoded)
-            while i < str_len:
-                for j in range(4):
-                    sm[j % 4] = ALPHABET.index(encoded[i])
-                    i += 1
-                char_code = ((sm[0] << 0x2) | (sm[1] >> 0x4)) ^ val
-                decoded += compat_chr(char_code)
-                if sm[2] != 0x40:
-                    char_code = ((sm[1] & 0xf) << 0x4) | (sm[2] >> 0x2)
-                    decoded += compat_chr(char_code)
-                if sm[3] != 0x40:
-                    char_code = ((sm[2] & 0x3) << 0x6) | sm[3]
-                    decoded += compat_chr(char_code)
-            return decoded
-
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._og_search_title(webpage, default=video_id)
-
-        formats = []
-        for format_ in re.findall(r'({[^}]*\bsrc\s*:\s*[^}]*})', webpage):
-            mobj = re.search(r'(src\s*:\s*[^(]+\(([^)]*)\)[\s,]*)', format_)
-            if mobj is None:
-                continue
-
-            format_ = format_.replace(mobj.group(0), '')
-
-            video = self._parse_json(
-                format_, video_id, transform_source=js_to_json,
-                fatal=False) or {}
-
-            mobj = re.search(
-                r'([\'"])(?P<src>(?:(?!\1).)+)\1\s*,\s*(?P<val>\d+)',
-                mobj.group(1))
-            if mobj is None:
-                continue
-
-            src = decrypt_src(mobj.group('src'), int_or_none(mobj.group('val')))
-            if not src:
-                continue
-
-            ext = determine_ext(src, default_ext=None)
-            if video.get('type') == 'application/dash+xml' or ext == 'mpd':
-                formats.extend(self._extract_mpd_formats(
-                    src, video_id, mpd_id='dash', fatal=False))
-            else:
-                formats.append({
-                    'url': src,
-                    'ext': ext or 'mp4',
-                    'width': int_or_none(video.get('width')),
-                    'height': int_or_none(video.get('height')),
-                    'tbr': int_or_none(video.get('bitrate')),
-                })
-
-        if not formats:
-            error = self._search_regex(
-                r'<p[^>]+\bclass=["\']lead[^>]+>(.+?)</p>', webpage,
-                'error', default=None)
-            if not error and '>Sorry' in webpage:
-                error = 'Video %s is not available' % video_id
-            if error:
-                raise ExtractorError(error, expected=True)
-
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'url': url,
-            'title': title,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/stretchinternet.py b/youtube_dl/extractor/stretchinternet.py

index ae2ac1b42fe0021c8b904721221d441368bf50ca..4dbead2ba428abdbfc678aea31403b998a1eb06c 100644 (file)
--- a/youtube_dl/extractor/stretchinternet.py
+++ b/youtube_dl/extractor/stretchinternet.py
@@ -5,44 +5,28 @@ from ..utils import int_or_none
  
  
  class StretchInternetIE(InfoExtractor):
-    _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
+    _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)'
      _TEST = {
-        'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
+        'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video',
          'info_dict': {
-            'id': '313900',
+            'id': '573272',
              'ext': 'mp4',
-            'title': 'Augustana (S.D.) Baseball vs University of Mary',
-            'description': 'md5:7578478614aae3bdd4a90f578f787438',
-            'timestamp': 1490468400,
-            'upload_date': '20170325',
+            'title': 'University of Mary Wrestling vs. Upper Iowa',
+            'timestamp': 1575668361,
+            'upload_date': '20191206',
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        stream = self._download_json(
-            'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
-            % video_id, video_id)
-
-        video_url = 'https://%s' % stream['source']
-
          event = self._download_json(
-            'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
-            video_id, query={
-                'clientID': 99997,
-                'eventID': video_id,
-                'token': 'asdf',
-            })['event']
-
-        title = event.get('title') or event['mobileTitle']
-        description = event.get('customText')
-        timestamp = int_or_none(event.get('longtime'))
+            'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
+            video_id)[0]
  
          return {
              'id': video_id,
-            'title': title,
-            'description': description,
-            'timestamp': timestamp,
-            'url': video_url,
+            'title': event['title'],
+            'timestamp': int_or_none(event.get('dateCreated'), 1000),
+            'url': 'https://' + event['media'][0]['url'],
          }
diff --git a/youtube_dl/extractor/stv.py b/youtube_dl/extractor/stv.py

index ccb074cd42abd1a3cfc928cdb62ed723ecc53d00..bae8b71f4ba8247696f81cec2315a4beb99cd891 100644 (file)
--- a/youtube_dl/extractor/stv.py
+++ b/youtube_dl/extractor/stv.py
@@ -4,15 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urllib_parse_urlparse
-)
  from ..utils import (
-    extract_attributes,
+    compat_str,
      float_or_none,
      int_or_none,
-    str_or_none,
  )
  
  
@@ -20,20 +15,20 @@ class STVPlayerIE(InfoExtractor):
      IE_NAME = 'stv:player'
      _VALID_URL = r'https?://player\.stv\.tv/(?P<type>episode|video)/(?P<id>[a-z0-9]{4})'
      _TEST = {
-        'url': 'https://player.stv.tv/video/7srz/victoria/interview-with-the-cast-ahead-of-new-victoria/',
-        'md5': '2ad867d4afd641fa14187596e0fbc91b',
+        'url': 'https://player.stv.tv/video/4gwd/emmerdale/60-seconds-on-set-with-laura-norton/',
+        'md5': '5adf9439c31d554f8be0707c7abe7e0a',
          'info_dict': {
-            'id': '6016487034001',
+            'id': '5333973339001',
              'ext': 'mp4',
-            'upload_date': '20190321',
-            'title': 'Interview with the cast ahead of new Victoria',
-            'description': 'Nell Hudson and Lily Travers tell us what to expect in the new season of Victoria.',
-            'timestamp': 1553179628,
+            'upload_date': '20170301',
+            'title': '60 seconds on set with Laura Norton',
+            'description': "How many questions can Laura - a.k.a Kerry Wyatt - answer in 60 seconds? Let\'s find out!",
+            'timestamp': 1488388054,
              'uploader_id': '1486976045',
          },
          'skip': 'this resource is unavailable outside of the UK',
      }
-    _PUBLISHER_ID = '1486976045'
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1486976045/default_default/index.html?videoId=%s'
      _PTYPE_MAP = {
          'episode': 'episodes',
          'video': 'shortform',
@@ -41,54 +36,32 @@ class STVPlayerIE(InfoExtractor):
  
      def _real_extract(self, url):
          ptype, video_id = re.match(self._VALID_URL, url).groups()
-        webpage = self._download_webpage(url, video_id)
+        resp = self._download_json(
+            'https://player.api.stv.tv/v1/%s/%s' % (self._PTYPE_MAP[ptype], video_id),
+            video_id)
  
-        qs = compat_parse_qs(compat_urllib_parse_urlparse(self._search_regex(
-            r'itemprop="embedURL"[^>]+href="([^"]+)',
-            webpage, 'embed URL', default=None)).query)
-        publisher_id = qs.get('publisherID', [None])[0] or self._PUBLISHER_ID
+        result = resp['results']
+        video = result['video']
+        video_id = compat_str(video['id'])
  
-        player_attr = extract_attributes(self._search_regex(
-            r'(<[^>]+class="bcplayer"[^>]+>)', webpage, 'player', default=None)) or {}
+        subtitles = {}
+        _subtitles = result.get('_subtitles') or {}
+        for ext, sub_url in _subtitles.items():
+            subtitles.setdefault('en', []).append({
+                'ext': 'vtt' if ext == 'webvtt' else ext,
+                'url': sub_url,
+            })
  
-        info = {}
-        duration = ref_id = series = video_id = None
-        api_ref_id = player_attr.get('data-player-api-refid')
-        if api_ref_id:
-            resp = self._download_json(
-                'https://player.api.stv.tv/v1/%s/%s' % (self._PTYPE_MAP[ptype], api_ref_id),
-                api_ref_id, fatal=False)
-            if resp:
-                result = resp.get('results') or {}
-                video = result.get('video') or {}
-                video_id = str_or_none(video.get('id'))
-                ref_id = video.get('guid')
-                duration = video.get('length')
-                programme = result.get('programme') or {}
-                series = programme.get('name') or programme.get('shortName')
-                subtitles = {}
-                _subtitles = result.get('_subtitles') or {}
-                for ext, sub_url in _subtitles.items():
-                    subtitles.setdefault('en', []).append({
-                        'ext': 'vtt' if ext == 'webvtt' else ext,
-                        'url': sub_url,
-                    })
-                info.update({
-                    'description': result.get('summary'),
-                    'subtitles': subtitles,
-                    'view_count': int_or_none(result.get('views')),
-                })
-        if not video_id:
-            video_id = qs.get('videoId', [None])[0] or self._search_regex(
-                r'<link\s+itemprop="url"\s+href="(\d+)"',
-                webpage, 'video id', default=None) or 'ref:' + (ref_id or player_attr['data-refid'])
+        programme = result.get('programme') or {}
  
-        info.update({
+        return {
              '_type': 'url_transparent',
-            'duration': float_or_none(duration or player_attr.get('data-duration'), 1000),
              'id': video_id,
+            'url': self.BRIGHTCOVE_URL_TEMPLATE % video_id,
+            'description': result.get('summary'),
+            'duration': float_or_none(video.get('length'), 1000),
+            'subtitles': subtitles,
+            'view_count': int_or_none(result.get('views')),
+            'series': programme.get('name') or programme.get('shortName'),
              'ie_key': 'BrightcoveNew',
-            'series': series or player_attr.get('data-programme-name'),
-            'url': 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id),
-        })
-        return info
+        }
diff --git a/youtube_dl/extractor/teachable.py b/youtube_dl/extractor/teachable.py

index c1a9deafee4dc2bdb16393b64374367cd47f3e8c..6b7f13b437e3f3671b355b7dabcea9fd5056969e 100644 (file)
--- a/youtube_dl/extractor/teachable.py
+++ b/youtube_dl/extractor/teachable.py
@@ -48,6 +48,16 @@ class TeachableBaseIE(InfoExtractor):
              'https://%s/sign_in' % site, None,
              'Downloading %s login page' % site)
  
+        def is_logged(webpage):
+            return any(re.search(p, webpage) for p in (
+                r'class=["\']user-signout',
+                r'<a[^>]+\bhref=["\']/sign_out',
+                r'Log\s+[Oo]ut\s*<'))
+
+        if is_logged(login_page):
+            self._logged_in = True
+            return
+
          login_url = compat_str(urlh.geturl())
  
          login_form = self._hidden_inputs(login_page)
@@ -78,10 +88,7 @@ class TeachableBaseIE(InfoExtractor):
                  'Go to https://%s/ and accept.' % (site, site), expected=True)
  
          # Successful login
-        if any(re.search(p, response) for p in (
-                r'class=["\']user-signout',
-                r'<a[^>]+\bhref=["\']/sign_out',
-                r'>\s*Log out\s*<')):
+        if is_logged(response):
              self._logged_in = True
              return
  
@@ -158,8 +165,12 @@ class TeachableIE(TeachableBaseIE):
              if any(re.search(p, webpage) for p in (
                      r'class=["\']lecture-contents-locked',
                      r'>\s*Lecture contents locked',
-                    r'id=["\']lecture-locked')):
+                    r'id=["\']lecture-locked',
+                    # https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
+                    r'class=["\'](?:inner-)?lesson-locked',
+                    r'>LESSON LOCKED<')):
                  self.raise_login_required('Lecture contents locked')
+            raise ExtractorError('Unable to find video URL')
  
          title = self._og_search_title(webpage, default=None)
  
diff --git a/youtube_dl/extractor/teachingchannel.py b/youtube_dl/extractor/teachingchannel.py

index e89759714e6e3cea3da8a7007df838618f6f1cc1..624cdb3adcaf8644fe2c816970fbe990656968ef 100644 (file)
--- a/youtube_dl/extractor/teachingchannel.py
+++ b/youtube_dl/extractor/teachingchannel.py
@@ -1,35 +1,33 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from .ooyala import OoyalaIE
  
  
  class TeachingChannelIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?teachingchannel\.org/videos/(?P<title>.+)'
+    _VALID_URL = r'https?://(?:www\.)?teachingchannel\.org/videos?/(?P<id>[^/?&#]+)'
  
      _TEST = {
          'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution',
-        'md5': '3d6361864d7cac20b57c8784da17166f',
          'info_dict': {
-            'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM',
+            'id': '3swwlzkT',
              'ext': 'mp4',
              'title': 'A History of Teaming',
              'description': 'md5:2a9033db8da81f2edffa4c99888140b3',
-            'duration': 422.255,
+            'duration': 422,
+            'upload_date': '20170316',
+            'timestamp': 1489691297,
          },
          'params': {
              'skip_download': True,
          },
-        'add_ie': ['Ooyala'],
+        'add_ie': ['JWPlatform'],
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        title = mobj.group('title')
-        webpage = self._download_webpage(url, title)
-        ooyala_code = self._search_regex(
-            r'data-embed-code=\'(.+?)\'', webpage, 'ooyala code')
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        mid = self._search_regex(
+            r'(?:data-mid=["\']|id=["\']jw-video-player-)([a-zA-Z0-9]{8})',
+            webpage, 'media id')
  
-        return OoyalaIE._build_url_result(ooyala_code)
+        return self.url_result('jwplatform:' + mid, 'JWPlatform', mid)
diff --git a/youtube_dl/extractor/teamcoco.py b/youtube_dl/extractor/teamcoco.py

index 7640cf00a1d87bc2582e355899fca3e017babc69..5793b711f6dc1103f7d33b41abb766eabfab1aea 100644 (file)
--- a/youtube_dl/extractor/teamcoco.py
+++ b/youtube_dl/extractor/teamcoco.py
@@ -84,6 +84,19 @@ class TeamcocoIE(TurnerBaseIE):
              'only_matching': True,
          }
      ]
+    _RECORD_TEMPL = '''id
+        title
+        teaser
+        publishOn
+        thumb {
+          preview
+        }
+        tags {
+          name
+        }
+        duration
+        turnerMediaId
+        turnerMediaAuthToken'''
  
      def _graphql_call(self, query_template, object_type, object_id):
          find_object = 'find' + object_type
@@ -98,36 +111,36 @@ class TeamcocoIE(TurnerBaseIE):
          display_id = self._match_id(url)
  
          response = self._graphql_call('''{
-  %s(slug: "%s") {
+  %%s(slug: "%%s") {
      ... on RecordSlug {
        record {
+        %s
+      }
+    }
+    ... on PageSlug {
+      child {
          id
-        title
-        teaser
-        publishOn
-        thumb {
-          preview
-        }
-        file {
-          url
-        }
-        tags {
-          name
-        }
-        duration
-        turnerMediaId
-        turnerMediaAuthToken
        }
      }
      ... on NotFoundSlug {
        status
      }
    }
-}''', 'Slug', display_id)
+}''' % self._RECORD_TEMPL, 'Slug', display_id)
          if response.get('status'):
              raise ExtractorError('This video is no longer available.', expected=True)
  
-        record = response['record']
+        child = response.get('child')
+        if child:
+            record = self._graphql_call('''{
+  %%s(id: "%%s") {
+    ... on Video {
+      %s
+    }
+  }
+}''' % self._RECORD_TEMPL, 'Record', child['id'])
+        else:
+            record = response['record']
          video_id = record['id']
  
          info = {
@@ -150,25 +163,21 @@ class TeamcocoIE(TurnerBaseIE):
                  'accessTokenType': 'jws',
              }))
          else:
-            d = self._download_json(
+            video_sources = self._download_json(
                  'https://teamcoco.com/_truman/d/' + video_id,
-                video_id, fatal=False) or {}
-            video_sources = d.get('meta') or {}
-            if not video_sources:
-                video_sources = self._graphql_call('''{
-  %s(id: "%s") {
-    src
-  }
-}''', 'RecordVideoSource', video_id) or {}
+                video_id)['meta']['src']
+            if isinstance(video_sources, dict):
+                video_sources = video_sources.values()
  
              formats = []
              get_quality = qualities(['low', 'sd', 'hd', 'uhd'])
-            for format_id, src in video_sources.get('src', {}).items():
+            for src in video_sources:
                  if not isinstance(src, dict):
                      continue
                  src_url = src.get('src')
                  if not src_url:
                      continue
+                format_id = src.get('label')
                  ext = determine_ext(src_url, mimetype2ext(src.get('type')))
                  if format_id == 'hls' or ext == 'm3u8':
                      # compat_urllib_parse.urljoin does not work here
@@ -190,9 +199,6 @@ class TeamcocoIE(TurnerBaseIE):
                          'format_id': format_id,
                          'quality': get_quality(format_id),
                      })
-            if not formats:
-                formats = self._extract_m3u8_formats(
-                    record['file']['url'], video_id, 'mp4', fatal=False)
              self._sort_formats(formats)
              info['formats'] = formats
  
diff --git a/youtube_dl/extractor/telegraaf.py b/youtube_dl/extractor/telegraaf.py

index 0f576c1aba1a01491f657d8d892291cfe7934f21..2dc0205373a87381663fd655cf5512e8cfcdeaed 100644 (file)
--- a/youtube_dl/extractor/telegraaf.py
+++ b/youtube_dl/extractor/telegraaf.py
@@ -4,21 +4,25 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      determine_ext,
-    remove_end,
+    int_or_none,
+    parse_iso8601,
+    try_get,
  )
  
  
  class TelegraafIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?telegraaf\.nl/tv/(?:[^/]+/)+(?P<id>\d+)/[^/]+\.html'
+    _VALID_URL = r'https?://(?:www\.)?telegraaf\.nl/video/(?P<id>\d+)'
      _TEST = {
-        'url': 'http://www.telegraaf.nl/tv/nieuws/binnenland/24353229/__Tikibad_ontruimd_wegens_brand__.html',
+        'url': 'https://www.telegraaf.nl/video/734366489/historisch-scheepswrak-slaat-na-100-jaar-los',
          'info_dict': {
-            'id': '24353229',
+            'id': 'gaMItuoSeUg2',
              'ext': 'mp4',
-            'title': 'Tikibad ontruimd wegens brand',
-            'description': 'md5:05ca046ff47b931f9b04855015e163a4',
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 33,
+            'title': 'Historisch scheepswrak slaat na 100 jaar los',
+            'description': 'md5:6f53b7c4f55596722ac24d6c0ec00cfb',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 55,
+            'timestamp': 1572805527,
+            'upload_date': '20191103',
          },
          'params': {
              # m3u8 download
@@ -27,23 +31,30 @@ class TelegraafIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        article_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        video_id = self._download_json(
+            'https://www.telegraaf.nl/graphql', article_id, query={
+                'query': '''{
+  article(uid: %s) {
+    videos {
+      videoId
+    }
+  }
+}''' % article_id,
+            })['data']['article']['videos'][0]['videoId']
  
-        player_url = self._html_search_regex(
-            r'<iframe[^>]+src="([^"]+")', webpage, 'player URL')
-        player_page = self._download_webpage(
-            player_url, video_id, note='Download player webpage')
-        playlist_url = self._search_regex(
-            r'playlist\s*:\s*"([^"]+)"', player_page, 'playlist URL')
-        playlist_data = self._download_json(playlist_url, video_id)
+        item = self._download_json(
+            'https://content.tmgvideo.nl/playlist/item=%s/playlist.json' % video_id,
+            video_id)['items'][0]
+        title = item['title']
  
-        item = playlist_data['items'][0]
          formats = []
-        locations = item['locations']
+        locations = item.get('locations') or {}
          for location in locations.get('adaptive', []):
-            manifest_url = location['src']
+            manifest_url = location.get('src')
+            if not manifest_url:
+                continue
              ext = determine_ext(manifest_url)
              if ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
@@ -54,25 +65,25 @@ class TelegraafIE(InfoExtractor):
              else:
                  self.report_warning('Unknown adaptive format %s' % ext)
          for location in locations.get('progressive', []):
+            src = try_get(location, lambda x: x['sources'][0]['src'])
+            if not src:
+                continue
+            label = location.get('label')
              formats.append({
-                'url': location['sources'][0]['src'],
-                'width': location.get('width'),
-                'height': location.get('height'),
-                'format_id': 'http-%s' % location['label'],
+                'url': src,
+                'width': int_or_none(location.get('width')),
+                'height': int_or_none(location.get('height')),
+                'format_id': 'http' + ('-%s' % label if label else ''),
              })
  
          self._sort_formats(formats)
  
-        title = remove_end(self._og_search_title(webpage), ' - VIDEO')
-        description = self._og_search_description(webpage)
-        duration = item.get('duration')
-        thumbnail = item.get('poster')
-
          return {
              'id': video_id,
              'title': title,
-            'description': description,
+            'description': item.get('description'),
              'formats': formats,
-            'duration': duration,
-            'thumbnail': thumbnail,
+            'duration': int_or_none(item.get('duration')),
+            'thumbnail': item.get('poster'),
+            'timestamp': parse_iso8601(item.get('datecreated'), ' '),
          }
diff --git a/youtube_dl/extractor/telequebec.py b/youtube_dl/extractor/telequebec.py

index 6965c127b3351122e894608c8a41ac3e79aff58d..ae9f66787439462967baa63dc58f39870fb89382 100644 (file)
--- a/youtube_dl/extractor/telequebec.py
+++ b/youtube_dl/extractor/telequebec.py
@@ -7,6 +7,7 @@ from ..utils import (
      int_or_none,
      smuggle_url,
      try_get,
+    unified_timestamp,
  )
  
  
@@ -22,7 +23,13 @@ class TeleQuebecBaseIE(InfoExtractor):
  
  
  class TeleQuebecIE(TeleQuebecBaseIE):
-    _VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            zonevideo\.telequebec\.tv/media|
+                            coucou\.telequebec\.tv/videos
+                        )/(?P<id>\d+)
+                    '''
      _TESTS = [{
          # available till 01.01.2023
          'url': 'http://zonevideo.telequebec.tv/media/37578/un-petit-choc-et-puis-repart/un-chef-a-la-cabane',
@@ -41,6 +48,9 @@ class TeleQuebecIE(TeleQuebecBaseIE):
          # no description
          'url': 'http://zonevideo.telequebec.tv/media/30261',
          'only_matching': True,
+    }, {
+        'url': 'https://coucou.telequebec.tv/videos/41788/idee-de-genie/l-heure-du-bain',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -61,6 +71,52 @@ class TeleQuebecIE(TeleQuebecBaseIE):
          return info
  
  
+class TeleQuebecSquatIE(InfoExtractor):
+    _VALID_URL = r'https://squat\.telequebec\.tv/videos/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://squat.telequebec.tv/videos/9314',
+        'info_dict': {
+            'id': 'd59ae78112d542e793d83cc9d3a5b530',
+            'ext': 'mp4',
+            'title': 'Poupeflekta',
+            'description': 'md5:2f0718f8d2f8fece1646ee25fb7bce75',
+            'duration': 1351,
+            'timestamp': 1569057600,
+            'upload_date': '20190921',
+            'series': 'Miraculous : Les Aventures de Ladybug et Chat Noir',
+            'season': 'Saison 3',
+            'season_number': 3,
+            'episode_number': 57,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://squat.api.telequebec.tv/v1/videos/%s' % video_id,
+            video_id)
+
+        media_id = video['sourceId']
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'http://zonevideo.telequebec.tv/media/%s' % media_id,
+            'ie_key': TeleQuebecIE.ie_key(),
+            'id': media_id,
+            'title': video.get('titre'),
+            'description': video.get('description'),
+            'timestamp': unified_timestamp(video.get('datePublication')),
+            'series': video.get('container'),
+            'season': video.get('saison'),
+            'season_number': int_or_none(video.get('noSaison')),
+            'episode_number': int_or_none(video.get('episode')),
+        }
+
+
  class TeleQuebecEmissionIE(TeleQuebecBaseIE):
      _VALID_URL = r'''(?x)
                      https?://
diff --git a/youtube_dl/extractor/tenplay.py b/youtube_dl/extractor/tenplay.py

new file mode 100644 (file)

index 0000000..dff44a4
--- /dev/null
+++ b/youtube_dl/extractor/tenplay.py
@@ -0,0 +1,55 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    parse_age_limit,
+    parse_iso8601,
+    smuggle_url,
+)
+
+
+class TenPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})'
+    _TEST = {
+        'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
+        'info_dict': {
+            'id': '6060533435001',
+            'ext': 'mp4',
+            'title': 'MasterChef - S1 Ep. 1',
+            'description': 'md5:4fe7b78e28af8f2d900cd20d900ef95c',
+            'age_limit': 10,
+            'timestamp': 1240828200,
+            'upload_date': '20090427',
+            'uploader_id': '2199827728001',
+        },
+        'params': {
+            'format': 'bestvideo',
+            'skip_download': True,
+        }
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        content_id = self._match_id(url)
+        data = self._download_json(
+            'https://10play.com.au/api/video/' + content_id, content_id)
+        video = data.get('video') or {}
+        metadata = data.get('metaData') or {}
+        brightcove_id = video.get('videoId') or metadata['showContentVideoId']
+        brightcove_url = smuggle_url(
+            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+            {'geo_countries': ['AU']})
+
+        return {
+            '_type': 'url_transparent',
+            'url': brightcove_url,
+            'id': content_id,
+            'title': video.get('title') or metadata.get('pageContentName') or metadata.get('showContentName'),
+            'description': video.get('description'),
+            'age_limit': parse_age_limit(video.get('showRatingClassification') or metadata.get('showProgramClassification')),
+            'series': metadata.get('showName'),
+            'season': metadata.get('showContentSeason'),
+            'timestamp': parse_iso8601(metadata.get('contentPublishDate') or metadata.get('pageContentPublishDate')),
+            'ie_key': 'BrightcoveNew',
+        }
diff --git a/youtube_dl/extractor/thesun.py b/youtube_dl/extractor/thesun.py

index 22d003776c890fa71918047eb8c1984392ffb43d..15d4a693271f064b183106a77b824e29b878ebcc 100644 (file)
--- a/youtube_dl/extractor/thesun.py
+++ b/youtube_dl/extractor/thesun.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .ooyala import OoyalaIE
+from ..utils import extract_attributes
  
  
  class TheSunIE(InfoExtractor):
@@ -16,6 +16,7 @@ class TheSunIE(InfoExtractor):
          },
          'playlist_count': 2,
      }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
          article_id = self._match_id(url)
@@ -23,10 +24,15 @@ class TheSunIE(InfoExtractor):
          webpage = self._download_webpage(url, article_id)
  
          entries = []
-        for ooyala_id in re.findall(
-                r'<[^>]+\b(?:id\s*=\s*"thesun-ooyala-player-|data-content-id\s*=\s*")([^"]+)',
+        for video in re.findall(
+                r'<video[^>]+data-video-id-pending=[^>]+>',
                  webpage):
-            entries.append(OoyalaIE._build_url_result(ooyala_id))
+            attrs = extract_attributes(video)
+            video_id = attrs['data-video-id-pending']
+            account_id = attrs.get('data-account', '5067014667001')
+            entries.append(self.url_result(
+                self.BRIGHTCOVE_URL_TEMPLATE % (account_id, video_id),
+                'BrightcoveNew', video_id))
  
          return self.playlist_result(
              entries, article_id, self._og_search_title(webpage, fatal=False))
diff --git a/youtube_dl/extractor/trunews.py b/youtube_dl/extractor/trunews.py

index b0c7caabf336499dea40adbb62a321e43a6a03f4..cca5b5cebd7c7dedd081156fcb670be3b8bd6027 100644 (file)
--- a/youtube_dl/extractor/trunews.py
+++ b/youtube_dl/extractor/trunews.py
@@ -1,21 +1,12 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    dict_get,
-    float_or_none,
-    int_or_none,
-    unified_timestamp,
-    update_url_query,
-    url_or_none,
-)
  
  
  class TruNewsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
      _TEST = {
          'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
-        'md5': 'a19c024c3906ff954fac9b96ce66bb08',
          'info_dict': {
              'id': '5c5a21e65d3c196e1c0020cc',
              'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
@@ -28,48 +19,16 @@ class TruNewsIE(InfoExtractor):
          },
          'add_ie': ['Zype'],
      }
+    _ZYPE_TEMPL = 'https://player.zype.com/embed/%s.js?api_key=X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
-        video = self._download_json(
+        zype_id = self._download_json(
              'https://api.zype.com/videos', display_id, query={
                  'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
                  'per_page': 1,
                  'active': 'true',
                  'friendly_title': display_id,
-            })['response'][0]
-
-        zype_id = video['_id']
-
-        thumbnails = []
-        thumbnails_list = video.get('thumbnails')
-        if isinstance(thumbnails_list, list):
-            for thumbnail in thumbnails_list:
-                if not isinstance(thumbnail, dict):
-                    continue
-                thumbnail_url = url_or_none(thumbnail.get('url'))
-                if not thumbnail_url:
-                    continue
-                thumbnails.append({
-                    'url': thumbnail_url,
-                    'width': int_or_none(thumbnail.get('width')),
-                    'height': int_or_none(thumbnail.get('height')),
-                })
-
-        return {
-            '_type': 'url_transparent',
-            'url': update_url_query(
-                'https://player.zype.com/embed/%s.js' % zype_id,
-                {'api_key': 'X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'}),
-            'ie_key': 'Zype',
-            'id': zype_id,
-            'display_id': display_id,
-            'title': video.get('title'),
-            'description': dict_get(video, ('description', 'ott_description', 'short_description')),
-            'duration': int_or_none(video.get('duration')),
-            'timestamp': unified_timestamp(video.get('published_at')),
-            'average_rating': float_or_none(video.get('rating')),
-            'view_count': int_or_none(video.get('request_count')),
-            'thumbnails': thumbnails,
-        }
+            })['response'][0]['_id']
+        return self.url_result(self._ZYPE_TEMPL % zype_id, 'Zype', zype_id)
diff --git a/youtube_dl/extractor/tutv.py b/youtube_dl/extractor/tutv.py

deleted file mode 100644 (file)

index 362318b..0000000
--- a/youtube_dl/extractor/tutv.py
+++ /dev/null
@@ -1,36 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_b64decode,
-    compat_parse_qs,
-)
-
-
-class TutvIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tu\.tv/videos/(?P<id>[^/?]+)'
-    _TEST = {
-        'url': 'http://tu.tv/videos/robots-futbolistas',
-        'md5': '0cd9e28ad270488911b0d2a72323395d',
-        'info_dict': {
-            'id': '2973058',
-            'ext': 'mp4',
-            'title': 'Robots futbolistas',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        internal_id = self._search_regex(r'codVideo=([0-9]+)', webpage, 'internal video ID')
-
-        data_content = self._download_webpage(
-            'http://tu.tv/flvurl.php?codVideo=%s' % internal_id, video_id, 'Downloading video info')
-        video_url = compat_b64decode(compat_parse_qs(data_content)['kpt'][0]).decode('utf-8')
-
-        return {
-            'id': internal_id,
-            'url': video_url,
-            'title': self._og_search_title(webpage),
-        }
diff --git a/youtube_dl/extractor/tv2.py b/youtube_dl/extractor/tv2.py

index d5071e8a5faf72a25c6b21cdfb2987b6a731fb32..4a19b9be63a6369bd9006da2c6f4af951d23b4b2 100644 (file)
--- a/youtube_dl/extractor/tv2.py
+++ b/youtube_dl/extractor/tv2.py
@@ -4,13 +4,17 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
      determine_ext,
+    ExtractorError,
      int_or_none,
      float_or_none,
      js_to_json,
      parse_iso8601,
      remove_end,
+    strip_or_none,
+    try_get,
  )
  
  
@@ -20,7 +24,7 @@ class TV2IE(InfoExtractor):
          'url': 'http://www.tv2.no/v/916509/',
          'info_dict': {
              'id': '916509',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Se Frode Gryttens hyllest av Steven Gerrard',
              'description': 'TV 2 Sportens huspoet tar avskjed med Liverpools kaptein Steven Gerrard.',
              'timestamp': 1431715610,
@@ -29,22 +33,40 @@ class TV2IE(InfoExtractor):
              'view_count': int,
              'categories': list,
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
      }
+    _API_DOMAIN = 'sumo.tv2.no'
+    _PROTOCOLS = ('HDS', 'HLS', 'DASH')
+    _GEO_COUNTRIES = ['NO']
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        api_base = 'http://%s/api/web/asset/%s' % (self._API_DOMAIN, video_id)
  
          formats = []
          format_urls = []
-        for protocol in ('HDS', 'HLS'):
-            data = self._download_json(
-                'http://sumo.tv2.no/api/web/asset/%s/play.json?protocol=%s&videoFormat=SMIL+ISMUSP' % (video_id, protocol),
-                video_id, 'Downloading play JSON')['playback']
-            for item in data['items']['item']:
+        for protocol in self._PROTOCOLS:
+            try:
+                data = self._download_json(
+                    api_base + '/play.json?protocol=%s&videoFormat=SMIL+ISMUSP' % protocol,
+                    video_id, 'Downloading play JSON')['playback']
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                    error = self._parse_json(e.cause.read().decode(), video_id)['error']
+                    error_code = error.get('code')
+                    if error_code == 'ASSET_PLAYBACK_INVALID_GEO_LOCATION':
+                        self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
+                    elif error_code == 'SESSION_NOT_AUTHENTICATED':
+                        self.raise_login_required()
+                    raise ExtractorError(error['description'])
+                raise
+            items = try_get(data, lambda x: x['items']['item'])
+            if not items:
+                continue
+            if not isinstance(items, list):
+                items = [items]
+            for item in items:
+                if not isinstance(item, dict):
+                    continue
                  video_url = item.get('url')
                  if not video_url or video_url in format_urls:
                      continue
@@ -57,9 +79,13 @@ class TV2IE(InfoExtractor):
                      formats.extend(self._extract_f4m_formats(
                          video_url, video_id, f4m_id=format_id, fatal=False))
                  elif ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        video_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                        m3u8_id=format_id, fatal=False))
+                    if not data.get('drmProtected'):
+                        formats.extend(self._extract_m3u8_formats(
+                            video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                            m3u8_id=format_id, fatal=False))
+                elif ext == 'mpd':
+                    formats.extend(self._extract_mpd_formats(
+                        video_url, video_id, format_id, fatal=False))
                  elif ext == 'ism' or video_url.endswith('.ism/Manifest'):
                      pass
                  else:
@@ -69,34 +95,30 @@ class TV2IE(InfoExtractor):
                          'tbr': int_or_none(item.get('bitrate')),
                          'filesize': int_or_none(item.get('fileSize')),
                      })
+        if not formats and data.get('drmProtected'):
+            raise ExtractorError('This video is DRM protected.', expected=True)
          self._sort_formats(formats)
  
          asset = self._download_json(
-            'http://sumo.tv2.no/api/web/asset/%s.json' % video_id,
-            video_id, 'Downloading metadata JSON')['asset']
-
+            api_base + '.json', video_id,
+            'Downloading metadata JSON')['asset']
          title = asset['title']
-        description = asset.get('description')
-        timestamp = parse_iso8601(asset.get('createTime'))
-        duration = float_or_none(asset.get('accurateDuration') or asset.get('duration'))
-        view_count = int_or_none(asset.get('views'))
-        categories = asset.get('keywords', '').split(',')
  
          thumbnails = [{
              'id': thumbnail.get('@type'),
              'url': thumbnail.get('url'),
-        } for _, thumbnail in asset.get('imageVersions', {}).items()]
+        } for _, thumbnail in (asset.get('imageVersions') or {}).items()]
  
          return {
              'id': video_id,
              'url': video_url,
              'title': title,
-            'description': description,
+            'description': strip_or_none(asset.get('description')),
              'thumbnails': thumbnails,
-            'timestamp': timestamp,
-            'duration': duration,
-            'view_count': view_count,
-            'categories': categories,
+            'timestamp': parse_iso8601(asset.get('createTime')),
+            'duration': float_or_none(asset.get('accurateDuration') or asset.get('duration')),
+            'view_count': int_or_none(asset.get('views')),
+            'categories': asset.get('keywords', '').split(','),
              'formats': formats,
          }
  
@@ -108,7 +130,7 @@ class TV2ArticleIE(InfoExtractor):
          'info_dict': {
              'id': '6930542',
              'title': 'Russen hetses etter pingvintyveri - innrømmer å ha åpnet luken på buret',
-            'description': 'md5:339573779d3eea3542ffe12006190954',
+            'description': 'De fire siktede nekter fortsatt for å ha stjålet pingvinbabyene, men innrømmer å ha åpnet luken til de små kyllingene.',
          },
          'playlist_count': 2,
      }, {
@@ -126,7 +148,7 @@ class TV2ArticleIE(InfoExtractor):
  
          if not assets:
              # New embed pattern
-            for v in re.findall(r'TV2ContentboxVideo\(({.+?})\)', webpage):
+            for v in re.findall(r'(?s)TV2ContentboxVideo\(({.+?})\)', webpage):
                  video = self._parse_json(
                      v, playlist_id, transform_source=js_to_json, fatal=False)
                  if not video:
@@ -143,3 +165,28 @@ class TV2ArticleIE(InfoExtractor):
          description = remove_end(self._og_search_description(webpage), ' - TV2.no')
  
          return self.playlist_result(entries, playlist_id, title, description)
+
+
+class KatsomoIE(TV2IE):
+    _VALID_URL = r'https?://(?:www\.)?(?:katsomo|mtv)\.fi/(?:#!/)?(?:[^/]+/[0-9a-z-]+-\d+/[0-9a-z-]+-|[^/]+/\d+/[^/]+/)(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://www.mtv.fi/sarja/mtv-uutiset-live-33001002003/lahden-pelicans-teki-kovan-ratkaisun-ville-nieminen-pihalle-1181321',
+        'info_dict': {
+            'id': '1181321',
+            'ext': 'mp4',
+            'title': 'MTV Uutiset Live',
+            'description': 'Päätöksen teki Pelicansin hallitus.',
+            'timestamp': 1575116484,
+            'upload_date': '20191130',
+            'duration': 37.12,
+            'view_count': int,
+            'categories': list,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+    _API_DOMAIN = 'api.katsomo.fi'
+    _PROTOCOLS = ('HLS', 'MPD')
+    _GEO_COUNTRIES = ['FI']
diff --git a/youtube_dl/extractor/tv2dk.py b/youtube_dl/extractor/tv2dk.py

new file mode 100644 (file)

index 0000000..611fdc0
--- /dev/null
+++ b/youtube_dl/extractor/tv2dk.py
@@ -0,0 +1,154 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    extract_attributes,
+    js_to_json,
+    url_or_none,
+)
+
+
+class TV2DKIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            tvsyd|
+                            tv2ostjylland|
+                            tvmidtvest|
+                            tv2fyn|
+                            tv2east|
+                            tv2lorry|
+                            tv2nord
+                        )\.dk/
+                        (:[^/]+/)*
+                        (?P<id>[^/?\#&]+)
+                    '''
+    _TESTS = [{
+        'url': 'https://www.tvsyd.dk/nyheder/28-10-2019/1930/1930-28-okt-2019?autoplay=1#player',
+        'info_dict': {
+            'id': '0_52jmwa0p',
+            'ext': 'mp4',
+            'title': '19:30 - 28. okt. 2019',
+            'timestamp': 1572290248,
+            'upload_date': '20191028',
+            'uploader_id': 'tvsyd',
+            'duration': 1347,
+            'view_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['Kaltura'],
+    }, {
+        'url': 'https://www.tv2ostjylland.dk/artikel/minister-gaar-ind-i-sag-om-diabetes-teknologi',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.tv2ostjylland.dk/nyheder/28-10-2019/22/2200-nyhederne-mandag-d-28-oktober-2019?autoplay=1#player',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.tvmidtvest.dk/nyheder/27-10-2019/1930/1930-27-okt-2019',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.tv2fyn.dk/artikel/fyn-kan-faa-landets-foerste-fabrik-til-groent-jetbraendstof',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.tv2east.dk/artikel/gods-faar-indleveret-tonsvis-af-aebler-100-kilo-aebler-gaar-til-en-aeblebrandy',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.tv2lorry.dk/koebenhavn/rasmus-paludan-evakueret-til-egen-demonstration#player',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.tv2nord.dk/artikel/dybt-uacceptabelt',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        entries = []
+        for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
+            video = extract_attributes(video_el)
+            kaltura_id = video.get('data-entryid')
+            if not kaltura_id:
+                continue
+            partner_id = video.get('data-partnerid')
+            if not partner_id:
+                continue
+            entries.append(self.url_result(
+                'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura',
+                video_id=kaltura_id))
+        return self.playlist_result(entries)
+
+
+class TV2DKBornholmPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://play\.tv2bornholm\.dk/\?.*?\bid=(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://play.tv2bornholm.dk/?area=specifikTV&id=781021',
+        'info_dict': {
+            'id': '781021',
+            'ext': 'mp4',
+            'title': '12Nyheder-27.11.19',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
+            data=json.dumps({
+                'playlist_id': video_id,
+                'serienavn': '',
+            }).encode(), headers={
+                'X-Requested-With': 'XMLHttpRequest',
+                'Content-Type': 'application/json; charset=UTF-8',
+            })['d']
+
+        # TODO: generalize flowplayer
+        title = self._search_regex(
+            r'title\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', video, 'title',
+            group='value')
+        sources = self._parse_json(self._search_regex(
+            r'(?s)sources:\s*(\[.+?\]),', video, 'sources'),
+            video_id, js_to_json)
+
+        formats = []
+        srcs = set()
+        for source in sources:
+            src = url_or_none(source.get('src'))
+            if not src:
+                continue
+            if src in srcs:
+                continue
+            srcs.add(src)
+            ext = determine_ext(src)
+            src_type = source.get('type')
+            if src_type == 'application/x-mpegurl' or ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    src, video_id, ext='mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif src_type == 'application/dash+xml' or ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    src, video_id, mpd_id='dash', fatal=False))
+            else:
+                formats.append({
+                    'url': src,
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py

index 1d66eeaff6e80cd1c629f79ad98a1a68f1d52564..2830c212ebbf9c6daa819667cc994001f8dc798a 100644 (file)
--- a/youtube_dl/extractor/twentyfourvideo.py
+++ b/youtube_dl/extractor/twentyfourvideo.py
@@ -18,7 +18,7 @@ class TwentyFourVideoIE(InfoExtractor):
                      https?://
                          (?P<host>
                              (?:(?:www|porno)\.)?24video\.
-                            (?:net|me|xxx|sexy?|tube|adult|site)
+                            (?:net|me|xxx|sexy?|tube|adult|site|vip)
                          )/
                          (?:
                              video/(?:(?:view|xml)/)?|
@@ -59,6 +59,9 @@ class TwentyFourVideoIE(InfoExtractor):
      }, {
          'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
          'only_matching': True,
+    }, {
+        'url': 'https://www.24video.vip/video/view/1044982',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index 0500e33a65c92a9667f84ca20f1c72e832365cad..a8c2502af8132834a34b8ef9c8ade935dd432604 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -17,12 +17,10 @@ from ..compat import (
  from ..utils import (
      clean_html,
      ExtractorError,
-    float_or_none,
      int_or_none,
      orderedSet,
      parse_duration,
      parse_iso8601,
-    qualities,
      try_get,
      unified_timestamp,
      update_url_query,
@@ -248,7 +246,7 @@ class TwitchVodIE(TwitchItemBaseIE):
                      https?://
                          (?:
                              (?:(?:www|go|m)\.)?twitch\.tv/(?:[^/]+/v(?:ideo)?|videos)/|
-                            player\.twitch\.tv/\?.*?\bvideo=v
+                            player\.twitch\.tv/\?.*?\bvideo=v?
                          )
                          (?P<id>\d+)
                      '''
@@ -306,6 +304,9 @@ class TwitchVodIE(TwitchItemBaseIE):
      }, {
          'url': 'https://www.twitch.tv/northernlion/video/291940395',
          'only_matching': True,
+    }, {
+        'url': 'https://player.twitch.tv/?video=480452374',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -324,6 +325,7 @@ class TwitchVodIE(TwitchItemBaseIE):
                      'allow_audio_only': 'true',
                      'allow_spectre': 'true',
                      'player': 'twitchweb',
+                    'playlist_include_framerate': 'true',
                      'nauth': access_token['token'],
                      'nauthsig': access_token['sig'],
                  })),
@@ -341,9 +343,8 @@ class TwitchVodIE(TwitchItemBaseIE):
              info['subtitles'] = {
                  'rechat': [{
                      'url': update_url_query(
-                        'https://rechat.twitch.tv/rechat-messages', {
-                            'video_id': 'v%s' % item_id,
-                            'start': info['timestamp'],
+                        'https://api.twitch.tv/v5/videos/%s/comments' % item_id, {
+                            'client_id': self._CLIENT_ID,
                          }),
                      'ext': 'json',
                  }],
@@ -596,6 +597,7 @@ class TwitchStreamIE(TwitchBaseIE):
              'allow_spectre': 'true',
              'p': random.randint(1000000, 10000000),
              'player': 'twitchweb',
+            'playlist_include_framerate': 'true',
              'segment_preference': '4',
              'sig': access_token['sig'].encode('utf-8'),
              'token': access_token['token'].encode('utf-8'),
@@ -641,7 +643,7 @@ class TwitchStreamIE(TwitchBaseIE):
  
  class TwitchClipsIE(TwitchBaseIE):
      IE_NAME = 'twitch:clips'
-    _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:[^/]+/)*|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
  
      _TESTS = [{
          'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
@@ -664,68 +666,89 @@ class TwitchClipsIE(TwitchBaseIE):
      }, {
          'url': 'https://www.twitch.tv/sergeynixon/clip/StormyThankfulSproutFutureMan',
          'only_matching': True,
+    }, {
+        'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        status = self._download_json(
-            'https://clips.twitch.tv/api/v2/clips/%s/status' % video_id,
-            video_id)
+        clip = self._download_json(
+            'https://gql.twitch.tv/gql', video_id, data=json.dumps({
+                'query': '''{
+  clip(slug: "%s") {
+    broadcaster {
+      displayName
+    }
+    createdAt
+    curator {
+      displayName
+      id
+    }
+    durationSeconds
+    id
+    tiny: thumbnailURL(width: 86, height: 45)
+    small: thumbnailURL(width: 260, height: 147)
+    medium: thumbnailURL(width: 480, height: 272)
+    title
+    videoQualities {
+      frameRate
+      quality
+      sourceURL
+    }
+    viewCount
+  }
+}''' % video_id,
+            }).encode(), headers={
+                'Client-ID': self._CLIENT_ID,
+            })['data']['clip']
+
+        if not clip:
+            raise ExtractorError(
+                'This clip is no longer available', expected=True)
  
          formats = []
-
-        for option in status['quality_options']:
+        for option in clip.get('videoQualities', []):
              if not isinstance(option, dict):
                  continue
-            source = url_or_none(option.get('source'))
+            source = url_or_none(option.get('sourceURL'))
              if not source:
                  continue
              formats.append({
                  'url': source,
                  'format_id': option.get('quality'),
                  'height': int_or_none(option.get('quality')),
-                'fps': int_or_none(option.get('frame_rate')),
+                'fps': int_or_none(option.get('frameRate')),
              })
-
          self._sort_formats(formats)
  
-        info = {
+        thumbnails = []
+        for thumbnail_id in ('tiny', 'small', 'medium'):
+            thumbnail_url = clip.get(thumbnail_id)
+            if not thumbnail_url:
+                continue
+            thumb = {
+                'id': thumbnail_id,
+                'url': thumbnail_url,
+            }
+            mobj = re.search(r'-(\d+)x(\d+)\.', thumbnail_url)
+            if mobj:
+                thumb.update({
+                    'height': int(mobj.group(2)),
+                    'width': int(mobj.group(1)),
+                })
+            thumbnails.append(thumb)
+
+        return {
+            'id': clip.get('id') or video_id,
+            'title': clip.get('title') or video_id,
              'formats': formats,
+            'duration': int_or_none(clip.get('durationSeconds')),
+            'views': int_or_none(clip.get('viewCount')),
+            'timestamp': unified_timestamp(clip.get('createdAt')),
+            'thumbnails': thumbnails,
+            'creator': try_get(clip, lambda x: x['broadcaster']['displayName'], compat_str),
+            'uploader': try_get(clip, lambda x: x['curator']['displayName'], compat_str),
+            'uploader_id': try_get(clip, lambda x: x['curator']['id'], compat_str),
          }
-
-        clip = self._call_api(
-            'kraken/clips/%s' % video_id, video_id, fatal=False, headers={
-                'Accept': 'application/vnd.twitchtv.v5+json',
-            })
-
-        if clip:
-            quality_key = qualities(('tiny', 'small', 'medium'))
-            thumbnails = []
-            thumbnails_dict = clip.get('thumbnails')
-            if isinstance(thumbnails_dict, dict):
-                for thumbnail_id, thumbnail_url in thumbnails_dict.items():
-                    thumbnails.append({
-                        'id': thumbnail_id,
-                        'url': thumbnail_url,
-                        'preference': quality_key(thumbnail_id),
-                    })
-
-            info.update({
-                'id': clip.get('tracking_id') or video_id,
-                'title': clip.get('title') or video_id,
-                'duration': float_or_none(clip.get('duration')),
-                'views': int_or_none(clip.get('views')),
-                'timestamp': unified_timestamp(clip.get('created_at')),
-                'thumbnails': thumbnails,
-                'creator': try_get(clip, lambda x: x['broadcaster']['display_name'], compat_str),
-                'uploader': try_get(clip, lambda x: x['curator']['display_name'], compat_str),
-                'uploader_id': try_get(clip, lambda x: x['curator']['id'], compat_str),
-            })
-        else:
-            info.update({
-                'title': video_id,
-                'id': video_id,
-            })
-
-        return info
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index cebb6238c561cbfdf77c5ada514d55c216c19a34..01468981c7cb9ebdcbcd0d7cca92df5b58672c7d 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -4,32 +4,67 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..compat import (
+    compat_HTTPError,
+    compat_parse_qs,
+    compat_urllib_parse_unquote,
+    compat_urllib_parse_urlparse,
+)
  from ..utils import (
-    determine_ext,
      dict_get,
      ExtractorError,
      float_or_none,
      int_or_none,
-    remove_end,
      try_get,
+    strip_or_none,
+    unified_timestamp,
+    update_url_query,
      xpath_text,
  )
  
-from .periscope import PeriscopeIE
+from .periscope import (
+    PeriscopeBaseIE,
+    PeriscopeIE,
+)
  
  
  class TwitterBaseIE(InfoExtractor):
+    _API_BASE = 'https://api.twitter.com/1.1/'
+    _BASE_REGEX = r'https?://(?:(?:www|m(?:obile)?)\.)?twitter\.com/'
+    _GUEST_TOKEN = None
+
+    def _extract_variant_formats(self, variant, video_id):
+        variant_url = variant.get('url')
+        if not variant_url:
+            return []
+        elif '.m3u8' in variant_url:
+            return self._extract_m3u8_formats(
+                variant_url, video_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False)
+        else:
+            tbr = int_or_none(dict_get(variant, ('bitrate', 'bit_rate')), 1000) or None
+            f = {
+                'url': variant_url,
+                'format_id': 'http' + ('-%d' % tbr if tbr else ''),
+                'tbr': tbr,
+            }
+            self._search_dimensions_in_video_url(f, variant_url)
+            return [f]
+
      def _extract_formats_from_vmap_url(self, vmap_url, video_id):
          vmap_data = self._download_xml(vmap_url, video_id)
-        video_url = xpath_text(vmap_data, './/MediaFile').strip()
-        if determine_ext(video_url) == 'm3u8':
-            return self._extract_m3u8_formats(
-                video_url, video_id, ext='mp4', m3u8_id='hls',
-                entry_protocol='m3u8_native')
-        return [{
-            'url': video_url,
-        }]
+        formats = []
+        urls = []
+        for video_variant in vmap_data.findall('.//{http://twitter.com/schema/videoVMapV2.xsd}videoVariant'):
+            video_variant.attrib['url'] = compat_urllib_parse_unquote(
+                video_variant.attrib['url'])
+            urls.append(video_variant.attrib['url'])
+            formats.extend(self._extract_variant_formats(
+                video_variant.attrib, video_id))
+        video_url = strip_or_none(xpath_text(vmap_data, './/MediaFile'))
+        if video_url not in urls:
+            formats.extend(self._extract_variant_formats({'url': video_url}, video_id))
+        return formats
  
      @staticmethod
      def _search_dimensions_in_video_url(a_format, video_url):
@@ -40,10 +75,30 @@ class TwitterBaseIE(InfoExtractor):
                  'height': int(m.group('height')),
              })
  
-
-class TwitterCardIE(TwitterBaseIE):
+    def _call_api(self, path, video_id, query={}):
+        headers = {
+            'Authorization': 'Bearer AAAAAAAAAAAAAAAAAAAAAPYXBAAAAAAACLXUNDekMxqa8h%2F40K4moUkGsoc%3DTYfbDKbT3jJPCEVnMYqilB28NHfOPqkca3qaAxGfsyKCs0wRbw',
+        }
+        if not self._GUEST_TOKEN:
+            self._GUEST_TOKEN = self._download_json(
+                self._API_BASE + 'guest/activate.json', video_id,
+                'Downloading guest token', data=b'',
+                headers=headers)['guest_token']
+        headers['x-guest-token'] = self._GUEST_TOKEN
+        try:
+            return self._download_json(
+                self._API_BASE + path, video_id, headers=headers, query=query)
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                raise ExtractorError(self._parse_json(
+                    e.cause.read().decode(),
+                    video_id)['errors'][0]['message'], expected=True)
+            raise
+
+
+class TwitterCardIE(InfoExtractor):
      IE_NAME = 'twitter:card'
-    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?P<path>cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)'
+    _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/(?:cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
@@ -51,19 +106,28 @@ class TwitterCardIE(TwitterBaseIE):
              'info_dict': {
                  'id': '560070183650213889',
                  'ext': 'mp4',
-                'title': 'Twitter web player',
-                'thumbnail': r're:^https?://.*\.jpg$',
+                'title': "Twitter - You can now shoot, edit and share video on Twitter. Capture life's most moving moments from your perspective.",
+                'description': 'md5:18d3e24bb4f6e5007487dd546e53bd96',
+                'uploader': 'Twitter',
+                'uploader_id': 'Twitter',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 30.033,
+                'timestamp': 1422366112,
+                'upload_date': '20150127',
              },
          },
          {
              'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
-            'md5': '7ee2a553b63d1bccba97fbed97d9e1c8',
+            'md5': '7137eca597f72b9abbe61e5ae0161399',
              'info_dict': {
                  'id': '623160978427936768',
                  'ext': 'mp4',
-                'title': 'Twitter web player',
-                'thumbnail': r're:^https?://.*$',
+                'title': "NASA - Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video.",
+                'description': "Fly over Pluto's icy Norgay Mountains and Sputnik Plain in this @NASANewHorizons #PlutoFlyby video. https://t.co/BJYgOjSeGA",
+                'uploader': 'NASA',
+                'uploader_id': 'NASA',
+                'timestamp': 1437408129,
+                'upload_date': '20150720',
              },
          },
          {
@@ -75,7 +139,7 @@ class TwitterCardIE(TwitterBaseIE):
                  'title': 'Ubuntu 11.10 Overview',
                  'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
                  'upload_date': '20111013',
-                'uploader': 'OMG! Ubuntu!',
+                'uploader': 'OMG! UBUNTU!',
                  'uploader_id': 'omgubuntu',
              },
              'add_ie': ['Youtube'],
@@ -99,190 +163,30 @@ class TwitterCardIE(TwitterBaseIE):
              'info_dict': {
                  'id': '705235433198714880',
                  'ext': 'mp4',
-                'title': 'Twitter web player',
-                'thumbnail': r're:^https?://.*',
+                'title': "Brent Yarina - Khalil Iverson's missed highlight dunk. And made highlight dunk. In one highlight.",
+                'description': "Khalil Iverson's missed highlight dunk. And made highlight dunk. In one highlight. https://t.co/OrxcJ28Bns",
+                'uploader': 'Brent Yarina',
+                'uploader_id': 'BTNBrentYarina',
+                'timestamp': 1456976204,
+                'upload_date': '20160303',
              },
+            'skip': 'This content is no longer available.',
          }, {
              'url': 'https://twitter.com/i/videos/752274308186120192',
              'only_matching': True,
          },
      ]
  
-    _API_BASE = 'https://api.twitter.com/1.1'
-
-    def _parse_media_info(self, media_info, video_id):
-        formats = []
-        for media_variant in media_info.get('variants', []):
-            media_url = media_variant['url']
-            if media_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(media_url, video_id, ext='mp4', m3u8_id='hls'))
-            elif media_url.endswith('.mpd'):
-                formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
-            else:
-                tbr = int_or_none(dict_get(media_variant, ('bitRate', 'bitrate')), scale=1000)
-                a_format = {
-                    'url': media_url,
-                    'format_id': 'http-%d' % tbr if tbr else 'http',
-                    'tbr': tbr,
-                }
-                # Reported bitRate may be zero
-                if not a_format['tbr']:
-                    del a_format['tbr']
-
-                self._search_dimensions_in_video_url(a_format, media_url)
-
-                formats.append(a_format)
-        return formats
-
-    def _extract_mobile_formats(self, username, video_id):
-        webpage = self._download_webpage(
-            'https://mobile.twitter.com/%s/status/%s' % (username, video_id),
-            video_id, 'Downloading mobile webpage',
-            headers={
-                # A recent mobile UA is necessary for `gt` cookie
-                'User-Agent': 'Mozilla/5.0 (Android 6.0.1; Mobile; rv:54.0) Gecko/54.0 Firefox/54.0',
-            })
-        main_script_url = self._html_search_regex(
-            r'<script[^>]+src="([^"]+main\.[^"]+)"', webpage, 'main script URL')
-        main_script = self._download_webpage(
-            main_script_url, video_id, 'Downloading main script')
-        bearer_token = self._search_regex(
-            r'BEARER_TOKEN\s*:\s*"([^"]+)"',
-            main_script, 'bearer token')
-        # https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-show-id
-        api_data = self._download_json(
-            '%s/statuses/show/%s.json' % (self._API_BASE, video_id),
-            video_id, 'Downloading API data',
-            headers={
-                'Authorization': 'Bearer ' + bearer_token,
-            })
-        media_info = try_get(api_data, lambda o: o['extended_entities']['media'][0]['video_info']) or {}
-        return self._parse_media_info(media_info, video_id)
-
      def _real_extract(self, url):
-        path, video_id = re.search(self._VALID_URL, url).groups()
-
-        config = None
-        formats = []
-        duration = None
-
-        urls = [url]
-        if path.startswith('cards/'):
-            urls.append('https://twitter.com/i/videos/' + video_id)
-
-        for u in urls:
-            webpage = self._download_webpage(
-                u, video_id, headers={'Referer': 'https://twitter.com/'})
-
-            iframe_url = self._html_search_regex(
-                r'<iframe[^>]+src="((?:https?:)?//(?:www\.youtube\.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
-                webpage, 'video iframe', default=None)
-            if iframe_url:
-                return self.url_result(iframe_url)
-
-            config = self._parse_json(self._html_search_regex(
-                r'data-(?:player-)?config="([^"]+)"', webpage,
-                'data player config', default='{}'),
-                video_id)
-
-            if config.get('source_type') == 'vine':
-                return self.url_result(config['player_url'], 'Vine')
-
-            periscope_url = PeriscopeIE._extract_url(webpage)
-            if periscope_url:
-                return self.url_result(periscope_url, PeriscopeIE.ie_key())
-
-            video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source')
-
-            if video_url:
-                if determine_ext(video_url) == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls'))
-                else:
-                    f = {
-                        'url': video_url,
-                    }
-
-                    self._search_dimensions_in_video_url(f, video_url)
-
-                    formats.append(f)
-
-            vmap_url = config.get('vmapUrl') or config.get('vmap_url')
-            if vmap_url:
-                formats.extend(
-                    self._extract_formats_from_vmap_url(vmap_url, video_id))
-
-            media_info = None
-
-            for entity in config.get('status', {}).get('entities', []):
-                if 'mediaInfo' in entity:
-                    media_info = entity['mediaInfo']
-
-            if media_info:
-                formats.extend(self._parse_media_info(media_info, video_id))
-                duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9)
-
-            username = config.get('user', {}).get('screen_name')
-            if username:
-                formats.extend(self._extract_mobile_formats(username, video_id))
-
-            if formats:
-                title = self._search_regex(r'<title>([^<]+)</title>', webpage, 'title')
-                thumbnail = config.get('posterImageUrl') or config.get('image_src')
-                duration = float_or_none(config.get('duration'), scale=1000) or duration
-                break
-
-        if not formats:
-            headers = {
-                'Authorization': 'Bearer AAAAAAAAAAAAAAAAAAAAAPYXBAAAAAAACLXUNDekMxqa8h%2F40K4moUkGsoc%3DTYfbDKbT3jJPCEVnMYqilB28NHfOPqkca3qaAxGfsyKCs0wRbw',
-                'Referer': url,
-            }
-            ct0 = self._get_cookies(url).get('ct0')
-            if ct0:
-                headers['csrf_token'] = ct0.value
-            guest_token = self._download_json(
-                '%s/guest/activate.json' % self._API_BASE, video_id,
-                'Downloading guest token', data=b'',
-                headers=headers)['guest_token']
-            headers['x-guest-token'] = guest_token
-            self._set_cookie('api.twitter.com', 'gt', guest_token)
-            config = self._download_json(
-                '%s/videos/tweet/config/%s.json' % (self._API_BASE, video_id),
-                video_id, headers=headers)
-            track = config['track']
-            vmap_url = track.get('vmapUrl')
-            if vmap_url:
-                formats = self._extract_formats_from_vmap_url(vmap_url, video_id)
-            else:
-                playback_url = track['playbackUrl']
-                if determine_ext(playback_url) == 'm3u8':
-                    formats = self._extract_m3u8_formats(
-                        playback_url, video_id, 'mp4',
-                        entry_protocol='m3u8_native', m3u8_id='hls')
-                else:
-                    formats = [{
-                        'url': playback_url,
-                    }]
-            title = 'Twitter web player'
-            thumbnail = config.get('posterImage')
-            duration = float_or_none(track.get('durationMs'), scale=1000)
-
-        self._remove_duplicate_formats(formats)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-        }
+        status_id = self._match_id(url)
+        return self.url_result(
+            'https://twitter.com/statuses/' + status_id,
+            TwitterIE.ie_key(), status_id)
  
  
-class TwitterIE(InfoExtractor):
+class TwitterIE(TwitterBaseIE):
      IE_NAME = 'twitter'
-    _VALID_URL = r'https?://(?:www\.|m\.|mobile\.)?twitter\.com/(?:i/web|(?P<user_id>[^/]+))/status/(?P<id>\d+)'
-    _TEMPLATE_URL = 'https://twitter.com/%s/status/%s'
-    _TEMPLATE_STATUSES_URL = 'https://twitter.com/statuses/%s'
+    _VALID_URL = TwitterBaseIE._BASE_REGEX + r'(?:(?:i/web|[^/]+)/status|statuses)/(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'https://twitter.com/freethenipple/status/643211948184596480',
@@ -291,10 +195,13 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'FREE THE NIPPLE - FTN supporters on Hollywood Blvd today!',
              'thumbnail': r're:^https?://.*\.jpg',
-            'description': 'FREE THE NIPPLE on Twitter: "FTN supporters on Hollywood Blvd today! http://t.co/c7jHH749xJ"',
+            'description': 'FTN supporters on Hollywood Blvd today! http://t.co/c7jHH749xJ',
              'uploader': 'FREE THE NIPPLE',
              'uploader_id': 'freethenipple',
              'duration': 12.922,
+            'timestamp': 1442188653,
+            'upload_date': '20150913',
+            'age_limit': 18,
          },
      }, {
          'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
@@ -316,19 +223,23 @@ class TwitterIE(InfoExtractor):
              'id': '665052190608723968',
              'ext': 'mp4',
              'title': 'Star Wars - A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens.',
-            'description': 'Star Wars on Twitter: "A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens."',
+            'description': 'A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens. https://t.co/OkSqT2fjWJ',
              'uploader_id': 'starwars',
              'uploader': 'Star Wars',
+            'timestamp': 1447395772,
+            'upload_date': '20151113',
          },
      }, {
          'url': 'https://twitter.com/BTNBrentYarina/status/705235433198714880',
          'info_dict': {
              'id': '705235433198714880',
              'ext': 'mp4',
-            'title': 'Brent Yarina - Khalil Iverson\'s missed highlight dunk. And made highlight dunk. In one highlight.',
-            'description': 'Brent Yarina on Twitter: "Khalil Iverson\'s missed highlight dunk. And made highlight dunk. In one highlight."',
+            'title': "Brent Yarina - Khalil Iverson's missed highlight dunk. And made highlight dunk. In one highlight.",
+            'description': "Khalil Iverson's missed highlight dunk. And made highlight dunk. In one highlight. https://t.co/OrxcJ28Bns",
              'uploader_id': 'BTNBrentYarina',
              'uploader': 'Brent Yarina',
+            'timestamp': 1456976204,
+            'upload_date': '20160303',
          },
          'params': {
              # The same video as https://twitter.com/i/videos/tweet/705235433198714880
@@ -340,12 +251,14 @@ class TwitterIE(InfoExtractor):
          'info_dict': {
              'id': '700207533655363584',
              'ext': 'mp4',
-            'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
-            'description': 'JG on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
+            'title': 'simon vetugo - BEAT PROD: @suhmeduh #Damndaniel',
+            'description': 'BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
              'thumbnail': r're:^https?://.*\.jpg',
-            'uploader': 'JG',
-            'uploader_id': 'jaydingeer',
+            'uploader': 'simon vetugo',
+            'uploader_id': 'simonvertugo',
              'duration': 30.0,
+            'timestamp': 1455777459,
+            'upload_date': '20160218',
          },
      }, {
          'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
@@ -353,10 +266,9 @@ class TwitterIE(InfoExtractor):
          'info_dict': {
              'id': 'MIOxnrUteUd',
              'ext': 'mp4',
-            'title': 'Vince Mancini - Vine of the day',
-            'description': 'Vince Mancini on Twitter: "Vine of the day https://t.co/xmTvRdqxWf"',
-            'uploader': 'Vince Mancini',
-            'uploader_id': 'Filmdrunk',
+            'title': 'Dr.Pepperの飲み方 #japanese #バカ #ドクペ #電動ガン',
+            'uploader': 'TAKUMA',
+            'uploader_id': '1004126642786242560',
              'timestamp': 1402826626,
              'upload_date': '20140615',
          },
@@ -367,21 +279,22 @@ class TwitterIE(InfoExtractor):
              'id': '719944021058060289',
              'ext': 'mp4',
              'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.',
-            'description': 'Captain America on Twitter: "@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI"',
-            'uploader_id': 'captainamerica',
+            'description': '@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI',
+            'uploader_id': 'CaptainAmerica',
              'uploader': 'Captain America',
              'duration': 3.17,
+            'timestamp': 1460483005,
+            'upload_date': '20160412',
          },
      }, {
          'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
          'info_dict': {
              'id': '1zqKVVlkqLaKB',
              'ext': 'mp4',
-            'title': 'Sgt Kerry Schmidt - LIVE on #Periscope: Road rage, mischief, assault, rollover and fire in one occurrence',
-            'description': 'Sgt Kerry Schmidt on Twitter: "LIVE on #Periscope: Road rage, mischief, assault, rollover and fire in one occurrence  https://t.co/EKrVgIXF3s"',
+            'title': 'Sgt Kerry Schmidt - Ontario Provincial Police - Road rage, mischief, assault, rollover and fire in one occurrence',
              'upload_date': '20160923',
-            'uploader_id': 'OPP_HSD',
-            'uploader': 'Sgt Kerry Schmidt',
+            'uploader_id': '1PmKqpJdOJQoY',
+            'uploader': 'Sgt Kerry Schmidt - Ontario Provincial Police',
              'timestamp': 1474613214,
          },
          'add_ie': ['Periscope'],
@@ -392,10 +305,12 @@ class TwitterIE(InfoExtractor):
              'id': '852138619213144067',
              'ext': 'mp4',
              'title': 'عالم الأخبار - كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة',
-            'description': 'عالم الأخبار on Twitter: "كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة   https://t.co/xg6OhpyKfN"',
+            'description': 'كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة   https://t.co/xg6OhpyKfN',
              'uploader': 'عالم الأخبار',
              'uploader_id': 'news_al3alm',
              'duration': 277.4,
+            'timestamp': 1492000653,
+            'upload_date': '20170412',
          },
      }, {
          'url': 'https://twitter.com/i/web/status/910031516746514432',
@@ -404,10 +319,12 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Préfet de Guadeloupe - [Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre.',
              'thumbnail': r're:^https?://.*\.jpg',
-            'description': 'Préfet de Guadeloupe on Twitter: "[Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre. https://t.co/mwx01Rs4lo"',
+            'description': '[Direct] #Maria Le centre se trouve actuellement au sud de Basse-Terre. Restez confinés. Réfugiez-vous dans la pièce la + sûre. https://t.co/mwx01Rs4lo',
              'uploader': 'Préfet de Guadeloupe',
              'uploader_id': 'Prefet971',
              'duration': 47.48,
+            'timestamp': 1505803395,
+            'upload_date': '20170919',
          },
          'params': {
              'skip_download': True,  # requires ffmpeg
@@ -420,10 +337,12 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 're:.*?Shep is on a roll today.*?',
              'thumbnail': r're:^https?://.*\.jpg',
-            'description': 'md5:63b036c228772523ae1924d5f8e5ed6b',
+            'description': 'md5:37b9f2ff31720cef23b2bd42ee8a0f09',
              'uploader': 'Lis Power',
              'uploader_id': 'LisPower1',
              'duration': 111.278,
+            'timestamp': 1527623489,
+            'upload_date': '20180529',
          },
          'params': {
              'skip_download': True,  # requires ffmpeg
@@ -435,88 +354,168 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Twitter - A new is coming.  Some of you got an opt-in to try it now. Check out the emoji button, quick keyboard shortcuts, upgraded trends, advanced search, and more. Let us know your thoughts!',
              'thumbnail': r're:^https?://.*\.jpg',
-            'description': 'md5:66d493500c013e3e2d434195746a7f78',
+            'description': 'md5:6dfd341a3310fb97d80d2bf7145df976',
              'uploader': 'Twitter',
              'uploader_id': 'Twitter',
              'duration': 61.567,
+            'timestamp': 1548184644,
+            'upload_date': '20190122',
+        },
+    }, {
+        # not available in Periscope
+        'url': 'https://twitter.com/ViviEducation/status/1136534865145286656',
+        'info_dict': {
+            'id': '1vOGwqejwoWxB',
+            'ext': 'mp4',
+            'title': 'Vivi - Vivi founder @lior_rauchy announcing our new student feedback tool live at @EduTECH_AU #EduTECH2019',
+            'uploader': 'Vivi',
+            'uploader_id': '1eVjYOLGkGrQL',
          },
+        'add_ie': ['TwitterBroadcast'],
+    }, {
+        # Twitch Clip Embed
+        'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
+        'only_matching': True,
+    }, {
+        # promo_video_website card
+        'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        twid = mobj.group('id')
-
-        webpage, urlh = self._download_webpage_handle(
-            self._TEMPLATE_STATUSES_URL % twid, twid)
-
-        if 'twitter.com/account/suspended' in urlh.geturl():
-            raise ExtractorError('Account suspended by Twitter.', expected=True)
-
-        user_id = None
-
-        redirect_mobj = re.match(self._VALID_URL, urlh.geturl())
-        if redirect_mobj:
-            user_id = redirect_mobj.group('user_id')
-
-        if not user_id:
-            user_id = mobj.group('user_id')
-
-        username = remove_end(self._og_search_title(webpage), ' on Twitter')
-
-        title = description = self._og_search_description(webpage).strip('').replace('\n', ' ').strip('“”')
+        twid = self._match_id(url)
+        status = self._call_api(
+            'statuses/show/%s.json' % twid, twid, {
+                'cards_platform': 'Web-12',
+                'include_cards': 1,
+                'include_reply_count': 1,
+                'include_user_entities': 0,
+                'tweet_mode': 'extended',
+            })
  
+        title = description = status['full_text'].replace('\n', ' ')
          # strip  'https -_t.co_BJYgOjSeGA' junk from filenames
          title = re.sub(r'\s+(https?://[^ ]+)', '', title)
+        user = status.get('user') or {}
+        uploader = user.get('name')
+        if uploader:
+            title = '%s - %s' % (uploader, title)
+        uploader_id = user.get('screen_name')
+
+        tags = []
+        for hashtag in (try_get(status, lambda x: x['entities']['hashtags'], list) or []):
+            hashtag_text = hashtag.get('text')
+            if not hashtag_text:
+                continue
+            tags.append(hashtag_text)
  
          info = {
-            'uploader_id': user_id,
-            'uploader': username,
-            'webpage_url': url,
-            'description': '%s on Twitter: "%s"' % (username, description),
-            'title': username + ' - ' + title,
+            'id': twid,
+            'title': title,
+            'description': description,
+            'uploader': uploader,
+            'timestamp': unified_timestamp(status.get('created_at')),
+            'uploader_id': uploader_id,
+            'uploader_url': 'https://twitter.com/' + uploader_id if uploader_id else None,
+            'like_count': int_or_none(status.get('favorite_count')),
+            'repost_count': int_or_none(status.get('retweet_count')),
+            'comment_count': int_or_none(status.get('reply_count')),
+            'age_limit': 18 if status.get('possibly_sensitive') else 0,
+            'tags': tags,
          }
  
-        mobj = re.search(r'''(?x)
-            <video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s*
-                <source[^>]+video-src="(?P<url>[^"]+)"
-        ''', webpage)
-
-        if mobj:
-            more_info = mobj.group('more_info')
-            height = int_or_none(self._search_regex(
-                r'data-height="(\d+)"', more_info, 'height', fatal=False))
-            width = int_or_none(self._search_regex(
-                r'data-width="(\d+)"', more_info, 'width', fatal=False))
-            thumbnail = self._search_regex(
-                r'poster="([^"]+)"', more_info, 'poster', fatal=False)
-            info.update({
-                'id': twid,
-                'url': mobj.group('url'),
-                'height': height,
-                'width': width,
-                'thumbnail': thumbnail,
-            })
-            return info
-
-        twitter_card_url = None
-        if 'class="PlayableMedia' in webpage:
-            twitter_card_url = '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid)
-        else:
-            twitter_card_iframe_url = self._search_regex(
-                r'data-full-card-iframe-url=([\'"])(?P<url>(?:(?!\1).)+)\1',
-                webpage, 'Twitter card iframe URL', default=None, group='url')
-            if twitter_card_iframe_url:
-                twitter_card_url = compat_urlparse.urljoin(url, twitter_card_iframe_url)
+        media = try_get(status, lambda x: x['extended_entities']['media'][0])
+        if media and media.get('type') != 'photo':
+            video_info = media.get('video_info') or {}
+
+            formats = []
+            for variant in video_info.get('variants', []):
+                formats.extend(self._extract_variant_formats(variant, twid))
+            self._sort_formats(formats)
+
+            thumbnails = []
+            media_url = media.get('media_url_https') or media.get('media_url')
+            if media_url:
+                def add_thumbnail(name, size):
+                    thumbnails.append({
+                        'id': name,
+                        'url': update_url_query(media_url, {'name': name}),
+                        'width': int_or_none(size.get('w') or size.get('width')),
+                        'height': int_or_none(size.get('h') or size.get('height')),
+                    })
+                for name, size in media.get('sizes', {}).items():
+                    add_thumbnail(name, size)
+                add_thumbnail('orig', media.get('original_info') or {})
  
-        if twitter_card_url:
              info.update({
-                '_type': 'url_transparent',
-                'ie_key': 'TwitterCard',
-                'url': twitter_card_url,
+                'formats': formats,
+                'thumbnails': thumbnails,
+                'duration': float_or_none(video_info.get('duration_millis'), 1000),
              })
-            return info
-
-        raise ExtractorError('There\'s no video in this tweet.')
+        else:
+            card = status.get('card')
+            if card:
+                binding_values = card['binding_values']
+
+                def get_binding_value(k):
+                    o = binding_values.get(k) or {}
+                    return try_get(o, lambda x: x[x['type'].lower() + '_value'])
+
+                card_name = card['name'].split(':')[-1]
+                if card_name in ('amplify', 'promo_video_website'):
+                    is_amplify = card_name == 'amplify'
+                    vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url')
+                    content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player'))
+                    formats = self._extract_formats_from_vmap_url(vmap_url, content_id or twid)
+                    self._sort_formats(formats)
+
+                    thumbnails = []
+                    for suffix in ('_small', '', '_large', '_x_large', '_original'):
+                        image = get_binding_value('player_image' + suffix) or {}
+                        image_url = image.get('url')
+                        if not image_url or '/player-placeholder' in image_url:
+                            continue
+                        thumbnails.append({
+                            'id': suffix[1:] if suffix else 'medium',
+                            'url': image_url,
+                            'width': int_or_none(image.get('width')),
+                            'height': int_or_none(image.get('height')),
+                        })
+
+                    info.update({
+                        'formats': formats,
+                        'thumbnails': thumbnails,
+                        'duration': int_or_none(get_binding_value(
+                            'content_duration_seconds')),
+                    })
+                elif card_name == 'player':
+                    info.update({
+                        '_type': 'url',
+                        'url': get_binding_value('player_url'),
+                    })
+                elif card_name == 'periscope_broadcast':
+                    info.update({
+                        '_type': 'url',
+                        'url': get_binding_value('url') or get_binding_value('player_url'),
+                        'ie_key': PeriscopeIE.ie_key(),
+                    })
+                elif card_name == 'broadcast':
+                    info.update({
+                        '_type': 'url',
+                        'url': get_binding_value('broadcast_url'),
+                        'ie_key': TwitterBroadcastIE.ie_key(),
+                    })
+                else:
+                    raise ExtractorError('Unsupported Twitter Card.')
+            else:
+                expanded_url = try_get(status, lambda x: x['entities']['urls'][0]['expanded_url'])
+                if not expanded_url:
+                    raise ExtractorError("There's no video in this tweet.")
+                info.update({
+                    '_type': 'url',
+                    'url': expanded_url,
+                })
+        return info
  
  
  class TwitterAmplifyIE(TwitterBaseIE):
@@ -573,3 +572,27 @@ class TwitterAmplifyIE(TwitterBaseIE):
              'formats': formats,
              'thumbnails': thumbnails,
          }
+
+
+class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
+    IE_NAME = 'twitter:broadcast'
+    _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
+
+    def _real_extract(self, url):
+        broadcast_id = self._match_id(url)
+        broadcast = self._call_api(
+            'broadcasts/show.json', broadcast_id,
+            {'ids': broadcast_id})['broadcasts'][broadcast_id]
+        info = self._parse_broadcast_data(broadcast, broadcast_id)
+        media_key = broadcast['media_key']
+        source = self._call_api(
+            'live_video_stream/status/' + media_key, media_key)['source']
+        m3u8_url = source.get('noRedirectPlaybackUrl') or source['location']
+        if '/live_video_stream/geoblocked/' in m3u8_url:
+            self.raise_geo_restricted()
+        m3u8_id = compat_parse_qs(compat_urllib_parse_urlparse(
+            m3u8_url).query).get('type', [None])[0]
+        state, width, height = self._extract_common_format_info(broadcast)
+        info['formats'] = self._extract_pscp_m3u8_formats(
+            m3u8_url, broadcast_id, m3u8_id, state, width, height)
+        return info
diff --git a/youtube_dl/extractor/ufctv.py b/youtube_dl/extractor/ufctv.py

index f3eaee6b3f45b71dadb0a1d3310aa3bf9cebf073..3d74ba0714e2f71f87e476abab6e9d62c902ee52 100644 (file)
--- a/youtube_dl/extractor/ufctv.py
+++ b/youtube_dl/extractor/ufctv.py
@@ -1,73 +1,16 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    parse_duration,
-    parse_iso8601,
-    urlencode_postdata,
-)
+from .imggaming import ImgGamingBaseIE
  
  
-class UFCTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ufc\.tv/video/(?P<id>[^/]+)'
+class UFCTVIE(ImgGamingBaseIE):
+    _VALID_URL = ImgGamingBaseIE._VALID_URL_TEMPL % r'(?:(?:app|www)\.)?(?:ufc\.tv|(?:ufc)?fightpass\.com)|ufcfightpass\.img(?:dge|gaming)\.com'
      _NETRC_MACHINE = 'ufctv'
-    _TEST = {
-        'url': 'https://www.ufc.tv/video/ufc-219-countdown-full-episode',
-        'info_dict': {
-            'id': '34167',
-            'ext': 'mp4',
-            'title': 'UFC 219 Countdown: Full Episode',
-            'description': 'md5:26d4e8bf4665ae5878842d7050c3c646',
-            'timestamp': 1513962360,
-            'upload_date': '20171222',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        }
-    }
+    _REALM = 'ufc'
  
-    def _real_initialize(self):
-        username, password = self._get_login_info()
-        if username is None:
-            return
  
-        code = self._download_json(
-            'https://www.ufc.tv/secure/authenticate',
-            None, 'Logging in', data=urlencode_postdata({
-                'username': username,
-                'password': password,
-                'format': 'json',
-            })).get('code')
-        if code and code != 'loginsuccess':
-            raise ExtractorError(code, expected=True)
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        video_data = self._download_json(url, display_id, query={
-            'format': 'json',
-        })
-        video_id = str(video_data['id'])
-        title = video_data['name']
-        m3u8_url = self._download_json(
-            'https://www.ufc.tv/service/publishpoint', video_id, query={
-                'type': 'video',
-                'format': 'json',
-                'id': video_id,
-            }, headers={
-                'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_1 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A402 Safari/604.1',
-            })['path']
-        m3u8_url = m3u8_url.replace('_iphone.', '.')
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': video_data.get('description'),
-            'duration': parse_duration(video_data.get('runtime')),
-            'timestamp': parse_iso8601(video_data.get('releaseDate')),
-            'formats': formats,
-        }
+class UFCArabiaIE(ImgGamingBaseIE):
+    _VALID_URL = ImgGamingBaseIE._VALID_URL_TEMPL % r'(?:(?:app|www)\.)?ufcarabia\.(?:ae|com)'
+    _NETRC_MACHINE = 'ufcarabia'
+    _REALM = 'admufc'
diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py

deleted file mode 100644 (file)

index 31eee0b..0000000
--- a/youtube_dl/extractor/vessel.py
+++ /dev/null
@@ -1,157 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import json
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    parse_iso8601,
-    sanitized_Request,
-)
-
-
-class VesselIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vessel\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z-_]+)'
-    _API_URL_TEMPLATE = 'https://www.vessel.com/api/view/items/%s'
-    _LOGIN_URL = 'https://www.vessel.com/api/account/login'
-    _NETRC_MACHINE = 'vessel'
-    _TESTS = [{
-        'url': 'https://www.vessel.com/videos/HDN7G5UMs',
-        'md5': '455cdf8beb71c6dd797fd2f3818d05c4',
-        'info_dict': {
-            'id': 'HDN7G5UMs',
-            'ext': 'mp4',
-            'title': 'Nvidia GeForce GTX Titan X - The Best Video Card on the Market?',
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'upload_date': '20150317',
-            'description': 'Did Nvidia pull out all the stops on the Titan X, or does its performance leave something to be desired?',
-            'timestamp': int,
-        },
-    }, {
-        'url': 'https://www.vessel.com/embed/G4U7gUJ6a?w=615&h=346',
-        'only_matching': True,
-    }, {
-        'url': 'https://www.vessel.com/videos/F01_dsLj1',
-        'only_matching': True,
-    }, {
-        'url': 'https://www.vessel.com/videos/RRX-sir-J',
-        'only_matching': True,
-    }]
-
-    @staticmethod
-    def _extract_urls(webpage):
-        return [url for _, url in re.findall(
-            r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?vessel\.com/embed/[0-9a-zA-Z-_]+.*?)\1',
-            webpage)]
-
-    @staticmethod
-    def make_json_request(url, data):
-        payload = json.dumps(data).encode('utf-8')
-        req = sanitized_Request(url, payload)
-        req.add_header('Content-Type', 'application/json; charset=utf-8')
-        return req
-
-    @staticmethod
-    def find_assets(data, asset_type, asset_id=None):
-        for asset in data.get('assets', []):
-            if not asset.get('type') == asset_type:
-                continue
-            elif asset_id is not None and not asset.get('id') == asset_id:
-                continue
-            else:
-                yield asset
-
-    def _check_access_rights(self, data):
-        access_info = data.get('__view', {})
-        if not access_info.get('allow_access', True):
-            err_code = access_info.get('error_code') or ''
-            if err_code == 'ITEM_PAID_ONLY':
-                raise ExtractorError(
-                    'This video requires subscription.', expected=True)
-            else:
-                raise ExtractorError(
-                    'Access to this content is restricted. (%s said: %s)' % (self.IE_NAME, err_code), expected=True)
-
-    def _login(self):
-        username, password = self._get_login_info()
-        if username is None:
-            return
-        self.report_login()
-        data = {
-            'client_id': 'web',
-            'type': 'password',
-            'user_key': username,
-            'password': password,
-        }
-        login_request = VesselIE.make_json_request(self._LOGIN_URL, data)
-        self._download_webpage(login_request, None, False, 'Wrong login info')
-
-    def _real_initialize(self):
-        self._login()
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-        data = self._parse_json(self._search_regex(
-            r'App\.bootstrapData\((.*?)\);', webpage, 'data'), video_id)
-        asset_id = data['model']['data']['id']
-
-        req = VesselIE.make_json_request(
-            self._API_URL_TEMPLATE % asset_id, {'client': 'web'})
-        data = self._download_json(req, video_id)
-        video_asset_id = data.get('main_video_asset')
-
-        self._check_access_rights(data)
-
-        try:
-            video_asset = next(
-                VesselIE.find_assets(data, 'video', asset_id=video_asset_id))
-        except StopIteration:
-            raise ExtractorError('No video assets found')
-
-        formats = []
-        for f in video_asset.get('sources', []):
-            location = f.get('location')
-            if not location:
-                continue
-            name = f.get('name')
-            if name == 'hls-index':
-                formats.extend(self._extract_m3u8_formats(
-                    location, video_id, ext='mp4',
-                    entry_protocol='m3u8_native', m3u8_id='m3u8', fatal=False))
-            elif name == 'dash-index':
-                formats.extend(self._extract_mpd_formats(
-                    location, video_id, mpd_id='dash', fatal=False))
-            else:
-                formats.append({
-                    'format_id': name,
-                    'tbr': f.get('bitrate'),
-                    'height': f.get('height'),
-                    'width': f.get('width'),
-                    'url': location,
-                })
-        self._sort_formats(formats)
-
-        thumbnails = []
-        for im_asset in VesselIE.find_assets(data, 'image'):
-            thumbnails.append({
-                'url': im_asset['location'],
-                'width': im_asset.get('width', 0),
-                'height': im_asset.get('height', 0),
-            })
-
-        return {
-            'id': video_id,
-            'title': data['title'],
-            'formats': formats,
-            'thumbnails': thumbnails,
-            'description': data.get('short_description'),
-            'duration': data.get('duration'),
-            'comment_count': data.get('comment_count'),
-            'like_count': data.get('like_count'),
-            'view_count': data.get('view_count'),
-            'timestamp': parse_iso8601(data.get('released_at')),
-        }
diff --git a/youtube_dl/extractor/vice.py b/youtube_dl/extractor/vice.py

index 8fdfd743d04a6ee4319a64337ea67398d2243896..e37499512856234d9b989aec53dfd59c42647300 100644 (file)
--- a/youtube_dl/extractor/vice.py
+++ b/youtube_dl/extractor/vice.py
@@ -1,35 +1,50 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-import time
+import functools
  import hashlib
  import json
  import random
+import re
+import time
  
  from .adobepass import AdobePassIE
-from .youtube import YoutubeIE
  from .common import InfoExtractor
+from .youtube import YoutubeIE
  from ..compat import (
      compat_HTTPError,
      compat_str,
  )
  from ..utils import (
+    clean_html,
      ExtractorError,
      int_or_none,
+    OnDemandPagedList,
      parse_age_limit,
      str_or_none,
      try_get,
  )
  
  
-class ViceIE(AdobePassIE):
+class ViceBaseIE(InfoExtractor):
+    def _call_api(self, resource, resource_key, resource_id, locale, fields, args=''):
+        return self._download_json(
+            'https://video.vice.com/api/v1/graphql', resource_id, query={
+                'query': '''{
+  %s(locale: "%s", %s: "%s"%s) {
+    %s
+  }
+}''' % (resource, locale, resource_key, resource_id, args, fields),
+            })['data'][resource]
+
+
+class ViceIE(ViceBaseIE, AdobePassIE):
      IE_NAME = 'vice'
-    _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?viceland)\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]+)'
+    _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
      _TESTS = [{
          'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
          'info_dict': {
-            'id': '5e647f0125e145c9aef2069412c0cbde',
+            'id': '58c69e38a55424f1227dc3f7',
              'ext': 'mp4',
              'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
              'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
@@ -43,17 +58,16 @@ class ViceIE(AdobePassIE):
              # m3u8 download
              'skip_download': True,
          },
-        'add_ie': ['UplynkPreplay'],
      }, {
          # geo restricted to US
          'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
          'info_dict': {
-            'id': '930c0ad1f47141cc955087eecaddb0e2',
+            'id': '5816510690b70e6c5fd39a56',
              'ext': 'mp4',
-            'uploader': 'waypoint',
+            'uploader': 'vice',
              'title': 'The Signal From Tölva',
              'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
-            'uploader_id': '57f7d621e05ca860fa9ccaf9',
+            'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1477941983,
              'upload_date': '20161031',
          },
@@ -61,15 +75,14 @@ class ViceIE(AdobePassIE):
              # m3u8 download
              'skip_download': True,
          },
-        'add_ie': ['UplynkPreplay'],
      }, {
          'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
          'info_dict': {
              'id': '581b12b60a0e1f4c0fb6ea2f',
              'ext': 'mp4',
              'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
-            'description': '<p>Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.</p>',
-            'uploader': 'VICE',
+            'description': 'Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.',
+            'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1485368119,
              'upload_date': '20170125',
@@ -78,9 +91,7 @@ class ViceIE(AdobePassIE):
          'params': {
              # AES-encrypted m3u8
              'skip_download': True,
-            'proxy': '127.0.0.1:8118',
          },
-        'add_ie': ['UplynkPreplay'],
      }, {
          'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
          'only_matching': True,
@@ -98,7 +109,7 @@ class ViceIE(AdobePassIE):
      @staticmethod
      def _extract_urls(webpage):
          return re.findall(
-            r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]+)',
+            r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})',
              webpage)
  
      @staticmethod
@@ -109,31 +120,16 @@ class ViceIE(AdobePassIE):
      def _real_extract(self, url):
          locale, video_id = re.match(self._VALID_URL, url).groups()
  
-        webpage = self._download_webpage(
-            'https://video.vice.com/%s/embed/%s' % (locale, video_id),
-            video_id)
-
-        video = self._parse_json(
-            self._search_regex(
-                r'PREFETCH_DATA\s*=\s*({.+?})\s*;\s*\n', webpage,
-                'app state'), video_id)['video']
-        video_id = video.get('vms_id') or video.get('id') or video_id
-        title = video['title']
-        is_locked = video.get('locked')
+        video = self._call_api('videos', 'id', video_id, locale, '''body
+    locked
+    rating
+    thumbnail_url
+    title''')[0]
+        title = video['title'].strip()
          rating = video.get('rating')
-        thumbnail = video.get('thumbnail_url')
-        duration = int_or_none(video.get('duration'))
-        series = try_get(
-            video, lambda x: x['episode']['season']['show']['title'],
-            compat_str)
-        episode_number = try_get(
-            video, lambda x: x['episode']['episode_number'])
-        season_number = try_get(
-            video, lambda x: x['episode']['season']['season_number'])
-        uploader = None
  
          query = {}
-        if is_locked:
+        if video.get('locked'):
              resource = self._get_mvpd_resource(
                  'VICELAND', title, video_id, rating)
              query['tvetoken'] = self._extract_mvpd_auth(
@@ -148,12 +144,9 @@ class ViceIE(AdobePassIE):
          query.update({
              'exp': exp,
              'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
-            '_ad_blocked': None,
-            '_ad_unit': '',
-            '_debug': '',
+            'skipadstitching': 1,
              'platform': 'desktop',
              'rn': random.randint(10000, 100000),
-            'fbprebidtoken': '',
          })
  
          try:
@@ -169,85 +162,94 @@ class ViceIE(AdobePassIE):
              raise
  
          video_data = preplay['video']
-        base = video_data['base']
-        uplynk_preplay_url = preplay['preplayURL']
-        episode = video_data.get('episode', {})
-        channel = video_data.get('channel', {})
+        formats = self._extract_m3u8_formats(
+            preplay['playURL'], video_id, 'mp4', 'm3u8_native')
+        self._sort_formats(formats)
+        episode = video_data.get('episode') or {}
+        channel = video_data.get('channel') or {}
+        season = video_data.get('season') or {}
  
          subtitles = {}
-        cc_url = preplay.get('ccURL')
-        if cc_url:
-            subtitles['en'] = [{
+        for subtitle in preplay.get('subtitleURLs', []):
+            cc_url = subtitle.get('url')
+            if not cc_url:
+                continue
+            language_code = try_get(subtitle, lambda x: x['languages'][0]['language_code'], compat_str) or 'en'
+            subtitles.setdefault(language_code, []).append({
                  'url': cc_url,
-            }]
+            })
  
          return {
-            '_type': 'url_transparent',
-            'url': uplynk_preplay_url,
+            'formats': formats,
              'id': video_id,
              'title': title,
-            'description': base.get('body') or base.get('display_body'),
-            'thumbnail': thumbnail,
-            'duration': int_or_none(video_data.get('video_duration')) or duration,
+            'description': clean_html(video.get('body')),
+            'thumbnail': video.get('thumbnail_url'),
+            'duration': int_or_none(video_data.get('video_duration')),
              'timestamp': int_or_none(video_data.get('created_at'), 1000),
-            'age_limit': parse_age_limit(video_data.get('video_rating')),
-            'series': video_data.get('show_title') or series,
-            'episode_number': int_or_none(episode.get('episode_number') or episode_number),
+            'age_limit': parse_age_limit(video_data.get('video_rating') or rating),
+            'series': try_get(video_data, lambda x: x['show']['base']['display_title'], compat_str),
+            'episode_number': int_or_none(episode.get('episode_number')),
              'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
-            'season_number': int_or_none(season_number),
-            'season_id': str_or_none(episode.get('season_id')),
-            'uploader': channel.get('base', {}).get('title') or channel.get('name') or uploader,
+            'season_number': int_or_none(season.get('season_number')),
+            'season_id': str_or_none(season.get('id') or video_data.get('season_id')),
+            'uploader': channel.get('name'),
              'uploader_id': str_or_none(channel.get('id')),
              'subtitles': subtitles,
-            'ie_key': 'UplynkPreplay',
          }
  
  
-class ViceShowIE(InfoExtractor):
+class ViceShowIE(ViceBaseIE):
      IE_NAME = 'vice:show'
-    _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
-
-    _TEST = {
-        'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
+    _VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
+    _PAGE_SIZE = 25
+    _TESTS = [{
+        'url': 'https://video.vice.com/en_us/show/fck-thats-delicious',
          'info_dict': {
-            'id': 'fuck-thats-delicious-2',
-            'title': "Fuck, That's Delicious",
-            'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
+            'id': '57a2040c8cb727dec794c901',
+            'title': 'F*ck, That’s Delicious',
+            'description': 'The life and eating habits of rap’s greatest bon vivant, Action Bronson.',
          },
-        'playlist_count': 17,
-    }
+        'playlist_mincount': 64,
+    }, {
+        'url': 'https://www.vicetv.com/en_us/show/fck-thats-delicious',
+        'only_matching': True,
+    }]
  
-    def _real_extract(self, url):
-        show_id = self._match_id(url)
-        webpage = self._download_webpage(url, show_id)
+    def _fetch_page(self, locale, show_id, page):
+        videos = self._call_api('videos', 'show_id', show_id, locale, '''body
+    id
+    url''', ', page: %d, per_page: %d' % (page + 1, self._PAGE_SIZE))
+        for video in videos:
+            yield self.url_result(
+                video['url'], ViceIE.ie_key(), video.get('id'))
  
-        entries = [
-            self.url_result(video_url, ViceIE.ie_key())
-            for video_url, _ in re.findall(
-                r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
-                % ViceIE._VALID_URL, webpage)]
+    def _real_extract(self, url):
+        locale, display_id = re.match(self._VALID_URL, url).groups()
+        show = self._call_api('shows', 'slug', display_id, locale, '''dek
+    id
+    title''')[0]
+        show_id = show['id']
  
-        title = self._search_regex(
-            r'<title>(.+?)</title>', webpage, 'title', default=None)
-        if title:
-            title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
-        description = self._html_search_meta(
-            'description', webpage, 'description')
+        entries = OnDemandPagedList(
+            functools.partial(self._fetch_page, locale, show_id),
+            self._PAGE_SIZE)
  
-        return self.playlist_result(entries, show_id, title, description)
+        return self.playlist_result(
+            entries, show_id, show.get('title'), show.get('dek'))
  
  
-class ViceArticleIE(InfoExtractor):
+class ViceArticleIE(ViceBaseIE):
      IE_NAME = 'vice:article'
-    _VALID_URL = r'https://www\.vice\.com/[^/]+/article/(?P<id>[^?#]+)'
+    _VALID_URL = r'https://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'
  
      _TESTS = [{
          'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
          'info_dict': {
-            'id': '41eae2a47b174a1398357cec55f1f6fc',
+            'id': '58dc0a3dee202d2a0ccfcbd8',
              'ext': 'mp4',
-            'title': 'Mormon War on Porn ',
-            'description': 'md5:6394a8398506581d0346b9ab89093fef',
+            'title': 'Mormon War on Porn',
+            'description': 'md5:1c5d91fe25fa8aa304f9def118b92dbf',
              'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1491883129,
@@ -258,10 +260,10 @@ class ViceArticleIE(InfoExtractor):
              # AES-encrypted m3u8
              'skip_download': True,
          },
-        'add_ie': ['UplynkPreplay'],
+        'add_ie': [ViceIE.ie_key()],
      }, {
          'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
-        'md5': '7fe8ebc4fa3323efafc127b82bd821d9',
+        'md5': '13010ee0bc694ea87ec40724397c2349',
          'info_dict': {
              'id': '3jstaBeXgAs',
              'ext': 'mp4',
@@ -271,15 +273,15 @@ class ViceArticleIE(InfoExtractor):
              'uploader_id': 'MotherboardTV',
              'upload_date': '20140529',
          },
-        'add_ie': ['Youtube'],
+        'add_ie': [YoutubeIE.ie_key()],
      }, {
          'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
          'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
          'info_dict': {
-            'id': 'e2ed435eb67e43efb66e6ef9a6930a88',
+            'id': '57f41d3556a0a80f54726060',
              'ext': 'mp4',
              'title': "Making The World's First Male Sex Doll",
-            'description': 'md5:916078ef0e032d76343116208b6cc2c4',
+            'description': 'md5:19b00b215b99961cf869c40fbe9df755',
              'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1476919911,
@@ -288,6 +290,7 @@ class ViceArticleIE(InfoExtractor):
          },
          'params': {
              'skip_download': True,
+            'format': 'bestvideo',
          },
          'add_ie': [ViceIE.ie_key()],
      }, {
@@ -299,14 +302,11 @@ class ViceArticleIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
+        locale, display_id = re.match(self._VALID_URL, url).groups()
  
-        prefetch_data = self._parse_json(self._search_regex(
-            r'__APP_STATE\s*=\s*({.+?})(?:\s*\|\|\s*{}\s*)?;\s*\n',
-            webpage, 'app state'), display_id)['pageData']
-        body = prefetch_data['body']
+        article = self._call_api('articles', 'slug', display_id, locale, '''body
+    embed_code''')[0]
+        body = article['body']
  
          def _url_res(video_url, ie_key):
              return {
@@ -316,7 +316,7 @@ class ViceArticleIE(InfoExtractor):
                  'ie_key': ie_key,
              }
  
-        vice_url = ViceIE._extract_url(webpage)
+        vice_url = ViceIE._extract_url(body)
          if vice_url:
              return _url_res(vice_url, ViceIE.ie_key())
  
@@ -332,6 +332,6 @@ class ViceArticleIE(InfoExtractor):
  
          video_url = self._html_search_regex(
              r'data-video-url="([^"]+)"',
-            prefetch_data['embed_code'], 'video URL')
+            article['embed_code'], 'video URL')
  
          return _url_res(video_url, ViceIE.ie_key())
diff --git a/youtube_dl/extractor/videodetective.py b/youtube_dl/extractor/videodetective.py

index a19411a058784fc61db3b764ec882f0b9986323f..fe70db713ea51fabe95912e0574a7eaa66bc8b42 100644 (file)
--- a/youtube_dl/extractor/videodetective.py
+++ b/youtube_dl/extractor/videodetective.py
@@ -1,7 +1,6 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
  from .internetvideoarchive import InternetVideoArchiveIE
  
  
@@ -13,7 +12,7 @@ class VideoDetectiveIE(InfoExtractor):
          'info_dict': {
              'id': '194487',
              'ext': 'mp4',
-            'title': 'KICK-ASS 2',
+            'title': 'Kick-Ass 2',
              'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
          },
          'params': {
@@ -24,7 +23,7 @@ class VideoDetectiveIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        og_video = self._og_search_video_url(webpage)
-        query = compat_urlparse.urlparse(og_video).query
-        return self.url_result(InternetVideoArchiveIE._build_json_url(query), ie=InternetVideoArchiveIE.ie_key())
+        query = 'customerid=69249&publishedid=' + video_id
+        return self.url_result(
+            InternetVideoArchiveIE._build_json_url(query),
+            ie=InternetVideoArchiveIE.ie_key())
diff --git a/youtube_dl/extractor/videopremium.py b/youtube_dl/extractor/videopremium.py

deleted file mode 100644 (file)

index cf690d7..0000000
--- a/youtube_dl/extractor/videopremium.py
+++ /dev/null
@@ -1,46 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-import random
-
-from .common import InfoExtractor
-
-
-class VideoPremiumIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?videopremium\.(?:tv|me)/(?P<id>\w+)(?:/.*)?'
-    _TEST = {
-        'url': 'http://videopremium.tv/4w7oadjsf156',
-        'info_dict': {
-            'id': '4w7oadjsf156',
-            'ext': 'f4v',
-            'title': 'youtube-dl_test_video____a_________-BaW_jenozKc.mp4.mp4'
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'skip': 'Test file has been deleted.',
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage_url = 'http://videopremium.tv/' + video_id
-        webpage = self._download_webpage(webpage_url, video_id)
-
-        if re.match(r'^<html><head><script[^>]*>window\.location\s*=', webpage):
-            # Download again, we need a cookie
-            webpage = self._download_webpage(
-                webpage_url, video_id,
-                note='Downloading webpage again (with cookie)')
-
-        video_title = self._html_search_regex(
-            r'<h2(?:.*?)>\s*(.+?)\s*<', webpage, 'video title')
-
-        return {
-            'id': video_id,
-            'url': 'rtmp://e%d.md.iplay.md/play' % random.randint(1, 16),
-            'play_path': 'mp4:%s.f4v' % video_id,
-            'page_url': 'http://videopremium.tv/' + video_id,
-            'player_url': 'http://videopremium.tv/uplayer/uppod.swf',
-            'ext': 'f4v',
-            'title': video_title,
-        }
diff --git a/youtube_dl/extractor/viewlift.py b/youtube_dl/extractor/viewlift.py

index c43d1a1e838987bb916c7b3eca83b7a900457c2b..851ad936cfc012b02c3125c9a0a3e898f9c6f005 100644 (file)
--- a/youtube_dl/extractor/viewlift.py
+++ b/youtube_dl/extractor/viewlift.py
@@ -13,11 +13,12 @@ from ..utils import (
      js_to_json,
      parse_age_limit,
      parse_duration,
+    try_get,
  )
  
  
  class ViewLiftBaseIE(InfoExtractor):
-    _DOMAINS_REGEX = r'(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
+    _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
  
  
  class ViewLiftEmbedIE(ViewLiftBaseIE):
@@ -113,7 +114,7 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
  
  
  class ViewLiftIE(ViewLiftBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)/(?:films/title|show|(?:news/)?videos?)/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?:/(?:films/title|show|(?:news/)?videos?))?/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
      _TESTS = [{
          'url': 'http://www.snagfilms.com/films/title/lost_for_life',
          'md5': '19844f897b35af219773fd63bdec2942',
@@ -128,7 +129,7 @@ class ViewLiftIE(ViewLiftBaseIE):
              'categories': 'mincount:3',
              'age_limit': 14,
              'upload_date': '20150421',
-            'timestamp': 1429656819,
+            'timestamp': 1429656820,
          }
      }, {
          'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
@@ -141,10 +142,26 @@ class ViewLiftIE(ViewLiftBaseIE):
              'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
              'thumbnail': r're:^https?://.*\.jpg',
              'duration': 979,
-            'categories': 'mincount:2',
              'timestamp': 1399478279,
              'upload_date': '20140507',
          }
+    }, {
+        'url': 'http://main.snagfilms.com/augie_alone/s_2_ep_12_love',
+        'info_dict': {
+            'id': '00000148-7b53-de26-a9fb-fbf306f70020',
+            'display_id': 'augie_alone/s_2_ep_12_love',
+            'ext': 'mp4',
+            'title': 'Augie, Alone:S. 2 Ep. 12 - Love',
+            'description': 'md5:db2a5c72d994f16a780c1eb353a8f403',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 107,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://main.snagfilms.com/films/title/the_freebie',
+        'only_matching': True,
      }, {
          # Film is not playable in your area.
          'url': 'http://www.snagfilms.com/films/title/inside_mecca',
@@ -162,6 +179,10 @@ class ViewLiftIE(ViewLiftBaseIE):
          'only_matching': True,
      }]
  
+    @classmethod
+    def suitable(cls, url):
+        return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
+
      def _real_extract(self, url):
          domain, display_id = re.match(self._VALID_URL, url).groups()
  
@@ -181,7 +202,21 @@ class ViewLiftIE(ViewLiftBaseIE):
              gist = content_data['gist']
              film_id = gist['id']
              title = gist['title']
-            video_assets = content_data['streamingInfo']['videoAssets']
+            video_assets = try_get(
+                content_data, lambda x: x['streamingInfo']['videoAssets'], dict)
+            if not video_assets:
+                token = self._download_json(
+                    'https://prod-api.viewlift.com/identity/anonymous-token',
+                    film_id, 'Downloading authorization token',
+                    query={'site': 'snagfilms'})['authorizationToken']
+                video_assets = self._download_json(
+                    'https://prod-api.viewlift.com/entitlement/video/status',
+                    film_id, headers={
+                        'Authorization': token,
+                        'Referer': url,
+                    }, query={
+                        'id': film_id
+                    })['video']['streamingInfo']['videoAssets']
  
              formats = []
              mpeg_video_assets = video_assets.get('mpeg') or []
@@ -241,8 +276,9 @@ class ViewLiftIE(ViewLiftBaseIE):
                          if category.get('title')]
                      break
              else:
-                title = self._search_regex(
-                    r'itemprop="title">([^<]+)<', webpage, 'title')
+                title = self._html_search_regex(
+                    (r'itemprop="title">([^<]+)<',
+                     r'(?s)itemprop="title">(.+?)<div'), webpage, 'title')
                  description = self._html_search_regex(
                      r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
                      webpage, 'description', default=None) or self._og_search_description(webpage)
diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py

deleted file mode 100644 (file)

index 6e31847..0000000
--- a/youtube_dl/extractor/viewster.py
+++ /dev/null
@@ -1,217 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_HTTPError,
-    compat_urllib_parse_unquote,
-)
-from ..utils import (
-    determine_ext,
-    ExtractorError,
-    int_or_none,
-    parse_iso8601,
-    sanitized_Request,
-    HEADRequest,
-    url_basename,
-)
-
-
-class ViewsterIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?viewster\.com/(?:serie|movie)/(?P<id>\d+-\d+-\d+)'
-    _TESTS = [{
-        # movie, Type=Movie
-        'url': 'http://www.viewster.com/movie/1140-11855-000/the-listening-project/',
-        'md5': 'e642d1b27fcf3a4ffa79f194f5adde36',
-        'info_dict': {
-            'id': '1140-11855-000',
-            'ext': 'mp4',
-            'title': 'The listening Project',
-            'description': 'md5:bac720244afd1a8ea279864e67baa071',
-            'timestamp': 1214870400,
-            'upload_date': '20080701',
-            'duration': 4680,
-        },
-    }, {
-        # series episode, Type=Episode
-        'url': 'http://www.viewster.com/serie/1284-19427-001/the-world-and-a-wall/',
-        'md5': '9243079a8531809efe1b089db102c069',
-        'info_dict': {
-            'id': '1284-19427-001',
-            'ext': 'mp4',
-            'title': 'The World and a Wall',
-            'description': 'md5:24814cf74d3453fdf5bfef9716d073e3',
-            'timestamp': 1428192000,
-            'upload_date': '20150405',
-            'duration': 1500,
-        },
-    }, {
-        # serie, Type=Serie
-        'url': 'http://www.viewster.com/serie/1303-19426-000/',
-        'info_dict': {
-            'id': '1303-19426-000',
-            'title': 'Is It Wrong to Try to Pick up Girls in a Dungeon?',
-            'description': 'md5:eeda9bef25b0d524b3a29a97804c2f11',
-        },
-        'playlist_count': 13,
-    }, {
-        # unfinished serie, no Type
-        'url': 'http://www.viewster.com/serie/1284-19427-000/baby-steps-season-2/',
-        'info_dict': {
-            'id': '1284-19427-000',
-            'title': 'Baby Steps—Season 2',
-            'description': 'md5:e7097a8fc97151e25f085c9eb7a1cdb1',
-        },
-        'playlist_mincount': 16,
-    }, {
-        # geo restricted series
-        'url': 'https://www.viewster.com/serie/1280-18794-002/',
-        'only_matching': True,
-    }, {
-        # geo restricted video
-        'url': 'https://www.viewster.com/serie/1280-18794-002/what-is-extraterritoriality-lawo/',
-        'only_matching': True,
-    }]
-
-    _ACCEPT_HEADER = 'application/json, text/javascript, */*; q=0.01'
-
-    def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True, query={}):
-        request = sanitized_Request(url)
-        request.add_header('Accept', self._ACCEPT_HEADER)
-        request.add_header('Auth-token', self._AUTH_TOKEN)
-        return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal, query=query)
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        # Get 'api_token' cookie
-        self._request_webpage(
-            HEADRequest('http://www.viewster.com/'),
-            video_id, headers=self.geo_verification_headers())
-        cookies = self._get_cookies('http://www.viewster.com/')
-        self._AUTH_TOKEN = compat_urllib_parse_unquote(cookies['api_token'].value)
-
-        info = self._download_json(
-            'https://public-api.viewster.com/search/%s' % video_id,
-            video_id, 'Downloading entry JSON')
-
-        entry_id = info.get('Id') or info['id']
-
-        # unfinished serie has no Type
-        if info.get('Type') in ('Serie', None):
-            try:
-                episodes = self._download_json(
-                    'https://public-api.viewster.com/series/%s/episodes' % entry_id,
-                    video_id, 'Downloading series JSON')
-            except ExtractorError as e:
-                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
-                    self.raise_geo_restricted()
-                else:
-                    raise
-            entries = [
-                self.url_result(
-                    'http://www.viewster.com/movie/%s' % episode['OriginId'], 'Viewster')
-                for episode in episodes]
-            title = (info.get('Title') or info['Synopsis']['Title']).strip()
-            description = info.get('Synopsis', {}).get('Detailed')
-            return self.playlist_result(entries, video_id, title, description)
-
-        formats = []
-        for language_set in info.get('LanguageSets', []):
-            manifest_url = None
-            m3u8_formats = []
-            audio = language_set.get('Audio') or ''
-            subtitle = language_set.get('Subtitle') or ''
-            base_format_id = audio
-            if subtitle:
-                base_format_id += '-%s' % subtitle
-
-            def concat(suffix, sep='-'):
-                return (base_format_id + '%s%s' % (sep, suffix)) if base_format_id else suffix
-
-            medias = self._download_json(
-                'https://public-api.viewster.com/movies/%s/videos' % entry_id,
-                video_id, fatal=False, query={
-                    'mediaTypes': ['application/f4m+xml', 'application/x-mpegURL', 'video/mp4'],
-                    'language': audio,
-                    'subtitle': subtitle,
-                })
-            if not medias:
-                continue
-            for media in medias:
-                video_url = media.get('Uri')
-                if not video_url:
-                    continue
-                ext = determine_ext(video_url)
-                if ext == 'f4m':
-                    manifest_url = video_url
-                    video_url += '&' if '?' in video_url else '?'
-                    video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1'
-                    formats.extend(self._extract_f4m_formats(
-                        video_url, video_id, f4m_id=concat('hds')))
-                elif ext == 'm3u8':
-                    manifest_url = video_url
-                    m3u8_formats = self._extract_m3u8_formats(
-                        video_url, video_id, 'mp4', m3u8_id=concat('hls'),
-                        fatal=False)  # m3u8 sometimes fail
-                    if m3u8_formats:
-                        formats.extend(m3u8_formats)
-                else:
-                    qualities_basename = self._search_regex(
-                        r'/([^/]+)\.csmil/',
-                        manifest_url, 'qualities basename', default=None)
-                    if not qualities_basename:
-                        continue
-                    QUALITIES_RE = r'((,\d+k)+,?)'
-                    qualities = self._search_regex(
-                        QUALITIES_RE, qualities_basename,
-                        'qualities', default=None)
-                    if not qualities:
-                        continue
-                    qualities = list(map(lambda q: int(q[:-1]), qualities.strip(',').split(',')))
-                    qualities.sort()
-                    http_template = re.sub(QUALITIES_RE, r'%dk', qualities_basename)
-                    http_url_basename = url_basename(video_url)
-                    if m3u8_formats:
-                        self._sort_formats(m3u8_formats)
-                        m3u8_formats = list(filter(
-                            lambda f: f.get('vcodec') != 'none', m3u8_formats))
-                    if len(qualities) == len(m3u8_formats):
-                        for q, m3u8_format in zip(qualities, m3u8_formats):
-                            f = m3u8_format.copy()
-                            f.update({
-                                'url': video_url.replace(http_url_basename, http_template % q),
-                                'format_id': f['format_id'].replace('hls', 'http'),
-                                'protocol': 'http',
-                            })
-                            formats.append(f)
-                    else:
-                        for q in qualities:
-                            formats.append({
-                                'url': video_url.replace(http_url_basename, http_template % q),
-                                'ext': 'mp4',
-                                'format_id': 'http-%d' % q,
-                                'tbr': q,
-                            })
-
-        if not formats and not info.get('VODSettings'):
-            self.raise_geo_restricted()
-
-        self._sort_formats(formats)
-
-        synopsis = info.get('Synopsis') or {}
-        # Prefer title outside synopsis since it's less messy
-        title = (info.get('Title') or synopsis['Title']).strip()
-        description = synopsis.get('Detailed') or (info.get('Synopsis') or {}).get('Short')
-        duration = int_or_none(info.get('Duration'))
-        timestamp = parse_iso8601(info.get('ReleaseDate'))
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'timestamp': timestamp,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index ddf375c6c2189f4eea655196a69c7e2339952527..baa46d5f3513cbde337f144c7143a9c501455ff4 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -15,19 +15,20 @@ from ..compat import (
      compat_urlparse,
  )
  from ..utils import (
+    clean_html,
      determine_ext,
+    dict_get,
      ExtractorError,
      js_to_json,
      int_or_none,
      merge_dicts,
-    NO_DEFAULT,
      OnDemandPagedList,
      parse_filesize,
-    qualities,
      RegexNotFoundError,
      sanitized_Request,
      smuggle_url,
      std_headers,
+    str_or_none,
      try_get,
      unified_timestamp,
      unsmuggle_url,
@@ -211,6 +212,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
          video_uploader_url = owner.get('url')
  
          return {
+            'id': str_or_none(video_data.get('id')) or video_id,
              'title': self._live_title(video_title) if is_live else video_title,
              'uploader': owner.get('name'),
              'uploader_id': video_uploader_url.split('/')[-1] if video_uploader_url else None,
@@ -258,11 +260,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
                          (?:
                              (?:
                                  www|
-                                (?P<player>player)
+                                player
                              )
                              \.
                          )?
-                        vimeo(?P<pro>pro)?\.com/
+                        vimeo(?:pro)?\.com/
                          (?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
                          (?:.*?/)?
                          (?:
@@ -284,7 +286,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'id': '56015672',
                  'ext': 'mp4',
                  'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
-                'description': 'md5:509a9ad5c9bf97c60faee9203aca4479',
+                'description': 'md5:2d3305bad981a06ff79f027f19865021',
                  'timestamp': 1355990239,
                  'upload_date': '20121220',
                  'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user7108434',
@@ -293,6 +295,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'duration': 10,
                  'license': 'by-sa',
              },
+            'params': {
+                'format': 'best[protocol=https]',
+            },
          },
          {
              'url': 'http://vimeopro.com/openstreetmapus/state-of-the-map-us-2013/video/68093876',
@@ -305,8 +310,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'uploader_id': 'openstreetmapus',
                  'uploader': 'OpenStreetMap US',
                  'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
-                'description': 'md5:fd69a7b8d8c34a4e1d2ec2e4afd6ec30',
+                'description': 'md5:2c362968038d4499f4d79f88458590c1',
                  'duration': 1595,
+                'upload_date': '20130610',
+                'timestamp': 1370893156,
+            },
+            'params': {
+                'format': 'best[protocol=https]',
              },
          },
          {
@@ -323,6 +333,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'duration': 3610,
                  'description': None,
              },
+            'params': {
+                'format': 'best[protocol=https]',
+            },
+            'expected_warnings': ['Unable to download JSON metadata'],
          },
          {
              'url': 'http://vimeo.com/68375962',
@@ -341,6 +355,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'description': 'md5:dca3ea23adb29ee387127bc4ddfce63f',
              },
              'params': {
+                'format': 'best[protocol=https]',
                  'videopassword': 'youtube-dl',
              },
          },
@@ -441,10 +456,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'uploader': '10Ft Films',
                  'uploader_url': r're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
                  'uploader_id': 'tenfootfilms',
+                'description': 'md5:0fa704e05b04f91f40b7f3ca2e801384',
+                'upload_date': '20130830',
+                'timestamp': 1377853339,
              },
              'params': {
                  'skip_download': True,
              },
+            'expected_warnings': ['Unable to download JSON metadata'],
          },
          {
              'url': 'http://player.vimeo.com/video/68375962',
@@ -459,6 +478,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'duration': 10,
              },
              'params': {
+                'format': 'best[protocol=https]',
                  'videopassword': 'youtube-dl',
              },
          },
@@ -523,7 +543,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
      def _verify_player_video_password(self, url, video_id, headers):
          password = self._downloader.params.get('videopassword')
          if password is None:
-            raise ExtractorError('This video is protected by a password, use the --video-password option')
+            raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
          data = urlencode_postdata({
              'password': base64.b64encode(password.encode()),
          })
@@ -552,28 +572,26 @@ class VimeoIE(VimeoBaseInfoExtractor):
              r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
  
          # Extract ID from URL
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
          orig_url = url
-        if mobj.group('pro'):
+        is_pro = 'vimeopro.com/' in url
+        is_player = '://player.vimeo.com/video/' in url
+        if is_pro:
              # some videos require portfolio_id to be present in player url
              # https://github.com/ytdl-org/youtube-dl/issues/20070
              url = self._extract_url(url, self._download_webpage(url, video_id))
-        elif mobj.group('player'):
+            if not url:
+                url = 'https://vimeo.com/' + video_id
+        elif is_player:
              url = 'https://player.vimeo.com/video/' + video_id
          elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
              url = 'https://vimeo.com/' + video_id
  
-        # Retrieve video webpage to extract further information
-        request = sanitized_Request(url, headers=headers)
          try:
-            webpage, urlh = self._download_webpage_handle(request, video_id)
+            # Retrieve video webpage to extract further information
+            webpage, urlh = self._download_webpage_handle(
+                url, video_id, headers=headers)
              redirect_url = compat_str(urlh.geturl())
-            # Some URLs redirect to ondemand can't be extracted with
-            # this extractor right away thus should be passed through
-            # ondemand extractor (e.g. https://vimeo.com/73445910)
-            if VimeoOndemandIE.suitable(redirect_url):
-                return self.url_result(redirect_url, VimeoOndemandIE.ie_key())
          except ExtractorError as ee:
              if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
                  errmsg = ee.cause.read()
@@ -600,6 +618,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
  
          cc_license = None
          timestamp = None
+        video_description = None
  
          # Extract the config JSON
          try:
@@ -611,17 +630,17 @@ class VimeoIE(VimeoBaseInfoExtractor):
                      # Sometimes new react-based page is served instead of old one that require
                      # different config URL extraction approach (see
                      # https://github.com/ytdl-org/youtube-dl/pull/7209)
-                    vimeo_clip_page_config = self._search_regex(
-                        r'vimeo\.clip_page_config\s*=\s*({.+?});', webpage,
-                        'vimeo clip page config')
-                    page_config = self._parse_json(vimeo_clip_page_config, video_id)
+                    page_config = self._parse_json(self._search_regex(
+                        r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
+                        webpage, 'page config'), video_id)
                      config_url = page_config['player']['config_url']
                      cc_license = page_config.get('cc_license')
                      timestamp = try_get(
                          page_config, lambda x: x['clip']['uploaded_on'],
                          compat_str)
-                config_json = self._download_webpage(config_url, video_id)
-                config = json.loads(config_json)
+                    video_description = clean_html(dict_get(
+                        page_config, ('description', 'description_html_escaped')))
+                config = self._download_json(config_url, video_id)
              except RegexNotFoundError:
                  # For pro videos or player.vimeo.com urls
                  # We try to find out to which variable is assigned the config dic
@@ -675,14 +694,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
                      {'force_feature_id': True}), 'Vimeo')
  
          # Extract video description
-
-        video_description = self._html_search_regex(
-            r'(?s)<div\s+class="[^"]*description[^"]*"[^>]*>(.*?)</div>',
-            webpage, 'description', default=None)
+        if not video_description:
+            video_description = self._html_search_regex(
+                r'(?s)<div\s+class="[^"]*description[^"]*"[^>]*>(.*?)</div>',
+                webpage, 'description', default=None)
          if not video_description:
              video_description = self._html_search_meta(
                  'description', webpage, default=None)
-        if not video_description and mobj.group('pro'):
+        if not video_description and is_pro:
              orig_webpage = self._download_webpage(
                  orig_url, video_id,
                  note='Downloading webpage for description',
@@ -690,7 +709,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              if orig_webpage:
                  video_description = self._html_search_meta(
                      'description', orig_webpage, default=None)
-        if not video_description and not mobj.group('player'):
+        if not video_description and not is_player:
              self._downloader.report_warning('Cannot find video description')
  
          # Extract upload date
@@ -730,7 +749,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
          channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None
  
          info_dict = {
-            'id': video_id,
              'formats': formats,
              'timestamp': unified_timestamp(timestamp),
              'description': video_description,
@@ -748,9 +766,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
          return info_dict
  
  
-class VimeoOndemandIE(VimeoBaseInfoExtractor):
+class VimeoOndemandIE(VimeoIE):
      IE_NAME = 'vimeo:ondemand'
-    _VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/([^/]+/)?(?P<id>[^/?#&]+)'
      _TESTS = [{
          # ondemand video not available via https://vimeo.com/id
          'url': 'https://vimeo.com/ondemand/20704',
@@ -762,24 +780,32 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
              'uploader': 'גם סרטים',
              'uploader_url': r're:https?://(?:www\.)?vimeo\.com/gumfilms',
              'uploader_id': 'gumfilms',
+            'description': 'md5:4c027c965e439de4baab621e48b60791',
+            'upload_date': '20140906',
+            'timestamp': 1410032453,
          },
          'params': {
              'format': 'best[protocol=https]',
          },
+        'expected_warnings': ['Unable to download JSON metadata'],
      }, {
          # requires Referer to be passed along with og:video:url
          'url': 'https://vimeo.com/ondemand/36938/126682985',
          'info_dict': {
-            'id': '126682985',
+            'id': '126584684',
              'ext': 'mp4',
              'title': 'Rävlock, rätt läte på rätt plats',
              'uploader': 'Lindroth & Norin',
-            'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user14430847',
-            'uploader_id': 'user14430847',
+            'uploader_url': r're:https?://(?:www\.)?vimeo\.com/lindrothnorin',
+            'uploader_id': 'lindrothnorin',
+            'description': 'md5:c3c46a90529612c8279fb6af803fc0df',
+            'upload_date': '20150502',
+            'timestamp': 1430586422,
          },
          'params': {
              'skip_download': True,
          },
+        'expected_warnings': ['Unable to download JSON metadata'],
      }, {
          'url': 'https://vimeo.com/ondemand/nazmaalik',
          'only_matching': True,
@@ -791,16 +817,6 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
          'only_matching': True,
      }]
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-        return self.url_result(
-            # Some videos require Referer to be passed along with og:video:url
-            # similarly to generic vimeo embeds (e.g.
-            # https://vimeo.com/ondemand/36938/126682985).
-            VimeoIE._smuggle_referrer(self._og_search_video_url(webpage), url),
-            VimeoIE.ie_key())
-
  
  class VimeoChannelIE(VimeoBaseInfoExtractor):
      IE_NAME = 'vimeo:channel'
@@ -816,6 +832,7 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
          },
          'playlist_mincount': 25,
      }]
+    _BASE_URL_TEMPL = 'https://vimeo.com/channels/%s'
  
      def _page_url(self, base_url, pagenum):
          return '%s/videos/page:%d/' % (base_url, pagenum)
@@ -887,14 +904,13 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
          return self.playlist_result(title_and_entries, list_id, list_title)
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        channel_id = mobj.group('id')
-        return self._extract_videos(channel_id, 'https://vimeo.com/channels/%s' % channel_id)
+        channel_id = self._match_id(url)
+        return self._extract_videos(channel_id, self._BASE_URL_TEMPL % channel_id)
  
  
  class VimeoUserIE(VimeoChannelIE):
      IE_NAME = 'vimeo:user'
-    _VALID_URL = r'https://vimeo\.com/(?!(?:[0-9]+|watchlater)(?:$|[?#/]))(?P<name>[^/]+)(?:/videos|[#?]|$)'
+    _VALID_URL = r'https://vimeo\.com/(?!(?:[0-9]+|watchlater)(?:$|[?#/]))(?P<id>[^/]+)(?:/videos|[#?]|$)'
      _TITLE_RE = r'<a[^>]+?class="user">([^<>]+?)</a>'
      _TESTS = [{
          'url': 'https://vimeo.com/nkistudio/videos',
@@ -904,11 +920,7 @@ class VimeoUserIE(VimeoChannelIE):
          },
          'playlist_mincount': 66,
      }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        name = mobj.group('name')
-        return self._extract_videos(name, 'https://vimeo.com/%s' % name)
+    _BASE_URL_TEMPL = 'https://vimeo.com/%s'
  
  
  class VimeoAlbumIE(VimeoChannelIE):
@@ -939,7 +951,7 @@ class VimeoAlbumIE(VimeoChannelIE):
      def _fetch_page(self, album_id, authorizaion, hashed_pass, page):
          api_page = page + 1
          query = {
-            'fields': 'link',
+            'fields': 'link,uri',
              'page': api_page,
              'per_page': self._PAGE_SIZE,
          }
@@ -954,7 +966,9 @@ class VimeoAlbumIE(VimeoChannelIE):
              link = video.get('link')
              if not link:
                  continue
-            yield self.url_result(link, VimeoIE.ie_key(), VimeoIE._match_id(link))
+            uri = video.get('uri')
+            video_id = self._search_regex(r'/videos/(\d+)', uri, 'video_id', default=None) if uri else None
+            yield self.url_result(link, VimeoIE.ie_key(), video_id)
  
      def _real_extract(self, url):
          album_id = self._match_id(url)
@@ -968,25 +982,18 @@ class VimeoAlbumIE(VimeoChannelIE):
              r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
  
  
-class VimeoGroupsIE(VimeoAlbumIE):
+class VimeoGroupsIE(VimeoChannelIE):
      IE_NAME = 'vimeo:group'
-    _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)(?:/(?!videos?/\d+)|$)'
+    _VALID_URL = r'https://vimeo\.com/groups/(?P<id>[^/]+)(?:/(?!videos?/\d+)|$)'
      _TESTS = [{
-        'url': 'https://vimeo.com/groups/rolexawards',
+        'url': 'https://vimeo.com/groups/kattykay',
          'info_dict': {
-            'id': 'rolexawards',
-            'title': 'Rolex Awards for Enterprise',
+            'id': 'kattykay',
+            'title': 'Katty Kay',
          },
-        'playlist_mincount': 73,
+        'playlist_mincount': 27,
      }]
-
-    def _extract_list_title(self, webpage):
-        return self._og_search_title(webpage, fatal=False)
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        name = mobj.group('name')
-        return self._extract_videos(name, 'https://vimeo.com/groups/%s' % name)
+    _BASE_URL_TEMPL = 'https://vimeo.com/groups/%s'
  
  
  class VimeoReviewIE(VimeoBaseInfoExtractor):
@@ -1002,7 +1009,9 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
              'title': "DICK HARDWICK 'Comedian'",
              'uploader': 'Richard Hardwick',
              'uploader_id': 'user21297594',
-        }
+            'description': "Comedian Dick Hardwick's five minute demo filmed in front of a live theater audience.\nEdit by Doug Mattocks",
+        },
+        'expected_warnings': ['Unable to download JSON metadata'],
      }, {
          'note': 'video player needs Referer',
          'url': 'https://vimeo.com/user22258446/review/91613211/13f927e053',
@@ -1015,7 +1024,8 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
              'duration': 2773,
              'thumbnail': r're:^https?://.*\.jpg$',
              'uploader_id': 'user22258446',
-        }
+        },
+        'skip': 'video gone',
      }, {
          'note': 'Password protected',
          'url': 'https://vimeo.com/user37284429/review/138823582/c4d865efde',
@@ -1035,33 +1045,20 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
      def _real_initialize(self):
          self._login()
  
-    def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
-        webpage = self._download_webpage(webpage_url, video_id)
-        config_url = self._html_search_regex(
-            r'data-config-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
-            'config URL', default=None, group='url')
-        if not config_url:
-            data = self._parse_json(self._search_regex(
-                r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
-                default=NO_DEFAULT if video_password_verified else '{}'), video_id)
-            config = data.get('vimeo_esi', {}).get('config', {})
-            config_url = config.get('configUrl') or try_get(config, lambda x: x['clipData']['configUrl'])
-        if config_url is None:
-            self._verify_video_password(webpage_url, video_id, webpage)
-            config_url = self._get_config_url(
-                webpage_url, video_id, video_password_verified=True)
-        return config_url
-
      def _real_extract(self, url):
          page_url, video_id = re.match(self._VALID_URL, url).groups()
-        config_url = self._get_config_url(url, video_id)
+        clip_data = self._download_json(
+            page_url.replace('/review/', '/review/data/'),
+            video_id)['clipData']
+        config_url = clip_data['configUrl']
          config = self._download_json(config_url, video_id)
          info_dict = self._parse_config(config, video_id)
-        source_format = self._extract_original_format(page_url, video_id)
+        source_format = self._extract_original_format(
+            page_url + '/action', video_id)
          if source_format:
              info_dict['formats'].append(source_format)
          self._vimeo_sort_formats(info_dict['formats'])
-        info_dict['id'] = video_id
+        info_dict['description'] = clean_html(clip_data.get('description'))
          return info_dict
  
  
@@ -1115,94 +1112,17 @@ class VimeoLikesIE(VimeoChannelIE):
          return self._extract_videos(user_id, 'https://vimeo.com/%s/likes' % user_id)
  
  
-class VHXEmbedIE(InfoExtractor):
+class VHXEmbedIE(VimeoBaseInfoExtractor):
      IE_NAME = 'vhx:embed'
      _VALID_URL = r'https?://embed\.vhx\.tv/videos/(?P<id>\d+)'
  
-    def _call_api(self, video_id, access_token, path='', query=None):
-        return self._download_json(
-            'https://api.vhx.tv/videos/' + video_id + path, video_id, headers={
-                'Authorization': 'Bearer ' + access_token,
-            }, query=query)
-
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
-        credentials = self._parse_json(self._search_regex(
-            r'(?s)credentials\s*:\s*({.+?}),', webpage,
-            'config'), video_id, js_to_json)
-        access_token = credentials['access_token']
-
-        query = {}
-        for k, v in credentials.items():
-            if k in ('authorization', 'authUserToken', 'ticket') and v and v != 'undefined':
-                if k == 'authUserToken':
-                    query['auth_user_token'] = v
-                else:
-                    query[k] = v
-        files = self._call_api(video_id, access_token, '/files', query)
-
-        formats = []
-        for f in files:
-            href = try_get(f, lambda x: x['_links']['source']['href'])
-            if not href:
-                continue
-            method = f.get('method')
-            if method == 'hls':
-                formats.extend(self._extract_m3u8_formats(
-                    href, video_id, 'mp4', 'm3u8_native',
-                    m3u8_id='hls', fatal=False))
-            elif method == 'dash':
-                formats.extend(self._extract_mpd_formats(
-                    href, video_id, mpd_id='dash', fatal=False))
-            else:
-                fmt = {
-                    'filesize': int_or_none(try_get(f, lambda x: x['size']['bytes'])),
-                    'format_id': 'http',
-                    'preference': 1,
-                    'url': href,
-                    'vcodec': f.get('codec'),
-                }
-                quality = f.get('quality')
-                if quality:
-                    fmt.update({
-                        'format_id': 'http-' + quality,
-                        'height': int_or_none(self._search_regex(r'(\d+)p', quality, 'height', default=None)),
-                    })
-                formats.append(fmt)
-        self._sort_formats(formats)
-
-        video_data = self._call_api(video_id, access_token)
-        title = video_data.get('title') or video_data['name']
-
-        subtitles = {}
-        for subtitle in try_get(video_data, lambda x: x['tracks']['subtitles'], list) or []:
-            lang = subtitle.get('srclang') or subtitle.get('label')
-            for _link in subtitle.get('_links', {}).values():
-                href = _link.get('href')
-                if not href:
-                    continue
-                subtitles.setdefault(lang, []).append({
-                    'url': href,
-                })
-
-        q = qualities(['small', 'medium', 'large', 'source'])
-        thumbnails = []
-        for thumbnail_id, thumbnail_url in video_data.get('thumbnail', {}).items():
-            thumbnails.append({
-                'id': thumbnail_id,
-                'url': thumbnail_url,
-                'preference': q(thumbnail_id),
-            })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': video_data.get('description'),
-            'duration': int_or_none(try_get(video_data, lambda x: x['duration']['seconds'])),
-            'formats': formats,
-            'subtitles': subtitles,
-            'thumbnails': thumbnails,
-            'timestamp': unified_timestamp(video_data.get('created_at')),
-            'view_count': int_or_none(video_data.get('plays_count')),
-        }
+        config_url = self._parse_json(self._search_regex(
+            r'window\.OTTData\s*=\s*({.+})', webpage,
+            'ott data'), video_id, js_to_json)['config_url']
+        config = self._download_json(config_url, video_id)
+        info = self._parse_config(config, video_id)
+        self._vimeo_sort_formats(info['formats'])
+        return info
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index 8b6dc0e244982f46cbc0d65bf5133b3505779fb0..00ec006c465e4f350bdc3f1099e6d38e5e2a8b8b 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  import collections
+import functools
  import re
  
  from .common import InfoExtractor
@@ -11,8 +12,8 @@ from ..utils import (
      ExtractorError,
      get_element_by_class,
      int_or_none,
+    OnDemandPagedList,
      orderedSet,
-    remove_start,
      str_or_none,
      str_to_int,
      unescapeHTML,
@@ -21,6 +22,7 @@ from ..utils import (
      urlencode_postdata,
  )
  from .dailymotion import DailymotionIE
+from .odnoklassniki import OdnoklassnikiIE
  from .pladform import PladformIE
  from .vimeo import VimeoIE
  from .youtube import YoutubeIE
@@ -60,6 +62,18 @@ class VKBaseIE(InfoExtractor):
      def _real_initialize(self):
          self._login()
  
+    def _download_payload(self, path, video_id, data, fatal=True):
+        data['al'] = 1
+        code, payload = self._download_json(
+            'https://vk.com/%s.php' % path, video_id,
+            data=urlencode_postdata(data), fatal=fatal,
+            headers={'X-Requested-With': 'XMLHttpRequest'})['payload']
+        if code == '3':
+            self.raise_login_required()
+        elif code == '8':
+            raise ExtractorError(clean_html(payload[0][1:-1]), expected=True)
+        return payload
+
  
  class VKIE(VKBaseIE):
      IE_NAME = 'vk'
@@ -96,7 +110,6 @@ class VKIE(VKBaseIE):
          },
          {
              'url': 'http://vk.com/video205387401_165548505',
-            'md5': '6c0aeb2e90396ba97035b9cbde548700',
              'info_dict': {
                  'id': '205387401_165548505',
                  'ext': 'mp4',
@@ -110,18 +123,18 @@ class VKIE(VKBaseIE):
          },
          {
              'note': 'Embedded video',
-            'url': 'http://vk.com/video_ext.php?oid=32194266&id=162925554&hash=7d8c2e0d5e05aeaa&hd=1',
-            'md5': 'c7ce8f1f87bec05b3de07fdeafe21a0a',
+            'url': 'https://vk.com/video_ext.php?oid=-77521&id=162222515&hash=87b046504ccd8bfa',
+            'md5': '7babad3b85ea2e91948005b1b8b0cb84',
              'info_dict': {
-                'id': '32194266_162925554',
+                'id': '-77521_162222515',
                  'ext': 'mp4',
-                'uploader': 'Vladimir Gavrin',
-                'title': 'Lin Dan',
-                'duration': 101,
-                'upload_date': '20120730',
-                'view_count': int,
+                'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
+                'title': 'ProtivoGunz - Хуёвая песня',
+                'duration': 195,
+                'upload_date': '20120212',
+                'timestamp': 1329049880,
+                'uploader_id': '-77521',
              },
-            'skip': 'This video has been removed from public access.',
          },
          {
              # VIDEO NOW REMOVED
@@ -138,18 +151,19 @@ class VKIE(VKBaseIE):
                  'upload_date': '20121218',
                  'view_count': int,
              },
-            'skip': 'Requires vk account credentials',
+            'skip': 'Removed',
          },
          {
              'url': 'http://vk.com/hd_kino_mania?z=video-43215063_168067957%2F15c66b9b533119788d',
-            'md5': '4d7a5ef8cf114dfa09577e57b2993202',
              'info_dict': {
                  'id': '-43215063_168067957',
                  'ext': 'mp4',
-                'uploader': 'Киномания - лучшее из мира кино',
+                'uploader': 'Bro Mazter',
                  'title': ' ',
                  'duration': 7291,
                  'upload_date': '20140328',
+                'uploader_id': '223413403',
+                'timestamp': 1396018030,
              },
              'skip': 'Requires vk account credentials',
          },
@@ -165,7 +179,7 @@ class VKIE(VKBaseIE):
                  'upload_date': '20140626',
                  'view_count': int,
              },
-            'skip': 'Only works from Russia',
+            'skip': 'Removed',
          },
          {
              # video (removed?) only available with list id
@@ -204,8 +218,7 @@ class VKIE(VKBaseIE):
                  'id': 'k3lz2cmXyRuJQSjGHUv',
                  'ext': 'mp4',
                  'title': 'md5:d52606645c20b0ddbb21655adaa4f56f',
-                # TODO: fix test by fixing dailymotion description extraction
-                'description': 'md5:c651358f03c56f1150b555c26d90a0fd',
+                'description': 'md5:424b8e88cc873217f520e582ba28bb36',
                  'uploader': 'AniLibria.Tv',
                  'upload_date': '20160914',
                  'uploader_id': 'x1p5vl5',
@@ -247,6 +260,9 @@ class VKIE(VKBaseIE):
                  'uploader_id': '-387766',
                  'timestamp': 1475137527,
              },
+            'params': {
+                'skip_download': True,
+            },
          },
          {
              # live stream, hls and rtmp links, most likely already finished live
@@ -288,80 +304,94 @@ class VKIE(VKBaseIE):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('videoid')
  
+        mv_data = {}
          if video_id:
-            info_url = 'https://vk.com/al_video.php?act=show_inline&al=1&video=' + video_id
+            data = {
+                'act': 'show_inline',
+                'video': video_id,
+            }
              # Some videos (removed?) can only be downloaded with list id specified
              list_id = mobj.group('list_id')
              if list_id:
-                info_url += '&list=%s' % list_id
+                data['list'] = list_id
+
+            payload = self._download_payload('al_video', video_id, data)
+            info_page = payload[1]
+            opts = payload[-1]
+            mv_data = opts.get('mvData') or {}
+            player = opts.get('player') or {}
          else:
-            info_url = 'http://vk.com/video_ext.php?' + mobj.group('embed_query')
              video_id = '%s_%s' % (mobj.group('oid'), mobj.group('id'))
  
-        info_page = self._download_webpage(info_url, video_id)
+            info_page = self._download_webpage(
+                'http://vk.com/video_ext.php?' + mobj.group('embed_query'), video_id)
  
-        error_message = self._html_search_regex(
-            [r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
-                r'(?s)<div[^>]+id="video_ext_msg"[^>]*>(.+?)</div>'],
-            info_page, 'error message', default=None)
-        if error_message:
-            raise ExtractorError(error_message, expected=True)
+            error_message = self._html_search_regex(
+                [r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
+                    r'(?s)<div[^>]+id="video_ext_msg"[^>]*>(.+?)</div>'],
+                info_page, 'error message', default=None)
+            if error_message:
+                raise ExtractorError(error_message, expected=True)
  
-        if re.search(r'<!>/login\.php\?.*\bact=security_check', info_page):
-            raise ExtractorError(
-                'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
-                expected=True)
+            if re.search(r'<!>/login\.php\?.*\bact=security_check', info_page):
+                raise ExtractorError(
+                    'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
+                    expected=True)
  
-        ERROR_COPYRIGHT = 'Video %s has been removed from public access due to rightholder complaint.'
+            ERROR_COPYRIGHT = 'Video %s has been removed from public access due to rightholder complaint.'
  
-        ERRORS = {
-            r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
-            ERROR_COPYRIGHT,
+            ERRORS = {
+                r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
+                ERROR_COPYRIGHT,
  
-            r'>The video .*? was removed from public access by request of the copyright holder.<':
-            ERROR_COPYRIGHT,
+                r'>The video .*? was removed from public access by request of the copyright holder.<':
+                ERROR_COPYRIGHT,
  
-            r'<!>Please log in or <':
-            'Video %s is only available for registered users, '
-            'use --username and --password options to provide account credentials.',
+                r'<!>Please log in or <':
+                'Video %s is only available for registered users, '
+                'use --username and --password options to provide account credentials.',
  
-            r'<!>Unknown error':
-            'Video %s does not exist.',
+                r'<!>Unknown error':
+                'Video %s does not exist.',
  
-            r'<!>Видео временно недоступно':
-            'Video %s is temporarily unavailable.',
+                r'<!>Видео временно недоступно':
+                'Video %s is temporarily unavailable.',
  
-            r'<!>Access denied':
-            'Access denied to video %s.',
+                r'<!>Access denied':
+                'Access denied to video %s.',
  
-            r'<!>Видеозапись недоступна, так как её автор был заблокирован.':
-            'Video %s is no longer available, because its author has been blocked.',
+                r'<!>Видеозапись недоступна, так как её автор был заблокирован.':
+                'Video %s is no longer available, because its author has been blocked.',
  
-            r'<!>This video is no longer available, because its author has been blocked.':
-            'Video %s is no longer available, because its author has been blocked.',
+                r'<!>This video is no longer available, because its author has been blocked.':
+                'Video %s is no longer available, because its author has been blocked.',
  
-            r'<!>This video is no longer available, because it has been deleted.':
-            'Video %s is no longer available, because it has been deleted.',
+                r'<!>This video is no longer available, because it has been deleted.':
+                'Video %s is no longer available, because it has been deleted.',
  
-            r'<!>The video .+? is not available in your region.':
-            'Video %s is not available in your region.',
-        }
+                r'<!>The video .+? is not available in your region.':
+                'Video %s is not available in your region.',
+            }
+
+            for error_re, error_msg in ERRORS.items():
+                if re.search(error_re, info_page):
+                    raise ExtractorError(error_msg % video_id, expected=True)
  
-        for error_re, error_msg in ERRORS.items():
-            if re.search(error_re, info_page):
-                raise ExtractorError(error_msg % video_id, expected=True)
+            player = self._parse_json(self._search_regex(
+                r'var\s+playerParams\s*=\s*({.+?})\s*;\s*\n',
+                info_page, 'player params'), video_id)
  
          youtube_url = YoutubeIE._extract_url(info_page)
          if youtube_url:
-            return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
+            return self.url_result(youtube_url, YoutubeIE.ie_key())
  
          vimeo_url = VimeoIE._extract_url(url, info_page)
          if vimeo_url is not None:
-            return self.url_result(vimeo_url)
+            return self.url_result(vimeo_url, VimeoIE.ie_key())
  
          pladform_url = PladformIE._extract_url(info_page)
          if pladform_url:
-            return self.url_result(pladform_url)
+            return self.url_result(pladform_url, PladformIE.ie_key())
  
          m_rutube = re.search(
              r'\ssrc="((?:https?:)?//rutube\.ru\\?/(?:video|play)\\?/embed(?:.*?))\\?"', info_page)
@@ -374,6 +404,10 @@ class VKIE(VKBaseIE):
          if dailymotion_urls:
              return self.url_result(dailymotion_urls[0], DailymotionIE.ie_key())
  
+        odnoklassniki_url = OdnoklassnikiIE._extract_url(info_page)
+        if odnoklassniki_url:
+            return self.url_result(odnoklassniki_url, OdnoklassnikiIE.ie_key())
+
          m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page)
          if m_opts:
              m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1))
@@ -383,38 +417,7 @@ class VKIE(VKBaseIE):
                      opts_url = 'http:' + opts_url
                  return self.url_result(opts_url)
  
-        # vars does not look to be served anymore since 24.10.2016
-        data = self._parse_json(
-            self._search_regex(
-                r'var\s+vars\s*=\s*({.+?});', info_page, 'vars', default='{}'),
-            video_id, fatal=False)
-
-        # <!json> is served instead
-        if not data:
-            data = self._parse_json(
-                self._search_regex(
-                    [r'<!json>\s*({.+?})\s*<!>', r'<!json>\s*({.+})'],
-                    info_page, 'json', default='{}'),
-                video_id)
-            if data:
-                data = data['player']['params'][0]
-
-        if not data:
-            data = self._parse_json(
-                self._search_regex(
-                    r'var\s+playerParams\s*=\s*({.+?})\s*;\s*\n', info_page,
-                    'player params', default='{}'),
-                video_id)
-            if data:
-                data = data['params'][0]
-
-        # <!--{...}
-        if not data:
-            data = self._parse_json(
-                self._search_regex(
-                    r'<!--\s*({.+})', info_page, 'payload'),
-                video_id)['payload'][-1][-1]['player']['params'][0]
-
+        data = player['params'][0]
          title = unescapeHTML(data['md_title'])
  
          # 2 = live
@@ -463,12 +466,12 @@ class VKIE(VKBaseIE):
              'title': title,
              'thumbnail': data.get('jpg'),
              'uploader': data.get('md_author'),
-            'uploader_id': str_or_none(data.get('author_id')),
-            'duration': data.get('duration'),
+            'uploader_id': str_or_none(data.get('author_id') or mv_data.get('authorId')),
+            'duration': int_or_none(data.get('duration') or mv_data.get('duration')),
              'timestamp': timestamp,
              'view_count': view_count,
-            'like_count': int_or_none(data.get('liked')),
-            'dislike_count': int_or_none(data.get('nolikes')),
+            'like_count': int_or_none(mv_data.get('likes')),
+            'comment_count': int_or_none(mv_data.get('commcount')),
              'is_live': is_live,
          }
  
@@ -476,15 +479,23 @@ class VKIE(VKBaseIE):
  class VKUserVideosIE(VKBaseIE):
      IE_NAME = 'vk:uservideos'
      IE_DESC = "VK - User's Videos"
-    _VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
+    _VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&](?:.*?\bsection=(?P<section>\w+))?|$)'
      _TEMPLATE_URL = 'https://vk.com/videos'
      _TESTS = [{
-        'url': 'http://vk.com/videos205387401',
+        'url': 'https://vk.com/videos-767561',
+        'info_dict': {
+            'id': '-767561_all',
+        },
+        'playlist_mincount': 1150,
+    }, {
+        'url': 'https://vk.com/videos-767561?section=uploaded',
          'info_dict': {
-            'id': '205387401',
-            'title': "Tom Cruise's Videos",
+            'id': '-767561_uploaded',
          },
-        'playlist_mincount': 4,
+        'playlist_mincount': 425,
+    }, {
+        'url': 'http://vk.com/videos205387401',
+        'only_matching': True,
      }, {
          'url': 'http://vk.com/videos-77521',
          'only_matching': True,
@@ -498,22 +509,33 @@ class VKUserVideosIE(VKBaseIE):
          'url': 'http://new.vk.com/videos205387401',
          'only_matching': True,
      }]
+    _PAGE_SIZE = 1000
+    _VIDEO = collections.namedtuple('Video', ['owner_id', 'id'])
+
+    def _fetch_page(self, page_id, section, page):
+        l = self._download_payload('al_video', page_id, {
+            'act': 'load_videos_silent',
+            'offset': page * self._PAGE_SIZE,
+            'oid': page_id,
+            'section': section,
+        })[0][section]['list']
+
+        for video in l:
+            v = self._VIDEO._make(video[:2])
+            video_id = '%d_%d' % (v.owner_id, v.id)
+            yield self.url_result(
+                'http://vk.com/video' + video_id, VKIE.ie_key(), video_id)
  
      def _real_extract(self, url):
-        page_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, page_id)
-
-        entries = [
-            self.url_result(
-                'http://vk.com/video' + video_id, 'VK', video_id=video_id)
-            for video_id in orderedSet(re.findall(r'href="/video(-?[0-9_]+)"', webpage))]
+        page_id, section = re.match(self._VALID_URL, url).groups()
+        if not section:
+            section = 'all'
  
-        title = unescapeHTML(self._search_regex(
-            r'<title>\s*([^<]+?)\s+\|\s+\d+\s+videos',
-            webpage, 'title', default=page_id))
+        entries = OnDemandPagedList(
+            functools.partial(self._fetch_page, page_id, section),
+            self._PAGE_SIZE)
  
-        return self.playlist_result(entries, page_id, title)
+        return self.playlist_result(entries, '%s_%s' % (page_id, section))
  
  
  class VKWallPostIE(VKBaseIE):
@@ -523,15 +545,15 @@ class VKWallPostIE(VKBaseIE):
          # public page URL, audio playlist
          'url': 'https://vk.com/bs.official?w=wall-23538238_35',
          'info_dict': {
-            'id': '23538238_35',
-            'title': 'Black Shadow - Wall post 23538238_35',
+            'id': '-23538238_35',
+            'title': 'Black Shadow - Wall post -23538238_35',
              'description': 'md5:3f84b9c4f9ef499731cf1ced9998cc0c',
          },
          'playlist': [{
              'md5': '5ba93864ec5b85f7ce19a9af4af080f6',
              'info_dict': {
                  'id': '135220665_111806521',
-                'ext': 'mp3',
+                'ext': 'mp4',
                  'title': 'Black Shadow - Слепое Верование',
                  'duration': 370,
                  'uploader': 'Black Shadow',
@@ -542,18 +564,16 @@ class VKWallPostIE(VKBaseIE):
              'md5': '4cc7e804579122b17ea95af7834c9233',
              'info_dict': {
                  'id': '135220665_111802303',
-                'ext': 'mp3',
+                'ext': 'mp4',
                  'title': 'Black Shadow - Война - Негасимое Бездны Пламя!',
                  'duration': 423,
                  'uploader': 'Black Shadow',
                  'artist': 'Black Shadow',
                  'track': 'Война - Негасимое Бездны Пламя!',
              },
-            'params': {
-                'skip_download': True,
-            },
          }],
          'params': {
+            'skip_download': True,
              'usenetrc': True,
          },
          'skip': 'Requires vk account credentials',
@@ -562,7 +582,7 @@ class VKWallPostIE(VKBaseIE):
          'url': 'https://vk.com/wall85155021_6319',
          'info_dict': {
              'id': '85155021_6319',
-            'title': 'Sergey Gorbunov - Wall post 85155021_6319',
+            'title': 'Сергей Горбунов - Wall post 85155021_6319',
          },
          'playlist_count': 1,
          'params': {
@@ -578,58 +598,72 @@ class VKWallPostIE(VKBaseIE):
          'url': 'https://m.vk.com/wall-23538238_35',
          'only_matching': True,
      }]
+    _BASE64_CHARS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN0PQRSTUVWXYZO123456789+/='
+    _AUDIO = collections.namedtuple('Audio', ['id', 'owner_id', 'url', 'title', 'performer', 'duration', 'album_id', 'unk', 'author_link', 'lyrics', 'flags', 'context', 'extra', 'hashes', 'cover_url', 'ads'])
+
+    def _decode(self, enc):
+        dec = ''
+        e = n = 0
+        for c in enc:
+            r = self._BASE64_CHARS.index(c)
+            cond = n % 4
+            e = 64 * e + r if cond else r
+            n += 1
+            if cond:
+                dec += chr(255 & e >> (-2 * n & 6))
+        return dec
+
+    def _unmask_url(self, mask_url, vk_id):
+        if 'audio_api_unavailable' in mask_url:
+            extra = mask_url.split('?extra=')[1].split('#')
+            func, base = self._decode(extra[1]).split(chr(11))
+            mask_url = list(self._decode(extra[0]))
+            url_len = len(mask_url)
+            indexes = [None] * url_len
+            index = int(base) ^ vk_id
+            for n in range(url_len - 1, -1, -1):
+                index = (url_len * (n + 1) ^ index + n) % url_len
+                indexes[n] = index
+            for n in range(1, url_len):
+                c = mask_url[n]
+                index = indexes[url_len - 1 - n]
+                mask_url[n] = mask_url[index]
+                mask_url[index] = c
+            mask_url = ''.join(mask_url)
+        return mask_url
  
      def _real_extract(self, url):
          post_id = self._match_id(url)
  
-        wall_url = 'https://vk.com/wall%s' % post_id
-
-        post_id = remove_start(post_id, '-')
-
-        webpage = self._download_webpage(wall_url, post_id)
-
-        error = self._html_search_regex(
-            r'>Error</div>\s*<div[^>]+class=["\']body["\'][^>]*>([^<]+)',
-            webpage, 'error', default=None)
-        if error:
-            raise ExtractorError('VK said: %s' % error, expected=True)
+        webpage = self._download_payload('wkview', post_id, {
+            'act': 'show',
+            'w': 'wall' + post_id,
+        })[1]
  
          description = clean_html(get_element_by_class('wall_post_text', webpage))
          uploader = clean_html(get_element_by_class('author', webpage))
-        thumbnail = self._og_search_thumbnail(webpage)
  
          entries = []
  
-        audio_ids = re.findall(r'data-full-id=["\'](\d+_\d+)', webpage)
-        if audio_ids:
-            al_audio = self._download_webpage(
-                'https://vk.com/al_audio.php', post_id,
-                note='Downloading audio info', fatal=False,
-                data=urlencode_postdata({
-                    'act': 'reload_audio',
-                    'al': '1',
-                    'ids': ','.join(audio_ids)
-                }))
-            if al_audio:
-                Audio = collections.namedtuple(
-                    'Audio', ['id', 'user_id', 'url', 'track', 'artist', 'duration'])
-                audios = self._parse_json(
-                    self._search_regex(
-                        r'<!json>(.+?)<!>', al_audio, 'audios', default='[]'),
-                    post_id, fatal=False, transform_source=unescapeHTML)
-                if isinstance(audios, list):
-                    for audio in audios:
-                        a = Audio._make(audio[:6])
-                        entries.append({
-                            'id': '%s_%s' % (a.user_id, a.id),
-                            'url': a.url,
-                            'title': '%s - %s' % (a.artist, a.track) if a.artist and a.track else a.id,
-                            'thumbnail': thumbnail,
-                            'duration': a.duration,
-                            'uploader': uploader,
-                            'artist': a.artist,
-                            'track': a.track,
-                        })
+        for audio in re.findall(r'data-audio="([^"]+)', webpage):
+            audio = self._parse_json(unescapeHTML(audio), post_id)
+            a = self._AUDIO._make(audio[:16])
+            if not a.url:
+                continue
+            title = unescapeHTML(a.title)
+            performer = unescapeHTML(a.performer)
+            entries.append({
+                'id': '%s_%s' % (a.owner_id, a.id),
+                'url': self._unmask_url(a.url, a.ads['vk_id']),
+                'title': '%s - %s' % (performer, title) if performer else title,
+                'thumbnails': [{'url': c_url} for c_url in a.cover_url.split(',')] if a.cover_url else None,
+                'duration': int_or_none(a.duration),
+                'uploader': uploader,
+                'artist': performer,
+                'track': title,
+                'ext': 'mp4',
+                'protocol': 'm3u8',
+            })
  
          for video in re.finditer(
                  r'<a[^>]+href=(["\'])(?P<url>/video(?:-?[\d_]+).*?)\1', webpage):
diff --git a/youtube_dl/extractor/vlive.py b/youtube_dl/extractor/vlive.py

index c3429f723ddec36cb89dfc9f329f766f45b953e1..f79531e6f3a2e922b0369706cc0d76a22feb2499 100644 (file)
--- a/youtube_dl/extractor/vlive.py
+++ b/youtube_dl/extractor/vlive.py
@@ -6,22 +6,18 @@ import time
  import itertools
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_str,
-)
+from .naver import NaverBaseIE
+from ..compat import compat_str
  from ..utils import (
-    dict_get,
      ExtractorError,
-    float_or_none,
-    int_or_none,
+    merge_dicts,
      remove_start,
      try_get,
      urlencode_postdata,
  )
  
  
-class VLiveIE(InfoExtractor):
+class VLiveIE(NaverBaseIE):
      IE_NAME = 'vlive'
      _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
      _NETRC_MACHINE = 'vlive'
@@ -34,6 +30,7 @@ class VLiveIE(InfoExtractor):
              'title': "[V LIVE] Girl's Day's Broadcast",
              'creator': "Girl's Day",
              'view_count': int,
+            'uploader_id': 'muploader_a',
          },
      }, {
          'url': 'http://www.vlive.tv/video/16937',
@@ -44,6 +41,7 @@ class VLiveIE(InfoExtractor):
              'creator': 'EXO',
              'view_count': int,
              'subtitles': 'mincount:12',
+            'uploader_id': 'muploader_j',
          },
          'params': {
              'skip_download': True,
@@ -187,45 +185,9 @@ class VLiveIE(InfoExtractor):
                      'This video is only available for CH+ subscribers')
              long_video_id, key = video_info['vid'], video_info['inkey']
  
-        playinfo = self._download_json(
-            'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
-            % compat_urllib_parse_urlencode({
-                'videoId': long_video_id,
-                'key': key,
-                'ptc': 'http',
-                'doct': 'json',  # document type (xml or json)
-                'cpt': 'vtt',  # captions type (vtt or ttml)
-            }), video_id)
-
-        formats = [{
-            'url': vid['source'],
-            'format_id': vid.get('encodingOption', {}).get('name'),
-            'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
-            'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
-            'width': int_or_none(vid.get('encodingOption', {}).get('width')),
-            'height': int_or_none(vid.get('encodingOption', {}).get('height')),
-            'filesize': int_or_none(vid.get('size')),
-        } for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
-        self._sort_formats(formats)
-
-        view_count = int_or_none(playinfo.get('meta', {}).get('count'))
-
-        subtitles = {}
-        for caption in playinfo.get('captions', {}).get('list', []):
-            lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
-            if lang and caption.get('source'):
-                subtitles[lang] = [{
-                    'ext': 'vtt',
-                    'url': caption['source']}]
-
-        info = self._get_common_fields(webpage)
-        info.update({
-            'id': video_id,
-            'formats': formats,
-            'view_count': view_count,
-            'subtitles': subtitles,
-        })
-        return info
+        return merge_dicts(
+            self._get_common_fields(webpage),
+            self._extract_video_info(video_id, long_video_id, key))
  
      def _download_init_page(self, video_id):
          return self._download_webpage(
diff --git a/youtube_dl/extractor/vodplatform.py b/youtube_dl/extractor/vodplatform.py

index 239644340384b60c8e1a80d40b50cabbd0fd2c9e..74d2257e7c059cdb410db69d77c25e1c7e8c9902 100644 (file)
--- a/youtube_dl/extractor/vodplatform.py
+++ b/youtube_dl/extractor/vodplatform.py
@@ -6,8 +6,8 @@ from ..utils import unescapeHTML
  
  
  class VODPlatformIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vod-platform\.net/[eE]mbed/(?P<id>[^/?#]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/(?P<id>[^/?#]+)'
+    _TESTS = [{
          # from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
          'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
          'md5': '1db2b7249ce383d6be96499006e951fc',
@@ -16,7 +16,10 @@ class VODPlatformIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
          }
-    }
+    }, {
+        'url': 'http://embed.kwikmotion.com/embed/RufMcytHDolTH1MuKHY9Fw',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 59e1359c48628af9b4c53bedc337fa6b9b3d1396..a52e40afa2892a10538251ba40e4d2a44a10a67d 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -1,17 +1,12 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
-    sanitized_Request,
+    urljoin,
  )
  
  
@@ -26,8 +21,7 @@ class VoiceRepublicIE(InfoExtractor):
              'ext': 'm4a',
              'title': 'Watching the Watchers: Building a Sousveillance State',
              'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
-            'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
-            'duration': 1800,
+            'duration': 1556,
              'view_count': int,
          }
      }, {
@@ -38,63 +32,31 @@ class VoiceRepublicIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
-        req = sanitized_Request(
-            compat_urlparse.urljoin(url, '/talks/%s' % display_id))
-        # Older versions of Firefox get redirected to an "upgrade browser" page
-        req.add_header('User-Agent', 'youtube-dl')
-        webpage = self._download_webpage(req, display_id)
+        webpage = self._download_webpage(url, display_id)
  
          if '>Queued for processing, please stand by...<' in webpage:
              raise ExtractorError(
                  'Audio is still queued for processing', expected=True)
  
-        config = self._search_regex(
-            r'(?s)return ({.+?});\s*\n', webpage,
-            'data', default=None)
-        data = self._parse_json(config, display_id, fatal=False) if config else None
-        if data:
-            title = data['title']
-            description = data.get('teaser')
-            talk_id = compat_str(data.get('talk_id') or display_id)
-            talk = data['talk']
-            duration = int_or_none(talk.get('duration'))
-            formats = [{
-                'url': compat_urlparse.urljoin(url, talk_url),
-                'format_id': format_id,
-                'ext': determine_ext(talk_url) or format_id,
-                'vcodec': 'none',
-            } for format_id, talk_url in talk['links'].items()]
-        else:
-            title = self._og_search_title(webpage)
-            description = self._html_search_regex(
-                r"(?s)<div class='talk-teaser'[^>]*>(.+?)</div>",
-                webpage, 'description', fatal=False)
-            talk_id = self._search_regex(
-                [r"id='jc-(\d+)'", r"data-shareable-id='(\d+)'"],
-                webpage, 'talk id', default=None) or display_id
-            duration = None
-            player = self._search_regex(
-                r"class='vr-player jp-jplayer'([^>]+)>", webpage, 'player')
-            formats = [{
-                'url': compat_urlparse.urljoin(url, talk_url),
-                'format_id': format_id,
-                'ext': determine_ext(talk_url) or format_id,
-                'vcodec': 'none',
-            } for format_id, talk_url in re.findall(r"data-([^=]+)='([^']+)'", player)]
+        talk = self._parse_json(self._search_regex(
+            r'initialSnapshot\s*=\s*({.+?});',
+            webpage, 'talk'), display_id)['talk']
+        title = talk['title']
+        formats = [{
+            'url': urljoin(url, talk_url),
+            'format_id': format_id,
+            'ext': determine_ext(talk_url) or format_id,
+            'vcodec': 'none',
+        } for format_id, talk_url in talk['media_links'].items()]
          self._sort_formats(formats)
  
-        thumbnail = self._og_search_thumbnail(webpage)
-        view_count = int_or_none(self._search_regex(
-            r"class='play-count[^']*'>\s*(\d+) plays",
-            webpage, 'play count', fatal=False))
-
          return {
-            'id': talk_id,
+            'id': compat_str(talk.get('id') or display_id),
              'display_id': display_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'view_count': view_count,
+            'description': talk.get('teaser'),
+            'thumbnail': talk.get('image_url'),
+            'duration': int_or_none(talk.get('archived_duration')),
+            'view_count': int_or_none(talk.get('play_count')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/vzaar.py b/youtube_dl/extractor/vzaar.py

index 3336e6c152f80212468cc950c8da662ddf5998db..b7d02fca3ac426679b21f2ba7213e2ac6fdb701b 100644 (file)
--- a/youtube_dl/extractor/vzaar.py
+++ b/youtube_dl/extractor/vzaar.py
@@ -32,6 +32,18 @@ class VzaarIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'MP3',
          },
+    }, {
+        # hlsAes = true
+        'url': 'https://view.vzaar.com/11379930/player',
+        'info_dict': {
+            'id': '11379930',
+            'ext': 'mp4',
+            'title': 'Videoaula',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          # with null videoTitle
          'url': 'https://view.vzaar.com/20313539/download',
@@ -58,6 +70,7 @@ class VzaarIE(InfoExtractor):
              f = {
                  'url': source_url,
                  'format_id': 'http',
+                'preference': 1,
              }
              if 'audio' in source_url:
                  f.update({
@@ -75,13 +88,17 @@ class VzaarIE(InfoExtractor):
  
          video_guid = video_data.get('guid')
          usp = video_data.get('usp')
-        if isinstance(video_guid, compat_str) and isinstance(usp, dict):
-            m3u8_url = ('http://fable.vzaar.com/v4/usp/%s/%s.ism/.m3u8?'
-                        % (video_guid, video_id)) + '&'.join(
-                '%s=%s' % (k, v) for k, v in usp.items())
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                m3u8_id='hls', fatal=False))
+        if video_data.get('uspEnabled') and isinstance(video_guid, compat_str) and isinstance(usp, dict):
+            hls_aes = video_data.get('hlsAes')
+            qs = '&'.join('%s=%s' % (k, v) for k, v in usp.items())
+            url_templ = 'http://%%s.vzaar.com/v5/usp%s/%s/%s.ism%%s?' % ('aes' if hls_aes else '', video_guid, video_id)
+            m3u8_formats = self._extract_m3u8_formats(
+                url_templ % ('fable', '/.m3u8') + qs, video_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False)
+            if hls_aes:
+                for f in m3u8_formats:
+                    f['_decryption_key_url'] = url_templ % ('goose', '') + qs
+            formats.extend(m3u8_formats)
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/wimp.py b/youtube_dl/extractor/wimp.py

deleted file mode 100644 (file)

index ea234e3..0000000
--- a/youtube_dl/extractor/wimp.py
+++ /dev/null
@@ -1,54 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from .youtube import YoutubeIE
-
-
-class WimpIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
-    _TESTS = [{
-        'url': 'http://www.wimp.com/maru-is-exhausted/',
-        'md5': 'ee21217ffd66d058e8b16be340b74883',
-        'info_dict': {
-            'id': 'maru-is-exhausted',
-            'ext': 'mp4',
-            'title': 'Maru is exhausted.',
-            'description': 'md5:57e099e857c0a4ea312542b684a869b8',
-        }
-    }, {
-        'url': 'http://www.wimp.com/clowncar/',
-        'md5': '5c31ad862a90dc5b1f023956faec13fe',
-        'info_dict': {
-            'id': 'cG4CEr2aiSg',
-            'ext': 'webm',
-            'title': 'Basset hound clown car...incredible!',
-            'description': '5 of my Bassets crawled in this dog loo! www.bellinghambassets.com\n\nFor licensing/usage please contact: licensing(at)jukinmediadotcom',
-            'upload_date': '20140303',
-            'uploader': 'Gretchen Hoey',
-            'uploader_id': 'gretchenandjeff1',
-        },
-        'add_ie': ['Youtube'],
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        youtube_id = self._search_regex(
-            (r"videoId\s*:\s*[\"']([0-9A-Za-z_-]{11})[\"']",
-             r'data-id=["\']([0-9A-Za-z_-]{11})'),
-            webpage, 'video URL', default=None)
-        if youtube_id:
-            return self.url_result(youtube_id, YoutubeIE.ie_key())
-
-        info_dict = self._extract_jwplayer_data(
-            webpage, video_id, require_title=False)
-
-        info_dict.update({
-            'id': video_id,
-            'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
-        })
-
-        return info_dict
diff --git a/youtube_dl/extractor/wistia.py b/youtube_dl/extractor/wistia.py

index fa142b974ae873a7883ede4894421f4f8a010549..085514d470bcee6d5719a9260ceefdfc3a9cdb17 100644 (file)
--- a/youtube_dl/extractor/wistia.py
+++ b/youtube_dl/extractor/wistia.py
@@ -12,9 +12,8 @@ from ..utils import (
  
  
  class WistiaIE(InfoExtractor):
-    _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]+)'
-    _API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
-    _IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
+    _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
+    _EMBED_BASE_URL = 'http://fast.wistia.com/embed/'
  
      _TESTS = [{
          'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
@@ -43,33 +42,34 @@ class WistiaIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    # https://wistia.com/support/embed-and-share/video-on-your-website
      @staticmethod
      def _extract_url(webpage):
          match = re.search(
-            r'<(?:meta[^>]+?content|iframe[^>]+?src)=(["\'])(?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/iframe/.+?)\1', webpage)
+            r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage)
          if match:
              return unescapeHTML(match.group('url'))
  
-        match = re.search(r'(?:id=["\']wistia_|data-wistia-?id=["\']|Wistia\.embed\(["\'])(?P<id>[^"\']+)', webpage)
-        if match:
-            return 'wistia:%s' % match.group('id')
-
          match = re.search(
              r'''(?sx)
                  <script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
-                <div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]+)\b.*?\2
+                <div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
              ''', webpage)
          if match:
              return 'wistia:%s' % match.group('id')
  
+        match = re.search(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage)
+        if match:
+            return 'wistia:%s' % match.group('id')
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          data_json = self._download_json(
-            self._API_URL % video_id, video_id,
+            self._EMBED_BASE_URL + 'medias/%s.json' % video_id, video_id,
              # Some videos require this.
              headers={
-                'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
+                'Referer': url if url.startswith('http') else self._EMBED_BASE_URL + 'iframe/' + video_id,
              })
  
          if data_json.get('error'):
@@ -94,27 +94,61 @@ class WistiaIE(InfoExtractor):
                      'url': aurl,
                      'width': int_or_none(a.get('width')),
                      'height': int_or_none(a.get('height')),
+                    'filesize': int_or_none(a.get('size')),
                  })
              else:
                  aext = a.get('ext')
-                is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
-                formats.append({
-                    'format_id': atype,
+                display_name = a.get('display_name')
+                format_id = atype
+                if atype and atype.endswith('_video') and display_name:
+                    format_id = '%s-%s' % (atype[:-6], display_name)
+                f = {
+                    'format_id': format_id,
                      'url': aurl,
-                    'tbr': int_or_none(a.get('bitrate')),
-                    'vbr': int_or_none(a.get('opt_vbitrate')),
-                    'width': int_or_none(a.get('width')),
-                    'height': int_or_none(a.get('height')),
-                    'filesize': int_or_none(a.get('size')),
-                    'vcodec': a.get('codec'),
-                    'container': a.get('container'),
-                    'ext': 'mp4' if is_m3u8 else aext,
-                    'protocol': 'm3u8' if is_m3u8 else None,
+                    'tbr': int_or_none(a.get('bitrate')) or None,
                      'preference': 1 if atype == 'original' else None,
-                })
+                }
+                if display_name == 'Audio':
+                    f.update({
+                        'vcodec': 'none',
+                    })
+                else:
+                    f.update({
+                        'width': int_or_none(a.get('width')),
+                        'height': int_or_none(a.get('height')),
+                        'vcodec': a.get('codec'),
+                    })
+                if a.get('container') == 'm3u8' or aext == 'm3u8':
+                    ts_f = f.copy()
+                    ts_f.update({
+                        'ext': 'ts',
+                        'format_id': f['format_id'].replace('hls-', 'ts-'),
+                        'url': f['url'].replace('.bin', '.ts'),
+                    })
+                    formats.append(ts_f)
+                    f.update({
+                        'ext': 'mp4',
+                        'protocol': 'm3u8_native',
+                    })
+                else:
+                    f.update({
+                        'container': a.get('container'),
+                        'ext': aext,
+                        'filesize': int_or_none(a.get('size')),
+                    })
+                formats.append(f)
  
          self._sort_formats(formats)
  
+        subtitles = {}
+        for caption in data.get('captions', []):
+            language = caption.get('language')
+            if not language:
+                continue
+            subtitles[language] = [{
+                'url': self._EMBED_BASE_URL + 'captions/' + video_id + '.vtt?language=' + language,
+            }]
+
          return {
              'id': video_id,
              'title': title,
@@ -123,4 +157,5 @@ class WistiaIE(InfoExtractor):
              'thumbnails': thumbnails,
              'duration': float_or_none(data.get('duration')),
              'timestamp': int_or_none(data.get('createdAt')),
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py

index b38c7a7b3cd5f87a33f814444fd70af3219afb93..48ef07ed1415aff0e81f9029136d480c76e3c14c 100644 (file)
--- a/youtube_dl/extractor/xfileshare.py
+++ b/youtube_dl/extractor/xfileshare.py
@@ -4,37 +4,64 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_chr
  from ..utils import (
      decode_packed_codes,
      determine_ext,
      ExtractorError,
      int_or_none,
-    NO_DEFAULT,
+    js_to_json,
      urlencode_postdata,
  )
  
  
+# based on openload_decode from 2bfeee69b976fe049761dd3012e30b637ee05a58
+def aa_decode(aa_code):
+    symbol_table = [
+        ('7', '((ﾟｰﾟ) + (o^_^o))'),
+        ('6', '((o^_^o) +(o^_^o))'),
+        ('5', '((ﾟｰﾟ) + (ﾟΘﾟ))'),
+        ('2', '((o^_^o) - (ﾟΘﾟ))'),
+        ('4', '(ﾟｰﾟ)'),
+        ('3', '(o^_^o)'),
+        ('1', '(ﾟΘﾟ)'),
+        ('0', '(c^_^o)'),
+    ]
+    delim = '(ﾟДﾟ)[ﾟεﾟ]+'
+    ret = ''
+    for aa_char in aa_code.split(delim):
+        for val, pat in symbol_table:
+            aa_char = aa_char.replace(pat, val)
+        aa_char = aa_char.replace('+ ', '')
+        m = re.match(r'^\d+', aa_char)
+        if m:
+            ret += compat_chr(int(m.group(0), 8))
+        else:
+            m = re.match(r'^u([\da-f]+)', aa_char)
+            if m:
+                ret += compat_chr(int(m.group(1), 16))
+    return ret
+
+
  class XFileShareIE(InfoExtractor):
      _SITES = (
-        (r'daclips\.(?:in|com)', 'DaClips'),
-        (r'filehoot\.com', 'FileHoot'),
-        (r'gorillavid\.(?:in|com)', 'GorillaVid'),
-        (r'movpod\.in', 'MovPod'),
-        (r'powerwatch\.pw', 'PowerWatch'),
-        (r'rapidvideo\.ws', 'Rapidvideo.ws'),
+        (r'clipwatching\.com', 'ClipWatching'),
+        (r'gounlimited\.to', 'GoUnlimited'),
+        (r'govid\.me', 'GoVid'),
+        (r'holavid\.com', 'HolaVid'),
+        (r'streamty\.com', 'Streamty'),
          (r'thevideobee\.to', 'TheVideoBee'),
-        (r'vidto\.(?:me|se)', 'Vidto'),
-        (r'streamin\.to', 'Streamin.To'),
-        (r'xvidstage\.com', 'XVIDSTAGE'),
-        (r'vidabc\.com', 'Vid ABC'),
+        (r'uqload\.com', 'Uqload'),
          (r'vidbom\.com', 'VidBom'),
          (r'vidlo\.us', 'vidlo'),
-        (r'rapidvideo\.(?:cool|org)', 'RapidVideo.TV'),
-        (r'fastvideo\.me', 'FastVideo.me'),
+        (r'vidlocker\.xyz', 'VidLocker'),
+        (r'vidshare\.tv', 'VidShare'),
+        (r'vup\.to', 'VUp'),
+        (r'xvideosharing\.com', 'XVideoSharing'),
      )
  
      IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
-    _VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
+    _VALID_URL = (r'https?://(?:www\.)?(?P<host>%s)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
                    % '|'.join(site for site in list(zip(*_SITES))[0]))
  
      _FILE_NOT_FOUND_REGEXES = (
@@ -43,82 +70,14 @@ class XFileShareIE(InfoExtractor):
      )
  
      _TESTS = [{
-        'url': 'http://gorillavid.in/06y9juieqpmi',
-        'md5': '5ae4a3580620380619678ee4875893ba',
-        'info_dict': {
-            'id': '06y9juieqpmi',
-            'ext': 'mp4',
-            'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
-            'thumbnail': r're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
-        'only_matching': True,
-    }, {
-        'url': 'http://daclips.in/3rso4kdn6f9m',
-        'md5': '1ad8fd39bb976eeb66004d3a4895f106',
+        'url': 'http://xvideosharing.com/fq65f94nd2ve',
+        'md5': '4181f63957e8fe90ac836fa58dc3c8a6',
          'info_dict': {
-            'id': '3rso4kdn6f9m',
+            'id': 'fq65f94nd2ve',
              'ext': 'mp4',
-            'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
+            'title': 'sample',
              'thumbnail': r're:http://.*\.jpg',
-        }
-    }, {
-        'url': 'http://movpod.in/0wguyyxi1yca',
-        'only_matching': True,
-    }, {
-        'url': 'http://filehoot.com/3ivfabn7573c.html',
-        'info_dict': {
-            'id': '3ivfabn7573c',
-            'ext': 'mp4',
-            'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
-            'thumbnail': r're:http://.*\.jpg',
-        },
-        'skip': 'Video removed',
-    }, {
-        'url': 'http://vidto.me/ku5glz52nqe1.html',
-        'info_dict': {
-            'id': 'ku5glz52nqe1',
-            'ext': 'mp4',
-            'title': 'test'
-        }
-    }, {
-        'url': 'http://powerwatch.pw/duecjibvicbu',
-        'info_dict': {
-            'id': 'duecjibvicbu',
-            'ext': 'mp4',
-            'title': 'Big Buck Bunny trailer',
-        },
-    }, {
-        'url': 'http://xvidstage.com/e0qcnl03co6z',
-        'info_dict': {
-            'id': 'e0qcnl03co6z',
-            'ext': 'mp4',
-            'title': 'Chucky Prank 2015.mp4',
-        },
-    }, {
-        # removed by administrator
-        'url': 'http://xvidstage.com/amfy7atlkx25',
-        'only_matching': True,
-    }, {
-        'url': 'http://vidabc.com/i8ybqscrphfv',
-        'info_dict': {
-            'id': 'i8ybqscrphfv',
-            'ext': 'mp4',
-            'title': 're:Beauty and the Beast 2017',
-        },
-        'params': {
-            'skip_download': True,
          },
-    }, {
-        'url': 'http://www.rapidvideo.cool/b667kprndr8w',
-        'only_matching': True,
-    }, {
-        'url': 'http://www.fastvideo.me/k8604r8nk8sn/FAST_FURIOUS_8_-_Trailer_italiano_ufficiale.mp4.html',
-        'only_matching': True,
-    }, {
-        'url': 'http://vidto.se/1tx1pf6t12cg.html',
-        'only_matching': True,
      }]
  
      @staticmethod
@@ -131,10 +90,9 @@ class XFileShareIE(InfoExtractor):
                  webpage)]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        host, video_id = re.match(self._VALID_URL, url).groups()
  
-        url = 'http://%s/%s' % (mobj.group('host'), video_id)
+        url = 'https://%s/' % host + ('embed-%s.html' % video_id if host in ('govid.me', 'vidlo.us') else video_id)
          webpage = self._download_webpage(url, video_id)
  
          if any(re.search(p, webpage) for p in self._FILE_NOT_FOUND_REGEXES):
@@ -142,7 +100,7 @@ class XFileShareIE(InfoExtractor):
  
          fields = self._hidden_inputs(webpage)
  
-        if fields['op'] == 'download1':
+        if fields.get('op') == 'download1':
              countdown = int_or_none(self._search_regex(
                  r'<span id="countdown_str">(?:[Ww]ait)?\s*<span id="cxc">(\d+)</span>\s*(?:seconds?)?</span>',
                  webpage, 'countdown', default=None))
@@ -160,13 +118,37 @@ class XFileShareIE(InfoExtractor):
              (r'style="z-index: [0-9]+;">([^<]+)</span>',
               r'<td nowrap>([^<]+)</td>',
               r'h4-fine[^>]*>([^<]+)<',
-             r'>Watch (.+) ',
+             r'>Watch (.+)[ <]',
               r'<h2 class="video-page-head">([^<]+)</h2>',
-             r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<'),  # streamin.to
+             r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<',  # streamin.to
+             r'title\s*:\s*"([^"]+)"'),  # govid.me
              webpage, 'title', default=None) or self._og_search_title(
              webpage, default=None) or video_id).strip()
  
-        def extract_formats(default=NO_DEFAULT):
+        for regex, func in (
+                (r'(eval\(function\(p,a,c,k,e,d\){.+)', decode_packed_codes),
+                (r'(ﾟ.+)', aa_decode)):
+            obf_code = self._search_regex(regex, webpage, 'obfuscated code', default=None)
+            if obf_code:
+                webpage = webpage.replace(obf_code, func(obf_code))
+
+        formats = []
+
+        jwplayer_data = self._search_regex(
+            [
+                r'jwplayer\("[^"]+"\)\.load\(\[({.+?})\]\);',
+                r'jwplayer\("[^"]+"\)\.setup\(({.+?})\);',
+            ], webpage,
+            'jwplayer data', default=None)
+        if jwplayer_data:
+            jwplayer_data = self._parse_json(
+                jwplayer_data.replace(r"\'", "'"), video_id, js_to_json)
+            if jwplayer_data:
+                formats = self._parse_jwplayer_data(
+                    jwplayer_data, video_id, False,
+                    m3u8_id='hls', mpd_id='dash')['formats']
+
+        if not formats:
              urls = []
              for regex in (
                      r'(?:file|src)\s*:\s*(["\'])(?P<url>http(?:(?!\1).)+\.(?:m3u8|mp4|flv)(?:(?!\1).)*)\1',
@@ -177,6 +159,12 @@ class XFileShareIE(InfoExtractor):
                      video_url = mobj.group('url')
                      if video_url not in urls:
                          urls.append(video_url)
+
+            sources = self._search_regex(
+                r'sources\s*:\s*(\[(?!{)[^\]]+\])', webpage, 'sources', default=None)
+            if sources:
+                urls.extend(self._parse_json(sources, video_id))
+
              formats = []
              for video_url in urls:
                  if determine_ext(video_url) == 'm3u8':
@@ -189,21 +177,13 @@ class XFileShareIE(InfoExtractor):
                          'url': video_url,
                          'format_id': 'sd',
                      })
-            if not formats and default is not NO_DEFAULT:
-                return default
-            self._sort_formats(formats)
-            return formats
-
-        formats = extract_formats(default=None)
-
-        if not formats:
-            webpage = decode_packed_codes(self._search_regex(
-                r"(}\('(.+)',(\d+),(\d+),'[^']*\b(?:file|embed)\b[^']*'\.split\('\|'\))",
-                webpage, 'packed code'))
-            formats = extract_formats()
+        self._sort_formats(formats)
  
          thumbnail = self._search_regex(
-            r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)
+            [
+                r'<video[^>]+poster="([^"]+)"',
+                r'(?:image|poster)\s*:\s*["\'](http[^"\']+)["\'],',
+            ], webpage, 'thumbnail', default=None)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/xvideos.py b/youtube_dl/extractor/xvideos.py

index 166bcf44380813e53686724e2ee629a232fcf190..8fc64914caf7166bcc49525ae133325067b134d3 100644 (file)
--- a/youtube_dl/extractor/xvideos.py
+++ b/youtube_dl/extractor/xvideos.py
@@ -17,7 +17,8 @@ class XVideosIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
-                            (?:www\.)?xvideos\.com/video|
+                            (?:[^/]+\.)?xvideos2?\.com/video|
+                            (?:www\.)?xvideos\.es/video|
                              flashservice\.xvideos\.com/embedframe/|
                              static-hw\.xvideos\.com/swf/xv-player\.swf\?.*?\bid_video=
                          )
@@ -39,6 +40,42 @@ class XVideosIE(InfoExtractor):
      }, {
          'url': 'http://static-hw.xvideos.com/swf/xv-player.swf?id_video=4588838',
          'only_matching': True,
+    }, {
+        'url': 'http://xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'https://xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'https://xvideos.es/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'https://www.xvideos.es/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'http://xvideos.es/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'http://www.xvideos.es/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'http://fr.xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'https://fr.xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'http://it.xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'https://it.xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'http://de.xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
+    }, {
+        'url': 'https://de.xvideos.com/video4588838/biker_takes_his_girl',
+        'only_matching': True
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/yahoo.py b/youtube_dl/extractor/yahoo.py

index e5ebdd1806ec30944d04ddb56707fea1228e2fae..238d9cea0c729912351895e5bd6ad453d43b7d31 100644 (file)
--- a/youtube_dl/extractor/yahoo.py
+++ b/youtube_dl/extractor/yahoo.py
@@ -3,453 +3,313 @@ from __future__ import unicode_literals
  
  import hashlib
  import itertools
-import json
  import re
  
  from .common import InfoExtractor, SearchInfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urlparse,
  )
  from ..utils import (
      clean_html,
-    determine_ext,
-    ExtractorError,
-    extract_attributes,
      int_or_none,
      mimetype2ext,
+    parse_iso8601,
      smuggle_url,
      try_get,
-    unescapeHTML,
      url_or_none,
  )
  
-from .brightcove import (
-    BrightcoveLegacyIE,
-    BrightcoveNewIE,
-)
-from .nbc import NBCSportsVPlayerIE
+from .brightcove import BrightcoveNewIE
  
  
  class YahooIE(InfoExtractor):
      IE_DESC = 'Yahoo screen and movies'
-    _VALID_URL = r'(?P<host>https?://(?:(?P<country>[a-zA-Z]{2})\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?:(?P<display_id>.+)?-)?(?P<id>[0-9]+)(?:-[a-z]+)?(?:\.html)?'
-    _TESTS = [
-        {
-            'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
-            'info_dict': {
-                'id': '2d25e626-2378-391f-ada0-ddaf1417e588',
-                'ext': 'mp4',
-                'title': 'Julian Smith & Travis Legg Watch Julian Smith',
-                'description': 'Julian and Travis watch Julian Smith',
-                'duration': 6863,
-            },
+    _VALID_URL = r'(?P<url>https?://(?:(?P<country>[a-zA-Z]{2}(?:-[a-zA-Z]{2})?|malaysia)\.)?(?:[\da-zA-Z_-]+\.)?yahoo\.com/(?:[^/]+/)*(?P<id>[^?&#]*-[0-9]+(?:-[a-z]+)?)\.html)'
+    _TESTS = [{
+        'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
+        'info_dict': {
+            'id': '2d25e626-2378-391f-ada0-ddaf1417e588',
+            'ext': 'mp4',
+            'title': 'Julian Smith & Travis Legg Watch Julian Smith',
+            'description': 'Julian and Travis watch Julian Smith',
+            'duration': 6863,
+            'timestamp': 1369812016,
+            'upload_date': '20130529',
          },
-        {
-            'url': 'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html',
-            'md5': '251af144a19ebc4a033e8ba91ac726bb',
-            'info_dict': {
-                'id': 'd1dedf8c-d58c-38c3-8963-e899929ae0a9',
-                'ext': 'mp4',
-                'title': 'Codefellas - The Cougar Lies with Spanish Moss',
-                'description': 'md5:66b627ab0a282b26352136ca96ce73c1',
-                'duration': 151,
-            },
-            'skip': 'HTTP Error 404',
+    }, {
+        'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
+        'md5': '7993e572fac98e044588d0b5260f4352',
+        'info_dict': {
+            'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb',
+            'ext': 'mp4',
+            'title': "Yahoo Saves 'Community'",
+            'description': 'md5:4d4145af2fd3de00cbb6c1d664105053',
+            'duration': 170,
+            'timestamp': 1406838636,
+            'upload_date': '20140731',
          },
-        {
-            'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
-            'md5': '7993e572fac98e044588d0b5260f4352',
-            'info_dict': {
-                'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb',
-                'ext': 'mp4',
-                'title': "Yahoo Saves 'Community'",
-                'description': 'md5:4d4145af2fd3de00cbb6c1d664105053',
-                'duration': 170,
-            }
+    }, {
+        'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
+        'md5': '71298482f7c64cbb7fa064e4553ff1c1',
+        'info_dict': {
+            'id': 'b3affa53-2e14-3590-852b-0e0db6cd1a58',
+            'ext': 'webm',
+            'title': 'Cute Raccoon Freed From Drain\u00a0Using Angle Grinder',
+            'description': 'md5:f66c890e1490f4910a9953c941dee944',
+            'duration': 97,
+            'timestamp': 1414489862,
+            'upload_date': '20141028',
+        }
+    }, {
+        'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
+        'md5': '88e209b417f173d86186bef6e4d1f160',
+        'info_dict': {
+            'id': 'f885cf7f-43d4-3450-9fac-46ac30ece521',
+            'ext': 'mp4',
+            'title': 'China Moses Is Crazy About the Blues',
+            'description': 'md5:9900ab8cd5808175c7b3fe55b979bed0',
+            'duration': 128,
+            'timestamp': 1385722202,
+            'upload_date': '20131129',
+        }
+    }, {
+        'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html',
+        'md5': '2a9752f74cb898af5d1083ea9f661b58',
+        'info_dict': {
+            'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
+            'ext': 'mp4',
+            'title': '\'True Story\' Trailer',
+            'description': 'True Story',
+            'duration': 150,
+            'timestamp': 1418919206,
+            'upload_date': '20141218',
          },
-        {
-            'url': 'https://tw.news.yahoo.com/%E6%95%A2%E5%95%8F%E5%B8%82%E9%95%B7%20%E9%BB%83%E7%A7%80%E9%9C%9C%E6%89%B9%E8%B3%B4%E6%B8%85%E5%BE%B7%20%E9%9D%9E%E5%B8%B8%E9%AB%98%E5%82%B2-034024051.html',
-            'md5': '45c024bad51e63e9b6f6fad7a43a8c23',
-            'info_dict': {
-                'id': 'cac903b3-fcf4-3c14-b632-643ab541712f',
-                'ext': 'mp4',
-                'title': '敢問市長／黃秀霜批賴清德「非常高傲」',
-                'description': '直言台南沒捷運 交通居五都之末',
-                'duration': 396,
-            },
+    }, {
+        'url': 'https://gma.yahoo.com/pizza-delivery-man-surprised-huge-tip-college-kids-195200785.html',
+        'only_matching': True,
+    }, {
+        'note': 'NBC Sports embeds',
+        'url': 'http://sports.yahoo.com/blogs/ncaab-the-dagger/tyler-kalinoski-s-buzzer-beater-caps-davidson-s-comeback-win-185609842.html?guid=nbc_cbk_davidsonbuzzerbeater_150313',
+        'info_dict': {
+            'id': '9CsDKds0kvHI',
+            'ext': 'flv',
+            'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
+            'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
+            'upload_date': '20150313',
+            'uploader': 'NBCU-SPORTS',
+            'timestamp': 1426270238,
          },
-        {
-            'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
-            'md5': '71298482f7c64cbb7fa064e4553ff1c1',
-            'info_dict': {
-                'id': 'b3affa53-2e14-3590-852b-0e0db6cd1a58',
-                'ext': 'webm',
-                'title': 'Cute Raccoon Freed From Drain\u00a0Using Angle Grinder',
-                'description': 'md5:f66c890e1490f4910a9953c941dee944',
-                'duration': 97,
-            }
+    }, {
+        'url': 'https://tw.news.yahoo.com/-100120367.html',
+        'only_matching': True,
+    }, {
+        # Query result is embedded in webpage, but explicit request to video API fails with geo restriction
+        'url': 'https://screen.yahoo.com/community/communitary-community-episode-1-ladders-154501237.html',
+        'md5': '4fbafb9c9b6f07aa8f870629f6671b35',
+        'info_dict': {
+            'id': '1f32853c-a271-3eef-8cb6-f6d6872cb504',
+            'ext': 'mp4',
+            'title': 'Communitary - Community Episode 1: Ladders',
+            'description': 'md5:8fc39608213295748e1e289807838c97',
+            'duration': 1646,
+            'timestamp': 1440436550,
+            'upload_date': '20150824',
+            'series': 'Communitary',
+            'season_number': 6,
+            'episode_number': 1,
          },
-        {
-            'url': 'https://ca.sports.yahoo.com/video/program-makes-hockey-more-affordable-013127711.html',
-            'md5': '57e06440778b1828a6079d2f744212c4',
-            'info_dict': {
-                'id': 'c9fa2a36-0d4d-3937-b8f6-cc0fb1881e73',
-                'ext': 'mp4',
-                'title': 'Program that makes hockey more affordable not offered in Manitoba',
-                'description': 'md5:c54a609f4c078d92b74ffb9bf1f496f4',
-                'duration': 121,
-            },
-            'skip': 'Video gone',
-        }, {
-            'url': 'https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html',
-            'info_dict': {
-                'id': '154609075',
-            },
-            'playlist': [{
-                'md5': '000887d0dc609bc3a47c974151a40fb8',
-                'info_dict': {
-                    'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
-                    'ext': 'mp4',
-                    'title': '\'The Interview\' TV Spot: War',
-                    'description': 'The Interview',
-                    'duration': 30,
-                },
-            }, {
-                'md5': '81bc74faf10750fe36e4542f9a184c66',
-                'info_dict': {
-                    'id': '1fc8ada0-718e-3abe-a450-bf31f246d1a9',
-                    'ext': 'mp4',
-                    'title': '\'The Interview\' TV Spot: Guys',
-                    'description': 'The Interview',
-                    'duration': 30,
-                },
-            }],
-        }, {
-            'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
-            'md5': '88e209b417f173d86186bef6e4d1f160',
-            'info_dict': {
-                'id': 'f885cf7f-43d4-3450-9fac-46ac30ece521',
-                'ext': 'mp4',
-                'title': 'China Moses Is Crazy About the Blues',
-                'description': 'md5:9900ab8cd5808175c7b3fe55b979bed0',
-                'duration': 128,
-            }
-        }, {
-            'url': 'https://in.lifestyle.yahoo.com/video/connect-dots-dark-side-virgo-090247395.html',
-            'md5': 'd9a083ccf1379127bf25699d67e4791b',
-            'info_dict': {
-                'id': '52aeeaa3-b3d1-30d8-9ef8-5d0cf05efb7c',
-                'ext': 'mp4',
-                'title': 'Connect the Dots: Dark Side of Virgo',
-                'description': 'md5:1428185051cfd1949807ad4ff6d3686a',
-                'duration': 201,
-            },
-            'skip': 'Domain name in.lifestyle.yahoo.com gone',
-        }, {
-            'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html',
-            'md5': '989396ae73d20c6f057746fb226aa215',
-            'info_dict': {
-                'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
-                'ext': 'mp4',
-                'title': '\'True Story\' Trailer',
-                'description': 'True Story',
-                'duration': 150,
-            },
-        }, {
-            'url': 'https://gma.yahoo.com/pizza-delivery-man-surprised-huge-tip-college-kids-195200785.html',
-            'only_matching': True,
-        }, {
-            'note': 'NBC Sports embeds',
-            'url': 'http://sports.yahoo.com/blogs/ncaab-the-dagger/tyler-kalinoski-s-buzzer-beater-caps-davidson-s-comeback-win-185609842.html?guid=nbc_cbk_davidsonbuzzerbeater_150313',
-            'info_dict': {
-                'id': '9CsDKds0kvHI',
-                'ext': 'flv',
-                'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
-                'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
-                'upload_date': '20150313',
-                'uploader': 'NBCU-SPORTS',
-                'timestamp': 1426270238,
-            }
-        }, {
-            'url': 'https://tw.news.yahoo.com/-100120367.html',
-            'only_matching': True,
-        }, {
-            # Query result is embedded in webpage, but explicit request to video API fails with geo restriction
-            'url': 'https://screen.yahoo.com/community/communitary-community-episode-1-ladders-154501237.html',
-            'md5': '4fbafb9c9b6f07aa8f870629f6671b35',
-            'info_dict': {
-                'id': '1f32853c-a271-3eef-8cb6-f6d6872cb504',
-                'ext': 'mp4',
-                'title': 'Communitary - Community Episode 1: Ladders',
-                'description': 'md5:8fc39608213295748e1e289807838c97',
-                'duration': 1646,
-            },
-        }, {
-            # it uses an alias to get the video_id
-            'url': 'https://www.yahoo.com/movies/the-stars-of-daddys-home-have-very-different-212843197.html',
-            'info_dict': {
-                'id': '40eda9c8-8e5f-3552-8745-830f67d0c737',
-                'ext': 'mp4',
-                'title': 'Will Ferrell & Mark Wahlberg Are Pro-Spanking',
-                'description': 'While they play feuding fathers in \'Daddy\'s Home,\' star Will Ferrell & Mark Wahlberg share their true feelings on parenthood.',
-            },
+    }, {
+        # ytwnews://cavideo/
+        'url': 'https://tw.video.yahoo.com/movie-tw/單車天使-中文版預-092316541.html',
+        'info_dict': {
+            'id': 'ba133ff2-0793-3510-b636-59dfe9ff6cff',
+            'ext': 'mp4',
+            'title': '單車天使 - 中文版預',
+            'description': '中文版預',
+            'timestamp': 1476696196,
+            'upload_date': '20161017',
          },
-        {
-            # config['models']['applet_model']['data']['sapi'] has no query
-            'url': 'https://www.yahoo.com/music/livenation/event/galactic-2016',
-            'md5': 'dac0c72d502bc5facda80c9e6d5c98db',
-            'info_dict': {
-                'id': 'a6015640-e9e5-3efb-bb60-05589a183919',
-                'ext': 'mp4',
-                'description': 'Galactic',
-                'title': 'Dolla Diva (feat. Maggie Koerner)',
-            },
-            'skip': 'redirect to https://www.yahoo.com/music',
+        'params': {
+            'skip_download': True,
          },
-        {
-            # yahoo://article/
-            'url': 'https://www.yahoo.com/movies/video/true-story-trailer-173000497.html',
-            'info_dict': {
-                'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
-                'ext': 'mp4',
-                'title': "'True Story' Trailer",
-                'description': 'True Story',
-            },
-            'params': {
-                'skip_download': True,
-            },
+    }, {
+        # Contains both a Yahoo hosted video and multiple Youtube embeds
+        'url': 'https://www.yahoo.com/entertainment/gwen-stefani-reveals-the-pop-hit-she-passed-on-assigns-it-to-her-voice-contestant-instead-033045672.html',
+        'info_dict': {
+            'id': '46c5d95a-528f-3d03-b732-732fcadd51de',
+            'title': 'Gwen Stefani reveals the pop hit she passed on, assigns it to her \'Voice\' contestant instead',
+            'description': 'Gwen decided not to record this hit herself, but she decided it was the perfect fit for Kyndall Inskeep.',
          },
-        {
-            # ytwnews://cavideo/
-            'url': 'https://tw.video.yahoo.com/movie-tw/單車天使-中文版預-092316541.html',
+        'playlist': [{
              'info_dict': {
-                'id': 'ba133ff2-0793-3510-b636-59dfe9ff6cff',
+                'id': '966d4262-4fd1-3aaa-b45b-049ca6e38ba6',
                  'ext': 'mp4',
-                'title': '單車天使 - 中文版預',
-                'description': '中文版預',
+                'title': 'Gwen Stefani reveals she turned down one of Sia\'s best songs',
+                'description': 'On "The Voice" Tuesday, Gwen Stefani told Taylor Swift which Sia hit was almost hers.',
+                'timestamp': 1572406500,
+                'upload_date': '20191030',
              },
-            'params': {
-                'skip_download': True,
-            },
-        },
-        {
-            # custom brightcove
-            'url': 'https://au.tv.yahoo.com/plus7/sunrise/-/watch/37083565/clown-entertainers-say-it-is-hurting-their-business/',
+        }, {
              'info_dict': {
-                'id': '5575377707001',
+                'id': '352CFDOQrKg',
                  'ext': 'mp4',
-                'title': "Clown entertainers say 'It' is hurting their business",
-                'description': 'Stephen King s horror film has much to answer for. Jelby and Mr Loopy the Clowns join us.',
-                'timestamp': 1505341164,
-                'upload_date': '20170913',
-                'uploader_id': '2376984109001',
-            },
-            'params': {
-                'skip_download': True,
+                'title': 'Kyndal Inskeep "Performs the Hell Out of" Sia\'s "Elastic Heart" - The Voice Knockouts 2019',
+                'description': 'md5:35b61e94c2ae214bc965ff4245f80d11',
+                'uploader': 'The Voice',
+                'uploader_id': 'NBCTheVoice',
+                'upload_date': '20191029',
              },
+        }],
+        'params': {
+            'playlistend': 2,
          },
-        {
-            # custom brightcove, geo-restricted to Australia, bypassable
-            'url': 'https://au.tv.yahoo.com/plus7/sunrise/-/watch/37263964/sunrise-episode-wed-27-sep/',
-            'only_matching': True,
-        }
-    ]
+        'expected_warnings': ['HTTP Error 404'],
+    }, {
+        'url': 'https://malaysia.news.yahoo.com/video/bystanders-help-ontario-policeman-bust-190932818.html',
+        'only_matching': True,
+    }, {
+        'url': 'https://es-us.noticias.yahoo.com/es-la-puerta-irrompible-que-110539379.html',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.yahoo.com/entertainment/v/longtime-cbs-news-60-minutes-032036500-cbs.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        page_id = mobj.group('id')
-        display_id = mobj.group('display_id') or page_id
-        host = mobj.group('host')
-        webpage, urlh = self._download_webpage_handle(url, display_id)
-        if 'err=404' in urlh.geturl():
-            raise ExtractorError('Video gone', expected=True)
-
-        # Look for iframed media first
-        entries = []
-        iframe_urls = re.findall(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
-        for idx, iframe_url in enumerate(iframe_urls):
-            entries.append(self.url_result(host + iframe_url, 'Yahoo'))
-        if entries:
-            return self.playlist_result(entries, page_id)
-
-        # Look for NBCSports iframes
-        nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
-        if nbc_sports_url:
-            return self.url_result(nbc_sports_url, NBCSportsVPlayerIE.ie_key())
-
-        # Look for Brightcove Legacy Studio embeds
-        bc_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
-        if bc_url:
-            return self.url_result(bc_url, BrightcoveLegacyIE.ie_key())
-
-        def brightcove_url_result(bc_url):
-            return self.url_result(
-                smuggle_url(bc_url, {'geo_countries': [mobj.group('country')]}),
-                BrightcoveNewIE.ie_key())
-
-        # Look for Brightcove New Studio embeds
-        bc_url = BrightcoveNewIE._extract_url(self, webpage)
-        if bc_url:
-            return brightcove_url_result(bc_url)
-
-        brightcove_iframe = self._search_regex(
-            r'(<iframe[^>]+data-video-id=["\']\d+[^>]+>)', webpage,
-            'brightcove iframe', default=None)
-        if brightcove_iframe:
-            attr = extract_attributes(brightcove_iframe)
-            src = attr.get('src')
-            if src:
-                parsed_src = compat_urlparse.urlparse(src)
-                qs = compat_urlparse.parse_qs(parsed_src.query)
-                account_id = qs.get('accountId', ['2376984109001'])[0]
-                brightcove_id = attr.get('data-video-id') or qs.get('videoId', [None])[0]
-                if account_id and brightcove_id:
-                    return brightcove_url_result(
-                        'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
-                        % (account_id, brightcove_id))
-
-        # Query result is often embedded in webpage as JSON. Sometimes explicit requests
-        # to video API results in a failure with geo restriction reason therefore using
-        # embedded query result when present sounds reasonable.
-        config_json = self._search_regex(
-            r'window\.Af\.bootstrap\[[^\]]+\]\s*=\s*({.*?"applet_type"\s*:\s*"td-applet-videoplayer".*?});(?:</script>|$)',
-            webpage, 'videoplayer applet', default=None)
-        if config_json:
-            config = self._parse_json(config_json, display_id, fatal=False)
-            if config:
-                sapi = config.get('models', {}).get('applet_model', {}).get('data', {}).get('sapi')
-                if sapi and 'query' in sapi:
-                    info = self._extract_info(display_id, sapi, webpage)
-                    self._sort_formats(info['formats'])
-                    return info
-
-        items_json = self._search_regex(
-            r'mediaItems: ({.*?})$', webpage, 'items', flags=re.MULTILINE,
-            default=None)
-        if items_json is None:
-            alias = self._search_regex(
-                r'"aliases":{"video":"(.*?)"', webpage, 'alias', default=None)
-            if alias is not None:
-                alias_info = self._download_json(
-                    'https://www.yahoo.com/_td/api/resource/VideoService.videos;video_aliases=["%s"]' % alias,
-                    display_id, 'Downloading alias info')
-                video_id = alias_info[0]['id']
-            else:
-                CONTENT_ID_REGEXES = [
-                    r'YUI\.namespace\("Media"\)\.CONTENT_ID\s*=\s*"([^"]+)"',
-                    r'root\.App\.Cache\.context\.videoCache\.curVideo = \{"([^"]+)"',
-                    r'"first_videoid"\s*:\s*"([^"]+)"',
-                    r'%s[^}]*"ccm_id"\s*:\s*"([^"]+)"' % re.escape(page_id),
-                    r'<article[^>]data-uuid=["\']([^"\']+)',
-                    r'<meta[^<>]+yahoo://article/view\?.*\buuid=([^&"\']+)',
-                    r'<meta[^<>]+["\']ytwnews://cavideo/(?:[^/]+/)+([\da-fA-F-]+)[&"\']',
-                ]
-                video_id = self._search_regex(
-                    CONTENT_ID_REGEXES, webpage, 'content ID')
+        url, country, display_id = re.match(self._VALID_URL, url).groups()
+        if not country:
+            country = 'us'
          else:
-            items = json.loads(items_json)
-            info = items['mediaItems']['query']['results']['mediaObj'][0]
-            # The 'meta' field is not always in the video webpage, we request it
-            # from another page
-            video_id = info['id']
-        return self._get_info(video_id, display_id, webpage)
-
-    def _extract_info(self, display_id, query, webpage):
-        info = query['query']['results']['mediaObj'][0]
-        meta = info.get('meta')
-        video_id = info.get('id')
-
-        if not meta:
-            msg = info['status'].get('msg')
-            if msg:
-                raise ExtractorError(
-                    '%s returned error: %s' % (self.IE_NAME, msg), expected=True)
-            raise ExtractorError('Unable to extract media object meta')
+            country = country.split('-')[0]
+        api_base = 'https://%s.yahoo.com/_td/api/resource/' % country
+
+        for i, uuid in enumerate(['url=' + url, 'ymedia-alias=' + display_id]):
+            content = self._download_json(
+                api_base + 'content;getDetailView=true;uuids=["%s"]' % uuid,
+                display_id, 'Downloading content JSON metadata', fatal=i == 1)
+            if content:
+                item = content['items'][0]
+                break
+
+        if item.get('type') != 'video':
+            entries = []
+
+            cover = item.get('cover') or {}
+            if cover.get('type') == 'yvideo':
+                cover_url = cover.get('url')
+                if cover_url:
+                    entries.append(self.url_result(
+                        cover_url, 'Yahoo', cover.get('uuid')))
+
+            for e in item.get('body', []):
+                if e.get('type') == 'videoIframe':
+                    iframe_url = e.get('url')
+                    if not iframe_url:
+                        continue
+                    entries.append(self.url_result(iframe_url))
+
+            return self.playlist_result(
+                entries, item.get('uuid'),
+                item.get('title'), item.get('summary'))
+
+        video_id = item['uuid']
+        video = self._download_json(
+            api_base + 'VideoService.videos;view=full;video_ids=["%s"]' % video_id,
+            video_id, 'Downloading video JSON metadata')[0]
+        title = video['title']
+
+        if country == 'malaysia':
+            country = 'my'
  
+        is_live = video.get('live_state') == 'live'
+        fmts = ('m3u8',) if is_live else ('webm', 'mp4')
+
+        urls = []
          formats = []
-        for s in info['streams']:
-            tbr = int_or_none(s.get('bitrate'))
-            format_info = {
-                'width': int_or_none(s.get('width')),
-                'height': int_or_none(s.get('height')),
-                'tbr': tbr,
-            }
-
-            host = s['host']
-            path = s['path']
-            if host.startswith('rtmp'):
-                fmt = 'rtmp'
-                format_info.update({
-                    'url': host,
-                    'play_path': path,
-                    'ext': 'flv',
-                })
-            else:
-                if s.get('format') == 'm3u8_playlist':
-                    fmt = 'hls'
-                    format_info.update({
-                        'protocol': 'm3u8_native',
-                        'ext': 'mp4',
-                    })
-                else:
-                    fmt = format_info['ext'] = determine_ext(path)
-                format_url = compat_urlparse.urljoin(host, path)
-                format_info['url'] = format_url
-            format_info['format_id'] = fmt + ('-%d' % tbr if tbr else '')
-            formats.append(format_info)
-
-        closed_captions = self._html_search_regex(
-            r'"closedcaptions":(\[[^\]]+\])', webpage, 'closed captions',
-            default='[]')
-
-        cc_json = self._parse_json(closed_captions, video_id, fatal=False)
          subtitles = {}
-        if cc_json:
-            for closed_caption in cc_json:
-                lang = closed_caption['lang']
-                if lang not in subtitles:
-                    subtitles[lang] = []
-                subtitles[lang].append({
-                    'url': closed_caption['url'],
-                    'ext': mimetype2ext(closed_caption['content_type']),
+        for fmt in fmts:
+            media_obj = self._download_json(
+                'https://video-api.yql.yahoo.com/v1/video/sapi/streams/' + video_id,
+                video_id, 'Downloading %s JSON metadata' % fmt,
+                headers=self.geo_verification_headers(), query={
+                    'format': fmt,
+                    'region': country.upper(),
+                })['query']['results']['mediaObj'][0]
+            msg = media_obj.get('status', {}).get('msg')
+
+            for s in media_obj.get('streams', []):
+                host = s.get('host')
+                path = s.get('path')
+                if not host or not path:
+                    continue
+                s_url = host + path
+                if s.get('format') == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        s_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                    continue
+                tbr = int_or_none(s.get('bitrate'))
+                formats.append({
+                    'url': s_url,
+                    'format_id': fmt + ('-%d' % tbr if tbr else ''),
+                    'width': int_or_none(s.get('width')),
+                    'height': int_or_none(s.get('height')),
+                    'tbr': tbr,
+                    'fps': int_or_none(s.get('framerate')),
+                })
+
+            for cc in media_obj.get('closedcaptions', []):
+                cc_url = cc.get('url')
+                if not cc_url or cc_url in urls:
+                    continue
+                urls.append(cc_url)
+                subtitles.setdefault(cc.get('lang') or 'en-US', []).append({
+                    'url': cc_url,
+                    'ext': mimetype2ext(cc.get('content_type')),
                  })
  
+        streaming_url = video.get('streaming_url')
+        if streaming_url and not is_live:
+            formats.extend(self._extract_m3u8_formats(
+                streaming_url, video_id, 'mp4',
+                'm3u8_native', m3u8_id='hls', fatal=False))
+
+        if not formats and msg == 'geo restricted':
+            self.raise_geo_restricted()
+
+        self._sort_formats(formats)
+
+        thumbnails = []
+        for thumb in video.get('thumbnails', []):
+            thumb_url = thumb.get('url')
+            if not thumb_url:
+                continue
+            thumbnails.append({
+                'id': thumb.get('tag'),
+                'url': thumb.get('url'),
+                'width': int_or_none(thumb.get('width')),
+                'height': int_or_none(thumb.get('height')),
+            })
+
+        series_info = video.get('series_info') or {}
+
          return {
              'id': video_id,
-            'display_id': display_id,
-            'title': unescapeHTML(meta['title']),
+            'title': self._live_title(title) if is_live else title,
              'formats': formats,
-            'description': clean_html(meta['description']),
-            'thumbnail': meta['thumbnail'] if meta.get('thumbnail') else self._og_search_thumbnail(webpage),
-            'duration': int_or_none(meta.get('duration')),
+            'display_id': display_id,
+            'thumbnails': thumbnails,
+            'description': clean_html(video.get('description')),
+            'timestamp': parse_iso8601(video.get('publish_time')),
              'subtitles': subtitles,
+            'duration': int_or_none(video.get('duration')),
+            'view_count': int_or_none(video.get('view_count')),
+            'is_live': is_live,
+            'series': video.get('show_name'),
+            'season_number': int_or_none(series_info.get('season_number')),
+            'episode_number': int_or_none(series_info.get('episode_number')),
          }
  
-    def _get_info(self, video_id, display_id, webpage):
-        region = self._search_regex(
-            r'\\?"region\\?"\s*:\s*\\?"([^"]+?)\\?"',
-            webpage, 'region', fatal=False, default='US').upper()
-        formats = []
-        info = {}
-        for fmt in ('webm', 'mp4'):
-            query_result = self._download_json(
-                'https://video.media.yql.yahoo.com/v1/video/sapi/streams/' + video_id,
-                display_id, 'Downloading %s video info' % fmt, query={
-                    'protocol': 'http',
-                    'region': region,
-                    'format': fmt,
-                })
-            info = self._extract_info(display_id, query_result, webpage)
-            formats.extend(info['formats'])
-        formats.extend(self._extract_m3u8_formats(
-            'http://video.media.yql.yahoo.com/v1/hls/%s?region=%s' % (video_id, region),
-            video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
-        self._sort_formats(formats)
-        info['formats'] = formats
-        return info
-
  
  class YahooSearchIE(SearchInfoExtractor):
      IE_DESC = 'Yahoo screen search'
@@ -523,7 +383,7 @@ class YahooGyaOPlayerIE(InfoExtractor):
              'id': video_id,
              'title': video['title'],
              'url': smuggle_url(
-                'http://players.brightcove.net/4235717419001/default_default/index.html?videoId=' + video['videoId'],
+                'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['videoId'],
                  {'geo_countries': ['JP']}),
              'description': video.get('longDescription'),
              'ie_key': BrightcoveNewIE.ie_key(),
diff --git a/youtube_dl/extractor/yourporn.py b/youtube_dl/extractor/yourporn.py

index 8a2d5f63bdb929edd310c250b9242bd2ed409208..98347491ee00b66d2c2f8df69ddf663c1bffbd84 100644 (file)
--- a/youtube_dl/extractor/yourporn.py
+++ b/youtube_dl/extractor/yourporn.py
@@ -1,6 +1,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      parse_duration,
      urljoin,
@@ -8,9 +9,9 @@ from ..utils import (
  
  
  class YourPornIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:yourporn\.sexy|sxyprn\.com)/post/(?P<id>[^/?#&.]+)'
+    _VALID_URL = r'https?://(?:www\.)?sxyprn\.com/post/(?P<id>[^/?#&.]+)'
      _TESTS = [{
-        'url': 'https://yourporn.sexy/post/57ffcb2e1179b.html',
+        'url': 'https://sxyprn.com/post/57ffcb2e1179b.html',
          'md5': '6f8682b6464033d87acaa7a8ff0c092e',
          'info_dict': {
              'id': '57ffcb2e1179b',
@@ -33,11 +34,19 @@ class YourPornIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        video_url = urljoin(url, self._parse_json(
+        parts = self._parse_json(
              self._search_regex(
                  r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
                  group='data'),
-            video_id)[video_id]).replace('/cdn/', '/cdn5/')
+            video_id)[video_id].split('/')
+
+        num = 0
+        for c in parts[6] + parts[7]:
+            if c.isnumeric():
+                num += int(c)
+        parts[5] = compat_str(int(parts[5]) - num)
+        parts[1] += '8'
+        video_url = urljoin(url, '/'.join(parts))
  
          title = (self._search_regex(
              r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
@@ -54,4 +63,5 @@ class YourPornIE(InfoExtractor):
              'thumbnail': thumbnail,
              'duration': duration,
              'age_limit': 18,
+            'ext': 'mp4',
          }
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index a3364a14ed0ebb9f273915414dda78eaae258a33..eacaa5ecdb70d2a16748b4c2e58edc14d7d69484 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -69,7 +69,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
      # If True it will raise an error if no login info is provided
      _LOGIN_REQUIRED = False
  
-    _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|OLAK5uy_)[0-9A-Za-z-_]{10,}'
+    _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
  
      def _set_language(self):
          self._set_cookie(
@@ -372,7 +372,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
      _VALID_URL = r"""(?x)^
                       (
                           (?:https?://|//)                                    # http(s):// or protocol-independent URL
-                         (?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/|
+                         (?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie|kids)?\.com/|
                              (?:www\.)?deturl\.com/www\.youtube\.com/|
                              (?:www\.)?pwnyoutube\.com/|
                              (?:www\.)?hooktube\.com/|
@@ -383,7 +383,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                              (?:(?:www|no)\.)?invidiou\.sh/|
                              (?:(?:www|fi|de)\.)?invidious\.snopyta\.org/|
                              (?:www\.)?invidious\.kabi\.tk/|
-                            (?:www\.)?invidious\.enkirton\.net/|
                              (?:www\.)?invidious\.13ad\.de/|
                              (?:www\.)?invidious\.mastodon\.host/|
                              (?:www\.)?invidious\.nixnet\.xyz/|
@@ -391,6 +390,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                              (?:www\.)?tube\.poal\.co/|
                              (?:www\.)?vid\.wxzm\.sx/|
                              (?:www\.)?yt\.elukerio\.org/|
+                            (?:www\.)?yt\.lelux\.fi/|
                              (?:www\.)?kgg2m7yk5aybusll\.onion/|
                              (?:www\.)?qklhadlycap4cnod\.onion/|
                              (?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
@@ -1224,6 +1224,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'skip_download': True,
              },
          },
+        {
+            'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
+            'only_matching': True,
+        },
      ]
  
      def __init__(self, *args, **kwargs):
@@ -1339,6 +1343,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          funcname = self._search_regex(
              (r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
               r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+             r'\b(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
               r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
               # Obsolete patterns
               r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
@@ -2465,7 +2470,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
                          (?:\w+\.)?
                          (?:
                              (?:
-                                youtube\.com|
+                                youtube(?:kids)?\.com|
                                  invidio\.us
                              )
                              /
@@ -2477,7 +2482,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
                              youtu\.be/[0-9A-Za-z_-]{11}\?.*?\blist=
                          )
                          (
-                            (?:PL|LL|EC|UU|FL|RD|UL|TL|OLAK5uy_)?[0-9A-Za-z-_]{10,}
+                            (?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)?[0-9A-Za-z-_]{10,}
                              # Top tracks, they can also include dots
                              |(?:MC)[\w\.]*
                          )
@@ -2647,6 +2652,9 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      }, {
          'url': 'https://invidio.us/playlist?list=PLDIoUOhQQPlXr63I_vwF9GD8sAKh77dWU',
          'only_matching': True,
+    }, {
+        'url': 'https://www.youtubekids.com/watch?v=Agk7R8I8o5U&list=PUZ6jURNr1WQZCNHF0ao-c0g',
+        'only_matching': True,
      }]
  
      def _real_initialize(self):
@@ -2817,7 +2825,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
  
  class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
      IE_DESC = 'YouTube.com channels'
-    _VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com|(?:www\.)?invidio\.us)/channel/(?P<id>[0-9A-Za-z_-]+)'
+    _VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie|kids)?\.com|(?:www\.)?invidio\.us)/channel/(?P<id>[0-9A-Za-z_-]+)'
      _TEMPLATE_URL = 'https://www.youtube.com/channel/%s/videos'
      _VIDEO_RE = r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?'
      IE_NAME = 'youtube:channel'
@@ -2845,6 +2853,9 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
      }, {
          'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
          'only_matching': True,
+    }, {
+        'url': 'https://www.youtubekids.com/channel/UCyu8StPfZWapR6rfW_JgqcA',
+        'only_matching': True,
      }]
  
      @classmethod
diff --git a/youtube_dl/extractor/zype.py b/youtube_dl/extractor/zype.py

index 3b16e703b9b9ef54497b3172deac0d299efb7373..2e2e97a0c4454971dab30136518ca26e903b9470 100644 (file)
--- a/youtube_dl/extractor/zype.py
+++ b/youtube_dl/extractor/zype.py
@@ -4,10 +4,20 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+    dict_get,
+    ExtractorError,
+    int_or_none,
+    js_to_json,
+    parse_iso8601,
+)
  
  
  class ZypeIE(InfoExtractor):
-    _VALID_URL = r'https?://player\.zype\.com/embed/(?P<id>[\da-fA-F]+)\.js\?.*?api_key=[^&]+'
+    _ID_RE = r'[\da-fA-F]+'
+    _COMMON_RE = r'//player\.zype\.com/embed/%s\.(?:js|json|html)\?.*?(?:access_token|(?:ap[ip]|player)_key)='
+    _VALID_URL = r'https?:%s[^&]+' % (_COMMON_RE % ('(?P<id>%s)' % _ID_RE))
      _TEST = {
          'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
          'md5': 'eaee31d474c76a955bdaba02a505c595',
@@ -16,6 +26,9 @@ class ZypeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Smoky Barbecue Favorites',
              'thumbnail': r're:^https?://.*\.jpe?g',
+            'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+            'timestamp': 1504915200,
+            'upload_date': '20170909',
          },
      }
  
@@ -24,34 +37,98 @@ class ZypeIE(InfoExtractor):
          return [
              mobj.group('url')
              for mobj in re.finditer(
-                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//player\.zype\.com/embed/[\da-fA-F]+\.js\?.*?api_key=.+?)\1',
+                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.+?)\1' % (ZypeIE._COMMON_RE % ZypeIE._ID_RE),
                  webpage)]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        try:
+            response = self._download_json(re.sub(
+                r'\.(?:js|html)\?', '.json?', url), video_id)['response']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401, 403):
+                raise ExtractorError(self._parse_json(
+                    e.cause.read().decode(), video_id)['message'], expected=True)
+            raise
  
-        title = self._search_regex(
-            r'video_title\s*[:=]\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'title', group='value')
+        body = response['body']
+        video = response['video']
+        title = video['title']
  
-        m3u8_url = self._search_regex(
-            r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1', webpage,
-            'm3u8 url', group='url')
-
-        formats = self._extract_m3u8_formats(
-            m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
-            m3u8_id='hls')
+        if isinstance(body, dict):
+            formats = []
+            for output in body.get('outputs', []):
+                output_url = output.get('url')
+                if not output_url:
+                    continue
+                name = output.get('name')
+                if name == 'm3u8':
+                    formats = self._extract_m3u8_formats(
+                        output_url, video_id, 'mp4',
+                        'm3u8_native', m3u8_id='hls', fatal=False)
+                else:
+                    f = {
+                        'format_id': name,
+                        'tbr': int_or_none(output.get('bitrate')),
+                        'url': output_url,
+                    }
+                    if name in ('m4a', 'mp3'):
+                        f['vcodec'] = 'none'
+                    else:
+                        f.update({
+                            'height': int_or_none(output.get('height')),
+                            'width': int_or_none(output.get('width')),
+                        })
+                    formats.append(f)
+            text_tracks = body.get('subtitles') or []
+        else:
+            m3u8_url = self._search_regex(
+                r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1',
+                body, 'm3u8 url', group='url')
+            formats = self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
+            text_tracks = self._search_regex(
+                r'textTracks\s*:\s*(\[[^]]+\])',
+                body, 'text tracks', default=None)
+            if text_tracks:
+                text_tracks = self._parse_json(
+                    text_tracks, video_id, js_to_json, False)
          self._sort_formats(formats)
  
-        thumbnail = self._search_regex(
-            r'poster\s*[:=]\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'thumbnail',
-            default=False, group='url')
+        subtitles = {}
+        if text_tracks:
+            for text_track in text_tracks:
+                tt_url = dict_get(text_track, ('file', 'src'))
+                if not tt_url:
+                    continue
+                subtitles.setdefault(text_track.get('label') or 'English', []).append({
+                    'url': tt_url,
+                })
+
+        thumbnails = []
+        for thumbnail in video.get('thumbnails', []):
+            thumbnail_url = thumbnail.get('url')
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'width': int_or_none(thumbnail.get('width')),
+                'height': int_or_none(thumbnail.get('height')),
+            })
  
          return {
              'id': video_id,
+            'display_id': video.get('friendly_title'),
              'title': title,
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
+            'description': dict_get(video, ('description', 'ott_description', 'short_description')),
+            'timestamp': parse_iso8601(video.get('published_at')),
+            'duration': int_or_none(video.get('duration')),
+            'view_count': int_or_none(video.get('request_count')),
+            'average_rating': int_or_none(video.get('rating')),
+            'season_number': int_or_none(video.get('season')),
+            'episode_number': int_or_none(video.get('episode')),
              'formats': formats,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/postprocessor/ffmpeg.py b/youtube_dl/postprocessor/ffmpeg.py

index 70416c25ee1a3fb333c63ff7334f87250122fb07..fd3f921a8a11da2e8c31573889ea4d7f5a9fea25 100644 (file)
--- a/youtube_dl/postprocessor/ffmpeg.py
+++ b/youtube_dl/postprocessor/ffmpeg.py
@@ -393,7 +393,7 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
              sub_ext = sub_info['ext']
              if ext != 'webm' or ext == 'webm' and sub_ext == 'vtt':
                  sub_langs.append(lang)
-                sub_filenames.append(subtitles_filename(filename, lang, sub_ext))
+                sub_filenames.append(subtitles_filename(filename, lang, sub_ext, ext))
              else:
                  if not webm_vtt_warn and ext == 'webm' and sub_ext != 'vtt':
                      webm_vtt_warn = True
@@ -606,9 +606,9 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
                  self._downloader.to_screen(
                      '[ffmpeg] Subtitle file for %s is already in the requested format' % new_ext)
                  continue
-            old_file = subtitles_filename(filename, lang, ext)
+            old_file = subtitles_filename(filename, lang, ext, info.get('ext'))
              sub_filenames.append(old_file)
-            new_file = subtitles_filename(filename, lang, new_ext)
+            new_file = subtitles_filename(filename, lang, new_ext, info.get('ext'))
  
              if ext in ('dfxp', 'ttml', 'tt'):
                  self._downloader.report_warning(
@@ -616,7 +616,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
                      'which results in style information loss')
  
                  dfxp_file = old_file
-                srt_file = subtitles_filename(filename, lang, 'srt')
+                srt_file = subtitles_filename(filename, lang, 'srt', info.get('ext'))
  
                  with open(dfxp_file, 'rb') as f:
                      srt_data = dfxp2srt(f.read())
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 7987572419a398349db4f73d2901fe324f3a401a..f6204692a81002cdfc44b02d183126e755283bd9 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -46,6 +46,7 @@ from .compat import (
      compat_html_entities,
      compat_html_entities_html5,
      compat_http_client,
+    compat_integer_types,
      compat_kwargs,
      compat_os_name,
      compat_parse_qs,
@@ -1718,13 +1719,16 @@ DATE_FORMATS = (
      '%B %d %Y',
      '%B %dst %Y',
      '%B %dnd %Y',
+    '%B %drd %Y',
      '%B %dth %Y',
      '%b %d %Y',
      '%b %dst %Y',
      '%b %dnd %Y',
+    '%b %drd %Y',
      '%b %dth %Y',
      '%b %dst %Y %I:%M',
      '%b %dnd %Y %I:%M',
+    '%b %drd %Y %I:%M',
      '%b %dth %Y %I:%M',
      '%Y %m %d',
      '%Y-%m-%d',
@@ -2906,8 +2910,8 @@ def determine_ext(url, default_ext='unknown_video'):
          return default_ext
  
  
-def subtitles_filename(filename, sub_lang, sub_format):
-    return filename.rsplit('.', 1)[0] + '.' + sub_lang + '.' + sub_format
+def subtitles_filename(filename, sub_lang, sub_format, expected_real_ext=None):
+    return replace_extension(filename, sub_lang + '.' + sub_format, expected_real_ext)
  
  
  def date_from_str(date_str):
@@ -3516,10 +3520,11 @@ def str_or_none(v, default=None):
  
  def str_to_int(int_str):
      """ A more relaxed version of int_or_none """
-    if int_str is None:
-        return None
-    int_str = re.sub(r'[,\.\+]', '', int_str)
-    return int(int_str)
+    if isinstance(int_str, compat_integer_types):
+        return int_str
+    elif isinstance(int_str, compat_str):
+        int_str = re.sub(r'[,\.\+]', '', int_str)
+        return int_or_none(int_str)
  
  
  def float_or_none(v, scale=1, invscale=1, default=None):
@@ -4979,7 +4984,7 @@ class ISO3166Utils(object):
  class GeoUtils(object):
      # Major IPv4 address blocks per country
      _country_ip_map = {
-        'AD': '85.94.160.0/19',
+        'AD': '46.172.224.0/19',
          'AE': '94.200.0.0/13',
          'AF': '149.54.0.0/17',
          'AG': '209.59.64.0/18',
@@ -4987,28 +4992,30 @@ class GeoUtils(object):
          'AL': '46.99.0.0/16',
          'AM': '46.70.0.0/15',
          'AO': '105.168.0.0/13',
-        'AP': '159.117.192.0/21',
+        'AP': '182.50.184.0/21',
+        'AQ': '23.154.160.0/24',
          'AR': '181.0.0.0/12',
          'AS': '202.70.112.0/20',
-        'AT': '84.112.0.0/13',
+        'AT': '77.116.0.0/14',
          'AU': '1.128.0.0/11',
          'AW': '181.41.0.0/18',
-        'AZ': '5.191.0.0/16',
+        'AX': '185.217.4.0/22',
+        'AZ': '5.197.0.0/16',
          'BA': '31.176.128.0/17',
          'BB': '65.48.128.0/17',
          'BD': '114.130.0.0/16',
          'BE': '57.0.0.0/8',
-        'BF': '129.45.128.0/17',
+        'BF': '102.178.0.0/15',
          'BG': '95.42.0.0/15',
          'BH': '37.131.0.0/17',
          'BI': '154.117.192.0/18',
          'BJ': '137.255.0.0/16',
-        'BL': '192.131.134.0/24',
+        'BL': '185.212.72.0/23',
          'BM': '196.12.64.0/18',
          'BN': '156.31.0.0/16',
          'BO': '161.56.0.0/16',
          'BQ': '161.0.80.0/20',
-        'BR': '152.240.0.0/12',
+        'BR': '191.128.0.0/12',
          'BS': '24.51.64.0/18',
          'BT': '119.2.96.0/19',
          'BW': '168.167.0.0/16',
@@ -5016,20 +5023,20 @@ class GeoUtils(object):
          'BZ': '179.42.192.0/18',
          'CA': '99.224.0.0/11',
          'CD': '41.243.0.0/16',
-        'CF': '196.32.200.0/21',
-        'CG': '197.214.128.0/17',
+        'CF': '197.242.176.0/21',
+        'CG': '160.113.0.0/16',
          'CH': '85.0.0.0/13',
-        'CI': '154.232.0.0/14',
+        'CI': '102.136.0.0/14',
          'CK': '202.65.32.0/19',
          'CL': '152.172.0.0/14',
-        'CM': '165.210.0.0/15',
+        'CM': '102.244.0.0/14',
          'CN': '36.128.0.0/10',
          'CO': '181.240.0.0/12',
          'CR': '201.192.0.0/12',
          'CU': '152.206.0.0/15',
          'CV': '165.90.96.0/19',
          'CW': '190.88.128.0/17',
-        'CY': '46.198.0.0/15',
+        'CY': '31.153.0.0/16',
          'CZ': '88.100.0.0/14',
          'DE': '53.0.0.0/8',
          'DJ': '197.241.0.0/17',
@@ -5046,6 +5053,7 @@ class GeoUtils(object):
          'EU': '2.16.0.0/13',
          'FI': '91.152.0.0/13',
          'FJ': '144.120.0.0/16',
+        'FK': '80.73.208.0/21',
          'FM': '119.252.112.0/20',
          'FO': '88.85.32.0/19',
          'FR': '90.0.0.0/9',
@@ -5055,8 +5063,8 @@ class GeoUtils(object):
          'GE': '31.146.0.0/16',
          'GF': '161.22.64.0/18',
          'GG': '62.68.160.0/19',
-        'GH': '45.208.0.0/14',
-        'GI': '85.115.128.0/19',
+        'GH': '154.160.0.0/12',
+        'GI': '95.164.0.0/16',
          'GL': '88.83.0.0/19',
          'GM': '160.182.0.0/15',
          'GN': '197.149.192.0/18',
@@ -5085,13 +5093,13 @@ class GeoUtils(object):
          'JE': '87.244.64.0/18',
          'JM': '72.27.0.0/17',
          'JO': '176.29.0.0/16',
-        'JP': '126.0.0.0/8',
+        'JP': '133.0.0.0/8',
          'KE': '105.48.0.0/12',
          'KG': '158.181.128.0/17',
          'KH': '36.37.128.0/17',
          'KI': '103.25.140.0/22',
          'KM': '197.255.224.0/20',
-        'KN': '198.32.32.0/19',
+        'KN': '198.167.192.0/19',
          'KP': '175.45.176.0/22',
          'KR': '175.192.0.0/10',
          'KW': '37.36.0.0/14',
@@ -5099,10 +5107,10 @@ class GeoUtils(object):
          'KZ': '2.72.0.0/13',
          'LA': '115.84.64.0/18',
          'LB': '178.135.0.0/16',
-        'LC': '192.147.231.0/24',
+        'LC': '24.92.144.0/20',
          'LI': '82.117.0.0/19',
          'LK': '112.134.0.0/15',
-        'LR': '41.86.0.0/19',
+        'LR': '102.183.0.0/16',
          'LS': '129.232.0.0/17',
          'LT': '78.56.0.0/13',
          'LU': '188.42.0.0/16',
@@ -5127,7 +5135,7 @@ class GeoUtils(object):
          'MT': '46.11.0.0/16',
          'MU': '105.16.0.0/12',
          'MV': '27.114.128.0/18',
-        'MW': '105.234.0.0/16',
+        'MW': '102.70.0.0/15',
          'MX': '187.192.0.0/11',
          'MY': '175.136.0.0/13',
          'MZ': '197.218.0.0/15',
@@ -5158,23 +5166,23 @@ class GeoUtils(object):
          'PW': '202.124.224.0/20',
          'PY': '181.120.0.0/14',
          'QA': '37.210.0.0/15',
-        'RE': '139.26.0.0/16',
+        'RE': '102.35.0.0/16',
          'RO': '79.112.0.0/13',
-        'RS': '178.220.0.0/14',
+        'RS': '93.86.0.0/15',
          'RU': '5.136.0.0/13',
-        'RW': '105.178.0.0/15',
+        'RW': '41.186.0.0/16',
          'SA': '188.48.0.0/13',
          'SB': '202.1.160.0/19',
          'SC': '154.192.0.0/11',
-        'SD': '154.96.0.0/13',
+        'SD': '102.120.0.0/13',
          'SE': '78.64.0.0/12',
-        'SG': '152.56.0.0/14',
+        'SG': '8.128.0.0/10',
          'SI': '188.196.0.0/14',
          'SK': '78.98.0.0/15',
-        'SL': '197.215.0.0/17',
+        'SL': '102.143.0.0/17',
          'SM': '89.186.32.0/19',
          'SN': '41.82.0.0/15',
-        'SO': '197.220.64.0/19',
+        'SO': '154.115.192.0/18',
          'SR': '186.179.128.0/17',
          'SS': '105.235.208.0/21',
          'ST': '197.159.160.0/19',
@@ -5197,15 +5205,15 @@ class GeoUtils(object):
          'TV': '202.2.96.0/19',
          'TW': '120.96.0.0/11',
          'TZ': '156.156.0.0/14',
-        'UA': '93.72.0.0/13',
-        'UG': '154.224.0.0/13',
-        'US': '3.0.0.0/8',
+        'UA': '37.52.0.0/14',
+        'UG': '102.80.0.0/13',
+        'US': '6.0.0.0/8',
          'UY': '167.56.0.0/13',
-        'UZ': '82.215.64.0/18',
+        'UZ': '84.54.64.0/18',
          'VA': '212.77.0.0/19',
-        'VC': '24.92.144.0/20',
+        'VC': '207.191.240.0/21',
          'VE': '186.88.0.0/13',
-        'VG': '172.103.64.0/18',
+        'VG': '66.81.192.0/20',
          'VI': '146.226.0.0/16',
          'VN': '14.160.0.0/11',
          'VU': '202.80.32.0/20',
@@ -5214,8 +5222,8 @@ class GeoUtils(object):
          'YE': '134.35.0.0/16',
          'YT': '41.242.116.0/22',
          'ZA': '41.0.0.0/11',
-        'ZM': '165.56.0.0/13',
-        'ZW': '41.85.192.0/19',
+        'ZM': '102.144.0.0/13',
+        'ZW': '102.177.192.0/18',
      }
  
      @classmethod
@@ -5377,6 +5385,19 @@ def decode_packed_codes(code):
          obfucasted_code)
  
  
+def caesar(s, alphabet, shift):
+    if shift == 0:
+        return s
+    l = len(alphabet)
+    return ''.join(
+        alphabet[(alphabet.index(c) + shift) % l] if c in alphabet else c
+        for c in s)
+
+
+def rot47(s):
+    return caesar(s, r'''!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~''', 47)
+
+
  def parse_m3u8_attributes(attrib):
      info = {}
      for (key, val) in re.findall(r'(?P<key>[A-Z0-9-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)', attrib):
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index c3eafb068b21766290dcc0ae53771688abe2b999..fa6f7289a0e11c045f741090bfaaef6f006e5ed6 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2019.09.28'
+__version__ = '2020.01.24'
author	Rogério Brito <rbrito@ime.usp.br>
	Fri, 10 Apr 2020 12:32:01 +0000 (09:32 -0300)
committer	Rogério Brito <rbrito@ime.usp.br>
	Fri, 10 Apr 2020 12:32:01 +0000 (09:32 -0300)
ChangeLog		patch \| blob \| history
README.md		patch \| blob \| history
README.txt		patch \| blob \| history
devscripts/create-github-release.py		patch \| blob \| history
docs/supportedsites.md		patch \| blob \| history
test/test_all_urls.py		patch \| blob \| history
test/test_utils.py		patch \| blob \| history
youtube-dl		patch \| blob \| history
youtube-dl.1		patch \| blob \| history
youtube_dl/YoutubeDL.py		patch \| blob \| history
youtube_dl/downloader/hls.py		patch \| blob \| history
youtube_dl/extractor/abcotvs.py		patch \| blob \| history
youtube_dl/extractor/addanime.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/adobetv.py		patch \| blob \| history
youtube_dl/extractor/americastestkitchen.py		patch \| blob \| history
youtube_dl/extractor/ard.py		patch \| blob \| history
youtube_dl/extractor/atresplayer.py		patch \| blob \| history
youtube_dl/extractor/audioboom.py		patch \| blob \| history
youtube_dl/extractor/azmedien.py		patch \| blob \| history
youtube_dl/extractor/bambuser.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/bellmedia.py		patch \| blob \| history
youtube_dl/extractor/bitchute.py		patch \| blob \| history
youtube_dl/extractor/bokecc.py		patch \| blob \| history
youtube_dl/extractor/brightcove.py		patch \| blob \| history
youtube_dl/extractor/businessinsider.py		patch \| blob \| history
youtube_dl/extractor/canvas.py		patch \| blob \| history
youtube_dl/extractor/ceskatelevize.py		patch \| blob \| history
youtube_dl/extractor/channel9.py		patch \| blob \| history
youtube_dl/extractor/chaturbate.py		patch \| blob \| history
youtube_dl/extractor/cloudflarestream.py		patch \| blob \| history
youtube_dl/extractor/comcarcoff.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/common.py		patch \| blob \| history
youtube_dl/extractor/contv.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/corus.py		patch \| blob \| history
youtube_dl/extractor/dailymotion.py		patch \| blob \| history
youtube_dl/extractor/daisuki.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/daum.py		patch \| blob \| history
youtube_dl/extractor/dctp.py		patch \| blob \| history
youtube_dl/extractor/discovery.py		patch \| blob \| history
youtube_dl/extractor/discoverynetworks.py		patch \| blob \| history
youtube_dl/extractor/dplay.py		patch \| blob \| history
youtube_dl/extractor/drtv.py		patch \| blob \| history
youtube_dl/extractor/dumpert.py		patch \| blob \| history
youtube_dl/extractor/extractors.py		patch \| blob \| history
youtube_dl/extractor/facebook.py		patch \| blob \| history
youtube_dl/extractor/flipagram.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/fox9.py		patch \| blob \| history
youtube_dl/extractor/gameone.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/generic.py		patch \| blob \| history
youtube_dl/extractor/gfycat.py		patch \| blob \| history
youtube_dl/extractor/globo.py		patch \| blob \| history
youtube_dl/extractor/go.py		patch \| blob \| history
youtube_dl/extractor/go90.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/hark.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/hotstar.py		patch \| blob \| history
youtube_dl/extractor/iconosquare.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/imggaming.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/internetvideoarchive.py		patch \| blob \| history
youtube_dl/extractor/ivi.py		patch \| blob \| history
youtube_dl/extractor/jamendo.py		patch \| blob \| history
youtube_dl/extractor/kakao.py		patch \| blob \| history
youtube_dl/extractor/kaltura.py		patch \| blob \| history
youtube_dl/extractor/keek.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/kinja.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/kontrtube.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/la7.py		patch \| blob \| history
youtube_dl/extractor/learnr.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/lego.py		patch \| blob \| history
youtube_dl/extractor/lnkgo.py		patch \| blob \| history
youtube_dl/extractor/macgamestore.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/mailru.py		patch \| blob \| history
youtube_dl/extractor/makertv.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/mangomolo.py		patch \| blob \| history
youtube_dl/extractor/mediaset.py		patch \| blob \| history
youtube_dl/extractor/minhateca.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/mit.py		patch \| blob \| history
youtube_dl/extractor/mitele.py		patch \| blob \| history
youtube_dl/extractor/mixcloud.py		patch \| blob \| history
youtube_dl/extractor/msn.py		patch \| blob \| history
youtube_dl/extractor/mtv.py		patch \| blob \| history
youtube_dl/extractor/musicplayon.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/myspass.py		patch \| blob \| history
youtube_dl/extractor/naver.py		patch \| blob \| history
youtube_dl/extractor/nbc.py		patch \| blob \| history
youtube_dl/extractor/ndr.py		patch \| blob \| history
youtube_dl/extractor/nexx.py		patch \| blob \| history
youtube_dl/extractor/nintendo.py		patch \| blob \| history
youtube_dl/extractor/nonktube.py		patch \| blob \| history
youtube_dl/extractor/nrk.py		patch \| blob \| history
youtube_dl/extractor/nrl.py		patch \| blob \| history
youtube_dl/extractor/ntvru.py		patch \| blob \| history
youtube_dl/extractor/odnoklassniki.py		patch \| blob \| history
youtube_dl/extractor/onet.py		patch \| blob \| history
youtube_dl/extractor/onionstudios.py		patch \| blob \| history
youtube_dl/extractor/ooyala.py		patch \| blob \| history
youtube_dl/extractor/openload.py		patch \| blob \| history
youtube_dl/extractor/orf.py		patch \| blob \| history
youtube_dl/extractor/pandatv.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/patreon.py		patch \| blob \| history
youtube_dl/extractor/peertube.py		patch \| blob \| history
youtube_dl/extractor/periscope.py		patch \| blob \| history
youtube_dl/extractor/pornhub.py		patch \| blob \| history
youtube_dl/extractor/promptfile.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/prosiebensat1.py		patch \| blob \| history
youtube_dl/extractor/puhutv.py		patch \| blob \| history
youtube_dl/extractor/redtube.py		patch \| blob \| history
youtube_dl/extractor/revision3.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/roosterteeth.py		patch \| blob \| history
youtube_dl/extractor/safari.py		patch \| blob \| history
youtube_dl/extractor/scrippsnetworks.py		patch \| blob \| history
youtube_dl/extractor/scte.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/seeker.py		patch \| blob \| history
youtube_dl/extractor/servingsys.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/shared.py		patch \| blob \| history
youtube_dl/extractor/slideslive.py		patch \| blob \| history
youtube_dl/extractor/soundcloud.py		patch \| blob \| history
youtube_dl/extractor/spankbang.py		patch \| blob \| history
youtube_dl/extractor/srmediathek.py		patch \| blob \| history
youtube_dl/extractor/streamango.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/stretchinternet.py		patch \| blob \| history
youtube_dl/extractor/stv.py		patch \| blob \| history
youtube_dl/extractor/teachable.py		patch \| blob \| history
youtube_dl/extractor/teachingchannel.py		patch \| blob \| history
youtube_dl/extractor/teamcoco.py		patch \| blob \| history
youtube_dl/extractor/telegraaf.py		patch \| blob \| history
youtube_dl/extractor/telequebec.py		patch \| blob \| history
youtube_dl/extractor/tenplay.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/thesun.py		patch \| blob \| history
youtube_dl/extractor/trunews.py		patch \| blob \| history
youtube_dl/extractor/tutv.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/tv2.py		patch \| blob \| history
youtube_dl/extractor/tv2dk.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/twentyfourvideo.py		patch \| blob \| history
youtube_dl/extractor/twitch.py		patch \| blob \| history
youtube_dl/extractor/twitter.py		patch \| blob \| history
youtube_dl/extractor/ufctv.py		patch \| blob \| history
youtube_dl/extractor/vessel.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/vice.py		patch \| blob \| history
youtube_dl/extractor/videodetective.py		patch \| blob \| history
youtube_dl/extractor/videopremium.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/viewlift.py		patch \| blob \| history
youtube_dl/extractor/viewster.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/vimeo.py		patch \| blob \| history
youtube_dl/extractor/vk.py		patch \| blob \| history
youtube_dl/extractor/vlive.py		patch \| blob \| history
youtube_dl/extractor/vodplatform.py		patch \| blob \| history
youtube_dl/extractor/voicerepublic.py		patch \| blob \| history
youtube_dl/extractor/vzaar.py		patch \| blob \| history
youtube_dl/extractor/wimp.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/wistia.py		patch \| blob \| history
youtube_dl/extractor/xfileshare.py		patch \| blob \| history
youtube_dl/extractor/xvideos.py		patch \| blob \| history
youtube_dl/extractor/yahoo.py		patch \| blob \| history
youtube_dl/extractor/yourporn.py		patch \| blob \| history
youtube_dl/extractor/youtube.py		patch \| blob \| history
youtube_dl/extractor/zype.py		patch \| blob \| history
youtube_dl/postprocessor/ffmpeg.py		patch \| blob \| history
youtube_dl/utils.py		patch \| blob \| history
youtube_dl/version.py		patch \| blob \| history