Update upstream source from tag 'upstream/2019.09.01'

author Rogério Brito <rbrito@ime.usp.br>

Mon, 2 Sep 2019 17:47:34 +0000 (14:47 -0300)

committer Rogério Brito <rbrito@ime.usp.br>

Mon, 2 Sep 2019 17:47:34 +0000 (14:47 -0300)
author Rogério Brito <rbrito@ime.usp.br>
Mon, 2 Sep 2019 17:47:34 +0000 (14:47 -0300)
committer Rogério Brito <rbrito@ime.usp.br>
Mon, 2 Sep 2019 17:47:34 +0000 (14:47 -0300)
diff --git a/ChangeLog b/ChangeLog

index 5ce78b07a234dd9bc30763bd616c63bb5f1d011a..e91e49854d4d7e7b409b7262698d206cd49a686e 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,136 @@
+version 2019.09.01
+
+Core
++ [extractor/generic] Add support for squarespace embeds (#21294, #21802,
+  #21859)
++ [downloader/external] Respect mtime option for aria2c (#22242)
+
+Extractors
++ [xhamster:user] Add support for user pages (#16330, #18454)
++ [xhamster] Add support for more domains
++ [verystream] Add support for woof.tube (#22217)
++ [dailymotion] Add support for lequipe.fr (#21328, #22152)
++ [openload] Add support for oload.vip (#22205)
++ [bbccouk] Extend URL regular expression (#19200)
++ [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223)
+* [safari] Fix authentication (#22161, #22184)
+* [usanetwork] Fix extraction (#22105)
++ [einthusan] Add support for einthusan.ca (#22171)
+* [youtube] Improve unavailable message extraction (#22117)
++ [piksel] Extract subtitles (#20506)
+
+
+version 2019.08.13
+
+Core
+* [downloader/fragment] Fix ETA calculation of resumed download (#21992)
+* [YoutubeDL] Check annotations availability (#18582)
+
+Extractors
+* [youtube:playlist] Improve flat extraction (#21927)
+* [youtube] Fix annotations extraction (#22045)
++ [discovery] Extract series meta field (#21808)
+* [youtube] Improve error detection (#16445)
+* [vimeo] Fix album extraction (#1933, #15704, #15855, #18967, #21986)
++ [roosterteeth] Add support for watch URLs
+* [discovery] Limit video data by show slug (#21980)
+
+
+version 2019.08.02
+
+Extractors
++ [tvigle] Add support for HLS and DASH formats (#21967)
+* [tvigle] Fix extraction (#21967)
++ [yandexvideo] Add support for DASH formats (#21971)
+* [discovery] Use API call for video data extraction (#21808)
++ [mgtv] Extract format_note (#21881)
+* [tvn24] Fix metadata extraction (#21833, #21834)
+* [dlive] Relax URL regular expression (#21909)
++ [openload] Add support for oload.best (#21913)
+* [youtube] Improve metadata extraction for age gate content (#21943)
+
+
+version 2019.07.30
+
+Extractors
+* [youtube] Fix and improve title and description extraction (#21934)
+
+
+version 2019.07.27
+
+Extractors
++ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
++ [discovery] Add support go.discovery.com URLs
+* [youtube:playlist] Relax video regular expression (#21844)
+* [generic] Restrict --default-search schemeless URLs detection pattern
+  (#21842)
+* [vrv] Fix CMS signing query extraction (#21809)
+
+
+version 2019.07.16
+
+Extractors
++ [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv
+  (#21281, #21290)
+* [kaltura] Check source format URL (#21290)
+* [ctsnews] Fix YouTube embeds extraction (#21678)
++ [einthusan] Add support for einthusan.com (#21748, #21775)
++ [youtube] Add support for invidious.mastodon.host (#21777)
++ [gfycat] Extend URL regular expression (#21779, #21780)
+* [youtube] Restrict is_live extraction (#21782)
+
+
+version 2019.07.14
+
+Extractors
+* [porn91] Fix extraction (#21312)
++ [yandexmusic] Extract track number and disk number (#21421)
++ [yandexmusic] Add support for multi disk albums (#21420, #21421)
+* [lynda] Handle missing subtitles (#20490, #20513)
++ [youtube] Add more invidious instances to URL regular expression (#21694)
+* [twitter] Improve uploader id extraction (#21705)
+* [spankbang] Fix and improve metadata extraction
+* [spankbang] Fix extraction (#21763, #21764)
++ [dlive] Add support for dlive.tv (#18080)
++ [livejournal] Add support for livejournal.com (#21526)
+* [roosterteeth] Fix free episode extraction (#16094)
+* [dbtv] Fix extraction
+* [bellator] Fix extraction
+- [rudo] Remove extractor (#18430, #18474)
+* [facebook] Fallback to twitter:image meta for thumbnail extraction (#21224)
+* [bleacherreport] Fix Bleacher Report CMS extraction
+* [espn] Fix fivethirtyeight.com extraction
+* [5tv] Relax video URL regular expression and support https URLs
+* [youtube] Fix is_live extraction (#21734)
+* [youtube] Fix authentication (#11270)
+
+
+version 2019.07.12
+
+Core
++ [adobepass] Add support for AT&T U-verse (mso ATT) (#13938, #21016)
+
+Extractors
++ [mgtv] Pass Referer HTTP header for format URLs (#21726)
++ [beeg] Add support for api/v6 v2 URLs without t argument (#21701)
+* [voxmedia:volume] Improvevox embed extraction (#16846)
+* [funnyordie] Move extraction to VoxMedia extractor (#16846)
+* [gameinformer] Fix extraction (#8895, #15363, #17206)
+* [funk] Fix extraction (#17915)
+* [packtpub] Relax lesson URL regular expression (#21695)
+* [packtpub] Fix extraction (#21268)
+* [philharmoniedeparis] Relax URL regular expression (#21672)
+* [peertube] Detect embed URLs in generic extraction (#21666)
+* [mixer:vod] Relax URL regular expression (#21657, #21658)
++ [lecturio] Add support id based URLs (#21630)
++ [go] Add site info for disneynow (#21613)
+* [ted] Restrict info regular expression (#21631)
+* [twitch:vod] Actualize m3u8 URL (#21538, #21607)
+* [vzaar] Fix videos with empty title (#21606)
+* [tvland] Fix extraction (#21384)
+* [arte] Clean extractor (#15583, #21614)
+
+
  version 2019.07.02
  
  Core
diff --git a/README.md b/README.md

index 8c48a30121bdd53344868615bced9ce00b99e40b..c39b13616946fbb1c07377e1bc74a39a3dfc4ee5 100644 (file)
--- a/README.md
+++ b/README.md
@@ -1216,6 +1216,72 @@ Incorrect:
  'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
  ```
  
+### Inline values
+
+Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
+
+#### Example
+
+Correct:
+
+```python
+title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+```
+
+Incorrect:
+
+```python
+TITLE_RE = r'<title>([^<]+)</title>'
+# ...some lines of code...
+title = self._html_search_regex(TITLE_RE, webpage, 'title')
+```
+
+### Collapse fallbacks
+
+Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
+
+#### Example
+
+Good:
+
+```python
+description = self._html_search_meta(
+    ['og:description', 'description', 'twitter:description'],
+    webpage, 'description', default=None)
+```
+
+Unwieldy:
+
+```python
+description = (
+    self._og_search_description(webpage, default=None)
+    or self._html_search_meta('description', webpage, default=None)
+    or self._html_search_meta('twitter:description', webpage, default=None))
+```
+
+Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
+
+### Trailing parentheses
+
+Always move trailing parentheses after the last argument.
+
+#### Example
+
+Correct:
+
+```python
+    lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+    list)
+```
+
+Incorrect:
+
+```python
+    lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+    list,
+)
+```
+
  ### Use convenience conversion and parsing functions
  
  Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
diff --git a/README.txt b/README.txt

index 44cd53432b1f0cb9752d87a782d41ccc9d50c26c..045e0b19a873b35e5856240fe3975757288f0625 100644 (file)
--- a/README.txt
+++ b/README.txt
@@ -1648,6 +1648,65 @@ Incorrect:
      'https://www.youtube.com/watch?v=FqZTN594JQw&list='
      'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
  
+Inline values
+
+Extracting variables is acceptable for reducing code duplication and
+improving readability of complex expressions. However, you should avoid
+extracting variables used only once and moving them to opposite parts of
+the extractor file, which makes reading the linear flow difficult.
+
+Example
+
+Correct:
+
+    title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+
+Incorrect:
+
+    TITLE_RE = r'<title>([^<]+)</title>'
+    # ...some lines of code...
+    title = self._html_search_regex(TITLE_RE, webpage, 'title')
+
+Collapse fallbacks
+
+Multiple fallback values can quickly become unwieldy. Collapse multiple
+fallback values into a single expression via a list of patterns.
+
+Example
+
+Good:
+
+    description = self._html_search_meta(
+        ['og:description', 'description', 'twitter:description'],
+        webpage, 'description', default=None)
+
+Unwieldy:
+
+    description = (
+        self._og_search_description(webpage, default=None)
+        or self._html_search_meta('description', webpage, default=None)
+        or self._html_search_meta('twitter:description', webpage, default=None))
+
+Methods supporting list of patterns are: _search_regex,
+_html_search_regex, _og_search_property, _html_search_meta.
+
+Trailing parentheses
+
+Always move trailing parentheses after the last argument.
+
+Example
+
+Correct:
+
+        lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+        list)
+
+Incorrect:
+
+        lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+        list,
+    )
+
  Use convenience conversion and parsing functions
  
  Wrap all extracted numeric data into safe functions from
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 55ae4314488d6cdffa6db1264cd2d44e9f0cec87..18bddc1383d9d1e36131585a8ce8536878f41e49 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -58,16 +58,8 @@
   - **ARD:mediathek**
   - **ARDBetaMediathek**
   - **Arkena**
- - **arte.tv**
   - **arte.tv:+7**
- - **arte.tv:cinema**
- - **arte.tv:concert**
- - **arte.tv:creative**
- - **arte.tv:ddc**
   - **arte.tv:embed**
- - **arte.tv:future**
- - **arte.tv:info**
- - **arte.tv:magazine**
   - **arte.tv:playlist**
   - **AsianCrush**
   - **AsianCrushPlaylist**
@@ -231,6 +223,8 @@
   - **DiscoveryNetworksDe**
   - **DiscoveryVR**
   - **Disney**
+ - **dlive:stream**
+ - **dlive:vod**
   - **Dotsub**
   - **DouyuShow**
   - **DouyuTV**: 斗鱼
@@ -313,9 +307,7 @@
   - **FrontendMastersCourse**
   - **FrontendMastersLesson**
   - **Funimation**
- - **FunkChannel**
- - **FunkMix**
- - **FunnyOrDie**
+ - **Funk**
   - **Fusion**
   - **Fux**
   - **FXNetworks**
@@ -458,6 +450,7 @@
   - **linkedin:learning:course**
   - **LinuxAcademy**
   - **LiTV**
+ - **LiveJournal**
   - **LiveLeak**
   - **LiveLeakEmbed**
   - **livestream**
@@ -764,7 +757,6 @@
   - **rtve.es:television**
   - **RTVNH**
   - **RTVS**
- - **Rudo**
   - **RUHD**
   - **rutube**: Rutube videos
   - **rutube:channel**: Rutube channels
@@ -896,7 +888,6 @@
   - **TF1**
   - **TFO**
   - **TheIntercept**
- - **theoperaplatform**
   - **ThePlatform**
   - **ThePlatformFeed**
   - **TheScene**
@@ -1109,6 +1100,7 @@
   - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
   - **XHamster**
   - **XHamsterEmbed**
+ - **XHamsterUser**
   - **xiami:album**: 虾米音乐 - 专辑
   - **xiami:artist**: 虾米音乐 - 歌手
   - **xiami:collection**: 虾米音乐 - 精选集
@@ -1126,6 +1118,7 @@
   - **Yahoo**: Yahoo screen and movies
   - **yahoo:gyao**
   - **yahoo:gyao:player**
+ - **yahoo:japannews**: Yahoo! Japan News
   - **YandexDisk**
   - **yandexmusic:album**: Яндекс.Музыка - Альбом
   - **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
diff --git a/youtube-dl b/youtube-dl

index 8864e5d193fb62b49a044a89a7cfb320743b6a7d..76a8e49b8b8864bb89ef6bccaf7f4289a2beb302 100755 (executable)

Binary files a/youtube-dl and b/youtube-dl differ
diff --git a/youtube-dl.1 b/youtube-dl.1

index 119ecd2ee67b1b0d97c45da23004fed633035da8..fa17c311357571462837bc8e49309826d48251c2 100644 (file)
--- a/youtube-dl.1
+++ b/youtube-dl.1
@@ -2423,6 +2423,86 @@ Incorrect:
  \[aq]PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4\[aq]
  \f[]
  .fi
+.SS Inline values
+.PP
+Extracting variables is acceptable for reducing code duplication and
+improving readability of complex expressions.
+However, you should avoid extracting variables used only once and moving
+them to opposite parts of the extractor file, which makes reading the
+linear flow difficult.
+.SS Example
+.PP
+Correct:
+.IP
+.nf
+\f[C]
+title\ =\ self._html_search_regex(r\[aq]<title>([^<]+)</title>\[aq],\ webpage,\ \[aq]title\[aq])
+\f[]
+.fi
+.PP
+Incorrect:
+.IP
+.nf
+\f[C]
+TITLE_RE\ =\ r\[aq]<title>([^<]+)</title>\[aq]
+#\ ...some\ lines\ of\ code...
+title\ =\ self._html_search_regex(TITLE_RE,\ webpage,\ \[aq]title\[aq])
+\f[]
+.fi
+.SS Collapse fallbacks
+.PP
+Multiple fallback values can quickly become unwieldy.
+Collapse multiple fallback values into a single expression via a list of
+patterns.
+.SS Example
+.PP
+Good:
+.IP
+.nf
+\f[C]
+description\ =\ self._html_search_meta(
+\ \ \ \ [\[aq]og:description\[aq],\ \[aq]description\[aq],\ \[aq]twitter:description\[aq]],
+\ \ \ \ webpage,\ \[aq]description\[aq],\ default=None)
+\f[]
+.fi
+.PP
+Unwieldy:
+.IP
+.nf
+\f[C]
+description\ =\ (
+\ \ \ \ self._og_search_description(webpage,\ default=None)
+\ \ \ \ or\ self._html_search_meta(\[aq]description\[aq],\ webpage,\ default=None)
+\ \ \ \ or\ self._html_search_meta(\[aq]twitter:description\[aq],\ webpage,\ default=None))
+\f[]
+.fi
+.PP
+Methods supporting list of patterns are: \f[C]_search_regex\f[],
+\f[C]_html_search_regex\f[], \f[C]_og_search_property\f[],
+\f[C]_html_search_meta\f[].
+.SS Trailing parentheses
+.PP
+Always move trailing parentheses after the last argument.
+.SS Example
+.PP
+Correct:
+.IP
+.nf
+\f[C]
+\ \ \ \ lambda\ x:\ x[\[aq]ResultSet\[aq]][\[aq]Result\[aq]][0][\[aq]VideoUrlSet\[aq]][\[aq]VideoUrl\[aq]],
+\ \ \ \ list)
+\f[]
+.fi
+.PP
+Incorrect:
+.IP
+.nf
+\f[C]
+\ \ \ \ lambda\ x:\ x[\[aq]ResultSet\[aq]][\[aq]Result\[aq]][0][\[aq]VideoUrlSet\[aq]][\[aq]VideoUrl\[aq]],
+\ \ \ \ list,
+)
+\f[]
+.fi
  .SS Use convenience conversion and parsing functions
  .PP
  Wrap all extracted numeric data into safe functions from
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index 3e832fec29f621db0837c6775deac0721b9211ff..6a44bc7bab8b79c67d5529f58ab455cc5da5c53c 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1783,6 +1783,8 @@ class YoutubeDL(object):
              annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
              if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
                  self.to_screen('[info] Video annotations are already present')
+            elif not info_dict.get('annotations'):
+                self.report_warning('There are no annotations to write.')
              else:
                  try:
                      self.to_screen('[info] Writing video annotations to: ' + annofn)
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index 165c975dd758e90afa657b9d2ce4ee8101abd036..9a659fc654d2a3af5d63e47363f8cdfdcdc0c333 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -94,7 +94,7 @@ def _real_main(argv=None):
              if opts.verbose:
                  write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
          except IOError:
-            sys.exit('ERROR: batch file could not be read')
+            sys.exit('ERROR: batch file %s could not be read' % opts.batchfile)
      all_urls = batch_urls + [url.strip() for url in args]  # batch_urls are already striped in read_batch_urls
      _enc = preferredencoding()
      all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]
diff --git a/youtube_dl/downloader/dash.py b/youtube_dl/downloader/dash.py

index eaa7adf7c4f2fc39d84984642a1ef17c014d089f..c6d674bc6c0690e00dee3c70879a97574c3911f3 100644 (file)
--- a/youtube_dl/downloader/dash.py
+++ b/youtube_dl/downloader/dash.py
@@ -53,7 +53,7 @@ class DashSegmentsFD(FragmentFD):
                  except compat_urllib_error.HTTPError as err:
                      # YouTube may often return 404 HTTP error for a fragment causing the
                      # whole download to fail. However if the same fragment is immediately
-                    # retried with the same request data this usually succeeds (1-2 attemps
+                    # retried with the same request data this usually succeeds (1-2 attempts
                      # is usually enough) thus allowing to download the whole file successfully.
                      # To be future-proof we will retry all fragments that fail with any
                      # HTTP error.
diff --git a/youtube_dl/downloader/external.py b/youtube_dl/downloader/external.py

index acdb27712264005346f1e9163af08ce71b73e84d..c31f8910ad89ceeaba9582eea5dc75e6adde3757 100644 (file)
--- a/youtube_dl/downloader/external.py
+++ b/youtube_dl/downloader/external.py
@@ -194,6 +194,7 @@ class Aria2cFD(ExternalFD):
          cmd += self._option('--interface', 'source_address')
          cmd += self._option('--all-proxy', 'proxy')
          cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
+        cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
          cmd += ['--', info_dict['url']]
          return cmd
  
diff --git a/youtube_dl/downloader/fragment.py b/youtube_dl/downloader/fragment.py

index f2e5733b6406603f9a52b26b65bd4bc3cc833fec..02f35459e82ddb0e39dad3b7790f2db7a90d85fa 100644 (file)
--- a/youtube_dl/downloader/fragment.py
+++ b/youtube_dl/downloader/fragment.py
@@ -190,12 +190,13 @@ class FragmentFD(FileDownloader):
          })
  
      def _start_frag_download(self, ctx):
+        resume_len = ctx['complete_frags_downloaded_bytes']
          total_frags = ctx['total_frags']
          # This dict stores the download progress, it's updated by the progress
          # hook
          state = {
              'status': 'downloading',
-            'downloaded_bytes': ctx['complete_frags_downloaded_bytes'],
+            'downloaded_bytes': resume_len,
              'fragment_index': ctx['fragment_index'],
              'fragment_count': total_frags,
              'filename': ctx['filename'],
@@ -234,8 +235,8 @@ class FragmentFD(FileDownloader):
                  state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
                  if not ctx['live']:
                      state['eta'] = self.calc_eta(
-                        start, time_now, estimated_size,
-                        state['downloaded_bytes'])
+                        start, time_now, estimated_size - resume_len,
+                        state['downloaded_bytes'] - resume_len)
                  state['speed'] = s.get('speed') or ctx.get('speed')
                  ctx['speed'] = state['speed']
                  ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
diff --git a/youtube_dl/downloader/ism.py b/youtube_dl/downloader/ism.py

index 063fcf4446447d566785c24f94df94d95553de16..1ca666b4a11e26dff4424ba802a09895443ca08c 100644 (file)
--- a/youtube_dl/downloader/ism.py
+++ b/youtube_dl/downloader/ism.py
@@ -146,7 +146,7 @@ def write_piff_header(stream, params):
              sps, pps = codec_private_data.split(u32.pack(1))[1:]
              avcc_payload = u8.pack(1)  # configuration version
              avcc_payload += sps[1:4]  # avc profile indication + profile compatibility + avc level indication
-            avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1))  # complete represenation (1) + reserved (11111) + length size minus one
+            avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1))  # complete representation (1) + reserved (11111) + length size minus one
              avcc_payload += u8.pack(1)  # reserved (0) + number of sps (0000001)
              avcc_payload += u16.pack(len(sps))
              avcc_payload += sps
diff --git a/youtube_dl/extractor/abcnews.py b/youtube_dl/extractor/abcnews.py

index cd29aca7789cd677081c0293848ff4b85d74db21..8b407bf9c6a2af1b8fc3a8e7c9e5f87068f508c5 100644 (file)
--- a/youtube_dl/extractor/abcnews.py
+++ b/youtube_dl/extractor/abcnews.py
@@ -15,10 +15,13 @@ class AbcNewsVideoIE(AMPIE):
      IE_NAME = 'abcnews:video'
      _VALID_URL = r'''(?x)
                      https?://
-                        abcnews\.go\.com/
                          (?:
-                            [^/]+/video/(?P<display_id>[0-9a-z-]+)-|
-                            video/embed\?.*?\bid=
+                            abcnews\.go\.com/
+                            (?:
+                                [^/]+/video/(?P<display_id>[0-9a-z-]+)-|
+                                video/embed\?.*?\bid=
+                            )|
+                            fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
                          )
                          (?P<id>\d+)
                      '''
diff --git a/youtube_dl/extractor/adobepass.py b/youtube_dl/extractor/adobepass.py

index 1cf2dcbf35567bc6a47664e1bbf05234eecaf2fb..38dca1b0a4c6891edeec4c225b9afc173048e1d5 100644 (file)
--- a/youtube_dl/extractor/adobepass.py
+++ b/youtube_dl/extractor/adobepass.py
@@ -25,6 +25,11 @@ MSO_INFO = {
          'username_field': 'username',
          'password_field': 'password',
      },
+    'ATT': {
+        'name': 'AT&T U-verse',
+        'username_field': 'userid',
+        'password_field': 'password',
+    },
      'ATTOTT': {
          'name': 'DIRECTV NOW',
          'username_field': 'email',
diff --git a/youtube_dl/extractor/arte.py b/youtube_dl/extractor/arte.py

index ffc321821cd3a4a0ba9a62ad97b6f443d8ecacb3..2bd3bfe8a852159cf7aefa93e731bab995ca7d23 100644 (file)
--- a/youtube_dl/extractor/arte.py
+++ b/youtube_dl/extractor/arte.py
@@ -4,17 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_str,
-    compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
-    find_xpath_attr,
-    get_element_by_attribute,
      int_or_none,
-    NO_DEFAULT,
      qualities,
      try_get,
      unified_strdate,
@@ -25,59 +18,7 @@ from ..utils import (
  # add tests.
  
  
-class ArteTvIE(InfoExtractor):
-    _VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
-    IE_NAME = 'arte.tv'
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        lang = mobj.group('lang')
-        video_id = mobj.group('id')
-
-        ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
-        ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
-        ref_xml_doc = self._download_xml(
-            ref_xml_url, video_id, note='Downloading metadata')
-        config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
-        config_xml_url = config_node.attrib['ref']
-        config = self._download_xml(
-            config_xml_url, video_id, note='Downloading configuration')
-
-        formats = [{
-            'format_id': q.attrib['quality'],
-            # The playpath starts at 'mp4:', if we don't manually
-            # split the url, rtmpdump will incorrectly parse them
-            'url': q.text.split('mp4:', 1)[0],
-            'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
-            'ext': 'flv',
-            'quality': 2 if q.attrib['quality'] == 'hd' else 1,
-        } for q in config.findall('./urls/url')]
-        self._sort_formats(formats)
-
-        title = config.find('.//name').text
-        thumbnail = config.find('.//firstThumbnailUrl').text
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'formats': formats,
-        }
-
-
  class ArteTVBaseIE(InfoExtractor):
-    @classmethod
-    def _extract_url_info(cls, url):
-        mobj = re.match(cls._VALID_URL, url)
-        lang = mobj.group('lang')
-        query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
-        if 'vid' in query:
-            video_id = query['vid'][0]
-        else:
-            # This is not a real id, it can be for example AJT for the news
-            # http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
-            video_id = mobj.group('id')
-        return video_id, lang
-
      def _extract_from_json_url(self, json_url, video_id, lang, title=None):
          info = self._download_json(json_url, video_id)
          player_info = info['videoJsonPlayer']
@@ -108,13 +49,15 @@ class ArteTVBaseIE(InfoExtractor):
              'upload_date': unified_strdate(upload_date_str),
              'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
          }
-        qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
+        qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
  
          LANGS = {
              'fr': 'F',
              'de': 'A',
              'en': 'E[ANG]',
              'es': 'E[ESP]',
+            'it': 'E[ITA]',
+            'pl': 'E[POL]',
          }
  
          langcode = LANGS.get(lang, lang)
@@ -126,8 +69,8 @@ class ArteTVBaseIE(InfoExtractor):
              l = re.escape(langcode)
  
              # Language preference from most to least priority
-            # Reference: section 5.6.3 of
-            # http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
+            # Reference: section 6.8 of
+            # https://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-07-1.pdf
              PREFERENCES = (
                  # original version in requested language, without subtitles
                  r'VO{0}$'.format(l),
@@ -193,274 +136,59 @@ class ArteTVBaseIE(InfoExtractor):
  
  class ArteTVPlus7IE(ArteTVBaseIE):
      IE_NAME = 'arte.tv:+7'
-    _VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/(?:[^/]+/)?(?P<lang>fr|de|en|es)/(?:videos/)?(?:[^/]+/)*(?P<id>[^/?#&]+)'
-
-    _TESTS = [{
-        'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
-        'only_matching': True,
-    }, {
-        'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
-        'only_matching': True,
-    }, {
-        'url': 'http://www.arte.tv/de/videos/048696-000-A/der-kluge-bauch-unser-zweites-gehirn',
-        'only_matching': True,
-    }]
-
-    @classmethod
-    def suitable(cls, url):
-        return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
-
-    def _real_extract(self, url):
-        video_id, lang = self._extract_url_info(url)
-        webpage = self._download_webpage(url, video_id)
-        return self._extract_from_webpage(webpage, video_id, lang)
-
-    def _extract_from_webpage(self, webpage, video_id, lang):
-        patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
-        ids = (video_id, '')
-        # some pages contain multiple videos (like
-        # http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
-        # so we first try to look for json URLs that contain the video id from
-        # the 'vid' parameter.
-        patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
-        json_url = self._html_search_regex(
-            patterns, webpage, 'json vp url', default=None)
-        if not json_url:
-            def find_iframe_url(webpage, default=NO_DEFAULT):
-                return self._html_search_regex(
-                    r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
-                    webpage, 'iframe url', group='url', default=default)
-
-            iframe_url = find_iframe_url(webpage, None)
-            if not iframe_url:
-                embed_url = self._html_search_regex(
-                    r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
-                if embed_url:
-                    player = self._download_json(
-                        embed_url, video_id, 'Downloading player page')
-                    iframe_url = find_iframe_url(player['html'])
-            # en and es URLs produce react-based pages with different layout (e.g.
-            # http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
-            if not iframe_url:
-                program = self._search_regex(
-                    r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
-                    webpage, 'program', default=None)
-                if program:
-                    embed_html = self._parse_json(program, video_id)
-                    if embed_html:
-                        iframe_url = find_iframe_url(embed_html['embed_html'])
-            if iframe_url:
-                json_url = compat_parse_qs(
-                    compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
-        if json_url:
-            title = self._search_regex(
-                r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
-                webpage, 'title', default=None, group='title')
-            return self._extract_from_json_url(json_url, video_id, lang, title=title)
-        # Different kind of embed URL (e.g.
-        # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
-        entries = [
-            self.url_result(url)
-            for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
-        return self.playlist_result(entries)
-
-
-# It also uses the arte_vp_url url from the webpage to extract the information
-class ArteTVCreativeIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:creative'
-    _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
-
-    _TESTS = [{
-        'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
-        'info_dict': {
-            'id': '057405-001-A',
-            'ext': 'mp4',
-            'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
-            'upload_date': '20150716',
-        },
-    }, {
-        'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
-        'playlist_count': 11,
-        'add_ie': ['Youtube'],
-    }, {
-        'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
-        'only_matching': True,
-    }]
-
-
-class ArteTVInfoIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:info'
-    _VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
-
-    _TESTS = [{
-        'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
-        'info_dict': {
-            'id': '067528-000-A',
-            'ext': 'mp4',
-            'title': 'Service civique, un cache misère ?',
-            'upload_date': '20160403',
-        },
-    }]
-
-
-class ArteTVFutureIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:future'
-    _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
  
      _TESTS = [{
-        'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
+        'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
          'info_dict': {
-            'id': '050940-028-A',
+            'id': '088501-000-A',
              'ext': 'mp4',
-            'title': 'Les écrevisses aussi peuvent être anxieuses',
-            'upload_date': '20140902',
+            'title': 'Mexico: Stealing Petrol to Survive',
+            'upload_date': '20190628',
          },
-    }, {
-        'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
-        'only_matching': True,
      }]
  
-
-class ArteTVDDCIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:ddc'
-    _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
-
-    _TESTS = []
-
      def _real_extract(self, url):
-        video_id, lang = self._extract_url_info(url)
-        if lang == 'folge':
-            lang = 'de'
-        elif lang == 'emission':
-            lang = 'fr'
-        webpage = self._download_webpage(url, video_id)
-        scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
-        script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
-        javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
-        json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
-        return self._extract_from_json_url(json_url, video_id, lang)
-
-
-class ArteTVConcertIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:concert'
-    _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
-
-    _TESTS = [{
-        'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
-        'md5': '9ea035b7bd69696b67aa2ccaaa218161',
-        'info_dict': {
-            'id': '186',
-            'ext': 'mp4',
-            'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
-            'upload_date': '20140128',
-            'description': 'md5:486eb08f991552ade77439fe6d82c305',
-        },
-    }]
-
-
-class ArteTVCinemaIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:cinema'
-    _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
-
-    _TESTS = [{
-        'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
-        'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
-        'info_dict': {
-            'id': '062494-000-A',
-            'ext': 'mp4',
-            'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
-            'upload_date': '20150807',
-        },
-    }]
-
-
-class ArteTVMagazineIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:magazine'
-    _VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
-
-    _TESTS = [{
-        # Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
-        'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
-        'md5': '2a9369bcccf847d1c741e51416299f25',
-        'info_dict': {
-            'id': '065965-000-A',
-            'ext': 'mp4',
-            'title': 'Trepalium - Extrait Ep.01',
-            'upload_date': '20160121',
-        },
-    }, {
-        # Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
-        'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
-        'md5': 'fedc64fc7a946110fe311634e79782ca',
-        'info_dict': {
-            'id': '054813-004_PLUS7-F',
-            'ext': 'mp4',
-            'title': 'Trepalium (4/6)',
-            'description': 'md5:10057003c34d54e95350be4f9b05cb40',
-            'upload_date': '20160218',
-        },
-    }, {
-        'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
-        'only_matching': True,
-    }]
+        lang, video_id = re.match(self._VALID_URL, url).groups()
+        return self._extract_from_json_url(
+            'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
+            video_id, lang)
  
  
  class ArteTVEmbedIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:embed'
      _VALID_URL = r'''(?x)
-        http://www\.arte\.tv
-        /(?:playerv2/embed|arte_vp/index)\.php\?json_url=
+        https://www\.arte\.tv
+        /player/v3/index\.php\?json_url=
          (?P<json_url>
-            http://arte\.tv/papi/tvguide/videos/stream/player/
-            (?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
+            https?://api\.arte\.tv/api/player/v1/config/
+            (?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
          )
      '''
  
      _TESTS = []
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        lang = mobj.group('lang')
-        json_url = mobj.group('json_url')
+        json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
          return self._extract_from_json_url(json_url, video_id, lang)
  
  
-class TheOperaPlatformIE(ArteTVPlus7IE):
-    IE_NAME = 'theoperaplatform'
-    _VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
-
-    _TESTS = [{
-        'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
-        'md5': '970655901fa2e82e04c00b955e9afe7b',
-        'info_dict': {
-            'id': '060338-009-A',
-            'ext': 'mp4',
-            'title': 'Verdi - OTELLO',
-            'upload_date': '20160927',
-        },
-    }]
-
-
  class ArteTVPlaylistIE(ArteTVBaseIE):
      IE_NAME = 'arte.tv:playlist'
-    _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
+    _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
  
      _TESTS = [{
-        'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
+        'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
          'info_dict': {
-            'id': 'PL-013263',
-            'title': 'Areva & Uramin',
-            'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
+            'id': 'RC-016954',
+            'title': 'Earn a Living',
+            'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
          },
          'playlist_mincount': 6,
-    }, {
-        'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
-        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        playlist_id, lang = self._extract_url_info(url)
+        lang, playlist_id = re.match(self._VALID_URL, url).groups()
          collection = self._download_json(
              'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
              % (lang, playlist_id), playlist_id)
diff --git a/youtube_dl/extractor/asiancrush.py b/youtube_dl/extractor/asiancrush.py

index 6d71c5ad5f8dc6a0277688ea947d64d03b06a0d0..0348e680c6456946fb582ffadc90fd3fb9fa2e11 100644 (file)
--- a/youtube_dl/extractor/asiancrush.py
+++ b/youtube_dl/extractor/asiancrush.py
@@ -5,14 +5,12 @@ import re
  
  from .common import InfoExtractor
  from .kaltura import KalturaIE
-from ..utils import (
-    extract_attributes,
-    remove_end,
-)
+from ..utils import extract_attributes
  
  
  class AsianCrushIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
+    _VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
+    _VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
      _TESTS = [{
          'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
          'md5': 'c3b740e48d0ba002a42c0b72857beae6',
@@ -20,7 +18,7 @@ class AsianCrushIE(InfoExtractor):
              'id': '1_y4tmjm5r',
              'ext': 'mp4',
              'title': 'Women Who Flirt',
-            'description': 'md5:3db14e9186197857e7063522cb89a805',
+            'description': 'md5:7e986615808bcfb11756eb503a751487',
              'timestamp': 1496936429,
              'upload_date': '20170608',
              'uploader_id': 'craig@crifkin.com',
@@ -28,10 +26,27 @@ class AsianCrushIE(InfoExtractor):
      }, {
          'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
          'only_matching': True,
+    }, {
+        'url': 'https://www.yuyutv.com/video/013886v/the-act-of-killing/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.yuyutv.com/video/peep-show/013922v-warring-factions/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.midnightpulp.com/video/010400v/drifters/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.midnightpulp.com/video/mononoke/016378v-zashikiwarashi-part-1/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        host = mobj.group('host')
+        video_id = mobj.group('id')
  
          webpage = self._download_webpage(url, video_id)
  
@@ -51,7 +66,7 @@ class AsianCrushIE(InfoExtractor):
                  r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
  
          player = self._download_webpage(
-            'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
+            'https://api.%s/embeddedVideoPlayer' % host, video_id,
              query={'id': entry_id})
  
          kaltura_id = self._search_regex(
@@ -63,15 +78,23 @@ class AsianCrushIE(InfoExtractor):
                  r'/p(?:artner_id)?/(\d+)', player, 'partner id',
                  default='513551')
  
-        return self.url_result(
-            'kaltura:%s:%s' % (partner_id, kaltura_id),
-            ie=KalturaIE.ie_key(), video_id=kaltura_id,
-            video_title=title)
+        description = self._html_search_regex(
+            r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
+            webpage, 'description', fatal=False)
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
+            'ie_key': KalturaIE.ie_key(),
+            'id': video_id,
+            'title': title,
+            'description': description,
+        }
  
  
  class AsianCrushPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
-    _TEST = {
+    _VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
+    _TESTS = [{
          'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
          'info_dict': {
              'id': '12481',
@@ -79,7 +102,16 @@ class AsianCrushPlaylistIE(InfoExtractor):
              'description': 'md5:7addd7c5132a09fd4741152d96cce886',
          },
          'playlist_count': 20,
-    }
+    }, {
+        'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.midnightpulp.com/series/016375s/mononoke/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
@@ -96,15 +128,15 @@ class AsianCrushPlaylistIE(InfoExtractor):
                  entries.append(self.url_result(
                      mobj.group('url'), ie=AsianCrushIE.ie_key()))
  
-        title = remove_end(
-            self._html_search_regex(
-                r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
-                'title', default=None) or self._og_search_title(
-                webpage, default=None) or self._html_search_meta(
-                'twitter:title', webpage, 'title',
-                default=None) or self._search_regex(
-                r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
-            ' | AsianCrush')
+        title = self._html_search_regex(
+            r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
+            'title', default=None) or self._og_search_title(
+            webpage, default=None) or self._html_search_meta(
+            'twitter:title', webpage, 'title',
+            default=None) or self._search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
+        if title:
+            title = re.sub(r'\s*\|\s*.+?$', '', title)
  
          description = self._og_search_description(
              webpage, default=None) or self._html_search_meta(
diff --git a/youtube_dl/extractor/bbc.py b/youtube_dl/extractor/bbc.py

index e76507951203b2f305a9fb161073e75611e5e919..901c5a54fb6f9d3320fbd0827a222c2c9e9676f0 100644 (file)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -40,6 +40,7 @@ class BBCCoUkIE(InfoExtractor):
                              iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
                              music/(?:clips|audiovideo/popular)[/#]|
                              radio/player/|
+                            sounds/play/|
                              events/[^/]+/play/[^/]+/
                          )
                          (?P<id>%s)(?!/(?:episodes|broadcasts|clips))
@@ -70,7 +71,7 @@ class BBCCoUkIE(InfoExtractor):
              'info_dict': {
                  'id': 'b039d07m',
                  'ext': 'flv',
-                'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
+                'title': 'Kaleidoscope, Leonard Cohen',
                  'description': 'The Canadian poet and songwriter reflects on his musical career.',
              },
              'params': {
@@ -220,6 +221,20 @@ class BBCCoUkIE(InfoExtractor):
                  # rtmp download
                  'skip_download': True,
              },
+        }, {
+            'url': 'https://www.bbc.co.uk/sounds/play/m0007jzb',
+            'note': 'Audio',
+            'info_dict': {
+                'id': 'm0007jz9',
+                'ext': 'mp4',
+                'title': 'BBC Proms, 2019, Prom 34: West–Eastern Divan Orchestra',
+                'description': "Live BBC Proms. West–Eastern Divan Orchestra with Daniel Barenboim and Martha Argerich.",
+                'duration': 9840,
+            },
+            'params': {
+                # rtmp download
+                'skip_download': True,
+            }
          }, {
              'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
              'only_matching': True,
@@ -609,7 +624,7 @@ class BBCIE(BBCCoUkIE):
          'url': 'http://www.bbc.com/news/world-europe-32668511',
          'info_dict': {
              'id': 'world-europe-32668511',
-            'title': 'Russia stages massive WW2 parade despite Western boycott',
+            'title': 'Russia stages massive WW2 parade',
              'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
          },
          'playlist_count': 2,
diff --git a/youtube_dl/extractor/beampro.py b/youtube_dl/extractor/beampro.py

index e264a145ff54ad74071dea7b07cb57036757bb1a..86abdae00ce330d82bc4bbeee9612e13db0e1f7b 100644 (file)
--- a/youtube_dl/extractor/beampro.py
+++ b/youtube_dl/extractor/beampro.py
@@ -99,7 +99,7 @@ class BeamProLiveIE(BeamProBaseIE):
  
  class BeamProVodIE(BeamProBaseIE):
      IE_NAME = 'Mixer:vod'
-    _VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>\w+)'
+    _VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
      _TESTS = [{
          'url': 'https://mixer.com/willow8714?vod=2259830',
          'md5': 'b2431e6e8347dc92ebafb565d368b76b',
@@ -122,6 +122,9 @@ class BeamProVodIE(BeamProBaseIE):
      }, {
          'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
          'only_matching': True,
+    }, {
+        'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
+        'only_matching': True,
      }]
  
      @staticmethod
diff --git a/youtube_dl/extractor/beeg.py b/youtube_dl/extractor/beeg.py

index c15a0ac8fdb28b95f9399a843b45322a44967bc0..5788d13baae816ad69155d42f5f2d2b5ce837231 100644 (file)
--- a/youtube_dl/extractor/beeg.py
+++ b/youtube_dl/extractor/beeg.py
@@ -32,6 +32,10 @@ class BeegIE(InfoExtractor):
          # api/v6 v2
          'url': 'https://beeg.com/1941093077?t=911-1391',
          'only_matching': True,
+    }, {
+        # api/v6 v2 w/o t
+        'url': 'https://beeg.com/1277207756',
+        'only_matching': True,
      }, {
          'url': 'https://beeg.porn/video/5416503',
          'only_matching': True,
@@ -49,14 +53,17 @@ class BeegIE(InfoExtractor):
              r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
              default='1546225636701')
  
-        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
-        t = qs.get('t', [''])[0].split('-')
-        if len(t) > 1:
+        if len(video_id) >= 10:
              query = {
                  'v': 2,
-                's': t[0],
-                'e': t[1],
              }
+            qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+            t = qs.get('t', [''])[0].split('-')
+            if len(t) > 1:
+                query.update({
+                    's': t[0],
+                    'e': t[1],
+                })
          else:
              query = {'v': 1}
  
diff --git a/youtube_dl/extractor/biobiochiletv.py b/youtube_dl/extractor/biobiochiletv.py

index b92031c8ab6bac3c5d5135743dda5883de080855..dc86c57c5df2e1efdb995cae9bd482f31b6c48fa 100644 (file)
--- a/youtube_dl/extractor/biobiochiletv.py
+++ b/youtube_dl/extractor/biobiochiletv.py
@@ -6,7 +6,6 @@ from ..utils import (
      ExtractorError,
      remove_end,
  )
-from .rudo import RudoIE
  
  
  class BioBioChileTVIE(InfoExtractor):
@@ -41,11 +40,15 @@ class BioBioChileTVIE(InfoExtractor):
      }, {
          'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
          'info_dict': {
-            'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
+            'id': 'b4xd0LK3SK',
              'ext': 'mp4',
-            'uploader': '(none)',
-            'upload_date': '20160708',
-            'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
+            # TODO: fix url_transparent information overriding
+            # 'uploader': 'Juan Pablo Echenique',
+            'title': 'Comentario Oscar Cáceres',
+        },
+        'params': {
+            # empty m3u8 manifest
+            'skip_download': True,
          },
      }, {
          'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
@@ -60,7 +63,9 @@ class BioBioChileTVIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        rudo_url = RudoIE._extract_url(webpage)
+        rudo_url = self._search_regex(
+            r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
+            webpage, 'embed URL', None, group='url')
          if not rudo_url:
              raise ExtractorError('No videos found')
  
@@ -68,7 +73,7 @@ class BioBioChileTVIE(InfoExtractor):
  
          thumbnail = self._og_search_thumbnail(webpage)
          uploader = self._html_search_regex(
-            r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
+            r'<a[^>]+href=["\'](?:https?://(?:busca|www)\.biobiochile\.cl)?/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
              webpage, 'uploader', fatal=False)
  
          return {
diff --git a/youtube_dl/extractor/bleacherreport.py b/youtube_dl/extractor/bleacherreport.py

index e829974ff1e5aa6e0b79c0567e2febe0fc683028..dc60224d00466f1943f2b7644efd4f4ba89a3d31 100644 (file)
--- a/youtube_dl/extractor/bleacherreport.py
+++ b/youtube_dl/extractor/bleacherreport.py
@@ -71,7 +71,7 @@ class BleacherReportIE(InfoExtractor):
          video = article_data.get('video')
          if video:
              video_type = video['type']
-            if video_type == 'cms.bleacherreport.com':
+            if video_type in ('cms.bleacherreport.com', 'vid.bleacherreport.com'):
                  info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
              elif video_type == 'ooyala.com':
                  info['url'] = 'ooyala:%s' % video['id']
@@ -87,9 +87,9 @@ class BleacherReportIE(InfoExtractor):
  
  
  class BleacherReportCMSIE(AMPIE):
-    _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})'
+    _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
      _TESTS = [{
-        'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
+        'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
          'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
          'info_dict': {
              'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
@@ -101,6 +101,6 @@ class BleacherReportCMSIE(AMPIE):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id)
+        info = self._extract_feed_info('http://vid.bleacherreport.com/videos/%s.akamai' % video_id)
          info['id'] = video_id
          return info
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 9c3e9eec6fb91e000dc632736691b53162dc4d05..85978661793a77411419d619f654a33755d23d4b 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -220,7 +220,7 @@ class InfoExtractor(object):
                          * "preference" (optional, int) - quality of the image
                          * "width" (optional, int)
                          * "height" (optional, int)
-                        * "resolution" (optional, string "{width}x{height"},
+                        * "resolution" (optional, string "{width}x{height}",
                                          deprecated)
                          * "filesize" (optional, int)
      thumbnail:      Full URL to a video thumbnail image.
diff --git a/youtube_dl/extractor/ctsnews.py b/youtube_dl/extractor/ctsnews.py

index d565335cf6c31a047b8882415afb4ea259578a06..679f1d92e95f3d78dcc945f727d53aee6182a93b 100644 (file)
--- a/youtube_dl/extractor/ctsnews.py
+++ b/youtube_dl/extractor/ctsnews.py
@@ -3,6 +3,7 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import unified_timestamp
+from .youtube import YoutubeIE
  
  
  class CtsNewsIE(InfoExtractor):
@@ -14,8 +15,8 @@ class CtsNewsIE(InfoExtractor):
          'info_dict': {
              'id': '201501291578109',
              'ext': 'mp4',
-            'title': '以色列.真主黨交火 3人死亡',
-            'description': '以色列和黎巴嫩真主黨，爆發五年最嚴重衝突，雙方砲轟交火，兩名以軍死亡，還有一名西班牙籍的聯合國維和人...',
+            'title': '以色列.真主黨交火 3人死亡 - 華視新聞網',
+            'description': '以色列和黎巴嫩真主黨，爆發五年最嚴重衝突，雙方砲轟交火，兩名以軍死亡，還有一名西班牙籍的聯合國維和人員也不幸罹難。大陸陝西、河南、安徽、江蘇和湖北五個省份出現大暴雪，嚴重影響陸空交通，不過九華山卻出現...',
              'timestamp': 1422528540,
              'upload_date': '20150129',
          }
@@ -26,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
          'info_dict': {
              'id': '201309031304098',
              'ext': 'mp4',
-            'title': '韓國31歲童顏男 貌如十多歲小孩',
+            'title': '韓國31歲童顏男 貌如十多歲小孩 - 華視新聞網',
              'description': '越有年紀的人，越希望看起來年輕一點，而南韓卻有一位31歲的男子，看起來像是11、12歲的小孩，身...',
              'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1378205880,
@@ -62,8 +63,7 @@ class CtsNewsIE(InfoExtractor):
              video_url = mp4_feed['source_url']
          else:
              self.to_screen('Not CTSPlayer video, trying Youtube...')
-            youtube_url = self._search_regex(
-                r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
+            youtube_url = YoutubeIE._extract_url(page)
  
              return self.url_result(youtube_url, ie='Youtube')
  
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 3d3d78041a57fee341a8ded60cfe605c28fb687a..745971900b23f1cbf44ce286b9c4a4a7e527a3da 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -48,7 +48,14 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
  
  
  class DailymotionIE(DailymotionBaseInfoExtractor):
-    _VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)'
+    _VALID_URL = r'''(?ix)
+                    https?://
+                        (?:
+                            (?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
+                            (?:www\.)?lequipe\.fr/video
+                        )
+                        /(?P<id>[^/?_]+)
+                    '''
      IE_NAME = 'dailymotion'
  
      _FORMATS = [
@@ -133,6 +140,12 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
      }, {
          'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
          'only_matching': True,
+    }, {
+        'url': 'https://www.lequipe.fr/video/x791mem',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
+        'only_matching': True,
      }]
  
      @staticmethod
diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py

index f232f0dc536f612530e6ca7cfa0fde97e20b9467..aaedf2e3d36084d90699277a52aebb6c3a71a335 100644 (file)
--- a/youtube_dl/extractor/dbtv.py
+++ b/youtube_dl/extractor/dbtv.py
@@ -7,50 +7,51 @@ from .common import InfoExtractor
  
  
  class DBTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
+    _VALID_URL = r'https?://(?:www\.)?dagbladet\.no/video/(?:(?:embed|(?P<display_id>[^/]+))/)?(?P<id>[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8})'
      _TESTS = [{
-        'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
-        'md5': '2e24f67936517b143a234b4cadf792ec',
+        'url': 'https://www.dagbladet.no/video/PynxJnNWChE/',
+        'md5': 'b8f850ba1860adbda668d367f9b77699',
          'info_dict': {
-            'id': '3649835190001',
-            'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
+            'id': 'PynxJnNWChE',
              'ext': 'mp4',
              'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
-            'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
+            'description': 'md5:49cc8370e7d66e8a2ef15c3b4631fd3f',
              'thumbnail': r're:https?://.*\.jpg',
-            'timestamp': 1404039863,
-            'upload_date': '20140629',
-            'duration': 69.544,
-            'uploader_id': '1027729757001',
+            'upload_date': '20160916',
+            'duration': 69,
+            'uploader_id': 'UCk5pvsyZJoYJBd7_oFPTlRQ',
+            'uploader': 'Dagbladet',
          },
-        'add_ie': ['BrightcoveNew']
+        'add_ie': ['Youtube']
      }, {
-        'url': 'http://dbtv.no/3649835190001',
+        'url': 'https://www.dagbladet.no/video/embed/xlGmyIeN9Jo/?autoplay=false',
          'only_matching': True,
      }, {
-        'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
-        'only_matching': True,
-    }, {
-        'url': 'http://dbtv.no/vice/5000634109001',
-        'only_matching': True,
-    }, {
-        'url': 'http://dbtv.no/filmtrailer/3359293614001',
+        'url': 'https://www.dagbladet.no/video/truer-iran-bor-passe-dere/PalfB2Cw',
          'only_matching': True,
      }]
  
      @staticmethod
      def _extract_urls(webpage):
          return [url for _, url in re.findall(
-            r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
+            r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dagbladet\.no/video/embed/(?:[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8}).*?)\1',
              webpage)]
  
      def _real_extract(self, url):
-        video_id, display_id = re.match(self._VALID_URL, url).groups()
-
-        return {
+        display_id, video_id = re.match(self._VALID_URL, url).groups()
+        info = {
              '_type': 'url_transparent',
-            'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
              'id': video_id,
              'display_id': display_id,
-            'ie_key': 'BrightcoveNew',
          }
+        if len(video_id) == 11:
+            info.update({
+                'url': video_id,
+                'ie_key': 'Youtube',
+            })
+        else:
+            info.update({
+                'url': 'jwplatform:' + video_id,
+                'ie_key': 'JWPlatform',
+            })
+        return info
diff --git a/youtube_dl/extractor/discovery.py b/youtube_dl/extractor/discovery.py

index b70c307a75b520aa63e44fea69762cacc6bbc319..6a2712cc50429b7297a9d4fe9e1ec2d80177986e 100644 (file)
--- a/youtube_dl/extractor/discovery.py
+++ b/youtube_dl/extractor/discovery.py
@@ -5,23 +5,17 @@ import re
  import string
  
  from .discoverygo import DiscoveryGoBaseIE
-from ..compat import (
-    compat_str,
-    compat_urllib_parse_unquote,
-)
-from ..utils import (
-    ExtractorError,
-    try_get,
-)
+from ..compat import compat_urllib_parse_unquote
+from ..utils import ExtractorError
  from ..compat import compat_HTTPError
  
  
  class DiscoveryIE(DiscoveryGoBaseIE):
      _VALID_URL = r'''(?x)https?://
          (?P<site>
+            (?:(?:www|go)\.)?discovery|
              (?:www\.)?
                  (?:
-                    discovery|
                      investigationdiscovery|
                      discoverylife|
                      animalplanet|
@@ -40,15 +34,15 @@ class DiscoveryIE(DiscoveryGoBaseIE):
                      cookingchanneltv|
                      motortrend
                  )
-        )\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
+        )\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
      _TESTS = [{
-        'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
+        'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
          'info_dict': {
-            'id': '5a2d9b4d6b66d17a5026e1fd',
+            'id': '5a2f35ce6b66d17a5026e29e',
              'ext': 'mp4',
-            'title': 'Dave Foley',
-            'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f',
-            'duration': 608,
+            'title': 'Riding with Matthew Perry',
+            'description': 'md5:a34333153e79bc4526019a5129e7f878',
+            'duration': 84,
          },
          'params': {
              'skip_download': True,  # requires ffmpeg
@@ -56,20 +50,20 @@ class DiscoveryIE(DiscoveryGoBaseIE):
      }, {
          'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
          'only_matching': True,
+    }, {
+        'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
+        'only_matching': True,
+    }, {
+        # using `show_slug` is important to get the correct video data
+        'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
+        'only_matching': True,
      }]
      _GEO_COUNTRIES = ['US']
      _GEO_BYPASS = False
+    _API_BASE_URL = 'https://api.discovery.com/v1/'
  
      def _real_extract(self, url):
-        site, path, display_id = re.match(self._VALID_URL, url).groups()
-        webpage = self._download_webpage(url, display_id)
-
-        react_data = self._parse_json(self._search_regex(
-            r'window\.__reactTransmitPacket\s*=\s*({.+?});',
-            webpage, 'react data'), display_id)
-        content_blocks = react_data['layout'][path]['contentBlocks']
-        video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
-        video_id = video['id']
+        site, show_slug, display_id = re.match(self._VALID_URL, url).groups()
  
          access_token = None
          cookies = self._get_cookies(url)
@@ -79,27 +73,36 @@ class DiscoveryIE(DiscoveryGoBaseIE):
          if auth_storage_cookie and auth_storage_cookie.value:
              auth_storage = self._parse_json(compat_urllib_parse_unquote(
                  compat_urllib_parse_unquote(auth_storage_cookie.value)),
-                video_id, fatal=False) or {}
+                display_id, fatal=False) or {}
              access_token = auth_storage.get('a') or auth_storage.get('access_token')
  
          if not access_token:
              access_token = self._download_json(
-                'https://%s.com/anonymous' % site, display_id, query={
+                'https://%s.com/anonymous' % site, display_id,
+                'Downloading token JSON metadata', query={
                      'authRel': 'authorization',
-                    'client_id': try_get(
-                        react_data, lambda x: x['application']['apiClientId'],
-                        compat_str) or '3020a40c2356a645b4b4',
+                    'client_id': '3020a40c2356a645b4b4',
                      'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
                      'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
                  })['access_token']
  
-        try:
-            headers = self.geo_verification_headers()
-            headers['Authorization'] = 'Bearer ' + access_token
+        headers = self.geo_verification_headers()
+        headers['Authorization'] = 'Bearer ' + access_token
  
+        try:
+            video = self._download_json(
+                self._API_BASE_URL + 'content/videos',
+                display_id, 'Downloading content JSON metadata',
+                headers=headers, query={
+                    'embed': 'show.name',
+                    'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
+                    'slug': display_id,
+                    'show_slug': show_slug,
+                })[0]
+            video_id = video['id']
              stream = self._download_json(
-                'https://api.discovery.com/v1/streaming/video/' + video_id,
-                display_id, headers=headers)
+                self._API_BASE_URL + 'streaming/video/' + video_id,
+                display_id, 'Downloading streaming JSON metadata', headers=headers)
          except ExtractorError as e:
              if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
                  e_description = self._parse_json(
diff --git a/youtube_dl/extractor/dlive.py b/youtube_dl/extractor/dlive.py

new file mode 100644 (file)

index 0000000..d95c67a
--- /dev/null
+++ b/youtube_dl/extractor/dlive.py
@@ -0,0 +1,97 @@
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class DLiveVODIE(InfoExtractor):
+    IE_NAME = 'dlive:vod'
+    _VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
+        'info_dict': {
+            'id': '3mTzOl4WR',
+            'ext': 'mp4',
+            'title': 'Minecraft with james charles epic',
+            'upload_date': '20190701',
+            'timestamp': 1562011015,
+            'uploader_id': 'pdp',
+        }
+    }, {
+        'url': 'https://dlive.tv/p/pdpreplay+D-RD-xSZg',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        uploader_id, vod_id = re.match(self._VALID_URL, url).groups()
+        broadcast = self._download_json(
+            'https://graphigo.prd.dlive.tv/', vod_id,
+            data=json.dumps({'query': '''query {
+  pastBroadcast(permlink:"%s+%s") {
+    content
+    createdAt
+    length
+    playbackUrl
+    title
+    thumbnailUrl
+    viewCount
+  }
+}''' % (uploader_id, vod_id)}).encode())['data']['pastBroadcast']
+        title = broadcast['title']
+        formats = self._extract_m3u8_formats(
+            broadcast['playbackUrl'], vod_id, 'mp4', 'm3u8_native')
+        self._sort_formats(formats)
+        return {
+            'id': vod_id,
+            'title': title,
+            'uploader_id': uploader_id,
+            'formats': formats,
+            'description': broadcast.get('content'),
+            'thumbnail': broadcast.get('thumbnailUrl'),
+            'timestamp': int_or_none(broadcast.get('createdAt'), 1000),
+            'view_count': int_or_none(broadcast.get('viewCount')),
+        }
+
+
+class DLiveStreamIE(InfoExtractor):
+    IE_NAME = 'dlive:stream'
+    _VALID_URL = r'https?://(?:www\.)?dlive\.tv/(?!p/)(?P<id>[\w.-]+)'
+
+    def _real_extract(self, url):
+        display_name = self._match_id(url)
+        user = self._download_json(
+            'https://graphigo.prd.dlive.tv/', display_name,
+            data=json.dumps({'query': '''query {
+  userByDisplayName(displayname:"%s") {
+    livestream {
+      content
+      createdAt
+      title
+      thumbnailUrl
+      watchingCount
+    }
+    username
+  }
+}''' % display_name}).encode())['data']['userByDisplayName']
+        livestream = user['livestream']
+        title = livestream['title']
+        username = user['username']
+        formats = self._extract_m3u8_formats(
+            'https://live.prd.dlive.tv/hls/live/%s.m3u8' % username,
+            display_name, 'mp4')
+        self._sort_formats(formats)
+        return {
+            'id': display_name,
+            'title': self._live_title(title),
+            'uploader': display_name,
+            'uploader_id': username,
+            'formats': formats,
+            'description': livestream.get('content'),
+            'thumbnail': livestream.get('thumbnailUrl'),
+            'is_live': True,
+            'timestamp': int_or_none(livestream.get('createdAt'), 1000),
+            'view_count': int_or_none(livestream.get('watchingCount')),
+        }
diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py

index 4485bf8c1a8a25e2650ad3568dab7f8b54159ea2..4e0f8bc819c70730a476ca31cd4320cecdc25b3d 100644 (file)
--- a/youtube_dl/extractor/einthusan.py
+++ b/youtube_dl/extractor/einthusan.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  import json
+import re
  
  from .common import InfoExtractor
  from ..compat import (
@@ -18,7 +19,7 @@ from ..utils import (
  
  
  class EinthusanIE(InfoExtractor):
-    _VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com|ca))/movie/watch/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://einthusan.tv/movie/watch/9097/',
          'md5': 'ff0f7f2065031b8a2cf13a933731c035',
@@ -32,6 +33,12 @@ class EinthusanIE(InfoExtractor):
      }, {
          'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
          'only_matching': True,
+    }, {
+        'url': 'https://einthusan.com/movie/watch/9097/',
+        'only_matching': True,
+    }, {
+        'url': 'https://einthusan.ca/movie/watch/4E9n/?lang=hindi',
+        'only_matching': True,
      }]
  
      # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
@@ -41,7 +48,9 @@ class EinthusanIE(InfoExtractor):
          )).decode('utf-8'), video_id)
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        host = mobj.group('host')
+        video_id = mobj.group('id')
  
          webpage = self._download_webpage(url, video_id)
  
@@ -53,7 +62,7 @@ class EinthusanIE(InfoExtractor):
          page_id = self._html_search_regex(
              '<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
          video_data = self._download_json(
-            'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
+            'https://%s/ajax/movie/watch/%s/' % (host, video_id), video_id,
              data=urlencode_postdata({
                  'xEvent': 'UIVideoPlayer.PingOutcome',
                  'xJson': json.dumps({
diff --git a/youtube_dl/extractor/espn.py b/youtube_dl/extractor/espn.py

index 8cc9bd165a96185b264e6e5976ae347fe02e2437..6cf05e6da8204c1e7cb6f7790fcb24104c8341cd 100644 (file)
--- a/youtube_dl/extractor/espn.py
+++ b/youtube_dl/extractor/espn.py
@@ -216,17 +216,14 @@ class FiveThirtyEightIE(InfoExtractor):
      _TEST = {
          'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/',
          'info_dict': {
-            'id': '21846851',
-            'ext': 'mp4',
+            'id': '56032156',
+            'ext': 'flv',
              'title': 'FiveThirtyEight: The Raiders can still make the playoffs',
              'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.',
-            'timestamp': 1513960621,
-            'upload_date': '20171222',
          },
          'params': {
              'skip_download': True,
          },
-        'expected_warnings': ['Unable to download f4m manifest'],
      }
  
      def _real_extract(self, url):
@@ -234,9 +231,8 @@ class FiveThirtyEightIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        video_id = self._search_regex(
-            r'data-video-id=["\'](?P<id>\d+)',
-            webpage, 'video id', group='id')
+        embed_url = self._search_regex(
+            r'<iframe[^>]+src=["\'](https?://fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/\d+)',
+            webpage, 'embed url')
  
-        return self.url_result(
-            'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())
+        return self.url_result(embed_url, 'AbcNewsVideo')
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

index 530474f3fc61282799a64d9c52589de3a0cc38a1..4adcae1e5a240bc81f53e0ed63f1ae6ca19b9442 100644 (file)
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -58,17 +58,8 @@ from .ard import (
      ARDMediathekIE,
  )
  from .arte import (
-    ArteTvIE,
      ArteTVPlus7IE,
-    ArteTVCreativeIE,
-    ArteTVConcertIE,
-    ArteTVInfoIE,
-    ArteTVFutureIE,
-    ArteTVCinemaIE,
-    ArteTVDDCIE,
-    ArteTVMagazineIE,
      ArteTVEmbedIE,
-    TheOperaPlatformIE,
      ArteTVPlaylistIE,
  )
  from .asiancrush import (
@@ -404,11 +395,7 @@ from .frontendmasters import (
      FrontendMastersCourseIE
  )
  from .funimation import FunimationIE
-from .funk import (
-    FunkMixIE,
-    FunkChannelIE,
-)
-from .funnyordie import FunnyOrDieIE
+from .funk import FunkIE
  from .fusion import FusionIE
  from .fxnetworks import FXNetworksIE
  from .gaia import GaiaIE
@@ -592,6 +579,7 @@ from .linkedin import (
  )
  from .linuxacademy import LinuxAcademyIE
  from .litv import LiTVIE
+from .livejournal import LiveJournalIE
  from .liveleak import (
      LiveLeakIE,
      LiveLeakEmbedIE,
@@ -980,7 +968,6 @@ from .rts import RTSIE
  from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
  from .rtvnh import RTVNHIE
  from .rtvs import RTVSIE
-from .rudo import RudoIE
  from .ruhd import RUHDIE
  from .rutube import (
      RutubeIE,
@@ -1268,6 +1255,10 @@ from .udn import UDNEmbedIE
  from .ufctv import UFCTVIE
  from .uktvplay import UKTVPlayIE
  from .digiteka import DigitekaIE
+from .dlive import (
+    DLiveVODIE,
+    DLiveStreamIE,
+)
  from .umg import UMGDeIE
  from .unistra import UnistraIE
  from .unity import UnityIE
@@ -1434,6 +1425,7 @@ from .xfileshare import XFileShareIE
  from .xhamster import (
      XHamsterIE,
      XHamsterEmbedIE,
+    XHamsterUserIE,
  )
  from .xiami import (
      XiamiSongIE,
@@ -1457,6 +1449,7 @@ from .yahoo import (
      YahooSearchIE,
      YahooGyaOPlayerIE,
      YahooGyaOIE,
+    YahooJapanNewsIE,
  )
  from .yandexdisk import YandexDiskIE
  from .yandexmusic import (
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index 789dd79d5474ea8f7c9d18df244c686076e3e646..a3dcdca3e2bd424bd7a24a800bdf457ec37a033f 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -428,7 +428,7 @@ class FacebookIE(InfoExtractor):
          timestamp = int_or_none(self._search_regex(
              r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
              'timestamp', default=None))
-        thumbnail = self._og_search_thumbnail(webpage)
+        thumbnail = self._html_search_meta(['og:image', 'twitter:image'], webpage)
  
          view_count = parse_count(self._search_regex(
              r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',
diff --git a/youtube_dl/extractor/fivetv.py b/youtube_dl/extractor/fivetv.py

index 9f9863746d3daacb8f008a36add7d237ea6ee12f..c4c0f1b3d1451a6779c53e53071e701d225f9bac 100644 (file)
--- a/youtube_dl/extractor/fivetv.py
+++ b/youtube_dl/extractor/fivetv.py
@@ -9,7 +9,7 @@ from ..utils import int_or_none
  
  class FiveTVIE(InfoExtractor):
      _VALID_URL = r'''(?x)
-                    http://
+                    https?://
                          (?:www\.)?5-tv\.ru/
                          (?:
                              (?:[^/]+/)+(?P<id>\d+)|
@@ -39,6 +39,7 @@ class FiveTVIE(InfoExtractor):
              'duration': 180,
          },
      }, {
+        # redirect to https://www.5-tv.ru/projects/1000095/izvestia-glavnoe/
          'url': 'http://www.5-tv.ru/glavnoe/#itemDetails',
          'info_dict': {
              'id': 'glavnoe',
@@ -46,6 +47,7 @@ class FiveTVIE(InfoExtractor):
              'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
              'thumbnail': r're:^https?://.*\.jpg$',
          },
+        'skip': 'redirect to «Известия. Главное» project page',
      }, {
          'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
          'only_matching': True,
@@ -70,7 +72,7 @@ class FiveTVIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          video_url = self._search_regex(
-            [r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"',
+            [r'<div[^>]+?class="(?:flow)?player[^>]+?data-href="([^"]+)"',
               r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
              webpage, 'video url')
  
diff --git a/youtube_dl/extractor/funk.py b/youtube_dl/extractor/funk.py

index 7e1af95e0fcd86b72115ef60b4b6dabe52f94d77..81d1949fd22fac93e78460e47bd79a6a50ff0105 100644 (file)
--- a/youtube_dl/extractor/funk.py
+++ b/youtube_dl/extractor/funk.py
@@ -1,89 +1,21 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import itertools
  import re
  
  from .common import InfoExtractor
  from .nexx import NexxIE
-from ..compat import compat_str
  from ..utils import (
      int_or_none,
-    try_get,
+    str_or_none,
  )
  
  
-class FunkBaseIE(InfoExtractor):
-    _HEADERS = {
-        'Accept': '*/*',
-        'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
-        'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4',
-    }
-    _AUTH = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4'
-
-    @staticmethod
-    def _make_headers(referer):
-        headers = FunkBaseIE._HEADERS.copy()
-        headers['Referer'] = referer
-        return headers
-
-    def _make_url_result(self, video):
-        return {
-            '_type': 'url_transparent',
-            'url': 'nexx:741:%s' % video['sourceId'],
-            'ie_key': NexxIE.ie_key(),
-            'id': video['sourceId'],
-            'title': video.get('title'),
-            'description': video.get('description'),
-            'duration': int_or_none(video.get('duration')),
-            'season_number': int_or_none(video.get('seasonNr')),
-            'episode_number': int_or_none(video.get('episodeNr')),
-        }
-
-
-class FunkMixIE(FunkBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
+class FunkIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?funk\.net/(?:channel|playlist)/[^/]+/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
      _TESTS = [{
-        'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten',
-        'md5': '8edf617c2f2b7c9847dfda313f199009',
-        'info_dict': {
-            'id': '123748',
-            'ext': 'mp4',
-            'title': '"Die realste Kifferdoku aller Zeiten"',
-            'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
-            'timestamp': 1490274721,
-            'upload_date': '20170323',
-        },
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        mix_id = mobj.group('id')
-        alias = mobj.group('alias')
-
-        lists = self._download_json(
-            'https://www.funk.net/api/v3.1/curation/curatedLists/',
-            mix_id, headers=self._make_headers(url), query={
-                'size': 100,
-            })['_embedded']['curatedListList']
-
-        metas = next(
-            l for l in lists
-            if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
-        video = next(
-            meta['videoDataDelegate']
-            for meta in metas
-            if try_get(
-                meta, lambda x: x['videoDataDelegate']['alias'],
-                compat_str) == alias)
-
-        return self._make_url_result(video)
-
-
-class FunkChannelIE(FunkBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
-    _TESTS = [{
-        'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
+        'url': 'https://www.funk.net/channel/ba-793/die-lustigsten-instrumente-aus-dem-internet-teil-2-1155821',
+        'md5': '8dd9d9ab59b4aa4173b3197f2ea48e81',
          'info_dict': {
              'id': '1155821',
              'ext': 'mp4',
@@ -92,83 +24,26 @@ class FunkChannelIE(FunkBaseIE):
              'timestamp': 1514507395,
              'upload_date': '20171229',
          },
-        'params': {
-            'skip_download': True,
-        },
-    }, {
-        # only available via byIdList API
-        'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
-        'info_dict': {
-            'id': '205067',
-            'ext': 'mp4',
-            'title': 'Martin Sonneborn erklärt die EU',
-            'description': 'md5:050f74626e4ed87edf4626d2024210c0',
-            'timestamp': 1494424042,
-            'upload_date': '20170510',
-        },
-        'params': {
-            'skip_download': True,
-        },
+
      }, {
-        'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
+        'url': 'https://www.funk.net/playlist/neuesteVideos/kameras-auf-dem-fusion-festival-1618699',
          'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        channel_id = mobj.group('id')
-        alias = mobj.group('alias')
-
-        headers = self._make_headers(url)
-
-        video = None
-
-        # Id-based channels are currently broken on their side: webplayer
-        # tries to process them via byChannelAlias endpoint and fails
-        # predictably.
-        for page_num in itertools.count():
-            by_channel_alias = self._download_json(
-                'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
-                % channel_id,
-                'Downloading byChannelAlias JSON page %d' % (page_num + 1),
-                headers=headers, query={
-                    'filterFsk': 'false',
-                    'sort': 'creationDate,desc',
-                    'size': 100,
-                    'page': page_num,
-                }, fatal=False)
-            if not by_channel_alias:
-                break
-            video_list = try_get(
-                by_channel_alias, lambda x: x['_embedded']['videoList'], list)
-            if not video_list:
-                break
-            try:
-                video = next(r for r in video_list if r.get('alias') == alias)
-                break
-            except StopIteration:
-                pass
-            if not try_get(
-                    by_channel_alias, lambda x: x['_links']['next']):
-                break
-
-        if not video:
-            by_id_list = self._download_json(
-                'https://www.funk.net/api/v3.0/content/videos/byIdList',
-                channel_id, 'Downloading byIdList JSON', headers=headers,
-                query={
-                    'ids': alias,
-                }, fatal=False)
-            if by_id_list:
-                video = try_get(by_id_list, lambda x: x['result'][0], dict)
-
-        if not video:
-            results = self._download_json(
-                'https://www.funk.net/api/v3.0/content/videos/filter',
-                channel_id, 'Downloading filter JSON', headers=headers, query={
-                    'channelId': channel_id,
-                    'size': 100,
-                })['result']
-            video = next(r for r in results if r.get('alias') == alias)
-
-        return self._make_url_result(video)
+        display_id, nexx_id = re.match(self._VALID_URL, url).groups()
+        video = self._download_json(
+            'https://www.funk.net/api/v4.0/videos/' + nexx_id, nexx_id)
+        return {
+            '_type': 'url_transparent',
+            'url': 'nexx:741:' + nexx_id,
+            'ie_key': NexxIE.ie_key(),
+            'id': nexx_id,
+            'title': video.get('title'),
+            'description': video.get('description'),
+            'duration': int_or_none(video.get('duration')),
+            'channel_id': str_or_none(video.get('channelId')),
+            'display_id': display_id,
+            'tags': video.get('tags'),
+            'thumbnail': video.get('imageUrlLandscape'),
+        }
diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py

deleted file mode 100644 (file)

index f85e7de..0000000
--- a/youtube_dl/extractor/funnyordie.py
+++ /dev/null
@@ -1,162 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    float_or_none,
-    int_or_none,
-    unified_timestamp,
-)
-
-
-class FunnyOrDieIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|articles|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
-    _TESTS = [{
-        'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
-        'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
-        'info_dict': {
-            'id': '0732f586d7',
-            'ext': 'mp4',
-            'title': 'Heart-Shaped Box: Literal Video Version',
-            'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
-            'thumbnail': r're:^http:.*\.jpg$',
-            'uploader': 'DASjr',
-            'timestamp': 1317904928,
-            'upload_date': '20111006',
-            'duration': 318.3,
-        },
-    }, {
-        'url': 'http://www.funnyordie.com/embed/e402820827',
-        'info_dict': {
-            'id': 'e402820827',
-            'ext': 'mp4',
-            'title': 'Please Use This Song (Jon Lajoie)',
-            'description': 'Please use this to sell something.  www.jonlajoie.com',
-            'thumbnail': r're:^http:.*\.jpg$',
-            'timestamp': 1398988800,
-            'upload_date': '20140502',
-        },
-        'params': {
-            'skip_download': True,
-        },
-    }, {
-        'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-
-        video_id = mobj.group('id')
-        webpage = self._download_webpage(url, video_id)
-
-        links = re.findall(r'<source src="([^"]+/v)[^"]+\.([^"]+)" type=\'video', webpage)
-        if not links:
-            raise ExtractorError('No media links available for %s' % video_id)
-
-        links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
-
-        m3u8_url = self._search_regex(
-            r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
-            webpage, 'm3u8 url', group='url')
-
-        formats = []
-
-        m3u8_formats = self._extract_m3u8_formats(
-            m3u8_url, video_id, 'mp4', 'm3u8_native',
-            m3u8_id='hls', fatal=False)
-        source_formats = list(filter(
-            lambda f: f.get('vcodec') != 'none', m3u8_formats))
-
-        bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)(?=[,/])', m3u8_url)]
-        bitrates.sort()
-
-        if source_formats:
-            self._sort_formats(source_formats)
-
-        for bitrate, f in zip(bitrates, source_formats or [{}] * len(bitrates)):
-            for path, ext in links:
-                ff = f.copy()
-                if ff:
-                    if ext != 'mp4':
-                        ff = dict(
-                            [(k, v) for k, v in ff.items()
-                             if k in ('height', 'width', 'format_id')])
-                    ff.update({
-                        'format_id': ff['format_id'].replace('hls', ext),
-                        'ext': ext,
-                        'protocol': 'http',
-                    })
-                else:
-                    ff.update({
-                        'format_id': '%s-%d' % (ext, bitrate),
-                        'vbr': bitrate,
-                    })
-                ff['url'] = self._proto_relative_url(
-                    '%s%d.%s' % (path, bitrate, ext))
-                formats.append(ff)
-        self._check_formats(formats, video_id)
-
-        formats.extend(m3u8_formats)
-        self._sort_formats(
-            formats, field_preference=('height', 'width', 'tbr', 'format_id'))
-
-        subtitles = {}
-        for src, src_lang in re.findall(r'<track kind="captions" src="([^"]+)" srclang="([^"]+)"', webpage):
-            subtitles[src_lang] = [{
-                'ext': src.split('/')[-1],
-                'url': 'http://www.funnyordie.com%s' % src,
-            }]
-
-        timestamp = unified_timestamp(self._html_search_meta(
-            'uploadDate', webpage, 'timestamp', default=None))
-
-        uploader = self._html_search_regex(
-            r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
-            webpage, 'uploader', default=None)
-
-        title, description, thumbnail, duration = [None] * 4
-
-        medium = self._parse_json(
-            self._search_regex(
-                r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
-                default='{}'),
-            video_id, fatal=False)
-        if medium:
-            title = medium.get('title')
-            duration = float_or_none(medium.get('duration'))
-            if not timestamp:
-                timestamp = unified_timestamp(medium.get('publishDate'))
-
-        post = self._parse_json(
-            self._search_regex(
-                r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
-                default='{}'),
-            video_id, fatal=False)
-        if post:
-            if not title:
-                title = post.get('name')
-            description = post.get('description')
-            thumbnail = post.get('picture')
-
-        if not title:
-            title = self._og_search_title(webpage)
-        if not description:
-            description = self._og_search_description(webpage)
-        if not duration:
-            duration = int_or_none(self._html_search_meta(
-                ('video:duration', 'duration'), webpage, 'duration', default=False))
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'timestamp': timestamp,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
diff --git a/youtube_dl/extractor/gameinformer.py b/youtube_dl/extractor/gameinformer.py

index a2920a793ba45d3fef47eebba26bc3c19517b63c..f1b96c172edd9b80b974ec43aab8be1242e8f9a4 100644 (file)
--- a/youtube_dl/extractor/gameinformer.py
+++ b/youtube_dl/extractor/gameinformer.py
@@ -1,12 +1,19 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+from .brightcove import BrightcoveNewIE
  from .common import InfoExtractor
+from ..utils import (
+    clean_html,
+    get_element_by_class,
+    get_element_by_id,
+)
  
  
  class GameInformerIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>[^.?&#]+)'
+    _TESTS = [{
+        # normal Brightcove embed code extracted with BrightcoveNewIE._extract_url
          'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
          'md5': '292f26da1ab4beb4c9099f1304d2b071',
          'info_dict': {
@@ -18,16 +25,25 @@ class GameInformerIE(InfoExtractor):
              'upload_date': '20150928',
              'uploader_id': '694940074001',
          },
-    }
+    }, {
+        # Brightcove id inside unique element with field--name-field-brightcove-video-id class
+        'url': 'https://www.gameinformer.com/video-feature/new-gameplay-today/2019/07/09/new-gameplay-today-streets-of-rogue',
+        'info_dict': {
+            'id': '6057111913001',
+            'ext': 'mp4',
+            'title': 'New Gameplay Today – Streets Of Rogue',
+            'timestamp': 1562699001,
+            'upload_date': '20190709',
+            'uploader_id': '694940074001',
+
+        },
+    }]
      BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(
              url, display_id, headers=self.geo_verification_headers())
-        brightcove_id = self._search_regex(
-            [r'<[^>]+\bid=["\']bc_(\d+)', r"getVideo\('[^']+video_id=(\d+)"],
-            webpage, 'brightcove id')
-        return self.url_result(
-            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
-            brightcove_id)
+        brightcove_id = clean_html(get_element_by_class('field--name-field-brightcove-video-id', webpage) or get_element_by_id('video-source-content', webpage))
+        brightcove_url = self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id if brightcove_id else BrightcoveNewIE._extract_url(self, webpage)
+        return self.url_result(brightcove_url, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index 77e2174609d817fa2934a2c56fb11cf22bd9193f..d1725d98b0c63f72031068a001ea150b54fd23ec 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -2075,6 +2075,22 @@ class GenericIE(InfoExtractor):
              },
              'playlist_count': 6,
          },
+        {
+            # Squarespace video embed, 2019-08-28
+            'url': 'http://ootboxford.com',
+            'info_dict': {
+                'id': 'Tc7b_JGdZfw',
+                'title': 'Out of the Blue, at Childish Things 10',
+                'ext': 'mp4',
+                'description': 'md5:a83d0026666cf5ee970f8bd1cfd69c7f',
+                'uploader_id': 'helendouglashouse',
+                'uploader': 'Helen & Douglas House',
+                'upload_date': '20140328',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
          {
              # Zype embed
              'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
@@ -2226,7 +2242,7 @@ class GenericIE(InfoExtractor):
                  default_search = 'fixup_error'
  
              if default_search in ('auto', 'auto_warning', 'fixup_error'):
-                if '/' in url:
+                if re.match(r'^[^\s/]+\.[^\s/]+/', url):
                      self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
                      return self.url_result('http://' + url)
                  elif default_search != 'fixup_error':
@@ -2395,6 +2411,12 @@ class GenericIE(InfoExtractor):
          # Unescaping the whole page allows to handle those cases in a generic way
          webpage = compat_urllib_parse_unquote(webpage)
  
+        # Unescape squarespace embeds to be detected by generic extractor,
+        # see https://github.com/ytdl-org/youtube-dl/issues/21294
+        webpage = re.sub(
+            r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
+            lambda x: unescapeHTML(x.group(0)), webpage)
+
          # it's tempting to parse this further, but you would
          # have to take into account all the variations like
          #   Video Title - Site Name
diff --git a/youtube_dl/extractor/gfycat.py b/youtube_dl/extractor/gfycat.py

index eb6f8583644b77f0063d7222e9c8ac37f4d0dae2..bbe3cb283afcc4f382ff613f706488b2e7c7c564 100644 (file)
--- a/youtube_dl/extractor/gfycat.py
+++ b/youtube_dl/extractor/gfycat.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class GfycatIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
      _TESTS = [{
          'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
          'info_dict': {
@@ -44,6 +44,9 @@ class GfycatIE(InfoExtractor):
              'categories': list,
              'age_limit': 0,
          }
+    }, {
+        'url': 'https://gfycat.com/ru/RemarkableDrearyAmurstarfish',
+        'only_matching': True
      }, {
          'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
          'only_matching': True
diff --git a/youtube_dl/extractor/go.py b/youtube_dl/extractor/go.py

index 5916f9a8f20b1a92ae63b2f4709a9d8bb8997fd0..03e48f4ea4b93153ef445f0d9a66779144821361 100644 (file)
--- a/youtube_dl/extractor/go.py
+++ b/youtube_dl/extractor/go.py
@@ -34,9 +34,13 @@ class GoIE(AdobePassIE):
          'watchdisneyxd': {
              'brand': '009',
              'resource_id': 'DisneyXD',
+        },
+        'disneynow': {
+            'brand': '011',
+            'resource_id': 'Disney',
          }
      }
-    _VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|disneynow)\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
+    _VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
                   % '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
      _TESTS = [{
          'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
@@ -83,7 +87,9 @@ class GoIE(AdobePassIE):
              display_id)['video']
  
      def _real_extract(self, url):
-        sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
+        mobj = re.match(self._VALID_URL, url)
+        sub_domain = mobj.group('sub_domain') or mobj.group('sub_domain_2')
+        video_id, display_id = mobj.group('id', 'display_id')
          site_info = self._SITE_INFO.get(sub_domain, {})
          brand = site_info.get('brand')
          if not video_id or not site_info:
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index 639d7383727bbdb48681827908103361ce6828cc..0a733424c471e6a0d83b04b7773742353dda5c5a 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -103,6 +103,11 @@ class KalturaIE(InfoExtractor):
          {
              'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
              'only_matching': True,
+        },
+        {
+            # unavailable source format
+            'url': 'kaltura:513551:1_66x4rg7o',
+            'only_matching': True,
          }
      ]
  
@@ -306,12 +311,17 @@ class KalturaIE(InfoExtractor):
                      f['fileExt'] = 'mp4'
              video_url = sign_url(
                  '%s/flavorId/%s' % (data_url, f['id']))
+            format_id = '%(fileExt)s-%(bitrate)s' % f
+            # Source format may not be available (e.g. kaltura:513551:1_66x4rg7o)
+            if f.get('isOriginal') is True and not self._is_valid_url(
+                    video_url, entry_id, format_id):
+                continue
              # audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
              # -f mp4-56)
              vcodec = 'none' if 'videoCodecId' not in f and f.get(
                  'frameRate') == 0 else f.get('videoCodecId')
              formats.append({
-                'format_id': '%(fileExt)s-%(bitrate)s' % f,
+                'format_id': format_id,
                  'ext': f.get('fileExt'),
                  'tbr': int_or_none(f['bitrate']),
                  'fps': int_or_none(f.get('frameRate')),
diff --git a/youtube_dl/extractor/lecturio.py b/youtube_dl/extractor/lecturio.py

index 24f78d928449b999f09acec8337297fa62da8f55..6ed7da4abaa7a2a45f924b4bf9f919261a40bec9 100644 (file)
--- a/youtube_dl/extractor/lecturio.py
+++ b/youtube_dl/extractor/lecturio.py
@@ -6,8 +6,8 @@ import re
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
+    clean_html,
      determine_ext,
-    extract_attributes,
      ExtractorError,
      float_or_none,
      int_or_none,
@@ -19,6 +19,7 @@ from ..utils import (
  
  
  class LecturioBaseIE(InfoExtractor):
+    _API_BASE_URL = 'https://app.lecturio.com/api/en/latest/html5/'
      _LOGIN_URL = 'https://app.lecturio.com/en/login'
      _NETRC_MACHINE = 'lecturio'
  
@@ -67,51 +68,56 @@ class LecturioIE(LecturioBaseIE):
      _VALID_URL = r'''(?x)
                      https://
                          (?:
-                            app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture|
-                            (?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag
+                            app\.lecturio\.com/([^/]+/(?P<nt>[^/?#&]+)\.lecture|(?:\#/)?lecture/c/\d+/(?P<id>\d+))|
+                            (?:www\.)?lecturio\.de/[^/]+/(?P<nt_de>[^/?#&]+)\.vortrag
                          )
                      '''
      _TESTS = [{
          'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
-        'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
+        'md5': '9a42cf1d8282a6311bf7211bbde26fde',
          'info_dict': {
              'id': '39634',
              'ext': 'mp4',
-            'title': 'Important Concepts and Terms â\80\93 Introduction to Microbiology',
+            'title': 'Important Concepts and Terms â\80\94 Introduction to Microbiology',
          },
          'skip': 'Requires lecturio account credentials',
      }, {
          'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
          'only_matching': True,
+    }, {
+        'url': 'https://app.lecturio.com/#/lecture/c/6434/39634',
+        'only_matching': True,
      }]
  
      _CC_LANGS = {
+        'Arabic': 'ar',
+        'Bulgarian': 'bg',
          'German': 'de',
          'English': 'en',
          'Spanish': 'es',
+        'Persian': 'fa',
          'French': 'fr',
+        'Japanese': 'ja',
          'Polish': 'pl',
+        'Pashto': 'ps',
          'Russian': 'ru',
      }
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('id') or mobj.group('id_de')
-
-        webpage = self._download_webpage(
-            'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
-            display_id)
-
-        lecture_id = self._search_regex(
-            r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
-
-        api_url = self._search_regex(
-            r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
-            'api url', group='url')
-
-        video = self._download_json(api_url, display_id)
-
+        nt = mobj.group('nt') or mobj.group('nt_de')
+        lecture_id = mobj.group('id')
+        display_id = nt or lecture_id
+        api_path = 'lectures/' + lecture_id if lecture_id else 'lecture/' + nt + '.json'
+        video = self._download_json(
+            self._API_BASE_URL + api_path, display_id)
          title = video['title'].strip()
+        if not lecture_id:
+            pid = video.get('productId') or video.get('uid')
+            if pid:
+                spid = pid.split('_')
+                if spid and len(spid) == 2:
+                    lecture_id = spid[1]
  
          formats = []
          for format_ in video['content']['media']:
@@ -129,24 +135,30 @@ class LecturioIE(LecturioBaseIE):
                  continue
              label = str_or_none(format_.get('label'))
              filesize = int_or_none(format_.get('fileSize'))
-            formats.append({
+            f = {
                  'url': file_url,
                  'format_id': label,
                  'filesize': float_or_none(filesize, invscale=1000)
-            })
+            }
+            if label:
+                mobj = re.match(r'(\d+)p\s*\(([^)]+)\)', label)
+                if mobj:
+                    f.update({
+                        'format_id': mobj.group(2),
+                        'height': int(mobj.group(1)),
+                    })
+            formats.append(f)
          self._sort_formats(formats)
  
          subtitles = {}
          automatic_captions = {}
-        cc = self._parse_json(
-            self._search_regex(
-                r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
-                default='{}'), display_id, fatal=False)
-        for cc_label, cc_url in cc.items():
-            cc_url = url_or_none(cc_url)
+        captions = video.get('captions') or []
+        for cc in captions:
+            cc_url = cc.get('url')
              if not cc_url:
                  continue
-            lang = self._search_regex(
+            cc_label = cc.get('translatedCode')
+            lang = cc.get('languageCode') or self._search_regex(
                  r'/([a-z]{2})_', cc_url, 'lang',
                  default=cc_label.split()[0] if cc_label else 'en')
              original_lang = self._search_regex(
@@ -160,7 +172,7 @@ class LecturioIE(LecturioBaseIE):
              })
  
          return {
-            'id': lecture_id,
+            'id': lecture_id or nt,
              'title': title,
              'formats': formats,
              'subtitles': subtitles,
@@ -169,37 +181,40 @@ class LecturioIE(LecturioBaseIE):
  
  
  class LecturioCourseIE(LecturioBaseIE):
-    _VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course'
-    _TEST = {
+    _VALID_URL = r'https://app\.lecturio\.com/(?:[^/]+/(?P<nt>[^/?#&]+)\.course|(?:#/)?course/c/(?P<id>\d+))'
+    _TESTS = [{
          'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
          'info_dict': {
              'id': 'microbiology-introduction',
              'title': 'Microbiology: Introduction',
+            'description': 'md5:13da8500c25880c6016ae1e6d78c386a',
          },
          'playlist_count': 45,
          'skip': 'Requires lecturio account credentials',
-    }
+    }, {
+        'url': 'https://app.lecturio.com/#/course/c/6434',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
+        nt, course_id = re.match(self._VALID_URL, url).groups()
+        display_id = nt or course_id
+        api_path = 'courses/' + course_id if course_id else 'course/content/' + nt + '.json'
+        course = self._download_json(
+            self._API_BASE_URL + api_path, display_id)
          entries = []
-        for mobj in re.finditer(
-                r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
-                webpage):
-            params = extract_attributes(mobj.group(0))
-            lecture_url = urljoin(url, params.get('data-url'))
-            lecture_id = params.get('data-id')
+        for lecture in course.get('lectures', []):
+            lecture_id = str_or_none(lecture.get('id'))
+            lecture_url = lecture.get('url')
+            if lecture_url:
+                lecture_url = urljoin(url, lecture_url)
+            else:
+                lecture_url = 'https://app.lecturio.com/#/lecture/c/%s/%s' % (course_id, lecture_id)
              entries.append(self.url_result(
                  lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
-
-        title = self._search_regex(
-            r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage,
-            'title', default=None)
-
-        return self.playlist_result(entries, display_id, title)
+        return self.playlist_result(
+            entries, display_id, course.get('title'),
+            clean_html(course.get('description')))
  
  
  class LecturioDeCourseIE(LecturioBaseIE):
diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py

index 8dd1ce0d0e935888f3a90dedf92f113f2ef74f9b..7dc0ad7947a8b3ddfa7713f5f2fe231cf9a1ebf5 100644 (file)
--- a/youtube_dl/extractor/leeco.py
+++ b/youtube_dl/extractor/leeco.py
@@ -326,7 +326,7 @@ class LetvCloudIE(InfoExtractor):
              elif play_json.get('code'):
                  raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
              else:
-                raise ExtractorError('Letv cloud returned an unknwon error')
+                raise ExtractorError('Letv cloud returned an unknown error')
  
          def b64decode(s):
              return compat_b64decode(s).decode('utf-8')
diff --git a/youtube_dl/extractor/livejournal.py b/youtube_dl/extractor/livejournal.py

new file mode 100644 (file)

index 0000000..3a9f455
--- /dev/null
+++ b/youtube_dl/extractor/livejournal.py
@@ -0,0 +1,42 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import int_or_none
+
+
+class LiveJournalIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:[^.]+\.)?livejournal\.com/video/album/\d+.+?\bid=(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://andrei-bt.livejournal.com/video/album/407/?mode=view&id=51272',
+        'md5': 'adaf018388572ced8a6f301ace49d4b2',
+        'info_dict': {
+            'id': '1263729',
+            'ext': 'mp4',
+            'title': 'Истребители против БПЛА',
+            'upload_date': '20190624',
+            'timestamp': 1561406715,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        record = self._parse_json(self._search_regex(
+            r'Site\.page\s*=\s*({.+?});', webpage,
+            'page data'), video_id)['video']['record']
+        storage_id = compat_str(record['storageid'])
+        title = record.get('name')
+        if title:
+            # remove filename extension(.mp4, .mov, etc...)
+            title = title.rsplit('.', 1)[0]
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'title': title,
+            'thumbnail': record.get('thumbnail'),
+            'timestamp': int_or_none(record.get('timecreate')),
+            'url': 'eagleplatform:vc.videos.livejournal.com:' + storage_id,
+            'ie_key': 'EaglePlatform',
+        }
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index 3084c6dffc9cd3b32ede52906669299807a9ad20..b3d8653d078c73ce75a4de69d109d085a57f6308 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -117,6 +117,10 @@ class LyndaIE(LyndaBaseIE):
      }, {
          'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
          'only_matching': True,
+    }, {
+        # Status="NotFound", Message="Transcript not found"
+        'url': 'https://www.lynda.com/ASP-NET-tutorials/What-you-should-know/5034180/2811512-4.html',
+        'only_matching': True,
      }]
  
      def _raise_unavailable(self, video_id):
@@ -247,12 +251,17 @@ class LyndaIE(LyndaBaseIE):
  
      def _get_subtitles(self, video_id):
          url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
-        subs = self._download_json(url, None, False)
+        subs = self._download_webpage(
+            url, video_id, 'Downloading subtitles JSON', fatal=False)
+        if not subs or 'Status="NotFound"' in subs:
+            return {}
+        subs = self._parse_json(subs, video_id, fatal=False)
+        if not subs:
+            return {}
          fixed_subs = self._fix_subtitles(subs)
          if fixed_subs:
              return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
-        else:
-            return {}
+        return {}
  
  
  class LyndaCourseIE(LyndaBaseIE):
diff --git a/youtube_dl/extractor/mgtv.py b/youtube_dl/extractor/mgtv.py

index 84137df502522a77561fb8d86ebfb8d7febb2b95..71fc3ec56da8684e49474633959ed72b2d3047b2 100644 (file)
--- a/youtube_dl/extractor/mgtv.py
+++ b/youtube_dl/extractor/mgtv.py
@@ -79,6 +79,10 @@ class MGTVIE(InfoExtractor):
                  'ext': 'mp4',
                  'tbr': tbr,
                  'protocol': 'm3u8_native',
+                'http_headers': {
+                    'Referer': url,
+                },
+                'format_note': stream.get('name'),
              })
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

index 11e92e47149aace86b5993987413038e48496005..679eaf6c312d0d4de0a3e11bee7076f6ec365039 100644 (file)
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -243,7 +243,13 @@ class PhantomJSwrapper(object):
  
  
  class OpenloadIE(InfoExtractor):
-    _DOMAINS = r'(?:openload\.(?:co|io|link|pw)|oload\.(?:tv|biz|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|press|pw|life|live|space|services|website)|oladblock\.(?:services|xyz|me)|openloed\.co)'
+    _DOMAINS = r'''
+                    (?:
+                        openload\.(?:co|io|link|pw)|
+                        oload\.(?:tv|best|biz|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|press|pw|life|live|space|services|website|vip)|
+                        oladblock\.(?:services|xyz|me)|openloed\.co
+                    )
+                '''
      _VALID_URL = r'''(?x)
                      https?://
                          (?P<host>
@@ -368,6 +374,9 @@ class OpenloadIE(InfoExtractor):
      }, {
          'url': 'https://oload.biz/f/bEk3Gp8ARr4/',
          'only_matching': True,
+    }, {
+        'url': 'https://oload.best/embed/kkz9JgVZeWc/',
+        'only_matching': True,
      }, {
          'url': 'https://oladblock.services/f/b8NWEgkqNLI/',
          'only_matching': True,
@@ -380,12 +389,15 @@ class OpenloadIE(InfoExtractor):
      }, {
          'url': 'https://openloed.co/f/b8NWEgkqNLI/',
          'only_matching': True,
+    }, {
+        'url': 'https://oload.vip/f/kUEfGclsU9o',
+        'only_matching': True,
      }]
  
      @classmethod
      def _extract_urls(cls, webpage):
          return re.findall(
-            r'<iframe[^>]+src=["\']((?:https?://)?%s/%s/[a-zA-Z0-9-_]+)'
+            r'(?x)<iframe[^>]+src=["\']((?:https?://)?%s/%s/[a-zA-Z0-9-_]+)'
              % (cls._DOMAINS, cls._EMBED_WORD), webpage)
  
      def _extract_decrypted_page(self, page_url, webpage, video_id):
@@ -451,7 +463,7 @@ class OpenloadIE(InfoExtractor):
  class VerystreamIE(OpenloadIE):
      IE_NAME = 'verystream'
  
-    _DOMAINS = r'(?:verystream\.com)'
+    _DOMAINS = r'(?:verystream\.com|woof\.tube)'
      _VALID_URL = r'''(?x)
                      https?://
                          (?P<host>
diff --git a/youtube_dl/extractor/packtpub.py b/youtube_dl/extractor/packtpub.py

index 1324137dfe44bef6bf09ca3899803beba88ddbe9..11ad3b3b8367b1b3655d7297cd08386de45323aa 100644 (file)
--- a/youtube_dl/extractor/packtpub.py
+++ b/youtube_dl/extractor/packtpub.py
@@ -5,26 +5,27 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_str,
+    # compat_str,
      compat_HTTPError,
  )
  from ..utils import (
      clean_html,
      ExtractorError,
-    remove_end,
+    # remove_end,
+    str_or_none,
      strip_or_none,
      unified_timestamp,
-    urljoin,
+    # urljoin,
  )
  
  
  class PacktPubBaseIE(InfoExtractor):
-    _PACKT_BASE = 'https://www.packtpub.com'
-    _MAPT_REST = '%s/mapt-rest' % _PACKT_BASE
+    # _PACKT_BASE = 'https://www.packtpub.com'
+    _STATIC_PRODUCTS_BASE = 'https://static.packt-cdn.com/products/'
  
  
  class PacktPubIE(PacktPubBaseIE):
-    _VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>[^/]+)/(?P<id>[^/]+)(?:/(?P<display_id>[^/?&#]+))?'
  
      _TESTS = [{
          'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
@@ -40,6 +41,9 @@ class PacktPubIE(PacktPubBaseIE):
      }, {
          'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
          'only_matching': True,
+    }, {
+        'url': 'https://subscription.packtpub.com/video/programming/9781838988906/p1/video1_1/business-card-project',
+        'only_matching': True,
      }]
      _NETRC_MACHINE = 'packtpub'
      _TOKEN = None
@@ -50,9 +54,9 @@ class PacktPubIE(PacktPubBaseIE):
              return
          try:
              self._TOKEN = self._download_json(
-                self._MAPT_REST + '/users/tokens', None,
+                'https://services.packtpub.com/auth-v1/users/tokens', None,
                  'Downloading Authorization Token', data=json.dumps({
-                    'email': username,
+                    'username': username,
                      'password': password,
                  }).encode())['data']['access']
          except ExtractorError as e:
@@ -61,54 +65,40 @@ class PacktPubIE(PacktPubBaseIE):
                  raise ExtractorError(message, expected=True)
              raise
  
-    def _handle_error(self, response):
-        if response.get('status') != 'success':
-            raise ExtractorError(
-                '% said: %s' % (self.IE_NAME, response['message']),
-                expected=True)
-
-    def _download_json(self, *args, **kwargs):
-        response = super(PacktPubIE, self)._download_json(*args, **kwargs)
-        self._handle_error(response)
-        return response
-
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        course_id, chapter_id, video_id = mobj.group(
-            'course_id', 'chapter_id', 'id')
+        course_id, chapter_id, video_id, display_id = re.match(self._VALID_URL, url).groups()
  
          headers = {}
          if self._TOKEN:
              headers['Authorization'] = 'Bearer ' + self._TOKEN
-        video = self._download_json(
-            '%s/users/me/products/%s/chapters/%s/sections/%s'
-            % (self._MAPT_REST, course_id, chapter_id, video_id), video_id,
-            'Downloading JSON video', headers=headers)['data']
-
-        content = video.get('content')
-        if not content:
-            self.raise_login_required('This video is locked')
-
-        video_url = content['file']
+        try:
+            video_url = self._download_json(
+                'https://services.packtpub.com/products-v1/products/%s/%s/%s' % (course_id, chapter_id, video_id), video_id,
+                'Downloading JSON video', headers=headers)['data']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
+                self.raise_login_required('This video is locked')
+            raise
  
-        metadata = self._download_json(
-            '%s/products/%s/chapters/%s/sections/%s/metadata'
-            % (self._MAPT_REST, course_id, chapter_id, video_id),
-            video_id)['data']
+        # TODO: find a better way to avoid duplicating course requests
+        # metadata = self._download_json(
+        #     '%s/products/%s/chapters/%s/sections/%s/metadata'
+        #     % (self._MAPT_REST, course_id, chapter_id, video_id),
+        #     video_id)['data']
  
-        title = metadata['pageTitle']
-        course_title = metadata.get('title')
-        if course_title:
-            title = remove_end(title, ' - %s' % course_title)
-        timestamp = unified_timestamp(metadata.get('publicationDate'))
-        thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
+        # title = metadata['pageTitle']
+        # course_title = metadata.get('title')
+        # if course_title:
+        #     title = remove_end(title, ' - %s' % course_title)
+        # timestamp = unified_timestamp(metadata.get('publicationDate'))
+        # thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
  
          return {
              'id': video_id,
              'url': video_url,
-            'title': title,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
+            'title': display_id or video_id,  # title,
+            # 'thumbnail': thumbnail,
+            # 'timestamp': timestamp,
          }
  
  
@@ -119,6 +109,7 @@ class PacktPubCourseIE(PacktPubBaseIE):
          'info_dict': {
              'id': '9781787122215',
              'title': 'Learn Nodejs by building 12 projects [Video]',
+            'description': 'md5:489da8d953f416e51927b60a1c7db0aa',
          },
          'playlist_count': 90,
      }, {
@@ -136,35 +127,38 @@ class PacktPubCourseIE(PacktPubBaseIE):
          url, course_id = mobj.group('url', 'id')
  
          course = self._download_json(
-            '%s/products/%s/metadata' % (self._MAPT_REST, course_id),
-            course_id)['data']
+            self._STATIC_PRODUCTS_BASE + '%s/toc' % course_id, course_id)
+        metadata = self._download_json(
+            self._STATIC_PRODUCTS_BASE + '%s/summary' % course_id,
+            course_id, fatal=False) or {}
  
          entries = []
-        for chapter_num, chapter in enumerate(course['tableOfContents'], 1):
-            if chapter.get('type') != 'chapter':
-                continue
-            children = chapter.get('children')
-            if not isinstance(children, list):
+        for chapter_num, chapter in enumerate(course['chapters'], 1):
+            chapter_id = str_or_none(chapter.get('id'))
+            sections = chapter.get('sections')
+            if not chapter_id or not isinstance(sections, list):
                  continue
              chapter_info = {
                  'chapter': chapter.get('title'),
                  'chapter_number': chapter_num,
-                'chapter_id': chapter.get('id'),
+                'chapter_id': chapter_id,
              }
-            for section in children:
-                if section.get('type') != 'section':
-                    continue
-                section_url = section.get('seoUrl')
-                if not isinstance(section_url, compat_str):
+            for section in sections:
+                section_id = str_or_none(section.get('id'))
+                if not section_id or section.get('contentType') != 'video':
                      continue
                  entry = {
                      '_type': 'url_transparent',
-                    'url': urljoin(url + '/', section_url),
+                    'url': '/'.join([url, chapter_id, section_id]),
                      'title': strip_or_none(section.get('title')),
                      'description': clean_html(section.get('summary')),
+                    'thumbnail': metadata.get('coverImage'),
+                    'timestamp': unified_timestamp(metadata.get('publicationDate')),
                      'ie_key': PacktPubIE.ie_key(),
                  }
                  entry.update(chapter_info)
                  entries.append(entry)
  
-        return self.playlist_result(entries, course_id, course.get('title'))
+        return self.playlist_result(
+            entries, course_id, metadata.get('title'),
+            clean_html(metadata.get('about')))
diff --git a/youtube_dl/extractor/peertube.py b/youtube_dl/extractor/peertube.py

index e03c3d1d3d61ec2fd981776fba2775464b9658d1..b50543e329f983a9fe06c0bda9c457bb2aa337b6 100644 (file)
--- a/youtube_dl/extractor/peertube.py
+++ b/youtube_dl/extractor/peertube.py
@@ -168,7 +168,7 @@ class PeerTubeIE(InfoExtractor):
      @staticmethod
      def _extract_peertube_url(webpage, source_url):
          mobj = re.match(
-            r'https?://(?P<host>[^/]+)/videos/watch/(?P<id>%s)'
+            r'https?://(?P<host>[^/]+)/videos/(?:watch|embed)/(?P<id>%s)'
              % PeerTubeIE._UUID_RE, source_url)
          if mobj and any(p in webpage for p in (
                  '<title>PeerTube<',
diff --git a/youtube_dl/extractor/philharmoniedeparis.py b/youtube_dl/extractor/philharmoniedeparis.py

index f723a2b3b507d5aa59428dd4f44057f7bbe3f655..03da64b116128f01893f047e747a1fb40af1e2df 100644 (file)
--- a/youtube_dl/extractor/philharmoniedeparis.py
+++ b/youtube_dl/extractor/philharmoniedeparis.py
@@ -14,7 +14,7 @@ class PhilharmonieDeParisIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
-                            live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
+                            live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|embed(?:app)?/|misc/Playlist\.ashx\?id=)|
                              pad\.philharmoniedeparis\.fr/doc/CIMU/
                          )
                          (?P<id>\d+)
@@ -40,6 +40,12 @@ class PhilharmonieDeParisIE(InfoExtractor):
      }, {
          'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
          'only_matching': True,
+    }, {
+        'url': 'https://live.philharmoniedeparis.fr/embedapp/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
+        'only_matching': True,
+    }, {
+        'url': 'https://live.philharmoniedeparis.fr/embed/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
+        'only_matching': True,
      }]
      _LIVE_URL = 'https://live.philharmoniedeparis.fr'
  
diff --git a/youtube_dl/extractor/piksel.py b/youtube_dl/extractor/piksel.py

index c0c276a506ef93c5e8af20363a0cb0cc7dc3a145..401298cb877f59ec46e896ee0c087d21f2c93924 100644 (file)
--- a/youtube_dl/extractor/piksel.py
+++ b/youtube_dl/extractor/piksel.py
@@ -18,15 +18,14 @@ class PikselIE(InfoExtractor):
      _VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
      _TESTS = [
          {
-            'url': 'http://player.piksel.com/v/nv60p12f',
-            'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
+            'url': 'http://player.piksel.com/v/ums2867l',
+            'md5': '34e34c8d89dc2559976a6079db531e85',
              'info_dict': {
-                'id': 'nv60p12f',
+                'id': 'ums2867l',
                  'ext': 'mp4',
-                'title': 'فن الحياة  - الحلقة 1',
-                'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
-                'timestamp': 1465231790,
-                'upload_date': '20160606',
+                'title': 'GX-005 with Caption',
+                'timestamp': 1481335659,
+                'upload_date': '20161210'
              }
          },
          {
@@ -39,7 +38,7 @@ class PikselIE(InfoExtractor):
                  'title': 'WAW- State of Washington vs. Donald J. Trump, et al',
                  'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.',
                  'timestamp': 1486171129,
-                'upload_date': '20170204',
+                'upload_date': '20170204'
              }
          }
      ]
@@ -113,6 +112,13 @@ class PikselIE(InfoExtractor):
              })
          self._sort_formats(formats)
  
+        subtitles = {}
+        for caption in video_data.get('captions', []):
+            caption_url = caption.get('url')
+            if caption_url:
+                subtitles.setdefault(caption.get('locale', 'en'), []).append({
+                    'url': caption_url})
+
          return {
              'id': video_id,
              'title': title,
@@ -120,4 +126,5 @@ class PikselIE(InfoExtractor):
              'thumbnail': video_data.get('thumbnailUrl'),
              'timestamp': parse_iso8601(video_data.get('dateadd')),
              'formats': formats,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/porn91.py b/youtube_dl/extractor/porn91.py

index 24c3600fe3224347a415226f0cfa73f38eaeca51..20eac647a87c10ad405c5dd9394bd6b04d66e7f9 100644 (file)
--- a/youtube_dl/extractor/porn91.py
+++ b/youtube_dl/extractor/porn91.py
@@ -39,7 +39,12 @@ class Porn91IE(InfoExtractor):
              r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title')
          title = title.replace('\n', '')
  
-        info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
+        video_link_url = self._search_regex(
+            r'<textarea[^>]+id=["\']fm-video_link[^>]+>([^<]+)</textarea>',
+            webpage, 'video link')
+        videopage = self._download_webpage(video_link_url, video_id)
+
+        info_dict = self._parse_html5_media_entries(url, videopage, video_id)[0]
  
          duration = parse_duration(self._search_regex(
              r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))
diff --git a/youtube_dl/extractor/roosterteeth.py b/youtube_dl/extractor/roosterteeth.py

index 857434540d89a41dae80d9b4270e7bd6e6192acb..8d88ee4994827b8cc648f03986e2015a2c2a1751 100644 (file)
--- a/youtube_dl/extractor/roosterteeth.py
+++ b/youtube_dl/extractor/roosterteeth.py
@@ -4,32 +4,34 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_HTTPError,
+    compat_str,
+)
  from ..utils import (
      ExtractorError,
      int_or_none,
-    strip_or_none,
-    unescapeHTML,
+    str_or_none,
      urlencode_postdata,
  )
  
  
  class RoosterTeethIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/episode/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/(?:episode|watch)/(?P<id>[^/?#&]+)'
      _LOGIN_URL = 'https://roosterteeth.com/login'
      _NETRC_MACHINE = 'roosterteeth'
      _TESTS = [{
          'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
          'md5': 'e2bd7764732d785ef797700a2489f212',
          'info_dict': {
-            'id': '26576',
+            'id': '9156',
              'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
              'ext': 'mp4',
-            'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
-            'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
+            'title': 'Million Dollars, But... The Game Announcement',
+            'description': 'md5:168a54b40e228e79f4ddb141e89fe4f5',
              'thumbnail': r're:^https?://.*\.png$',
              'series': 'Million Dollars, But...',
              'episode': 'Million Dollars, But... The Game Announcement',
-            'comment_count': int,
          },
      }, {
          'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
@@ -47,6 +49,9 @@ class RoosterTeethIE(InfoExtractor):
          # only available for FIRST members
          'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one',
          'only_matching': True,
+    }, {
+        'url': 'https://roosterteeth.com/watch/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
+        'only_matching': True,
      }]
  
      def _login(self):
@@ -89,60 +94,55 @@ class RoosterTeethIE(InfoExtractor):
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
-        episode = strip_or_none(unescapeHTML(self._search_regex(
-            (r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
-             r'<title>(?P<title>[^<]+)</title>'), webpage, 'title',
-            default=None, group='title')))
-
-        title = strip_or_none(self._og_search_title(
-            webpage, default=None)) or episode
-
-        m3u8_url = self._search_regex(
-            r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
-            webpage, 'm3u8 url', default=None, group='url')
-
-        if not m3u8_url:
-            if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
-                self.raise_login_required(
-                    '%s is only available for FIRST members' % display_id)
-
-            if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
-                self.raise_login_required('%s is not available yet' % display_id)
-
-            raise ExtractorError('Unable to extract m3u8 URL')
+        api_episode_url = 'https://svod-be.roosterteeth.com/api/v1/episodes/%s' % display_id
+
+        try:
+            m3u8_url = self._download_json(
+                api_episode_url + '/videos', display_id,
+                'Downloading video JSON metadata')['data'][0]['attributes']['url']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                if self._parse_json(e.cause.read().decode(), display_id).get('access') is False:
+                    self.raise_login_required(
+                        '%s is only available for FIRST members' % display_id)
+            raise
  
          formats = self._extract_m3u8_formats(
-            m3u8_url, display_id, ext='mp4',
-            entry_protocol='m3u8_native', m3u8_id='hls')
+            m3u8_url, display_id, 'mp4', 'm3u8_native', m3u8_id='hls')
          self._sort_formats(formats)
  
-        description = strip_or_none(self._og_search_description(webpage))
-        thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage))
-
-        series = self._search_regex(
-            (r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'),
-            webpage, 'series', fatal=False)
-
-        comment_count = int_or_none(self._search_regex(
-            r'>Comments \((\d+)\)<', webpage,
-            'comment count', fatal=False))
-
-        video_id = self._search_regex(
-            (r'containerId\s*=\s*["\']episode-(\d+)\1',
-             r'<div[^<]+id=["\']episode-(\d+)'), webpage,
-            'video id', default=display_id)
+        episode = self._download_json(
+            api_episode_url, display_id,
+            'Downloading episode JSON metadata')['data'][0]
+        attributes = episode['attributes']
+        title = attributes.get('title') or attributes['display_title']
+        video_id = compat_str(episode['id'])
+
+        thumbnails = []
+        for image in episode.get('included', {}).get('images', []):
+            if image.get('type') == 'episode_image':
+                img_attributes = image.get('attributes') or {}
+                for k in ('thumb', 'small', 'medium', 'large'):
+                    img_url = img_attributes.get(k)
+                    if img_url:
+                        thumbnails.append({
+                            'id': k,
+                            'url': img_url,
+                        })
  
          return {
              'id': video_id,
              'display_id': display_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'series': series,
-            'episode': episode,
-            'comment_count': comment_count,
+            'description': attributes.get('description') or attributes.get('caption'),
+            'thumbnails': thumbnails,
+            'series': attributes.get('show_title'),
+            'season_number': int_or_none(attributes.get('season_number')),
+            'season_id': attributes.get('season_id'),
+            'episode': title,
+            'episode_number': int_or_none(attributes.get('number')),
+            'episode_id': str_or_none(episode.get('uuid')),
              'formats': formats,
+            'channel_id': attributes.get('channel_id'),
+            'duration': int_or_none(attributes.get('length')),
          }
diff --git a/youtube_dl/extractor/rtlnl.py b/youtube_dl/extractor/rtlnl.py

index 0b5e55d16d30764fda7d4182c31d8eae9fb6f9cc..fadca8c175475e9b2bf7cb4c9bc0a7e2ed163fcc 100644 (file)
--- a/youtube_dl/extractor/rtlnl.py
+++ b/youtube_dl/extractor/rtlnl.py
@@ -32,7 +32,7 @@ class RtlNlIE(InfoExtractor):
              'duration': 1167.96,
          },
      }, {
-        # best format avaialble a3t
+        # best format available a3t
          'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
          'md5': 'dea7474214af1271d91ef332fb8be7ea',
          'info_dict': {
diff --git a/youtube_dl/extractor/rudo.py b/youtube_dl/extractor/rudo.py

deleted file mode 100644 (file)

index f036f67..0000000
--- a/youtube_dl/extractor/rudo.py
+++ /dev/null
@@ -1,53 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    js_to_json,
-    get_element_by_class,
-    unified_strdate,
-)
-
-
-class RudoIE(InfoExtractor):
-    _VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
-
-    _TEST = {
-        'url': 'http://rudo.video/vod/oTzw0MGnyG',
-        'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
-        'info_dict': {
-            'id': 'oTzw0MGnyG',
-            'ext': 'mp4',
-            'title': 'Comentario Tomás Mosciatti',
-            'upload_date': '20160617',
-        },
-    }
-
-    @classmethod
-    def _extract_url(cls, webpage):
-        mobj = re.search(
-            r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
-            webpage)
-        if mobj:
-            return mobj.group('url')
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
-
-        jwplayer_data = self._parse_json(self._search_regex(
-            r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
-            transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
-
-        info_dict = self._parse_jwplayer_data(
-            jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
-
-        info_dict.update({
-            'title': self._og_search_title(webpage),
-            'upload_date': unified_strdate(get_element_by_class('date', webpage)),
-        })
-
-        return info_dict
diff --git a/youtube_dl/extractor/safari.py b/youtube_dl/extractor/safari.py

index 8d4806794ae5f5e83e667a66447179269b0fb1a7..bd9ee1647d47d47bfc8d8341139c2bf953ecf158 100644 (file)
--- a/youtube_dl/extractor/safari.py
+++ b/youtube_dl/extractor/safari.py
@@ -68,9 +68,10 @@ class SafariBaseIE(InfoExtractor):
              raise ExtractorError(
                  'Unable to login: %s' % credentials, expected=True)
  
-        # oreilly serves two same groot_sessionid cookies in Set-Cookie header
-        # and expects first one to be actually set
-        self._apply_first_set_cookie_header(urlh, 'groot_sessionid')
+        # oreilly serves two same instances of the following cookies
+        # in Set-Cookie header and expects first one to be actually set
+        for cookie in ('groot_sessionid', 'orm-jwt', 'orm-rt'):
+            self._apply_first_set_cookie_header(urlh, cookie)
  
          _, urlh = self._download_webpage_handle(
              auth.get('redirect_uri') or next_uri, None, 'Completing login',)
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 3a8626e0200471f7d90ec8042d43545348a44b55..05538f3d6b8733fe30f45ef642a9d730bbd39cd5 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -197,7 +197,7 @@ class SoundcloudIE(InfoExtractor):
                  'skip_download': True,
              },
          },
-        # not avaialble via api.soundcloud.com/i1/tracks/id/streams
+        # not available via api.soundcloud.com/i1/tracks/id/streams
          {
              'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
              'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index f11d728caac1e73e9fe746a87d95f4261fa11c0d..e040ada29b24542582f72f08f31b843d928af251 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -5,6 +5,7 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
+    merge_dicts,
      orderedSet,
      parse_duration,
      parse_resolution,
@@ -26,6 +27,8 @@ class SpankBangIE(InfoExtractor):
              'description': 'dillion harper masturbates on a bed',
              'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'silly2587',
+            'timestamp': 1422571989,
+            'upload_date': '20150129',
              'age_limit': 18,
          }
      }, {
@@ -106,31 +109,36 @@ class SpankBangIE(InfoExtractor):
  
              for format_id, format_url in stream.items():
                  if format_id.startswith(STREAM_URL_PREFIX):
+                    if format_url and isinstance(format_url, list):
+                        format_url = format_url[0]
                      extract_format(
                          format_id[len(STREAM_URL_PREFIX):], format_url)
  
          self._sort_formats(formats)
  
+        info = self._search_json_ld(webpage, video_id, default={})
+
          title = self._html_search_regex(
-            r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title')
+            r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title', default=None)
          description = self._search_regex(
              r'<div[^>]+\bclass=["\']bottom[^>]+>\s*<p>[^<]*</p>\s*<p>([^<]+)',
-            webpage, 'description', fatal=False)
-        thumbnail = self._og_search_thumbnail(webpage)
-        uploader = self._search_regex(
-            r'class="user"[^>]*><img[^>]+>([^<]+)',
+            webpage, 'description', default=None)
+        thumbnail = self._og_search_thumbnail(webpage, default=None)
+        uploader = self._html_search_regex(
+            (r'(?s)<li[^>]+class=["\']profile[^>]+>(.+?)</a>',
+             r'class="user"[^>]*><img[^>]+>([^<]+)'),
              webpage, 'uploader', default=None)
          duration = parse_duration(self._search_regex(
              r'<div[^>]+\bclass=["\']right_side[^>]+>\s*<span>([^<]+)',
-            webpage, 'duration', fatal=False))
+            webpage, 'duration', default=None))
          view_count = str_to_int(self._search_regex(
-            r'([\d,.]+)\s+plays', webpage, 'view count', fatal=False))
+            r'([\d,.]+)\s+plays', webpage, 'view count', default=None))
  
          age_limit = self._rta_search(webpage)
  
-        return {
+        return merge_dicts({
              'id': video_id,
-            'title': title,
+            'title': title or video_id,
              'description': description,
              'thumbnail': thumbnail,
              'uploader': uploader,
@@ -138,7 +146,8 @@ class SpankBangIE(InfoExtractor):
              'view_count': view_count,
              'formats': formats,
              'age_limit': age_limit,
-        }
+        }, info
+        )
  
  
  class SpankBangPlaylistIE(InfoExtractor):
diff --git a/youtube_dl/extractor/spike.py b/youtube_dl/extractor/spike.py

index 21b93a5b3c7e4d2289ce3261f6c737b0f737001e..7c11ea7aaf9306181fee00adab6080d5c40ac1d7 100644 (file)
--- a/youtube_dl/extractor/spike.py
+++ b/youtube_dl/extractor/spike.py
@@ -22,7 +22,7 @@ class BellatorIE(MTVServicesInfoExtractor):
          'only_matching': True,
      }]
  
-    _FEED_URL = 'http://www.spike.com/feeds/mrss/'
+    _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
      _GEO_COUNTRIES = ['US']
  
  
diff --git a/youtube_dl/extractor/ted.py b/youtube_dl/extractor/ted.py

index 9b60cc462646da506f490ee1037d6dcf4d47fcbe..db5a4f44e6b9be254729256fc741d6f3eb3693f7 100644 (file)
--- a/youtube_dl/extractor/ted.py
+++ b/youtube_dl/extractor/ted.py
@@ -133,7 +133,7 @@ class TEDIE(InfoExtractor):
  
      def _extract_info(self, webpage):
          info_json = self._search_regex(
-            r'(?s)q\(\s*"\w+.init"\s*,\s*({.+})\)\s*</script>',
+            r'(?s)q\(\s*"\w+.init"\s*,\s*({.+?})\)\s*</script>',
              webpage, 'info json')
          return json.loads(info_json)
  
diff --git a/youtube_dl/extractor/tvigle.py b/youtube_dl/extractor/tvigle.py

index 3475ef4c3b91b69c136c53e251a25d3d902152f3..180259abac795e3f27e221c6c0cb95afdfe768ec 100644 (file)
--- a/youtube_dl/extractor/tvigle.py
+++ b/youtube_dl/extractor/tvigle.py
@@ -9,6 +9,8 @@ from ..utils import (
      float_or_none,
      int_or_none,
      parse_age_limit,
+    try_get,
+    url_or_none,
  )
  
  
@@ -23,11 +25,10 @@ class TvigleIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.tvigle.ru/video/sokrat/',
-            'md5': '36514aed3657d4f70b4b2cef8eb520cd',
              'info_dict': {
                  'id': '1848932',
                  'display_id': 'sokrat',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Сократ',
                  'description': 'md5:d6b92ffb7217b4b8ebad2e7665253c17',
                  'duration': 6586,
@@ -37,7 +38,6 @@ class TvigleIE(InfoExtractor):
          },
          {
              'url': 'http://www.tvigle.ru/video/vladimir-vysotskii/vedushchii-teleprogrammy-60-minut-ssha-o-vladimire-vysotskom/',
-            'md5': 'e7efe5350dd5011d0de6550b53c3ba7b',
              'info_dict': {
                  'id': '5142516',
                  'ext': 'flv',
@@ -62,7 +62,7 @@ class TvigleIE(InfoExtractor):
              webpage = self._download_webpage(url, display_id)
              video_id = self._html_search_regex(
                  (r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
-                 r'var\s+cloudId\s*=\s*["\'](\d+)',
+                 r'cloudId\s*=\s*["\'](\d+)',
                   r'class="video-preview current_playing" id="(\d+)"'),
                  webpage, 'video id')
  
@@ -90,21 +90,40 @@ class TvigleIE(InfoExtractor):
          age_limit = parse_age_limit(item.get('ageRestrictions'))
  
          formats = []
-        for vcodec, fmts in item['videos'].items():
+        for vcodec, url_or_fmts in item['videos'].items():
              if vcodec == 'hls':
-                continue
-            for format_id, video_url in fmts.items():
-                if format_id == 'm3u8':
+                m3u8_url = url_or_none(url_or_fmts)
+                if not m3u8_url:
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif vcodec == 'dash':
+                mpd_url = url_or_none(url_or_fmts)
+                if not mpd_url:
+                    continue
+                formats.extend(self._extract_mpd_formats(
+                    mpd_url, video_id, mpd_id='dash', fatal=False))
+            else:
+                if not isinstance(url_or_fmts, dict):
                      continue
-                height = self._search_regex(
-                    r'^(\d+)[pP]$', format_id, 'height', default=None)
-                formats.append({
-                    'url': video_url,
-                    'format_id': '%s-%s' % (vcodec, format_id),
-                    'vcodec': vcodec,
-                    'height': int_or_none(height),
-                    'filesize': int_or_none(item.get('video_files_size', {}).get(vcodec, {}).get(format_id)),
-                })
+                for format_id, video_url in url_or_fmts.items():
+                    if format_id == 'm3u8':
+                        continue
+                    video_url = url_or_none(video_url)
+                    if not video_url:
+                        continue
+                    height = self._search_regex(
+                        r'^(\d+)[pP]$', format_id, 'height', default=None)
+                    filesize = int_or_none(try_get(
+                        item, lambda x: x['video_files_size'][vcodec][format_id]))
+                    formats.append({
+                        'url': video_url,
+                        'format_id': '%s-%s' % (vcodec, format_id),
+                        'vcodec': vcodec,
+                        'height': int_or_none(height),
+                        'filesize': filesize,
+                    })
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/tvland.py b/youtube_dl/extractor/tvland.py

index 957cf1ea2666ace07087ffd7d9e94810e87fe1e8..791144128beb79271614309cfbf7b86c49f58217 100644 (file)
--- a/youtube_dl/extractor/tvland.py
+++ b/youtube_dl/extractor/tvland.py
@@ -1,32 +1,35 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .mtv import MTVServicesInfoExtractor
+from .spike import ParamountNetworkIE
  
  
-class TVLandIE(MTVServicesInfoExtractor):
+class TVLandIE(ParamountNetworkIE):
      IE_NAME = 'tvland.com'
      _VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://www.tvland.com/feeds/mrss/'
      _TESTS = [{
          # Geo-restricted. Without a proxy metadata are still there. With a
          # proxy it redirects to http://m.tvland.com/app/
-        'url': 'http://www.tvland.com/episodes/hqhps2/everybody-loves-raymond-the-invasion-ep-048',
+        'url': 'https://www.tvland.com/episodes/s04pzf/everybody-loves-raymond-the-dog-season-1-ep-19',
          'info_dict': {
-            'description': 'md5:80973e81b916a324e05c14a3fb506d29',
-            'title': 'The Invasion',
+            'description': 'md5:84928e7a8ad6649371fbf5da5e1ad75a',
+            'title': 'The Dog',
          },
-        'playlist': [],
+        'playlist_mincount': 5,
      }, {
-        'url': 'http://www.tvland.com/video-clips/zea2ev/younger-younger--hilary-duff---little-lies',
+        'url': 'https://www.tvland.com/video-clips/4n87f2/younger-a-first-look-at-younger-season-6',
          'md5': 'e2c6389401cf485df26c79c247b08713',
          'info_dict': {
-            'id': 'b8697515-4bbe-4e01-83d5-fa705ce5fa88',
+            'id': '891f7d3c-5b5b-4753-b879-b7ba1a601757',
              'ext': 'mp4',
-            'title': 'Younger|December 28, 2015|2|NO-EPISODE#|Younger: Hilary Duff - Little Lies',
-            'description': 'md5:7d192f56ca8d958645c83f0de8ef0269',
-            'upload_date': '20151228',
-            'timestamp': 1451289600,
+            'title': 'Younger|April 30, 2019|6|NO-EPISODE#|A First Look at Younger Season 6',
+            'description': 'md5:595ea74578d3a888ae878dfd1c7d4ab2',
+            'upload_date': '20190430',
+            'timestamp': 1556658000,
+        },
+        'params': {
+            'skip_download': True,
          },
      }, {
          'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301',
diff --git a/youtube_dl/extractor/tvn24.py b/youtube_dl/extractor/tvn24.py

index 6590e1fd01801f1825a0bb102197b7e2449b90dd..de0fb506316814c8562de514daddc5f99165cf6f 100644 (file)
--- a/youtube_dl/extractor/tvn24.py
+++ b/youtube_dl/extractor/tvn24.py
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
+    NO_DEFAULT,
      unescapeHTML,
  )
  
@@ -17,9 +18,21 @@ class TVN24IE(InfoExtractor):
              'id': '1584444',
              'ext': 'mp4',
              'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"',
-            'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości "Szkła kontaktowego".',
+            'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości Szkła kontaktowego.',
              'thumbnail': 're:https?://.*[.]jpeg',
          }
+    }, {
+        # different layout
+        'url': 'https://tvnmeteo.tvn24.pl/magazyny/maja-w-ogrodzie,13/odcinki-online,1,4,1,0/pnacza-ptaki-i-iglaki-odc-691-hgtv-odc-29,1771763.html',
+        'info_dict': {
+            'id': '1771763',
+            'ext': 'mp4',
+            'title': 'Pnącza, ptaki i iglaki (odc. 691 /HGTV odc. 29)',
+            'thumbnail': 're:https?://.*',
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html',
          'only_matching': True,
@@ -35,18 +48,21 @@ class TVN24IE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        display_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, display_id)
  
-        title = self._og_search_title(webpage)
+        title = self._og_search_title(
+            webpage, default=None) or self._search_regex(
+            r'<h\d+[^>]+class=["\']magazineItemHeader[^>]+>(.+?)</h',
+            webpage, 'title')
  
-        def extract_json(attr, name, fatal=True):
+        def extract_json(attr, name, default=NO_DEFAULT, fatal=True):
              return self._parse_json(
                  self._search_regex(
                      r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage,
-                    name, group='json', fatal=fatal) or '{}',
-                video_id, transform_source=unescapeHTML, fatal=fatal)
+                    name, group='json', default=default, fatal=fatal) or '{}',
+                display_id, transform_source=unescapeHTML, fatal=fatal)
  
          quality_data = extract_json('data-quality', 'formats')
  
@@ -59,16 +75,24 @@ class TVN24IE(InfoExtractor):
              })
          self._sort_formats(formats)
  
-        description = self._og_search_description(webpage)
+        description = self._og_search_description(webpage, default=None)
          thumbnail = self._og_search_thumbnail(
              webpage, default=None) or self._html_search_regex(
              r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage,
              'thumbnail', group='url')
  
+        video_id = None
+
          share_params = extract_json(
-            'data-share-params', 'share params', fatal=False)
+            'data-share-params', 'share params', default=None)
          if isinstance(share_params, dict):
-            video_id = share_params.get('id') or video_id
+            video_id = share_params.get('id')
+
+        if not video_id:
+            video_id = self._search_regex(
+                r'data-vid-id=["\'](\d+)', webpage, 'video id',
+                default=None) or self._search_regex(
+                r',(\d+)\.html', url, 'video id', default=display_id)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index dc5ff29c3f20545c30ad2b185305e4731774849c..0500e33a65c92a9667f84ca20f1c72e832365cad 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -317,7 +317,7 @@ class TwitchVodIE(TwitchItemBaseIE):
              'Downloading %s access token' % self._ITEM_TYPE)
  
          formats = self._extract_m3u8_formats(
-            '%s/vod/%s?%s' % (
+            '%s/vod/%s.m3u8?%s' % (
                  self._USHER_BASE, item_id,
                  compat_urllib_parse_urlencode({
                      'allow_source': 'true',
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index 41d0b6be8c8654bba8db336e1032f95abdc72e05..cebb6238c561cbfdf77c5ada514d55c216c19a34 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -428,11 +428,22 @@ class TwitterIE(InfoExtractor):
          'params': {
              'skip_download': True,  # requires ffmpeg
          },
+    }, {
+        'url': 'https://twitter.com/foobar/status/1087791357756956680',
+        'info_dict': {
+            'id': '1087791357756956680',
+            'ext': 'mp4',
+            'title': 'Twitter - A new is coming.  Some of you got an opt-in to try it now. Check out the emoji button, quick keyboard shortcuts, upgraded trends, advanced search, and more. Let us know your thoughts!',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'description': 'md5:66d493500c013e3e2d434195746a7f78',
+            'uploader': 'Twitter',
+            'uploader_id': 'Twitter',
+            'duration': 61.567,
+        },
      }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        user_id = mobj.group('user_id')
          twid = mobj.group('id')
  
          webpage, urlh = self._download_webpage_handle(
@@ -441,8 +452,13 @@ class TwitterIE(InfoExtractor):
          if 'twitter.com/account/suspended' in urlh.geturl():
              raise ExtractorError('Account suspended by Twitter.', expected=True)
  
-        if user_id is None:
-            mobj = re.match(self._VALID_URL, urlh.geturl())
+        user_id = None
+
+        redirect_mobj = re.match(self._VALID_URL, urlh.geturl())
+        if redirect_mobj:
+            user_id = redirect_mobj.group('user_id')
+
+        if not user_id:
              user_id = mobj.group('user_id')
  
          username = remove_end(self._og_search_title(webpage), ' on Twitter')
diff --git a/youtube_dl/extractor/usanetwork.py b/youtube_dl/extractor/usanetwork.py

index 823340776a805eda47d6f0b833d8f6a481a58f17..54c7495ccd30bdd8df8b7250617330832b0c2a4b 100644 (file)
--- a/youtube_dl/extractor/usanetwork.py
+++ b/youtube_dl/extractor/usanetwork.py
@@ -1,11 +1,9 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .adobepass import AdobePassIE
  from ..utils import (
-    extract_attributes,
+    NO_DEFAULT,
      smuggle_url,
      update_url_query,
  )
@@ -31,22 +29,22 @@ class USANetworkIE(AdobePassIE):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
  
-        player_params = extract_attributes(self._search_regex(
-            r'(<div[^>]+data-usa-tve-player-container[^>]*>)', webpage, 'player params'))
-        video_id = player_params['data-mpx-guid']
-        title = player_params['data-episode-title']
+        def _x(name, default=NO_DEFAULT):
+            return self._search_regex(
+                r'data-%s\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % name,
+                webpage, name, default=default, group='value')
  
-        account_pid, path = re.search(
-            r'data-src="(?:https?)?//player\.theplatform\.com/p/([^/]+)/.*?/(media/guid/\d+/\d+)',
-            webpage).groups()
+        video_id = _x('mpx-guid')
+        title = _x('episode-title')
+        mpx_account_id = _x('mpx-account-id', '2304992029')
  
          query = {
              'mbr': 'true',
          }
-        if player_params.get('data-is-full-episode') == '1':
+        if _x('is-full-episode', None) == '1':
              query['manifest'] = 'm3u'
  
-        if player_params.get('data-entitlement') == 'auth':
+        if _x('is-entitlement', None) == '1':
              adobe_pass = {}
              drupal_settings = self._search_regex(
                  r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
@@ -57,7 +55,7 @@ class USANetworkIE(AdobePassIE):
                      adobe_pass = drupal_settings.get('adobePass', {})
              resource = self._get_mvpd_resource(
                  adobe_pass.get('adobePassResourceId', 'usa'),
-                title, video_id, player_params.get('data-episode-rating', 'TV-14'))
+                title, video_id, _x('episode-rating', 'TV-14'))
              query['auth'] = self._extract_mvpd_auth(
                  url, video_id, adobe_pass.get('adobePassRequestorId', 'usa'), resource)
  
@@ -65,11 +63,11 @@ class USANetworkIE(AdobePassIE):
          info.update({
              '_type': 'url_transparent',
              'url': smuggle_url(update_url_query(
-                'http://link.theplatform.com/s/%s/%s' % (account_pid, path),
+                'http://link.theplatform.com/s/HNK2IC/media/guid/%s/%s' % (mpx_account_id, video_id),
                  query), {'force_smil_url': True}),
              'id': video_id,
              'title': title,
-            'series': player_params.get('data-show-title'),
+            'series': _x('show-title', None),
              'episode': title,
              'ie_key': 'ThePlatform',
          })
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index b5b44a79a26317489e68c646c725bc2945b1f878..ddf375c6c2189f4eea655196a69c7e2339952527 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -2,12 +2,14 @@
  from __future__ import unicode_literals
  
  import base64
+import functools
  import json
  import re
  import itertools
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_kwargs,
      compat_HTTPError,
      compat_str,
      compat_urlparse,
@@ -19,6 +21,7 @@ from ..utils import (
      int_or_none,
      merge_dicts,
      NO_DEFAULT,
+    OnDemandPagedList,
      parse_filesize,
      qualities,
      RegexNotFoundError,
@@ -98,6 +101,13 @@ class VimeoBaseInfoExtractor(InfoExtractor):
              webpage, 'vuid', group='vuid')
          return xsrft, vuid
  
+    def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs):
+        vimeo_config = self._search_regex(
+            r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));',
+            webpage, 'vimeo config', *args, **compat_kwargs(kwargs))
+        if vimeo_config:
+            return self._parse_json(vimeo_config, video_id)
+
      def _set_vimeo_cookie(self, name, value):
          self._set_cookie('vimeo.com', name, value)
  
@@ -253,7 +263,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                              \.
                          )?
                          vimeo(?P<pro>pro)?\.com/
-                        (?!(?:channels|album)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
+                        (?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
                          (?:.*?/)?
                          (?:
                              (?:
@@ -580,11 +590,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
          # and latter we extract those that are Vimeo specific.
          self.report_extraction(video_id)
  
-        vimeo_config = self._search_regex(
-            r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));', webpage,
-            'vimeo config', default=None)
+        vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
          if vimeo_config:
-            seed_status = self._parse_json(vimeo_config, video_id).get('seed_status', {})
+            seed_status = vimeo_config.get('seed_status', {})
              if seed_status.get('state') == 'failed':
                  raise ExtractorError(
                      '%s said: %s' % (self.IE_NAME, seed_status['title']),
@@ -905,7 +913,7 @@ class VimeoUserIE(VimeoChannelIE):
  
  class VimeoAlbumIE(VimeoChannelIE):
      IE_NAME = 'vimeo:album'
-    _VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)(?:$|[?#]|/(?!video))'
+    _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
      _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
      _TESTS = [{
          'url': 'https://vimeo.com/album/2632481',
@@ -925,21 +933,39 @@ class VimeoAlbumIE(VimeoChannelIE):
          'params': {
              'videopassword': 'youtube-dl',
          }
-    }, {
-        'url': 'https://vimeo.com/album/2632481/sort:plays/format:thumbnail',
-        'only_matching': True,
-    }, {
-        # TODO: respect page number
-        'url': 'https://vimeo.com/album/2632481/page:2/sort:plays/format:thumbnail',
-        'only_matching': True,
      }]
-
-    def _page_url(self, base_url, pagenum):
-        return '%s/page:%d/' % (base_url, pagenum)
+    _PAGE_SIZE = 100
+
+    def _fetch_page(self, album_id, authorizaion, hashed_pass, page):
+        api_page = page + 1
+        query = {
+            'fields': 'link',
+            'page': api_page,
+            'per_page': self._PAGE_SIZE,
+        }
+        if hashed_pass:
+            query['_hashed_pass'] = hashed_pass
+        videos = self._download_json(
+            'https://api.vimeo.com/albums/%s/videos' % album_id,
+            album_id, 'Downloading page %d' % api_page, query=query, headers={
+                'Authorization': 'jwt ' + authorizaion,
+            })['data']
+        for video in videos:
+            link = video.get('link')
+            if not link:
+                continue
+            yield self.url_result(link, VimeoIE.ie_key(), VimeoIE._match_id(link))
  
      def _real_extract(self, url):
          album_id = self._match_id(url)
-        return self._extract_videos(album_id, 'https://vimeo.com/album/%s' % album_id)
+        webpage = self._download_webpage(url, album_id)
+        webpage = self._login_list_password(url, album_id, webpage)
+        api_config = self._extract_vimeo_config(webpage, album_id)['api']
+        entries = OnDemandPagedList(functools.partial(
+            self._fetch_page, album_id, api_config['jwt'],
+            api_config.get('hashed_pass')), self._PAGE_SIZE)
+        return self.playlist_result(entries, album_id, self._html_search_regex(
+            r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
  
  
  class VimeoGroupsIE(VimeoAlbumIE):
diff --git a/youtube_dl/extractor/voxmedia.py b/youtube_dl/extractor/voxmedia.py

index c7a0a88fe896ac408f0cf1b559929d98b7f6a8b7..b318e15d4b4da53fe0b42f3537d0fde66182299c 100644 (file)
--- a/youtube_dl/extractor/voxmedia.py
+++ b/youtube_dl/extractor/voxmedia.py
@@ -4,7 +4,10 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from .once import OnceIE
  from ..compat import compat_urllib_parse_unquote
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
  
  
  class VoxMediaVolumeIE(OnceIE):
@@ -13,18 +16,43 @@ class VoxMediaVolumeIE(OnceIE):
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
-        video_data = self._parse_json(self._search_regex(
-            r'Volume\.createVideo\(({.+})\s*,\s*{.*}\s*,\s*\[.*\]\s*,\s*{.*}\);', webpage, 'video data'), video_id)
+
+        setup = self._parse_json(self._search_regex(
+            r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
+        video_data = setup.get('video') or {}
+        info = {
+            'id': video_id,
+            'title': video_data.get('title_short'),
+            'description': video_data.get('description_long') or video_data.get('description_short'),
+            'thumbnail': video_data.get('brightcove_thumbnail')
+        }
+        asset = setup.get('asset') or setup.get('params') or {}
+
+        formats = []
+        hls_url = asset.get('hls_url')
+        if hls_url:
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        mp4_url = asset.get('mp4_url')
+        if mp4_url:
+            tbr = self._search_regex(r'-(\d+)k\.', mp4_url, 'bitrate', default=None)
+            format_id = 'http'
+            if tbr:
+                format_id += '-' + tbr
+            formats.append({
+                'format_id': format_id,
+                'url': mp4_url,
+                'tbr': int_or_none(tbr),
+            })
+        if formats:
+            self._sort_formats(formats)
+            info['formats'] = formats
+            return info
+
          for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
              provider_video_id = video_data.get('%s_id' % provider_video_type)
              if not provider_video_id:
                  continue
-            info = {
-                'id': video_id,
-                'title': video_data.get('title_short'),
-                'description': video_data.get('description_long') or video_data.get('description_short'),
-                'thumbnail': video_data.get('brightcove_thumbnail')
-            }
              if provider_video_type == 'brightcove':
                  info['formats'] = self._extract_once_formats(provider_video_id)
                  self._sort_formats(info['formats'])
@@ -39,46 +67,49 @@ class VoxMediaVolumeIE(OnceIE):
  
  
  class VoxMediaIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked|funnyordie)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
      _TESTS = [{
+        # Volume embed, Youtube
          'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
          'info_dict': {
-            'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
+            'id': 'j4mLW6x17VM',
              'ext': 'mp4',
-            'title': 'Google\'s new material design direction',
-            'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+            'title': 'Material world: how Google discovered what software is made of',
+            'description': 'md5:dfc17e7715e3b542d66e33a109861382',
+            'upload_date': '20190710',
+            'uploader_id': 'TheVerge',
+            'uploader': 'The Verge',
          },
-        'add_ie': ['Ooyala'],
+        'add_ie': ['Youtube'],
      }, {
-        # data-ooyala-id
+        # Volume embed, Youtube
          'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
-        'md5': 'd744484ff127884cd2ba09e3fa604e4b',
+        'md5': '4c8f4a0937752b437c3ebc0ed24802b5',
          'info_dict': {
-            'id': 'RkZXU4cTphOCPDMZg5oEounJyoFI0g-B',
+            'id': 'Gy8Md3Eky38',
              'ext': 'mp4',
              'title': 'The Nexus 6: hands-on with Google\'s phablet',
-            'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af',
+            'description': 'md5:d9f0216e5fb932dd2033d6db37ac3f1d',
+            'uploader_id': 'TheVerge',
+            'upload_date': '20141021',
+            'uploader': 'The Verge',
          },
-        'add_ie': ['Ooyala'],
-        'skip': 'Video Not Found',
+        'add_ie': ['Youtube'],
+        'skip': 'similar to the previous test',
      }, {
-        # volume embed
+        # Volume embed, Youtube
          'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
          'info_dict': {
-            'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
+            'id': 'YCjDnX-Xzhg',
              'ext': 'mp4',
-            'title': 'The new frontier of LGBTQ civil rights, explained',
-            'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+            'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
+            'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
+            'uploader_id': 'voxdotcom',
+            'upload_date': '20150915',
+            'uploader': 'Vox',
          },
-        'add_ie': ['Ooyala'],
+        'add_ie': ['Youtube'],
+        'skip': 'similar to the previous test',
      }, {
          # youtube embed
          'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
@@ -93,6 +124,7 @@ class VoxMediaIE(InfoExtractor):
              'uploader': 'Vox',
          },
          'add_ie': ['Youtube'],
+        'skip': 'Page no longer contain videos',
      }, {
          # SBN.VideoLinkset.entryGroup multiple ooyala embeds
          'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
@@ -118,10 +150,11 @@ class VoxMediaIE(InfoExtractor):
                  'description': 'md5:e02d56b026d51aa32c010676765a690d',
              },
          }],
+        'skip': 'Page no longer contain videos',
      }, {
          # volume embed, Brightcove Once
          'url': 'https://www.recode.net/2014/6/17/11628066/post-post-pc-ceo-the-full-code-conference-video-of-microsofts-satya',
-        'md5': '01571a896281f77dc06e084138987ea2',
+        'md5': '2dbc77b8b0bff1894c2fce16eded637d',
          'info_dict': {
              'id': '1231c973d',
              'ext': 'mp4',
diff --git a/youtube_dl/extractor/vrv.py b/youtube_dl/extractor/vrv.py

index c814a8a4a3a04d83cf3c478331cb86bf31f0b592..6e51469b094363879749bf2926714662e26a1858 100644 (file)
--- a/youtube_dl/extractor/vrv.py
+++ b/youtube_dl/extractor/vrv.py
@@ -64,7 +64,15 @@ class VRVBaseIE(InfoExtractor):
  
      def _call_cms(self, path, video_id, note):
          if not self._CMS_SIGNING:
-            self._CMS_SIGNING = self._call_api('index', video_id, 'CMS Signing')['cms_signing']
+            index = self._call_api('index', video_id, 'CMS Signing')
+            self._CMS_SIGNING = index.get('cms_signing') or {}
+            if not self._CMS_SIGNING:
+                for signing_policy in index.get('signing_policies', []):
+                    signing_path = signing_policy.get('path')
+                    if signing_path and signing_path.startswith('/cms/'):
+                        name, value = signing_policy.get('name'), signing_policy.get('value')
+                        if name and value:
+                            self._CMS_SIGNING[name] = value
          return self._download_json(
              self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
              note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
diff --git a/youtube_dl/extractor/vzaar.py b/youtube_dl/extractor/vzaar.py

index 6000671c31bc399a405f03a0064996287d46fe3e..3336e6c152f80212468cc950c8da662ddf5998db 100644 (file)
--- a/youtube_dl/extractor/vzaar.py
+++ b/youtube_dl/extractor/vzaar.py
@@ -32,6 +32,10 @@ class VzaarIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'MP3',
          },
+    }, {
+        # with null videoTitle
+        'url': 'https://view.vzaar.com/20313539/download',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -45,7 +49,7 @@ class VzaarIE(InfoExtractor):
          video_data = self._download_json(
              'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
  
-        title = video_data['videoTitle']
+        title = video_data.get('videoTitle') or video_id
  
          formats = []
  
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index d268372e6f11e2c8d43626b3f1e9503f4f70cb4e..a5b94d2794166d452464b728ec40b2b258459c64 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -1,5 +1,6 @@
  from __future__ import unicode_literals
  
+import itertools
  import re
  
  from .common import InfoExtractor
@@ -8,6 +9,7 @@ from ..utils import (
      clean_html,
      determine_ext,
      dict_get,
+    extract_attributes,
      ExtractorError,
      int_or_none,
      parse_duration,
@@ -18,21 +20,21 @@ from ..utils import (
  
  
  class XHamsterIE(InfoExtractor):
+    _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
      _VALID_URL = r'''(?x)
                      https?://
-                        (?:.+?\.)?xhamster\.(?:com|one)/
+                        (?:.+?\.)?%s/
                          (?:
                              movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
                              videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
                          )
-                    '''
-
+                    ''' % _DOMAINS
      _TESTS = [{
-        'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
-        'md5': '8281348b8d3c53d39fffb377d24eac4e',
+        'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'md5': '98b4687efb1ffd331c4197854dc09e8f',
          'info_dict': {
              'id': '1509445',
-            'display_id': 'femaleagent_shy_beauty_takes_the_bait',
+            'display_id': 'femaleagent-shy-beauty-takes-the-bait',
              'ext': 'mp4',
              'title': 'FemaleAgent Shy beauty takes the bait',
              'timestamp': 1350194821,
@@ -40,13 +42,12 @@ class XHamsterIE(InfoExtractor):
              'uploader': 'Ruseful2011',
              'duration': 893,
              'age_limit': 18,
-            'categories': ['Fake Hub', 'Amateur', 'MILFs', 'POV', 'Beauti', 'Beauties', 'Beautiful', 'Boss', 'Office', 'Oral', 'Reality', 'Sexy', 'Taking'],
          },
      }, {
-        'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
+        'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=',
          'info_dict': {
              'id': '2221348',
-            'display_id': 'britney_spears_sexy_booty',
+            'display_id': 'britney-spears-sexy-booty',
              'ext': 'mp4',
              'title': 'Britney Spears  Sexy Booty',
              'timestamp': 1379123460,
@@ -54,13 +55,12 @@ class XHamsterIE(InfoExtractor):
              'uploader': 'jojo747400',
              'duration': 200,
              'age_limit': 18,
-            'categories': ['Britney Spears', 'Celebrities', 'HD Videos', 'Sexy', 'Sexy Booty'],
          },
          'params': {
              'skip_download': True,
          },
      }, {
-        # empty seo
+        # empty seo, unavailable via new URL schema
          'url': 'http://xhamster.com/movies/5667973/.html',
          'info_dict': {
              'id': '5667973',
@@ -71,7 +71,6 @@ class XHamsterIE(InfoExtractor):
              'uploader': 'parejafree',
              'duration': 72,
              'age_limit': 18,
-            'categories': ['Amateur', 'Blowjobs'],
          },
          'params': {
              'skip_download': True,
@@ -94,6 +93,18 @@ class XHamsterIE(InfoExtractor):
      }, {
          'url': 'https://xhamster.one/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
          'only_matching': True,
+    }, {
+        'url': 'https://xhamster.desi/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'only_matching': True,
+    }, {
+        'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'only_matching': True,
+    }, {
+        'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -285,7 +296,7 @@ class XHamsterIE(InfoExtractor):
  
  
  class XHamsterEmbedIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:.+?\.)?%s/xembed\.php\?video=(?P<id>\d+)' % XHamsterIE._DOMAINS
      _TEST = {
          'url': 'http://xhamster.com/xembed.php?video=3328539',
          'info_dict': {
@@ -322,3 +333,49 @@ class XHamsterEmbedIE(InfoExtractor):
              video_url = dict_get(vars, ('downloadLink', 'homepageLink', 'commentsLink', 'shareUrl'))
  
          return self.url_result(video_url, 'XHamster')
+
+
+class XHamsterUserIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?%s/users/(?P<id>[^/?#&]+)' % XHamsterIE._DOMAINS
+    _TESTS = [{
+        # Paginated user profile
+        'url': 'https://xhamster.com/users/netvideogirls/videos',
+        'info_dict': {
+            'id': 'netvideogirls',
+        },
+        'playlist_mincount': 267,
+    }, {
+        # Non-paginated user profile
+        'url': 'https://xhamster.com/users/firatkaan/videos',
+        'info_dict': {
+            'id': 'firatkaan',
+        },
+        'playlist_mincount': 1,
+    }]
+
+    def _entries(self, user_id):
+        next_page_url = 'https://xhamster.com/users/%s/videos/1' % user_id
+        for pagenum in itertools.count(1):
+            page = self._download_webpage(
+                next_page_url, user_id, 'Downloading page %s' % pagenum)
+            for video_tag in re.findall(
+                    r'(<a[^>]+class=["\'].*?\bvideo-thumb__image-container[^>]+>)',
+                    page):
+                video = extract_attributes(video_tag)
+                video_url = url_or_none(video.get('href'))
+                if not video_url or not XHamsterIE.suitable(video_url):
+                    continue
+                video_id = XHamsterIE._match_id(video_url)
+                yield self.url_result(
+                    video_url, ie=XHamsterIE.ie_key(), video_id=video_id)
+            mobj = re.search(r'<a[^>]+data-page=["\']next[^>]+>', page)
+            if not mobj:
+                break
+            next_page = extract_attributes(mobj.group(0))
+            next_page_url = url_or_none(next_page.get('href'))
+            if not next_page_url:
+                break
+
+    def _real_extract(self, url):
+        user_id = self._match_id(url)
+        return self.playlist_result(self._entries(user_id), user_id)
diff --git a/youtube_dl/extractor/yahoo.py b/youtube_dl/extractor/yahoo.py

index a3b5f00c8696e6e45af3678a8bbe6f611a21af2a..e5ebdd1806ec30944d04ddb56707fea1228e2fae 100644 (file)
--- a/youtube_dl/extractor/yahoo.py
+++ b/youtube_dl/extractor/yahoo.py
@@ -1,12 +1,14 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import hashlib
  import itertools
  import json
  import re
  
  from .common import InfoExtractor, SearchInfoExtractor
  from ..compat import (
+    compat_str,
      compat_urllib_parse,
      compat_urlparse,
  )
@@ -18,7 +20,9 @@ from ..utils import (
      int_or_none,
      mimetype2ext,
      smuggle_url,
+    try_get,
      unescapeHTML,
+    url_or_none,
  )
  
  from .brightcove import (
@@ -556,3 +560,130 @@ class YahooGyaOIE(InfoExtractor):
                  'https://gyao.yahoo.co.jp/player/%s/' % video_id.replace(':', '/'),
                  YahooGyaOPlayerIE.ie_key(), video_id))
          return self.playlist_result(entries, program_id)
+
+
+class YahooJapanNewsIE(InfoExtractor):
+    IE_NAME = 'yahoo:japannews'
+    IE_DESC = 'Yahoo! Japan News'
+    _VALID_URL = r'https?://(?P<host>(?:news|headlines)\.yahoo\.co\.jp)[^\d]*(?P<id>\d[\d-]*\d)?'
+    _GEO_COUNTRIES = ['JP']
+    _TESTS = [{
+        'url': 'https://headlines.yahoo.co.jp/videonews/ann?a=20190716-00000071-ann-int',
+        'info_dict': {
+            'id': '1736242',
+            'ext': 'mp4',
+            'title': 'ムン大統領が対日批判を強化“現金化”効果は？（テレビ朝日系（ANN）） - Yahoo!ニュース',
+            'description': '韓国の元徴用工らを巡る裁判の原告が弁護士が差し押さえた三菱重工業の資産を売却して - Yahoo!ニュース(テレビ朝日系（ANN）)',
+            'thumbnail': r're:^https?://.*\.[a-zA-Z\d]{3,4}$',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # geo restricted
+        'url': 'https://headlines.yahoo.co.jp/hl?a=20190721-00000001-oxv-l04',
+        'only_matching': True,
+    }, {
+        'url': 'https://headlines.yahoo.co.jp/videonews/',
+        'only_matching': True,
+    }, {
+        'url': 'https://news.yahoo.co.jp',
+        'only_matching': True,
+    }, {
+        'url': 'https://news.yahoo.co.jp/byline/hashimotojunji/20190628-00131977/',
+        'only_matching': True,
+    }, {
+        'url': 'https://news.yahoo.co.jp/feature/1356',
+        'only_matching': True
+    }]
+
+    def _extract_formats(self, json_data, content_id):
+        formats = []
+
+        video_data = try_get(
+            json_data,
+            lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+            list)
+        for vid in video_data or []:
+            delivery = vid.get('delivery')
+            url = url_or_none(vid.get('Url'))
+            if not delivery or not url:
+                continue
+            elif delivery == 'hls':
+                formats.extend(
+                    self._extract_m3u8_formats(
+                        url, content_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+            else:
+                formats.append({
+                    'url': url,
+                    'format_id': 'http-%s' % compat_str(vid.get('bitrate', '')),
+                    'height': int_or_none(vid.get('height')),
+                    'width': int_or_none(vid.get('width')),
+                    'tbr': int_or_none(vid.get('bitrate')),
+                })
+        self._remove_duplicate_formats(formats)
+        self._sort_formats(formats)
+
+        return formats
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        host = mobj.group('host')
+        display_id = mobj.group('id') or host
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._html_search_meta(
+            ['og:title', 'twitter:title'], webpage, 'title', default=None
+        ) or self._html_search_regex('<title>([^<]+)</title>', webpage, 'title')
+
+        if display_id == host:
+            # Headline page (w/ multiple BC playlists) ('news.yahoo.co.jp', 'headlines.yahoo.co.jp/videonews/', ...)
+            stream_plists = re.findall(r'plist=(\d+)', webpage) or re.findall(r'plist["\']:\s*["\']([^"\']+)', webpage)
+            entries = [
+                self.url_result(
+                    smuggle_url(
+                        'http://players.brightcove.net/5690807595001/HyZNerRl7_default/index.html?playlistId=%s' % plist_id,
+                        {'geo_countries': ['JP']}),
+                    ie='BrightcoveNew', video_id=plist_id)
+                for plist_id in stream_plists]
+            return self.playlist_result(entries, playlist_title=title)
+
+        # Article page
+        description = self._html_search_meta(
+            ['og:description', 'description', 'twitter:description'],
+            webpage, 'description', default=None)
+        thumbnail = self._og_search_thumbnail(
+            webpage, default=None) or self._html_search_meta(
+            'twitter:image', webpage, 'thumbnail', default=None)
+        space_id = self._search_regex([
+            r'<script[^>]+class=["\']yvpub-player["\'][^>]+spaceid=([^&"\']+)',
+            r'YAHOO\.JP\.srch\.\w+link\.onLoad[^;]+spaceID["\' ]*:["\' ]+([^"\']+)',
+            r'<!--\s+SpaceID=(\d+)'
+        ], webpage, 'spaceid')
+
+        content_id = self._search_regex(
+            r'<script[^>]+class=["\']yvpub-player["\'][^>]+contentid=(?P<contentid>[^&"\']+)',
+            webpage, 'contentid', group='contentid')
+
+        json_data = self._download_json(
+            'https://feapi-yvpub.yahooapis.jp/v1/content/%s' % content_id,
+            content_id,
+            query={
+                'appid': 'dj0zaiZpPVZMTVFJR0FwZWpiMyZzPWNvbnN1bWVyc2VjcmV0Jng9YjU-',
+                'output': 'json',
+                'space_id': space_id,
+                'domain': host,
+                'ak': hashlib.md5('_'.join((space_id, host)).encode()).hexdigest(),
+                'device_type': '1100',
+            })
+        formats = self._extract_formats(json_data, content_id)
+
+        return {
+            'id': content_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/yandexmusic.py b/youtube_dl/extractor/yandexmusic.py

index 1dfee59e96289a31b892dd1e9ccc8aeeb32e0821..08d35e04c968a497bc28f4c3207e8d63432bd8d4 100644 (file)
--- a/youtube_dl/extractor/yandexmusic.py
+++ b/youtube_dl/extractor/yandexmusic.py
@@ -10,6 +10,7 @@ from ..utils import (
      ExtractorError,
      int_or_none,
      float_or_none,
+    try_get,
  )
  
  
@@ -51,23 +52,43 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
      IE_DESC = 'Яндекс.Музыка - Трек'
      _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://music.yandex.ru/album/540508/track/4878838',
          'md5': 'f496818aa2f60b6c0062980d2e00dc20',
          'info_dict': {
              'id': '4878838',
              'ext': 'mp3',
-            'title': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
+            'title': 'Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
              'filesize': 4628061,
              'duration': 193.04,
              'track': 'Gypsy Eyes 1',
              'album': 'Gypsy Soul',
              'album_artist': 'Carlo Ambrosio',
-            'artist': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari',
+            'artist': 'Carlo Ambrosio & Fabio Di Bari',
              'release_year': 2009,
          },
          'skip': 'Travis CI servers blocked by YandexMusic',
-    }
+    }, {
+        # multiple disks
+        'url': 'http://music.yandex.ru/album/3840501/track/705105',
+        'md5': 'ebe7b4e2ac7ac03fe11c19727ca6153e',
+        'info_dict': {
+            'id': '705105',
+            'ext': 'mp3',
+            'title': 'Hooverphonic - Sometimes',
+            'filesize': 5743386,
+            'duration': 239.27,
+            'track': 'Sometimes',
+            'album': 'The Best of Hooverphonic',
+            'album_artist': 'Hooverphonic',
+            'artist': 'Hooverphonic',
+            'release_year': 2016,
+            'genre': 'pop',
+            'disc_number': 2,
+            'track_number': 9,
+        },
+        'skip': 'Travis CI servers blocked by YandexMusic',
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -110,9 +131,21 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
              'abr': int_or_none(download_data.get('bitrate')),
          }
  
+        def extract_artist_name(artist):
+            decomposed = artist.get('decomposed')
+            if not isinstance(decomposed, list):
+                return artist['name']
+            parts = [artist['name']]
+            for element in decomposed:
+                if isinstance(element, dict) and element.get('name'):
+                    parts.append(element['name'])
+                elif isinstance(element, compat_str):
+                    parts.append(element)
+            return ''.join(parts)
+
          def extract_artist(artist_list):
              if artist_list and isinstance(artist_list, list):
-                artists_names = [a['name'] for a in artist_list if a.get('name')]
+                artists_names = [extract_artist_name(a) for a in artist_list if a.get('name')]
                  if artists_names:
                      return ', '.join(artists_names)
  
@@ -121,10 +154,17 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
              album = albums[0]
              if isinstance(album, dict):
                  year = album.get('year')
+                disc_number = int_or_none(try_get(
+                    album, lambda x: x['trackPosition']['volume']))
+                track_number = int_or_none(try_get(
+                    album, lambda x: x['trackPosition']['index']))
                  track_info.update({
                      'album': album.get('title'),
                      'album_artist': extract_artist(album.get('artists')),
                      'release_year': int_or_none(year),
+                    'genre': album.get('genre'),
+                    'disc_number': disc_number,
+                    'track_number': track_number,
                  })
  
          track_artist = extract_artist(track.get('artists'))
@@ -152,7 +192,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
      IE_DESC = 'Яндекс.Музыка - Альбом'
      _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<id>\d+)/?(\?|$)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://music.yandex.ru/album/540508',
          'info_dict': {
              'id': '540508',
@@ -160,7 +200,15 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
          },
          'playlist_count': 50,
          'skip': 'Travis CI servers blocked by YandexMusic',
-    }
+    }, {
+        'url': 'https://music.yandex.ru/album/3840501',
+        'info_dict': {
+            'id': '3840501',
+            'title': 'Hooverphonic - The Best of Hooverphonic (2016)',
+        },
+        'playlist_count': 33,
+        'skip': 'Travis CI servers blocked by YandexMusic',
+    }]
  
      def _real_extract(self, url):
          album_id = self._match_id(url)
@@ -169,7 +217,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
              'http://music.yandex.ru/handlers/album.jsx?album=%s' % album_id,
              album_id, 'Downloading album JSON')
  
-        entries = self._build_playlist(album['volumes'][0])
+        entries = self._build_playlist([track for volume in album['volumes'] for track in volume])
  
          title = '%s - %s' % (album['artists'][0]['name'], album['title'])
          year = album.get('year')
diff --git a/youtube_dl/extractor/yandexvideo.py b/youtube_dl/extractor/yandexvideo.py

index 1aea9538310bb4cdffdc8adc5e60be15e904dcce..46529be05b65ee896af1e50c75d34a5528b74596 100644 (file)
--- a/youtube_dl/extractor/yandexvideo.py
+++ b/youtube_dl/extractor/yandexvideo.py
@@ -3,6 +3,7 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      int_or_none,
      url_or_none,
  )
@@ -47,6 +48,10 @@ class YandexVideoIE(InfoExtractor):
          # episode, sports
          'url': 'https://yandex.ru/?stream_channel=1538487871&stream_id=4132a07f71fb0396be93d74b3477131d',
          'only_matching': True,
+    }, {
+        # DASH with DRM
+        'url': 'https://yandex.ru/portal/video?from=morda&stream_id=485a92d94518d73a9d0ff778e13505f8',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -59,13 +64,22 @@ class YandexVideoIE(InfoExtractor):
                  'disable_trackings': 1,
              })['content']
  
-        m3u8_url = url_or_none(content.get('content_url')) or url_or_none(
+        content_url = url_or_none(content.get('content_url')) or url_or_none(
              content['streams'][0]['url'])
          title = content.get('title') or content.get('computed_title')
  
-        formats = self._extract_m3u8_formats(
-            m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
-            m3u8_id='hls')
+        ext = determine_ext(content_url)
+
+        if ext == 'm3u8':
+            formats = self._extract_m3u8_formats(
+                content_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls')
+        elif ext == 'mpd':
+            formats = self._extract_mpd_formats(
+                content_url, video_id, mpd_id='dash')
+        else:
+            formats = [{'url': content_url}]
+
          self._sort_formats(formats)
  
          description = content.get('description')
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index b570d5bae9b6128afb544e16fc9aef292e64fa01..25d056b3c21ea3f98c5c36384e86d243dfd4a913 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -27,9 +27,11 @@ from ..compat import (
      compat_str,
  )
  from ..utils import (
+    bool_or_none,
      clean_html,
      dict_get,
      error_to_compat_str,
+    extract_attributes,
      ExtractorError,
      float_or_none,
      get_element_by_attribute,
@@ -116,6 +118,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
                  'f.req': json.dumps(f_req),
                  'flowName': 'GlifWebSignIn',
                  'flowEntry': 'ServiceLogin',
+                # TODO: reverse actual botguard identifier generation algo
+                'bgRequest': '["identifier",""]',
              })
              return self._download_json(
                  url, None, note=note, errnote=errnote,
@@ -321,17 +325,18 @@ class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
          for video_id, video_title in self.extract_videos_from_page(content):
              yield self.url_result(video_id, 'Youtube', video_id, video_title)
  
-    def extract_videos_from_page(self, page):
-        ids_in_page = []
-        titles_in_page = []
-        for mobj in re.finditer(self._VIDEO_RE, page):
+    def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page):
+        for mobj in re.finditer(video_re, page):
              # The link with index 0 is not the first video of the playlist (not sure if still actual)
              if 'index' in mobj.groupdict() and mobj.group('id') == '0':
                  continue
              video_id = mobj.group('id')
-            video_title = unescapeHTML(mobj.group('title'))
+            video_title = unescapeHTML(
+                mobj.group('title')) if 'title' in mobj.groupdict() else None
              if video_title:
                  video_title = video_title.strip()
+            if video_title == '► Play all':
+                video_title = None
              try:
                  idx = ids_in_page.index(video_id)
                  if video_title and not titles_in_page[idx]:
@@ -339,6 +344,12 @@ class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
              except ValueError:
                  ids_in_page.append(video_id)
                  titles_in_page.append(video_title)
+
+    def extract_videos_from_page(self, page):
+        ids_in_page = []
+        titles_in_page = []
+        self.extract_videos_from_page_impl(
+            self._VIDEO_RE, page, ids_in_page, titles_in_page)
          return zip(ids_in_page, titles_in_page)
  
  
@@ -368,11 +379,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                              (?:www\.)?hooktube\.com/|
                              (?:www\.)?yourepeat\.com/|
                              tube\.majestyc\.net/|
+                            # Invidious instances taken from https://github.com/omarroth/invidious/wiki/Invidious-Instances
                              (?:(?:www|dev)\.)?invidio\.us/|
-                            (?:www\.)?invidiou\.sh/|
-                            (?:www\.)?invidious\.snopyta\.org/|
+                            (?:(?:www|no)\.)?invidiou\.sh/|
+                            (?:(?:www|fi|de)\.)?invidious\.snopyta\.org/|
                              (?:www\.)?invidious\.kabi\.tk/|
+                            (?:www\.)?invidious\.enkirton\.net/|
+                            (?:www\.)?invidious\.13ad\.de/|
+                            (?:www\.)?invidious\.mastodon\.host/|
+                            (?:www\.)?invidious\.nixnet\.xyz/|
+                            (?:www\.)?tube\.poal\.co/|
                              (?:www\.)?vid\.wxzm\.sx/|
+                            (?:www\.)?yt\.elukerio\.org/|
                              youtube\.googleapis\.com/)                        # the various hostnames, with wildcard subdomains
                           (?:.*?\#/)?                                          # handle anchor (#/) redirect urls
                           (?:                                                  # the various things that can precede the ID:
@@ -1587,17 +1605,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          video_id = mobj.group(2)
          return video_id
  
-    def _extract_annotations(self, video_id):
-        return self._download_webpage(
-            'https://www.youtube.com/annotations_invideo', video_id,
-            note='Downloading annotations',
-            errnote='Unable to download video annotations', fatal=False,
-            query={
-                'features': 1,
-                'legacy': 1,
-                'video_id': video_id,
-            })
-
      @staticmethod
      def _extract_chapters(description, duration):
          if not description:
@@ -1692,6 +1699,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          def extract_token(v_info):
              return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token'))
  
+        def extract_player_response(player_response, video_id):
+            pl_response = str_or_none(player_response)
+            if not pl_response:
+                return
+            pl_response = self._parse_json(pl_response, video_id, fatal=False)
+            if isinstance(pl_response, dict):
+                add_dash_mpd_pr(pl_response)
+                return pl_response
+
          player_response = {}
  
          # Get video info
@@ -1714,7 +1730,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  note='Refetching age-gated info webpage',
                  errnote='unable to download video info webpage')
              video_info = compat_parse_qs(video_info_webpage)
+            pl_response = video_info.get('player_response', [None])[0]
+            player_response = extract_player_response(pl_response, video_id)
              add_dash_mpd(video_info)
+            view_count = extract_view_count(video_info)
          else:
              age_gate = False
              video_info = None
@@ -1737,11 +1756,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      is_live = True
                  sts = ytplayer_config.get('sts')
                  if not player_response:
-                    pl_response = str_or_none(args.get('player_response'))
-                    if pl_response:
-                        pl_response = self._parse_json(pl_response, video_id, fatal=False)
-                        if isinstance(pl_response, dict):
-                            player_response = pl_response
+                    player_response = extract_player_response(args.get('player_response'), video_id)
              if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
                  add_dash_mpd_pr(player_response)
                  # We also try looking in get_video_info since it may contain different dashmpd
@@ -1773,9 +1788,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      get_video_info = compat_parse_qs(video_info_webpage)
                      if not player_response:
                          pl_response = get_video_info.get('player_response', [None])[0]
-                        if isinstance(pl_response, dict):
-                            player_response = pl_response
-                            add_dash_mpd_pr(player_response)
+                        player_response = extract_player_response(pl_response, video_id)
                      add_dash_mpd(get_video_info)
                      if view_count is None:
                          view_count = extract_view_count(get_video_info)
@@ -1798,9 +1811,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                          break
  
          def extract_unavailable_message():
-            return self._html_search_regex(
-                r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
-                video_webpage, 'unavailable message', default=None)
+            messages = []
+            for tag, kind in (('h1', 'message'), ('div', 'submessage')):
+                msg = self._html_search_regex(
+                    r'(?s)<{tag}[^>]+id=["\']unavailable-{kind}["\'][^>]*>(.+?)</{tag}>'.format(tag=tag, kind=kind),
+                    video_webpage, 'unavailable %s' % kind, default=None)
+                if msg:
+                    messages.append(msg)
+            if messages:
+                return '\n'.join(messages)
  
          if not video_info:
              unavailable_message = extract_unavailable_message()
@@ -1812,16 +1831,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          video_details = try_get(
              player_response, lambda x: x['videoDetails'], dict) or {}
  
-        # title
-        if 'title' in video_info:
-            video_title = video_info['title'][0]
-        elif 'title' in player_response:
-            video_title = video_details['title']
-        else:
+        video_title = video_info.get('title', [None])[0] or video_details.get('title')
+        if not video_title:
              self._downloader.report_warning('Unable to extract video title')
              video_title = '_'
  
-        # description
          description_original = video_description = get_element_by_id("eow-description", video_webpage)
          if video_description:
  
@@ -1846,11 +1860,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              ''', replace_url, video_description)
              video_description = clean_html(video_description)
          else:
-            fd_mobj = re.search(r'<meta name="description" content="([^"]+)"', video_webpage)
-            if fd_mobj:
-                video_description = unescapeHTML(fd_mobj.group(1))
-            else:
-                video_description = ''
+            video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
  
          if not smuggled_data.get('force_singlefeed', False):
              if not self._downloader.params.get('noplaylist'):
@@ -1888,6 +1898,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          if view_count is None and video_details:
              view_count = int_or_none(video_details.get('viewCount'))
  
+        if is_live is None:
+            is_live = bool_or_none(video_details.get('isLive'))
+
          # Check for "rental" videos
          if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
              raise ExtractorError('"rental" videos not supported. See https://github.com/ytdl-org/youtube-dl/issues/359 for more information.', expected=True)
@@ -2090,9 +2103,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
                      formats.append(a_format)
              else:
-                error_message = clean_html(video_info.get('reason', [None])[0])
+                error_message = extract_unavailable_message()
+                if not error_message:
+                    error_message = clean_html(try_get(
+                        player_response, lambda x: x['playabilityStatus']['reason'],
+                        compat_str))
                  if not error_message:
-                    error_message = extract_unavailable_message()
+                    error_message = clean_html(
+                        try_get(video_info, lambda x: x['reason'][0], compat_str))
                  if error_message:
                      raise ExtractorError(error_message, expected=True)
                  raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info')
@@ -2263,7 +2281,21 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          # annotations
          video_annotations = None
          if self._downloader.params.get('writeannotations', False):
-            video_annotations = self._extract_annotations(video_id)
+            xsrf_token = self._search_regex(
+                r'([\'"])XSRF_TOKEN\1\s*:\s*([\'"])(?P<xsrf_token>[A-Za-z0-9+/=]+)\2',
+                video_webpage, 'xsrf token', group='xsrf_token', fatal=False)
+            invideo_url = try_get(
+                player_response, lambda x: x['annotations'][0]['playerAnnotationsUrlsRenderer']['invideoUrl'], compat_str)
+            if xsrf_token and invideo_url:
+                xsrf_field_name = self._search_regex(
+                    r'([\'"])XSRF_FIELD_NAME\1\s*:\s*([\'"])(?P<xsrf_field_name>\w+)\2',
+                    video_webpage, 'xsrf field name',
+                    group='xsrf_field_name', default='session_token')
+                video_annotations = self._download_webpage(
+                    self._proto_relative_url(invideo_url),
+                    video_id, note='Downloading annotations',
+                    errnote='Unable to download video annotations', fatal=False,
+                    data=urlencode_postdata({xsrf_field_name: xsrf_token}))
  
          chapters = self._extract_chapters(description_original, video_duration)
  
@@ -2421,7 +2453,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
                          (%(playlist_id)s)
                       )""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
      _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
-    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
+    _VIDEO_RE_TPL = r'href="\s*/watch\?v=%s(?:&amp;(?:[^"]*?index=(?P<index>\d+))?(?:[^>]+>(?P<title>[^<]+))?)?'
+    _VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
      IE_NAME = 'youtube:playlist'
      _TESTS = [{
          'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
@@ -2444,6 +2477,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'title': '29C3: Not my department',
              'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
+            'uploader': 'Christiaan008',
+            'uploader_id': 'ChRiStIaAn008',
          },
          'playlist_count': 95,
      }, {
@@ -2452,6 +2487,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'title': '[OLD]Team Fortress 2 (Class-based LP)',
              'id': 'PLBB231211A4F62143',
+            'uploader': 'Wickydoo',
+            'uploader_id': 'Wickydoo',
          },
          'playlist_mincount': 26,
      }, {
@@ -2460,6 +2497,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'title': 'Uploads from Cauchemar',
              'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q',
+            'uploader': 'Cauchemar',
+            'uploader_id': 'Cauchemar89',
          },
          'playlist_mincount': 799,
      }, {
@@ -2477,13 +2516,17 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'title': 'JODA15',
              'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
+            'uploader': 'milan',
+            'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
          }
      }, {
          'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
          'playlist_mincount': 485,
          'info_dict': {
-            'title': '2017 華語最新單曲 (2/24更新)',
+            'title': '2018 Chinese New Singles (11/6 updated)',
              'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
+            'uploader': 'LBK',
+            'uploader_id': 'sdragonfang',
          }
      }, {
          'note': 'Embedded SWF player',
@@ -2492,13 +2535,16 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'title': 'JODA7',
              'id': 'YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ',
-        }
+        },
+        'skip': 'This playlist does not exist',
      }, {
          'note': 'Buggy playlist: the webpage has a "Load more" button but it doesn\'t have more videos',
          'url': 'https://www.youtube.com/playlist?list=UUXw-G3eDE9trcvY2sBMM_aA',
          'info_dict': {
              'title': 'Uploads from Interstellar Movie',
              'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
+            'uploader': 'Interstellar Movie',
+            'uploader_id': 'InterstellarMovie1',
          },
          'playlist_mincount': 21,
      }, {
@@ -2523,6 +2569,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          'params': {
              'skip_download': True,
          },
+        'skip': 'This video is not available.',
          'add_ie': [YoutubeIE.ie_key()],
      }, {
          'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
@@ -2534,7 +2581,6 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'uploader_id': 'backuspagemuseum',
              'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
              'upload_date': '20161008',
-            'license': 'Standard YouTube License',
              'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
              'categories': ['Nonprofits & Activism'],
              'tags': list,
@@ -2545,6 +2591,16 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'noplaylist': True,
              'skip_download': True,
          },
+    }, {
+        # https://github.com/ytdl-org/youtube-dl/issues/21844
+        'url': 'https://www.youtube.com/playlist?list=PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
+        'info_dict': {
+            'title': 'Data Analysis with Dr Mike Pound',
+            'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
+            'uploader_id': 'Computerphile',
+            'uploader': 'Computerphile',
+        },
+        'playlist_mincount': 11,
      }, {
          'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
          'only_matching': True,
@@ -2563,6 +2619,34 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      def _real_initialize(self):
          self._login()
  
+    def extract_videos_from_page(self, page):
+        ids_in_page = []
+        titles_in_page = []
+
+        for item in re.findall(
+                r'(<[^>]*\bdata-video-id\s*=\s*["\'][0-9A-Za-z_-]{11}[^>]+>)', page):
+            attrs = extract_attributes(item)
+            video_id = attrs['data-video-id']
+            video_title = unescapeHTML(attrs.get('data-title'))
+            if video_title:
+                video_title = video_title.strip()
+            ids_in_page.append(video_id)
+            titles_in_page.append(video_title)
+
+        # Fallback with old _VIDEO_RE
+        self.extract_videos_from_page_impl(
+            self._VIDEO_RE, page, ids_in_page, titles_in_page)
+
+        # Relaxed fallbacks
+        self.extract_videos_from_page_impl(
+            r'href="\s*/watch\?v\s*=\s*(?P<id>[0-9A-Za-z_-]{11})', page,
+            ids_in_page, titles_in_page)
+        self.extract_videos_from_page_impl(
+            r'data-video-ids\s*=\s*["\'](?P<id>[0-9A-Za-z_-]{11})', page,
+            ids_in_page, titles_in_page)
+
+        return zip(ids_in_page, titles_in_page)
+
      def _extract_mix(self, playlist_id):
          # The mixes are generated from a single video
          # the id of the playlist is just 'RD' + video_id
@@ -2711,6 +2795,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'id': 'UUKfVa3S1e4PHvxWcwyMMg8w',
              'title': 'Uploads from lex will',
+            'uploader': 'lex will',
+            'uploader_id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
          }
      }, {
          'note': 'Age restricted channel',
@@ -2720,6 +2806,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
          'info_dict': {
              'id': 'UUs0ifCMCm1icqRbqhUINa0w',
              'title': 'Uploads from Deus Ex',
+            'uploader': 'Deus Ex',
+            'uploader_id': 'DeusExOfficial',
          },
      }, {
          'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
@@ -2804,6 +2892,8 @@ class YoutubeUserIE(YoutubeChannelIE):
          'info_dict': {
              'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
              'title': 'Uploads from The Linux Foundation',
+            'uploader': 'The Linux Foundation',
+            'uploader_id': 'TheLinuxFoundation',
          }
      }, {
          # Only available via https://www.youtube.com/c/12minuteathlete/videos
@@ -2813,6 +2903,8 @@ class YoutubeUserIE(YoutubeChannelIE):
          'info_dict': {
              'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
              'title': 'Uploads from 12 Minute Athlete',
+            'uploader': '12 Minute Athlete',
+            'uploader_id': 'the12minuteathlete',
          }
      }, {
          'url': 'ytuser:phihag',
@@ -2906,7 +2998,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
          'playlist_mincount': 4,
          'info_dict': {
              'id': 'ThirstForScience',
-            'title': 'Thirst for Science',
+            'title': 'ThirstForScience',
          },
      }, {
          # with "Load more" button
@@ -2923,6 +3015,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
              'id': 'UCiU1dHvZObB2iP6xkJ__Icw',
              'title': 'Chem Player',
          },
+        'skip': 'Blocked',
      }]
  
  
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 78fe543263b1dce11cf2a084ba7d903516e7b232..98fa3228606cde087ee04e696711958571711d23 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2019.07.02'
+__version__ = '2019.09.01'
author	Rogério Brito <rbrito@ime.usp.br>
	Mon, 2 Sep 2019 17:47:34 +0000 (14:47 -0300)
committer	Rogério Brito <rbrito@ime.usp.br>
	Mon, 2 Sep 2019 17:47:34 +0000 (14:47 -0300)
ChangeLog		patch \| blob \| history
README.md		patch \| blob \| history
README.txt		patch \| blob \| history
docs/supportedsites.md		patch \| blob \| history
youtube-dl		patch \| blob \| history
youtube-dl.1		patch \| blob \| history
youtube_dl/YoutubeDL.py		patch \| blob \| history
youtube_dl/__init__.py		patch \| blob \| history
youtube_dl/downloader/dash.py		patch \| blob \| history
youtube_dl/downloader/external.py		patch \| blob \| history
youtube_dl/downloader/fragment.py		patch \| blob \| history
youtube_dl/downloader/ism.py		patch \| blob \| history
youtube_dl/extractor/abcnews.py		patch \| blob \| history
youtube_dl/extractor/adobepass.py		patch \| blob \| history
youtube_dl/extractor/arte.py		patch \| blob \| history
youtube_dl/extractor/asiancrush.py		patch \| blob \| history
youtube_dl/extractor/bbc.py		patch \| blob \| history
youtube_dl/extractor/beampro.py		patch \| blob \| history
youtube_dl/extractor/beeg.py		patch \| blob \| history
youtube_dl/extractor/biobiochiletv.py		patch \| blob \| history
youtube_dl/extractor/bleacherreport.py		patch \| blob \| history
youtube_dl/extractor/common.py		patch \| blob \| history
youtube_dl/extractor/ctsnews.py		patch \| blob \| history
youtube_dl/extractor/dailymotion.py		patch \| blob \| history
youtube_dl/extractor/dbtv.py		patch \| blob \| history
youtube_dl/extractor/discovery.py		patch \| blob \| history
youtube_dl/extractor/dlive.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/einthusan.py		patch \| blob \| history
youtube_dl/extractor/espn.py		patch \| blob \| history
youtube_dl/extractor/extractors.py		patch \| blob \| history
youtube_dl/extractor/facebook.py		patch \| blob \| history
youtube_dl/extractor/fivetv.py		patch \| blob \| history
youtube_dl/extractor/funk.py		patch \| blob \| history
youtube_dl/extractor/funnyordie.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/gameinformer.py		patch \| blob \| history
youtube_dl/extractor/generic.py		patch \| blob \| history
youtube_dl/extractor/gfycat.py		patch \| blob \| history
youtube_dl/extractor/go.py		patch \| blob \| history
youtube_dl/extractor/kaltura.py		patch \| blob \| history
youtube_dl/extractor/lecturio.py		patch \| blob \| history
youtube_dl/extractor/leeco.py		patch \| blob \| history
youtube_dl/extractor/livejournal.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/lynda.py		patch \| blob \| history
youtube_dl/extractor/mgtv.py		patch \| blob \| history
youtube_dl/extractor/openload.py		patch \| blob \| history
youtube_dl/extractor/packtpub.py		patch \| blob \| history
youtube_dl/extractor/peertube.py		patch \| blob \| history
youtube_dl/extractor/philharmoniedeparis.py		patch \| blob \| history
youtube_dl/extractor/piksel.py		patch \| blob \| history
youtube_dl/extractor/porn91.py		patch \| blob \| history
youtube_dl/extractor/roosterteeth.py		patch \| blob \| history
youtube_dl/extractor/rtlnl.py		patch \| blob \| history
youtube_dl/extractor/rudo.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/safari.py		patch \| blob \| history
youtube_dl/extractor/soundcloud.py		patch \| blob \| history
youtube_dl/extractor/spankbang.py		patch \| blob \| history
youtube_dl/extractor/spike.py		patch \| blob \| history
youtube_dl/extractor/ted.py		patch \| blob \| history
youtube_dl/extractor/tvigle.py		patch \| blob \| history
youtube_dl/extractor/tvland.py		patch \| blob \| history
youtube_dl/extractor/tvn24.py		patch \| blob \| history
youtube_dl/extractor/twitch.py		patch \| blob \| history
youtube_dl/extractor/twitter.py		patch \| blob \| history
youtube_dl/extractor/usanetwork.py		patch \| blob \| history
youtube_dl/extractor/vimeo.py		patch \| blob \| history
youtube_dl/extractor/voxmedia.py		patch \| blob \| history
youtube_dl/extractor/vrv.py		patch \| blob \| history
youtube_dl/extractor/vzaar.py		patch \| blob \| history
youtube_dl/extractor/xhamster.py		patch \| blob \| history
youtube_dl/extractor/yahoo.py		patch \| blob \| history
youtube_dl/extractor/yandexmusic.py		patch \| blob \| history
youtube_dl/extractor/yandexvideo.py		patch \| blob \| history
youtube_dl/extractor/youtube.py		patch \| blob \| history
youtube_dl/version.py		patch \| blob \| history