New upstream version 2018.01.27

author Rogério Brito <rbrito@ime.usp.br>

Wed, 31 Jan 2018 04:22:35 +0000 (02:22 -0200)

committer Rogério Brito <rbrito@ime.usp.br>

Wed, 31 Jan 2018 04:22:35 +0000 (02:22 -0200)
author Rogério Brito <rbrito@ime.usp.br>
Wed, 31 Jan 2018 04:22:35 +0000 (02:22 -0200)
committer Rogério Brito <rbrito@ime.usp.br>
Wed, 31 Jan 2018 04:22:35 +0000 (02:22 -0200)
diff --git a/AUTHORS b/AUTHORS

index 7e012247c511adddc3d91549602d14cc49b01afe..40215a5cf83f52e0e1252b9bba5fe80f9025c076 100644 (file)
--- a/AUTHORS
+++ b/AUTHORS
@@ -231,3 +231,5 @@ John Dong
  Tatsuyuki Ishi
  Daniel Weber
  Kay Bouché
+Yang Hongbo
+Lei Wang
diff --git a/ChangeLog b/ChangeLog

index bfffb1f5f410861037389853167563a43b41df1b..00c5c9c6be8f39a759ad1d0b2437b7139709114b 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,92 @@
+version 2018.01.27
+
+Core
+* [extractor/common] Improve _json_ld for articles
+* Switch codebase to use compat_b64decode
++ [compat] Add compat_b64decode
+
+Extractors
++ [seznamzpravy] Add support for seznam.cz and seznamzpravy.cz (#14102, #14616)
+* [dplay] Bypass geo restriction
++ [dplay] Add support for disco-api videos (#15396)
+* [youtube] Extract precise error messages (#15284)
+* [teachertube] Capture and output error message
+* [teachertube] Fix and relax thumbnail extraction (#15403)
++ [prosiebensat1] Add another clip id regular expression (#15378)
+* [tbs] Update tokenizer url (#15395)
+* [mixcloud] Use compat_b64decode (#15394)
+- [thesixtyone] Remove extractor (#15341)
+
+
+version 2018.01.21
+
+Core
+* [extractor/common] Improve jwplayer DASH formats extraction (#9242, #15187)
+* [utils] Improve scientific notation handling in js_to_json (#14789)
+
+Extractors
++ [southparkdk] Add support for southparkstudios.nu
++ [southpark] Add support for collections (#14803)
+* [franceinter] Fix upload date extraction (#14996)
++ [rtvs] Add support for rtvs.sk (#9242, #15187)
+* [restudy] Fix extraction and extend URL regular expression (#15347)
+* [youtube:live] Improve live detection (#15365)
++ [springboardplatform] Add support for springboardplatform.com
+* [prosiebensat1] Add another clip id regular expression (#15290)
+- [ringtv] Remove extractor (#15345)
+
+
+version 2018.01.18
+
+Extractors
+* [soundcloud] Update client id (#15306)
+- [kamcord] Remove extractor (#15322)
++ [spiegel] Add support for nexx videos (#15285)
+* [twitch] Fix authentication and error capture (#14090, #15264)
+* [vk] Detect more errors due to copyright complaints (#15259)
+
+
+version 2018.01.14
+
+Extractors
+* [youtube] Fix live streams extraction (#15202)
+* [wdr] Bypass geo restriction
+* [wdr] Rework extractors (#14598)
++ [wdr] Add support for wdrmaus.de/elefantenseite (#14598)
++ [gamestar] Add support for gamepro.de (#3384)
+* [viafree] Skip rtmp formats (#15232)
++ [pandoratv] Add support for mobile URLs (#12441)
++ [pandoratv] Add support for new URL format (#15131)
++ [ximalaya] Add support for ximalaya.com (#14687)
++ [digg] Add support for digg.com (#15214)
+* [limelight] Tolerate empty pc formats (#15150, #15151, #15207)
+* [ndr:embed:base] Make separate formats extraction non fatal (#15203)
++ [weibo] Add extractor (#15079)
++ [ok] Add support for live streams
+* [canalplus] Fix extraction (#15072)
+* [bilibili] Fix extraction (#15188)
+
+
+version 2018.01.07
+
+Core
+* [utils] Fix youtube-dl under PyPy3 on Windows
+* [YoutubeDL] Output python implementation in debug header
+
+Extractors
++ [jwplatform] Add support for multiple embeds (#15192)
+* [mitele] Fix extraction (#15186)
++ [motherless] Add support for groups (#15124)
+* [lynda] Relax URL regular expression (#15185)
+* [soundcloud] Fallback to avatar picture for thumbnail (#12878)
+* [youku] Fix list extraction (#15135)
+* [openload] Fix extraction (#15166)
+* [lynda] Skip invalid subtitles (#15159)
+* [twitch] Pass video id to url_result when extracting playlist (#15139)
+* [rtve.es:alacarta] Fix extraction of some new URLs
+* [acast] Fix extraction (#15147)
+
+
  version 2017.12.31
  
  Core
diff --git a/README.md b/README.md

index 47b0640abfd179a13909835e2e05af589bb60e6d..eb05f848f73327eff3cde5f99c5715f23ef70b4b 100644 (file)
--- a/README.md
+++ b/README.md
@@ -46,7 +46,7 @@ Or with [MacPorts](https://www.macports.org/):
  Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
  
  # DESCRIPTION
-**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
+**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
  
      youtube-dl [OPTIONS] URL [URL...]
  
@@ -863,7 +863,7 @@ Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
  
  In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
  
-Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
+Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
  
  Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
  
diff --git a/README.txt b/README.txt

index 0a748ea7bab0075b8abd3984fc48685ed8d7d253..54b6137fbc9587ac2792fa7105d711dcbe1a874b 100644 (file)
--- a/README.txt
+++ b/README.txt
@@ -61,9 +61,9 @@ DESCRIPTION
  YOUTUBE-DL is a command-line program to download videos from YouTube.com
  and a few more sites. It requires the Python interpreter, version 2.6,
  2.7, or 3.2+, and it is not platform specific. It should work on your
-Unix box, on Windows or on Mac OS X. It is released to the public
-domain, which means you can modify it, redistribute it or use it however
-you like.
+Unix box, on Windows or on macOS. It is released to the public domain,
+which means you can modify it, redistribute it or use it however you
+like.
  
      youtube-dl [OPTIONS] URL [URL...]
  
@@ -1145,8 +1145,8 @@ first line of the cookies file must be either # HTTP Cookie File or
  # Netscape HTTP Cookie File. Make sure you have correct newline format
  in the cookies file and convert newlines if necessary to correspond with
  your OS, namely CRLF (\r\n) for Windows and LF (\n) for Unix and
-Unix-like systems (Linux, Mac OS, etc.). HTTP Error 400: Bad Request
-when using --cookies is a good sign of invalid newline format.
+Unix-like systems (Linux, macOS, etc.). HTTP Error 400: Bad Request when
+using --cookies is a good sign of invalid newline format.
  
  Passing cookies to youtube-dl is a good way to workaround login when a
  particular extractor does not implement it explicitly. Another use case
diff --git a/devscripts/install_jython.sh b/devscripts/install_jython.sh

new file mode 100755 (executable)

index 0000000..bafca4d
--- /dev/null
+++ b/devscripts/install_jython.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+wget http://central.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar
+java -jar jython-installer-2.7.1.jar -s -d "$HOME/jython"
+$HOME/jython/bin/jython -m pip install nose
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 75bd5c922ac59e873d5ca998797b4ad3eadef8df..c15b5eec5b57557f853a4c1188faf1aa3cea31ca 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -128,7 +128,7 @@
   - **CamdemyFolder**
   - **CamWithHer**
   - **canalc2.tv**
- - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
+ - **Canalplus**: mycanal.fr and piwiplus.fr
   - **Canvas**
   - **CanvasEen**: canvas.be and een.be
   - **CarambaTV**
@@ -210,6 +210,7 @@
   - **defense.gouv.fr**
   - **democracynow**
   - **DHM**: Filmarchiv - Deutsches Historisches Museum
+ - **Digg**
   - **DigitallySpeaking**
   - **Digiteka**
   - **Discovery**
@@ -382,7 +383,6 @@
   - **JWPlatform**
   - **Kakao**
   - **Kaltura**
- - **Kamcord**
   - **KanalPlay**: Kanal 5/9/11 Play
   - **Kankan**
   - **Karaoketv**
@@ -478,6 +478,7 @@
   - **Moniker**: allmyvideos.net and vidspot.net
   - **Morningstar**: morningstar.com
   - **Motherless**
+ - **MotherlessGroup**
   - **Motorsport**: motorsport.com
   - **MovieClips**
   - **MovieFap**
@@ -681,7 +682,6 @@
   - **revision**
   - **revision3:embed**
   - **RICE**
- - **RingTV**
   - **RMCDecouverte**
   - **RockstarGames**
   - **RoosterTeeth**
@@ -702,6 +702,7 @@
   - **rtve.es:live**: RTVE.es live streams
   - **rtve.es:television**
   - **RTVNH**
+ - **RTVS**
   - **Rudo**
   - **RUHD**
   - **RulePorn**
@@ -731,6 +732,8 @@
   - **ServingSys**
   - **Servus**
   - **Sexu**
+ - **SeznamZpravy**
+ - **SeznamZpravyArticle**
   - **Shahid**
   - **ShahidShow**
   - **Shared**: shared.sx
@@ -772,7 +775,7 @@
   - **Sport5**
   - **SportBoxEmbed**
   - **SportDeutschland**
- - **Sportschau**
+ - **SpringboardPlatform**
   - **Sprout**
   - **sr:mediathek**: Saarländischer Rundfunk
   - **SRGSSR**
@@ -821,7 +824,6 @@
   - **ThePlatform**
   - **ThePlatformFeed**
   - **TheScene**
- - **TheSixtyOne**
   - **TheStar**
   - **TheSun**
   - **TheWeatherChannel**
@@ -1001,10 +1003,14 @@
   - **WatchIndianPorn**: Watch Indian Porn
   - **WDR**
   - **wdr:mobile**
+ - **WDRElefant**
+ - **WDRPage**
   - **Webcaster**
   - **WebcasterFeed**
   - **WebOfStories**
   - **WebOfStoriesPlaylist**
+ - **Weibo**
+ - **WeiboMobile**
   - **WeiqiTV**: WQTV
   - **wholecloud**: WholeCloud
   - **Wimp**
@@ -1024,6 +1030,8 @@
   - **xiami:artist**: 虾米音乐 - 歌手
   - **xiami:collection**: 虾米音乐 - 精选集
   - **xiami:song**: 虾米音乐
+ - **ximalaya**: 喜马拉雅FM
+ - **ximalaya:album**: 喜马拉雅FM 专辑
   - **XMinus**
   - **XNXX**
   - **Xstream**
diff --git a/test/test_download.py b/test/test_download.py

index 209f5f6d673c00c77e040430fda4b5d9b87f9c13..ebe820dfc1990e4df6758795345375375402900b 100644 (file)
--- a/test/test_download.py
+++ b/test/test_download.py
@@ -92,8 +92,8 @@ class TestDownload(unittest.TestCase):
  def generator(test_case, tname):
  
      def test_template(self):
-        ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
-        other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
+        ie = youtube_dl.extractor.get_info_extractor(test_case['name'])()
+        other_ies = [get_info_extractor(ie_key)() for ie_key in test_case.get('add_ie', [])]
          is_playlist = any(k.startswith('playlist') for k in test_case)
          test_cases = test_case.get(
              'playlist', [] if is_playlist else [test_case])
diff --git a/test/test_utils.py b/test/test_utils.py

index 0857c0fc0cef84b7e09a09ef77dc815d4a508cef..fdf6031f7db157b4989aafc345380ce184f91458 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -814,6 +814,9 @@ class TestUtil(unittest.TestCase):
          inp = '''{"duration": "00:01:07"}'''
          self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
  
+        inp = '''{segments: [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}'''
+        self.assertEqual(js_to_json(inp), '''{"segments": [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}''')
+
      def test_js_to_json_edgecases(self):
          on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
          self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@@ -885,6 +888,13 @@ class TestUtil(unittest.TestCase):
          on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
          self.assertEqual(json.loads(on), {'42': 42})
  
+        on = js_to_json('{42:4.2e1}')
+        self.assertEqual(json.loads(on), {'42': 42.0})
+
+    def test_js_to_json_malformed(self):
+        self.assertEqual(js_to_json('42a1'), '42"a1"')
+        self.assertEqual(js_to_json('42a-1'), '42"a"-1')
+
      def test_extract_attributes(self):
          self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
          self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
diff --git a/youtube-dl b/youtube-dl

index d00c30eba1c9fbb46745744ed936827f4a71d556..fb4e30dda41667da0ab86275791106f0e5717519 100755 (executable)

Binary files a/youtube-dl and b/youtube-dl differ
diff --git a/youtube-dl.1 b/youtube-dl.1

index 410fce138b3daa4ec54743527a7d1518a84840f3..3800a96a6515698da8f9d019bb7dbd64615abb09 100644 (file)
--- a/youtube-dl.1
+++ b/youtube-dl.1
@@ -11,7 +11,7 @@ youtube\-dl \- download videos from youtube.com or other video platforms
  YouTube.com and a few more sites.
  It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is
  not platform specific.
-It should work on your Unix box, on Windows or on Mac OS X.
+It should work on your Unix box, on Windows or on macOS.
  It is released to the public domain, which means you can modify it,
  redistribute it or use it however you like.
  .SH OPTIONS
@@ -1746,7 +1746,7 @@ Make sure you have correct newline
  format (https://en.wikipedia.org/wiki/Newline) in the cookies file and
  convert newlines if necessary to correspond with your OS, namely
  \f[C]CRLF\f[] (\f[C]\\r\\n\f[]) for Windows and \f[C]LF\f[]
-(\f[C]\\n\f[]) for Unix and Unix\-like systems (Linux, Mac OS, etc.).
+(\f[C]\\n\f[]) for Unix and Unix\-like systems (Linux, macOS, etc.).
  \f[C]HTTP\ Error\ 400:\ Bad\ Request\f[] when using \f[C]\-\-cookies\f[]
  is a good sign of invalid newline format.
  .PP
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index ace80f14b8bdac8dca7c3d87e08179ead17c5c02..97bd9c526dc60d71d569b625e2cebd7c6af46bd0 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -2233,8 +2233,16 @@ class YoutubeDL(object):
                  sys.exc_clear()
              except Exception:
                  pass
-        self._write_string('[debug] Python version %s - %s\n' % (
-            platform.python_version(), platform_name()))
+
+        def python_implementation():
+            impl_name = platform.python_implementation()
+            if impl_name == 'PyPy' and hasattr(sys, 'pypy_version_info'):
+                return impl_name + ' version %d.%d.%d' % sys.pypy_version_info[:3]
+            return impl_name
+
+        self._write_string('[debug] Python version %s (%s) - %s\n' % (
+            platform.python_version(), python_implementation(),
+            platform_name()))
  
          exe_versions = FFmpegPostProcessor.get_versions(self)
          exe_versions['rtmpdump'] = rtmpdump_version()
diff --git a/youtube_dl/aes.py b/youtube_dl/aes.py

index c5bb3c4ef1561847a1025a0b35095a2224582efe..461bb6d413a91bde8f408667d838088c5f8e11be 100644 (file)
--- a/youtube_dl/aes.py
+++ b/youtube_dl/aes.py
@@ -1,8 +1,8 @@
  from __future__ import unicode_literals
  
-import base64
  from math import ceil
  
+from .compat import compat_b64decode
  from .utils import bytes_to_intlist, intlist_to_bytes
  
  BLOCK_SIZE_BYTES = 16
@@ -180,7 +180,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
      """
      NONCE_LENGTH_BYTES = 8
  
-    data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
+    data = bytes_to_intlist(compat_b64decode(data))
      password = bytes_to_intlist(password.encode('utf-8'))
  
      key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password))
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index 2a62248ef7bc6b9da459fea1e7a7e77a65e9ed95..646c9d79ccc8826a624c805853d5c02faf63f59d 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -1,14 +1,17 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import base64
  import binascii
  import collections
+import ctypes
  import email
  import getpass
  import io
  import itertools
  import optparse
  import os
+import platform
  import re
  import shlex
  import shutil
@@ -2906,14 +2909,44 @@ except ImportError:  # not 2.6+ or is 3.x
      except ImportError:
          compat_zip = zip
  
+
+if sys.version_info < (3, 3):
+    def compat_b64decode(s, *args, **kwargs):
+        if isinstance(s, compat_str):
+            s = s.encode('ascii')
+        return base64.b64decode(s, *args, **kwargs)
+else:
+    compat_b64decode = base64.b64decode
+
+
+if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
+    # PyPy2 prior to version 5.4.0 expects byte strings as Windows function
+    # names, see the original PyPy issue [1] and the youtube-dl one [2].
+    # 1. https://bitbucket.org/pypy/pypy/issues/2360/windows-ctypescdll-typeerror-function-name
+    # 2. https://github.com/rg3/youtube-dl/pull/4392
+    def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
+        real = ctypes.WINFUNCTYPE(*args, **kwargs)
+
+        def resf(tpl, *args, **kwargs):
+            funcname, dll = tpl
+            return real((str(funcname), dll), *args, **kwargs)
+
+        return resf
+else:
+    def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
+        return ctypes.WINFUNCTYPE(*args, **kwargs)
+
+
  __all__ = [
      'compat_HTMLParseError',
      'compat_HTMLParser',
      'compat_HTTPError',
+    'compat_b64decode',
      'compat_basestring',
      'compat_chr',
      'compat_cookiejar',
      'compat_cookies',
+    'compat_ctypes_WINFUNCTYPE',
      'compat_etree_fromstring',
      'compat_etree_register_namespace',
      'compat_expanduser',
diff --git a/youtube_dl/downloader/f4m.py b/youtube_dl/downloader/f4m.py

index fdb80f42ae3fd61d76e9fe0d14274da1c127d289..15e71be9a4f482e2739419c9fc512d423976dd9e 100644 (file)
--- a/youtube_dl/downloader/f4m.py
+++ b/youtube_dl/downloader/f4m.py
@@ -1,12 +1,12 @@
  from __future__ import division, unicode_literals
  
-import base64
  import io
  import itertools
  import time
  
  from .fragment import FragmentFD
  from ..compat import (
+    compat_b64decode,
      compat_etree_fromstring,
      compat_urlparse,
      compat_urllib_error,
@@ -312,7 +312,7 @@ class F4mFD(FragmentFD):
              boot_info = self._get_bootstrap_from_url(bootstrap_url)
          else:
              bootstrap_url = None
-            bootstrap = base64.b64decode(node.text.encode('ascii'))
+            bootstrap = compat_b64decode(node.text)
              boot_info = read_bootstrap_info(bootstrap)
          return boot_info, bootstrap_url
  
@@ -349,7 +349,7 @@ class F4mFD(FragmentFD):
          live = boot_info['live']
          metadata_node = media.find(_add_ns('metadata'))
          if metadata_node is not None:
-            metadata = base64.b64decode(metadata_node.text.encode('ascii'))
+            metadata = compat_b64decode(metadata_node.text)
          else:
              metadata = None
  
diff --git a/youtube_dl/extractor/acast.py b/youtube_dl/extractor/acast.py

index 6dace305141423ec35d25abe3c4283476a762b2f..5871e72dca61cc64dd57833d891bd77e854e93df 100644 (file)
--- a/youtube_dl/extractor/acast.py
+++ b/youtube_dl/extractor/acast.py
@@ -8,7 +8,7 @@ from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
      int_or_none,
-    parse_iso8601,
+    unified_timestamp,
      OnDemandPagedList,
  )
  
@@ -32,7 +32,7 @@ class ACastIE(InfoExtractor):
      }, {
          # test with multiple blings
          'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
-        'md5': '55c0097badd7095f494c99a172f86501',
+        'md5': 'e87d5b8516cd04c0d81b6ee1caca28d0',
          'info_dict': {
              'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
              'ext': 'mp3',
@@ -40,23 +40,24 @@ class ACastIE(InfoExtractor):
              'timestamp': 1477346700,
              'upload_date': '20161024',
              'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
-            'duration': 2797,
+            'duration': 2766,
          }
      }]
  
      def _real_extract(self, url):
          channel, display_id = re.match(self._VALID_URL, url).groups()
          cast_data = self._download_json(
-            'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
+            'https://play-api.acast.com/splash/%s/%s' % (channel, display_id), display_id)
+        e = cast_data['result']['episode']
          return {
-            'id': compat_str(cast_data['id']),
+            'id': compat_str(e['id']),
              'display_id': display_id,
-            'url': [b['audio'] for b in cast_data['blings'] if b['type'] == 'BlingAudio'][0],
-            'title': cast_data['name'],
-            'description': cast_data.get('description'),
-            'thumbnail': cast_data.get('image'),
-            'timestamp': parse_iso8601(cast_data.get('publishingDate')),
-            'duration': int_or_none(cast_data.get('duration')),
+            'url': e['mediaUrl'],
+            'title': e['name'],
+            'description': e.get('description'),
+            'thumbnail': e.get('image'),
+            'timestamp': unified_timestamp(e.get('publishingDate')),
+            'duration': int_or_none(e.get('duration')),
          }
  
  
diff --git a/youtube_dl/extractor/adn.py b/youtube_dl/extractor/adn.py

index cffdab6ca4a4488caf06aa73579e363a7ede8fd4..64fb755da0f663670778453590afa75974f8895d 100644 (file)
--- a/youtube_dl/extractor/adn.py
+++ b/youtube_dl/extractor/adn.py
@@ -1,13 +1,15 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import json
  import os
  
  from .common import InfoExtractor
  from ..aes import aes_cbc_decrypt
-from ..compat import compat_ord
+from ..compat import (
+    compat_b64decode,
+    compat_ord,
+)
  from ..utils import (
      bytes_to_intlist,
      ExtractorError,
@@ -48,9 +50,9 @@ class ADNIE(InfoExtractor):
  
          # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
          dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
-            bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
+            bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
              bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'),
-            bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
+            bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
          ))
          subtitles_json = self._parse_json(
              dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),
diff --git a/youtube_dl/extractor/bigflix.py b/youtube_dl/extractor/bigflix.py

index b4ce767af6735321ab08769e4d2c87b716b93e65..28e3e59f670995b747fc028f99c285e2ef7e69aa 100644 (file)
--- a/youtube_dl/extractor/bigflix.py
+++ b/youtube_dl/extractor/bigflix.py
@@ -1,11 +1,13 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+    compat_b64decode,
+    compat_urllib_parse_unquote,
+)
  
  
  class BigflixIE(InfoExtractor):
@@ -39,8 +41,8 @@ class BigflixIE(InfoExtractor):
              webpage, 'title')
  
          def decode_url(quoted_b64_url):
-            return base64.b64decode(compat_urllib_parse_unquote(
-                quoted_b64_url).encode('ascii')).decode('utf-8')
+            return compat_b64decode(compat_urllib_parse_unquote(
+                quoted_b64_url)).decode('utf-8')
  
          formats = []
          for height, encoded_url in re.findall(
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index 1e57310d657fee2c47cb8e63a9868a36724f0f6a..beffcecd09f55ad4bd5365639ffb2d0459a624f2 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -102,6 +102,7 @@ class BiliBiliIE(InfoExtractor):
                      video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
              headers = {
                  'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
+                'Referer': url
              }
              headers.update(self.geo_verification_headers())
  
@@ -116,10 +117,15 @@ class BiliBiliIE(InfoExtractor):
          payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
          sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
  
+        headers = {
+            'Referer': url
+        }
+        headers.update(self.geo_verification_headers())
+
          video_info = self._download_json(
              'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
              video_id, note='Downloading video info page',
-            headers=self.geo_verification_headers())
+            headers=headers)
  
          if 'durl' not in video_info:
              self._report_error(video_info)
diff --git a/youtube_dl/extractor/canalplus.py b/youtube_dl/extractor/canalplus.py

index d8bf073f40cf171d547992679d295055f82ef386..51c11cb7e18d36a2144ae5b1614cfeb0f7a4be83 100644 (file)
--- a/youtube_dl/extractor/canalplus.py
+++ b/youtube_dl/extractor/canalplus.py
@@ -4,59 +4,36 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
-    dict_get,
      # ExtractorError,
      # HEADRequest,
      int_or_none,
      qualities,
-    remove_end,
      unified_strdate,
  )
  
  
  class CanalplusIE(InfoExtractor):
-    IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
-    _VALID_URL = r'''(?x)
-                        https?://
-                            (?:
-                                (?:
-                                    (?:(?:www|m)\.)?canalplus\.fr|
-                                    (?:www\.)?piwiplus\.fr|
-                                    (?:www\.)?d8\.tv|
-                                    (?:www\.)?c8\.fr|
-                                    (?:www\.)?d17\.tv|
-                                    (?:(?:football|www)\.)?cstar\.fr|
-                                    (?:www\.)?itele\.fr
-                                )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
-                                player\.canalplus\.fr/#/(?P<id>\d+)
-                            )
-
-                    '''
+    IE_DESC = 'mycanal.fr and piwiplus.fr'
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>mycanal|piwiplus)\.fr/(?:[^/]+/)*(?P<display_id>[^?/]+)(?:\.html\?.*\bvid=|/p/)(?P<id>\d+)'
      _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
      _SITE_ID_MAP = {
-        'canalplus': 'cplus',
+        'mycanal': 'cplus',
          'piwiplus': 'teletoon',
-        'd8': 'd8',
-        'c8': 'd8',
-        'd17': 'd17',
-        'cstar': 'd17',
-        'itele': 'itele',
      }
  
      # Only works for direct mp4 URLs
      _GEO_COUNTRIES = ['FR']
  
      _TESTS = [{
-        'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
+        'url': 'https://www.mycanal.fr/d17-emissions/lolywood/p/1397061',
          'info_dict': {
-            'id': '1405510',
-            'display_id': 'pid1830-c-zapping',
+            'id': '1397061',
+            'display_id': 'lolywood',
              'ext': 'mp4',
-            'title': 'Zapping - 02/07/2016',
-            'description': 'Le meilleur de toutes les chaînes, tous les jours',
-            'upload_date': '20160702',
+            'title': 'Euro 2016 : Je préfère te prévenir - Lolywood - Episode 34',
+            'description': 'md5:7d97039d455cb29cdba0d652a0efaa5e',
+            'upload_date': '20160602',
          },
      }, {
          # geo restricted, bypassed
@@ -70,64 +47,12 @@ class CanalplusIE(InfoExtractor):
              'upload_date': '20140724',
          },
          'expected_warnings': ['HTTP Error 403: Forbidden'],
-    }, {
-        # geo restricted, bypassed
-        'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html?vid=1443684',
-        'md5': 'bb6f9f343296ab7ebd88c97b660ecf8d',
-        'info_dict': {
-            'id': '1443684',
-            'display_id': 'pid6318-videos-integrales',
-            'ext': 'mp4',
-            'title': 'Guess my iep ! - TPMP - 07/04/2017',
-            'description': 'md5:6f005933f6e06760a9236d9b3b5f17fa',
-            'upload_date': '20170407',
-        },
-        'expected_warnings': ['HTTP Error 403: Forbidden'],
-    }, {
-        'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
-        'info_dict': {
-            'id': '1420176',
-            'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
-            'ext': 'mp4',
-            'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
-            'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
-            'upload_date': '20161014',
-        },
-    }, {
-        'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
-        'info_dict': {
-            'id': '1416769',
-            'display_id': 'pid7566-feminines-videos',
-            'ext': 'mp4',
-            'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
-            'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
-            'upload_date': '20160921',
-        },
-        'params': {
-            'skip_download': True,
-        },
-    }, {
-        'url': 'http://m.canalplus.fr/?vid=1398231',
-        'only_matching': True,
-    }, {
-        'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
-        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-
-        site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
-
-        # Beware, some subclasses do not define an id group
-        display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
+        site, display_id, video_id = re.match(self._VALID_URL, url).groups()
  
-        webpage = self._download_webpage(url, display_id)
-        video_id = self._search_regex(
-            [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
-             r'id=["\']canal_video_player(?P<id>\d+)',
-             r'data-video=["\'](?P<id>\d+)'],
-            webpage, 'video id', default=mobj.group('vid'), group='id')
+        site_id = self._SITE_ID_MAP[site]
  
          info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
          video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
@@ -161,7 +86,7 @@ class CanalplusIE(InfoExtractor):
                      format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
              else:
                  formats.append({
-                    # the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
+                    # the secret extracted from ya function in http://player.canalplus.fr/common/js/canalPlayer.js
                      'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
                      'format_id': format_id,
                      'preference': preference(format_id),
diff --git a/youtube_dl/extractor/chilloutzone.py b/youtube_dl/extractor/chilloutzone.py

index d4769da75e021562d4229474585e048780e9d0d2..5aac212991f1625cde463936a94399e0b37ad8a9 100644 (file)
--- a/youtube_dl/extractor/chilloutzone.py
+++ b/youtube_dl/extractor/chilloutzone.py
@@ -1,11 +1,11 @@
  from __future__ import unicode_literals
  
  import re
-import base64
  import json
  
  from .common import InfoExtractor
  from .youtube import YoutubeIE
+from ..compat import compat_b64decode
  from ..utils import (
      clean_html,
      ExtractorError
@@ -58,7 +58,7 @@ class ChilloutzoneIE(InfoExtractor):
  
          base64_video_info = self._html_search_regex(
              r'var cozVidData = "(.+?)";', webpage, 'video data')
-        decoded_video_info = base64.b64decode(base64_video_info.encode('utf-8')).decode('utf-8')
+        decoded_video_info = compat_b64decode(base64_video_info).decode('utf-8')
          video_info_dict = json.loads(decoded_video_info)
  
          # get video information from dict
diff --git a/youtube_dl/extractor/chirbit.py b/youtube_dl/extractor/chirbit.py

index 4815b34be7832144075793217de77ba44b7c9471..8d75cdf199b7f7b8ac5321c1e58b7ef57d4f2622 100644 (file)
--- a/youtube_dl/extractor/chirbit.py
+++ b/youtube_dl/extractor/chirbit.py
@@ -1,10 +1,10 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_b64decode
  from ..utils import parse_duration
  
  
@@ -44,8 +44,7 @@ class ChirbitIE(InfoExtractor):
  
          # Reverse engineered from https://chirb.it/js/chirbit.player.js (look
          # for soundURL)
-        audio_url = base64.b64decode(
-            data_fd[::-1].encode('ascii')).decode('utf-8')
+        audio_url = compat_b64decode(data_fd[::-1]).decode('utf-8')
  
          title = self._search_regex(
              r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 5b6a09c0b7f732ef5e9361775e42719af0deb8b7..deafb48508fc7a0def88e5bd23fab558d37d8213 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -1027,7 +1027,7 @@ class InfoExtractor(object):
                      part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
                      if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
                          info['series'] = unescapeHTML(part_of_series.get('name'))
-                elif item_type == 'Article':
+                elif item_type in ('Article', 'NewsArticle'):
                      info.update({
                          'timestamp': parse_iso8601(e.get('datePublished')),
                          'title': unescapeHTML(e.get('headline')),
@@ -2404,7 +2404,7 @@ class InfoExtractor(object):
                  formats.extend(self._extract_m3u8_formats(
                      source_url, video_id, 'mp4', entry_protocol='m3u8_native',
                      m3u8_id=m3u8_id, fatal=False))
-            elif ext == 'mpd':
+            elif source_type == 'dash' or ext == 'mpd':
                  formats.extend(self._extract_mpd_formats(
                      source_url, video_id, mpd_id=mpd_id, fatal=False))
              elif ext == 'smil':
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index b92f2544799769abc9690a04d4d1090af435ca79..3efdc8c21dce170691333f8ee297d65c7c6d1e06 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -3,13 +3,13 @@ from __future__ import unicode_literals
  
  import re
  import json
-import base64
  import zlib
  
  from hashlib import sha1
  from math import pow, sqrt, floor
  from .common import InfoExtractor
  from ..compat import (
+    compat_b64decode,
      compat_etree_fromstring,
      compat_urllib_parse_urlencode,
      compat_urllib_request,
@@ -272,8 +272,8 @@ class CrunchyrollIE(CrunchyrollBaseIE):
      }
  
      def _decrypt_subtitles(self, data, iv, id):
-        data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
-        iv = bytes_to_intlist(base64.b64decode(iv.encode('utf-8')))
+        data = bytes_to_intlist(compat_b64decode(data))
+        iv = bytes_to_intlist(compat_b64decode(iv))
          id = int(id)
  
          def obfuscate_key_aux(count, modulo, start):
diff --git a/youtube_dl/extractor/daisuki.py b/youtube_dl/extractor/daisuki.py

index 5c9ac68a02590b3eccea83cd4d9a49ffca47a2e8..dbc1aa5d460aeb0c570bca7a3f988720327cc6d1 100644 (file)
--- a/youtube_dl/extractor/daisuki.py
+++ b/youtube_dl/extractor/daisuki.py
@@ -10,6 +10,7 @@ from ..aes import (
      aes_cbc_decrypt,
      aes_cbc_encrypt,
  )
+from ..compat import compat_b64decode
  from ..utils import (
      bytes_to_intlist,
      bytes_to_long,
@@ -93,7 +94,7 @@ class DaisukiMottoIE(InfoExtractor):
  
          rtn = self._parse_json(
              intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
-                base64.b64decode(encrypted_rtn)),
+                compat_b64decode(encrypted_rtn)),
                  aes_key, iv)).decode('utf-8').rstrip('\0'),
              video_id)
  
diff --git a/youtube_dl/extractor/digg.py b/youtube_dl/extractor/digg.py

new file mode 100644 (file)

index 0000000..913c175
--- /dev/null
+++ b/youtube_dl/extractor/digg.py
@@ -0,0 +1,56 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import js_to_json
+
+
+class DiggIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?digg\.com/video/(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        # JWPlatform via provider
+        'url': 'http://digg.com/video/sci-fi-short-jonah-daniel-kaluuya-get-out',
+        'info_dict': {
+            'id': 'LcqvmS0b',
+            'ext': 'mp4',
+            'title': "'Get Out' Star Daniel Kaluuya Goes On 'Moby Dick'-Like Journey In Sci-Fi Short 'Jonah'",
+            'description': 'md5:541bb847648b6ee3d6514bc84b82efda',
+            'upload_date': '20180109',
+            'timestamp': 1515530551,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # Youtube via provider
+        'url': 'http://digg.com/video/dog-boat-seal-play',
+        'only_matching': True,
+    }, {
+        # vimeo as regular embed
+        'url': 'http://digg.com/video/dream-girl-short-film',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        info = self._parse_json(
+            self._search_regex(
+                r'(?s)video_info\s*=\s*({.+?});\n', webpage, 'video info',
+                default='{}'), display_id, transform_source=js_to_json,
+            fatal=False)
+
+        video_id = info.get('video_id')
+
+        if video_id:
+            provider = info.get('provider_name')
+            if provider == 'youtube':
+                return self.url_result(
+                    video_id, ie='Youtube', video_id=video_id)
+            elif provider == 'jwplayer':
+                return self.url_result(
+                    'jwplatform:%s' % video_id, ie='JWPlatform',
+                    video_id=video_id)
+
+        return self.url_result(url, 'Generic')
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

index 76e784105451293705f0057dbfe5035d73bb5c1d..a08dace4318a05dfacd9c6e9ad94508934b615ca 100644 (file)
--- a/youtube_dl/extractor/dplay.py
+++ b/youtube_dl/extractor/dplay.py
@@ -12,25 +12,28 @@ from ..compat import (
      compat_urlparse,
  )
  from ..utils import (
+    determine_ext,
      ExtractorError,
+    float_or_none,
      int_or_none,
      remove_end,
      try_get,
      unified_strdate,
+    unified_timestamp,
      update_url_query,
      USER_AGENTS,
  )
  
  
  class DPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?P<domain>www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:videoer/)?(?P<id>[^/]+/[^/?#]+)'
  
      _TESTS = [{
          # non geo restricted, via secure api, unsigned download hls URL
          'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
          'info_dict': {
              'id': '3172',
-            'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
+            'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet',
              'ext': 'mp4',
              'title': 'Svensken lär sig njuta av livet',
              'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
@@ -48,7 +51,7 @@ class DPlayIE(InfoExtractor):
          'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
          'info_dict': {
              'id': '70816',
-            'display_id': 'season-6-episode-12',
+            'display_id': 'mig-og-min-mor/season-6-episode-12',
              'ext': 'mp4',
              'title': 'Episode 12',
              'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
@@ -65,6 +68,30 @@ class DPlayIE(InfoExtractor):
          # geo restricted, via direct unsigned hls URL
          'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
          'only_matching': True,
+    }, {
+        # disco-api
+        'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
+        'info_dict': {
+            'id': '40206',
+            'display_id': 'i-kongens-klr/sesong-1-episode-7',
+            'ext': 'mp4',
+            'title': 'Episode 7',
+            'description': 'md5:e3e1411b2b9aebeea36a6ec5d50c60cf',
+            'duration': 2611.16,
+            'timestamp': 1516726800,
+            'upload_date': '20180123',
+            'series': 'I kongens klær',
+            'season_number': 1,
+            'episode_number': 7,
+        },
+        'params': {
+            'format': 'bestvideo',
+            'skip_download': True,
+        },
+    }, {
+        # geo restricted, bypassable via X-Forwarded-For
+        'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -72,10 +99,81 @@ class DPlayIE(InfoExtractor):
          display_id = mobj.group('id')
          domain = mobj.group('domain')
  
+        self._initialize_geo_bypass([mobj.group('country').upper()])
+
          webpage = self._download_webpage(url, display_id)
  
          video_id = self._search_regex(
-            r'data-video-id=["\'](\d+)', webpage, 'video id')
+            r'data-video-id=["\'](\d+)', webpage, 'video id', default=None)
+
+        if not video_id:
+            host = mobj.group('host')
+            disco_base = 'https://disco-api.%s' % host
+            self._download_json(
+                '%s/token' % disco_base, display_id, 'Downloading token',
+                query={
+                    'realm': host.replace('.', ''),
+                })
+            video = self._download_json(
+                '%s/content/videos/%s' % (disco_base, display_id), display_id,
+                headers={
+                    'Referer': url,
+                    'x-disco-client': 'WEB:UNKNOWN:dplay-client:0.0.1',
+                }, query={
+                    'include': 'show'
+                })
+            video_id = video['data']['id']
+            info = video['data']['attributes']
+            title = info['name']
+            formats = []
+            for format_id, format_dict in self._download_json(
+                    '%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
+                    display_id)['data']['attributes']['streaming'].items():
+                if not isinstance(format_dict, dict):
+                    continue
+                format_url = format_dict.get('url')
+                if not format_url:
+                    continue
+                ext = determine_ext(format_url)
+                if format_id == 'dash' or ext == 'mpd':
+                    formats.extend(self._extract_mpd_formats(
+                        format_url, display_id, mpd_id='dash', fatal=False))
+                elif format_id == 'hls' or ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, display_id, 'mp4',
+                        entry_protocol='m3u8_native', m3u8_id='hls',
+                        fatal=False))
+                else:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                    })
+            self._sort_formats(formats)
+
+            series = None
+            try:
+                included = video.get('included')
+                if isinstance(included, list):
+                    show = next(e for e in included if e.get('type') == 'show')
+                    series = try_get(
+                        show, lambda x: x['attributes']['name'], compat_str)
+            except StopIteration:
+                pass
+
+            return {
+                'id': video_id,
+                'display_id': display_id,
+                'title': title,
+                'description': info.get('description'),
+                'duration': float_or_none(
+                    info.get('videoDuration'), scale=1000),
+                'timestamp': unified_timestamp(info.get('publishStart')),
+                'series': series,
+                'season_number': int_or_none(info.get('seasonNumber')),
+                'episode_number': int_or_none(info.get('episodeNumber')),
+                'age_limit': int_or_none(info.get('minimum_age')),
+                'formats': formats,
+            }
  
          info = self._download_json(
              'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py

index c9fc9b5a9df65cd8681ce8e0933473ea9658202d..be2e3d37841b48ce7cf187b01ddd5236f85d7683 100644 (file)
--- a/youtube_dl/extractor/dumpert.py
+++ b/youtube_dl/extractor/dumpert.py
@@ -1,10 +1,10 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_b64decode
  from ..utils import (
      qualities,
      sanitized_Request,
@@ -42,7 +42,7 @@ class DumpertIE(InfoExtractor):
              r'data-files="([^"]+)"', webpage, 'data files')
  
          files = self._parse_json(
-            base64.b64decode(files_base64.encode('utf-8')).decode('utf-8'),
+            compat_b64decode(files_base64).decode('utf-8'),
              video_id)
  
          quality = qualities(['flv', 'mobile', 'tablet', '720p'])
diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py

index 3f6268637c87a55d658b59f8a7f65a1d22c000f0..4485bf8c1a8a25e2650ad3568dab7f8b54159ea2 100644 (file)
--- a/youtube_dl/extractor/einthusan.py
+++ b/youtube_dl/extractor/einthusan.py
@@ -1,13 +1,13 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import json
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urlparse,
+    compat_b64decode,
      compat_str,
+    compat_urlparse,
  )
  from ..utils import (
      extract_attributes,
@@ -36,9 +36,9 @@ class EinthusanIE(InfoExtractor):
  
      # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
      def _decrypt(self, encrypted_data, video_id):
-        return self._parse_json(base64.b64decode((
+        return self._parse_json(compat_b64decode((
              encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
-        ).encode('ascii')).decode('utf-8'), video_id)
+        )).decode('utf-8'), video_id)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

index e64defe624dd9158b4cb36320b76d2a867a2d602..b442256fee258b573fcc28e392c0d0ccb7a8ea1c 100644 (file)
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -259,6 +259,7 @@ from .deezer import DeezerPlaylistIE
  from .democracynow import DemocracynowIE
  from .dfb import DFBIE
  from .dhm import DHMIE
+from .digg import DiggIE
  from .dotsub import DotsubIE
  from .douyutv import (
      DouyuShowIE,
@@ -489,7 +490,6 @@ from .jwplatform import JWPlatformIE
  from .jpopsukitv import JpopsukiIE
  from .kakao import KakaoIE
  from .kaltura import KalturaIE
-from .kamcord import KamcordIE
  from .kanalplay import KanalPlayIE
  from .kankan import KankanIE
  from .karaoketv import KaraoketvIE
@@ -609,7 +609,10 @@ from .mofosex import MofosexIE
  from .mojvideo import MojvideoIE
  from .moniker import MonikerIE
  from .morningstar import MorningstarIE
-from .motherless import MotherlessIE
+from .motherless import (
+    MotherlessIE,
+    MotherlessGroupIE
+)
  from .motorsport import MotorsportIE
  from .movieclips import MovieClipsIE
  from .moviezine import MoviezineIE
@@ -878,7 +881,6 @@ from .revision3 import (
      Revision3IE,
  )
  from .rice import RICEIE
-from .ringtv import RingTVIE
  from .rmcdecouverte import RMCDecouverteIE
  from .ro220 import Ro220IE
  from .rockstargames import RockstarGamesIE
@@ -898,6 +900,7 @@ from .rtp import RTPIE
  from .rts import RTSIE
  from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
  from .rtvnh import RTVNHIE
+from .rtvs import RTVSIE
  from .rudo import RudoIE
  from .ruhd import RUHDIE
  from .ruleporn import RulePornIE
@@ -930,6 +933,10 @@ from .servingsys import ServingSysIE
  from .servus import ServusIE
  from .sevenplus import SevenPlusIE
  from .sexu import SexuIE
+from .seznamzpravy import (
+    SeznamZpravyIE,
+    SeznamZpravyArticleIE,
+)
  from .shahid import (
      ShahidIE,
      ShahidShowIE,
@@ -987,7 +994,7 @@ from .stitcher import StitcherIE
  from .sport5 import Sport5IE
  from .sportbox import SportBoxEmbedIE
  from .sportdeutschland import SportDeutschlandIE
-from .sportschau import SportschauIE
+from .springboardplatform import SpringboardPlatformIE
  from .sprout import SproutIE
  from .srgssr import (
      SRGSSRIE,
@@ -1043,7 +1050,6 @@ from .theplatform import (
      ThePlatformFeedIE,
  )
  from .thescene import TheSceneIE
-from .thesixtyone import TheSixtyOneIE
  from .thestar import TheStarIE
  from .thesun import TheSunIE
  from .theweatherchannel import TheWeatherChannelIE
@@ -1285,6 +1291,8 @@ from .watchbox import WatchBoxIE
  from .watchindianporn import WatchIndianPornIE
  from .wdr import (
      WDRIE,
+    WDRPageIE,
+    WDRElefantIE,
      WDRMobileIE,
  )
  from .webcaster import (
@@ -1295,6 +1303,10 @@ from .webofstories import (
      WebOfStoriesIE,
      WebOfStoriesPlaylistIE,
  )
+from .weibo import (
+    WeiboIE, 
+    WeiboMobileIE
+)
  from .weiqitv import WeiqiTVIE
  from .wimp import WimpIE
  from .wistia import WistiaIE
@@ -1320,6 +1332,10 @@ from .xiami import (
      XiamiArtistIE,
      XiamiCollectionIE
  )
+from .ximalaya import (
+    XimalayaIE,
+    XimalayaAlbumIE
+)
  from .xminus import XMinusIE
  from .xnxx import XNXXIE
  from .xstream import XstreamIE
diff --git a/youtube_dl/extractor/franceinter.py b/youtube_dl/extractor/franceinter.py

index 707b9e00db02104a43e65ed8e0e94a3b2c7211c7..05806895c0a3300dd9db8e68efb15fcb70bb8006 100644 (file)
--- a/youtube_dl/extractor/franceinter.py
+++ b/youtube_dl/extractor/franceinter.py
@@ -33,7 +33,7 @@ class FranceInterIE(InfoExtractor):
          description = self._og_search_description(webpage)
  
          upload_date_str = self._search_regex(
-            r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
+            r'class=["\']\s*cover-emission-period\s*["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
              webpage, 'upload date', fatal=False)
          if upload_date_str:
              upload_date_list = upload_date_str.split()
diff --git a/youtube_dl/extractor/gamestar.py b/youtube_dl/extractor/gamestar.py

index e607d6ab8215db56afd3810613f24bb2debf63f5..f00dab2f355271dde290ffbf16bc641ae5ad88bd 100644 (file)
--- a/youtube_dl/extractor/gamestar.py
+++ b/youtube_dl/extractor/gamestar.py
@@ -1,6 +1,8 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
@@ -9,27 +11,34 @@ from ..utils import (
  
  
  class GameStarIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?game(?P<site>pro|star)\.de/videos/.*,(?P<id>[0-9]+)\.html'
+    _TESTS = [{
          'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
-        'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
+        'md5': 'ee782f1f8050448c95c5cacd63bc851c',
          'info_dict': {
              'id': '76110',
              'ext': 'mp4',
              'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
              'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'timestamp': 1406542020,
+            'timestamp': 1406542380,
              'upload_date': '20140728',
-            'duration': 17
+            'duration': 17,
          }
-    }
+    }, {
+        'url': 'http://www.gamepro.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.gamestar.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
+        mobj = re.match(self._VALID_URL, url)
+        site = mobj.group('site')
+        video_id = mobj.group('id')
  
-        url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
+        webpage = self._download_webpage(url, video_id)
  
          # TODO: there are multiple ld+json objects in the webpage,
          # while _search_json_ld finds only the first one
@@ -37,16 +46,17 @@ class GameStarIE(InfoExtractor):
              r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
              webpage, 'JSON-LD', group='json_ld'), video_id)
          info_dict = self._json_ld(json_ld, video_id)
-        info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
+        info_dict['title'] = remove_end(
+            info_dict['title'], ' - Game%s' % site.title())
  
-        view_count = json_ld.get('interactionCount')
+        view_count = int_or_none(json_ld.get('interactionCount'))
          comment_count = int_or_none(self._html_search_regex(
-            r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
-            fatal=False))
+            r'<span>Kommentare</span>\s*<span[^>]+class=["\']count[^>]+>\s*\(\s*([0-9]+)',
+            webpage, 'comment count', fatal=False))
  
          info_dict.update({
              'id': video_id,
-            'url': url,
+            'url': 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id,
              'ext': 'mp4',
              'view_count': view_count,
              'comment_count': comment_count
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index cc4c90b8cef194952e1b7d7517d6036329eb1c79..1d9da8115832126671233101dbc3b51759e63a33 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -101,6 +101,7 @@ from .vzaar import VzaarIE
  from .channel9 import Channel9IE
  from .vshare import VShareIE
  from .mediasite import MediasiteIE
+from .springboardplatform import SpringboardPlatformIE
  
  
  class GenericIE(InfoExtractor):
@@ -1938,6 +1939,21 @@ class GenericIE(InfoExtractor):
                  'timestamp': 1474354800,
                  'upload_date': '20160920',
              }
+        },
+        {
+            'url': 'http://www.kidzworld.com/article/30935-trolls-the-beat-goes-on-interview-skylar-astin-and-amanda-leighton',
+            'info_dict': {
+                'id': '1731611',
+                'ext': 'mp4',
+                'title': 'Official Trailer | TROLLS: THE BEAT GOES ON!',
+                'description': 'md5:eb5f23826a027ba95277d105f248b825',
+                'timestamp': 1516100691,
+                'upload_date': '20180116',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'add_ie': [SpringboardPlatformIE.ie_key()],
          }
          # {
          #     # TODO: find another test
@@ -2708,9 +2724,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(viewlift_url)
  
          # Look for JWPlatform embeds
-        jwplatform_url = JWPlatformIE._extract_url(webpage)
-        if jwplatform_url:
-            return self.url_result(jwplatform_url, 'JWPlatform')
+        jwplatform_urls = JWPlatformIE._extract_urls(webpage)
+        if jwplatform_urls:
+            return self.playlist_from_matches(jwplatform_urls, video_id, video_title, ie=JWPlatformIE.ie_key())
  
          # Look for Digiteka embeds
          digiteka_url = DigitekaIE._extract_url(webpage)
@@ -2906,6 +2922,12 @@ class GenericIE(InfoExtractor):
                  for mediasite_url in mediasite_urls]
              return self.playlist_result(entries, video_id, video_title)
  
+        springboardplatform_urls = SpringboardPlatformIE._extract_urls(webpage)
+        if springboardplatform_urls:
+            return self.playlist_from_matches(
+                springboardplatform_urls, video_id, video_title,
+                ie=SpringboardPlatformIE.ie_key())
+
          def merge_dicts(dict1, dict2):
              merged = {}
              for k, v in dict1.items():
diff --git a/youtube_dl/extractor/hotnewhiphop.py b/youtube_dl/extractor/hotnewhiphop.py

index 34163725f8c9562380a3ea30a17780e599f3b0a7..4703e1894f08f70057ace62fc1411980ea7728b4 100644 (file)
--- a/youtube_dl/extractor/hotnewhiphop.py
+++ b/youtube_dl/extractor/hotnewhiphop.py
@@ -1,8 +1,7 @@
  from __future__ import unicode_literals
  
-import base64
-
  from .common import InfoExtractor
+from ..compat import compat_b64decode
  from ..utils import (
      ExtractorError,
      HEADRequest,
@@ -48,7 +47,7 @@ class HotNewHipHopIE(InfoExtractor):
          if 'mediaKey' not in mkd:
              raise ExtractorError('Did not get a media key')
  
-        redirect_url = base64.b64decode(video_url_base64).decode('utf-8')
+        redirect_url = compat_b64decode(video_url_base64).decode('utf-8')
          redirect_req = HEADRequest(redirect_url)
          req = self._request_webpage(
              redirect_req, video_id,
diff --git a/youtube_dl/extractor/infoq.py b/youtube_dl/extractor/infoq.py

index c3e892feb1fd905b98f99b5670ee7c120b0b208d..391c2f5d015f970098fea710725e9be77e574cab 100644 (file)
--- a/youtube_dl/extractor/infoq.py
+++ b/youtube_dl/extractor/infoq.py
@@ -2,9 +2,8 @@
  
  from __future__ import unicode_literals
  
-import base64
-
  from ..compat import (
+    compat_b64decode,
      compat_urllib_parse_unquote,
      compat_urlparse,
  )
@@ -61,7 +60,7 @@ class InfoQIE(BokeCCBaseIE):
          encoded_id = self._search_regex(
              r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id', default=None)
  
-        real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
+        real_id = compat_urllib_parse_unquote(compat_b64decode(encoded_id).decode('utf-8'))
          playpath = 'mp4:' + real_id
  
          return [{
diff --git a/youtube_dl/extractor/jwplatform.py b/youtube_dl/extractor/jwplatform.py

index c9bcbb08f787ef74bea78c4b402ecb7b167556e7..63d0dc998cf1cf281dda3c27f3afaae84f4906c9 100644 (file)
--- a/youtube_dl/extractor/jwplatform.py
+++ b/youtube_dl/extractor/jwplatform.py
@@ -23,11 +23,14 @@ class JWPlatformIE(InfoExtractor):
  
      @staticmethod
      def _extract_url(webpage):
-        mobj = re.search(
-            r'<(?:script|iframe)[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})',
+        urls = JWPlatformIE._extract_urls(webpage)
+        return urls[0] if urls else None
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
              webpage)
-        if mobj:
-            return mobj.group('url')
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/kamcord.py b/youtube_dl/extractor/kamcord.py

deleted file mode 100644 (file)

index b50120d..0000000
--- a/youtube_dl/extractor/kamcord.py
+++ /dev/null
@@ -1,71 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    int_or_none,
-    qualities,
-)
-
-
-class KamcordIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
-    _TEST = {
-        'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
-        'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
-        'info_dict': {
-            'id': 'hNYRduDgWb4',
-            'ext': 'mp4',
-            'title': 'Drinking Madness',
-            'uploader': 'jacksfilms',
-            'uploader_id': '3044562',
-            'view_count': int,
-            'like_count': int,
-            'comment_count': int,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        video = self._parse_json(
-            self._search_regex(
-                r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
-                webpage, 'video'),
-            video_id)['video']
-
-        title = video['title']
-
-        formats = self._extract_m3u8_formats(
-            video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
-        self._sort_formats(formats)
-
-        uploader = video.get('user', {}).get('username')
-        uploader_id = video.get('user', {}).get('id')
-
-        view_count = int_or_none(video.get('viewCount'))
-        like_count = int_or_none(video.get('heartCount'))
-        comment_count = int_or_none(video.get('messageCount'))
-
-        preference_key = qualities(('small', 'medium', 'large'))
-
-        thumbnails = [{
-            'url': thumbnail_url,
-            'id': thumbnail_id,
-            'preference': preference_key(thumbnail_id),
-        } for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
-            if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'view_count': view_count,
-            'like_count': like_count,
-            'comment_count': comment_count,
-            'thumbnails': thumbnails,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py

index 0a07c1320993647dbc844b6d0bdfd4591ef2825d..ffe10154b7c6acd986f41ea0036d4b195fb003fb 100644 (file)
--- a/youtube_dl/extractor/leeco.py
+++ b/youtube_dl/extractor/leeco.py
@@ -1,7 +1,6 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import datetime
  import hashlib
  import re
@@ -9,6 +8,7 @@ import time
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_b64decode,
      compat_ord,
      compat_str,
      compat_urllib_parse_urlencode,
@@ -329,7 +329,7 @@ class LetvCloudIE(InfoExtractor):
                  raise ExtractorError('Letv cloud returned an unknwon error')
  
          def b64decode(s):
-            return base64.b64decode(s.encode('utf-8')).decode('utf-8')
+            return compat_b64decode(s).decode('utf-8')
  
          formats = []
          for media in play_json['data']['video_info']['media'].values():
diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py

index ad65b2759d7245d2129e238f703a8e66a6cb333c..2803d7e8df47c92003ccd11a23c951af1644a7a1 100644 (file)
--- a/youtube_dl/extractor/limelight.py
+++ b/youtube_dl/extractor/limelight.py
@@ -10,6 +10,7 @@ from ..utils import (
      float_or_none,
      int_or_none,
      smuggle_url,
+    try_get,
      unsmuggle_url,
      ExtractorError,
  )
@@ -220,6 +221,12 @@ class LimelightBaseIE(InfoExtractor):
              'subtitles': subtitles,
          }
  
+    def _extract_info_helper(self, pc, mobile, i, metadata):
+        return self._extract_info(
+            try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
+            try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
+            metadata)
+
  
  class LimelightMediaIE(LimelightBaseIE):
      IE_NAME = 'limelight'
@@ -282,10 +289,7 @@ class LimelightMediaIE(LimelightBaseIE):
              'getMobilePlaylistByMediaId', 'properties',
              smuggled_data.get('source_url'))
  
-        return self._extract_info(
-            pc['playlistItems'][0].get('streams', []),
-            mobile['mediaList'][0].get('mobileUrls', []) if mobile else [],
-            metadata)
+        return self._extract_info_helper(pc, mobile, 0, metadata)
  
  
  class LimelightChannelIE(LimelightBaseIE):
@@ -326,10 +330,7 @@ class LimelightChannelIE(LimelightBaseIE):
              'media', smuggled_data.get('source_url'))
  
          entries = [
-            self._extract_info(
-                pc['playlistItems'][i].get('streams', []),
-                mobile['mediaList'][i].get('mobileUrls', []) if mobile else [],
-                medias['media_list'][i])
+            self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
              for i in range(len(medias['media_list']))]
  
          return self.playlist_result(entries, channel_id, pc['title'])
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index 1b6f5091d36eecb0f241b95e38739d4b967d6c34..f5c7abc13eb431605355c080ade55da197fc04e3 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -94,7 +94,15 @@ class LyndaBaseIE(InfoExtractor):
  class LyndaIE(LyndaBaseIE):
      IE_NAME = 'lynda'
      IE_DESC = 'lynda.com videos'
-    _VALID_URL = r'https?://(?:www\.)?(?:lynda\.com|educourse\.ga)/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?(?:lynda\.com|educourse\.ga)/
+                        (?:
+                            (?:[^/]+/){2,3}(?P<course_id>\d+)|
+                            player/embed
+                        )/
+                        (?P<id>\d+)
+                    '''
  
      _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
  
@@ -113,6 +121,9 @@ class LyndaIE(LyndaBaseIE):
      }, {
          'url': 'https://educourse.ga/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
          'only_matching': True,
+    }, {
+        'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
+        'only_matching': True,
      }]
  
      def _raise_unavailable(self, video_id):
@@ -244,8 +255,9 @@ class LyndaIE(LyndaBaseIE):
      def _get_subtitles(self, video_id):
          url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
          subs = self._download_json(url, None, False)
-        if subs:
-            return {'en': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]}
+        fixed_subs = self._fix_subtitles(subs)
+        if fixed_subs:
+            return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
          else:
              return {}
  
@@ -256,7 +268,15 @@ class LyndaCourseIE(LyndaBaseIE):
  
      # Course link equals to welcome/introduction video link of same course
      # We will recognize it as course link
-    _VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>[^/]+/[^/]+/(?P<courseid>\d+))-\d\.html'
+    _VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>(?:[^/]+/){2,3}(?P<courseid>\d+))-2\.html'
+
+    _TESTS = [{
+        'url': 'https://www.lynda.com/Graphic-Design-tutorials/Grundlagen-guten-Gestaltung/393570-2.html',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Grundlagen-guten-Gestaltung/393570-2.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/mangomolo.py b/youtube_dl/extractor/mangomolo.py

index dbd761a67864036cd84be24fa9ef432433c08202..482175a3412db859e32cb79893df1c830afa17e7 100644 (file)
--- a/youtube_dl/extractor/mangomolo.py
+++ b/youtube_dl/extractor/mangomolo.py
@@ -1,13 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
-
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
-from ..utils import (
-    int_or_none,
+from ..compat import (
+    compat_b64decode,
+    compat_urllib_parse_unquote,
  )
+from ..utils import int_or_none
  
  
  class MangomoloBaseIE(InfoExtractor):
@@ -51,4 +50,4 @@ class MangomoloLiveIE(MangomoloBaseIE):
      _IS_LIVE = True
  
      def _get_real_id(self, page_id):
-        return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()
+        return compat_b64decode(compat_urllib_parse_unquote(page_id)).decode()
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index 964dc542cc06cc1128fa11693c1d9a4a64271672..42759eae87cecd940c7b8edd8b9aabbb578e9039 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -1,13 +1,13 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import json
  import uuid
  
  from .common import InfoExtractor
  from .ooyala import OoyalaIE
  from ..compat import (
      compat_str,
-    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -42,31 +42,33 @@ class MiTeleBaseIE(InfoExtractor):
                  duration = int_or_none(mmc.get('duration'))
              for location in mmc['locations']:
                  gat = self._proto_relative_url(location.get('gat'), 'http:')
-                bas = location.get('bas')
-                loc = location.get('loc')
+                gcp = location.get('gcp')
                  ogn = location.get('ogn')
-                if None in (gat, bas, loc, ogn):
+                if None in (gat, gcp, ogn):
                      continue
                  token_data = {
-                    'bas': bas,
-                    'icd': loc,
+                    'gcp': gcp,
                      'ogn': ogn,
-                    'sta': '0',
+                    'sta': 0,
                  }
                  media = self._download_json(
-                    '%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
-                    video_id, 'Downloading %s JSON' % location['loc'])
-                file_ = media.get('file')
-                if not file_:
+                    gat, video_id, data=json.dumps(token_data).encode('utf-8'),
+                    headers={
+                        'Content-Type': 'application/json;charset=utf-8',
+                        'Referer': url,
+                    })
+                stream = media.get('stream') or media.get('file')
+                if not stream:
                      continue
-                ext = determine_ext(file_)
+                ext = determine_ext(stream)
                  if ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
-                        file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
+                        stream + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
                          video_id, f4m_id='hds', fatal=False))
                  elif ext == 'm3u8':
                      formats.extend(self._extract_m3u8_formats(
-                        file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+                        stream, video_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index 7b2bb6e20577929abb9097e2d05c13cfc141d4f9..a56b7690f8703cb873cba7d59cb097afee1b8121 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -1,12 +1,12 @@
  from __future__ import unicode_literals
  
-import base64
  import functools
  import itertools
  import re
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_b64decode,
      compat_chr,
      compat_ord,
      compat_str,
@@ -79,7 +79,7 @@ class MixcloudIE(InfoExtractor):
  
          if encrypted_play_info is not None:
              # Decode
-            encrypted_play_info = base64.b64decode(encrypted_play_info)
+            encrypted_play_info = compat_b64decode(encrypted_play_info)
          else:
              # New path
              full_info_json = self._parse_json(self._html_search_regex(
@@ -109,7 +109,7 @@ class MixcloudIE(InfoExtractor):
              kpa_target = encrypted_play_info
          else:
              kps = ['https://', 'http://']
-            kpa_target = base64.b64decode(info_json['streamInfo']['url'])
+            kpa_target = compat_b64decode(info_json['streamInfo']['url'])
          for kp in kps:
              partial_key = self._decrypt_xor_cipher(kpa_target, kp)
              for quote in ["'", '"']:
@@ -165,7 +165,7 @@ class MixcloudIE(InfoExtractor):
                  format_url = stream_info.get(url_key)
                  if not format_url:
                      continue
-                decrypted = self._decrypt_xor_cipher(key, base64.b64decode(format_url))
+                decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url))
                  if not decrypted:
                      continue
                  if url_key == 'hlsUrl':
diff --git a/youtube_dl/extractor/motherless.py b/youtube_dl/extractor/motherless.py

index 6fe3b6049b2917ed5d7b075d0ca2c7ae943c459f..e24396e791cfddf88e53a34639e563b54795694e 100644 (file)
--- a/youtube_dl/extractor/motherless.py
+++ b/youtube_dl/extractor/motherless.py
@@ -4,8 +4,11 @@ import datetime
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
+    InAdvancePagedList,
+    orderedSet,
      str_to_int,
      unified_strdate,
  )
@@ -114,3 +117,86 @@ class MotherlessIE(InfoExtractor):
              'age_limit': age_limit,
              'url': video_url,
          }
+
+
+class MotherlessGroupIE(InfoExtractor):
+    _VALID_URL = 'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)'
+    _TESTS = [{
+        'url': 'http://motherless.com/g/movie_scenes',
+        'info_dict': {
+            'id': 'movie_scenes',
+            'title': 'Movie Scenes',
+            'description': 'Hot and sexy scenes from "regular" movies... '
+                           'Beautiful actresses fully nude... A looot of '
+                           'skin! :)Enjoy!',
+        },
+        'playlist_mincount': 662,
+    }, {
+        'url': 'http://motherless.com/gv/sex_must_be_funny',
+        'info_dict': {
+            'id': 'sex_must_be_funny',
+            'title': 'Sex must be funny',
+            'description': 'Sex can be funny. Wide smiles,laugh, games, fun of '
+                           'any kind!'
+        },
+        'playlist_mincount': 9,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return (False if MotherlessIE.suitable(url)
+                else super(MotherlessGroupIE, cls).suitable(url))
+
+    def _extract_entries(self, webpage, base):
+        entries = []
+        for mobj in re.finditer(
+                r'href="(?P<href>/[^"]+)"[^>]*>(?:\s*<img[^>]+alt="[^-]+-\s(?P<title>[^"]+)")?',
+                webpage):
+            video_url = compat_urlparse.urljoin(base, mobj.group('href'))
+            if not MotherlessIE.suitable(video_url):
+                continue
+            video_id = MotherlessIE._match_id(video_url)
+            title = mobj.group('title')
+            entries.append(self.url_result(
+                video_url, ie=MotherlessIE.ie_key(), video_id=video_id,
+                video_title=title))
+        # Alternative fallback
+        if not entries:
+            entries = [
+                self.url_result(
+                    compat_urlparse.urljoin(base, '/' + video_id),
+                    ie=MotherlessIE.ie_key(), video_id=video_id)
+                for video_id in orderedSet(re.findall(
+                    r'data-codename=["\']([A-Z0-9]+)', webpage))]
+        return entries
+
+    def _real_extract(self, url):
+        group_id = self._match_id(url)
+        page_url = compat_urlparse.urljoin(url, '/gv/%s' % group_id)
+        webpage = self._download_webpage(page_url, group_id)
+        title = self._search_regex(
+            r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
+        description = self._html_search_meta(
+            'description', webpage, fatal=False)
+        page_count = self._int(self._search_regex(
+            r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
+            webpage, 'page_count'), 'page_count')
+        PAGE_SIZE = 80
+
+        def _get_page(idx):
+            webpage = self._download_webpage(
+                page_url, group_id, query={'page': idx + 1},
+                note='Downloading page %d/%d' % (idx + 1, page_count)
+            )
+            for entry in self._extract_entries(webpage, url):
+                yield entry
+
+        playlist = InAdvancePagedList(_get_page, page_count, PAGE_SIZE)
+
+        return {
+            '_type': 'playlist',
+            'id': group_id,
+            'title': title,
+            'description': description,
+            'entries': playlist
+        }
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index 07528d140f38bfa68a0d04cb85978d1017bae547..aec2ea1331f3c909957e50d4166e7657618fa1a6 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -190,10 +190,12 @@ class NDREmbedBaseIE(InfoExtractor):
              ext = determine_ext(src, None)
              if ext == 'f4m':
                  formats.extend(self._extract_f4m_formats(
-                    src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
+                    src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
+                    f4m_id='hds', fatal=False))
              elif ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
-                    src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
+                    src, video_id, 'mp4', m3u8_id='hls',
+                    entry_protocol='m3u8_native', fatal=False))
              else:
                  quality = f.get('quality')
                  ff = {
diff --git a/youtube_dl/extractor/odnoklassniki.py b/youtube_dl/extractor/odnoklassniki.py

index 8e13bcf1fe4674e803f2a87f177980779fa62637..5c8b37e18bf6232bfd70c6221b6da925411b9fe7 100644 (file)
--- a/youtube_dl/extractor/odnoklassniki.py
+++ b/youtube_dl/extractor/odnoklassniki.py
@@ -19,11 +19,11 @@ from ..utils import (
  
  
  class OdnoklassnikiIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
+    _VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer|live)/(?P<id>[\d-]+)'
      _TESTS = [{
          # metadata in JSON
          'url': 'http://ok.ru/video/20079905452',
-        'md5': '6ba728d85d60aa2e6dd37c9e70fdc6bc',
+        'md5': '0b62089b479e06681abaaca9d204f152',
          'info_dict': {
              'id': '20079905452',
              'ext': 'mp4',
@@ -35,7 +35,6 @@ class OdnoklassnikiIE(InfoExtractor):
              'like_count': int,
              'age_limit': 0,
          },
-        'skip': 'Video has been blocked',
      }, {
          # metadataUrl
          'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
@@ -99,6 +98,9 @@ class OdnoklassnikiIE(InfoExtractor):
      }, {
          'url': 'http://mobile.ok.ru/video/20079905452',
          'only_matching': True,
+    }, {
+        'url': 'https://www.ok.ru/live/484531969818',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -184,6 +186,10 @@ class OdnoklassnikiIE(InfoExtractor):
              })
              return info
  
+        assert title
+        if provider == 'LIVE_TV_APP':
+            info['title'] = self._live_title(title)
+
          quality = qualities(('4', '0', '1', '2', '3', '5'))
  
          formats = [{
@@ -210,6 +216,20 @@ class OdnoklassnikiIE(InfoExtractor):
              if fmt_type:
                  fmt['quality'] = quality(fmt_type)
  
+        # Live formats
+        m3u8_url = metadata.get('hlsMasterPlaylistUrl')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8',
+                m3u8_id='hls', fatal=False))
+        rtmp_url = metadata.get('rtmpUrl')
+        if rtmp_url:
+            formats.append({
+                'url': rtmp_url,
+                'format_id': 'rtmp',
+                'ext': 'flv',
+            })
+
          self._sort_formats(formats)
  
          info['formats'] = formats
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index 52580baed3c49242be55529f8c75f5d110ff41f7..ad8bf03f86a8bf3511fdc864ba462337f7bb51ec 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -1,9 +1,13 @@
  from __future__ import unicode_literals
+
  import re
-import base64
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import (
+    compat_b64decode,
+    compat_str,
+    compat_urllib_parse_urlencode,
+)
  from ..utils import (
      determine_ext,
      ExtractorError,
@@ -12,7 +16,6 @@ from ..utils import (
      try_get,
      unsmuggle_url,
  )
-from ..compat import compat_urllib_parse_urlencode
  
  
  class OoyalaBaseIE(InfoExtractor):
@@ -44,7 +47,7 @@ class OoyalaBaseIE(InfoExtractor):
                  url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
                  if not url_data:
                      continue
-                s_url = base64.b64decode(url_data.encode('ascii')).decode('utf-8')
+                s_url = compat_b64decode(url_data).decode('utf-8')
                  if not s_url or s_url in urls:
                      continue
                  urls.append(s_url)
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

index b282bcfd9d2abc950c4a0e24fe808adfbf1a0e41..eaaaf8a081782ae597f2ed6a376c09fca6fbf5e5 100644 (file)
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -333,7 +333,11 @@ class OpenloadIE(InfoExtractor):
          webpage, _ = phantom.get(page_url, html=webpage, video_id=video_id, headers=headers)
  
          decoded_id = (get_element_by_id('streamurl', webpage) or
-                      get_element_by_id('streamuri', webpage))
+                      get_element_by_id('streamuri', webpage) or
+                      get_element_by_id('streamurj', webpage))
+
+        if not decoded_id:
+            raise ExtractorError('Can\'t find stream URL', video_id=video_id)
  
          video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id
  
diff --git a/youtube_dl/extractor/pandoratv.py b/youtube_dl/extractor/pandoratv.py

index fc7bd34112f8863b741a67e2b97d473c7fc55e55..538738c090a515c296969dab9762ee1d643d1382 100644 (file)
--- a/youtube_dl/extractor/pandoratv.py
+++ b/youtube_dl/extractor/pandoratv.py
@@ -1,6 +1,8 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import (
      compat_str,
@@ -18,7 +20,14 @@ from ..utils import (
  class PandoraTVIE(InfoExtractor):
      IE_NAME = 'pandora.tv'
      IE_DESC = '판도라TV'
-    _VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:
+                                (?:www\.)?pandora\.tv/view/(?P<user_id>[^/]+)/(?P<id>\d+)|  # new format
+                                (?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?|        # old format
+                                m\.pandora\.tv/?\?                                          # mobile
+                            )
+                    '''
      _TESTS = [{
          'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
          'info_dict': {
@@ -53,14 +62,25 @@ class PandoraTVIE(InfoExtractor):
              # Test metadata only
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.pandora.tv/view/mikakim/53294230#36797454_new',
+        'only_matching': True,
+    }, {
+        'url': 'http://m.pandora.tv/?c=view&ch_userid=mikakim&prgid=54600346',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
-        video_id = qs.get('prgid', [None])[0]
-        user_id = qs.get('ch_userid', [None])[0]
-        if any(not f for f in (video_id, user_id,)):
-            raise ExtractorError('Invalid URL', expected=True)
+        mobj = re.match(self._VALID_URL, url)
+        user_id = mobj.group('user_id')
+        video_id = mobj.group('id')
+
+        if not user_id or not video_id:
+            qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+            video_id = qs.get('prgid', [None])[0]
+            user_id = qs.get('ch_userid', [None])[0]
+            if any(not f for f in (video_id, user_id,)):
+                raise ExtractorError('Invalid URL', expected=True)
  
          data = self._download_json(
              'http://m.pandora.tv/?c=view&m=viewJsonApi&ch_userid=%s&prgid=%s'
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index d8a4bd2443304e2a890dbdcc873cb74a257d6a51..48757fd4f71de7ee0c2682f818f3976d72080741 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -344,6 +344,8 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
          r'clip[iI]d=(\d+)',
          r'clip[iI]d\s*=\s*["\'](\d+)',
          r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
+        r'proMamsId&quot;\s*:\s*&quot;(\d+)',
+        r'proMamsId"\s*:\s*"(\d+)',
      ]
      _TITLE_REGEXES = [
          r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',
diff --git a/youtube_dl/extractor/restudy.py b/youtube_dl/extractor/restudy.py

index fd50065d4ad05d668919f2c0392e99f3ab637032..d47fb45ca5f7ce73e2905136f579cdaa6937fb32 100644 (file)
--- a/youtube_dl/extractor/restudy.py
+++ b/youtube_dl/extractor/restudy.py
@@ -5,8 +5,8 @@ from .common import InfoExtractor
  
  
  class RestudyIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?restudy\.dk/video/play/id/(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www|portal)\.)?restudy\.dk/video/[^/]+/id/(?P<id>[0-9]+)'
+    _TESTS = [{
          'url': 'https://www.restudy.dk/video/play/id/1637',
          'info_dict': {
              'id': '1637',
@@ -18,7 +18,10 @@ class RestudyIE(InfoExtractor):
              # rtmp download
              'skip_download': True,
          }
-    }
+    }, {
+        'url': 'https://portal.restudy.dk/video/leiden-frosteffekt/id/1637',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -29,7 +32,7 @@ class RestudyIE(InfoExtractor):
          description = self._og_search_description(webpage).strip()
  
          formats = self._extract_smil_formats(
-            'https://www.restudy.dk/awsmedia/SmilDirectory/video_%s.xml' % video_id,
+            'https://cdn.portal.restudy.dk/dynamic/themes/front/awsmedia/SmilDirectory/video_%s.xml' % video_id,
              video_id)
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/ringtv.py b/youtube_dl/extractor/ringtv.py

deleted file mode 100644 (file)

index 2c2c707..0000000
--- a/youtube_dl/extractor/ringtv.py
+++ /dev/null
@@ -1,44 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class RingTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ringtv\.craveonline\.com/(?P<type>news|videos/video)/(?P<id>[^/?#]+)'
-    _TEST = {
-        'url': 'http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30',
-        'md5': 'd25945f5df41cdca2d2587165ac28720',
-        'info_dict': {
-            'id': '857645',
-            'ext': 'mp4',
-            'title': 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
-            'description': 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id').split('-')[0]
-        webpage = self._download_webpage(url, video_id)
-
-        if mobj.group('type') == 'news':
-            video_id = self._search_regex(
-                r'''(?x)<iframe[^>]+src="http://cms\.springboardplatform\.com/
-                        embed_iframe/[0-9]+/video/([0-9]+)/''',
-                webpage, 'real video ID')
-        title = self._og_search_title(webpage)
-        description = self._html_search_regex(
-            r'addthis:description="([^"]+)"',
-            webpage, 'description', fatal=False)
-        final_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4' % video_id
-        thumbnail_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg' % video_id
-
-        return {
-            'id': video_id,
-            'url': final_url,
-            'title': title,
-            'thumbnail': thumbnail_url,
-            'description': description,
-        }
diff --git a/youtube_dl/extractor/rtl2.py b/youtube_dl/extractor/rtl2.py

index 666e90e9022892a3c91e64ca7fba83d59e524f92..18a327d81a44aa40640efdeeeb12283686d8a984 100644 (file)
--- a/youtube_dl/extractor/rtl2.py
+++ b/youtube_dl/extractor/rtl2.py
@@ -1,12 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import re
  
  from .common import InfoExtractor
  from ..aes import aes_cbc_decrypt
  from ..compat import (
+    compat_b64decode,
      compat_ord,
      compat_str,
  )
@@ -142,11 +142,11 @@ class RTL2YouIE(RTL2YouBaseIE):
          stream_data = self._download_json(
              self._BACKWERK_BASE_URL + 'stream/video/' + video_id, video_id)
  
-        data, iv = base64.b64decode(stream_data['streamUrl']).decode().split(':')
+        data, iv = compat_b64decode(stream_data['streamUrl']).decode().split(':')
          stream_url = intlist_to_bytes(aes_cbc_decrypt(
-            bytes_to_intlist(base64.b64decode(data)),
+            bytes_to_intlist(compat_b64decode(data)),
              bytes_to_intlist(self._AES_KEY),
-            bytes_to_intlist(base64.b64decode(iv))
+            bytes_to_intlist(compat_b64decode(iv))
          ))
          if b'rtl2_you_video_not_found' in stream_url:
              raise ExtractorError('video not found', expected=True)
diff --git a/youtube_dl/extractor/rtve.py b/youtube_dl/extractor/rtve.py

index d9edf9da2f71cf6fbcf3c8bfd310c4818eced62c..ce9db0629459d23ea0cc870a1bde348d21e99e9a 100644 (file)
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -7,6 +7,7 @@ import time
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_b64decode,
      compat_struct_unpack,
  )
  from ..utils import (
@@ -21,7 +22,7 @@ from ..utils import (
  
  
  def _decrypt_url(png):
-    encrypted_data = base64.b64decode(png.encode('utf-8'))
+    encrypted_data = compat_b64decode(png)
      text_index = encrypted_data.find(b'tEXt')
      text_chunk = encrypted_data[text_index - 4:]
      length = compat_struct_unpack('!I', text_chunk[:4])[0]
@@ -31,6 +32,9 @@ def _decrypt_url(png):
      hash_index = data.index('#')
      alphabet_data = data[:hash_index]
      url_data = data[hash_index + 1:]
+    if url_data[0] == 'H' and url_data[3] == '%':
+        # remove useless HQ%% at the start
+        url_data = url_data[4:]
  
      alphabet = []
      e = 0
diff --git a/youtube_dl/extractor/rtvs.py b/youtube_dl/extractor/rtvs.py

new file mode 100644 (file)

index 0000000..6573b26
--- /dev/null
+++ b/youtube_dl/extractor/rtvs.py
@@ -0,0 +1,47 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class RTVSIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?rtvs\.sk/(?:radio|televizia)/archiv/\d+/(?P<id>\d+)'
+    _TESTS = [{
+        # radio archive
+        'url': 'http://www.rtvs.sk/radio/archiv/11224/414872',
+        'md5': '134d5d6debdeddf8a5d761cbc9edacb8',
+        'info_dict': {
+            'id': '414872',
+            'ext': 'mp3',
+            'title': 'Ostrov pokladov 1 časť.mp3'
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        # tv archive
+        'url': 'http://www.rtvs.sk/televizia/archiv/8249/63118',
+        'md5': '85e2c55cf988403b70cac24f5c086dc6',
+        'info_dict': {
+            'id': '63118',
+            'ext': 'mp4',
+            'title': 'Amaro Džives - Náš deň',
+            'description': 'Galavečer pri príležitosti Medzinárodného dňa Rómov.'
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        playlist_url = self._search_regex(
+            r'playlist["\']?\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+            'playlist url', group='url')
+
+        data = self._download_json(
+            playlist_url, video_id, 'Downloading playlist')[0]
+        return self._parse_jwplayer_data(data, video_id=video_id)
diff --git a/youtube_dl/extractor/seznamzpravy.py b/youtube_dl/extractor/seznamzpravy.py

new file mode 100644 (file)

index 0000000..cf32d1e
--- /dev/null
+++ b/youtube_dl/extractor/seznamzpravy.py
@@ -0,0 +1,170 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_parse_qs,
+    compat_str,
+    compat_urllib_parse_urlparse,
+)
+from ..utils import (
+    urljoin,
+    int_or_none,
+    parse_codecs,
+    try_get,
+)
+
+
+def _raw_id(src_url):
+    return compat_urllib_parse_urlparse(src_url).path.split('/')[-1]
+
+
+class SeznamZpravyIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?seznamzpravy\.cz/iframe/player\?.*\bsrc='
+    _TESTS = [{
+        'url': 'https://www.seznamzpravy.cz/iframe/player?duration=241&serviceSlug=zpravy&src=https%3A%2F%2Fv39-a.sdn.szn.cz%2Fv_39%2Fvmd%2F5999c902ea707c67d8e267a9%3Ffl%3Dmdk%2C432f65a0%7C&itemType=video&autoPlay=false&title=Sv%C4%9Bt%20bez%20obalu%3A%20%C4%8Ce%C5%A1t%C3%AD%20voj%C3%A1ci%20na%20mis%C3%ADch%20(kr%C3%A1tk%C3%A1%20verze)&series=Sv%C4%9Bt%20bez%20obalu&serviceName=Seznam%20Zpr%C3%A1vy&poster=%2F%2Fd39-a.sdn.szn.cz%2Fd_39%2Fc_img_F_I%2FR5puJ.jpeg%3Ffl%3Dcro%2C0%2C0%2C1920%2C1080%7Cres%2C1200%2C%2C1%7Cjpg%2C80%2C%2C1&width=1920&height=1080&cutFrom=0&cutTo=0&splVersion=VOD&contentId=170889&contextId=35990&showAdvert=true&collocation=&autoplayPossible=true&embed=&isVideoTooShortForPreroll=false&isVideoTooLongForPostroll=true&videoCommentOpKey=&videoCommentId=&version=4.0.76&dotService=zpravy&gemiusPrismIdentifier=bVc1ZIb_Qax4W2v5xOPGpMeCP31kFfrTzj0SqPTLh_b.Z7&zoneIdPreroll=seznam.pack.videospot&skipOffsetPreroll=5&sectionPrefixPreroll=%2Fzpravy',
+        'info_dict': {
+            'id': '170889',
+            'ext': 'mp4',
+            'title': 'Svět bez obalu: Čeští vojáci na misích (krátká verze)',
+            'thumbnail': r're:^https?://.*\.jpe?g',
+            'duration': 241,
+            'series': 'Svět bez obalu',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # with Location key
+        'url': 'https://www.seznamzpravy.cz/iframe/player?duration=null&serviceSlug=zpravy&src=https%3A%2F%2Flive-a.sdn.szn.cz%2Fv_39%2F59e468fe454f8472a96af9fa%3Ffl%3Dmdk%2C5c1e2840%7C&itemType=livevod&autoPlay=false&title=P%C5%99edseda%20KDU-%C4%8CSL%20Pavel%20B%C4%9Blobr%C3%A1dek%20ve%20volebn%C3%AD%20V%C3%BDzv%C4%9B%20Seznamu&series=V%C3%BDzva&serviceName=Seznam%20Zpr%C3%A1vy&poster=%2F%2Fd39-a.sdn.szn.cz%2Fd_39%2Fc_img_G_J%2FjTBCs.jpeg%3Ffl%3Dcro%2C0%2C0%2C1280%2C720%7Cres%2C1200%2C%2C1%7Cjpg%2C80%2C%2C1&width=16&height=9&cutFrom=0&cutTo=0&splVersion=VOD&contentId=185688&contextId=38489&showAdvert=true&collocation=&hideFullScreen=false&hideSubtitles=false&embed=&isVideoTooShortForPreroll=false&isVideoTooShortForPreroll2=false&isVideoTooLongForPostroll=false&fakePostrollZoneID=seznam.clanky.zpravy.preroll&fakePrerollZoneID=seznam.clanky.zpravy.preroll&videoCommentId=&trim=default_16x9&noPrerollVideoLength=30&noPreroll2VideoLength=undefined&noMidrollVideoLength=0&noPostrollVideoLength=999999&autoplayPossible=true&version=5.0.41&dotService=zpravy&gemiusPrismIdentifier=zD3g7byfW5ekpXmxTVLaq5Srjw5i4hsYo0HY1aBwIe..27&zoneIdPreroll=seznam.pack.videospot&skipOffsetPreroll=5&sectionPrefixPreroll=%2Fzpravy%2Fvyzva&zoneIdPostroll=seznam.pack.videospot&skipOffsetPostroll=5&sectionPrefixPostroll=%2Fzpravy%2Fvyzva&regression=false',
+        'info_dict': {
+            'id': '185688',
+            'ext': 'mp4',
+            'title': 'Předseda KDU-ČSL Pavel Bělobrádek ve volební Výzvě Seznamu',
+            'thumbnail': r're:^https?://.*\.jpe?g',
+            'series': 'Výzva',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [
+            mobj.group('url') for mobj in re.finditer(
+                r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?seznamzpravy\.cz/iframe/player\?.*?)\1',
+                webpage)]
+
+    def _extract_sdn_formats(self, sdn_url, video_id):
+        sdn_data = self._download_json(sdn_url, video_id)
+
+        if sdn_data.get('Location'):
+            sdn_url = sdn_data['Location']
+            sdn_data = self._download_json(sdn_url, video_id)
+
+        formats = []
+        mp4_formats = try_get(sdn_data, lambda x: x['data']['mp4'], dict) or {}
+        for format_id, format_data in mp4_formats.items():
+            relative_url = format_data.get('url')
+            if not relative_url:
+                continue
+
+            try:
+                width, height = format_data.get('resolution')
+            except (TypeError, ValueError):
+                width, height = None, None
+
+            f = {
+                'url': urljoin(sdn_url, relative_url),
+                'format_id': 'http-%s' % format_id,
+                'tbr': int_or_none(format_data.get('bandwidth'), scale=1000),
+                'width': int_or_none(width),
+                'height': int_or_none(height),
+            }
+            f.update(parse_codecs(format_data.get('codec')))
+            formats.append(f)
+
+        pls = sdn_data.get('pls', {})
+
+        def get_url(format_id):
+            return try_get(pls, lambda x: x[format_id]['url'], compat_str)
+
+        dash_rel_url = get_url('dash')
+        if dash_rel_url:
+            formats.extend(self._extract_mpd_formats(
+                urljoin(sdn_url, dash_rel_url), video_id, mpd_id='dash',
+                fatal=False))
+
+        hls_rel_url = get_url('hls')
+        if hls_rel_url:
+            formats.extend(self._extract_m3u8_formats(
+                urljoin(sdn_url, hls_rel_url), video_id, ext='mp4',
+                m3u8_id='hls', fatal=False))
+
+        self._sort_formats(formats)
+        return formats
+
+    def _real_extract(self, url):
+        params = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
+
+        src = params['src'][0]
+        title = params['title'][0]
+        video_id = params.get('contentId', [_raw_id(src)])[0]
+        formats = self._extract_sdn_formats(src + 'spl2,2,VOD', video_id)
+
+        duration = int_or_none(params.get('duration', [None])[0])
+        series = params.get('series', [None])[0]
+        thumbnail = params.get('poster', [None])[0]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'series': series,
+            'formats': formats,
+        }
+
+
+class SeznamZpravyArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?:seznam\.cz/zpravy|seznamzpravy\.cz)/clanek/(?:[^/?#&]+)-(?P<id>\d+)'
+    _API_URL = 'https://apizpravy.seznam.cz/'
+
+    _TESTS = [{
+        # two videos on one page, with SDN URL
+        'url': 'https://www.seznamzpravy.cz/clanek/jejich-svet-na-nas-utoci-je-lepsi-branit-se-na-jejich-pisecku-rika-reziser-a-major-v-zaloze-marhoul-35990',
+        'info_dict': {
+            'id': '35990',
+            'title': 'md5:6011c877a36905f28f271fcd8dcdb0f2',
+            'description': 'md5:933f7b06fa337a814ba199d3596d27ba',
+        },
+        'playlist_count': 2,
+    }, {
+        # video with live stream URL
+        'url': 'https://www.seznam.cz/zpravy/clanek/znovu-do-vlady-s-ano-pavel-belobradek-ve-volebnim-specialu-seznamu-38489',
+        'info_dict': {
+            'id': '38489',
+            'title': 'md5:8fa1afdc36fd378cf0eba2b74c5aca60',
+            'description': 'md5:428e7926a1a81986ec7eb23078004fb4',
+        },
+        'playlist_count': 1,
+    }]
+
+    def _real_extract(self, url):
+        article_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, article_id)
+
+        info = self._search_json_ld(webpage, article_id, default={})
+        print(info)
+
+        title = info.get('title') or self._og_search_title(webpage, fatal=False)
+        description = info.get('description') or self._og_search_description(webpage)
+
+        return self.playlist_result([
+            self.url_result(url, ie=SeznamZpravyIE.ie_key())
+            for url in SeznamZpravyIE._extract_urls(webpage)],
+            article_id, title, description)
diff --git a/youtube_dl/extractor/shared.py b/youtube_dl/extractor/shared.py

index 89e19e9277f42b69ac6f01b03c78ced00f3ec990..b2250afddd43ee01d9552ddd41a9fb5067c5335e 100644 (file)
--- a/youtube_dl/extractor/shared.py
+++ b/youtube_dl/extractor/shared.py
@@ -1,8 +1,7 @@
  from __future__ import unicode_literals
  
-import base64
-
  from .common import InfoExtractor
+from ..compat import compat_b64decode
  from ..utils import (
      ExtractorError,
      int_or_none,
@@ -22,8 +21,8 @@ class SharedBaseIE(InfoExtractor):
  
          video_url = self._extract_video_url(webpage, video_id, url)
  
-        title = base64.b64decode(self._html_search_meta(
-            'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
+        title = compat_b64decode(self._html_search_meta(
+            'full:title', webpage, 'title')).decode('utf-8')
          filesize = int_or_none(self._html_search_meta(
              'full:size', webpage, 'file size', fatal=False))
  
@@ -92,5 +91,4 @@ class VivoIE(SharedBaseIE):
                  r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
                  webpage, 'stream', group='url'),
              video_id,
-            transform_source=lambda x: base64.b64decode(
-                x.encode('ascii')).decode('utf-8'))[0]
+            transform_source=lambda x: compat_b64decode(x).decode('utf-8'))[0]
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 8894f4b0c32ee58894be125dbbd62d02ca6ae8fe..97ff422f04b7f68e2040a01607f974567e46e744 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -136,9 +136,28 @@ class SoundcloudIE(InfoExtractor):
                  'license': 'all-rights-reserved',
              },
          },
+        # no album art, use avatar pic for thumbnail
+        {
+            'url': 'https://soundcloud.com/garyvee/sideways-prod-mad-real',
+            'md5': '59c7872bc44e5d99b7211891664760c2',
+            'info_dict': {
+                'id': '309699954',
+                'ext': 'mp3',
+                'title': 'Sideways (Prod. Mad Real)',
+                'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+                'uploader': 'garyvee',
+                'upload_date': '20170226',
+                'duration': 207,
+                'thumbnail': r're:https?://.*\.jpg',
+                'license': 'all-rights-reserved',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
      ]
  
-    _CLIENT_ID = 'c6CU49JDMapyrQo06UxU9xouB9ZVzqCn'
+    _CLIENT_ID = 'DQskPX1pntALRzMp4HSxya3Mc0AO66Ro'
      _IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
  
      @staticmethod
@@ -160,7 +179,7 @@ class SoundcloudIE(InfoExtractor):
          name = full_title or track_id
          if quiet:
              self.report_extraction(name)
-        thumbnail = info.get('artwork_url')
+        thumbnail = info.get('artwork_url') or info.get('user', {}).get('avatar_url')
          if isinstance(thumbnail, compat_str):
              thumbnail = thumbnail.replace('-large', '-t500x500')
          ext = 'mp3'
diff --git a/youtube_dl/extractor/southpark.py b/youtube_dl/extractor/southpark.py

index d8ce416fc7d1a9ec2e3561752890d916f2bcf93a..da75a43a730334e70f5725a1a41357810cf704de 100644 (file)
--- a/youtube_dl/extractor/southpark.py
+++ b/youtube_dl/extractor/southpark.py
@@ -6,7 +6,7 @@ from .mtv import MTVServicesInfoExtractor
  
  class SouthParkIE(MTVServicesInfoExtractor):
      IE_NAME = 'southpark.cc.com'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
  
      _FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
  
@@ -20,6 +20,9 @@ class SouthParkIE(MTVServicesInfoExtractor):
              'timestamp': 1112760000,
              'upload_date': '20050406',
          },
+    }, {
+        'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
+        'only_matching': True,
      }]
  
  
@@ -41,7 +44,7 @@ class SouthParkEsIE(SouthParkIE):
  
  class SouthParkDeIE(SouthParkIE):
      IE_NAME = 'southpark.de'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden|collections)/(?P<id>.+?)(\?|#|$))'
      _FEED_URL = 'http://www.southpark.de/feeds/video-player/mrss/'
  
      _TESTS = [{
@@ -70,12 +73,15 @@ class SouthParkDeIE(SouthParkIE):
              'description': 'Kyle will mit seinem kleinen Bruder Ike Videospiele spielen. Als der nicht mehr mit ihm spielen will, hat Kyle Angst, dass er die Kids von heute nicht mehr versteht.',
          },
          'playlist_count': 3,
+    }, {
+        'url': 'http://www.southpark.de/collections/2476/superhero-showdown/1',
+        'only_matching': True,
      }]
  
  
  class SouthParkNlIE(SouthParkIE):
      IE_NAME = 'southpark.nl'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
      _FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
  
      _TESTS = [{
@@ -90,7 +96,7 @@ class SouthParkNlIE(SouthParkIE):
  
  class SouthParkDkIE(SouthParkIE):
      IE_NAME = 'southparkstudios.dk'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.dk/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.(?:dk|nu)/(?:clips|full-episodes|collections)/(?P<id>.+?)(\?|#|$))'
      _FEED_URL = 'http://www.southparkstudios.dk/feeds/video-player/mrss/'
  
      _TESTS = [{
@@ -100,4 +106,10 @@ class SouthParkDkIE(SouthParkIE):
              'description': 'Butters is convinced he\'s living in a virtual reality.',
          },
          'playlist_mincount': 3,
+    }, {
+        'url': 'http://www.southparkstudios.dk/collections/2476/superhero-showdown/1',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.southparkstudios.nu/collections/2476/superhero-showdown/1',
+        'only_matching': True,
      }]
diff --git a/youtube_dl/extractor/spiegel.py b/youtube_dl/extractor/spiegel.py

index 84298fee4279bb1484f5e52edbfa03f0bc148b01..fc995e8c14da760dc33c706bfadba532bd86b05d 100644 (file)
--- a/youtube_dl/extractor/spiegel.py
+++ b/youtube_dl/extractor/spiegel.py
@@ -4,7 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .nexx import NexxEmbedIE
+from .nexx import (
+    NexxIE,
+    NexxEmbedIE,
+)
  from .spiegeltv import SpiegeltvIE
  from ..compat import compat_urlparse
  from ..utils import (
@@ -51,6 +54,10 @@ class SpiegelIE(InfoExtractor):
      }, {
          'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
          'only_matching': True,
+    }, {
+        # nexx video
+        'url': 'http://www.spiegel.de/video/spiegel-tv-magazin-ueber-guellekrise-in-schleswig-holstein-video-99012776.html',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -61,6 +68,14 @@ class SpiegelIE(InfoExtractor):
          if SpiegeltvIE.suitable(handle.geturl()):
              return self.url_result(handle.geturl(), 'Spiegeltv')
  
+        nexx_id = self._search_regex(
+            r'nexxOmniaId\s*:\s*(\d+)', webpage, 'nexx id', default=None)
+        if nexx_id:
+            domain_id = NexxIE._extract_domain_id(webpage) or '748'
+            return self.url_result(
+                'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
+                video_id=nexx_id)
+
          video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
  
          title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)
diff --git a/youtube_dl/extractor/sportschau.py b/youtube_dl/extractor/sportschau.py

deleted file mode 100644 (file)

index 0d7925a..0000000
--- a/youtube_dl/extractor/sportschau.py
+++ /dev/null
@@ -1,38 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .wdr import WDRBaseIE
-from ..utils import get_element_by_attribute
-
-
-class SportschauIE(WDRBaseIE):
-    IE_NAME = 'Sportschau'
-    _VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
-    _TEST = {
-        'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
-        'info_dict': {
-            'id': 'mdb-1140188',
-            'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
-            'ext': 'mp4',
-            'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
-            'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
-            'upload_date': '20160615',
-        },
-        'skip': 'Geo-restricted to Germany',
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-        title = get_element_by_attribute('class', 'headline', webpage)
-        description = self._html_search_meta('description', webpage, 'description')
-
-        info = self._extract_wdr_video(webpage, video_id)
-
-        info.update({
-            'title': title,
-            'description': description,
-        })
-
-        return info
diff --git a/youtube_dl/extractor/springboardplatform.py b/youtube_dl/extractor/springboardplatform.py

new file mode 100644 (file)

index 0000000..07d99b5
--- /dev/null
+++ b/youtube_dl/extractor/springboardplatform.py
@@ -0,0 +1,125 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    xpath_attr,
+    xpath_text,
+    xpath_element,
+    unescapeHTML,
+    unified_timestamp,
+)
+
+
+class SpringboardPlatformIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+                    https?://
+                        cms\.springboardplatform\.com/
+                        (?:
+                            (?:previews|embed_iframe)/(?P<index>\d+)/video/(?P<id>\d+)|
+                            xml_feeds_advanced/index/(?P<index_2>\d+)/rss3/(?P<id_2>\d+)
+                        )
+                    '''
+    _TESTS = [{
+        'url': 'http://cms.springboardplatform.com/previews/159/video/981017/0/0/1',
+        'md5': '5c3cb7b5c55740d482561099e920f192',
+        'info_dict': {
+            'id': '981017',
+            'ext': 'mp4',
+            'title': 'Redman "BUD like YOU" "Usher Good Kisser" REMIX',
+            'description': 'Redman "BUD like YOU" "Usher Good Kisser" REMIX',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1409132328,
+            'upload_date': '20140827',
+            'duration': 193,
+        },
+    }, {
+        'url': 'http://cms.springboardplatform.com/embed_iframe/159/video/981017/rab007/rapbasement.com/1/1',
+        'only_matching': True,
+    }, {
+        'url': 'http://cms.springboardplatform.com/embed_iframe/20/video/1731611/ki055/kidzworld.com/10',
+        'only_matching': True,
+    }, {
+        'url': 'http://cms.springboardplatform.com/xml_feeds_advanced/index/159/rss3/981017/0/0/1/',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [
+            mobj.group('url')
+            for mobj in re.finditer(
+                r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//cms\.springboardplatform\.com/embed_iframe/\d+/video/\d+.*?)\1',
+                webpage)]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id') or mobj.group('id_2')
+        index = mobj.group('index') or mobj.group('index_2')
+
+        video = self._download_xml(
+            'http://cms.springboardplatform.com/xml_feeds_advanced/index/%s/rss3/%s'
+            % (index, video_id), video_id)
+
+        item = xpath_element(video, './/item', 'item', fatal=True)
+
+        content = xpath_element(
+            item, './{http://search.yahoo.com/mrss/}content', 'content',
+            fatal=True)
+        title = unescapeHTML(xpath_text(item, './title', 'title', fatal=True))
+
+        video_url = content.attrib['url']
+
+        if 'error_video.mp4' in video_url:
+            raise ExtractorError(
+                'Video %s no longer exists' % video_id, expected=True)
+
+        duration = int_or_none(content.get('duration'))
+        tbr = int_or_none(content.get('bitrate'))
+        filesize = int_or_none(content.get('fileSize'))
+        width = int_or_none(content.get('width'))
+        height = int_or_none(content.get('height'))
+
+        description = unescapeHTML(xpath_text(
+            item, './description', 'description'))
+        thumbnail = xpath_attr(
+            item, './{http://search.yahoo.com/mrss/}thumbnail', 'url',
+            'thumbnail')
+
+        timestamp = unified_timestamp(xpath_text(
+            item, './{http://cms.springboardplatform.com/namespaces.html}created',
+            'timestamp'))
+
+        formats = [{
+            'url': video_url,
+            'format_id': 'http',
+            'tbr': tbr,
+            'filesize': filesize,
+            'width': width,
+            'height': height,
+        }]
+
+        m3u8_format = formats[0].copy()
+        m3u8_format.update({
+            'url': re.sub(r'(https?://)cdn\.', r'\1hls.', video_url) + '.m3u8',
+            'ext': 'mp4',
+            'format_id': 'hls',
+            'protocol': 'm3u8_native',
+        })
+        formats.append(m3u8_format)
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/tbs.py b/youtube_dl/extractor/tbs.py

index eab22c38f187c50334cb43f48535e8982ad847a3..edc31729d35f250d2b8267e87719ea54ce9480e3 100644 (file)
--- a/youtube_dl/extractor/tbs.py
+++ b/youtube_dl/extractor/tbs.py
@@ -58,7 +58,7 @@ class TBSIE(TurnerBaseIE):
                  continue
              if stream_data.get('playlistProtection') == 'spe':
                  m3u8_url = self._add_akamai_spe_token(
-                    'http://www.%s.com/service/token_spe' % site,
+                    'http://token.vgtf.net/token/token_spe',
                      m3u8_url, media_id, {
                          'url': url,
                          'site_name': site[:3].upper(),
diff --git a/youtube_dl/extractor/teachertube.py b/youtube_dl/extractor/teachertube.py

index f14713a78904c0e879571d6642061f3baa7a617a..1272078c50b8703aa906e055fc57842078542e95 100644 (file)
--- a/youtube_dl/extractor/teachertube.py
+++ b/youtube_dl/extractor/teachertube.py
@@ -5,8 +5,9 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    qualities,
      determine_ext,
+    ExtractorError,
+    qualities,
  )
  
  
@@ -17,6 +18,7 @@ class TeacherTubeIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?teachertube\.com/(viewVideo\.php\?video_id=|music\.php\?music_id=|video/(?:[\da-z-]+-)?|audio/)(?P<id>\d+)'
  
      _TESTS = [{
+        # flowplayer
          'url': 'http://www.teachertube.com/viewVideo.php?video_id=339997',
          'md5': 'f9434ef992fd65936d72999951ee254c',
          'info_dict': {
@@ -24,19 +26,10 @@ class TeacherTubeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Measures of dispersion from a frequency table',
              'description': 'Measures of dispersion from a frequency table',
-            'thumbnail': r're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://www.teachertube.com/viewVideo.php?video_id=340064',
-        'md5': '0d625ec6bc9bf50f70170942ad580676',
-        'info_dict': {
-            'id': '340064',
-            'ext': 'mp4',
-            'title': 'How to Make Paper Dolls _ Paper Art Projects',
-            'description': 'Learn how to make paper dolls in this simple',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.(?:jpg|png)',
          },
      }, {
+        # jwplayer
          'url': 'http://www.teachertube.com/music.php?music_id=8805',
          'md5': '01e8352006c65757caf7b961f6050e21',
          'info_dict': {
@@ -46,20 +39,21 @@ class TeacherTubeIE(InfoExtractor):
              'description': 'RADIJSKA EMISIJA ZRAKOPLOVNE TEHNI?KE ?KOLE P',
          },
      }, {
+        # unavailable video
          'url': 'http://www.teachertube.com/video/intro-video-schleicher-297790',
-        'md5': '9c79fbb2dd7154823996fc28d4a26998',
-        'info_dict': {
-            'id': '297790',
-            'ext': 'mp4',
-            'title': 'Intro Video - Schleicher',
-            'description': 'Intro Video - Why to flip, how flipping will',
-        },
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
+        error = self._search_regex(
+            r'<div\b[^>]+\bclass=["\']msgBox error[^>]+>([^<]+)', webpage,
+            'error', default=None)
+        if error:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
          title = self._html_search_meta('title', webpage, 'title', fatal=True)
          TITLE_SUFFIX = ' - TeacherTube'
          if title.endswith(TITLE_SUFFIX):
@@ -84,12 +78,16 @@ class TeacherTubeIE(InfoExtractor):
  
          self._sort_formats(formats)
  
+        thumbnail = self._og_search_thumbnail(
+            webpage, default=None) or self._html_search_meta(
+            'thumbnail', webpage)
+
          return {
              'id': video_id,
              'title': title,
-            'thumbnail': self._html_search_regex(r'\'image\'\s*:\s*["\']([^"\']+)["\']', webpage, 'thumbnail'),
-            'formats': formats,
              'description': description,
+            'thumbnail': thumbnail,
+            'formats': formats,
          }
  
  
diff --git a/youtube_dl/extractor/teamcoco.py b/youtube_dl/extractor/teamcoco.py

index 75346393b017995098d08136df2cbffad1e1c6bb..9056c8cbc29a4431a364be6b3731f5aef2f06529 100644 (file)
--- a/youtube_dl/extractor/teamcoco.py
+++ b/youtube_dl/extractor/teamcoco.py
@@ -1,18 +1,20 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import base64
  import binascii
  import re
  import json
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_b64decode,
+    compat_ord,
+)
  from ..utils import (
      ExtractorError,
      qualities,
      determine_ext,
  )
-from ..compat import compat_ord
  
  
  class TeamcocoIE(InfoExtractor):
@@ -97,7 +99,7 @@ class TeamcocoIE(InfoExtractor):
              for i in range(len(cur_fragments)):
                  cur_sequence = (''.join(cur_fragments[i:] + cur_fragments[:i])).encode('ascii')
                  try:
-                    raw_data = base64.b64decode(cur_sequence)
+                    raw_data = compat_b64decode(cur_sequence)
                      if compat_ord(raw_data[0]) == compat_ord('{'):
                          return json.loads(raw_data.decode('utf-8'))
                  except (TypeError, binascii.Error, UnicodeDecodeError, ValueError):
diff --git a/youtube_dl/extractor/thesixtyone.py b/youtube_dl/extractor/thesixtyone.py

deleted file mode 100644 (file)

index d63aef5..0000000
--- a/youtube_dl/extractor/thesixtyone.py
+++ /dev/null
@@ -1,106 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import unified_strdate
-
-
-class TheSixtyOneIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?thesixtyone\.com/
-        (?:.*?/)*
-        (?:
-            s|
-            song/comments/list|
-            song
-        )/(?:[^/]+/)?(?P<id>[A-Za-z0-9]+)/?$'''
-    _SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}'
-    _SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream'
-    _THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop'
-    _TESTS = [
-        {
-            'url': 'http://www.thesixtyone.com/s/SrE3zD7s1jt/',
-            'md5': '821cc43b0530d3222e3e2b70bb4622ea',
-            'info_dict': {
-                'id': 'SrE3zD7s1jt',
-                'ext': 'mp3',
-                'title': 'CASIO - Unicorn War Mixtape',
-                'thumbnail': 're:^https?://.*_desktop$',
-                'upload_date': '20071217',
-                'duration': 3208,
-            }
-        },
-        {
-            'url': 'http://www.thesixtyone.com/song/comments/list/SrE3zD7s1jt',
-            'only_matching': True,
-        },
-        {
-            'url': 'http://www.thesixtyone.com/s/ULoiyjuJWli#/s/SrE3zD7s1jt/',
-            'only_matching': True,
-        },
-        {
-            'url': 'http://www.thesixtyone.com/#/s/SrE3zD7s1jt/',
-            'only_matching': True,
-        },
-        {
-            'url': 'http://www.thesixtyone.com/song/SrE3zD7s1jt/',
-            'only_matching': True,
-        },
-        {
-            'url': 'http://www.thesixtyone.com/maryatmidnight/song/StrawberriesandCream/yvWtLp0c4GQ/',
-            'only_matching': True,
-        },
-    ]
-
-    _DECODE_MAP = {
-        'x': 'a',
-        'm': 'b',
-        'w': 'c',
-        'q': 'd',
-        'n': 'e',
-        'p': 'f',
-        'a': '0',
-        'h': '1',
-        'e': '2',
-        'u': '3',
-        's': '4',
-        'i': '5',
-        'o': '6',
-        'y': '7',
-        'r': '8',
-        'c': '9'
-    }
-
-    def _real_extract(self, url):
-        song_id = self._match_id(url)
-
-        webpage = self._download_webpage(
-            self._SONG_URL_TEMPLATE.format(song_id), song_id)
-
-        song_data = self._parse_json(self._search_regex(
-            r'"%s":\s(\{.*?\})' % song_id, webpage, 'song_data'), song_id)
-
-        if self._search_regex(r'(t61\.s3_audio_load\s*=\s*1\.0;)', webpage, 's3_audio_load marker', default=None):
-            song_data['audio_server'] = 's3.amazonaws.com'
-        else:
-            song_data['audio_server'] = song_data['audio_server'] + '.thesixtyone.com'
-
-        keys = [self._DECODE_MAP.get(s, s) for s in song_data['key']]
-        url = self._SONG_FILE_URL_TEMPLATE.format(
-            "".join(reversed(keys)), **song_data)
-
-        formats = [{
-            'format_id': 'sd',
-            'url': url,
-            'ext': 'mp3',
-        }]
-
-        return {
-            'id': song_id,
-            'title': '{artist:} - {name:}'.format(**song_data),
-            'formats': formats,
-            'comment_count': song_data.get('comments_count'),
-            'duration': song_data.get('play_time'),
-            'like_count': song_data.get('score'),
-            'thumbnail': self._THUMBNAIL_URL_TEMPLATE.format(**song_data),
-            'upload_date': unified_strdate(song_data.get('publish_date')),
-        }
diff --git a/youtube_dl/extractor/tutv.py b/youtube_dl/extractor/tutv.py

index 822372ea14e52c08d0e95f81a4618c122b2232d8..362318b24b5df786cc98b9688d2859258e15178f 100644 (file)
--- a/youtube_dl/extractor/tutv.py
+++ b/youtube_dl/extractor/tutv.py
@@ -1,9 +1,10 @@
  from __future__ import unicode_literals
  
-import base64
-
  from .common import InfoExtractor
-from ..compat import compat_parse_qs
+from ..compat import (
+    compat_b64decode,
+    compat_parse_qs,
+)
  
  
  class TutvIE(InfoExtractor):
@@ -26,7 +27,7 @@ class TutvIE(InfoExtractor):
  
          data_content = self._download_webpage(
              'http://tu.tv/flvurl.php?codVideo=%s' % internal_id, video_id, 'Downloading video info')
-        video_url = base64.b64decode(compat_parse_qs(data_content)['kpt'][0].encode('utf-8')).decode('utf-8')
+        video_url = compat_b64decode(compat_parse_qs(data_content)['kpt'][0]).decode('utf-8')
  
          return {
              'id': internal_id,
diff --git a/youtube_dl/extractor/tvplay.py b/youtube_dl/extractor/tvplay.py

index 46132eda1f3700bc51df6633b96723b4b2874ebb..84597b55e0f6047a1dccd1905cb4771949b3cf00 100644 (file)
--- a/youtube_dl/extractor/tvplay.py
+++ b/youtube_dl/extractor/tvplay.py
@@ -273,6 +273,8 @@ class TVPlayIE(InfoExtractor):
                      'ext': ext,
                  }
                  if video_url.startswith('rtmp'):
+                    if smuggled_data.get('skip_rtmp'):
+                        continue
                      m = re.search(
                          r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
                      if not m:
@@ -434,6 +436,10 @@ class ViafreeIE(InfoExtractor):
          return self.url_result(
              smuggle_url(
                  'mtg:%s' % video_id,
-                {'geo_countries': [
-                    compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]]}),
+                {
+                    'geo_countries': [
+                        compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
+                    # rtmp host mtgfs.fplive.net for viafree is unresolvable
+                    'skip_rtmp': True,
+                }),
              ie=TVPlayIE.ie_key(), video_id=video_id)
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index bf57eac01f2f42dd5dd4d4fc7bdce55f08ee7513..1981b4d4a8064541c0fcf2faa04d863c72eebb14 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -85,10 +85,15 @@ class TwitchBaseIE(InfoExtractor):
                  if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
                      response = self._parse_json(
                          e.cause.read().decode('utf-8'), None)
-                    fail(response['message'])
+                    fail(response.get('message') or response['errors'][0])
                  raise
  
-            redirect_url = urljoin(post_url, response['redirect'])
+            if 'Authenticated successfully' in response.get('message', ''):
+                return None, None
+
+            redirect_url = urljoin(
+                post_url,
+                response.get('redirect') or response['redirect_path'])
              return self._download_webpage_handle(
                  redirect_url, None, 'Downloading login redirect page',
                  headers=headers)
@@ -106,6 +111,10 @@ class TwitchBaseIE(InfoExtractor):
                  'password': password,
              })
  
+        # Successful login
+        if not redirect_page:
+            return
+
          if re.search(r'(?i)<form[^>]+id="two-factor-submit"', redirect_page) is not None:
              # TODO: Add mechanism to request an SMS or phone call
              tfa_token = self._get_tfa_info('two-factor authentication token')
@@ -358,9 +367,16 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
                  break
              offset += limit
          return self.playlist_result(
-            [self.url_result(entry) for entry in orderedSet(entries)],
+            [self._make_url_result(entry) for entry in orderedSet(entries)],
              channel_id, channel_name)
  
+    def _make_url_result(self, url):
+        try:
+            video_id = 'v%s' % TwitchVodIE._match_id(url)
+            return self.url_result(url, TwitchVodIE.ie_key(), video_id=video_id)
+        except AssertionError:
+            return self.url_result(url)
+
      def _extract_playlist_page(self, response):
          videos = response.get('videos')
          return [video['url'] for video in videos] if videos else []
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index d4838b3e5f59410c7e3db3b99823261d51350925..b8ea50362fc83f9bcdb480a9625d0dc55cd28b52 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -318,9 +318,14 @@ class VKIE(VKBaseIE):
                  'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
                  expected=True)
  
+        ERROR_COPYRIGHT = 'Video %s has been removed from public access due to rightholder complaint.'
+
          ERRORS = {
              r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
-            'Video %s has been removed from public access due to rightholder complaint.',
+            ERROR_COPYRIGHT,
+
+            r'>The video .*? was removed from public access by request of the copyright holder.<':
+            ERROR_COPYRIGHT,
  
              r'<!>Please log in or <':
              'Video %s is only available for registered users, '
diff --git a/youtube_dl/extractor/wdr.py b/youtube_dl/extractor/wdr.py

index 621de1e1efb73a9a377a46fe0fa702e595c3cde5..cf6f7c7ed6ab5bce442a6cb2079baf687c324417 100644 (file)
--- a/youtube_dl/extractor/wdr.py
+++ b/youtube_dl/extractor/wdr.py
@@ -4,49 +4,50 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      determine_ext,
      ExtractorError,
      js_to_json,
      strip_jsonp,
+    try_get,
      unified_strdate,
      update_url_query,
      urlhandle_detect_ext,
  )
  
  
-class WDRBaseIE(InfoExtractor):
-    def _extract_wdr_video(self, webpage, display_id):
-        # for wdr.de the data-extension is in a tag with the class "mediaLink"
-        # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
-        # for wdrmaus, in a tag with the class "videoButton" (previously a link
-        # to the page in a multiline "videoLink"-tag)
-        json_metadata = self._html_search_regex(
-            r'''(?sx)class=
-                    (?:
-                        (["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
-                        (["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
-                    )data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
-            ''',
-            webpage, 'media link', default=None, group='data')
-
-        if not json_metadata:
-            return
+class WDRIE(InfoExtractor):
+    _VALID_URL = r'https?://deviceids-medp\.wdr\.de/ondemand/\d+/(?P<id>\d+)\.js'
+    _GEO_COUNTRIES = ['DE']
+    _TEST = {
+        'url': 'http://deviceids-medp.wdr.de/ondemand/155/1557833.js',
+        'info_dict': {
+            'id': 'mdb-1557833',
+            'ext': 'mp4',
+            'title': 'Biathlon-Staffel verpasst Podest bei Olympia-Generalprobe',
+            'upload_date': '20180112',
+        },
+    }
  
-        media_link_obj = self._parse_json(json_metadata, display_id,
-                                          transform_source=js_to_json)
-        jsonp_url = media_link_obj['mediaObj']['url']
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
  
          metadata = self._download_json(
-            jsonp_url, display_id, transform_source=strip_jsonp)
+            url, video_id, transform_source=strip_jsonp)
+
+        is_live = metadata.get('mediaType') == 'live'
  
-        metadata_tracker_data = metadata['trackerData']
-        metadata_media_resource = metadata['mediaResource']
+        tracker_data = metadata['trackerData']
+        media_resource = metadata['mediaResource']
  
          formats = []
  
          # check if the metadata contains a direct URL to a file
-        for kind, media_resource in metadata_media_resource.items():
+        for kind, media_resource in media_resource.items():
              if kind not in ('dflt', 'alt'):
                  continue
  
@@ -57,13 +58,13 @@ class WDRBaseIE(InfoExtractor):
                  ext = determine_ext(medium_url)
                  if ext == 'm3u8':
                      formats.extend(self._extract_m3u8_formats(
-                        medium_url, display_id, 'mp4', 'm3u8_native',
+                        medium_url, video_id, 'mp4', 'm3u8_native',
                          m3u8_id='hls'))
                  elif ext == 'f4m':
                      manifest_url = update_url_query(
                          medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
                      formats.extend(self._extract_f4m_formats(
-                        manifest_url, display_id, f4m_id='hds', fatal=False))
+                        manifest_url, video_id, f4m_id='hds', fatal=False))
                  elif ext == 'smil':
                      formats.extend(self._extract_smil_formats(
                          medium_url, 'stream', fatal=False))
@@ -73,7 +74,7 @@ class WDRBaseIE(InfoExtractor):
                      }
                      if ext == 'unknown_video':
                          urlh = self._request_webpage(
-                            medium_url, display_id, note='Determining extension')
+                            medium_url, video_id, note='Determining extension')
                          ext = urlhandle_detect_ext(urlh)
                          a_format['ext'] = ext
                      formats.append(a_format)
@@ -81,30 +82,30 @@ class WDRBaseIE(InfoExtractor):
          self._sort_formats(formats)
  
          subtitles = {}
-        caption_url = metadata_media_resource.get('captionURL')
+        caption_url = media_resource.get('captionURL')
          if caption_url:
              subtitles['de'] = [{
                  'url': caption_url,
                  'ext': 'ttml',
              }]
  
-        title = metadata_tracker_data['trackerClipTitle']
+        title = tracker_data['trackerClipTitle']
  
          return {
-            'id': metadata_tracker_data.get('trackerClipId', display_id),
-            'display_id': display_id,
-            'title': title,
-            'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
+            'id': tracker_data.get('trackerClipId', video_id),
+            'title': self._live_title(title) if is_live else title,
+            'alt_title': tracker_data.get('trackerClipSubcategory'),
              'formats': formats,
              'subtitles': subtitles,
-            'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
+            'upload_date': unified_strdate(tracker_data.get('trackerClipAirTime')),
+            'is_live': is_live,
          }
  
  
-class WDRIE(WDRBaseIE):
+class WDRPageIE(InfoExtractor):
      _CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
-    _PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
-    _VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
+    _PAGE_REGEX = r'/(?:mediathek/)?(?:[^/]+/)*(?P<display_id>[^/]+)\.html'
+    _VALID_URL = r'https?://(?:www\d?\.)?(?:wdr\d?|sportschau)\.de' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
  
      _TESTS = [
          {
@@ -124,6 +125,7 @@ class WDRIE(WDRBaseIE):
                      'ext': 'ttml',
                  }]},
              },
+            'skip': 'HTTP Error 404: Not Found',
          },
          {
              'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html',
@@ -139,19 +141,17 @@ class WDRIE(WDRBaseIE):
                  'is_live': False,
                  'subtitles': {}
              },
+            'skip': 'HTTP Error 404: Not Found',
          },
          {
              'url': 'http://www1.wdr.de/mediathek/video/live/index.html',
              'info_dict': {
-                'id': 'mdb-103364',
+                'id': 'mdb-1406149',
                  'ext': 'mp4',
-                'display_id': 'index',
-                'title': r're:^WDR Fernsehen im Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+                'title': r're:^WDR Fernsehen im Livestream \(nur in Deutschland erreichbar\) [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
                  'alt_title': 'WDR Fernsehen Live',
-                'upload_date': None,
-                'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
+                'upload_date': '20150101',
                  'is_live': True,
-                'subtitles': {}
              },
              'params': {
                  'skip_download': True,  # m3u8 download
@@ -159,19 +159,18 @@ class WDRIE(WDRBaseIE):
          },
          {
              'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html',
-            'playlist_mincount': 8,
+            'playlist_mincount': 7,
              'info_dict': {
-                'id': 'aktuelle-stunde/aktuelle-stunde-120',
+                'id': 'aktuelle-stunde-120',
              },
          },
          {
              'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
              'info_dict': {
-                'id': 'mdb-1323501',
+                'id': 'mdb-1552552',
                  'ext': 'mp4',
                  'upload_date': 're:^[0-9]{8}$',
                  'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
-                'description': 'Die Seite mit der Maus -',
              },
              'skip': 'The id changes from week to week because of the new episode'
          },
@@ -183,7 +182,6 @@ class WDRIE(WDRBaseIE):
                  'ext': 'mp4',
                  'upload_date': '20130919',
                  'title': 'Sachgeschichte - Achterbahn ',
-                'description': 'Die Seite mit der Maus -',
              },
          },
          {
@@ -191,52 +189,114 @@ class WDRIE(WDRBaseIE):
              # Live stream, MD5 unstable
              'info_dict': {
                  'id': 'mdb-869971',
-                'ext': 'flv',
-                'title': 'COSMO Livestream',
-                'description': 'md5:2309992a6716c347891c045be50992e4',
+                'ext': 'mp4',
+                'title': r're:^COSMO Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
                  'upload_date': '20160101',
              },
+            'params': {
+                'skip_download': True,  # m3u8 download
+            }
+        },
+        {
+            'url': 'http://www.sportschau.de/handballem2018/handball-nationalmannschaft-em-stolperstein-vorrunde-100.html',
+            'info_dict': {
+                'id': 'mdb-1556012',
+                'ext': 'mp4',
+                'title': 'DHB-Vizepräsident Bob Hanning - "Die Weltspitze ist extrem breit"',
+                'upload_date': '20180111',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            'url': 'http://www.sportschau.de/handballem2018/audio-vorschau---die-handball-em-startet-mit-grossem-favoritenfeld-100.html',
+            'only_matching': True,
          }
      ]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        url_type = mobj.group('type')
-        page_url = mobj.group('page_url')
          display_id = mobj.group('display_id')
          webpage = self._download_webpage(url, display_id)
  
-        info_dict = self._extract_wdr_video(webpage, display_id)
+        entries = []
+
+        # Article with several videos
  
-        if not info_dict:
+        # for wdr.de the data-extension is in a tag with the class "mediaLink"
+        # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
+        # for wdrmaus, in a tag with the class "videoButton" (previously a link
+        # to the page in a multiline "videoLink"-tag)
+        for mobj in re.finditer(
+            r'''(?sx)class=
+                    (?:
+                        (["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
+                        (["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
+                    )data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
+                    ''', webpage):
+            media_link_obj = self._parse_json(
+                mobj.group('data'), display_id, transform_source=js_to_json,
+                fatal=False)
+            if not media_link_obj:
+                continue
+            jsonp_url = try_get(
+                media_link_obj, lambda x: x['mediaObj']['url'], compat_str)
+            if jsonp_url:
+                entries.append(self.url_result(jsonp_url, ie=WDRIE.ie_key()))
+
+        # Playlist (e.g. https://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html)
+        if not entries:
              entries = [
-                self.url_result(page_url + href[0], 'WDR')
-                for href in re.findall(
-                    r'<a href="(%s)"[^>]+data-extension=' % self._PAGE_REGEX,
-                    webpage)
+                self.url_result(
+                    compat_urlparse.urljoin(url, mobj.group('href')),
+                    ie=WDRPageIE.ie_key())
+                for mobj in re.finditer(
+                    r'<a[^>]+\bhref=(["\'])(?P<href>(?:(?!\1).)+)\1[^>]+\bdata-extension=',
+                    webpage) if re.match(self._PAGE_REGEX, mobj.group('href'))
              ]
  
-            if entries:  # Playlist page
-                return self.playlist_result(entries, playlist_id=display_id)
+        return self.playlist_result(entries, playlist_id=display_id)
  
-            raise ExtractorError('No downloadable streams found', expected=True)
  
-        is_live = url_type == 'live'
-
-        if is_live:
-            info_dict.update({
-                'title': self._live_title(info_dict['title']),
-                'upload_date': None,
-            })
-        elif 'upload_date' not in info_dict:
-            info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
+class WDRElefantIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)wdrmaus\.de/elefantenseite/#(?P<id>.+)'
+    _TEST = {
+        'url': 'http://www.wdrmaus.de/elefantenseite/#folge_ostern_2015',
+        'info_dict': {
+            'title': 'Folge Oster-Spezial 2015',
+            'id': 'mdb-1088195',
+            'ext': 'mp4',
+            'age_limit': None,
+            'upload_date': '20150406'
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }
  
-        info_dict.update({
-            'description': self._html_search_meta('Description', webpage),
-            'is_live': is_live,
-        })
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
  
-        return info_dict
+        # Table of Contents seems to always be at this address, so fetch it directly.
+        # The website fetches configurationJS.php5, which links to tableOfContentsJS.php5.
+        table_of_contents = self._download_json(
+            'https://www.wdrmaus.de/elefantenseite/data/tableOfContentsJS.php5',
+            display_id)
+        if display_id not in table_of_contents:
+            raise ExtractorError(
+                'No entry in site\'s table of contents for this URL. '
+                'Is the fragment part of the URL (after the #) correct?',
+                expected=True)
+        xml_metadata_path = table_of_contents[display_id]['xmlPath']
+        xml_metadata = self._download_xml(
+            'https://www.wdrmaus.de/elefantenseite/' + xml_metadata_path,
+            display_id)
+        zmdb_url_element = xml_metadata.find('./movie/zmdb_url')
+        if zmdb_url_element is None:
+            raise ExtractorError(
+                '%s is not a video' % display_id, expected=True)
+        return self.url_result(zmdb_url_element.text, ie=WDRIE.ie_key())
  
  
  class WDRMobileIE(InfoExtractor):
diff --git a/youtube_dl/extractor/weibo.py b/youtube_dl/extractor/weibo.py

new file mode 100644 (file)

index 0000000..3cb4d71
--- /dev/null
+++ b/youtube_dl/extractor/weibo.py
@@ -0,0 +1,140 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+import json
+import random
+import re
+
+from ..compat import (
+    compat_parse_qs,
+    compat_str,
+)
+from ..utils import (
+    js_to_json,
+    strip_jsonp,
+    urlencode_postdata,
+)
+
+
+class WeiboIE(InfoExtractor):
+    _VALID_URL = r'https?://weibo\.com/[0-9]+/(?P<id>[a-zA-Z0-9]+)'
+    _TEST = {
+        'url': 'https://weibo.com/6275294458/Fp6RGfbff?type=comment',
+        'info_dict': {
+            'id': 'Fp6RGfbff',
+            'ext': 'mp4',
+            'title': 'You should have servants to massage you,... 来自Hosico_猫 - 微博',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        # to get Referer url for genvisitor
+        webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        visitor_url = urlh.geturl()
+
+        if 'passport.weibo.com' in visitor_url:
+            # first visit
+            visitor_data = self._download_json(
+                'https://passport.weibo.com/visitor/genvisitor', video_id,
+                note='Generating first-visit data',
+                transform_source=strip_jsonp,
+                headers={'Referer': visitor_url},
+                data=urlencode_postdata({
+                    'cb': 'gen_callback',
+                    'fp': json.dumps({
+                        'os': '2',
+                        'browser': 'Gecko57,0,0,0',
+                        'fonts': 'undefined',
+                        'screenInfo': '1440*900*24',
+                        'plugins': '',
+                    }),
+                }))
+
+            tid = visitor_data['data']['tid']
+            cnfd = '%03d' % visitor_data['data']['confidence']
+
+            self._download_webpage(
+                'https://passport.weibo.com/visitor/visitor', video_id,
+                note='Running first-visit callback',
+                query={
+                    'a': 'incarnate',
+                    't': tid,
+                    'w': 2,
+                    'c': cnfd,
+                    'cb': 'cross_domain',
+                    'from': 'weibo',
+                    '_rand': random.random(),
+                })
+
+            webpage = self._download_webpage(
+                url, video_id, note='Revisiting webpage')
+
+        title = self._html_search_regex(
+            r'<title>(.+?)</title>', webpage, 'title')
+
+        video_formats = compat_parse_qs(self._search_regex(
+            r'video-sources=\\\"(.+?)\"', webpage, 'video_sources'))
+
+        formats = []
+        supported_resolutions = (480, 720)
+        for res in supported_resolutions:
+            vid_urls = video_formats.get(compat_str(res))
+            if not vid_urls or not isinstance(vid_urls, list):
+                continue
+
+            vid_url = vid_urls[0]
+            formats.append({
+                'url': vid_url,
+                'height': res,
+            })
+
+        self._sort_formats(formats)
+
+        uploader = self._og_search_property(
+            'nick-name', webpage, 'uploader', default=None)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'uploader': uploader,
+            'formats': formats
+        }
+
+
+class WeiboMobileIE(InfoExtractor):
+    _VALID_URL = r'https?://m\.weibo\.cn/status/(?P<id>[0-9]+)(\?.+)?'
+    _TEST = {
+        'url': 'https://m.weibo.cn/status/4189191225395228?wm=3333_2001&sourcetype=weixin&featurecode=newtitle&from=singlemessage&isappinstalled=0',
+        'info_dict': {
+            'id': '4189191225395228',
+            'ext': 'mp4',
+            'title': '午睡当然是要甜甜蜜蜜的啦',
+            'uploader': '柴犬柴犬'
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        # to get Referer url for genvisitor
+        webpage = self._download_webpage(url, video_id, note='visit the page')
+
+        weibo_info = self._parse_json(self._search_regex(
+            r'var\s+\$render_data\s*=\s*\[({.*})\]\[0\]\s*\|\|\s*{};',
+            webpage, 'js_code', flags=re.DOTALL),
+            video_id, transform_source=js_to_json)
+
+        status_data = weibo_info.get('status', {})
+        page_info = status_data.get('page_info')
+        title = status_data['status_title']
+        uploader = status_data.get('user', {}).get('screen_name')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'uploader': uploader,
+            'url': page_info['media_info']['stream_url']
+        }
diff --git a/youtube_dl/extractor/ximalaya.py b/youtube_dl/extractor/ximalaya.py

new file mode 100644 (file)

index 0000000..a912e54
--- /dev/null
+++ b/youtube_dl/extractor/ximalaya.py
@@ -0,0 +1,233 @@
+# coding: utf-8
+
+from __future__ import unicode_literals
+
+import itertools
+import re
+
+from .common import InfoExtractor
+
+
+class XimalayaBaseIE(InfoExtractor):
+    _GEO_COUNTRIES = ['CN']
+
+
+class XimalayaIE(XimalayaBaseIE):
+    IE_NAME = 'ximalaya'
+    IE_DESC = '喜马拉雅FM'
+    _VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/sound/(?P<id>[0-9]+)'
+    _USER_URL_FORMAT = '%s://www.ximalaya.com/zhubo/%i/'
+    _TESTS = [
+        {
+            'url': 'http://www.ximalaya.com/61425525/sound/47740352/',
+            'info_dict': {
+                'id': '47740352',
+                'ext': 'm4a',
+                'uploader': '小彬彬爱听书',
+                'uploader_id': 61425525,
+                'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
+                'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
+                'description': "contains:《送孟浩然之广陵》\n作者：李白\n故人西辞黄鹤楼，烟花三月下扬州。\n孤帆远影碧空尽，惟见长江天际流。",
+                'thumbnails': [
+                    {
+                        'name': 'cover_url',
+                        'url': r're:^https?://.*\.jpg$',
+                    },
+                    {
+                        'name': 'cover_url_142',
+                        'url': r're:^https?://.*\.jpg$',
+                        'width': 180,
+                        'height': 180
+                    }
+                ],
+                'categories': ['renwen', '人文'],
+                'duration': 93,
+                'view_count': int,
+                'like_count': int,
+            }
+        },
+        {
+            'url': 'http://m.ximalaya.com/61425525/sound/47740352/',
+            'info_dict': {
+                'id': '47740352',
+                'ext': 'm4a',
+                'uploader': '小彬彬爱听书',
+                'uploader_id': 61425525,
+                'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
+                'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
+                'description': "contains:《送孟浩然之广陵》\n作者：李白\n故人西辞黄鹤楼，烟花三月下扬州。\n孤帆远影碧空尽，惟见长江天际流。",
+                'thumbnails': [
+                    {
+                        'name': 'cover_url',
+                        'url': r're:^https?://.*\.jpg$',
+                    },
+                    {
+                        'name': 'cover_url_142',
+                        'url': r're:^https?://.*\.jpg$',
+                        'width': 180,
+                        'height': 180
+                    }
+                ],
+                'categories': ['renwen', '人文'],
+                'duration': 93,
+                'view_count': int,
+                'like_count': int,
+            }
+        },
+        {
+            'url': 'https://www.ximalaya.com/11045267/sound/15705996/',
+            'info_dict': {
+                'id': '15705996',
+                'ext': 'm4a',
+                'uploader': '李延隆老师',
+                'uploader_id': 11045267,
+                'uploader_url': 'https://www.ximalaya.com/zhubo/11045267/',
+                'title': 'Lesson 1 Excuse me!',
+                'description': "contains:Listen to the tape then answer\xa0this question. Whose handbag is it?\n"
+                               "听录音，然后回答问题，这是谁的手袋？",
+                'thumbnails': [
+                    {
+                        'name': 'cover_url',
+                        'url': r're:^https?://.*\.jpg$',
+                    },
+                    {
+                        'name': 'cover_url_142',
+                        'url': r're:^https?://.*\.jpg$',
+                        'width': 180,
+                        'height': 180
+                    }
+                ],
+                'categories': ['train', '外语'],
+                'duration': 40,
+                'view_count': int,
+                'like_count': int,
+            }
+        },
+    ]
+
+    def _real_extract(self, url):
+
+        is_m = 'm.ximalaya' in url
+        scheme = 'https' if url.startswith('https') else 'http'
+
+        audio_id = self._match_id(url)
+        webpage = self._download_webpage(url, audio_id,
+                                         note='Download sound page for %s' % audio_id,
+                                         errnote='Unable to get sound page')
+
+        audio_info_file = '%s://m.ximalaya.com/tracks/%s.json' % (scheme, audio_id)
+        audio_info = self._download_json(audio_info_file, audio_id,
+                                         'Downloading info json %s' % audio_info_file,
+                                         'Unable to download info file')
+
+        formats = []
+        for bps, k in (('24k', 'play_path_32'), ('64k', 'play_path_64')):
+            if audio_info.get(k):
+                formats.append({
+                    'format_id': bps,
+                    'url': audio_info[k],
+                })
+
+        thumbnails = []
+        for k in audio_info.keys():
+            # cover pics kyes like: cover_url', 'cover_url_142'
+            if k.startswith('cover_url'):
+                thumbnail = {'name': k, 'url': audio_info[k]}
+                if k == 'cover_url_142':
+                    thumbnail['width'] = 180
+                    thumbnail['height'] = 180
+                thumbnails.append(thumbnail)
+
+        audio_uploader_id = audio_info.get('uid')
+
+        if is_m:
+            audio_description = self._html_search_regex(r'(?s)<section\s+class=["\']content[^>]+>(.+?)</section>',
+                                                        webpage, 'audio_description', fatal=False)
+        else:
+            audio_description = self._html_search_regex(r'(?s)<div\s+class=["\']rich_intro[^>]*>(.+?</article>)',
+                                                        webpage, 'audio_description', fatal=False)
+
+        if not audio_description:
+            audio_description_file = '%s://www.ximalaya.com/sounds/%s/rich_intro' % (scheme, audio_id)
+            audio_description = self._download_webpage(audio_description_file, audio_id,
+                                                       note='Downloading description file %s' % audio_description_file,
+                                                       errnote='Unable to download descrip file',
+                                                       fatal=False)
+            audio_description = audio_description.strip() if audio_description else None
+
+        return {
+            'id': audio_id,
+            'uploader': audio_info.get('nickname'),
+            'uploader_id': audio_uploader_id,
+            'uploader_url': self._USER_URL_FORMAT % (scheme, audio_uploader_id) if audio_uploader_id else None,
+            'title': audio_info['title'],
+            'thumbnails': thumbnails,
+            'description': audio_description,
+            'categories': list(filter(None, (audio_info.get('category_name'), audio_info.get('category_title')))),
+            'duration': audio_info.get('duration'),
+            'view_count': audio_info.get('play_count'),
+            'like_count': audio_info.get('favorites_count'),
+            'formats': formats,
+        }
+
+
+class XimalayaAlbumIE(XimalayaBaseIE):
+    IE_NAME = 'ximalaya:album'
+    IE_DESC = '喜马拉雅FM 专辑'
+    _VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/album/(?P<id>[0-9]+)'
+    _TEMPLATE_URL = '%s://www.ximalaya.com/%s/album/%s/'
+    _BASE_URL_TEMPL = '%s://www.ximalaya.com%s'
+    _LIST_VIDEO_RE = r'<a[^>]+?href="(?P<url>/%s/sound/(?P<id>\d+)/?)"[^>]+?title="(?P<title>[^>]+)">'
+    _TESTS = [{
+        'url': 'http://www.ximalaya.com/61425525/album/5534601/',
+        'info_dict': {
+            'title': '唐诗三百首（含赏析）',
+            'id': '5534601',
+        },
+        'playlist_count': 312,
+    }, {
+        'url': 'http://m.ximalaya.com/61425525/album/5534601',
+        'info_dict': {
+            'title': '唐诗三百首（含赏析）',
+            'id': '5534601',
+        },
+        'playlist_count': 312,
+    },
+    ]
+
+    def _real_extract(self, url):
+        self.scheme = scheme = 'https' if url.startswith('https') else 'http'
+
+        mobj = re.match(self._VALID_URL, url)
+        uid, playlist_id = mobj.group('uid'), mobj.group('id')
+
+        webpage = self._download_webpage(self._TEMPLATE_URL % (scheme, uid, playlist_id), playlist_id,
+                                         note='Download album page for %s' % playlist_id,
+                                         errnote='Unable to get album info')
+
+        title = self._html_search_regex(r'detailContent_title[^>]*><h1(?:[^>]+)?>([^<]+)</h1>',
+                                        webpage, 'title', fatal=False)
+
+        return self.playlist_result(self._entries(webpage, playlist_id, uid), playlist_id, title)
+
+    def _entries(self, page, playlist_id, uid):
+        html = page
+        for page_num in itertools.count(1):
+            for entry in self._process_page(html, uid):
+                yield entry
+
+            next_url = self._search_regex(r'<a\s+href=(["\'])(?P<more>[\S]+)\1[^>]+rel=(["\'])next\3',
+                                          html, 'list_next_url', default=None, group='more')
+            if not next_url:
+                break
+
+            next_full_url = self._BASE_URL_TEMPL % (self.scheme, next_url)
+            html = self._download_webpage(next_full_url, playlist_id)
+
+    def _process_page(self, html, uid):
+        find_from = html.index('album_soundlist')
+        for mobj in re.finditer(self._LIST_VIDEO_RE % uid, html[find_from:]):
+            yield self.url_result(self._BASE_URL_TEMPL % (self.scheme, mobj.group('url')),
+                                  XimalayaIE.ie_key(),
+                                  mobj.group('id'),
+                                  mobj.group('title'))
diff --git a/youtube_dl/extractor/youku.py b/youtube_dl/extractor/youku.py

index c7947d4a1165212d0bbeb1ad571d9b6a1c04590a..5b0b248cdd031588a958056e85f3efa9a6579eb4 100644 (file)
--- a/youtube_dl/extractor/youku.py
+++ b/youtube_dl/extractor/youku.py
@@ -245,13 +245,19 @@ class YoukuShowIE(InfoExtractor):
          #  No data-id value.
          'url': 'http://list.youku.com/show/id_zefbfbd61237fefbfbdef.html',
          'only_matching': True,
+    }, {
+        #  Wrong number of reload_id.
+        'url': 'http://list.youku.com/show/id_z20eb4acaf5c211e3b2ad.html',
+        'only_matching': True,
      }]
  
      def _extract_entries(self, playlist_data_url, show_id, note, query):
          query['callback'] = 'cb'
          playlist_data = self._download_json(
              playlist_data_url, show_id, query=query, note=note,
-            transform_source=lambda s: js_to_json(strip_jsonp(s)))['html']
+            transform_source=lambda s: js_to_json(strip_jsonp(s))).get('html')
+        if playlist_data is None:
+            return [None, None]
          drama_list = (get_element_by_class('p-drama-grid', playlist_data) or
                        get_element_by_class('p-drama-half-row', playlist_data))
          if drama_list is None:
@@ -291,8 +297,8 @@ class YoukuShowIE(InfoExtractor):
                      'id': page_config['showid'],
                      'stage': reload_id,
                  })
-            entries.extend(new_entries)
-
+            if new_entries is not None:
+                entries.extend(new_entries)
          desc = self._html_search_meta('description', webpage, fatal=False)
          playlist_title = desc.split(',')[0] if desc else None
          detail_li = get_element_by_class('p-intro', webpage)
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index 0919bef0e06ae1b839a7debc38bdf457dafae192..43051512bc1013640bc99d98437b3cf2a7b258a3 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -1596,6 +1596,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                          if 'token' not in video_info:
                              video_info = get_video_info
                          break
+
+        def extract_unavailable_message():
+            return self._html_search_regex(
+                r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
+                video_webpage, 'unavailable message', default=None)
+
          if 'token' not in video_info:
              if 'reason' in video_info:
                  if 'The uploader has not made this video available in your country.' in video_info['reason']:
@@ -1604,8 +1610,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      countries = regions_allowed.split(',') if regions_allowed else None
                      self.raise_geo_restricted(
                          msg=video_info['reason'][0], countries=countries)
+                reason = video_info['reason'][0]
+                if 'Invalid parameters' in reason:
+                    unavailable_message = extract_unavailable_message()
+                    if unavailable_message:
+                        reason = unavailable_message
                  raise ExtractorError(
-                    'YouTube said: %s' % video_info['reason'][0],
+                    'YouTube said: %s' % reason,
                      expected=True, video_id=video_id)
              else:
                  raise ExtractorError(
@@ -1810,7 +1821,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'url': video_info['conn'][0],
                  'player_url': player_url,
              }]
-        elif len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1:
+        elif not is_live and (len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1):
              encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
              if 'rtmpe%3Dyes' in encoded_url_map:
                  raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)
@@ -1953,9 +1964,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
                  formats.append(a_format)
          else:
-            unavailable_message = self._html_search_regex(
-                r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
-                video_webpage, 'unavailable message', default=None)
+            unavailable_message = extract_unavailable_message()
              if unavailable_message:
                  raise ExtractorError(unavailable_message, expected=True)
              raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
@@ -2530,10 +2539,11 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
          webpage = self._download_webpage(url, channel_id, fatal=False)
          if webpage:
              page_type = self._og_search_property(
-                'type', webpage, 'page type', default=None)
+                'type', webpage, 'page type', default='')
              video_id = self._html_search_meta(
                  'videoId', webpage, 'video id', default=None)
-            if page_type == 'video' and video_id and re.match(r'^[0-9A-Za-z_-]{11}$', video_id):
+            if page_type.startswith('video') and video_id and re.match(
+                    r'^[0-9A-Za-z_-]{11}$', video_id):
                  return self.url_result(video_id, YoutubeIE.ie_key())
          return self.url_result(base_url)
  
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 2843a3dc06be1b7b4d27ecf6678cb0ebdb971070..2fe9cf585db817e1d86831c71ef28502b4a16ee5 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -39,6 +39,7 @@ from .compat import (
      compat_HTMLParser,
      compat_basestring,
      compat_chr,
+    compat_ctypes_WINFUNCTYPE,
      compat_etree_fromstring,
      compat_expanduser,
      compat_html_entities,
@@ -1330,24 +1331,24 @@ def _windows_write_string(s, out):
      if fileno not in WIN_OUTPUT_IDS:
          return False
  
-    GetStdHandle = ctypes.WINFUNCTYPE(
+    GetStdHandle = compat_ctypes_WINFUNCTYPE(
          ctypes.wintypes.HANDLE, ctypes.wintypes.DWORD)(
-        (b'GetStdHandle', ctypes.windll.kernel32))
+        ('GetStdHandle', ctypes.windll.kernel32))
      h = GetStdHandle(WIN_OUTPUT_IDS[fileno])
  
-    WriteConsoleW = ctypes.WINFUNCTYPE(
+    WriteConsoleW = compat_ctypes_WINFUNCTYPE(
          ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE, ctypes.wintypes.LPWSTR,
          ctypes.wintypes.DWORD, ctypes.POINTER(ctypes.wintypes.DWORD),
-        ctypes.wintypes.LPVOID)((b'WriteConsoleW', ctypes.windll.kernel32))
+        ctypes.wintypes.LPVOID)(('WriteConsoleW', ctypes.windll.kernel32))
      written = ctypes.wintypes.DWORD(0)
  
-    GetFileType = ctypes.WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)((b'GetFileType', ctypes.windll.kernel32))
+    GetFileType = compat_ctypes_WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)(('GetFileType', ctypes.windll.kernel32))
      FILE_TYPE_CHAR = 0x0002
      FILE_TYPE_REMOTE = 0x8000
-    GetConsoleMode = ctypes.WINFUNCTYPE(
+    GetConsoleMode = compat_ctypes_WINFUNCTYPE(
          ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE,
          ctypes.POINTER(ctypes.wintypes.DWORD))(
-        (b'GetConsoleMode', ctypes.windll.kernel32))
+        ('GetConsoleMode', ctypes.windll.kernel32))
      INVALID_HANDLE_VALUE = ctypes.wintypes.DWORD(-1).value
  
      def not_a_console(handle):
@@ -2266,7 +2267,7 @@ def js_to_json(code):
          "(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
          '(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
          {comment}|,(?={skip}[\]}}])|
-        [a-zA-Z_][.a-zA-Z_0-9]*|
+        (?:(?<![0-9])[eE]|[a-df-zA-DF-Z_])[.a-zA-Z_0-9]*|
          \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:{skip}:)?|
          [0-9]+(?={skip}:)
          '''.format(comment=COMMENT_RE, skip=SKIP_RE), fix_kv, code)
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index a3f84b9ea46ed2a801bc3edb6c34e3551699f61f..8a2b57ffba2eb4763695ae4a2c1af0fa27ff6484 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2017.12.31'
+__version__ = '2018.01.27'
author	Rogério Brito <rbrito@ime.usp.br>
	Wed, 31 Jan 2018 04:22:35 +0000 (02:22 -0200)
committer	Rogério Brito <rbrito@ime.usp.br>
	Wed, 31 Jan 2018 04:22:35 +0000 (02:22 -0200)
AUTHORS		patch \| blob \| history
ChangeLog		patch \| blob \| history
README.md		patch \| blob \| history
README.txt		patch \| blob \| history
devscripts/install_jython.sh	[new file with mode: 0755]	patch \| blob
docs/supportedsites.md		patch \| blob \| history
test/test_download.py		patch \| blob \| history
test/test_utils.py		patch \| blob \| history
youtube-dl		patch \| blob \| history
youtube-dl.1		patch \| blob \| history
youtube_dl/YoutubeDL.py		patch \| blob \| history
youtube_dl/aes.py		patch \| blob \| history
youtube_dl/compat.py		patch \| blob \| history
youtube_dl/downloader/f4m.py		patch \| blob \| history
youtube_dl/extractor/acast.py		patch \| blob \| history
youtube_dl/extractor/adn.py		patch \| blob \| history
youtube_dl/extractor/bigflix.py		patch \| blob \| history
youtube_dl/extractor/bilibili.py		patch \| blob \| history
youtube_dl/extractor/canalplus.py		patch \| blob \| history
youtube_dl/extractor/chilloutzone.py		patch \| blob \| history
youtube_dl/extractor/chirbit.py		patch \| blob \| history
youtube_dl/extractor/common.py		patch \| blob \| history
youtube_dl/extractor/crunchyroll.py		patch \| blob \| history
youtube_dl/extractor/daisuki.py		patch \| blob \| history
youtube_dl/extractor/digg.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/dplay.py		patch \| blob \| history
youtube_dl/extractor/dumpert.py		patch \| blob \| history
youtube_dl/extractor/einthusan.py		patch \| blob \| history
youtube_dl/extractor/extractors.py		patch \| blob \| history
youtube_dl/extractor/franceinter.py		patch \| blob \| history
youtube_dl/extractor/gamestar.py		patch \| blob \| history
youtube_dl/extractor/generic.py		patch \| blob \| history
youtube_dl/extractor/hotnewhiphop.py		patch \| blob \| history
youtube_dl/extractor/infoq.py		patch \| blob \| history
youtube_dl/extractor/jwplatform.py		patch \| blob \| history
youtube_dl/extractor/kamcord.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/leeco.py		patch \| blob \| history
youtube_dl/extractor/limelight.py		patch \| blob \| history
youtube_dl/extractor/lynda.py		patch \| blob \| history
youtube_dl/extractor/mangomolo.py		patch \| blob \| history
youtube_dl/extractor/mitele.py		patch \| blob \| history
youtube_dl/extractor/mixcloud.py		patch \| blob \| history
youtube_dl/extractor/motherless.py		patch \| blob \| history
youtube_dl/extractor/ndr.py		patch \| blob \| history
youtube_dl/extractor/odnoklassniki.py		patch \| blob \| history
youtube_dl/extractor/ooyala.py		patch \| blob \| history
youtube_dl/extractor/openload.py		patch \| blob \| history
youtube_dl/extractor/pandoratv.py		patch \| blob \| history
youtube_dl/extractor/prosiebensat1.py		patch \| blob \| history
youtube_dl/extractor/restudy.py		patch \| blob \| history
youtube_dl/extractor/ringtv.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/rtl2.py		patch \| blob \| history
youtube_dl/extractor/rtve.py		patch \| blob \| history
youtube_dl/extractor/rtvs.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/seznamzpravy.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/shared.py		patch \| blob \| history
youtube_dl/extractor/soundcloud.py		patch \| blob \| history
youtube_dl/extractor/southpark.py		patch \| blob \| history
youtube_dl/extractor/spiegel.py		patch \| blob \| history
youtube_dl/extractor/sportschau.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/springboardplatform.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/tbs.py		patch \| blob \| history
youtube_dl/extractor/teachertube.py		patch \| blob \| history
youtube_dl/extractor/teamcoco.py		patch \| blob \| history
youtube_dl/extractor/thesixtyone.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/tutv.py		patch \| blob \| history
youtube_dl/extractor/tvplay.py		patch \| blob \| history
youtube_dl/extractor/twitch.py		patch \| blob \| history
youtube_dl/extractor/vk.py		patch \| blob \| history
youtube_dl/extractor/wdr.py		patch \| blob \| history
youtube_dl/extractor/weibo.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/ximalaya.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/youku.py		patch \| blob \| history
youtube_dl/extractor/youtube.py		patch \| blob \| history
youtube_dl/utils.py		patch \| blob \| history
youtube_dl/version.py		patch \| blob \| history