Tatsuyuki Ishi
Daniel Weber
Kay Bouché
+Yang Hongbo
+Lei Wang
+version 2018.01.27
+
+Core
+* [extractor/common] Improve _json_ld for articles
+* Switch codebase to use compat_b64decode
++ [compat] Add compat_b64decode
+
+Extractors
++ [seznamzpravy] Add support for seznam.cz and seznamzpravy.cz (#14102, #14616)
+* [dplay] Bypass geo restriction
++ [dplay] Add support for disco-api videos (#15396)
+* [youtube] Extract precise error messages (#15284)
+* [teachertube] Capture and output error message
+* [teachertube] Fix and relax thumbnail extraction (#15403)
++ [prosiebensat1] Add another clip id regular expression (#15378)
+* [tbs] Update tokenizer url (#15395)
+* [mixcloud] Use compat_b64decode (#15394)
+- [thesixtyone] Remove extractor (#15341)
+
+
+version 2018.01.21
+
+Core
+* [extractor/common] Improve jwplayer DASH formats extraction (#9242, #15187)
+* [utils] Improve scientific notation handling in js_to_json (#14789)
+
+Extractors
++ [southparkdk] Add support for southparkstudios.nu
++ [southpark] Add support for collections (#14803)
+* [franceinter] Fix upload date extraction (#14996)
++ [rtvs] Add support for rtvs.sk (#9242, #15187)
+* [restudy] Fix extraction and extend URL regular expression (#15347)
+* [youtube:live] Improve live detection (#15365)
++ [springboardplatform] Add support for springboardplatform.com
+* [prosiebensat1] Add another clip id regular expression (#15290)
+- [ringtv] Remove extractor (#15345)
+
+
+version 2018.01.18
+
+Extractors
+* [soundcloud] Update client id (#15306)
+- [kamcord] Remove extractor (#15322)
++ [spiegel] Add support for nexx videos (#15285)
+* [twitch] Fix authentication and error capture (#14090, #15264)
+* [vk] Detect more errors due to copyright complaints (#15259)
+
+
+version 2018.01.14
+
+Extractors
+* [youtube] Fix live streams extraction (#15202)
+* [wdr] Bypass geo restriction
+* [wdr] Rework extractors (#14598)
++ [wdr] Add support for wdrmaus.de/elefantenseite (#14598)
++ [gamestar] Add support for gamepro.de (#3384)
+* [viafree] Skip rtmp formats (#15232)
++ [pandoratv] Add support for mobile URLs (#12441)
++ [pandoratv] Add support for new URL format (#15131)
++ [ximalaya] Add support for ximalaya.com (#14687)
++ [digg] Add support for digg.com (#15214)
+* [limelight] Tolerate empty pc formats (#15150, #15151, #15207)
+* [ndr:embed:base] Make separate formats extraction non fatal (#15203)
++ [weibo] Add extractor (#15079)
++ [ok] Add support for live streams
+* [canalplus] Fix extraction (#15072)
+* [bilibili] Fix extraction (#15188)
+
+
+version 2018.01.07
+
+Core
+* [utils] Fix youtube-dl under PyPy3 on Windows
+* [YoutubeDL] Output python implementation in debug header
+
+Extractors
++ [jwplatform] Add support for multiple embeds (#15192)
+* [mitele] Fix extraction (#15186)
++ [motherless] Add support for groups (#15124)
+* [lynda] Relax URL regular expression (#15185)
+* [soundcloud] Fallback to avatar picture for thumbnail (#12878)
+* [youku] Fix list extraction (#15135)
+* [openload] Fix extraction (#15166)
+* [lynda] Skip invalid subtitles (#15159)
+* [twitch] Pass video id to url_result when extracting playlist (#15139)
+* [rtve.es:alacarta] Fix extraction of some new URLs
+* [acast] Fix extraction (#15147)
+
+
version 2017.12.31
Core
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION
-**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
+**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
youtube-dl [OPTIONS] URL [URL...]
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
-Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
+Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
YOUTUBE-DL is a command-line program to download videos from YouTube.com
and a few more sites. It requires the Python interpreter, version 2.6,
2.7, or 3.2+, and it is not platform specific. It should work on your
-Unix box, on Windows or on Mac OS X. It is released to the public
-domain, which means you can modify it, redistribute it or use it however
-you like.
+Unix box, on Windows or on macOS. It is released to the public domain,
+which means you can modify it, redistribute it or use it however you
+like.
youtube-dl [OPTIONS] URL [URL...]
# Netscape HTTP Cookie File. Make sure you have correct newline format
in the cookies file and convert newlines if necessary to correspond with
your OS, namely CRLF (\r\n) for Windows and LF (\n) for Unix and
-Unix-like systems (Linux, Mac OS, etc.). HTTP Error 400: Bad Request
-when using --cookies is a good sign of invalid newline format.
+Unix-like systems (Linux, macOS, etc.). HTTP Error 400: Bad Request when
+using --cookies is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a
particular extractor does not implement it explicitly. Another use case
--- /dev/null
+#!/bin/bash
+
+wget http://central.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar
+java -jar jython-installer-2.7.1.jar -s -d "$HOME/jython"
+$HOME/jython/bin/jython -m pip install nose
- **CamdemyFolder**
- **CamWithHer**
- **canalc2.tv**
- - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
+ - **Canalplus**: mycanal.fr and piwiplus.fr
- **Canvas**
- **CanvasEen**: canvas.be and een.be
- **CarambaTV**
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
+ - **Digg**
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
- **JWPlatform**
- **Kakao**
- **Kaltura**
- - **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan**
- **Karaoketv**
- **Moniker**: allmyvideos.net and vidspot.net
- **Morningstar**: morningstar.com
- **Motherless**
+ - **MotherlessGroup**
- **Motorsport**: motorsport.com
- **MovieClips**
- **MovieFap**
- **revision**
- **revision3:embed**
- **RICE**
- - **RingTV**
- **RMCDecouverte**
- **RockstarGames**
- **RoosterTeeth**
- **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH**
+ - **RTVS**
- **Rudo**
- **RUHD**
- **RulePorn**
- **ServingSys**
- **Servus**
- **Sexu**
+ - **SeznamZpravy**
+ - **SeznamZpravyArticle**
- **Shahid**
- **ShahidShow**
- **Shared**: shared.sx
- **Sport5**
- **SportBoxEmbed**
- **SportDeutschland**
- - **Sportschau**
+ - **SpringboardPlatform**
- **Sprout**
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
- - **TheSixtyOne**
- **TheStar**
- **TheSun**
- **TheWeatherChannel**
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**
+ - **WDRElefant**
+ - **WDRPage**
- **Webcaster**
- **WebcasterFeed**
- **WebOfStories**
- **WebOfStoriesPlaylist**
+ - **Weibo**
+ - **WeiboMobile**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐
+ - **ximalaya**: 喜马拉雅FM
+ - **ximalaya:album**: 喜马拉雅FM 专辑
- **XMinus**
- **XNXX**
- **Xstream**
def generator(test_case, tname):
def test_template(self):
- ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
- other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
+ ie = youtube_dl.extractor.get_info_extractor(test_case['name'])()
+ other_ies = [get_info_extractor(ie_key)() for ie_key in test_case.get('add_ie', [])]
is_playlist = any(k.startswith('playlist') for k in test_case)
test_cases = test_case.get(
'playlist', [] if is_playlist else [test_case])
inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
+ inp = '''{segments: [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}'''
+ self.assertEqual(js_to_json(inp), '''{"segments": [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}''')
+
def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
self.assertEqual(json.loads(on), {'42': 42})
+ on = js_to_json('{42:4.2e1}')
+ self.assertEqual(json.loads(on), {'42': 42.0})
+
+ def test_js_to_json_malformed(self):
+ self.assertEqual(js_to_json('42a1'), '42"a1"')
+ self.assertEqual(js_to_json('42a-1'), '42"a"-1')
+
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
YouTube.com and a few more sites.
It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is
not platform specific.
-It should work on your Unix box, on Windows or on Mac OS X.
+It should work on your Unix box, on Windows or on macOS.
It is released to the public domain, which means you can modify it,
redistribute it or use it however you like.
.SH OPTIONS
format (https://en.wikipedia.org/wiki/Newline) in the cookies file and
convert newlines if necessary to correspond with your OS, namely
\f[C]CRLF\f[] (\f[C]\\r\\n\f[]) for Windows and \f[C]LF\f[]
-(\f[C]\\n\f[]) for Unix and Unix\-like systems (Linux, Mac OS, etc.).
+(\f[C]\\n\f[]) for Unix and Unix\-like systems (Linux, macOS, etc.).
\f[C]HTTP\ Error\ 400:\ Bad\ Request\f[] when using \f[C]\-\-cookies\f[]
is a good sign of invalid newline format.
.PP
sys.exc_clear()
except Exception:
pass
- self._write_string('[debug] Python version %s - %s\n' % (
- platform.python_version(), platform_name()))
+
+ def python_implementation():
+ impl_name = platform.python_implementation()
+ if impl_name == 'PyPy' and hasattr(sys, 'pypy_version_info'):
+ return impl_name + ' version %d.%d.%d' % sys.pypy_version_info[:3]
+ return impl_name
+
+ self._write_string('[debug] Python version %s (%s) - %s\n' % (
+ platform.python_version(), python_implementation(),
+ platform_name()))
exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version()
from __future__ import unicode_literals
-import base64
from math import ceil
+from .compat import compat_b64decode
from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
"""
NONCE_LENGTH_BYTES = 8
- data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
+ data = bytes_to_intlist(compat_b64decode(data))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password))
# coding: utf-8
from __future__ import unicode_literals
+import base64
import binascii
import collections
+import ctypes
import email
import getpass
import io
import itertools
import optparse
import os
+import platform
import re
import shlex
import shutil
except ImportError:
compat_zip = zip
+
+if sys.version_info < (3, 3):
+ def compat_b64decode(s, *args, **kwargs):
+ if isinstance(s, compat_str):
+ s = s.encode('ascii')
+ return base64.b64decode(s, *args, **kwargs)
+else:
+ compat_b64decode = base64.b64decode
+
+
+if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
+ # PyPy2 prior to version 5.4.0 expects byte strings as Windows function
+ # names, see the original PyPy issue [1] and the youtube-dl one [2].
+ # 1. https://bitbucket.org/pypy/pypy/issues/2360/windows-ctypescdll-typeerror-function-name
+ # 2. https://github.com/rg3/youtube-dl/pull/4392
+ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
+ real = ctypes.WINFUNCTYPE(*args, **kwargs)
+
+ def resf(tpl, *args, **kwargs):
+ funcname, dll = tpl
+ return real((str(funcname), dll), *args, **kwargs)
+
+ return resf
+else:
+ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
+ return ctypes.WINFUNCTYPE(*args, **kwargs)
+
+
__all__ = [
'compat_HTMLParseError',
'compat_HTMLParser',
'compat_HTTPError',
+ 'compat_b64decode',
'compat_basestring',
'compat_chr',
'compat_cookiejar',
'compat_cookies',
+ 'compat_ctypes_WINFUNCTYPE',
'compat_etree_fromstring',
'compat_etree_register_namespace',
'compat_expanduser',
from __future__ import division, unicode_literals
-import base64
import io
import itertools
import time
from .fragment import FragmentFD
from ..compat import (
+ compat_b64decode,
compat_etree_fromstring,
compat_urlparse,
compat_urllib_error,
boot_info = self._get_bootstrap_from_url(bootstrap_url)
else:
bootstrap_url = None
- bootstrap = base64.b64decode(node.text.encode('ascii'))
+ bootstrap = compat_b64decode(node.text)
boot_info = read_bootstrap_info(bootstrap)
return boot_info, bootstrap_url
live = boot_info['live']
metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None:
- metadata = base64.b64decode(metadata_node.text.encode('ascii'))
+ metadata = compat_b64decode(metadata_node.text)
else:
metadata = None
from ..compat import compat_str
from ..utils import (
int_or_none,
- parse_iso8601,
+ unified_timestamp,
OnDemandPagedList,
)
}, {
# test with multiple blings
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
- 'md5': '55c0097badd7095f494c99a172f86501',
+ 'md5': 'e87d5b8516cd04c0d81b6ee1caca28d0',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
'timestamp': 1477346700,
'upload_date': '20161024',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
- 'duration': 2797,
+ 'duration': 2766,
}
}]
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
cast_data = self._download_json(
- 'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
+ 'https://play-api.acast.com/splash/%s/%s' % (channel, display_id), display_id)
+ e = cast_data['result']['episode']
return {
- 'id': compat_str(cast_data['id']),
+ 'id': compat_str(e['id']),
'display_id': display_id,
- 'url': [b['audio'] for b in cast_data['blings'] if b['type'] == 'BlingAudio'][0],
- 'title': cast_data['name'],
- 'description': cast_data.get('description'),
- 'thumbnail': cast_data.get('image'),
- 'timestamp': parse_iso8601(cast_data.get('publishingDate')),
- 'duration': int_or_none(cast_data.get('duration')),
+ 'url': e['mediaUrl'],
+ 'title': e['name'],
+ 'description': e.get('description'),
+ 'thumbnail': e.get('image'),
+ 'timestamp': unified_timestamp(e.get('publishingDate')),
+ 'duration': int_or_none(e.get('duration')),
}
# coding: utf-8
from __future__ import unicode_literals
-import base64
import json
import os
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
-from ..compat import compat_ord
+from ..compat import (
+ compat_b64decode,
+ compat_ord,
+)
from ..utils import (
bytes_to_intlist,
ExtractorError,
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
- bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
+ bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'),
- bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
+ bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),
# coding: utf-8
from __future__ import unicode_literals
-import base64
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+ compat_b64decode,
+ compat_urllib_parse_unquote,
+)
class BigflixIE(InfoExtractor):
webpage, 'title')
def decode_url(quoted_b64_url):
- return base64.b64decode(compat_urllib_parse_unquote(
- quoted_b64_url).encode('ascii')).decode('utf-8')
+ return compat_b64decode(compat_urllib_parse_unquote(
+ quoted_b64_url)).decode('utf-8')
formats = []
for height, encoded_url in re.findall(
video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
+ 'Referer': url
}
headers.update(self.geo_verification_headers())
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
+ headers = {
+ 'Referer': url
+ }
+ headers.update(self.geo_verification_headers())
+
video_info = self._download_json(
'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page',
- headers=self.geo_verification_headers())
+ headers=headers)
if 'durl' not in video_info:
self._report_error(video_info)
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlparse
from ..utils import (
- dict_get,
# ExtractorError,
# HEADRequest,
int_or_none,
qualities,
- remove_end,
unified_strdate,
)
class CanalplusIE(InfoExtractor):
- IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
- _VALID_URL = r'''(?x)
- https?://
- (?:
- (?:
- (?:(?:www|m)\.)?canalplus\.fr|
- (?:www\.)?piwiplus\.fr|
- (?:www\.)?d8\.tv|
- (?:www\.)?c8\.fr|
- (?:www\.)?d17\.tv|
- (?:(?:football|www)\.)?cstar\.fr|
- (?:www\.)?itele\.fr
- )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
- player\.canalplus\.fr/#/(?P<id>\d+)
- )
-
- '''
+ IE_DESC = 'mycanal.fr and piwiplus.fr'
+ _VALID_URL = r'https?://(?:www\.)?(?P<site>mycanal|piwiplus)\.fr/(?:[^/]+/)*(?P<display_id>[^?/]+)(?:\.html\?.*\bvid=|/p/)(?P<id>\d+)'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = {
- 'canalplus': 'cplus',
+ 'mycanal': 'cplus',
'piwiplus': 'teletoon',
- 'd8': 'd8',
- 'c8': 'd8',
- 'd17': 'd17',
- 'cstar': 'd17',
- 'itele': 'itele',
}
# Only works for direct mp4 URLs
_GEO_COUNTRIES = ['FR']
_TESTS = [{
- 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
+ 'url': 'https://www.mycanal.fr/d17-emissions/lolywood/p/1397061',
'info_dict': {
- 'id': '1405510',
- 'display_id': 'pid1830-c-zapping',
+ 'id': '1397061',
+ 'display_id': 'lolywood',
'ext': 'mp4',
- 'title': 'Zapping - 02/07/2016',
- 'description': 'Le meilleur de toutes les chaînes, tous les jours',
- 'upload_date': '20160702',
+ 'title': 'Euro 2016 : Je préfère te prévenir - Lolywood - Episode 34',
+ 'description': 'md5:7d97039d455cb29cdba0d652a0efaa5e',
+ 'upload_date': '20160602',
},
}, {
# geo restricted, bypassed
'upload_date': '20140724',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
- }, {
- # geo restricted, bypassed
- 'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html?vid=1443684',
- 'md5': 'bb6f9f343296ab7ebd88c97b660ecf8d',
- 'info_dict': {
- 'id': '1443684',
- 'display_id': 'pid6318-videos-integrales',
- 'ext': 'mp4',
- 'title': 'Guess my iep ! - TPMP - 07/04/2017',
- 'description': 'md5:6f005933f6e06760a9236d9b3b5f17fa',
- 'upload_date': '20170407',
- },
- 'expected_warnings': ['HTTP Error 403: Forbidden'],
- }, {
- 'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
- 'info_dict': {
- 'id': '1420176',
- 'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
- 'ext': 'mp4',
- 'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
- 'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
- 'upload_date': '20161014',
- },
- }, {
- 'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
- 'info_dict': {
- 'id': '1416769',
- 'display_id': 'pid7566-feminines-videos',
- 'ext': 'mp4',
- 'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
- 'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
- 'upload_date': '20160921',
- },
- 'params': {
- 'skip_download': True,
- },
- }, {
- 'url': 'http://m.canalplus.fr/?vid=1398231',
- 'only_matching': True,
- }, {
- 'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
- 'only_matching': True,
}]
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
-
- site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
-
- # Beware, some subclasses do not define an id group
- display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
+ site, display_id, video_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(url, display_id)
- video_id = self._search_regex(
- [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
- r'id=["\']canal_video_player(?P<id>\d+)',
- r'data-video=["\'](?P<id>\d+)'],
- webpage, 'video id', default=mobj.group('vid'), group='id')
+ site_id = self._SITE_ID_MAP[site]
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
- # the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
+ # the secret extracted from ya function in http://player.canalplus.fr/common/js/canalPlayer.js
'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
'format_id': format_id,
'preference': preference(format_id),
from __future__ import unicode_literals
import re
-import base64
import json
from .common import InfoExtractor
from .youtube import YoutubeIE
+from ..compat import compat_b64decode
from ..utils import (
clean_html,
ExtractorError
base64_video_info = self._html_search_regex(
r'var cozVidData = "(.+?)";', webpage, 'video data')
- decoded_video_info = base64.b64decode(base64_video_info.encode('utf-8')).decode('utf-8')
+ decoded_video_info = compat_b64decode(base64_video_info).decode('utf-8')
video_info_dict = json.loads(decoded_video_info)
# get video information from dict
# coding: utf-8
from __future__ import unicode_literals
-import base64
import re
from .common import InfoExtractor
+from ..compat import compat_b64decode
from ..utils import parse_duration
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look
# for soundURL)
- audio_url = base64.b64decode(
- data_fd[::-1].encode('ascii')).decode('utf-8')
+ audio_url = compat_b64decode(data_fd[::-1]).decode('utf-8')
title = self._search_regex(
r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
- elif item_type == 'Article':
+ elif item_type in ('Article', 'NewsArticle'):
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=m3u8_id, fatal=False))
- elif ext == 'mpd':
+ elif source_type == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, video_id, mpd_id=mpd_id, fatal=False))
elif ext == 'smil':
import re
import json
-import base64
import zlib
from hashlib import sha1
from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
+ compat_b64decode,
compat_etree_fromstring,
compat_urllib_parse_urlencode,
compat_urllib_request,
}
def _decrypt_subtitles(self, data, iv, id):
- data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
- iv = bytes_to_intlist(base64.b64decode(iv.encode('utf-8')))
+ data = bytes_to_intlist(compat_b64decode(data))
+ iv = bytes_to_intlist(compat_b64decode(iv))
id = int(id)
def obfuscate_key_aux(count, modulo, start):
aes_cbc_decrypt,
aes_cbc_encrypt,
)
+from ..compat import compat_b64decode
from ..utils import (
bytes_to_intlist,
bytes_to_long,
rtn = self._parse_json(
intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
- base64.b64decode(encrypted_rtn)),
+ compat_b64decode(encrypted_rtn)),
aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id)
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import js_to_json
+
+
+class DiggIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?digg\.com/video/(?P<id>[^/?#&]+)'
+ _TESTS = [{
+ # JWPlatform via provider
+ 'url': 'http://digg.com/video/sci-fi-short-jonah-daniel-kaluuya-get-out',
+ 'info_dict': {
+ 'id': 'LcqvmS0b',
+ 'ext': 'mp4',
+ 'title': "'Get Out' Star Daniel Kaluuya Goes On 'Moby Dick'-Like Journey In Sci-Fi Short 'Jonah'",
+ 'description': 'md5:541bb847648b6ee3d6514bc84b82efda',
+ 'upload_date': '20180109',
+ 'timestamp': 1515530551,
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ # Youtube via provider
+ 'url': 'http://digg.com/video/dog-boat-seal-play',
+ 'only_matching': True,
+ }, {
+ # vimeo as regular embed
+ 'url': 'http://digg.com/video/dream-girl-short-film',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ info = self._parse_json(
+ self._search_regex(
+ r'(?s)video_info\s*=\s*({.+?});\n', webpage, 'video info',
+ default='{}'), display_id, transform_source=js_to_json,
+ fatal=False)
+
+ video_id = info.get('video_id')
+
+ if video_id:
+ provider = info.get('provider_name')
+ if provider == 'youtube':
+ return self.url_result(
+ video_id, ie='Youtube', video_id=video_id)
+ elif provider == 'jwplayer':
+ return self.url_result(
+ 'jwplatform:%s' % video_id, ie='JWPlatform',
+ video_id=video_id)
+
+ return self.url_result(url, 'Generic')
compat_urlparse,
)
from ..utils import (
+ determine_ext,
ExtractorError,
+ float_or_none,
int_or_none,
remove_end,
try_get,
unified_strdate,
+ unified_timestamp,
update_url_query,
USER_AGENTS,
)
class DPlayIE(InfoExtractor):
- _VALID_URL = r'https?://(?P<domain>www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
+ _VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:videoer/)?(?P<id>[^/]+/[^/?#]+)'
_TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': {
'id': '3172',
- 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
+ 'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': {
'id': '70816',
- 'display_id': 'season-6-episode-12',
+ 'display_id': 'mig-og-min-mor/season-6-episode-12',
'ext': 'mp4',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
+ }, {
+ # disco-api
+ 'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
+ 'info_dict': {
+ 'id': '40206',
+ 'display_id': 'i-kongens-klr/sesong-1-episode-7',
+ 'ext': 'mp4',
+ 'title': 'Episode 7',
+ 'description': 'md5:e3e1411b2b9aebeea36a6ec5d50c60cf',
+ 'duration': 2611.16,
+ 'timestamp': 1516726800,
+ 'upload_date': '20180123',
+ 'series': 'I kongens klær',
+ 'season_number': 1,
+ 'episode_number': 7,
+ },
+ 'params': {
+ 'format': 'bestvideo',
+ 'skip_download': True,
+ },
+ }, {
+ # geo restricted, bypassable via X-Forwarded-For
+ 'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
+ 'only_matching': True,
}]
def _real_extract(self, url):
display_id = mobj.group('id')
domain = mobj.group('domain')
+ self._initialize_geo_bypass([mobj.group('country').upper()])
+
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
- r'data-video-id=["\'](\d+)', webpage, 'video id')
+ r'data-video-id=["\'](\d+)', webpage, 'video id', default=None)
+
+ if not video_id:
+ host = mobj.group('host')
+ disco_base = 'https://disco-api.%s' % host
+ self._download_json(
+ '%s/token' % disco_base, display_id, 'Downloading token',
+ query={
+ 'realm': host.replace('.', ''),
+ })
+ video = self._download_json(
+ '%s/content/videos/%s' % (disco_base, display_id), display_id,
+ headers={
+ 'Referer': url,
+ 'x-disco-client': 'WEB:UNKNOWN:dplay-client:0.0.1',
+ }, query={
+ 'include': 'show'
+ })
+ video_id = video['data']['id']
+ info = video['data']['attributes']
+ title = info['name']
+ formats = []
+ for format_id, format_dict in self._download_json(
+ '%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
+ display_id)['data']['attributes']['streaming'].items():
+ if not isinstance(format_dict, dict):
+ continue
+ format_url = format_dict.get('url')
+ if not format_url:
+ continue
+ ext = determine_ext(format_url)
+ if format_id == 'dash' or ext == 'mpd':
+ formats.extend(self._extract_mpd_formats(
+ format_url, display_id, mpd_id='dash', fatal=False))
+ elif format_id == 'hls' or ext == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ format_url, display_id, 'mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls',
+ fatal=False))
+ else:
+ formats.append({
+ 'url': format_url,
+ 'format_id': format_id,
+ })
+ self._sort_formats(formats)
+
+ series = None
+ try:
+ included = video.get('included')
+ if isinstance(included, list):
+ show = next(e for e in included if e.get('type') == 'show')
+ series = try_get(
+ show, lambda x: x['attributes']['name'], compat_str)
+ except StopIteration:
+ pass
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': title,
+ 'description': info.get('description'),
+ 'duration': float_or_none(
+ info.get('videoDuration'), scale=1000),
+ 'timestamp': unified_timestamp(info.get('publishStart')),
+ 'series': series,
+ 'season_number': int_or_none(info.get('seasonNumber')),
+ 'episode_number': int_or_none(info.get('episodeNumber')),
+ 'age_limit': int_or_none(info.get('minimum_age')),
+ 'formats': formats,
+ }
info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
# coding: utf-8
from __future__ import unicode_literals
-import base64
import re
from .common import InfoExtractor
+from ..compat import compat_b64decode
from ..utils import (
qualities,
sanitized_Request,
r'data-files="([^"]+)"', webpage, 'data files')
files = self._parse_json(
- base64.b64decode(files_base64.encode('utf-8')).decode('utf-8'),
+ compat_b64decode(files_base64).decode('utf-8'),
video_id)
quality = qualities(['flv', 'mobile', 'tablet', '720p'])
# coding: utf-8
from __future__ import unicode_literals
-import base64
import json
from .common import InfoExtractor
from ..compat import (
- compat_urlparse,
+ compat_b64decode,
compat_str,
+ compat_urlparse,
)
from ..utils import (
extract_attributes,
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id):
- return self._parse_json(base64.b64decode((
+ return self._parse_json(compat_b64decode((
encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
- ).encode('ascii')).decode('utf-8'), video_id)
+ )).decode('utf-8'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
from .democracynow import DemocracynowIE
from .dfb import DFBIE
from .dhm import DHMIE
+from .digg import DiggIE
from .dotsub import DotsubIE
from .douyutv import (
DouyuShowIE,
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
-from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .mojvideo import MojvideoIE
from .moniker import MonikerIE
from .morningstar import MorningstarIE
-from .motherless import MotherlessIE
+from .motherless import (
+ MotherlessIE,
+ MotherlessGroupIE
+)
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE
Revision3IE,
)
from .rice import RICEIE
-from .ringtv import RingTVIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE
+from .rtvs import RTVSIE
from .rudo import RudoIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
from .servus import ServusIE
from .sevenplus import SevenPlusIE
from .sexu import SexuIE
+from .seznamzpravy import (
+ SeznamZpravyIE,
+ SeznamZpravyArticleIE,
+)
from .shahid import (
ShahidIE,
ShahidShowIE,
from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE
from .sportdeutschland import SportDeutschlandIE
-from .sportschau import SportschauIE
+from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE
from .srgssr import (
SRGSSRIE,
ThePlatformFeedIE,
)
from .thescene import TheSceneIE
-from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thesun import TheSunIE
from .theweatherchannel import TheWeatherChannelIE
from .watchindianporn import WatchIndianPornIE
from .wdr import (
WDRIE,
+ WDRPageIE,
+ WDRElefantIE,
WDRMobileIE,
)
from .webcaster import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
+from .weibo import (
+ WeiboIE,
+ WeiboMobileIE
+)
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
XiamiArtistIE,
XiamiCollectionIE
)
+from .ximalaya import (
+ XimalayaIE,
+ XimalayaAlbumIE
+)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE
description = self._og_search_description(webpage)
upload_date_str = self._search_regex(
- r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
+ r'class=["\']\s*cover-emission-period\s*["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
webpage, 'upload date', fatal=False)
if upload_date_str:
upload_date_list = upload_date_str.split()
# coding: utf-8
from __future__ import unicode_literals
+import re
+
from .common import InfoExtractor
from ..utils import (
int_or_none,
class GameStarIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
- _TEST = {
+ _VALID_URL = r'https?://(?:www\.)?game(?P<site>pro|star)\.de/videos/.*,(?P<id>[0-9]+)\.html'
+ _TESTS = [{
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
- 'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
+ 'md5': 'ee782f1f8050448c95c5cacd63bc851c',
'info_dict': {
'id': '76110',
'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
'thumbnail': r're:^https?://.*\.jpg$',
- 'timestamp': 1406542020,
+ 'timestamp': 1406542380,
'upload_date': '20140728',
- 'duration': 17
+ 'duration': 17,
}
- }
+ }, {
+ 'url': 'http://www.gamepro.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.gamestar.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
- video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
+ mobj = re.match(self._VALID_URL, url)
+ site = mobj.group('site')
+ video_id = mobj.group('id')
- url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
+ webpage = self._download_webpage(url, video_id)
# TODO: there are multiple ld+json objects in the webpage,
# while _search_json_ld finds only the first one
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
webpage, 'JSON-LD', group='json_ld'), video_id)
info_dict = self._json_ld(json_ld, video_id)
- info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
+ info_dict['title'] = remove_end(
+ info_dict['title'], ' - Game%s' % site.title())
- view_count = json_ld.get('interactionCount')
+ view_count = int_or_none(json_ld.get('interactionCount'))
comment_count = int_or_none(self._html_search_regex(
- r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
- fatal=False))
+ r'<span>Kommentare</span>\s*<span[^>]+class=["\']count[^>]+>\s*\(\s*([0-9]+)',
+ webpage, 'comment count', fatal=False))
info_dict.update({
'id': video_id,
- 'url': url,
+ 'url': 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id,
'ext': 'mp4',
'view_count': view_count,
'comment_count': comment_count
from .channel9 import Channel9IE
from .vshare import VShareIE
from .mediasite import MediasiteIE
+from .springboardplatform import SpringboardPlatformIE
class GenericIE(InfoExtractor):
'timestamp': 1474354800,
'upload_date': '20160920',
}
+ },
+ {
+ 'url': 'http://www.kidzworld.com/article/30935-trolls-the-beat-goes-on-interview-skylar-astin-and-amanda-leighton',
+ 'info_dict': {
+ 'id': '1731611',
+ 'ext': 'mp4',
+ 'title': 'Official Trailer | TROLLS: THE BEAT GOES ON!',
+ 'description': 'md5:eb5f23826a027ba95277d105f248b825',
+ 'timestamp': 1516100691,
+ 'upload_date': '20180116',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ 'add_ie': [SpringboardPlatformIE.ie_key()],
}
# {
# # TODO: find another test
return self.url_result(viewlift_url)
# Look for JWPlatform embeds
- jwplatform_url = JWPlatformIE._extract_url(webpage)
- if jwplatform_url:
- return self.url_result(jwplatform_url, 'JWPlatform')
+ jwplatform_urls = JWPlatformIE._extract_urls(webpage)
+ if jwplatform_urls:
+ return self.playlist_from_matches(jwplatform_urls, video_id, video_title, ie=JWPlatformIE.ie_key())
# Look for Digiteka embeds
digiteka_url = DigitekaIE._extract_url(webpage)
for mediasite_url in mediasite_urls]
return self.playlist_result(entries, video_id, video_title)
+ springboardplatform_urls = SpringboardPlatformIE._extract_urls(webpage)
+ if springboardplatform_urls:
+ return self.playlist_from_matches(
+ springboardplatform_urls, video_id, video_title,
+ ie=SpringboardPlatformIE.ie_key())
+
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
from __future__ import unicode_literals
-import base64
-
from .common import InfoExtractor
+from ..compat import compat_b64decode
from ..utils import (
ExtractorError,
HEADRequest,
if 'mediaKey' not in mkd:
raise ExtractorError('Did not get a media key')
- redirect_url = base64.b64decode(video_url_base64).decode('utf-8')
+ redirect_url = compat_b64decode(video_url_base64).decode('utf-8')
redirect_req = HEADRequest(redirect_url)
req = self._request_webpage(
redirect_req, video_id,
from __future__ import unicode_literals
-import base64
-
from ..compat import (
+ compat_b64decode,
compat_urllib_parse_unquote,
compat_urlparse,
)
encoded_id = self._search_regex(
r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id', default=None)
- real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
+ real_id = compat_urllib_parse_unquote(compat_b64decode(encoded_id).decode('utf-8'))
playpath = 'mp4:' + real_id
return [{
@staticmethod
def _extract_url(webpage):
- mobj = re.search(
- r'<(?:script|iframe)[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})',
+ urls = JWPlatformIE._extract_urls(webpage)
+ return urls[0] if urls else None
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return re.findall(
+ r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
webpage)
- if mobj:
- return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
+++ /dev/null
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
- int_or_none,
- qualities,
-)
-
-
-class KamcordIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
- _TEST = {
- 'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
- 'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
- 'info_dict': {
- 'id': 'hNYRduDgWb4',
- 'ext': 'mp4',
- 'title': 'Drinking Madness',
- 'uploader': 'jacksfilms',
- 'uploader_id': '3044562',
- 'view_count': int,
- 'like_count': int,
- 'comment_count': int,
- },
- }
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- video = self._parse_json(
- self._search_regex(
- r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
- webpage, 'video'),
- video_id)['video']
-
- title = video['title']
-
- formats = self._extract_m3u8_formats(
- video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
- self._sort_formats(formats)
-
- uploader = video.get('user', {}).get('username')
- uploader_id = video.get('user', {}).get('id')
-
- view_count = int_or_none(video.get('viewCount'))
- like_count = int_or_none(video.get('heartCount'))
- comment_count = int_or_none(video.get('messageCount'))
-
- preference_key = qualities(('small', 'medium', 'large'))
-
- thumbnails = [{
- 'url': thumbnail_url,
- 'id': thumbnail_id,
- 'preference': preference_key(thumbnail_id),
- } for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
- if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
-
- return {
- 'id': video_id,
- 'title': title,
- 'uploader': uploader,
- 'uploader_id': uploader_id,
- 'view_count': view_count,
- 'like_count': like_count,
- 'comment_count': comment_count,
- 'thumbnails': thumbnails,
- 'formats': formats,
- }
# coding: utf-8
from __future__ import unicode_literals
-import base64
import datetime
import hashlib
import re
from .common import InfoExtractor
from ..compat import (
+ compat_b64decode,
compat_ord,
compat_str,
compat_urllib_parse_urlencode,
raise ExtractorError('Letv cloud returned an unknwon error')
def b64decode(s):
- return base64.b64decode(s.encode('utf-8')).decode('utf-8')
+ return compat_b64decode(s).decode('utf-8')
formats = []
for media in play_json['data']['video_info']['media'].values():
float_or_none,
int_or_none,
smuggle_url,
+ try_get,
unsmuggle_url,
ExtractorError,
)
'subtitles': subtitles,
}
+ def _extract_info_helper(self, pc, mobile, i, metadata):
+ return self._extract_info(
+ try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
+ try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
+ metadata)
+
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
- return self._extract_info(
- pc['playlistItems'][0].get('streams', []),
- mobile['mediaList'][0].get('mobileUrls', []) if mobile else [],
- metadata)
+ return self._extract_info_helper(pc, mobile, 0, metadata)
class LimelightChannelIE(LimelightBaseIE):
'media', smuggled_data.get('source_url'))
entries = [
- self._extract_info(
- pc['playlistItems'][i].get('streams', []),
- mobile['mediaList'][i].get('mobileUrls', []) if mobile else [],
- medias['media_list'][i])
+ self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
for i in range(len(medias['media_list']))]
return self.playlist_result(entries, channel_id, pc['title'])
class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos'
- _VALID_URL = r'https?://(?:www\.)?(?:lynda\.com|educourse\.ga)/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:www\.)?(?:lynda\.com|educourse\.ga)/
+ (?:
+ (?:[^/]+/){2,3}(?P<course_id>\d+)|
+ player/embed
+ )/
+ (?P<id>\d+)
+ '''
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
}, {
'url': 'https://educourse.ga/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'only_matching': True,
+ }, {
+ 'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
+ 'only_matching': True,
}]
def _raise_unavailable(self, video_id):
def _get_subtitles(self, video_id):
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
subs = self._download_json(url, None, False)
- if subs:
- return {'en': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]}
+ fixed_subs = self._fix_subtitles(subs)
+ if fixed_subs:
+ return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
else:
return {}
# Course link equals to welcome/introduction video link of same course
# We will recognize it as course link
- _VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>[^/]+/[^/]+/(?P<courseid>\d+))-\d\.html'
+ _VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>(?:[^/]+/){2,3}(?P<courseid>\d+))-2\.html'
+
+ _TESTS = [{
+ 'url': 'https://www.lynda.com/Graphic-Design-tutorials/Grundlagen-guten-Gestaltung/393570-2.html',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Grundlagen-guten-Gestaltung/393570-2.html',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
# coding: utf-8
from __future__ import unicode_literals
-import base64
-
from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
-from ..utils import (
- int_or_none,
+from ..compat import (
+ compat_b64decode,
+ compat_urllib_parse_unquote,
)
+from ..utils import int_or_none
class MangomoloBaseIE(InfoExtractor):
_IS_LIVE = True
def _get_real_id(self, page_id):
- return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()
+ return compat_b64decode(compat_urllib_parse_unquote(page_id)).decode()
# coding: utf-8
from __future__ import unicode_literals
+import json
import uuid
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..compat import (
compat_str,
- compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
- bas = location.get('bas')
- loc = location.get('loc')
+ gcp = location.get('gcp')
ogn = location.get('ogn')
- if None in (gat, bas, loc, ogn):
+ if None in (gat, gcp, ogn):
continue
token_data = {
- 'bas': bas,
- 'icd': loc,
+ 'gcp': gcp,
'ogn': ogn,
- 'sta': '0',
+ 'sta': 0,
}
media = self._download_json(
- '%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
- video_id, 'Downloading %s JSON' % location['loc'])
- file_ = media.get('file')
- if not file_:
+ gat, video_id, data=json.dumps(token_data).encode('utf-8'),
+ headers={
+ 'Content-Type': 'application/json;charset=utf-8',
+ 'Referer': url,
+ })
+ stream = media.get('stream') or media.get('file')
+ if not stream:
continue
- ext = determine_ext(file_)
+ ext = determine_ext(stream)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
- file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
+ stream + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
- file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+ stream, video_id, 'mp4', 'm3u8_native',
+ m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
from __future__ import unicode_literals
-import base64
import functools
import itertools
import re
from .common import InfoExtractor
from ..compat import (
+ compat_b64decode,
compat_chr,
compat_ord,
compat_str,
if encrypted_play_info is not None:
# Decode
- encrypted_play_info = base64.b64decode(encrypted_play_info)
+ encrypted_play_info = compat_b64decode(encrypted_play_info)
else:
# New path
full_info_json = self._parse_json(self._html_search_regex(
kpa_target = encrypted_play_info
else:
kps = ['https://', 'http://']
- kpa_target = base64.b64decode(info_json['streamInfo']['url'])
+ kpa_target = compat_b64decode(info_json['streamInfo']['url'])
for kp in kps:
partial_key = self._decrypt_xor_cipher(kpa_target, kp)
for quote in ["'", '"']:
format_url = stream_info.get(url_key)
if not format_url:
continue
- decrypted = self._decrypt_xor_cipher(key, base64.b64decode(format_url))
+ decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url))
if not decrypted:
continue
if url_key == 'hlsUrl':
import re
from .common import InfoExtractor
+from ..compat import compat_urlparse
from ..utils import (
ExtractorError,
+ InAdvancePagedList,
+ orderedSet,
str_to_int,
unified_strdate,
)
'age_limit': age_limit,
'url': video_url,
}
+
+
+class MotherlessGroupIE(InfoExtractor):
+ _VALID_URL = 'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)'
+ _TESTS = [{
+ 'url': 'http://motherless.com/g/movie_scenes',
+ 'info_dict': {
+ 'id': 'movie_scenes',
+ 'title': 'Movie Scenes',
+ 'description': 'Hot and sexy scenes from "regular" movies... '
+ 'Beautiful actresses fully nude... A looot of '
+ 'skin! :)Enjoy!',
+ },
+ 'playlist_mincount': 662,
+ }, {
+ 'url': 'http://motherless.com/gv/sex_must_be_funny',
+ 'info_dict': {
+ 'id': 'sex_must_be_funny',
+ 'title': 'Sex must be funny',
+ 'description': 'Sex can be funny. Wide smiles,laugh, games, fun of '
+ 'any kind!'
+ },
+ 'playlist_mincount': 9,
+ }]
+
+ @classmethod
+ def suitable(cls, url):
+ return (False if MotherlessIE.suitable(url)
+ else super(MotherlessGroupIE, cls).suitable(url))
+
+ def _extract_entries(self, webpage, base):
+ entries = []
+ for mobj in re.finditer(
+ r'href="(?P<href>/[^"]+)"[^>]*>(?:\s*<img[^>]+alt="[^-]+-\s(?P<title>[^"]+)")?',
+ webpage):
+ video_url = compat_urlparse.urljoin(base, mobj.group('href'))
+ if not MotherlessIE.suitable(video_url):
+ continue
+ video_id = MotherlessIE._match_id(video_url)
+ title = mobj.group('title')
+ entries.append(self.url_result(
+ video_url, ie=MotherlessIE.ie_key(), video_id=video_id,
+ video_title=title))
+ # Alternative fallback
+ if not entries:
+ entries = [
+ self.url_result(
+ compat_urlparse.urljoin(base, '/' + video_id),
+ ie=MotherlessIE.ie_key(), video_id=video_id)
+ for video_id in orderedSet(re.findall(
+ r'data-codename=["\']([A-Z0-9]+)', webpage))]
+ return entries
+
+ def _real_extract(self, url):
+ group_id = self._match_id(url)
+ page_url = compat_urlparse.urljoin(url, '/gv/%s' % group_id)
+ webpage = self._download_webpage(page_url, group_id)
+ title = self._search_regex(
+ r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
+ description = self._html_search_meta(
+ 'description', webpage, fatal=False)
+ page_count = self._int(self._search_regex(
+ r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
+ webpage, 'page_count'), 'page_count')
+ PAGE_SIZE = 80
+
+ def _get_page(idx):
+ webpage = self._download_webpage(
+ page_url, group_id, query={'page': idx + 1},
+ note='Downloading page %d/%d' % (idx + 1, page_count)
+ )
+ for entry in self._extract_entries(webpage, url):
+ yield entry
+
+ playlist = InAdvancePagedList(_get_page, page_count, PAGE_SIZE)
+
+ return {
+ '_type': 'playlist',
+ 'id': group_id,
+ 'title': title,
+ 'description': description,
+ 'entries': playlist
+ }
ext = determine_ext(src, None)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
- src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
+ src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
+ f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
- src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
+ src, video_id, 'mp4', m3u8_id='hls',
+ entry_protocol='m3u8_native', fatal=False))
else:
quality = f.get('quality')
ff = {
class OdnoklassnikiIE(InfoExtractor):
- _VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
+ _VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer|live)/(?P<id>[\d-]+)'
_TESTS = [{
# metadata in JSON
'url': 'http://ok.ru/video/20079905452',
- 'md5': '6ba728d85d60aa2e6dd37c9e70fdc6bc',
+ 'md5': '0b62089b479e06681abaaca9d204f152',
'info_dict': {
'id': '20079905452',
'ext': 'mp4',
'like_count': int,
'age_limit': 0,
},
- 'skip': 'Video has been blocked',
}, {
# metadataUrl
'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
}, {
'url': 'http://mobile.ok.ru/video/20079905452',
'only_matching': True,
+ }, {
+ 'url': 'https://www.ok.ru/live/484531969818',
+ 'only_matching': True,
}]
def _real_extract(self, url):
})
return info
+ assert title
+ if provider == 'LIVE_TV_APP':
+ info['title'] = self._live_title(title)
+
quality = qualities(('4', '0', '1', '2', '3', '5'))
formats = [{
if fmt_type:
fmt['quality'] = quality(fmt_type)
+ # Live formats
+ m3u8_url = metadata.get('hlsMasterPlaylistUrl')
+ if m3u8_url:
+ formats.extend(self._extract_m3u8_formats(
+ m3u8_url, video_id, 'mp4', entry_protocol='m3u8',
+ m3u8_id='hls', fatal=False))
+ rtmp_url = metadata.get('rtmpUrl')
+ if rtmp_url:
+ formats.append({
+ 'url': rtmp_url,
+ 'format_id': 'rtmp',
+ 'ext': 'flv',
+ })
+
self._sort_formats(formats)
info['formats'] = formats
from __future__ import unicode_literals
+
import re
-import base64
from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import (
+ compat_b64decode,
+ compat_str,
+ compat_urllib_parse_urlencode,
+)
from ..utils import (
determine_ext,
ExtractorError,
try_get,
unsmuggle_url,
)
-from ..compat import compat_urllib_parse_urlencode
class OoyalaBaseIE(InfoExtractor):
url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
if not url_data:
continue
- s_url = base64.b64decode(url_data.encode('ascii')).decode('utf-8')
+ s_url = compat_b64decode(url_data).decode('utf-8')
if not s_url or s_url in urls:
continue
urls.append(s_url)
webpage, _ = phantom.get(page_url, html=webpage, video_id=video_id, headers=headers)
decoded_id = (get_element_by_id('streamurl', webpage) or
- get_element_by_id('streamuri', webpage))
+ get_element_by_id('streamuri', webpage) or
+ get_element_by_id('streamurj', webpage))
+
+ if not decoded_id:
+ raise ExtractorError('Can\'t find stream URL', video_id=video_id)
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id
# coding: utf-8
from __future__ import unicode_literals
+import re
+
from .common import InfoExtractor
from ..compat import (
compat_str,
class PandoraTVIE(InfoExtractor):
IE_NAME = 'pandora.tv'
IE_DESC = '판도라TV'
- _VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:
+ (?:www\.)?pandora\.tv/view/(?P<user_id>[^/]+)/(?P<id>\d+)| # new format
+ (?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?| # old format
+ m\.pandora\.tv/?\? # mobile
+ )
+ '''
_TESTS = [{
'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
'info_dict': {
# Test metadata only
'skip_download': True,
},
+ }, {
+ 'url': 'http://www.pandora.tv/view/mikakim/53294230#36797454_new',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://m.pandora.tv/?c=view&ch_userid=mikakim&prgid=54600346',
+ 'only_matching': True,
}]
def _real_extract(self, url):
- qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
- video_id = qs.get('prgid', [None])[0]
- user_id = qs.get('ch_userid', [None])[0]
- if any(not f for f in (video_id, user_id,)):
- raise ExtractorError('Invalid URL', expected=True)
+ mobj = re.match(self._VALID_URL, url)
+ user_id = mobj.group('user_id')
+ video_id = mobj.group('id')
+
+ if not user_id or not video_id:
+ qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+ video_id = qs.get('prgid', [None])[0]
+ user_id = qs.get('ch_userid', [None])[0]
+ if any(not f for f in (video_id, user_id,)):
+ raise ExtractorError('Invalid URL', expected=True)
data = self._download_json(
'http://m.pandora.tv/?c=view&m=viewJsonApi&ch_userid=%s&prgid=%s'
r'clip[iI]d=(\d+)',
r'clip[iI]d\s*=\s*["\'](\d+)',
r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
+ r'proMamsId"\s*:\s*"(\d+)',
+ r'proMamsId"\s*:\s*"(\d+)',
]
_TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',
class RestudyIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?restudy\.dk/video/play/id/(?P<id>[0-9]+)'
- _TEST = {
+ _VALID_URL = r'https?://(?:(?:www|portal)\.)?restudy\.dk/video/[^/]+/id/(?P<id>[0-9]+)'
+ _TESTS = [{
'url': 'https://www.restudy.dk/video/play/id/1637',
'info_dict': {
'id': '1637',
# rtmp download
'skip_download': True,
}
- }
+ }, {
+ 'url': 'https://portal.restudy.dk/video/leiden-frosteffekt/id/1637',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
description = self._og_search_description(webpage).strip()
formats = self._extract_smil_formats(
- 'https://www.restudy.dk/awsmedia/SmilDirectory/video_%s.xml' % video_id,
+ 'https://cdn.portal.restudy.dk/dynamic/themes/front/awsmedia/SmilDirectory/video_%s.xml' % video_id,
video_id)
self._sort_formats(formats)
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class RingTVIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?ringtv\.craveonline\.com/(?P<type>news|videos/video)/(?P<id>[^/?#]+)'
- _TEST = {
- 'url': 'http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30',
- 'md5': 'd25945f5df41cdca2d2587165ac28720',
- 'info_dict': {
- 'id': '857645',
- 'ext': 'mp4',
- 'title': 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
- 'description': 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
- }
- }
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id').split('-')[0]
- webpage = self._download_webpage(url, video_id)
-
- if mobj.group('type') == 'news':
- video_id = self._search_regex(
- r'''(?x)<iframe[^>]+src="http://cms\.springboardplatform\.com/
- embed_iframe/[0-9]+/video/([0-9]+)/''',
- webpage, 'real video ID')
- title = self._og_search_title(webpage)
- description = self._html_search_regex(
- r'addthis:description="([^"]+)"',
- webpage, 'description', fatal=False)
- final_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4' % video_id
- thumbnail_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg' % video_id
-
- return {
- 'id': video_id,
- 'url': final_url,
- 'title': title,
- 'thumbnail': thumbnail_url,
- 'description': description,
- }
# coding: utf-8
from __future__ import unicode_literals
-import base64
import re
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..compat import (
+ compat_b64decode,
compat_ord,
compat_str,
)
stream_data = self._download_json(
self._BACKWERK_BASE_URL + 'stream/video/' + video_id, video_id)
- data, iv = base64.b64decode(stream_data['streamUrl']).decode().split(':')
+ data, iv = compat_b64decode(stream_data['streamUrl']).decode().split(':')
stream_url = intlist_to_bytes(aes_cbc_decrypt(
- bytes_to_intlist(base64.b64decode(data)),
+ bytes_to_intlist(compat_b64decode(data)),
bytes_to_intlist(self._AES_KEY),
- bytes_to_intlist(base64.b64decode(iv))
+ bytes_to_intlist(compat_b64decode(iv))
))
if b'rtl2_you_video_not_found' in stream_url:
raise ExtractorError('video not found', expected=True)
from .common import InfoExtractor
from ..compat import (
+ compat_b64decode,
compat_struct_unpack,
)
from ..utils import (
def _decrypt_url(png):
- encrypted_data = base64.b64decode(png.encode('utf-8'))
+ encrypted_data = compat_b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = compat_struct_unpack('!I', text_chunk[:4])[0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index + 1:]
+ if url_data[0] == 'H' and url_data[3] == '%':
+ # remove useless HQ%% at the start
+ url_data = url_data[4:]
alphabet = []
e = 0
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class RTVSIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?rtvs\.sk/(?:radio|televizia)/archiv/\d+/(?P<id>\d+)'
+ _TESTS = [{
+ # radio archive
+ 'url': 'http://www.rtvs.sk/radio/archiv/11224/414872',
+ 'md5': '134d5d6debdeddf8a5d761cbc9edacb8',
+ 'info_dict': {
+ 'id': '414872',
+ 'ext': 'mp3',
+ 'title': 'Ostrov pokladov 1 časť.mp3'
+ },
+ 'params': {
+ 'skip_download': True,
+ }
+ }, {
+ # tv archive
+ 'url': 'http://www.rtvs.sk/televizia/archiv/8249/63118',
+ 'md5': '85e2c55cf988403b70cac24f5c086dc6',
+ 'info_dict': {
+ 'id': '63118',
+ 'ext': 'mp4',
+ 'title': 'Amaro Džives - Náš deň',
+ 'description': 'Galavečer pri príležitosti Medzinárodného dňa Rómov.'
+ },
+ 'params': {
+ 'skip_download': True,
+ }
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, video_id)
+
+ playlist_url = self._search_regex(
+ r'playlist["\']?\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+ 'playlist url', group='url')
+
+ data = self._download_json(
+ playlist_url, video_id, 'Downloading playlist')[0]
+ return self._parse_jwplayer_data(data, video_id=video_id)
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+ compat_parse_qs,
+ compat_str,
+ compat_urllib_parse_urlparse,
+)
+from ..utils import (
+ urljoin,
+ int_or_none,
+ parse_codecs,
+ try_get,
+)
+
+
+def _raw_id(src_url):
+ return compat_urllib_parse_urlparse(src_url).path.split('/')[-1]
+
+
+class SeznamZpravyIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?seznamzpravy\.cz/iframe/player\?.*\bsrc='
+ _TESTS = [{
+ 'url': 'https://www.seznamzpravy.cz/iframe/player?duration=241&serviceSlug=zpravy&src=https%3A%2F%2Fv39-a.sdn.szn.cz%2Fv_39%2Fvmd%2F5999c902ea707c67d8e267a9%3Ffl%3Dmdk%2C432f65a0%7C&itemType=video&autoPlay=false&title=Sv%C4%9Bt%20bez%20obalu%3A%20%C4%8Ce%C5%A1t%C3%AD%20voj%C3%A1ci%20na%20mis%C3%ADch%20(kr%C3%A1tk%C3%A1%20verze)&series=Sv%C4%9Bt%20bez%20obalu&serviceName=Seznam%20Zpr%C3%A1vy&poster=%2F%2Fd39-a.sdn.szn.cz%2Fd_39%2Fc_img_F_I%2FR5puJ.jpeg%3Ffl%3Dcro%2C0%2C0%2C1920%2C1080%7Cres%2C1200%2C%2C1%7Cjpg%2C80%2C%2C1&width=1920&height=1080&cutFrom=0&cutTo=0&splVersion=VOD&contentId=170889&contextId=35990&showAdvert=true&collocation=&autoplayPossible=true&embed=&isVideoTooShortForPreroll=false&isVideoTooLongForPostroll=true&videoCommentOpKey=&videoCommentId=&version=4.0.76&dotService=zpravy&gemiusPrismIdentifier=bVc1ZIb_Qax4W2v5xOPGpMeCP31kFfrTzj0SqPTLh_b.Z7&zoneIdPreroll=seznam.pack.videospot&skipOffsetPreroll=5§ionPrefixPreroll=%2Fzpravy',
+ 'info_dict': {
+ 'id': '170889',
+ 'ext': 'mp4',
+ 'title': 'Svět bez obalu: Čeští vojáci na misích (krátká verze)',
+ 'thumbnail': r're:^https?://.*\.jpe?g',
+ 'duration': 241,
+ 'series': 'Svět bez obalu',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ # with Location key
+ 'url': 'https://www.seznamzpravy.cz/iframe/player?duration=null&serviceSlug=zpravy&src=https%3A%2F%2Flive-a.sdn.szn.cz%2Fv_39%2F59e468fe454f8472a96af9fa%3Ffl%3Dmdk%2C5c1e2840%7C&itemType=livevod&autoPlay=false&title=P%C5%99edseda%20KDU-%C4%8CSL%20Pavel%20B%C4%9Blobr%C3%A1dek%20ve%20volebn%C3%AD%20V%C3%BDzv%C4%9B%20Seznamu&series=V%C3%BDzva&serviceName=Seznam%20Zpr%C3%A1vy&poster=%2F%2Fd39-a.sdn.szn.cz%2Fd_39%2Fc_img_G_J%2FjTBCs.jpeg%3Ffl%3Dcro%2C0%2C0%2C1280%2C720%7Cres%2C1200%2C%2C1%7Cjpg%2C80%2C%2C1&width=16&height=9&cutFrom=0&cutTo=0&splVersion=VOD&contentId=185688&contextId=38489&showAdvert=true&collocation=&hideFullScreen=false&hideSubtitles=false&embed=&isVideoTooShortForPreroll=false&isVideoTooShortForPreroll2=false&isVideoTooLongForPostroll=false&fakePostrollZoneID=seznam.clanky.zpravy.preroll&fakePrerollZoneID=seznam.clanky.zpravy.preroll&videoCommentId=&trim=default_16x9&noPrerollVideoLength=30&noPreroll2VideoLength=undefined&noMidrollVideoLength=0&noPostrollVideoLength=999999&autoplayPossible=true&version=5.0.41&dotService=zpravy&gemiusPrismIdentifier=zD3g7byfW5ekpXmxTVLaq5Srjw5i4hsYo0HY1aBwIe..27&zoneIdPreroll=seznam.pack.videospot&skipOffsetPreroll=5§ionPrefixPreroll=%2Fzpravy%2Fvyzva&zoneIdPostroll=seznam.pack.videospot&skipOffsetPostroll=5§ionPrefixPostroll=%2Fzpravy%2Fvyzva®ression=false',
+ 'info_dict': {
+ 'id': '185688',
+ 'ext': 'mp4',
+ 'title': 'Předseda KDU-ČSL Pavel Bělobrádek ve volební Výzvě Seznamu',
+ 'thumbnail': r're:^https?://.*\.jpe?g',
+ 'series': 'Výzva',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }]
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return [
+ mobj.group('url') for mobj in re.finditer(
+ r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?seznamzpravy\.cz/iframe/player\?.*?)\1',
+ webpage)]
+
+ def _extract_sdn_formats(self, sdn_url, video_id):
+ sdn_data = self._download_json(sdn_url, video_id)
+
+ if sdn_data.get('Location'):
+ sdn_url = sdn_data['Location']
+ sdn_data = self._download_json(sdn_url, video_id)
+
+ formats = []
+ mp4_formats = try_get(sdn_data, lambda x: x['data']['mp4'], dict) or {}
+ for format_id, format_data in mp4_formats.items():
+ relative_url = format_data.get('url')
+ if not relative_url:
+ continue
+
+ try:
+ width, height = format_data.get('resolution')
+ except (TypeError, ValueError):
+ width, height = None, None
+
+ f = {
+ 'url': urljoin(sdn_url, relative_url),
+ 'format_id': 'http-%s' % format_id,
+ 'tbr': int_or_none(format_data.get('bandwidth'), scale=1000),
+ 'width': int_or_none(width),
+ 'height': int_or_none(height),
+ }
+ f.update(parse_codecs(format_data.get('codec')))
+ formats.append(f)
+
+ pls = sdn_data.get('pls', {})
+
+ def get_url(format_id):
+ return try_get(pls, lambda x: x[format_id]['url'], compat_str)
+
+ dash_rel_url = get_url('dash')
+ if dash_rel_url:
+ formats.extend(self._extract_mpd_formats(
+ urljoin(sdn_url, dash_rel_url), video_id, mpd_id='dash',
+ fatal=False))
+
+ hls_rel_url = get_url('hls')
+ if hls_rel_url:
+ formats.extend(self._extract_m3u8_formats(
+ urljoin(sdn_url, hls_rel_url), video_id, ext='mp4',
+ m3u8_id='hls', fatal=False))
+
+ self._sort_formats(formats)
+ return formats
+
+ def _real_extract(self, url):
+ params = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
+
+ src = params['src'][0]
+ title = params['title'][0]
+ video_id = params.get('contentId', [_raw_id(src)])[0]
+ formats = self._extract_sdn_formats(src + 'spl2,2,VOD', video_id)
+
+ duration = int_or_none(params.get('duration', [None])[0])
+ series = params.get('series', [None])[0]
+ thumbnail = params.get('poster', [None])[0]
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'thumbnail': thumbnail,
+ 'duration': duration,
+ 'series': series,
+ 'formats': formats,
+ }
+
+
+class SeznamZpravyArticleIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?(?:seznam\.cz/zpravy|seznamzpravy\.cz)/clanek/(?:[^/?#&]+)-(?P<id>\d+)'
+ _API_URL = 'https://apizpravy.seznam.cz/'
+
+ _TESTS = [{
+ # two videos on one page, with SDN URL
+ 'url': 'https://www.seznamzpravy.cz/clanek/jejich-svet-na-nas-utoci-je-lepsi-branit-se-na-jejich-pisecku-rika-reziser-a-major-v-zaloze-marhoul-35990',
+ 'info_dict': {
+ 'id': '35990',
+ 'title': 'md5:6011c877a36905f28f271fcd8dcdb0f2',
+ 'description': 'md5:933f7b06fa337a814ba199d3596d27ba',
+ },
+ 'playlist_count': 2,
+ }, {
+ # video with live stream URL
+ 'url': 'https://www.seznam.cz/zpravy/clanek/znovu-do-vlady-s-ano-pavel-belobradek-ve-volebnim-specialu-seznamu-38489',
+ 'info_dict': {
+ 'id': '38489',
+ 'title': 'md5:8fa1afdc36fd378cf0eba2b74c5aca60',
+ 'description': 'md5:428e7926a1a81986ec7eb23078004fb4',
+ },
+ 'playlist_count': 1,
+ }]
+
+ def _real_extract(self, url):
+ article_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, article_id)
+
+ info = self._search_json_ld(webpage, article_id, default={})
+ print(info)
+
+ title = info.get('title') or self._og_search_title(webpage, fatal=False)
+ description = info.get('description') or self._og_search_description(webpage)
+
+ return self.playlist_result([
+ self.url_result(url, ie=SeznamZpravyIE.ie_key())
+ for url in SeznamZpravyIE._extract_urls(webpage)],
+ article_id, title, description)
from __future__ import unicode_literals
-import base64
-
from .common import InfoExtractor
+from ..compat import compat_b64decode
from ..utils import (
ExtractorError,
int_or_none,
video_url = self._extract_video_url(webpage, video_id, url)
- title = base64.b64decode(self._html_search_meta(
- 'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
+ title = compat_b64decode(self._html_search_meta(
+ 'full:title', webpage, 'title')).decode('utf-8')
filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False))
r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'stream', group='url'),
video_id,
- transform_source=lambda x: base64.b64decode(
- x.encode('ascii')).decode('utf-8'))[0]
+ transform_source=lambda x: compat_b64decode(x).decode('utf-8'))[0]
'license': 'all-rights-reserved',
},
},
+ # no album art, use avatar pic for thumbnail
+ {
+ 'url': 'https://soundcloud.com/garyvee/sideways-prod-mad-real',
+ 'md5': '59c7872bc44e5d99b7211891664760c2',
+ 'info_dict': {
+ 'id': '309699954',
+ 'ext': 'mp3',
+ 'title': 'Sideways (Prod. Mad Real)',
+ 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+ 'uploader': 'garyvee',
+ 'upload_date': '20170226',
+ 'duration': 207,
+ 'thumbnail': r're:https?://.*\.jpg',
+ 'license': 'all-rights-reserved',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ },
]
- _CLIENT_ID = 'c6CU49JDMapyrQo06UxU9xouB9ZVzqCn'
+ _CLIENT_ID = 'DQskPX1pntALRzMp4HSxya3Mc0AO66Ro'
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
@staticmethod
name = full_title or track_id
if quiet:
self.report_extraction(name)
- thumbnail = info.get('artwork_url')
+ thumbnail = info.get('artwork_url') or info.get('user', {}).get('avatar_url')
if isinstance(thumbnail, compat_str):
thumbnail = thumbnail.replace('-large', '-t500x500')
ext = 'mp3'
class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com'
- _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
+ _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
'timestamp': 1112760000,
'upload_date': '20050406',
},
+ }, {
+ 'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
+ 'only_matching': True,
}]
class SouthParkDeIE(SouthParkIE):
IE_NAME = 'southpark.de'
- _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden)/(?P<id>.+?)(\?|#|$))'
+ _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.de/feeds/video-player/mrss/'
_TESTS = [{
'description': 'Kyle will mit seinem kleinen Bruder Ike Videospiele spielen. Als der nicht mehr mit ihm spielen will, hat Kyle Angst, dass er die Kids von heute nicht mehr versteht.',
},
'playlist_count': 3,
+ }, {
+ 'url': 'http://www.southpark.de/collections/2476/superhero-showdown/1',
+ 'only_matching': True,
}]
class SouthParkNlIE(SouthParkIE):
IE_NAME = 'southpark.nl'
- _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
+ _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
_TESTS = [{
class SouthParkDkIE(SouthParkIE):
IE_NAME = 'southparkstudios.dk'
- _VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.dk/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
+ _VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.(?:dk|nu)/(?:clips|full-episodes|collections)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.dk/feeds/video-player/mrss/'
_TESTS = [{
'description': 'Butters is convinced he\'s living in a virtual reality.',
},
'playlist_mincount': 3,
+ }, {
+ 'url': 'http://www.southparkstudios.dk/collections/2476/superhero-showdown/1',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.southparkstudios.nu/collections/2476/superhero-showdown/1',
+ 'only_matching': True,
}]
import re
from .common import InfoExtractor
-from .nexx import NexxEmbedIE
+from .nexx import (
+ NexxIE,
+ NexxEmbedIE,
+)
from .spiegeltv import SpiegeltvIE
from ..compat import compat_urlparse
from ..utils import (
}, {
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
'only_matching': True,
+ }, {
+ # nexx video
+ 'url': 'http://www.spiegel.de/video/spiegel-tv-magazin-ueber-guellekrise-in-schleswig-holstein-video-99012776.html',
+ 'only_matching': True,
}]
def _real_extract(self, url):
if SpiegeltvIE.suitable(handle.geturl()):
return self.url_result(handle.geturl(), 'Spiegeltv')
+ nexx_id = self._search_regex(
+ r'nexxOmniaId\s*:\s*(\d+)', webpage, 'nexx id', default=None)
+ if nexx_id:
+ domain_id = NexxIE._extract_domain_id(webpage) or '748'
+ return self.url_result(
+ 'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
+ video_id=nexx_id)
+
video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .wdr import WDRBaseIE
-from ..utils import get_element_by_attribute
-
-
-class SportschauIE(WDRBaseIE):
- IE_NAME = 'Sportschau'
- _VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
- _TEST = {
- 'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
- 'info_dict': {
- 'id': 'mdb-1140188',
- 'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
- 'ext': 'mp4',
- 'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
- 'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
- 'upload_date': '20160615',
- },
- 'skip': 'Geo-restricted to Germany',
- }
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
- title = get_element_by_attribute('class', 'headline', webpage)
- description = self._html_search_meta('description', webpage, 'description')
-
- info = self._extract_wdr_video(webpage, video_id)
-
- info.update({
- 'title': title,
- 'description': description,
- })
-
- return info
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ int_or_none,
+ xpath_attr,
+ xpath_text,
+ xpath_element,
+ unescapeHTML,
+ unified_timestamp,
+)
+
+
+class SpringboardPlatformIE(InfoExtractor):
+ _VALID_URL = r'''(?x)
+ https?://
+ cms\.springboardplatform\.com/
+ (?:
+ (?:previews|embed_iframe)/(?P<index>\d+)/video/(?P<id>\d+)|
+ xml_feeds_advanced/index/(?P<index_2>\d+)/rss3/(?P<id_2>\d+)
+ )
+ '''
+ _TESTS = [{
+ 'url': 'http://cms.springboardplatform.com/previews/159/video/981017/0/0/1',
+ 'md5': '5c3cb7b5c55740d482561099e920f192',
+ 'info_dict': {
+ 'id': '981017',
+ 'ext': 'mp4',
+ 'title': 'Redman "BUD like YOU" "Usher Good Kisser" REMIX',
+ 'description': 'Redman "BUD like YOU" "Usher Good Kisser" REMIX',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ 'timestamp': 1409132328,
+ 'upload_date': '20140827',
+ 'duration': 193,
+ },
+ }, {
+ 'url': 'http://cms.springboardplatform.com/embed_iframe/159/video/981017/rab007/rapbasement.com/1/1',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://cms.springboardplatform.com/embed_iframe/20/video/1731611/ki055/kidzworld.com/10',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://cms.springboardplatform.com/xml_feeds_advanced/index/159/rss3/981017/0/0/1/',
+ 'only_matching': True,
+ }]
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return [
+ mobj.group('url')
+ for mobj in re.finditer(
+ r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//cms\.springboardplatform\.com/embed_iframe/\d+/video/\d+.*?)\1',
+ webpage)]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id') or mobj.group('id_2')
+ index = mobj.group('index') or mobj.group('index_2')
+
+ video = self._download_xml(
+ 'http://cms.springboardplatform.com/xml_feeds_advanced/index/%s/rss3/%s'
+ % (index, video_id), video_id)
+
+ item = xpath_element(video, './/item', 'item', fatal=True)
+
+ content = xpath_element(
+ item, './{http://search.yahoo.com/mrss/}content', 'content',
+ fatal=True)
+ title = unescapeHTML(xpath_text(item, './title', 'title', fatal=True))
+
+ video_url = content.attrib['url']
+
+ if 'error_video.mp4' in video_url:
+ raise ExtractorError(
+ 'Video %s no longer exists' % video_id, expected=True)
+
+ duration = int_or_none(content.get('duration'))
+ tbr = int_or_none(content.get('bitrate'))
+ filesize = int_or_none(content.get('fileSize'))
+ width = int_or_none(content.get('width'))
+ height = int_or_none(content.get('height'))
+
+ description = unescapeHTML(xpath_text(
+ item, './description', 'description'))
+ thumbnail = xpath_attr(
+ item, './{http://search.yahoo.com/mrss/}thumbnail', 'url',
+ 'thumbnail')
+
+ timestamp = unified_timestamp(xpath_text(
+ item, './{http://cms.springboardplatform.com/namespaces.html}created',
+ 'timestamp'))
+
+ formats = [{
+ 'url': video_url,
+ 'format_id': 'http',
+ 'tbr': tbr,
+ 'filesize': filesize,
+ 'width': width,
+ 'height': height,
+ }]
+
+ m3u8_format = formats[0].copy()
+ m3u8_format.update({
+ 'url': re.sub(r'(https?://)cdn\.', r'\1hls.', video_url) + '.m3u8',
+ 'ext': 'mp4',
+ 'format_id': 'hls',
+ 'protocol': 'm3u8_native',
+ })
+ formats.append(m3u8_format)
+
+ self._sort_formats(formats)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'timestamp': timestamp,
+ 'duration': duration,
+ 'formats': formats,
+ }
continue
if stream_data.get('playlistProtection') == 'spe':
m3u8_url = self._add_akamai_spe_token(
- 'http://www.%s.com/service/token_spe' % site,
+ 'http://token.vgtf.net/token/token_spe',
m3u8_url, media_id, {
'url': url,
'site_name': site[:3].upper(),
from .common import InfoExtractor
from ..utils import (
- qualities,
determine_ext,
+ ExtractorError,
+ qualities,
)
_VALID_URL = r'https?://(?:www\.)?teachertube\.com/(viewVideo\.php\?video_id=|music\.php\?music_id=|video/(?:[\da-z-]+-)?|audio/)(?P<id>\d+)'
_TESTS = [{
+ # flowplayer
'url': 'http://www.teachertube.com/viewVideo.php?video_id=339997',
'md5': 'f9434ef992fd65936d72999951ee254c',
'info_dict': {
'ext': 'mp4',
'title': 'Measures of dispersion from a frequency table',
'description': 'Measures of dispersion from a frequency table',
- 'thumbnail': r're:http://.*\.jpg',
- },
- }, {
- 'url': 'http://www.teachertube.com/viewVideo.php?video_id=340064',
- 'md5': '0d625ec6bc9bf50f70170942ad580676',
- 'info_dict': {
- 'id': '340064',
- 'ext': 'mp4',
- 'title': 'How to Make Paper Dolls _ Paper Art Projects',
- 'description': 'Learn how to make paper dolls in this simple',
- 'thumbnail': r're:http://.*\.jpg',
+ 'thumbnail': r're:https?://.*\.(?:jpg|png)',
},
}, {
+ # jwplayer
'url': 'http://www.teachertube.com/music.php?music_id=8805',
'md5': '01e8352006c65757caf7b961f6050e21',
'info_dict': {
'description': 'RADIJSKA EMISIJA ZRAKOPLOVNE TEHNI?KE ?KOLE P',
},
}, {
+ # unavailable video
'url': 'http://www.teachertube.com/video/intro-video-schleicher-297790',
- 'md5': '9c79fbb2dd7154823996fc28d4a26998',
- 'info_dict': {
- 'id': '297790',
- 'ext': 'mp4',
- 'title': 'Intro Video - Schleicher',
- 'description': 'Intro Video - Why to flip, how flipping will',
- },
+ 'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
+ error = self._search_regex(
+ r'<div\b[^>]+\bclass=["\']msgBox error[^>]+>([^<]+)', webpage,
+ 'error', default=None)
+ if error:
+ raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
title = self._html_search_meta('title', webpage, 'title', fatal=True)
TITLE_SUFFIX = ' - TeacherTube'
if title.endswith(TITLE_SUFFIX):
self._sort_formats(formats)
+ thumbnail = self._og_search_thumbnail(
+ webpage, default=None) or self._html_search_meta(
+ 'thumbnail', webpage)
+
return {
'id': video_id,
'title': title,
- 'thumbnail': self._html_search_regex(r'\'image\'\s*:\s*["\']([^"\']+)["\']', webpage, 'thumbnail'),
- 'formats': formats,
'description': description,
+ 'thumbnail': thumbnail,
+ 'formats': formats,
}
# coding: utf-8
from __future__ import unicode_literals
-import base64
import binascii
import re
import json
from .common import InfoExtractor
+from ..compat import (
+ compat_b64decode,
+ compat_ord,
+)
from ..utils import (
ExtractorError,
qualities,
determine_ext,
)
-from ..compat import compat_ord
class TeamcocoIE(InfoExtractor):
for i in range(len(cur_fragments)):
cur_sequence = (''.join(cur_fragments[i:] + cur_fragments[:i])).encode('ascii')
try:
- raw_data = base64.b64decode(cur_sequence)
+ raw_data = compat_b64decode(cur_sequence)
if compat_ord(raw_data[0]) == compat_ord('{'):
return json.loads(raw_data.decode('utf-8'))
except (TypeError, binascii.Error, UnicodeDecodeError, ValueError):
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import unified_strdate
-
-
-class TheSixtyOneIE(InfoExtractor):
- _VALID_URL = r'''(?x)https?://(?:www\.)?thesixtyone\.com/
- (?:.*?/)*
- (?:
- s|
- song/comments/list|
- song
- )/(?:[^/]+/)?(?P<id>[A-Za-z0-9]+)/?$'''
- _SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}'
- _SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream'
- _THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop'
- _TESTS = [
- {
- 'url': 'http://www.thesixtyone.com/s/SrE3zD7s1jt/',
- 'md5': '821cc43b0530d3222e3e2b70bb4622ea',
- 'info_dict': {
- 'id': 'SrE3zD7s1jt',
- 'ext': 'mp3',
- 'title': 'CASIO - Unicorn War Mixtape',
- 'thumbnail': 're:^https?://.*_desktop$',
- 'upload_date': '20071217',
- 'duration': 3208,
- }
- },
- {
- 'url': 'http://www.thesixtyone.com/song/comments/list/SrE3zD7s1jt',
- 'only_matching': True,
- },
- {
- 'url': 'http://www.thesixtyone.com/s/ULoiyjuJWli#/s/SrE3zD7s1jt/',
- 'only_matching': True,
- },
- {
- 'url': 'http://www.thesixtyone.com/#/s/SrE3zD7s1jt/',
- 'only_matching': True,
- },
- {
- 'url': 'http://www.thesixtyone.com/song/SrE3zD7s1jt/',
- 'only_matching': True,
- },
- {
- 'url': 'http://www.thesixtyone.com/maryatmidnight/song/StrawberriesandCream/yvWtLp0c4GQ/',
- 'only_matching': True,
- },
- ]
-
- _DECODE_MAP = {
- 'x': 'a',
- 'm': 'b',
- 'w': 'c',
- 'q': 'd',
- 'n': 'e',
- 'p': 'f',
- 'a': '0',
- 'h': '1',
- 'e': '2',
- 'u': '3',
- 's': '4',
- 'i': '5',
- 'o': '6',
- 'y': '7',
- 'r': '8',
- 'c': '9'
- }
-
- def _real_extract(self, url):
- song_id = self._match_id(url)
-
- webpage = self._download_webpage(
- self._SONG_URL_TEMPLATE.format(song_id), song_id)
-
- song_data = self._parse_json(self._search_regex(
- r'"%s":\s(\{.*?\})' % song_id, webpage, 'song_data'), song_id)
-
- if self._search_regex(r'(t61\.s3_audio_load\s*=\s*1\.0;)', webpage, 's3_audio_load marker', default=None):
- song_data['audio_server'] = 's3.amazonaws.com'
- else:
- song_data['audio_server'] = song_data['audio_server'] + '.thesixtyone.com'
-
- keys = [self._DECODE_MAP.get(s, s) for s in song_data['key']]
- url = self._SONG_FILE_URL_TEMPLATE.format(
- "".join(reversed(keys)), **song_data)
-
- formats = [{
- 'format_id': 'sd',
- 'url': url,
- 'ext': 'mp3',
- }]
-
- return {
- 'id': song_id,
- 'title': '{artist:} - {name:}'.format(**song_data),
- 'formats': formats,
- 'comment_count': song_data.get('comments_count'),
- 'duration': song_data.get('play_time'),
- 'like_count': song_data.get('score'),
- 'thumbnail': self._THUMBNAIL_URL_TEMPLATE.format(**song_data),
- 'upload_date': unified_strdate(song_data.get('publish_date')),
- }
from __future__ import unicode_literals
-import base64
-
from .common import InfoExtractor
-from ..compat import compat_parse_qs
+from ..compat import (
+ compat_b64decode,
+ compat_parse_qs,
+)
class TutvIE(InfoExtractor):
data_content = self._download_webpage(
'http://tu.tv/flvurl.php?codVideo=%s' % internal_id, video_id, 'Downloading video info')
- video_url = base64.b64decode(compat_parse_qs(data_content)['kpt'][0].encode('utf-8')).decode('utf-8')
+ video_url = compat_b64decode(compat_parse_qs(data_content)['kpt'][0]).decode('utf-8')
return {
'id': internal_id,
'ext': ext,
}
if video_url.startswith('rtmp'):
+ if smuggled_data.get('skip_rtmp'):
+ continue
m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
return self.url_result(
smuggle_url(
'mtg:%s' % video_id,
- {'geo_countries': [
- compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]]}),
+ {
+ 'geo_countries': [
+ compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
+ # rtmp host mtgfs.fplive.net for viafree is unresolvable
+ 'skip_rtmp': True,
+ }),
ie=TVPlayIE.ie_key(), video_id=video_id)
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
response = self._parse_json(
e.cause.read().decode('utf-8'), None)
- fail(response['message'])
+ fail(response.get('message') or response['errors'][0])
raise
- redirect_url = urljoin(post_url, response['redirect'])
+ if 'Authenticated successfully' in response.get('message', ''):
+ return None, None
+
+ redirect_url = urljoin(
+ post_url,
+ response.get('redirect') or response['redirect_path'])
return self._download_webpage_handle(
redirect_url, None, 'Downloading login redirect page',
headers=headers)
'password': password,
})
+ # Successful login
+ if not redirect_page:
+ return
+
if re.search(r'(?i)<form[^>]+id="two-factor-submit"', redirect_page) is not None:
# TODO: Add mechanism to request an SMS or phone call
tfa_token = self._get_tfa_info('two-factor authentication token')
break
offset += limit
return self.playlist_result(
- [self.url_result(entry) for entry in orderedSet(entries)],
+ [self._make_url_result(entry) for entry in orderedSet(entries)],
channel_id, channel_name)
+ def _make_url_result(self, url):
+ try:
+ video_id = 'v%s' % TwitchVodIE._match_id(url)
+ return self.url_result(url, TwitchVodIE.ie_key(), video_id=video_id)
+ except AssertionError:
+ return self.url_result(url)
+
def _extract_playlist_page(self, response):
videos = response.get('videos')
return [video['url'] for video in videos] if videos else []
'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
expected=True)
+ ERROR_COPYRIGHT = 'Video %s has been removed from public access due to rightholder complaint.'
+
ERRORS = {
r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
- 'Video %s has been removed from public access due to rightholder complaint.',
+ ERROR_COPYRIGHT,
+
+ r'>The video .*? was removed from public access by request of the copyright holder.<':
+ ERROR_COPYRIGHT,
r'<!>Please log in or <':
'Video %s is only available for registered users, '
import re
from .common import InfoExtractor
+from ..compat import (
+ compat_str,
+ compat_urlparse,
+)
from ..utils import (
determine_ext,
ExtractorError,
js_to_json,
strip_jsonp,
+ try_get,
unified_strdate,
update_url_query,
urlhandle_detect_ext,
)
-class WDRBaseIE(InfoExtractor):
- def _extract_wdr_video(self, webpage, display_id):
- # for wdr.de the data-extension is in a tag with the class "mediaLink"
- # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
- # for wdrmaus, in a tag with the class "videoButton" (previously a link
- # to the page in a multiline "videoLink"-tag)
- json_metadata = self._html_search_regex(
- r'''(?sx)class=
- (?:
- (["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
- (["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
- )data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
- ''',
- webpage, 'media link', default=None, group='data')
-
- if not json_metadata:
- return
+class WDRIE(InfoExtractor):
+ _VALID_URL = r'https?://deviceids-medp\.wdr\.de/ondemand/\d+/(?P<id>\d+)\.js'
+ _GEO_COUNTRIES = ['DE']
+ _TEST = {
+ 'url': 'http://deviceids-medp.wdr.de/ondemand/155/1557833.js',
+ 'info_dict': {
+ 'id': 'mdb-1557833',
+ 'ext': 'mp4',
+ 'title': 'Biathlon-Staffel verpasst Podest bei Olympia-Generalprobe',
+ 'upload_date': '20180112',
+ },
+ }
- media_link_obj = self._parse_json(json_metadata, display_id,
- transform_source=js_to_json)
- jsonp_url = media_link_obj['mediaObj']['url']
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
metadata = self._download_json(
- jsonp_url, display_id, transform_source=strip_jsonp)
+ url, video_id, transform_source=strip_jsonp)
+
+ is_live = metadata.get('mediaType') == 'live'
- metadata_tracker_data = metadata['trackerData']
- metadata_media_resource = metadata['mediaResource']
+ tracker_data = metadata['trackerData']
+ media_resource = metadata['mediaResource']
formats = []
# check if the metadata contains a direct URL to a file
- for kind, media_resource in metadata_media_resource.items():
+ for kind, media_resource in media_resource.items():
if kind not in ('dflt', 'alt'):
continue
ext = determine_ext(medium_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
- medium_url, display_id, 'mp4', 'm3u8_native',
+ medium_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls'))
elif ext == 'f4m':
manifest_url = update_url_query(
medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
formats.extend(self._extract_f4m_formats(
- manifest_url, display_id, f4m_id='hds', fatal=False))
+ manifest_url, video_id, f4m_id='hds', fatal=False))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
medium_url, 'stream', fatal=False))
}
if ext == 'unknown_video':
urlh = self._request_webpage(
- medium_url, display_id, note='Determining extension')
+ medium_url, video_id, note='Determining extension')
ext = urlhandle_detect_ext(urlh)
a_format['ext'] = ext
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
- caption_url = metadata_media_resource.get('captionURL')
+ caption_url = media_resource.get('captionURL')
if caption_url:
subtitles['de'] = [{
'url': caption_url,
'ext': 'ttml',
}]
- title = metadata_tracker_data['trackerClipTitle']
+ title = tracker_data['trackerClipTitle']
return {
- 'id': metadata_tracker_data.get('trackerClipId', display_id),
- 'display_id': display_id,
- 'title': title,
- 'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
+ 'id': tracker_data.get('trackerClipId', video_id),
+ 'title': self._live_title(title) if is_live else title,
+ 'alt_title': tracker_data.get('trackerClipSubcategory'),
'formats': formats,
'subtitles': subtitles,
- 'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
+ 'upload_date': unified_strdate(tracker_data.get('trackerClipAirTime')),
+ 'is_live': is_live,
}
-class WDRIE(WDRBaseIE):
+class WDRPageIE(InfoExtractor):
_CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
- _PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
- _VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
+ _PAGE_REGEX = r'/(?:mediathek/)?(?:[^/]+/)*(?P<display_id>[^/]+)\.html'
+ _VALID_URL = r'https?://(?:www\d?\.)?(?:wdr\d?|sportschau)\.de' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
_TESTS = [
{
'ext': 'ttml',
}]},
},
+ 'skip': 'HTTP Error 404: Not Found',
},
{
'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html',
'is_live': False,
'subtitles': {}
},
+ 'skip': 'HTTP Error 404: Not Found',
},
{
'url': 'http://www1.wdr.de/mediathek/video/live/index.html',
'info_dict': {
- 'id': 'mdb-103364',
+ 'id': 'mdb-1406149',
'ext': 'mp4',
- 'display_id': 'index',
- 'title': r're:^WDR Fernsehen im Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+ 'title': r're:^WDR Fernsehen im Livestream \(nur in Deutschland erreichbar\) [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'alt_title': 'WDR Fernsehen Live',
- 'upload_date': None,
- 'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
+ 'upload_date': '20150101',
'is_live': True,
- 'subtitles': {}
},
'params': {
'skip_download': True, # m3u8 download
},
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html',
- 'playlist_mincount': 8,
+ 'playlist_mincount': 7,
'info_dict': {
- 'id': 'aktuelle-stunde/aktuelle-stunde-120',
+ 'id': 'aktuelle-stunde-120',
},
},
{
'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
'info_dict': {
- 'id': 'mdb-1323501',
+ 'id': 'mdb-1552552',
'ext': 'mp4',
'upload_date': 're:^[0-9]{8}$',
'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
- 'description': 'Die Seite mit der Maus -',
},
'skip': 'The id changes from week to week because of the new episode'
},
'ext': 'mp4',
'upload_date': '20130919',
'title': 'Sachgeschichte - Achterbahn ',
- 'description': 'Die Seite mit der Maus -',
},
},
{
# Live stream, MD5 unstable
'info_dict': {
'id': 'mdb-869971',
- 'ext': 'flv',
- 'title': 'COSMO Livestream',
- 'description': 'md5:2309992a6716c347891c045be50992e4',
+ 'ext': 'mp4',
+ 'title': r're:^COSMO Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20160101',
},
+ 'params': {
+ 'skip_download': True, # m3u8 download
+ }
+ },
+ {
+ 'url': 'http://www.sportschau.de/handballem2018/handball-nationalmannschaft-em-stolperstein-vorrunde-100.html',
+ 'info_dict': {
+ 'id': 'mdb-1556012',
+ 'ext': 'mp4',
+ 'title': 'DHB-Vizepräsident Bob Hanning - "Die Weltspitze ist extrem breit"',
+ 'upload_date': '20180111',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ },
+ {
+ 'url': 'http://www.sportschau.de/handballem2018/audio-vorschau---die-handball-em-startet-mit-grossem-favoritenfeld-100.html',
+ 'only_matching': True,
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
- url_type = mobj.group('type')
- page_url = mobj.group('page_url')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
- info_dict = self._extract_wdr_video(webpage, display_id)
+ entries = []
+
+ # Article with several videos
- if not info_dict:
+ # for wdr.de the data-extension is in a tag with the class "mediaLink"
+ # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
+ # for wdrmaus, in a tag with the class "videoButton" (previously a link
+ # to the page in a multiline "videoLink"-tag)
+ for mobj in re.finditer(
+ r'''(?sx)class=
+ (?:
+ (["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
+ (["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
+ )data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
+ ''', webpage):
+ media_link_obj = self._parse_json(
+ mobj.group('data'), display_id, transform_source=js_to_json,
+ fatal=False)
+ if not media_link_obj:
+ continue
+ jsonp_url = try_get(
+ media_link_obj, lambda x: x['mediaObj']['url'], compat_str)
+ if jsonp_url:
+ entries.append(self.url_result(jsonp_url, ie=WDRIE.ie_key()))
+
+ # Playlist (e.g. https://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html)
+ if not entries:
entries = [
- self.url_result(page_url + href[0], 'WDR')
- for href in re.findall(
- r'<a href="(%s)"[^>]+data-extension=' % self._PAGE_REGEX,
- webpage)
+ self.url_result(
+ compat_urlparse.urljoin(url, mobj.group('href')),
+ ie=WDRPageIE.ie_key())
+ for mobj in re.finditer(
+ r'<a[^>]+\bhref=(["\'])(?P<href>(?:(?!\1).)+)\1[^>]+\bdata-extension=',
+ webpage) if re.match(self._PAGE_REGEX, mobj.group('href'))
]
- if entries: # Playlist page
- return self.playlist_result(entries, playlist_id=display_id)
+ return self.playlist_result(entries, playlist_id=display_id)
- raise ExtractorError('No downloadable streams found', expected=True)
- is_live = url_type == 'live'
-
- if is_live:
- info_dict.update({
- 'title': self._live_title(info_dict['title']),
- 'upload_date': None,
- })
- elif 'upload_date' not in info_dict:
- info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
+class WDRElefantIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)wdrmaus\.de/elefantenseite/#(?P<id>.+)'
+ _TEST = {
+ 'url': 'http://www.wdrmaus.de/elefantenseite/#folge_ostern_2015',
+ 'info_dict': {
+ 'title': 'Folge Oster-Spezial 2015',
+ 'id': 'mdb-1088195',
+ 'ext': 'mp4',
+ 'age_limit': None,
+ 'upload_date': '20150406'
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }
- info_dict.update({
- 'description': self._html_search_meta('Description', webpage),
- 'is_live': is_live,
- })
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
- return info_dict
+ # Table of Contents seems to always be at this address, so fetch it directly.
+ # The website fetches configurationJS.php5, which links to tableOfContentsJS.php5.
+ table_of_contents = self._download_json(
+ 'https://www.wdrmaus.de/elefantenseite/data/tableOfContentsJS.php5',
+ display_id)
+ if display_id not in table_of_contents:
+ raise ExtractorError(
+ 'No entry in site\'s table of contents for this URL. '
+ 'Is the fragment part of the URL (after the #) correct?',
+ expected=True)
+ xml_metadata_path = table_of_contents[display_id]['xmlPath']
+ xml_metadata = self._download_xml(
+ 'https://www.wdrmaus.de/elefantenseite/' + xml_metadata_path,
+ display_id)
+ zmdb_url_element = xml_metadata.find('./movie/zmdb_url')
+ if zmdb_url_element is None:
+ raise ExtractorError(
+ '%s is not a video' % display_id, expected=True)
+ return self.url_result(zmdb_url_element.text, ie=WDRIE.ie_key())
class WDRMobileIE(InfoExtractor):
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+import json
+import random
+import re
+
+from ..compat import (
+ compat_parse_qs,
+ compat_str,
+)
+from ..utils import (
+ js_to_json,
+ strip_jsonp,
+ urlencode_postdata,
+)
+
+
+class WeiboIE(InfoExtractor):
+ _VALID_URL = r'https?://weibo\.com/[0-9]+/(?P<id>[a-zA-Z0-9]+)'
+ _TEST = {
+ 'url': 'https://weibo.com/6275294458/Fp6RGfbff?type=comment',
+ 'info_dict': {
+ 'id': 'Fp6RGfbff',
+ 'ext': 'mp4',
+ 'title': 'You should have servants to massage you,... 来自Hosico_猫 - 微博',
+ }
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ # to get Referer url for genvisitor
+ webpage, urlh = self._download_webpage_handle(url, video_id)
+
+ visitor_url = urlh.geturl()
+
+ if 'passport.weibo.com' in visitor_url:
+ # first visit
+ visitor_data = self._download_json(
+ 'https://passport.weibo.com/visitor/genvisitor', video_id,
+ note='Generating first-visit data',
+ transform_source=strip_jsonp,
+ headers={'Referer': visitor_url},
+ data=urlencode_postdata({
+ 'cb': 'gen_callback',
+ 'fp': json.dumps({
+ 'os': '2',
+ 'browser': 'Gecko57,0,0,0',
+ 'fonts': 'undefined',
+ 'screenInfo': '1440*900*24',
+ 'plugins': '',
+ }),
+ }))
+
+ tid = visitor_data['data']['tid']
+ cnfd = '%03d' % visitor_data['data']['confidence']
+
+ self._download_webpage(
+ 'https://passport.weibo.com/visitor/visitor', video_id,
+ note='Running first-visit callback',
+ query={
+ 'a': 'incarnate',
+ 't': tid,
+ 'w': 2,
+ 'c': cnfd,
+ 'cb': 'cross_domain',
+ 'from': 'weibo',
+ '_rand': random.random(),
+ })
+
+ webpage = self._download_webpage(
+ url, video_id, note='Revisiting webpage')
+
+ title = self._html_search_regex(
+ r'<title>(.+?)</title>', webpage, 'title')
+
+ video_formats = compat_parse_qs(self._search_regex(
+ r'video-sources=\\\"(.+?)\"', webpage, 'video_sources'))
+
+ formats = []
+ supported_resolutions = (480, 720)
+ for res in supported_resolutions:
+ vid_urls = video_formats.get(compat_str(res))
+ if not vid_urls or not isinstance(vid_urls, list):
+ continue
+
+ vid_url = vid_urls[0]
+ formats.append({
+ 'url': vid_url,
+ 'height': res,
+ })
+
+ self._sort_formats(formats)
+
+ uploader = self._og_search_property(
+ 'nick-name', webpage, 'uploader', default=None)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'uploader': uploader,
+ 'formats': formats
+ }
+
+
+class WeiboMobileIE(InfoExtractor):
+ _VALID_URL = r'https?://m\.weibo\.cn/status/(?P<id>[0-9]+)(\?.+)?'
+ _TEST = {
+ 'url': 'https://m.weibo.cn/status/4189191225395228?wm=3333_2001&sourcetype=weixin&featurecode=newtitle&from=singlemessage&isappinstalled=0',
+ 'info_dict': {
+ 'id': '4189191225395228',
+ 'ext': 'mp4',
+ 'title': '午睡当然是要甜甜蜜蜜的啦',
+ 'uploader': '柴犬柴犬'
+ }
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ # to get Referer url for genvisitor
+ webpage = self._download_webpage(url, video_id, note='visit the page')
+
+ weibo_info = self._parse_json(self._search_regex(
+ r'var\s+\$render_data\s*=\s*\[({.*})\]\[0\]\s*\|\|\s*{};',
+ webpage, 'js_code', flags=re.DOTALL),
+ video_id, transform_source=js_to_json)
+
+ status_data = weibo_info.get('status', {})
+ page_info = status_data.get('page_info')
+ title = status_data['status_title']
+ uploader = status_data.get('user', {}).get('screen_name')
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'uploader': uploader,
+ 'url': page_info['media_info']['stream_url']
+ }
--- /dev/null
+# coding: utf-8
+
+from __future__ import unicode_literals
+
+import itertools
+import re
+
+from .common import InfoExtractor
+
+
+class XimalayaBaseIE(InfoExtractor):
+ _GEO_COUNTRIES = ['CN']
+
+
+class XimalayaIE(XimalayaBaseIE):
+ IE_NAME = 'ximalaya'
+ IE_DESC = '喜马拉雅FM'
+ _VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/sound/(?P<id>[0-9]+)'
+ _USER_URL_FORMAT = '%s://www.ximalaya.com/zhubo/%i/'
+ _TESTS = [
+ {
+ 'url': 'http://www.ximalaya.com/61425525/sound/47740352/',
+ 'info_dict': {
+ 'id': '47740352',
+ 'ext': 'm4a',
+ 'uploader': '小彬彬爱听书',
+ 'uploader_id': 61425525,
+ 'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
+ 'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
+ 'description': "contains:《送孟浩然之广陵》\n作者:李白\n故人西辞黄鹤楼,烟花三月下扬州。\n孤帆远影碧空尽,惟见长江天际流。",
+ 'thumbnails': [
+ {
+ 'name': 'cover_url',
+ 'url': r're:^https?://.*\.jpg$',
+ },
+ {
+ 'name': 'cover_url_142',
+ 'url': r're:^https?://.*\.jpg$',
+ 'width': 180,
+ 'height': 180
+ }
+ ],
+ 'categories': ['renwen', '人文'],
+ 'duration': 93,
+ 'view_count': int,
+ 'like_count': int,
+ }
+ },
+ {
+ 'url': 'http://m.ximalaya.com/61425525/sound/47740352/',
+ 'info_dict': {
+ 'id': '47740352',
+ 'ext': 'm4a',
+ 'uploader': '小彬彬爱听书',
+ 'uploader_id': 61425525,
+ 'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
+ 'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
+ 'description': "contains:《送孟浩然之广陵》\n作者:李白\n故人西辞黄鹤楼,烟花三月下扬州。\n孤帆远影碧空尽,惟见长江天际流。",
+ 'thumbnails': [
+ {
+ 'name': 'cover_url',
+ 'url': r're:^https?://.*\.jpg$',
+ },
+ {
+ 'name': 'cover_url_142',
+ 'url': r're:^https?://.*\.jpg$',
+ 'width': 180,
+ 'height': 180
+ }
+ ],
+ 'categories': ['renwen', '人文'],
+ 'duration': 93,
+ 'view_count': int,
+ 'like_count': int,
+ }
+ },
+ {
+ 'url': 'https://www.ximalaya.com/11045267/sound/15705996/',
+ 'info_dict': {
+ 'id': '15705996',
+ 'ext': 'm4a',
+ 'uploader': '李延隆老师',
+ 'uploader_id': 11045267,
+ 'uploader_url': 'https://www.ximalaya.com/zhubo/11045267/',
+ 'title': 'Lesson 1 Excuse me!',
+ 'description': "contains:Listen to the tape then answer\xa0this question. Whose handbag is it?\n"
+ "听录音,然后回答问题,这是谁的手袋?",
+ 'thumbnails': [
+ {
+ 'name': 'cover_url',
+ 'url': r're:^https?://.*\.jpg$',
+ },
+ {
+ 'name': 'cover_url_142',
+ 'url': r're:^https?://.*\.jpg$',
+ 'width': 180,
+ 'height': 180
+ }
+ ],
+ 'categories': ['train', '外语'],
+ 'duration': 40,
+ 'view_count': int,
+ 'like_count': int,
+ }
+ },
+ ]
+
+ def _real_extract(self, url):
+
+ is_m = 'm.ximalaya' in url
+ scheme = 'https' if url.startswith('https') else 'http'
+
+ audio_id = self._match_id(url)
+ webpage = self._download_webpage(url, audio_id,
+ note='Download sound page for %s' % audio_id,
+ errnote='Unable to get sound page')
+
+ audio_info_file = '%s://m.ximalaya.com/tracks/%s.json' % (scheme, audio_id)
+ audio_info = self._download_json(audio_info_file, audio_id,
+ 'Downloading info json %s' % audio_info_file,
+ 'Unable to download info file')
+
+ formats = []
+ for bps, k in (('24k', 'play_path_32'), ('64k', 'play_path_64')):
+ if audio_info.get(k):
+ formats.append({
+ 'format_id': bps,
+ 'url': audio_info[k],
+ })
+
+ thumbnails = []
+ for k in audio_info.keys():
+ # cover pics kyes like: cover_url', 'cover_url_142'
+ if k.startswith('cover_url'):
+ thumbnail = {'name': k, 'url': audio_info[k]}
+ if k == 'cover_url_142':
+ thumbnail['width'] = 180
+ thumbnail['height'] = 180
+ thumbnails.append(thumbnail)
+
+ audio_uploader_id = audio_info.get('uid')
+
+ if is_m:
+ audio_description = self._html_search_regex(r'(?s)<section\s+class=["\']content[^>]+>(.+?)</section>',
+ webpage, 'audio_description', fatal=False)
+ else:
+ audio_description = self._html_search_regex(r'(?s)<div\s+class=["\']rich_intro[^>]*>(.+?</article>)',
+ webpage, 'audio_description', fatal=False)
+
+ if not audio_description:
+ audio_description_file = '%s://www.ximalaya.com/sounds/%s/rich_intro' % (scheme, audio_id)
+ audio_description = self._download_webpage(audio_description_file, audio_id,
+ note='Downloading description file %s' % audio_description_file,
+ errnote='Unable to download descrip file',
+ fatal=False)
+ audio_description = audio_description.strip() if audio_description else None
+
+ return {
+ 'id': audio_id,
+ 'uploader': audio_info.get('nickname'),
+ 'uploader_id': audio_uploader_id,
+ 'uploader_url': self._USER_URL_FORMAT % (scheme, audio_uploader_id) if audio_uploader_id else None,
+ 'title': audio_info['title'],
+ 'thumbnails': thumbnails,
+ 'description': audio_description,
+ 'categories': list(filter(None, (audio_info.get('category_name'), audio_info.get('category_title')))),
+ 'duration': audio_info.get('duration'),
+ 'view_count': audio_info.get('play_count'),
+ 'like_count': audio_info.get('favorites_count'),
+ 'formats': formats,
+ }
+
+
+class XimalayaAlbumIE(XimalayaBaseIE):
+ IE_NAME = 'ximalaya:album'
+ IE_DESC = '喜马拉雅FM 专辑'
+ _VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/album/(?P<id>[0-9]+)'
+ _TEMPLATE_URL = '%s://www.ximalaya.com/%s/album/%s/'
+ _BASE_URL_TEMPL = '%s://www.ximalaya.com%s'
+ _LIST_VIDEO_RE = r'<a[^>]+?href="(?P<url>/%s/sound/(?P<id>\d+)/?)"[^>]+?title="(?P<title>[^>]+)">'
+ _TESTS = [{
+ 'url': 'http://www.ximalaya.com/61425525/album/5534601/',
+ 'info_dict': {
+ 'title': '唐诗三百首(含赏析)',
+ 'id': '5534601',
+ },
+ 'playlist_count': 312,
+ }, {
+ 'url': 'http://m.ximalaya.com/61425525/album/5534601',
+ 'info_dict': {
+ 'title': '唐诗三百首(含赏析)',
+ 'id': '5534601',
+ },
+ 'playlist_count': 312,
+ },
+ ]
+
+ def _real_extract(self, url):
+ self.scheme = scheme = 'https' if url.startswith('https') else 'http'
+
+ mobj = re.match(self._VALID_URL, url)
+ uid, playlist_id = mobj.group('uid'), mobj.group('id')
+
+ webpage = self._download_webpage(self._TEMPLATE_URL % (scheme, uid, playlist_id), playlist_id,
+ note='Download album page for %s' % playlist_id,
+ errnote='Unable to get album info')
+
+ title = self._html_search_regex(r'detailContent_title[^>]*><h1(?:[^>]+)?>([^<]+)</h1>',
+ webpage, 'title', fatal=False)
+
+ return self.playlist_result(self._entries(webpage, playlist_id, uid), playlist_id, title)
+
+ def _entries(self, page, playlist_id, uid):
+ html = page
+ for page_num in itertools.count(1):
+ for entry in self._process_page(html, uid):
+ yield entry
+
+ next_url = self._search_regex(r'<a\s+href=(["\'])(?P<more>[\S]+)\1[^>]+rel=(["\'])next\3',
+ html, 'list_next_url', default=None, group='more')
+ if not next_url:
+ break
+
+ next_full_url = self._BASE_URL_TEMPL % (self.scheme, next_url)
+ html = self._download_webpage(next_full_url, playlist_id)
+
+ def _process_page(self, html, uid):
+ find_from = html.index('album_soundlist')
+ for mobj in re.finditer(self._LIST_VIDEO_RE % uid, html[find_from:]):
+ yield self.url_result(self._BASE_URL_TEMPL % (self.scheme, mobj.group('url')),
+ XimalayaIE.ie_key(),
+ mobj.group('id'),
+ mobj.group('title'))
# No data-id value.
'url': 'http://list.youku.com/show/id_zefbfbd61237fefbfbdef.html',
'only_matching': True,
+ }, {
+ # Wrong number of reload_id.
+ 'url': 'http://list.youku.com/show/id_z20eb4acaf5c211e3b2ad.html',
+ 'only_matching': True,
}]
def _extract_entries(self, playlist_data_url, show_id, note, query):
query['callback'] = 'cb'
playlist_data = self._download_json(
playlist_data_url, show_id, query=query, note=note,
- transform_source=lambda s: js_to_json(strip_jsonp(s)))['html']
+ transform_source=lambda s: js_to_json(strip_jsonp(s))).get('html')
+ if playlist_data is None:
+ return [None, None]
drama_list = (get_element_by_class('p-drama-grid', playlist_data) or
get_element_by_class('p-drama-half-row', playlist_data))
if drama_list is None:
'id': page_config['showid'],
'stage': reload_id,
})
- entries.extend(new_entries)
-
+ if new_entries is not None:
+ entries.extend(new_entries)
desc = self._html_search_meta('description', webpage, fatal=False)
playlist_title = desc.split(',')[0] if desc else None
detail_li = get_element_by_class('p-intro', webpage)
if 'token' not in video_info:
video_info = get_video_info
break
+
+ def extract_unavailable_message():
+ return self._html_search_regex(
+ r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
+ video_webpage, 'unavailable message', default=None)
+
if 'token' not in video_info:
if 'reason' in video_info:
if 'The uploader has not made this video available in your country.' in video_info['reason']:
countries = regions_allowed.split(',') if regions_allowed else None
self.raise_geo_restricted(
msg=video_info['reason'][0], countries=countries)
+ reason = video_info['reason'][0]
+ if 'Invalid parameters' in reason:
+ unavailable_message = extract_unavailable_message()
+ if unavailable_message:
+ reason = unavailable_message
raise ExtractorError(
- 'YouTube said: %s' % video_info['reason'][0],
+ 'YouTube said: %s' % reason,
expected=True, video_id=video_id)
else:
raise ExtractorError(
'url': video_info['conn'][0],
'player_url': player_url,
}]
- elif len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1:
+ elif not is_live and (len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1):
encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
if 'rtmpe%3Dyes' in encoded_url_map:
raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format)
else:
- unavailable_message = self._html_search_regex(
- r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
- video_webpage, 'unavailable message', default=None)
+ unavailable_message = extract_unavailable_message()
if unavailable_message:
raise ExtractorError(unavailable_message, expected=True)
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
webpage = self._download_webpage(url, channel_id, fatal=False)
if webpage:
page_type = self._og_search_property(
- 'type', webpage, 'page type', default=None)
+ 'type', webpage, 'page type', default='')
video_id = self._html_search_meta(
'videoId', webpage, 'video id', default=None)
- if page_type == 'video' and video_id and re.match(r'^[0-9A-Za-z_-]{11}$', video_id):
+ if page_type.startswith('video') and video_id and re.match(
+ r'^[0-9A-Za-z_-]{11}$', video_id):
return self.url_result(video_id, YoutubeIE.ie_key())
return self.url_result(base_url)
compat_HTMLParser,
compat_basestring,
compat_chr,
+ compat_ctypes_WINFUNCTYPE,
compat_etree_fromstring,
compat_expanduser,
compat_html_entities,
if fileno not in WIN_OUTPUT_IDS:
return False
- GetStdHandle = ctypes.WINFUNCTYPE(
+ GetStdHandle = compat_ctypes_WINFUNCTYPE(
ctypes.wintypes.HANDLE, ctypes.wintypes.DWORD)(
- (b'GetStdHandle', ctypes.windll.kernel32))
+ ('GetStdHandle', ctypes.windll.kernel32))
h = GetStdHandle(WIN_OUTPUT_IDS[fileno])
- WriteConsoleW = ctypes.WINFUNCTYPE(
+ WriteConsoleW = compat_ctypes_WINFUNCTYPE(
ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE, ctypes.wintypes.LPWSTR,
ctypes.wintypes.DWORD, ctypes.POINTER(ctypes.wintypes.DWORD),
- ctypes.wintypes.LPVOID)((b'WriteConsoleW', ctypes.windll.kernel32))
+ ctypes.wintypes.LPVOID)(('WriteConsoleW', ctypes.windll.kernel32))
written = ctypes.wintypes.DWORD(0)
- GetFileType = ctypes.WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)((b'GetFileType', ctypes.windll.kernel32))
+ GetFileType = compat_ctypes_WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)(('GetFileType', ctypes.windll.kernel32))
FILE_TYPE_CHAR = 0x0002
FILE_TYPE_REMOTE = 0x8000
- GetConsoleMode = ctypes.WINFUNCTYPE(
+ GetConsoleMode = compat_ctypes_WINFUNCTYPE(
ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE,
ctypes.POINTER(ctypes.wintypes.DWORD))(
- (b'GetConsoleMode', ctypes.windll.kernel32))
+ ('GetConsoleMode', ctypes.windll.kernel32))
INVALID_HANDLE_VALUE = ctypes.wintypes.DWORD(-1).value
def not_a_console(handle):
"(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
'(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
{comment}|,(?={skip}[\]}}])|
- [a-zA-Z_][.a-zA-Z_0-9]*|
+ (?:(?<![0-9])[eE]|[a-df-zA-DF-Z_])[.a-zA-Z_0-9]*|
\b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:{skip}:)?|
[0-9]+(?={skip}:)
'''.format(comment=COMMENT_RE, skip=SKIP_RE), fix_kv, code)
from __future__ import unicode_literals
-__version__ = '2017.12.31'
+__version__ = '2018.01.27'