+version 2018.11.07
+
+Extractors
++ [youtube] Add another JS signature function name regex (#18091, #18093,
+ #18094)
+* [facebook] Fix tahoe request (#17171)
+* [cliphunter] Fix extraction (#18083)
++ [youtube:playlist] Add support for invidio.us (#18077)
+* [zattoo] Arrange API hosts for derived extractors (#18035)
++ [youtube] Add fallback metadata extraction from videoDetails (#18052)
+
+
+version 2018.11.03
+
+Core
+* [extractor/common] Ensure response handle is not prematurely closed before
+ it can be read if it matches expected_status (#17195, #17846, #17447)
+
+Extractors
+* [laola1tv:embed] Set correct stream access URL scheme (#16341)
++ [ehftv] Add support for ehftv.com (#15408)
+* [azmedien] Adopt to major site redesign (#17745, #17746)
++ [twitcasting] Add support for twitcasting.tv (#17981)
+* [orf:tvthek] Fix extraction (#17737, #17956, #18024)
++ [openload] Add support for oload.fun (#18045)
+* [njpwworld] Fix authentication (#17427)
++ [linkedin:learning] Add support for linkedin.com/learning (#13545)
+* [theplatform] Improve error detection (#13222)
+* [cnbc] Simplify extraction (#14280, #17110)
++ [cbnc] Add support for new URL schema (#14193)
+* [aparat] Improve extraction and extract more metadata (#17445, #18008)
+* [aparat] Fix extraction
+
+
+version 2018.10.29
+
+Core
++ [extractor/common] Add validation for JSON-LD URLs
+
+Extractors
++ [sportbox] Add support for matchtv.ru
+* [sportbox] Fix extraction (#17978)
+* [screencast] Fix extraction (#14590, #14617, #17990)
++ [openload] Add support for oload.icu
++ [ivi] Add support for ivi.tv
+* [crunchyroll] Improve extraction failsafeness (#17991)
+* [dailymail] Fix formats extraction (#17976)
+* [viewster] Reduce format requests
+* [cwtv] Handle API errors (#17905)
++ [rutube] Use geo verification headers (#17897)
++ [brightcove:legacy] Add fallbacks to brightcove:new (#13912)
+- [tv3] Remove extractor (#10461, #15339)
+* [ted] Fix extraction for HTTP and RTMP formats (#5941, #17572, #17894)
++ [openload] Add support for oload.cc (#17823)
++ [patreon] Extract post_file URL (#17792)
+* [patreon] Fix extraction (#14502, #10471)
+
+
+version 2018.10.05
+
+Extractors
+* [pluralsight] Improve authentication (#17762)
+* [dailymotion] Fix extraction (#17699)
+* [crunchyroll] Switch to HTTPS for RpcApi (#17749)
++ [philharmoniedeparis] Add support for pad.philharmoniedeparis.fr (#17705)
+* [philharmoniedeparis] Fix extraction (#17705)
++ [jamendo] Add support for licensing.jamendo.com (#17724)
++ [openload] Add support for oload.cloud (#17710)
+* [pluralsight] Fix subtitles extraction (#17726, #17728)
++ [vimeo] Add another config regular expression (#17690)
+* [spike] Fix Paramount Network extraction (#17677)
+* [hotstar] Fix extraction (#14694, #14931, #17637)
+
+
+version 2018.09.26
+
+Extractors
+* [pluralsight] Fix subtitles extraction (#17671)
+* [mediaset] Improve embed support (#17668)
++ [youtube] Add support for invidio.us (#17613)
++ [zattoo] Add support for more zattoo platform sites
+* [zattoo] Fix extraction (#17175, #17542)
+
+
+version 2018.09.18
+
+Core
++ [extractor/common] Introduce channel meta fields
+
+Extractors
+* [adobepass] Don't pollute default headers dict
+* [udemy] Don't pollute default headers dict
+* [twitch] Don't pollute default headers dict
+* [youtube] Don't pollute default query dict (#17593)
+* [crunchyroll] Prefer hardsubless formats and formats in locale language
+* [vrv] Make format ids deterministic
+* [vimeo] Fix ondemand playlist extraction (#14591)
++ [pornhub] Extract upload date (#17574)
++ [porntube] Extract channel meta fields
++ [vimeo] Extract channel meta fields
++ [youtube] Extract channel meta fields (#9676, #12939)
+* [porntube] Fix extraction (#17541)
+* [asiancrush] Fix extraction (#15630)
++ [twitch:clips] Extend URL regular expression (closes #17559)
++ [vzaar] Add support for HLS
+* [tube8] Fix metadata extraction (#17520)
+* [eporner] Extract JSON-LD (#17519)
+
+
version 2018.09.10
Core
- `timestamp` (numeric): UNIX timestamp of the moment the video became available
- `upload_date` (string): Video upload date (YYYYMMDD)
- `uploader_id` (string): Nickname or id of the video uploader
+ - `channel` (string): Full name of the channel the video is uploaded on
+ - `channel_id` (string): Id of the channel
- `location` (string): Physical location where the video was filmed
- `duration` (numeric): Length of the video in seconds
- `view_count` (numeric): How many users have watched the video on the platform
### Use safe conversion functions
-Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
+
+Use `url_or_none` for safe URL processing.
+
+Use `try_get` for safe metadata extraction from parsed JSON.
+
+Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
+
+#### More examples
+
+##### Safely extract optional description from parsed JSON
+```python
+description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
+```
+
+##### Safely extract more optional metadata
+```python
+video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
+description = video.get('summary')
+duration = float_or_none(video.get('durationMs'), scale=1000)
+view_count = int_or_none(video.get('views'))
+```
# EMBEDDING YOUTUBE-DL
available
- upload_date (string): Video upload date (YYYYMMDD)
- uploader_id (string): Nickname or id of the video uploader
+- channel (string): Full name of the channel the video is uploaded on
+- channel_id (string): Id of the channel
- location (string): Physical location where the video was filmed
- duration (numeric): Length of the video in seconds
- view_count (numeric): How many users have watched the video on the
Use safe conversion functions
-Wrap all extracted numeric data into safe functions from utils:
-int_or_none, float_or_none. Use them for string to number conversions as
-well.
+Wrap all extracted numeric data into safe functions from
+youtube_dl/utils.py: int_or_none, float_or_none. Use them for string to
+number conversions as well.
+
+Use url_or_none for safe URL processing.
+
+Use try_get for safe metadata extraction from parsed JSON.
+
+Explore youtube_dl/utils.py for more useful convenience functions.
+
+More examples
+
+Safely extract optional description from parsed JSON
+
+ description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
+
+Safely extract more optional metadata
+
+ video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
+ description = video.get('summary')
+ duration = float_or_none(video.get('durationMs'), scale=1000)
+ view_count = int_or_none(video.get('views'))
- **awaan:season**
- **awaan:video**
- **AZMedien**: AZ Medien videos
- - **AZMedienPlaylist**: AZ Medien playlists
- - **AZMedienShowPlaylist**: AZ Medien show playlists
- **BaiduVideo**: 百度视频
- **bambuser**
- **bambuser:channel**
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
+ - **BBVTV**
- **Beatport**
- **Beeg**
- **BehindKink**
- **Clyp**
- **cmt.com**
- **CNBC**
+ - **CNBCVideo**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
- **EchoMsk**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
+ - **ehftv**
- **eHow**
+ - **EinsUndEinsTV**
- **Einthusan**
- **eitb.tv**
- **EllenTube**
- **EsriVideo**
- **Europa**
- **EveryonesMixtape**
+ - **EWETV**
- **ExpoTV**
- **Expressen**
- **ExtremeTube**
- **Gfycat**
- **GiantBomb**
- **Giga**
+ - **GlattvisionTV**
- **Glide**: Glide mobile video messages (glide.me)
- **Globo**
- **GloboArticle**
- **HitRecord**
- **HornBunny**
- **HotNewHipHop**
- - **HotStar**
+ - **hotstar**
- **hotstar:playlist**
- **Howcast**
- **HowStuffWorks**
- **limelight:channel**
- **limelight:channel_list**
- **LineTV**
+ - **linkedin:learning**
+ - **linkedin:learning:course**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
- **Mixer:vod**
- **MLB**
- **Mnet**
+ - **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
- **Myvi**
- **MyVidster**
- **MyviEmbed**
+ - **MyVisionTV**
- **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
- **netease:program**: 网易云音乐 - 电台节目
- **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐
+ - **NetPlus**
- **Netzkino**
- **Newgrounds**
- **NewgroundsPlaylist**
- **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
+ - **OsnatelTV**
- **PacktPub**
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
+ - **QuantumTV**
- **Quickline**
- **QuicklineLive**
- **R7**
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
+ - **SAKTV**
- **Sapo**: SAPO Vídeos
- **savefrom.net**
- **SBS**: sbs.com.au
- **Spiegeltv**
- **sport.francetvinfo.fr**
- **Sport5**
- - **SportBoxEmbed**
+ - **SportBox**
- **SportDeutschland**
- **SpringboardPlatform**
- **Sprout**
- **TV2**
- **tv2.hu**
- **TV2Article**
- - **TV3**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
- **TVPlayer**
- **TVPlayHome**
- **Tweakers**
+ - **TwitCasting**
- **twitch:chapter**
- **twitch:clips**
- **twitch:profile**
- **vrv**
- **vrv:series**
- **VShare**
+ - **VTXTV**
- **vube**: Vube.com
- **VuClip**
- **VVVVID**
- **VyboryMos**
- **Vzaar**
- **Walla**
+ - **WalyTV**
- **washingtonpost**
- **washingtonpost:article**
- **wat.tv**
import os.path
import re
import types
+import ssl
import sys
import youtube_dl.extractor
real_warning(w)
ydl.report_warning = _report_warning
+
+
+def http_server_port(httpd):
+ if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
+ # In Jython SSLSocket is not a subclass of socket.socket
+ sock = httpd.socket.sock
+ else:
+ sock = httpd.socket
+ return sock.getsockname()[1]
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from test.helper import FakeYDL, expect_dict, expect_value
-from youtube_dl.compat import compat_etree_fromstring
+from test.helper import FakeYDL, expect_dict, expect_value, http_server_port
+from youtube_dl.compat import compat_etree_fromstring, compat_http_server
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
+import threading
+
+
+TEAPOT_RESPONSE_STATUS = 418
+TEAPOT_RESPONSE_BODY = "<h1>418 I'm a teapot</h1>"
+
+
+class InfoExtractorTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
+ def log_message(self, format, *args):
+ pass
+
+ def do_GET(self):
+ if self.path == '/teapot':
+ self.send_response(TEAPOT_RESPONSE_STATUS)
+ self.send_header('Content-Type', 'text/html; charset=utf-8')
+ self.end_headers()
+ self.wfile.write(TEAPOT_RESPONSE_BODY.encode())
+ else:
+ assert False
class TestIE(InfoExtractor):
for i in range(len(entries)):
expect_dict(self, entries[i], expected_entries[i])
+ def test_response_with_expected_status_returns_content(self):
+ # Checks for mitigations against the effects of
+ # <https://bugs.python.org/issue15002> that affect Python 3.4.1+, which
+ # manifest as `_download_webpage`, `_download_xml`, `_download_json`,
+ # or the underlying `_download_webpage_handle` returning no content
+ # when a response matches `expected_status`.
+
+ httpd = compat_http_server.HTTPServer(
+ ('127.0.0.1', 0), InfoExtractorTestRequestHandler)
+ port = http_server_port(httpd)
+ server_thread = threading.Thread(target=httpd.serve_forever)
+ server_thread.daemon = True
+ server_thread.start()
+
+ (content, urlh) = self.ie._download_webpage_handle(
+ 'http://127.0.0.1:%d/teapot' % port, None,
+ expected_status=TEAPOT_RESPONSE_STATUS)
+ self.assertEqual(content, TEAPOT_RESPONSE_BODY)
+
if __name__ == '__main__':
unittest.main()
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from test.helper import try_rm
+from test.helper import http_server_port, try_rm
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
from youtube_dl.utils import encodeFilename
-import ssl
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
-def http_server_port(httpd):
- if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
- # In Jython SSLSocket is not a subclass of socket.socket
- sock = httpd.socket.sock
- else:
- sock = httpd.socket
- return sock.getsockname()[1]
-
-
TEST_SIZE = 10 * 1024
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from test.helper import http_server_port
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server, compat_urllib_request
import ssl
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
-def http_server_port(httpd):
- if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
- # In Jython SSLSocket is not a subclass of socket.socket
- sock = httpd.socket.sock
- else:
- sock = httpd.socket
- return sock.getsockname()[1]
-
-
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
.IP \[bu] 2
\f[C]uploader_id\f[] (string): Nickname or id of the video uploader
.IP \[bu] 2
+\f[C]channel\f[] (string): Full name of the channel the video is
+uploaded on
+.IP \[bu] 2
+\f[C]channel_id\f[] (string): Id of the channel
+.IP \[bu] 2
\f[C]location\f[] (string): Physical location where the video was filmed
.IP \[bu] 2
\f[C]duration\f[] (numeric): Length of the video in seconds
.fi
.SS Use safe conversion functions
.PP
-Wrap all extracted numeric data into safe functions from \f[C]utils\f[]:
+Wrap all extracted numeric data into safe functions from
+\f[C]youtube_dl/utils.py\f[] (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py):
\f[C]int_or_none\f[], \f[C]float_or_none\f[].
Use them for string to number conversions as well.
+.PP
+Use \f[C]url_or_none\f[] for safe URL processing.
+.PP
+Use \f[C]try_get\f[] for safe metadata extraction from parsed JSON.
+.PP
+Explore
+\f[C]youtube_dl/utils.py\f[] (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py)
+for more useful convenience functions.
+.SS More examples
+.SS Safely extract optional description from parsed JSON
+.IP
+.nf
+\f[C]
+description\ =\ try_get(response,\ lambda\ x:\ x[\[aq]result\[aq]][\[aq]video\[aq]][0][\[aq]summary\[aq]],\ compat_str)
+\f[]
+.fi
+.SS Safely extract more optional metadata
+.IP
+.nf
+\f[C]
+video\ =\ try_get(response,\ lambda\ x:\ x[\[aq]result\[aq]][\[aq]video\[aq]][0],\ dict)\ or\ {}
+description\ =\ video.get(\[aq]summary\[aq])
+duration\ =\ float_or_none(video.get(\[aq]durationMs\[aq]),\ scale=1000)
+view_count\ =\ int_or_none(video.get(\[aq]views\[aq]))
+\f[]
+.fi
.SH EMBEDDING YOUTUBE\-DL
.PP
youtube\-dl makes the best effort to be a good command\-line program,
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
def _download_webpage_handle(self, *args, **kwargs):
- headers = kwargs.get('headers', {})
- headers.update(self.geo_verification_headers())
+ headers = self.geo_verification_headers()
+ headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
from .common import InfoExtractor
from ..utils import (
int_or_none,
+ merge_dicts,
mimetype2ext,
url_or_none,
)
class AparatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
- _TEST = {
+ _TESTS = [{
'url': 'http://www.aparat.com/v/wP8On',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
'title': 'تیم گلکسی 11 - زومیت',
- 'age_limit': 0,
+ 'description': 'md5:096bdabcdcc4569f2b8a5e903a3b3028',
+ 'duration': 231,
+ 'timestamp': 1387394859,
+ 'upload_date': '20131218',
+ 'view_count': int,
},
- # 'skip': 'Extremely unreliable',
- }
+ }, {
+ # multiple formats
+ 'url': 'https://www.aparat.com/v/8dflw/',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
- # Note: There is an easier-to-parse configuration at
- # http://www.aparat.com/video/video/config/videohash/%video_id
- # but the URL in there does not work
- webpage = self._download_webpage(
- 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
- video_id)
+ # Provides more metadata
+ webpage = self._download_webpage(url, video_id, fatal=False)
- title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
+ if not webpage:
+ # Note: There is an easier-to-parse configuration at
+ # http://www.aparat.com/video/video/config/videohash/%video_id
+ # but the URL in there does not work
+ webpage = self._download_webpage(
+ 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
+ video_id)
- file_list = self._parse_json(
+ options = self._parse_json(
self._search_regex(
- r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
- 'file list'),
+ r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
+ webpage, 'options', group='value'),
video_id)
+ player = options['plugins']['sabaPlayerPlugin']
+
formats = []
- for item in file_list[0]:
- file_url = url_or_none(item.get('file'))
- if not file_url:
- continue
- ext = mimetype2ext(item.get('type'))
- label = item.get('label')
- formats.append({
- 'url': file_url,
- 'ext': ext,
- 'format_id': label or ext,
- 'height': int_or_none(self._search_regex(
- r'(\d+)[pP]', label or '', 'height', default=None)),
- })
- self._sort_formats(formats)
+ for sources in player['multiSRC']:
+ for item in sources:
+ if not isinstance(item, dict):
+ continue
+ file_url = url_or_none(item.get('src'))
+ if not file_url:
+ continue
+ item_type = item.get('type')
+ if item_type == 'application/vnd.apple.mpegurl':
+ formats.extend(self._extract_m3u8_formats(
+ file_url, video_id, 'mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls',
+ fatal=False))
+ else:
+ ext = mimetype2ext(item.get('type'))
+ label = item.get('label')
+ formats.append({
+ 'url': file_url,
+ 'ext': ext,
+ 'format_id': 'http-%s' % (label or ext),
+ 'height': int_or_none(self._search_regex(
+ r'(\d+)[pP]', label or '', 'height',
+ default=None)),
+ })
+ self._sort_formats(
+ formats, field_preference=('height', 'width', 'tbr', 'format_id'))
+
+ info = self._search_json_ld(webpage, video_id, default={})
- thumbnail = self._search_regex(
- r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
+ if not info.get('title'):
+ info['title'] = player['title']
- return {
+ return merge_dicts(info, {
'id': video_id,
- 'title': title,
- 'thumbnail': thumbnail,
- 'age_limit': self._family_friendly_search(webpage),
+ 'thumbnail': url_or_none(options.get('poster')),
+ 'duration': int_or_none(player.get('duration')),
'formats': formats,
- }
+ })
from ..utils import (
extract_attributes,
remove_end,
- urlencode_postdata,
)
def _real_extract(self, url):
video_id = self._match_id(url)
- data = self._download_json(
- 'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
- data=urlencode_postdata({
- 'postid': video_id,
- 'action': 'get_channel_kaltura_vars',
- }))
+ webpage = self._download_webpage(url, video_id)
- entry_id = data['entry_id']
+ entry_id, partner_id, title = [None] * 3
+
+ vars = self._parse_json(
+ self._search_regex(
+ r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
+ default='{}'), video_id, fatal=False)
+ if vars:
+ entry_id = vars.get('entry_id')
+ partner_id = vars.get('partner_id')
+ title = vars.get('vid_label')
+
+ if not entry_id:
+ entry_id = self._search_regex(
+ r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
+
+ player = self._download_webpage(
+ 'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
+ query={'id': entry_id})
+
+ kaltura_id = self._search_regex(
+ r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
+ 'kaltura id', group='id')
+
+ if not partner_id:
+ partner_id = self._search_regex(
+ r'/p(?:artner_id)?/(\d+)', player, 'partner id',
+ default='513551')
return self.url_result(
- 'kaltura:%s:%s' % (data['partner_id'], entry_id),
- ie=KalturaIE.ie_key(), video_id=entry_id,
- video_title=data.get('vid_label'))
+ 'kaltura:%s:%s' % (partner_id, kaltura_id),
+ ie=KalturaIE.ie_key(), video_id=kaltura_id,
+ video_title=title)
class AsianCrushPlaylistIE(InfoExtractor):
# coding: utf-8
from __future__ import unicode_literals
+import json
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
-from ..utils import (
- get_element_by_class,
- get_element_by_id,
- strip_or_none,
- urljoin,
-)
-class AZMedienBaseIE(InfoExtractor):
- def _kaltura_video(self, partner_id, entry_id):
- return self.url_result(
- 'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
- video_id=entry_id)
-
-
-class AZMedienIE(AZMedienBaseIE):
+class AZMedienIE(InfoExtractor):
IE_DESC = 'AZ Medien videos'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
- (?:
+ (?P<host>
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
- [0-9]+-show-[^/\#]+
- (?:
- /[0-9]+-episode-[^/\#]+
- (?:
- /[0-9]+-segment-(?:[^/\#]+\#)?|
- \#
- )|
- \#
+ [^/]+/
+ (?P<id>
+ [^/]+-(?P<article_id>\d+)
)
- (?P<id>[^\#]+)
+ (?:
+ \#video=
+ (?P<kaltura_id>
+ [_0-9a-z]+
+ )
+ )?
'''
_TESTS = [{
- # URL with 'segment'
- 'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
+ 'url': 'https://www.telezueri.ch/sonntalk/bundesrats-vakanzen-eu-rahmenabkommen-133214569',
'info_dict': {
- 'id': '1_2444peh4',
+ 'id': '1_anruz3wy',
'ext': 'mp4',
- 'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
- 'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
- 'uploader_id': 'TeleZ?ri',
- 'upload_date': '20161218',
- 'timestamp': 1482084490,
+ 'title': 'Bundesrats-Vakanzen / EU-Rahmenabkommen',
+ 'description': 'md5:dd9f96751ec9c35e409a698a328402f3',
+ 'uploader_id': 'TVOnline',
+ 'upload_date': '20180930',
+ 'timestamp': 1538328802,
},
'params': {
'skip_download': True,
},
}, {
- # URL with 'segment' and fragment:
- 'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
- 'only_matching': True
- }, {
- # URL with 'episode' and fragment:
- 'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
- 'only_matching': True
- }, {
- # URL with 'show' and fragment:
- 'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
+ 'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- partner_id = self._search_regex(
- r'<script[^>]+src=["\'](?:https?:)?//(?:[^/]+\.)?kaltura\.com(?:/[^/]+)*/(?:p|partner_id)/([0-9]+)',
- webpage, 'kaltura partner id')
- entry_id = self._html_search_regex(
- r'<a[^>]+data-id=(["\'])(?P<id>(?:(?!\1).)+)\1[^>]+data-slug=["\']%s'
- % re.escape(video_id), webpage, 'kaltura entry id', group='id')
-
- return self._kaltura_video(partner_id, entry_id)
-
-
-class AZMedienPlaylistIE(AZMedienBaseIE):
- IE_DESC = 'AZ Medien playlists'
- _VALID_URL = r'''(?x)
- https?://
- (?:www\.)?
- (?:
- telezueri\.ch|
- telebaern\.tv|
- telem1\.ch
- )/
- (?P<id>[0-9]+-
- (?:
- show|
- topic|
- themen
- )-[^/\#]+
- (?:
- /[0-9]+-episode-[^/\#]+
- )?
- )$
- '''
-
- _TESTS = [{
- # URL with 'episode'
- 'url': 'http://www.telebaern.tv/118-show-news/13735-episode-donnerstag-15-dezember-2016',
- 'info_dict': {
- 'id': '118-show-news/13735-episode-donnerstag-15-dezember-2016',
- 'title': 'News - Donnerstag, 15. Dezember 2016',
- },
- 'playlist_count': 9,
- }, {
- # URL with 'themen'
- 'url': 'http://www.telem1.ch/258-themen-tele-m1-classics',
- 'info_dict': {
- 'id': '258-themen-tele-m1-classics',
- 'title': 'Tele M1 Classics',
- },
- 'playlist_mincount': 15,
- }, {
- # URL with 'topic', contains nested playlists
- 'url': 'http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen',
- 'only_matching': True,
- }, {
- # URL with 'show' only
- 'url': 'http://www.telezueri.ch/86-show-talktaeglich',
- 'only_matching': True
- }]
+ _PARTNER_ID = '1719221'
def _real_extract(self, url):
- show_id = self._match_id(url)
- webpage = self._download_webpage(url, show_id)
-
- entries = []
-
- partner_id = self._search_regex(
- r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
- webpage, 'kaltura partner id', default=None)
-
- if partner_id:
- entries = [
- self._kaltura_video(partner_id, m.group('id'))
- for m in re.finditer(
- r'data-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage)]
-
- if not entries:
- entries = [
- self.url_result(m.group('url'), ie=AZMedienIE.ie_key())
- for m in re.finditer(
- r'<a[^>]+data-real=(["\'])(?P<url>http.+?)\1', webpage)]
-
- if not entries:
- entries = [
- # May contain nested playlists (e.g. [1]) thus no explicit
- # ie_key
- # 1. http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen)
- self.url_result(urljoin(url, m.group('url')))
- for m in re.finditer(
- r'<a[^>]+name=[^>]+href=(["\'])(?P<url>/.+?)\1', webpage)]
-
- title = self._search_regex(
- r'episodeShareTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
- webpage, 'title',
- default=strip_or_none(get_element_by_id(
- 'video-title', webpage)), group='title')
-
- return self.playlist_result(entries, show_id, title)
-
-
-class AZMedienShowPlaylistIE(AZMedienBaseIE):
- IE_DESC = 'AZ Medien show playlists'
- _VALID_URL = r'''(?x)
- https?://
- (?:www\.)?
- (?:
- telezueri\.ch|
- telebaern\.tv|
- telem1\.ch
- )/
- (?:
- all-episodes|
- alle-episoden
- )/
- (?P<id>[^/?#&]+)
- '''
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ entry_id = mobj.group('kaltura_id')
+
+ if not entry_id:
+ webpage = self._download_webpage(url, video_id)
+ api_path = self._search_regex(
+ r'["\']apiPath["\']\s*:\s*["\']([^"^\']+)["\']',
+ webpage, 'api path')
+ api_url = 'https://www.%s%s' % (mobj.group('host'), api_path)
+ payload = {
+ 'query': '''query VideoContext($articleId: ID!) {
+ article: node(id: $articleId) {
+ ... on Article {
+ mainAssetRelation {
+ asset {
+ ... on VideoAsset {
+ kalturaId
+ }
+ }
+ }
+ }
+ }
+ }''',
+ 'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
+ }
+ json_data = self._download_json(
+ api_url, video_id, headers={
+ 'Content-Type': 'application/json',
+ },
+ data=json.dumps(payload).encode())
+ entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
- _TEST = {
- 'url': 'http://www.telezueri.ch/all-episodes/astrotalk',
- 'info_dict': {
- 'id': 'astrotalk',
- 'title': 'TeleZüri: AstroTalk - alle episoden',
- 'description': 'md5:4c0f7e7d741d906004266e295ceb4a26',
- },
- 'playlist_mincount': 13,
- }
-
- def _real_extract(self, url):
- playlist_id = self._match_id(url)
- webpage = self._download_webpage(url, playlist_id)
- episodes = get_element_by_class('search-mobile-box', webpage)
- entries = [self.url_result(
- urljoin(url, m.group('url'))) for m in re.finditer(
- r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1', episodes)]
- title = self._og_search_title(webpage, fatal=False)
- description = self._og_search_description(webpage)
- return self.playlist_result(entries, playlist_id, title, description)
+ return self.url_result(
+ 'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
+ ie=KalturaIE.ie_key(), video_id=entry_id)
# coding: utf-8
from __future__ import unicode_literals
-import re
+import base64
import json
+import re
+import struct
from .common import InfoExtractor
from .adobepass import AdobePassIE
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
+ def _brightcove_new_url_result(self, publisher_id, video_id):
+ brightcove_new_url = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s' % (publisher_id, video_id)
+ return self.url_result(brightcove_new_url, BrightcoveNewIE.ie_key(), video_id)
+
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
'error message', default=None)
if error_msg is not None:
+ publisher_id = query.get('publisherId')
+ if publisher_id and publisher_id[0].isdigit():
+ publisher_id = publisher_id[0]
+ if not publisher_id:
+ player_key = query.get('playerKey')
+ if player_key and ',' in player_key[0]:
+ player_key = player_key[0]
+ else:
+ player_id = query.get('playerID')
+ if player_id and player_id[0].isdigit():
+ player_page = self._download_webpage(
+ 'http://link.brightcove.com/services/player/bcpid' + player_id[0],
+ video_id, headers=headers, fatal=False)
+ if player_page:
+ player_key = self._search_regex(
+ r'<param\s+name="playerKey"\s+value="([\w~,-]+)"',
+ player_page, 'player key', fatal=False)
+ if player_key:
+ enc_pub_id = player_key.split(',')[1].replace('~', '=')
+ publisher_id = struct.unpack('>Q', base64.urlsafe_b64decode(enc_pub_id))[0]
+ if publisher_id:
+ return self._brightcove_new_url_result(publisher_id, video_id)
raise ExtractorError(
'brightcove said: %s' % error_msg, expected=True)
else:
return ad_info
- if 'url' not in info and not info.get('formats'):
- raise ExtractorError('Unable to extract video url for %s' % video_id)
+ if not info.get('url') and not info.get('formats'):
+ uploader_id = info.get('uploader_id')
+ if uploader_id:
+ info.update(self._brightcove_new_url_result(uploader_id, video_id))
+ else:
+ raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..utils import int_or_none
-
-
-_translation_table = {
- 'a': 'h', 'd': 'e', 'e': 'v', 'f': 'o', 'g': 'f', 'i': 'd', 'l': 'n',
- 'm': 'a', 'n': 'm', 'p': 'u', 'q': 't', 'r': 's', 'v': 'p', 'x': 'r',
- 'y': 'l', 'z': 'i',
- '$': ':', '&': '.', '(': '=', '^': '&', '=': '/',
-}
-
-
-def _decode(s):
- return ''.join(_translation_table.get(c, c) for c in s)
+from ..utils import (
+ int_or_none,
+ url_or_none,
+)
class CliphunterIE(InfoExtractor):
formats = []
for format_id, f in gexo_files.items():
- video_url = f.get('url')
+ video_url = url_or_none(f.get('url'))
if not video_url:
continue
fmt = f.get('fmt')
height = f.get('h')
format_id = '%s_%sp' % (fmt, height) if fmt and height else format_id
formats.append({
- 'url': _decode(video_url),
+ 'url': video_url,
'format_id': format_id,
'width': int_or_none(f.get('w')),
'height': int_or_none(height),
# coding: utf-8
from __future__ import unicode_literals
+
from .common import InfoExtractor
from ..utils import smuggle_url
{'force_smil_url': True}),
'id': video_id,
}
+
+
+class CNBCVideoIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?cnbc\.com/video/(?:[^/]+/)+(?P<id>[^./?#&]+)'
+ _TEST = {
+ 'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html',
+ 'info_dict': {
+ 'id': '7000031301',
+ 'ext': 'mp4',
+ 'title': "Trump: I don't necessarily agree with raising rates",
+ 'description': 'md5:878d8f0b4ebb5bb1dda3514b91b49de3',
+ 'timestamp': 1531958400,
+ 'upload_date': '20180719',
+ 'uploader': 'NBCU-CNBC',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+ video_id = self._search_regex(
+ r'content_id["\']\s*:\s*["\'](\d+)', webpage, display_id,
+ 'video id')
+ return self.url_result(
+ 'http://video.cnbc.com/gallery/?video=%s' % video_id,
+ CNBCIE.ie_key())
update_url_query,
urljoin,
url_basename,
+ url_or_none,
xpath_element,
xpath_text,
xpath_with_ns,
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
uploader_url: Full URL to a personal webpage of the video uploader.
+ channel: Full name of the channel the video is uploaded on.
+ Note that channel fields may or may not repeat uploader
+ fields. This depends on a particular extractor.
+ channel_id: Id of the channel.
+ channel_url: Full URL to a channel webpage.
location: Physical location where the video was filmed.
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
+ # Retain reference to error to prevent file object from
+ # being closed before it can be read. Works around the
+ # effects of <https://bugs.python.org/issue15002>
+ # introduced in Python 3.4.1.
+ err.fp._error = err
return err.fp
if errnote is False:
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
info.update({
- 'url': e.get('contentUrl'),
+ 'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
- 'thumbnail': e.get('thumbnailUrl') or e.get('thumbnailURL'),
+ 'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
# However, this is not always respected, for example, [2]
# contains EXT-X-STREAM-INF tag which references AUDIO
# rendition group but does not have CODECS and despite
- # referencing audio group an audio group, it represents
- # a complete (with audio and video) format. So, for such cases
- # we will ignore references to rendition groups and treat them
+ # referencing an audio group it represents a complete
+ # (with audio and video) format. So, for such cases we will
+ # ignore references to rendition groups and treat them
# as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none':
audio_group = groups.get(audio_group_id)
import re
import json
+import xml.etree.ElementTree as etree
import zlib
from hashlib import sha1
data['req'] = 'RpcApi' + method
data = compat_urllib_parse_urlencode(data).encode('utf-8')
return self._download_xml(
- 'http://www.crunchyroll.com/xml/',
+ 'https://www.crunchyroll.com/xml/',
video_id, note, fatal=False, data=data, headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
'Downloading subtitles for ' + sub_name, data={
'subtitle_script_id': sub_id,
})
- if sub_doc is None:
+ if not isinstance(sub_doc, etree.Element):
continue
sid = sub_doc.get('id')
iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
webpage, 'vilos media', default='{}'), video_id)
media_metadata = media.get('metadata') or {}
+ language = self._search_regex(
+ r'(?:vilos\.config\.player\.language|LOCALE)\s*=\s*(["\'])(?P<lang>(?:(?!\1).)+)\1',
+ webpage, 'language', default=None, group='lang')
+
video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
formats = []
for stream in media.get('streams', []):
- formats.extend(self._extract_vrv_formats(
+ audio_lang = stream.get('audio_lang')
+ hardsub_lang = stream.get('hardsub_lang')
+ vrv_formats = self._extract_vrv_formats(
stream.get('url'), video_id, stream.get('format'),
- stream.get('audio_lang'), stream.get('hardsub_lang')))
+ audio_lang, hardsub_lang)
+ for f in vrv_formats:
+ if not hardsub_lang:
+ f['preference'] = 1
+ language_preference = 0
+ if audio_lang == language:
+ language_preference += 1
+ if hardsub_lang == language:
+ language_preference += 1
+ if language_preference:
+ f['language_preference'] = language_preference
+ formats.extend(vrv_formats)
if not formats:
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
'video_quality': stream_quality,
'current_page': url,
})
- if streamdata is not None:
+ if isinstance(streamdata, etree.Element):
stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info is not None:
stream_infos.append(stream_info)
'video_format': stream_format,
'video_encode_quality': stream_quality,
})
- if stream_info is not None:
+ if isinstance(stream_info, etree.Element):
stream_infos.append(stream_info)
for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id')
'ext': 'flv',
})
formats.append(format_info)
- self._sort_formats(formats, ('height', 'width', 'tbr', 'fps'))
+ self._sort_formats(formats, ('preference', 'language_preference', 'height', 'width', 'tbr', 'fps'))
metadata = self._call_rpc_api(
'VideoPlayer_GetMediaMetadata', video_id,
series = self._html_search_regex(
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
- season = xpath_text(metadata, 'series_title')
- episode = xpath_text(metadata, 'episode_title') or media_metadata.get('title')
- episode_number = int_or_none(xpath_text(metadata, 'episode_number') or media_metadata.get('episode_number'))
+ season = episode = episode_number = duration = thumbnail = None
+
+ if isinstance(metadata, etree.Element):
+ season = xpath_text(metadata, 'series_title')
+ episode = xpath_text(metadata, 'episode_title')
+ episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
+ duration = float_or_none(media_metadata.get('duration'), 1000)
+ thumbnail = xpath_text(metadata, 'episode_image_url')
+
+ if not episode:
+ episode = media_metadata.get('title')
+ if not episode_number:
+ episode_number = int_or_none(media_metadata.get('episode_number'))
+ if not thumbnail:
+ thumbnail = media_metadata.get('thumbnail', {}).get('url')
season_number = int_or_none(self._search_regex(
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
'id': video_id,
'title': video_title,
'description': video_description,
- 'duration': float_or_none(media_metadata.get('duration'), 1000),
- 'thumbnail': xpath_text(metadata, 'episode_image_url') or media_metadata.get('thumbnail', {}).get('url'),
+ 'duration': duration,
+ 'thumbnail': thumbnail,
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series,
from .common import InfoExtractor
from ..utils import (
+ ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
def _real_extract(self, url):
video_id = self._match_id(url)
- video_data = self._download_json(
+ data = self._download_json(
'http://images.cwtv.com/feed/mobileapp/video-meta/apiversion_8/guid_' + video_id,
- video_id)['video']
+ video_id)
+ if data.get('result') != 'ok':
+ raise ExtractorError(data['msg'], expected=True)
+ video_data = data['video']
title = video_data['title']
mpx_url = video_data.get('mpx_url') or 'http://link.theplatform.com/s/cwtv/media/guid/2703454149/%s?formats=M3U' % video_id
'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id)
video_sources = self._download_json(sources_url, video_id)
+ body = video_sources.get('body')
+ if body:
+ video_sources = body
formats = []
for rendition in video_sources['renditions']:
parse_iso8601,
sanitized_Request,
str_to_int,
+ try_get,
unescapeHTML,
+ update_url_query,
+ url_or_none,
urlencode_postdata,
)
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
- player = self._parse_json(player_v5, video_id)
- metadata = player['metadata']
-
- if metadata.get('error', {}).get('type') == 'password_protected':
+ player = self._parse_json(player_v5, video_id, fatal=False) or {}
+ metadata = try_get(player, lambda x: x['metadata'], dict)
+ if not metadata:
+ metadata_url = url_or_none(try_get(
+ player, lambda x: x['context']['metadata_template_url1']))
+ if metadata_url:
+ metadata_url = metadata_url.replace(':videoId', video_id)
+ else:
+ metadata_url = update_url_query(
+ 'https://www.dailymotion.com/player/metadata/video/%s'
+ % video_id, {
+ 'embedder': url,
+ 'integration': 'inline',
+ 'GK_PV5_NEON': '1',
+ })
+ metadata = self._download_json(
+ metadata_url, video_id, 'Downloading metadata JSON')
+
+ if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
encode_base_n,
ExtractorError,
int_or_none,
+ merge_dicts,
parse_duration,
str_to_int,
url_or_none,
'display_id': 'Infamous-Tiffany-Teen-Strip-Tease-Video',
'ext': 'mp4',
'title': 'Infamous Tiffany Teen Strip Tease Video',
+ 'description': 'md5:764f39abf932daafa37485eb46efa152',
+ 'timestamp': 1232520922,
+ 'upload_date': '20090121',
'duration': 1838,
'view_count': int,
'age_limit': 18,
},
+ 'params': {
+ 'proxy': '127.0.0.1:8118'
+ }
}, {
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
})
self._sort_formats(formats)
- duration = parse_duration(self._html_search_meta('duration', webpage))
+ json_ld = self._search_json_ld(webpage, display_id, default={})
+
+ duration = parse_duration(self._html_search_meta(
+ 'duration', webpage, default=None))
view_count = str_to_int(self._search_regex(
r'id="cinemaviews">\s*([0-9,]+)\s*<small>views',
webpage, 'view count', fatal=False))
- return {
+ return merge_dicts(json_ld, {
'id': video_id,
'display_id': display_id,
'title': title,
'view_count': view_count,
'formats': formats,
'age_limit': 18,
- }
+ })
AWAANLiveIE,
AWAANSeasonIE,
)
-from .azmedien import (
- AZMedienIE,
- AZMedienPlaylistIE,
- AZMedienShowPlaylistIE,
-)
+from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .clubic import ClubicIE
from .clyp import ClypIE
from .cmt import CMTIE
-from .cnbc import CNBCIE
+from .cnbc import (
+ CNBCIE,
+ CNBCVideoIE,
+)
from .cnn import (
CNNIE,
CNNBlogsIE,
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
+ EHFTVIE,
ITTFIE,
)
from .lci import LCIIE
LimelightChannelListIE,
)
from .line import LineTVIE
+from .linkedin import (
+ LinkedInLearningIE,
+ LinkedInLearningCourseIE,
+)
from .litv import LiTVIE
from .liveleak import (
LiveLeakIE,
)
from .stitcher import StitcherIE
from .sport5 import Sport5IE
-from .sportbox import SportBoxEmbedIE
+from .sportbox import SportBoxIE
from .sportdeutschland import SportDeutschlandIE
from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE
TV2ArticleIE,
)
from .tv2hu import TV2HuIE
-from .tv3 import TV3IE
from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE
from .tva import TVAIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
+from .twitcasting import TwitCastingIE
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import (
+ BBVTVIE,
+ EinsUndEinsTVIE,
+ EWETVIE,
+ GlattvisionTVIE,
+ MNetTVIE,
+ MyVisionTVIE,
+ NetPlusIE,
+ OsnatelTVIE,
+ QuantumTVIE,
QuicklineIE,
QuicklineLiveIE,
+ SAKTVIE,
+ VTXTVIE,
+ WalyTVIE,
ZattooIE,
ZattooLiveIE,
)
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
- _VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true'
+ _VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary'
_TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
import re
from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..compat import (
+ compat_b64decode,
+ compat_str,
+ compat_urllib_parse_unquote,
+ compat_urlparse,
+)
from ..utils import (
+ int_or_none,
parse_duration,
parse_iso8601,
+ str_or_none,
str_to_int,
+ try_get,
+ unified_timestamp,
+ url_or_none,
)
class FourTubeBaseIE(InfoExtractor):
+ _TKN_HOST = 'tkn.kodicdn.com'
+
+ def _extract_formats(self, url, video_id, media_id, sources):
+ token_url = 'https://%s/%s/desktop/%s' % (
+ self._TKN_HOST, media_id, '+'.join(sources))
+
+ parsed_url = compat_urlparse.urlparse(url)
+ tokens = self._download_json(token_url, video_id, data=b'', headers={
+ 'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
+ 'Referer': url,
+ })
+ formats = [{
+ 'url': tokens[format]['token'],
+ 'format_id': format + 'p',
+ 'resolution': format + 'p',
+ 'quality': int(format),
+ } for format in sources]
+ self._sort_formats(formats)
+ return formats
+
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
media_id = params[0]
sources = ['%s' % p for p in params[2]]
- token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
- media_id, '+'.join(sources))
-
- parsed_url = compat_urlparse.urlparse(url)
- tokens = self._download_json(token_url, video_id, data=b'', headers={
- 'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
- 'Referer': url,
- })
- formats = [{
- 'url': tokens[format]['token'],
- 'format_id': format + 'p',
- 'resolution': format + 'p',
- 'quality': int(format),
- } for format in sources]
- self._sort_formats(formats)
+ formats = self._extract_formats(url, video_id, media_id, sources)
return {
'id': video_id,
class PornTubeIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
+ _TKN_HOST = 'tkn.porntube.com'
_TESTS = [{
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
'info_dict': {
'ext': 'mp4',
'title': 'Teen couple doing anal',
'uploader': 'Alexy',
- 'uploader_id': 'Alexy',
+ 'uploader_id': '91488',
'upload_date': '20150606',
'timestamp': 1433595647,
'duration': 5052,
'view_count': int,
'like_count': int,
- 'categories': list,
+ 'age_limit': 18,
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ 'url': 'https://www.porntube.com/videos/squirting-teen-ballerina-ecg_1331406',
+ 'info_dict': {
+ 'id': '1331406',
+ 'ext': 'mp4',
+ 'title': 'Squirting Teen Ballerina on ECG',
+ 'uploader': 'Exploited College Girls',
+ 'uploader_id': '665',
+ 'channel': 'Exploited College Girls',
+ 'channel_id': '665',
+ 'upload_date': '20130920',
+ 'timestamp': 1379685485,
+ 'duration': 851,
+ 'view_count': int,
+ 'like_count': int,
'age_limit': 18,
},
'params': {
'only_matching': True,
}]
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id, display_id = mobj.group('id', 'display_id')
+
+ webpage = self._download_webpage(url, display_id)
+
+ video = self._parse_json(
+ self._search_regex(
+ r'INITIALSTATE\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
+ webpage, 'data', group='value'), video_id,
+ transform_source=lambda x: compat_urllib_parse_unquote(
+ compat_b64decode(x).decode('utf-8')))['page']['video']
+
+ title = video['title']
+ media_id = video['mediaId']
+ sources = [compat_str(e['height'])
+ for e in video['encodings'] if e.get('height')]
+ formats = self._extract_formats(url, video_id, media_id, sources)
+
+ thumbnail = url_or_none(video.get('masterThumb'))
+ uploader = try_get(video, lambda x: x['user']['username'], compat_str)
+ uploader_id = str_or_none(try_get(
+ video, lambda x: x['user']['id'], int))
+ channel = try_get(video, lambda x: x['channel']['name'], compat_str)
+ channel_id = str_or_none(try_get(
+ video, lambda x: x['channel']['id'], int))
+ like_count = int_or_none(video.get('likes'))
+ dislike_count = int_or_none(video.get('dislikes'))
+ view_count = int_or_none(video.get('playsQty'))
+ duration = int_or_none(video.get('durationInSeconds'))
+ timestamp = unified_timestamp(video.get('publishedAt'))
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'formats': formats,
+ 'thumbnail': thumbnail,
+ 'uploader': uploader or channel,
+ 'uploader_id': uploader_id or channel_id,
+ 'channel': channel,
+ 'channel_id': channel_id,
+ 'timestamp': timestamp,
+ 'like_count': like_count,
+ 'dislike_count': dislike_count,
+ 'view_count': view_count,
+ 'duration': duration,
+ 'age_limit': 18,
+ }
+
class PornerBrosIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
from .ooyala import OoyalaIE
from .rutv import RUTVIE
from .tvc import TVCIE
-from .sportbox import SportBoxEmbedIE
+from .sportbox import SportBoxIE
from .smotri import SmotriIE
from .myvi import MyviIE
from .condenast import CondeNastIE
return self.url_result(tvc_url, 'TVC')
# Look for embedded SportBox player
- sportbox_urls = SportBoxEmbedIE._extract_urls(webpage)
+ sportbox_urls = SportBoxIE._extract_urls(webpage)
if sportbox_urls:
- return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie='SportBoxEmbed')
+ return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie=SportBoxIE.ie_key())
# Look for embedded XHamster player
xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
# Look for Mediaset embeds
- mediaset_urls = MediasetIE._extract_urls(webpage)
+ mediaset_urls = MediasetIE._extract_urls(self, webpage)
if mediaset_urls:
return self.playlist_from_matches(
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
# coding: utf-8
from __future__ import unicode_literals
-import re
+import hashlib
+import hmac
+import time
from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import compat_HTTPError
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
+ try_get,
)
class HotStarBaseIE(InfoExtractor):
- _GEO_COUNTRIES = ['IN']
-
- def _download_json(self, *args, **kwargs):
- response = super(HotStarBaseIE, self)._download_json(*args, **kwargs)
- if response['resultCode'] != 'OK':
- if kwargs.get('fatal'):
- raise ExtractorError(
- response['errorDescription'], expected=True)
- return None
- return response['resultObj']
-
- def _download_content_info(self, content_id):
- return self._download_json(
- 'https://account.hotstar.com/AVS/besc', content_id, query={
- 'action': 'GetAggregatedContentDetails',
- 'appVersion': '5.0.40',
- 'channel': 'PCTV',
- 'contentId': content_id,
- })['contentInfo'][0]
+ _AKAMAI_ENCRYPTION_KEY = b'\x05\xfc\x1a\x01\xca\xc9\x4b\xc4\x12\xfc\x53\x12\x07\x75\xf9\xee'
+
+ def _call_api(self, path, video_id, query_name='contentId'):
+ st = int(time.time())
+ exp = st + 6000
+ auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
+ auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
+ response = self._download_json(
+ 'https://api.hotstar.com/' + path,
+ video_id, headers={
+ 'hotstarauth': auth,
+ 'x-country-code': 'IN',
+ 'x-platform-code': 'JIO',
+ }, query={
+ query_name: video_id,
+ 'tas': 10000,
+ })
+ if response['statusCode'] != 'OK':
+ raise ExtractorError(
+ response['body']['message'], expected=True)
+ return response['body']['results']
class HotStarIE(HotStarBaseIE):
+ IE_NAME = 'hotstar'
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{
- 'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
+ 'url': 'https://www.hotstar.com/can-you-not-spread-rumours/1000076273',
'info_dict': {
'id': '1000076273',
'ext': 'mp4',
- 'title': 'On Air With AIB',
+ 'title': 'Can You Not Spread Rumours?',
'description': 'md5:c957d8868e9bc793ccb813691cc4c434',
- 'timestamp': 1447227000,
+ 'timestamp': 1447248600,
'upload_date': '20151111',
'duration': 381,
},
'url': 'http://www.hotstar.com/1000000515',
'only_matching': True,
}]
+ _GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url)
- video_data = self._download_content_info(video_id)
+ webpage = self._download_webpage(url, video_id)
+ app_state = self._parse_json(self._search_regex(
+ r'<script>window\.APP_STATE\s*=\s*({.+?})</script>',
+ webpage, 'app state'), video_id)
+ video_data = {}
+ for v in app_state.values():
+ content = try_get(v, lambda x: x['initialState']['contentData']['content'], dict)
+ if content and content.get('contentId') == video_id:
+ video_data = content
- title = video_data['episodeTitle']
+ title = video_data['title']
- if video_data.get('encrypted') == 'Y':
+ if video_data.get('drmProtected'):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = []
- for f in ('JIO',):
- format_data = self._download_json(
- 'http://getcdn.hotstar.com/AVS/besc',
- video_id, 'Downloading %s JSON metadata' % f,
- fatal=False, query={
- 'action': 'GetCDN',
- 'asJson': 'Y',
- 'channel': f,
- 'id': video_id,
- 'type': 'VOD',
- })
- if format_data:
- format_url = format_data.get('src')
- if not format_url:
- continue
- ext = determine_ext(format_url)
- if ext == 'm3u8':
- formats.extend(self._extract_m3u8_formats(
- format_url, video_id, 'mp4',
- m3u8_id='hls', fatal=False))
- elif ext == 'f4m':
- # produce broken files
- continue
- else:
- formats.append({
- 'url': format_url,
- 'width': int_or_none(format_data.get('width')),
- 'height': int_or_none(format_data.get('height')),
- })
+ format_data = self._call_api('h/v1/play', video_id)['item']
+ format_url = format_data['playbackUrl']
+ ext = determine_ext(format_url)
+ if ext == 'm3u8':
+ try:
+ formats.extend(self._extract_m3u8_formats(
+ format_url, video_id, 'mp4', m3u8_id='hls'))
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+ self.raise_geo_restricted(countries=['IN'])
+ raise
+ elif ext == 'f4m':
+ # produce broken files
+ pass
+ else:
+ formats.append({
+ 'url': format_url,
+ 'width': int_or_none(format_data.get('width')),
+ 'height': int_or_none(format_data.get('height')),
+ })
self._sort_formats(formats)
return {
'title': title,
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
- 'timestamp': int_or_none(video_data.get('broadcastDate')),
+ 'timestamp': int_or_none(video_data.get('broadcastDate') or video_data.get('startDate')),
'formats': formats,
+ 'channel': video_data.get('channelName'),
+ 'channel_id': video_data.get('channelId'),
+ 'series': video_data.get('showName'),
+ 'season': video_data.get('seasonName'),
+ 'season_number': int_or_none(video_data.get('seasonNo')),
+ 'season_id': video_data.get('seasonId'),
'episode': title,
- 'episode_number': int_or_none(video_data.get('episodeNumber')),
- 'series': video_data.get('contentTitle'),
+ 'episode_number': int_or_none(video_data.get('episodeNo')),
}
class HotStarPlaylistIE(HotStarBaseIE):
IE_NAME = 'hotstar:playlist'
- _VALID_URL = r'(?P<url>https?://(?:www\.)?hotstar\.com/tv/[^/]+/(?P<content_id>\d+))/(?P<type>[^/]+)/(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:www\.)?hotstar\.com/tv/[^/]+/s-\w+/list/[^/]+/t-(?P<id>\w+)'
_TESTS = [{
- 'url': 'http://www.hotstar.com/tv/pratidaan/14982/episodes/14812/9993',
+ 'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/popular-clips/t-3_2_26',
'info_dict': {
- 'id': '14812',
+ 'id': '3_2_26',
},
- 'playlist_mincount': 75,
+ 'playlist_mincount': 20,
}, {
- 'url': 'http://www.hotstar.com/tv/pratidaan/14982/popular-clips/9998/9998',
+ 'url': 'https://www.hotstar.com/tv/savdhaan-india/s-26/list/extras/t-2480',
'only_matching': True,
}]
- _ITEM_TYPES = {
- 'episodes': 'EPISODE',
- 'popular-clips': 'CLIPS',
- }
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- base_url = mobj.group('url')
- content_id = mobj.group('content_id')
- playlist_type = mobj.group('type')
-
- content_info = self._download_content_info(content_id)
- playlist_id = compat_str(content_info['categoryId'])
-
- collection = self._download_json(
- 'https://search.hotstar.com/AVS/besc', playlist_id, query={
- 'action': 'SearchContents',
- 'appVersion': '5.0.40',
- 'channel': 'PCTV',
- 'moreFilters': 'series:%s;' % playlist_id,
- 'query': '*',
- 'searchOrder': 'last_broadcast_date desc,year desc,title asc',
- 'type': self._ITEM_TYPES.get(playlist_type, 'EPISODE'),
- })
+ playlist_id = self._match_id(url)
+
+ collection = self._call_api('o/v1/tray/find', playlist_id, 'uqId')
entries = [
self.url_result(
- '%s/_/%s' % (base_url, video['contentId']),
+ 'https://www.hotstar.com/%s' % video['contentId'],
ie=HotStarIE.ie_key(), video_id=video['contentId'])
- for video in collection['response']['docs']
+ for video in collection['assets']['items']
if video.get('contentId')]
return self.playlist_result(entries, playlist_id)
class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
- _VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:www\.)?ivi\.(?:ru|tv)/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'Only works from Russia',
- }
+ },
+ {
+ 'url': 'https://www.ivi.tv/watch/33560/',
+ 'only_matching': True,
+ },
]
# Sorted by quality
class JamendoIE(JamendoBaseIE):
- _VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
- _TEST = {
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:
+ licensing\.jamendo\.com/[^/]+|
+ (?:www\.)?jamendo\.com
+ )
+ /track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)
+ '''
+ _TESTS = [{
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
'md5': '6e9e82ed6db98678f171c25a8ed09ffd',
'info_dict': {
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg'
}
- }
+ }, {
+ 'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
mobj = self._VALID_URL_RE.match(url)
track_id = mobj.group('id')
display_id = mobj.group('display_id')
- webpage = self._download_webpage(url, display_id)
+ webpage = self._download_webpage(
+ 'https://www.jamendo.com/track/%s/%s' % (track_id, display_id),
+ display_id)
title, artist, track = self._extract_meta(webpage)
from __future__ import unicode_literals
import json
+import re
from .common import InfoExtractor
from ..utils import (
def _extract_token_url(self, stream_access_url, video_id, data):
return self._download_json(
- stream_access_url, video_id, headers={
+ self._proto_relative_url(stream_access_url, 'https:'), video_id,
+ headers={
'Content-Type': 'application/json',
}, data=json.dumps(data).encode())['data']['stream-access'][0]
}
-class Laola1TvIE(Laola1TvEmbedIE):
+class Laola1TvBaseIE(Laola1TvEmbedIE):
+ def _extract_video(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+
+ if 'Dieser Livestream ist bereits beendet.' in webpage:
+ raise ExtractorError('This live stream has already finished.', expected=True)
+
+ conf = self._parse_json(self._search_regex(
+ r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
+ display_id,
+ transform_source=lambda s: js_to_json(re.sub(r'shareurl:.+,', '', s)))
+ video_id = conf['videoid']
+
+ config = self._download_json(conf['configUrl'], video_id, query={
+ 'videoid': video_id,
+ 'partnerid': conf['partnerid'],
+ 'language': conf.get('language', ''),
+ 'portal': conf.get('portalid', ''),
+ })
+ error = config.get('error')
+ if error:
+ raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
+ video_data = config['video']
+ title = video_data['title']
+ is_live = video_data.get('isLivestream') and video_data.get('isLive')
+ meta = video_data.get('metaInformation')
+ sports = meta.get('sports')
+ categories = sports.split(',') if sports else []
+
+ token_url = self._extract_token_url(
+ video_data['streamAccess'], video_id,
+ video_data['abo']['required'])
+
+ formats = self._extract_formats(token_url, video_id)
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': self._live_title(title) if is_live else title,
+ 'description': video_data.get('description'),
+ 'thumbnail': video_data.get('image'),
+ 'categories': categories,
+ 'formats': formats,
+ 'is_live': is_live,
+ }
+
+
+class Laola1TvIE(Laola1TvBaseIE):
IE_NAME = 'laola1tv'
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P<id>[^/?#&]+)'
+
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
}]
def _real_extract(self, url):
- display_id = self._match_id(url)
+ return self._extract_video(url)
- webpage = self._download_webpage(url, display_id)
- if 'Dieser Livestream ist bereits beendet.' in webpage:
- raise ExtractorError('This live stream has already finished.', expected=True)
-
- conf = self._parse_json(self._search_regex(
- r'(?s)conf\s*=\s*({.+?});', webpage, 'conf'),
- display_id, js_to_json)
-
- video_id = conf['videoid']
-
- config = self._download_json(conf['configUrl'], video_id, query={
- 'videoid': video_id,
- 'partnerid': conf['partnerid'],
- 'language': conf.get('language', ''),
- 'portal': conf.get('portalid', ''),
- })
- error = config.get('error')
- if error:
- raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
-
- video_data = config['video']
- title = video_data['title']
- is_live = video_data.get('isLivestream') and video_data.get('isLive')
- meta = video_data.get('metaInformation')
- sports = meta.get('sports')
- categories = sports.split(',') if sports else []
-
- token_url = self._extract_token_url(
- video_data['streamAccess'], video_id,
- video_data['abo']['required'])
+class EHFTVIE(Laola1TvBaseIE):
+ IE_NAME = 'ehftv'
+ _VALID_URL = r'https?://(?:www\.)?ehftv\.com/[a-z]+(?:-[a-z]+)?/[^/]+/(?P<id>[^/?#&]+)'
- formats = self._extract_formats(token_url, video_id)
+ _TESTS = [{
+ 'url': 'https://www.ehftv.com/int/video/paris-saint-germain-handball-pge-vive-kielce/1166761',
+ 'info_dict': {
+ 'id': '1166761',
+ 'display_id': 'paris-saint-germain-handball-pge-vive-kielce',
+ 'ext': 'mp4',
+ 'title': 'Paris Saint-Germain Handball - PGE Vive Kielce',
+ 'is_live': False,
+ 'categories': ['Handball'],
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }]
- return {
- 'id': video_id,
- 'display_id': display_id,
- 'title': self._live_title(title) if is_live else title,
- 'description': video_data.get('description'),
- 'thumbnail': video_data.get('image'),
- 'categories': categories,
- 'formats': formats,
- 'is_live': is_live,
- }
+ def _real_extract(self, url):
+ return self._extract_video(url)
class ITTFIE(InfoExtractor):
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ float_or_none,
+ int_or_none,
+ urlencode_postdata,
+)
+
+
+class LinkedInLearningBaseIE(InfoExtractor):
+ _NETRC_MACHINE = 'linkedin'
+
+ def _call_api(self, course_slug, fields, video_slug=None, resolution=None):
+ query = {
+ 'courseSlug': course_slug,
+ 'fields': fields,
+ 'q': 'slugs',
+ }
+ sub = ''
+ if video_slug:
+ query.update({
+ 'videoSlug': video_slug,
+ 'resolution': '_%s' % resolution,
+ })
+ sub = ' %dp' % resolution
+ api_url = 'https://www.linkedin.com/learning-api/detailedCourses'
+ return self._download_json(
+ api_url, video_slug, 'Downloading%s JSON metadata' % sub, headers={
+ 'Csrf-Token': self._get_cookies(api_url)['JSESSIONID'].value,
+ }, query=query)['elements'][0]
+
+ def _get_video_id(self, urn, course_slug, video_slug):
+ if urn:
+ mobj = re.search(r'urn:li:lyndaCourse:\d+,(\d+)', urn)
+ if mobj:
+ return mobj.group(1)
+ return '%s/%s' % (course_slug, video_slug)
+
+ def _real_initialize(self):
+ email, password = self._get_login_info()
+ if email is None:
+ return
+
+ login_page = self._download_webpage(
+ 'https://www.linkedin.com/uas/login?trk=learning',
+ None, 'Downloading login page')
+ action_url = self._search_regex(
+ r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page, 'post url',
+ default='https://www.linkedin.com/uas/login-submit', group='url')
+ data = self._hidden_inputs(login_page)
+ data.update({
+ 'session_key': email,
+ 'session_password': password,
+ })
+ login_submit_page = self._download_webpage(
+ action_url, None, 'Logging in',
+ data=urlencode_postdata(data))
+ error = self._search_regex(
+ r'<span[^>]+class="error"[^>]*>\s*(.+?)\s*</span>',
+ login_submit_page, 'error', default=None)
+ if error:
+ raise ExtractorError(error, expected=True)
+
+
+class LinkedInLearningIE(LinkedInLearningBaseIE):
+ IE_NAME = 'linkedin:learning'
+ _VALID_URL = r'https?://(?:www\.)?linkedin\.com/learning/(?P<course_slug>[^/]+)/(?P<id>[^/?#]+)'
+ _TEST = {
+ 'url': 'https://www.linkedin.com/learning/programming-foundations-fundamentals/welcome?autoplay=true',
+ 'md5': 'a1d74422ff0d5e66a792deb996693167',
+ 'info_dict': {
+ 'id': '90426',
+ 'ext': 'mp4',
+ 'title': 'Welcome',
+ 'timestamp': 1430396150.82,
+ 'upload_date': '20150430',
+ },
+ }
+
+ def _real_extract(self, url):
+ course_slug, video_slug = re.match(self._VALID_URL, url).groups()
+
+ video_data = None
+ formats = []
+ for width, height in ((640, 360), (960, 540), (1280, 720)):
+ video_data = self._call_api(
+ course_slug, 'selectedVideo', video_slug, height)['selectedVideo']
+
+ video_url_data = video_data.get('url') or {}
+ progressive_url = video_url_data.get('progressiveUrl')
+ if progressive_url:
+ formats.append({
+ 'format_id': 'progressive-%dp' % height,
+ 'url': progressive_url,
+ 'height': height,
+ 'width': width,
+ 'source_preference': 1,
+ })
+
+ title = video_data['title']
+
+ audio_url = video_data.get('audio', {}).get('progressiveUrl')
+ if audio_url:
+ formats.append({
+ 'abr': 64,
+ 'ext': 'm4a',
+ 'format_id': 'audio',
+ 'url': audio_url,
+ 'vcodec': 'none',
+ })
+
+ streaming_url = video_url_data.get('streamingUrl')
+ if streaming_url:
+ formats.extend(self._extract_m3u8_formats(
+ streaming_url, video_slug, 'mp4',
+ 'm3u8_native', m3u8_id='hls', fatal=False))
+
+ self._sort_formats(formats, ('width', 'height', 'source_preference', 'tbr', 'abr'))
+
+ return {
+ 'id': self._get_video_id(video_data.get('urn'), course_slug, video_slug),
+ 'title': title,
+ 'formats': formats,
+ 'thumbnail': video_data.get('defaultThumbnail'),
+ 'timestamp': float_or_none(video_data.get('publishedOn'), 1000),
+ 'duration': int_or_none(video_data.get('durationInSeconds')),
+ }
+
+
+class LinkedInLearningCourseIE(LinkedInLearningBaseIE):
+ IE_NAME = 'linkedin:learning:course'
+ _VALID_URL = r'https?://(?:www\.)?linkedin\.com/learning/(?P<id>[^/?#]+)'
+ _TEST = {
+ 'url': 'https://www.linkedin.com/learning/programming-foundations-fundamentals',
+ 'info_dict': {
+ 'id': 'programming-foundations-fundamentals',
+ 'title': 'Programming Foundations: Fundamentals',
+ 'description': 'md5:76e580b017694eb89dc8e8923fff5c86',
+ },
+ 'playlist_mincount': 61,
+ }
+
+ @classmethod
+ def suitable(cls, url):
+ return False if LinkedInLearningIE.suitable(url) else super(LinkedInLearningCourseIE, cls).suitable(url)
+
+ def _real_extract(self, url):
+ course_slug = self._match_id(url)
+ course_data = self._call_api(course_slug, 'chapters,description,title')
+
+ entries = []
+ for chapter in course_data.get('chapters', []):
+ chapter_title = chapter.get('title')
+ for video in chapter.get('videos', []):
+ video_slug = video.get('slug')
+ if not video_slug:
+ continue
+ entries.append({
+ '_type': 'url_transparent',
+ 'id': self._get_video_id(video.get('urn'), course_slug, video_slug),
+ 'title': video.get('title'),
+ 'url': 'https://www.linkedin.com/learning/%s/%s' % (course_slug, video_slug),
+ 'chapter': chapter_title,
+ 'ie_key': LinkedInLearningIE.ie_key(),
+ })
+
+ return self.playlist_result(
+ entries, course_slug,
+ course_data.get('title'),
+ course_data.get('description'))
import re
from .theplatform import ThePlatformBaseIE
+from ..compat import (
+ compat_parse_qs,
+ compat_str,
+ compat_urllib_parse_urlparse,
+)
from ..utils import (
ExtractorError,
int_or_none,
}]
@staticmethod
- def _extract_urls(webpage):
- return [
- mobj.group('url')
- for mobj in re.finditer(
- r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>https?://(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml\?.*?\bid=\d+.*?)\1',
- webpage)]
+ def _extract_urls(ie, webpage):
+ def _qs(url):
+ return compat_parse_qs(compat_urllib_parse_urlparse(url).query)
+
+ def _program_guid(qs):
+ return qs.get('programGuid', [None])[0]
+
+ entries = []
+ for mobj in re.finditer(
+ r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml.*?)\1',
+ webpage):
+ embed_url = mobj.group('url')
+ embed_qs = _qs(embed_url)
+ program_guid = _program_guid(embed_qs)
+ if program_guid:
+ entries.append(embed_url)
+ continue
+ video_id = embed_qs.get('id', [None])[0]
+ if not video_id:
+ continue
+ urlh = ie._request_webpage(
+ embed_url, video_id, note='Following embed URL redirect')
+ embed_url = compat_str(urlh.geturl())
+ program_guid = _program_guid(_qs(embed_url))
+ if program_guid:
+ entries.append(embed_url)
+ return entries
def _real_extract(self, url):
guid = self._match_id(url)
'skip': 'Requires login',
}
+ _LOGIN_URL = 'https://front.njpwworld.com/auth/login'
+
def _real_initialize(self):
self._login()
if not username:
return True
+ # Setup session (will set necessary cookies)
+ self._request_webpage(
+ 'https://njpwworld.com/', None, note='Setting up session')
+
webpage, urlh = self._download_webpage_handle(
- 'https://njpwworld.com/auth/login', None,
+ self._LOGIN_URL, None,
note='Logging in', errnote='Unable to login',
data=urlencode_postdata({'login_id': username, 'pw': password}),
- headers={'Referer': 'https://njpwworld.com/auth'})
+ headers={'Referer': 'https://front.njpwworld.com/auth'})
# /auth/login will return 302 for successful logins
- if urlh.geturl() == 'https://njpwworld.com/auth/login':
+ if urlh.geturl() == self._LOGIN_URL:
self.report_warning('unable to login')
return False
class OpenloadIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
+ _VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o',
}, {
'url': 'https://oload.download/f/kUEfGclsU9o',
'only_matching': True,
+ }, {
+ 'url': 'https://oload.cloud/f/4ZDnBXRWiB8',
+ 'only_matching': True,
}, {
# Its title has not got its extension but url has it
'url': 'https://oload.download/f/N4Otkw39VCw/Tomb.Raider.2018.HDRip.XviD.AC3-EVO.avi.mp4',
'only_matching': True,
+ }, {
+ 'url': 'https://oload.cc/embed/5NEAbI2BDSk',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://oload.icu/f/-_i4y_F_Hs8',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://oload.fun/f/gb6G1H4sHXY',
+ 'only_matching': True,
}]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
strip_jsonp,
unescapeHTML,
unified_strdate,
+ url_or_none,
)
webpage, 'playlist', group='json'),
playlist_id, transform_source=unescapeHTML)['playlist']['videos']
- def quality_to_int(s):
- m = re.search('([0-9]+)', s)
- if m is None:
- return -1
- return int(m.group(1))
-
entries = []
for sd in data_jsb:
video_id, title = sd.get('id'), sd.get('title')
if not video_id or not title:
continue
video_id = compat_str(video_id)
- formats = [{
- 'preference': -10 if fd['delivery'] == 'hls' else None,
- 'format_id': '%s-%s-%s' % (
- fd['delivery'], fd['quality'], fd['quality_string']),
- 'url': fd['src'],
- 'protocol': fd['protocol'],
- 'quality': quality_to_int(fd['quality']),
- } for fd in sd['sources']]
+ formats = []
+ for fd in sd['sources']:
+ src = url_or_none(fd.get('src'))
+ if not src:
+ continue
+ format_id_list = []
+ for key in ('delivery', 'quality', 'quality_string'):
+ value = fd.get(key)
+ if value:
+ format_id_list.append(value)
+ format_id = '-'.join(format_id_list)
+ if determine_ext(fd['src']) == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ fd['src'], video_id, 'mp4', m3u8_id=format_id))
+ elif determine_ext(fd['src']) == 'f4m':
+ formats.extend(self._extract_f4m_formats(
+ fd['src'], video_id, f4m_id=format_id))
+ else:
+ formats.append({
+ 'format_id': format_id,
+ 'url': src,
+ 'protocol': fd.get('protocol'),
+ })
# Check for geoblocking.
# There is a property is_geoprotection, but that's always false
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..utils import js_to_json
+from ..utils import (
+ clean_html,
+ determine_ext,
+ int_or_none,
+ parse_iso8601,
+)
class PatreonIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?patreon\.com/creation\?hid=(?P<id>[^&#]+)'
- _TESTS = [
- {
- 'url': 'http://www.patreon.com/creation?hid=743933',
- 'md5': 'e25505eec1053a6e6813b8ed369875cc',
- 'info_dict': {
- 'id': '743933',
- 'ext': 'mp3',
- 'title': 'Episode 166: David Smalley of Dogma Debate',
- 'uploader': 'Cognitive Dissonance Podcast',
- 'thumbnail': 're:^https?://.*$',
- },
+ _VALID_URL = r'https?://(?:www\.)?patreon\.com/(?:creation\?hid=|posts/(?:[\w-]+-)?)(?P<id>\d+)'
+ _TESTS = [{
+ 'url': 'http://www.patreon.com/creation?hid=743933',
+ 'md5': 'e25505eec1053a6e6813b8ed369875cc',
+ 'info_dict': {
+ 'id': '743933',
+ 'ext': 'mp3',
+ 'title': 'Episode 166: David Smalley of Dogma Debate',
+ 'description': 'md5:713b08b772cd6271b9f3906683cfacdf',
+ 'uploader': 'Cognitive Dissonance Podcast',
+ 'thumbnail': 're:^https?://.*$',
+ 'timestamp': 1406473987,
+ 'upload_date': '20140727',
+ },
+ }, {
+ 'url': 'http://www.patreon.com/creation?hid=754133',
+ 'md5': '3eb09345bf44bf60451b8b0b81759d0a',
+ 'info_dict': {
+ 'id': '754133',
+ 'ext': 'mp3',
+ 'title': 'CD 167 Extra',
+ 'uploader': 'Cognitive Dissonance Podcast',
+ 'thumbnail': 're:^https?://.*$',
},
- {
- 'url': 'http://www.patreon.com/creation?hid=754133',
- 'md5': '3eb09345bf44bf60451b8b0b81759d0a',
- 'info_dict': {
- 'id': '754133',
- 'ext': 'mp3',
- 'title': 'CD 167 Extra',
- 'uploader': 'Cognitive Dissonance Podcast',
- 'thumbnail': 're:^https?://.*$',
- },
+ 'skip': 'Patron-only content',
+ }, {
+ 'url': 'https://www.patreon.com/creation?hid=1682498',
+ 'info_dict': {
+ 'id': 'SU4fj_aEMVw',
+ 'ext': 'mp4',
+ 'title': 'I\'m on Patreon!',
+ 'uploader': 'TraciJHines',
+ 'thumbnail': 're:^https?://.*$',
+ 'upload_date': '20150211',
+ 'description': 'md5:c5a706b1f687817a3de09db1eb93acd4',
+ 'uploader_id': 'TraciJHines',
},
- {
- 'url': 'https://www.patreon.com/creation?hid=1682498',
- 'info_dict': {
- 'id': 'SU4fj_aEMVw',
- 'ext': 'mp4',
- 'title': 'I\'m on Patreon!',
- 'uploader': 'TraciJHines',
- 'thumbnail': 're:^https?://.*$',
- 'upload_date': '20150211',
- 'description': 'md5:c5a706b1f687817a3de09db1eb93acd4',
- 'uploader_id': 'TraciJHines',
- },
- 'params': {
- 'noplaylist': True,
- 'skip_download': True,
- }
+ 'params': {
+ 'noplaylist': True,
+ 'skip_download': True,
}
- ]
+ }, {
+ 'url': 'https://www.patreon.com/posts/episode-166-of-743933',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.patreon.com/posts/743933',
+ 'only_matching': True,
+ }]
# Currently Patreon exposes download URL via hidden CSS, so login is not
# needed. Keeping this commented for when this inevitably changes.
def _real_extract(self, url):
video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
- title = self._og_search_title(webpage).strip()
-
- attach_fn = self._html_search_regex(
- r'<div class="attach"><a target="_blank" href="([^"]+)">',
- webpage, 'attachment URL', default=None)
- embed = self._html_search_regex(
- r'<div[^>]+id="watchCreation"[^>]*>\s*<iframe[^>]+src="([^"]+)"',
- webpage, 'embedded URL', default=None)
-
- if attach_fn is not None:
- video_url = 'http://www.patreon.com' + attach_fn
- thumbnail = self._og_search_thumbnail(webpage)
- uploader = self._html_search_regex(
- r'<strong>(.*?)</strong> is creating', webpage, 'uploader')
- elif embed is not None:
- return self.url_result(embed)
- else:
- playlist = self._parse_json(self._search_regex(
- r'(?s)new\s+jPlayerPlaylist\(\s*\{\s*[^}]*},\s*(\[.*?,?\s*\])',
- webpage, 'playlist JSON'),
- video_id, transform_source=js_to_json)
- data = playlist[0]
- video_url = self._proto_relative_url(data['mp3'])
- thumbnail = self._proto_relative_url(data.get('cover'))
- uploader = data.get('artist')
-
- return {
+ post = self._download_json(
+ 'https://www.patreon.com/api/posts/' + video_id, video_id)
+ attributes = post['data']['attributes']
+ title = attributes['title'].strip()
+ image = attributes.get('image') or {}
+ info = {
'id': video_id,
- 'url': video_url,
- 'ext': 'mp3',
'title': title,
- 'uploader': uploader,
- 'thumbnail': thumbnail,
+ 'description': clean_html(attributes.get('content')),
+ 'thumbnail': image.get('large_url') or image.get('url'),
+ 'timestamp': parse_iso8601(attributes.get('published_at')),
+ 'like_count': int_or_none(attributes.get('like_count')),
+ 'comment_count': int_or_none(attributes.get('comment_count')),
}
+
+ def add_file(file_data):
+ file_url = file_data.get('url')
+ if file_url:
+ info.update({
+ 'url': file_url,
+ 'ext': determine_ext(file_data.get('name'), 'mp3'),
+ })
+
+ for i in post.get('included', []):
+ i_type = i.get('type')
+ if i_type == 'attachment':
+ add_file(i.get('attributes') or {})
+ elif i_type == 'user':
+ user_attributes = i.get('attributes')
+ if user_attributes:
+ info.update({
+ 'uploader': user_attributes.get('full_name'),
+ 'uploader_url': user_attributes.get('url'),
+ })
+
+ if not info.get('url'):
+ add_file(attributes.get('post_file') or {})
+
+ if not info.get('url'):
+ info.update({
+ '_type': 'url',
+ 'url': attributes['embed']['url'],
+ })
+
+ return info
from __future__ import unicode_literals
from .common import InfoExtractor
+from ..compat import compat_str
from ..utils import (
- float_or_none,
- int_or_none,
- parse_iso8601,
- xpath_text,
+ try_get,
+ urljoin,
)
class PhilharmonieDeParisIE(InfoExtractor):
IE_DESC = 'Philharmonie de Paris'
- _VALID_URL = r'https?://live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)(?P<id>\d+)'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:
+ live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
+ pad\.philharmoniedeparis\.fr/doc/CIMU/
+ )
+ (?P<id>\d+)
+ '''
_TESTS = [{
+ 'url': 'http://pad.philharmoniedeparis.fr/doc/CIMU/1086697/jazz-a-la-villette-knower',
+ 'md5': 'a0a4b195f544645073631cbec166a2c2',
+ 'info_dict': {
+ 'id': '1086697',
+ 'ext': 'mp4',
+ 'title': 'Jazz à la Villette : Knower',
+ },
+ }, {
'url': 'http://live.philharmoniedeparis.fr/concert/1032066.html',
'info_dict': {
'id': '1032066',
- 'ext': 'flv',
- 'title': 'md5:d1f5585d87d041d07ce9434804bc8425',
- 'timestamp': 1428179400,
- 'upload_date': '20150404',
- 'duration': 6592.278,
+ 'title': 'md5:0a031b81807b3593cffa3c9a87a167a0',
},
- 'params': {
- # rtmp download
- 'skip_download': True,
- }
+ 'playlist_mincount': 2,
}, {
'url': 'http://live.philharmoniedeparis.fr/Concert/1030324.html',
'only_matching': True,
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True,
}]
+ _LIVE_URL = 'https://live.philharmoniedeparis.fr'
def _real_extract(self, url):
video_id = self._match_id(url)
- concert = self._download_xml(
- 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=%s' % video_id,
- video_id).find('./concert')
+ config = self._download_json(
+ '%s/otoPlayer/config.ashx' % self._LIVE_URL, video_id, query={
+ 'id': video_id,
+ 'lang': 'fr-FR',
+ })
- formats = []
- info_dict = {
- 'id': video_id,
- 'title': xpath_text(concert, './titre', 'title', fatal=True),
- 'formats': formats,
- }
-
- fichiers = concert.find('./fichiers')
- stream = fichiers.attrib['serveurstream']
- for fichier in fichiers.findall('./fichier'):
- info_dict['duration'] = float_or_none(fichier.get('timecodefin'))
- for quality, (format_id, suffix) in enumerate([('lq', ''), ('hq', '_hd')]):
- format_url = fichier.get('url%s' % suffix)
- if not format_url:
+ def extract_entry(source):
+ if not isinstance(source, dict):
+ return
+ title = source.get('title')
+ if not title:
+ return
+ files = source.get('files')
+ if not isinstance(files, dict):
+ return
+ format_urls = set()
+ formats = []
+ for format_id in ('mobile', 'desktop'):
+ format_url = try_get(
+ files, lambda x: x[format_id]['file'], compat_str)
+ if not format_url or format_url in format_urls:
continue
- formats.append({
- 'url': stream,
- 'play_path': format_url,
- 'ext': 'flv',
- 'format_id': format_id,
- 'width': int_or_none(concert.get('largeur%s' % suffix)),
- 'height': int_or_none(concert.get('hauteur%s' % suffix)),
- 'quality': quality,
- })
- self._sort_formats(formats)
+ format_urls.add(format_url)
+ m3u8_url = urljoin(self._LIVE_URL, format_url)
+ formats.extend(self._extract_m3u8_formats(
+ m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+ if not formats:
+ return
+ self._sort_formats(formats)
+ return {
+ 'title': title,
+ 'formats': formats,
+ }
+
+ thumbnail = urljoin(self._LIVE_URL, config.get('image'))
+
+ info = extract_entry(config)
+ if info:
+ info.update({
+ 'id': video_id,
+ 'thumbnail': thumbnail,
+ })
+ return info
- date, hour = concert.get('date'), concert.get('heure')
- if date and hour:
- info_dict['timestamp'] = parse_iso8601(
- '%s-%s-%sT%s:00' % (date[0:4], date[4:6], date[6:8], hour))
- elif date:
- info_dict['upload_date'] = date
+ entries = []
+ for num, chapter in enumerate(config['chapters'], start=1):
+ entry = extract_entry(chapter)
+ entry['id'] = '%s-%d' % (video_id, num)
+ entries.append(entry)
- return info_dict
+ return self.playlist_result(entries, video_id, config.get('title'))
import json
import os
import random
+import re
from .common import InfoExtractor
from ..compat import (
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
- if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
+ if all(not re.search(p, response) for p in (
+ r'__INITIAL_STATE__', r'["\']currentUser["\']',
+ # new layout?
+ r'>\s*Sign out\s*<')):
BLOCKED = 'Your account has been blocked due to suspicious activity'
if BLOCKED in response:
raise ExtractorError(
raise ExtractorError('Unable to log in')
- def _get_subtitles(self, author, clip_idx, lang, name, duration, video_id):
- captions_post = {
- 'a': author,
- 'cn': clip_idx,
- 'lc': lang,
- 'm': name,
- }
- captions = self._download_json(
- '%s/player/retrieve-captions' % self._API_BASE, video_id,
- 'Downloading captions JSON', 'Unable to download captions JSON',
- fatal=False, data=json.dumps(captions_post).encode('utf-8'),
- headers={'Content-Type': 'application/json;charset=utf-8'})
+ def _get_subtitles(self, author, clip_idx, clip_id, lang, name, duration, video_id):
+ captions = None
+ if clip_id:
+ captions = self._download_json(
+ '%s/transcript/api/v1/caption/json/%s/%s'
+ % (self._API_BASE, clip_id, lang), video_id,
+ 'Downloading captions JSON', 'Unable to download captions JSON',
+ fatal=False)
+ if not captions:
+ captions_post = {
+ 'a': author,
+ 'cn': int(clip_idx),
+ 'lc': lang,
+ 'm': name,
+ }
+ captions = self._download_json(
+ '%s/player/retrieve-captions' % self._API_BASE, video_id,
+ 'Downloading captions JSON', 'Unable to download captions JSON',
+ fatal=False, data=json.dumps(captions_post).encode('utf-8'),
+ headers={'Content-Type': 'application/json;charset=utf-8'})
if captions:
return {
lang: [{
# TODO: other languages?
subtitles = self.extract_subtitles(
- author, clip_idx, 'en', name, duration, display_id)
+ author, clip_idx, clip.get('clipId'), 'en', name, duration, display_id)
return {
'id': clip_id,
thumbnail = self._og_search_thumbnail(webpage)
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp'))
- print(self._html_search_meta(
- 'duration', webpage))
duration = int_or_none(self._html_search_meta(
'duration', webpage), invscale=60)
view_count = int_or_none(self._html_search_meta(
'ext': 'mp4',
'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes',
+ 'upload_date': '20130628',
'duration': 361,
'view_count': int,
'like_count': int,
'ext': 'mp4',
'title': '重庆婷婷女王足交',
'uploader': 'Unknown',
+ 'upload_date': '20150213',
'duration': 1753,
'view_count': int,
'like_count': int,
video_urls.append((video_url, None))
video_urls_set.add(video_url)
+ upload_date = None
formats = []
for video_url, height in video_urls:
+ if not upload_date:
+ upload_date = self._search_regex(
+ r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
+ if upload_date:
+ upload_date = upload_date.replace('/', '')
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj:
return {
'id': video_id,
'uploader': video_uploader,
+ 'upload_date': upload_date,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
('programma', 'nomeProgramma'), webpage, 'title')
description = unescapeHTML(self._html_search_meta(
('description', 'og:description'), webpage, 'description'))
- print(description)
entries = []
for mobj in re.finditer(
options = self._download_json(
'http://rutube.ru/api/play/options/%s/?format=json' % video_id,
- video_id, 'Downloading options JSON')
+ video_id, 'Downloading options JSON',
+ headers=self.geo_verification_headers())
formats = []
for format_id, format_url in options['video_balancer'].items():
r'src=(.*?)(?:$|&)', video_meta,
'meta tag video URL', default=None)
+ if video_url is None:
+ video_url = self._html_search_regex(
+ r'MediaContentUrl["\']\s*:(["\'])(?P<url>(?:(?!\1).)+)\1',
+ webpage, 'video url', default=None, group='url')
+
+ if video_url is None:
+ video_url = self._html_search_meta(
+ 'og:video', webpage, default=None)
+
if video_url is None:
raise ExtractorError('Cannot find video')
_FEED_URL = 'http://www.paramountnetwork.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']
+
+ def _extract_mgid(self, webpage):
+ cs = self._parse_json(self._search_regex(
+ r'window\.__DATA__\s*=\s*({.+})',
+ webpage, 'data'), None)['children']
+ c = next(c for c in cs if c.get('type') == 'VideoPlayer')
+ return c['props']['media']['video']['config']['uri']
determine_ext,
int_or_none,
js_to_json,
+ merge_dicts,
)
-class SportBoxEmbedIE(InfoExtractor):
- _VALID_URL = r'https?://news\.sportbox\.ru/vdl/player(?:/[^/]+/|\?.*?\bn?id=)(?P<id>\d+)'
+class SportBoxIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:news\.sportbox|matchtv)\.ru/vdl/player(?:/[^/]+/|\?.*?\bn?id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://news.sportbox.ru/vdl/player/ci/211355',
'info_dict': {
- 'id': '211355',
+ 'id': '109158',
'ext': 'mp4',
- 'title': '211355',
+ 'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
+ 'description': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 292,
'view_count': int,
+ 'timestamp': 1426237001,
+ 'upload_date': '20150313',
},
'params': {
# m3u8 download
}, {
'url': 'https://news.sportbox.ru/vdl/player/media/193095',
'only_matching': True,
+ }, {
+ 'url': 'https://news.sportbox.ru/vdl/player/media/109158',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://matchtv.ru/vdl/player/media/109158',
+ 'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
- r'<iframe[^>]+src="(https?://news\.sportbox\.ru/vdl/player[^"]+)"',
+ r'<iframe[^>]+src="(https?://(?:news\.sportbox|matchtv)\.ru/vdl/player[^"]+)"',
webpage)
def _real_extract(self, url):
webpage = self._download_webpage(url, video_id)
- wjplayer_data = self._parse_json(
+ sources = self._parse_json(
self._search_regex(
- r'(?s)wjplayer\(({.+?})\);', webpage, 'wjplayer settings'),
+ r'(?s)playerOptions\.sources(?:WithRes)?\s*=\s*(\[.+?\])\s*;\s*\n',
+ webpage, 'sources'),
video_id, transform_source=js_to_json)
formats = []
- for source in wjplayer_data['sources']:
+ for source in sources:
src = source.get('src')
if not src:
continue
})
self._sort_formats(formats)
+ player = self._parse_json(
+ self._search_regex(
+ r'(?s)playerOptions\s*=\s*({.+?})\s*;\s*\n', webpage,
+ 'player options', default='{}'),
+ video_id, transform_source=js_to_json)
+ media_id = player['mediaId']
+
+ info = self._search_json_ld(webpage, media_id, default={})
+
view_count = int_or_none(self._search_regex(
r'Просмотров\s*:\s*(\d+)', webpage, 'view count', default=None))
- return {
- 'id': video_id,
- 'title': video_id,
- 'thumbnail': wjplayer_data.get('poster'),
- 'duration': int_or_none(wjplayer_data.get('duration')),
+ return merge_dicts(info, {
+ 'id': media_id,
+ 'title': self._og_search_title(webpage, default=None) or media_id,
+ 'thumbnail': player.get('poster'),
+ 'duration': int_or_none(player.get('duration')),
'view_count': view_count,
'formats': formats,
- }
+ })
http_url = None
for format_id, resources in resources_.items():
- if not isinstance(resources, dict):
- continue
if format_id == 'h264':
for resource in resources:
h264_url = resource.get('file')
'tbr': int_or_none(resource.get('bitrate')),
})
elif format_id == 'hls':
+ if not isinstance(resources, dict):
+ continue
stream_url = url_or_none(resources.get('stream'))
if not stream_url:
continue
smil_url, video_id, note=note, query={'format': 'SMIL'},
headers=self.geo_verification_headers())
error_element = find_xpath_attr(meta, _x('.//smil:ref'), 'src')
- if error_element is not None and error_element.attrib['src'].startswith(
- 'http://link.theplatform.%s/s/errorFiles/Unavailable.' % self._TP_TLD):
- raise ExtractorError(error_element.attrib['abstract'], expected=True)
+ if error_element is not None:
+ exception = find_xpath_attr(
+ error_element, _x('.//smil:param'), 'name', 'exception')
+ if exception is not None:
+ if exception.get('value') == 'GeoLocationBlocked':
+ self.raise_geo_restricted(error_element.attrib['abstract'])
+ elif error_element.attrib['src'].startswith(
+ 'http://link.theplatform.%s/s/errorFiles/Unavailable.'
+ % self._TP_TLD):
+ raise ExtractorError(
+ error_element.attrib['abstract'], expected=True)
smil_formats = self._parse_smil_formats(
meta, smil_url, video_id, namespace=default_ns,
r'videoTitle\s*=\s*"([^"]+)', webpage, 'title')
description = self._html_search_regex(
- r'>Description:</strong>\s*(.+?)\s*<', webpage, 'description', fatal=False)
+ r'(?s)Description:</dt>\s*<dd>(.+?)</dd>', webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'<span class="username">\s*(.+?)\s*<',
webpage, 'uploader', fatal=False)
dislike_count = int_or_none(self._search_regex(
r'rdownVar\s*=\s*"(\d+)"', webpage, 'dislike count', fatal=False))
view_count = str_to_int(self._search_regex(
- r'<strong>Views: </strong>([\d,\.]+)\s*</li>',
+ r'Views:\s*</dt>\s*<dd>([\d,\.]+)',
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._search_regex(
r'<span id="allCommentsCount">(\d+)</span>',
webpage, 'comment count', fatal=False))
category = self._search_regex(
- r'Category:\s*</strong>\s*<a[^>]+href=[^>]+>([^<]+)',
+ r'Category:\s*</dt>\s*<dd>\s*<a[^>]+href=[^>]+>([^<]+)',
webpage, 'category', fatal=False)
categories = [category] if category else None
tags_str = self._search_regex(
- r'(?s)Tags:\s*</strong>(.+?)</(?!a)',
+ r'(?s)Tags:\s*</dt>\s*<dd>(.+?)</(?!a)',
webpage, 'tags', fatal=False)
tags = [t for t in re.findall(
r'<a[^>]+href=[^>]+>([^<]+)', tags_str)] if tags_str else None
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class TV3IE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?tv3\.co\.nz/(?P<id>[^/]+)/tabid/\d+/articleID/\d+/MCat/\d+/Default\.aspx'
- _TEST = {
- 'url': 'http://www.tv3.co.nz/MOTORSPORT-SRS-SsangYong-Hampton-Downs-Round-3/tabid/3692/articleID/121615/MCat/2915/Default.aspx',
- 'info_dict': {
- 'id': '4659127992001',
- 'ext': 'mp4',
- 'title': 'CRC Motorsport: SRS SsangYong Hampton Downs Round 3 - S2015 Ep3',
- 'description': 'SsangYong Racing Series returns for Round 3 with drivers from New Zealand and Australia taking to the grid at Hampton Downs raceway.',
- 'uploader_id': '3812193411001',
- 'upload_date': '20151213',
- 'timestamp': 1449975272,
- },
- 'expected_warnings': [
- 'Failed to download MPD manifest'
- ],
- 'params': {
- # m3u8 download
- 'skip_download': True,
- },
- }
- BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/3812193411001/default_default/index.html?videoId=%s'
-
- def _real_extract(self, url):
- display_id = self._match_id(url)
- webpage = self._download_webpage(url, display_id)
- brightcove_id = self._search_regex(r'<param\s*name="@videoPlayer"\s*value="(\d+)"', webpage, 'brightcove id')
- return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+import re
+
+
+class TwitCastingIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:[^/]+\.)?twitcasting\.tv/(?P<uploader_id>[^/]+)/movie/(?P<id>\d+)'
+ _TEST = {
+ 'url': 'https://twitcasting.tv/ivetesangalo/movie/2357609',
+ 'md5': '745243cad58c4681dc752490f7540d7f',
+ 'info_dict': {
+ 'id': '2357609',
+ 'ext': 'mp4',
+ 'title': 'Recorded Live #2357609',
+ 'uploader_id': 'ivetesangalo',
+ 'description': "Moi! I'm live on TwitCasting from my iPhone.",
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ uploader_id = mobj.group('uploader_id')
+
+ webpage = self._download_webpage(url, video_id)
+
+ title = self._html_search_regex(
+ r'(?s)<[^>]+id=["\']movietitle[^>]+>(.+?)</',
+ webpage, 'title', default=None) or self._html_search_meta(
+ 'twitter:title', webpage, fatal=True)
+
+ m3u8_url = self._search_regex(
+ (r'data-movie-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
+ r'(["\'])(?P<url>http.+?\.m3u8.*?)\1'),
+ webpage, 'm3u8 url', group='url')
+
+ formats = self._extract_m3u8_formats(
+ m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls')
+
+ thumbnail = self._og_search_thumbnail(webpage)
+ description = self._og_search_description(
+ webpage, default=None) or self._html_search_meta(
+ 'twitter:description', webpage)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'uploader_id': uploader_id,
+ 'formats': formats,
+ }
expected=True)
def _call_api(self, path, item_id, *args, **kwargs):
- kwargs.setdefault('headers', {})['Client-ID'] = self._CLIENT_ID
+ headers = kwargs.get('headers', {}).copy()
+ headers['Client-ID'] = self._CLIENT_ID
+ kwargs['headers'] = headers
response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id,
*args, **compat_kwargs(kwargs))
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
- TwitchHighlightsIE))
+ TwitchHighlightsIE,
+ TwitchClipsIE))
else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url):
class TwitchClipsIE(TwitchBaseIE):
IE_NAME = 'twitch:clips'
- _VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+ _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:[^/]+/)*|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
# multiple formats
'url': 'https://clips.twitch.tv/rflegendary/UninterestedBeeDAESuppy',
'only_matching': True,
+ }, {
+ 'url': 'https://www.twitch.tv/sergeynixon/clip/StormyThankfulSproutFutureMan',
+ 'only_matching': True,
}]
def _real_extract(self, url):
raise ExtractorError(error_str, expected=True)
def _download_webpage_handle(self, *args, **kwargs):
- kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
+ headers = kwargs.get('headers', {}).copy()
+ headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
+ kwargs['headers'] = headers
return super(UdemyIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
def concat(suffix, sep='-'):
return (base_format_id + '%s%s' % (sep, suffix)) if base_format_id else suffix
- for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
- media = self._download_json(
- 'https://public-api.viewster.com/movies/%s/video' % entry_id,
- video_id, 'Downloading %s JSON' % concat(media_type, ' '), fatal=False, query={
- 'mediaType': media_type,
- 'language': audio,
- 'subtitle': subtitle,
- })
- if not media:
- continue
+ medias = self._download_json(
+ 'https://public-api.viewster.com/movies/%s/videos' % entry_id,
+ video_id, fatal=False, query={
+ 'mediaTypes': ['application/f4m+xml', 'application/x-mpegURL', 'video/mp4'],
+ 'language': audio,
+ 'subtitle': subtitle,
+ })
+ if not medias:
+ continue
+ for media in medias:
video_url = media.get('Uri')
if not video_url:
continue
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio',
'uploader_id': 'atencio',
'uploader': 'Peter Atencio',
+ 'channel_id': 'keypeele',
+ 'channel_url': r're:https?://(?:www\.)?vimeo\.com/channels/keypeele',
'timestamp': 1380339469,
'upload_date': '20130928',
'duration': 187,
},
+ 'expected_warnings': ['Unable to download JSON metadata'],
},
{
'url': 'http://vimeo.com/76979871',
'url': 'https://vimeo.com/channels/tributes/6213729',
'info_dict': {
'id': '6213729',
- 'ext': 'mov',
+ 'ext': 'mp4',
'title': 'Vimeo Tribute: The Shining',
'uploader': 'Casey Donahue',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue',
'uploader_id': 'caseydonahue',
+ 'channel_url': r're:https?://(?:www\.)?vimeo\.com/channels/tributes',
+ 'channel_id': 'tributes',
'timestamp': 1250886430,
'upload_date': '20090821',
'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
if 'Referer' not in headers:
headers['Referer'] = url
+ channel_id = self._search_regex(
+ r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
+
# Extract ID from URL
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
else:
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
+ config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
config = self._search_regex(config_re, webpage, 'info section',
flags=re.DOTALL)
config = json.loads(config)
if config.get('view') == 4:
config = self._verify_player_video_password(redirect_url, video_id)
+ vod = config.get('video', {}).get('vod', {})
+
def is_rented():
if '>You rented this title.<' in webpage:
return True
if config.get('user', {}).get('purchased'):
return True
- label = try_get(
- config, lambda x: x['video']['vod']['purchase_options'][0]['label_string'], compat_str)
- if label and label.startswith('You rented this'):
- return True
+ for purchase_option in vod.get('purchase_options', []):
+ if purchase_option.get('purchased'):
+ return True
+ label = purchase_option.get('label_string')
+ if label and (label.startswith('You rented this') or label.endswith(' remaining')):
+ return True
return False
- if is_rented():
- feature_id = config.get('video', {}).get('vod', {}).get('feature_id')
+ if is_rented() and vod.get('is_trailer'):
+ feature_id = vod.get('feature_id')
if feature_id and not data.get('force_feature_id', False):
return self.url_result(smuggle_url(
'https://player.vimeo.com/player/%s' % feature_id,
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license')
+ channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None
+
info_dict = {
'id': video_id,
'formats': formats,
'like_count': like_count,
'comment_count': comment_count,
'license': cc_license,
+ 'channel_id': channel_id,
+ 'channel_url': channel_url,
}
info_dict = merge_dicts(info_dict, info_dict_config, json_ld)
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'):
return []
- stream_id = hardsub_lang or audio_lang
+ assert audio_lang or hardsub_lang
+ stream_id_list = []
+ if audio_lang:
+ stream_id_list.append('audio-%s' % audio_lang)
+ if hardsub_lang:
+ stream_id_list.append('hardsub-%s' % hardsub_lang)
+ stream_id = '-'.join(stream_id_list)
format_id = '%s-%s' % (stream_format, stream_id)
if stream_format == 'hls':
adaptive_formats = self._extract_m3u8_formats(
import re
from .common import InfoExtractor
+from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
+ unified_timestamp,
+ url_or_none,
)
class VzaarIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|view)\.)?vzaar\.com/(?:videos/)?(?P<id>\d+)'
_TESTS = [{
+ # HTTP and HLS
'url': 'https://vzaar.com/videos/1152805',
'md5': 'bde5ddfeb104a6c56a93a06b04901dbf',
'info_dict': {
video_id = self._match_id(url)
video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
- source_url = video_data['sourceUrl']
- info = {
+ title = video_data['videoTitle']
+
+ formats = []
+
+ source_url = url_or_none(video_data.get('sourceUrl'))
+ if source_url:
+ f = {
+ 'url': source_url,
+ 'format_id': 'http',
+ }
+ if 'audio' in source_url:
+ f.update({
+ 'vcodec': 'none',
+ 'ext': 'mp3',
+ })
+ else:
+ f.update({
+ 'width': int_or_none(video_data.get('width')),
+ 'height': int_or_none(video_data.get('height')),
+ 'ext': 'mp4',
+ 'fps': float_or_none(video_data.get('fps')),
+ })
+ formats.append(f)
+
+ video_guid = video_data.get('guid')
+ usp = video_data.get('usp')
+ if isinstance(video_guid, compat_str) and isinstance(usp, dict):
+ m3u8_url = ('http://fable.vzaar.com/v4/usp/%s/%s.ism/.m3u8?'
+ % (video_guid, video_id)) + '&'.join(
+ '%s=%s' % (k, v) for k, v in usp.items())
+ formats.extend(self._extract_m3u8_formats(
+ m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+
+ self._sort_formats(formats)
+
+ return {
'id': video_id,
- 'title': video_data['videoTitle'],
- 'url': source_url,
+ 'title': title,
'thumbnail': self._proto_relative_url(video_data.get('poster')),
'duration': float_or_none(video_data.get('videoDuration')),
+ 'timestamp': unified_timestamp(video_data.get('ts')),
+ 'formats': formats,
}
- if 'audio' in source_url:
- info.update({
- 'vcodec': 'none',
- 'ext': 'mp3',
- })
- else:
- info.update({
- 'width': int_or_none(video_data.get('width')),
- 'height': int_or_none(video_data.get('height')),
- 'ext': 'mp4',
- })
- return info
remove_quotes,
remove_start,
smuggle_url,
+ str_or_none,
str_to_int,
try_get,
unescapeHTML,
return True
def _download_webpage_handle(self, *args, **kwargs):
- kwargs.setdefault('query', {})['disable_polymer'] = 'true'
+ query = kwargs.get('query', {}).copy()
+ query['disable_polymer'] = 'true'
+ kwargs['query'] = query
return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
(?:www\.)?hooktube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/|
+ (?:www\.)?invidio\.us/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
+ 'channel_id': 'UCLqxVugv74EIW3VWh2NOa3Q',
+ 'channel_url': r're:https?://(?:www\.)?youtube\.com/channel/UCLqxVugv74EIW3VWh2NOa3Q',
'upload_date': '20121002',
'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'],
'tags': ['youtube-dl'],
'duration': 10,
+ 'view_count': int,
'like_count': int,
'dislike_count': int,
'start_time': 1,
'categories': ['Science & Technology'],
'tags': ['youtube-dl'],
'duration': 10,
+ 'view_count': int,
'like_count': int,
'dislike_count': int,
},
'url': 'https://www.youtube.com/watch?v=MuAGGZNfUkU&list=RDMM',
'only_matching': True,
},
+ {
+ 'url': 'https://invidio.us/watch?v=BaW_jenozKc',
+ 'only_matching': True,
+ },
]
def __init__(self, *args, **kwargs):
(r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
- r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\('),
+ r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+ r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\('),
jscode, 'Initial JS player signature function name', group='sig')
jsi = JSInterpreter(jscode)
def extract_view_count(v_info):
return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
+ player_response = {}
+
# Get video info
embed_webpage = None
if re.search(r'player-age-gate-content">', video_webpage) is not None:
if args.get('livestream') == '1' or args.get('live_playback') == 1:
is_live = True
sts = ytplayer_config.get('sts')
+ if not player_response:
+ pl_response = str_or_none(args.get('player_response'))
+ if pl_response:
+ pl_response = self._parse_json(pl_response, video_id, fatal=False)
+ if isinstance(pl_response, dict):
+ player_response = pl_response
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
# We also try looking in get_video_info since it may contain different dashmpd
# URL that points to a DASH manifest with possibly different itag set (some itags
if not video_info_webpage:
continue
get_video_info = compat_parse_qs(video_info_webpage)
+ if not player_response:
+ pl_response = get_video_info.get('player_response', [None])[0]
+ if isinstance(pl_response, dict):
+ player_response = pl_response
add_dash_mpd(get_video_info)
if view_count is None:
view_count = extract_view_count(get_video_info)
'"token" parameter not in video info for unknown reason',
video_id=video_id)
+ video_details = try_get(
+ player_response, lambda x: x['videoDetails'], dict) or {}
+
# title
if 'title' in video_info:
video_title = video_info['title'][0]
+ elif 'title' in player_response:
+ video_title = video_details['title']
else:
self._downloader.report_warning('Unable to extract video title')
video_title = '_'
if view_count is None:
view_count = extract_view_count(video_info)
+ if view_count is None and video_details:
+ view_count = int_or_none(video_details.get('viewCount'))
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
# uploader
- video_uploader = try_get(video_info, lambda x: x['author'][0], compat_str)
+ video_uploader = try_get(
+ video_info, lambda x: x['author'][0],
+ compat_str) or str_or_none(video_details.get('author'))
if video_uploader:
video_uploader = compat_urllib_parse_unquote_plus(video_uploader)
else:
else:
self._downloader.report_warning('unable to extract uploader nickname')
+ channel_id = self._html_search_meta(
+ 'channelId', video_webpage, 'channel id')
+ channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
+
# thumbnail image
# We try first to get a high quality image:
m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
like_count = _extract_count('like')
dislike_count = _extract_count('dislike')
+ if view_count is None:
+ view_count = str_to_int(self._search_regex(
+ r'<[^>]+class=["\']watch-view-count[^>]+>\s*([\d,\s]+)', video_webpage,
+ 'view count', default=None))
+
# subtitles
video_subtitles = self.extract_subtitles(video_id, video_webpage)
automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
video_duration = try_get(
video_info, lambda x: int_or_none(x['length_seconds'][0]))
+ if not video_duration:
+ video_duration = int_or_none(video_details.get('lengthSeconds'))
if not video_duration:
video_duration = parse_duration(self._html_search_meta(
'duration', video_webpage, 'video duration'))
'uploader': video_uploader,
'uploader_id': video_uploader_id,
'uploader_url': video_uploader_url,
+ 'channel_id': channel_id,
+ 'channel_url': channel_url,
'upload_date': upload_date,
'license': video_license,
'creator': video_creator or artist,
(?:https?://)?
(?:\w+\.)?
(?:
- youtube\.com/
+ (?:
+ youtube\.com|
+ invidio\.us
+ )
+ /
(?:
(?:course|view_play_list|my_playlists|artist|playlist|watch|embed/(?:videoseries|[0-9A-Za-z_-]{11}))
\? (?:.*?[&;])*? (?:p|a|list)=
'description': 'md5:507cdcb5a49ac0da37a920ece610be80',
'categories': ['People & Blogs'],
'tags': list,
+ 'view_count': int,
'like_count': int,
'dislike_count': int,
},
# music album playlist
'url': 'OLAK5uy_m4xAFdmMC5rX3Ji3g93pQe3hqLZw_9LhM',
'only_matching': True,
+ }, {
+ 'url': 'https://invidio.us/playlist?list=PLDIoUOhQQPlXr63I_vwF9GD8sAKh77dWU',
+ 'only_matching': True,
}]
def _real_initialize(self):
class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
IE_DESC = 'YouTube.com channels'
- _VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/(?P<id>[0-9A-Za-z_-]+)'
+ _VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com|(?:www\.)?invidio\.us)/channel/(?P<id>[0-9A-Za-z_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/channel/%s/videos'
_VIDEO_RE = r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?'
IE_NAME = 'youtube:channel'
'id': 'UUs0ifCMCm1icqRbqhUINa0w',
'title': 'Uploads from Deus Ex',
},
+ }, {
+ 'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
+ 'only_matching': True,
}]
@classmethod
)
-class ZattooBaseIE(InfoExtractor):
- _NETRC_MACHINE = 'zattoo'
- _HOST_URL = 'https://zattoo.com'
-
+class ZattooPlatformBaseIE(InfoExtractor):
_power_guide_hash = None
+ def _host_url(self):
+ return 'https://%s' % (self._API_HOST if hasattr(self, '_API_HOST') else self._HOST)
+
def _login(self):
username, password = self._get_login_info()
if not username or not password:
try:
data = self._download_json(
- '%s/zapi/v2/account/login' % self._HOST_URL, None, 'Logging in',
+ '%s/zapi/v2/account/login' % self._host_url(), None, 'Logging in',
data=urlencode_postdata({
'login': username,
'password': password,
'remember': 'true',
}), headers={
- 'Referer': '%s/login' % self._HOST_URL,
+ 'Referer': '%s/login' % self._host_url(),
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})
except ExtractorError as e:
def _real_initialize(self):
webpage = self._download_webpage(
- self._HOST_URL, None, 'Downloading app token')
+ self._host_url(), None, 'Downloading app token')
app_token = self._html_search_regex(
r'appToken\s*=\s*(["\'])(?P<token>(?:(?!\1).)+?)\1',
webpage, 'app token', group='token')
# Will setup appropriate cookies
self._request_webpage(
- '%s/zapi/v2/session/hello' % self._HOST_URL, None,
+ '%s/zapi/v2/session/hello' % self._host_url(), None,
'Opening session', data=urlencode_postdata({
'client_app_token': app_token,
'uuid': compat_str(uuid4()),
def _extract_cid(self, video_id, channel_name):
channel_groups = self._download_json(
- '%s/zapi/v2/cached/channels/%s' % (self._HOST_URL,
+ '%s/zapi/v2/cached/channels/%s' % (self._host_url(),
self._power_guide_hash),
video_id, 'Downloading channel list',
query={'details': False})['channel_groups']
def _extract_cid_and_video_info(self, video_id):
data = self._download_json(
- '%s/zapi/program/details' % self._HOST_URL,
+ '%s/zapi/v2/cached/program/power_details/%s' % (
+ self._host_url(), self._power_guide_hash),
video_id,
'Downloading video information',
query={
- 'program_id': video_id,
- 'complete': True
+ 'program_ids': video_id,
+ 'complete': True,
})
- p = data['program']
+ p = data['programs'][0]
cid = p['cid']
info_dict = {
'id': video_id,
- 'title': p.get('title') or p['episode_title'],
- 'description': p.get('description'),
- 'thumbnail': p.get('image_url'),
+ 'title': p.get('t') or p['et'],
+ 'description': p.get('d'),
+ 'thumbnail': p.get('i_url'),
'creator': p.get('channel_name'),
- 'episode': p.get('episode_title'),
- 'episode_number': int_or_none(p.get('episode_number')),
- 'season_number': int_or_none(p.get('season_number')),
+ 'episode': p.get('et'),
+ 'episode_number': int_or_none(p.get('e_no')),
+ 'season_number': int_or_none(p.get('s_no')),
'release_year': int_or_none(p.get('year')),
- 'categories': try_get(p, lambda x: x['categories'], list),
+ 'categories': try_get(p, lambda x: x['c'], list),
+ 'tags': try_get(p, lambda x: x['g'], list)
}
return cid, info_dict
if is_live:
postdata_common.update({'timeshift': 10800})
- url = '%s/zapi/watch/live/%s' % (self._HOST_URL, cid)
+ url = '%s/zapi/watch/live/%s' % (self._host_url(), cid)
elif record_id:
- url = '%s/zapi/watch/recording/%s' % (self._HOST_URL, record_id)
+ url = '%s/zapi/watch/recording/%s' % (self._host_url(), record_id)
else:
- url = '%s/zapi/watch/recall/%s/%s' % (self._HOST_URL, cid, video_id)
+ url = '%s/zapi/watch/recall/%s/%s' % (self._host_url(), cid, video_id)
formats = []
for stream_type in ('dash', 'hls', 'hls5', 'hds'):
return info_dict
-class QuicklineBaseIE(ZattooBaseIE):
+class QuicklineBaseIE(ZattooPlatformBaseIE):
_NETRC_MACHINE = 'quickline'
- _HOST_URL = 'https://mobiltv.quickline.com'
+ _HOST = 'mobiltv.quickline.com'
class QuicklineIE(QuicklineBaseIE):
- _VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)'
+ _VALID_URL = r'https?://(?:www\.)?%s/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)' % re.escape(QuicklineBaseIE._HOST)
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
class QuicklineLiveIE(QuicklineBaseIE):
- _VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<id>[^/]+)'
+ _VALID_URL = r'https?://(?:www\.)?%s/watch/(?P<id>[^/]+)' % re.escape(QuicklineBaseIE._HOST)
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/srf1',
return self._extract_video(channel_name, video_id, is_live=True)
+class ZattooBaseIE(ZattooPlatformBaseIE):
+ _NETRC_MACHINE = 'zattoo'
+ _HOST = 'zattoo.com'
+
+
+def _make_valid_url(tmpl, host):
+ return tmpl % re.escape(host)
+
+
class ZattooIE(ZattooBaseIE):
- _VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
+ _VALID_URL_TEMPLATE = r'https?://(?:www\.)?%s/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
+ _VALID_URL = _make_valid_url(_VALID_URL_TEMPLATE, ZattooBaseIE._HOST)
# Since regular videos are only available for 7 days and recorded videos
# are only available for a specific user, we cannot have detailed tests.
def _real_extract(self, url):
channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True)
+
+
+class NetPlusIE(ZattooIE):
+ _NETRC_MACHINE = 'netplus'
+ _HOST = 'netplus.tv'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.netplus.tv/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class MNetTVIE(ZattooIE):
+ _NETRC_MACHINE = 'mnettv'
+ _HOST = 'tvplus.m-net.de'
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://tvplus.m-net.de/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class WalyTVIE(ZattooIE):
+ _NETRC_MACHINE = 'walytv'
+ _HOST = 'player.waly.tv'
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://player.waly.tv/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class BBVTVIE(ZattooIE):
+ _NETRC_MACHINE = 'bbvtv'
+ _HOST = 'bbv-tv.net'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.bbv-tv.net/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class VTXTVIE(ZattooIE):
+ _NETRC_MACHINE = 'vtxtv'
+ _HOST = 'vtxtv.ch'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.vtxtv.ch/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class MyVisionTVIE(ZattooIE):
+ _NETRC_MACHINE = 'myvisiontv'
+ _HOST = 'myvisiontv.ch'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.myvisiontv.ch/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class GlattvisionTVIE(ZattooIE):
+ _NETRC_MACHINE = 'glattvisiontv'
+ _HOST = 'iptv.glattvision.ch'
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://iptv.glattvision.ch/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class SAKTVIE(ZattooIE):
+ _NETRC_MACHINE = 'saktv'
+ _HOST = 'saktv.ch'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.saktv.ch/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class EWETVIE(ZattooIE):
+ _NETRC_MACHINE = 'ewetv'
+ _HOST = 'tvonline.ewe.de'
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://tvonline.ewe.de/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class QuantumTVIE(ZattooIE):
+ _NETRC_MACHINE = 'quantumtv'
+ _HOST = 'quantum-tv.com'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.quantum-tv.com/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class OsnatelTVIE(ZattooIE):
+ _NETRC_MACHINE = 'osnateltv'
+ _HOST = 'tvonline.osnatel.de'
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://tvonline.osnatel.de/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
+
+
+class EinsUndEinsTVIE(ZattooIE):
+ _NETRC_MACHINE = '1und1tv'
+ _HOST = '1und1.tv'
+ _API_HOST = 'www.%s' % _HOST
+ _VALID_URL = _make_valid_url(ZattooIE._VALID_URL_TEMPLATE, _HOST)
+
+ _TESTS = [{
+ 'url': 'https://www.1und1.tv/watch/abc/123-abc',
+ 'only_matching': True,
+ }]
from __future__ import unicode_literals
-__version__ = '2018.09.10'
+__version__ = '2018.11.07'