+version 2019.09.01
+
+Core
++ [extractor/generic] Add support for squarespace embeds (#21294, #21802,
+ #21859)
++ [downloader/external] Respect mtime option for aria2c (#22242)
+
+Extractors
++ [xhamster:user] Add support for user pages (#16330, #18454)
++ [xhamster] Add support for more domains
++ [verystream] Add support for woof.tube (#22217)
++ [dailymotion] Add support for lequipe.fr (#21328, #22152)
++ [openload] Add support for oload.vip (#22205)
++ [bbccouk] Extend URL regular expression (#19200)
++ [youtube] Add support for invidious.nixnet.xyz and yt.elukerio.org (#22223)
+* [safari] Fix authentication (#22161, #22184)
+* [usanetwork] Fix extraction (#22105)
++ [einthusan] Add support for einthusan.ca (#22171)
+* [youtube] Improve unavailable message extraction (#22117)
++ [piksel] Extract subtitles (#20506)
+
+
+version 2019.08.13
+
+Core
+* [downloader/fragment] Fix ETA calculation of resumed download (#21992)
+* [YoutubeDL] Check annotations availability (#18582)
+
+Extractors
+* [youtube:playlist] Improve flat extraction (#21927)
+* [youtube] Fix annotations extraction (#22045)
++ [discovery] Extract series meta field (#21808)
+* [youtube] Improve error detection (#16445)
+* [vimeo] Fix album extraction (#1933, #15704, #15855, #18967, #21986)
++ [roosterteeth] Add support for watch URLs
+* [discovery] Limit video data by show slug (#21980)
+
+
+version 2019.08.02
+
+Extractors
++ [tvigle] Add support for HLS and DASH formats (#21967)
+* [tvigle] Fix extraction (#21967)
++ [yandexvideo] Add support for DASH formats (#21971)
+* [discovery] Use API call for video data extraction (#21808)
++ [mgtv] Extract format_note (#21881)
+* [tvn24] Fix metadata extraction (#21833, #21834)
+* [dlive] Relax URL regular expression (#21909)
++ [openload] Add support for oload.best (#21913)
+* [youtube] Improve metadata extraction for age gate content (#21943)
+
+
+version 2019.07.30
+
+Extractors
+* [youtube] Fix and improve title and description extraction (#21934)
+
+
+version 2019.07.27
+
+Extractors
++ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
++ [discovery] Add support go.discovery.com URLs
+* [youtube:playlist] Relax video regular expression (#21844)
+* [generic] Restrict --default-search schemeless URLs detection pattern
+ (#21842)
+* [vrv] Fix CMS signing query extraction (#21809)
+
+
+version 2019.07.16
+
+Extractors
++ [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv
+ (#21281, #21290)
+* [kaltura] Check source format URL (#21290)
+* [ctsnews] Fix YouTube embeds extraction (#21678)
++ [einthusan] Add support for einthusan.com (#21748, #21775)
++ [youtube] Add support for invidious.mastodon.host (#21777)
++ [gfycat] Extend URL regular expression (#21779, #21780)
+* [youtube] Restrict is_live extraction (#21782)
+
+
+version 2019.07.14
+
+Extractors
+* [porn91] Fix extraction (#21312)
++ [yandexmusic] Extract track number and disk number (#21421)
++ [yandexmusic] Add support for multi disk albums (#21420, #21421)
+* [lynda] Handle missing subtitles (#20490, #20513)
++ [youtube] Add more invidious instances to URL regular expression (#21694)
+* [twitter] Improve uploader id extraction (#21705)
+* [spankbang] Fix and improve metadata extraction
+* [spankbang] Fix extraction (#21763, #21764)
++ [dlive] Add support for dlive.tv (#18080)
++ [livejournal] Add support for livejournal.com (#21526)
+* [roosterteeth] Fix free episode extraction (#16094)
+* [dbtv] Fix extraction
+* [bellator] Fix extraction
+- [rudo] Remove extractor (#18430, #18474)
+* [facebook] Fallback to twitter:image meta for thumbnail extraction (#21224)
+* [bleacherreport] Fix Bleacher Report CMS extraction
+* [espn] Fix fivethirtyeight.com extraction
+* [5tv] Relax video URL regular expression and support https URLs
+* [youtube] Fix is_live extraction (#21734)
+* [youtube] Fix authentication (#11270)
+
+
+version 2019.07.12
+
+Core
++ [adobepass] Add support for AT&T U-verse (mso ATT) (#13938, #21016)
+
+Extractors
++ [mgtv] Pass Referer HTTP header for format URLs (#21726)
++ [beeg] Add support for api/v6 v2 URLs without t argument (#21701)
+* [voxmedia:volume] Improvevox embed extraction (#16846)
+* [funnyordie] Move extraction to VoxMedia extractor (#16846)
+* [gameinformer] Fix extraction (#8895, #15363, #17206)
+* [funk] Fix extraction (#17915)
+* [packtpub] Relax lesson URL regular expression (#21695)
+* [packtpub] Fix extraction (#21268)
+* [philharmoniedeparis] Relax URL regular expression (#21672)
+* [peertube] Detect embed URLs in generic extraction (#21666)
+* [mixer:vod] Relax URL regular expression (#21657, #21658)
++ [lecturio] Add support id based URLs (#21630)
++ [go] Add site info for disneynow (#21613)
+* [ted] Restrict info regular expression (#21631)
+* [twitch:vod] Actualize m3u8 URL (#21538, #21607)
+* [vzaar] Fix videos with empty title (#21606)
+* [tvland] Fix extraction (#21384)
+* [arte] Clean extractor (#15583, #21614)
+
+
version 2019.07.02
Core
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
```
+### Inline values
+
+Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
+
+#### Example
+
+Correct:
+
+```python
+title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+```
+
+Incorrect:
+
+```python
+TITLE_RE = r'<title>([^<]+)</title>'
+# ...some lines of code...
+title = self._html_search_regex(TITLE_RE, webpage, 'title')
+```
+
+### Collapse fallbacks
+
+Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
+
+#### Example
+
+Good:
+
+```python
+description = self._html_search_meta(
+ ['og:description', 'description', 'twitter:description'],
+ webpage, 'description', default=None)
+```
+
+Unwieldy:
+
+```python
+description = (
+ self._og_search_description(webpage, default=None)
+ or self._html_search_meta('description', webpage, default=None)
+ or self._html_search_meta('twitter:description', webpage, default=None))
+```
+
+Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
+
+### Trailing parentheses
+
+Always move trailing parentheses after the last argument.
+
+#### Example
+
+Correct:
+
+```python
+ lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+ list)
+```
+
+Incorrect:
+
+```python
+ lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+ list,
+)
+```
+
### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
'https://www.youtube.com/watch?v=FqZTN594JQw&list='
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
+Inline values
+
+Extracting variables is acceptable for reducing code duplication and
+improving readability of complex expressions. However, you should avoid
+extracting variables used only once and moving them to opposite parts of
+the extractor file, which makes reading the linear flow difficult.
+
+Example
+
+Correct:
+
+ title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+
+Incorrect:
+
+ TITLE_RE = r'<title>([^<]+)</title>'
+ # ...some lines of code...
+ title = self._html_search_regex(TITLE_RE, webpage, 'title')
+
+Collapse fallbacks
+
+Multiple fallback values can quickly become unwieldy. Collapse multiple
+fallback values into a single expression via a list of patterns.
+
+Example
+
+Good:
+
+ description = self._html_search_meta(
+ ['og:description', 'description', 'twitter:description'],
+ webpage, 'description', default=None)
+
+Unwieldy:
+
+ description = (
+ self._og_search_description(webpage, default=None)
+ or self._html_search_meta('description', webpage, default=None)
+ or self._html_search_meta('twitter:description', webpage, default=None))
+
+Methods supporting list of patterns are: _search_regex,
+_html_search_regex, _og_search_property, _html_search_meta.
+
+Trailing parentheses
+
+Always move trailing parentheses after the last argument.
+
+Example
+
+Correct:
+
+ lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+ list)
+
+Incorrect:
+
+ lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+ list,
+ )
+
Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from
- **ARD:mediathek**
- **ARDBetaMediathek**
- **Arkena**
- - **arte.tv**
- **arte.tv:+7**
- - **arte.tv:cinema**
- - **arte.tv:concert**
- - **arte.tv:creative**
- - **arte.tv:ddc**
- **arte.tv:embed**
- - **arte.tv:future**
- - **arte.tv:info**
- - **arte.tv:magazine**
- **arte.tv:playlist**
- **AsianCrush**
- **AsianCrushPlaylist**
- **DiscoveryNetworksDe**
- **DiscoveryVR**
- **Disney**
+ - **dlive:stream**
+ - **dlive:vod**
- **Dotsub**
- **DouyuShow**
- **DouyuTV**: 斗鱼
- **FrontendMastersCourse**
- **FrontendMastersLesson**
- **Funimation**
- - **FunkChannel**
- - **FunkMix**
- - **FunnyOrDie**
+ - **Funk**
- **Fusion**
- **Fux**
- **FXNetworks**
- **linkedin:learning:course**
- **LinuxAcademy**
- **LiTV**
+ - **LiveJournal**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
- **rtve.es:television**
- **RTVNH**
- **RTVS**
- - **Rudo**
- **RUHD**
- **rutube**: Rutube videos
- **rutube:channel**: Rutube channels
- **TF1**
- **TFO**
- **TheIntercept**
- - **theoperaplatform**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XHamster**
- **XHamsterEmbed**
+ - **XHamsterUser**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
+ - **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
\[aq]PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4\[aq]
\f[]
.fi
+.SS Inline values
+.PP
+Extracting variables is acceptable for reducing code duplication and
+improving readability of complex expressions.
+However, you should avoid extracting variables used only once and moving
+them to opposite parts of the extractor file, which makes reading the
+linear flow difficult.
+.SS Example
+.PP
+Correct:
+.IP
+.nf
+\f[C]
+title\ =\ self._html_search_regex(r\[aq]<title>([^<]+)</title>\[aq],\ webpage,\ \[aq]title\[aq])
+\f[]
+.fi
+.PP
+Incorrect:
+.IP
+.nf
+\f[C]
+TITLE_RE\ =\ r\[aq]<title>([^<]+)</title>\[aq]
+#\ ...some\ lines\ of\ code...
+title\ =\ self._html_search_regex(TITLE_RE,\ webpage,\ \[aq]title\[aq])
+\f[]
+.fi
+.SS Collapse fallbacks
+.PP
+Multiple fallback values can quickly become unwieldy.
+Collapse multiple fallback values into a single expression via a list of
+patterns.
+.SS Example
+.PP
+Good:
+.IP
+.nf
+\f[C]
+description\ =\ self._html_search_meta(
+\ \ \ \ [\[aq]og:description\[aq],\ \[aq]description\[aq],\ \[aq]twitter:description\[aq]],
+\ \ \ \ webpage,\ \[aq]description\[aq],\ default=None)
+\f[]
+.fi
+.PP
+Unwieldy:
+.IP
+.nf
+\f[C]
+description\ =\ (
+\ \ \ \ self._og_search_description(webpage,\ default=None)
+\ \ \ \ or\ self._html_search_meta(\[aq]description\[aq],\ webpage,\ default=None)
+\ \ \ \ or\ self._html_search_meta(\[aq]twitter:description\[aq],\ webpage,\ default=None))
+\f[]
+.fi
+.PP
+Methods supporting list of patterns are: \f[C]_search_regex\f[],
+\f[C]_html_search_regex\f[], \f[C]_og_search_property\f[],
+\f[C]_html_search_meta\f[].
+.SS Trailing parentheses
+.PP
+Always move trailing parentheses after the last argument.
+.SS Example
+.PP
+Correct:
+.IP
+.nf
+\f[C]
+\ \ \ \ lambda\ x:\ x[\[aq]ResultSet\[aq]][\[aq]Result\[aq]][0][\[aq]VideoUrlSet\[aq]][\[aq]VideoUrl\[aq]],
+\ \ \ \ list)
+\f[]
+.fi
+.PP
+Incorrect:
+.IP
+.nf
+\f[C]
+\ \ \ \ lambda\ x:\ x[\[aq]ResultSet\[aq]][\[aq]Result\[aq]][0][\[aq]VideoUrlSet\[aq]][\[aq]VideoUrl\[aq]],
+\ \ \ \ list,
+)
+\f[]
+.fi
.SS Use convenience conversion and parsing functions
.PP
Wrap all extracted numeric data into safe functions from
annofn = replace_extension(filename, 'annotations.xml', info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
self.to_screen('[info] Video annotations are already present')
+ elif not info_dict.get('annotations'):
+ self.report_warning('There are no annotations to write.')
else:
try:
self.to_screen('[info] Writing video annotations to: ' + annofn)
if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
except IOError:
- sys.exit('ERROR: batch file could not be read')
+ sys.exit('ERROR: batch file %s could not be read' % opts.batchfile)
all_urls = batch_urls + [url.strip() for url in args] # batch_urls are already striped in read_batch_urls
_enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
- # retried with the same request data this usually succeeds (1-2 attemps
+ # retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
+ cmd += self._bool_option('--remote-time', 'updatetime', 'true', 'false', '=')
cmd += ['--', info_dict['url']]
return cmd
})
def _start_frag_download(self, ctx):
+ resume_len = ctx['complete_frags_downloaded_bytes']
total_frags = ctx['total_frags']
# This dict stores the download progress, it's updated by the progress
# hook
state = {
'status': 'downloading',
- 'downloaded_bytes': ctx['complete_frags_downloaded_bytes'],
+ 'downloaded_bytes': resume_len,
'fragment_index': ctx['fragment_index'],
'fragment_count': total_frags,
'filename': ctx['filename'],
state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
if not ctx['live']:
state['eta'] = self.calc_eta(
- start, time_now, estimated_size,
- state['downloaded_bytes'])
+ start, time_now, estimated_size - resume_len,
+ state['downloaded_bytes'] - resume_len)
state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
- avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
+ avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete representation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps
IE_NAME = 'abcnews:video'
_VALID_URL = r'''(?x)
https?://
- abcnews\.go\.com/
(?:
- [^/]+/video/(?P<display_id>[0-9a-z-]+)-|
- video/embed\?.*?\bid=
+ abcnews\.go\.com/
+ (?:
+ [^/]+/video/(?P<display_id>[0-9a-z-]+)-|
+ video/embed\?.*?\bid=
+ )|
+ fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
)
(?P<id>\d+)
'''
'username_field': 'username',
'password_field': 'password',
},
+ 'ATT': {
+ 'name': 'AT&T U-verse',
+ 'username_field': 'userid',
+ 'password_field': 'password',
+ },
'ATTOTT': {
'name': 'DIRECTV NOW',
'username_field': 'email',
import re
from .common import InfoExtractor
-from ..compat import (
- compat_parse_qs,
- compat_str,
- compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
from ..utils import (
ExtractorError,
- find_xpath_attr,
- get_element_by_attribute,
int_or_none,
- NO_DEFAULT,
qualities,
try_get,
unified_strdate,
# add tests.
-class ArteTvIE(InfoExtractor):
- _VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
- IE_NAME = 'arte.tv'
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- lang = mobj.group('lang')
- video_id = mobj.group('id')
-
- ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
- ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
- ref_xml_doc = self._download_xml(
- ref_xml_url, video_id, note='Downloading metadata')
- config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
- config_xml_url = config_node.attrib['ref']
- config = self._download_xml(
- config_xml_url, video_id, note='Downloading configuration')
-
- formats = [{
- 'format_id': q.attrib['quality'],
- # The playpath starts at 'mp4:', if we don't manually
- # split the url, rtmpdump will incorrectly parse them
- 'url': q.text.split('mp4:', 1)[0],
- 'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
- 'ext': 'flv',
- 'quality': 2 if q.attrib['quality'] == 'hd' else 1,
- } for q in config.findall('./urls/url')]
- self._sort_formats(formats)
-
- title = config.find('.//name').text
- thumbnail = config.find('.//firstThumbnailUrl').text
- return {
- 'id': video_id,
- 'title': title,
- 'thumbnail': thumbnail,
- 'formats': formats,
- }
-
-
class ArteTVBaseIE(InfoExtractor):
- @classmethod
- def _extract_url_info(cls, url):
- mobj = re.match(cls._VALID_URL, url)
- lang = mobj.group('lang')
- query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
- if 'vid' in query:
- video_id = query['vid'][0]
- else:
- # This is not a real id, it can be for example AJT for the news
- # http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
- video_id = mobj.group('id')
- return video_id, lang
-
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
- qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
+ qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
+ 'it': 'E[ITA]',
+ 'pl': 'E[POL]',
}
langcode = LANGS.get(lang, lang)
l = re.escape(langcode)
# Language preference from most to least priority
- # Reference: section 5.6.3 of
- # http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
+ # Reference: section 6.8 of
+ # https://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-07-1.pdf
PREFERENCES = (
# original version in requested language, without subtitles
r'VO{0}$'.format(l),
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
- _VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/(?:[^/]+/)?(?P<lang>fr|de|en|es)/(?:videos/)?(?:[^/]+/)*(?P<id>[^/?#&]+)'
-
- _TESTS = [{
- 'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
- 'only_matching': True,
- }, {
- 'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
- 'only_matching': True,
- }, {
- 'url': 'http://www.arte.tv/de/videos/048696-000-A/der-kluge-bauch-unser-zweites-gehirn',
- 'only_matching': True,
- }]
-
- @classmethod
- def suitable(cls, url):
- return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
-
- def _real_extract(self, url):
- video_id, lang = self._extract_url_info(url)
- webpage = self._download_webpage(url, video_id)
- return self._extract_from_webpage(webpage, video_id, lang)
-
- def _extract_from_webpage(self, webpage, video_id, lang):
- patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
- ids = (video_id, '')
- # some pages contain multiple videos (like
- # http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
- # so we first try to look for json URLs that contain the video id from
- # the 'vid' parameter.
- patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
- json_url = self._html_search_regex(
- patterns, webpage, 'json vp url', default=None)
- if not json_url:
- def find_iframe_url(webpage, default=NO_DEFAULT):
- return self._html_search_regex(
- r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
- webpage, 'iframe url', group='url', default=default)
-
- iframe_url = find_iframe_url(webpage, None)
- if not iframe_url:
- embed_url = self._html_search_regex(
- r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
- if embed_url:
- player = self._download_json(
- embed_url, video_id, 'Downloading player page')
- iframe_url = find_iframe_url(player['html'])
- # en and es URLs produce react-based pages with different layout (e.g.
- # http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
- if not iframe_url:
- program = self._search_regex(
- r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
- webpage, 'program', default=None)
- if program:
- embed_html = self._parse_json(program, video_id)
- if embed_html:
- iframe_url = find_iframe_url(embed_html['embed_html'])
- if iframe_url:
- json_url = compat_parse_qs(
- compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
- if json_url:
- title = self._search_regex(
- r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
- webpage, 'title', default=None, group='title')
- return self._extract_from_json_url(json_url, video_id, lang, title=title)
- # Different kind of embed URL (e.g.
- # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
- entries = [
- self.url_result(url)
- for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
- return self.playlist_result(entries)
-
-
-# It also uses the arte_vp_url url from the webpage to extract the information
-class ArteTVCreativeIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:creative'
- _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
-
- _TESTS = [{
- 'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
- 'info_dict': {
- 'id': '057405-001-A',
- 'ext': 'mp4',
- 'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
- 'upload_date': '20150716',
- },
- }, {
- 'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
- 'playlist_count': 11,
- 'add_ie': ['Youtube'],
- }, {
- 'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
- 'only_matching': True,
- }]
-
-
-class ArteTVInfoIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:info'
- _VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
-
- _TESTS = [{
- 'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
- 'info_dict': {
- 'id': '067528-000-A',
- 'ext': 'mp4',
- 'title': 'Service civique, un cache misère ?',
- 'upload_date': '20160403',
- },
- }]
-
-
-class ArteTVFutureIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:future'
- _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
+ _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
_TESTS = [{
- 'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
+ 'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'info_dict': {
- 'id': '050940-028-A',
+ 'id': '088501-000-A',
'ext': 'mp4',
- 'title': 'Les écrevisses aussi peuvent être anxieuses',
- 'upload_date': '20140902',
+ 'title': 'Mexico: Stealing Petrol to Survive',
+ 'upload_date': '20190628',
},
- }, {
- 'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
- 'only_matching': True,
}]
-
-class ArteTVDDCIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:ddc'
- _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
-
- _TESTS = []
-
def _real_extract(self, url):
- video_id, lang = self._extract_url_info(url)
- if lang == 'folge':
- lang = 'de'
- elif lang == 'emission':
- lang = 'fr'
- webpage = self._download_webpage(url, video_id)
- scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
- script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
- javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
- json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
- return self._extract_from_json_url(json_url, video_id, lang)
-
-
-class ArteTVConcertIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:concert'
- _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
-
- _TESTS = [{
- 'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
- 'md5': '9ea035b7bd69696b67aa2ccaaa218161',
- 'info_dict': {
- 'id': '186',
- 'ext': 'mp4',
- 'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
- 'upload_date': '20140128',
- 'description': 'md5:486eb08f991552ade77439fe6d82c305',
- },
- }]
-
-
-class ArteTVCinemaIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:cinema'
- _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
-
- _TESTS = [{
- 'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
- 'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
- 'info_dict': {
- 'id': '062494-000-A',
- 'ext': 'mp4',
- 'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
- 'upload_date': '20150807',
- },
- }]
-
-
-class ArteTVMagazineIE(ArteTVPlus7IE):
- IE_NAME = 'arte.tv:magazine'
- _VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
-
- _TESTS = [{
- # Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
- 'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
- 'md5': '2a9369bcccf847d1c741e51416299f25',
- 'info_dict': {
- 'id': '065965-000-A',
- 'ext': 'mp4',
- 'title': 'Trepalium - Extrait Ep.01',
- 'upload_date': '20160121',
- },
- }, {
- # Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
- 'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
- 'md5': 'fedc64fc7a946110fe311634e79782ca',
- 'info_dict': {
- 'id': '054813-004_PLUS7-F',
- 'ext': 'mp4',
- 'title': 'Trepalium (4/6)',
- 'description': 'md5:10057003c34d54e95350be4f9b05cb40',
- 'upload_date': '20160218',
- },
- }, {
- 'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
- 'only_matching': True,
- }]
+ lang, video_id = re.match(self._VALID_URL, url).groups()
+ return self._extract_from_json_url(
+ 'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
+ video_id, lang)
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
- http://www\.arte\.tv
- /(?:playerv2/embed|arte_vp/index)\.php\?json_url=
+ https://www\.arte\.tv
+ /player/v3/index\.php\?json_url=
(?P<json_url>
- http://arte\.tv/papi/tvguide/videos/stream/player/
- (?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
+ https?://api\.arte\.tv/api/player/v1/config/
+ (?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
)
'''
_TESTS = []
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id')
- lang = mobj.group('lang')
- json_url = mobj.group('json_url')
+ json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(json_url, video_id, lang)
-class TheOperaPlatformIE(ArteTVPlus7IE):
- IE_NAME = 'theoperaplatform'
- _VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
-
- _TESTS = [{
- 'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
- 'md5': '970655901fa2e82e04c00b955e9afe7b',
- 'info_dict': {
- 'id': '060338-009-A',
- 'ext': 'mp4',
- 'title': 'Verdi - OTELLO',
- 'upload_date': '20160927',
- },
- }]
-
-
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
- _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
+ _VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_TESTS = [{
- 'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
+ 'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': {
- 'id': 'PL-013263',
- 'title': 'Areva & Uramin',
- 'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
+ 'id': 'RC-016954',
+ 'title': 'Earn a Living',
+ 'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
},
'playlist_mincount': 6,
- }, {
- 'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
- 'only_matching': True,
}]
def _real_extract(self, url):
- playlist_id, lang = self._extract_url_info(url)
+ lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)
from .common import InfoExtractor
from .kaltura import KalturaIE
-from ..utils import (
- extract_attributes,
- remove_end,
-)
+from ..utils import extract_attributes
class AsianCrushIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
+ _VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
+ _VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
- 'description': 'md5:3db14e9186197857e7063522cb89a805',
+ 'description': 'md5:7e986615808bcfb11756eb503a751487',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
+ }, {
+ 'url': 'https://www.yuyutv.com/video/013886v/the-act-of-killing/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.yuyutv.com/video/peep-show/013922v-warring-factions/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.midnightpulp.com/video/010400v/drifters/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.midnightpulp.com/video/mononoke/016378v-zashikiwarashi-part-1/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
+ 'only_matching': True,
}]
def _real_extract(self, url):
- video_id = self._match_id(url)
+ mobj = re.match(self._VALID_URL, url)
+ host = mobj.group('host')
+ video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
- 'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
+ 'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
- return self.url_result(
- 'kaltura:%s:%s' % (partner_id, kaltura_id),
- ie=KalturaIE.ie_key(), video_id=kaltura_id,
- video_title=title)
+ description = self._html_search_regex(
+ r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
+ webpage, 'description', fatal=False)
+
+ return {
+ '_type': 'url_transparent',
+ 'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
+ 'ie_key': KalturaIE.ie_key(),
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ }
class AsianCrushPlaylistIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
- _TEST = {
+ _VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
+ _TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
- }
+ }, {
+ 'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.midnightpulp.com/series/016375s/mononoke/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
playlist_id = self._match_id(url)
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
- title = remove_end(
- self._html_search_regex(
- r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
- 'title', default=None) or self._og_search_title(
- webpage, default=None) or self._html_search_meta(
- 'twitter:title', webpage, 'title',
- default=None) or self._search_regex(
- r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
- ' | AsianCrush')
+ title = self._html_search_regex(
+ r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
+ 'title', default=None) or self._og_search_title(
+ webpage, default=None) or self._html_search_meta(
+ 'twitter:title', webpage, 'title',
+ default=None) or self._search_regex(
+ r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
+ if title:
+ title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/(?:clips|audiovideo/popular)[/#]|
radio/player/|
+ sounds/play/|
events/[^/]+/play/[^/]+/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
- 'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
+ 'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
},
'params': {
# rtmp download
'skip_download': True,
},
+ }, {
+ 'url': 'https://www.bbc.co.uk/sounds/play/m0007jzb',
+ 'note': 'Audio',
+ 'info_dict': {
+ 'id': 'm0007jz9',
+ 'ext': 'mp4',
+ 'title': 'BBC Proms, 2019, Prom 34: West–Eastern Divan Orchestra',
+ 'description': "Live BBC Proms. West–Eastern Divan Orchestra with Daniel Barenboim and Martha Argerich.",
+ 'duration': 9840,
+ },
+ 'params': {
+ # rtmp download
+ 'skip_download': True,
+ }
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': {
'id': 'world-europe-32668511',
- 'title': 'Russia stages massive WW2 parade despite Western boycott',
+ 'title': 'Russia stages massive WW2 parade',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
},
'playlist_count': 2,
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
- _VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>\w+)'
+ _VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
+ }, {
+ 'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
+ 'only_matching': True,
}]
@staticmethod
# api/v6 v2
'url': 'https://beeg.com/1941093077?t=911-1391',
'only_matching': True,
+ }, {
+ # api/v6 v2 w/o t
+ 'url': 'https://beeg.com/1277207756',
+ 'only_matching': True,
}, {
'url': 'https://beeg.porn/video/5416503',
'only_matching': True,
r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
default='1546225636701')
- qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
- t = qs.get('t', [''])[0].split('-')
- if len(t) > 1:
+ if len(video_id) >= 10:
query = {
'v': 2,
- 's': t[0],
- 'e': t[1],
}
+ qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+ t = qs.get('t', [''])[0].split('-')
+ if len(t) > 1:
+ query.update({
+ 's': t[0],
+ 'e': t[1],
+ })
else:
query = {'v': 1}
ExtractorError,
remove_end,
)
-from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor):
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
- 'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
+ 'id': 'b4xd0LK3SK',
'ext': 'mp4',
- 'uploader': '(none)',
- 'upload_date': '20160708',
- 'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
+ # TODO: fix url_transparent information overriding
+ # 'uploader': 'Juan Pablo Echenique',
+ 'title': 'Comentario Oscar Cáceres',
+ },
+ 'params': {
+ # empty m3u8 manifest
+ 'skip_download': True,
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
webpage = self._download_webpage(url, video_id)
- rudo_url = RudoIE._extract_url(webpage)
+ rudo_url = self._search_regex(
+ r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
+ webpage, 'embed URL', None, group='url')
if not rudo_url:
raise ExtractorError('No videos found')
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
- r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
+ r'<a[^>]+href=["\'](?:https?://(?:busca|www)\.biobiochile\.cl)?/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {
video = article_data.get('video')
if video:
video_type = video['type']
- if video_type == 'cms.bleacherreport.com':
+ if video_type in ('cms.bleacherreport.com', 'vid.bleacherreport.com'):
info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
elif video_type == 'ooyala.com':
info['url'] = 'ooyala:%s' % video['id']
class BleacherReportCMSIE(AMPIE):
- _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})'
+ _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{
- 'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
+ 'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
def _real_extract(self, url):
video_id = self._match_id(url)
- info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id)
+ info = self._extract_feed_info('http://vid.bleacherreport.com/videos/%s.akamai' % video_id)
info['id'] = video_id
return info
* "preference" (optional, int) - quality of the image
* "width" (optional, int)
* "height" (optional, int)
- * "resolution" (optional, string "{width}x{height"},
+ * "resolution" (optional, string "{width}x{height}",
deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image.
from .common import InfoExtractor
from ..utils import unified_timestamp
+from .youtube import YoutubeIE
class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201501291578109',
'ext': 'mp4',
- 'title': '以色列.真主黨交火 3人死亡',
- 'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
+ 'title': '以色列.真主黨交火 3人死亡 - 華視新聞網',
+ 'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人員也不幸罹難。大陸陝西、河南、安徽、江蘇和湖北五個省份出現大暴雪,嚴重影響陸空交通,不過九華山卻出現...',
'timestamp': 1422528540,
'upload_date': '20150129',
}
'info_dict': {
'id': '201309031304098',
'ext': 'mp4',
- 'title': '韓國31歲童顏男 貌如十多歲小孩',
+ 'title': '韓國31歲童顏男 貌如十多歲小孩 - 華視新聞網',
'description': '越有年紀的人,越希望看起來年輕一點,而南韓卻有一位31歲的男子,看起來像是11、12歲的小孩,身...',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1378205880,
video_url = mp4_feed['source_url']
else:
self.to_screen('Not CTSPlayer video, trying Youtube...')
- youtube_url = self._search_regex(
- r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
+ youtube_url = YoutubeIE._extract_url(page)
return self.url_result(youtube_url, ie='Youtube')
class DailymotionIE(DailymotionBaseInfoExtractor):
- _VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)'
+ _VALID_URL = r'''(?ix)
+ https?://
+ (?:
+ (?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
+ (?:www\.)?lequipe\.fr/video
+ )
+ /(?P<id>[^/?_]+)
+ '''
IE_NAME = 'dailymotion'
_FORMATS = [
}, {
'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
'only_matching': True,
+ }, {
+ 'url': 'https://www.lequipe.fr/video/x791mem',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
+ 'only_matching': True,
}]
@staticmethod
class DBTVIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
+ _VALID_URL = r'https?://(?:www\.)?dagbladet\.no/video/(?:(?:embed|(?P<display_id>[^/]+))/)?(?P<id>[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8})'
_TESTS = [{
- 'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
- 'md5': '2e24f67936517b143a234b4cadf792ec',
+ 'url': 'https://www.dagbladet.no/video/PynxJnNWChE/',
+ 'md5': 'b8f850ba1860adbda668d367f9b77699',
'info_dict': {
- 'id': '3649835190001',
- 'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
+ 'id': 'PynxJnNWChE',
'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
- 'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
+ 'description': 'md5:49cc8370e7d66e8a2ef15c3b4631fd3f',
'thumbnail': r're:https?://.*\.jpg',
- 'timestamp': 1404039863,
- 'upload_date': '20140629',
- 'duration': 69.544,
- 'uploader_id': '1027729757001',
+ 'upload_date': '20160916',
+ 'duration': 69,
+ 'uploader_id': 'UCk5pvsyZJoYJBd7_oFPTlRQ',
+ 'uploader': 'Dagbladet',
},
- 'add_ie': ['BrightcoveNew']
+ 'add_ie': ['Youtube']
}, {
- 'url': 'http://dbtv.no/3649835190001',
+ 'url': 'https://www.dagbladet.no/video/embed/xlGmyIeN9Jo/?autoplay=false',
'only_matching': True,
}, {
- 'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
- 'only_matching': True,
- }, {
- 'url': 'http://dbtv.no/vice/5000634109001',
- 'only_matching': True,
- }, {
- 'url': 'http://dbtv.no/filmtrailer/3359293614001',
+ 'url': 'https://www.dagbladet.no/video/truer-iran-bor-passe-dere/PalfB2Cw',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
- r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
+ r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dagbladet\.no/video/embed/(?:[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8}).*?)\1',
webpage)]
def _real_extract(self, url):
- video_id, display_id = re.match(self._VALID_URL, url).groups()
-
- return {
+ display_id, video_id = re.match(self._VALID_URL, url).groups()
+ info = {
'_type': 'url_transparent',
- 'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id,
- 'ie_key': 'BrightcoveNew',
}
+ if len(video_id) == 11:
+ info.update({
+ 'url': video_id,
+ 'ie_key': 'Youtube',
+ })
+ else:
+ info.update({
+ 'url': 'jwplatform:' + video_id,
+ 'ie_key': 'JWPlatform',
+ })
+ return info
import string
from .discoverygo import DiscoveryGoBaseIE
-from ..compat import (
- compat_str,
- compat_urllib_parse_unquote,
-)
-from ..utils import (
- ExtractorError,
- try_get,
-)
+from ..compat import compat_urllib_parse_unquote
+from ..utils import ExtractorError
from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
+ (?:(?:www|go)\.)?discovery|
(?:www\.)?
(?:
- discovery|
investigationdiscovery|
discoverylife|
animalplanet|
cookingchanneltv|
motortrend
)
- )\.com(?P<path>/tv-shows/[^/]+/(?:video|full-episode)s/(?P<id>[^./?#]+))'''
+ )\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
_TESTS = [{
- 'url': 'https://www.discovery.com/tv-shows/cash-cab/videos/dave-foley',
+ 'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
'info_dict': {
- 'id': '5a2d9b4d6b66d17a5026e1fd',
+ 'id': '5a2f35ce6b66d17a5026e29e',
'ext': 'mp4',
- 'title': 'Dave Foley',
- 'description': 'md5:4b39bcafccf9167ca42810eb5f28b01f',
- 'duration': 608,
+ 'title': 'Riding with Matthew Perry',
+ 'description': 'md5:a34333153e79bc4526019a5129e7f878',
+ 'duration': 84,
},
'params': {
'skip_download': True, # requires ffmpeg
}, {
'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'only_matching': True,
+ }, {
+ 'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
+ 'only_matching': True,
+ }, {
+ # using `show_slug` is important to get the correct video data
+ 'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
+ 'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
+ _API_BASE_URL = 'https://api.discovery.com/v1/'
def _real_extract(self, url):
- site, path, display_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(url, display_id)
-
- react_data = self._parse_json(self._search_regex(
- r'window\.__reactTransmitPacket\s*=\s*({.+?});',
- webpage, 'react data'), display_id)
- content_blocks = react_data['layout'][path]['contentBlocks']
- video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
- video_id = video['id']
+ site, show_slug, display_id = re.match(self._VALID_URL, url).groups()
access_token = None
cookies = self._get_cookies(url)
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
- video_id, fatal=False) or {}
+ display_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
- 'https://%s.com/anonymous' % site, display_id, query={
+ 'https://%s.com/anonymous' % site, display_id,
+ 'Downloading token JSON metadata', query={
'authRel': 'authorization',
- 'client_id': try_get(
- react_data, lambda x: x['application']['apiClientId'],
- compat_str) or '3020a40c2356a645b4b4',
+ 'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
- try:
- headers = self.geo_verification_headers()
- headers['Authorization'] = 'Bearer ' + access_token
+ headers = self.geo_verification_headers()
+ headers['Authorization'] = 'Bearer ' + access_token
+ try:
+ video = self._download_json(
+ self._API_BASE_URL + 'content/videos',
+ display_id, 'Downloading content JSON metadata',
+ headers=headers, query={
+ 'embed': 'show.name',
+ 'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
+ 'slug': display_id,
+ 'show_slug': show_slug,
+ })[0]
+ video_id = video['id']
stream = self._download_json(
- 'https://api.discovery.com/v1/streaming/video/' + video_id,
- display_id, headers=headers)
+ self._API_BASE_URL + 'streaming/video/' + video_id,
+ display_id, 'Downloading streaming JSON metadata', headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(
--- /dev/null
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class DLiveVODIE(InfoExtractor):
+ IE_NAME = 'dlive:vod'
+ _VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[^/?#&]+)'
+ _TESTS = [{
+ 'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
+ 'info_dict': {
+ 'id': '3mTzOl4WR',
+ 'ext': 'mp4',
+ 'title': 'Minecraft with james charles epic',
+ 'upload_date': '20190701',
+ 'timestamp': 1562011015,
+ 'uploader_id': 'pdp',
+ }
+ }, {
+ 'url': 'https://dlive.tv/p/pdpreplay+D-RD-xSZg',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ uploader_id, vod_id = re.match(self._VALID_URL, url).groups()
+ broadcast = self._download_json(
+ 'https://graphigo.prd.dlive.tv/', vod_id,
+ data=json.dumps({'query': '''query {
+ pastBroadcast(permlink:"%s+%s") {
+ content
+ createdAt
+ length
+ playbackUrl
+ title
+ thumbnailUrl
+ viewCount
+ }
+}''' % (uploader_id, vod_id)}).encode())['data']['pastBroadcast']
+ title = broadcast['title']
+ formats = self._extract_m3u8_formats(
+ broadcast['playbackUrl'], vod_id, 'mp4', 'm3u8_native')
+ self._sort_formats(formats)
+ return {
+ 'id': vod_id,
+ 'title': title,
+ 'uploader_id': uploader_id,
+ 'formats': formats,
+ 'description': broadcast.get('content'),
+ 'thumbnail': broadcast.get('thumbnailUrl'),
+ 'timestamp': int_or_none(broadcast.get('createdAt'), 1000),
+ 'view_count': int_or_none(broadcast.get('viewCount')),
+ }
+
+
+class DLiveStreamIE(InfoExtractor):
+ IE_NAME = 'dlive:stream'
+ _VALID_URL = r'https?://(?:www\.)?dlive\.tv/(?!p/)(?P<id>[\w.-]+)'
+
+ def _real_extract(self, url):
+ display_name = self._match_id(url)
+ user = self._download_json(
+ 'https://graphigo.prd.dlive.tv/', display_name,
+ data=json.dumps({'query': '''query {
+ userByDisplayName(displayname:"%s") {
+ livestream {
+ content
+ createdAt
+ title
+ thumbnailUrl
+ watchingCount
+ }
+ username
+ }
+}''' % display_name}).encode())['data']['userByDisplayName']
+ livestream = user['livestream']
+ title = livestream['title']
+ username = user['username']
+ formats = self._extract_m3u8_formats(
+ 'https://live.prd.dlive.tv/hls/live/%s.m3u8' % username,
+ display_name, 'mp4')
+ self._sort_formats(formats)
+ return {
+ 'id': display_name,
+ 'title': self._live_title(title),
+ 'uploader': display_name,
+ 'uploader_id': username,
+ 'formats': formats,
+ 'description': livestream.get('content'),
+ 'thumbnail': livestream.get('thumbnailUrl'),
+ 'is_live': True,
+ 'timestamp': int_or_none(livestream.get('createdAt'), 1000),
+ 'view_count': int_or_none(livestream.get('watchingCount')),
+ }
from __future__ import unicode_literals
import json
+import re
from .common import InfoExtractor
from ..compat import (
class EinthusanIE(InfoExtractor):
- _VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
+ _VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com|ca))/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
}, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
+ }, {
+ 'url': 'https://einthusan.com/movie/watch/9097/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://einthusan.ca/movie/watch/4E9n/?lang=hindi',
+ 'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
)).decode('utf-8'), video_id)
def _real_extract(self, url):
- video_id = self._match_id(url)
+ mobj = re.match(self._VALID_URL, url)
+ host = mobj.group('host')
+ video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
page_id = self._html_search_regex(
'<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
video_data = self._download_json(
- 'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
+ 'https://%s/ajax/movie/watch/%s/' % (host, video_id), video_id,
data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({
_TEST = {
'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/',
'info_dict': {
- 'id': '21846851',
- 'ext': 'mp4',
+ 'id': '56032156',
+ 'ext': 'flv',
'title': 'FiveThirtyEight: The Raiders can still make the playoffs',
'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.',
- 'timestamp': 1513960621,
- 'upload_date': '20171222',
},
'params': {
'skip_download': True,
},
- 'expected_warnings': ['Unable to download f4m manifest'],
}
def _real_extract(self, url):
webpage = self._download_webpage(url, video_id)
- video_id = self._search_regex(
- r'data-video-id=["\'](?P<id>\d+)',
- webpage, 'video id', group='id')
+ embed_url = self._search_regex(
+ r'<iframe[^>]+src=["\'](https?://fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/\d+)',
+ webpage, 'embed url')
- return self.url_result(
- 'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())
+ return self.url_result(embed_url, 'AbcNewsVideo')
ARDMediathekIE,
)
from .arte import (
- ArteTvIE,
ArteTVPlus7IE,
- ArteTVCreativeIE,
- ArteTVConcertIE,
- ArteTVInfoIE,
- ArteTVFutureIE,
- ArteTVCinemaIE,
- ArteTVDDCIE,
- ArteTVMagazineIE,
ArteTVEmbedIE,
- TheOperaPlatformIE,
ArteTVPlaylistIE,
)
from .asiancrush import (
FrontendMastersCourseIE
)
from .funimation import FunimationIE
-from .funk import (
- FunkMixIE,
- FunkChannelIE,
-)
-from .funnyordie import FunnyOrDieIE
+from .funk import FunkIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
)
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
+from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE
from .rtvs import RTVSIE
-from .rudo import RudoIE
from .ruhd import RUHDIE
from .rutube import (
RutubeIE,
from .ufctv import UFCTVIE
from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE
+from .dlive import (
+ DLiveVODIE,
+ DLiveStreamIE,
+)
from .umg import UMGDeIE
from .unistra import UnistraIE
from .unity import UnityIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
+ XHamsterUserIE,
)
from .xiami import (
XiamiSongIE,
YahooSearchIE,
YahooGyaOPlayerIE,
YahooGyaOIE,
+ YahooJapanNewsIE,
)
from .yandexdisk import YandexDiskIE
from .yandexmusic import (
timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None))
- thumbnail = self._og_search_thumbnail(webpage)
+ thumbnail = self._html_search_meta(['og:image', 'twitter:image'], webpage)
view_count = parse_count(self._search_regex(
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',
class FiveTVIE(InfoExtractor):
_VALID_URL = r'''(?x)
- http://
+ https?://
(?:www\.)?5-tv\.ru/
(?:
(?:[^/]+/)+(?P<id>\d+)|
'duration': 180,
},
}, {
+ # redirect to https://www.5-tv.ru/projects/1000095/izvestia-glavnoe/
'url': 'http://www.5-tv.ru/glavnoe/#itemDetails',
'info_dict': {
'id': 'glavnoe',
'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
'thumbnail': r're:^https?://.*\.jpg$',
},
+ 'skip': 'redirect to «Известия. Главное» project page',
}, {
'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
'only_matching': True,
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
- [r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"',
+ [r'<div[^>]+?class="(?:flow)?player[^>]+?data-href="([^"]+)"',
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
webpage, 'video url')
# coding: utf-8
from __future__ import unicode_literals
-import itertools
import re
from .common import InfoExtractor
from .nexx import NexxIE
-from ..compat import compat_str
from ..utils import (
int_or_none,
- try_get,
+ str_or_none,
)
-class FunkBaseIE(InfoExtractor):
- _HEADERS = {
- 'Accept': '*/*',
- 'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
- 'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4',
- }
- _AUTH = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4'
-
- @staticmethod
- def _make_headers(referer):
- headers = FunkBaseIE._HEADERS.copy()
- headers['Referer'] = referer
- return headers
-
- def _make_url_result(self, video):
- return {
- '_type': 'url_transparent',
- 'url': 'nexx:741:%s' % video['sourceId'],
- 'ie_key': NexxIE.ie_key(),
- 'id': video['sourceId'],
- 'title': video.get('title'),
- 'description': video.get('description'),
- 'duration': int_or_none(video.get('duration')),
- 'season_number': int_or_none(video.get('seasonNr')),
- 'episode_number': int_or_none(video.get('episodeNr')),
- }
-
-
-class FunkMixIE(FunkBaseIE):
- _VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
+class FunkIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?funk\.net/(?:channel|playlist)/[^/]+/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
- 'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten',
- 'md5': '8edf617c2f2b7c9847dfda313f199009',
- 'info_dict': {
- 'id': '123748',
- 'ext': 'mp4',
- 'title': '"Die realste Kifferdoku aller Zeiten"',
- 'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
- 'timestamp': 1490274721,
- 'upload_date': '20170323',
- },
- }]
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- mix_id = mobj.group('id')
- alias = mobj.group('alias')
-
- lists = self._download_json(
- 'https://www.funk.net/api/v3.1/curation/curatedLists/',
- mix_id, headers=self._make_headers(url), query={
- 'size': 100,
- })['_embedded']['curatedListList']
-
- metas = next(
- l for l in lists
- if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
- video = next(
- meta['videoDataDelegate']
- for meta in metas
- if try_get(
- meta, lambda x: x['videoDataDelegate']['alias'],
- compat_str) == alias)
-
- return self._make_url_result(video)
-
-
-class FunkChannelIE(FunkBaseIE):
- _VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
- _TESTS = [{
- 'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
+ 'url': 'https://www.funk.net/channel/ba-793/die-lustigsten-instrumente-aus-dem-internet-teil-2-1155821',
+ 'md5': '8dd9d9ab59b4aa4173b3197f2ea48e81',
'info_dict': {
'id': '1155821',
'ext': 'mp4',
'timestamp': 1514507395,
'upload_date': '20171229',
},
- 'params': {
- 'skip_download': True,
- },
- }, {
- # only available via byIdList API
- 'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
- 'info_dict': {
- 'id': '205067',
- 'ext': 'mp4',
- 'title': 'Martin Sonneborn erklärt die EU',
- 'description': 'md5:050f74626e4ed87edf4626d2024210c0',
- 'timestamp': 1494424042,
- 'upload_date': '20170510',
- },
- 'params': {
- 'skip_download': True,
- },
+
}, {
- 'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
+ 'url': 'https://www.funk.net/playlist/neuesteVideos/kameras-auf-dem-fusion-festival-1618699',
'only_matching': True,
}]
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- channel_id = mobj.group('id')
- alias = mobj.group('alias')
-
- headers = self._make_headers(url)
-
- video = None
-
- # Id-based channels are currently broken on their side: webplayer
- # tries to process them via byChannelAlias endpoint and fails
- # predictably.
- for page_num in itertools.count():
- by_channel_alias = self._download_json(
- 'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
- % channel_id,
- 'Downloading byChannelAlias JSON page %d' % (page_num + 1),
- headers=headers, query={
- 'filterFsk': 'false',
- 'sort': 'creationDate,desc',
- 'size': 100,
- 'page': page_num,
- }, fatal=False)
- if not by_channel_alias:
- break
- video_list = try_get(
- by_channel_alias, lambda x: x['_embedded']['videoList'], list)
- if not video_list:
- break
- try:
- video = next(r for r in video_list if r.get('alias') == alias)
- break
- except StopIteration:
- pass
- if not try_get(
- by_channel_alias, lambda x: x['_links']['next']):
- break
-
- if not video:
- by_id_list = self._download_json(
- 'https://www.funk.net/api/v3.0/content/videos/byIdList',
- channel_id, 'Downloading byIdList JSON', headers=headers,
- query={
- 'ids': alias,
- }, fatal=False)
- if by_id_list:
- video = try_get(by_id_list, lambda x: x['result'][0], dict)
-
- if not video:
- results = self._download_json(
- 'https://www.funk.net/api/v3.0/content/videos/filter',
- channel_id, 'Downloading filter JSON', headers=headers, query={
- 'channelId': channel_id,
- 'size': 100,
- })['result']
- video = next(r for r in results if r.get('alias') == alias)
-
- return self._make_url_result(video)
+ display_id, nexx_id = re.match(self._VALID_URL, url).groups()
+ video = self._download_json(
+ 'https://www.funk.net/api/v4.0/videos/' + nexx_id, nexx_id)
+ return {
+ '_type': 'url_transparent',
+ 'url': 'nexx:741:' + nexx_id,
+ 'ie_key': NexxIE.ie_key(),
+ 'id': nexx_id,
+ 'title': video.get('title'),
+ 'description': video.get('description'),
+ 'duration': int_or_none(video.get('duration')),
+ 'channel_id': str_or_none(video.get('channelId')),
+ 'display_id': display_id,
+ 'tags': video.get('tags'),
+ 'thumbnail': video.get('imageUrlLandscape'),
+ }
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- ExtractorError,
- float_or_none,
- int_or_none,
- unified_timestamp,
-)
-
-
-class FunnyOrDieIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|articles|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
- _TESTS = [{
- 'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
- 'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
- 'info_dict': {
- 'id': '0732f586d7',
- 'ext': 'mp4',
- 'title': 'Heart-Shaped Box: Literal Video Version',
- 'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
- 'thumbnail': r're:^http:.*\.jpg$',
- 'uploader': 'DASjr',
- 'timestamp': 1317904928,
- 'upload_date': '20111006',
- 'duration': 318.3,
- },
- }, {
- 'url': 'http://www.funnyordie.com/embed/e402820827',
- 'info_dict': {
- 'id': 'e402820827',
- 'ext': 'mp4',
- 'title': 'Please Use This Song (Jon Lajoie)',
- 'description': 'Please use this to sell something. www.jonlajoie.com',
- 'thumbnail': r're:^http:.*\.jpg$',
- 'timestamp': 1398988800,
- 'upload_date': '20140502',
- },
- 'params': {
- 'skip_download': True,
- },
- }, {
- 'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
-
- video_id = mobj.group('id')
- webpage = self._download_webpage(url, video_id)
-
- links = re.findall(r'<source src="([^"]+/v)[^"]+\.([^"]+)" type=\'video', webpage)
- if not links:
- raise ExtractorError('No media links available for %s' % video_id)
-
- links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
-
- m3u8_url = self._search_regex(
- r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
- webpage, 'm3u8 url', group='url')
-
- formats = []
-
- m3u8_formats = self._extract_m3u8_formats(
- m3u8_url, video_id, 'mp4', 'm3u8_native',
- m3u8_id='hls', fatal=False)
- source_formats = list(filter(
- lambda f: f.get('vcodec') != 'none', m3u8_formats))
-
- bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)(?=[,/])', m3u8_url)]
- bitrates.sort()
-
- if source_formats:
- self._sort_formats(source_formats)
-
- for bitrate, f in zip(bitrates, source_formats or [{}] * len(bitrates)):
- for path, ext in links:
- ff = f.copy()
- if ff:
- if ext != 'mp4':
- ff = dict(
- [(k, v) for k, v in ff.items()
- if k in ('height', 'width', 'format_id')])
- ff.update({
- 'format_id': ff['format_id'].replace('hls', ext),
- 'ext': ext,
- 'protocol': 'http',
- })
- else:
- ff.update({
- 'format_id': '%s-%d' % (ext, bitrate),
- 'vbr': bitrate,
- })
- ff['url'] = self._proto_relative_url(
- '%s%d.%s' % (path, bitrate, ext))
- formats.append(ff)
- self._check_formats(formats, video_id)
-
- formats.extend(m3u8_formats)
- self._sort_formats(
- formats, field_preference=('height', 'width', 'tbr', 'format_id'))
-
- subtitles = {}
- for src, src_lang in re.findall(r'<track kind="captions" src="([^"]+)" srclang="([^"]+)"', webpage):
- subtitles[src_lang] = [{
- 'ext': src.split('/')[-1],
- 'url': 'http://www.funnyordie.com%s' % src,
- }]
-
- timestamp = unified_timestamp(self._html_search_meta(
- 'uploadDate', webpage, 'timestamp', default=None))
-
- uploader = self._html_search_regex(
- r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
- webpage, 'uploader', default=None)
-
- title, description, thumbnail, duration = [None] * 4
-
- medium = self._parse_json(
- self._search_regex(
- r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
- default='{}'),
- video_id, fatal=False)
- if medium:
- title = medium.get('title')
- duration = float_or_none(medium.get('duration'))
- if not timestamp:
- timestamp = unified_timestamp(medium.get('publishDate'))
-
- post = self._parse_json(
- self._search_regex(
- r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
- default='{}'),
- video_id, fatal=False)
- if post:
- if not title:
- title = post.get('name')
- description = post.get('description')
- thumbnail = post.get('picture')
-
- if not title:
- title = self._og_search_title(webpage)
- if not description:
- description = self._og_search_description(webpage)
- if not duration:
- duration = int_or_none(self._html_search_meta(
- ('video:duration', 'duration'), webpage, 'duration', default=False))
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'uploader': uploader,
- 'timestamp': timestamp,
- 'duration': duration,
- 'formats': formats,
- 'subtitles': subtitles,
- }
# coding: utf-8
from __future__ import unicode_literals
+from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
+from ..utils import (
+ clean_html,
+ get_element_by_class,
+ get_element_by_id,
+)
class GameInformerIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx'
- _TEST = {
+ _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>[^.?&#]+)'
+ _TESTS = [{
+ # normal Brightcove embed code extracted with BrightcoveNewIE._extract_url
'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
'md5': '292f26da1ab4beb4c9099f1304d2b071',
'info_dict': {
'upload_date': '20150928',
'uploader_id': '694940074001',
},
- }
+ }, {
+ # Brightcove id inside unique element with field--name-field-brightcove-video-id class
+ 'url': 'https://www.gameinformer.com/video-feature/new-gameplay-today/2019/07/09/new-gameplay-today-streets-of-rogue',
+ 'info_dict': {
+ 'id': '6057111913001',
+ 'ext': 'mp4',
+ 'title': 'New Gameplay Today – Streets Of Rogue',
+ 'timestamp': 1562699001,
+ 'upload_date': '20190709',
+ 'uploader_id': '694940074001',
+
+ },
+ }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(
url, display_id, headers=self.geo_verification_headers())
- brightcove_id = self._search_regex(
- [r'<[^>]+\bid=["\']bc_(\d+)', r"getVideo\('[^']+video_id=(\d+)"],
- webpage, 'brightcove id')
- return self.url_result(
- self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
- brightcove_id)
+ brightcove_id = clean_html(get_element_by_class('field--name-field-brightcove-video-id', webpage) or get_element_by_id('video-source-content', webpage))
+ brightcove_url = self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id if brightcove_id else BrightcoveNewIE._extract_url(self, webpage)
+ return self.url_result(brightcove_url, 'BrightcoveNew', brightcove_id)
},
'playlist_count': 6,
},
+ {
+ # Squarespace video embed, 2019-08-28
+ 'url': 'http://ootboxford.com',
+ 'info_dict': {
+ 'id': 'Tc7b_JGdZfw',
+ 'title': 'Out of the Blue, at Childish Things 10',
+ 'ext': 'mp4',
+ 'description': 'md5:a83d0026666cf5ee970f8bd1cfd69c7f',
+ 'uploader_id': 'helendouglashouse',
+ 'uploader': 'Helen & Douglas House',
+ 'upload_date': '20140328',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ },
{
# Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
default_search = 'fixup_error'
if default_search in ('auto', 'auto_warning', 'fixup_error'):
- if '/' in url:
+ if re.match(r'^[^\s/]+\.[^\s/]+/', url):
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
elif default_search != 'fixup_error':
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage)
+ # Unescape squarespace embeds to be detected by generic extractor,
+ # see https://github.com/ytdl-org/youtube-dl/issues/21294
+ webpage = re.sub(
+ r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
+ lambda x: unescapeHTML(x.group(0)), webpage)
+
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
class GfycatIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
+ _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
'categories': list,
'age_limit': 0,
}
+ }, {
+ 'url': 'https://gfycat.com/ru/RemarkableDrearyAmurstarfish',
+ 'only_matching': True
}, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True
'watchdisneyxd': {
'brand': '009',
'resource_id': 'DisneyXD',
+ },
+ 'disneynow': {
+ 'brand': '011',
+ 'resource_id': 'Disney',
}
}
- _VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|disneynow)\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
+ _VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
display_id)['video']
def _real_extract(self, url):
- sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
+ mobj = re.match(self._VALID_URL, url)
+ sub_domain = mobj.group('sub_domain') or mobj.group('sub_domain_2')
+ video_id, display_id = mobj.group('id', 'display_id')
site_info = self._SITE_INFO.get(sub_domain, {})
brand = site_info.get('brand')
if not video_id or not site_info:
{
'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
'only_matching': True,
+ },
+ {
+ # unavailable source format
+ 'url': 'kaltura:513551:1_66x4rg7o',
+ 'only_matching': True,
}
]
f['fileExt'] = 'mp4'
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
+ format_id = '%(fileExt)s-%(bitrate)s' % f
+ # Source format may not be available (e.g. kaltura:513551:1_66x4rg7o)
+ if f.get('isOriginal') is True and not self._is_valid_url(
+ video_url, entry_id, format_id):
+ continue
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId')
formats.append({
- 'format_id': '%(fileExt)s-%(bitrate)s' % f,
+ 'format_id': format_id,
'ext': f.get('fileExt'),
'tbr': int_or_none(f['bitrate']),
'fps': int_or_none(f.get('frameRate')),
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
+ clean_html,
determine_ext,
- extract_attributes,
ExtractorError,
float_or_none,
int_or_none,
class LecturioBaseIE(InfoExtractor):
+ _API_BASE_URL = 'https://app.lecturio.com/api/en/latest/html5/'
_LOGIN_URL = 'https://app.lecturio.com/en/login'
_NETRC_MACHINE = 'lecturio'
_VALID_URL = r'''(?x)
https://
(?:
- app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture|
- (?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag
+ app\.lecturio\.com/([^/]+/(?P<nt>[^/?#&]+)\.lecture|(?:\#/)?lecture/c/\d+/(?P<id>\d+))|
+ (?:www\.)?lecturio\.de/[^/]+/(?P<nt_de>[^/?#&]+)\.vortrag
)
'''
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
- 'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
+ 'md5': '9a42cf1d8282a6311bf7211bbde26fde',
'info_dict': {
'id': '39634',
'ext': 'mp4',
- 'title': 'Important Concepts and Terms â\80\93 Introduction to Microbiology',
+ 'title': 'Important Concepts and Terms â\80\94 Introduction to Microbiology',
},
'skip': 'Requires lecturio account credentials',
}, {
'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
'only_matching': True,
+ }, {
+ 'url': 'https://app.lecturio.com/#/lecture/c/6434/39634',
+ 'only_matching': True,
}]
_CC_LANGS = {
+ 'Arabic': 'ar',
+ 'Bulgarian': 'bg',
'German': 'de',
'English': 'en',
'Spanish': 'es',
+ 'Persian': 'fa',
'French': 'fr',
+ 'Japanese': 'ja',
'Polish': 'pl',
+ 'Pashto': 'ps',
'Russian': 'ru',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
- display_id = mobj.group('id') or mobj.group('id_de')
-
- webpage = self._download_webpage(
- 'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
- display_id)
-
- lecture_id = self._search_regex(
- r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
-
- api_url = self._search_regex(
- r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
- 'api url', group='url')
-
- video = self._download_json(api_url, display_id)
-
+ nt = mobj.group('nt') or mobj.group('nt_de')
+ lecture_id = mobj.group('id')
+ display_id = nt or lecture_id
+ api_path = 'lectures/' + lecture_id if lecture_id else 'lecture/' + nt + '.json'
+ video = self._download_json(
+ self._API_BASE_URL + api_path, display_id)
title = video['title'].strip()
+ if not lecture_id:
+ pid = video.get('productId') or video.get('uid')
+ if pid:
+ spid = pid.split('_')
+ if spid and len(spid) == 2:
+ lecture_id = spid[1]
formats = []
for format_ in video['content']['media']:
continue
label = str_or_none(format_.get('label'))
filesize = int_or_none(format_.get('fileSize'))
- formats.append({
+ f = {
'url': file_url,
'format_id': label,
'filesize': float_or_none(filesize, invscale=1000)
- })
+ }
+ if label:
+ mobj = re.match(r'(\d+)p\s*\(([^)]+)\)', label)
+ if mobj:
+ f.update({
+ 'format_id': mobj.group(2),
+ 'height': int(mobj.group(1)),
+ })
+ formats.append(f)
self._sort_formats(formats)
subtitles = {}
automatic_captions = {}
- cc = self._parse_json(
- self._search_regex(
- r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
- default='{}'), display_id, fatal=False)
- for cc_label, cc_url in cc.items():
- cc_url = url_or_none(cc_url)
+ captions = video.get('captions') or []
+ for cc in captions:
+ cc_url = cc.get('url')
if not cc_url:
continue
- lang = self._search_regex(
+ cc_label = cc.get('translatedCode')
+ lang = cc.get('languageCode') or self._search_regex(
r'/([a-z]{2})_', cc_url, 'lang',
default=cc_label.split()[0] if cc_label else 'en')
original_lang = self._search_regex(
})
return {
- 'id': lecture_id,
+ 'id': lecture_id or nt,
'title': title,
'formats': formats,
'subtitles': subtitles,
class LecturioCourseIE(LecturioBaseIE):
- _VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course'
- _TEST = {
+ _VALID_URL = r'https://app\.lecturio\.com/(?:[^/]+/(?P<nt>[^/?#&]+)\.course|(?:#/)?course/c/(?P<id>\d+))'
+ _TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
'info_dict': {
'id': 'microbiology-introduction',
'title': 'Microbiology: Introduction',
+ 'description': 'md5:13da8500c25880c6016ae1e6d78c386a',
},
'playlist_count': 45,
'skip': 'Requires lecturio account credentials',
- }
+ }, {
+ 'url': 'https://app.lecturio.com/#/course/c/6434',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
- display_id = self._match_id(url)
-
- webpage = self._download_webpage(url, display_id)
-
+ nt, course_id = re.match(self._VALID_URL, url).groups()
+ display_id = nt or course_id
+ api_path = 'courses/' + course_id if course_id else 'course/content/' + nt + '.json'
+ course = self._download_json(
+ self._API_BASE_URL + api_path, display_id)
entries = []
- for mobj in re.finditer(
- r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
- webpage):
- params = extract_attributes(mobj.group(0))
- lecture_url = urljoin(url, params.get('data-url'))
- lecture_id = params.get('data-id')
+ for lecture in course.get('lectures', []):
+ lecture_id = str_or_none(lecture.get('id'))
+ lecture_url = lecture.get('url')
+ if lecture_url:
+ lecture_url = urljoin(url, lecture_url)
+ else:
+ lecture_url = 'https://app.lecturio.com/#/lecture/c/%s/%s' % (course_id, lecture_id)
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
-
- title = self._search_regex(
- r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage,
- 'title', default=None)
-
- return self.playlist_result(entries, display_id, title)
+ return self.playlist_result(
+ entries, display_id, course.get('title'),
+ clean_html(course.get('description')))
class LecturioDeCourseIE(LecturioBaseIE):
elif play_json.get('code'):
raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
else:
- raise ExtractorError('Letv cloud returned an unknwon error')
+ raise ExtractorError('Letv cloud returned an unknown error')
def b64decode(s):
return compat_b64decode(s).decode('utf-8')
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import int_or_none
+
+
+class LiveJournalIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:[^.]+\.)?livejournal\.com/video/album/\d+.+?\bid=(?P<id>\d+)'
+ _TEST = {
+ 'url': 'https://andrei-bt.livejournal.com/video/album/407/?mode=view&id=51272',
+ 'md5': 'adaf018388572ced8a6f301ace49d4b2',
+ 'info_dict': {
+ 'id': '1263729',
+ 'ext': 'mp4',
+ 'title': 'Истребители против БПЛА',
+ 'upload_date': '20190624',
+ 'timestamp': 1561406715,
+ }
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+ record = self._parse_json(self._search_regex(
+ r'Site\.page\s*=\s*({.+?});', webpage,
+ 'page data'), video_id)['video']['record']
+ storage_id = compat_str(record['storageid'])
+ title = record.get('name')
+ if title:
+ # remove filename extension(.mp4, .mov, etc...)
+ title = title.rsplit('.', 1)[0]
+ return {
+ '_type': 'url_transparent',
+ 'id': video_id,
+ 'title': title,
+ 'thumbnail': record.get('thumbnail'),
+ 'timestamp': int_or_none(record.get('timecreate')),
+ 'url': 'eagleplatform:vc.videos.livejournal.com:' + storage_id,
+ 'ie_key': 'EaglePlatform',
+ }
}, {
'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
'only_matching': True,
+ }, {
+ # Status="NotFound", Message="Transcript not found"
+ 'url': 'https://www.lynda.com/ASP-NET-tutorials/What-you-should-know/5034180/2811512-4.html',
+ 'only_matching': True,
}]
def _raise_unavailable(self, video_id):
def _get_subtitles(self, video_id):
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
- subs = self._download_json(url, None, False)
+ subs = self._download_webpage(
+ url, video_id, 'Downloading subtitles JSON', fatal=False)
+ if not subs or 'Status="NotFound"' in subs:
+ return {}
+ subs = self._parse_json(subs, video_id, fatal=False)
+ if not subs:
+ return {}
fixed_subs = self._fix_subtitles(subs)
if fixed_subs:
return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
- else:
- return {}
+ return {}
class LyndaCourseIE(LyndaBaseIE):
'ext': 'mp4',
'tbr': tbr,
'protocol': 'm3u8_native',
+ 'http_headers': {
+ 'Referer': url,
+ },
+ 'format_note': stream.get('name'),
})
self._sort_formats(formats)
class OpenloadIE(InfoExtractor):
- _DOMAINS = r'(?:openload\.(?:co|io|link|pw)|oload\.(?:tv|biz|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|press|pw|life|live|space|services|website)|oladblock\.(?:services|xyz|me)|openloed\.co)'
+ _DOMAINS = r'''
+ (?:
+ openload\.(?:co|io|link|pw)|
+ oload\.(?:tv|best|biz|stream|site|xyz|win|download|cloud|cc|icu|fun|club|info|press|pw|life|live|space|services|website|vip)|
+ oladblock\.(?:services|xyz|me)|openloed\.co
+ )
+ '''
_VALID_URL = r'''(?x)
https?://
(?P<host>
}, {
'url': 'https://oload.biz/f/bEk3Gp8ARr4/',
'only_matching': True,
+ }, {
+ 'url': 'https://oload.best/embed/kkz9JgVZeWc/',
+ 'only_matching': True,
}, {
'url': 'https://oladblock.services/f/b8NWEgkqNLI/',
'only_matching': True,
}, {
'url': 'https://openloed.co/f/b8NWEgkqNLI/',
'only_matching': True,
+ }, {
+ 'url': 'https://oload.vip/f/kUEfGclsU9o',
+ 'only_matching': True,
}]
@classmethod
def _extract_urls(cls, webpage):
return re.findall(
- r'<iframe[^>]+src=["\']((?:https?://)?%s/%s/[a-zA-Z0-9-_]+)'
+ r'(?x)<iframe[^>]+src=["\']((?:https?://)?%s/%s/[a-zA-Z0-9-_]+)'
% (cls._DOMAINS, cls._EMBED_WORD), webpage)
def _extract_decrypted_page(self, page_url, webpage, video_id):
class VerystreamIE(OpenloadIE):
IE_NAME = 'verystream'
- _DOMAINS = r'(?:verystream\.com)'
+ _DOMAINS = r'(?:verystream\.com|woof\.tube)'
_VALID_URL = r'''(?x)
https?://
(?P<host>
from .common import InfoExtractor
from ..compat import (
- compat_str,
+ # compat_str,
compat_HTTPError,
)
from ..utils import (
clean_html,
ExtractorError,
- remove_end,
+ # remove_end,
+ str_or_none,
strip_or_none,
unified_timestamp,
- urljoin,
+ # urljoin,
)
class PacktPubBaseIE(InfoExtractor):
- _PACKT_BASE = 'https://www.packtpub.com'
- _MAPT_REST = '%s/mapt-rest' % _PACKT_BASE
+ # _PACKT_BASE = 'https://www.packtpub.com'
+ _STATIC_PRODUCTS_BASE = 'https://static.packt-cdn.com/products/'
class PacktPubIE(PacktPubBaseIE):
- _VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>[^/]+)/(?P<id>[^/]+)(?:/(?P<display_id>[^/?&#]+))?'
_TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
}, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
'only_matching': True,
+ }, {
+ 'url': 'https://subscription.packtpub.com/video/programming/9781838988906/p1/video1_1/business-card-project',
+ 'only_matching': True,
}]
_NETRC_MACHINE = 'packtpub'
_TOKEN = None
return
try:
self._TOKEN = self._download_json(
- self._MAPT_REST + '/users/tokens', None,
+ 'https://services.packtpub.com/auth-v1/users/tokens', None,
'Downloading Authorization Token', data=json.dumps({
- 'email': username,
+ 'username': username,
'password': password,
}).encode())['data']['access']
except ExtractorError as e:
raise ExtractorError(message, expected=True)
raise
- def _handle_error(self, response):
- if response.get('status') != 'success':
- raise ExtractorError(
- '% said: %s' % (self.IE_NAME, response['message']),
- expected=True)
-
- def _download_json(self, *args, **kwargs):
- response = super(PacktPubIE, self)._download_json(*args, **kwargs)
- self._handle_error(response)
- return response
-
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- course_id, chapter_id, video_id = mobj.group(
- 'course_id', 'chapter_id', 'id')
+ course_id, chapter_id, video_id, display_id = re.match(self._VALID_URL, url).groups()
headers = {}
if self._TOKEN:
headers['Authorization'] = 'Bearer ' + self._TOKEN
- video = self._download_json(
- '%s/users/me/products/%s/chapters/%s/sections/%s'
- % (self._MAPT_REST, course_id, chapter_id, video_id), video_id,
- 'Downloading JSON video', headers=headers)['data']
-
- content = video.get('content')
- if not content:
- self.raise_login_required('This video is locked')
-
- video_url = content['file']
+ try:
+ video_url = self._download_json(
+ 'https://services.packtpub.com/products-v1/products/%s/%s/%s' % (course_id, chapter_id, video_id), video_id,
+ 'Downloading JSON video', headers=headers)['data']
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
+ self.raise_login_required('This video is locked')
+ raise
- metadata = self._download_json(
- '%s/products/%s/chapters/%s/sections/%s/metadata'
- % (self._MAPT_REST, course_id, chapter_id, video_id),
- video_id)['data']
+ # TODO: find a better way to avoid duplicating course requests
+ # metadata = self._download_json(
+ # '%s/products/%s/chapters/%s/sections/%s/metadata'
+ # % (self._MAPT_REST, course_id, chapter_id, video_id),
+ # video_id)['data']
- title = metadata['pageTitle']
- course_title = metadata.get('title')
- if course_title:
- title = remove_end(title, ' - %s' % course_title)
- timestamp = unified_timestamp(metadata.get('publicationDate'))
- thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
+ # title = metadata['pageTitle']
+ # course_title = metadata.get('title')
+ # if course_title:
+ # title = remove_end(title, ' - %s' % course_title)
+ # timestamp = unified_timestamp(metadata.get('publicationDate'))
+ # thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
return {
'id': video_id,
'url': video_url,
- 'title': title,
- 'thumbnail': thumbnail,
- 'timestamp': timestamp,
+ 'title': display_id or video_id, # title,
+ # 'thumbnail': thumbnail,
+ # 'timestamp': timestamp,
}
'info_dict': {
'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]',
+ 'description': 'md5:489da8d953f416e51927b60a1c7db0aa',
},
'playlist_count': 90,
}, {
url, course_id = mobj.group('url', 'id')
course = self._download_json(
- '%s/products/%s/metadata' % (self._MAPT_REST, course_id),
- course_id)['data']
+ self._STATIC_PRODUCTS_BASE + '%s/toc' % course_id, course_id)
+ metadata = self._download_json(
+ self._STATIC_PRODUCTS_BASE + '%s/summary' % course_id,
+ course_id, fatal=False) or {}
entries = []
- for chapter_num, chapter in enumerate(course['tableOfContents'], 1):
- if chapter.get('type') != 'chapter':
- continue
- children = chapter.get('children')
- if not isinstance(children, list):
+ for chapter_num, chapter in enumerate(course['chapters'], 1):
+ chapter_id = str_or_none(chapter.get('id'))
+ sections = chapter.get('sections')
+ if not chapter_id or not isinstance(sections, list):
continue
chapter_info = {
'chapter': chapter.get('title'),
'chapter_number': chapter_num,
- 'chapter_id': chapter.get('id'),
+ 'chapter_id': chapter_id,
}
- for section in children:
- if section.get('type') != 'section':
- continue
- section_url = section.get('seoUrl')
- if not isinstance(section_url, compat_str):
+ for section in sections:
+ section_id = str_or_none(section.get('id'))
+ if not section_id or section.get('contentType') != 'video':
continue
entry = {
'_type': 'url_transparent',
- 'url': urljoin(url + '/', section_url),
+ 'url': '/'.join([url, chapter_id, section_id]),
'title': strip_or_none(section.get('title')),
'description': clean_html(section.get('summary')),
+ 'thumbnail': metadata.get('coverImage'),
+ 'timestamp': unified_timestamp(metadata.get('publicationDate')),
'ie_key': PacktPubIE.ie_key(),
}
entry.update(chapter_info)
entries.append(entry)
- return self.playlist_result(entries, course_id, course.get('title'))
+ return self.playlist_result(
+ entries, course_id, metadata.get('title'),
+ clean_html(metadata.get('about')))
@staticmethod
def _extract_peertube_url(webpage, source_url):
mobj = re.match(
- r'https?://(?P<host>[^/]+)/videos/watch/(?P<id>%s)'
+ r'https?://(?P<host>[^/]+)/videos/(?:watch|embed)/(?P<id>%s)'
% PeerTubeIE._UUID_RE, source_url)
if mobj and any(p in webpage for p in (
'<title>PeerTube<',
_VALID_URL = r'''(?x)
https?://
(?:
- live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
+ live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|embed(?:app)?/|misc/Playlist\.ashx\?id=)|
pad\.philharmoniedeparis\.fr/doc/CIMU/
)
(?P<id>\d+)
}, {
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True,
+ }, {
+ 'url': 'https://live.philharmoniedeparis.fr/embedapp/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://live.philharmoniedeparis.fr/embed/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
+ 'only_matching': True,
}]
_LIVE_URL = 'https://live.philharmoniedeparis.fr'
_VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
_TESTS = [
{
- 'url': 'http://player.piksel.com/v/nv60p12f',
- 'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
+ 'url': 'http://player.piksel.com/v/ums2867l',
+ 'md5': '34e34c8d89dc2559976a6079db531e85',
'info_dict': {
- 'id': 'nv60p12f',
+ 'id': 'ums2867l',
'ext': 'mp4',
- 'title': 'فن الحياة - الحلقة 1',
- 'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
- 'timestamp': 1465231790,
- 'upload_date': '20160606',
+ 'title': 'GX-005 with Caption',
+ 'timestamp': 1481335659,
+ 'upload_date': '20161210'
}
},
{
'title': 'WAW- State of Washington vs. Donald J. Trump, et al',
'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.',
'timestamp': 1486171129,
- 'upload_date': '20170204',
+ 'upload_date': '20170204'
}
}
]
})
self._sort_formats(formats)
+ subtitles = {}
+ for caption in video_data.get('captions', []):
+ caption_url = caption.get('url')
+ if caption_url:
+ subtitles.setdefault(caption.get('locale', 'en'), []).append({
+ 'url': caption_url})
+
return {
'id': video_id,
'title': title,
'thumbnail': video_data.get('thumbnailUrl'),
'timestamp': parse_iso8601(video_data.get('dateadd')),
'formats': formats,
+ 'subtitles': subtitles,
}
r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title')
title = title.replace('\n', '')
- info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
+ video_link_url = self._search_regex(
+ r'<textarea[^>]+id=["\']fm-video_link[^>]+>([^<]+)</textarea>',
+ webpage, 'video link')
+ videopage = self._download_webpage(video_link_url, video_id)
+
+ info_dict = self._parse_html5_media_entries(url, videopage, video_id)[0]
duration = parse_duration(self._search_regex(
r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))
import re
from .common import InfoExtractor
+from ..compat import (
+ compat_HTTPError,
+ compat_str,
+)
from ..utils import (
ExtractorError,
int_or_none,
- strip_or_none,
- unescapeHTML,
+ str_or_none,
urlencode_postdata,
)
class RoosterTeethIE(InfoExtractor):
- _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/episode/(?P<id>[^/?#&]+)'
+ _VALID_URL = r'https?://(?:.+?\.)?roosterteeth\.com/(?:episode|watch)/(?P<id>[^/?#&]+)'
_LOGIN_URL = 'https://roosterteeth.com/login'
_NETRC_MACHINE = 'roosterteeth'
_TESTS = [{
'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'md5': 'e2bd7764732d785ef797700a2489f212',
'info_dict': {
- 'id': '26576',
+ 'id': '9156',
'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'ext': 'mp4',
- 'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
- 'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
+ 'title': 'Million Dollars, But... The Game Announcement',
+ 'description': 'md5:168a54b40e228e79f4ddb141e89fe4f5',
'thumbnail': r're:^https?://.*\.png$',
'series': 'Million Dollars, But...',
'episode': 'Million Dollars, But... The Game Announcement',
- 'comment_count': int,
},
}, {
'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
# only available for FIRST members
'url': 'http://roosterteeth.com/episode/rt-docs-the-world-s-greatest-head-massage-the-world-s-greatest-head-massage-an-asmr-journey-part-one',
'only_matching': True,
+ }, {
+ 'url': 'https://roosterteeth.com/watch/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
+ 'only_matching': True,
}]
def _login(self):
def _real_extract(self, url):
display_id = self._match_id(url)
-
- webpage = self._download_webpage(url, display_id)
-
- episode = strip_or_none(unescapeHTML(self._search_regex(
- (r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
- r'<title>(?P<title>[^<]+)</title>'), webpage, 'title',
- default=None, group='title')))
-
- title = strip_or_none(self._og_search_title(
- webpage, default=None)) or episode
-
- m3u8_url = self._search_regex(
- r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
- webpage, 'm3u8 url', default=None, group='url')
-
- if not m3u8_url:
- if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
- self.raise_login_required(
- '%s is only available for FIRST members' % display_id)
-
- if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
- self.raise_login_required('%s is not available yet' % display_id)
-
- raise ExtractorError('Unable to extract m3u8 URL')
+ api_episode_url = 'https://svod-be.roosterteeth.com/api/v1/episodes/%s' % display_id
+
+ try:
+ m3u8_url = self._download_json(
+ api_episode_url + '/videos', display_id,
+ 'Downloading video JSON metadata')['data'][0]['attributes']['url']
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+ if self._parse_json(e.cause.read().decode(), display_id).get('access') is False:
+ self.raise_login_required(
+ '%s is only available for FIRST members' % display_id)
+ raise
formats = self._extract_m3u8_formats(
- m3u8_url, display_id, ext='mp4',
- entry_protocol='m3u8_native', m3u8_id='hls')
+ m3u8_url, display_id, 'mp4', 'm3u8_native', m3u8_id='hls')
self._sort_formats(formats)
- description = strip_or_none(self._og_search_description(webpage))
- thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage))
-
- series = self._search_regex(
- (r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'),
- webpage, 'series', fatal=False)
-
- comment_count = int_or_none(self._search_regex(
- r'>Comments \((\d+)\)<', webpage,
- 'comment count', fatal=False))
-
- video_id = self._search_regex(
- (r'containerId\s*=\s*["\']episode-(\d+)\1',
- r'<div[^<]+id=["\']episode-(\d+)'), webpage,
- 'video id', default=display_id)
+ episode = self._download_json(
+ api_episode_url, display_id,
+ 'Downloading episode JSON metadata')['data'][0]
+ attributes = episode['attributes']
+ title = attributes.get('title') or attributes['display_title']
+ video_id = compat_str(episode['id'])
+
+ thumbnails = []
+ for image in episode.get('included', {}).get('images', []):
+ if image.get('type') == 'episode_image':
+ img_attributes = image.get('attributes') or {}
+ for k in ('thumb', 'small', 'medium', 'large'):
+ img_url = img_attributes.get(k)
+ if img_url:
+ thumbnails.append({
+ 'id': k,
+ 'url': img_url,
+ })
return {
'id': video_id,
'display_id': display_id,
'title': title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'series': series,
- 'episode': episode,
- 'comment_count': comment_count,
+ 'description': attributes.get('description') or attributes.get('caption'),
+ 'thumbnails': thumbnails,
+ 'series': attributes.get('show_title'),
+ 'season_number': int_or_none(attributes.get('season_number')),
+ 'season_id': attributes.get('season_id'),
+ 'episode': title,
+ 'episode_number': int_or_none(attributes.get('number')),
+ 'episode_id': str_or_none(episode.get('uuid')),
'formats': formats,
+ 'channel_id': attributes.get('channel_id'),
+ 'duration': int_or_none(attributes.get('length')),
}
'duration': 1167.96,
},
}, {
- # best format avaialble a3t
+ # best format available a3t
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'md5': 'dea7474214af1271d91ef332fb8be7ea',
'info_dict': {
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- js_to_json,
- get_element_by_class,
- unified_strdate,
-)
-
-
-class RudoIE(InfoExtractor):
- _VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
-
- _TEST = {
- 'url': 'http://rudo.video/vod/oTzw0MGnyG',
- 'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
- 'info_dict': {
- 'id': 'oTzw0MGnyG',
- 'ext': 'mp4',
- 'title': 'Comentario Tomás Mosciatti',
- 'upload_date': '20160617',
- },
- }
-
- @classmethod
- def _extract_url(cls, webpage):
- mobj = re.search(
- r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
- webpage)
- if mobj:
- return mobj.group('url')
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
-
- jwplayer_data = self._parse_json(self._search_regex(
- r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
- transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
-
- info_dict = self._parse_jwplayer_data(
- jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
-
- info_dict.update({
- 'title': self._og_search_title(webpage),
- 'upload_date': unified_strdate(get_element_by_class('date', webpage)),
- })
-
- return info_dict
raise ExtractorError(
'Unable to login: %s' % credentials, expected=True)
- # oreilly serves two same groot_sessionid cookies in Set-Cookie header
- # and expects first one to be actually set
- self._apply_first_set_cookie_header(urlh, 'groot_sessionid')
+ # oreilly serves two same instances of the following cookies
+ # in Set-Cookie header and expects first one to be actually set
+ for cookie in ('groot_sessionid', 'orm-jwt', 'orm-rt'):
+ self._apply_first_set_cookie_header(urlh, cookie)
_, urlh = self._download_webpage_handle(
auth.get('redirect_uri') or next_uri, None, 'Completing login',)
'skip_download': True,
},
},
- # not avaialble via api.soundcloud.com/i1/tracks/id/streams
+ # not available via api.soundcloud.com/i1/tracks/id/streams
{
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
from .common import InfoExtractor
from ..utils import (
ExtractorError,
+ merge_dicts,
orderedSet,
parse_duration,
parse_resolution,
'description': 'dillion harper masturbates on a bed',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'silly2587',
+ 'timestamp': 1422571989,
+ 'upload_date': '20150129',
'age_limit': 18,
}
}, {
for format_id, format_url in stream.items():
if format_id.startswith(STREAM_URL_PREFIX):
+ if format_url and isinstance(format_url, list):
+ format_url = format_url[0]
extract_format(
format_id[len(STREAM_URL_PREFIX):], format_url)
self._sort_formats(formats)
+ info = self._search_json_ld(webpage, video_id, default={})
+
title = self._html_search_regex(
- r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title')
+ r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title', default=None)
description = self._search_regex(
r'<div[^>]+\bclass=["\']bottom[^>]+>\s*<p>[^<]*</p>\s*<p>([^<]+)',
- webpage, 'description', fatal=False)
- thumbnail = self._og_search_thumbnail(webpage)
- uploader = self._search_regex(
- r'class="user"[^>]*><img[^>]+>([^<]+)',
+ webpage, 'description', default=None)
+ thumbnail = self._og_search_thumbnail(webpage, default=None)
+ uploader = self._html_search_regex(
+ (r'(?s)<li[^>]+class=["\']profile[^>]+>(.+?)</a>',
+ r'class="user"[^>]*><img[^>]+>([^<]+)'),
webpage, 'uploader', default=None)
duration = parse_duration(self._search_regex(
r'<div[^>]+\bclass=["\']right_side[^>]+>\s*<span>([^<]+)',
- webpage, 'duration', fatal=False))
+ webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex(
- r'([\d,.]+)\s+plays', webpage, 'view count', fatal=False))
+ r'([\d,.]+)\s+plays', webpage, 'view count', default=None))
age_limit = self._rta_search(webpage)
- return {
+ return merge_dicts({
'id': video_id,
- 'title': title,
+ 'title': title or video_id,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'view_count': view_count,
'formats': formats,
'age_limit': age_limit,
- }
+ }, info
+ )
class SpankBangPlaylistIE(InfoExtractor):
'only_matching': True,
}]
- _FEED_URL = 'http://www.spike.com/feeds/mrss/'
+ _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']
def _extract_info(self, webpage):
info_json = self._search_regex(
- r'(?s)q\(\s*"\w+.init"\s*,\s*({.+})\)\s*</script>',
+ r'(?s)q\(\s*"\w+.init"\s*,\s*({.+?})\)\s*</script>',
webpage, 'info json')
return json.loads(info_json)
float_or_none,
int_or_none,
parse_age_limit,
+ try_get,
+ url_or_none,
)
_TESTS = [
{
'url': 'http://www.tvigle.ru/video/sokrat/',
- 'md5': '36514aed3657d4f70b4b2cef8eb520cd',
'info_dict': {
'id': '1848932',
'display_id': 'sokrat',
- 'ext': 'flv',
+ 'ext': 'mp4',
'title': 'Сократ',
'description': 'md5:d6b92ffb7217b4b8ebad2e7665253c17',
'duration': 6586,
},
{
'url': 'http://www.tvigle.ru/video/vladimir-vysotskii/vedushchii-teleprogrammy-60-minut-ssha-o-vladimire-vysotskom/',
- 'md5': 'e7efe5350dd5011d0de6550b53c3ba7b',
'info_dict': {
'id': '5142516',
'ext': 'flv',
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
(r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
- r'var\s+cloudId\s*=\s*["\'](\d+)',
+ r'cloudId\s*=\s*["\'](\d+)',
r'class="video-preview current_playing" id="(\d+)"'),
webpage, 'video id')
age_limit = parse_age_limit(item.get('ageRestrictions'))
formats = []
- for vcodec, fmts in item['videos'].items():
+ for vcodec, url_or_fmts in item['videos'].items():
if vcodec == 'hls':
- continue
- for format_id, video_url in fmts.items():
- if format_id == 'm3u8':
+ m3u8_url = url_or_none(url_or_fmts)
+ if not m3u8_url:
+ continue
+ formats.extend(self._extract_m3u8_formats(
+ m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+ elif vcodec == 'dash':
+ mpd_url = url_or_none(url_or_fmts)
+ if not mpd_url:
+ continue
+ formats.extend(self._extract_mpd_formats(
+ mpd_url, video_id, mpd_id='dash', fatal=False))
+ else:
+ if not isinstance(url_or_fmts, dict):
continue
- height = self._search_regex(
- r'^(\d+)[pP]$', format_id, 'height', default=None)
- formats.append({
- 'url': video_url,
- 'format_id': '%s-%s' % (vcodec, format_id),
- 'vcodec': vcodec,
- 'height': int_or_none(height),
- 'filesize': int_or_none(item.get('video_files_size', {}).get(vcodec, {}).get(format_id)),
- })
+ for format_id, video_url in url_or_fmts.items():
+ if format_id == 'm3u8':
+ continue
+ video_url = url_or_none(video_url)
+ if not video_url:
+ continue
+ height = self._search_regex(
+ r'^(\d+)[pP]$', format_id, 'height', default=None)
+ filesize = int_or_none(try_get(
+ item, lambda x: x['video_files_size'][vcodec][format_id]))
+ formats.append({
+ 'url': video_url,
+ 'format_id': '%s-%s' % (vcodec, format_id),
+ 'vcodec': vcodec,
+ 'height': int_or_none(height),
+ 'filesize': filesize,
+ })
self._sort_formats(formats)
return {
# coding: utf-8
from __future__ import unicode_literals
-from .mtv import MTVServicesInfoExtractor
+from .spike import ParamountNetworkIE
-class TVLandIE(MTVServicesInfoExtractor):
+class TVLandIE(ParamountNetworkIE):
IE_NAME = 'tvland.com'
_VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.tvland.com/feeds/mrss/'
_TESTS = [{
# Geo-restricted. Without a proxy metadata are still there. With a
# proxy it redirects to http://m.tvland.com/app/
- 'url': 'http://www.tvland.com/episodes/hqhps2/everybody-loves-raymond-the-invasion-ep-048',
+ 'url': 'https://www.tvland.com/episodes/s04pzf/everybody-loves-raymond-the-dog-season-1-ep-19',
'info_dict': {
- 'description': 'md5:80973e81b916a324e05c14a3fb506d29',
- 'title': 'The Invasion',
+ 'description': 'md5:84928e7a8ad6649371fbf5da5e1ad75a',
+ 'title': 'The Dog',
},
- 'playlist': [],
+ 'playlist_mincount': 5,
}, {
- 'url': 'http://www.tvland.com/video-clips/zea2ev/younger-younger--hilary-duff---little-lies',
+ 'url': 'https://www.tvland.com/video-clips/4n87f2/younger-a-first-look-at-younger-season-6',
'md5': 'e2c6389401cf485df26c79c247b08713',
'info_dict': {
- 'id': 'b8697515-4bbe-4e01-83d5-fa705ce5fa88',
+ 'id': '891f7d3c-5b5b-4753-b879-b7ba1a601757',
'ext': 'mp4',
- 'title': 'Younger|December 28, 2015|2|NO-EPISODE#|Younger: Hilary Duff - Little Lies',
- 'description': 'md5:7d192f56ca8d958645c83f0de8ef0269',
- 'upload_date': '20151228',
- 'timestamp': 1451289600,
+ 'title': 'Younger|April 30, 2019|6|NO-EPISODE#|A First Look at Younger Season 6',
+ 'description': 'md5:595ea74578d3a888ae878dfd1c7d4ab2',
+ 'upload_date': '20190430',
+ 'timestamp': 1556658000,
+ },
+ 'params': {
+ 'skip_download': True,
},
}, {
'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301',
from .common import InfoExtractor
from ..utils import (
int_or_none,
+ NO_DEFAULT,
unescapeHTML,
)
'id': '1584444',
'ext': 'mp4',
'title': '"Święta mają być wesołe, dlatego, ludziska, wszyscy pod jemiołę"',
- 'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości "Szkła kontaktowego".',
+ 'description': 'Wyjątkowe orędzie Artura Andrusa, jednego z gości Szkła kontaktowego.',
'thumbnail': 're:https?://.*[.]jpeg',
}
+ }, {
+ # different layout
+ 'url': 'https://tvnmeteo.tvn24.pl/magazyny/maja-w-ogrodzie,13/odcinki-online,1,4,1,0/pnacza-ptaki-i-iglaki-odc-691-hgtv-odc-29,1771763.html',
+ 'info_dict': {
+ 'id': '1771763',
+ 'ext': 'mp4',
+ 'title': 'Pnącza, ptaki i iglaki (odc. 691 /HGTV odc. 29)',
+ 'thumbnail': 're:https?://.*',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
}, {
'url': 'http://fakty.tvn24.pl/ogladaj-online,60/53-konferencja-bezpieczenstwa-w-monachium,716431.html',
'only_matching': True,
}]
def _real_extract(self, url):
- video_id = self._match_id(url)
+ display_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
+ webpage = self._download_webpage(url, display_id)
- title = self._og_search_title(webpage)
+ title = self._og_search_title(
+ webpage, default=None) or self._search_regex(
+ r'<h\d+[^>]+class=["\']magazineItemHeader[^>]+>(.+?)</h',
+ webpage, 'title')
- def extract_json(attr, name, fatal=True):
+ def extract_json(attr, name, default=NO_DEFAULT, fatal=True):
return self._parse_json(
self._search_regex(
r'\b%s=(["\'])(?P<json>(?!\1).+?)\1' % attr, webpage,
- name, group='json', fatal=fatal) or '{}',
- video_id, transform_source=unescapeHTML, fatal=fatal)
+ name, group='json', default=default, fatal=fatal) or '{}',
+ display_id, transform_source=unescapeHTML, fatal=fatal)
quality_data = extract_json('data-quality', 'formats')
})
self._sort_formats(formats)
- description = self._og_search_description(webpage)
+ description = self._og_search_description(webpage, default=None)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_regex(
r'\bdata-poster=(["\'])(?P<url>(?!\1).+?)\1', webpage,
'thumbnail', group='url')
+ video_id = None
+
share_params = extract_json(
- 'data-share-params', 'share params', fatal=False)
+ 'data-share-params', 'share params', default=None)
if isinstance(share_params, dict):
- video_id = share_params.get('id') or video_id
+ video_id = share_params.get('id')
+
+ if not video_id:
+ video_id = self._search_regex(
+ r'data-vid-id=["\'](\d+)', webpage, 'video id',
+ default=None) or self._search_regex(
+ r',(\d+)\.html', url, 'video id', default=display_id)
return {
'id': video_id,
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
- '%s/vod/%s?%s' % (
+ '%s/vod/%s.m3u8?%s' % (
self._USHER_BASE, item_id,
compat_urllib_parse_urlencode({
'allow_source': 'true',
'params': {
'skip_download': True, # requires ffmpeg
},
+ }, {
+ 'url': 'https://twitter.com/foobar/status/1087791357756956680',
+ 'info_dict': {
+ 'id': '1087791357756956680',
+ 'ext': 'mp4',
+ 'title': 'Twitter - A new is coming. Some of you got an opt-in to try it now. Check out the emoji button, quick keyboard shortcuts, upgraded trends, advanced search, and more. Let us know your thoughts!',
+ 'thumbnail': r're:^https?://.*\.jpg',
+ 'description': 'md5:66d493500c013e3e2d434195746a7f78',
+ 'uploader': 'Twitter',
+ 'uploader_id': 'Twitter',
+ 'duration': 61.567,
+ },
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
- user_id = mobj.group('user_id')
twid = mobj.group('id')
webpage, urlh = self._download_webpage_handle(
if 'twitter.com/account/suspended' in urlh.geturl():
raise ExtractorError('Account suspended by Twitter.', expected=True)
- if user_id is None:
- mobj = re.match(self._VALID_URL, urlh.geturl())
+ user_id = None
+
+ redirect_mobj = re.match(self._VALID_URL, urlh.geturl())
+ if redirect_mobj:
+ user_id = redirect_mobj.group('user_id')
+
+ if not user_id:
user_id = mobj.group('user_id')
username = remove_end(self._og_search_title(webpage), ' on Twitter')
# coding: utf-8
from __future__ import unicode_literals
-import re
-
from .adobepass import AdobePassIE
from ..utils import (
- extract_attributes,
+ NO_DEFAULT,
smuggle_url,
update_url_query,
)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
- player_params = extract_attributes(self._search_regex(
- r'(<div[^>]+data-usa-tve-player-container[^>]*>)', webpage, 'player params'))
- video_id = player_params['data-mpx-guid']
- title = player_params['data-episode-title']
+ def _x(name, default=NO_DEFAULT):
+ return self._search_regex(
+ r'data-%s\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % name,
+ webpage, name, default=default, group='value')
- account_pid, path = re.search(
- r'data-src="(?:https?)?//player\.theplatform\.com/p/([^/]+)/.*?/(media/guid/\d+/\d+)',
- webpage).groups()
+ video_id = _x('mpx-guid')
+ title = _x('episode-title')
+ mpx_account_id = _x('mpx-account-id', '2304992029')
query = {
'mbr': 'true',
}
- if player_params.get('data-is-full-episode') == '1':
+ if _x('is-full-episode', None) == '1':
query['manifest'] = 'm3u'
- if player_params.get('data-entitlement') == 'auth':
+ if _x('is-entitlement', None) == '1':
adobe_pass = {}
drupal_settings = self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
adobe_pass = drupal_settings.get('adobePass', {})
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'usa'),
- title, video_id, player_params.get('data-episode-rating', 'TV-14'))
+ title, video_id, _x('episode-rating', 'TV-14'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, adobe_pass.get('adobePassRequestorId', 'usa'), resource)
info.update({
'_type': 'url_transparent',
'url': smuggle_url(update_url_query(
- 'http://link.theplatform.com/s/%s/%s' % (account_pid, path),
+ 'http://link.theplatform.com/s/HNK2IC/media/guid/%s/%s' % (mpx_account_id, video_id),
query), {'force_smil_url': True}),
'id': video_id,
'title': title,
- 'series': player_params.get('data-show-title'),
+ 'series': _x('show-title', None),
'episode': title,
'ie_key': 'ThePlatform',
})
from __future__ import unicode_literals
import base64
+import functools
import json
import re
import itertools
from .common import InfoExtractor
from ..compat import (
+ compat_kwargs,
compat_HTTPError,
compat_str,
compat_urlparse,
int_or_none,
merge_dicts,
NO_DEFAULT,
+ OnDemandPagedList,
parse_filesize,
qualities,
RegexNotFoundError,
webpage, 'vuid', group='vuid')
return xsrft, vuid
+ def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs):
+ vimeo_config = self._search_regex(
+ r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));',
+ webpage, 'vimeo config', *args, **compat_kwargs(kwargs))
+ if vimeo_config:
+ return self._parse_json(vimeo_config, video_id)
+
def _set_vimeo_cookie(self, name, value):
self._set_cookie('vimeo.com', name, value)
\.
)?
vimeo(?P<pro>pro)?\.com/
- (?!(?:channels|album)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
+ (?!(?:channels|album|showcase)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
(?:.*?/)?
(?:
(?:
# and latter we extract those that are Vimeo specific.
self.report_extraction(video_id)
- vimeo_config = self._search_regex(
- r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));', webpage,
- 'vimeo config', default=None)
+ vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
if vimeo_config:
- seed_status = self._parse_json(vimeo_config, video_id).get('seed_status', {})
+ seed_status = vimeo_config.get('seed_status', {})
if seed_status.get('state') == 'failed':
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, seed_status['title']),
class VimeoAlbumIE(VimeoChannelIE):
IE_NAME = 'vimeo:album'
- _VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)(?:$|[?#]|/(?!video))'
+ _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
_TESTS = [{
'url': 'https://vimeo.com/album/2632481',
'params': {
'videopassword': 'youtube-dl',
}
- }, {
- 'url': 'https://vimeo.com/album/2632481/sort:plays/format:thumbnail',
- 'only_matching': True,
- }, {
- # TODO: respect page number
- 'url': 'https://vimeo.com/album/2632481/page:2/sort:plays/format:thumbnail',
- 'only_matching': True,
}]
-
- def _page_url(self, base_url, pagenum):
- return '%s/page:%d/' % (base_url, pagenum)
+ _PAGE_SIZE = 100
+
+ def _fetch_page(self, album_id, authorizaion, hashed_pass, page):
+ api_page = page + 1
+ query = {
+ 'fields': 'link',
+ 'page': api_page,
+ 'per_page': self._PAGE_SIZE,
+ }
+ if hashed_pass:
+ query['_hashed_pass'] = hashed_pass
+ videos = self._download_json(
+ 'https://api.vimeo.com/albums/%s/videos' % album_id,
+ album_id, 'Downloading page %d' % api_page, query=query, headers={
+ 'Authorization': 'jwt ' + authorizaion,
+ })['data']
+ for video in videos:
+ link = video.get('link')
+ if not link:
+ continue
+ yield self.url_result(link, VimeoIE.ie_key(), VimeoIE._match_id(link))
def _real_extract(self, url):
album_id = self._match_id(url)
- return self._extract_videos(album_id, 'https://vimeo.com/album/%s' % album_id)
+ webpage = self._download_webpage(url, album_id)
+ webpage = self._login_list_password(url, album_id, webpage)
+ api_config = self._extract_vimeo_config(webpage, album_id)['api']
+ entries = OnDemandPagedList(functools.partial(
+ self._fetch_page, album_id, api_config['jwt'],
+ api_config.get('hashed_pass')), self._PAGE_SIZE)
+ return self.playlist_result(entries, album_id, self._html_search_regex(
+ r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
class VimeoGroupsIE(VimeoAlbumIE):
from .common import InfoExtractor
from .once import OnceIE
from ..compat import compat_urllib_parse_unquote
-from ..utils import ExtractorError
+from ..utils import (
+ ExtractorError,
+ int_or_none,
+)
class VoxMediaVolumeIE(OnceIE):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
- video_data = self._parse_json(self._search_regex(
- r'Volume\.createVideo\(({.+})\s*,\s*{.*}\s*,\s*\[.*\]\s*,\s*{.*}\);', webpage, 'video data'), video_id)
+
+ setup = self._parse_json(self._search_regex(
+ r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
+ video_data = setup.get('video') or {}
+ info = {
+ 'id': video_id,
+ 'title': video_data.get('title_short'),
+ 'description': video_data.get('description_long') or video_data.get('description_short'),
+ 'thumbnail': video_data.get('brightcove_thumbnail')
+ }
+ asset = setup.get('asset') or setup.get('params') or {}
+
+ formats = []
+ hls_url = asset.get('hls_url')
+ if hls_url:
+ formats.extend(self._extract_m3u8_formats(
+ hls_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+ mp4_url = asset.get('mp4_url')
+ if mp4_url:
+ tbr = self._search_regex(r'-(\d+)k\.', mp4_url, 'bitrate', default=None)
+ format_id = 'http'
+ if tbr:
+ format_id += '-' + tbr
+ formats.append({
+ 'format_id': format_id,
+ 'url': mp4_url,
+ 'tbr': int_or_none(tbr),
+ })
+ if formats:
+ self._sort_formats(formats)
+ info['formats'] = formats
+ return info
+
for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
provider_video_id = video_data.get('%s_id' % provider_video_type)
if not provider_video_id:
continue
- info = {
- 'id': video_id,
- 'title': video_data.get('title_short'),
- 'description': video_data.get('description_long') or video_data.get('description_short'),
- 'thumbnail': video_data.get('brightcove_thumbnail')
- }
if provider_video_type == 'brightcove':
info['formats'] = self._extract_once_formats(provider_video_id)
self._sort_formats(info['formats'])
class VoxMediaIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
+ _VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked|funnyordie)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
_TESTS = [{
+ # Volume embed, Youtube
'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
'info_dict': {
- 'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
+ 'id': 'j4mLW6x17VM',
'ext': 'mp4',
- 'title': 'Google\'s new material design direction',
- 'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
- },
- 'params': {
- # m3u8 download
- 'skip_download': True,
+ 'title': 'Material world: how Google discovered what software is made of',
+ 'description': 'md5:dfc17e7715e3b542d66e33a109861382',
+ 'upload_date': '20190710',
+ 'uploader_id': 'TheVerge',
+ 'uploader': 'The Verge',
},
- 'add_ie': ['Ooyala'],
+ 'add_ie': ['Youtube'],
}, {
- # data-ooyala-id
+ # Volume embed, Youtube
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
- 'md5': 'd744484ff127884cd2ba09e3fa604e4b',
+ 'md5': '4c8f4a0937752b437c3ebc0ed24802b5',
'info_dict': {
- 'id': 'RkZXU4cTphOCPDMZg5oEounJyoFI0g-B',
+ 'id': 'Gy8Md3Eky38',
'ext': 'mp4',
'title': 'The Nexus 6: hands-on with Google\'s phablet',
- 'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af',
+ 'description': 'md5:d9f0216e5fb932dd2033d6db37ac3f1d',
+ 'uploader_id': 'TheVerge',
+ 'upload_date': '20141021',
+ 'uploader': 'The Verge',
},
- 'add_ie': ['Ooyala'],
- 'skip': 'Video Not Found',
+ 'add_ie': ['Youtube'],
+ 'skip': 'similar to the previous test',
}, {
- # volume embed
+ # Volume embed, Youtube
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'info_dict': {
- 'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
+ 'id': 'YCjDnX-Xzhg',
'ext': 'mp4',
- 'title': 'The new frontier of LGBTQ civil rights, explained',
- 'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
- },
- 'params': {
- # m3u8 download
- 'skip_download': True,
+ 'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
+ 'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
+ 'uploader_id': 'voxdotcom',
+ 'upload_date': '20150915',
+ 'uploader': 'Vox',
},
- 'add_ie': ['Ooyala'],
+ 'add_ie': ['Youtube'],
+ 'skip': 'similar to the previous test',
}, {
# youtube embed
'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
'uploader': 'Vox',
},
'add_ie': ['Youtube'],
+ 'skip': 'Page no longer contain videos',
}, {
# SBN.VideoLinkset.entryGroup multiple ooyala embeds
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
'description': 'md5:e02d56b026d51aa32c010676765a690d',
},
}],
+ 'skip': 'Page no longer contain videos',
}, {
# volume embed, Brightcove Once
'url': 'https://www.recode.net/2014/6/17/11628066/post-post-pc-ceo-the-full-code-conference-video-of-microsofts-satya',
- 'md5': '01571a896281f77dc06e084138987ea2',
+ 'md5': '2dbc77b8b0bff1894c2fce16eded637d',
'info_dict': {
'id': '1231c973d',
'ext': 'mp4',
def _call_cms(self, path, video_id, note):
if not self._CMS_SIGNING:
- self._CMS_SIGNING = self._call_api('index', video_id, 'CMS Signing')['cms_signing']
+ index = self._call_api('index', video_id, 'CMS Signing')
+ self._CMS_SIGNING = index.get('cms_signing') or {}
+ if not self._CMS_SIGNING:
+ for signing_policy in index.get('signing_policies', []):
+ signing_path = signing_policy.get('path')
+ if signing_path and signing_path.startswith('/cms/'):
+ name, value = signing_policy.get('name'), signing_policy.get('value')
+ if name and value:
+ self._CMS_SIGNING[name] = value
return self._download_json(
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
'ext': 'mp3',
'title': 'MP3',
},
+ }, {
+ # with null videoTitle
+ 'url': 'https://view.vzaar.com/20313539/download',
+ 'only_matching': True,
}]
@staticmethod
video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
- title = video_data['videoTitle']
+ title = video_data.get('videoTitle') or video_id
formats = []
from __future__ import unicode_literals
+import itertools
import re
from .common import InfoExtractor
clean_html,
determine_ext,
dict_get,
+ extract_attributes,
ExtractorError,
int_or_none,
parse_duration,
class XHamsterIE(InfoExtractor):
+ _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
_VALID_URL = r'''(?x)
https?://
- (?:.+?\.)?xhamster\.(?:com|one)/
+ (?:.+?\.)?%s/
(?:
movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
)
- '''
-
+ ''' % _DOMAINS
_TESTS = [{
- 'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
- 'md5': '8281348b8d3c53d39fffb377d24eac4e',
+ 'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+ 'md5': '98b4687efb1ffd331c4197854dc09e8f',
'info_dict': {
'id': '1509445',
- 'display_id': 'femaleagent_shy_beauty_takes_the_bait',
+ 'display_id': 'femaleagent-shy-beauty-takes-the-bait',
'ext': 'mp4',
'title': 'FemaleAgent Shy beauty takes the bait',
'timestamp': 1350194821,
'uploader': 'Ruseful2011',
'duration': 893,
'age_limit': 18,
- 'categories': ['Fake Hub', 'Amateur', 'MILFs', 'POV', 'Beauti', 'Beauties', 'Beautiful', 'Boss', 'Office', 'Oral', 'Reality', 'Sexy', 'Taking'],
},
}, {
- 'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
+ 'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=',
'info_dict': {
'id': '2221348',
- 'display_id': 'britney_spears_sexy_booty',
+ 'display_id': 'britney-spears-sexy-booty',
'ext': 'mp4',
'title': 'Britney Spears Sexy Booty',
'timestamp': 1379123460,
'uploader': 'jojo747400',
'duration': 200,
'age_limit': 18,
- 'categories': ['Britney Spears', 'Celebrities', 'HD Videos', 'Sexy', 'Sexy Booty'],
},
'params': {
'skip_download': True,
},
}, {
- # empty seo
+ # empty seo, unavailable via new URL schema
'url': 'http://xhamster.com/movies/5667973/.html',
'info_dict': {
'id': '5667973',
'uploader': 'parejafree',
'duration': 72,
'age_limit': 18,
- 'categories': ['Amateur', 'Blowjobs'],
},
'params': {
'skip_download': True,
}, {
'url': 'https://xhamster.one/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
+ }, {
+ 'url': 'https://xhamster.desi/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
+ 'only_matching': True,
}]
def _real_extract(self, url):
class XHamsterEmbedIE(InfoExtractor):
- _VALID_URL = r'https?://(?:.+?\.)?xhamster\.com/xembed\.php\?video=(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:.+?\.)?%s/xembed\.php\?video=(?P<id>\d+)' % XHamsterIE._DOMAINS
_TEST = {
'url': 'http://xhamster.com/xembed.php?video=3328539',
'info_dict': {
video_url = dict_get(vars, ('downloadLink', 'homepageLink', 'commentsLink', 'shareUrl'))
return self.url_result(video_url, 'XHamster')
+
+
+class XHamsterUserIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:.+?\.)?%s/users/(?P<id>[^/?#&]+)' % XHamsterIE._DOMAINS
+ _TESTS = [{
+ # Paginated user profile
+ 'url': 'https://xhamster.com/users/netvideogirls/videos',
+ 'info_dict': {
+ 'id': 'netvideogirls',
+ },
+ 'playlist_mincount': 267,
+ }, {
+ # Non-paginated user profile
+ 'url': 'https://xhamster.com/users/firatkaan/videos',
+ 'info_dict': {
+ 'id': 'firatkaan',
+ },
+ 'playlist_mincount': 1,
+ }]
+
+ def _entries(self, user_id):
+ next_page_url = 'https://xhamster.com/users/%s/videos/1' % user_id
+ for pagenum in itertools.count(1):
+ page = self._download_webpage(
+ next_page_url, user_id, 'Downloading page %s' % pagenum)
+ for video_tag in re.findall(
+ r'(<a[^>]+class=["\'].*?\bvideo-thumb__image-container[^>]+>)',
+ page):
+ video = extract_attributes(video_tag)
+ video_url = url_or_none(video.get('href'))
+ if not video_url or not XHamsterIE.suitable(video_url):
+ continue
+ video_id = XHamsterIE._match_id(video_url)
+ yield self.url_result(
+ video_url, ie=XHamsterIE.ie_key(), video_id=video_id)
+ mobj = re.search(r'<a[^>]+data-page=["\']next[^>]+>', page)
+ if not mobj:
+ break
+ next_page = extract_attributes(mobj.group(0))
+ next_page_url = url_or_none(next_page.get('href'))
+ if not next_page_url:
+ break
+
+ def _real_extract(self, url):
+ user_id = self._match_id(url)
+ return self.playlist_result(self._entries(user_id), user_id)
# coding: utf-8
from __future__ import unicode_literals
+import hashlib
import itertools
import json
import re
from .common import InfoExtractor, SearchInfoExtractor
from ..compat import (
+ compat_str,
compat_urllib_parse,
compat_urlparse,
)
int_or_none,
mimetype2ext,
smuggle_url,
+ try_get,
unescapeHTML,
+ url_or_none,
)
from .brightcove import (
'https://gyao.yahoo.co.jp/player/%s/' % video_id.replace(':', '/'),
YahooGyaOPlayerIE.ie_key(), video_id))
return self.playlist_result(entries, program_id)
+
+
+class YahooJapanNewsIE(InfoExtractor):
+ IE_NAME = 'yahoo:japannews'
+ IE_DESC = 'Yahoo! Japan News'
+ _VALID_URL = r'https?://(?P<host>(?:news|headlines)\.yahoo\.co\.jp)[^\d]*(?P<id>\d[\d-]*\d)?'
+ _GEO_COUNTRIES = ['JP']
+ _TESTS = [{
+ 'url': 'https://headlines.yahoo.co.jp/videonews/ann?a=20190716-00000071-ann-int',
+ 'info_dict': {
+ 'id': '1736242',
+ 'ext': 'mp4',
+ 'title': 'ムン大統領が対日批判を強化“現金化”効果は?(テレビ朝日系(ANN)) - Yahoo!ニュース',
+ 'description': '韓国の元徴用工らを巡る裁判の原告が弁護士が差し押さえた三菱重工業の資産を売却して - Yahoo!ニュース(テレビ朝日系(ANN))',
+ 'thumbnail': r're:^https?://.*\.[a-zA-Z\d]{3,4}$',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ # geo restricted
+ 'url': 'https://headlines.yahoo.co.jp/hl?a=20190721-00000001-oxv-l04',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://headlines.yahoo.co.jp/videonews/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://news.yahoo.co.jp',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://news.yahoo.co.jp/byline/hashimotojunji/20190628-00131977/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://news.yahoo.co.jp/feature/1356',
+ 'only_matching': True
+ }]
+
+ def _extract_formats(self, json_data, content_id):
+ formats = []
+
+ video_data = try_get(
+ json_data,
+ lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
+ list)
+ for vid in video_data or []:
+ delivery = vid.get('delivery')
+ url = url_or_none(vid.get('Url'))
+ if not delivery or not url:
+ continue
+ elif delivery == 'hls':
+ formats.extend(
+ self._extract_m3u8_formats(
+ url, content_id, 'mp4', 'm3u8_native',
+ m3u8_id='hls', fatal=False))
+ else:
+ formats.append({
+ 'url': url,
+ 'format_id': 'http-%s' % compat_str(vid.get('bitrate', '')),
+ 'height': int_or_none(vid.get('height')),
+ 'width': int_or_none(vid.get('width')),
+ 'tbr': int_or_none(vid.get('bitrate')),
+ })
+ self._remove_duplicate_formats(formats)
+ self._sort_formats(formats)
+
+ return formats
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ host = mobj.group('host')
+ display_id = mobj.group('id') or host
+
+ webpage = self._download_webpage(url, display_id)
+
+ title = self._html_search_meta(
+ ['og:title', 'twitter:title'], webpage, 'title', default=None
+ ) or self._html_search_regex('<title>([^<]+)</title>', webpage, 'title')
+
+ if display_id == host:
+ # Headline page (w/ multiple BC playlists) ('news.yahoo.co.jp', 'headlines.yahoo.co.jp/videonews/', ...)
+ stream_plists = re.findall(r'plist=(\d+)', webpage) or re.findall(r'plist["\']:\s*["\']([^"\']+)', webpage)
+ entries = [
+ self.url_result(
+ smuggle_url(
+ 'http://players.brightcove.net/5690807595001/HyZNerRl7_default/index.html?playlistId=%s' % plist_id,
+ {'geo_countries': ['JP']}),
+ ie='BrightcoveNew', video_id=plist_id)
+ for plist_id in stream_plists]
+ return self.playlist_result(entries, playlist_title=title)
+
+ # Article page
+ description = self._html_search_meta(
+ ['og:description', 'description', 'twitter:description'],
+ webpage, 'description', default=None)
+ thumbnail = self._og_search_thumbnail(
+ webpage, default=None) or self._html_search_meta(
+ 'twitter:image', webpage, 'thumbnail', default=None)
+ space_id = self._search_regex([
+ r'<script[^>]+class=["\']yvpub-player["\'][^>]+spaceid=([^&"\']+)',
+ r'YAHOO\.JP\.srch\.\w+link\.onLoad[^;]+spaceID["\' ]*:["\' ]+([^"\']+)',
+ r'<!--\s+SpaceID=(\d+)'
+ ], webpage, 'spaceid')
+
+ content_id = self._search_regex(
+ r'<script[^>]+class=["\']yvpub-player["\'][^>]+contentid=(?P<contentid>[^&"\']+)',
+ webpage, 'contentid', group='contentid')
+
+ json_data = self._download_json(
+ 'https://feapi-yvpub.yahooapis.jp/v1/content/%s' % content_id,
+ content_id,
+ query={
+ 'appid': 'dj0zaiZpPVZMTVFJR0FwZWpiMyZzPWNvbnN1bWVyc2VjcmV0Jng9YjU-',
+ 'output': 'json',
+ 'space_id': space_id,
+ 'domain': host,
+ 'ak': hashlib.md5('_'.join((space_id, host)).encode()).hexdigest(),
+ 'device_type': '1100',
+ })
+ formats = self._extract_formats(json_data, content_id)
+
+ return {
+ 'id': content_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'formats': formats,
+ }
ExtractorError,
int_or_none,
float_or_none,
+ try_get,
)
IE_DESC = 'Яндекс.Музыка - Трек'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
- _TEST = {
+ _TESTS = [{
'url': 'http://music.yandex.ru/album/540508/track/4878838',
'md5': 'f496818aa2f60b6c0062980d2e00dc20',
'info_dict': {
'id': '4878838',
'ext': 'mp3',
- 'title': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
+ 'title': 'Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
'filesize': 4628061,
'duration': 193.04,
'track': 'Gypsy Eyes 1',
'album': 'Gypsy Soul',
'album_artist': 'Carlo Ambrosio',
- 'artist': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari',
+ 'artist': 'Carlo Ambrosio & Fabio Di Bari',
'release_year': 2009,
},
'skip': 'Travis CI servers blocked by YandexMusic',
- }
+ }, {
+ # multiple disks
+ 'url': 'http://music.yandex.ru/album/3840501/track/705105',
+ 'md5': 'ebe7b4e2ac7ac03fe11c19727ca6153e',
+ 'info_dict': {
+ 'id': '705105',
+ 'ext': 'mp3',
+ 'title': 'Hooverphonic - Sometimes',
+ 'filesize': 5743386,
+ 'duration': 239.27,
+ 'track': 'Sometimes',
+ 'album': 'The Best of Hooverphonic',
+ 'album_artist': 'Hooverphonic',
+ 'artist': 'Hooverphonic',
+ 'release_year': 2016,
+ 'genre': 'pop',
+ 'disc_number': 2,
+ 'track_number': 9,
+ },
+ 'skip': 'Travis CI servers blocked by YandexMusic',
+ }]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
'abr': int_or_none(download_data.get('bitrate')),
}
+ def extract_artist_name(artist):
+ decomposed = artist.get('decomposed')
+ if not isinstance(decomposed, list):
+ return artist['name']
+ parts = [artist['name']]
+ for element in decomposed:
+ if isinstance(element, dict) and element.get('name'):
+ parts.append(element['name'])
+ elif isinstance(element, compat_str):
+ parts.append(element)
+ return ''.join(parts)
+
def extract_artist(artist_list):
if artist_list and isinstance(artist_list, list):
- artists_names = [a['name'] for a in artist_list if a.get('name')]
+ artists_names = [extract_artist_name(a) for a in artist_list if a.get('name')]
if artists_names:
return ', '.join(artists_names)
album = albums[0]
if isinstance(album, dict):
year = album.get('year')
+ disc_number = int_or_none(try_get(
+ album, lambda x: x['trackPosition']['volume']))
+ track_number = int_or_none(try_get(
+ album, lambda x: x['trackPosition']['index']))
track_info.update({
'album': album.get('title'),
'album_artist': extract_artist(album.get('artists')),
'release_year': int_or_none(year),
+ 'genre': album.get('genre'),
+ 'disc_number': disc_number,
+ 'track_number': track_number,
})
track_artist = extract_artist(track.get('artists'))
IE_DESC = 'Яндекс.Музыка - Альбом'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<id>\d+)/?(\?|$)'
- _TEST = {
+ _TESTS = [{
'url': 'http://music.yandex.ru/album/540508',
'info_dict': {
'id': '540508',
},
'playlist_count': 50,
'skip': 'Travis CI servers blocked by YandexMusic',
- }
+ }, {
+ 'url': 'https://music.yandex.ru/album/3840501',
+ 'info_dict': {
+ 'id': '3840501',
+ 'title': 'Hooverphonic - The Best of Hooverphonic (2016)',
+ },
+ 'playlist_count': 33,
+ 'skip': 'Travis CI servers blocked by YandexMusic',
+ }]
def _real_extract(self, url):
album_id = self._match_id(url)
'http://music.yandex.ru/handlers/album.jsx?album=%s' % album_id,
album_id, 'Downloading album JSON')
- entries = self._build_playlist(album['volumes'][0])
+ entries = self._build_playlist([track for volume in album['volumes'] for track in volume])
title = '%s - %s' % (album['artists'][0]['name'], album['title'])
year = album.get('year')
from .common import InfoExtractor
from ..utils import (
+ determine_ext,
int_or_none,
url_or_none,
)
# episode, sports
'url': 'https://yandex.ru/?stream_channel=1538487871&stream_id=4132a07f71fb0396be93d74b3477131d',
'only_matching': True,
+ }, {
+ # DASH with DRM
+ 'url': 'https://yandex.ru/portal/video?from=morda&stream_id=485a92d94518d73a9d0ff778e13505f8',
+ 'only_matching': True,
}]
def _real_extract(self, url):
'disable_trackings': 1,
})['content']
- m3u8_url = url_or_none(content.get('content_url')) or url_or_none(
+ content_url = url_or_none(content.get('content_url')) or url_or_none(
content['streams'][0]['url'])
title = content.get('title') or content.get('computed_title')
- formats = self._extract_m3u8_formats(
- m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
- m3u8_id='hls')
+ ext = determine_ext(content_url)
+
+ if ext == 'm3u8':
+ formats = self._extract_m3u8_formats(
+ content_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls')
+ elif ext == 'mpd':
+ formats = self._extract_mpd_formats(
+ content_url, video_id, mpd_id='dash')
+ else:
+ formats = [{'url': content_url}]
+
self._sort_formats(formats)
description = content.get('description')
compat_str,
)
from ..utils import (
+ bool_or_none,
clean_html,
dict_get,
error_to_compat_str,
+ extract_attributes,
ExtractorError,
float_or_none,
get_element_by_attribute,
'f.req': json.dumps(f_req),
'flowName': 'GlifWebSignIn',
'flowEntry': 'ServiceLogin',
+ # TODO: reverse actual botguard identifier generation algo
+ 'bgRequest': '["identifier",""]',
})
return self._download_json(
url, None, note=note, errnote=errnote,
for video_id, video_title in self.extract_videos_from_page(content):
yield self.url_result(video_id, 'Youtube', video_id, video_title)
- def extract_videos_from_page(self, page):
- ids_in_page = []
- titles_in_page = []
- for mobj in re.finditer(self._VIDEO_RE, page):
+ def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page):
+ for mobj in re.finditer(video_re, page):
# The link with index 0 is not the first video of the playlist (not sure if still actual)
if 'index' in mobj.groupdict() and mobj.group('id') == '0':
continue
video_id = mobj.group('id')
- video_title = unescapeHTML(mobj.group('title'))
+ video_title = unescapeHTML(
+ mobj.group('title')) if 'title' in mobj.groupdict() else None
if video_title:
video_title = video_title.strip()
+ if video_title == '► Play all':
+ video_title = None
try:
idx = ids_in_page.index(video_id)
if video_title and not titles_in_page[idx]:
except ValueError:
ids_in_page.append(video_id)
titles_in_page.append(video_title)
+
+ def extract_videos_from_page(self, page):
+ ids_in_page = []
+ titles_in_page = []
+ self.extract_videos_from_page_impl(
+ self._VIDEO_RE, page, ids_in_page, titles_in_page)
return zip(ids_in_page, titles_in_page)
(?:www\.)?hooktube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/|
+ # Invidious instances taken from https://github.com/omarroth/invidious/wiki/Invidious-Instances
(?:(?:www|dev)\.)?invidio\.us/|
- (?:www\.)?invidiou\.sh/|
- (?:www\.)?invidious\.snopyta\.org/|
+ (?:(?:www|no)\.)?invidiou\.sh/|
+ (?:(?:www|fi|de)\.)?invidious\.snopyta\.org/|
(?:www\.)?invidious\.kabi\.tk/|
+ (?:www\.)?invidious\.enkirton\.net/|
+ (?:www\.)?invidious\.13ad\.de/|
+ (?:www\.)?invidious\.mastodon\.host/|
+ (?:www\.)?invidious\.nixnet\.xyz/|
+ (?:www\.)?tube\.poal\.co/|
(?:www\.)?vid\.wxzm\.sx/|
+ (?:www\.)?yt\.elukerio\.org/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
video_id = mobj.group(2)
return video_id
- def _extract_annotations(self, video_id):
- return self._download_webpage(
- 'https://www.youtube.com/annotations_invideo', video_id,
- note='Downloading annotations',
- errnote='Unable to download video annotations', fatal=False,
- query={
- 'features': 1,
- 'legacy': 1,
- 'video_id': video_id,
- })
-
@staticmethod
def _extract_chapters(description, duration):
if not description:
def extract_token(v_info):
return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token'))
+ def extract_player_response(player_response, video_id):
+ pl_response = str_or_none(player_response)
+ if not pl_response:
+ return
+ pl_response = self._parse_json(pl_response, video_id, fatal=False)
+ if isinstance(pl_response, dict):
+ add_dash_mpd_pr(pl_response)
+ return pl_response
+
player_response = {}
# Get video info
note='Refetching age-gated info webpage',
errnote='unable to download video info webpage')
video_info = compat_parse_qs(video_info_webpage)
+ pl_response = video_info.get('player_response', [None])[0]
+ player_response = extract_player_response(pl_response, video_id)
add_dash_mpd(video_info)
+ view_count = extract_view_count(video_info)
else:
age_gate = False
video_info = None
is_live = True
sts = ytplayer_config.get('sts')
if not player_response:
- pl_response = str_or_none(args.get('player_response'))
- if pl_response:
- pl_response = self._parse_json(pl_response, video_id, fatal=False)
- if isinstance(pl_response, dict):
- player_response = pl_response
+ player_response = extract_player_response(args.get('player_response'), video_id)
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
add_dash_mpd_pr(player_response)
# We also try looking in get_video_info since it may contain different dashmpd
get_video_info = compat_parse_qs(video_info_webpage)
if not player_response:
pl_response = get_video_info.get('player_response', [None])[0]
- if isinstance(pl_response, dict):
- player_response = pl_response
- add_dash_mpd_pr(player_response)
+ player_response = extract_player_response(pl_response, video_id)
add_dash_mpd(get_video_info)
if view_count is None:
view_count = extract_view_count(get_video_info)
break
def extract_unavailable_message():
- return self._html_search_regex(
- r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
- video_webpage, 'unavailable message', default=None)
+ messages = []
+ for tag, kind in (('h1', 'message'), ('div', 'submessage')):
+ msg = self._html_search_regex(
+ r'(?s)<{tag}[^>]+id=["\']unavailable-{kind}["\'][^>]*>(.+?)</{tag}>'.format(tag=tag, kind=kind),
+ video_webpage, 'unavailable %s' % kind, default=None)
+ if msg:
+ messages.append(msg)
+ if messages:
+ return '\n'.join(messages)
if not video_info:
unavailable_message = extract_unavailable_message()
video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {}
- # title
- if 'title' in video_info:
- video_title = video_info['title'][0]
- elif 'title' in player_response:
- video_title = video_details['title']
- else:
+ video_title = video_info.get('title', [None])[0] or video_details.get('title')
+ if not video_title:
self._downloader.report_warning('Unable to extract video title')
video_title = '_'
- # description
description_original = video_description = get_element_by_id("eow-description", video_webpage)
if video_description:
''', replace_url, video_description)
video_description = clean_html(video_description)
else:
- fd_mobj = re.search(r'<meta name="description" content="([^"]+)"', video_webpage)
- if fd_mobj:
- video_description = unescapeHTML(fd_mobj.group(1))
- else:
- video_description = ''
+ video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
if not smuggled_data.get('force_singlefeed', False):
if not self._downloader.params.get('noplaylist'):
if view_count is None and video_details:
view_count = int_or_none(video_details.get('viewCount'))
+ if is_live is None:
+ is_live = bool_or_none(video_details.get('isLive'))
+
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
raise ExtractorError('"rental" videos not supported. See https://github.com/ytdl-org/youtube-dl/issues/359 for more information.', expected=True)
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format)
else:
- error_message = clean_html(video_info.get('reason', [None])[0])
+ error_message = extract_unavailable_message()
+ if not error_message:
+ error_message = clean_html(try_get(
+ player_response, lambda x: x['playabilityStatus']['reason'],
+ compat_str))
if not error_message:
- error_message = extract_unavailable_message()
+ error_message = clean_html(
+ try_get(video_info, lambda x: x['reason'][0], compat_str))
if error_message:
raise ExtractorError(error_message, expected=True)
raise ExtractorError('no conn, hlsvp, hlsManifestUrl or url_encoded_fmt_stream_map information found in video info')
# annotations
video_annotations = None
if self._downloader.params.get('writeannotations', False):
- video_annotations = self._extract_annotations(video_id)
+ xsrf_token = self._search_regex(
+ r'([\'"])XSRF_TOKEN\1\s*:\s*([\'"])(?P<xsrf_token>[A-Za-z0-9+/=]+)\2',
+ video_webpage, 'xsrf token', group='xsrf_token', fatal=False)
+ invideo_url = try_get(
+ player_response, lambda x: x['annotations'][0]['playerAnnotationsUrlsRenderer']['invideoUrl'], compat_str)
+ if xsrf_token and invideo_url:
+ xsrf_field_name = self._search_regex(
+ r'([\'"])XSRF_FIELD_NAME\1\s*:\s*([\'"])(?P<xsrf_field_name>\w+)\2',
+ video_webpage, 'xsrf field name',
+ group='xsrf_field_name', default='session_token')
+ video_annotations = self._download_webpage(
+ self._proto_relative_url(invideo_url),
+ video_id, note='Downloading annotations',
+ errnote='Unable to download video annotations', fatal=False,
+ data=urlencode_postdata({xsrf_field_name: xsrf_token}))
chapters = self._extract_chapters(description_original, video_duration)
(%(playlist_id)s)
)""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
- _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
+ _VIDEO_RE_TPL = r'href="\s*/watch\?v=%s(?:&(?:[^"]*?index=(?P<index>\d+))?(?:[^>]+>(?P<title>[^<]+))?)?'
+ _VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
IE_NAME = 'youtube:playlist'
_TESTS = [{
'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
'info_dict': {
'title': '29C3: Not my department',
'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
+ 'uploader': 'Christiaan008',
+ 'uploader_id': 'ChRiStIaAn008',
},
'playlist_count': 95,
}, {
'info_dict': {
'title': '[OLD]Team Fortress 2 (Class-based LP)',
'id': 'PLBB231211A4F62143',
+ 'uploader': 'Wickydoo',
+ 'uploader_id': 'Wickydoo',
},
'playlist_mincount': 26,
}, {
'info_dict': {
'title': 'Uploads from Cauchemar',
'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q',
+ 'uploader': 'Cauchemar',
+ 'uploader_id': 'Cauchemar89',
},
'playlist_mincount': 799,
}, {
'info_dict': {
'title': 'JODA15',
'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
+ 'uploader': 'milan',
+ 'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
}
}, {
'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'playlist_mincount': 485,
'info_dict': {
- 'title': '2017 華語最新單曲 (2/24更新)',
+ 'title': '2018 Chinese New Singles (11/6 updated)',
'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
+ 'uploader': 'LBK',
+ 'uploader_id': 'sdragonfang',
}
}, {
'note': 'Embedded SWF player',
'info_dict': {
'title': 'JODA7',
'id': 'YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ',
- }
+ },
+ 'skip': 'This playlist does not exist',
}, {
'note': 'Buggy playlist: the webpage has a "Load more" button but it doesn\'t have more videos',
'url': 'https://www.youtube.com/playlist?list=UUXw-G3eDE9trcvY2sBMM_aA',
'info_dict': {
'title': 'Uploads from Interstellar Movie',
'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
+ 'uploader': 'Interstellar Movie',
+ 'uploader_id': 'InterstellarMovie1',
},
'playlist_mincount': 21,
}, {
'params': {
'skip_download': True,
},
+ 'skip': 'This video is not available.',
'add_ie': [YoutubeIE.ie_key()],
}, {
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
'uploader_id': 'backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
'upload_date': '20161008',
- 'license': 'Standard YouTube License',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'],
'tags': list,
'noplaylist': True,
'skip_download': True,
},
+ }, {
+ # https://github.com/ytdl-org/youtube-dl/issues/21844
+ 'url': 'https://www.youtube.com/playlist?list=PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
+ 'info_dict': {
+ 'title': 'Data Analysis with Dr Mike Pound',
+ 'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
+ 'uploader_id': 'Computerphile',
+ 'uploader': 'Computerphile',
+ },
+ 'playlist_mincount': 11,
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
def _real_initialize(self):
self._login()
+ def extract_videos_from_page(self, page):
+ ids_in_page = []
+ titles_in_page = []
+
+ for item in re.findall(
+ r'(<[^>]*\bdata-video-id\s*=\s*["\'][0-9A-Za-z_-]{11}[^>]+>)', page):
+ attrs = extract_attributes(item)
+ video_id = attrs['data-video-id']
+ video_title = unescapeHTML(attrs.get('data-title'))
+ if video_title:
+ video_title = video_title.strip()
+ ids_in_page.append(video_id)
+ titles_in_page.append(video_title)
+
+ # Fallback with old _VIDEO_RE
+ self.extract_videos_from_page_impl(
+ self._VIDEO_RE, page, ids_in_page, titles_in_page)
+
+ # Relaxed fallbacks
+ self.extract_videos_from_page_impl(
+ r'href="\s*/watch\?v\s*=\s*(?P<id>[0-9A-Za-z_-]{11})', page,
+ ids_in_page, titles_in_page)
+ self.extract_videos_from_page_impl(
+ r'data-video-ids\s*=\s*["\'](?P<id>[0-9A-Za-z_-]{11})', page,
+ ids_in_page, titles_in_page)
+
+ return zip(ids_in_page, titles_in_page)
+
def _extract_mix(self, playlist_id):
# The mixes are generated from a single video
# the id of the playlist is just 'RD' + video_id
'info_dict': {
'id': 'UUKfVa3S1e4PHvxWcwyMMg8w',
'title': 'Uploads from lex will',
+ 'uploader': 'lex will',
+ 'uploader_id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
}
}, {
'note': 'Age restricted channel',
'info_dict': {
'id': 'UUs0ifCMCm1icqRbqhUINa0w',
'title': 'Uploads from Deus Ex',
+ 'uploader': 'Deus Ex',
+ 'uploader_id': 'DeusExOfficial',
},
}, {
'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
'info_dict': {
'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
'title': 'Uploads from The Linux Foundation',
+ 'uploader': 'The Linux Foundation',
+ 'uploader_id': 'TheLinuxFoundation',
}
}, {
# Only available via https://www.youtube.com/c/12minuteathlete/videos
'info_dict': {
'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
'title': 'Uploads from 12 Minute Athlete',
+ 'uploader': '12 Minute Athlete',
+ 'uploader_id': 'the12minuteathlete',
}
}, {
'url': 'ytuser:phihag',
'playlist_mincount': 4,
'info_dict': {
'id': 'ThirstForScience',
- 'title': 'Thirst for Science',
+ 'title': 'ThirstForScience',
},
}, {
# with "Load more" button
'id': 'UCiU1dHvZObB2iP6xkJ__Icw',
'title': 'Chem Player',
},
+ 'skip': 'Blocked',
}]
from __future__ import unicode_literals
-__version__ = '2019.07.02'
+__version__ = '2019.09.01'