--all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific
one is requested
- -F, --list-formats List all available formats
+ -F, --list-formats List all available formats of specified
+ videos
--youtube-skip-dash-manifest Do not download the DASH manifests and
related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g.
## Subtitle Options:
--write-sub Write subtitle file
- --write-auto-sub Write automatic subtitle file (YouTube
- only)
+ --write-auto-sub Write automatically generated subtitle file
+ (YouTube only)
--all-subs Download all the available subtitles of the
video
--list-subs List all available subtitles for the video
Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
+### Do I need any other programs?
+
+youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
+
+Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](http://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
+
### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
--all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific
one is requested
- -F, --list-formats List all available formats
+ -F, --list-formats List all available formats of specified
+ videos
--youtube-skip-dash-manifest Do not download the DASH manifests and
related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g.
Subtitle Options:
--write-sub Write subtitle file
- --write-auto-sub Write automatic subtitle file (YouTube
- only)
+ --write-auto-sub Write automatically generated subtitle file
+ (YouTube only)
--all-subs Download all the available subtitles of the
video
--list-subs List all available subtitles for the video
webbrowser to the youtube URL, solving the CAPTCHA, and restart
youtube-dl.
+Do I need any other programs?
+
+youtube-dl works fine on its own on most sites. However, if you want to
+convert video/audio, you'll need avconv or ffmpeg. On some sites - most
+notably YouTube - videos can be retrieved in a higher quality format
+without sound. youtube-dl will detect whether avconv/ffmpeg is present
+and automatically pick the best option.
+
+Videos or video formats streamed via RTMP protocol can only be
+downloaded when rtmpdump is installed. Downloading MMS and RTSP videos
+requires either mplayer or mpv to be installed.
+
I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as vlc or
- **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek
- **Break**
- - **Brightcove**
+ - **brightcove:legacy**
+ - **brightcove:new**
- **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed**
- **Discovery**
- **Dotsub**
- **DouyuTV**: 斗鱼
+ - **DPlay**
- **dramafever**
- **dramafever:series**
- **DRBonanza**
- **GodTube**
- **GoldenMoustache**
- **Golem**
- - **GorillaVid**: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net and filehoot.com
- **Goshgay**
- **Groupon**
- **Hark**
- **nowness:playlist**
- **nowness:series**
- **NowTV**
+ - **NowTVList**
- **nowvideo**: NowVideo
- **npo**: npo.nl and ntr.nl
- **npo.nl:live**
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- - **Quickscope**: Quick Scope
- **QuickVid**
- **R7**
- **radio.de**
- **soompi:show**
- **soundcloud**
- **soundcloud:playlist**
+ - **soundcloud:search**: Soundcloud search
- **soundcloud:set**
- **soundcloud:user**
- **soundgasm**
- **WSJ**: Wall Street Journal
- **XBef**
- **XboxClips**
+ - **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
- **XHamster**
- **XHamsterEmbed**
- **XMinus**
- **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
+ - **youtube:user:playlists**: YouTube.com user playlists
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **Zapiks**
- **ZDF**
clean_html,
DateRange,
detect_exe_version,
+ determine_ext,
encodeFilename,
escape_rfc3986,
escape_url,
self.assertEqual(unescapeHTML('%20;'), '%20;')
self.assertEqual(unescapeHTML('/'), '/')
self.assertEqual(unescapeHTML('/'), '/')
- self.assertEqual(
- unescapeHTML('é'), 'é')
+ self.assertEqual(unescapeHTML('é'), 'é')
+ self.assertEqual(unescapeHTML('�'), '�')
def test_daterange(self):
_20century = DateRange("19000101", "20000101")
self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
+ def test_determine_ext(self):
+ self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
+ self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)
+ self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None)
+ self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None)
+ self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8')
+
def test_find_xpath_attr(self):
testxml = '''<root>
<node/>
.RE
.TP
.B \-F, \-\-list\-formats
-List all available formats
+List all available formats of specified videos
.RS
.RE
.TP
.RE
.TP
.B \-\-write\-auto\-sub
-Write automatic subtitle file (YouTube only)
+Write automatically generated subtitle file (YouTube only)
.RS
.RE
.TP
CAPTCHA (https://github.com/rg3/youtube-dl/issues/154), but at the
moment, your best course of action is pointing a webbrowser to the
youtube URL, solving the CAPTCHA, and restart youtube\-dl.
+.SS Do I need any other programs?
+.PP
+youtube\-dl works fine on its own on most sites.
+However, if you want to convert video/audio, you\[aq]ll need
+avconv (https://libav.org/) or ffmpeg (https://www.ffmpeg.org/).
+On some sites \- most notably YouTube \- videos can be retrieved in a
+higher quality format without sound.
+youtube\-dl will detect whether avconv/ffmpeg is present and
+automatically pick the best option.
+.PP
+Videos or video formats streamed via RTMP protocol can only be
+downloaded when rtmpdump (https://rtmpdump.mplayerhq.hu/) is installed.
+Downloading MMS and RTSP videos requires either
+mplayer (http://mplayerhq.hu/) or mpv (https://mpv.io/) to be installed.
.SS I have downloaded a video but how can I play it?
.PP
Once the video is fully downloaded, use any video player, such as
complete --command youtube-dl --long-option format --short-option f --description 'Video format code, see the "FORMAT SELECTION" for all the info'
complete --command youtube-dl --long-option all-formats --description 'Download all available video formats'
complete --command youtube-dl --long-option prefer-free-formats --description 'Prefer free video formats unless a specific one is requested'
-complete --command youtube-dl --long-option list-formats --short-option F --description 'List all available formats'
+complete --command youtube-dl --long-option list-formats --short-option F --description 'List all available formats of specified videos'
complete --command youtube-dl --long-option youtube-include-dash-manifest
complete --command youtube-dl --long-option youtube-skip-dash-manifest --description 'Do not download the DASH manifests and related data on YouTube videos'
complete --command youtube-dl --long-option merge-output-format --description 'If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no merge is required'
complete --command youtube-dl --long-option write-sub --description 'Write subtitle file'
-complete --command youtube-dl --long-option write-auto-sub --description 'Write automatic subtitle file (YouTube only)'
+complete --command youtube-dl --long-option write-auto-sub --description 'Write automatically generated subtitle file (YouTube only)'
complete --command youtube-dl --long-option all-subs --description 'Download all the available subtitles of the video'
complete --command youtube-dl --long-option list-subs --description 'List all available subtitles for the video'
complete --command youtube-dl --long-option sub-format --description 'Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"'
import ctypes
from .compat import (
+ compat_basestring,
compat_cookiejar,
compat_expanduser,
compat_get_terminal_size,
SameFileError,
sanitize_filename,
sanitize_path,
+ sanitized_Request,
std_headers,
subtitles_filename,
UnavailableVideoError,
writethumbnail: Write the thumbnail image to a file
write_all_thumbnails: Write all thumbnail formats to files
writesubtitles: Write the video subtitles to a file
- writeautomaticsub: Write the automatic subtitles to a file
+ writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
extra_info=extra)
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
+ self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
elif result_type == 'compat_list':
self.report_warning(
filter_parts.append(string)
def _remove_unused_ops(tokens):
- # Remove operators that we don't use and join them with the sourrounding strings
+ # Remove operators that we don't use and join them with the surrounding strings
# for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
ALLOWED_OPS = ('/', '+', ',', '(', ')')
last_string, last_start, last_end, last_line = None, None, None, None
return res
def _calc_cookies(self, info_dict):
- pr = compat_urllib_request.Request(info_dict['url'])
+ pr = sanitized_Request(info_dict['url'])
self.cookiejar.add_cookie_header(pr)
return pr.get_header('Cookie')
def urlopen(self, req):
""" Start an HTTP download """
+ if isinstance(req, compat_basestring):
+ req = sanitized_Request(req)
return self._opener.open(req, timeout=self._socket_timeout)
def print_debug_header(self):
with YoutubeDL(ydl_opts) as ydl:
# Update version
if opts.update_self:
- update_self(ydl.to_screen, opts.verbose)
+ update_self(ydl.to_screen, opts.verbose, ydl._opener)
# Remove cache dir
if opts.rm_cachedir:
min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
- (experimenatal)
+ (experimental)
external_downloader_args: A list of additional command-line arguments for the
external downloader.
import re
from .common import FileDownloader
-from ..compat import compat_urllib_request
+from ..utils import sanitized_Request
class DashSegmentsFD(FileDownloader):
def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
- req = compat_urllib_request.Request(target_url)
+ req = sanitized_Request(target_url)
if remaining_bytes is not None:
req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
# [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
args += [
'-headers',
- ''.join('%s: %s\r\n' % (key, val) for key, val in info_dict['http_headers'].items())]
+ ''.join('%s: %s\r\n' % (key, val) for key, val in info_dict['http_headers'].items() if key.lower() != 'accept-encoding')]
args += ['-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc']
import re
from .common import FileDownloader
-from ..compat import (
- compat_urllib_request,
- compat_urllib_error,
-)
+from ..compat import compat_urllib_error
from ..utils import (
ContentTooShortError,
encodeFilename,
sanitize_open,
+ sanitized_Request,
)
add_headers = info_dict.get('http_headers')
if add_headers:
headers.update(add_headers)
- basic_request = compat_urllib_request.Request(url, None, headers)
- request = compat_urllib_request.Request(url, None, headers)
+ basic_request = sanitized_Request(url, None, headers)
+ request = sanitized_Request(url, None, headers)
is_test = self.params.get('test', False)
return False
# Download using rtmpdump. rtmpdump returns exit code 2 when
- # the connection was interrumpted and resuming appears to be
+ # the connection was interrupted and resuming appears to be
# possible. This is part of rtmpdump's normal usage, AFAIK.
basic_args = [
'rtmpdump', '--verbose', '-r', url,
from .bpb import BpbIE
from .br import BRIE
from .breakcom import BreakIE
-from .brightcove import BrightcoveIE
+from .brightcove import (
+ BrightcoveLegacyIE,
+ BrightcoveNewIE,
+)
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .dhm import DHMIE
from .dotsub import DotsubIE
from .douyutv import DouyuTVIE
+from .dplay import DPlayIE
from .dramafever import (
DramaFeverIE,
DramaFeverSeriesIE,
from .golem import GolemIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
-from .gorillavid import GorillaVidIE
from .goshgay import GoshgayIE
from .groupon import GrouponIE
from .hark import HarkIE
NownessPlaylistIE,
NownessSeriesIE,
)
-from .nowtv import NowTVIE
+from .nowtv import (
+ NowTVIE,
+ NowTVListIE,
+)
from .nowvideo import NowVideoIE
from .npo import (
NPOIE,
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
-from .periscope import (
- PeriscopeIE,
- QuickscopeIE,
-)
+from .periscope import PeriscopeIE
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
- SoundcloudPlaylistIE
+ SoundcloudPlaylistIE,
+ SoundcloudSearchIE
)
from .soundgasm import (
SoundgasmIE,
from .wsj import WSJIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
+from .xfileshare import XFileShareIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
+ YoutubeUserPlaylistsIE,
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE
'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.',
'uploader': 'Al Jazeera English',
},
- 'add_ie': ['Brightcove'],
+ 'add_ie': ['BrightcoveLegacy'],
'skip': 'Not accessible from Travis CI server',
}
'playerKey=AQ~~%2CAAAAmtVJIFk~%2CTVGOQ5ZTwJbeMWnq5d_H4MOM57xfzApc'
'&%40videoPlayer={0}'.format(brightcove_id)
),
- 'ie_key': 'Brightcove',
+ 'ie_key': 'BrightcoveLegacy',
}
from ..compat import (
compat_str,
compat_urllib_parse,
- compat_urllib_request,
)
from ..utils import (
int_or_none,
float_or_none,
+ sanitized_Request,
xpath_text,
ExtractorError,
)
'j_password': password,
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
formats = []
for fmt in ['windows', 'android_tablet']:
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._URL_VIDEO_TEMPLATE.format(fmt, episode_id, timestamp_shifted, token))
request.add_header('User-Agent', self._USER_AGENT)
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
- compat_urllib_request,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
+ sanitized_Request,
)
'pass': password,
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
'&sort=created&access_mode=0%2C1%2C2&limit={count}'
'&method=broadcast&format=json&vid_older_than={last}'
).format(user=user, count=self._STEP, last=last_id)
- req = compat_urllib_request.Request(req_url)
+ req = sanitized_Request(req_url)
# Without setting this header, we wouldn't get any result
req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
data = self._download_json(
_MEDIASELECTOR_URLS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails
# with geolocation in some cases when it's even not geo restricted at all (e.g.
- # http://www.bbc.co.uk/programmes/b06bp7lf)
+ # http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
]
return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id)
except BBCCoUkIE.MediaSelectionError as e:
- if e.id in ('notukerror', 'geolocation'):
+ if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
last_exception = e
continue
self._raise_extractor_error(e)
media_selection = self._download_xml(
url, programme_id, 'Downloading media selection XML')
except ExtractorError as ee:
- if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
+ if isinstance(ee.cause, compat_HTTPError) and ee.cause.code in (403, 404):
media_selection = compat_etree_fromstring(ee.cause.read().decode('utf-8'))
else:
raise
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urlparse,
-)
+from ..compat import compat_urlparse
from ..utils import (
clean_html,
int_or_none,
parse_iso8601,
+ sanitized_Request,
unescapeHTML,
xpath_text,
xpath_with_ns,
for lang, url in subtitles_urls.items():
# For some weird reason, blip.tv serves a video instead of subtitles
# when we request with a common UA
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('User-Agent', 'youtube-dl')
subtitles[lang] = [{
# The extension is 'srt' but it's actually an 'ass' file
class BloombergIE(InfoExtractor):
- _VALID_URL = r'https?://www\.bloomberg\.com/news/videos/[^/]+/(?P<id>[^/?#]+)'
+ _VALID_URL = r'https?://www\.bloomberg\.com/news/[^/]+/[^/]+/(?P<id>[^/?#]+)'
- _TEST = {
+ _TESTS = [{
'url': 'http://www.bloomberg.com/news/videos/b/aaeae121-5949-481e-a1ce-4562db6f5df2',
# The md5 checksum changes
'info_dict': {
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:a8ba0302912d03d246979735c17d2761',
},
- }
+ }, {
+ 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
name = self._match_id(url)
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
- compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
)
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
+ float_or_none,
+ js_to_json,
+ int_or_none,
+ parse_iso8601,
+ sanitized_Request,
unescapeHTML,
unsmuggle_url,
)
-class BrightcoveIE(InfoExtractor):
+class BrightcoveLegacyIE(InfoExtractor):
+ IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
def _get_video_info(self, video_id, query_str, query, referer=None):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
- req = compat_urllib_request.Request(request_url)
+ req = sanitized_Request(request_url)
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
if 'url' not in info and not info.get('formats'):
raise ExtractorError('Unable to extract video url for %s' % info['id'])
return info
+
+
+class BrightcoveNewIE(InfoExtractor):
+ IE_NAME = 'brightcove:new'
+ _VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+)'
+ _TESTS = [{
+ 'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
+ 'md5': 'c8100925723840d4b0d243f7025703be',
+ 'info_dict': {
+ 'id': '4463358922001',
+ 'ext': 'mp4',
+ 'title': 'Meet the man behind Popcorn Time',
+ 'description': 'md5:eac376a4fe366edc70279bfb681aea16',
+ 'duration': 165.768,
+ 'timestamp': 1441391203,
+ 'upload_date': '20150904',
+ 'uploader_id': '929656772001',
+ 'formats': 'mincount:22',
+ },
+ }, {
+ # with rtmp streams
+ 'url': 'http://players.brightcove.net/4036320279001/5d112ed9-283f-485f-a7f9-33f42e8bc042_default/index.html?videoId=4279049078001',
+ 'info_dict': {
+ 'id': '4279049078001',
+ 'ext': 'mp4',
+ 'title': 'Titansgrave: Chapter 0',
+ 'description': 'Titansgrave: Chapter 0',
+ 'duration': 1242.058,
+ 'timestamp': 1433556729,
+ 'upload_date': '20150606',
+ 'uploader_id': '4036320279001',
+ 'formats': 'mincount:41',
+ },
+ 'params': {
+ 'skip_download': True,
+ }
+ }]
+
+ @staticmethod
+ def _extract_urls(webpage):
+ # Reference:
+ # 1. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideoiniframe
+ # 2. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideousingjavascript)
+ # 3. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/embed-in-page.html
+
+ entries = []
+
+ # Look for iframe embeds [1]
+ for _, url in re.findall(
+ r'<iframe[^>]+src=(["\'])((?:https?:)//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
+ entries.append(url)
+
+ # Look for embed_in_page embeds [2]
+ for video_id, account_id, player_id, embed in re.findall(
+ # According to examples from [3] it's unclear whether video id
+ # may be optional and what to do when it is
+ r'''(?sx)
+ <video[^>]+
+ data-video-id=["\'](\d+)["\'][^>]*>.*?
+ </video>.*?
+ <script[^>]+
+ src=["\'](?:https?:)?//players\.brightcove\.net/
+ (\d+)/([\da-f-]+)_([^/]+)/index\.min\.js
+ ''', webpage):
+ entries.append(
+ 'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
+ % (account_id, player_id, embed, video_id))
+
+ return entries
+
+ def _real_extract(self, url):
+ account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
+
+ webpage = self._download_webpage(
+ 'http://players.brightcove.net/%s/%s_%s/index.min.js'
+ % (account_id, player_id, embed), video_id)
+
+ policy_key = None
+
+ catalog = self._search_regex(
+ r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
+ if catalog:
+ catalog = self._parse_json(
+ js_to_json(catalog), video_id, fatal=False)
+ if catalog:
+ policy_key = catalog.get('policyKey')
+
+ if not policy_key:
+ policy_key = self._search_regex(
+ r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+ webpage, 'policy key', group='pk')
+
+ req = sanitized_Request(
+ 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s'
+ % (account_id, video_id),
+ headers={'Accept': 'application/json;pk=%s' % policy_key})
+ json_data = self._download_json(req, video_id)
+
+ title = json_data['name']
+
+ formats = []
+ for source in json_data.get('sources', []):
+ source_type = source.get('type')
+ src = source.get('src')
+ if source_type == 'application/x-mpegURL':
+ if not src:
+ continue
+ m3u8_formats = self._extract_m3u8_formats(
+ src, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False)
+ if m3u8_formats:
+ formats.extend(m3u8_formats)
+ else:
+ streaming_src = source.get('streaming_src')
+ stream_name, app_name = source.get('stream_name'), source.get('app_name')
+ if not src and not streaming_src and (not stream_name or not app_name):
+ continue
+ tbr = float_or_none(source.get('avg_bitrate'), 1000)
+ height = int_or_none(source.get('height'))
+ f = {
+ 'tbr': tbr,
+ 'width': int_or_none(source.get('width')),
+ 'height': height,
+ 'filesize': int_or_none(source.get('size')),
+ 'container': source.get('container'),
+ 'vcodec': source.get('codec'),
+ 'ext': source.get('container').lower(),
+ }
+
+ def build_format_id(kind):
+ format_id = kind
+ if tbr:
+ format_id += '-%dk' % int(tbr)
+ if height:
+ format_id += '-%dp' % height
+ return format_id
+
+ if src or streaming_src:
+ f.update({
+ 'url': src or streaming_src,
+ 'format_id': build_format_id('http' if src else 'http-streaming'),
+ 'preference': 2 if src else 1,
+ })
+ else:
+ f.update({
+ 'url': app_name,
+ 'play_path': stream_name,
+ 'format_id': build_format_id('rtmp'),
+ })
+ formats.append(f)
+ self._sort_formats(formats)
+
+ description = json_data.get('description')
+ thumbnail = json_data.get('thumbnail')
+ timestamp = parse_iso8601(json_data.get('published_at'))
+ duration = float_or_none(json_data.get('duration'), 1000)
+ tags = json_data.get('tags', [])
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'duration': duration,
+ 'timestamp': timestamp,
+ 'uploader_id': account_id,
+ 'formats': formats,
+ 'tags': tags,
+ }
from __future__ import unicode_literals
from .common import InfoExtractor
+from ..utils import (
+ sanitized_Request,
+ smuggle_url,
+)
class CBSIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
- webpage = self._download_webpage(url, display_id)
+ request = sanitized_Request(url)
+ # Android UA is served with higher quality (720p) streams (see
+ # https://github.com/rg3/youtube-dl/issues/7490)
+ request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')
+ webpage = self._download_webpage(request, display_id)
real_id = self._search_regex(
[r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
webpage, 'real video ID')
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
- 'url': 'theplatform:%s' % real_id,
+ 'url': smuggle_url(
+ 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id,
+ {'force_smil_url': True}),
'display_id': display_id,
}
'format_id': format_id,
}
if uri.startswith('rtmp'):
+ play_path = re.sub(
+ r'{slistFilePath}', '',
+ uri.split('<break>')[-1].split('{break}')[-1])
fmt.update({
'app': 'ondemand?auth=cbs',
- 'play_path': 'mp4:' + uri.split('<break>')[-1],
+ 'play_path': 'mp4:' + play_path,
'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
'page_url': 'http://www.cbsnews.com',
'ext': 'flv',
from .common import InfoExtractor
from ..compat import (
- compat_urllib_request,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
from ..utils import (
ExtractorError,
float_or_none,
+ sanitized_Request,
)
'requestSource': 'iVysilani',
}
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=compat_urllib_parse.urlencode(data))
if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
- req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url))
+ req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage)
import json
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
int_or_none,
+ sanitized_Request,
)
}
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
json.dumps(player_options_request))
request.add_header('Content-Type', 'application/json')
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlparse,
- compat_urllib_request,
compat_urlparse,
compat_str,
compat_etree_fromstring,
int_or_none,
RegexNotFoundError,
sanitize_filename,
+ sanitized_Request,
unescapeHTML,
unified_strdate,
url_basename,
if not media_nodes:
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
+ base_url = xpath_text(
+ manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
+ 'base URL', default=None)
+ if base_url:
+ base_url = base_url.strip()
for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0':
media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
continue
manifest_url = (
media_url if media_url.startswith('http://') or media_url.startswith('https://')
- else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url))
+ else ((base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url))
# If media_url is itself a f4m manifest do the recursive extraction
# since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
int_or_none,
lowercase_escape,
remove_end,
+ sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_text,
'name': username,
'password': password,
})
- login_request = compat_urllib_request.Request(login_url, data)
+ login_request = sanitized_Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._download_webpage(login_request, None, False, 'Wrong login info')
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
- else compat_urllib_request.Request(url_or_request))
+ else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/rg3/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
'video_uploader', fatal=False)
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
- playerdata_req = compat_urllib_request.Request(playerdata_url)
+ playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
- streamdata_req = compat_urllib_request.Request(
+ streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
from .common import InfoExtractor
-from ..compat import (
- compat_str,
- compat_urllib_request,
-)
+from ..compat import compat_str
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
parse_iso8601,
+ sanitized_Request,
str_to_int,
unescapeHTML,
)
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.add_header('Cookie', 'family_filter=off; ff=off')
return request
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
int_or_none,
parse_iso8601,
+ sanitized_Request,
)
def _real_extract(self, url):
video_id = self._match_id(url)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
headers={'Origin': 'http://www.dcndigital.ae'})
--- /dev/null
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import time
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class DPlayIE(InfoExtractor):
+ _VALID_URL = r'http://www\.dplay\.se/[^/]+/(?P<id>[^/?#]+)'
+
+ _TEST = {
+ 'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
+ 'info_dict': {
+ 'id': '3172',
+ 'ext': 'mp4',
+ 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
+ 'title': 'Svensken lär sig njuta av livet',
+ 'duration': 2650,
+ },
+ }
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+ video_id = self._search_regex(
+ r'data-video-id="(\d+)"', webpage, 'video id')
+
+ info = self._download_json(
+ 'http://www.dplay.se/api/v2/ajax/videos?video_id=' + video_id,
+ video_id)['data'][0]
+
+ self._set_cookie(
+ 'secure.dplay.se', 'dsc-geo',
+ '{"countryCode":"NL","expiry":%d}' % ((time.time() + 20 * 60) * 1000))
+ # TODO: consider adding support for 'stream_type=hds', it seems to
+ # require setting some cookies
+ manifest_url = self._download_json(
+ 'https://secure.dplay.se/secure/api/v2/user/authorization/stream/%s?stream_type=hls' % video_id,
+ video_id, 'Getting manifest url for hls stream')['hls']
+ formats = self._extract_m3u8_formats(
+ manifest_url, video_id, ext='mp4', entry_protocol='m3u8_native')
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': info['title'],
+ 'formats': formats,
+ 'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
+ }
from ..compat import (
compat_HTTPError,
compat_urllib_parse,
- compat_urllib_request,
compat_urlparse,
)
from ..utils import (
determine_ext,
int_or_none,
parse_iso8601,
+ sanitized_Request,
)
'password': password,
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
from __future__ import unicode_literals
import base64
+import re
from .common import InfoExtractor
-from ..compat import compat_urllib_request
-from ..utils import qualities
+from ..utils import (
+ qualities,
+ sanitized_Request,
+)
class DumpertIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
+ _VALID_URL = r'(?P<protocol>https?)://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://www.dumpert.nl/mediabase/6646981/951bc60f/',
'md5': '1b9318d7d5054e7dcb9dc7654f21d643',
}]
def _real_extract(self, url):
- video_id = self._match_id(url)
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ protocol = mobj.group('protocol')
- url = 'https://www.dumpert.nl/mediabase/' + video_id
- req = compat_urllib_request.Request(url)
+ url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'nsfw=1; cpc=10')
webpage = self._download_webpage(req, video_id)
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
+ sanitized_Request,
)
hls_url = media.get('HLS_SURL')
if hls_url:
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://mam.eitb.eus/mam/REST/ServiceMultiweb/DomainRestrictedSecurity/TokenAuth/',
headers={'Referer': url})
token_data = self._download_json(
import json
from .common import InfoExtractor
-from ..compat import compat_urllib_request
-
from ..utils import (
determine_ext,
clean_html,
int_or_none,
float_or_none,
+ sanitized_Request,
)
video_id = ims_video['videoID']
key = ims_video['hash']
- config_req = compat_urllib_request.Request(
+ config_req = sanitized_Request(
'http://www.escapistmagazine.com/videos/'
'vidconfig.php?videoID=%s&hash=%s' % (video_id, key))
config_req.add_header('Referer', url)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
playlist_id = mobj.group('id')
pllist_url = 'http://everyonesmixtape.com/mixtape.php?a=getMixes&u=-1&linked=%s&explore=' % playlist_id
- pllist_req = compat_urllib_request.Request(pllist_url)
+ pllist_req = sanitized_Request(pllist_url)
pllist_req.add_header('X-Requested-With', 'XMLHttpRequest')
playlist_list = self._download_json(
raise ExtractorError('Playlist id not found')
pl_url = 'http://everyonesmixtape.com/mixtape.php?a=getMix&id=%s&userId=null&code=' % playlist_no
- pl_req = compat_urllib_request.Request(pl_url)
+ pl_req = sanitized_Request(pl_url)
pl_req.add_header('X-Requested-With', 'XMLHttpRequest')
playlist = self._download_json(
pl_req, playlist_id, note='Downloading playlist info')
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
int_or_none,
+ sanitized_Request,
str_to_int,
)
def _real_extract(self, url):
video_id = self._match_id(url)
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
compat_str,
compat_urllib_error,
compat_urllib_parse_unquote,
- compat_urllib_request,
)
from ..utils import (
ExtractorError,
limit_length,
+ sanitized_Request,
urlencode_postdata,
get_element_by_id,
clean_html,
if useremail is None:
return
- login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
+ login_page_req = sanitized_Request(self._LOGIN_URL)
login_page_req.add_header('Cookie', 'locale=en_US')
login_page = self._download_webpage(login_page_req, None,
note='Downloading login page',
'timezone': '-60',
'trynum': '1',
}
- request = compat_urllib_request.Request(self._LOGIN_URL, urlencode_postdata(login_form))
+ request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try:
login_results = self._download_webpage(request, None,
r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h'),
'name_action_selected': 'dont_save',
}
- check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
+ check_req = sanitized_Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
check_response = self._download_webpage(check_req, None,
note='Confirming login')
from ..utils import (
encode_dict,
ExtractorError,
+ sanitized_Request,
)
}
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)
login_results = self._download_webpage(request, None, note='Logging in', errnote='Unable to log in')
return False
# this is also needed
- login_redir = compat_urllib_request.Request('http://id.fc2.com/?mode=redirect&login=done')
+ login_redir = sanitized_Request('http://id.fc2.com/?mode=redirect&login=done')
self._download_webpage(
login_redir, None, note='Login redirect', errnote='Login redirect failed')
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
ExtractorError,
find_xpath_attr,
+ sanitized_Request,
)
video_id = mobj.group('id')
video_uploader_id = mobj.group('uploader_id')
webpage_url = 'http://www.flickr.com/photos/' + video_uploader_id + '/' + video_id
- req = compat_urllib_request.Request(webpage_url)
+ req = sanitized_Request(webpage_url)
req.add_header(
'User-Agent',
# it needs a more recent version
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
parse_duration,
parse_iso8601,
+ sanitized_Request,
str_to_int,
)
b'Content-Type': b'application/x-www-form-urlencoded',
b'Origin': b'http://www.4tube.com',
}
- token_req = compat_urllib_request.Request(token_url, b'{}', headers)
+ token_req = sanitized_Request(token_url, b'{}', headers)
tokens = self._download_json(token_req, video_id)
formats = [{
'url': tokens[format]['token'],
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
- bitrates = self._html_search_regex(r'<source src="[^"]+/v,((?:\d+,)+)\.mp4\.csmil', webpage, 'video bitrates')
- bitrates = [int(b) for b in bitrates.rstrip(',').split(',')]
- bitrates.sort()
+ m3u8_url = self._search_regex(
+ r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1',
+ webpage, 'm3u8 url', default=None, group='url')
formats = []
+
+ m3u8_formats = self._extract_m3u8_formats(
+ m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
+ if m3u8_formats:
+ formats.extend(m3u8_formats)
+
+ bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)[,/]', m3u8_url)]
+ bitrates.sort()
+
for bitrate in bitrates:
for link in links:
formats.append({
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
remove_end,
HEADRequest,
+ sanitized_Request,
)
'password': password,
}
- request = compat_urllib_request.Request(login_url, compat_urllib_parse.urlencode(login_form))
+ request = sanitized_Request(login_url, compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._download_webpage(request, display_id, 'Logging in')
start_page = self._download_webpage(webpage_url, display_id, 'Getting authenticated video page')
from ..compat import (
compat_etree_fromstring,
compat_urllib_parse_unquote,
- compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
)
HEADRequest,
is_html,
orderedSet,
+ sanitized_Request,
smuggle_url,
unescapeHTML,
unified_strdate,
url_basename,
xpath_text,
)
-from .brightcove import BrightcoveIE
+from .brightcove import (
+ BrightcoveLegacyIE,
+ BrightcoveNewIE,
+)
from .nbc import NBCSportsVPlayerIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
# it also tests brightcove videos that need to set the 'Referer' in the
# http requests
{
- 'add_ie': ['Brightcove'],
+ 'add_ie': ['BrightcoveLegacy'],
'url': 'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
'info_dict': {
'id': '2765128793001',
'uploader': 'thestar.com',
'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
},
- 'add_ie': ['Brightcove'],
+ 'add_ie': ['BrightcoveLegacy'],
},
{
'url': 'http://www.championat.com/video/football/v/87/87499.html',
},
{
# https://github.com/rg3/youtube-dl/issues/3541
- 'add_ie': ['Brightcove'],
+ 'add_ie': ['BrightcoveLegacy'],
'url': 'http://www.kijk.nl/sbs6/leermijvrouwenkennen/videos/jqMiXKAYan2S/aflevering-1',
'info_dict': {
'id': '3866516442001',
'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
},
},
+ # Kaltura embed protected with referrer
+ {
+ 'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
+ 'info_dict': {
+ 'id': '1_g4fbemnq',
+ 'ext': 'mp4',
+ 'title': 'Violetta - Achter De Schermen - Ruggero',
+ 'description': 'Achter de schermen met Ruggero',
+ 'timestamp': 1435133761,
+ 'upload_date': '20150624',
+ 'uploader_id': 'echojecka',
+ },
+ },
# Eagle.Platform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
'ext': 'mp4',
'title': 'cinemasnob',
},
+ },
+ # BrightcoveInPageEmbed embed
+ {
+ 'url': 'http://www.geekandsundry.com/tabletop-bonus-wils-final-thoughts-on-dread/',
+ 'info_dict': {
+ 'id': '4238694884001',
+ 'ext': 'flv',
+ 'title': 'Tabletop: Dread, Last Thoughts',
+ 'description': 'Tabletop: Dread, Last Thoughts',
+ 'duration': 51690,
+ },
+ },
+ # JWPlayer with M3U8
+ {
+ 'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
+ 'info_dict': {
+ 'id': 'playlist',
+ 'ext': 'mp4',
+ 'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
+ 'uploader': 'ren.tv',
+ },
+ 'params': {
+ # m3u8 downloads
+ 'skip_download': True,
+ }
}
]
full_response = None
if head_response is False:
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.add_header('Accept-Encoding', '*')
full_response = self._request_webpage(request, video_id)
head_response = full_response
'%s on generic information extractor.' % ('Forcing' if force else 'Falling back'))
if not full_response:
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
# Some webservers may serve compressed content of rather big size (e.g. gzipped flac)
# making it impossible to download only chunk of the file (yet we need only 512kB to
# test whether it's HTML or not). According to youtube-dl default Accept-Encoding
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
- # Look for BrightCove:
- bc_urls = BrightcoveIE._extract_brightcove_urls(webpage)
+ # Look for Brightcove Legacy Studio embeds
+ bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
if bc_urls:
self.to_screen('Brightcove video detected.')
entries = [{
'_type': 'url',
'url': smuggle_url(bc_url, {'Referer': url}),
- 'ie_key': 'Brightcove'
+ 'ie_key': 'BrightcoveLegacy'
} for bc_url in bc_urls]
return {
'entries': entries,
}
+ # Look for Brightcove New Studio embeds
+ bc_urls = BrightcoveNewIE._extract_urls(webpage)
+ if bc_urls:
+ return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
+
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or
re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
if mobj is not None:
- return self.url_result('kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), 'Kaltura')
+ return self.url_result(smuggle_url(
+ 'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(),
+ {'source_url': url}), 'Kaltura')
# Look for Eagle.Platform embeds
mobj = re.search(
# Look for UDN embeds
mobj = re.search(
- r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._VALID_URL, webpage)
+ r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage)
if mobj is not None:
return self.url_result(
compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed')
entries = []
for video_url in found:
+ video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
# here's a fun little line of code for you:
video_id = os.path.splitext(video_id)[0]
+ entry_info_dict = {
+ 'id': video_id,
+ 'uploader': video_uploader,
+ 'title': video_title,
+ 'age_limit': age_limit,
+ }
+
ext = determine_ext(video_url)
if ext == 'smil':
- entries.append({
- 'id': video_id,
- 'formats': self._extract_smil_formats(video_url, video_id),
- 'uploader': video_uploader,
- 'title': video_title,
- 'age_limit': age_limit,
- })
+ entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id)
elif ext == 'xspf':
return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id)
+ elif ext == 'm3u8':
+ entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
else:
- entries.append({
- 'id': video_id,
- 'url': video_url,
- 'uploader': video_uploader,
- 'title': video_title,
- 'age_limit': age_limit,
- })
+ entry_info_dict['url'] = video_url
+
+ entries.append(entry_info_dict)
if len(entries) == 1:
return entries[0]
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urlparse,
-)
+from ..compat import compat_urlparse
from ..utils import (
HEADRequest,
+ sanitized_Request,
str_to_int,
urlencode_postdata,
urlhandle_detect_ext,
r'intTrackId\s*=\s*(\d+)', webpage, 'track ID')
payload = urlencode_postdata({'tracks[]': track_id})
- req = compat_urllib_request.Request(self._PLAYLIST_URL, payload)
+ req = sanitized_Request(self._PLAYLIST_URL, payload)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
track = self._download_json(req, track_id, 'Downloading playlist')[0]
import base64
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
HEADRequest,
+ sanitized_Request,
)
('mediaType', 's'),
('mediaId', video_id),
])
- r = compat_urllib_request.Request(
+ r = sanitized_Request(
'http://www.hotnewhiphop.com/ajax/media/getActions/', data=reqdata)
r.add_header('Content-Type', 'application/x-www-form-urlencoded')
mkd = self._download_json(
import time
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
data = {'ax': 1, 'ts': time.time()}
data_encoded = compat_urllib_parse.urlencode(data)
complete_url = url + "?" + data_encoded
- request = compat_urllib_request.Request(complete_url)
+ request = sanitized_Request(complete_url)
response, urlh = self._download_webpage_handle(
request, track_id, 'Downloading webpage with the url')
cookie = urlh.headers.get('Set-Cookie', '')
title = track['song']
serve_url = "http://hypem.com/serve/source/%s/%s" % (track_id, key)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
serve_url, '', {'Content-Type': 'application/json'})
request.add_header('cookie', cookie)
song_data = self._download_json(request, track_id, 'Downloading metadata')
class InstagramIE(InfoExtractor):
- _VALID_URL = r'https://instagram\.com/p/(?P<id>[\da-zA-Z]+)'
- _TEST = {
+ _VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)'
+ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
'info_dict': {
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
- }
+ }, {
+ 'url': 'https://instagram.com/p/-Cmh1cukG2/',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
from math import floor
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
ExtractorError,
remove_end,
+ sanitized_Request,
)
(floor(random() * 1073741824), floor(random() * 1073741824))
)
- req = compat_urllib_request.Request(player_url)
+ req = sanitized_Request(player_url)
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id)
import json
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
]
}
- request = compat_urllib_request.Request(api_url, json.dumps(data))
+ request = sanitized_Request(api_url, json.dumps(data))
video_json_page = self._download_webpage(
request, video_id, 'Downloading video JSON')
from __future__ import unicode_literals
import re
+import base64
from .common import InfoExtractor
-from ..compat import compat_urllib_parse
+from ..compat import (
+ compat_urllib_parse,
+ compat_urlparse,
+)
from ..utils import (
+ clean_html,
ExtractorError,
int_or_none,
+ unsmuggle_url,
)
video_id, actions, note='Downloading video info JSON')
def _real_extract(self, url):
+ url, smuggled_data = unsmuggle_url(url, {})
+
mobj = re.match(self._VALID_URL, url)
partner_id = mobj.group('partner_id_s') or mobj.group('partner_id') or mobj.group('partner_id_html5')
entry_id = mobj.group('id_s') or mobj.group('id') or mobj.group('id_html5')
info, source_data = self._get_video_info(entry_id, partner_id)
- formats = [{
- 'format_id': '%(fileExt)s-%(bitrate)s' % f,
- 'ext': f['fileExt'],
- 'tbr': f['bitrate'],
- 'fps': f.get('frameRate'),
- 'filesize_approx': int_or_none(f.get('size'), invscale=1024),
- 'container': f.get('containerFormat'),
- 'vcodec': f.get('videoCodecId'),
- 'height': f.get('height'),
- 'width': f.get('width'),
- 'url': '%s/flavorId/%s' % (info['dataUrl'], f['id']),
- } for f in source_data['flavorAssets']]
+ source_url = smuggled_data.get('source_url')
+ if source_url:
+ referrer = base64.b64encode(
+ '://'.join(compat_urlparse.urlparse(source_url)[:2])
+ .encode('utf-8')).decode('utf-8')
+ else:
+ referrer = None
+
+ formats = []
+ for f in source_data['flavorAssets']:
+ video_url = '%s/flavorId/%s' % (info['dataUrl'], f['id'])
+ if referrer:
+ video_url += '?referrer=%s' % referrer
+ formats.append({
+ 'format_id': '%(fileExt)s-%(bitrate)s' % f,
+ 'ext': f.get('fileExt'),
+ 'tbr': int_or_none(f['bitrate']),
+ 'fps': int_or_none(f.get('frameRate')),
+ 'filesize_approx': int_or_none(f.get('size'), invscale=1024),
+ 'container': f.get('containerFormat'),
+ 'vcodec': f.get('videoCodecId'),
+ 'height': int_or_none(f.get('height')),
+ 'width': int_or_none(f.get('width')),
+ 'url': video_url,
+ })
+ self._check_formats(formats, entry_id)
self._sort_formats(formats)
return {
'id': entry_id,
'title': info['name'],
'formats': formats,
- 'description': info.get('description'),
+ 'description': clean_html(info.get('description')),
'thumbnail': info.get('thumbnailUrl'),
'duration': info.get('duration'),
'timestamp': info.get('createdAt'),
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse_urlparse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_urlparse
+from ..utils import sanitized_Request
class KeezMoviesIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
- compat_urllib_request,
compat_ord,
)
from ..utils import (
determine_ext,
ExtractorError,
parse_iso8601,
+ sanitized_Request,
int_or_none,
encode_data_uri,
)
'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.letv.com'
}
- play_json_req = compat_urllib_request.Request(
+ play_json_req = sanitized_Request(
'http://api.letv.com/mms/out/video/playJson?' + compat_urllib_parse.urlencode(params)
)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
from ..compat import (
compat_str,
compat_urllib_parse,
- compat_urllib_request,
)
from ..utils import (
ExtractorError,
clean_html,
int_or_none,
+ sanitized_Request,
)
self._login()
def _login(self):
- (username, password) = self._get_login_info()
+ username, password = self._get_login_info()
if username is None:
return
'remember': 'false',
'stayPut': 'false'
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
login_page = self._download_webpage(
request, None, 'Logging in as %s' % username)
'remember': 'false',
'stayPut': 'false',
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form).encode('utf-8'))
login_page = self._download_webpage(
request, None,
raise ExtractorError('Unable to log in')
def _logout(self):
+ username, _ = self._get_login_info()
+ if username is None:
+ return
+
self._download_webpage(
'http://www.lynda.com/ajax/logout.aspx', None,
'Logging out', 'Unable to log out', fatal=False)
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote,
- compat_urllib_request,
)
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
+ sanitized_Request,
)
'filters': '0',
'submit': "Continue - I'm over 18",
}
- request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
+ request = sanitized_Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self.report_age_confirmation()
self._download_webpage(request, None, False, 'Unable to confirm age')
return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# Retrieve video webpage to extract further information
- req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id)
+ req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
# AnyClip videos require the flashversion cookie so that we get the link
# to the mp4 file
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
int_or_none,
parse_duration,
parse_filesize,
+ sanitized_Request,
)
('fileId', video_id),
('__RequestVerificationToken', token),
]
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
'http://minhateca.com.br/action/License/Download',
data=compat_urllib_parse.urlencode(token_data))
req.add_header('Content-Type', 'application/x-www-form-urlencoded')
import random
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
xpath_text,
int_or_none,
ExtractorError,
+ sanitized_Request,
)
'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/xml.php?id=%s&r=%s' % (id, random.randint(100, 999)),
video_id)
- vid_config_request = compat_urllib_request.Request(
+ vid_config_request = sanitized_Request(
'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/sina.php?{0}'.format(xml_config),
headers=http_headers)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
int_or_none,
+ sanitized_Request,
)
]
r_json = json.dumps(r)
post = compat_urllib_parse.urlencode({'r': r_json})
- req = compat_urllib_request.Request(self._API_URL, post)
+ req = sanitized_Request(self._API_URL, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
response = self._download_json(req, video_id)
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
- compat_urllib_request,
)
+from ..utils import sanitized_Request
class MofosexIE(InfoExtractor):
video_id = mobj.group('id')
url = 'http://www.' + mobj.group('url')
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
remove_start,
+ sanitized_Request,
)
orig_webpage, 'builtin URL', default=None, group='url')
if builtin_url:
- req = compat_urllib_request.Request(builtin_url)
+ req = sanitized_Request(builtin_url)
req.add_header('Referer', url)
webpage = self._download_webpage(req, video_id, 'Downloading builtin page')
title = self._og_search_title(orig_webpage).strip()
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
- req = compat_urllib_request.Request(url, post, headers)
+ req = sanitized_Request(url, post, headers)
webpage = self._download_webpage(
req, video_id, note='Downloading video page ...')
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
'hash': hash_key,
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://mooshare.biz/%s' % video_id, compat_urllib_parse.urlencode(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
+from ..utils import sanitized_Request
class MovieClipsIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
# it doesn't work if it thinks the browser it's too old
req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/43.0 (Chrome)')
webpage = self._download_webpage(req, display_id)
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
- compat_urllib_request,
compat_str,
)
from ..utils import (
find_xpath_attr,
fix_xml_ampersands,
HEADRequest,
+ sanitized_Request,
unescapeHTML,
url_basename,
RegexNotFoundError,
def _extract_mobile_video_formats(self, mtvn_id):
webpage_url = self._MOBILE_TEMPLATE % mtvn_id
- req = compat_urllib_request.Request(webpage_url)
+ req = sanitized_Request(webpage_url)
# Otherwise we get a webpage that would execute some javascript
req.add_header('User-Agent', 'curl/7')
webpage = self._download_webpage(req, mtvn_id,
compat_ord,
compat_urllib_parse,
compat_urllib_parse_unquote,
- compat_urllib_request,
)
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
mobj = re.search(r'data-video-service="/service/data/video/%s/config' % video_id, webpage)
if mobj is not None:
- request = compat_urllib_request.Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
+ request = sanitized_Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
response = self._download_webpage(request, video_id,
'Downloading video info')
info = json.loads(base64.b64decode(response).decode('utf-8'))
from .common import InfoExtractor
from ..compat import (
- compat_urllib_request,
compat_urllib_parse,
compat_str,
compat_itertools_count,
)
+from ..utils import sanitized_Request
class NetEaseMusicBaseIE(InfoExtractor):
if not details:
continue
formats.append({
- 'url': 'http://m1.music.126.net/%s/%s.%s' %
+ 'url': 'http://m5.music.126.net/%s/%s.%s' %
(cls._encrypt(details['dfsId']), details['dfsId'],
details['extension']),
'ext': details.get('extension'),
return int(round(ms / 1000.0))
def query_api(self, endpoint, video_id, note):
- req = compat_urllib_request.Request('%s%s' % (self._API_BASE, endpoint))
+ req = sanitized_Request('%s%s' % (self._API_BASE, endpoint))
req.add_header('Referer', self._API_BASE)
return self._download_json(req, video_id, note)
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
class NFBIE(InfoExtractor):
uploader = self._html_search_regex(r'<em class="director-name" itemprop="name">([^<]+)</em>',
page, 'director name', fatal=False)
- request = compat_urllib_request.Request('https://www.nfb.ca/film/%s/player_config' % video_id,
- compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii'))
+ request = sanitized_Request(
+ 'https://www.nfb.ca/film/%s/player_config' % video_id,
+ compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii'))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf')
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
- compat_urllib_request,
compat_urlparse,
)
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
+ sanitized_Request,
xpath_text,
determine_ext,
)
'password': password,
}
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'https://secure.nicovideo.jp/secure/login', login_data)
login_results = self._download_webpage(
request, None, note='Logging in', errnote='Unable to log in')
'k': thumb_play_key,
'v': video_id
})
- flv_info_request = compat_urllib_request.Request(
+ flv_info_request = sanitized_Request(
'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
{'Content-Type': 'application/x-www-form-urlencoded'})
flv_info_webpage = self._download_webpage(
from ..compat import (
compat_str,
compat_urllib_parse,
- compat_urllib_request,
)
from ..utils import (
clean_html,
int_or_none,
float_or_none,
parse_iso8601,
+ sanitized_Request,
)
'username': username,
'password': password,
}
- request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
+ request = sanitized_Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
login = self._download_json(request, None, 'Logging in as %s' % username)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
ExtractorError,
+ sanitized_Request,
urlencode_postdata,
xpath_text,
xpath_with_ns,
'op': 'download1',
'method_free': 'Continue to Video',
}
- req = compat_urllib_request.Request(url, urlencode_postdata(fields))
+ req = sanitized_Request(url, urlencode_postdata(fields))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(req, video_id,
'Downloading download page')
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urlparse,
-)
+from ..compat import compat_urlparse
from ..utils import (
ExtractorError,
+ NO_DEFAULT,
+ encode_dict,
+ sanitized_Request,
+ urlencode_postdata,
)
}
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id')
+ video_id = self._match_id(url)
- page = self._download_webpage(
- 'http://%s/video/%s' % (self._HOST, video_id), video_id, 'Downloading video page')
+ url = 'http://%s/video/%s' % (self._HOST, video_id)
- if re.search(self._FILE_DELETED_REGEX, page) is not None:
- raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+ webpage = self._download_webpage(
+ url, video_id, 'Downloading video page')
- filekey = self._search_regex(self._FILEKEY_REGEX, page, 'filekey')
+ if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
+ raise ExtractorError('Video %s does not exist' % video_id, expected=True)
- title = self._html_search_regex(self._TITLE_REGEX, page, 'title', fatal=False)
- description = self._html_search_regex(self._DESCRIPTION_REGEX, page, 'description', default='', fatal=False)
+ def extract_filekey(default=NO_DEFAULT):
+ return self._search_regex(
+ self._FILEKEY_REGEX, webpage, 'filekey', default=default)
+
+ filekey = extract_filekey(default=None)
+
+ if not filekey:
+ fields = self._hidden_inputs(webpage)
+ post_url = self._search_regex(
+ r'<form[^>]+action=(["\'])(?P<url>.+?)\1', webpage,
+ 'post url', default=url, group='url')
+ if not post_url.startswith('http'):
+ post_url = compat_urlparse.urljoin(url, post_url)
+ request = sanitized_Request(
+ post_url, urlencode_postdata(encode_dict(fields)))
+ request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+ request.add_header('Referer', post_url)
+ webpage = self._download_webpage(
+ request, video_id, 'Downloading continue to the video page')
+
+ filekey = extract_filekey()
+
+ title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title', fatal=False)
+ description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
api_response = self._download_webpage(
'http://%s/api/player.api.php?key=%s&file=%s' % (self._HOST, filekey, video_id), video_id,
# encoding: utf-8
from __future__ import unicode_literals
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
from .common import InfoExtractor
-from ..utils import ExtractorError
-from ..compat import (
- compat_str,
- compat_urllib_request,
+from ..compat import compat_str
+from ..utils import (
+ ExtractorError,
+ sanitized_Request,
)
'http://www.nowness.com/iframe?id=%s' % video_id, video_id,
note='Downloading player JavaScript',
errnote='Unable to download player JavaScript')
- bc_url = BrightcoveIE._extract_brightcove_url(player_code)
+ bc_url = BrightcoveLegacyIE._extract_brightcove_url(player_code)
if bc_url is None:
raise ExtractorError('Could not find player definition')
- return self.url_result(bc_url, 'Brightcove')
+ return self.url_result(bc_url, 'BrightcoveLegacy')
elif source == 'vimeo':
return self.url_result('http://vimeo.com/%s' % video_id, 'Vimeo')
elif source == 'youtube':
def _api_request(self, url, request_path):
display_id = self._match_id(url)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://api.nowness.com/api/' + request_path % display_id,
headers={
'X-Nowness-Language': 'zh-cn' if 'cn.nowness.com' in url else 'en-us',
# coding: utf-8
from __future__ import unicode_literals
+import re
+
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
)
-class NowTVIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<id>.+?)/(?:player|preview)'
+class NowTVBaseIE(InfoExtractor):
+ _VIDEO_FIELDS = (
+ 'id', 'title', 'free', 'geoblocked', 'articleLong', 'articleShort',
+ 'broadcastStartDate', 'seoUrl', 'duration', 'files',
+ 'format.defaultImage169Format', 'format.defaultImage169Logo')
+
+ def _extract_video(self, info, display_id=None):
+ video_id = compat_str(info['id'])
+
+ files = info['files']
+ if not files:
+ if info.get('geoblocked', False):
+ raise ExtractorError(
+ 'Video %s is not available from your location due to geo restriction' % video_id,
+ expected=True)
+ if not info.get('free', True):
+ raise ExtractorError(
+ 'Video %s is not available for free' % video_id, expected=True)
+
+ formats = []
+ for item in files['items']:
+ if determine_ext(item['path']) != 'f4v':
+ continue
+ app, play_path = remove_start(item['path'], '/').split('/', 1)
+ formats.append({
+ 'url': 'rtmpe://fms.rtl.de',
+ 'app': app,
+ 'play_path': 'mp4:%s' % play_path,
+ 'ext': 'flv',
+ 'page_url': 'http://rtlnow.rtl.de',
+ 'player_url': 'http://cdn.static-fra.de/now/vodplayer.swf',
+ 'tbr': int_or_none(item.get('bitrate')),
+ })
+ self._sort_formats(formats)
+
+ title = info['title']
+ description = info.get('articleLong') or info.get('articleShort')
+ timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
+ duration = parse_duration(info.get('duration'))
+
+ f = info.get('format', {})
+ thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id or info.get('seoUrl'),
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'timestamp': timestamp,
+ 'duration': duration,
+ 'formats': formats,
+ }
+
+
+class NowTVIE(NowTVBaseIE):
+ _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:list/[^/]+/)?(?P<id>[^/]+)/(?:player|preview)'
_TESTS = [{
# rtl
'id': '203519',
'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
'ext': 'flv',
- 'title': 'Die neuen Bauern und eine Hochzeit',
+ 'title': 'Inka Bause stellt die neuen Bauern vor',
'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1432580700,
}]
def _real_extract(self, url):
- display_id = self._match_id(url)
- display_id_split = display_id.split('/')
- if len(display_id) > 2:
- display_id = '/'.join((display_id_split[0], display_id_split[-1]))
+ mobj = re.match(self._VALID_URL, url)
+ display_id = '%s/%s' % (mobj.group('show_id'), mobj.group('id'))
info = self._download_json(
- 'https://api.nowtv.de/v3/movies/%s?fields=id,title,free,geoblocked,articleLong,articleShort,broadcastStartDate,seoUrl,duration,format,files' % display_id,
- display_id)
+ 'https://api.nowtv.de/v3/movies/%s?fields=%s'
+ % (display_id, ','.join(self._VIDEO_FIELDS)), display_id)
- video_id = compat_str(info['id'])
+ return self._extract_video(info, display_id)
- files = info['files']
- if not files:
- if info.get('geoblocked', False):
- raise ExtractorError(
- 'Video %s is not available from your location due to geo restriction' % video_id,
- expected=True)
- if not info.get('free', True):
- raise ExtractorError(
- 'Video %s is not available for free' % video_id, expected=True)
- formats = []
- for item in files['items']:
- if determine_ext(item['path']) != 'f4v':
- continue
- app, play_path = remove_start(item['path'], '/').split('/', 1)
- formats.append({
- 'url': 'rtmpe://fms.rtl.de',
- 'app': app,
- 'play_path': 'mp4:%s' % play_path,
- 'ext': 'flv',
- 'page_url': 'http://rtlnow.rtl.de',
- 'player_url': 'http://cdn.static-fra.de/now/vodplayer.swf',
- 'tbr': int_or_none(item.get('bitrate')),
- })
- self._sort_formats(formats)
+class NowTVListIE(NowTVBaseIE):
+ _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/list/(?P<id>[^?/#&]+)$'
- title = info['title']
- description = info.get('articleLong') or info.get('articleShort')
- timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
- duration = parse_duration(info.get('duration'))
+ _SHOW_FIELDS = ('title', )
+ _SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
- f = info.get('format', {})
- thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
+ _TESTS = [{
+ 'url': 'http://www.nowtv.at/rtl/stern-tv/list/aktuell',
+ 'info_dict': {
+ 'id': '17006',
+ 'title': 'stern TV - Aktuell',
+ },
+ 'playlist_count': 1,
+ }, {
+ 'url': 'http://www.nowtv.at/rtl/das-supertalent/list/free-staffel-8',
+ 'info_dict': {
+ 'id': '20716',
+ 'title': 'Das Supertalent - FREE Staffel 8',
+ },
+ 'playlist_count': 14,
+ }]
- return {
- 'id': video_id,
- 'display_id': display_id,
- 'title': title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'timestamp': timestamp,
- 'duration': duration,
- 'formats': formats,
- }
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ show_id = mobj.group('show_id')
+ season_id = mobj.group('id')
+
+ fields = []
+ fields.extend(self._SHOW_FIELDS)
+ fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
+ fields.extend(
+ 'formatTabs.formatTabPages.container.movies.%s' % field
+ for field in self._VIDEO_FIELDS)
+
+ list_info = self._download_json(
+ 'https://api.nowtv.de/v3/formats/seo?fields=%s&name=%s.php'
+ % (','.join(fields), show_id),
+ season_id)
+
+ season = next(
+ season for season in list_info['formatTabs']['items']
+ if season.get('seoheadline') == season_id)
+
+ title = '%s - %s' % (list_info['title'], season['headline'])
+
+ entries = []
+ for container in season['formatTabPages']['items']:
+ for info in ((container.get('container') or {}).get('movies') or {}).get('items') or []:
+ entries.append(self._extract_video(info))
+
+ return self.playlist_result(
+ entries, compat_str(season.get('id') or season_id), title)
IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
- _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|ec|sx|eu|at|ag|co|li)'}
+ _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
- _HOST = 'www.nowvideo.ch'
+ _HOST = 'www.nowvideo.to'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_FILEKEY_REGEX = r'var fkzd="([^"]+)";'
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
parse_duration,
+ sanitized_Request,
unified_strdate,
)
formats = []
for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]:
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://m.nuvid.com/play/%s' % video_id)
request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed)
webpage = self._download_webpage(
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..utils import (
- js_to_json,
-)
+from ..utils import js_to_json
class PatreonIE(InfoExtractor):
'password': password,
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'https://www.patreon.com/processLogin',
compat_urllib_parse.urlencode(login_form).encode('utf-8')
)
# Article with embedded player (or direct video)
(?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
# Player
- video\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
+ (?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
)
'''
'params': {
'skip_download': True, # requires ffmpeg
},
+ },
+ {
+ 'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
+ 'only_matching': True,
}
]
_ERRORS = {
return self.playlist_result(entries, display_id)
info = self._download_json(
- 'http://video.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
+ 'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id)
formats = []
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
from ..utils import parse_iso8601
class PeriscopeIE(InfoExtractor):
IE_DESC = 'Periscope'
- _VALID_URL = r'https?://(?:www\.)?periscope\.tv/w/(?P<id>[^/?#]+)'
+ _VALID_URL = r'https?://(?:www\.)?periscope\.tv/[^/]+/(?P<id>[^/?#]+)'
# Alive example URLs can be found here http://onperiscope.com/
_TESTS = [{
'url': 'https://www.periscope.tv/w/aJUQnjY3MjA3ODF8NTYxMDIyMDl2zCg2pECBgwTqRpQuQD352EMPTKQjT4uqlM3cgWFA-g==',
}, {
'url': 'https://www.periscope.tv/w/1ZkKzPbMVggJv',
'only_matching': True,
+ }, {
+ 'url': 'https://www.periscope.tv/bastaakanoggano/1OdKrlkZZjOJX',
+ 'only_matching': True,
}]
def _call_api(self, method, value):
'thumbnails': thumbnails,
'formats': formats,
}
-
-
-class QuickscopeIE(InfoExtractor):
- IE_DESC = 'Quick Scope'
- _VALID_URL = r'https?://watchonperiscope\.com/broadcast/(?P<id>\d+)'
- _TEST = {
- 'url': 'https://watchonperiscope.com/broadcast/56180087',
- 'only_matching': True,
- }
-
- def _real_extract(self, url):
- broadcast_id = self._match_id(url)
- request = compat_urllib_request.Request(
- 'https://watchonperiscope.com/api/accessChannel', compat_urllib_parse.urlencode({
- 'broadcast_id': broadcast_id,
- 'entry_ticket': '',
- 'from_push': 'false',
- 'uses_sessions': 'true',
- }).encode('utf-8'))
- return self.url_result(
- self._download_json(request, broadcast_id)['share_url'], 'Periscope')
import os.path
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
- req = compat_urllib_request.Request(url, post, headers)
+ req = sanitized_Request(url, post, headers)
webpage = self._download_webpage(
req, video_id, note='Downloading video page ...')
from __future__ import unicode_literals
-import re
import json
+import random
+import collections
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
- compat_urllib_request,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
+ sanitized_Request,
)
-class PluralsightIE(InfoExtractor):
+class PluralsightBaseIE(InfoExtractor):
+ _API_BASE = 'http://app.pluralsight.com'
+
+
+class PluralsightIE(PluralsightBaseIE):
IE_NAME = 'pluralsight'
- _VALID_URL = r'https?://(?:www\.)?pluralsight\.com/training/player\?author=(?P<author>[^&]+)&name=(?P<name>[^&]+)(?:&mode=live)?&clip=(?P<clip>\d+)&course=(?P<course>[^&]+)'
- _LOGIN_URL = 'https://www.pluralsight.com/id/'
+ _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/training/player\?'
+ _LOGIN_URL = 'https://app.pluralsight.com/id/'
+
_NETRC_MACHINE = 'pluralsight'
- _TEST = {
+ _TESTS = [{
'url': 'http://www.pluralsight.com/training/player?author=mike-mckeown&name=hosting-sql-server-windows-azure-iaas-m7-mgmt&mode=live&clip=3&course=hosting-sql-server-windows-azure-iaas',
'md5': '4d458cf5cf4c593788672419a8dd4cf8',
'info_dict': {
'duration': 338,
},
'skip': 'Requires pluralsight account credentials',
- }
+ }, {
+ 'url': 'https://app.pluralsight.com/training/player?course=angularjs-get-started&author=scott-allen&name=angularjs-get-started-m1-introduction&clip=0&mode=live',
+ 'only_matching': True,
+ }, {
+ # available without pluralsight account
+ 'url': 'http://app.pluralsight.com/training/player?author=scott-allen&name=angularjs-get-started-m1-introduction&mode=live&clip=0&course=angularjs-get-started',
+ 'only_matching': True,
+ }]
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
- self.raise_login_required('Pluralsight account is required')
+ return
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
post_url, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
+ if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
+ raise ExtractorError('Unable to log in')
+
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- author = mobj.group('author')
- name = mobj.group('name')
- clip_id = mobj.group('clip')
- course = mobj.group('course')
+ qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+
+ author = qs.get('author', [None])[0]
+ name = qs.get('name', [None])[0]
+ clip_id = qs.get('clip', [None])[0]
+ course = qs.get('course', [None])[0]
+
+ if any(not f for f in (author, name, clip_id, course,)):
+ raise ExtractorError('Invalid URL', expected=True)
display_id = '%s-%s' % (name, clip_id)
webpage = self._download_webpage(url, display_id)
- collection = self._parse_json(
- self._search_regex(
- r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
- webpage, 'modules'),
- display_id)
+ modules = self._search_regex(
+ r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
+ webpage, 'modules', default=None)
+
+ if modules:
+ collection = self._parse_json(modules, display_id)
+ else:
+ # Webpage may be served in different layout (see
+ # https://github.com/rg3/youtube-dl/issues/7607)
+ collection = self._parse_json(
+ self._search_regex(
+ r'var\s+initialState\s*=\s*({.+?});\n', webpage, 'initial state'),
+ display_id)['course']['modules']
module, clip = None, None
for module_ in collection:
- if module_.get('moduleName') == name:
+ if name in (module_.get('moduleName'), module_.get('name')):
module = module_
for clip_ in module_.get('clips', []):
clip_index = clip_.get('clipIndex')
+ if clip_index is None:
+ clip_index = clip_.get('index')
if clip_index is None:
continue
if compat_str(clip_index) == clip_id:
'high': {'width': 1024, 'height': 768},
}
+ AllowedQuality = collections.namedtuple('AllowedQuality', ['ext', 'qualities'])
+
ALLOWED_QUALITIES = (
- ('webm', ('high',)),
- ('mp4', ('low', 'medium', 'high',)),
+ AllowedQuality('webm', ('high',)),
+ AllowedQuality('mp4', ('low', 'medium', 'high',)),
)
+ # In order to minimize the number of calls to ViewClip API and reduce
+ # the probability of being throttled or banned by Pluralsight we will request
+ # only single format until formats listing was explicitly requested.
+ if self._downloader.params.get('listformats', False):
+ allowed_qualities = ALLOWED_QUALITIES
+ else:
+ def guess_allowed_qualities():
+ req_format = self._downloader.params.get('format') or 'best'
+ req_format_split = req_format.split('-')
+ if len(req_format_split) > 1:
+ req_ext, req_quality = req_format_split
+ for allowed_quality in ALLOWED_QUALITIES:
+ if req_ext == allowed_quality.ext and req_quality in allowed_quality.qualities:
+ return (AllowedQuality(req_ext, (req_quality, )), )
+ req_ext = 'webm' if self._downloader.params.get('prefer_free_formats') else 'mp4'
+ return (AllowedQuality(req_ext, ('high', )), )
+ allowed_qualities = guess_allowed_qualities()
+
formats = []
- for ext, qualities in ALLOWED_QUALITIES:
+ for ext, qualities in allowed_qualities:
for quality in qualities:
f = QUALITIES[quality].copy()
clip_post = {
'mt': ext,
'q': '%dx%d' % (f['width'], f['height']),
}
- request = compat_urllib_request.Request(
- 'http://www.pluralsight.com/training/Player/ViewClip',
+ request = sanitized_Request(
+ '%s/training/Player/ViewClip' % self._API_BASE,
json.dumps(clip_post).encode('utf-8'))
request.add_header('Content-Type', 'application/json;charset=utf-8')
format_id = '%s-%s' % (ext, quality)
clip_url = self._download_webpage(
request, display_id, 'Downloading %s URL' % format_id, fatal=False)
+
+ # Pluralsight tracks multiple sequential calls to ViewClip API and start
+ # to return 429 HTTP errors after some time (see
+ # https://github.com/rg3/youtube-dl/pull/6989). Moreover it may even lead
+ # to account ban (see https://github.com/rg3/youtube-dl/issues/6842).
+ # To somewhat reduce the probability of these consequences
+ # we will sleep random amount of time before each call to ViewClip.
+ self._sleep(
+ random.randint(2, 5), display_id,
+ '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
+
if not clip_url:
continue
f.update({
}
-class PluralsightCourseIE(InfoExtractor):
+class PluralsightCourseIE(PluralsightBaseIE):
IE_NAME = 'pluralsight:course'
- _VALID_URL = r'https?://(?:www\.)?pluralsight\.com/courses/(?P<id>[^/]+)'
- _TEST = {
+ _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/(?:library/)?courses/(?P<id>[^/]+)'
+ _TESTS = [{
# Free course from Pluralsight Starter Subscription for Microsoft TechNet
# https://offers.pluralsight.com/technet?loc=zTS3z&prod=zOTprodz&tech=zOttechz&prog=zOTprogz&type=zSOz&media=zOTmediaz&country=zUSz
'url': 'http://www.pluralsight.com/courses/hosting-sql-server-windows-azure-iaas',
'description': 'md5:61b37e60f21c4b2f91dc621a977d0986',
},
'playlist_count': 31,
- }
+ }, {
+ # available without pluralsight account
+ 'url': 'https://www.pluralsight.com/courses/angularjs-get-started',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://app.pluralsight.com/library/courses/understanding-microsoft-azure-amazon-aws/table-of-contents',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
course_id = self._match_id(url)
# TODO: PSM cookie
course = self._download_json(
- 'http://www.pluralsight.com/data/course/%s' % course_id,
+ '%s/data/course/%s' % (self._API_BASE, course_id),
course_id, 'Downloading course JSON')
title = course['title']
description = course.get('description') or course.get('shortDescription')
course_data = self._download_json(
- 'http://www.pluralsight.com/data/course/content/%s' % course_id,
+ '%s/data/course/content/%s' % (self._API_BASE, course_id),
course_id, 'Downloading course data JSON')
entries = []
if not player_parameters:
continue
entries.append(self.url_result(
- 'http://www.pluralsight.com/training/player?%s' % player_parameters,
+ '%s/training/player?%s' % (self._API_BASE, player_parameters),
'Pluralsight'))
return self.playlist_result(entries, course_id, title, description)
webpage = self._download_webpage(url, display_id or video_id)
title = self._html_search_regex(
- r'<title>(.+) porn HD.+?</title>', webpage, 'title')
+ [r'<span[^>]+class=["\']video-name["\'][^>]*>([^<]+)',
+ r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title')
description = self._html_search_regex(
r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
view_count = int_or_none(self._html_search_regex(
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
- compat_urllib_request,
)
from ..utils import (
ExtractorError,
+ sanitized_Request,
str_to_int,
)
from ..aes import (
def _real_extract(self, url):
video_id = self._match_id(url)
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
import json
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
int_or_none,
+ sanitized_Request,
)
'authenticationSpaceKey': originAuthenticationSpaceKey,
'credentials': 'Clip Application',
}
- token_req = compat_urllib_request.Request(
+ token_req = sanitized_Request(
'https://api.aebn.net/auth/v1/token/primal',
data=json.dumps(token_req_data).encode('utf-8'))
token_req.add_header('Content-Type', 'application/json')
token = token_answer['tokenKey']
# Get video URL
- delivery_req = compat_urllib_request.Request(
+ delivery_req = sanitized_Request(
'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id)
delivery_req.add_header('Authorization', token)
delivery_info = self._download_json(
video_url = delivery_info['mediaUrl']
# Get additional info (title etc.)
- info_req = compat_urllib_request.Request(
+ info_req = sanitized_Request(
'https://api.aebn.net/content/v1/clips/%s?expand='
'title,description,primaryImageNumber,startSecond,endSecond,'
'movie.title,movie.MovieId,movie.boxCoverFront,movie.stars,'
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
+from ..compat import compat_urllib_parse
+from ..utils import (
+ ExtractorError,
+ sanitized_Request,
)
-from ..utils import ExtractorError
class PrimeShareTVIE(InfoExtractor):
webpage, 'wait time', default=7)) + 1
self._sleep(wait_time, video_id)
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
url, compat_urllib_parse.urlencode(fields), headers)
video_page = self._download_webpage(
req, video_id, 'Downloading video page')
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
determine_ext,
ExtractorError,
+ sanitized_Request,
)
fields = self._hidden_inputs(webpage)
post = compat_urllib_parse.urlencode(fields)
- req = compat_urllib_request.Request(url, post)
+ req = sanitized_Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
from .common import InfoExtractor
from ..utils import (
+ sanitized_Request,
strip_jsonp,
unescapeHTML,
clean_html,
)
-from ..compat import compat_urllib_request
class QQMusicIE(InfoExtractor):
singer_desc = None
if singer_id:
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg?utf8=1&outCharset=utf-8&format=xml&singerid=%s' % singer_id)
req.add_header(
'Referer', 'http://s.plcloud.music.qq.com/xhr_proxy_utf8.html')
import time
from .common import InfoExtractor
-from ..compat import compat_urllib_request, compat_urlparse
from ..utils import (
ExtractorError,
float_or_none,
remove_end,
+ sanitized_Request,
std_headers,
struct_unpack,
)
if info['state'] == 'DESPU':
raise ExtractorError('The video is no longer available', expected=True)
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id)
- png_request = compat_urllib_request.Request(png_url)
+ png_request = sanitized_Request(png_url)
png_request.add_header('Referer', url)
png = self._download_webpage(png_request, video_id, 'Downloading url information')
video_url = _decrypt_url(png)
if not video_url.endswith('.f4m'):
- auth_url = video_url.replace(
+ video_url = video_url.replace(
'resources/', 'auth/resources/'
).replace('.net.rtve', '.multimedia.cdn.rtve')
- video_path = self._download_webpage(
- auth_url, video_id, 'Getting video url')
- # Use mvod1.akcdn instead of flash.akamaihd.multimedia.cdn to get
- # the right Content-Length header and the mp4 format
- video_url = compat_urlparse.urljoin(
- 'http://mvod1.akcdn.rtve.es/', video_path)
subtitles = None
if info.get('sbtFile') is not None:
compat_str,
)
from ..utils import (
- ExtractorError,
+ determine_ext,
unified_strdate,
)
'http://rutube.ru/api/play/options/%s/?format=json' % video_id,
video_id, 'Downloading options JSON')
- m3u8_url = options['video_balancer'].get('m3u8')
- if m3u8_url is None:
- raise ExtractorError('Couldn\'t find m3u8 manifest url')
- formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+ formats = []
+ for format_id, format_url in options['video_balancer'].items():
+ ext = determine_ext(format_url)
+ if ext == 'm3u8':
+ m3u8_formats = self._extract_m3u8_formats(
+ format_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)
+ if m3u8_formats:
+ formats.extend(m3u8_formats)
+ elif ext == 'f4m':
+ f4m_formats = self._extract_f4m_formats(
+ format_url, video_id, f4m_id=format_id, fatal=False)
+ if f4m_formats:
+ formats.extend(f4m_formats)
+ else:
+ formats.append({
+ 'url': format_url,
+ 'format_id': format_id,
+ })
+ self._sort_formats(formats)
return {
'id': video['id'],
class RutubeEmbedIE(InfoExtractor):
IE_NAME = 'rutube:embed'
IE_DESC = 'Rutube embedded videos'
- _VALID_URL = 'https?://rutube\.ru/video/embed/(?P<id>[0-9]+)'
+ _VALID_URL = 'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
- _TEST = {
+ _TESTS = [{
'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=',
'info_dict': {
'id': 'a10e53b86e8f349080f718582ce4c661',
'params': {
'skip_download': 'Requires ffmpeg',
},
- }
+ }, {
+ 'url': 'http://rutube.ru/play/embed/8083783',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
embed_id = self._match_id(url)
extract_formats(child)
elif child.tag.endswith('File'):
video_url = child.text
- if not video_url or video_url in processed_urls or 'NOT_USED' in video_url:
+ if (not video_url or video_url in processed_urls or
+ any(p in video_url for p in ('NOT_USED', 'NOT-USED'))):
return
processed_urls.append(video_url)
ext = determine_ext(video_url)
if ext == 'm3u8':
- formats.extend(self._extract_m3u8_formats(
- video_url, video_id, 'mp4', m3u8_id='hls'))
+ m3u8_formats = self._extract_m3u8_formats(
+ video_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
+ if m3u8_formats:
+ formats.extend(m3u8_formats)
elif ext == 'f4m':
- formats.extend(self._extract_f4m_formats(
- video_url, video_id, f4m_id='hds'))
+ f4m_formats = self._extract_f4m_formats(
+ video_url, video_id, f4m_id='hds', fatal=False)
+ if f4m_formats:
+ formats.extend(f4m_formats)
else:
proto = compat_urllib_parse_urlparse(video_url).scheme
if not child.tag.startswith('HTTP') and proto != 'rtmp':
import re
from .common import InfoExtractor
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
+ sanitized_Request,
smuggle_url,
std_headers,
)
'next': '',
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form), headers=headers)
login_page = self._download_webpage(
request, None, 'Logging in as %s' % username)
'%s/%s/chapter-content/%s.html' % (self._API_BASE, course_id, part),
part)
- bc_url = BrightcoveIE._extract_brightcove_url(webpage)
+ bc_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
if not bc_url:
raise ExtractorError('Could not extract Brightcove URL from %s' % url, expected=True)
- return self.url_result(smuggle_url(bc_url, {'Referer': url}), 'Brightcove')
+ return self.url_result(smuggle_url(bc_url, {'Referer': url}), 'BrightcoveLegacy')
class SafariCourseIE(SafariBaseIE):
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urlparse,
-)
+from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
+ sanitized_Request,
unified_strdate,
)
def _real_extract(self, url):
video_id = self._match_id(url)
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4')
webpage = self._download_webpage(req, video_id)
import base64
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
int_or_none,
+ sanitized_Request,
)
'Video %s does not exist' % video_id, expected=True)
download_form = self._hidden_inputs(webpage)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
url, compat_urllib_parse.urlencode(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
parse_duration,
+ sanitized_Request,
)
'method_free': 'Free'
}
post = compat_urllib_parse.urlencode(fields)
- req = compat_urllib_request.Request(url, post)
+ req = sanitized_Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(req, video_id,
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
class SinaIE(InfoExtractor):
if mobj.group('token') is not None:
# The video id is in the redirected url
self.to_screen('Getting video id')
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.get_method = lambda: 'HEAD'
(_, urlh) = self._download_webpage_handle(request, 'NA', False)
return self._real_extract(urlh.geturl())
import uuid
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
int_or_none,
+ sanitized_Request,
unified_strdate,
)
if video_password:
video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'http://smotri.com/video/view/url/bot/', compat_urllib_parse.urlencode(video_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
'password': password,
}
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
broadcast_url + '/?no_redirect=1', compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
broadcast_page = self._download_webpage(
from .common import InfoExtractor
from ..compat import (
compat_str,
- compat_urllib_request,
compat_urllib_parse,
)
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
else:
base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
- req = compat_urllib_request.Request(base_data_url + vid_id)
+ req = sanitized_Request(base_data_url + vid_id)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
import re
import itertools
-from .common import InfoExtractor
+from .common import (
+ InfoExtractor,
+ SearchInfoExtractor
+)
from ..compat import (
compat_str,
compat_urlparse,
compat_urllib_parse,
)
from ..utils import (
+ encode_dict,
ExtractorError,
int_or_none,
unified_strdate,
'description': data.get('description'),
'entries': entries,
}
+
+
+class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
+ IE_NAME = 'soundcloud:search'
+ IE_DESC = 'Soundcloud search'
+ _MAX_RESULTS = float('inf')
+ _TESTS = [{
+ 'url': 'scsearch15:post-avant jazzcore',
+ 'info_dict': {
+ 'title': 'post-avant jazzcore',
+ },
+ 'playlist_count': 15,
+ }]
+
+ _SEARCH_KEY = 'scsearch'
+ _MAX_RESULTS_PER_PAGE = 200
+ _DEFAULT_RESULTS_PER_PAGE = 50
+ _API_V2_BASE = 'https://api-v2.soundcloud.com'
+
+ def _get_collection(self, endpoint, collection_id, **query):
+ limit = min(
+ query.get('limit', self._DEFAULT_RESULTS_PER_PAGE),
+ self._MAX_RESULTS_PER_PAGE)
+ query['limit'] = limit
+ query['client_id'] = self._CLIENT_ID
+ query['linked_partitioning'] = '1'
+ query['offset'] = 0
+ data = compat_urllib_parse.urlencode(encode_dict(query))
+ next_url = '{0}{1}?{2}'.format(self._API_V2_BASE, endpoint, data)
+
+ collected_results = 0
+
+ for i in itertools.count(1):
+ response = self._download_json(
+ next_url, collection_id, 'Downloading page {0}'.format(i),
+ 'Unable to download API page')
+
+ collection = response.get('collection', [])
+ if not collection:
+ break
+
+ collection = list(filter(bool, collection))
+ collected_results += len(collection)
+
+ for item in collection:
+ yield self.url_result(item['uri'], SoundcloudIE.ie_key())
+
+ if not collection or collected_results >= limit:
+ break
+
+ next_url = response.get('next_href')
+ if not next_url:
+ break
+
+ def _get_n_results(self, query, n):
+ tracks = self._get_collection('/search/tracks', query, limit=n, q=query)
+ return self.playlist_result(tracks, playlist_title=query)
import re
from .common import InfoExtractor
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
from ..utils import RegexNotFoundError, ExtractorError
class SpaceIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
_TEST = {
- 'add_ie': ['Brightcove'],
+ 'add_ie': ['BrightcoveLegacy'],
'url': 'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
'info_dict': {
'id': '2780937028001',
brightcove_url = self._og_search_video_url(webpage)
except RegexNotFoundError:
# Other videos works fine with the info from the object
- brightcove_url = BrightcoveIE._extract_brightcove_url(webpage)
+ brightcove_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
if brightcove_url is None:
raise ExtractorError(
'The webpage does not contain a video', expected=True)
- return self.url_result(brightcove_url, BrightcoveIE.ie_key())
+ return self.url_result(brightcove_url, BrightcoveLegacyIE.ie_key())
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
- compat_urllib_request,
)
from ..utils import (
+ sanitized_Request,
str_to_int,
unified_strdate,
)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
- req = compat_urllib_request.Request('http://www.' + mobj.group('url'))
+ req = sanitized_Request('http://www.' + mobj.group('url'))
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
parse_iso8601,
+ sanitized_Request,
)
api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
- req = compat_urllib_request.Request(api_url, headers={
+ req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',
'Referer': url,
})
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
class StreamcloudIE(InfoExtractor):
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
- req = compat_urllib_request.Request(url, post, headers)
+ req = sanitized_Request(url, post, headers)
webpage = self._download_webpage(
req, video_id, note='Downloading video page ...')
import time
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
int_or_none,
+ sanitized_Request,
)
video_id = self._match_id(url)
api_path = '/episode/%s' % video_id
- req = compat_urllib_request.Request(self._API_URL + api_path)
+ req = sanitized_Request(self._API_URL + api_path)
req.add_header('Api-Password', _get_api_key(api_path))
data = self._download_json(req, video_id)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
clean_html,
ExtractorError,
float_or_none,
parse_iso8601,
+ sanitized_Request,
)
display_id = mobj.group('id')
playlist_url = self._API_URL.format(display_id)
- request = compat_urllib_request.Request(playlist_url)
+ request = sanitized_Request(playlist_url)
request.add_header('X-Requested-With', 'XMLHttpRequest')
request.add_header('Accept', 'application/json')
request.add_header('Referer', url)
'upload_date': '20150701',
'categories': ['Today/Shows/Orange Room', 'Today/Sections/Money', 'Today/Topics/Tech', "Today/Topics/Editor's picks"],
},
+ }, {
+ # From http://www.nbc.com/the-blacklist/video/sir-crispin-crandall/2928790?onid=137781#vc137781=1
+ # geo-restricted (US), HLS encrypted with AES-128
+ 'url': 'http://player.theplatform.com/p/NnzsPC/onsite_universal/select/media/guid/2410887629/2928790?fwsitesection=nbc_the_blacklist_video_library&autoPlay=true&carouselID=137781',
+ 'only_matching': True,
}]
@staticmethod
# Seems there's no pattern for the interested script filename, so
# I try one by one
for script in reversed(scripts):
- feed_script = self._download_webpage(script, video_id, 'Downloading feed script')
- feed_id = self._search_regex(r'defaultFeedId\s*:\s*"([^"]+)"', feed_script, 'default feed id', default=None)
+ feed_script = self._download_webpage(
+ self._proto_relative_url(script, 'http:'),
+ video_id, 'Downloading feed script')
+ feed_id = self._search_regex(
+ r'defaultFeedId\s*:\s*"([^"]+)"', feed_script,
+ 'default feed id', default=None)
if feed_id is not None:
break
if feed_id is None:
if smuggled_data.get('force_smil_url', False):
smil_url = url
+ # Explicitly specified SMIL (see https://github.com/rg3/youtube-dl/issues/7385)
+ elif '/guid/' in url:
+ webpage = self._download_webpage(url, video_id)
+ smil_url = self._search_regex(
+ r'<link[^>]+href=(["\'])(?P<url>.+?)\1[^>]+type=["\']application/smil\+xml',
+ webpage, 'smil url', group='url')
+ path = self._search_regex(
+ r'link\.theplatform\.com/s/((?:[^/?#&]+/)+[^/?#&]+)', smil_url, 'path')
+ smil_url += '?' if '?' not in smil_url else '&' + 'formats=m3u,mpeg4&format=SMIL'
elif mobj.group('config'):
config_url = url + '&form=json'
config_url = config_url.replace('swf/', 'config/')
import re
from .common import InfoExtractor
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
from .discovery import DiscoveryIE
from ..compat import compat_urlparse
return {
'_type': 'url',
- 'url': BrightcoveIE._extract_brightcove_url(iframe),
- 'ie': BrightcoveIE.ie_key(),
+ 'url': BrightcoveLegacyIE._extract_brightcove_url(iframe),
+ 'ie': BrightcoveLegacyIE.ie_key(),
}
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse_urlparse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_urlparse
from ..utils import (
int_or_none,
+ sanitized_Request,
str_to_int,
)
from ..aes import aes_decrypt_text
video_id = mobj.group('id')
display_id = mobj.group('display_id')
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, display_id)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
int_or_none,
+ sanitized_Request,
)
'password': password,
}
payload = compat_urllib_parse.urlencode(form_data).encode('utf-8')
- request = compat_urllib_request.Request(self._LOGIN_URL, payload)
+ request = sanitized_Request(self._LOGIN_URL, payload)
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
login_page = self._download_webpage(
request, None, False, 'Wrong login info')
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
- compat_urllib_request,
compat_urlparse,
)
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
+ sanitized_Request,
)
for cookie in self._downloader.cookiejar:
if cookie.name == 'api_token':
headers['Twitch-Api-Token'] = cookie.value
- request = compat_urllib_request.Request(url, headers=headers)
+ request = sanitized_Request(url, headers=headers)
response = super(TwitchBaseIE, self)._download_json(request, video_id, note)
self._handle_error(response)
return response
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(redirect_url, post_url)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
post_url, compat_urllib_parse.urlencode(encode_dict(login_form)).encode('utf-8'))
request.add_header('Referer', redirect_url)
response = self._download_webpage(
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
xpath_text,
remove_end,
+ int_or_none,
+ ExtractorError,
+ sanitized_Request,
)
_TESTS = [
{
'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
- 'md5': '7d2f6b4d2eb841a7ccc893d479bfceb4',
+ 'md5': '4fa26a35f9d1bf4b646590ba8e84be19',
'info_dict': {
'id': '560070183650213889',
'ext': 'mp4',
'uploader': 'OMG! Ubuntu!',
'uploader_id': 'omgubuntu',
},
+ 'add_ie': ['Youtube'],
+ },
+ {
+ 'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
+ 'md5': 'ab2745d0b0ce53319a534fccaa986439',
+ 'info_dict': {
+ 'id': 'iBb2x00UVlv',
+ 'ext': 'mp4',
+ 'upload_date': '20151113',
+ 'uploader_id': '1189339351084113920',
+ 'uploader': '@ArsenalTerje',
+ 'title': 'Vine by @ArsenalTerje',
+ },
+ 'add_ie': ['Vine'],
}
]
config = None
formats = []
for user_agent in USER_AGENTS:
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.add_header('User-Agent', user_agent)
webpage = self._download_webpage(request, video_id)
- youtube_url = self._html_search_regex(
- r'<iframe[^>]+src="((?:https?:)?//www.youtube.com/embed/[^"]+)"',
- webpage, 'youtube iframe', default=None)
- if youtube_url:
- return self.url_result(youtube_url, 'Youtube')
+ iframe_url = self._html_search_regex(
+ r'<iframe[^>]+src="((?:https?:)?//(?:www.youtube.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
+ webpage, 'video iframe', default=None)
+ if iframe_url:
+ return self.url_result(iframe_url)
config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'data player config'),
_VALID_URL = r'https?://(?:www\.|m\.|mobile\.)?twitter\.com/(?P<user_id>[^/]+)/status/(?P<id>\d+)'
_TEMPLATE_URL = 'https://twitter.com/%s/status/%s'
- _TEST = {
+ _TESTS = [{
'url': 'https://twitter.com/freethenipple/status/643211948184596480',
- 'md5': '31cd83a116fc41f99ae3d909d4caf6a0',
+ 'md5': 'db6612ec5d03355953c3ca9250c97e5e',
'info_dict': {
'id': '643211948184596480',
'ext': 'mp4',
'uploader': 'FREE THE NIPPLE',
'uploader_id': 'freethenipple',
},
- }
+ }, {
+ 'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
+ 'md5': 'f36dcd5fb92bf7057f155e7d927eeb42',
+ 'info_dict': {
+ 'id': '657991469417025536',
+ 'ext': 'mp4',
+ 'title': 'Gifs - tu vai cai tu vai cai tu nao eh capaz disso tu vai cai',
+ 'description': 'Gifs on Twitter: "tu vai cai tu vai cai tu nao eh capaz disso tu vai cai https://t.co/tM46VHFlO5"',
+ 'thumbnail': 're:^https?://.*\.png',
+ 'uploader': 'Gifs',
+ 'uploader_id': 'giphz',
+ },
+ }, {
+ 'url': 'https://twitter.com/starwars/status/665052190608723968',
+ 'md5': '39b7199856dee6cd4432e72c74bc69d4',
+ 'info_dict': {
+ 'id': '665052190608723968',
+ 'ext': 'mp4',
+ 'title': 'Star Wars - A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens.',
+ 'description': 'Star Wars on Twitter: "A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens."',
+ 'uploader_id': 'starwars',
+ 'uploader': 'Star Wars',
+ },
+ }]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
username = remove_end(self._og_search_title(webpage), ' on Twitter')
- title = self._og_search_description(webpage).strip('').replace('\n', ' ')
+ title = description = self._og_search_description(webpage).strip('').replace('\n', ' ').strip('“”')
# strip 'https -_t.co_BJYgOjSeGA' junk from filenames
- mobj = re.match(r'“(.*)\s+(https?://[^ ]+)”', title)
- title, short_url = mobj.groups()
-
- card_id = self._search_regex(
- r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url')
- card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
+ title = re.sub(r'\s+(https?://[^ ]+)', '', title)
- return {
- '_type': 'url_transparent',
- 'ie_key': 'TwitterCard',
+ info = {
'uploader_id': user_id,
'uploader': username,
- 'url': card_url,
'webpage_url': url,
- 'description': '%s on Twitter: "%s %s"' % (username, title, short_url),
+ 'description': '%s on Twitter: "%s"' % (username, description),
'title': username + ' - ' + title,
}
+
+ card_id = self._search_regex(
+ r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url', default=None)
+ if card_id:
+ card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
+ info.update({
+ '_type': 'url_transparent',
+ 'ie_key': 'TwitterCard',
+ 'url': card_url,
+ })
+ return info
+
+ mobj = re.search(r'''(?x)
+ <video[^>]+class="animated-gif"[^>]+
+ (?:data-height="(?P<height>\d+)")?[^>]+
+ (?:data-width="(?P<width>\d+)")?[^>]+
+ (?:poster="(?P<poster>[^"]+)")?[^>]*>\s*
+ <source[^>]+video-src="(?P<url>[^"]+)"
+ ''', webpage)
+
+ if mobj:
+ info.update({
+ 'id': twid,
+ 'url': mobj.group('url'),
+ 'height': int_or_none(mobj.group('height')),
+ 'width': int_or_none(mobj.group('width')),
+ 'thumbnail': mobj.group('poster'),
+ })
+ return info
+
+ raise ExtractorError('There\'s not video in this tweet.')
)
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
for header, value in headers.items():
url_or_request.add_header(header, value)
else:
- url_or_request = compat_urllib_request.Request(url_or_request, headers=headers)
+ url_or_request = sanitized_Request(url_or_request, headers=headers)
response = super(UdemyIE, self)._download_json(url_or_request, video_id, note)
self._handle_error(response)
'password': password.encode('utf-8'),
})
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Referer', self._ORIGIN_URL)
request.add_header('Origin', self._ORIGIN_URL)
class UDNEmbedIE(InfoExtractor):
IE_DESC = '聯合影音'
- _VALID_URL = r'https?://video\.udn\.com/(?:embed|play)/news/(?P<id>\d+)'
+ _PROTOCOL_RELATIVE_VALID_URL = r'//video\.udn\.com/(?:embed|play)/news/(?P<id>\d+)'
+ _VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL
_TESTS = [{
'url': 'http://video.udn.com/embed/news/300040',
'md5': 'de06b4c90b042c128395a88f0384817e',
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
- compat_urllib_request,
compat_urlparse,
)
from ..utils import (
ExtractorError,
+ sanitized_Request,
)
info_url = "http://vbox7.com/play/magare.do"
data = compat_urllib_parse.urlencode({'as3': '1', 'vid': video_id})
- info_request = compat_urllib_request.Request(info_url, data)
+ info_request = sanitized_Request(info_url, data)
info_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
info_response = self._download_webpage(info_request, video_id, 'Downloading info webpage')
if info_response is None:
import json
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
-)
from ..utils import (
int_or_none,
ExtractorError,
+ sanitized_Request,
)
if 'class="adultwarning-container"' in webpage:
self.report_age_confirmation()
age_limit = 18
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.add_header('Cookie', 'confirmedAdult=true')
webpage = self._download_webpage(request, video_id)
import json
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
ExtractorError,
parse_iso8601,
+ sanitized_Request,
)
@staticmethod
def make_json_request(url, data):
payload = json.dumps(data).encode('utf-8')
- req = compat_urllib_request.Request(url, payload)
+ req = sanitized_Request(url, payload)
req.add_header('Content-Type', 'application/json; charset=utf-8')
return req
import re
from .common import InfoExtractor
-from ..compat import (
- compat_etree_fromstring,
- compat_urllib_request,
-)
+from ..compat import compat_etree_fromstring
from ..utils import (
ExtractorError,
int_or_none,
+ sanitized_Request,
)
_SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/'
def _real_initialize(self):
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
'http://www.vevo.com/auth', data=b'')
webpage = self._download_webpage(
req, None,
from ..utils import (
float_or_none,
int_or_none,
-)
-from ..compat import (
- compat_urllib_request
+ sanitized_Request,
)
'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?video_id=%s&key=v0vhrt7bg2xq1vyxhkct' %
video_id)
headers = {'Referer': 'http://static.cdn-ec.viddler.com/js/arpeggio/v2/embed.html'}
- request = compat_urllib_request.Request(json_url, None, headers)
+ request = sanitized_Request(json_url, None, headers)
data = self._download_json(request, video_id)['video']
formats = []
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_request
+from ..utils import sanitized_Request
class VideoMegaIE(InfoExtractor):
video_id = self._match_id(url)
iframe_url = 'http://videomega.tv/cdn.php?ref=%s' % video_id
- req = compat_urllib_request.Request(iframe_url)
+ req = sanitized_Request(iframe_url)
req.add_header('Referer', url)
req.add_header('Cookie', 'noadvtday=0')
webpage = self._download_webpage(req, video_id)
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
- compat_urllib_request,
compat_urllib_parse,
compat_urllib_parse_unquote,
)
ExtractorError,
int_or_none,
parse_iso8601,
+ sanitized_Request,
HEADRequest,
)
_ACCEPT_HEADER = 'application/json, text/javascript, */*; q=0.01'
def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True):
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.add_header('Accept', self._ACCEPT_HEADER)
request.add_header('Auth-token', self._AUTH_TOKEN)
return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal)
import hashlib
import itertools
+from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
+ sanitized_Request,
)
-from ..compat import compat_urllib_request
-from .common import InfoExtractor
class VikiBaseIE(InfoExtractor):
hashlib.sha1
).hexdigest()
url = self._API_URL_TEMPLATE % (query, sig)
- return compat_urllib_request.Request(
+ return sanitized_Request(
url, json.dumps(post_data).encode('utf-8')) if post_data else url
def _call_api(self, path, video_id, note, timestamp=None, post_data=None):
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
- compat_urllib_request,
compat_urlparse,
)
from ..utils import (
InAdvancePagedList,
int_or_none,
RegexNotFoundError,
+ sanitized_Request,
smuggle_url,
std_headers,
unified_strdate,
'service': 'vimeo',
'token': token,
}))
- login_request = compat_urllib_request.Request(self._LOGIN_URL, data)
+ login_request = sanitized_Request(self._LOGIN_URL, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
- login_request.add_header('Cookie', 'vuid=%s' % vuid)
login_request.add_header('Referer', self._LOGIN_URL)
+ self._set_vimeo_cookie('vuid', vuid)
self._download_webpage(login_request, None, False, 'Wrong login info')
def _extract_xsrft_and_vuid(self, webpage):
webpage, 'vuid', group='vuid')
return xsrft, vuid
+ def _set_vimeo_cookie(self, name, value):
+ self._set_cookie('vimeo.com', name, value)
+
class VimeoIE(VimeoBaseInfoExtractor):
"""Information extractor for vimeo.com."""
'note': 'Video not completely processed, "failed" seed status',
'only_matching': True,
},
+ {
+ 'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
+ 'only_matching': True,
+ },
]
@staticmethod
if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://')
- password_request = compat_urllib_request.Request(url + '/password', data)
+ password_request = sanitized_Request(url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
- password_request.add_header('Cookie', 'clip_test2=1; vuid=%s' % vuid)
password_request.add_header('Referer', url)
+ self._set_vimeo_cookie('vuid', vuid)
return self._download_webpage(
password_request, video_id,
'Verifying the password', 'Wrong password')
raise ExtractorError('This video is protected by a password, use the --video-password option')
data = urlencode_postdata(encode_dict({'password': password}))
pass_url = url + '/check-password'
- password_request = compat_urllib_request.Request(pass_url, data)
+ password_request = sanitized_Request(pass_url, data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
return self._download_json(
password_request, video_id,
url = 'https://vimeo.com/' + video_id
# Retrieve video webpage to extract further information
- request = compat_urllib_request.Request(url, None, headers)
+ request = sanitized_Request(url, None, headers)
try:
webpage = self._download_webpage(request, video_id)
except ExtractorError as ee:
like_count = None
comment_count = None
- # Vimeo specific: extract request signature and timestamp
- sig = config['request']['signature']
- timestamp = config['request']['timestamp']
-
- # Vimeo specific: extract video codec and quality information
- # First consider quality, then codecs, then take everything
- codecs = [('vp6', 'flv'), ('vp8', 'flv'), ('h264', 'mp4')]
- files = {'hd': [], 'sd': [], 'other': []}
- config_files = config["video"].get("files") or config["request"].get("files")
- for codec_name, codec_extension in codecs:
- for quality in config_files.get(codec_name, []):
- format_id = '-'.join((codec_name, quality)).lower()
- key = quality if quality in files else 'other'
- video_url = None
- if isinstance(config_files[codec_name], dict):
- file_info = config_files[codec_name][quality]
- video_url = file_info.get('url')
- else:
- file_info = {}
- if video_url is None:
- video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
- % (video_id, sig, timestamp, quality, codec_name.upper())
-
- files[key].append({
- 'ext': codec_extension,
- 'url': video_url,
- 'format_id': format_id,
- 'width': int_or_none(file_info.get('width')),
- 'height': int_or_none(file_info.get('height')),
- 'tbr': int_or_none(file_info.get('bitrate')),
- })
formats = []
- m3u8_url = config_files.get('hls', {}).get('all')
+ config_files = config['video'].get('files') or config['request'].get('files', {})
+ for f in config_files.get('progressive', []):
+ video_url = f.get('url')
+ if not video_url:
+ continue
+ formats.append({
+ 'url': video_url,
+ 'format_id': 'http-%s' % f.get('quality'),
+ 'width': int_or_none(f.get('width')),
+ 'height': int_or_none(f.get('height')),
+ 'fps': int_or_none(f.get('fps')),
+ 'tbr': int_or_none(f.get('bitrate')),
+ })
+ m3u8_url = config_files.get('hls', {}).get('url')
if m3u8_url:
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', 0, 'hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
- for key in ('other', 'sd', 'hd'):
- formats += files[key]
- self._sort_formats(formats)
+ # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
+ # at the same time without actual units specified. This lead to wrong sorting.
+ self._sort_formats(formats, field_preference=('height', 'width', 'fps', 'format_id'))
subtitles = {}
text_tracks = config['request'].get('text_tracks')
password_path = self._search_regex(
r'action="([^"]+)"', login_form, 'password URL')
password_url = compat_urlparse.urljoin(page_url, password_path)
- password_request = compat_urllib_request.Request(password_url, post)
+ password_request = sanitized_Request(password_url, post)
password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
- password_request.add_header('Cookie', 'vuid=%s' % vuid)
- self._set_cookie('vimeo.com', 'xsrft', token)
+ self._set_vimeo_cookie('vuid', vuid)
+ self._set_vimeo_cookie('xsrft', token)
return self._download_webpage(
password_request, list_id,
'Verifying the password', 'Wrong password')
- def _extract_videos(self, list_id, base_url):
- video_ids = []
+ def _title_and_entries(self, list_id, base_url):
for pagenum in itertools.count(1):
page_url = self._page_url(base_url, pagenum)
webpage = self._download_webpage(
if pagenum == 1:
webpage = self._login_list_password(page_url, list_id, webpage)
+ yield self._extract_list_title(webpage)
+
+ for video_id in re.findall(r'id="clip_(\d+?)"', webpage):
+ yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
- video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break
- entries = [self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
- for video_id in video_ids]
- return {'_type': 'playlist',
- 'id': list_id,
- 'title': self._extract_list_title(webpage),
- 'entries': entries,
- }
+ def _extract_videos(self, list_id, base_url):
+ title_and_entries = self._title_and_entries(list_id, base_url)
+ list_title = next(title_and_entries)
+ return self.playlist_result(title_and_entries, list_id, list_title)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
class VimeoGroupsIE(VimeoAlbumIE):
IE_NAME = 'vimeo:group'
- _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)'
+ _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)(?:/(?!videos?/\d+)|$)'
_TESTS = [{
'url': 'https://vimeo.com/groups/rolexawards',
'info_dict': {
def _page_url(self, base_url, pagenum):
url = '%s/page:%d/' % (base_url, pagenum)
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
# Set the header to get a partial html page with the ids,
# the normal page doesn't contain them.
request.add_header('X-Requested-With', 'XMLHttpRequest')
from ..compat import (
compat_str,
compat_urllib_parse,
- compat_urllib_request,
)
from ..utils import (
ExtractorError,
orderedSet,
+ sanitized_Request,
str_to_int,
unescapeHTML,
unified_strdate,
'pass': password.encode('cp1251'),
})
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'https://login.vk.com/?act=login',
compat_urllib_parse.urlencode(login_form).encode('utf-8'))
login_page = self._download_webpage(
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
class VodlockerIE(InfoExtractor):
if fields['op'] == 'download1':
self._sleep(3, video_id) # they do detect when requests happen too fast!
post = compat_urllib_parse.urlencode(fields)
- req = compat_urllib_request.Request(url, post)
+ req = sanitized_Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urlparse,
-)
+from ..compat import compat_urlparse
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
+ sanitized_Request,
)
def _real_extract(self, url):
display_id = self._match_id(url)
- req = compat_urllib_request.Request(
+ req = sanitized_Request(
compat_urlparse.urljoin(url, '/talks/%s' % display_id))
# Older versions of Firefox get redirected to an "upgrade browser" page
req.add_header('User-Agent', 'youtube-dl')
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import compat_urllib_request
-from ..utils import ExtractorError
+from ..utils import (
+ ExtractorError,
+ sanitized_Request,
+)
class WistiaIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
- request = compat_urllib_request.Request(self._API_URL.format(video_id))
+ request = sanitized_Request(self._API_URL.format(video_id))
request.add_header('Referer', url) # Some videos require this.
data_json = self._download_json(request, video_id)
if data_json.get('error'):
'duration': duration,
'upload_date': upload_date,
'title': title,
- 'formats': formats,
'categories': categories,
}
-# -*- coding: utf-8 -*-
+# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
encode_dict,
int_or_none,
+ sanitized_Request,
)
-class GorillaVidIE(InfoExtractor):
- IE_DESC = 'GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net and filehoot.com'
+class XFileShareIE(InfoExtractor):
+ IE_DESC = 'XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me'
_VALID_URL = r'''(?x)
https?://(?P<host>(?:www\.)?
- (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net|filehoot\.com))/
+ (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net|filehoot\.com|vidto\.me))/
(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
'''
'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
'thumbnail': 're:http://.*\.jpg',
}
+ }, {
+ 'url': 'http://vidto.me/ku5glz52nqe1.html',
+ 'info_dict': {
+ 'id': 'ku5glz52nqe1',
+ 'ext': 'mp4',
+ 'title': 'test'
+ }
}]
def _real_extract(self, url):
post = compat_urllib_parse.urlencode(encode_dict(fields))
- req = compat_urllib_request.Request(url, post)
+ req = sanitized_Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(req, video_id, 'Downloading video page')
- title = self._search_regex(
- [r'style="z-index: [0-9]+;">([^<]+)</span>', r'<td nowrap>([^<]+)</td>', r'>Watch (.+) '],
- webpage, 'title', default=None) or self._og_search_title(webpage)
+ title = (self._search_regex(
+ [r'style="z-index: [0-9]+;">([^<]+)</span>',
+ r'<td nowrap>([^<]+)</td>',
+ r'>Watch (.+) ',
+ r'<h2 class="video-page-head">([^<]+)</h2>'],
+ webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
video_url = self._search_regex(
- r'file\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'file url')
+ [r'file\s*:\s*["\'](http[^"\']+)["\'],',
+ r'file_link\s*=\s*\'(https?:\/\/[0-9a-zA-z.\/\-_]+)'],
+ webpage, 'file url')
thumbnail = self._search_regex(
- r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', fatal=False)
+ r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)
formats = [{
'format_id': 'sd',
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_request,
- compat_urllib_parse_unquote,
-)
+from ..compat import compat_urllib_parse_unquote
from ..utils import (
parse_duration,
+ sanitized_Request,
str_to_int,
)
def _real_extract(self, url):
video_id = self._match_id(url)
- req = compat_urllib_request.Request(url)
+ req = sanitized_Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
import re
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse_unquote,
- compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_unquote
from ..utils import (
clean_html,
ExtractorError,
determine_ext,
+ sanitized_Request,
)
'url': video_url,
}]
- android_req = compat_urllib_request.Request(url)
+ android_req = sanitized_Request(url)
android_req.add_header('User-Agent', self._ANDROID_USER_AGENT)
android_webpage = self._download_webpage(android_req, video_id, fatal=False)
from ..compat import (
compat_str,
compat_urllib_parse,
- compat_urllib_request,
)
from ..utils import (
int_or_none,
float_or_none,
+ sanitized_Request,
)
if len(tracks) < len(track_ids):
present_track_ids = set([compat_str(track['id']) for track in tracks if track.get('id')])
missing_track_ids = set(map(compat_str, track_ids)) - set(present_track_ids)
- request = compat_urllib_request.Request(
+ request = sanitized_Request(
'https://music.yandex.ru/handlers/track-entries.jsx',
compat_urllib_parse.urlencode({
'entries': ','.join(missing_track_ids),
import base64
from .common import InfoExtractor
-from ..utils import ExtractorError
-
from ..compat import (
compat_urllib_parse,
compat_ord,
- compat_urllib_request,
+)
+from ..utils import (
+ ExtractorError,
+ sanitized_Request,
)
video_id = self._match_id(url)
def retrieve_data(req_url, note):
- req = compat_urllib_request.Request(req_url)
+ req = sanitized_Request(req_url)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
import re
from .common import InfoExtractor
-from ..compat import compat_urllib_request
from ..utils import (
int_or_none,
+ sanitized_Request,
str_to_int,
unescapeHTML,
unified_strdate,
video_id = mobj.group('id')
display_id = mobj.group('display_id')
- request = compat_urllib_request.Request(url)
+ request = sanitized_Request(url)
request.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(request, display_id)
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
- compat_urllib_request,
compat_urlparse,
compat_str,
)
orderedSet,
parse_duration,
remove_start,
+ sanitized_Request,
smuggle_url,
str_to_int,
unescapeHTML,
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('ascii')
- req = compat_urllib_request.Request(self._LOGIN_URL, login_data)
+ req = sanitized_Request(self._LOGIN_URL, login_data)
login_results = self._download_webpage(
req, None,
note='Logging in', errnote='unable to log in', fatal=False)
tfa_data = compat_urllib_parse.urlencode(encode_dict(tfa_form_strs)).encode('ascii')
- tfa_req = compat_urllib_request.Request(self._TWOFACTOR_URL, tfa_data)
+ tfa_req = sanitized_Request(self._TWOFACTOR_URL, tfa_data)
tfa_results = self._download_webpage(
tfa_req, None,
note='Submitting TFA code', errnote='unable to submit tfa', fatal=False)
return
-class YoutubePlaylistBaseInfoExtractor(InfoExtractor):
- # Extract the video ids from the playlist pages
+class YoutubeEntryListBaseInfoExtractor(InfoExtractor):
+ # Extract entries from page with "Load more" button
def _entries(self, page, playlist_id):
more_widget_html = content_html = page
for page_num in itertools.count(1):
- for video_id, video_title in self.extract_videos_from_page(content_html):
- yield self.url_result(
- video_id, 'Youtube', video_id=video_id,
- video_title=video_title)
+ for entry in self._process_page(content_html):
+ yield entry
mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
if not mobj:
break
more_widget_html = more['load_more_widget_html']
+
+class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
+ def _process_page(self, content):
+ for video_id, video_title in self.extract_videos_from_page(content):
+ yield self.url_result(video_id, 'Youtube', video_id, video_title)
+
def extract_videos_from_page(self, page):
ids_in_page = []
titles_in_page = []
return zip(ids_in_page, titles_in_page)
+class YoutubePlaylistsBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
+ def _process_page(self, content):
+ for playlist_id in re.findall(r'href="/?playlist\?list=(.+?)"', content):
+ yield self.url_result(
+ 'https://www.youtube.com/playlist?list=%s' % playlist_id, 'YoutubePlaylist')
+
+ def _real_extract(self, url):
+ playlist_id = self._match_id(url)
+ webpage = self._download_webpage(url, playlist_id)
+ title = self._og_search_title(webpage, fatal=False)
+ return self.playlist_result(self._entries(webpage, playlist_id), playlist_id, title)
+
+
class YoutubeIE(YoutubeBaseInfoExtractor):
IE_DESC = 'YouTube.com'
_VALID_URL = r"""(?x)^
'title': 'Principal Sexually Assaults A Teacher - Episode 117 - 8th June 2012',
'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7',
'uploader': 'SET India',
- 'uploader_id': 'setindia'
+ 'uploader_id': 'setindia',
+ 'age_limit': 18,
}
},
{
'info_dict': {
'id': 'lqQg6PlCWgI',
'ext': 'mp4',
- 'upload_date': '20120724',
+ 'upload_date': '20150827',
'uploader_id': 'olympic',
'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
'uploader': 'Olympics',
{
'url': 'http://vid.plus/FlRa-iH7PGw',
'only_matching': True,
- }
+ },
+ {
+ # Title with JS-like syntax "};" (see https://github.com/rg3/youtube-dl/issues/7468)
+ 'url': 'https://www.youtube.com/watch?v=lsguqyKfVQg',
+ 'info_dict': {
+ 'id': 'lsguqyKfVQg',
+ 'ext': 'mp4',
+ 'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21',
+ 'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
+ 'upload_date': '20151119',
+ 'uploader_id': 'IronSoulElf',
+ 'uploader': 'IronSoulElf',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ },
+ {
+ # Tags with '};' (see https://github.com/rg3/youtube-dl/issues/7468)
+ 'url': 'https://www.youtube.com/watch?v=Ms7iBXnlUO8',
+ 'only_matching': True,
+ },
]
def __init__(self, *args, **kwargs):
return {}
return sub_lang_list
+ def _get_ytplayer_config(self, video_id, webpage):
+ patterns = (
+ # User data may contain arbitrary character sequences that may affect
+ # JSON extraction with regex, e.g. when '};' is contained the second
+ # regex won't capture the whole JSON. Yet working around by trying more
+ # concrete regex first keeping in mind proper quoted string handling
+ # to be implemented in future that will replace this workaround (see
+ # https://github.com/rg3/youtube-dl/issues/7468,
+ # https://github.com/rg3/youtube-dl/pull/7599)
+ r';ytplayer\.config\s*=\s*({.+?});ytplayer',
+ r';ytplayer\.config\s*=\s*({.+?});',
+ )
+ config = self._search_regex(
+ patterns, webpage, 'ytplayer.config', default=None)
+ if config:
+ return self._parse_json(
+ uppercase_escape(config), video_id, fatal=False)
+
def _get_automatic_captions(self, video_id, webpage):
"""We need the webpage for getting the captions url, pass it as an
argument to speed up the process."""
self.to_screen('%s: Looking for automatic captions' % video_id)
- mobj = re.search(r';ytplayer.config = ({.*?});', webpage)
+ player_config = self._get_ytplayer_config(video_id, webpage)
err_msg = 'Couldn\'t find automatic captions for %s' % video_id
- if mobj is None:
+ if not player_config:
self._downloader.report_warning(err_msg)
return {}
- player_config = json.loads(mobj.group(1))
try:
args = player_config['args']
caption_url = args['ttsurl']
age_gate = False
video_info = None
# Try looking directly into the video webpage
- mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage)
- if mobj:
- json_code = uppercase_escape(mobj.group(1))
- ytplayer_config = json.loads(json_code)
+ ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
+ if ytplayer_config:
args = ytplayer_config['args']
if args.get('url_encoded_fmt_stream_map'):
# Convert to the same format returned by compat_parse_qs
self.report_warning('Youtube gives an alert message: ' + match)
playlist_title = self._html_search_regex(
- r'(?s)<h1 class="pl-header-title[^"]*">\s*(.*?)\s*</h1>',
+ r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>',
page, 'title')
return self.playlist_result(self._entries(page, playlist_id), playlist_id, playlist_title)
return super(YoutubeUserIE, cls).suitable(url)
+class YoutubeUserPlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
+ IE_DESC = 'YouTube.com user playlists'
+ _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/user/(?P<id>[^/]+)/playlists'
+ IE_NAME = 'youtube:user:playlists'
+
+ _TESTS = [{
+ 'url': 'http://www.youtube.com/user/ThirstForScience/playlists',
+ 'playlist_mincount': 4,
+ 'info_dict': {
+ 'id': 'ThirstForScience',
+ 'title': 'Thirst for Science',
+ },
+ }, {
+ # with "Load more" button
+ 'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
+ 'playlist_mincount': 70,
+ 'info_dict': {
+ 'id': 'igorkle1',
+ 'title': 'Игорь Клейнер',
+ },
+ }]
+
+
class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
IE_DESC = 'YouTube.com searches'
# there doesn't appear to be a real limit, for example if you search for
}
-class YoutubeShowIE(InfoExtractor):
+class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor):
IE_DESC = 'YouTube.com (multi-season) shows'
_VALID_URL = r'https?://www\.youtube\.com/show/(?P<id>[^?#]*)'
IE_NAME = 'youtube:show'
}]
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- playlist_id = mobj.group('id')
- webpage = self._download_webpage(
- 'https://www.youtube.com/show/%s/playlists' % playlist_id, playlist_id, 'Downloading show webpage')
- # There's one playlist for each season of the show
- m_seasons = list(re.finditer(r'href="(/playlist\?list=.*?)"', webpage))
- self.to_screen('%s: Found %s seasons' % (playlist_id, len(m_seasons)))
- entries = [
- self.url_result(
- 'https://www.youtube.com' + season.group(1), 'YoutubePlaylist')
- for season in m_seasons
- ]
- title = self._og_search_title(webpage, fatal=False)
-
- return {
- '_type': 'playlist',
- 'id': playlist_id,
- 'title': title,
- 'entries': entries,
- }
+ playlist_id = self._match_id(url)
+ return super(YoutubeShowIE, self)._real_extract(
+ 'https://www.youtube.com/show/%s/playlists' % playlist_id)
class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
obj = {}
obj_m = re.search(
(r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) +
- r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\})*)' +
+ r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\}(?:,\s*)?)*)' +
r'\}\s*;',
self.code)
fields = obj_m.group('fields')
video_format.add_option(
'-F', '--list-formats',
action='store_true', dest='listformats',
- help='List all available formats')
+ help='List all available formats of specified videos')
video_format.add_option(
'--youtube-include-dash-manifest',
action='store_true', dest='youtube_include_dash_manifest', default=True,
subtitles.add_option(
'--write-auto-sub', '--write-automatic-sub',
action='store_true', dest='writeautomaticsub', default=False,
- help='Write automatic subtitle file (YouTube only)')
+ help='Write automatically generated subtitle file (YouTube only)')
subtitles.add_option(
'--all-subs',
action='store_true', dest='allsubtitles', default=False,
import sys
from zipimport import zipimporter
-from .compat import (
- compat_str,
- compat_urllib_request,
-)
-from .utils import make_HTTPS_handler
+from .compat import compat_str
+
from .version import __version__
return True
-def update_self(to_screen, verbose):
+def update_self(to_screen, verbose, opener):
"""Update the program file with the latest version from the repository"""
UPDATE_URL = "https://rg3.github.io/youtube-dl/update/"
to_screen('It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
return
- https_handler = make_HTTPS_handler({})
- opener = compat_urllib_request.build_opener(https_handler)
-
# Check if there is a new version
try:
newversion = opener.open(VERSION_URL).read().decode('utf-8').strip()
return os.path.join(*sanitized_path)
+# Prepend protocol-less URLs with `http:` scheme in order to mitigate the number of
+# unwanted failures due to missing protocol
+def sanitized_Request(url, *args, **kwargs):
+ return compat_urllib_request.Request(
+ 'http:%s' % url if url.startswith('//') else url, *args, **kwargs)
+
+
def orderedSet(iterable):
""" Remove all duplicates from the input iterable """
res = []
numstr = '0%s' % numstr
else:
base = 10
- return compat_chr(int(numstr, base))
+ # See https://github.com/rg3/youtube-dl/issues/7518
+ try:
+ return compat_chr(int(numstr, base))
+ except ValueError:
+ pass
# Unknown entity in name, return its literal representation
- return ('&%s;' % entity)
+ return '&%s;' % entity
def unescapeHTML(s):
guess = url.partition('?')[0].rpartition('.')[2]
if re.match(r'^[A-Za-z0-9]+$', guess):
return guess
+ elif guess.rstrip('/') in (
+ 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac',
+ 'flv', 'f4v', 'f4a', 'f4b',
+ 'webm', 'ogg', 'ogv', 'oga', 'ogx', 'spx', 'opus',
+ 'mkv', 'mka', 'mk3d',
+ 'avi', 'divx',
+ 'mov',
+ 'asf', 'wmv', 'wma',
+ '3gp', '3g2',
+ 'mp3',
+ 'flac',
+ 'ape',
+ 'wav',
+ 'f4f', 'f4m', 'm3u8', 'smil'):
+ return guess.rstrip('/')
else:
return default_ext
def encode_dict(d, encoding='utf-8'):
- return dict((k.encode(encoding), v.encode(encoding)) for k, v in d.items())
+ def encode(v):
+ return v.encode(encoding) if isinstance(v, compat_basestring) else v
+ return dict((encode(k), encode(v)) for k, v in d.items())
US_RATINGS = {
from __future__ import unicode_literals
-__version__ = '2015.11.10'
+__version__ = '2015.11.27.1'