default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
+ --bidi-workaround Work around terminals that lack bidirectional
+ text support. Requires fribidi executable in PATH
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
+ --min-views COUNT Do not download any videos with less than COUNT
+ views
+ --max-views COUNT Do not download any videos with more than COUNT
+ views
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not listed in the archive
--restrict-filenames Restrict filenames to only ASCII characters, and
avoid "&" and spaces in filenames
-a, --batch-file FILE file containing URLs to download ('-' for stdin)
+ --load-info FILE json file containing the video information
+ (created with the "--write-json" option
-w, --no-overwrites do not overwrite files
-c, --continue force resume of partially downloaded files. By
default, youtube-dl will resume downloads if
--get-id simulate, quiet but print id
--get-thumbnail simulate, quiet but print thumbnail URL
--get-description simulate, quiet but print video description
+ --get-duration simulate, quiet but print video length
--get-filename simulate, quiet but print output filename
--get-format simulate, quiet but print output format
-j, --dump-json simulate, quiet but print JSON information
# BUGS
-Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>
+Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email.
-Please include:
-
-* Your exact command line, like `youtube-dl -t "http://www.youtube.com/watch?v=uHlDtZ6Oc3s&feature=channel_video_title"`. A common mistake is not to escape the `&`. Putting URLs in quotes should solve this problem.
-* If possible re-run the command with `--verbose`, and include the full output, it is really helpful to us.
-* The output of `youtube-dl --version`
-* The output of `python --version`
-* The name and version of your Operating System ("Ubuntu 11.04 x64" or "Windows 7 x64" is usually enough).
+Please include the full output of the command when run with `--verbose`. The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
For discussions, join us in the irc channel #youtube-dl on freenode.
+
+When you submit a request, please re-read it once to avoid a couple of mistakes (you can and should use this as a checklist):
+
+### Is the description of the issue itself sufficient?
+
+We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts.
+
+So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious
+
+- What the problem is
+- How it could be fixed
+- How your proposed solution would look like
+
+If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a commiter myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
+
+For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
+
+Site support requests must contain an example URL. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
+
+### Are you using the latest version?
+
+Before reporting any issue, type youtube-dl -U. This should report that you're up-to-date. Ábout 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
+
+### Is the issue already documented?
+
+Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or at https://github.com/rg3/youtube-dl/search?type=Issues . If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
+
+### Why are existing options not enough?
+
+Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
+
+### Is there enough context in your bug report?
+
+People want to solve problems, and often think they do us a favor by breaking down their larger problems (e.g. wanting to skip already downloaded files) to a specific request (e.g. requesting us to look whether the file exists before downloading the info page). However, what often happens is that they break down the problem into two steps: One simple, and one impossible (or extremely complicated one).
+
+We are then presented with a very complicated request when the original problem could be solved far easier, e.g. by recording the downloaded video IDs in a separate file. To avoid this, you must include the greater context where it is non-obvious. In particular, every feature request that does not consist of adding support for a new site should contain a use case scenario that explains in what situation the missing feature would be useful.
+
+### Does the issue involve one problem, and one problem only?
+
+Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
+
+In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
+
+### Is anyone going to need the feature?
+
+Only post features that you (or an incapicated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
+ --bidi-workaround Work around terminals that lack bidirectional
+ text support. Requires fribidi executable in PATH
Video Selection:
----------------
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
+ --min-views COUNT Do not download any videos with less than COUNT
+ views
+ --max-views COUNT Do not download any videos with more than COUNT
+ views
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not listed in the archive
--restrict-filenames Restrict filenames to only ASCII characters, and
avoid "&" and spaces in filenames
-a, --batch-file FILE file containing URLs to download ('-' for stdin)
+ --load-info FILE json file containing the video information
+ (created with the "--write-json" option
-w, --no-overwrites do not overwrite files
-c, --continue force resume of partially downloaded files. By
default, youtube-dl will resume downloads if
--get-id simulate, quiet but print id
--get-thumbnail simulate, quiet but print thumbnail URL
--get-description simulate, quiet but print video description
+ --get-duration simulate, quiet but print video length
--get-filename simulate, quiet but print output filename
--get-format simulate, quiet but print output format
-j, --dump-json simulate, quiet but print JSON information
====
Bugs and suggestions should be reported at:
-https://github.com/rg3/youtube-dl/issues
+https://github.com/rg3/youtube-dl/issues . Unless you were prompted so
+or there is another pertinent reason (e.g. GitHub fails to accept the
+bug report), please do not send bug reports via personal email.
-Please include:
-
-- Your exact command line, like
- youtube-dl -t "http://www.youtube.com/watch?v=uHlDtZ6Oc3s&feature=channel_video_title".
- A common mistake is not to escape the &. Putting URLs in quotes
- should solve this problem.
-- If possible re-run the command with --verbose, and include the full
- output, it is really helpful to us.
-- The output of youtube-dl --version
-- The output of python --version
-- The name and version of your Operating System ("Ubuntu 11.04 x64" or
- "Windows 7 x64" is usually enough).
+Please include the full output of the command when run with --verbose.
+The output (including the first lines) contain important debugging
+information. Issues without the full output are often not reproducible
+and therefore do not get solved in short order, if ever.
For discussions, join us in the irc channel #youtube-dl on freenode.
+
+When you submit a request, please re-read it once to avoid a couple of
+mistakes (you can and should use this as a checklist):
+
+Is the description of the issue itself sufficient?
+
+We often get issue reports that we cannot really decipher. While in most
+cases we eventually get the required information after asking back
+multiple times, this poses an unnecessary drain on our resources. Many
+contributors, including myself, are also not native speakers, so we may
+misread some parts.
+
+So please elaborate on what feature you are requesting, or what bug you
+want to be fixed. Make sure that it's obvious
+
+- What the problem is
+- How it could be fixed
+- How your proposed solution would look like
+
+If your report is shorter than two lines, it is almost certainly missing
+some of these, which makes it hard for us to respond to it. We're often
+too polite to close the issue outright, but the missing info makes
+misinterpretation likely. As a commiter myself, I often get frustrated
+by these issues, since the only possible way for me to move forward on
+them is to ask for clarification over and over.
+
+For bug reports, this means that your report should contain the complete
+output of youtube-dl when called with the -v flag. The error message you
+get for (most) bugs even says so, but you would not believe how many of
+our bug reports do not contain this information.
+
+Site support requests must contain an example URL. An example URL is a
+URL you might want to download, like
+http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious
+video present. Except under very special circumstances, the main page of
+a video service (e.g. http://www.youtube.com/ ) is not an example URL.
+
+Are you using the latest version?
+
+Before reporting any issue, type youtube-dl -U. This should report that
+you're up-to-date. Ábout 20% of the reports we receive are already
+fixed, but people are using outdated versions. This goes for feature
+requests as well.
+
+Is the issue already documented?
+
+Make sure that someone has not already opened the issue you're trying to
+open. Search at the top of the window or at
+https://github.com/rg3/youtube-dl/search?type=Issues . If there is an
+issue, feel free to write something along the lines of "This affects me
+as well, with version 2015.01.01. Here is some more information on the
+issue: ...". While some issues may be old, a new post into them often
+spurs rapid activity.
+
+Why are existing options not enough?
+
+Before requesting a new feature, please have a quick peek at the list of
+supported options. Many feature requests are for features that actually
+exist already! Please, absolutely do show off your work in the issue
+report and detail how the existing similar options do not solve your
+problem.
+
+Is there enough context in your bug report?
+
+People want to solve problems, and often think they do us a favor by
+breaking down their larger problems (e.g. wanting to skip already
+downloaded files) to a specific request (e.g. requesting us to look
+whether the file exists before downloading the info page). However, what
+often happens is that they break down the problem into two steps: One
+simple, and one impossible (or extremely complicated one).
+
+We are then presented with a very complicated request when the original
+problem could be solved far easier, e.g. by recording the downloaded
+video IDs in a separate file. To avoid this, you must include the
+greater context where it is non-obvious. In particular, every feature
+request that does not consist of adding support for a new site should
+contain a use case scenario that explains in what situation the missing
+feature would be useful.
+
+Does the issue involve one problem, and one problem only?
+
+Some of our users seem to think there is a limit of issues they can or
+should open. There is no limit of issues they can or should open. While
+it may seem appealing to be able to dump all your issues into one
+ticket, that means that someone who solves one of your issues cannot
+mark the issue as closed. Typically, reporting a bunch of issues leads
+to the ticket lingering since nobody wants to attack that behemoth,
+until someone mercifully splits the issue into multiple ones.
+
+In particular, every site support request issue should only pertain to
+services at one site (generally under a common domain, but always using
+the same backend technology). Do not request support for vimeo user
+videos, Whitehouse podcasts, and Google Plus pages in the same issue.
+Also, make sure that you don't post bug reports alongside feature
+requests. As a rule of thumb, a feature request does not include outputs
+of youtube-dl that are not immediately related to the feature at hand.
+Do not post reports of a network error alongside the request for a new
+video service.
+
+Is anyone going to need the feature?
+
+Only post features that you (or an incapicated friend you can personally
+talk to) require. Do not post features because they seem like a good
+idea. If they are really useful, they will be requested by someone who
+requires them.
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
+from youtube_dl import YoutubeDL
class YDL(FakeYDL):
self.assertEqual(test_dict['extractor'], 'Foo')
self.assertEqual(test_dict['playlist'], 'funny videos')
+ def test_prepare_filename(self):
+ info = {
+ u'id': u'1234',
+ u'ext': u'mp4',
+ u'width': None,
+ }
+ def fname(templ):
+ ydl = YoutubeDL({'outtmpl': templ})
+ return ydl.prepare_filename(info)
+ self.assertEqual(fname(u'%(id)s.%(ext)s'), u'1234.mp4')
+ self.assertEqual(fname(u'%(id)s-%(width)s.%(ext)s'), u'1234-NA.mp4')
+ # Replace missing fields with 'NA'
+ self.assertEqual(fname(u'%(uploader_date)s-%(id)s.%(ext)s'), u'NA-1234.mp4')
+
if __name__ == '__main__':
unittest.main()
from test.helper import get_testcases
from youtube_dl.extractor import (
+ FacebookIE,
gen_extractors,
JustinTVIE,
YoutubeIE,
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
+ def test_facebook_matching(self):
+ self.assertTrue(FacebookIE.suitable(u'https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
+
def test_no_duplicates(self):
ies = gen_extractors()
for tc in get_testcases():
url = tc['url']
for ie in ies:
- if type(ie).__name__ in ['GenericIE', tc['name'] + 'IE']:
+ if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'):
self.assertTrue(ie.suitable(url), '%s should match URL %r' % (type(ie).__name__, url))
else:
self.assertFalse(ie.suitable(url), '%s should not match URL %r' % (type(ie).__name__, url))
self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/user7108434', ['vimeo:user'])
+ # https://github.com/rg3/youtube-dl/issues/1930
+ def test_soundcloud_not_matching_sets(self):
+ self.assertMatch('http://soundcloud.com/floex/sets/gone-ep', ['soundcloud:set'])
if __name__ == '__main__':
unittest.main()
from youtube_dl.extractor import (
+ AcademicEarthCourseIE,
DailymotionPlaylistIE,
DailymotionUserIE,
VimeoChannelIE,
VimeoUserIE,
+ VimeoAlbumIE,
+ VimeoGroupsIE,
UstreamChannelIE,
SoundcloudSetIE,
SoundcloudUserIE,
BambuserChannelIE,
BandcampAlbumIE,
SmotriCommunityIE,
- SmotriUserIE
+ SmotriUserIE,
+ IviCompilationIE
)
self.assertEqual(result['title'], u'Nki')
self.assertTrue(len(result['entries']) > 65)
+ def test_vimeo_album(self):
+ dl = FakeYDL()
+ ie = VimeoAlbumIE(dl)
+ result = ie.extract('http://vimeo.com/album/2632481')
+ self.assertIsPlaylist(result)
+ self.assertEqual(result['title'], u'Staff Favorites: November 2013')
+ self.assertTrue(len(result['entries']) > 12)
+
+ def test_vimeo_groups(self):
+ dl = FakeYDL()
+ ie = VimeoGroupsIE(dl)
+ result = ie.extract('http://vimeo.com/groups/rolexawards')
+ self.assertIsPlaylist(result)
+ self.assertEqual(result['title'], u'Rolex Awards for Enterprise')
+ self.assertTrue(len(result['entries']) > 72)
+
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
self.assertEqual(result['title'], u'Inspector')
self.assertTrue(len(result['entries']) >= 9)
+ def test_AcademicEarthCourse(self):
+ dl = FakeYDL()
+ ie = AcademicEarthCourseIE(dl)
+ result = ie.extract(u'http://academicearth.org/courses/building-dynamic-websites/')
+ self.assertIsPlaylist(result)
+ self.assertEqual(result['id'], u'building-dynamic-websites')
+ self.assertEqual(result['title'], u'Building Dynamic Websites')
+ self.assertEqual(result['description'], u"Today's websites are increasingly dynamic. Pages are no longer static HTML files but instead generated by scripts and database calls. User interfaces are more seamless, with technologies like Ajax replacing traditional page reloads. This course teaches students how to build dynamic websites with Ajax and with Linux, Apache, MySQL, and PHP (LAMP), one of today's most popular frameworks. Students learn how to set up domain names with DNS, how to structure pages with XHTML and CSS, how to program in JavaScript and PHP, how to configure Apache and MySQL, how to design and query databases with SQL, how to use Ajax with both XML and JSON, and how to build mashups. The course explores issues of security, scalability, and cross-browser support and also discusses enterprise-level deployments of websites, including third-party hosting, virtualization, colocation in data centers, firewalling, and load-balancing.")
+ self.assertEqual(len(result['entries']), 10)
+
+ def test_ivi_compilation(self):
+ dl = FakeYDL()
+ ie = IviCompilationIE(dl)
+ result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel')
+ self.assertIsPlaylist(result)
+ self.assertEqual(result['id'], u'dezhurnyi_angel')
+ self.assertEqual(result['title'], u'Дежурный ангел (2010 - 2012)')
+ self.assertTrue(len(result['entries']) >= 36)
+
+ def test_ivi_compilation_season(self):
+ dl = FakeYDL()
+ ie = IviCompilationIE(dl)
+ result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel/season2')
+ self.assertIsPlaylist(result)
+ self.assertEqual(result['id'], u'dezhurnyi_angel/season2')
+ self.assertEqual(result['title'], u'Дежурный ангел (2010 - 2012) 2 сезон')
+ self.assertTrue(len(result['entries']) >= 20)
+
+
if __name__ == '__main__':
unittest.main()
#from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import (
- timeconvert,
- sanitize_filename,
- unescapeHTML,
- orderedSet,
DateRange,
- unified_strdate,
+ encodeFilename,
find_xpath_attr,
get_meta_content,
- xpath_with_ns,
+ orderedSet,
+ sanitize_filename,
+ shell_quote,
smuggle_url,
+ str_to_int,
+ timeconvert,
+ unescapeHTML,
+ unified_strdate,
unsmuggle_url,
- shell_quote,
- encodeFilename,
+ url_basename,
+ xpath_with_ns,
)
if sys.version_info < (3, 0):
args = ['ffmpeg', '-i', encodeFilename(u'ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), u"""ffmpeg -i 'ñ€ß'"'"'.mp4'""")
+ def test_str_to_int(self):
+ self.assertEqual(str_to_int('123,456'), 123456)
+ self.assertEqual(str_to_int('123.456'), 123456)
+
+ def test_url_basename(self):
+ self.assertEqual(url_basename(u'http://foo.de/'), u'')
+ self.assertEqual(url_basename(u'http://foo.de/bar/baz'), u'baz')
+ self.assertEqual(url_basename(u'http://foo.de/bar/baz?x=y'), u'baz')
+ self.assertEqual(url_basename(u'http://foo.de/bar/baz#x=y'), u'baz')
+ self.assertEqual(url_basename(u'http://foo.de/bar/baz/'), u'baz')
+ self.assertEqual(
+ url_basename(u'http://media.w3.org/2010/05/sintel/trailer.mp4'),
+ u'trailer.mp4')
if __name__ == '__main__':
unittest.main()
INFO_JSON_FILE = TEST_ID + '.info.json'
DESCRIPTION_FILE = TEST_ID + '.mp4.description'
EXPECTED_DESCRIPTION = u'''test chars: "'/\ä↭𝕐
+test URL: https://github.com/rg3/youtube-dl/issues/1892
This is a test video for youtube-dl.
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
+ YoutubeTopListIE,
)
original_video = entries[0]
self.assertEqual(original_video['id'], 'rjFaenf1T-Y')
+ def test_youtube_toplist(self):
+ dl = FakeYDL()
+ ie = YoutubeTopListIE(dl)
+ result = ie.extract('yttoplist:music:Top Tracks')
+ entries = result['entries']
+ self.assertTrue(len(entries) >= 5)
+
if __name__ == '__main__':
unittest.main()
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ default\ $XDG_CACHE_HOME/youtube\-dl\ or\ ~/.cache
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ /youtube\-dl\ .
\-\-no\-cache\-dir\ \ \ \ \ \ \ \ \ \ \ \ \ Disable\ filesystem\ caching
+\-\-bidi\-workaround\ \ \ \ \ \ \ \ \ \ Work\ around\ terminals\ that\ lack\ bidirectional
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ text\ support.\ Requires\ fribidi\ executable\ in\ PATH
\f[]
.fi
.SS Video Selection:
\-\-date\ DATE\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ download\ only\ videos\ uploaded\ in\ this\ date
\-\-datebefore\ DATE\ \ \ \ \ \ \ \ \ \ download\ only\ videos\ uploaded\ before\ this\ date
\-\-dateafter\ DATE\ \ \ \ \ \ \ \ \ \ \ download\ only\ videos\ uploaded\ after\ this\ date
+\-\-min\-views\ COUNT\ \ \ \ \ \ \ \ \ \ Do\ not\ download\ any\ videos\ with\ less\ than\ COUNT
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ views
+\-\-max\-views\ COUNT\ \ \ \ \ \ \ \ \ \ Do\ not\ download\ any\ videos\ with\ more\ than\ COUNT
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ views
\-\-no\-playlist\ \ \ \ \ \ \ \ \ \ \ \ \ \ download\ only\ the\ currently\ playing\ video
\-\-age\-limit\ YEARS\ \ \ \ \ \ \ \ \ \ download\ only\ videos\ suitable\ for\ the\ given\ age
\-\-download\-archive\ FILE\ \ \ \ Download\ only\ videos\ not\ listed\ in\ the\ archive
\-\-restrict\-filenames\ \ \ \ \ \ \ Restrict\ filenames\ to\ only\ ASCII\ characters,\ and
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ avoid\ "&"\ and\ spaces\ in\ filenames
\-a,\ \-\-batch\-file\ FILE\ \ \ \ \ \ file\ containing\ URLs\ to\ download\ (\[aq]\-\[aq]\ for\ stdin)
+\-\-load\-info\ FILE\ \ \ \ \ \ \ \ \ \ \ json\ file\ containing\ the\ video\ information
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (created\ with\ the\ "\-\-write\-json"\ option
\-w,\ \-\-no\-overwrites\ \ \ \ \ \ \ \ do\ not\ overwrite\ files
\-c,\ \-\-continue\ \ \ \ \ \ \ \ \ \ \ \ \ force\ resume\ of\ partially\ downloaded\ files.\ By
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ default,\ youtube\-dl\ will\ resume\ downloads\ if
\-\-get\-id\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ id
\-\-get\-thumbnail\ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ thumbnail\ URL
\-\-get\-description\ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ video\ description
+\-\-get\-duration\ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ video\ length
\-\-get\-filename\ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ output\ filename
\-\-get\-format\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ output\ format
\-j,\ \-\-dump\-json\ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ JSON\ information
.SH BUGS
.PP
Bugs and suggestions should be reported at:
-<https://github.com/rg3/youtube-dl/issues>
+<https://github.com/rg3/youtube-dl/issues> .
+Unless you were prompted so or there is another pertinent reason (e.g.
+GitHub fails to accept the bug report), please do not send bug reports
+via personal email.
.PP
-Please include:
-.IP \[bu] 2
-Your exact command line, like
-\f[C]youtube\-dl\ \-t\ "http://www.youtube.com/watch?v=uHlDtZ6Oc3s&feature=channel_video_title"\f[].
-A common mistake is not to escape the \f[C]&\f[].
-Putting URLs in quotes should solve this problem.
-.IP \[bu] 2
-If possible re\-run the command with \f[C]\-\-verbose\f[], and include
-the full output, it is really helpful to us.
+Please include the full output of the command when run with
+\f[C]\-\-verbose\f[].
+The output (including the first lines) contain important debugging
+information.
+Issues without the full output are often not reproducible and therefore
+do not get solved in short order, if ever.
+.PP
+For discussions, join us in the irc channel #youtube\-dl on freenode.
+.PP
+When you submit a request, please re\-read it once to avoid a couple of
+mistakes (you can and should use this as a checklist):
+.SS Is the description of the issue itself sufficient?
+.PP
+We often get issue reports that we cannot really decipher.
+While in most cases we eventually get the required information after
+asking back multiple times, this poses an unnecessary drain on our
+resources.
+Many contributors, including myself, are also not native speakers, so we
+may misread some parts.
+.PP
+So please elaborate on what feature you are requesting, or what bug you
+want to be fixed.
+Make sure that it\[aq]s obvious
.IP \[bu] 2
-The output of \f[C]youtube\-dl\ \-\-version\f[]
+What the problem is
.IP \[bu] 2
-The output of \f[C]python\ \-\-version\f[]
+How it could be fixed
.IP \[bu] 2
-The name and version of your Operating System ("Ubuntu 11.04 x64" or
-"Windows 7 x64" is usually enough).
+How your proposed solution would look like
.PP
-For discussions, join us in the irc channel #youtube\-dl on freenode.
+If your report is shorter than two lines, it is almost certainly missing
+some of these, which makes it hard for us to respond to it.
+We\[aq]re often too polite to close the issue outright, but the missing
+info makes misinterpretation likely.
+As a commiter myself, I often get frustrated by these issues, since the
+only possible way for me to move forward on them is to ask for
+clarification over and over.
+.PP
+For bug reports, this means that your report should contain the
+\f[I]complete\f[] output of youtube\-dl when called with the \-v flag.
+The error message you get for (most) bugs even says so, but you would
+not believe how many of our bug reports do not contain this information.
+.PP
+Site support requests must contain an example URL.
+An example URL is a URL you might want to download, like
+http://www.youtube.com/watch?v=BaW_jenozKc .
+There should be an obvious video present.
+Except under very special circumstances, the main page of a video
+service (e.g.
+http://www.youtube.com/ ) is \f[I]not\f[] an example URL.
+.SS Are you using the latest version?
+.PP
+Before reporting any issue, type youtube\-dl \-U.
+This should report that you\[aq]re up\-to\-date.
+Ábout 20% of the reports we receive are already fixed, but people are
+using outdated versions.
+This goes for feature requests as well.
+.SS Is the issue already documented?
+.PP
+Make sure that someone has not already opened the issue you\[aq]re
+trying to open.
+Search at the top of the window or at
+https://github.com/rg3/youtube\-dl/search?type=Issues .
+If there is an issue, feel free to write something along the lines of
+"This affects me as well, with version 2015.01.01.
+Here is some more information on the issue: ...".
+While some issues may be old, a new post into them often spurs rapid
+activity.
+.SS Why are existing options not enough?
+.PP
+Before requesting a new feature, please have a quick peek at the list of
+supported
+options (https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis).
+Many feature requests are for features that actually exist already!
+Please, absolutely do show off your work in the issue report and detail
+how the existing similar options do \f[I]not\f[] solve your problem.
+.SS Is there enough context in your bug report?
+.PP
+People want to solve problems, and often think they do us a favor by
+breaking down their larger problems (e.g.
+wanting to skip already downloaded files) to a specific request (e.g.
+requesting us to look whether the file exists before downloading the
+info page).
+However, what often happens is that they break down the problem into two
+steps: One simple, and one impossible (or extremely complicated one).
+.PP
+We are then presented with a very complicated request when the original
+problem could be solved far easier, e.g.
+by recording the downloaded video IDs in a separate file.
+To avoid this, you must include the greater context where it is
+non\-obvious.
+In particular, every feature request that does not consist of adding
+support for a new site should contain a use case scenario that explains
+in what situation the missing feature would be useful.
+.SS Does the issue involve one problem, and one problem only?
+.PP
+Some of our users seem to think there is a limit of issues they can or
+should open.
+There is no limit of issues they can or should open.
+While it may seem appealing to be able to dump all your issues into one
+ticket, that means that someone who solves one of your issues cannot
+mark the issue as closed.
+Typically, reporting a bunch of issues leads to the ticket lingering
+since nobody wants to attack that behemoth, until someone mercifully
+splits the issue into multiple ones.
+.PP
+In particular, every site support request issue should only pertain to
+services at one site (generally under a common domain, but always using
+the same backend technology).
+Do not request support for vimeo user videos, Whitehouse podcasts, and
+Google Plus pages in the same issue.
+Also, make sure that you don\[aq]t post bug reports alongside feature
+requests.
+As a rule of thumb, a feature request does not include outputs of
+youtube\-dl that are not immediately related to the feature at hand.
+Do not post reports of a network error alongside the request for a new
+video service.
+.SS Is anyone going to need the feature?
+.PP
+Only post features that you (or an incapicated friend you can personally
+talk to) require.
+Do not post features because they seem like a good idea.
+If they are really useful, they will be requested by someone who
+requires them.
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
prev="${COMP_WORDS[COMP_CWORD-1]}"
- opts="--help --version --update --ignore-errors --abort-on-error --dump-user-agent --user-agent --referer --list-extractors --extractor-descriptions --proxy --no-check-certificate --cache-dir --no-cache-dir --socket-timeout --playlist-start --playlist-end --match-title --reject-title --max-downloads --min-filesize --max-filesize --date --datebefore --dateafter --no-playlist --age-limit --download-archive --rate-limit --retries --buffer-size --no-resize-buffer --test --title --id --literal --auto-number --output --autonumber-size --restrict-filenames --batch-file --no-overwrites --continue --no-continue --cookies --no-part --no-mtime --write-description --write-info-json --write-annotations --write-thumbnail --quiet --simulate --skip-download --get-url --get-title --get-id --get-thumbnail --get-description --get-filename --get-format --dump-json --newline --no-progress --console-title --verbose --dump-intermediate-pages --write-pages --youtube-print-sig-code --format --all-formats --prefer-free-formats --max-quality --list-formats --write-sub --write-auto-sub --all-subs --list-subs --sub-format --sub-lang --username --password --netrc --video-password --extract-audio --audio-format --audio-quality --recode-video --keep-video --no-post-overwrites --embed-subs --add-metadata"
+ opts="--help --version --update --ignore-errors --abort-on-error --dump-user-agent --user-agent --referer --list-extractors --extractor-descriptions --proxy --no-check-certificate --cache-dir --no-cache-dir --socket-timeout --bidi-workaround --playlist-start --playlist-end --match-title --reject-title --max-downloads --min-filesize --max-filesize --date --datebefore --dateafter --min-views --max-views --no-playlist --age-limit --download-archive --rate-limit --retries --buffer-size --no-resize-buffer --test --title --id --literal --auto-number --output --autonumber-size --restrict-filenames --batch-file --load-info --no-overwrites --continue --no-continue --cookies --no-part --no-mtime --write-description --write-info-json --write-annotations --write-thumbnail --quiet --simulate --skip-download --get-url --get-title --get-id --get-thumbnail --get-description --get-duration --get-filename --get-format --dump-json --newline --no-progress --console-title --verbose --dump-intermediate-pages --write-pages --youtube-print-sig-code --format --all-formats --prefer-free-formats --max-quality --list-formats --write-sub --write-auto-sub --all-subs --list-subs --sub-format --sub-lang --username --password --netrc --video-password --extract-audio --audio-format --audio-quality --recode-video --keep-video --no-post-overwrites --embed-subs --add-metadata"
keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater :ythistory"
fileopts="-a|--batch-file|--download-archive|--cookies"
diropts="--cache-dir"
"""Report destination filename."""
self.to_screen(u'[download] Destination: ' + filename)
+ def _report_progress_status(self, msg, is_last_line=False):
+ fullmsg = u'[download] ' + msg
+ if self.params.get('progress_with_newline', False):
+ self.to_screen(fullmsg)
+ else:
+ if os.name == 'nt':
+ prev_len = getattr(self, '_report_progress_prev_line_length',
+ 0)
+ if prev_len > len(fullmsg):
+ fullmsg += u' ' * (prev_len - len(fullmsg))
+ self._report_progress_prev_line_length = len(fullmsg)
+ clear_line = u'\r'
+ else:
+ clear_line = (u'\r\x1b[K' if sys.stderr.isatty() else u'\r')
+ self.to_screen(clear_line + fullmsg, skip_eol=not is_last_line)
+ self.to_console_title(u'youtube-dl ' + msg)
+
def report_progress(self, percent, data_len_str, speed, eta):
"""Report download progress."""
if self.params.get('noprogress', False):
return
- clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
if eta is not None:
eta_str = self.format_eta(eta)
else:
else:
percent_str = 'Unknown %'
speed_str = self.format_speed(speed)
- if self.params.get('progress_with_newline', False):
- self.to_screen(u'[download] %s of %s at %s ETA %s' %
- (percent_str, data_len_str, speed_str, eta_str))
+
+ msg = (u'%s of %s at %s ETA %s' %
+ (percent_str, data_len_str, speed_str, eta_str))
+ self._report_progress_status(msg)
+
+ def report_progress_live_stream(self, downloaded_data_len, speed, elapsed):
+ if self.params.get('noprogress', False):
+ return
+ downloaded_str = format_bytes(downloaded_data_len)
+ speed_str = self.format_speed(speed)
+ elapsed_str = FileDownloader.format_seconds(elapsed)
+ msg = u'%s at %s (%s)' % (downloaded_str, speed_str, elapsed_str)
+ self._report_progress_status(msg)
+
+ def report_finish(self, data_len_str, tot_time):
+ """Report download finished."""
+ if self.params.get('noprogress', False):
+ self.to_screen(u'[download] Download completed')
else:
- self.to_screen(u'\r%s[download] %s of %s at %s ETA %s' %
- (clear_line, percent_str, data_len_str, speed_str, eta_str), skip_eol=True)
- self.to_console_title(u'youtube-dl - %s of %s at %s ETA %s' %
- (percent_str.strip(), data_len_str.strip(), speed_str.strip(), eta_str.strip()))
+ self._report_progress_status(
+ (u'100%% of %s in %s' %
+ (data_len_str, self.format_seconds(tot_time))),
+ is_last_line=True)
def report_resuming_byte(self, resume_len):
"""Report attempt to resume at given byte."""
"""Report it was impossible to resume download."""
self.to_screen(u'[download] Unable to resume')
- def report_finish(self, data_len_str, tot_time):
- """Report download finished."""
- if self.params.get('noprogress', False):
- self.to_screen(u'[download] Download completed')
- else:
- clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
- self.to_screen(u'\r%s[download] 100%% of %s in %s' %
- (clear_line, data_len_str, self.format_seconds(tot_time)))
-
- def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live):
+ def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live, conn):
def run_rtmpdump(args):
start = time.time()
resume_percent = None
'eta': eta,
'speed': speed,
})
- elif self.params.get('verbose', False):
- if not cursor_in_new_line:
- self.to_screen(u'')
- cursor_in_new_line = True
- self.to_screen(u'[rtmpdump] '+line)
+ else:
+ # no percent for live streams
+ mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
+ if mobj:
+ downloaded_data_len = int(float(mobj.group(1))*1024)
+ time_now = time.time()
+ speed = self.calc_speed(start, time_now, downloaded_data_len)
+ self.report_progress_live_stream(downloaded_data_len, speed, time_now - start)
+ cursor_in_new_line = False
+ self._hook_progress({
+ 'downloaded_bytes': downloaded_data_len,
+ 'tmpfilename': tmpfilename,
+ 'filename': filename,
+ 'status': 'downloading',
+ 'speed': speed,
+ })
+ elif self.params.get('verbose', False):
+ if not cursor_in_new_line:
+ self.to_screen(u'')
+ cursor_in_new_line = True
+ self.to_screen(u'[rtmpdump] '+line)
proc.wait()
if not cursor_in_new_line:
self.to_screen(u'')
basic_args += ['--stop', '1']
if live:
basic_args += ['--live']
+ if conn:
+ basic_args += ['--conn', conn]
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if sys.platform == 'win32' and sys.version_info < (3, 0):
info_dict.get('page_url', None),
info_dict.get('play_path', None),
info_dict.get('tc_url', None),
- info_dict.get('rtmp_live', False))
+ info_dict.get('rtmp_live', False),
+ info_dict.get('rtmp_conn', None))
# Attempt to download using mplayer
if url.startswith('mms') or url.startswith('rtsp'):
from __future__ import absolute_import
+import collections
import errno
import io
import json
from .utils import (
compat_cookiejar,
compat_http_client,
- compat_print,
compat_str,
compat_urllib_error,
compat_urllib_request,
encodeFilename,
ExtractorError,
format_bytes,
+ formatSeconds,
+ get_term_width,
locked_file,
make_HTTPS_handler,
MaxDownloadsReached,
subtitles_filename,
takewhile_inclusive,
UnavailableVideoError,
+ url_basename,
write_json_file,
write_string,
YoutubeDLHandler,
forcethumbnail: Force printing thumbnail URL.
forcedescription: Force printing description.
forcefilename: Force printing final filename.
+ forceduration: Force printing duration.
forcejson: Force printing info_dict as JSON.
simulate: Do not download the video files.
format: Video format code.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
- download_archive: File name of a file where all downloads are recorded.
+ min_views: An integer representing the minimum view count the video
+ must have in order to not be skipped.
+ Videos without view count information are always
+ downloaded. None for no limit.
+ max_views: An integer representing the maximum view count.
+ Videos that are more popular than that are not
+ downloaded.
+ Videos without view count information are always
+ downloaded. None for no limit.
+ download_archive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
cookiefile: File name where cookies should be read from and dumped to.
nocheckcertificate:Do not verify SSL certificates
proxy: URL of the proxy server to use
socket_timeout: Time to wait for unresponsive hosts, in seconds
+ bidi_workaround: Work around buggy terminals without bidirectional text
+ support, using fridibi
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
self._download_retcode = 0
self._num_downloads = 0
self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
+ self._err_file = sys.stderr
self.params = {} if params is None else params
+ if params.get('bidi_workaround', False):
+ try:
+ import pty
+ master, slave = pty.openpty()
+ width = get_term_width()
+ if width is None:
+ width_args = []
+ else:
+ width_args = ['-w', str(width)]
+ self._fribidi = subprocess.Popen(
+ ['fribidi', '-c', 'UTF-8'] + width_args,
+ stdin=subprocess.PIPE,
+ stdout=slave,
+ stderr=self._err_file)
+ self._fribidi_channel = os.fdopen(master, 'rb')
+ except OSError as ose:
+ if ose.errno == 2:
+ self.report_warning(u'Could not find fribidi executable, ignoring --bidi-workaround . Make sure that fribidi is an executable file in one of the directories in your $PATH.')
+ else:
+ raise
+
if (sys.version_info >= (3,) and sys.platform != 'win32' and
sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
and not params['restrictfilenames']):
self._pps.append(pp)
pp.set_downloader(self)
+ def _bidi_workaround(self, message):
+ if not hasattr(self, '_fribidi_channel'):
+ return message
+
+ assert type(message) == type(u'')
+ line_count = message.count(u'\n') + 1
+ self._fribidi.stdin.write((message + u'\n').encode('utf-8'))
+ self._fribidi.stdin.flush()
+ res = u''.join(self._fribidi_channel.readline().decode('utf-8')
+ for _ in range(line_count))
+ return res[:-len(u'\n')]
+
def to_screen(self, message, skip_eol=False):
+ """Print message to stdout if not in quiet mode."""
+ return self.to_stdout(message, skip_eol, check_quiet=True)
+
+ def to_stdout(self, message, skip_eol=False, check_quiet=False):
"""Print message to stdout if not in quiet mode."""
if self.params.get('logger'):
self.params['logger'].debug(message)
- elif not self.params.get('quiet', False):
+ elif not check_quiet or not self.params.get('quiet', False):
+ message = self._bidi_workaround(message)
terminator = [u'\n', u''][skip_eol]
output = message + terminator
+
write_string(output, self._screen_file)
def to_stderr(self, message):
if self.params.get('logger'):
self.params['logger'].error(message)
else:
+ message = self._bidi_workaround(message)
output = message + u'\n'
- if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
- output = output.encode(preferredencoding())
- sys.stderr.write(output)
+ write_string(output, self._err_file)
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
'''
- if sys.stderr.isatty() and os.name != 'nt':
+ if self._err_file.isatty() and os.name != 'nt':
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header = u'WARNING:'
Do the same as trouble, but prefixes the message with 'ERROR:', colored
in red if stderr is a tty file.
'''
- if sys.stderr.isatty() and os.name != 'nt':
+ if self._err_file.isatty() and os.name != 'nt':
_msg_header = u'\033[0;31mERROR:\033[0m'
else:
_msg_header = u'ERROR:'
error_message = u'%s %s' % (_msg_header, message)
self.trouble(error_message, tb)
- def report_writedescription(self, descfn):
- """ Report that the description file is being written """
- self.to_screen(u'[info] Writing video description to: ' + descfn)
-
- def report_writesubtitles(self, sub_filename):
- """ Report that the subtitles file is being written """
- self.to_screen(u'[info] Writing video subtitles to: ' + sub_filename)
-
- def report_writeinfojson(self, infofn):
- """ Report that the metadata file has been written """
- self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
-
- def report_writeannotations(self, annofn):
- """ Report that the annotations file has been written. """
- self.to_screen(u'[info] Writing video annotations to: ' + annofn)
-
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
try:
template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
sanitize = lambda k, v: sanitize_filename(
- u'NA' if v is None else compat_str(v),
+ compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k == u'id'))
template_dict = dict((k, sanitize(k, v))
- for k, v in template_dict.items())
+ for k, v in template_dict.items()
+ if v is not None)
+ template_dict = collections.defaultdict(lambda: u'NA', template_dict)
tmpl = os.path.expanduser(self.params['outtmpl'])
filename = tmpl % template_dict
return filename
- except KeyError as err:
- self.report_error(u'Erroneous output template')
- return None
except ValueError as err:
self.report_error(u'Error in output template: ' + str(err) + u' (encoding: ' + repr(preferredencoding()) + ')')
return None
def _match_entry(self, info_dict):
""" Returns None iff the file should be downloaded """
+ video_title = info_dict.get('title', info_dict.get('id', u'video'))
if 'title' in info_dict:
# This can happen when we're just evaluating the playlist
title = info_dict['title']
matchtitle = self.params.get('matchtitle', False)
if matchtitle:
if not re.search(matchtitle, title, re.IGNORECASE):
- return u'[download] "' + title + '" title did not match pattern "' + matchtitle + '"'
+ return u'"' + title + '" title did not match pattern "' + matchtitle + '"'
rejecttitle = self.params.get('rejecttitle', False)
if rejecttitle:
if re.search(rejecttitle, title, re.IGNORECASE):
if date is not None:
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
- return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
+ return u'%s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
+ view_count = info_dict.get('view_count', None)
+ if view_count is not None:
+ min_views = self.params.get('min_views')
+ if min_views is not None and view_count < min_views:
+ return u'Skipping %s, because it has not reached minimum view count (%d/%d)' % (video_title, view_count, min_views)
+ max_views = self.params.get('max_views')
+ if max_views is not None and view_count > max_views:
+ return u'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
return u'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
- return (u'%s has already been recorded in archive'
- % info_dict.get('title', info_dict.get('id', u'video')))
+ return u'%s has already been recorded in archive' % video_title
return None
@staticmethod
for key, value in extra_info.items():
info_dict.setdefault(key, value)
- def extract_info(self, url, download=True, ie_key=None, extra_info={}):
+ def extract_info(self, url, download=True, ie_key=None, extra_info={},
+ process=True):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
{
'extractor': ie.IE_NAME,
'webpage_url': url,
+ 'webpage_url_basename': url_basename(url),
'extractor_key': ie.ie_key(),
})
- return self.process_ie_result(ie_result, download, extra_info)
+ if process:
+ return self.process_ie_result(ie_result, download, extra_info)
+ else:
+ return ie_result
except ExtractorError as de: # An error we somewhat expected
self.report_error(compat_str(de), de.format_traceback())
break
download,
ie_key=ie_result.get('ie_key'),
extra_info=extra_info)
+ elif result_type == 'url_transparent':
+ # Use the information from the embedding page
+ info = self.extract_info(
+ ie_result['url'], ie_key=ie_result.get('ie_key'),
+ extra_info=extra_info, download=False, process=False)
+
+ def make_result(embedded_info):
+ new_result = ie_result.copy()
+ for f in ('_type', 'url', 'ext', 'player_url', 'formats',
+ 'entries', 'urlhandle', 'ie_key', 'duration',
+ 'subtitles', 'annotations', 'format',
+ 'thumbnail', 'thumbnails'):
+ if f in new_result:
+ del new_result[f]
+ if f in embedded_info:
+ new_result[f] = embedded_info[f]
+ return new_result
+ new_result = make_result(info)
+
+ assert new_result.get('_type') != 'url_transparent'
+ if new_result.get('_type') == 'compat_list':
+ new_result['entries'] = [
+ make_result(e) for e in new_result['entries']]
+
+ return self.process_ie_result(
+ new_result, download=download, extra_info=extra_info)
elif result_type == 'playlist':
-
# We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None)
self.to_screen(u'[download] Downloading playlist: %s' % playlist)
n_all_entries = len(ie_result['entries'])
playliststart = self.params.get('playliststart', 1) - 1
- playlistend = self.params.get('playlistend', -1)
-
+ playlistend = self.params.get('playlistend', None)
+ # For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
- entries = ie_result['entries'][playliststart:]
- else:
- entries = ie_result['entries'][playliststart:playlistend]
+ playlistend = None
+ entries = ie_result['entries'][playliststart:playlistend]
n_entries = len(entries)
- self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
+ self.to_screen(
+ u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
for i, entry in enumerate(entries, 1):
'playlist_index': i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
+ 'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
{
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
+ 'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
})
return r
# Forced printings
if self.params.get('forcetitle', False):
- compat_print(info_dict['fulltitle'])
+ self.to_stdout(info_dict['fulltitle'])
if self.params.get('forceid', False):
- compat_print(info_dict['id'])
+ self.to_stdout(info_dict['id'])
if self.params.get('forceurl', False):
# For RTMP URLs, also include the playpath
- compat_print(info_dict['url'] + info_dict.get('play_path', u''))
+ self.to_stdout(info_dict['url'] + info_dict.get('play_path', u''))
if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
- compat_print(info_dict['thumbnail'])
+ self.to_stdout(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
- compat_print(info_dict['description'])
+ self.to_stdout(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None:
- compat_print(filename)
+ self.to_stdout(filename)
+ if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
+ self.to_stdout(formatSeconds(info_dict['duration']))
if self.params.get('forceformat', False):
- compat_print(info_dict['format'])
+ self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
- compat_print(json.dumps(info_dict))
+ info_dict['_filename'] = filename
+ self.to_stdout(json.dumps(info_dict))
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
return
if self.params.get('writedescription', False):
- try:
- descfn = filename + u'.description'
- self.report_writedescription(descfn)
- with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
- descfile.write(info_dict['description'])
- except (KeyError, TypeError):
- self.report_warning(u'There\'s no description to write.')
- except (OSError, IOError):
- self.report_error(u'Cannot write description file ' + descfn)
- return
+ descfn = filename + u'.description'
+ if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(descfn)):
+ self.to_screen(u'[info] Video description is already present')
+ else:
+ try:
+ self.to_screen(u'[info] Writing video description to: ' + descfn)
+ with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
+ descfile.write(info_dict['description'])
+ except (KeyError, TypeError):
+ self.report_warning(u'There\'s no description to write.')
+ except (OSError, IOError):
+ self.report_error(u'Cannot write description file ' + descfn)
+ return
if self.params.get('writeannotations', False):
- try:
- annofn = filename + u'.annotations.xml'
- self.report_writeannotations(annofn)
- with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
- annofile.write(info_dict['annotations'])
- except (KeyError, TypeError):
- self.report_warning(u'There are no annotations to write.')
- except (OSError, IOError):
- self.report_error(u'Cannot write annotations file: ' + annofn)
- return
+ annofn = filename + u'.annotations.xml'
+ if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
+ self.to_screen(u'[info] Video annotations are already present')
+ else:
+ try:
+ self.to_screen(u'[info] Writing video annotations to: ' + annofn)
+ with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
+ annofile.write(info_dict['annotations'])
+ except (KeyError, TypeError):
+ self.report_warning(u'There are no annotations to write.')
+ except (OSError, IOError):
+ self.report_error(u'Cannot write annotations file: ' + annofn)
+ return
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
continue
try:
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
- self.report_writesubtitles(sub_filename)
- with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
- subfile.write(sub)
+ if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
+ self.to_screen(u'[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
+ else:
+ self.to_screen(u'[info] Writing video subtitles to: ' + sub_filename)
+ with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
+ subfile.write(sub)
except (OSError, IOError):
self.report_error(u'Cannot write subtitles file ' + descfn)
return
if self.params.get('writeinfojson', False):
infofn = os.path.splitext(filename)[0] + u'.info.json'
- self.report_writeinfojson(infofn)
- try:
- json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
- write_json_file(json_info_dict, encodeFilename(infofn))
- except (OSError, IOError):
- self.report_error(u'Cannot write metadata to JSON file ' + infofn)
- return
+ if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(infofn)):
+ self.to_screen(u'[info] Video description metadata is already present')
+ else:
+ self.to_screen(u'[info] Writing video description metadata as JSON to: ' + infofn)
+ try:
+ json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
+ write_json_file(json_info_dict, encodeFilename(infofn))
+ except (OSError, IOError):
+ self.report_error(u'Cannot write metadata to JSON file ' + infofn)
+ return
if self.params.get('writethumbnail', False):
if info_dict.get('thumbnail') is not None:
thumb_format = determine_ext(info_dict['thumbnail'], u'jpg')
- thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
- self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
- (info_dict['extractor'], info_dict['id']))
- try:
- uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
- with open(thumb_filename, 'wb') as thumbf:
- shutil.copyfileobj(uf, thumbf)
- self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
- (info_dict['extractor'], info_dict['id'], thumb_filename))
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- self.report_warning(u'Unable to download thumbnail "%s": %s' %
- (info_dict['thumbnail'], compat_str(err)))
+ thumb_filename = os.path.splitext(filename)[0] + u'.' + thumb_format
+ if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
+ self.to_screen(u'[%s] %s: Thumbnail is already present' %
+ (info_dict['extractor'], info_dict['id']))
+ else:
+ self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
+ (info_dict['extractor'], info_dict['id']))
+ try:
+ uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
+ with open(thumb_filename, 'wb') as thumbf:
+ shutil.copyfileobj(uf, thumbf)
+ self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
+ (info_dict['extractor'], info_dict['id'], thumb_filename))
+ except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
+ self.report_warning(u'Unable to download thumbnail "%s": %s' %
+ (info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
return self._download_retcode
+ def download_with_info_file(self, info_filename):
+ with io.open(info_filename, 'r', encoding='utf-8') as f:
+ info = json.load(f)
+ try:
+ self.process_ie_result(info, download=True)
+ except DownloadError:
+ webpage_url = info.get('webpage_url')
+ if webpage_url is not None:
+ self.report_warning(u'The info failed to download, trying with "%s"' % webpage_url)
+ return self.download([webpage_url])
+ else:
+ raise
+ return self._download_retcode
+
def post_process(self, filename, ie_info):
"""Run all the postprocessors on the given file."""
info = dict(ie_info)
'Anton Larionov',
'Takuya Tsuchida',
'Sergey M.',
+ 'Michael Orlitzky',
)
__license__ = 'Public Domain'
import random
import re
import shlex
-import subprocess
import sys
compat_print,
DateRange,
decodeOption,
- determine_ext,
+ get_term_width,
DownloadError,
get_cachedir,
MaxDownloadsReached,
preferredencoding,
SameFileError,
+ setproctitle,
std_headers,
write_string,
)
def _comma_separated_values_options_callback(option, opt_str, value, parser):
setattr(parser.values, option.dest, value.split(','))
- def _find_term_columns():
- columns = os.environ.get('COLUMNS', None)
- if columns:
- return int(columns)
-
- try:
- sp = subprocess.Popen(['stty', 'size'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
- out,err = sp.communicate()
- return int(out.split()[1])
- except:
- pass
- return None
-
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
max_help_position = 80
# No need to wrap help messages if we're on a wide console
- columns = _find_term_columns()
+ columns = get_term_width()
if columns: max_width = columns
fmt = optparse.IndentedHelpFormatter(width=max_width, max_help_position=max_help_position)
general.add_option(
'--socket-timeout', dest='socket_timeout',
type=float, default=None, help=optparse.SUPPRESS_HELP)
-
-
- selection.add_option('--playlist-start',
- dest='playliststart', metavar='NUMBER', help='playlist video to start at (default is %default)', default=1)
- selection.add_option('--playlist-end',
- dest='playlistend', metavar='NUMBER', help='playlist video to end at (default is last)', default=-1)
+ general.add_option(
+ '--bidi-workaround', dest='bidi_workaround', action='store_true',
+ help=u'Work around terminals that lack bidirectional text support. Requires fribidi executable in PATH')
+
+
+ selection.add_option(
+ '--playlist-start',
+ dest='playliststart', metavar='NUMBER', default=1, type=int,
+ help='playlist video to start at (default is %default)')
+ selection.add_option(
+ '--playlist-end',
+ dest='playlistend', metavar='NUMBER', default=None, type=int,
+ help='playlist video to end at (default is last)')
selection.add_option('--match-title', dest='matchtitle', metavar='REGEX',help='download only matching titles (regex or caseless sub-string)')
selection.add_option('--reject-title', dest='rejecttitle', metavar='REGEX',help='skip download for matching titles (regex or caseless sub-string)')
selection.add_option('--max-downloads', metavar='NUMBER',
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
+ selection.add_option(
+ '--min-views', metavar='COUNT', dest='min_views',
+ default=None, type=int,
+ help="Do not download any videos with less than COUNT views",)
+ selection.add_option(
+ '--max-views', metavar='COUNT', dest='max_views',
+ default=None, type=int,
+ help="Do not download any videos with more than COUNT views",)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
verbosity.add_option('--get-description',
action='store_true', dest='getdescription',
help='simulate, quiet but print video description', default=False)
+ verbosity.add_option('--get-duration',
+ action='store_true', dest='getduration',
+ help='simulate, quiet but print video length', default=False)
verbosity.add_option('--get-filename',
action='store_true', dest='getfilename',
help='simulate, quiet but print output filename', default=False)
help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False)
filesystem.add_option('-a', '--batch-file',
dest='batchfile', metavar='FILE', help='file containing URLs to download (\'-\' for stdin)')
+ filesystem.add_option('--load-info',
+ dest='load_info_filename', metavar='FILE',
+ help='json file containing the video information (created with the "--write-json" option')
filesystem.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
filesystem.add_option('-c', '--continue',
return parser, opts, args
+
def _real_main(argv=None):
# Compatibility fixes for Windows
if sys.platform == 'win32':
# https://github.com/rg3/youtube-dl/issues/820
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
+ setproctitle(u'youtube-dl')
+
parser, opts, args = parseOpts(argv)
# Set user agent
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
matchedUrls = [url for url in all_urls if ie.suitable(url)]
- all_urls = [url for url in all_urls if url not in matchedUrls]
for mu in matchedUrls:
compat_print(u' ' + mu)
sys.exit(0)
if numeric_buffersize is None:
parser.error(u'invalid buffer size specified')
opts.buffersize = numeric_buffersize
- try:
- opts.playliststart = int(opts.playliststart)
- if opts.playliststart <= 0:
- raise ValueError(u'Playlist start must be positive')
- except (TypeError, ValueError):
- parser.error(u'invalid playlist start number specified')
- try:
- opts.playlistend = int(opts.playlistend)
- if opts.playlistend != -1 and (opts.playlistend <= 0 or opts.playlistend < opts.playliststart):
- raise ValueError(u'Playlist end must be greater than playlist start')
- except (TypeError, ValueError):
- parser.error(u'invalid playlist end number specified')
+ if opts.playliststart <= 0:
+ raise ValueError(u'Playlist start must be positive')
+ if opts.playlistend not in (-1, None) and opts.playlistend < opts.playliststart:
+ raise ValueError(u'Playlist end must be greater than playlist start')
if opts.extractaudio:
if opts.audioformat not in ['best', 'aac', 'mp3', 'm4a', 'opus', 'vorbis', 'wav']:
parser.error(u'invalid audio format specified')
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(title)s-%(id)s.%(ext)s')
- if '%(ext)s' not in outtmpl and opts.extractaudio:
+ if not os.path.splitext(outtmpl)[1] and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
- u' file! Use "%%(ext)s" instead of %r' %
- determine_ext(outtmpl, u''))
+ u' file! Use "{0}.%(ext)s" instead of "{0}" as the output'
+ u' template'.format(outtmpl))
+
+ any_printing = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson
ydl_opts = {
'usenetrc': opts.usenetrc,
'username': opts.username,
'password': opts.password,
'videopassword': opts.videopassword,
- 'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
+ 'quiet': (opts.quiet or any_printing),
'forceurl': opts.geturl,
'forcetitle': opts.gettitle,
'forceid': opts.getid,
'forcethumbnail': opts.getthumbnail,
'forcedescription': opts.getdescription,
+ 'forceduration': opts.getduration,
'forcefilename': opts.getfilename,
'forceformat': opts.getformat,
'forcejson': opts.dumpjson,
'simulate': opts.simulate,
- 'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
+ 'skip_download': (opts.skip_download or opts.simulate or any_printing),
'format': opts.format,
'format_limit': opts.format_limit,
'listformats': opts.listformats,
'keepvideo': opts.keepvideo,
'min_filesize': opts.min_filesize,
'max_filesize': opts.max_filesize,
+ 'min_views': opts.min_views,
+ 'max_views': opts.max_views,
'daterange': date,
'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code,
'nocheckcertificate': opts.no_check_certificate,
'proxy': opts.proxy,
'socket_timeout': opts.socket_timeout,
+ 'bidi_workaround': opts.bidi_workaround,
}
with YoutubeDL(ydl_opts) as ydl:
update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing
- if len(all_urls) < 1:
+ if (len(all_urls) < 1) and (opts.load_info_filename is None):
if not opts.update_self:
parser.error(u'you must provide at least one URL')
else:
sys.exit()
try:
- retcode = ydl.download(all_urls)
+ if opts.load_info_filename is not None:
+ retcode = ydl.download_with_info_file(opts.load_info_filename)
+ else:
+ retcode = ydl.download(all_urls)
except MaxDownloadsReached:
ydl.to_screen(u'--max-download limit reached, aborting.')
retcode = 101
-__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_decrypt_text']
+__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_cbc_decrypt', 'aes_decrypt_text']
import base64
from math import ceil
return decrypted_data
+def aes_cbc_decrypt(data, key, iv):
+ """
+ Decrypt with aes in CBC mode
+
+ @param {int[]} data cipher
+ @param {int[]} key 16/24/32-Byte cipher key
+ @param {int[]} iv 16-Byte IV
+ @returns {int[]} decrypted data
+ """
+ expanded_key = key_expansion(key)
+ block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
+
+ decrypted_data=[]
+ previous_cipher_block = iv
+ for i in range(block_count):
+ block = data[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES]
+ block += [0]*(BLOCK_SIZE_BYTES - len(block))
+
+ decrypted_block = aes_decrypt(block, expanded_key)
+ decrypted_data += xor(decrypted_block, previous_cipher_block)
+ previous_cipher_block = block
+ decrypted_data = decrypted_data[:len(data)]
+
+ return decrypted_data
+
def key_expansion(data):
"""
Generate key schedule
@returns {int[]} 16-Byte cipher
"""
rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
-
+
data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
for i in range(1, rounds+1):
data = sub_bytes(data)
if i != rounds:
data = mix_columns(data)
data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
+
+ return data
+
+def aes_decrypt(data, expanded_key):
+ """
+ Decrypt one block with aes
+
+ @param {int[]} data 16-Byte cipher
+ @param {int[]} expanded_key 176/208/240-Byte expanded key
+ @returns {int[]} 16-Byte state
+ """
+ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
+
+ for i in range(rounds, 0, -1):
+ data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
+ if i != rounds:
+ data = mix_columns_inv(data)
+ data = shift_rows_inv(data)
+ data = sub_bytes_inv(data)
+ data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
return data
0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16)
-MIX_COLUMN_MATRIX = ((2,3,1,1),
- (1,2,3,1),
- (1,1,2,3),
- (3,1,1,2))
+SBOX_INV = (0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
+ 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
+ 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d, 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
+ 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2, 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
+ 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
+ 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda, 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
+ 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a, 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
+ 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02, 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
+ 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea, 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
+ 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85, 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
+ 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89, 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
+ 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20, 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
+ 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31, 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
+ 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d, 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
+ 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0, 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
+ 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26, 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d)
+MIX_COLUMN_MATRIX = ((0x2,0x3,0x1,0x1),
+ (0x1,0x2,0x3,0x1),
+ (0x1,0x1,0x2,0x3),
+ (0x3,0x1,0x1,0x2))
+MIX_COLUMN_MATRIX_INV = ((0xE,0xB,0xD,0x9),
+ (0x9,0xE,0xB,0xD),
+ (0xD,0x9,0xE,0xB),
+ (0xB,0xD,0x9,0xE))
+RIJNDAEL_EXP_TABLE = (0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35,
+ 0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA,
+ 0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31,
+ 0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD,
+ 0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88,
+ 0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A,
+ 0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3,
+ 0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0,
+ 0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41,
+ 0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75,
+ 0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80,
+ 0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54,
+ 0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA,
+ 0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E,
+ 0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17,
+ 0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01)
+RIJNDAEL_LOG_TABLE = (0x00, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1a, 0xc6, 0x4b, 0xc7, 0x1b, 0x68, 0x33, 0xee, 0xdf, 0x03,
+ 0x64, 0x04, 0xe0, 0x0e, 0x34, 0x8d, 0x81, 0xef, 0x4c, 0x71, 0x08, 0xc8, 0xf8, 0x69, 0x1c, 0xc1,
+ 0x7d, 0xc2, 0x1d, 0xb5, 0xf9, 0xb9, 0x27, 0x6a, 0x4d, 0xe4, 0xa6, 0x72, 0x9a, 0xc9, 0x09, 0x78,
+ 0x65, 0x2f, 0x8a, 0x05, 0x21, 0x0f, 0xe1, 0x24, 0x12, 0xf0, 0x82, 0x45, 0x35, 0x93, 0xda, 0x8e,
+ 0x96, 0x8f, 0xdb, 0xbd, 0x36, 0xd0, 0xce, 0x94, 0x13, 0x5c, 0xd2, 0xf1, 0x40, 0x46, 0x83, 0x38,
+ 0x66, 0xdd, 0xfd, 0x30, 0xbf, 0x06, 0x8b, 0x62, 0xb3, 0x25, 0xe2, 0x98, 0x22, 0x88, 0x91, 0x10,
+ 0x7e, 0x6e, 0x48, 0xc3, 0xa3, 0xb6, 0x1e, 0x42, 0x3a, 0x6b, 0x28, 0x54, 0xfa, 0x85, 0x3d, 0xba,
+ 0x2b, 0x79, 0x0a, 0x15, 0x9b, 0x9f, 0x5e, 0xca, 0x4e, 0xd4, 0xac, 0xe5, 0xf3, 0x73, 0xa7, 0x57,
+ 0xaf, 0x58, 0xa8, 0x50, 0xf4, 0xea, 0xd6, 0x74, 0x4f, 0xae, 0xe9, 0xd5, 0xe7, 0xe6, 0xad, 0xe8,
+ 0x2c, 0xd7, 0x75, 0x7a, 0xeb, 0x16, 0x0b, 0xf5, 0x59, 0xcb, 0x5f, 0xb0, 0x9c, 0xa9, 0x51, 0xa0,
+ 0x7f, 0x0c, 0xf6, 0x6f, 0x17, 0xc4, 0x49, 0xec, 0xd8, 0x43, 0x1f, 0x2d, 0xa4, 0x76, 0x7b, 0xb7,
+ 0xcc, 0xbb, 0x3e, 0x5a, 0xfb, 0x60, 0xb1, 0x86, 0x3b, 0x52, 0xa1, 0x6c, 0xaa, 0x55, 0x29, 0x9d,
+ 0x97, 0xb2, 0x87, 0x90, 0x61, 0xbe, 0xdc, 0xfc, 0xbc, 0x95, 0xcf, 0xcd, 0x37, 0x3f, 0x5b, 0xd1,
+ 0x53, 0x39, 0x84, 0x3c, 0x41, 0xa2, 0x6d, 0x47, 0x14, 0x2a, 0x9e, 0x5d, 0x56, 0xf2, 0xd3, 0xab,
+ 0x44, 0x11, 0x92, 0xd9, 0x23, 0x20, 0x2e, 0x89, 0xb4, 0x7c, 0xb8, 0x26, 0x77, 0x99, 0xe3, 0xa5,
+ 0x67, 0x4a, 0xed, 0xde, 0xc5, 0x31, 0xfe, 0x18, 0x0d, 0x63, 0x8c, 0x80, 0xc0, 0xf7, 0x70, 0x07)
def sub_bytes(data):
return [SBOX[x] for x in data]
+def sub_bytes_inv(data):
+ return [SBOX_INV[x] for x in data]
+
def rotate(data):
return data[1:] + [data[0]]
def xor(data1, data2):
return [x^y for x, y in zip(data1, data2)]
-def mix_column(data):
+def rijndael_mul(a, b):
+ if(a==0 or b==0):
+ return 0
+ return RIJNDAEL_EXP_TABLE[(RIJNDAEL_LOG_TABLE[a] + RIJNDAEL_LOG_TABLE[b]) % 0xFF]
+
+def mix_column(data, matrix):
data_mixed = []
for row in range(4):
mixed = 0
for column in range(4):
- addend = data[column]
- if MIX_COLUMN_MATRIX[row][column] in (2,3):
- addend <<= 1
- if addend > 0xff:
- addend &= 0xff
- addend ^= 0x1b
- if MIX_COLUMN_MATRIX[row][column] == 3:
- addend ^= data[column]
- mixed ^= addend & 0xff
+ # xor is (+) and (-)
+ mixed ^= rijndael_mul(data[column], matrix[row][column])
data_mixed.append(mixed)
return data_mixed
-def mix_columns(data):
+def mix_columns(data, matrix=MIX_COLUMN_MATRIX):
data_mixed = []
for i in range(4):
column = data[i*4 : (i+1)*4]
- data_mixed += mix_column(column)
+ data_mixed += mix_column(column, matrix)
return data_mixed
+def mix_columns_inv(data):
+ return mix_columns(data, MIX_COLUMN_MATRIX_INV)
+
def shift_rows(data):
data_shifted = []
for column in range(4):
data_shifted.append( data[((column + row) & 0b11) * 4 + row] )
return data_shifted
+def shift_rows_inv(data):
+ data_shifted = []
+ for column in range(4):
+ for row in range(4):
+ data_shifted.append( data[((column - row) & 0b11) * 4 + row] )
+ return data_shifted
+
def inc(data):
data = data[:] # copy
for i in range(len(data)-1,-1,-1):
-from .appletrailers import AppleTrailersIE
+from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .anitube import AnitubeIE
+from .aparat import AparatIE
+from .appletrailers import AppleTrailersIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .arte import (
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
+ ArteTVDDCIE,
)
from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
+from .blinkx import BlinkxIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .breakcom import BreakIE
from .c56 import C56IE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
+from .cbs import CBSIE
+from .channel9 import Channel9IE
from .cinemassacre import CinemassacreIE
from .clipfish import ClipfishIE
from .clipsyndicate import ClipsyndicateIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .condenast import CondeNastIE
from .criterion import CriterionIE
+from .crunchyroll import CrunchyrollIE
from .cspan import CSpanIE
from .d8 import D8IE
from .dailymotion import (
from .francetv import (
PluzzIE,
FranceTvInfoIE,
- France2IE,
+ FranceTVIE,
GenerationQuoiIE
)
from .freesound import FreesoundIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
+from .ivi import (
+ IviIE,
+ IviCompilationIE
+)
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
from .keek import KeekIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE, LivestreamOriginalIE
+from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
+from .ndtv import NDTVIE
from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .niconico import NiconicoIE
+from .ninegag import NineGagIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .podomatic import PodomaticIE
+from .pornhd import PornHdIE
from .pornhub import PornHubIE
from .pornotube import PornotubeIE
+from .pyvideo import PyvideoIE
+from .radiofrance import RadioFranceIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .ringtv import RingTVIE
SmotriIE,
SmotriCommunityIE,
SmotriUserIE,
+ SmotriBroadcastIE,
)
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tf1 import TF1IE
+from .theplatform import ThePlatformIE
from .thisav import ThisAVIE
from .toutv import TouTvIE
from .traileraddict import TrailerAddictIE
VimeoIE,
VimeoChannelIE,
VimeoUserIE,
+ VimeoAlbumIE,
+ VimeoGroupsIE,
)
from .vine import VineIE
from .viki import VikiIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
+from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .xhamster import XHamsterIE
from .xnxx import XNXXIE
YoutubeWatchLaterIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
+ YoutubeTopListIE,
)
from .zdf import ZDFIE
--- /dev/null
+import re
+
+from .common import InfoExtractor
+
+
+class AcademicEarthCourseIE(InfoExtractor):
+ _VALID_URL = r'^https?://(?:www\.)?academicearth\.org/(?:courses|playlists)/(?P<id>[^?#/]+)'
+ IE_NAME = u'AcademicEarth:Course'
+
+ def _real_extract(self, url):
+ m = re.match(self._VALID_URL, url)
+ playlist_id = m.group('id')
+
+ webpage = self._download_webpage(url, playlist_id)
+ title = self._html_search_regex(
+ r'<h1 class="playlist-name">(.*?)</h1>', webpage, u'title')
+ description = self._html_search_regex(
+ r'<p class="excerpt">(.*?)</p>',
+ webpage, u'description', fatal=False)
+ urls = re.findall(
+ r'<h3 class="lecture-title"><a target="_blank" href="([^"]+)">',
+ webpage)
+ entries = [self.url_result(u) for u in urls]
+
+ return {
+ '_type': 'playlist',
+ 'id': playlist_id,
+ 'title': title,
+ 'description': description,
+ 'entries': entries,
+ }
class AddAnimeIE(InfoExtractor):
- _VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
+ _VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video\.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'AddAnime'
_TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
--- /dev/null
+#coding: utf-8
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ HEADRequest,
+)
+
+
+class AparatIE(InfoExtractor):
+ _VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
+
+ _TEST = {
+ u'url': u'http://www.aparat.com/v/wP8On',
+ u'file': u'wP8On.mp4',
+ u'md5': u'6714e0af7e0d875c5a39c4dc4ab46ad1',
+ u'info_dict': {
+ u"title": u"تیم گلکسی 11 - زومیت",
+ },
+ #u'skip': u'Extremely unreliable',
+ }
+
+ def _real_extract(self, url):
+ m = re.match(self._VALID_URL, url)
+ video_id = m.group('id')
+
+ # Note: There is an easier-to-parse configuration at
+ # http://www.aparat.com/video/video/config/videohash/%video_id
+ # but the URL in there does not work
+ embed_url = (u'http://www.aparat.com/video/video/embed/videohash/' +
+ video_id + u'/vt/frame')
+ webpage = self._download_webpage(embed_url, video_id)
+
+ video_urls = re.findall(r'fileList\[[0-9]+\]\s*=\s*"([^"]+)"', webpage)
+ for i, video_url in enumerate(video_urls):
+ req = HEADRequest(video_url)
+ res = self._request_webpage(
+ req, video_id, note=u'Testing video URL %d' % i, errnote=False)
+ if res:
+ break
+ else:
+ raise ExtractorError(u'No working video URLs found')
+
+ title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, u'title')
+ thumbnail = self._search_regex(
+ r'\s+image:\s*"([^"]+)"', webpage, u'thumbnail', fatal=False)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'url': video_url,
+ 'ext': 'mp4',
+ 'thumbnail': thumbnail,
+ }
import re
-import xml.etree.ElementTree
import json
from .common import InfoExtractor
class AppleTrailersIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?trailers.apple.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
+ _VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TEST = {
u"url": u"http://trailers.apple.com/trailers/wb/manofsteel/",
u"playlist": [
uploader_id = mobj.group('company')
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
- playlist_snippet = self._download_webpage(playlist_url, movie)
- playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
- playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
- # The ' in the onClick attributes are not escaped, it couldn't be parsed
- # with xml.etree.ElementTree.fromstring
- # like: http://trailers.apple.com/trailers/wb/gravity/
- def _clean_json(m):
- return u'iTunes.playURL(%s);' % m.group(1).replace('\'', ''')
- playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
- playlist_html = u'<html>' + playlist_cleaned + u'</html>'
+ def fix_html(s):
+ s = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', s)
+ s = re.sub(r'<img ([^<]*?)>', r'<img \1/>', s)
+ # The ' in the onClick attributes are not escaped, it couldn't be parsed
+ # like: http://trailers.apple.com/trailers/wb/gravity/
+ def _clean_json(m):
+ return u'iTunes.playURL(%s);' % m.group(1).replace('\'', ''')
+ s = re.sub(self._JSON_RE, _clean_json, s)
+ s = u'<html>' + s + u'</html>'
+ return s
+ doc = self._download_xml(playlist_url, movie, transform_source=fix_html)
- doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = []
for li in doc.findall('./div/ul/li'):
on_click = li.find('.//a').attrib['onClick']
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
- _VALID_URL = r'(?:https?://)?(?:www\.)?archive.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
+ _VALID_URL = r'(?:https?://)?(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
_TEST = {
u"url": u"http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect",
u'file': u'XD300-23_68HighlightsAResearchCntAugHumanIntellect.ogv',
determine_ext,
get_element_by_id,
compat_str,
+ get_element_by_attribute,
)
# There are different sources of video in arte.tv, the extraction process
# add tests.
class ArteTvIE(InfoExtractor):
- _VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
- _LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
+ _VIDEOS_URL = r'(?:http://)?videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
+ _LIVEWEB_URL = r'(?:http://)?liveweb\.arte\.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
IE_NAME = u'arte.tv'
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
+ return self._extract_from_json_url(json_url, video_id, lang)
+ def _extract_from_json_url(self, json_url, video_id, lang):
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)
+
+
+class ArteTVDDCIE(ArteTVPlus7IE):
+ IE_NAME = u'arte.tv:ddc'
+ _VALID_URL = r'http?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
+
+ def _real_extract(self, url):
+ video_id, lang = self._extract_url_info(url)
+ if lang == 'folge':
+ lang = 'de'
+ elif lang == 'emission':
+ lang = 'fr'
+ webpage = self._download_webpage(url, video_id)
+ scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
+ script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
+ javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
+ json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
+ return self._extract_from_json_url(json_url, video_id, lang)
u"title": u"[Commie]The Legend of the Legendary Heroes - 03 - Replication Eye (Alpha Stigma)[F9410F5A]"
}
}
- _VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed.php\?.*?file=([^&]+).*?'
+ _VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed\.php\?.*?file=([^&]+).*?'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
class BambuserChannelIE(InfoExtractor):
IE_NAME = u'bambuser:channel'
- _VALID_URL = r'http://bambuser.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
+ _VALID_URL = r'https?://bambuser\.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
# The maximum number we can get with each request
_STEP = 50
--- /dev/null
+import datetime
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ remove_start,
+)
+
+
+class BlinkxIE(InfoExtractor):
+ _VALID_URL = r'^(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
+ _IE_NAME = u'blinkx'
+
+ _TEST = {
+ u'url': u'http://www.blinkx.com/ce/8aQUy7GVFYgFzpKhT0oqsilwOGFRVXk3R1ZGWWdGenBLaFQwb3FzaWx3OGFRVXk3R1ZGWWdGenB',
+ u'file': u'8aQUy7GV.mp4',
+ u'md5': u'2e9a07364af40163a908edbf10bb2492',
+ u'info_dict': {
+ u"title": u"Police Car Rolls Away",
+ u"uploader": u"stupidvideos.com",
+ u"upload_date": u"20131215",
+ u"description": u"A police car gently rolls away from a fight. Maybe it felt weird being around a confrontation and just had to get out of there!",
+ u"duration": 14.886,
+ u"thumbnails": [{
+ "width": 100,
+ "height": 76,
+ "url": "http://cdn.blinkx.com/stream/b/41/StupidVideos/20131215/1873969261/1873969261_tn_0.jpg",
+ }],
+ },
+ }
+
+ def _real_extract(self, url):
+ m = re.match(self._VALID_URL, url)
+ video_id = m.group('id')
+ display_id = video_id[:8]
+
+ api_url = (u'https://apib4.blinkx.com/api.php?action=play_video&' +
+ u'video=%s' % video_id)
+ data_json = self._download_webpage(api_url, display_id)
+ data = json.loads(data_json)['api']['results'][0]
+ dt = datetime.datetime.fromtimestamp(data['pubdate_epoch'])
+ upload_date = dt.strftime('%Y%m%d')
+
+ duration = None
+ thumbnails = []
+ formats = []
+ for m in data['media']:
+ if m['type'] == 'jpg':
+ thumbnails.append({
+ 'url': m['link'],
+ 'width': int(m['w']),
+ 'height': int(m['h']),
+ })
+ elif m['type'] == 'original':
+ duration = m['d']
+ elif m['type'] == 'youtube':
+ yt_id = m['link']
+ self.to_screen(u'Youtube video detected: %s' % yt_id)
+ return self.url_result(yt_id, 'Youtube', video_id=yt_id)
+ elif m['type'] in ('flv', 'mp4'):
+ vcodec = remove_start(m['vcodec'], 'ff')
+ acodec = remove_start(m['acodec'], 'ff')
+ format_id = (u'%s-%sk-%s' %
+ (vcodec,
+ (int(m['vbr']) + int(m['abr'])) // 1000,
+ m['w']))
+ formats.append({
+ 'format_id': format_id,
+ 'url': m['link'],
+ 'vcodec': vcodec,
+ 'acodec': acodec,
+ 'abr': int(m['abr']) // 1000,
+ 'vbr': int(m['vbr']) // 1000,
+ 'width': int(m['w']),
+ 'height': int(m['h']),
+ })
+ formats.sort(key=lambda f: (f['width'], f['vbr'], f['abr']))
+
+ return {
+ 'id': display_id,
+ 'fullid': video_id,
+ 'title': data['title'],
+ 'formats': formats,
+ 'uploader': data['channel_name'],
+ 'upload_date': upload_date,
+ 'description': data.get('description'),
+ 'thumbnails': thumbnails,
+ 'duration': duration,
+ }
url = 'http://blip.tv/play/g_%s' % api_mobj.group('video_id')
urlp = compat_urllib_parse_urlparse(url)
if urlp.path.startswith('/play/'):
- request = compat_urllib_request.Request(url)
- response = compat_urllib_request.urlopen(request)
+ response = self._request_webpage(url, None, False)
redirecturl = response.geturl()
rurlp = compat_urllib_parse_urlparse(redirecturl)
file_id = compat_parse_qs(rurlp.fragment)['file'][0].rpartition('/')[2]
request.add_header('User-Agent', 'iTunes/10.6.1')
self.report_extraction(mobj.group(1))
info = None
- try:
- urlh = compat_urllib_request.urlopen(request)
- if urlh.headers.get('Content-Type', '').startswith('video/'): # Direct download
- basename = url.split('/')[-1]
- title,ext = os.path.splitext(basename)
- title = title.decode('UTF-8')
- ext = ext.replace('.', '')
- self.report_direct_download(title)
- info = {
- 'id': title,
- 'url': url,
- 'uploader': None,
- 'upload_date': None,
- 'title': title,
- 'ext': ext,
- 'urlhandle': urlh
- }
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'ERROR: unable to download video info webpage: %s' % compat_str(err))
+ urlh = self._request_webpage(request, None, False,
+ u'unable to download video info webpage')
+ if urlh.headers.get('Content-Type', '').startswith('video/'): # Direct download
+ basename = url.split('/')[-1]
+ title,ext = os.path.splitext(basename)
+ title = title.decode('UTF-8')
+ ext = ext.replace('.', '')
+ self.report_direct_download(title)
+ info = {
+ 'id': title,
+ 'url': url,
+ 'uploader': None,
+ 'upload_date': None,
+ 'title': title,
+ 'ext': ext,
+ 'urlhandle': urlh
+ }
if info is None: # Regular URL
try:
json_code_bytes = urlh.read()
class BloombergIE(InfoExtractor):
- _VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?).html'
+ _VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?)\.html'
_TEST = {
u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
- u'md5': u'8eccab865181d29ec2958f32a6a754f5',
+ u'md5': u'5423e113865d26e40624dce2e4b45d95',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'Mashable',
},
},
+ {
+ # test that the default referer works
+ # from http://national.ballet.ca/interact/video/Lost_in_Motion_II/
+ u'url': u'http://link.brightcove.com/services/player/bcpid756015033001?bckey=AQ~~,AAAApYJi_Ck~,GxhXCegT1Dp39ilhXuxMJxasUhVNZiil&bctid=2878862109001',
+ u'info_dict': {
+ u'id': u'2878862109001',
+ u'ext': u'mp4',
+ u'title': u'Lost in Motion II',
+ u'description': u'md5:363109c02998fee92ec02211bd8000df',
+ u'uploader': u'National Ballet of Canada',
+ },
+ },
]
@classmethod
videoPlayer = query.get('@videoPlayer')
if videoPlayer:
- return self._get_video_info(videoPlayer[0], query_str, query)
+ return self._get_video_info(videoPlayer[0], query_str, query,
+ # We set the original url as the default 'Referer' header
+ referer=url)
else:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
- def _get_video_info(self, video_id, query_str, query):
+ def _get_video_info(self, video_id, query_str, query, referer=None):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
req = compat_urllib_request.Request(request_url)
linkBase = query.get('linkBaseURL')
if linkBase is not None:
- req.add_header('Referer', linkBase[0])
+ referer = linkBase[0]
+ if referer is not None:
+ req.add_header('Referer', referer)
webpage = self._download_webpage(req, video_id)
self.report_extraction(video_id)
--- /dev/null
+import re
+
+from .common import InfoExtractor
+
+
+class CBSIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?cbs\.com/shows/[^/]+/video/(?P<id>[^/]+)/.*'
+
+ _TEST = {
+ u'url': u'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
+ u'file': u'4JUVEwq3wUT7.flv',
+ u'info_dict': {
+ u'title': u'Connect Chat feat. Garth Brooks',
+ u'description': u'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
+ u'duration': 1495,
+ },
+ u'params': {
+ # rtmp download
+ u'skip_download': True,
+ },
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ webpage = self._download_webpage(url, video_id)
+ real_id = self._search_regex(
+ r"video\.settings\.pid\s*=\s*'([^']+)';",
+ webpage, u'real video ID')
+ return self.url_result(u'theplatform:%s' % real_id)
--- /dev/null
+# encoding: utf-8
+
+import re
+
+from .common import InfoExtractor
+from ..utils import ExtractorError
+
+class Channel9IE(InfoExtractor):
+ '''
+ Common extractor for channel9.msdn.com.
+
+ The type of provided URL (video or playlist) is determined according to
+ meta Search.PageType from web page HTML rather than URL itself, as it is
+ not always possible to do.
+ '''
+ IE_DESC = u'Channel 9'
+ IE_NAME = u'channel9'
+ _VALID_URL = r'^https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
+
+ _TESTS = [
+ {
+ u'url': u'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
+ u'file': u'Events_TechEd_Australia_2013_KOS002.mp4',
+ u'md5': u'bbd75296ba47916b754e73c3a4bbdf10',
+ u'info_dict': {
+ u'title': u'Developer Kick-Off Session: Stuff We Love',
+ u'description': u'md5:c08d72240b7c87fcecafe2692f80e35f',
+ u'duration': 4576,
+ u'thumbnail': u'http://media.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
+ u'session_code': u'KOS002',
+ u'session_day': u'Day 1',
+ u'session_room': u'Arena 1A',
+ u'session_speakers': [ u'Ed Blankenship', u'Andrew Coates', u'Brady Gaster', u'Patrick Klug', u'Mads Kristensen' ],
+ },
+ },
+ {
+ u'url': u'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
+ u'file': u'posts_Self-service-BI-with-Power-BI-nuclear-testing.mp4',
+ u'md5': u'b43ee4529d111bc37ba7ee4f34813e68',
+ u'info_dict': {
+ u'title': u'Self-service BI with Power BI - nuclear testing',
+ u'description': u'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
+ u'duration': 1540,
+ u'thumbnail': u'http://media.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
+ u'authors': [ u'Mike Wilmot' ],
+ },
+ }
+ ]
+
+ _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
+
+ # Sorted by quality
+ _known_formats = ['MP3', 'MP4', 'Mid Quality WMV', 'Mid Quality MP4', 'High Quality WMV', 'High Quality MP4']
+
+ def _restore_bytes(self, formatted_size):
+ if not formatted_size:
+ return 0
+ m = re.match(r'^(?P<size>\d+(?:\.\d+)?)\s+(?P<units>[a-zA-Z]+)', formatted_size)
+ if not m:
+ return 0
+ units = m.group('units')
+ try:
+ exponent = [u'B', u'KB', u'MB', u'GB', u'TB', u'PB', u'EB', u'ZB', u'YB'].index(units.upper())
+ except ValueError:
+ return 0
+ size = float(m.group('size'))
+ return int(size * (1024 ** exponent))
+
+ def _formats_from_html(self, html):
+ FORMAT_REGEX = r'''
+ (?x)
+ <a\s+href="(?P<url>[^"]+)">(?P<quality>[^<]+)</a>\s*
+ <span\s+class="usage">\((?P<note>[^\)]+)\)</span>\s*
+ (?:<div\s+class="popup\s+rounded">\s*
+ <h3>File\s+size</h3>\s*(?P<filesize>.*?)\s*
+ </div>)? # File size part may be missing
+ '''
+ # Extract known formats
+ formats = [{'url': x.group('url'),
+ 'format_id': x.group('quality'),
+ 'format_note': x.group('note'),
+ 'format': '%s (%s)' % (x.group('quality'), x.group('note')),
+ 'filesize': self._restore_bytes(x.group('filesize')), # File size is approximate
+ } for x in list(re.finditer(FORMAT_REGEX, html)) if x.group('quality') in self._known_formats]
+ # Sort according to known formats list
+ formats.sort(key=lambda fmt: self._known_formats.index(fmt['format_id']))
+ return formats
+
+ def _extract_title(self, html):
+ title = self._html_search_meta(u'title', html, u'title')
+ if title is None:
+ title = self._og_search_title(html)
+ TITLE_SUFFIX = u' (Channel 9)'
+ if title is not None and title.endswith(TITLE_SUFFIX):
+ title = title[:-len(TITLE_SUFFIX)]
+ return title
+
+ def _extract_description(self, html):
+ DESCRIPTION_REGEX = r'''(?sx)
+ <div\s+class="entry-content">\s*
+ <div\s+id="entry-body">\s*
+ (?P<description>.+?)\s*
+ </div>\s*
+ </div>
+ '''
+ m = re.search(DESCRIPTION_REGEX, html)
+ if m is not None:
+ return m.group('description')
+ return self._html_search_meta(u'description', html, u'description')
+
+ def _extract_duration(self, html):
+ m = re.search(r'data-video_duration="(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})"', html)
+ return ((int(m.group('hours')) * 60 * 60) + (int(m.group('minutes')) * 60) + int(m.group('seconds'))) if m else None
+
+ def _extract_slides(self, html):
+ m = re.search(r'<a href="(?P<slidesurl>[^"]+)" class="slides">Slides</a>', html)
+ return m.group('slidesurl') if m is not None else None
+
+ def _extract_zip(self, html):
+ m = re.search(r'<a href="(?P<zipurl>[^"]+)" class="zip">Zip</a>', html)
+ return m.group('zipurl') if m is not None else None
+
+ def _extract_avg_rating(self, html):
+ m = re.search(r'<p class="avg-rating">Avg Rating: <span>(?P<avgrating>[^<]+)</span></p>', html)
+ return float(m.group('avgrating')) if m is not None else 0
+
+ def _extract_rating_count(self, html):
+ m = re.search(r'<div class="rating-count">\((?P<ratingcount>[^<]+)\)</div>', html)
+ return int(self._fix_count(m.group('ratingcount'))) if m is not None else 0
+
+ def _extract_view_count(self, html):
+ m = re.search(r'<li class="views">\s*<span class="count">(?P<viewcount>[^<]+)</span> Views\s*</li>', html)
+ return int(self._fix_count(m.group('viewcount'))) if m is not None else 0
+
+ def _extract_comment_count(self, html):
+ m = re.search(r'<li class="comments">\s*<a href="#comments">\s*<span class="count">(?P<commentcount>[^<]+)</span> Comments\s*</a>\s*</li>', html)
+ return int(self._fix_count(m.group('commentcount'))) if m is not None else 0
+
+ def _fix_count(self, count):
+ return int(str(count).replace(',', '')) if count is not None else None
+
+ def _extract_authors(self, html):
+ m = re.search(r'(?s)<li class="author">(.*?)</li>', html)
+ if m is None:
+ return None
+ return re.findall(r'<a href="/Niners/[^"]+">([^<]+)</a>', m.group(1))
+
+ def _extract_session_code(self, html):
+ m = re.search(r'<li class="code">\s*(?P<code>.+?)\s*</li>', html)
+ return m.group('code') if m is not None else None
+
+ def _extract_session_day(self, html):
+ m = re.search(r'<li class="day">\s*<a href="/Events/[^"]+">(?P<day>[^<]+)</a>\s*</li>', html)
+ return m.group('day') if m is not None else None
+
+ def _extract_session_room(self, html):
+ m = re.search(r'<li class="room">\s*(?P<room>.+?)\s*</li>', html)
+ return m.group('room') if m is not None else None
+
+ def _extract_session_speakers(self, html):
+ return re.findall(r'<a href="/Events/Speakers/[^"]+">([^<]+)</a>', html)
+
+ def _extract_content(self, html, content_path):
+ # Look for downloadable content
+ formats = self._formats_from_html(html)
+ slides = self._extract_slides(html)
+ zip_ = self._extract_zip(html)
+
+ # Nothing to download
+ if len(formats) == 0 and slides is None and zip_ is None:
+ self._downloader.report_warning(u'None of recording, slides or zip are available for %s' % content_path)
+ return
+
+ # Extract meta
+ title = self._extract_title(html)
+ description = self._extract_description(html)
+ thumbnail = self._og_search_thumbnail(html)
+ duration = self._extract_duration(html)
+ avg_rating = self._extract_avg_rating(html)
+ rating_count = self._extract_rating_count(html)
+ view_count = self._extract_view_count(html)
+ comment_count = self._extract_comment_count(html)
+
+ common = {'_type': 'video',
+ 'id': content_path,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'duration': duration,
+ 'avg_rating': avg_rating,
+ 'rating_count': rating_count,
+ 'view_count': view_count,
+ 'comment_count': comment_count,
+ }
+
+ result = []
+
+ if slides is not None:
+ d = common.copy()
+ d.update({ 'title': title + '-Slides', 'url': slides })
+ result.append(d)
+
+ if zip_ is not None:
+ d = common.copy()
+ d.update({ 'title': title + '-Zip', 'url': zip_ })
+ result.append(d)
+
+ if len(formats) > 0:
+ d = common.copy()
+ d.update({ 'title': title, 'formats': formats })
+ result.append(d)
+
+ return result
+
+ def _extract_entry_item(self, html, content_path):
+ contents = self._extract_content(html, content_path)
+ if contents is None:
+ return contents
+
+ authors = self._extract_authors(html)
+
+ for content in contents:
+ content['authors'] = authors
+
+ return contents
+
+ def _extract_session(self, html, content_path):
+ contents = self._extract_content(html, content_path)
+ if contents is None:
+ return contents
+
+ session_meta = {'session_code': self._extract_session_code(html),
+ 'session_day': self._extract_session_day(html),
+ 'session_room': self._extract_session_room(html),
+ 'session_speakers': self._extract_session_speakers(html),
+ }
+
+ for content in contents:
+ content.update(session_meta)
+
+ return contents
+
+ def _extract_list(self, content_path):
+ rss = self._download_xml(self._RSS_URL % content_path, content_path, u'Downloading RSS')
+ entries = [self.url_result(session_url.text, 'Channel9')
+ for session_url in rss.findall('./channel/item/link')]
+ title_text = rss.find('./channel/title').text
+ return self.playlist_result(entries, content_path, title_text)
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ content_path = mobj.group('contentpath')
+
+ webpage = self._download_webpage(url, content_path, u'Downloading web page')
+
+ page_type_m = re.search(r'<meta name="Search.PageType" content="(?P<pagetype>[^"]+)"/>', webpage)
+ if page_type_m is None:
+ raise ExtractorError(u'Search.PageType not found, don\'t know how to process this page', expected=True)
+
+ page_type = page_type_m.group('pagetype')
+ if page_type == 'List': # List page, may contain list of 'item'-like objects
+ return self._extract_list(content_path)
+ elif page_type == 'Entry.Item': # Any 'item'-like page, may contain downloadable content
+ return self._extract_entry_item(webpage, content_path)
+ elif page_type == 'Session': # Event session page, may contain downloadable content
+ return self._extract_session(webpage, content_path)
+ else:
+ raise ExtractorError(u'Unexpected Search.PageType %s' % page_type, expected=True)
\ No newline at end of file
import re
-import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
find_xpath_attr,
+ fix_xml_all_ampersand,
)
# it includes a required token
flvars = self._search_regex(r'flvars: "(.*?)"', js_player, u'flvars')
- playlist_page = self._download_webpage(
+ pdoc = self._download_xml(
'http://eplayer.clipsyndicate.com/osmf/playlist?%s' % flvars,
- video_id, u'Downloading video info')
- # Fix broken xml
- playlist_page = re.sub('&', '&', playlist_page)
- pdoc = xml.etree.ElementTree.fromstring(playlist_page.encode('utf-8'))
+ video_id, u'Downloading video info',
+ transform_source=fix_xml_all_ampersand)
track_doc = pdoc.find('trackList/track')
def find_param(name):
class ComedyCentralIE(MTVServicesInfoExtractor):
- _VALID_URL = r'http://www.comedycentral.com/(video-clips|episodes|cc-studios)/(?P<title>.*)'
+ _VALID_URL = r'https?://(?:www.)?comedycentral.com/(video-clips|episodes|cc-studios)/(?P<title>.*)'
_FEED_URL = u'http://comedycentral.com/feeds/mrss/'
_TEST = {
sanitize_filename,
unescapeHTML,
)
+_NO_DEFAULT = object()
class InfoExtractor(object):
The dictionaries must include the following fields:
id: Video identifier.
- url: Final video URL.
title: Video title, unescaped.
- ext: Video filename extension.
- Instead of url and ext, formats can also specified.
+ Additionally, it must contain either a formats entry or url and ext:
+
+ formats: A list of dictionaries for each format available, it must
+ be ordered from worst to best quality. Potential fields:
+ * url Mandatory. The URL of the video file
+ * ext Will be calculated from url if missing
+ * format A human-readable description of the format
+ ("mp4 container with h264/opus").
+ Calculated from the format_id, width, height.
+ and format_note fields if missing.
+ * format_id A short description of the format
+ ("mp4_h264_opus" or "19")
+ * format_note Additional info about the format
+ ("3D" or "DASH video")
+ * width Width of the video, if known
+ * height Height of the video, if known
+ * abr Average audio bitrate in KBit/s
+ * acodec Name of the audio codec in use
+ * vbr Average video bitrate in KBit/s
+ * vcodec Name of the video codec in use
+ * filesize The number of bytes, if known in advance
+ * player_url SWF Player URL (used for rtmpdump).
+ url: Final video URL.
+ ext: Video filename extension.
+ format: The video format, defaults to ext (used for --get-format)
+ player_url: SWF Player URL (used for rtmpdump).
+ urlhandle: [internal] The urlHandle to be used to download the file,
+ like returned by urllib.request.urlopen
The following fields are optional:
- format: The video format, defaults to ext (used for --get-format)
thumbnails: A list of dictionaries (with the entries "resolution" and
"url") for the varying thumbnails
thumbnail: Full URL to a video thumbnail image.
upload_date: Video upload date (YYYYMMDD).
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
- player_url: SWF Player URL (used for rtmpdump).
subtitles: The subtitle file contents as a dictionary in the format
{language: subtitles}.
+ duration: Length of the video in seconds, as an integer.
view_count: How many users have watched the video on the platform.
- urlhandle: [internal] The urlHandle to be used to download the file,
- like returned by urllib.request.urlopen
+ like_count: Number of positive ratings of the video
+ dislike_count: Number of negative ratings of the video
+ comment_count: Number of comments on the video
age_limit: Age restriction for the video, as an integer (years)
- formats: A list of dictionaries for each format available, it must
- be ordered from worst to best quality. Potential fields:
- * url Mandatory. The URL of the video file
- * ext Will be calculated from url if missing
- * format A human-readable description of the format
- ("mp4 container with h264/opus").
- Calculated from the format_id, width, height.
- and format_note fields if missing.
- * format_id A short description of the format
- ("mp4_h264_opus" or "19")
- * format_note Additional info about the format
- ("3D" or "DASH video")
- * width Width of the video, if known
- * height Height of the video, if known
- * abr Average audio bitrate in KBit/s
- * acodec Name of the audio codec in use
- * vbr Average video bitrate in KBit/s
- * vcodec Name of the video codec in use
- * filesize The number of bytes, if known in advance
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
def IE_NAME(self):
return type(self).__name__[:-2]
- def _request_webpage(self, url_or_request, video_id, note=None, errnote=None):
+ def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
""" Returns the response handle """
if note is None:
self.report_download_webpage(video_id)
elif note is not False:
- self.to_screen(u'%s: %s' % (video_id, note))
+ if video_id is None:
+ self.to_screen(u'%s' % (note,))
+ else:
+ self.to_screen(u'%s: %s' % (video_id, note))
try:
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
+ if errnote is False:
+ return False
if errnote is None:
errnote = u'Unable to download webpage'
- raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2], cause=err)
+ errmsg = u'%s: %s' % (errnote, compat_str(err))
+ if fatal:
+ raise ExtractorError(errmsg, sys.exc_info()[2], cause=err)
+ else:
+ self._downloader.report_warning(errmsg)
+ return False
- def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None):
+ def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
""" Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0]
- urlh = self._request_webpage(url_or_request, video_id, note, errnote)
+ urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal)
+ if urlh is False:
+ assert not fatal
+ return False
content_type = urlh.headers.get('Content-Type', '')
webpage_bytes = urlh.read()
m = re.match(r'[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+\s*;\s*charset=(.+)', content_type)
content = webpage_bytes.decode(encoding, 'replace')
return (content, urlh)
- def _download_webpage(self, url_or_request, video_id, note=None, errnote=None):
+ def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
""" Returns the data of the page as a string """
- return self._download_webpage_handle(url_or_request, video_id, note, errnote)[0]
+ res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal)
+ if res is False:
+ return res
+ else:
+ content, _ = res
+ return content
def _download_xml(self, url_or_request, video_id,
- note=u'Downloading XML', errnote=u'Unable to download XML'):
+ note=u'Downloading XML', errnote=u'Unable to download XML',
+ transform_source=None):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(url_or_request, video_id, note, errnote)
+ if transform_source:
+ xml_string = transform_source(xml_string)
return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
def to_screen(self, msg):
self.to_screen(u'Logging in')
#Methods for following #608
- def url_result(self, url, ie=None, video_id=None):
+ @staticmethod
+ def url_result(url, ie=None, video_id=None):
"""Returns a url that points to a page that should be processed"""
#TODO: ie should be the class used for getting the info
video_info = {'_type': 'url',
if video_id is not None:
video_info['id'] = video_id
return video_info
- def playlist_result(self, entries, playlist_id=None, playlist_title=None):
+ @staticmethod
+ def playlist_result(entries, playlist_id=None, playlist_title=None):
"""Returns a playlist"""
video_info = {'_type': 'playlist',
'entries': entries}
video_info['title'] = playlist_title
return video_info
- def _search_regex(self, pattern, string, name, default=None, fatal=True, flags=0):
+ def _search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0):
"""
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
mobj = re.search(p, string, flags)
if mobj: break
- if sys.stderr.isatty() and os.name != 'nt':
+ if os.name != 'nt' and sys.stderr.isatty():
_name = u'\033[0;34m%s\033[0m' % name
else:
_name = name
if mobj:
# return the first matching group
return next(g for g in mobj.groups() if g is not None)
- elif default is not None:
+ elif default is not _NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError(u'Unable to extract %s' % _name)
u'please report this issue on http://yt-dl.org/bug' % _name)
return None
- def _html_search_regex(self, pattern, string, name, default=None, fatal=True, flags=0):
+ def _html_search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0):
"""
Like _search_regex, but strips HTML tags and unescapes entities.
"""
--- /dev/null
+# encoding: utf-8
+import re, base64, zlib
+from hashlib import sha1
+from math import pow, sqrt, floor
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ compat_urllib_parse,
+ compat_urllib_request,
+ bytes_to_intlist,
+ intlist_to_bytes,
+ unified_strdate,
+ clean_html,
+)
+from ..aes import (
+ aes_cbc_decrypt,
+ inc,
+)
+
+class CrunchyrollIE(InfoExtractor):
+ _VALID_URL = r'(?:https?://)?(?:www\.)?(?P<url>crunchyroll\.com/[^/]*/[^/?&]*?(?P<video_id>[0-9]+))(?:[/?&]|$)'
+ _TESTS = [{
+ u'url': u'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
+ u'file': u'645513.flv',
+ #u'md5': u'b1639fd6ddfaa43788c85f6d1dddd412',
+ u'info_dict': {
+ u'title': u'Wanna be the Strongest in the World Episode 1 – An Idol-Wrestler is Born!',
+ u'description': u'md5:2d17137920c64f2f49981a7797d275ef',
+ u'thumbnail': u'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
+ u'uploader': u'Yomiuri Telecasting Corporation (YTV)',
+ u'upload_date': u'20131013',
+ },
+ u'params': {
+ # rtmp
+ u'skip_download': True,
+ },
+ }]
+
+ _FORMAT_IDS = {
+ u'360': (u'60', u'106'),
+ u'480': (u'61', u'106'),
+ u'720': (u'62', u'106'),
+ u'1080': (u'80', u'108'),
+ }
+
+ def _decrypt_subtitles(self, data, iv, id):
+ data = bytes_to_intlist(data)
+ iv = bytes_to_intlist(iv)
+ id = int(id)
+
+ def obfuscate_key_aux(count, modulo, start):
+ output = list(start)
+ for _ in range(count):
+ output.append(output[-1] + output[-2])
+ # cut off start values
+ output = output[2:]
+ output = list(map(lambda x: x % modulo + 33, output))
+ return output
+
+ def obfuscate_key(key):
+ num1 = int(floor(pow(2, 25) * sqrt(6.9)))
+ num2 = (num1 ^ key) << 5
+ num3 = key ^ num1
+ num4 = num3 ^ (num3 >> 3) ^ num2
+ prefix = intlist_to_bytes(obfuscate_key_aux(20, 97, (1, 2)))
+ shaHash = bytes_to_intlist(sha1(prefix + str(num4).encode(u'ascii')).digest())
+ # Extend 160 Bit hash to 256 Bit
+ return shaHash + [0] * 12
+
+ key = obfuscate_key(id)
+ class Counter:
+ __value = iv
+ def next_value(self):
+ temp = self.__value
+ self.__value = inc(self.__value)
+ return temp
+ decrypted_data = intlist_to_bytes(aes_cbc_decrypt(data, key, iv))
+ return zlib.decompress(decrypted_data)
+
+ def _convert_subtitles_to_srt(self, subtitles):
+ i=1
+ output = u''
+ for start, end, text in re.findall(r'<event [^>]*?start="([^"]+)" [^>]*?end="([^"]+)" [^>]*?text="([^"]+)"[^>]*?>', subtitles):
+ start = start.replace(u'.', u',')
+ end = end.replace(u'.', u',')
+ text = clean_html(text)
+ text = text.replace(u'\\N', u'\n')
+ if not text:
+ continue
+ output += u'%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
+ i+=1
+ return output
+
+ def _real_extract(self,url):
+ mobj = re.match(self._VALID_URL, url)
+
+ webpage_url = u'http://www.' + mobj.group('url')
+ video_id = mobj.group(u'video_id')
+ webpage = self._download_webpage(webpage_url, video_id)
+ note_m = self._html_search_regex(r'<div class="showmedia-trailer-notice">(.+?)</div>', webpage, u'trailer-notice', default=u'')
+ if note_m:
+ raise ExtractorError(note_m)
+
+ video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, u'video_title', flags=re.DOTALL)
+ video_title = re.sub(r' {2,}', u' ', video_title)
+ video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, u'video_description', default=u'')
+ if not video_description:
+ video_description = None
+ video_upload_date = self._html_search_regex(r'<div>Availability for free users:(.+?)</div>', webpage, u'video_upload_date', fatal=False, flags=re.DOTALL)
+ if video_upload_date:
+ video_upload_date = unified_strdate(video_upload_date)
+ video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, u'video_uploader', fatal=False, flags=re.DOTALL)
+
+ playerdata_url = compat_urllib_parse.unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, u'playerdata_url'))
+ playerdata_req = compat_urllib_request.Request(playerdata_url)
+ playerdata_req.data = compat_urllib_parse.urlencode({u'current_page': webpage_url})
+ playerdata_req.add_header(u'Content-Type', u'application/x-www-form-urlencoded')
+ playerdata = self._download_webpage(playerdata_req, video_id, note=u'Downloading media info')
+
+ stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, u'stream_id')
+ video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, u'thumbnail', fatal=False)
+
+ formats = []
+ for fmt in re.findall(r'\?p([0-9]{3,4})=1', webpage):
+ stream_quality, stream_format = self._FORMAT_IDS[fmt]
+ video_format = fmt+u'p'
+ streamdata_req = compat_urllib_request.Request(u'http://www.crunchyroll.com/xml/')
+ # urlencode doesn't work!
+ streamdata_req.data = u'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality='+stream_quality+u'&media%5Fid='+stream_id+u'&video%5Fformat='+stream_format
+ streamdata_req.add_header(u'Content-Type', u'application/x-www-form-urlencoded')
+ streamdata_req.add_header(u'Content-Length', str(len(streamdata_req.data)))
+ streamdata = self._download_webpage(streamdata_req, video_id, note=u'Downloading media info for '+video_format)
+ video_url = self._search_regex(r'<host>([^<]+)', streamdata, u'video_url')
+ video_play_path = self._search_regex(r'<file>([^<]+)', streamdata, u'video_play_path')
+ formats.append({
+ u'url': video_url,
+ u'play_path': video_play_path,
+ u'ext': 'flv',
+ u'format': video_format,
+ u'format_id': video_format,
+ })
+
+ subtitles = {}
+ for sub_id, sub_name in re.findall(r'\?ssid=([0-9]+)" title="([^"]+)', webpage):
+ sub_page = self._download_webpage(u'http://www.crunchyroll.com/xml/?req=RpcApiSubtitle_GetXml&subtitle_script_id='+sub_id,\
+ video_id, note=u'Downloading subtitles for '+sub_name)
+ id = self._search_regex(r'id=\'([0-9]+)', sub_page, u'subtitle_id', fatal=False)
+ iv = self._search_regex(r'<iv>([^<]+)', sub_page, u'subtitle_iv', fatal=False)
+ data = self._search_regex(r'<data>([^<]+)', sub_page, u'subtitle_data', fatal=False)
+ if not id or not iv or not data:
+ continue
+ id = int(id)
+ iv = base64.b64decode(iv)
+ data = base64.b64decode(data)
+
+ subtitle = self._decrypt_subtitles(data, iv, id).decode(u'utf-8')
+ lang_code = self._search_regex(r'lang_code=\'([^\']+)', subtitle, u'subtitle_lang_code', fatal=False)
+ if not lang_code:
+ continue
+ subtitles[lang_code] = self._convert_subtitles_to_srt(subtitle)
+
+ return {
+ u'id': video_id,
+ u'title': video_title,
+ u'description': video_description,
+ u'thumbnail': video_thumbnail,
+ u'uploader': video_uploader,
+ u'upload_date': video_upload_date,
+ u'subtitles': subtitles,
+ u'formats': formats,
+ }
)
class CSpanIE(InfoExtractor):
- _VALID_URL = r'http://www.c-spanvideo.org/program/(.*)'
+ _VALID_URL = r'http://www\.c-spanvideo\.org/program/(.*)'
_TEST = {
u'url': u'http://www.c-spanvideo.org/program/HolderonV',
u'file': u'315139.flv',
get_element_by_attribute,
get_element_by_id,
orderedSet,
+ str_to_int,
ExtractorError,
)
class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion"""
- _VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
+ _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
IE_NAME = u'dailymotion'
_FORMATS = [
# Extract id and simplified title from URL
mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group(1).split('_')[0].split('?')[0]
+ video_id = mobj.group('id')
url = 'http://www.dailymotion.com/video/%s' % video_id
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
- video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
- # Looking for official user
- r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
- webpage, 'video uploader', fatal=False)
age_limit = self._rta_search(webpage)
video_upload_date = None
self._list_available_subtitles(video_id, webpage)
return
+ view_count = self._search_regex(
+ r'video_views_count[^>]+>\s+([\d\.,]+)', webpage, u'view count', fatal=False)
+ if view_count is not None:
+ view_count = str_to_int(view_count)
+
return {
'id': video_id,
'formats': formats,
- 'uploader': video_uploader,
+ 'uploader': info['owner_screenname'],
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'],
'age_limit': age_limit,
+ 'view_count': view_count,
}
def _get_available_subtitles(self, video_id, webpage):
class DaumIE(InfoExtractor):
- _VALID_URL = r'https?://tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
IE_NAME = u'daum.net'
_TEST = {
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
- _VALID_URL = r'(?:http://)?(?:www\.)?3sat.de/mediathek/index.php\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
+ _VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/index\.php\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TEST = {
u"url": u"http://www.3sat.de/mediathek/index.php?obj=36983",
u'file': u'36983.webm',
class EightTracksIE(InfoExtractor):
IE_NAME = '8tracks'
- _VALID_URL = r'https?://8tracks.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
+ _VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
u"name": u"EightTracks",
u"url": u"http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
IE_NAME = u'exfm'
IE_DESC = u'ex.fm'
_VALID_URL = r'(?:http://)?(?:www\.)?ex\.fm/song/([^/]+)'
- _SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
+ _SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [
{
u'url': u'http://ex.fm/song/eh359',
class FacebookIE(InfoExtractor):
"""Information Extractor for Facebook"""
- _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
+ _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:[^#?]*#!/)?(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
_LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook'
u'file': u'120708114770723.mp4',
u'md5': u'48975a41ccc4b7a581abd68651c1a5a8',
u'info_dict': {
- u"duration": 279,
+ u"duration": 279,
u"title": u"PEOPLE ARE AWESOME 2013"
}
}
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
- _VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
+ _VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+)\.html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
class FKTVIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv'
- _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
+ _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik\.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_TEST = {
u'url': u'http://fernsehkritik.tv/folge-1',
class FKTVPosteckeIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv:postecke'
- _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
+ _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik\.tv/inline-video/postecke\.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
_TEST = {
u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
u'file': u'0120.flv',
thumbnail_path = info.find('image').text
return {'id': video_id,
- 'ext': 'mp4',
+ 'ext': 'flv' if video_url.startswith('rtmp') else 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
- _VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
+ _VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+)\.html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
return self._extract_video(video_id)
-class France2IE(FranceTVBaseInfoExtractor):
- IE_NAME = u'france2.fr'
- _VALID_URL = r'''(?x)https?://www\.france2\.fr/
+class FranceTVIE(FranceTVBaseInfoExtractor):
+ IE_NAME = u'francetv'
+ IE_DESC = u'France 2, 3, 4, 5 and Ô'
+ _VALID_URL = r'''(?x)https?://www\.france[2345o]\.fr/
(?:
- emissions/.*?/videos/(?P<id>\d+)
- | emission/(?P<key>[^/?]+)
+ emissions/.*?/(videos|emissions)/(?P<id>[^/?]+)
+ | (emissions?|jt)/(?P<key>[^/?]+)
)'''
- _TEST = {
- u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
- u'file': u'75540104.mp4',
- u'info_dict': {
- u'title': u'13h15, le samedi...',
- u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
+ _TESTS = [
+ # france2
+ {
+ u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
+ u'file': u'75540104.mp4',
+ u'info_dict': {
+ u'title': u'13h15, le samedi...',
+ u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
+ },
+ u'params': {
+ # m3u8 download
+ u'skip_download': True,
+ },
},
- u'params': {
- u'skip_download': True,
+ # france3
+ {
+ u'url': u'http://www.france3.fr/emissions/pieces-a-conviction/diffusions/13-11-2013_145575',
+ u'info_dict': {
+ u'id': u'000702326_CAPP_PicesconvictionExtrait313022013_120220131722_Au',
+ u'ext': u'flv',
+ u'title': u'Le scandale du prix des médicaments',
+ u'description': u'md5:1384089fbee2f04fc6c9de025ee2e9ce',
+ },
+ u'params': {
+ # rtmp download
+ u'skip_download': True,
+ },
},
- }
+ # france4
+ {
+ u'url': u'http://www.france4.fr/emissions/hero-corp/videos/rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
+ u'info_dict': {
+ u'id': u'rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
+ u'ext': u'flv',
+ u'title': u'Hero Corp Making of - Extrait 1',
+ u'description': u'md5:c87d54871b1790679aec1197e73d650a',
+ },
+ u'params': {
+ # rtmp download
+ u'skip_download': True,
+ },
+ },
+ # france5
+ {
+ u'url': u'http://www.france5.fr/emissions/c-a-dire/videos/92837968',
+ u'info_dict': {
+ u'id': u'92837968',
+ u'ext': u'mp4',
+ u'title': u'C à dire ?!',
+ u'description': u'md5:fb1db1cbad784dcce7c7a7bd177c8e2f',
+ },
+ u'params': {
+ # m3u8 download
+ u'skip_download': True,
+ },
+ },
+ # franceo
+ {
+ u'url': u'http://www.franceo.fr/jt/info-afrique/04-12-2013',
+ u'info_dict': {
+ u'id': u'92327925',
+ u'ext': u'mp4',
+ u'title': u'Infô-Afrique',
+ u'description': u'md5:ebf346da789428841bee0fd2a935ea55',
+ },
+ u'params': {
+ # m3u8 download
+ u'skip_download': True,
+ },
+ u'skip': u'The id changes frequently',
+ },
+ ]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
- video_id = self._html_search_regex(
- r'''(?x)<div\s+class="video-player">\s*
+ id_res = [
+ (r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
- class="francetv-video-player">''',
- webpage, u'video ID')
+ class="francetv-video-player">'''),
+ (r'<a id="player_direct" href="http://info\.francetelevisions'
+ '\.fr/\?id-video=([^"/&]+)'),
+ (r'<a class="video" id="ftv_player_(.+?)"'),
+ ]
+ video_id = self._html_search_regex(id_res, webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GamekingsIE(InfoExtractor):
- _VALID_URL = r'http?://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
+ _VALID_URL = r'http://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
_TEST = {
u"url": u"http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/",
u'file': u'20130811.mp4',
class GametrailersIE(MTVServicesInfoExtractor):
- _VALID_URL = r'http://www.gametrailers.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
-
+ _VALID_URL = r'http://www\.gametrailers\.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
_TEST = {
u'url': u'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
u'file': u'70e9a5d7-cf25-4a10-9104-6f3e7342ae0d.mp4',
compat_urlparse,
ExtractorError,
+ HEADRequest,
smuggle_url,
unescapeHTML,
+ unified_strdate,
+ url_basename,
)
from .brightcove import BrightcoveIE
+from .ooyala import OoyalaIE
class GenericIE(InfoExtractor):
u'skip_download': True,
},
},
+ # Direct link to a video
+ {
+ u'url': u'http://media.w3.org/2010/05/sintel/trailer.mp4',
+ u'file': u'trailer.mp4',
+ u'md5': u'67d406c2bcb6af27fa886f31aa934bbe',
+ u'info_dict': {
+ u'id': u'trailer',
+ u'title': u'trailer',
+ u'upload_date': u'20100513',
+ }
+ },
+ # ooyala video
+ {
+ u'url': u'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
+ u'md5': u'5644c6ca5d5782c1d0d350dad9bd840c',
+ u'info_dict': {
+ u'id': u'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ',
+ u'ext': u'mp4',
+ u'title': u'2cc213299525360.mov', #that's what we get
+ },
+ },
]
def report_download_webpage(self, video_id):
"""Report information extraction."""
self._downloader.to_screen(u'[redirect] Following redirect to %s' % new_url)
- def _test_redirect(self, url):
+ def _send_head(self, url):
"""Check if it is a redirect, like url shorteners, in case return the new url."""
- class HeadRequest(compat_urllib_request.Request):
- def get_method(self):
- return "HEAD"
class HEADRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
"""
Subclass the HTTPRedirectHandler to make it use our
- HeadRequest also on the redirected URL
+ HEADRequest also on the redirected URL
"""
def redirect_request(self, req, fp, code, msg, headers, newurl):
if code in (301, 302, 303, 307):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
- return HeadRequest(newurl,
+ return HEADRequest(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True)
compat_urllib_request.HTTPErrorProcessor, compat_urllib_request.HTTPSHandler]:
opener.add_handler(handler())
- response = opener.open(HeadRequest(url))
+ response = opener.open(HEADRequest(url))
if response is None:
raise ExtractorError(u'Invalid URL protocol')
- new_url = response.geturl()
-
- if url == new_url:
- return False
-
- self.report_following_redirect(new_url)
- return new_url
+ return response
def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
+ video_id = os.path.splitext(url.split('/')[-1])[0]
try:
- new_url = self._test_redirect(url)
- if new_url:
- return [self.url_result(new_url)]
+ response = self._send_head(url)
+
+ # Check for redirect
+ new_url = response.geturl()
+ if url != new_url:
+ self.report_following_redirect(new_url)
+ return self.url_result(new_url)
+
+ # Check for direct link to a video
+ content_type = response.headers.get('Content-Type', '')
+ m = re.match(r'^(?P<type>audio|video|application(?=/ogg$))/(?P<format_id>.+)$', content_type)
+ if m:
+ upload_date = response.headers.get('Last-Modified')
+ if upload_date:
+ upload_date = unified_strdate(upload_date)
+ return {
+ 'id': video_id,
+ 'title': os.path.splitext(url_basename(url))[0],
+ 'formats': [{
+ 'format_id': m.group('format_id'),
+ 'url': url,
+ 'vcodec': u'none' if m.group('type') == 'audio' else None
+ }],
+ 'upload_date': upload_date,
+ }
+
except compat_urllib_error.HTTPError:
# This may be a stupid server that doesn't like HEAD, our UA, or so
pass
- video_id = url.split('/')[-1]
try:
webpage = self._download_webpage(url, video_id)
except ValueError:
# Site Name | Video Title
# Video Title - Tagline | Site Name
# and so on and so forth; it's just not practical
- video_title = self._html_search_regex(r'<title>(.*)</title>',
- webpage, u'video title', default=u'video', flags=re.DOTALL)
+ video_title = self._html_search_regex(
+ r'(?s)<title>(.*?)</title>', webpage, u'video title',
+ default=u'video')
+
+ # video uploader is domain name
+ video_uploader = self._search_regex(
+ r'^(?:https?://)?([^/]*)/.*', url, u'video uploader')
# Look for BrightCove:
bc_url = BrightcoveIE._extract_brightcove_url(webpage)
self.to_screen(u'Brightcove video detected.')
return self.url_result(bc_url, 'Brightcove')
- # Look for embedded Vimeo player
+ # Look for embedded (iframe) Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
if mobj:
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
+ # Look for embedded (swf embed) Vimeo player
+ mobj = re.search(
+ r'<embed[^>]+?src="(https?://(?:www\.)?vimeo.com/moogaloop.swf.+?)"', webpage)
+ if mobj:
+ return self.url_result(mobj.group(1), 'Vimeo')
+
# Look for embedded YouTube player
- matches = re.findall(
- r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube.com/embed/.+?)\1', webpage)
+ matches = re.findall(r'''(?x)
+ (?:<iframe[^>]+?src=|embedSWF\(\s*)
+ (["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube\.com/
+ (?:embed|v)/.+?)
+ \1''', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Youtube')
for tuppl in matches]
# Look for embedded Dailymotion player
matches = re.findall(
- r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion.com/embed/video/.+?)\1', webpage)
+ r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
if matches:
urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Dailymotion')
for tuppl in matches]
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
+ # Look for embedded Wistia player
+ match = re.search(
+ r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:fast\.)?wistia\.net/embed/iframe/.+?)\1', webpage)
+ if match:
+ return {
+ '_type': 'url_transparent',
+ 'url': unescapeHTML(match.group('url')),
+ 'ie_key': 'Wistia',
+ 'uploader': video_uploader,
+ 'title': video_title,
+ 'id': video_id,
+ }
+
+ # Look for embedded blip.tv player
+ mobj = re.search(r'<meta\s[^>]*https?://api.blip.tv/\w+/redirect/\w+/(\d+)', webpage)
+ if mobj:
+ return self.url_result('http://blip.tv/seo/-'+mobj.group(1), 'BlipTV')
+ mobj = re.search(r'<(?:iframe|embed|object)\s[^>]*https?://(?:\w+\.)?blip.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', webpage)
+ if mobj:
+ player_url = 'http://blip.tv/play/%s.x?p=1' % mobj.group(1)
+ player_page = self._download_webpage(player_url, mobj.group(1))
+ blip_video_id = self._search_regex(r'data-episode-id="(\d+)', player_page, u'blip_video_id', fatal=False)
+ if blip_video_id:
+ return self.url_result('http://blip.tv/seo/-'+blip_video_id, 'BlipTV')
+
# Look for Bandcamp pages with custom domain
mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
if mobj is not None:
# Don't set the extractor because it can be a track url or an album
return self.url_result(burl)
+ # Look for embedded Vevo player
+ mobj = re.search(
+ r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:cache\.)?vevo\.com/.+?)\1', webpage)
+ if mobj is not None:
+ return self.url_result(mobj.group('url'))
+
+ # Look for Ooyala videos
+ mobj = re.search(r'player.ooyala.com/[^"?]+\?[^"]*?(?:embedCode|ec)=([^"&]+)', webpage)
+ if mobj is not None:
+ return OoyalaIE._build_url_result(mobj.group(1))
+
+ # Look for Aparat videos
+ mobj = re.search(r'<iframe src="(http://www.aparat.com/video/[^"]+)"', webpage)
+ if mobj is not None:
+ return self.url_result(mobj.group(1), 'Aparat')
+
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
# here's a fun little line of code for you:
video_id = os.path.splitext(video_id)[0]
- # video uploader is domain name
- video_uploader = self._search_regex(r'(?:https?://)?([^/]*)/.*',
- url, u'video uploader')
-
return {
'id': video_id,
'url': video_url,
'uploader': video_uploader,
- 'upload_date': None,
'title': video_title,
}
u'file': u'1435540.mp3',
u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
u'info_dict': {
- u"title": u"Freddie Gibbs - Lay It Down"
+ u"title": u'Freddie Gibbs "Lay It Down"'
}
}
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
- u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
+ u'title': u'26 Twisted Moments from GTA 5 in Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
class OneUPIE(IGNIE):
"""Extractor for 1up.com, it uses the ign videos system."""
- _VALID_URL = r'https?://gamevideos.1up.com/(?P<type>video)/id/(?P<name_or_id>.+)'
+ _VALID_URL = r'https?://gamevideos\.1up\.com/(?P<type>video)/id/(?P<name_or_id>.+)'
IE_NAME = '1up.com'
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
class ImdbIE(InfoExtractor):
IE_NAME = u'imdb'
IE_DESC = u'Internet Movie Database trailers'
- _VALID_URL = r'http://www\.imdb\.com/video/imdb/vi(?P<id>\d+)'
+ _VALID_URL = r'http://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
_TEST = {
u'url': u'http://www.imdb.com/video/imdb/vi2524815897',
u'ext': u'mp4',
u'title': u'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
u'description': u'md5:9061c2219254e5d14e03c25c98e96a81',
- u'duration': 151,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
- webpage = self._download_webpage(url,video_id)
+ webpage = self._download_webpage('http://www.imdb.com/video/imdb/vi%s' % video_id, video_id)
descr = get_element_by_attribute('itemprop', 'description', webpage)
available_formats = re.findall(
r'case \'(?P<f_id>.*?)\' :$\s+url = \'(?P<path>.*?)\'', webpage,
flags=re.MULTILINE)
formats = []
for f_id, f_path in available_formats:
+ f_path = f_path.strip()
format_page = self._download_webpage(
compat_urlparse.urljoin(url, f_path),
u'Downloading info for %s format' % f_id)
formats.append({
'format_id': f_id,
'url': format_info['url'],
- 'height': int(info['titleObject']['encoding']['selected'][:-1]),
})
return {
'formats': formats,
'description': descr,
'thumbnail': format_info['slate'],
- 'duration': int(info['titleObject']['title']['duration_seconds']),
}
from .common import InfoExtractor
class InstagramIE(InfoExtractor):
- _VALID_URL = r'(?:http://)?instagram.com/p/(.*?)/'
+ _VALID_URL = r'(?:http://)?instagram\.com/p/(.*?)/'
_TEST = {
u'url': u'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
u'file': u'aye83DjauH.mp4',
--- /dev/null
+# encoding: utf-8
+
+import re
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+ compat_urllib_request,
+ ExtractorError,
+)
+
+
+class IviIE(InfoExtractor):
+ IE_DESC = u'ivi.ru'
+ IE_NAME = u'ivi'
+ _VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch(?:/(?P<compilationid>[^/]+))?/(?P<videoid>\d+)'
+
+ _TESTS = [
+ # Single movie
+ {
+ u'url': u'http://www.ivi.ru/watch/53141',
+ u'file': u'53141.mp4',
+ u'md5': u'6ff5be2254e796ed346251d117196cf4',
+ u'info_dict': {
+ u'title': u'Иван Васильевич меняет профессию',
+ u'description': u'md5:14d8eda24e9d93d29b5857012c6d6346',
+ u'duration': 5498,
+ u'thumbnail': u'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
+ },
+ u'skip': u'Only works from Russia',
+ },
+ # Serial's serie
+ {
+ u'url': u'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
+ u'file': u'74791.mp4',
+ u'md5': u'3e6cc9a848c1d2ebcc6476444967baa9',
+ u'info_dict': {
+ u'title': u'Дежурный ангел - 1 серия',
+ u'duration': 2490,
+ u'thumbnail': u'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
+ },
+ u'skip': u'Only works from Russia',
+ }
+ ]
+
+ # Sorted by quality
+ _known_formats = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
+
+ # Sorted by size
+ _known_thumbnails = ['Thumb-120x90', 'Thumb-160', 'Thumb-640x480']
+
+ def _extract_description(self, html):
+ m = re.search(r'<meta name="description" content="(?P<description>[^"]+)"/>', html)
+ return m.group('description') if m is not None else None
+
+ def _extract_comment_count(self, html):
+ m = re.search(u'(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
+ return int(m.group('commentcount')) if m is not None else 0
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('videoid')
+
+ api_url = 'http://api.digitalaccess.ru/api/json/'
+
+ data = {u'method': u'da.content.get',
+ u'params': [video_id, {u'site': u's183',
+ u'referrer': u'http://www.ivi.ru/watch/%s' % video_id,
+ u'contentid': video_id
+ }
+ ]
+ }
+
+ request = compat_urllib_request.Request(api_url, json.dumps(data))
+
+ video_json_page = self._download_webpage(request, video_id, u'Downloading video JSON')
+ video_json = json.loads(video_json_page)
+
+ if u'error' in video_json:
+ error = video_json[u'error']
+ if error[u'origin'] == u'NoRedisValidData':
+ raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
+ raise ExtractorError(u'Unable to download video %s: %s' % (video_id, error[u'message']), expected=True)
+
+ result = video_json[u'result']
+
+ formats = [{'url': x[u'url'],
+ 'format_id': x[u'content_format']
+ } for x in result[u'files'] if x[u'content_format'] in self._known_formats]
+ formats.sort(key=lambda fmt: self._known_formats.index(fmt['format_id']))
+
+ if len(formats) == 0:
+ self._downloader.report_warning(u'No media links available for %s' % video_id)
+ return
+
+ duration = result[u'duration']
+ compilation = result[u'compilation']
+ title = result[u'title']
+
+ title = '%s - %s' % (compilation, title) if compilation is not None else title
+
+ previews = result[u'preview']
+ previews.sort(key=lambda fmt: self._known_thumbnails.index(fmt['content_format']))
+ thumbnail = previews[-1][u'url'] if len(previews) > 0 else None
+
+ video_page = self._download_webpage(url, video_id, u'Downloading video page')
+ description = self._extract_description(video_page)
+ comment_count = self._extract_comment_count(video_page)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'thumbnail': thumbnail,
+ 'description': description,
+ 'duration': duration,
+ 'comment_count': comment_count,
+ 'formats': formats,
+ }
+
+
+class IviCompilationIE(InfoExtractor):
+ IE_DESC = u'ivi.ru compilations'
+ IE_NAME = u'ivi:compilation'
+ _VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch/(?!\d+)(?P<compilationid>[a-z\d_-]+)(?:/season(?P<seasonid>\d+))?$'
+
+ def _extract_entries(self, html, compilation_id):
+ return [self.url_result('http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), 'Ivi')
+ for serie in re.findall(r'<strong><a href="/watch/%s/(\d+)">(?:[^<]+)</a></strong>' % compilation_id, html)]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ compilation_id = mobj.group('compilationid')
+ season_id = mobj.group('seasonid')
+
+ if season_id is not None: # Season link
+ season_page = self._download_webpage(url, compilation_id, u'Downloading season %s web page' % season_id)
+ playlist_id = '%s/season%s' % (compilation_id, season_id)
+ playlist_title = self._html_search_meta(u'title', season_page, u'title')
+ entries = self._extract_entries(season_page, compilation_id)
+ else: # Compilation link
+ compilation_page = self._download_webpage(url, compilation_id, u'Downloading compilation web page')
+ playlist_id = compilation_id
+ playlist_title = self._html_search_meta(u'title', compilation_page, u'title')
+ seasons = re.findall(r'<a href="/watch/%s/season(\d+)">[^<]+</a>' % compilation_id, compilation_page)
+ if len(seasons) == 0: # No seasons in this compilation
+ entries = self._extract_entries(compilation_page, compilation_id)
+ else:
+ entries = []
+ for season_id in seasons:
+ season_page = self._download_webpage('http://www.ivi.ru/watch/%s/season%s' % (compilation_id, season_id),
+ compilation_id, u'Downloading season %s web page' % season_id)
+ entries.extend(self._extract_entries(season_page, compilation_id))
+
+ return self.playlist_result(entries, playlist_id, playlist_title)
\ No newline at end of file
)
class JukeboxIE(InfoExtractor):
- _VALID_URL = r'^http://www\.jukebox?\..+?\/.+[,](?P<video_id>[a-z0-9\-]+).html'
+ _VALID_URL = r'^http://www\.jukebox?\..+?\/.+[,](?P<video_id>[a-z0-9\-]+)\.html'
_IFRAME = r'<iframe .*src="(?P<iframe>[^"]*)".*>'
_VIDEO_URL = r'"config":{"file":"(?P<video_url>http:[^"]+[.](?P<video_ext>[^.?]+)[?]mdtk=[0-9]+)"'
_TITLE = r'<h1 class="inline">(?P<title>[^<]+)</h1>.*<span id="infos_article_artist">(?P<artist>[^<]+)</span>'
class LiveLeakIE(InfoExtractor):
- _VALID_URL = r'^(?:http?://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
+ _VALID_URL = r'^(?:http://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'liveleak'
_TEST = {
u'url': u'http://www.liveleak.com/view?i=757_1364311680',
class LivestreamIE(InfoExtractor):
IE_NAME = u'livestream'
- _VALID_URL = r'http://new.livestream.com/.*?/(?P<event_name>.*?)(/videos/(?P<id>\d+))?/?$'
+ _VALID_URL = r'http://new\.livestream\.com/.*?/(?P<event_name>.*?)(/videos/(?P<id>\d+))?/?$'
_TEST = {
u'url': u'http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370',
u'file': u'4719370.mp4',
--- /dev/null
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+)
+
+
+class MDRIE(InfoExtractor):
+ _VALID_URL = r'^(?P<domain>(?:https?://)?(?:www\.)?mdr\.de)/mediathek/(?:.*)/(?P<type>video|audio)(?P<video_id>[^/_]+)_.*'
+
+ # No tests, MDR regularily deletes its videos
+
+ def _real_extract(self, url):
+ m = re.match(self._VALID_URL, url)
+ video_id = m.group('video_id')
+ domain = m.group('domain')
+
+ # determine title and media streams from webpage
+ html = self._download_webpage(url, video_id)
+
+ title = self._html_search_regex(r'<h2>(.*?)</h2>', html, u'title')
+ xmlurl = self._search_regex(
+ r'(/mediathek/(?:.+)/(?:video|audio)[0-9]+-avCustom.xml)', html, u'XML URL')
+
+ doc = self._download_xml(domain + xmlurl, video_id)
+ formats = []
+ for a in doc.findall('./assets/asset'):
+ url_el = a.find('.//progressiveDownloadUrl')
+ if url_el is None:
+ continue
+ abr = int(a.find('bitrateAudio').text) // 1000
+ media_type = a.find('mediaType').text
+ format = {
+ 'abr': abr,
+ 'filesize': int(a.find('fileSize').text),
+ 'url': url_el.text,
+ }
+
+ vbr_el = a.find('bitrateVideo')
+ if vbr_el is None:
+ format.update({
+ 'vcodec': 'none',
+ 'format_id': u'%s-%d' % (media_type, abr),
+ })
+ else:
+ vbr = int(vbr_el.text) // 1000
+ format.update({
+ 'vbr': vbr,
+ 'width': int(a.find('frameWidth').text),
+ 'height': int(a.find('frameHeight').text),
+ 'format_id': u'%s-%d' % (media_type, vbr),
+ })
+ formats.append(format)
+ formats.sort(key=lambda f: (f.get('vbr'), f['abr']))
+ if not formats:
+ raise ExtractorError(u'Could not find any valid formats')
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'formats': formats,
+ }
import re
-import socket
from .common import InfoExtractor
from ..utils import (
- compat_http_client,
compat_parse_qs,
- compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
- compat_str,
determine_ext,
ExtractorError,
)
u'age_limit': 18,
},
},
+ # cbs video
+ {
+ u'url': u'http://www.metacafe.com/watch/cb-0rOxMBabDXN6/samsung_galaxy_note_2_samsungs_next_generation_phablet/',
+ u'info_dict': {
+ u'id': u'0rOxMBabDXN6',
+ u'ext': u'flv',
+ u'title': u'Samsung Galaxy Note 2: Samsung\'s next-generation phablet',
+ u'description': u'md5:54d49fac53d26d5a0aaeccd061ada09d',
+ u'duration': 129,
+ },
+ u'params': {
+ # rtmp download
+ u'skip_download': True,
+ },
+ },
]
def _real_initialize(self):
# Retrieve disclaimer
- request = compat_urllib_request.Request(self._DISCLAIMER)
- try:
- self.report_disclaimer()
- compat_urllib_request.urlopen(request).read()
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to retrieve disclaimer: %s' % compat_str(err))
+ self.report_disclaimer()
+ self._download_webpage(self._DISCLAIMER, None, False, u'Unable to retrieve disclaimer')
# Confirm age
disclaimer_form = {
}
request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
- try:
- self.report_age_confirmation()
- compat_urllib_request.urlopen(request).read()
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to confirm age: %s' % compat_str(err))
+ self.report_age_confirmation()
+ self._download_webpage(request, None, False, u'Unable to confirm age')
def _real_extract(self, url):
# Extract id and simplified title from URL
video_id = mobj.group(1)
- # Check if video comes from YouTube
- mobj2 = re.match(r'^yt-(.*)$', video_id)
- if mobj2 is not None:
- return [self.url_result('http://www.youtube.com/watch?v=%s' % mobj2.group(1), 'Youtube')]
+ # the video may come from an external site
+ m_external = re.match('^(\w{2})-(.*)$', video_id)
+ if m_external is not None:
+ prefix, ext_id = m_external.groups()
+ # Check if video comes from YouTube
+ if prefix == 'yt':
+ return self.url_result('http://www.youtube.com/watch?v=%s' % ext_id, 'Youtube')
+ # CBS videos use theplatform.com
+ if prefix == 'cb':
+ return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
# Retrieve video webpage to extract further information
req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id)
import re
-import xml.etree.ElementTree
import operator
from .common import InfoExtractor
+from ..utils import (
+ fix_xml_all_ampersand,
+)
class MetacriticIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
# The xml is not well formatted, there are raw '&'
- info_xml = self._download_webpage('http://www.metacritic.com/video_data?video=' + video_id,
- video_id, u'Downloading info xml').replace('&', '&')
- info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
+ info = self._download_xml('http://www.metacritic.com/video_data?video=' + video_id,
+ video_id, u'Downloading info xml', transform_source=fix_xml_all_ampersand)
clip = next(c for c in info.findall('playList/clip') if c.find('id').text == video_id)
formats = []
import json
import re
-import socket
from .common import InfoExtractor
from ..utils import (
- compat_http_client,
- compat_urllib_error,
- compat_urllib_request,
unified_strdate,
+ ExtractorError,
)
"""Returns 1st active url from list"""
for url in url_list:
try:
- compat_urllib_request.urlopen(url)
+ # We only want to know if the request succeed
+ # don't download the whole file
+ self._request_webpage(url, None, False)
return url
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error):
+ except ExtractorError:
url = None
return None
+ def _get_url(self, template_url):
+ return self.check_urls(template_url % i for i in range(30))
+
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
preview_url = self._search_regex(r'data-preview-url="(.+?)"', webpage, u'preview url')
song_url = preview_url.replace('/previews/', '/cloudcasts/originals/')
template_url = re.sub(r'(stream\d*)', 'stream%d', song_url)
- final_song_url = self.check_urls(template_url % i for i in range(30))
+ final_song_url = self._get_url(template_url)
+ if final_song_url is None:
+ self.to_screen('Trying with m4a extension')
+ template_url = template_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
+ final_song_url = self._get_url(template_url)
+ if final_song_url is None:
+ raise ExtractorError(u'Unable to extract track url')
return {
'id': track_id,
'title': info['name'],
'url': final_song_url,
- 'ext': 'mp3',
'description': info.get('description'),
'thumbnail': info['pictures'].get('extra_large'),
'uploader': info['user']['name'],
def _get_videos_info(self, uri):
video_id = self._id_from_uri(uri)
data = compat_urllib_parse.urlencode({'uri': uri})
- idoc = self._download_xml(self._FEED_URL +'?' + data, video_id,
- u'Downloading info')
+
+ def fix_ampersand(s):
+ """ Fix unencoded ampersand in XML """
+ return s.replace(u'& ', '& ')
+ idoc = self._download_xml(
+ self._FEED_URL + '?' + data, video_id,
+ u'Downloading info', transform_source=fix_ampersand)
return [self._get_video_info(item) for item in idoc.findall('.//item')]
class MTVIE(MTVServicesInfoExtractor):
- _VALID_URL = r'^https?://(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$'
+ _VALID_URL = r'''(?x)^https?://
+ (?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
+ m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''
_FEED_URL = 'http://www.mtv.com/player/embed/AS3/rss/'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
-
- webpage = self._download_webpage(url, video_id)
-
- # Some videos come from Vevo.com
- m_vevo = re.search(r'isVevoVideo = true;.*?vevoVideoId = "(.*?)";',
- webpage, re.DOTALL)
- if m_vevo:
- vevo_id = m_vevo.group(1);
- self.to_screen(u'Vevo video detected: %s' % vevo_id)
- return self.url_result('vevo:%s' % vevo_id, ie='Vevo')
-
- uri = self._html_search_regex(r'/uri/(.*?)\?', webpage, u'uri')
+ uri = mobj.group('mgid')
+ if uri is None:
+ webpage = self._download_webpage(url, video_id)
+
+ # Some videos come from Vevo.com
+ m_vevo = re.search(r'isVevoVideo = true;.*?vevoVideoId = "(.*?)";',
+ webpage, re.DOTALL)
+ if m_vevo:
+ vevo_id = m_vevo.group(1);
+ self.to_screen(u'Vevo video detected: %s' % vevo_id)
+ return self.url_result('vevo:%s' % vevo_id, ie='Vevo')
+
+ uri = self._html_search_regex(r'/uri/(.*?)\?', webpage, u'uri')
return self._get_videos_info(uri)
class MuzuTVIE(InfoExtractor):
- _VALID_URL = r'https?://www.muzu.tv/(.+?)/(.+?)/(?P<id>\d+)'
+ _VALID_URL = r'https?://www\.muzu\.tv/(.+?)/(.+?)/(?P<id>\d+)'
IE_NAME = u'muzu.tv'
_TEST = {
class MySpassIE(InfoExtractor):
- _VALID_URL = r'http://www.myspass.de/.*'
+ _VALID_URL = r'http://www\.myspass\.de/.*'
_TEST = {
u'url': u'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
u'file': u'11741.mp4',
class NaverIE(InfoExtractor):
- _VALID_URL = r'https?://tvcast\.naver\.com/v/(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)'
_TEST = {
u'url': u'http://tvcast.naver.com/v/81652',
--- /dev/null
+import re
+
+from .common import InfoExtractor
+from ..utils import month_by_name
+
+
+class NDTVIE(InfoExtractor):
+ _VALID_URL = r'^https?://(?:www\.)?ndtv\.com/video/player/[^/]*/[^/]*/(?P<id>[a-z0-9]+)'
+
+ _TEST = {
+ u"url": u"http://www.ndtv.com/video/player/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal/300710",
+ u"file": u"300710.mp4",
+ u"md5": u"39f992dbe5fb531c395d8bbedb1e5e88",
+ u"info_dict": {
+ u"title": u"NDTV exclusive: Don't need character certificate from Rahul Gandhi, says Arvind Kejriwal",
+ u"description": u"In an exclusive interview to NDTV, Aam Aadmi Party's Arvind Kejriwal says it makes no difference to him that Rahul Gandhi said the Congress needs to learn from his party.",
+ u"upload_date": u"20131208",
+ u"duration": 1327,
+ u"thumbnail": u"http://i.ndtvimg.com/video/images/vod/medium/2013-12/big_300710_1386518307.jpg",
+ },
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+
+ webpage = self._download_webpage(url, video_id)
+
+ filename = self._search_regex(
+ r"__filename='([^']+)'", webpage, u'video filename')
+ video_url = (u'http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' %
+ filename)
+
+ duration_str = filename = self._search_regex(
+ r"__duration='([^']+)'", webpage, u'duration', fatal=False)
+ duration = None if duration_str is None else int(duration_str)
+
+ date_m = re.search(r'''(?x)
+ <p\s+class="vod_dateline">\s*
+ Published\s+On:\s*
+ (?P<monthname>[A-Za-z]+)\s+(?P<day>[0-9]+),\s*(?P<year>[0-9]+)
+ ''', webpage)
+ upload_date = None
+ assert date_m
+ if date_m is not None:
+ month = month_by_name(date_m.group('monthname'))
+ if month is not None:
+ upload_date = '%s%02d%02d' % (
+ date_m.group('year'), month, int(date_m.group('day')))
+
+ description = self._og_search_description(webpage)
+ READ_MORE = u' (Read more)'
+ if description.endswith(READ_MORE):
+ description = description[:-len(READ_MORE)]
+
+ return {
+ 'id': video_id,
+ 'url': video_url,
+ 'title': self._og_search_title(webpage),
+ 'description': description,
+ 'thumbnail': self._og_search_thumbnail(webpage),
+ 'duration': duration,
+ 'upload_date': upload_date,
+ }
--- /dev/null
+import json
+import re
+
+from .common import InfoExtractor
+
+
+class NineGagIE(InfoExtractor):
+ IE_NAME = '9gag'
+ _VALID_URL = r'^https?://(?:www\.)?9gag\.tv/v/(?P<id>[0-9]+)'
+
+ _TEST = {
+ u"url": u"http://9gag.tv/v/1912",
+ u"file": u"1912.mp4",
+ u"info_dict": {
+ u"description": u"This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
+ u"title": u"\"People Are Awesome 2013\" Is Absolutely Awesome"
+ },
+ u'add_ie': [u'Youtube']
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+
+ webpage = self._download_webpage(url, video_id)
+ data_json = self._html_search_regex(r'''(?x)
+ <div\s*id="tv-video"\s*data-video-source="youtube"\s*
+ data-video-meta="([^"]+)"''', webpage, u'video metadata')
+
+ data = json.loads(data_json)
+
+ return {
+ '_type': 'url_transparent',
+ 'url': data['youtubeVideoId'],
+ 'ie_key': 'Youtube',
+ 'id': video_id,
+ 'title': data['title'],
+ 'description': data['description'],
+ 'view_count': int(data['view_count']),
+ 'like_count': int(data['statistic']['like']),
+ 'dislike_count': int(data['statistic']['dislike']),
+ 'thumbnail': data['thumbnail_url'],
+ }
def _url_for_embed_code(embed_code):
return 'http://player.ooyala.com/player.js?embedCode=%s' % embed_code
+ @classmethod
+ def _build_url_result(cls, embed_code):
+ return cls.url_result(cls._url_for_embed_code(embed_code),
+ ie=cls.ie_key())
+
def _extract_result(self, info, more_info):
return {'id': info['embedCode'],
'ext': 'mp4',
)
class ORFIE(InfoExtractor):
- _VALID_URL = r'https?://tvthek.orf.at/(programs/.+?/episodes|topics/.+?)/(?P<id>\d+)'
+ _VALID_URL = r'https?://tvthek\.orf\.at/(programs/.+?/episodes|topics/.+?)/(?P<id>\d+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
class PBSIE(InfoExtractor):
- _VALID_URL = r'https?://video.pbs.org/video/(?P<id>\d+)/?'
+ _VALID_URL = r'https?://video\.pbs\.org/video/(?P<id>\d+)/?'
_TEST = {
u'url': u'http://video.pbs.org/video/2365006249/',
--- /dev/null
+import re
+
+from .common import InfoExtractor
+from ..utils import compat_urllib_parse
+
+
+class PornHdIE(InfoExtractor):
+ _VALID_URL = r'(?:http://)?(?:www\.)?pornhd\.com/videos/(?P<video_id>[0-9]+)/(?P<video_title>.+)'
+ _TEST = {
+ u'url': u'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
+ u'file': u'1962.flv',
+ u'md5': u'35272469887dca97abd30abecc6cdf75',
+ u'info_dict': {
+ u"title": u"sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
+ u"age_limit": 18,
+ }
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+
+ video_id = mobj.group('video_id')
+ video_title = mobj.group('video_title')
+
+ webpage = self._download_webpage(url, video_id)
+
+ video_url = self._html_search_regex(
+ r'&hd=(http.+?)&', webpage, u'video URL')
+ video_url = compat_urllib_parse.unquote(video_url)
+ age_limit = 18
+
+ return {
+ 'id': video_id,
+ 'url': video_url,
+ 'ext': 'flv',
+ 'title': video_title,
+ 'age_limit': age_limit,
+ }
)
class PornHubIE(InfoExtractor):
- _VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>pornhub\.com/view_video\.php\?viewkey=(?P<videoid>[0-9]+))'
+ _VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>pornhub\.com/view_video\.php\?viewkey=(?P<videoid>[0-9a-f]+))'
_TEST = {
u'url': u'http://www.pornhub.com/view_video.php?viewkey=648719015',
u'file': u'648719015.mp4',
--- /dev/null
+import re
+import os
+
+from .common import InfoExtractor
+
+
+class PyvideoIE(InfoExtractor):
+ _VALID_URL = r'(?:http://)?(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
+ _TESTS = [{
+ u'url': u'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
+ u'file': u'24_4WWkSmNo.mp4',
+ u'md5': u'de317418c8bc76b1fd8633e4f32acbc6',
+ u'info_dict': {
+ u"title": u"Become a logging expert in 30 minutes",
+ u"description": u"md5:9665350d466c67fb5b1598de379021f7",
+ u"upload_date": u"20130320",
+ u"uploader": u"NextDayVideo",
+ u"uploader_id": u"NextDayVideo",
+ },
+ u'add_ie': ['Youtube'],
+ },
+ {
+ u'url': u'http://pyvideo.org/video/2542/gloriajw-spotifywitherikbernhardsson182m4v',
+ u'md5': u'5fe1c7e0a8aa5570330784c847ff6d12',
+ u'info_dict': {
+ u'id': u'2542',
+ u'ext': u'm4v',
+ u'title': u'Gloriajw-SpotifyWithErikBernhardsson182',
+ },
+ },
+ ]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ webpage = self._download_webpage(url, video_id)
+ m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', webpage)
+
+ if m_youtube is not None:
+ return self.url_result(m_youtube.group(1), 'Youtube')
+
+ title = self._html_search_regex(r'<div class="section">.*?<h3>([^>]+?)</h3>',
+ webpage, u'title', flags=re.DOTALL)
+ video_url = self._search_regex([r'<source src="(.*?)"',
+ r'<dt>Download</dt>.*?<a href="(.+?)"'],
+ webpage, u'video url', flags=re.DOTALL)
+ return {
+ 'id': video_id,
+ 'title': os.path.splitext(title)[0],
+ 'url': video_url,
+ }
--- /dev/null
+# coding: utf-8
+import re
+
+from .common import InfoExtractor
+
+
+class RadioFranceIE(InfoExtractor):
+ _VALID_URL = r'^https?://maison\.radiofrance\.fr/radiovisions/(?P<id>[^?#]+)'
+ IE_NAME = u'radiofrance'
+
+ _TEST = {
+ u'url': u'http://maison.radiofrance.fr/radiovisions/one-one',
+ u'file': u'one-one.ogg',
+ u'md5': u'bdbb28ace95ed0e04faab32ba3160daf',
+ u'info_dict': {
+ u"title": u"One to one",
+ u"description": u"Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
+ u"uploader": u"Thomas Hercouët",
+ },
+ }
+
+ def _real_extract(self, url):
+ m = re.match(self._VALID_URL, url)
+ video_id = m.group('id')
+
+ webpage = self._download_webpage(url, video_id)
+ title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, u'title')
+ description = self._html_search_regex(
+ r'<div class="bloc_page_wrapper"><div class="text">(.*?)</div>',
+ webpage, u'description', fatal=False)
+ uploader = self._html_search_regex(
+ r'<div class="credit"> © (.*?)</div>',
+ webpage, u'uploader', fatal=False)
+
+ formats_str = self._html_search_regex(
+ r'class="jp-jplayer[^"]*" data-source="([^"]+)">',
+ webpage, u'audio URLs')
+ formats = [
+ {
+ 'format_id': fm[0],
+ 'url': fm[1],
+ 'vcodec': 'none',
+ }
+ for fm in
+ re.findall(r"([a-z0-9]+)\s*:\s*'([^']+)'", formats_str)
+ ]
+ # No sorting, we don't know any more about these formats
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'formats': formats,
+ 'description': description,
+ 'uploader': uploader,
+ }
ExtractorError,
)
+
class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
- _VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?rtlnitronow\.de/|(?:www\.)?superrtlnow\.de/|(?:www\.)?n-tvnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
+ _VALID_URL = r'(?:http://)?(?P<url>(?P<domain>rtl-now\.rtl\.de|rtl2now\.rtl2\.de|(?:www\.)?voxnow\.de|(?:www\.)?rtlnitronow\.de|(?:www\.)?superrtlnow\.de|(?:www\.)?n-tvnow\.de)/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv',
u'info_dict': {
- u'upload_date': u'20070416',
+ u'upload_date': u'20070416',
u'title': u'Ahornallee - Folge 1 - Der Einzug',
u'description': u'Folge 1 - Der Einzug',
},
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
- video_page_url = u'http://' + mobj.group('base_url')
+ video_page_url = u'http://' + mobj.group('domain') + u'/'
video_id = mobj.group(u'video_id')
webpage = self._download_webpage(webpage_url, video_id)
class RutubeIE(InfoExtractor):
- _VALID_URL = r'https?://rutube.ru/video/(?P<long_id>\w+)'
+ _VALID_URL = r'https?://rutube\.ru/video/(?P<long_id>\w+)'
_TEST = {
u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
class SlashdotIE(InfoExtractor):
- _VALID_URL = r'https?://tv.slashdot.org/video/\?embed=(?P<id>.*?)(&|$)'
+ _VALID_URL = r'https?://tv\.slashdot\.org/video/\?embed=(?P<id>.*?)(&|$)'
_TEST = {
u'add_ie': ['Ooyala'],
# encoding: utf-8
+import os.path
import re
import json
import hashlib
+import uuid
from .common import InfoExtractor
from ..utils import (
- determine_ext,
- ExtractorError
+ compat_urllib_parse,
+ compat_urllib_request,
+ ExtractorError,
+ url_basename,
)
# We will extract some from the video web page instead
video_page_url = 'http://' + mobj.group('url')
video_page = self._download_webpage(video_page_url, video_id, u'Downloading video page')
-
+
+ # Warning if video is unavailable
+ warning = self._html_search_regex(
+ r'<div class="videoUnModer">(.*?)</div>', video_page,
+ u'warning messagef', default=None)
+ if warning is not None:
+ self._downloader.report_warning(
+ u'Video %s may not be available; smotri said: %s ' %
+ (video_id, warning))
+
# Adult content
if re.search(u'EroConfirmText">', video_page) is not None:
self.report_age_confirmation()
# Extract the rest of meta data
video_title = self._search_meta(u'name', video_page, u'title')
if not video_title:
- video_title = video_url.rsplit('/', 1)[-1]
+ video_title = os.path.splitext(url_basename(video_url))[0]
video_description = self._search_meta(u'description', video_page)
END_TEXT = u' на сайте Smotri.com'
- if video_description.endswith(END_TEXT):
+ if video_description and video_description.endswith(END_TEXT):
video_description = video_description[:-len(END_TEXT)]
START_TEXT = u'Смотреть онлайн ролик '
- if video_description.startswith(START_TEXT):
+ if video_description and video_description.startswith(START_TEXT):
video_description = video_description[len(START_TEXT):]
video_thumbnail = self._search_meta(u'thumbnail', video_page)
upload_date_str = self._search_meta(u'uploadDate', video_page, u'upload date')
- upload_date_m = re.search(r'(?P<year>\d{4})\.(?P<month>\d{2})\.(?P<day>\d{2})T', upload_date_str)
- video_upload_date = (
- (
- upload_date_m.group('year') +
- upload_date_m.group('month') +
- upload_date_m.group('day')
+ if upload_date_str:
+ upload_date_m = re.search(r'(?P<year>\d{4})\.(?P<month>\d{2})\.(?P<day>\d{2})T', upload_date_str)
+ video_upload_date = (
+ (
+ upload_date_m.group('year') +
+ upload_date_m.group('month') +
+ upload_date_m.group('day')
+ )
+ if upload_date_m else None
)
- if upload_date_m else None
- )
+ else:
+ video_upload_date = None
duration_str = self._search_meta(u'duration', video_page)
- duration_m = re.search(r'T(?P<hours>[0-9]{2})H(?P<minutes>[0-9]{2})M(?P<seconds>[0-9]{2})S', duration_str)
- video_duration = (
- (
- (int(duration_m.group('hours')) * 60 * 60) +
- (int(duration_m.group('minutes')) * 60) +
- int(duration_m.group('seconds'))
+ if duration_str:
+ duration_m = re.search(r'T(?P<hours>[0-9]{2})H(?P<minutes>[0-9]{2})M(?P<seconds>[0-9]{2})S', duration_str)
+ video_duration = (
+ (
+ (int(duration_m.group('hours')) * 60 * 60) +
+ (int(duration_m.group('minutes')) * 60) +
+ int(duration_m.group('seconds'))
+ )
+ if duration_m else None
)
- if duration_m else None
- )
+ else:
+ video_duration = None
video_uploader = self._html_search_regex(
u'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info[^"]+">(.*?)</a>',
'uploader': video_uploader,
'upload_date': video_upload_date,
'uploader_id': video_uploader_id,
- 'video_duration': video_duration,
+ 'duration': video_duration,
'view_count': video_view_count,
'age_limit': 18 if adult_content else 0,
'video_page_url': video_page_url
u'user nickname')
return self.playlist_result(entries, user_id, user_nickname)
+
+
+class SmotriBroadcastIE(InfoExtractor):
+ IE_DESC = u'Smotri.com broadcasts'
+ IE_NAME = u'smotri:broadcast'
+ _VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<broadcastid>[^/]+))/?.*'
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ broadcast_id = mobj.group('broadcastid')
+
+ broadcast_url = 'http://' + mobj.group('url')
+ broadcast_page = self._download_webpage(broadcast_url, broadcast_id, u'Downloading broadcast page')
+
+ if re.search(u'>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
+ raise ExtractorError(u'Broadcast %s does not exist' % broadcast_id, expected=True)
+
+ # Adult content
+ if re.search(u'EroConfirmText">', broadcast_page) is not None:
+
+ (username, password) = self._get_login_info()
+ if username is None:
+ raise ExtractorError(u'Erotic broadcasts allowed only for registered users, '
+ u'use --username and --password options to provide account credentials.', expected=True)
+
+ # Log in
+ login_form_strs = {
+ u'login-hint53': '1',
+ u'confirm_erotic': '1',
+ u'login': username,
+ u'password': password,
+ }
+ # Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
+ # chokes on unicode
+ login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
+ login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
+ login_url = broadcast_url + '/?no_redirect=1'
+ request = compat_urllib_request.Request(login_url, login_data)
+ request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+ broadcast_page = self._download_webpage(
+ request, broadcast_id, note=u'Logging in and confirming age')
+
+ if re.search(u'>Неверный логин или пароль<', broadcast_page) is not None:
+ raise ExtractorError(u'Unable to log in: bad username or password', expected=True)
+
+ adult_content = True
+ else:
+ adult_content = False
+
+ ticket = self._html_search_regex(
+ u'window\.broadcast_control\.addFlashVar\\(\'file\', \'([^\']+)\'\\);',
+ broadcast_page, u'broadcast ticket')
+
+ url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
+
+ broadcast_password = self._downloader.params.get('videopassword', None)
+ if broadcast_password:
+ url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
+
+ broadcast_json_page = self._download_webpage(url, broadcast_id, u'Downloading broadcast JSON')
+
+ try:
+ broadcast_json = json.loads(broadcast_json_page)
+
+ protected_broadcast = broadcast_json['_pass_protected'] == 1
+ if protected_broadcast and not broadcast_password:
+ raise ExtractorError(u'This broadcast is protected by a password, use the --video-password option', expected=True)
+
+ broadcast_offline = broadcast_json['is_play'] == 0
+ if broadcast_offline:
+ raise ExtractorError(u'Broadcast %s is offline' % broadcast_id, expected=True)
+
+ rtmp_url = broadcast_json['_server']
+ if not rtmp_url.startswith('rtmp://'):
+ raise ExtractorError(u'Unexpected broadcast rtmp URL')
+
+ broadcast_playpath = broadcast_json['_streamName']
+ broadcast_thumbnail = broadcast_json['_imgURL']
+ broadcast_title = broadcast_json['title']
+ broadcast_description = broadcast_json['description']
+ broadcaster_nick = broadcast_json['nick']
+ broadcaster_login = broadcast_json['login']
+ rtmp_conn = 'S:%s' % uuid.uuid4().hex
+ except KeyError:
+ if protected_broadcast:
+ raise ExtractorError(u'Bad broadcast password', expected=True)
+ raise ExtractorError(u'Unexpected broadcast JSON')
+
+ return {
+ 'id': broadcast_id,
+ 'url': rtmp_url,
+ 'title': broadcast_title,
+ 'thumbnail': broadcast_thumbnail,
+ 'description': broadcast_description,
+ 'uploader': broadcaster_nick,
+ 'uploader_id': broadcaster_login,
+ 'age_limit': 18 if adult_content else 0,
+ 'ext': 'flv',
+ 'play_path': broadcast_playpath,
+ 'rtmp_live': True,
+ 'rtmp_conn': rtmp_conn
+ }
+# encoding: utf-8
import json
import re
import itertools
"""
_VALID_URL = r'''^(?:https?://)?
- (?:(?:(?:www\.)?soundcloud\.com/([\w\d-]+)/([\w\d-]+)/?(?:[?].*)?$)
+ (?:(?:(?:www\.|m\.)?soundcloud\.com/
+ (?P<uploader>[\w\d-]+)/
+ (?!sets/)(?P<title>[\w\d-]+)/?
+ (?P<token>[^?]+?)?(?:[?].*)?$)
|(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+))
- |(?P<widget>w.soundcloud.com/player/?.*?url=.*)
+ |(?P<widget>w\.soundcloud\.com/player/?.*?url=.*)
)
'''
IE_NAME = u'soundcloud'
u'skip_download': True,
},
},
+ # private link
+ {
+ u'url': u'https://soundcloud.com/jaimemf/youtube-dl-test-video-a-y-baw/s-8Pjrp',
+ u'md5': u'aa0dd32bfea9b0c5ef4f02aacd080604',
+ u'info_dict': {
+ u'id': u'123998367',
+ u'ext': u'mp3',
+ u'title': u'Youtube - Dl Test Video \'\' Ä↭',
+ u'uploader': u'jaimeMF',
+ u'description': u'test chars: \"\'/\\ä↭',
+ u'upload_date': u'20131209',
+ },
+ },
+ # downloadable song
+ {
+ u'url': u'https://soundcloud.com/simgretina/just-your-problem-baby-1',
+ u'md5': u'56a8b69568acaa967b4c49f9d1d52d19',
+ u'info_dict': {
+ u'id': u'105614606',
+ u'ext': u'wav',
+ u'title': u'Just Your Problem Baby (Acapella)',
+ u'description': u'Vocals',
+ u'uploader': u'Sim Gretina',
+ u'upload_date': u'20130815',
+ },
+ },
]
_CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
def _resolv_url(cls, url):
return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
- def _extract_info_dict(self, info, full_title=None, quiet=False):
+ def _extract_info_dict(self, info, full_title=None, quiet=False, secret_token=None):
track_id = compat_str(info['id'])
name = full_title or track_id
if quiet:
thumbnail = info['artwork_url']
if thumbnail is not None:
thumbnail = thumbnail.replace('-large', '-t500x500')
- ext = info.get('original_format', u'mp3')
+ ext = u'mp3'
result = {
'id': track_id,
'uploader': info['user']['username'],
track_id, self._CLIENT_ID))
result['formats'] = [{
'format_id': 'download',
- 'ext': ext,
+ 'ext': info.get('original_format', u'mp3'),
'url': format_url,
'vcodec': 'none',
}]
else:
# We have to retrieve the url
+ streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
+ 'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
stream_json = self._download_webpage(
- 'http://api.soundcloud.com/i1/tracks/{0}/streams?client_id={1}'.format(track_id, self._IPHONE_CLIENT_ID),
+ streams_url,
track_id, u'Downloading track url')
formats = []
raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group('track_id')
+ token = None
if track_id is not None:
info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
full_title = track_id
return self.url_result(query['url'][0], ie='Soundcloud')
else:
# extract uploader (which is in the url)
- uploader = mobj.group(1)
+ uploader = mobj.group('uploader')
# extract simple title (uploader + slug of song title)
- slug_title = mobj.group(2)
- full_title = '%s/%s' % (uploader, slug_title)
+ slug_title = mobj.group('title')
+ token = mobj.group('token')
+ full_title = resolve_title = '%s/%s' % (uploader, slug_title)
+ if token:
+ resolve_title += '/%s' % token
self.report_resolve(full_title)
- url = 'http://soundcloud.com/%s/%s' % (uploader, slug_title)
+ url = 'http://soundcloud.com/%s' % resolve_title
info_json_url = self._resolv_url(url)
info_json = self._download_webpage(info_json_url, full_title, u'Downloading info JSON')
info = json.loads(info_json)
- return self._extract_info_dict(info, full_title)
+ return self._extract_info_dict(info, full_title, secret_token=token)
class SoundcloudSetIE(SoundcloudIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/sets/([\w\d-]+)(?:[?].*)?$'
class SoundcloudUserIE(SoundcloudIE):
- _VALID_URL = r'https?://(www\.)?soundcloud.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
+ _VALID_URL = r'https?://(www\.)?soundcloud\.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
IE_NAME = u'soundcloud:user'
# it's in tests/test_playlists.py
class SpaceIE(InfoExtractor):
- _VALID_URL = r'https?://www\.space\.com/\d+-(?P<title>[^/\.\?]*?)-video.html'
+ _VALID_URL = r'https?://www\.space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
_TEST = {
u'add_ie': ['Brightcove'],
u'url': u'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
import re
-import socket
-import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
- compat_http_client,
- compat_str,
- compat_urllib_error,
- compat_urllib_request,
-
ExtractorError,
orderedSet,
unescapeHTML,
class StanfordOpenClassroomIE(InfoExtractor):
IE_NAME = u'stanfordoc'
IE_DESC = u'Stanford Open ClassRoom'
- _VALID_URL = r'^(?:https?://)?openclassroom.stanford.edu(?P<path>/?|(/MainFolder/(?:HomePage|CoursePage|VideoPage)\.php([?]course=(?P<course>[^&]+)(&video=(?P<video>[^&]+))?(&.*)?)?))$'
+ _VALID_URL = r'^(?:https?://)?openclassroom\.stanford\.edu(?P<path>/?|(/MainFolder/(?:HomePage|CoursePage|VideoPage)\.php([?]course=(?P<course>[^&]+)(&video=(?P<video>[^&]+))?(&.*)?)?))$'
_TEST = {
u'url': u'http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=PracticalUnix&video=intro-environment&speed=100',
u'file': u'PracticalUnix_intro-environment.mp4',
self.report_extraction(info['id'])
baseUrl = 'http://openclassroom.stanford.edu/MainFolder/courses/' + course + '/videos/'
xmlUrl = baseUrl + video + '.xml'
- try:
- metaXml = compat_urllib_request.urlopen(xmlUrl).read()
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to download video info XML: %s' % compat_str(err))
- mdoc = xml.etree.ElementTree.fromstring(metaXml)
+ mdoc = self._download_xml(xmlUrl, info['id'])
try:
info['title'] = mdoc.findall('./title')[0].text
info['url'] = baseUrl + mdoc.findall('./videoFile')[0].text
'upload_date': None,
}
- self.report_download_webpage(info['id'])
rootURL = 'http://openclassroom.stanford.edu/MainFolder/HomePage.php'
- try:
- rootpage = compat_urllib_request.urlopen(rootURL).read()
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to download course info page: ' + compat_str(err))
+ rootpage = self._download_webpage(rootURL, info['id'],
+ errnote=u'Unable to download course info page')
info['title'] = info['id']
class TF1IE(InfoExtractor):
"""TF1 uses the wat.tv player."""
- _VALID_URL = r'http://videos.tf1.fr/.*-(.*?).html'
+ _VALID_URL = r'http://videos\.tf1\.fr/.*-(.*?)\.html'
_TEST = {
u'url': u'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
u'file': u'10635995.mp4',
--- /dev/null
+import re
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ xpath_with_ns,
+)
+
+_x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language'})
+
+
+class ThePlatformIE(InfoExtractor):
+ _VALID_URL = r'(?:https?://link\.theplatform\.com/s/[^/]+/|theplatform:)(?P<id>[^/\?]+)'
+
+ _TEST = {
+ # from http://www.metacafe.com/watch/cb-e9I_cZgTgIPd/blackberrys_big_bold_z30/
+ u'url': u'http://link.theplatform.com/s/dJ5BDC/e9I_cZgTgIPd/meta.smil?format=smil&Tracking=true&mbr=true',
+ u'info_dict': {
+ u'id': u'e9I_cZgTgIPd',
+ u'ext': u'flv',
+ u'title': u'Blackberry\'s big, bold Z30',
+ u'description': u'The Z30 is Blackberry\'s biggest, baddest mobile messaging device yet.',
+ u'duration': 247,
+ },
+ u'params': {
+ # rtmp download
+ u'skip_download': True,
+ },
+ }
+
+ def _get_info(self, video_id):
+ smil_url = ('http://link.theplatform.com/s/dJ5BDC/{0}/meta.smil?'
+ 'format=smil&mbr=true'.format(video_id))
+ meta = self._download_xml(smil_url, video_id)
+
+ try:
+ error_msg = next(
+ n.attrib['abstract']
+ for n in meta.findall(_x('.//smil:ref'))
+ if n.attrib.get('title') == u'Geographic Restriction')
+ except StopIteration:
+ pass
+ else:
+ raise ExtractorError(error_msg, expected=True)
+
+ info_url = 'http://link.theplatform.com/s/dJ5BDC/{0}?format=preview'.format(video_id)
+ info_json = self._download_webpage(info_url, video_id)
+ info = json.loads(info_json)
+
+ head = meta.find(_x('smil:head'))
+ body = meta.find(_x('smil:body'))
+ base_url = head.find(_x('smil:meta')).attrib['base']
+ switch = body.find(_x('smil:switch'))
+ formats = []
+ for f in switch.findall(_x('smil:video')):
+ attr = f.attrib
+ formats.append({
+ 'url': base_url,
+ 'play_path': 'mp4:' + attr['src'],
+ 'ext': 'flv',
+ 'width': int(attr['width']),
+ 'height': int(attr['height']),
+ 'vbr': int(attr['system-bitrate']),
+ })
+ formats.sort(key=lambda f: (f['height'], f['width'], f['vbr']))
+
+ return {
+ 'id': video_id,
+ 'title': info['title'],
+ 'formats': formats,
+ 'description': info['description'],
+ 'thumbnail': info['defaultThumbnailUrl'],
+ 'duration': info['duration']//1000,
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+ return self._get_info(video_id)
from .common import InfoExtractor
class UnistraIE(InfoExtractor):
- _VALID_URL = r'http://utv.unistra.fr/(?:index|video).php\?id_video\=(\d+)'
+ _VALID_URL = r'http://utv\.unistra\.fr/(?:index|video)\.php\?id_video\=(\d+)'
_TEST = {
u'url': u'http://utv.unistra.fr/video.php?id_video=154',
_TEST = {
u'url': u'http://vbox7.com/play:249bb972c2',
u'file': u'249bb972c2.flv',
- u'md5': u'9c70d6d956f888bdc08c124acc120cfe',
+ u'md5': u'99f65c0c9ef9b682b97313e052734c3f',
u'info_dict': {
u"title": u"\u0421\u043c\u044f\u0445! \u0427\u0443\u0434\u043e - \u0447\u0438\u0441\u0442 \u0437\u0430 \u0441\u0435\u043a\u0443\u043d\u0434\u0438 - \u0421\u043a\u0440\u0438\u0442\u0430 \u043a\u0430\u043c\u0435\u0440\u0430"
}
)
class VeeHDIE(InfoExtractor):
- _VALID_URL = r'https?://veehd.com/video/(?P<id>\d+)'
+ _VALID_URL = r'https?://veehd\.com/video/(?P<id>\d+)'
_TEST = {
u'url': u'http://veehd.com/video/4686958',
Accepts urls from vevo.com or in the format 'vevo:{id}'
(currently used by MTVIE)
"""
- _VALID_URL = r'((http://www.vevo.com/watch/.*?/.*?/)|(vevo:))(?P<id>.*?)(\?|$)'
+ _VALID_URL = r'''(?x)
+ (?:https?://www\.vevo\.com/watch/(?:[^/]+/[^/]+/)?|
+ https?://cache\.vevo\.com/m/html/embed\.html\?video=|
+ https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
+ vevo:)
+ (?P<id>[^&?#]+)'''
_TESTS = [{
u'url': u'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
u'file': u'GB1101300280.mp4',
u"upload_date": u"20130624",
u"uploader": u"Hurts",
u"title": u"Somebody to Die For",
- u"duration": 230,
+ u"duration": 230.12,
u"width": 1920,
u"height": 1080,
}
class ViceIE(InfoExtractor):
- _VALID_URL = r'http://www.vice.com/.*?/(?P<name>.+)'
+ _VALID_URL = r'http://www\.vice\.com/.*?/(?P<name>.+)'
_TEST = {
u'url': u'http://www.vice.com/Fringes/cowboy-capitalists-part-1',
import re
from .common import InfoExtractor
-from ..utils import (
- determine_ext,
-)
class ViddlerIE(InfoExtractor):
- _VALID_URL = r'(?P<domain>https?://(?:www\.)?viddler.com)/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
+ _VALID_URL = r'(?P<domain>https?://(?:www\.)?viddler\.com)/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
_TEST = {
u"url": u"http://www.viddler.com/v/43903784",
u'file': u'43903784.mp4',
)
class VideofyMeIE(InfoExtractor):
- _VALID_URL = r'https?://(www.videofy.me/.+?|p.videofy.me/v)/(?P<id>\d+)(&|#|$)'
+ _VALID_URL = r'https?://(www\.videofy\.me/.+?|p\.videofy\.me/v)/(?P<id>\d+)(&|#|$)'
IE_NAME = u'videofy.me'
_TEST = {
u'params': {
u'skip_download': True,
},
+ u'skip': u'Test file has been deleted.',
}
def _real_extract(self, url):
unsmuggle_url,
)
+
class VimeoIE(InfoExtractor):
"""Information extractor for vimeo.com."""
# _VALID_URL matches Vimeo URLs
- _VALID_URL = r'(?P<proto>https?://)?(?:(?:www|(?P<player>player))\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)/?(?:[?].*)?(?:#.*)?$'
+ _VALID_URL = r'''(?x)
+ (?P<proto>https?://)?
+ (?:(?:www|(?P<player>player))\.)?
+ vimeo(?P<pro>pro)?\.com/
+ (?:.*?/)?
+ (?:(?:play_redirect_hls|moogaloop\.swf)\?clip_id=)?
+ (?:videos?/)?
+ (?P<id>[0-9]+)
+ /?(?:[?&].*)?(?:[#].*)?$'''
_NETRC_MACHINE = 'vimeo'
IE_NAME = u'vimeo'
_TESTS = [
def _real_initialize(self):
self._login()
- def _real_extract(self, url, new_video=True):
+ def _real_extract(self, url):
url, data = unsmuggle_url(url)
headers = std_headers
if data is not None:
config = json.loads(config_json)
except RegexNotFoundError:
# For pro videos or player.vimeo.com urls
- config = self._search_regex([r' = {config:({.+?}),assets:', r'(?:c|b)=({.+?});'],
- webpage, u'info section', flags=re.DOTALL)
+ # We try to find out to which variable is assigned the config dic
+ m_variable_name = re.search('(\w)\.video\.id', webpage)
+ if m_variable_name is not None:
+ config_re = r'%s=({.+?});' % re.escape(m_variable_name.group(1))
+ else:
+ config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
+ config = self._search_regex(config_re, webpage, u'info section',
+ flags=re.DOTALL)
config = json.loads(config)
except Exception as e:
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
if mobj is not None:
video_upload_date = mobj.group(1) + mobj.group(2) + mobj.group(3)
+ try:
+ view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, u'view count'))
+ like_count = int(self._search_regex(r'UserLikes:(\d+)', webpage, u'like count'))
+ comment_count = int(self._search_regex(r'UserComments:(\d+)', webpage, u'comment count'))
+ except RegexNotFoundError:
+ # This info is only available in vimeo.com/{id} urls
+ view_count = None
+ like_count = None
+ comment_count = None
+
# Vimeo specific: extract request signature and timestamp
sig = config['request']['signature']
timestamp = config['request']['timestamp']
'description': video_description,
'formats': formats,
'webpage_url': url,
+ 'view_count': view_count,
+ 'like_count': like_count,
+ 'comment_count': comment_count,
}
_MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
_TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'
+ def _page_url(self, base_url, pagenum):
+ return '%s/videos/page:%d/' % (base_url, pagenum)
+
+ def _extract_list_title(self, webpage):
+ return self._html_search_regex(self._TITLE_RE, webpage, u'list title')
+
def _extract_videos(self, list_id, base_url):
video_ids = []
for pagenum in itertools.count(1):
webpage = self._download_webpage(
- '%s/videos/page:%d/' % (base_url, pagenum),list_id,
+ self._page_url(base_url, pagenum) ,list_id,
u'Downloading page %s' % pagenum)
video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
entries = [self.url_result('http://vimeo.com/%s' % video_id, 'Vimeo')
for video_id in video_ids]
- list_title = self._html_search_regex(self._TITLE_RE, webpage,
- u'list title')
return {'_type': 'playlist',
'id': list_id,
- 'title': list_title,
+ 'title': self._extract_list_title(webpage),
'entries': entries,
}
@classmethod
def suitable(cls, url):
- if VimeoChannelIE.suitable(url) or VimeoIE.suitable(url):
+ if VimeoChannelIE.suitable(url) or VimeoIE.suitable(url) or VimeoAlbumIE.suitable(url) or VimeoGroupsIE.suitable(url):
return False
return super(VimeoUserIE, cls).suitable(url)
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
return self._extract_videos(name, 'http://vimeo.com/%s' % name)
+
+
+class VimeoAlbumIE(VimeoChannelIE):
+ IE_NAME = u'vimeo:album'
+ _VALID_URL = r'(?:https?://)?vimeo.\com/album/(?P<id>\d+)'
+ _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
+
+ def _page_url(self, base_url, pagenum):
+ return '%s/page:%d/' % (base_url, pagenum)
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ album_id = mobj.group('id')
+ return self._extract_videos(album_id, 'http://vimeo.com/album/%s' % album_id)
+
+
+class VimeoGroupsIE(VimeoAlbumIE):
+ IE_NAME = u'vimeo:group'
+ _VALID_URL = r'(?:https?://)?vimeo.\com/groups/(?P<name>[^/]+)'
+
+ def _extract_list_title(self, webpage):
+ return self._og_search_title(webpage)
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ name = mobj.group('name')
+ return self._extract_videos(name, 'http://vimeo.com/groups/%s' % name)
class WatIE(InfoExtractor):
- _VALID_URL=r'http://www.wat.tv/.*-(?P<shortID>.*?)_.*?.html'
+ _VALID_URL=r'http://www\.wat\.tv/.*-(?P<shortID>.*?)_.*?\.html'
IE_NAME = 'wat.tv'
_TEST = {
u'url': u'http://www.wat.tv/video/world-war-philadelphia-vost-6bv55_2fjr7_.html',
u'file': u'deerfence.flv',
u'md5': u'8b215e2e0168c6081a1cf84b2846a2b5',
u'info_dict': {
- u"title": u"Watch Till End: Herd of deer jump over a fence."
+ u"title": u"Watch Till End: Herd of deer jump over a fence.",
+ u"description": u"These deer look as fluid as running water when they jump over this fence as a herd. This video is one that needs to be watched until the very end for the true majesty to be witnessed, but once it comes, it's sure to take your breath away.",
}
}
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
- title = self._search_regex(r'<meta name="description" content="(.+?)" />',webpage, 'video title')
- thumbnail_url = self._search_regex(r'<meta property="og\:image" content="(.+?)" />', webpage,'video thumbnail')
googleString = self._search_regex("googleCode = '(.*?)'", webpage, 'file url')
googleString = base64.b64decode(googleString).decode('ascii')
- final_url = self._search_regex('","(.*?)"', googleString,'final video url')
- ext = final_url.rpartition(u'.')[2]
-
- return [{
- 'id': video_id,
- 'url': final_url,
- 'ext': ext,
- 'title': title,
- 'thumbnail': thumbnail_url,
- }]
+ final_url = self._search_regex('","(.*?)"', googleString, u'final video url')
+ return {
+ 'id': video_id,
+ 'url': final_url,
+ 'title': self._og_search_title(webpage),
+ 'thumbnail': self._og_search_thumbnail(webpage),
+ 'description': self._og_search_description(webpage),
+ }
--- /dev/null
+import json
+import re
+
+from .common import InfoExtractor
+
+
+class WistiaIE(InfoExtractor):
+ _VALID_URL = r'^https?://(?:fast\.)?wistia\.net/embed/iframe/(?P<id>[a-z0-9]+)'
+
+ _TEST = {
+ u"url": u"http://fast.wistia.net/embed/iframe/sh7fpupwlt",
+ u"file": u"sh7fpupwlt.mov",
+ u"md5": u"cafeb56ec0c53c18c97405eecb3133df",
+ u"info_dict": {
+ u"title": u"cfh_resourceful_zdkh_final_1"
+ },
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('id')
+
+ webpage = self._download_webpage(url, video_id)
+ data_json = self._html_search_regex(
+ r'Wistia.iframeInit\((.*?), {}\);', webpage, u'video data')
+
+ data = json.loads(data_json)
+
+ formats = []
+ thumbnails = []
+ for atype, a in data['assets'].items():
+ if atype == 'still':
+ thumbnails.append({
+ 'url': a['url'],
+ 'resolution': '%dx%d' % (a['width'], a['height']),
+ })
+ continue
+ if atype == 'preview':
+ continue
+ formats.append({
+ 'format_id': atype,
+ 'url': a['url'],
+ 'width': a['width'],
+ 'height': a['height'],
+ 'filesize': a['size'],
+ 'ext': a['ext'],
+ })
+ formats.sort(key=lambda a: a['filesize'])
+
+ return {
+ 'id': video_id,
+ 'title': data['name'],
+ 'formats': formats,
+ 'thumbnails': thumbnails,
+ }
{
u'url': u'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
u'file': u'2221348.flv',
- u'md5': u'970a94178ca4118c5aa3aaea21211b81',
+ u'md5': u'e767b9475de189320f691f49c679c4c7',
u'info_dict': {
u"upload_date": u"20130914",
u"uploader_id": u"jojo747400",
return mobj.group('server')+'/key='+mobj.group('file')
def is_hd(webpage):
- return webpage.find('<div class=\'icon iconHD\'>') != -1
+ return webpage.find('<div class=\'icon iconHD\'') != -1
mobj = re.match(self._VALID_URL, url)
video_title = self._html_search_regex(r'<div class="p_5px[^>]*>([^<]+)', webpage, u'title')
video_uploader = self._html_search_regex(r'so_s\.addVariable\("owner_u", "([^"]+)', webpage, u'uploader', fatal=False)
- video_description = self._html_search_regex(r'<p class="video_description">([^<]+)', webpage, u'description', default=None)
+ video_description = self._html_search_regex(r'<p class="video_description">([^<]+)', webpage, u'description', fatal=False)
video_url= self._html_search_regex(r'var videoMp4 = "([^"]+)', webpage, u'video_url').replace('\\/', '/')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
# The 'meta' field is not always in the video webpage, we request it
# from another page
long_id = info['id']
- return self._get_info(info['id'], video_id)
+ return self._get_info(long_id, video_id)
def _get_info(self, long_id, video_id):
query = ('SELECT * FROM yahoo.media.video.streams WHERE id="%s"'
class YouJizzIE(InfoExtractor):
- _VALID_URL = r'^(?:https?://)?(?:\w+\.)?youjizz\.com/videos/(?P<videoid>[^.]+).html$'
+ _VALID_URL = r'^(?:https?://)?(?:\w+\.)?youjizz\.com/videos/(?P<videoid>[^.]+)\.html$'
_TEST = {
u'url': u'http://www.youjizz.com/videos/zeichentrick-1-2189178.html',
u'file': u'2189178.flv',
import json
import os.path
import re
-import socket
import string
import struct
import traceback
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_chr,
- compat_http_client,
compat_parse_qs,
- compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
# If True it will raise an error if no login info is provided
_LOGIN_REQUIRED = False
- def report_lang(self):
- """Report attempt to set language."""
- self.to_screen(u'Setting language')
-
def _set_language(self):
- request = compat_urllib_request.Request(self._LANG_URL)
- try:
- self.report_lang()
- compat_urllib_request.urlopen(request).read()
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- self._downloader.report_warning(u'unable to set language: %s' % compat_str(err))
- return False
- return True
+ return bool(self._download_webpage(
+ self._LANG_URL, None,
+ note=u'Setting language', errnote='unable to set language',
+ fatal=False))
def _login(self):
(username, password) = self._get_login_info()
raise ExtractorError(u'No login info available, needed for using %s.' % self.IE_NAME, expected=True)
return False
- request = compat_urllib_request.Request(self._LOGIN_URL)
- try:
- login_page = compat_urllib_request.urlopen(request).read().decode('utf-8')
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- self._downloader.report_warning(u'unable to fetch login page: %s' % compat_str(err))
- return False
+ login_page = self._download_webpage(
+ self._LOGIN_URL, None,
+ note=u'Downloading login page',
+ errnote=u'unable to fetch login page', fatal=False)
+ if login_page is False:
+ return
galx = self._search_regex(r'(?s)<input.+?name="GALX".+?value="(.+?)"',
login_page, u'Login GALX parameter')
# chokes on unicode
login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
login_data = compat_urllib_parse.urlencode(login_form).encode('ascii')
- request = compat_urllib_request.Request(self._LOGIN_URL, login_data)
- try:
- self.report_login()
- login_results = compat_urllib_request.urlopen(request).read().decode('utf-8')
- if re.search(r'(?i)<form[^>]* id="gaia_loginform"', login_results) is not None:
- self._downloader.report_warning(u'unable to log in: bad username or password')
- return False
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- self._downloader.report_warning(u'unable to log in: %s' % compat_str(err))
+
+ req = compat_urllib_request.Request(self._LOGIN_URL, login_data)
+ login_results = self._download_webpage(
+ req, None,
+ note=u'Logging in', errnote=u'unable to log in', fatal=False)
+ if login_results is False:
+ return False
+ if re.search(r'(?i)<form[^>]* id="gaia_loginform"', login_results) is not None:
+ self._downloader.report_warning(u'unable to log in: bad username or password')
return False
return True
def _confirm_age(self):
age_form = {
- 'next_url': '/',
- 'action_confirm': 'Confirm',
- }
- request = compat_urllib_request.Request(self._AGE_URL, compat_urllib_parse.urlencode(age_form))
- try:
- self.report_age_confirmation()
- compat_urllib_request.urlopen(request).read().decode('utf-8')
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to confirm age: %s' % compat_str(err))
+ 'next_url': '/',
+ 'action_confirm': 'Confirm',
+ }
+ req = compat_urllib_request.Request(self._AGE_URL, compat_urllib_parse.urlencode(age_form))
+
+ self._download_webpage(
+ req, None,
+ note=u'Confirming age', errnote=u'Unable to confirm age')
return True
def _real_initialize(self):
super(YoutubeIE, self).__init__(*args, **kwargs)
self._player_cache = {}
- def report_video_webpage_download(self, video_id):
- """Report attempt to download video webpage."""
- self.to_screen(u'%s: Downloading video webpage' % video_id)
-
def report_video_info_webpage_download(self, video_id):
"""Report attempt to download video info webpage."""
self.to_screen(u'%s: Downloading video info webpage' % video_id)
video_id = self._extract_id(url)
# Get video webpage
- self.report_video_webpage_download(video_id)
url = 'https://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id
- request = compat_urllib_request.Request(url)
- try:
- video_webpage_bytes = compat_urllib_request.urlopen(request).read()
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to download video webpage: %s' % compat_str(err))
-
- video_webpage = video_webpage_bytes.decode('utf-8', 'ignore')
+ video_webpage = self._download_webpage(url, video_id)
# Attempt to extract SWF player URL
mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
else:
video_description = u''
+ def _extract_count(klass):
+ count = self._search_regex(
+ r'class="%s">([\d,]+)</span>' % re.escape(klass),
+ video_webpage, klass, default=None)
+ if count is not None:
+ return int(count.replace(',', ''))
+ return None
+ like_count = _extract_count(u'likes-count')
+ dislike_count = _extract_count(u'dislikes-count')
+
# subtitles
video_subtitles = self.extract_subtitles(video_id, video_webpage)
if 'length_seconds' not in video_info:
self._downloader.report_warning(u'unable to extract video duration')
- video_duration = ''
+ video_duration = None
else:
- video_duration = compat_urllib_parse.unquote_plus(video_info['length_seconds'][0])
+ video_duration = int(compat_urllib_parse.unquote_plus(video_info['length_seconds'][0]))
# annotations
video_annotations = None
'annotations': video_annotations,
'webpage_url': 'https://www.youtube.com/watch?v=%s' % video_id,
'view_count': view_count,
+ 'like_count': like_count,
+ 'dislike_count': dislike_count,
})
return results
\? (?:.*?&)*? (?:p|a|list)=
| p/
)
- ((?:PL|EC|UU|FL)?[0-9A-Za-z-_]{10,})
+ ((?:PL|EC|UU|FL|RD)?[0-9A-Za-z-_]{10,})
.*
|
- ((?:PL|EC|UU|FL)[0-9A-Za-z-_]{10,})
+ ((?:PL|EC|UU|FL|RD)[0-9A-Za-z-_]{10,})
)"""
_TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&page=%s'
_MORE_PAGES_INDICATOR = r'data-link-type="next"'
def _extract_mix(self, playlist_id):
# The mixes are generated from a a single video
# the id of the playlist is just 'RD' + video_id
- url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[2:], playlist_id)
+ url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
webpage = self._download_webpage(url, playlist_id, u'Downloading Youtube mix')
title_span = (get_element_by_attribute('class', 'title long-title', webpage) or
get_element_by_attribute('class', 'title ', webpage))
else:
self.to_screen(u'Downloading playlist PL%s - add --no-playlist to just download video %s' % (playlist_id, video_id))
- if len(playlist_id) == 13: # 'RD' + 11 characters for the video id
+ if playlist_id.startswith('RD'):
# Mixes require a custom extraction process
return self._extract_mix(playlist_id)
+ if playlist_id.startswith('TL'):
+ raise ExtractorError(u'For downloading YouTube.com top lists, use '
+ u'the "yttoplist" keyword, for example "youtube-dl \'yttoplist:music:Top Tracks\'"', expected=True)
# Extract the video ids from the playlist pages
ids = []
return self.playlist_result(url_results, playlist_id, playlist_title)
+class YoutubeTopListIE(YoutubePlaylistIE):
+ IE_NAME = u'youtube:toplist'
+ IE_DESC = (u'YouTube.com top lists, "yttoplist:{channel}:{list title}"'
+ u' (Example: "yttoplist:music:Top Tracks")')
+ _VALID_URL = r'yttoplist:(?P<chann>.*?):(?P<title>.*?)$'
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ channel = mobj.group('chann')
+ title = mobj.group('title')
+ query = compat_urllib_parse.urlencode({'title': title})
+ playlist_re = 'href="([^"]+?%s[^"]+?)"' % re.escape(query)
+ channel_page = self._download_webpage('https://www.youtube.com/%s' % channel, title)
+ link = self._html_search_regex(playlist_re, channel_page, u'list')
+ url = compat_urlparse.urljoin('https://www.youtube.com/', link)
+
+ video_re = r'data-index="\d+".*?data-video-id="([0-9A-Za-z_-]{11})"'
+ ids = []
+ # sometimes the webpage doesn't contain the videos
+ # retry until we get them
+ for i in itertools.count(0):
+ msg = u'Downloading Youtube mix'
+ if i > 0:
+ msg += ', retry #%d' % i
+ webpage = self._download_webpage(url, title, msg)
+ ids = orderedSet(re.findall(video_re, webpage))
+ if ids:
+ break
+ url_results = self._ids_to_results(ids)
+ return self.playlist_result(url_results, playlist_title=title)
+
+
class YoutubeChannelIE(InfoExtractor):
IE_DESC = u'YouTube.com channels'
_VALID_URL = r"^(?:https?://)?(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/([0-9A-Za-z_-]+)"
video_ids = []
url = 'https://www.youtube.com/channel/%s/videos' % channel_id
channel_page = self._download_webpage(url, channel_id)
- if re.search(r'channel-header-autogenerated-label', channel_page) is not None:
- autogenerated = True
- else:
- autogenerated = False
+ autogenerated = re.search(r'''(?x)
+ class="[^"]*?(?:
+ channel-header-autogenerated-label|
+ yt-channel-title-autogenerated
+ )[^"]*"''', channel_page) is not None
if autogenerated:
# The videos are contained in a single page
# page by page until there are no video ids - it means we got
# all of them.
- video_ids = []
+ url_results = []
for pagenum in itertools.count(0):
start_index = pagenum * self._GDATA_PAGE_SIZE + 1
break
# Extract video identifiers
- ids_in_page = []
- for entry in response['feed']['entry']:
- ids_in_page.append(entry['id']['$t'].split('/')[-1])
- video_ids.extend(ids_in_page)
+ entries = response['feed']['entry']
+ for entry in entries:
+ title = entry['title']['$t']
+ video_id = entry['id']['$t'].split('/')[-1]
+ url_results.append({
+ '_type': 'url',
+ 'url': video_id,
+ 'ie_key': 'Youtube',
+ 'id': 'video_id',
+ 'title': title,
+ })
# A little optimization - if current page is not
# "full", ie. does not contain PAGE_SIZE video ids then
# are no more ids on further pages - no need to query
# again.
- if len(ids_in_page) < self._GDATA_PAGE_SIZE:
+ if len(entries) < self._GDATA_PAGE_SIZE:
break
- url_results = [
- self.url_result(video_id, 'Youtube', video_id=video_id)
- for video_id in video_ids]
return self.playlist_result(url_results, playlist_title=username)
IE_NAME = u'youtube:search'
_SEARCH_KEY = 'ytsearch'
- def report_download_page(self, query, pagenum):
- """Report attempt to download search page with given number."""
- self._downloader.to_screen(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
-
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
limit = n
while (50 * pagenum) < limit:
- self.report_download_page(query, pagenum+1)
result_url = self._API_URL % (compat_urllib_parse.quote_plus(query), (50*pagenum)+1)
- request = compat_urllib_request.Request(result_url)
- try:
- data = compat_urllib_request.urlopen(request).read().decode('utf-8')
- except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
- raise ExtractorError(u'Unable to download API page: %s' % compat_str(err))
- api_response = json.loads(data)['data']
-
- if not 'items' in api_response:
+ data_json = self._download_webpage(
+ result_url, video_id=u'query "%s"' % query,
+ note=u'Downloading page %s' % (pagenum + 1),
+ errnote=u'Unable to download API page')
+ data = json.loads(data_json)
+ api_response = data['data']
+
+ if 'items' not in api_response:
raise ExtractorError(u'[youtube] No video results')
new_ids = list(video['id'] for video in api_response['items'])
try:
proto_pref = -PROTO_ORDER.index(format_m.group('proto'))
except ValueError:
- proto_pref = 999
+ proto_pref = -999
quality = fnode.find('./quality').text
QUALITY_ORDER = ['veryhigh', '300', 'high', 'med', 'low']
try:
quality_pref = -QUALITY_ORDER.index(quality)
except ValueError:
- quality_pref = 999
+ quality_pref = -999
abr = int(fnode.find('./audioBitrate').text) // 1000
vbr = int(fnode.find('./videoBitrate').text) // 1000
#!/usr/bin/env python
# -*- coding: utf-8 -*-
+import ctypes
import datetime
import email.utils
import errno
import re
import ssl
import socket
+import subprocess
import sys
import traceback
-import xml.etree.ElementTree
import zlib
try:
def connect(self):
sock = socket.create_connection((self.host, self.port), self.timeout)
- if self._tunnel_host:
+ if getattr(self, '_tunnel_host', False):
self.sock = sock
self._tunnel()
try:
return HTTPSHandlerV3()
else:
context = ssl.SSLContext(ssl.PROTOCOL_SSLv3)
- context.set_default_verify_paths()
-
context.verify_mode = (ssl.CERT_NONE
if opts_no_check_certificate
else ssl.CERT_REQUIRED)
+ context.set_default_verify_paths()
+ try:
+ context.load_default_certs()
+ except AttributeError:
+ pass # Python < 3.4
return compat_urllib_request.HTTPSHandler(context=context)
class ExtractorError(Exception):
upload_date = datetime.datetime.strptime(date_str, expression).strftime('%Y%m%d')
except:
pass
+ if upload_date is None:
+ timetuple = email.utils.parsedate_tz(date_str)
+ if timetuple:
+ upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
return upload_date
def determine_ext(url, default_ext=u'unknown_video'):
suffix = [u'B', u'KiB', u'MiB', u'GiB', u'TiB', u'PiB', u'EiB', u'ZiB', u'YiB'][exponent]
converted = float(bytes) / float(1024 ** exponent)
return u'%.2f%s' % (converted, suffix)
+
+
+def str_to_int(int_str):
+ int_str = re.sub(r'[,\.]', u'', int_str)
+ return int(int_str)
+
+
+def get_term_width():
+ columns = os.environ.get('COLUMNS', None)
+ if columns:
+ return int(columns)
+
+ try:
+ sp = subprocess.Popen(
+ ['stty', 'size'],
+ stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+ out, err = sp.communicate()
+ return int(out.split()[1])
+ except:
+ pass
+ return None
+
+
+def month_by_name(name):
+ """ Return the number of a month by (locale-independently) English name """
+
+ ENGLISH_NAMES = [
+ u'January', u'February', u'March', u'April', u'May', u'June',
+ u'July', u'August', u'September', u'October', u'November', u'December']
+ try:
+ return ENGLISH_NAMES.index(name) + 1
+ except ValueError:
+ return None
+
+
+def fix_xml_all_ampersand(xml_str):
+ """Replace all the '&' by '&' in XML"""
+ return xml_str.replace(u'&', u'&')
+
+
+def setproctitle(title):
+ assert isinstance(title, type(u''))
+ try:
+ libc = ctypes.cdll.LoadLibrary("libc.so.6")
+ except OSError:
+ return
+ title = title
+ buf = ctypes.create_string_buffer(len(title) + 1)
+ buf.value = title.encode('utf-8')
+ try:
+ libc.prctl(15, ctypes.byref(buf), 0, 0, 0)
+ except AttributeError:
+ return # Strange libc, just skip this
+
+
+def remove_start(s, start):
+ if s.startswith(start):
+ return s[len(start):]
+ return s
+
+
+def url_basename(url):
+ path = compat_urlparse.urlparse(url).path
+ return path.strip(u'/').split(u'/')[-1]
+
+
+class HEADRequest(compat_urllib_request.Request):
+ def get_method(self):
+ return "HEAD"
-__version__ = '2013.12.04'
+__version__ = '2013.12.23'