Imported Upstream version 2013.12.23

author Rogério Brito <rbrito@ime.usp.br>

Thu, 26 Dec 2013 18:41:27 +0000 (16:41 -0200)

committer Rogério Brito <rbrito@ime.usp.br>

Thu, 26 Dec 2013 18:41:27 +0000 (16:41 -0200)
author Rogério Brito <rbrito@ime.usp.br>
Thu, 26 Dec 2013 18:41:27 +0000 (16:41 -0200)
committer Rogério Brito <rbrito@ime.usp.br>
Thu, 26 Dec 2013 18:41:27 +0000 (16:41 -0200)
diff --git a/README.md b/README.md

index 029c418d16e332c73942bf6b60ac6470d8b8429f..caed9484672d8844890d43d98708f83044e6b3d2 100644 (file)
--- a/README.md
+++ b/README.md
@@ -38,6 +38,8 @@ which means you can modify it, redistribute it or use it however you like.
                                 default $XDG_CACHE_HOME/youtube-dl or ~/.cache
                                 /youtube-dl .
      --no-cache-dir             Disable filesystem caching
+    --bidi-workaround          Work around terminals that lack bidirectional
+                               text support. Requires fribidi executable in PATH
  
  ## Video Selection:
      --playlist-start NUMBER    playlist video to start at (default is 1)
@@ -54,6 +56,10 @@ which means you can modify it, redistribute it or use it however you like.
      --date DATE                download only videos uploaded in this date
      --datebefore DATE          download only videos uploaded before this date
      --dateafter DATE           download only videos uploaded after this date
+    --min-views COUNT          Do not download any videos with less than COUNT
+                               views
+    --max-views COUNT          Do not download any videos with more than COUNT
+                               views
      --no-playlist              download only the currently playing video
      --age-limit YEARS          download only videos suitable for the given age
      --download-archive FILE    Download only videos not listed in the archive
@@ -98,6 +104,8 @@ which means you can modify it, redistribute it or use it however you like.
      --restrict-filenames       Restrict filenames to only ASCII characters, and
                                 avoid "&" and spaces in filenames
      -a, --batch-file FILE      file containing URLs to download ('-' for stdin)
+    --load-info FILE           json file containing the video information
+                               (created with the "--write-json" option
      -w, --no-overwrites        do not overwrite files
      -c, --continue             force resume of partially downloaded files. By
                                 default, youtube-dl will resume downloads if
@@ -123,6 +131,7 @@ which means you can modify it, redistribute it or use it however you like.
      --get-id                   simulate, quiet but print id
      --get-thumbnail            simulate, quiet but print thumbnail URL
      --get-description          simulate, quiet but print video description
+    --get-duration             simulate, quiet but print video length
      --get-filename             simulate, quiet but print output filename
      --get-format               simulate, quiet but print output format
      -j, --dump-json            simulate, quiet but print JSON information
@@ -274,14 +283,54 @@ This README file was originally written by Daniel Bolton (<https://github.com/db
  
  # BUGS
  
-Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>
+Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email.
  
-Please include:
-
-* Your exact command line, like `youtube-dl -t "http://www.youtube.com/watch?v=uHlDtZ6Oc3s&feature=channel_video_title"`. A common mistake is not to escape the `&`. Putting URLs in quotes should solve this problem.
-* If possible re-run the command with `--verbose`, and include the full output, it is really helpful to us.
-* The output of `youtube-dl --version`
-* The output of `python --version`
-* The name and version of your Operating System ("Ubuntu 11.04 x64" or "Windows 7 x64" is usually enough).
+Please include the full output of the command when run with `--verbose`. The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
  
  For discussions, join us in the irc channel #youtube-dl on freenode.
+
+When you submit a request, please re-read it once to avoid a couple of mistakes (you can and should use this as a checklist):
+
+### Is the description of the issue itself sufficient?
+
+We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts.
+
+So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious
+
+- What the problem is
+- How it could be fixed
+- How your proposed solution would look like
+
+If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a commiter myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
+
+For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
+
+Site support requests must contain an example URL. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
+
+###  Are you using the latest version?
+
+Before reporting any issue, type youtube-dl -U. This should report that you're up-to-date. Ábout 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
+
+###  Is the issue already documented?
+
+Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or at https://github.com/rg3/youtube-dl/search?type=Issues . If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
+
+###  Why are existing options not enough?
+
+Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
+
+###  Is there enough context in your bug report?
+
+People want to solve problems, and often think they do us a favor by breaking down their larger problems (e.g. wanting to skip already downloaded files) to a specific request (e.g. requesting us to look whether the file exists before downloading the info page). However, what often happens is that they break down the problem into two steps: One simple, and one impossible (or extremely complicated one).
+
+We are then presented with a very complicated request when the original problem could be solved far easier, e.g. by recording the downloaded video IDs in a separate file. To avoid this, you must include the greater context where it is non-obvious. In particular, every feature request that does not consist of adding support for a new site should contain a use case scenario that explains in what situation the missing feature would be useful.
+
+###  Does the issue involve one problem, and one problem only?
+
+Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
+
+In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
+
+###  Is anyone going to need the feature?
+
+Only post features that you (or an incapicated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
diff --git a/README.txt b/README.txt

index fc9f17e558e28b26a7459ddd519fb1cf84185be8..3645b8c6ce293feb0a921805377a50ef9e384e08 100644 (file)
--- a/README.txt
+++ b/README.txt
@@ -45,6 +45,8 @@ OPTIONS
                                 default $XDG_CACHE_HOME/youtube-dl or ~/.cache
                                 /youtube-dl .
      --no-cache-dir             Disable filesystem caching
+    --bidi-workaround          Work around terminals that lack bidirectional
+                               text support. Requires fribidi executable in PATH
  
  Video Selection:
  ----------------
@@ -63,6 +65,10 @@ Video Selection:
      --date DATE                download only videos uploaded in this date
      --datebefore DATE          download only videos uploaded before this date
      --dateafter DATE           download only videos uploaded after this date
+    --min-views COUNT          Do not download any videos with less than COUNT
+                               views
+    --max-views COUNT          Do not download any videos with more than COUNT
+                               views
      --no-playlist              download only the currently playing video
      --age-limit YEARS          download only videos suitable for the given age
      --download-archive FILE    Download only videos not listed in the archive
@@ -111,6 +117,8 @@ Filesystem Options:
      --restrict-filenames       Restrict filenames to only ASCII characters, and
                                 avoid "&" and spaces in filenames
      -a, --batch-file FILE      file containing URLs to download ('-' for stdin)
+    --load-info FILE           json file containing the video information
+                               (created with the "--write-json" option
      -w, --no-overwrites        do not overwrite files
      -c, --continue             force resume of partially downloaded files. By
                                 default, youtube-dl will resume downloads if
@@ -138,6 +146,7 @@ Verbosity / Simulation Options:
      --get-id                   simulate, quiet but print id
      --get-thumbnail            simulate, quiet but print thumbnail URL
      --get-description          simulate, quiet but print video description
+    --get-duration             simulate, quiet but print video length
      --get-filename             simulate, quiet but print output filename
      --get-format               simulate, quiet but print output format
      -j, --dump-json            simulate, quiet but print JSON information
@@ -354,19 +363,118 @@ BUGS
  ====
  
  Bugs and suggestions should be reported at:
-https://github.com/rg3/youtube-dl/issues
+https://github.com/rg3/youtube-dl/issues . Unless you were prompted so
+or there is another pertinent reason (e.g. GitHub fails to accept the
+bug report), please do not send bug reports via personal email.
  
-Please include:
-
--   Your exact command line, like
-    youtube-dl -t "http://www.youtube.com/watch?v=uHlDtZ6Oc3s&feature=channel_video_title".
-    A common mistake is not to escape the &. Putting URLs in quotes
-    should solve this problem.
--   If possible re-run the command with --verbose, and include the full
-    output, it is really helpful to us.
--   The output of youtube-dl --version
--   The output of python --version
--   The name and version of your Operating System ("Ubuntu 11.04 x64" or
-    "Windows 7 x64" is usually enough).
+Please include the full output of the command when run with --verbose.
+The output (including the first lines) contain important debugging
+information. Issues without the full output are often not reproducible
+and therefore do not get solved in short order, if ever.
  
  For discussions, join us in the irc channel #youtube-dl on freenode.
+
+When you submit a request, please re-read it once to avoid a couple of
+mistakes (you can and should use this as a checklist):
+
+Is the description of the issue itself sufficient?
+
+We often get issue reports that we cannot really decipher. While in most
+cases we eventually get the required information after asking back
+multiple times, this poses an unnecessary drain on our resources. Many
+contributors, including myself, are also not native speakers, so we may
+misread some parts.
+
+So please elaborate on what feature you are requesting, or what bug you
+want to be fixed. Make sure that it's obvious
+
+-   What the problem is
+-   How it could be fixed
+-   How your proposed solution would look like
+
+If your report is shorter than two lines, it is almost certainly missing
+some of these, which makes it hard for us to respond to it. We're often
+too polite to close the issue outright, but the missing info makes
+misinterpretation likely. As a commiter myself, I often get frustrated
+by these issues, since the only possible way for me to move forward on
+them is to ask for clarification over and over.
+
+For bug reports, this means that your report should contain the complete
+output of youtube-dl when called with the -v flag. The error message you
+get for (most) bugs even says so, but you would not believe how many of
+our bug reports do not contain this information.
+
+Site support requests must contain an example URL. An example URL is a
+URL you might want to download, like
+http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious
+video present. Except under very special circumstances, the main page of
+a video service (e.g. http://www.youtube.com/ ) is not an example URL.
+
+Are you using the latest version?
+
+Before reporting any issue, type youtube-dl -U. This should report that
+you're up-to-date. Ábout 20% of the reports we receive are already
+fixed, but people are using outdated versions. This goes for feature
+requests as well.
+
+Is the issue already documented?
+
+Make sure that someone has not already opened the issue you're trying to
+open. Search at the top of the window or at
+https://github.com/rg3/youtube-dl/search?type=Issues . If there is an
+issue, feel free to write something along the lines of "This affects me
+as well, with version 2015.01.01. Here is some more information on the
+issue: ...". While some issues may be old, a new post into them often
+spurs rapid activity.
+
+Why are existing options not enough?
+
+Before requesting a new feature, please have a quick peek at the list of
+supported options. Many feature requests are for features that actually
+exist already! Please, absolutely do show off your work in the issue
+report and detail how the existing similar options do not solve your
+problem.
+
+Is there enough context in your bug report?
+
+People want to solve problems, and often think they do us a favor by
+breaking down their larger problems (e.g. wanting to skip already
+downloaded files) to a specific request (e.g. requesting us to look
+whether the file exists before downloading the info page). However, what
+often happens is that they break down the problem into two steps: One
+simple, and one impossible (or extremely complicated one).
+
+We are then presented with a very complicated request when the original
+problem could be solved far easier, e.g. by recording the downloaded
+video IDs in a separate file. To avoid this, you must include the
+greater context where it is non-obvious. In particular, every feature
+request that does not consist of adding support for a new site should
+contain a use case scenario that explains in what situation the missing
+feature would be useful.
+
+Does the issue involve one problem, and one problem only?
+
+Some of our users seem to think there is a limit of issues they can or
+should open. There is no limit of issues they can or should open. While
+it may seem appealing to be able to dump all your issues into one
+ticket, that means that someone who solves one of your issues cannot
+mark the issue as closed. Typically, reporting a bunch of issues leads
+to the ticket lingering since nobody wants to attack that behemoth,
+until someone mercifully splits the issue into multiple ones.
+
+In particular, every site support request issue should only pertain to
+services at one site (generally under a common domain, but always using
+the same backend technology). Do not request support for vimeo user
+videos, Whitehouse podcasts, and Google Plus pages in the same issue.
+Also, make sure that you don't post bug reports alongside feature
+requests. As a rule of thumb, a feature request does not include outputs
+of youtube-dl that are not immediately related to the feature at hand.
+Do not post reports of a network error alongside the request for a new
+video service.
+
+Is anyone going to need the feature?
+
+Only post features that you (or an incapicated friend you can personally
+talk to) require. Do not post features because they seem like a good
+idea. If they are really useful, they will be requested by someone who
+requires them.
diff --git a/test/test_YoutubeDL.py b/test/test_YoutubeDL.py

index 58cf9c313607020d1493b420f8b93e18ccccd474..3100c362aa6940d2c557dffb5cabb0f5564ef4a8 100644 (file)
--- a/test/test_YoutubeDL.py
+++ b/test/test_YoutubeDL.py
@@ -7,6 +7,7 @@ import unittest
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  
  from test.helper import FakeYDL
+from youtube_dl import YoutubeDL
  
  
  class YDL(FakeYDL):
@@ -140,6 +141,20 @@ class TestFormatSelection(unittest.TestCase):
          self.assertEqual(test_dict['extractor'], 'Foo')
          self.assertEqual(test_dict['playlist'], 'funny videos')
  
+    def test_prepare_filename(self):
+        info = {
+            u'id': u'1234',
+            u'ext': u'mp4',
+            u'width': None,
+        }
+        def fname(templ):
+            ydl = YoutubeDL({'outtmpl': templ})
+            return ydl.prepare_filename(info)
+        self.assertEqual(fname(u'%(id)s.%(ext)s'), u'1234.mp4')
+        self.assertEqual(fname(u'%(id)s-%(width)s.%(ext)s'), u'1234-NA.mp4')
+        # Replace missing fields with 'NA'
+        self.assertEqual(fname(u'%(uploader_date)s-%(id)s.%(ext)s'), u'NA-1234.mp4')
+
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_all_urls.py b/test/test_all_urls.py

index 6b9764c67e98ba47a63b227d824bb97b82757b53..bd77b7c30149d556caa1237b4be4c06a56adc613 100644 (file)
--- a/test/test_all_urls.py
+++ b/test/test_all_urls.py
@@ -10,6 +10,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  from test.helper import get_testcases
  
  from youtube_dl.extractor import (
+    FacebookIE,
      gen_extractors,
      JustinTVIE,
      YoutubeIE,
@@ -87,12 +88,15 @@ class TestAllURLsMatching(unittest.TestCase):
          assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
          assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
  
+    def test_facebook_matching(self):
+        self.assertTrue(FacebookIE.suitable(u'https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
+
      def test_no_duplicates(self):
          ies = gen_extractors()
          for tc in get_testcases():
              url = tc['url']
              for ie in ies:
-                if type(ie).__name__ in ['GenericIE', tc['name'] + 'IE']:
+                if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'):
                      self.assertTrue(ie.suitable(url), '%s should match URL %r' % (type(ie).__name__, url))
                  else:
                      self.assertFalse(ie.suitable(url), '%s should not match URL %r' % (type(ie).__name__, url))
@@ -110,6 +114,9 @@ class TestAllURLsMatching(unittest.TestCase):
          self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel'])
          self.assertMatch('http://vimeo.com/user7108434', ['vimeo:user'])
  
+    # https://github.com/rg3/youtube-dl/issues/1930
+    def test_soundcloud_not_matching_sets(self):
+        self.assertMatch('http://soundcloud.com/floex/sets/gone-ep', ['soundcloud:set'])
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_playlists.py b/test/test_playlists.py

index 00c950109111c709be0228f351220532c638467f..1b7b4e3d808cb936fa5fac07136049bd174a4490 100644 (file)
--- a/test/test_playlists.py
+++ b/test/test_playlists.py
@@ -12,10 +12,13 @@ from test.helper import FakeYDL
  
  
  from youtube_dl.extractor import (
+    AcademicEarthCourseIE,
      DailymotionPlaylistIE,
      DailymotionUserIE,
      VimeoChannelIE,
      VimeoUserIE,
+    VimeoAlbumIE,
+    VimeoGroupsIE,
      UstreamChannelIE,
      SoundcloudSetIE,
      SoundcloudUserIE,
@@ -24,7 +27,8 @@ from youtube_dl.extractor import (
      BambuserChannelIE,
      BandcampAlbumIE,
      SmotriCommunityIE,
-    SmotriUserIE
+    SmotriUserIE,
+    IviCompilationIE
  )
  
  
@@ -65,6 +69,22 @@ class TestPlaylists(unittest.TestCase):
          self.assertEqual(result['title'], u'Nki')
          self.assertTrue(len(result['entries']) > 65)
  
+    def test_vimeo_album(self):
+        dl = FakeYDL()
+        ie = VimeoAlbumIE(dl)
+        result = ie.extract('http://vimeo.com/album/2632481')
+        self.assertIsPlaylist(result)
+        self.assertEqual(result['title'], u'Staff Favorites: November 2013')
+        self.assertTrue(len(result['entries']) > 12)
+
+    def test_vimeo_groups(self):
+        dl = FakeYDL()
+        ie = VimeoGroupsIE(dl)
+        result = ie.extract('http://vimeo.com/groups/rolexawards')
+        self.assertIsPlaylist(result)
+        self.assertEqual(result['title'], u'Rolex Awards for Enterprise')
+        self.assertTrue(len(result['entries']) > 72)
+
      def test_ustream_channel(self):
          dl = FakeYDL()
          ie = UstreamChannelIE(dl)
@@ -140,5 +160,34 @@ class TestPlaylists(unittest.TestCase):
          self.assertEqual(result['title'], u'Inspector')
          self.assertTrue(len(result['entries']) >= 9)
  
+    def test_AcademicEarthCourse(self):
+        dl = FakeYDL()
+        ie = AcademicEarthCourseIE(dl)
+        result = ie.extract(u'http://academicearth.org/courses/building-dynamic-websites/')
+        self.assertIsPlaylist(result)
+        self.assertEqual(result['id'], u'building-dynamic-websites')
+        self.assertEqual(result['title'], u'Building Dynamic Websites')
+        self.assertEqual(result['description'], u"Today's websites are increasingly dynamic. Pages are no longer static HTML files but instead generated by scripts and database calls. User interfaces are more seamless, with technologies like Ajax replacing traditional page reloads. This course teaches students how to build dynamic websites with Ajax and with Linux, Apache, MySQL, and PHP (LAMP), one of today's most popular frameworks. Students learn how to set up domain names with DNS, how to structure pages with XHTML and CSS, how to program in JavaScript and PHP, how to configure Apache and MySQL, how to design and query databases with SQL, how to use Ajax with both XML and JSON, and how to build mashups. The course explores issues of security, scalability, and cross-browser support and also discusses enterprise-level deployments of websites, including third-party hosting, virtualization, colocation in data centers, firewalling, and load-balancing.")
+        self.assertEqual(len(result['entries']), 10)
+        
+    def test_ivi_compilation(self):
+        dl = FakeYDL()
+        ie = IviCompilationIE(dl)
+        result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel')
+        self.assertIsPlaylist(result)
+        self.assertEqual(result['id'], u'dezhurnyi_angel')
+        self.assertEqual(result['title'], u'Дежурный ангел (2010 - 2012)')
+        self.assertTrue(len(result['entries']) >= 36)
+        
+    def test_ivi_compilation_season(self):
+        dl = FakeYDL()
+        ie = IviCompilationIE(dl)
+        result = ie.extract('http://www.ivi.ru/watch/dezhurnyi_angel/season2')
+        self.assertIsPlaylist(result)
+        self.assertEqual(result['id'], u'dezhurnyi_angel/season2')
+        self.assertEqual(result['title'], u'Дежурный ангел (2010 - 2012) 2 сезон')
+        self.assertTrue(len(result['entries']) >= 20)
+
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_utils.py b/test/test_utils.py

index e9e590e749f131a0950c79bcf4fee1e9fb9004c2..e5778cd83ee9ea74e4786243f1e6279aed3697d3 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -13,19 +13,21 @@ import xml.etree.ElementTree
  
  #from youtube_dl.utils import htmlentity_transform
  from youtube_dl.utils import (
-    timeconvert,
-    sanitize_filename,
-    unescapeHTML,
-    orderedSet,
      DateRange,
-    unified_strdate,
+    encodeFilename,
      find_xpath_attr,
      get_meta_content,
-    xpath_with_ns,
+    orderedSet,
+    sanitize_filename,
+    shell_quote,
      smuggle_url,
+    str_to_int,
+    timeconvert,
+    unescapeHTML,
+    unified_strdate,
      unsmuggle_url,
-    shell_quote,
-    encodeFilename,
+    url_basename,
+    xpath_with_ns,
  )
  
  if sys.version_info < (3, 0):
@@ -176,6 +178,19 @@ class TestUtil(unittest.TestCase):
          args = ['ffmpeg', '-i', encodeFilename(u'ñ€ß\'.mp4')]
          self.assertEqual(shell_quote(args), u"""ffmpeg -i 'ñ€ß'"'"'.mp4'""")
  
+    def test_str_to_int(self):
+        self.assertEqual(str_to_int('123,456'), 123456)
+        self.assertEqual(str_to_int('123.456'), 123456)
+
+    def test_url_basename(self):
+        self.assertEqual(url_basename(u'http://foo.de/'), u'')
+        self.assertEqual(url_basename(u'http://foo.de/bar/baz'), u'baz')
+        self.assertEqual(url_basename(u'http://foo.de/bar/baz?x=y'), u'baz')
+        self.assertEqual(url_basename(u'http://foo.de/bar/baz#x=y'), u'baz')
+        self.assertEqual(url_basename(u'http://foo.de/bar/baz/'), u'baz')
+        self.assertEqual(
+            url_basename(u'http://media.w3.org/2010/05/sintel/trailer.mp4'),
+            u'trailer.mp4')
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_write_info_json.py b/test/test_write_info_json.py

index d7177611b5e1a90aa3bdf612ae873336ff44d686..90426a559551571a7769c6137a76270a4ad47b1c 100644 (file)
--- a/test/test_write_info_json.py
+++ b/test/test_write_info_json.py
@@ -33,6 +33,7 @@ TEST_ID = 'BaW_jenozKc'
  INFO_JSON_FILE = TEST_ID + '.info.json'
  DESCRIPTION_FILE = TEST_ID + '.mp4.description'
  EXPECTED_DESCRIPTION = u'''test chars:  "'/\ä↭𝕐
+test URL: https://github.com/rg3/youtube-dl/issues/1892
  
  This is a test video for youtube-dl.
  
diff --git a/test/test_youtube_lists.py b/test/test_youtube_lists.py

index 95f07d129b61df97376c6699f05a656043be4773..d9fe5af4eec5fe791fce6d7fc06a6efb262213e4 100644 (file)
--- a/test/test_youtube_lists.py
+++ b/test/test_youtube_lists.py
@@ -15,6 +15,7 @@ from youtube_dl.extractor import (
      YoutubeIE,
      YoutubeChannelIE,
      YoutubeShowIE,
+    YoutubeTopListIE,
  )
  
  
@@ -116,5 +117,12 @@ class TestYoutubeLists(unittest.TestCase):
          original_video = entries[0]
          self.assertEqual(original_video['id'], 'rjFaenf1T-Y')
  
+    def test_youtube_toplist(self):
+        dl = FakeYDL()
+        ie = YoutubeTopListIE(dl)
+        result = ie.extract('yttoplist:music:Top Tracks')
+        entries = result['entries']
+        self.assertTrue(len(entries) >= 5)
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/youtube-dl b/youtube-dl

index 899ca28bd80ce245f7afd7cbadc2eae53755a5ff..2e3e8a9b6cf644d8b656750bfe3386298e6609d3 100755 (executable)

Binary files a/youtube-dl and b/youtube-dl differ
diff --git a/youtube-dl.1 b/youtube-dl.1

index a172bf858dd95b9b5e55d353cd801b57643b308d..c99538c06871d2bd1765f1d50994d43c493dc2c6 100644 (file)
--- a/youtube-dl.1
+++ b/youtube-dl.1
@@ -42,6 +42,8 @@ redistribute it or use it however you like.
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ default\ $XDG_CACHE_HOME/youtube\-dl\ or\ ~/.cache
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ /youtube\-dl\ .
  \-\-no\-cache\-dir\ \ \ \ \ \ \ \ \ \ \ \ \ Disable\ filesystem\ caching
+\-\-bidi\-workaround\ \ \ \ \ \ \ \ \ \ Work\ around\ terminals\ that\ lack\ bidirectional
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ text\ support.\ Requires\ fribidi\ executable\ in\ PATH
  \f[]
  .fi
  .SS Video Selection:
@@ -62,6 +64,10 @@ redistribute it or use it however you like.
  \-\-date\ DATE\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ download\ only\ videos\ uploaded\ in\ this\ date
  \-\-datebefore\ DATE\ \ \ \ \ \ \ \ \ \ download\ only\ videos\ uploaded\ before\ this\ date
  \-\-dateafter\ DATE\ \ \ \ \ \ \ \ \ \ \ download\ only\ videos\ uploaded\ after\ this\ date
+\-\-min\-views\ COUNT\ \ \ \ \ \ \ \ \ \ Do\ not\ download\ any\ videos\ with\ less\ than\ COUNT
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ views
+\-\-max\-views\ COUNT\ \ \ \ \ \ \ \ \ \ Do\ not\ download\ any\ videos\ with\ more\ than\ COUNT
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ views
  \-\-no\-playlist\ \ \ \ \ \ \ \ \ \ \ \ \ \ download\ only\ the\ currently\ playing\ video
  \-\-age\-limit\ YEARS\ \ \ \ \ \ \ \ \ \ download\ only\ videos\ suitable\ for\ the\ given\ age
  \-\-download\-archive\ FILE\ \ \ \ Download\ only\ videos\ not\ listed\ in\ the\ archive
@@ -114,6 +120,8 @@ redistribute it or use it however you like.
  \-\-restrict\-filenames\ \ \ \ \ \ \ Restrict\ filenames\ to\ only\ ASCII\ characters,\ and
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ avoid\ "&"\ and\ spaces\ in\ filenames
  \-a,\ \-\-batch\-file\ FILE\ \ \ \ \ \ file\ containing\ URLs\ to\ download\ (\[aq]\-\[aq]\ for\ stdin)
+\-\-load\-info\ FILE\ \ \ \ \ \ \ \ \ \ \ json\ file\ containing\ the\ video\ information
+\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (created\ with\ the\ "\-\-write\-json"\ option
  \-w,\ \-\-no\-overwrites\ \ \ \ \ \ \ \ do\ not\ overwrite\ files
  \-c,\ \-\-continue\ \ \ \ \ \ \ \ \ \ \ \ \ force\ resume\ of\ partially\ downloaded\ files.\ By
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ default,\ youtube\-dl\ will\ resume\ downloads\ if
@@ -143,6 +151,7 @@ redistribute it or use it however you like.
  \-\-get\-id\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ id
  \-\-get\-thumbnail\ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ thumbnail\ URL
  \-\-get\-description\ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ video\ description
+\-\-get\-duration\ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ video\ length
  \-\-get\-filename\ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ output\ filename
  \-\-get\-format\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ output\ format
  \-j,\ \-\-dump\-json\ \ \ \ \ \ \ \ \ \ \ \ simulate,\ quiet\ but\ print\ JSON\ information
@@ -383,23 +392,132 @@ domain.
  .SH BUGS
  .PP
  Bugs and suggestions should be reported at:
-<https://github.com/rg3/youtube-dl/issues>
+<https://github.com/rg3/youtube-dl/issues> .
+Unless you were prompted so or there is another pertinent reason (e.g.
+GitHub fails to accept the bug report), please do not send bug reports
+via personal email.
  .PP
-Please include:
-.IP \[bu] 2
-Your exact command line, like
-\f[C]youtube\-dl\ \-t\ "http://www.youtube.com/watch?v=uHlDtZ6Oc3s&feature=channel_video_title"\f[].
-A common mistake is not to escape the \f[C]&\f[].
-Putting URLs in quotes should solve this problem.
-.IP \[bu] 2
-If possible re\-run the command with \f[C]\-\-verbose\f[], and include
-the full output, it is really helpful to us.
+Please include the full output of the command when run with
+\f[C]\-\-verbose\f[].
+The output (including the first lines) contain important debugging
+information.
+Issues without the full output are often not reproducible and therefore
+do not get solved in short order, if ever.
+.PP
+For discussions, join us in the irc channel #youtube\-dl on freenode.
+.PP
+When you submit a request, please re\-read it once to avoid a couple of
+mistakes (you can and should use this as a checklist):
+.SS Is the description of the issue itself sufficient?
+.PP
+We often get issue reports that we cannot really decipher.
+While in most cases we eventually get the required information after
+asking back multiple times, this poses an unnecessary drain on our
+resources.
+Many contributors, including myself, are also not native speakers, so we
+may misread some parts.
+.PP
+So please elaborate on what feature you are requesting, or what bug you
+want to be fixed.
+Make sure that it\[aq]s obvious
  .IP \[bu] 2
-The output of \f[C]youtube\-dl\ \-\-version\f[]
+What the problem is
  .IP \[bu] 2
-The output of \f[C]python\ \-\-version\f[]
+How it could be fixed
  .IP \[bu] 2
-The name and version of your Operating System ("Ubuntu 11.04 x64" or
-"Windows 7 x64" is usually enough).
+How your proposed solution would look like
  .PP
-For discussions, join us in the irc channel #youtube\-dl on freenode.
+If your report is shorter than two lines, it is almost certainly missing
+some of these, which makes it hard for us to respond to it.
+We\[aq]re often too polite to close the issue outright, but the missing
+info makes misinterpretation likely.
+As a commiter myself, I often get frustrated by these issues, since the
+only possible way for me to move forward on them is to ask for
+clarification over and over.
+.PP
+For bug reports, this means that your report should contain the
+\f[I]complete\f[] output of youtube\-dl when called with the \-v flag.
+The error message you get for (most) bugs even says so, but you would
+not believe how many of our bug reports do not contain this information.
+.PP
+Site support requests must contain an example URL.
+An example URL is a URL you might want to download, like
+http://www.youtube.com/watch?v=BaW_jenozKc .
+There should be an obvious video present.
+Except under very special circumstances, the main page of a video
+service (e.g.
+http://www.youtube.com/ ) is \f[I]not\f[] an example URL.
+.SS Are you using the latest version?
+.PP
+Before reporting any issue, type youtube\-dl \-U.
+This should report that you\[aq]re up\-to\-date.
+Ábout 20% of the reports we receive are already fixed, but people are
+using outdated versions.
+This goes for feature requests as well.
+.SS Is the issue already documented?
+.PP
+Make sure that someone has not already opened the issue you\[aq]re
+trying to open.
+Search at the top of the window or at
+https://github.com/rg3/youtube\-dl/search?type=Issues .
+If there is an issue, feel free to write something along the lines of
+"This affects me as well, with version 2015.01.01.
+Here is some more information on the issue: ...".
+While some issues may be old, a new post into them often spurs rapid
+activity.
+.SS Why are existing options not enough?
+.PP
+Before requesting a new feature, please have a quick peek at the list of
+supported
+options (https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis).
+Many feature requests are for features that actually exist already!
+Please, absolutely do show off your work in the issue report and detail
+how the existing similar options do \f[I]not\f[] solve your problem.
+.SS Is there enough context in your bug report?
+.PP
+People want to solve problems, and often think they do us a favor by
+breaking down their larger problems (e.g.
+wanting to skip already downloaded files) to a specific request (e.g.
+requesting us to look whether the file exists before downloading the
+info page).
+However, what often happens is that they break down the problem into two
+steps: One simple, and one impossible (or extremely complicated one).
+.PP
+We are then presented with a very complicated request when the original
+problem could be solved far easier, e.g.
+by recording the downloaded video IDs in a separate file.
+To avoid this, you must include the greater context where it is
+non\-obvious.
+In particular, every feature request that does not consist of adding
+support for a new site should contain a use case scenario that explains
+in what situation the missing feature would be useful.
+.SS Does the issue involve one problem, and one problem only?
+.PP
+Some of our users seem to think there is a limit of issues they can or
+should open.
+There is no limit of issues they can or should open.
+While it may seem appealing to be able to dump all your issues into one
+ticket, that means that someone who solves one of your issues cannot
+mark the issue as closed.
+Typically, reporting a bunch of issues leads to the ticket lingering
+since nobody wants to attack that behemoth, until someone mercifully
+splits the issue into multiple ones.
+.PP
+In particular, every site support request issue should only pertain to
+services at one site (generally under a common domain, but always using
+the same backend technology).
+Do not request support for vimeo user videos, Whitehouse podcasts, and
+Google Plus pages in the same issue.
+Also, make sure that you don\[aq]t post bug reports alongside feature
+requests.
+As a rule of thumb, a feature request does not include outputs of
+youtube\-dl that are not immediately related to the feature at hand.
+Do not post reports of a network error alongside the request for a new
+video service.
+.SS Is anyone going to need the feature?
+.PP
+Only post features that you (or an incapicated friend you can personally
+talk to) require.
+Do not post features because they seem like a good idea.
+If they are really useful, they will be requested by someone who
+requires them.
diff --git a/youtube-dl.bash-completion b/youtube-dl.bash-completion

index 9c30e7b529f0f38c996d1c4ddce6b0ff7791e221..0cb7e92759b45c92fe1e10c9393a2c87f8e8044e 100644 (file)
--- a/youtube-dl.bash-completion
+++ b/youtube-dl.bash-completion
@@ -4,7 +4,7 @@ __youtube_dl()
      COMPREPLY=()
      cur="${COMP_WORDS[COMP_CWORD]}"
      prev="${COMP_WORDS[COMP_CWORD-1]}"
-    opts="--help --version --update --ignore-errors --abort-on-error --dump-user-agent --user-agent --referer --list-extractors --extractor-descriptions --proxy --no-check-certificate --cache-dir --no-cache-dir --socket-timeout --playlist-start --playlist-end --match-title --reject-title --max-downloads --min-filesize --max-filesize --date --datebefore --dateafter --no-playlist --age-limit --download-archive --rate-limit --retries --buffer-size --no-resize-buffer --test --title --id --literal --auto-number --output --autonumber-size --restrict-filenames --batch-file --no-overwrites --continue --no-continue --cookies --no-part --no-mtime --write-description --write-info-json --write-annotations --write-thumbnail --quiet --simulate --skip-download --get-url --get-title --get-id --get-thumbnail --get-description --get-filename --get-format --dump-json --newline --no-progress --console-title --verbose --dump-intermediate-pages --write-pages --youtube-print-sig-code --format --all-formats --prefer-free-formats --max-quality --list-formats --write-sub --write-auto-sub --all-subs --list-subs --sub-format --sub-lang --username --password --netrc --video-password --extract-audio --audio-format --audio-quality --recode-video --keep-video --no-post-overwrites --embed-subs --add-metadata"
+    opts="--help --version --update --ignore-errors --abort-on-error --dump-user-agent --user-agent --referer --list-extractors --extractor-descriptions --proxy --no-check-certificate --cache-dir --no-cache-dir --socket-timeout --bidi-workaround --playlist-start --playlist-end --match-title --reject-title --max-downloads --min-filesize --max-filesize --date --datebefore --dateafter --min-views --max-views --no-playlist --age-limit --download-archive --rate-limit --retries --buffer-size --no-resize-buffer --test --title --id --literal --auto-number --output --autonumber-size --restrict-filenames --batch-file --load-info --no-overwrites --continue --no-continue --cookies --no-part --no-mtime --write-description --write-info-json --write-annotations --write-thumbnail --quiet --simulate --skip-download --get-url --get-title --get-id --get-thumbnail --get-description --get-duration --get-filename --get-format --dump-json --newline --no-progress --console-title --verbose --dump-intermediate-pages --write-pages --youtube-print-sig-code --format --all-formats --prefer-free-formats --max-quality --list-formats --write-sub --write-auto-sub --all-subs --list-subs --sub-format --sub-lang --username --password --netrc --video-password --extract-audio --audio-format --audio-quality --recode-video --keep-video --no-post-overwrites --embed-subs --add-metadata"
      keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater :ythistory"
      fileopts="-a|--batch-file|--download-archive|--cookies"
      diropts="--cache-dir"
diff --git a/youtube_dl/FileDownloader.py b/youtube_dl/FileDownloader.py

index 3ff9716b33b22e39a0a6d925bfa33aba8fa092f9..47124932fc7e9ff3c40ec29d003757cdb20cf967 100644 (file)
--- a/youtube_dl/FileDownloader.py
+++ b/youtube_dl/FileDownloader.py
@@ -204,11 +204,27 @@ class FileDownloader(object):
          """Report destination filename."""
          self.to_screen(u'[download] Destination: ' + filename)
  
+    def _report_progress_status(self, msg, is_last_line=False):
+        fullmsg = u'[download] ' + msg
+        if self.params.get('progress_with_newline', False):
+            self.to_screen(fullmsg)
+        else:
+            if os.name == 'nt':
+                prev_len = getattr(self, '_report_progress_prev_line_length',
+                                   0)
+                if prev_len > len(fullmsg):
+                    fullmsg += u' ' * (prev_len - len(fullmsg))
+                self._report_progress_prev_line_length = len(fullmsg)
+                clear_line = u'\r'
+            else:
+                clear_line = (u'\r\x1b[K' if sys.stderr.isatty() else u'\r')
+            self.to_screen(clear_line + fullmsg, skip_eol=not is_last_line)
+        self.to_console_title(u'youtube-dl ' + msg)
+
      def report_progress(self, percent, data_len_str, speed, eta):
          """Report download progress."""
          if self.params.get('noprogress', False):
              return
-        clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
          if eta is not None:
              eta_str = self.format_eta(eta)
          else:
@@ -218,14 +234,29 @@ class FileDownloader(object):
          else:
              percent_str = 'Unknown %'
          speed_str = self.format_speed(speed)
-        if self.params.get('progress_with_newline', False):
-            self.to_screen(u'[download] %s of %s at %s ETA %s' %
-                (percent_str, data_len_str, speed_str, eta_str))
+
+        msg = (u'%s of %s at %s ETA %s' %
+               (percent_str, data_len_str, speed_str, eta_str))
+        self._report_progress_status(msg)
+
+    def report_progress_live_stream(self, downloaded_data_len, speed, elapsed):
+        if self.params.get('noprogress', False):
+            return
+        downloaded_str = format_bytes(downloaded_data_len)
+        speed_str = self.format_speed(speed)
+        elapsed_str = FileDownloader.format_seconds(elapsed)
+        msg = u'%s at %s (%s)' % (downloaded_str, speed_str, elapsed_str)
+        self._report_progress_status(msg)
+
+    def report_finish(self, data_len_str, tot_time):
+        """Report download finished."""
+        if self.params.get('noprogress', False):
+            self.to_screen(u'[download] Download completed')
          else:
-            self.to_screen(u'\r%s[download] %s of %s at %s ETA %s' %
-                (clear_line, percent_str, data_len_str, speed_str, eta_str), skip_eol=True)
-        self.to_console_title(u'youtube-dl - %s of %s at %s ETA %s' %
-                (percent_str.strip(), data_len_str.strip(), speed_str.strip(), eta_str.strip()))
+            self._report_progress_status(
+                (u'100%% of %s in %s' %
+                 (data_len_str, self.format_seconds(tot_time))),
+                is_last_line=True)
  
      def report_resuming_byte(self, resume_len):
          """Report attempt to resume at given byte."""
@@ -246,16 +277,7 @@ class FileDownloader(object):
          """Report it was impossible to resume download."""
          self.to_screen(u'[download] Unable to resume')
  
-    def report_finish(self, data_len_str, tot_time):
-        """Report download finished."""
-        if self.params.get('noprogress', False):
-            self.to_screen(u'[download] Download completed')
-        else:
-            clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
-            self.to_screen(u'\r%s[download] 100%% of %s in %s' %
-                (clear_line, data_len_str, self.format_seconds(tot_time)))
-
-    def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live):
+    def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url, live, conn):
          def run_rtmpdump(args):
              start = time.time()
              resume_percent = None
@@ -301,11 +323,27 @@ class FileDownloader(object):
                          'eta': eta,
                          'speed': speed,
                      })
-                elif self.params.get('verbose', False):
-                    if not cursor_in_new_line:
-                        self.to_screen(u'')
-                    cursor_in_new_line = True
-                    self.to_screen(u'[rtmpdump] '+line)
+                else:
+                    # no percent for live streams
+                    mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
+                    if mobj:
+                        downloaded_data_len = int(float(mobj.group(1))*1024)
+                        time_now = time.time()
+                        speed = self.calc_speed(start, time_now, downloaded_data_len)
+                        self.report_progress_live_stream(downloaded_data_len, speed, time_now - start)
+                        cursor_in_new_line = False
+                        self._hook_progress({
+                            'downloaded_bytes': downloaded_data_len,
+                            'tmpfilename': tmpfilename,
+                            'filename': filename,
+                            'status': 'downloading',
+                            'speed': speed,
+                        })
+                    elif self.params.get('verbose', False):
+                        if not cursor_in_new_line:
+                            self.to_screen(u'')
+                        cursor_in_new_line = True
+                        self.to_screen(u'[rtmpdump] '+line)
              proc.wait()
              if not cursor_in_new_line:
                  self.to_screen(u'')
@@ -338,6 +376,8 @@ class FileDownloader(object):
              basic_args += ['--stop', '1']
          if live:
              basic_args += ['--live']
+        if conn:
+            basic_args += ['--conn', conn]
          args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
  
          if sys.platform == 'win32' and sys.version_info < (3, 0):
@@ -479,7 +519,8 @@ class FileDownloader(object):
                                                  info_dict.get('page_url', None),
                                                  info_dict.get('play_path', None),
                                                  info_dict.get('tc_url', None),
-                                                info_dict.get('rtmp_live', False))
+                                                info_dict.get('rtmp_live', False),
+                                                info_dict.get('rtmp_conn', None))
  
          # Attempt to download using mplayer
          if url.startswith('mms') or url.startswith('rtsp'):
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index b68b110a461f50af2d163997d1acfa18ece3afaf..2a078adfbbc7f7aed7ca31a6aff85d0e6a9c19b2 100644 (file)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -3,6 +3,7 @@
  
  from __future__ import absolute_import
  
+import collections
  import errno
  import io
  import json
@@ -22,7 +23,6 @@ if os.name == 'nt':
  from .utils import (
      compat_cookiejar,
      compat_http_client,
-    compat_print,
      compat_str,
      compat_urllib_error,
      compat_urllib_request,
@@ -34,6 +34,8 @@ from .utils import (
      encodeFilename,
      ExtractorError,
      format_bytes,
+    formatSeconds,
+    get_term_width,
      locked_file,
      make_HTTPS_handler,
      MaxDownloadsReached,
@@ -45,6 +47,7 @@ from .utils import (
      subtitles_filename,
      takewhile_inclusive,
      UnavailableVideoError,
+    url_basename,
      write_json_file,
      write_string,
      YoutubeDLHandler,
@@ -93,6 +96,7 @@ class YoutubeDL(object):
      forcethumbnail:    Force printing thumbnail URL.
      forcedescription:  Force printing description.
      forcefilename:     Force printing final filename.
+    forceduration:     Force printing duration.
      forcejson:         Force printing info_dict as JSON.
      simulate:          Do not download the video files.
      format:            Video format code.
@@ -126,13 +130,24 @@ class YoutubeDL(object):
      noplaylist:        Download single video instead of a playlist if in doubt.
      age_limit:         An integer representing the user's age in years.
                         Unsuitable videos for the given age are skipped.
-    download_archive:   File name of a file where all downloads are recorded.
+    min_views:         An integer representing the minimum view count the video
+                       must have in order to not be skipped.
+                       Videos without view count information are always
+                       downloaded. None for no limit.
+    max_views:         An integer representing the maximum view count.
+                       Videos that are more popular than that are not
+                       downloaded.
+                       Videos without view count information are always
+                       downloaded. None for no limit.
+    download_archive:  File name of a file where all downloads are recorded.
                         Videos already present in the file are not downloaded
                         again.
      cookiefile:        File name where cookies should be read from and dumped to.
      nocheckcertificate:Do not verify SSL certificates
      proxy:             URL of the proxy server to use
      socket_timeout:    Time to wait for unresponsive hosts, in seconds
+    bidi_workaround:   Work around buggy terminals without bidirectional text
+                       support, using fridibi
  
      The following parameters are not used by YoutubeDL itself, they are used by
      the FileDownloader:
@@ -156,8 +171,30 @@ class YoutubeDL(object):
          self._download_retcode = 0
          self._num_downloads = 0
          self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
+        self._err_file = sys.stderr
          self.params = {} if params is None else params
  
+        if params.get('bidi_workaround', False):
+            try:
+                import pty
+                master, slave = pty.openpty()
+                width = get_term_width()
+                if width is None:
+                    width_args = []
+                else:
+                    width_args = ['-w', str(width)]
+                self._fribidi = subprocess.Popen(
+                    ['fribidi', '-c', 'UTF-8'] + width_args,
+                    stdin=subprocess.PIPE,
+                    stdout=slave,
+                    stderr=self._err_file)
+                self._fribidi_channel = os.fdopen(master, 'rb')
+            except OSError as ose:
+                if ose.errno == 2:
+                    self.report_warning(u'Could not find fribidi executable, ignoring --bidi-workaround . Make sure that  fribidi  is an executable file in one of the directories in your $PATH.')
+                else:
+                    raise
+
          if (sys.version_info >= (3,) and sys.platform != 'win32' and
                  sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
                  and not params['restrictfilenames']):
@@ -205,13 +242,31 @@ class YoutubeDL(object):
          self._pps.append(pp)
          pp.set_downloader(self)
  
+    def _bidi_workaround(self, message):
+        if not hasattr(self, '_fribidi_channel'):
+            return message
+
+        assert type(message) == type(u'')
+        line_count = message.count(u'\n') + 1
+        self._fribidi.stdin.write((message + u'\n').encode('utf-8'))
+        self._fribidi.stdin.flush()
+        res = u''.join(self._fribidi_channel.readline().decode('utf-8')
+                       for _ in range(line_count))
+        return res[:-len(u'\n')]
+
      def to_screen(self, message, skip_eol=False):
+        """Print message to stdout if not in quiet mode."""
+        return self.to_stdout(message, skip_eol, check_quiet=True)
+
+    def to_stdout(self, message, skip_eol=False, check_quiet=False):
          """Print message to stdout if not in quiet mode."""
          if self.params.get('logger'):
              self.params['logger'].debug(message)
-        elif not self.params.get('quiet', False):
+        elif not check_quiet or not self.params.get('quiet', False):
+            message = self._bidi_workaround(message)
              terminator = [u'\n', u''][skip_eol]
              output = message + terminator
+
              write_string(output, self._screen_file)
  
      def to_stderr(self, message):
@@ -220,10 +275,9 @@ class YoutubeDL(object):
          if self.params.get('logger'):
              self.params['logger'].error(message)
          else:
+            message = self._bidi_workaround(message)
              output = message + u'\n'
-            if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
-                output = output.encode(preferredencoding())
-            sys.stderr.write(output)
+            write_string(output, self._err_file)
  
      def to_console_title(self, message):
          if not self.params.get('consoletitle', False):
@@ -294,7 +348,7 @@ class YoutubeDL(object):
          Print the message to stderr, it will be prefixed with 'WARNING:'
          If stderr is a tty file the 'WARNING:' will be colored
          '''
-        if sys.stderr.isatty() and os.name != 'nt':
+        if self._err_file.isatty() and os.name != 'nt':
              _msg_header = u'\033[0;33mWARNING:\033[0m'
          else:
              _msg_header = u'WARNING:'
@@ -306,29 +360,13 @@ class YoutubeDL(object):
          Do the same as trouble, but prefixes the message with 'ERROR:', colored
          in red if stderr is a tty file.
          '''
-        if sys.stderr.isatty() and os.name != 'nt':
+        if self._err_file.isatty() and os.name != 'nt':
              _msg_header = u'\033[0;31mERROR:\033[0m'
          else:
              _msg_header = u'ERROR:'
          error_message = u'%s %s' % (_msg_header, message)
          self.trouble(error_message, tb)
  
-    def report_writedescription(self, descfn):
-        """ Report that the description file is being written """
-        self.to_screen(u'[info] Writing video description to: ' + descfn)
-
-    def report_writesubtitles(self, sub_filename):
-        """ Report that the subtitles file is being written """
-        self.to_screen(u'[info] Writing video subtitles to: ' + sub_filename)
-
-    def report_writeinfojson(self, infofn):
-        """ Report that the metadata file has been written """
-        self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
-
-    def report_writeannotations(self, annofn):
-        """ Report that the annotations file has been written. """
-        self.to_screen(u'[info] Writing video annotations to: ' + annofn)
-
      def report_file_already_downloaded(self, file_name):
          """Report file has already been fully downloaded."""
          try:
@@ -355,18 +393,17 @@ class YoutubeDL(object):
                  template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
  
              sanitize = lambda k, v: sanitize_filename(
-                u'NA' if v is None else compat_str(v),
+                compat_str(v),
                  restricted=self.params.get('restrictfilenames'),
                  is_id=(k == u'id'))
              template_dict = dict((k, sanitize(k, v))
-                                 for k, v in template_dict.items())
+                                 for k, v in template_dict.items()
+                                 if v is not None)
+            template_dict = collections.defaultdict(lambda: u'NA', template_dict)
  
              tmpl = os.path.expanduser(self.params['outtmpl'])
              filename = tmpl % template_dict
              return filename
-        except KeyError as err:
-            self.report_error(u'Erroneous output template')
-            return None
          except ValueError as err:
              self.report_error(u'Error in output template: ' + str(err) + u' (encoding: ' + repr(preferredencoding()) + ')')
              return None
@@ -374,13 +411,14 @@ class YoutubeDL(object):
      def _match_entry(self, info_dict):
          """ Returns None iff the file should be downloaded """
  
+        video_title = info_dict.get('title', info_dict.get('id', u'video'))
          if 'title' in info_dict:
              # This can happen when we're just evaluating the playlist
              title = info_dict['title']
              matchtitle = self.params.get('matchtitle', False)
              if matchtitle:
                  if not re.search(matchtitle, title, re.IGNORECASE):
-                    return u'[download] "' + title + '" title did not match pattern "' + matchtitle + '"'
+                    return u'"' + title + '" title did not match pattern "' + matchtitle + '"'
              rejecttitle = self.params.get('rejecttitle', False)
              if rejecttitle:
                  if re.search(rejecttitle, title, re.IGNORECASE):
@@ -389,14 +427,21 @@ class YoutubeDL(object):
          if date is not None:
              dateRange = self.params.get('daterange', DateRange())
              if date not in dateRange:
-                return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
+                return u'%s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
+        view_count = info_dict.get('view_count', None)
+        if view_count is not None:
+            min_views = self.params.get('min_views')
+            if min_views is not None and view_count < min_views:
+                return u'Skipping %s, because it has not reached minimum view count (%d/%d)' % (video_title, view_count, min_views)
+            max_views = self.params.get('max_views')
+            if max_views is not None and view_count > max_views:
+                return u'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
          age_limit = self.params.get('age_limit')
          if age_limit is not None:
              if age_limit < info_dict.get('age_limit', 0):
                  return u'Skipping "' + title + '" because it is age restricted'
          if self.in_download_archive(info_dict):
-            return (u'%s has already been recorded in archive'
-                    % info_dict.get('title', info_dict.get('id', u'video')))
+            return u'%s has already been recorded in archive' % video_title
          return None
  
      @staticmethod
@@ -405,7 +450,8 @@ class YoutubeDL(object):
          for key, value in extra_info.items():
              info_dict.setdefault(key, value)
  
-    def extract_info(self, url, download=True, ie_key=None, extra_info={}):
+    def extract_info(self, url, download=True, ie_key=None, extra_info={},
+                     process=True):
          '''
          Returns a list with a dictionary for each video we find.
          If 'download', also downloads the videos.
@@ -439,9 +485,13 @@ class YoutubeDL(object):
                      {
                          'extractor': ie.IE_NAME,
                          'webpage_url': url,
+                        'webpage_url_basename': url_basename(url),
                          'extractor_key': ie.ie_key(),
                      })
-                return self.process_ie_result(ie_result, download, extra_info)
+                if process:
+                    return self.process_ie_result(ie_result, download, extra_info)
+                else:
+                    return ie_result
              except ExtractorError as de: # An error we somewhat expected
                  self.report_error(compat_str(de), de.format_traceback())
                  break
@@ -474,8 +524,33 @@ class YoutubeDL(object):
                                       download,
                                       ie_key=ie_result.get('ie_key'),
                                       extra_info=extra_info)
+        elif result_type == 'url_transparent':
+            # Use the information from the embedding page
+            info = self.extract_info(
+                ie_result['url'], ie_key=ie_result.get('ie_key'),
+                extra_info=extra_info, download=False, process=False)
+
+            def make_result(embedded_info):
+                new_result = ie_result.copy()
+                for f in ('_type', 'url', 'ext', 'player_url', 'formats',
+                          'entries', 'urlhandle', 'ie_key', 'duration',
+                          'subtitles', 'annotations', 'format',
+                          'thumbnail', 'thumbnails'):
+                    if f in new_result:
+                        del new_result[f]
+                    if f in embedded_info:
+                        new_result[f] = embedded_info[f]
+                return new_result
+            new_result = make_result(info)
+
+            assert new_result.get('_type') != 'url_transparent'
+            if new_result.get('_type') == 'compat_list':
+                new_result['entries'] = [
+                    make_result(e) for e in new_result['entries']]
+
+            return self.process_ie_result(
+                new_result, download=download, extra_info=extra_info)
          elif result_type == 'playlist':
-
              # We process each entry in the playlist
              playlist = ie_result.get('title', None) or ie_result.get('id', None)
              self.to_screen(u'[download] Downloading playlist: %s' % playlist)
@@ -484,16 +559,16 @@ class YoutubeDL(object):
  
              n_all_entries = len(ie_result['entries'])
              playliststart = self.params.get('playliststart', 1) - 1
-            playlistend = self.params.get('playlistend', -1)
-
+            playlistend = self.params.get('playlistend', None)
+            # For backwards compatibility, interpret -1 as whole list
              if playlistend == -1:
-                entries = ie_result['entries'][playliststart:]
-            else:
-                entries = ie_result['entries'][playliststart:playlistend]
+                playlistend = None
  
+            entries = ie_result['entries'][playliststart:playlistend]
              n_entries = len(entries)
  
-            self.to_screen(u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
+            self.to_screen(
+                u"[%s] playlist '%s': Collected %d video ids (downloading %d of them)" %
                  (ie_result['extractor'], playlist, n_all_entries, n_entries))
  
              for i, entry in enumerate(entries, 1):
@@ -503,6 +578,7 @@ class YoutubeDL(object):
                      'playlist_index': i + playliststart,
                      'extractor': ie_result['extractor'],
                      'webpage_url': ie_result['webpage_url'],
+                    'webpage_url_basename': url_basename(ie_result['webpage_url']),
                      'extractor_key': ie_result['extractor_key'],
                  }
  
@@ -523,6 +599,7 @@ class YoutubeDL(object):
                      {
                          'extractor': ie_result['extractor'],
                          'webpage_url': ie_result['webpage_url'],
+                        'webpage_url_basename': url_basename(ie_result['webpage_url']),
                          'extractor_key': ie_result['extractor_key'],
                      })
                  return r
@@ -666,22 +743,25 @@ class YoutubeDL(object):
  
          # Forced printings
          if self.params.get('forcetitle', False):
-            compat_print(info_dict['fulltitle'])
+            self.to_stdout(info_dict['fulltitle'])
          if self.params.get('forceid', False):
-            compat_print(info_dict['id'])
+            self.to_stdout(info_dict['id'])
          if self.params.get('forceurl', False):
              # For RTMP URLs, also include the playpath
-            compat_print(info_dict['url'] + info_dict.get('play_path', u''))
+            self.to_stdout(info_dict['url'] + info_dict.get('play_path', u''))
          if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
-            compat_print(info_dict['thumbnail'])
+            self.to_stdout(info_dict['thumbnail'])
          if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
-            compat_print(info_dict['description'])
+            self.to_stdout(info_dict['description'])
          if self.params.get('forcefilename', False) and filename is not None:
-            compat_print(filename)
+            self.to_stdout(filename)
+        if self.params.get('forceduration', False) and info_dict.get('duration') is not None:
+            self.to_stdout(formatSeconds(info_dict['duration']))
          if self.params.get('forceformat', False):
-            compat_print(info_dict['format'])
+            self.to_stdout(info_dict['format'])
          if self.params.get('forcejson', False):
-            compat_print(json.dumps(info_dict))
+            info_dict['_filename'] = filename
+            self.to_stdout(json.dumps(info_dict))
  
          # Do nothing else if in simulate mode
          if self.params.get('simulate', False):
@@ -699,28 +779,34 @@ class YoutubeDL(object):
              return
  
          if self.params.get('writedescription', False):
-            try:
-                descfn = filename + u'.description'
-                self.report_writedescription(descfn)
-                with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
-                    descfile.write(info_dict['description'])
-            except (KeyError, TypeError):
-                self.report_warning(u'There\'s no description to write.')
-            except (OSError, IOError):
-                self.report_error(u'Cannot write description file ' + descfn)
-                return
+            descfn = filename + u'.description'
+            if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(descfn)):
+                self.to_screen(u'[info] Video description is already present')
+            else:
+                try:
+                    self.to_screen(u'[info] Writing video description to: ' + descfn)
+                    with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
+                        descfile.write(info_dict['description'])
+                except (KeyError, TypeError):
+                    self.report_warning(u'There\'s no description to write.')
+                except (OSError, IOError):
+                    self.report_error(u'Cannot write description file ' + descfn)
+                    return
  
          if self.params.get('writeannotations', False):
-            try:
-                annofn = filename + u'.annotations.xml'
-                self.report_writeannotations(annofn)
-                with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
-                    annofile.write(info_dict['annotations'])
-            except (KeyError, TypeError):
-                self.report_warning(u'There are no annotations to write.')
-            except (OSError, IOError):
-                self.report_error(u'Cannot write annotations file: ' + annofn)
-                return
+            annofn = filename + u'.annotations.xml'
+            if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(annofn)):
+                self.to_screen(u'[info] Video annotations are already present')
+            else:
+                try:
+                    self.to_screen(u'[info] Writing video annotations to: ' + annofn)
+                    with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
+                        annofile.write(info_dict['annotations'])
+                except (KeyError, TypeError):
+                    self.report_warning(u'There are no annotations to write.')
+                except (OSError, IOError):
+                    self.report_error(u'Cannot write annotations file: ' + annofn)
+                    return
  
          subtitles_are_requested = any([self.params.get('writesubtitles', False),
                                         self.params.get('writeautomaticsub')])
@@ -736,38 +822,48 @@ class YoutubeDL(object):
                      continue
                  try:
                      sub_filename = subtitles_filename(filename, sub_lang, sub_format)
-                    self.report_writesubtitles(sub_filename)
-                    with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
-                            subfile.write(sub)
+                    if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
+                        self.to_screen(u'[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
+                    else:
+                        self.to_screen(u'[info] Writing video subtitles to: ' + sub_filename)
+                        with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
+                                subfile.write(sub)
                  except (OSError, IOError):
                      self.report_error(u'Cannot write subtitles file ' + descfn)
                      return
  
          if self.params.get('writeinfojson', False):
              infofn = os.path.splitext(filename)[0] + u'.info.json'
-            self.report_writeinfojson(infofn)
-            try:
-                json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
-                write_json_file(json_info_dict, encodeFilename(infofn))
-            except (OSError, IOError):
-                self.report_error(u'Cannot write metadata to JSON file ' + infofn)
-                return
+            if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(infofn)):
+                self.to_screen(u'[info] Video description metadata is already present')
+            else:
+                self.to_screen(u'[info] Writing video description metadata as JSON to: ' + infofn)
+                try:
+                    json_info_dict = dict((k, v) for k, v in info_dict.items() if not k in ['urlhandle'])
+                    write_json_file(json_info_dict, encodeFilename(infofn))
+                except (OSError, IOError):
+                    self.report_error(u'Cannot write metadata to JSON file ' + infofn)
+                    return
  
          if self.params.get('writethumbnail', False):
              if info_dict.get('thumbnail') is not None:
                  thumb_format = determine_ext(info_dict['thumbnail'], u'jpg')
-                thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
-                self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
-                               (info_dict['extractor'], info_dict['id']))
-                try:
-                    uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
-                    with open(thumb_filename, 'wb') as thumbf:
-                        shutil.copyfileobj(uf, thumbf)
-                    self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
-                        (info_dict['extractor'], info_dict['id'], thumb_filename))
-                except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-                    self.report_warning(u'Unable to download thumbnail "%s": %s' %
-                        (info_dict['thumbnail'], compat_str(err)))
+                thumb_filename = os.path.splitext(filename)[0] + u'.' + thumb_format
+                if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
+                    self.to_screen(u'[%s] %s: Thumbnail is already present' %
+                                   (info_dict['extractor'], info_dict['id']))
+                else:
+                    self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
+                                   (info_dict['extractor'], info_dict['id']))
+                    try:
+                        uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
+                        with open(thumb_filename, 'wb') as thumbf:
+                            shutil.copyfileobj(uf, thumbf)
+                        self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
+                            (info_dict['extractor'], info_dict['id'], thumb_filename))
+                    except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
+                        self.report_warning(u'Unable to download thumbnail "%s": %s' %
+                            (info_dict['thumbnail'], compat_str(err)))
  
          if not self.params.get('skip_download', False):
              if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
@@ -812,6 +908,20 @@ class YoutubeDL(object):
  
          return self._download_retcode
  
+    def download_with_info_file(self, info_filename):
+        with io.open(info_filename, 'r', encoding='utf-8') as f:
+            info = json.load(f)
+        try:
+            self.process_ie_result(info, download=True)
+        except DownloadError:
+            webpage_url = info.get('webpage_url')
+            if webpage_url is not None:
+                self.report_warning(u'The info failed to download, trying with "%s"' % webpage_url)
+                return self.download([webpage_url])
+            else:
+                raise
+        return self._download_retcode
+
      def post_process(self, filename, ie_info):
          """Run all the postprocessors on the given file."""
          info = dict(ie_info)
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index d2446b6706a6eb239cf52a00fb775ef0eb9cac9f..63437301b6fb43f360856646184f7161e2d76c3b 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -37,6 +37,7 @@ __authors__  = (
      'Anton Larionov',
      'Takuya Tsuchida',
      'Sergey M.',
+    'Michael Orlitzky',
  )
  
  __license__ = 'Public Domain'
@@ -48,7 +49,6 @@ import os
  import random
  import re
  import shlex
-import subprocess
  import sys
  
  
@@ -56,12 +56,13 @@ from .utils import (
      compat_print,
      DateRange,
      decodeOption,
-    determine_ext,
+    get_term_width,
      DownloadError,
      get_cachedir,
      MaxDownloadsReached,
      preferredencoding,
      SameFileError,
+    setproctitle,
      std_headers,
      write_string,
  )
@@ -113,19 +114,6 @@ def parseOpts(overrideArguments=None):
      def _comma_separated_values_options_callback(option, opt_str, value, parser):
          setattr(parser.values, option.dest, value.split(','))
  
-    def _find_term_columns():
-        columns = os.environ.get('COLUMNS', None)
-        if columns:
-            return int(columns)
-
-        try:
-            sp = subprocess.Popen(['stty', 'size'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
-            out,err = sp.communicate()
-            return int(out.split()[1])
-        except:
-            pass
-        return None
-
      def _hide_login_info(opts):
          opts = list(opts)
          for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
@@ -140,7 +128,7 @@ def parseOpts(overrideArguments=None):
      max_help_position = 80
  
      # No need to wrap help messages if we're on a wide console
-    columns = _find_term_columns()
+    columns = get_term_width()
      if columns: max_width = columns
  
      fmt = optparse.IndentedHelpFormatter(width=max_width, max_help_position=max_help_position)
@@ -204,12 +192,19 @@ def parseOpts(overrideArguments=None):
      general.add_option(
          '--socket-timeout', dest='socket_timeout',
          type=float, default=None, help=optparse.SUPPRESS_HELP)
-
-
-    selection.add_option('--playlist-start',
-            dest='playliststart', metavar='NUMBER', help='playlist video to start at (default is %default)', default=1)
-    selection.add_option('--playlist-end',
-            dest='playlistend', metavar='NUMBER', help='playlist video to end at (default is last)', default=-1)
+    general.add_option(
+        '--bidi-workaround', dest='bidi_workaround', action='store_true',
+        help=u'Work around terminals that lack bidirectional text support. Requires fribidi executable in PATH')
+
+
+    selection.add_option(
+        '--playlist-start',
+        dest='playliststart', metavar='NUMBER', default=1, type=int,
+        help='playlist video to start at (default is %default)')
+    selection.add_option(
+        '--playlist-end',
+        dest='playlistend', metavar='NUMBER', default=None, type=int,
+        help='playlist video to end at (default is last)')
      selection.add_option('--match-title', dest='matchtitle', metavar='REGEX',help='download only matching titles (regex or caseless sub-string)')
      selection.add_option('--reject-title', dest='rejecttitle', metavar='REGEX',help='skip download for matching titles (regex or caseless sub-string)')
      selection.add_option('--max-downloads', metavar='NUMBER',
@@ -220,6 +215,14 @@ def parseOpts(overrideArguments=None):
      selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
      selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
      selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
+    selection.add_option(
+        '--min-views', metavar='COUNT', dest='min_views',
+        default=None, type=int,
+        help="Do not download any videos with less than COUNT views",)
+    selection.add_option(
+        '--max-views', metavar='COUNT', dest='max_views',
+        default=None, type=int,
+        help="Do not download any videos with more than COUNT views",)
      selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
      selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
                           help='download only videos suitable for the given age',
@@ -300,6 +303,9 @@ def parseOpts(overrideArguments=None):
      verbosity.add_option('--get-description',
              action='store_true', dest='getdescription',
              help='simulate, quiet but print video description', default=False)
+    verbosity.add_option('--get-duration',
+            action='store_true', dest='getduration',
+            help='simulate, quiet but print video length', default=False)
      verbosity.add_option('--get-filename',
              action='store_true', dest='getfilename',
              help='simulate, quiet but print output filename', default=False)
@@ -360,6 +366,9 @@ def parseOpts(overrideArguments=None):
              help='Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames', default=False)
      filesystem.add_option('-a', '--batch-file',
              dest='batchfile', metavar='FILE', help='file containing URLs to download (\'-\' for stdin)')
+    filesystem.add_option('--load-info',
+            dest='load_info_filename', metavar='FILE',
+            help='json file containing the video information (created with the "--write-json" option')
      filesystem.add_option('-w', '--no-overwrites',
              action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
      filesystem.add_option('-c', '--continue',
@@ -467,12 +476,15 @@ def parseOpts(overrideArguments=None):
  
      return parser, opts, args
  
+
  def _real_main(argv=None):
      # Compatibility fixes for Windows
      if sys.platform == 'win32':
          # https://github.com/rg3/youtube-dl/issues/820
          codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
  
+    setproctitle(u'youtube-dl')
+
      parser, opts, args = parseOpts(argv)
  
      # Set user agent
@@ -512,7 +524,6 @@ def _real_main(argv=None):
          for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
              compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
              matchedUrls = [url for url in all_urls if ie.suitable(url)]
-            all_urls = [url for url in all_urls if url not in matchedUrls]
              for mu in matchedUrls:
                  compat_print(u'  ' + mu)
          sys.exit(0)
@@ -567,18 +578,10 @@ def _real_main(argv=None):
          if numeric_buffersize is None:
              parser.error(u'invalid buffer size specified')
          opts.buffersize = numeric_buffersize
-    try:
-        opts.playliststart = int(opts.playliststart)
-        if opts.playliststart <= 0:
-            raise ValueError(u'Playlist start must be positive')
-    except (TypeError, ValueError):
-        parser.error(u'invalid playlist start number specified')
-    try:
-        opts.playlistend = int(opts.playlistend)
-        if opts.playlistend != -1 and (opts.playlistend <= 0 or opts.playlistend < opts.playliststart):
-            raise ValueError(u'Playlist end must be greater than playlist start')
-    except (TypeError, ValueError):
-        parser.error(u'invalid playlist end number specified')
+    if opts.playliststart <= 0:
+        raise ValueError(u'Playlist start must be positive')
+    if opts.playlistend not in (-1, None) and opts.playlistend < opts.playliststart:
+        raise ValueError(u'Playlist end must be greater than playlist start')
      if opts.extractaudio:
          if opts.audioformat not in ['best', 'aac', 'mp3', 'm4a', 'opus', 'vorbis', 'wav']:
              parser.error(u'invalid audio format specified')
@@ -611,27 +614,30 @@ def _real_main(argv=None):
              or (opts.useid and u'%(id)s.%(ext)s')
              or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
              or u'%(title)s-%(id)s.%(ext)s')
-    if '%(ext)s' not in outtmpl and opts.extractaudio:
+    if not os.path.splitext(outtmpl)[1] and opts.extractaudio:
          parser.error(u'Cannot download a video and extract audio into the same'
-                     u' file! Use "%%(ext)s" instead of %r' %
-                     determine_ext(outtmpl, u''))
+                     u' file! Use "{0}.%(ext)s" instead of "{0}" as the output'
+                     u' template'.format(outtmpl))
+
+    any_printing = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson
  
      ydl_opts = {
          'usenetrc': opts.usenetrc,
          'username': opts.username,
          'password': opts.password,
          'videopassword': opts.videopassword,
-        'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
+        'quiet': (opts.quiet or any_printing),
          'forceurl': opts.geturl,
          'forcetitle': opts.gettitle,
          'forceid': opts.getid,
          'forcethumbnail': opts.getthumbnail,
          'forcedescription': opts.getdescription,
+        'forceduration': opts.getduration,
          'forcefilename': opts.getfilename,
          'forceformat': opts.getformat,
          'forcejson': opts.dumpjson,
          'simulate': opts.simulate,
-        'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.dumpjson),
+        'skip_download': (opts.skip_download or opts.simulate or any_printing),
          'format': opts.format,
          'format_limit': opts.format_limit,
          'listformats': opts.listformats,
@@ -675,6 +681,8 @@ def _real_main(argv=None):
          'keepvideo': opts.keepvideo,
          'min_filesize': opts.min_filesize,
          'max_filesize': opts.max_filesize,
+        'min_views': opts.min_views,
+        'max_views': opts.max_views,
          'daterange': date,
          'cachedir': opts.cachedir,
          'youtube_print_sig_code': opts.youtube_print_sig_code,
@@ -684,6 +692,7 @@ def _real_main(argv=None):
          'nocheckcertificate': opts.no_check_certificate,
          'proxy': opts.proxy,
          'socket_timeout': opts.socket_timeout,
+        'bidi_workaround': opts.bidi_workaround,
      }
  
      with YoutubeDL(ydl_opts) as ydl:
@@ -706,14 +715,17 @@ def _real_main(argv=None):
              update_self(ydl.to_screen, opts.verbose)
  
          # Maybe do nothing
-        if len(all_urls) < 1:
+        if (len(all_urls) < 1) and (opts.load_info_filename is None):
              if not opts.update_self:
                  parser.error(u'you must provide at least one URL')
              else:
                  sys.exit()
  
          try:
-            retcode = ydl.download(all_urls)
+            if opts.load_info_filename is not None:
+                retcode = ydl.download_with_info_file(opts.load_info_filename)
+            else:
+                retcode = ydl.download(all_urls)
          except MaxDownloadsReached:
              ydl.to_screen(u'--max-download limit reached, aborting.')
              retcode = 101
diff --git a/youtube_dl/aes.py b/youtube_dl/aes.py

index 9a0c93fa6f4efb415f7e6dad25239a4c219a2542..e9c5e21521d66baa177986e8ca878e3fc1a75461 100644 (file)
--- a/youtube_dl/aes.py
+++ b/youtube_dl/aes.py
@@ -1,4 +1,4 @@
-__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_decrypt_text']
+__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_cbc_decrypt', 'aes_decrypt_text']
  
  import base64
  from math import ceil
@@ -32,6 +32,31 @@ def aes_ctr_decrypt(data, key, counter):
      
      return decrypted_data
  
+def aes_cbc_decrypt(data, key, iv):
+    """
+    Decrypt with aes in CBC mode
+    
+    @param {int[]} data        cipher
+    @param {int[]} key         16/24/32-Byte cipher key
+    @param {int[]} iv          16-Byte IV
+    @returns {int[]}           decrypted data
+    """
+    expanded_key = key_expansion(key)
+    block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
+    
+    decrypted_data=[]
+    previous_cipher_block = iv
+    for i in range(block_count):
+        block = data[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES]
+        block += [0]*(BLOCK_SIZE_BYTES - len(block))
+        
+        decrypted_block = aes_decrypt(block, expanded_key)
+        decrypted_data += xor(decrypted_block, previous_cipher_block)
+        previous_cipher_block = block
+    decrypted_data = decrypted_data[:len(data)]
+    
+    return decrypted_data
+
  def key_expansion(data):
      """
      Generate key schedule
@@ -75,7 +100,7 @@ def aes_encrypt(data, expanded_key):
      @returns {int[]}             16-Byte cipher
      """
      rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
-    
+
      data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
      for i in range(1, rounds+1):
          data = sub_bytes(data)
@@ -83,6 +108,26 @@ def aes_encrypt(data, expanded_key):
          if i != rounds:
              data = mix_columns(data)
          data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
+
+    return data
+
+def aes_decrypt(data, expanded_key):
+    """
+    Decrypt one block with aes
+    
+    @param {int[]} data          16-Byte cipher
+    @param {int[]} expanded_key  176/208/240-Byte expanded key
+    @returns {int[]}             16-Byte state
+    """
+    rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
+    
+    for i in range(rounds, 0, -1):
+        data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
+        if i != rounds:
+            data = mix_columns_inv(data)
+        data = shift_rows_inv(data)
+        data = sub_bytes_inv(data)
+    data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
      
      return data
  
@@ -139,14 +184,69 @@ SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B,
          0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
          0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
          0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16)
-MIX_COLUMN_MATRIX = ((2,3,1,1),
-                     (1,2,3,1),
-                     (1,1,2,3),
-                     (3,1,1,2))
+SBOX_INV = (0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
+            0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
+            0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d, 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
+            0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2, 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
+            0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
+            0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda, 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
+            0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a, 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
+            0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02, 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
+            0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea, 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
+            0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85, 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
+            0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89, 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
+            0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20, 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
+            0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31, 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
+            0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d, 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
+            0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0, 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
+            0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26, 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d)
+MIX_COLUMN_MATRIX = ((0x2,0x3,0x1,0x1),
+                     (0x1,0x2,0x3,0x1),
+                     (0x1,0x1,0x2,0x3),
+                     (0x3,0x1,0x1,0x2))
+MIX_COLUMN_MATRIX_INV = ((0xE,0xB,0xD,0x9),
+                         (0x9,0xE,0xB,0xD),
+                         (0xD,0x9,0xE,0xB),
+                         (0xB,0xD,0x9,0xE))
+RIJNDAEL_EXP_TABLE = (0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35,
+                      0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA,
+                      0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31,
+                      0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD,
+                      0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88,
+                      0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A,
+                      0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3,
+                      0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0,
+                      0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41,
+                      0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75,
+                      0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80,
+                      0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54,
+                      0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA,
+                      0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E,
+                      0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17,
+                      0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01)
+RIJNDAEL_LOG_TABLE = (0x00, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1a, 0xc6, 0x4b, 0xc7, 0x1b, 0x68, 0x33, 0xee, 0xdf, 0x03,
+                      0x64, 0x04, 0xe0, 0x0e, 0x34, 0x8d, 0x81, 0xef, 0x4c, 0x71, 0x08, 0xc8, 0xf8, 0x69, 0x1c, 0xc1,
+                      0x7d, 0xc2, 0x1d, 0xb5, 0xf9, 0xb9, 0x27, 0x6a, 0x4d, 0xe4, 0xa6, 0x72, 0x9a, 0xc9, 0x09, 0x78,
+                      0x65, 0x2f, 0x8a, 0x05, 0x21, 0x0f, 0xe1, 0x24, 0x12, 0xf0, 0x82, 0x45, 0x35, 0x93, 0xda, 0x8e,
+                      0x96, 0x8f, 0xdb, 0xbd, 0x36, 0xd0, 0xce, 0x94, 0x13, 0x5c, 0xd2, 0xf1, 0x40, 0x46, 0x83, 0x38,
+                      0x66, 0xdd, 0xfd, 0x30, 0xbf, 0x06, 0x8b, 0x62, 0xb3, 0x25, 0xe2, 0x98, 0x22, 0x88, 0x91, 0x10,
+                      0x7e, 0x6e, 0x48, 0xc3, 0xa3, 0xb6, 0x1e, 0x42, 0x3a, 0x6b, 0x28, 0x54, 0xfa, 0x85, 0x3d, 0xba,
+                      0x2b, 0x79, 0x0a, 0x15, 0x9b, 0x9f, 0x5e, 0xca, 0x4e, 0xd4, 0xac, 0xe5, 0xf3, 0x73, 0xa7, 0x57,
+                      0xaf, 0x58, 0xa8, 0x50, 0xf4, 0xea, 0xd6, 0x74, 0x4f, 0xae, 0xe9, 0xd5, 0xe7, 0xe6, 0xad, 0xe8,
+                      0x2c, 0xd7, 0x75, 0x7a, 0xeb, 0x16, 0x0b, 0xf5, 0x59, 0xcb, 0x5f, 0xb0, 0x9c, 0xa9, 0x51, 0xa0,
+                      0x7f, 0x0c, 0xf6, 0x6f, 0x17, 0xc4, 0x49, 0xec, 0xd8, 0x43, 0x1f, 0x2d, 0xa4, 0x76, 0x7b, 0xb7,
+                      0xcc, 0xbb, 0x3e, 0x5a, 0xfb, 0x60, 0xb1, 0x86, 0x3b, 0x52, 0xa1, 0x6c, 0xaa, 0x55, 0x29, 0x9d,
+                      0x97, 0xb2, 0x87, 0x90, 0x61, 0xbe, 0xdc, 0xfc, 0xbc, 0x95, 0xcf, 0xcd, 0x37, 0x3f, 0x5b, 0xd1,
+                      0x53, 0x39, 0x84, 0x3c, 0x41, 0xa2, 0x6d, 0x47, 0x14, 0x2a, 0x9e, 0x5d, 0x56, 0xf2, 0xd3, 0xab,
+                      0x44, 0x11, 0x92, 0xd9, 0x23, 0x20, 0x2e, 0x89, 0xb4, 0x7c, 0xb8, 0x26, 0x77, 0x99, 0xe3, 0xa5,
+                      0x67, 0x4a, 0xed, 0xde, 0xc5, 0x31, 0xfe, 0x18, 0x0d, 0x63, 0x8c, 0x80, 0xc0, 0xf7, 0x70, 0x07)
  
  def sub_bytes(data):
      return [SBOX[x] for x in data]
  
+def sub_bytes_inv(data):
+    return [SBOX_INV[x] for x in data]
+
  def rotate(data):
      return data[1:] + [data[0]]
  
@@ -160,30 +260,31 @@ def key_schedule_core(data, rcon_iteration):
  def xor(data1, data2):
      return [x^y for x, y in zip(data1, data2)]
  
-def mix_column(data):
+def rijndael_mul(a, b):
+    if(a==0 or b==0):
+        return 0
+    return RIJNDAEL_EXP_TABLE[(RIJNDAEL_LOG_TABLE[a] + RIJNDAEL_LOG_TABLE[b]) % 0xFF]
+
+def mix_column(data, matrix):
      data_mixed = []
      for row in range(4):
          mixed = 0
          for column in range(4):
-            addend = data[column]
-            if MIX_COLUMN_MATRIX[row][column] in (2,3):
-                addend <<= 1
-                if addend > 0xff:
-                    addend &= 0xff
-                    addend ^= 0x1b
-                if MIX_COLUMN_MATRIX[row][column] == 3:
-                    addend ^= data[column]
-            mixed ^= addend & 0xff
+            # xor is (+) and (-)
+            mixed ^= rijndael_mul(data[column], matrix[row][column])
          data_mixed.append(mixed)
      return data_mixed
  
-def mix_columns(data):
+def mix_columns(data, matrix=MIX_COLUMN_MATRIX):
      data_mixed = []
      for i in range(4):
          column = data[i*4 : (i+1)*4]
-        data_mixed += mix_column(column)
+        data_mixed += mix_column(column, matrix)
      return data_mixed
  
+def mix_columns_inv(data):
+    return mix_columns(data, MIX_COLUMN_MATRIX_INV)
+
  def shift_rows(data):
      data_shifted = []
      for column in range(4):
@@ -191,6 +292,13 @@ def shift_rows(data):
              data_shifted.append( data[((column + row) & 0b11) * 4 + row] )
      return data_shifted
  
+def shift_rows_inv(data):
+    data_shifted = []
+    for column in range(4):
+        for row in range(4):
+            data_shifted.append( data[((column - row) & 0b11) * 4 + row] )
+    return data_shifted
+
  def inc(data):
      data = data[:] # copy
      for i in range(len(data)-1,-1,-1):
diff --git a/youtube_dl/extractor/__init__.py b/youtube_dl/extractor/__init__.py

index bd996483b4e0d0469ad5a0f04aec4bc078788a13..a39a1e2f49803161913442236244b1910d27755c 100644 (file)
--- a/youtube_dl/extractor/__init__.py
+++ b/youtube_dl/extractor/__init__.py
@@ -1,6 +1,8 @@
-from .appletrailers import AppleTrailersIE
+from .academicearth import AcademicEarthCourseIE
  from .addanime import AddAnimeIE
  from .anitube import AnitubeIE
+from .aparat import AparatIE
+from .appletrailers import AppleTrailersIE
  from .archiveorg import ArchiveOrgIE
  from .ard import ARDIE
  from .arte import (
@@ -8,10 +10,12 @@ from .arte import (
      ArteTVPlus7IE,
      ArteTVCreativeIE,
      ArteTVFutureIE,
+    ArteTVDDCIE,
  )
  from .auengine import AUEngineIE
  from .bambuser import BambuserIE, BambuserChannelIE
  from .bandcamp import BandcampIE, BandcampAlbumIE
+from .blinkx import BlinkxIE
  from .bliptv import BlipTVIE, BlipTVUserIE
  from .bloomberg import BloombergIE
  from .breakcom import BreakIE
@@ -19,6 +23,8 @@ from .brightcove import BrightcoveIE
  from .c56 import C56IE
  from .canalplus import CanalplusIE
  from .canalc2 import Canalc2IE
+from .cbs import CBSIE
+from .channel9 import Channel9IE
  from .cinemassacre import CinemassacreIE
  from .clipfish import ClipfishIE
  from .clipsyndicate import ClipsyndicateIE
@@ -27,6 +33,7 @@ from .collegehumor import CollegeHumorIE
  from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
  from .condenast import CondeNastIE
  from .criterion import CriterionIE
+from .crunchyroll import CrunchyrollIE
  from .cspan import CSpanIE
  from .d8 import D8IE
  from .dailymotion import (
@@ -56,7 +63,7 @@ from .flickr import FlickrIE
  from .francetv import (
      PluzzIE,
      FranceTvInfoIE,
-    France2IE,
+    FranceTVIE,
      GenerationQuoiIE
  )
  from .freesound import FreesoundIE
@@ -77,6 +84,10 @@ from .ina import InaIE
  from .infoq import InfoQIE
  from .instagram import InstagramIE
  from .internetvideoarchive import InternetVideoArchiveIE
+from .ivi import (
+    IviIE,
+    IviCompilationIE
+)
  from .jeuxvideo import JeuxVideoIE
  from .jukebox import JukeboxIE
  from .justintv import JustinTVIE
@@ -86,6 +97,7 @@ from .kickstarter import KickStarterIE
  from .keek import KeekIE
  from .liveleak import LiveLeakIE
  from .livestream import LivestreamIE, LivestreamOriginalIE
+from .mdr import MDRIE
  from .metacafe import MetacafeIE
  from .metacritic import MetacriticIE
  from .mit import TechTVMITIE, MITIE
@@ -99,17 +111,22 @@ from .myvideo import MyVideoIE
  from .naver import NaverIE
  from .nba import NBAIE
  from .nbc import NBCNewsIE
+from .ndtv import NDTVIE
  from .newgrounds import NewgroundsIE
  from .nhl import NHLIE, NHLVideocenterIE
  from .niconico import NiconicoIE
+from .ninegag import NineGagIE
  from .nowvideo import NowVideoIE
  from .ooyala import OoyalaIE
  from .orf import ORFIE
  from .pbs import PBSIE
  from .photobucket import PhotobucketIE
  from .podomatic import PodomaticIE
+from .pornhd import PornHdIE
  from .pornhub import PornHubIE
  from .pornotube import PornotubeIE
+from .pyvideo import PyvideoIE
+from .radiofrance import RadioFranceIE
  from .rbmaradio import RBMARadioIE
  from .redtube import RedTubeIE
  from .ringtv import RingTVIE
@@ -125,6 +142,7 @@ from .smotri import (
      SmotriIE,
      SmotriCommunityIE,
      SmotriUserIE,
+    SmotriBroadcastIE,
  )
  from .sohu import SohuIE
  from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
@@ -144,6 +162,7 @@ from .teamcoco import TeamcocoIE
  from .techtalks import TechTalksIE
  from .ted import TEDIE
  from .tf1 import TF1IE
+from .theplatform import ThePlatformIE
  from .thisav import ThisAVIE
  from .toutv import TouTvIE
  from .traileraddict import TrailerAddictIE
@@ -168,6 +187,8 @@ from .vimeo import (
      VimeoIE,
      VimeoChannelIE,
      VimeoUserIE,
+    VimeoAlbumIE,
+    VimeoGroupsIE,
  )
  from .vine import VineIE
  from .viki import VikiIE
@@ -176,6 +197,7 @@ from .wat import WatIE
  from .websurg import WeBSurgIE
  from .weibo import WeiboIE
  from .wimp import WimpIE
+from .wistia import WistiaIE
  from .worldstarhiphop import WorldStarHipHopIE
  from .xhamster import XHamsterIE
  from .xnxx import XNXXIE
@@ -203,6 +225,7 @@ from .youtube import (
      YoutubeWatchLaterIE,
      YoutubeFavouritesIE,
      YoutubeHistoryIE,
+    YoutubeTopListIE,
  )
  from .zdf import ZDFIE
  
diff --git a/youtube_dl/extractor/academicearth.py b/youtube_dl/extractor/academicearth.py

new file mode 100644 (file)

index 0000000..ac05f82
--- /dev/null
+++ b/youtube_dl/extractor/academicearth.py
@@ -0,0 +1,31 @@
+import re
+
+from .common import InfoExtractor
+
+
+class AcademicEarthCourseIE(InfoExtractor):
+    _VALID_URL = r'^https?://(?:www\.)?academicearth\.org/(?:courses|playlists)/(?P<id>[^?#/]+)'
+    IE_NAME = u'AcademicEarth:Course'
+
+    def _real_extract(self, url):
+        m = re.match(self._VALID_URL, url)
+        playlist_id = m.group('id')
+
+        webpage = self._download_webpage(url, playlist_id)
+        title = self._html_search_regex(
+            r'<h1 class="playlist-name">(.*?)</h1>', webpage, u'title')
+        description = self._html_search_regex(
+            r'<p class="excerpt">(.*?)</p>',
+            webpage, u'description', fatal=False)
+        urls = re.findall(
+            r'<h3 class="lecture-title"><a target="_blank" href="([^"]+)">',
+            webpage)
+        entries = [self.url_result(u) for u in urls]
+
+        return {
+            '_type': 'playlist',
+            'id': playlist_id,
+            'title': title,
+            'description': description,
+            'entries': entries,
+        }
diff --git a/youtube_dl/extractor/addanime.py b/youtube_dl/extractor/addanime.py

index b99d4b96689c23a13379d4392484c3763ce0e36f..a3a1b999df25da791617c46a793843b2fd6ddc99 100644 (file)
--- a/youtube_dl/extractor/addanime.py
+++ b/youtube_dl/extractor/addanime.py
@@ -13,7 +13,7 @@ from ..utils import (
  
  class AddAnimeIE(InfoExtractor):
  
-    _VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
+    _VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video\.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
      IE_NAME = u'AddAnime'
      _TEST = {
          u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
diff --git a/youtube_dl/extractor/aparat.py b/youtube_dl/extractor/aparat.py

new file mode 100644 (file)

index 0000000..7e93bc4
--- /dev/null
+++ b/youtube_dl/extractor/aparat.py
@@ -0,0 +1,56 @@
+#coding: utf-8
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    HEADRequest,
+)
+
+
+class AparatIE(InfoExtractor):
+    _VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
+
+    _TEST = {
+        u'url': u'http://www.aparat.com/v/wP8On',
+        u'file': u'wP8On.mp4',
+        u'md5': u'6714e0af7e0d875c5a39c4dc4ab46ad1',
+        u'info_dict': {
+            u"title": u"تیم گلکسی 11 - زومیت",
+        },
+        #u'skip': u'Extremely unreliable',
+    }
+
+    def _real_extract(self, url):
+        m = re.match(self._VALID_URL, url)
+        video_id = m.group('id')
+
+        # Note: There is an easier-to-parse configuration at
+        # http://www.aparat.com/video/video/config/videohash/%video_id
+        # but the URL in there does not work
+        embed_url = (u'http://www.aparat.com/video/video/embed/videohash/' +
+                     video_id + u'/vt/frame')
+        webpage = self._download_webpage(embed_url, video_id)
+
+        video_urls = re.findall(r'fileList\[[0-9]+\]\s*=\s*"([^"]+)"', webpage)
+        for i, video_url in enumerate(video_urls):
+            req = HEADRequest(video_url)
+            res = self._request_webpage(
+                req, video_id, note=u'Testing video URL %d' % i, errnote=False)
+            if res:
+                break
+        else:
+            raise ExtractorError(u'No working video URLs found')
+
+        title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, u'title')
+        thumbnail = self._search_regex(
+            r'\s+image:\s*"([^"]+)"', webpage, u'thumbnail', fatal=False)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'url': video_url,
+            'ext': 'mp4',
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/appletrailers.py b/youtube_dl/extractor/appletrailers.py

index 4befff3942cd5f17fddb48bfb3b4c7f7623af1d6..ef5644aa54fe28002dc4d8c76308941c264252e3 100644 (file)
--- a/youtube_dl/extractor/appletrailers.py
+++ b/youtube_dl/extractor/appletrailers.py
@@ -1,5 +1,4 @@
  import re
-import xml.etree.ElementTree
  import json
  
  from .common import InfoExtractor
@@ -10,7 +9,7 @@ from ..utils import (
  
  
  class AppleTrailersIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?trailers.apple.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
      _TEST = {
          u"url": u"http://trailers.apple.com/trailers/wb/manofsteel/",
          u"playlist": [
@@ -65,18 +64,18 @@ class AppleTrailersIE(InfoExtractor):
          uploader_id = mobj.group('company')
  
          playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
-        playlist_snippet = self._download_webpage(playlist_url, movie)
-        playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
-        playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
-        # The ' in the onClick attributes are not escaped, it couldn't be parsed
-        # with xml.etree.ElementTree.fromstring
-        # like: http://trailers.apple.com/trailers/wb/gravity/
-        def _clean_json(m):
-            return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
-        playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
-        playlist_html = u'<html>' + playlist_cleaned + u'</html>'
+        def fix_html(s):
+            s = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', s)
+            s = re.sub(r'<img ([^<]*?)>', r'<img \1/>', s)
+            # The ' in the onClick attributes are not escaped, it couldn't be parsed
+            # like: http://trailers.apple.com/trailers/wb/gravity/
+            def _clean_json(m):
+                return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
+            s = re.sub(self._JSON_RE, _clean_json, s)
+            s = u'<html>' + s + u'</html>'
+            return s
+        doc = self._download_xml(playlist_url, movie, transform_source=fix_html)
  
-        doc = xml.etree.ElementTree.fromstring(playlist_html)
          playlist = []
          for li in doc.findall('./div/ul/li'):
              on_click = li.find('.//a').attrib['onClick']
diff --git a/youtube_dl/extractor/archiveorg.py b/youtube_dl/extractor/archiveorg.py

index 3ae0aebb1275f0a4b1bed0c1dda3d969c0672a87..8bb546410f7a7486bdaa964bc724cf2c501e8851 100644 (file)
--- a/youtube_dl/extractor/archiveorg.py
+++ b/youtube_dl/extractor/archiveorg.py
@@ -11,7 +11,7 @@ from ..utils import (
  class ArchiveOrgIE(InfoExtractor):
      IE_NAME = 'archive.org'
      IE_DESC = 'archive.org videos'
-    _VALID_URL = r'(?:https?://)?(?:www\.)?archive.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
+    _VALID_URL = r'(?:https?://)?(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
      _TEST = {
          u"url": u"http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect",
          u'file': u'XD300-23_68HighlightsAResearchCntAugHumanIntellect.ogv',
diff --git a/youtube_dl/extractor/arte.py b/youtube_dl/extractor/arte.py

index 8b62ee774cc021d4b77e97aced8034f72d4d64e4..9254fbfe0de5cb9138cb50deeb4719f94c18f92e 100644 (file)
--- a/youtube_dl/extractor/arte.py
+++ b/youtube_dl/extractor/arte.py
@@ -10,6 +10,7 @@ from ..utils import (
      determine_ext,
      get_element_by_id,
      compat_str,
+    get_element_by_attribute,
  )
  
  # There are different sources of video in arte.tv, the extraction process 
@@ -17,8 +18,8 @@ from ..utils import (
  # add tests.
  
  class ArteTvIE(InfoExtractor):
-    _VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
-    _LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
+    _VIDEOS_URL = r'(?:http://)?videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
+    _LIVEWEB_URL = r'(?:http://)?liveweb\.arte\.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
      _LIVE_URL = r'index-[0-9]+\.html$'
  
      IE_NAME = u'arte.tv'
@@ -142,7 +143,9 @@ class ArteTVPlus7IE(InfoExtractor):
  
      def _extract_from_webpage(self, webpage, video_id, lang):
          json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
+        return self._extract_from_json_url(json_url, video_id, lang)
  
+    def _extract_from_json_url(self, json_url, video_id, lang):
          json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
          self.report_extraction(video_id)
          info = json.loads(json_info)
@@ -257,3 +260,21 @@ class ArteTVFutureIE(ArteTVPlus7IE):
          webpage = self._download_webpage(url, anchor_id)
          row = get_element_by_id(anchor_id, webpage)
          return self._extract_from_webpage(row, anchor_id, lang)
+
+
+class ArteTVDDCIE(ArteTVPlus7IE):
+    IE_NAME = u'arte.tv:ddc'
+    _VALID_URL = r'http?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
+
+    def _real_extract(self, url):
+        video_id, lang = self._extract_url_info(url)
+        if lang == 'folge':
+            lang = 'de'
+        elif lang == 'emission':
+            lang = 'fr'
+        webpage = self._download_webpage(url, video_id)
+        scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
+        script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
+        javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
+        json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
+        return self._extract_from_json_url(json_url, video_id, lang)
diff --git a/youtube_dl/extractor/auengine.py b/youtube_dl/extractor/auengine.py

index 95c038003b431dc48ac3bb89dcc03f8aa39ea07f..bcccc0b7a54c8b03b84a3e1303672509577faa66 100644 (file)
--- a/youtube_dl/extractor/auengine.py
+++ b/youtube_dl/extractor/auengine.py
@@ -16,7 +16,7 @@ class AUEngineIE(InfoExtractor):
              u"title": u"[Commie]The Legend of the Legendary Heroes - 03 - Replication Eye (Alpha Stigma)[F9410F5A]"
          }
      }
-    _VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed.php\?.*?file=([^&]+).*?'
+    _VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed\.php\?.*?file=([^&]+).*?'
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/bambuser.py b/youtube_dl/extractor/bambuser.py

index b80508efed09a7ccece8e6980706e7083d3b96e9..d48c0c38d0ecfc787ce364e015d5a53260b922d4 100644 (file)
--- a/youtube_dl/extractor/bambuser.py
+++ b/youtube_dl/extractor/bambuser.py
@@ -54,7 +54,7 @@ class BambuserIE(InfoExtractor):
  
  class BambuserChannelIE(InfoExtractor):
      IE_NAME = u'bambuser:channel'
-    _VALID_URL = r'http://bambuser.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
+    _VALID_URL = r'https?://bambuser\.com/channel/(?P<user>.*?)(?:/|#|\?|$)'
      # The maximum number we can get with each request
      _STEP = 50
  
diff --git a/youtube_dl/extractor/blinkx.py b/youtube_dl/extractor/blinkx.py

new file mode 100644 (file)

index 0000000..144ce64
--- /dev/null
+++ b/youtube_dl/extractor/blinkx.py
@@ -0,0 +1,90 @@
+import datetime
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    remove_start,
+)
+
+
+class BlinkxIE(InfoExtractor):
+    _VALID_URL = r'^(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
+    _IE_NAME = u'blinkx'
+
+    _TEST = {
+        u'url': u'http://www.blinkx.com/ce/8aQUy7GVFYgFzpKhT0oqsilwOGFRVXk3R1ZGWWdGenBLaFQwb3FzaWx3OGFRVXk3R1ZGWWdGenB',
+        u'file': u'8aQUy7GV.mp4',
+        u'md5': u'2e9a07364af40163a908edbf10bb2492',
+        u'info_dict': {
+            u"title": u"Police Car Rolls Away",
+            u"uploader": u"stupidvideos.com",
+            u"upload_date": u"20131215",
+            u"description": u"A police car gently rolls away from a fight. Maybe it felt weird being around a confrontation and just had to get out of there!",
+            u"duration": 14.886,
+            u"thumbnails": [{
+                "width": 100,
+                "height": 76,
+                "url": "http://cdn.blinkx.com/stream/b/41/StupidVideos/20131215/1873969261/1873969261_tn_0.jpg",
+            }],
+        },
+    }
+
+    def _real_extract(self, url):
+        m = re.match(self._VALID_URL, url)
+        video_id = m.group('id')
+        display_id = video_id[:8]
+
+        api_url = (u'https://apib4.blinkx.com/api.php?action=play_video&' +
+                   u'video=%s' % video_id)
+        data_json = self._download_webpage(api_url, display_id)
+        data = json.loads(data_json)['api']['results'][0]
+        dt = datetime.datetime.fromtimestamp(data['pubdate_epoch'])
+        upload_date = dt.strftime('%Y%m%d')
+
+        duration = None
+        thumbnails = []
+        formats = []
+        for m in data['media']:
+            if m['type'] == 'jpg':
+                thumbnails.append({
+                    'url': m['link'],
+                    'width': int(m['w']),
+                    'height': int(m['h']),
+                })
+            elif m['type'] == 'original':
+                duration = m['d']
+            elif m['type'] == 'youtube':
+                yt_id = m['link']
+                self.to_screen(u'Youtube video detected: %s' % yt_id)
+                return self.url_result(yt_id, 'Youtube', video_id=yt_id)
+            elif m['type'] in ('flv', 'mp4'):
+                vcodec = remove_start(m['vcodec'], 'ff')
+                acodec = remove_start(m['acodec'], 'ff')
+                format_id = (u'%s-%sk-%s' %
+                             (vcodec,
+                              (int(m['vbr']) + int(m['abr'])) // 1000,
+                              m['w']))
+                formats.append({
+                    'format_id': format_id,
+                    'url': m['link'],
+                    'vcodec': vcodec,
+                    'acodec': acodec,
+                    'abr': int(m['abr']) // 1000,
+                    'vbr': int(m['vbr']) // 1000,
+                    'width': int(m['w']),
+                    'height': int(m['h']),
+                })
+        formats.sort(key=lambda f: (f['width'], f['vbr'], f['abr']))
+
+        return {
+            'id': display_id,
+            'fullid': video_id,
+            'title': data['title'],
+            'formats': formats,
+            'uploader': data['channel_name'],
+            'upload_date': upload_date,
+            'description': data.get('description'),
+            'thumbnails': thumbnails,
+            'duration': duration,
+        }
diff --git a/youtube_dl/extractor/bliptv.py b/youtube_dl/extractor/bliptv.py

index 493504f75082f7b7605121acbfd88dbb621e84fb..5e33a69df42fcbaa1b17f1737d66f5841ca50318 100644 (file)
--- a/youtube_dl/extractor/bliptv.py
+++ b/youtube_dl/extractor/bliptv.py
@@ -51,8 +51,7 @@ class BlipTVIE(InfoExtractor):
              url = 'http://blip.tv/play/g_%s' % api_mobj.group('video_id')
          urlp = compat_urllib_parse_urlparse(url)
          if urlp.path.startswith('/play/'):
-            request = compat_urllib_request.Request(url)
-            response = compat_urllib_request.urlopen(request)
+            response = self._request_webpage(url, None, False)
              redirecturl = response.geturl()
              rurlp = compat_urllib_parse_urlparse(redirecturl)
              file_id = compat_parse_qs(rurlp.fragment)['file'][0].rpartition('/')[2]
@@ -69,25 +68,23 @@ class BlipTVIE(InfoExtractor):
          request.add_header('User-Agent', 'iTunes/10.6.1')
          self.report_extraction(mobj.group(1))
          info = None
-        try:
-            urlh = compat_urllib_request.urlopen(request)
-            if urlh.headers.get('Content-Type', '').startswith('video/'): # Direct download
-                basename = url.split('/')[-1]
-                title,ext = os.path.splitext(basename)
-                title = title.decode('UTF-8')
-                ext = ext.replace('.', '')
-                self.report_direct_download(title)
-                info = {
-                    'id': title,
-                    'url': url,
-                    'uploader': None,
-                    'upload_date': None,
-                    'title': title,
-                    'ext': ext,
-                    'urlhandle': urlh
-                }
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            raise ExtractorError(u'ERROR: unable to download video info webpage: %s' % compat_str(err))
+        urlh = self._request_webpage(request, None, False,
+            u'unable to download video info webpage')
+        if urlh.headers.get('Content-Type', '').startswith('video/'): # Direct download
+            basename = url.split('/')[-1]
+            title,ext = os.path.splitext(basename)
+            title = title.decode('UTF-8')
+            ext = ext.replace('.', '')
+            self.report_direct_download(title)
+            info = {
+                'id': title,
+                'url': url,
+                'uploader': None,
+                'upload_date': None,
+                'title': title,
+                'ext': ext,
+                'urlhandle': urlh
+            }
          if info is None: # Regular URL
              try:
                  json_code_bytes = urlh.read()
diff --git a/youtube_dl/extractor/bloomberg.py b/youtube_dl/extractor/bloomberg.py

index 3666a780b9209da0125d319e3851f40a05bc4e4f..755d9c9ef2a093289df91409097320908ea06df7 100644 (file)
--- a/youtube_dl/extractor/bloomberg.py
+++ b/youtube_dl/extractor/bloomberg.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class BloombergIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?).html'
+    _VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?)\.html'
  
      _TEST = {
          u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 66fe0ac9ade6fad80d77f0429c136c2d022af16d..f7f0041c0872f84349d2ee060ef8ada9aed9d6bd 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -26,7 +26,7 @@ class BrightcoveIE(InfoExtractor):
              # From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
              u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
              u'file': u'2371591881001.mp4',
-            u'md5': u'8eccab865181d29ec2958f32a6a754f5',
+            u'md5': u'5423e113865d26e40624dce2e4b45d95',
              u'note': u'Test Brightcove downloads and detection in GenericIE',
              u'info_dict': {
                  u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
@@ -55,6 +55,18 @@ class BrightcoveIE(InfoExtractor):
                  u'uploader': u'Mashable',
              },
          },
+        {
+            # test that the default referer works
+            # from http://national.ballet.ca/interact/video/Lost_in_Motion_II/
+            u'url': u'http://link.brightcove.com/services/player/bcpid756015033001?bckey=AQ~~,AAAApYJi_Ck~,GxhXCegT1Dp39ilhXuxMJxasUhVNZiil&bctid=2878862109001',
+            u'info_dict': {
+                u'id': u'2878862109001',
+                u'ext': u'mp4',
+                u'title': u'Lost in Motion II',
+                u'description': u'md5:363109c02998fee92ec02211bd8000df',
+                u'uploader': u'National Ballet of Canada',
+            },
+        },
      ]
  
      @classmethod
@@ -118,17 +130,21 @@ class BrightcoveIE(InfoExtractor):
  
          videoPlayer = query.get('@videoPlayer')
          if videoPlayer:
-            return self._get_video_info(videoPlayer[0], query_str, query)
+            return self._get_video_info(videoPlayer[0], query_str, query,
+                # We set the original url as the default 'Referer' header
+                referer=url)
          else:
              player_key = query['playerKey']
              return self._get_playlist_info(player_key[0])
  
-    def _get_video_info(self, video_id, query_str, query):
+    def _get_video_info(self, video_id, query_str, query, referer=None):
          request_url = self._FEDERATED_URL_TEMPLATE % query_str
          req = compat_urllib_request.Request(request_url)
          linkBase = query.get('linkBaseURL')
          if linkBase is not None:
-            req.add_header('Referer', linkBase[0])
+            referer = linkBase[0]
+        if referer is not None:
+            req.add_header('Referer', referer)
          webpage = self._download_webpage(req, video_id)
  
          self.report_extraction(video_id)
diff --git a/youtube_dl/extractor/cbs.py b/youtube_dl/extractor/cbs.py

new file mode 100644 (file)

index 0000000..ac03158
--- /dev/null
+++ b/youtube_dl/extractor/cbs.py
@@ -0,0 +1,30 @@
+import re
+
+from .common import InfoExtractor
+
+
+class CBSIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?cbs\.com/shows/[^/]+/video/(?P<id>[^/]+)/.*'
+
+    _TEST = {
+        u'url': u'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
+        u'file': u'4JUVEwq3wUT7.flv',
+        u'info_dict': {
+            u'title': u'Connect Chat feat. Garth Brooks',
+            u'description': u'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
+            u'duration': 1495,
+        },
+        u'params': {
+            # rtmp download
+            u'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        webpage = self._download_webpage(url, video_id)
+        real_id = self._search_regex(
+            r"video\.settings\.pid\s*=\s*'([^']+)';",
+            webpage, u'real video ID')
+        return self.url_result(u'theplatform:%s' % real_id)
diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py

new file mode 100644 (file)

index 0000000..ae70ea2
--- /dev/null
+++ b/youtube_dl/extractor/channel9.py
@@ -0,0 +1,267 @@
+# encoding: utf-8
+
+import re
+
+from .common import InfoExtractor
+from ..utils import ExtractorError
+
+class Channel9IE(InfoExtractor):
+    '''
+    Common extractor for channel9.msdn.com.
+
+    The type of provided URL (video or playlist) is determined according to
+    meta Search.PageType from web page HTML rather than URL itself, as it is
+    not always possible to do.    
+    '''
+    IE_DESC = u'Channel 9'
+    IE_NAME = u'channel9'
+    _VALID_URL = r'^https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
+
+    _TESTS = [
+        {
+            u'url': u'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
+            u'file': u'Events_TechEd_Australia_2013_KOS002.mp4',
+            u'md5': u'bbd75296ba47916b754e73c3a4bbdf10',
+            u'info_dict': {
+                u'title': u'Developer Kick-Off Session: Stuff We Love',
+                u'description': u'md5:c08d72240b7c87fcecafe2692f80e35f',
+                u'duration': 4576,
+                u'thumbnail': u'http://media.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
+                u'session_code': u'KOS002',
+                u'session_day': u'Day 1',
+                u'session_room': u'Arena 1A',
+                u'session_speakers': [ u'Ed Blankenship', u'Andrew Coates', u'Brady Gaster', u'Patrick Klug', u'Mads Kristensen' ],
+            },
+        },
+        {
+            u'url': u'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
+            u'file': u'posts_Self-service-BI-with-Power-BI-nuclear-testing.mp4',
+            u'md5': u'b43ee4529d111bc37ba7ee4f34813e68',
+            u'info_dict': {
+                u'title': u'Self-service BI with Power BI - nuclear testing',
+                u'description': u'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
+                u'duration': 1540,
+                u'thumbnail': u'http://media.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
+                u'authors': [ u'Mike Wilmot' ],
+            },
+        }
+    ]
+
+    _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
+
+    # Sorted by quality
+    _known_formats = ['MP3', 'MP4', 'Mid Quality WMV', 'Mid Quality MP4', 'High Quality WMV', 'High Quality MP4']
+
+    def _restore_bytes(self, formatted_size):
+        if not formatted_size:
+            return 0
+        m = re.match(r'^(?P<size>\d+(?:\.\d+)?)\s+(?P<units>[a-zA-Z]+)', formatted_size)
+        if not m:
+            return 0
+        units = m.group('units')
+        try:
+            exponent = [u'B', u'KB', u'MB', u'GB', u'TB', u'PB', u'EB', u'ZB', u'YB'].index(units.upper())
+        except ValueError:
+            return 0
+        size = float(m.group('size'))
+        return int(size * (1024 ** exponent))
+
+    def _formats_from_html(self, html):
+        FORMAT_REGEX = r'''
+            (?x)
+            <a\s+href="(?P<url>[^"]+)">(?P<quality>[^<]+)</a>\s*
+            <span\s+class="usage">\((?P<note>[^\)]+)\)</span>\s*
+            (?:<div\s+class="popup\s+rounded">\s*
+            <h3>File\s+size</h3>\s*(?P<filesize>.*?)\s*
+            </div>)?                                                # File size part may be missing
+        '''
+        # Extract known formats
+        formats = [{'url': x.group('url'),
+                 'format_id': x.group('quality'),
+                 'format_note': x.group('note'),
+                 'format': '%s (%s)' % (x.group('quality'), x.group('note')), 
+                 'filesize': self._restore_bytes(x.group('filesize')), # File size is approximate
+                 } for x in list(re.finditer(FORMAT_REGEX, html)) if x.group('quality') in self._known_formats]
+        # Sort according to known formats list
+        formats.sort(key=lambda fmt: self._known_formats.index(fmt['format_id']))
+        return formats
+
+    def _extract_title(self, html):
+        title = self._html_search_meta(u'title', html, u'title')
+        if title is None:           
+            title = self._og_search_title(html)
+            TITLE_SUFFIX = u' (Channel 9)'
+            if title is not None and title.endswith(TITLE_SUFFIX):
+                title = title[:-len(TITLE_SUFFIX)]
+        return title
+
+    def _extract_description(self, html):
+        DESCRIPTION_REGEX = r'''(?sx)
+            <div\s+class="entry-content">\s*
+            <div\s+id="entry-body">\s*
+            (?P<description>.+?)\s*
+            </div>\s*
+            </div>
+        '''
+        m = re.search(DESCRIPTION_REGEX, html)
+        if m is not None:
+            return m.group('description')
+        return self._html_search_meta(u'description', html, u'description')
+
+    def _extract_duration(self, html):
+        m = re.search(r'data-video_duration="(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})"', html)
+        return ((int(m.group('hours')) * 60 * 60) + (int(m.group('minutes')) * 60) + int(m.group('seconds'))) if m else None
+
+    def _extract_slides(self, html):
+        m = re.search(r'<a href="(?P<slidesurl>[^"]+)" class="slides">Slides</a>', html)
+        return m.group('slidesurl') if m is not None else None
+
+    def _extract_zip(self, html):
+        m = re.search(r'<a href="(?P<zipurl>[^"]+)" class="zip">Zip</a>', html)
+        return m.group('zipurl') if m is not None else None
+
+    def _extract_avg_rating(self, html):
+        m = re.search(r'<p class="avg-rating">Avg Rating: <span>(?P<avgrating>[^<]+)</span></p>', html)
+        return float(m.group('avgrating')) if m is not None else 0
+
+    def _extract_rating_count(self, html):
+        m = re.search(r'<div class="rating-count">\((?P<ratingcount>[^<]+)\)</div>', html)
+        return int(self._fix_count(m.group('ratingcount'))) if m is not None else 0
+
+    def _extract_view_count(self, html):
+        m = re.search(r'<li class="views">\s*<span class="count">(?P<viewcount>[^<]+)</span> Views\s*</li>', html)
+        return int(self._fix_count(m.group('viewcount'))) if m is not None else 0
+
+    def _extract_comment_count(self, html):
+        m = re.search(r'<li class="comments">\s*<a href="#comments">\s*<span class="count">(?P<commentcount>[^<]+)</span> Comments\s*</a>\s*</li>', html)
+        return int(self._fix_count(m.group('commentcount'))) if m is not None else 0
+
+    def _fix_count(self, count):
+        return int(str(count).replace(',', '')) if count is not None else None
+
+    def _extract_authors(self, html):
+        m = re.search(r'(?s)<li class="author">(.*?)</li>', html)
+        if m is None:
+            return None
+        return re.findall(r'<a href="/Niners/[^"]+">([^<]+)</a>', m.group(1))
+
+    def _extract_session_code(self, html):
+        m = re.search(r'<li class="code">\s*(?P<code>.+?)\s*</li>', html)
+        return m.group('code') if m is not None else None
+
+    def _extract_session_day(self, html):
+        m = re.search(r'<li class="day">\s*<a href="/Events/[^"]+">(?P<day>[^<]+)</a>\s*</li>', html)
+        return m.group('day') if m is not None else None
+
+    def _extract_session_room(self, html):
+        m = re.search(r'<li class="room">\s*(?P<room>.+?)\s*</li>', html)
+        return m.group('room') if m is not None else None
+
+    def _extract_session_speakers(self, html):
+        return re.findall(r'<a href="/Events/Speakers/[^"]+">([^<]+)</a>', html)
+
+    def _extract_content(self, html, content_path):
+        # Look for downloadable content        
+        formats = self._formats_from_html(html)
+        slides = self._extract_slides(html)
+        zip_ = self._extract_zip(html)
+
+        # Nothing to download
+        if len(formats) == 0 and slides is None and zip_ is None:
+            self._downloader.report_warning(u'None of recording, slides or zip are available for %s' % content_path)
+            return
+
+        # Extract meta
+        title = self._extract_title(html)
+        description = self._extract_description(html)
+        thumbnail = self._og_search_thumbnail(html)
+        duration = self._extract_duration(html)
+        avg_rating = self._extract_avg_rating(html)
+        rating_count = self._extract_rating_count(html)
+        view_count = self._extract_view_count(html)
+        comment_count = self._extract_comment_count(html)
+
+        common = {'_type': 'video',
+                  'id': content_path,
+                  'description': description,
+                  'thumbnail': thumbnail,
+                  'duration': duration,
+                  'avg_rating': avg_rating,
+                  'rating_count': rating_count,
+                  'view_count': view_count,
+                  'comment_count': comment_count,
+                }
+
+        result = []
+
+        if slides is not None:
+            d = common.copy()
+            d.update({ 'title': title + '-Slides', 'url': slides })
+            result.append(d)
+
+        if zip_ is not None:
+            d = common.copy()
+            d.update({ 'title': title + '-Zip', 'url': zip_ })
+            result.append(d)
+
+        if len(formats) > 0:
+            d = common.copy()
+            d.update({ 'title': title, 'formats': formats })
+            result.append(d)
+
+        return result
+
+    def _extract_entry_item(self, html, content_path):
+        contents = self._extract_content(html, content_path)
+        if contents is None:
+            return contents
+
+        authors = self._extract_authors(html)
+
+        for content in contents:
+            content['authors'] = authors
+
+        return contents
+
+    def _extract_session(self, html, content_path):
+        contents = self._extract_content(html, content_path)
+        if contents is None:
+            return contents
+
+        session_meta = {'session_code': self._extract_session_code(html),
+                        'session_day': self._extract_session_day(html),
+                        'session_room': self._extract_session_room(html),
+                        'session_speakers': self._extract_session_speakers(html),
+                        }
+
+        for content in contents:
+            content.update(session_meta)
+
+        return contents
+
+    def _extract_list(self, content_path):
+        rss = self._download_xml(self._RSS_URL % content_path, content_path, u'Downloading RSS')
+        entries = [self.url_result(session_url.text, 'Channel9')
+                   for session_url in rss.findall('./channel/item/link')]
+        title_text = rss.find('./channel/title').text
+        return self.playlist_result(entries, content_path, title_text)
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        content_path = mobj.group('contentpath')
+
+        webpage = self._download_webpage(url, content_path, u'Downloading web page')
+
+        page_type_m = re.search(r'<meta name="Search.PageType" content="(?P<pagetype>[^"]+)"/>', webpage)
+        if page_type_m is None:
+            raise ExtractorError(u'Search.PageType not found, don\'t know how to process this page', expected=True)
+
+        page_type = page_type_m.group('pagetype')
+        if page_type == 'List':         # List page, may contain list of 'item'-like objects
+            return self._extract_list(content_path)
+        elif page_type == 'Entry.Item': # Any 'item'-like page, may contain downloadable content
+            return self._extract_entry_item(webpage, content_path)
+        elif page_type == 'Session':    # Event session page, may contain downloadable content
+            return self._extract_session(webpage, content_path)
+        else:
+            raise ExtractorError(u'Unexpected Search.PageType %s' % page_type, expected=True)
+\ No newline at end of file
diff --git a/youtube_dl/extractor/clipsyndicate.py b/youtube_dl/extractor/clipsyndicate.py

index d4fc869732a8ae15e60f0963ef3418abac1a9201..c60089ad353274adaa380671cee9d4e3ce2e2718 100644 (file)
--- a/youtube_dl/extractor/clipsyndicate.py
+++ b/youtube_dl/extractor/clipsyndicate.py
@@ -1,9 +1,9 @@
  import re
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
  from ..utils import (
      find_xpath_attr,
+    fix_xml_all_ampersand,
  )
  
  
@@ -30,12 +30,10 @@ class ClipsyndicateIE(InfoExtractor):
          # it includes a required token
          flvars = self._search_regex(r'flvars: "(.*?)"', js_player, u'flvars')
  
-        playlist_page = self._download_webpage(
+        pdoc = self._download_xml(
              'http://eplayer.clipsyndicate.com/osmf/playlist?%s' % flvars,
-            video_id, u'Downloading video info') 
-        # Fix broken xml
-        playlist_page = re.sub('&', '&amp;', playlist_page)
-        pdoc = xml.etree.ElementTree.fromstring(playlist_page.encode('utf-8'))
+            video_id, u'Downloading video info',
+            transform_source=fix_xml_all_ampersand) 
  
          track_doc = pdoc.find('trackList/track')
          def find_param(name):
diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py

index 53579aa2703e78150c14dfe5dfd35e6240310952..a54ce3ee7c44727a9e56b1ab8359bd099b48bb35 100644 (file)
--- a/youtube_dl/extractor/comedycentral.py
+++ b/youtube_dl/extractor/comedycentral.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class ComedyCentralIE(MTVServicesInfoExtractor):
-    _VALID_URL = r'http://www.comedycentral.com/(video-clips|episodes|cc-studios)/(?P<title>.*)'
+    _VALID_URL = r'https?://(?:www.)?comedycentral.com/(video-clips|episodes|cc-studios)/(?P<title>.*)'
      _FEED_URL = u'http://comedycentral.com/feeds/mrss/'
  
      _TEST = {
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 1b049082de5bbc9541a6513acc4124a478b87ea0..ba46a7bc77d17ed4bcf4dcf7764b1d39f4799958 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -18,6 +18,7 @@ from ..utils import (
      sanitize_filename,
      unescapeHTML,
  )
+_NO_DEFAULT = object()
  
  
  class InfoExtractor(object):
@@ -34,15 +35,39 @@ class InfoExtractor(object):
      The dictionaries must include the following fields:
  
      id:             Video identifier.
-    url:            Final video URL.
      title:          Video title, unescaped.
-    ext:            Video filename extension.
  
-    Instead of url and ext, formats can also specified.
+    Additionally, it must contain either a formats entry or url and ext:
+
+    formats:        A list of dictionaries for each format available, it must
+                    be ordered from worst to best quality. Potential fields:
+                    * url        Mandatory. The URL of the video file
+                    * ext        Will be calculated from url if missing
+                    * format     A human-readable description of the format
+                                 ("mp4 container with h264/opus").
+                                 Calculated from the format_id, width, height.
+                                 and format_note fields if missing.
+                    * format_id  A short description of the format
+                                 ("mp4_h264_opus" or "19")
+                    * format_note Additional info about the format
+                                 ("3D" or "DASH video")
+                    * width      Width of the video, if known
+                    * height     Height of the video, if known
+                    * abr        Average audio bitrate in KBit/s
+                    * acodec     Name of the audio codec in use
+                    * vbr        Average video bitrate in KBit/s
+                    * vcodec     Name of the video codec in use
+                    * filesize   The number of bytes, if known in advance
+                    * player_url SWF Player URL (used for rtmpdump).
+    url:            Final video URL.
+    ext:            Video filename extension.
+    format:         The video format, defaults to ext (used for --get-format)
+    player_url:     SWF Player URL (used for rtmpdump).
+    urlhandle:      [internal] The urlHandle to be used to download the file,
+                    like returned by urllib.request.urlopen
  
      The following fields are optional:
  
-    format:         The video format, defaults to ext (used for --get-format)
      thumbnails:     A list of dictionaries (with the entries "resolution" and
                      "url") for the varying thumbnails
      thumbnail:      Full URL to a video thumbnail image.
@@ -51,32 +76,14 @@ class InfoExtractor(object):
      upload_date:    Video upload date (YYYYMMDD).
      uploader_id:    Nickname or id of the video uploader.
      location:       Physical location of the video.
-    player_url:     SWF Player URL (used for rtmpdump).
      subtitles:      The subtitle file contents as a dictionary in the format
                      {language: subtitles}.
+    duration:       Length of the video in seconds, as an integer.
      view_count:     How many users have watched the video on the platform.
-    urlhandle:      [internal] The urlHandle to be used to download the file,
-                    like returned by urllib.request.urlopen
+    like_count:     Number of positive ratings of the video
+    dislike_count:  Number of negative ratings of the video
+    comment_count:  Number of comments on the video
      age_limit:      Age restriction for the video, as an integer (years)
-    formats:        A list of dictionaries for each format available, it must
-                    be ordered from worst to best quality. Potential fields:
-                    * url       Mandatory. The URL of the video file
-                    * ext       Will be calculated from url if missing
-                    * format    A human-readable description of the format
-                                ("mp4 container with h264/opus").
-                                Calculated from the format_id, width, height.
-                                and format_note fields if missing.
-                    * format_id A short description of the format
-                                ("mp4_h264_opus" or "19")
-                    * format_note Additional info about the format
-                                ("3D" or "DASH video")
-                    * width     Width of the video, if known
-                    * height    Height of the video, if known
-                    * abr       Average audio bitrate in KBit/s
-                    * acodec    Name of the audio codec in use
-                    * vbr       Average video bitrate in KBit/s
-                    * vcodec    Name of the video codec in use
-                    * filesize  The number of bytes, if known in advance
      webpage_url:    The url to the video webpage, if given to youtube-dl it
                      should allow to get the same result again. (It will be set
                      by YoutubeDL if it's missing)
@@ -151,27 +158,40 @@ class InfoExtractor(object):
      def IE_NAME(self):
          return type(self).__name__[:-2]
  
-    def _request_webpage(self, url_or_request, video_id, note=None, errnote=None):
+    def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
          """ Returns the response handle """
          if note is None:
              self.report_download_webpage(video_id)
          elif note is not False:
-            self.to_screen(u'%s: %s' % (video_id, note))
+            if video_id is None:
+                self.to_screen(u'%s' % (note,))
+            else:
+                self.to_screen(u'%s: %s' % (video_id, note))
          try:
              return self._downloader.urlopen(url_or_request)
          except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
+            if errnote is False:
+                return False
              if errnote is None:
                  errnote = u'Unable to download webpage'
-            raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2], cause=err)
+            errmsg = u'%s: %s' % (errnote, compat_str(err))
+            if fatal:
+                raise ExtractorError(errmsg, sys.exc_info()[2], cause=err)
+            else:
+                self._downloader.report_warning(errmsg)
+                return False
  
-    def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None):
+    def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
          """ Returns a tuple (page content as string, URL handle) """
  
          # Strip hashes from the URL (#1038)
          if isinstance(url_or_request, (compat_str, str)):
              url_or_request = url_or_request.partition('#')[0]
  
-        urlh = self._request_webpage(url_or_request, video_id, note, errnote)
+        urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal)
+        if urlh is False:
+            assert not fatal
+            return False
          content_type = urlh.headers.get('Content-Type', '')
          webpage_bytes = urlh.read()
          m = re.match(r'[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+\s*;\s*charset=(.+)', content_type)
@@ -206,14 +226,22 @@ class InfoExtractor(object):
          content = webpage_bytes.decode(encoding, 'replace')
          return (content, urlh)
  
-    def _download_webpage(self, url_or_request, video_id, note=None, errnote=None):
+    def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
          """ Returns the data of the page as a string """
-        return self._download_webpage_handle(url_or_request, video_id, note, errnote)[0]
+        res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal)
+        if res is False:
+            return res
+        else:
+            content, _ = res
+            return content
  
      def _download_xml(self, url_or_request, video_id,
-                      note=u'Downloading XML', errnote=u'Unable to download XML'):
+                      note=u'Downloading XML', errnote=u'Unable to download XML',
+                      transform_source=None):
          """Return the xml as an xml.etree.ElementTree.Element"""
          xml_string = self._download_webpage(url_or_request, video_id, note, errnote)
+        if transform_source:
+            xml_string = transform_source(xml_string)
          return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
  
      def to_screen(self, msg):
@@ -237,7 +265,8 @@ class InfoExtractor(object):
          self.to_screen(u'Logging in')
  
      #Methods for following #608
-    def url_result(self, url, ie=None, video_id=None):
+    @staticmethod
+    def url_result(url, ie=None, video_id=None):
          """Returns a url that points to a page that should be processed"""
          #TODO: ie should be the class used for getting the info
          video_info = {'_type': 'url',
@@ -246,7 +275,8 @@ class InfoExtractor(object):
          if video_id is not None:
              video_info['id'] = video_id
          return video_info
-    def playlist_result(self, entries, playlist_id=None, playlist_title=None):
+    @staticmethod
+    def playlist_result(entries, playlist_id=None, playlist_title=None):
          """Returns a playlist"""
          video_info = {'_type': 'playlist',
                        'entries': entries}
@@ -256,7 +286,7 @@ class InfoExtractor(object):
              video_info['title'] = playlist_title
          return video_info
  
-    def _search_regex(self, pattern, string, name, default=None, fatal=True, flags=0):
+    def _search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0):
          """
          Perform a regex search on the given string, using a single or a list of
          patterns returning the first matching group.
@@ -270,7 +300,7 @@ class InfoExtractor(object):
                  mobj = re.search(p, string, flags)
                  if mobj: break
  
-        if sys.stderr.isatty() and os.name != 'nt':
+        if os.name != 'nt' and sys.stderr.isatty():
              _name = u'\033[0;34m%s\033[0m' % name
          else:
              _name = name
@@ -278,7 +308,7 @@ class InfoExtractor(object):
          if mobj:
              # return the first matching group
              return next(g for g in mobj.groups() if g is not None)
-        elif default is not None:
+        elif default is not _NO_DEFAULT:
              return default
          elif fatal:
              raise RegexNotFoundError(u'Unable to extract %s' % _name)
@@ -287,7 +317,7 @@ class InfoExtractor(object):
                  u'please report this issue on http://yt-dl.org/bug' % _name)
              return None
  
-    def _html_search_regex(self, pattern, string, name, default=None, fatal=True, flags=0):
+    def _html_search_regex(self, pattern, string, name, default=_NO_DEFAULT, fatal=True, flags=0):
          """
          Like _search_regex, but strips HTML tags and unescapes entities.
          """
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

new file mode 100644 (file)

index 0000000..2b66bdd
--- /dev/null
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -0,0 +1,171 @@
+# encoding: utf-8
+import re, base64, zlib
+from hashlib import sha1
+from math import pow, sqrt, floor
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    compat_urllib_parse,
+    compat_urllib_request,
+    bytes_to_intlist,
+    intlist_to_bytes,
+    unified_strdate,
+    clean_html,
+)
+from ..aes import (
+    aes_cbc_decrypt,
+    inc,
+)
+
+class CrunchyrollIE(InfoExtractor):
+    _VALID_URL = r'(?:https?://)?(?:www\.)?(?P<url>crunchyroll\.com/[^/]*/[^/?&]*?(?P<video_id>[0-9]+))(?:[/?&]|$)'
+    _TESTS = [{
+        u'url': u'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
+        u'file': u'645513.flv',
+        #u'md5': u'b1639fd6ddfaa43788c85f6d1dddd412',
+        u'info_dict': {
+            u'title': u'Wanna be the Strongest in the World Episode 1 – An Idol-Wrestler is Born!',
+            u'description': u'md5:2d17137920c64f2f49981a7797d275ef',
+            u'thumbnail': u'http://img1.ak.crunchyroll.com/i/spire1-tmb/20c6b5e10f1a47b10516877d3c039cae1380951166_full.jpg',
+            u'uploader': u'Yomiuri Telecasting Corporation (YTV)',
+            u'upload_date': u'20131013',
+        },
+        u'params': {
+            # rtmp
+            u'skip_download': True,
+        },
+    }]
+
+    _FORMAT_IDS = {
+        u'360': (u'60', u'106'),
+        u'480': (u'61', u'106'),
+        u'720': (u'62', u'106'),
+        u'1080': (u'80', u'108'),
+    }
+
+    def _decrypt_subtitles(self, data, iv, id):
+        data = bytes_to_intlist(data)
+        iv = bytes_to_intlist(iv)
+        id = int(id)
+
+        def obfuscate_key_aux(count, modulo, start):
+            output = list(start)
+            for _ in range(count):
+                output.append(output[-1] + output[-2])
+            # cut off start values
+            output = output[2:]
+            output = list(map(lambda x: x % modulo + 33, output))
+            return output
+
+        def obfuscate_key(key):
+            num1 = int(floor(pow(2, 25) * sqrt(6.9)))
+            num2 = (num1 ^ key) << 5
+            num3 = key ^ num1
+            num4 = num3 ^ (num3 >> 3) ^ num2
+            prefix = intlist_to_bytes(obfuscate_key_aux(20, 97, (1, 2)))
+            shaHash = bytes_to_intlist(sha1(prefix + str(num4).encode(u'ascii')).digest())
+            # Extend 160 Bit hash to 256 Bit
+            return shaHash + [0] * 12
+        
+        key = obfuscate_key(id)
+        class Counter:
+            __value = iv
+            def next_value(self):
+                temp = self.__value
+                self.__value = inc(self.__value)
+                return temp
+        decrypted_data = intlist_to_bytes(aes_cbc_decrypt(data, key, iv))
+        return zlib.decompress(decrypted_data)
+
+    def _convert_subtitles_to_srt(self, subtitles):
+        i=1
+        output = u''
+        for start, end, text in re.findall(r'<event [^>]*?start="([^"]+)" [^>]*?end="([^"]+)" [^>]*?text="([^"]+)"[^>]*?>', subtitles):
+            start = start.replace(u'.', u',')
+            end = end.replace(u'.', u',')
+            text = clean_html(text)
+            text = text.replace(u'\\N', u'\n')
+            if not text:
+                continue
+            output += u'%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
+            i+=1
+        return output
+
+    def _real_extract(self,url):
+        mobj = re.match(self._VALID_URL, url)
+
+        webpage_url = u'http://www.' + mobj.group('url')
+        video_id = mobj.group(u'video_id')
+        webpage = self._download_webpage(webpage_url, video_id)
+        note_m = self._html_search_regex(r'<div class="showmedia-trailer-notice">(.+?)</div>', webpage, u'trailer-notice', default=u'')
+        if note_m:
+            raise ExtractorError(note_m)
+
+        video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, u'video_title', flags=re.DOTALL)
+        video_title = re.sub(r' {2,}', u' ', video_title)
+        video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, u'video_description', default=u'')
+        if not video_description:
+            video_description = None
+        video_upload_date = self._html_search_regex(r'<div>Availability for free users:(.+?)</div>', webpage, u'video_upload_date', fatal=False, flags=re.DOTALL)
+        if video_upload_date:
+            video_upload_date = unified_strdate(video_upload_date)
+        video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, u'video_uploader', fatal=False, flags=re.DOTALL)
+
+        playerdata_url = compat_urllib_parse.unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, u'playerdata_url'))
+        playerdata_req = compat_urllib_request.Request(playerdata_url)
+        playerdata_req.data = compat_urllib_parse.urlencode({u'current_page': webpage_url})
+        playerdata_req.add_header(u'Content-Type', u'application/x-www-form-urlencoded')
+        playerdata = self._download_webpage(playerdata_req, video_id, note=u'Downloading media info')
+        
+        stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, u'stream_id')
+        video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, u'thumbnail', fatal=False)
+
+        formats = []
+        for fmt in re.findall(r'\?p([0-9]{3,4})=1', webpage):
+            stream_quality, stream_format = self._FORMAT_IDS[fmt]
+            video_format = fmt+u'p'
+            streamdata_req = compat_urllib_request.Request(u'http://www.crunchyroll.com/xml/')
+            # urlencode doesn't work!
+            streamdata_req.data = u'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality='+stream_quality+u'&media%5Fid='+stream_id+u'&video%5Fformat='+stream_format
+            streamdata_req.add_header(u'Content-Type', u'application/x-www-form-urlencoded')
+            streamdata_req.add_header(u'Content-Length', str(len(streamdata_req.data)))
+            streamdata = self._download_webpage(streamdata_req, video_id, note=u'Downloading media info for '+video_format)
+            video_url = self._search_regex(r'<host>([^<]+)', streamdata, u'video_url')
+            video_play_path = self._search_regex(r'<file>([^<]+)', streamdata, u'video_play_path')
+            formats.append({
+                u'url': video_url,
+                u'play_path':   video_play_path,
+                u'ext': 'flv',
+                u'format': video_format,
+                u'format_id': video_format,
+            })
+
+        subtitles = {}
+        for sub_id, sub_name in re.findall(r'\?ssid=([0-9]+)" title="([^"]+)', webpage):
+            sub_page = self._download_webpage(u'http://www.crunchyroll.com/xml/?req=RpcApiSubtitle_GetXml&subtitle_script_id='+sub_id,\
+                                              video_id, note=u'Downloading subtitles for '+sub_name)
+            id = self._search_regex(r'id=\'([0-9]+)', sub_page, u'subtitle_id', fatal=False)
+            iv = self._search_regex(r'<iv>([^<]+)', sub_page, u'subtitle_iv', fatal=False)
+            data = self._search_regex(r'<data>([^<]+)', sub_page, u'subtitle_data', fatal=False)
+            if not id or not iv or not data:
+                continue
+            id = int(id)
+            iv = base64.b64decode(iv)
+            data = base64.b64decode(data)
+
+            subtitle = self._decrypt_subtitles(data, iv, id).decode(u'utf-8')
+            lang_code = self._search_regex(r'lang_code=\'([^\']+)', subtitle, u'subtitle_lang_code', fatal=False)
+            if not lang_code:
+                continue
+            subtitles[lang_code] = self._convert_subtitles_to_srt(subtitle)
+
+        return {
+            u'id':          video_id,
+            u'title':       video_title,
+            u'description': video_description,
+            u'thumbnail':   video_thumbnail,
+            u'uploader':    video_uploader,
+            u'upload_date': video_upload_date,
+            u'subtitles':   subtitles,
+            u'formats':     formats,
+        }
diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py

index 7bf03c584c7388b162c9b3912a4aa0f410ed5b22..d5730684dc497b37d7ff57098f5a156ff620e40e 100644 (file)
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -6,7 +6,7 @@ from ..utils import (
  )
  
  class CSpanIE(InfoExtractor):
-    _VALID_URL = r'http://www.c-spanvideo.org/program/(.*)'
+    _VALID_URL = r'http://www\.c-spanvideo\.org/program/(.*)'
      _TEST = {
          u'url': u'http://www.c-spanvideo.org/program/HolderonV',
          u'file': u'315139.flv',
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 71f5e03eea393b7733bf3bfeb4f2eeea5b21eb85..6685c94a3d6b283e0b7f2240ebfcf35ce462edc2 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -11,6 +11,7 @@ from ..utils import (
      get_element_by_attribute,
      get_element_by_id,
      orderedSet,
+    str_to_int,
  
      ExtractorError,
  )
@@ -27,7 +28,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
  class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
      """Information Extractor for Dailymotion"""
  
-    _VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
+    _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
      IE_NAME = u'dailymotion'
  
      _FORMATS = [
@@ -80,7 +81,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
          # Extract id and simplified title from URL
          mobj = re.match(self._VALID_URL, url)
  
-        video_id = mobj.group(1).split('_')[0].split('?')[0]
+        video_id = mobj.group('id')
  
          url = 'http://www.dailymotion.com/video/%s' % video_id
  
@@ -100,10 +101,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
              self.to_screen(u'Vevo video detected: %s' % vevo_id)
              return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
  
-        video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
-                                             # Looking for official user
-                                             r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
-                                            webpage, 'video uploader', fatal=False)
          age_limit = self._rta_search(webpage)
  
          video_upload_date = None
@@ -146,15 +143,21 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
              self._list_available_subtitles(video_id, webpage)
              return
  
+        view_count = self._search_regex(
+            r'video_views_count[^>]+>\s+([\d\.,]+)', webpage, u'view count', fatal=False)
+        if view_count is not None:
+            view_count = str_to_int(view_count)
+
          return {
              'id':       video_id,
              'formats': formats,
-            'uploader': video_uploader,
+            'uploader': info['owner_screenname'],
              'upload_date':  video_upload_date,
              'title':    self._og_search_title(webpage),
              'subtitles':    video_subtitles,
              'thumbnail': info['thumbnail_url'],
              'age_limit': age_limit,
+            'view_count': view_count,
          }
  
      def _get_available_subtitles(self, video_id, webpage):
diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py

index d418ce4a8a29c122e811c96aac76d388c790b560..4876ecb4812710e2509eec8fc19f00dac60d2fde 100644 (file)
--- a/youtube_dl/extractor/daum.py
+++ b/youtube_dl/extractor/daum.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  
  class DaumIE(InfoExtractor):
-    _VALID_URL = r'https?://tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
      IE_NAME = u'daum.net'
  
      _TEST = {
diff --git a/youtube_dl/extractor/dreisat.py b/youtube_dl/extractor/dreisat.py

index 24ce794255211112eafadaf2b5a629716b90aa5e..cb7226f82a6af167569286918a56cce64e796150 100644 (file)
--- a/youtube_dl/extractor/dreisat.py
+++ b/youtube_dl/extractor/dreisat.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  class DreiSatIE(InfoExtractor):
      IE_NAME = '3sat'
-    _VALID_URL = r'(?:http://)?(?:www\.)?3sat.de/mediathek/index.php\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
+    _VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/index\.php\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
      _TEST = {
          u"url": u"http://www.3sat.de/mediathek/index.php?obj=36983",
          u'file': u'36983.webm',
diff --git a/youtube_dl/extractor/eighttracks.py b/youtube_dl/extractor/eighttracks.py

index f21ef88530d2f8913b4b35d9c03fc4fc14de7ddc..88f5526b8a59491cc6cd40b48fe9451b3fc2d12b 100644 (file)
--- a/youtube_dl/extractor/eighttracks.py
+++ b/youtube_dl/extractor/eighttracks.py
@@ -10,7 +10,7 @@ from ..utils import (
  
  class EightTracksIE(InfoExtractor):
      IE_NAME = '8tracks'
-    _VALID_URL = r'https?://8tracks.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
+    _VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
      _TEST = {
          u"name": u"EightTracks",
          u"url": u"http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
diff --git a/youtube_dl/extractor/exfm.py b/youtube_dl/extractor/exfm.py

index a51d79b08c656144c3f67d853fcae8fe52bc6e1f..682901d16227e088e203bd01656db21cc2f70dda 100644 (file)
--- a/youtube_dl/extractor/exfm.py
+++ b/youtube_dl/extractor/exfm.py
@@ -8,7 +8,7 @@ class ExfmIE(InfoExtractor):
      IE_NAME = u'exfm'
      IE_DESC = u'ex.fm'
      _VALID_URL = r'(?:http://)?(?:www\.)?ex\.fm/song/([^/]+)'
-    _SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
+    _SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
      _TESTS = [
          {
              u'url': u'http://ex.fm/song/eh359',
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index 3b210710e3695ec3aa940b335d9868a281d7740a..4556079c8ad5edce7a6a3efe29989299d719ed28 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -17,7 +17,7 @@ from ..utils import (
  class FacebookIE(InfoExtractor):
      """Information Extractor for Facebook"""
  
-    _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
+    _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:[^#?]*#!/)?(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
      _LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
      _CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
      _NETRC_MACHINE = 'facebook'
@@ -27,7 +27,7 @@ class FacebookIE(InfoExtractor):
          u'file': u'120708114770723.mp4',
          u'md5': u'48975a41ccc4b7a581abd68651c1a5a8',
          u'info_dict': {
-            u"duration": 279, 
+            u"duration": 279,
              u"title": u"PEOPLE ARE AWESOME 2013"
          }
      }
diff --git a/youtube_dl/extractor/faz.py b/youtube_dl/extractor/faz.py

index d0dfde694b4d93f7249f2dd3a326ecb0bdca98dd..c6ab6952e84dc9074816f28ebb7fe6d8ce02cb47 100644 (file)
--- a/youtube_dl/extractor/faz.py
+++ b/youtube_dl/extractor/faz.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  class FazIE(InfoExtractor):
      IE_NAME = u'faz.net'
-    _VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
+    _VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+)\.html'
  
      _TEST = {
          u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
diff --git a/youtube_dl/extractor/fktv.py b/youtube_dl/extractor/fktv.py

index dba1a8dc262979b5afce987211bab2f14e502dba..d7048c8c1ae7e6ba149552a7b32ec2ab42c8a3f2 100644 (file)
--- a/youtube_dl/extractor/fktv.py
+++ b/youtube_dl/extractor/fktv.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  class FKTVIE(InfoExtractor):
      IE_NAME = u'fernsehkritik.tv'
-    _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
+    _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik\.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
  
      _TEST = {
          u'url': u'http://fernsehkritik.tv/folge-1',
@@ -52,7 +52,7 @@ class FKTVIE(InfoExtractor):
  
  class FKTVPosteckeIE(InfoExtractor):
      IE_NAME = u'fernsehkritik.tv:postecke'
-    _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
+    _VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik\.tv/inline-video/postecke\.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
      _TEST = {
          u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
          u'file': u'0120.flv',
diff --git a/youtube_dl/extractor/francetv.py b/youtube_dl/extractor/francetv.py

index 6e1971043b3853b9fe54e682473a61621c9989e2..ad85bc16d7796cfcf42331a05bb0392e773f70c5 100644 (file)
--- a/youtube_dl/extractor/francetv.py
+++ b/youtube_dl/extractor/francetv.py
@@ -21,7 +21,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
          thumbnail_path = info.find('image').text
  
          return {'id': video_id,
-                'ext': 'mp4',
+                'ext': 'flv' if video_url.startswith('rtmp') else 'mp4',
                  'url': video_url,
                  'title': info.find('titre').text,
                  'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
@@ -45,7 +45,7 @@ class PluzzIE(FranceTVBaseInfoExtractor):
  
  class FranceTvInfoIE(FranceTVBaseInfoExtractor):
      IE_NAME = u'francetvinfo.fr'
-    _VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
+    _VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+)\.html'
  
      _TEST = {
          u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
@@ -66,35 +66,101 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
          return self._extract_video(video_id)
  
  
-class France2IE(FranceTVBaseInfoExtractor):
-    IE_NAME = u'france2.fr'
-    _VALID_URL = r'''(?x)https?://www\.france2\.fr/
+class FranceTVIE(FranceTVBaseInfoExtractor):
+    IE_NAME = u'francetv'
+    IE_DESC = u'France 2, 3, 4, 5 and Ô'
+    _VALID_URL = r'''(?x)https?://www\.france[2345o]\.fr/
          (?:
-            emissions/.*?/videos/(?P<id>\d+)
-        |   emission/(?P<key>[^/?]+)
+            emissions/.*?/(videos|emissions)/(?P<id>[^/?]+)
+        |   (emissions?|jt)/(?P<key>[^/?]+)
          )'''
  
-    _TEST = {
-        u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
-        u'file': u'75540104.mp4',
-        u'info_dict': {
-            u'title': u'13h15, le samedi...',
-            u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
+    _TESTS = [
+        # france2
+        {
+            u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
+            u'file': u'75540104.mp4',
+            u'info_dict': {
+                u'title': u'13h15, le samedi...',
+                u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
+            },
+            u'params': {
+                # m3u8 download
+                u'skip_download': True,
+            },
          },
-        u'params': {
-            u'skip_download': True,
+        # france3
+        {
+            u'url': u'http://www.france3.fr/emissions/pieces-a-conviction/diffusions/13-11-2013_145575',
+            u'info_dict': {
+                u'id': u'000702326_CAPP_PicesconvictionExtrait313022013_120220131722_Au',
+                u'ext': u'flv',
+                u'title': u'Le scandale du prix des médicaments',
+                u'description': u'md5:1384089fbee2f04fc6c9de025ee2e9ce',
+            },
+            u'params': {
+                # rtmp download
+                u'skip_download': True,
+            },
          },
-    }
+        # france4
+        {
+            u'url': u'http://www.france4.fr/emissions/hero-corp/videos/rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
+            u'info_dict': {
+                u'id': u'rhozet_herocorp_bonus_1_20131106_1923_06112013172108_F4',
+                u'ext': u'flv',
+                u'title': u'Hero Corp Making of - Extrait 1',
+                u'description': u'md5:c87d54871b1790679aec1197e73d650a',
+            },
+            u'params': {
+                # rtmp download
+                u'skip_download': True,
+            },
+        },
+        # france5
+        {
+            u'url': u'http://www.france5.fr/emissions/c-a-dire/videos/92837968',
+            u'info_dict': {
+                u'id': u'92837968',
+                u'ext': u'mp4',
+                u'title': u'C à dire ?!',
+                u'description': u'md5:fb1db1cbad784dcce7c7a7bd177c8e2f',
+            },
+            u'params': {
+                # m3u8 download
+                u'skip_download': True,
+            },
+        },
+        # franceo
+        {
+            u'url': u'http://www.franceo.fr/jt/info-afrique/04-12-2013',
+            u'info_dict': {
+                u'id': u'92327925',
+                u'ext': u'mp4',
+                u'title': u'Infô-Afrique',
+                u'description': u'md5:ebf346da789428841bee0fd2a935ea55',
+            },
+            u'params': {
+                # m3u8 download
+                u'skip_download': True,
+            },
+            u'skip': u'The id changes frequently',
+        },
+    ]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          if mobj.group('key'):
              webpage = self._download_webpage(url, mobj.group('key'))
-            video_id = self._html_search_regex(
-                r'''(?x)<div\s+class="video-player">\s*
+            id_res = [
+                (r'''(?x)<div\s+class="video-player">\s*
                      <a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
-                    class="francetv-video-player">''',
-                webpage, u'video ID')
+                    class="francetv-video-player">'''),
+                (r'<a id="player_direct" href="http://info\.francetelevisions'
+                 '\.fr/\?id-video=([^"/&]+)'),
+                (r'<a class="video" id="ftv_player_(.+?)"'),
+            ]
+            video_id = self._html_search_regex(id_res, webpage, u'video ID')
          else:
              video_id = mobj.group('id')
          return self._extract_video(video_id)
diff --git a/youtube_dl/extractor/gamekings.py b/youtube_dl/extractor/gamekings.py

index c91669b0ebaeac6085ba010c242dc571b28e57e1..a3a5251fe5711173ccb3986c263994d560345bf8 100644 (file)
--- a/youtube_dl/extractor/gamekings.py
+++ b/youtube_dl/extractor/gamekings.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class GamekingsIE(InfoExtractor):
-    _VALID_URL = r'http?://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
+    _VALID_URL = r'http://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)'
      _TEST = {
          u"url": u"http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/",
          u'file': u'20130811.mp4',
diff --git a/youtube_dl/extractor/gametrailers.py b/youtube_dl/extractor/gametrailers.py

index 3a8bef250fa8eddd89af54291228ec3909c1453c..d82a5d4b2a30578298080f03a8bba5f502e48f20 100644 (file)
--- a/youtube_dl/extractor/gametrailers.py
+++ b/youtube_dl/extractor/gametrailers.py
@@ -4,8 +4,7 @@ from .mtv import MTVServicesInfoExtractor
  
  
  class GametrailersIE(MTVServicesInfoExtractor):
-    _VALID_URL = r'http://www.gametrailers.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
-
+    _VALID_URL = r'http://www\.gametrailers\.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
      _TEST = {
          u'url': u'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
          u'file': u'70e9a5d7-cf25-4a10-9104-6f3e7342ae0d.mp4',
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index 10ae06263ef1349ff6526575feb61cb71591f3cc..7a14c98f9b6ef9d550606c72c330d0730ec1233e 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -11,10 +11,14 @@ from ..utils import (
      compat_urlparse,
  
      ExtractorError,
+    HEADRequest,
      smuggle_url,
      unescapeHTML,
+    unified_strdate,
+    url_basename,
  )
  from .brightcove import BrightcoveIE
+from .ooyala import OoyalaIE
  
  
  class GenericIE(InfoExtractor):
@@ -71,6 +75,27 @@ class GenericIE(InfoExtractor):
                  u'skip_download': True,
              },
          },
+        # Direct link to a video
+        {
+            u'url': u'http://media.w3.org/2010/05/sintel/trailer.mp4',
+            u'file': u'trailer.mp4',
+            u'md5': u'67d406c2bcb6af27fa886f31aa934bbe',
+            u'info_dict': {
+                u'id': u'trailer',
+                u'title': u'trailer',
+                u'upload_date': u'20100513',
+            }
+        },
+        # ooyala video
+        {
+            u'url': u'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
+            u'md5': u'5644c6ca5d5782c1d0d350dad9bd840c',
+            u'info_dict': {
+                u'id': u'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ',
+                u'ext': u'mp4',
+                u'title': u'2cc213299525360.mov', #that's what we get
+            },
+        },
      ]
  
      def report_download_webpage(self, video_id):
@@ -83,23 +108,20 @@ class GenericIE(InfoExtractor):
          """Report information extraction."""
          self._downloader.to_screen(u'[redirect] Following redirect to %s' % new_url)
  
-    def _test_redirect(self, url):
+    def _send_head(self, url):
          """Check if it is a redirect, like url shorteners, in case return the new url."""
-        class HeadRequest(compat_urllib_request.Request):
-            def get_method(self):
-                return "HEAD"
  
          class HEADRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
              """
              Subclass the HTTPRedirectHandler to make it use our
-            HeadRequest also on the redirected URL
+            HEADRequest also on the redirected URL
              """
              def redirect_request(self, req, fp, code, msg, headers, newurl):
                  if code in (301, 302, 303, 307):
                      newurl = newurl.replace(' ', '%20')
                      newheaders = dict((k,v) for k,v in req.headers.items()
                                        if k.lower() not in ("content-length", "content-type"))
-                    return HeadRequest(newurl,
+                    return HEADRequest(newurl,
                                         headers=newheaders,
                                         origin_req_host=req.get_origin_req_host(),
                                         unverifiable=True)
@@ -128,32 +150,49 @@ class GenericIE(InfoExtractor):
                          compat_urllib_request.HTTPErrorProcessor, compat_urllib_request.HTTPSHandler]:
              opener.add_handler(handler())
  
-        response = opener.open(HeadRequest(url))
+        response = opener.open(HEADRequest(url))
          if response is None:
              raise ExtractorError(u'Invalid URL protocol')
-        new_url = response.geturl()
-
-        if url == new_url:
-            return False
-
-        self.report_following_redirect(new_url)
-        return new_url
+        return response
  
      def _real_extract(self, url):
          parsed_url = compat_urlparse.urlparse(url)
          if not parsed_url.scheme:
              self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
              return self.url_result('http://' + url)
+        video_id = os.path.splitext(url.split('/')[-1])[0]
  
          try:
-            new_url = self._test_redirect(url)
-            if new_url:
-                return [self.url_result(new_url)]
+            response = self._send_head(url)
+
+            # Check for redirect
+            new_url = response.geturl()
+            if url != new_url:
+                self.report_following_redirect(new_url)
+                return self.url_result(new_url)
+
+            # Check for direct link to a video
+            content_type = response.headers.get('Content-Type', '')
+            m = re.match(r'^(?P<type>audio|video|application(?=/ogg$))/(?P<format_id>.+)$', content_type)
+            if m:
+                upload_date = response.headers.get('Last-Modified')
+                if upload_date:
+                    upload_date = unified_strdate(upload_date)
+                return {
+                    'id': video_id,
+                    'title': os.path.splitext(url_basename(url))[0],
+                    'formats': [{
+                        'format_id': m.group('format_id'),
+                        'url': url,
+                        'vcodec': u'none' if m.group('type') == 'audio' else None
+                    }],
+                    'upload_date': upload_date,
+                }
+
          except compat_urllib_error.HTTPError:
              # This may be a stupid server that doesn't like HEAD, our UA, or so
              pass
  
-        video_id = url.split('/')[-1]
          try:
              webpage = self._download_webpage(url, video_id)
          except ValueError:
@@ -169,8 +208,13 @@ class GenericIE(InfoExtractor):
          #   Site Name | Video Title
          #   Video Title - Tagline | Site Name
          # and so on and so forth; it's just not practical
-        video_title = self._html_search_regex(r'<title>(.*)</title>',
-            webpage, u'video title', default=u'video', flags=re.DOTALL)
+        video_title = self._html_search_regex(
+            r'(?s)<title>(.*?)</title>', webpage, u'video title',
+            default=u'video')
+
+        # video uploader is domain name
+        video_uploader = self._search_regex(
+            r'^(?:https?://)?([^/]*)/.*', url, u'video uploader')
  
          # Look for BrightCove:
          bc_url = BrightcoveIE._extract_brightcove_url(webpage)
@@ -178,7 +222,7 @@ class GenericIE(InfoExtractor):
              self.to_screen(u'Brightcove video detected.')
              return self.url_result(bc_url, 'Brightcove')
  
-        # Look for embedded Vimeo player
+        # Look for embedded (iframe) Vimeo player
          mobj = re.search(
              r'<iframe[^>]+?src="(https?://player.vimeo.com/video/.+?)"', webpage)
          if mobj:
@@ -186,9 +230,18 @@ class GenericIE(InfoExtractor):
              surl = smuggle_url(player_url, {'Referer': url})
              return self.url_result(surl, 'Vimeo')
  
+        # Look for embedded (swf embed) Vimeo player
+        mobj = re.search(
+            r'<embed[^>]+?src="(https?://(?:www\.)?vimeo.com/moogaloop.swf.+?)"', webpage)
+        if mobj:
+            return self.url_result(mobj.group(1), 'Vimeo')
+
          # Look for embedded YouTube player
-        matches = re.findall(
-            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube.com/embed/.+?)\1', webpage)
+        matches = re.findall(r'''(?x)
+            (?:<iframe[^>]+?src=|embedSWF\(\s*)
+            (["\'])(?P<url>(?:https?:)?//(?:www\.)?youtube\.com/
+                (?:embed|v)/.+?)
+            \1''', webpage)
          if matches:
              urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Youtube')
                       for tuppl in matches]
@@ -197,13 +250,38 @@ class GenericIE(InfoExtractor):
  
          # Look for embedded Dailymotion player
          matches = re.findall(
-            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion.com/embed/video/.+?)\1', webpage)
+            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
          if matches:
              urlrs = [self.url_result(unescapeHTML(tuppl[1]), 'Dailymotion')
                       for tuppl in matches]
              return self.playlist_result(
                  urlrs, playlist_id=video_id, playlist_title=video_title)
  
+        # Look for embedded Wistia player
+        match = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:fast\.)?wistia\.net/embed/iframe/.+?)\1', webpage)
+        if match:
+            return {
+                '_type': 'url_transparent',
+                'url': unescapeHTML(match.group('url')),
+                'ie_key': 'Wistia',
+                'uploader': video_uploader,
+                'title': video_title,
+                'id': video_id,
+            }
+
+        # Look for embedded blip.tv player
+        mobj = re.search(r'<meta\s[^>]*https?://api.blip.tv/\w+/redirect/\w+/(\d+)', webpage)
+        if mobj:
+            return self.url_result('http://blip.tv/seo/-'+mobj.group(1), 'BlipTV')
+        mobj = re.search(r'<(?:iframe|embed|object)\s[^>]*https?://(?:\w+\.)?blip.tv/(?:play/|api\.swf#)([a-zA-Z0-9]+)', webpage)
+        if mobj:
+            player_url = 'http://blip.tv/play/%s.x?p=1' % mobj.group(1)
+            player_page = self._download_webpage(player_url, mobj.group(1))
+            blip_video_id = self._search_regex(r'data-episode-id="(\d+)', player_page, u'blip_video_id', fatal=False)
+            if blip_video_id:
+                return self.url_result('http://blip.tv/seo/-'+blip_video_id, 'BlipTV')
+
          # Look for Bandcamp pages with custom domain
          mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
          if mobj is not None:
@@ -211,6 +289,22 @@ class GenericIE(InfoExtractor):
              # Don't set the extractor because it can be a track url or an album
              return self.url_result(burl)
  
+        # Look for embedded Vevo player
+        mobj = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:cache\.)?vevo\.com/.+?)\1', webpage)
+        if mobj is not None:
+            return self.url_result(mobj.group('url'))
+
+        # Look for Ooyala videos
+        mobj = re.search(r'player.ooyala.com/[^"?]+\?[^"]*?(?:embedCode|ec)=([^"&]+)', webpage)
+        if mobj is not None:
+            return OoyalaIE._build_url_result(mobj.group(1))
+
+        # Look for Aparat videos
+        mobj = re.search(r'<iframe src="(http://www.aparat.com/video/[^"]+)"', webpage)
+        if mobj is not None:
+            return self.url_result(mobj.group(1), 'Aparat')
+
          # Start with something easy: JW Player in SWFObject
          mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
          if mobj is None:
@@ -247,14 +341,9 @@ class GenericIE(InfoExtractor):
          # here's a fun little line of code for you:
          video_id = os.path.splitext(video_id)[0]
  
-        # video uploader is domain name
-        video_uploader = self._search_regex(r'(?:https?://)?([^/]*)/.*',
-            url, u'video uploader')
-
          return {
              'id':       video_id,
              'url':      video_url,
              'uploader': video_uploader,
-            'upload_date':  None,
              'title':    video_title,
          }
diff --git a/youtube_dl/extractor/hotnewhiphop.py b/youtube_dl/extractor/hotnewhiphop.py

index 3798118a7fc491f9b2437878cf9d99df1f05b5ec..0ee74fb38410a4acce1e15c7a9ce98d80409012e 100644 (file)
--- a/youtube_dl/extractor/hotnewhiphop.py
+++ b/youtube_dl/extractor/hotnewhiphop.py
@@ -11,7 +11,7 @@ class HotNewHipHopIE(InfoExtractor):
          u'file': u'1435540.mp3',
          u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
          u'info_dict': {
-            u"title": u"Freddie Gibbs - Lay It Down"
+            u"title": u'Freddie Gibbs "Lay It Down"'
          }
      }
  
diff --git a/youtube_dl/extractor/ign.py b/youtube_dl/extractor/ign.py

index c52146f7d716dd02ba34230e9fbb7c4dfe5ac15d..381af91e42d4c9f642b35643107f5dafd026aad9 100644 (file)
--- a/youtube_dl/extractor/ign.py
+++ b/youtube_dl/extractor/ign.py
@@ -44,7 +44,7 @@ class IGNIE(InfoExtractor):
                  {
                      u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
                      u'info_dict': {
-                        u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
+                        u'title': u'26 Twisted Moments from GTA 5 in Slow Motion',
                          u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
                      },
                  },
@@ -103,7 +103,7 @@ class IGNIE(InfoExtractor):
  class OneUPIE(IGNIE):
      """Extractor for 1up.com, it uses the ign videos system."""
  
-    _VALID_URL = r'https?://gamevideos.1up.com/(?P<type>video)/id/(?P<name_or_id>.+)'
+    _VALID_URL = r'https?://gamevideos\.1up\.com/(?P<type>video)/id/(?P<name_or_id>.+)'
      IE_NAME = '1up.com'
  
      _DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
diff --git a/youtube_dl/extractor/imdb.py b/youtube_dl/extractor/imdb.py

index d8e9712a7acd39db97c8a55b2551137ca0e56a41..e5332cce820ca239c915da402107a77143f0484b 100644 (file)
--- a/youtube_dl/extractor/imdb.py
+++ b/youtube_dl/extractor/imdb.py
@@ -11,7 +11,7 @@ from ..utils import (
  class ImdbIE(InfoExtractor):
      IE_NAME = u'imdb'
      IE_DESC = u'Internet Movie Database trailers'
-    _VALID_URL = r'http://www\.imdb\.com/video/imdb/vi(?P<id>\d+)'
+    _VALID_URL = r'http://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
  
      _TEST = {
          u'url': u'http://www.imdb.com/video/imdb/vi2524815897',
@@ -21,20 +21,20 @@ class ImdbIE(InfoExtractor):
              u'ext': u'mp4',
              u'title': u'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
              u'description': u'md5:9061c2219254e5d14e03c25c98e96a81',
-            u'duration': 151,
          }
      }
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        webpage = self._download_webpage(url,video_id)
+        webpage = self._download_webpage('http://www.imdb.com/video/imdb/vi%s' % video_id, video_id)
          descr = get_element_by_attribute('itemprop', 'description', webpage)
          available_formats = re.findall(
              r'case \'(?P<f_id>.*?)\' :$\s+url = \'(?P<path>.*?)\'', webpage,
              flags=re.MULTILINE)
          formats = []
          for f_id, f_path in available_formats:
+            f_path = f_path.strip()
              format_page = self._download_webpage(
                  compat_urlparse.urljoin(url, f_path),
                  u'Downloading info for %s format' % f_id)
@@ -46,7 +46,6 @@ class ImdbIE(InfoExtractor):
              formats.append({
                  'format_id': f_id,
                  'url': format_info['url'],
-                'height': int(info['titleObject']['encoding']['selected'][:-1]),
              })
  
          return {
@@ -55,5 +54,4 @@ class ImdbIE(InfoExtractor):
              'formats': formats,
              'description': descr,
              'thumbnail': format_info['slate'],
-            'duration': int(info['titleObject']['title']['duration_seconds']),
          }
diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py

index 213aac428451bfcb860585b26de0e1c43abc732d..660573d022d267b1dfbf0d7274083f5ae47e9953 100644 (file)
--- a/youtube_dl/extractor/instagram.py
+++ b/youtube_dl/extractor/instagram.py
@@ -3,7 +3,7 @@ import re
  from .common import InfoExtractor
  
  class InstagramIE(InfoExtractor):
-    _VALID_URL = r'(?:http://)?instagram.com/p/(.*?)/'
+    _VALID_URL = r'(?:http://)?instagram\.com/p/(.*?)/'
      _TEST = {
          u'url': u'http://instagram.com/p/aye83DjauH/?foo=bar#abc',
          u'file': u'aye83DjauH.mp4',
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

new file mode 100644 (file)

index 0000000..4bdf55f
--- /dev/null
+++ b/youtube_dl/extractor/ivi.py
@@ -0,0 +1,154 @@
+# encoding: utf-8
+
+import re
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+    compat_urllib_request,
+    ExtractorError,
+)
+
+
+class IviIE(InfoExtractor):
+    IE_DESC = u'ivi.ru'
+    IE_NAME = u'ivi'
+    _VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch(?:/(?P<compilationid>[^/]+))?/(?P<videoid>\d+)'
+
+    _TESTS = [
+        # Single movie
+        {
+            u'url': u'http://www.ivi.ru/watch/53141',
+            u'file': u'53141.mp4',
+            u'md5': u'6ff5be2254e796ed346251d117196cf4',
+            u'info_dict': {
+                u'title': u'Иван Васильевич меняет профессию',
+                u'description': u'md5:14d8eda24e9d93d29b5857012c6d6346',
+                u'duration': 5498,
+                u'thumbnail': u'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
+            },
+            u'skip': u'Only works from Russia',
+        },
+        # Serial's serie
+        {
+            u'url': u'http://www.ivi.ru/watch/dezhurnyi_angel/74791',
+            u'file': u'74791.mp4',
+            u'md5': u'3e6cc9a848c1d2ebcc6476444967baa9',
+            u'info_dict': {
+                u'title': u'Дежурный ангел - 1 серия',
+                u'duration': 2490,
+                u'thumbnail': u'http://thumbs.ivi.ru/f7.vcp.digitalaccess.ru/contents/8/e/bc2f6c2b6e5d291152fdd32c059141.jpg',
+            },
+            u'skip': u'Only works from Russia',
+         }
+    ]
+    
+    # Sorted by quality
+    _known_formats = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
+
+    # Sorted by size
+    _known_thumbnails = ['Thumb-120x90', 'Thumb-160', 'Thumb-640x480']
+
+    def _extract_description(self, html):
+        m = re.search(r'<meta name="description" content="(?P<description>[^"]+)"/>', html)
+        return m.group('description') if m is not None else None
+
+    def _extract_comment_count(self, html):
+        m = re.search(u'(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
+        return int(m.group('commentcount')) if m is not None else 0
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('videoid')
+
+        api_url = 'http://api.digitalaccess.ru/api/json/'
+
+        data = {u'method': u'da.content.get',
+                u'params': [video_id, {u'site': u's183',
+                                       u'referrer': u'http://www.ivi.ru/watch/%s' % video_id,
+                                       u'contentid': video_id
+                                    }
+                            ]
+                }
+
+        request = compat_urllib_request.Request(api_url, json.dumps(data))
+
+        video_json_page = self._download_webpage(request, video_id, u'Downloading video JSON')
+        video_json = json.loads(video_json_page)
+
+        if u'error' in video_json:
+            error = video_json[u'error']
+            if error[u'origin'] == u'NoRedisValidData':
+                raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
+            raise ExtractorError(u'Unable to download video %s: %s' % (video_id, error[u'message']), expected=True)
+
+        result = video_json[u'result']
+
+        formats = [{'url': x[u'url'],
+                    'format_id': x[u'content_format']
+                    } for x in result[u'files'] if x[u'content_format'] in self._known_formats]
+        formats.sort(key=lambda fmt: self._known_formats.index(fmt['format_id']))
+
+        if len(formats) == 0:
+            self._downloader.report_warning(u'No media links available for %s' % video_id)
+            return
+
+        duration = result[u'duration']
+        compilation = result[u'compilation']
+        title = result[u'title']
+
+        title = '%s - %s' % (compilation, title) if compilation is not None else title  
+
+        previews = result[u'preview']
+        previews.sort(key=lambda fmt: self._known_thumbnails.index(fmt['content_format']))
+        thumbnail = previews[-1][u'url'] if len(previews) > 0 else None
+
+        video_page = self._download_webpage(url, video_id, u'Downloading video page')
+        description = self._extract_description(video_page)
+        comment_count = self._extract_comment_count(video_page)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'description': description,
+            'duration': duration,
+            'comment_count': comment_count,
+            'formats': formats,
+        }
+
+
+class IviCompilationIE(InfoExtractor):
+    IE_DESC = u'ivi.ru compilations'
+    IE_NAME = u'ivi:compilation'
+    _VALID_URL = r'^https?://(?:www\.)?ivi\.ru/watch/(?!\d+)(?P<compilationid>[a-z\d_-]+)(?:/season(?P<seasonid>\d+))?$'
+
+    def _extract_entries(self, html, compilation_id):
+        return [self.url_result('http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), 'Ivi')
+                for serie in re.findall(r'<strong><a href="/watch/%s/(\d+)">(?:[^<]+)</a></strong>' % compilation_id, html)]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        compilation_id = mobj.group('compilationid')
+        season_id = mobj.group('seasonid')
+
+        if season_id is not None: # Season link
+            season_page = self._download_webpage(url, compilation_id, u'Downloading season %s web page' % season_id)
+            playlist_id = '%s/season%s' % (compilation_id, season_id)
+            playlist_title = self._html_search_meta(u'title', season_page, u'title')
+            entries = self._extract_entries(season_page, compilation_id)
+        else: # Compilation link            
+            compilation_page = self._download_webpage(url, compilation_id, u'Downloading compilation web page')
+            playlist_id = compilation_id
+            playlist_title = self._html_search_meta(u'title', compilation_page, u'title')
+            seasons = re.findall(r'<a href="/watch/%s/season(\d+)">[^<]+</a>' % compilation_id, compilation_page)
+            if len(seasons) == 0: # No seasons in this compilation
+                entries = self._extract_entries(compilation_page, compilation_id)
+            else:
+                entries = []
+                for season_id in seasons:
+                    season_page = self._download_webpage('http://www.ivi.ru/watch/%s/season%s' % (compilation_id, season_id),
+                                                         compilation_id, u'Downloading season %s web page' % season_id)
+                    entries.extend(self._extract_entries(season_page, compilation_id))
+
+        return self.playlist_result(entries, playlist_id, playlist_title)
+\ No newline at end of file
diff --git a/youtube_dl/extractor/jukebox.py b/youtube_dl/extractor/jukebox.py

index c7bb234fe9eec9bd848f2a19c4307722cc4bbca0..592c64e1de0a47299770ef838095abf1f0988bcc 100644 (file)
--- a/youtube_dl/extractor/jukebox.py
+++ b/youtube_dl/extractor/jukebox.py
@@ -8,7 +8,7 @@ from ..utils import (
  )
  
  class JukeboxIE(InfoExtractor):
-    _VALID_URL = r'^http://www\.jukebox?\..+?\/.+[,](?P<video_id>[a-z0-9\-]+).html'
+    _VALID_URL = r'^http://www\.jukebox?\..+?\/.+[,](?P<video_id>[a-z0-9\-]+)\.html'
      _IFRAME = r'<iframe .*src="(?P<iframe>[^"]*)".*>'
      _VIDEO_URL = r'"config":{"file":"(?P<video_url>http:[^"]+[.](?P<video_ext>[^.?]+)[?]mdtk=[0-9]+)"'
      _TITLE = r'<h1 class="inline">(?P<title>[^<]+)</h1>.*<span id="infos_article_artist">(?P<artist>[^<]+)</span>'
diff --git a/youtube_dl/extractor/liveleak.py b/youtube_dl/extractor/liveleak.py

index dd062a14e736ba84b3aacb9d3bf426bca4c8f86f..5ae57a77c65c84d559946a651a3612fab86c8535 100644 (file)
--- a/youtube_dl/extractor/liveleak.py
+++ b/youtube_dl/extractor/liveleak.py
@@ -8,7 +8,7 @@ from ..utils import (
  
  class LiveLeakIE(InfoExtractor):
  
-    _VALID_URL = r'^(?:http?://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
+    _VALID_URL = r'^(?:http://)?(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<video_id>[\w_]+)(?:.*)'
      IE_NAME = u'liveleak'
      _TEST = {
          u'url': u'http://www.liveleak.com/view?i=757_1364311680',
diff --git a/youtube_dl/extractor/livestream.py b/youtube_dl/extractor/livestream.py

index 9bc35b115033ce641e4435ebb807c6e1c93c975e..1dcd1fb2de42894d80c494185caeb600540b02da 100644 (file)
--- a/youtube_dl/extractor/livestream.py
+++ b/youtube_dl/extractor/livestream.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  class LivestreamIE(InfoExtractor):
      IE_NAME = u'livestream'
-    _VALID_URL = r'http://new.livestream.com/.*?/(?P<event_name>.*?)(/videos/(?P<id>\d+))?/?$'
+    _VALID_URL = r'http://new\.livestream\.com/.*?/(?P<event_name>.*?)(/videos/(?P<id>\d+))?/?$'
      _TEST = {
          u'url': u'http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370',
          u'file': u'4719370.mp4',
diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py

new file mode 100644 (file)

index 0000000..08ce064
--- /dev/null
+++ b/youtube_dl/extractor/mdr.py
@@ -0,0 +1,63 @@
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+)
+
+
+class MDRIE(InfoExtractor):
+    _VALID_URL = r'^(?P<domain>(?:https?://)?(?:www\.)?mdr\.de)/mediathek/(?:.*)/(?P<type>video|audio)(?P<video_id>[^/_]+)_.*'
+    
+    # No tests, MDR regularily deletes its videos
+
+    def _real_extract(self, url):
+        m = re.match(self._VALID_URL, url)
+        video_id = m.group('video_id')
+        domain = m.group('domain')
+
+        # determine title and media streams from webpage
+        html = self._download_webpage(url, video_id)
+
+        title = self._html_search_regex(r'<h2>(.*?)</h2>', html, u'title')
+        xmlurl = self._search_regex(
+            r'(/mediathek/(?:.+)/(?:video|audio)[0-9]+-avCustom.xml)', html, u'XML URL')
+
+        doc = self._download_xml(domain + xmlurl, video_id)
+        formats = []
+        for a in doc.findall('./assets/asset'):
+            url_el = a.find('.//progressiveDownloadUrl')
+            if url_el is None:
+                continue
+            abr = int(a.find('bitrateAudio').text) // 1000
+            media_type = a.find('mediaType').text
+            format = {
+                'abr': abr,
+                'filesize': int(a.find('fileSize').text),
+                'url': url_el.text,
+            }
+
+            vbr_el = a.find('bitrateVideo')
+            if vbr_el is None:
+                format.update({
+                    'vcodec': 'none',
+                    'format_id': u'%s-%d' % (media_type, abr),
+                })
+            else:
+                vbr = int(vbr_el.text) // 1000
+                format.update({
+                    'vbr': vbr,
+                    'width': int(a.find('frameWidth').text),
+                    'height': int(a.find('frameHeight').text),
+                    'format_id': u'%s-%d' % (media_type, vbr),
+                })
+            formats.append(format)
+        formats.sort(key=lambda f: (f.get('vbr'), f['abr']))
+        if not formats:
+            raise ExtractorError(u'Could not find any valid formats')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py

index 91480ba875d5fff781ce08a47c41a3824e94e910..99d3c83a5e4c3a31d71e9f487e13930af498dc1f 100644 (file)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -1,14 +1,10 @@
  import re
-import socket
  
  from .common import InfoExtractor
  from ..utils import (
-    compat_http_client,
      compat_parse_qs,
-    compat_urllib_error,
      compat_urllib_parse,
      compat_urllib_request,
-    compat_str,
      determine_ext,
      ExtractorError,
  )
@@ -69,6 +65,21 @@ class MetacafeIE(InfoExtractor):
              u'age_limit': 18,
          },
      },
+    # cbs video
+    {
+        u'url': u'http://www.metacafe.com/watch/cb-0rOxMBabDXN6/samsung_galaxy_note_2_samsungs_next_generation_phablet/',
+        u'info_dict': {
+            u'id': u'0rOxMBabDXN6',
+            u'ext': u'flv',
+            u'title': u'Samsung Galaxy Note 2: Samsung\'s next-generation phablet',
+            u'description': u'md5:54d49fac53d26d5a0aaeccd061ada09d',
+            u'duration': 129,
+        },
+        u'params': {
+            # rtmp download
+            u'skip_download': True,
+        },
+    },
      ]
  
  
@@ -78,12 +89,8 @@ class MetacafeIE(InfoExtractor):
  
      def _real_initialize(self):
          # Retrieve disclaimer
-        request = compat_urllib_request.Request(self._DISCLAIMER)
-        try:
-            self.report_disclaimer()
-            compat_urllib_request.urlopen(request).read()
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            raise ExtractorError(u'Unable to retrieve disclaimer: %s' % compat_str(err))
+        self.report_disclaimer()
+        self._download_webpage(self._DISCLAIMER, None, False, u'Unable to retrieve disclaimer')
  
          # Confirm age
          disclaimer_form = {
@@ -92,11 +99,8 @@ class MetacafeIE(InfoExtractor):
              }
          request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        try:
-            self.report_age_confirmation()
-            compat_urllib_request.urlopen(request).read()
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            raise ExtractorError(u'Unable to confirm age: %s' % compat_str(err))
+        self.report_age_confirmation()
+        self._download_webpage(request, None, False, u'Unable to confirm age')
  
      def _real_extract(self, url):
          # Extract id and simplified title from URL
@@ -106,10 +110,16 @@ class MetacafeIE(InfoExtractor):
  
          video_id = mobj.group(1)
  
-        # Check if video comes from YouTube
-        mobj2 = re.match(r'^yt-(.*)$', video_id)
-        if mobj2 is not None:
-            return [self.url_result('http://www.youtube.com/watch?v=%s' % mobj2.group(1), 'Youtube')]
+        # the video may come from an external site
+        m_external = re.match('^(\w{2})-(.*)$', video_id)
+        if m_external is not None:
+            prefix, ext_id = m_external.groups()
+            # Check if video comes from YouTube
+            if prefix == 'yt':
+                return self.url_result('http://www.youtube.com/watch?v=%s' % ext_id, 'Youtube')
+            # CBS videos use theplatform.com
+            if prefix == 'cb':
+                return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
  
          # Retrieve video webpage to extract further information
          req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id)
diff --git a/youtube_dl/extractor/metacritic.py b/youtube_dl/extractor/metacritic.py

index 6b95b4998852ac61d1061e0dcf6c3f442772fee2..e560c1d354d8b03a05133bf1458ce8d28b84b7bc 100644 (file)
--- a/youtube_dl/extractor/metacritic.py
+++ b/youtube_dl/extractor/metacritic.py
@@ -1,8 +1,10 @@
  import re
-import xml.etree.ElementTree
  import operator
  
  from .common import InfoExtractor
+from ..utils import (
+    fix_xml_all_ampersand,
+)
  
  
  class MetacriticIE(InfoExtractor):
@@ -23,9 +25,8 @@ class MetacriticIE(InfoExtractor):
          video_id = mobj.group('id')
          webpage = self._download_webpage(url, video_id)
          # The xml is not well formatted, there are raw '&'
-        info_xml = self._download_webpage('http://www.metacritic.com/video_data?video=' + video_id,
-            video_id, u'Downloading info xml').replace('&', '&amp;')
-        info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
+        info = self._download_xml('http://www.metacritic.com/video_data?video=' + video_id,
+            video_id, u'Downloading info xml', transform_source=fix_xml_all_ampersand)
  
          clip = next(c for c in info.findall('playList/clip') if c.find('id').text == video_id)
          formats = []
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index e2baf44d7e15032022e6b304ace2bf8ef11a09b2..125d81551c26ea67eff82f2d2189bd058d16b873 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -1,13 +1,10 @@
  import json
  import re
-import socket
  
  from .common import InfoExtractor
  from ..utils import (
-    compat_http_client,
-    compat_urllib_error,
-    compat_urllib_request,
      unified_strdate,
+    ExtractorError,
  )
  
  
@@ -31,13 +28,18 @@ class MixcloudIE(InfoExtractor):
          """Returns 1st active url from list"""
          for url in url_list:
              try:
-                compat_urllib_request.urlopen(url)
+                # We only want to know if the request succeed
+                # don't download the whole file
+                self._request_webpage(url, None, False)
                  return url
-            except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error):
+            except ExtractorError:
                  url = None
  
          return None
  
+    def _get_url(self, template_url):
+        return self.check_urls(template_url % i for i in range(30))
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
  
@@ -53,13 +55,18 @@ class MixcloudIE(InfoExtractor):
          preview_url = self._search_regex(r'data-preview-url="(.+?)"', webpage, u'preview url')
          song_url = preview_url.replace('/previews/', '/cloudcasts/originals/')
          template_url = re.sub(r'(stream\d*)', 'stream%d', song_url)
-        final_song_url = self.check_urls(template_url % i for i in range(30))
+        final_song_url = self._get_url(template_url)
+        if final_song_url is None:
+            self.to_screen('Trying with m4a extension')
+            template_url = template_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
+            final_song_url = self._get_url(template_url)
+        if final_song_url is None:
+            raise ExtractorError(u'Unable to extract track url')
  
          return {
              'id': track_id,
              'title': info['name'],
              'url': final_song_url,
-            'ext': 'mp3',
              'description': info.get('description'),
              'thumbnail': info['pictures'].get('extra_large'),
              'uploader': info['user']['name'],
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index 6b3feb560768f96c4d5b3bb3adc0989ecf1c1d4f..ed11f521aa02aa3fe421b8fc743b0a26b1e1cdd0 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -82,13 +82,20 @@ class MTVServicesInfoExtractor(InfoExtractor):
      def _get_videos_info(self, uri):
          video_id = self._id_from_uri(uri)
          data = compat_urllib_parse.urlencode({'uri': uri})
-        idoc = self._download_xml(self._FEED_URL +'?' + data, video_id,
-                                         u'Downloading info')
+
+        def fix_ampersand(s):
+            """ Fix unencoded ampersand in XML """
+            return s.replace(u'& ', '&amp; ')
+        idoc = self._download_xml(
+            self._FEED_URL + '?' + data, video_id,
+            u'Downloading info', transform_source=fix_ampersand)
          return [self._get_video_info(item) for item in idoc.findall('.//item')]
  
  
  class MTVIE(MTVServicesInfoExtractor):
-    _VALID_URL = r'^https?://(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$'
+    _VALID_URL = r'''(?x)^https?://
+        (?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
+           m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''
  
      _FEED_URL = 'http://www.mtv.com/player/embed/AS3/rss/'
  
@@ -122,16 +129,17 @@ class MTVIE(MTVServicesInfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('videoid')
-
-        webpage = self._download_webpage(url, video_id)
-
-        # Some videos come from Vevo.com
-        m_vevo = re.search(r'isVevoVideo = true;.*?vevoVideoId = "(.*?)";',
-                           webpage, re.DOTALL)
-        if m_vevo:
-            vevo_id = m_vevo.group(1);
-            self.to_screen(u'Vevo video detected: %s' % vevo_id)
-            return self.url_result('vevo:%s' % vevo_id, ie='Vevo')
-
-        uri = self._html_search_regex(r'/uri/(.*?)\?', webpage, u'uri')
+        uri = mobj.group('mgid')
+        if uri is None:
+            webpage = self._download_webpage(url, video_id)
+    
+            # Some videos come from Vevo.com
+            m_vevo = re.search(r'isVevoVideo = true;.*?vevoVideoId = "(.*?)";',
+                               webpage, re.DOTALL)
+            if m_vevo:
+                vevo_id = m_vevo.group(1);
+                self.to_screen(u'Vevo video detected: %s' % vevo_id)
+                return self.url_result('vevo:%s' % vevo_id, ie='Vevo')
+    
+            uri = self._html_search_regex(r'/uri/(.*?)\?', webpage, u'uri')
          return self._get_videos_info(uri)
diff --git a/youtube_dl/extractor/muzu.py b/youtube_dl/extractor/muzu.py

index 03e31ea1c9ed98fd59c72504dffb2fa37c80edb7..1772b7f9ae43c2eaef57a15a5b3df5d9e7244213 100644 (file)
--- a/youtube_dl/extractor/muzu.py
+++ b/youtube_dl/extractor/muzu.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  
  class MuzuTVIE(InfoExtractor):
-    _VALID_URL = r'https?://www.muzu.tv/(.+?)/(.+?)/(?P<id>\d+)'
+    _VALID_URL = r'https?://www\.muzu\.tv/(.+?)/(.+?)/(?P<id>\d+)'
      IE_NAME = u'muzu.tv'
  
      _TEST = {
diff --git a/youtube_dl/extractor/myspass.py b/youtube_dl/extractor/myspass.py

index 0067bf134fb416596c5db6948060ede7881421fa..4becddee604b4ec60a7ffd44c0619a07d31c2514 100644 (file)
--- a/youtube_dl/extractor/myspass.py
+++ b/youtube_dl/extractor/myspass.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  
  class MySpassIE(InfoExtractor):
-    _VALID_URL = r'http://www.myspass.de/.*'
+    _VALID_URL = r'http://www\.myspass\.de/.*'
      _TEST = {
          u'url': u'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
          u'file': u'11741.mp4',
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index c012ec0cfacb2afea6b395c5c87509f53ed58614..4cab30631956b903682fc2de7aa5dd551bcdd4a3 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  
  class NaverIE(InfoExtractor):
-    _VALID_URL = r'https?://tvcast\.naver\.com/v/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)'
  
      _TEST = {
          u'url': u'http://tvcast.naver.com/v/81652',
diff --git a/youtube_dl/extractor/ndtv.py b/youtube_dl/extractor/ndtv.py

new file mode 100644 (file)

index 0000000..d81df3c
--- /dev/null
+++ b/youtube_dl/extractor/ndtv.py
@@ -0,0 +1,64 @@
+import re
+
+from .common import InfoExtractor
+from ..utils import month_by_name
+
+
+class NDTVIE(InfoExtractor):
+    _VALID_URL = r'^https?://(?:www\.)?ndtv\.com/video/player/[^/]*/[^/]*/(?P<id>[a-z0-9]+)'
+
+    _TEST = {
+        u"url": u"http://www.ndtv.com/video/player/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal/300710",
+        u"file": u"300710.mp4",
+        u"md5": u"39f992dbe5fb531c395d8bbedb1e5e88",
+        u"info_dict": {
+            u"title": u"NDTV exclusive: Don't need character certificate from Rahul Gandhi, says Arvind Kejriwal",
+            u"description": u"In an exclusive interview to NDTV, Aam Aadmi Party's Arvind Kejriwal says it makes no difference to him that Rahul Gandhi said the Congress needs to learn from his party.",
+            u"upload_date": u"20131208",
+            u"duration": 1327,
+            u"thumbnail": u"http://i.ndtvimg.com/video/images/vod/medium/2013-12/big_300710_1386518307.jpg",
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+
+        filename = self._search_regex(
+            r"__filename='([^']+)'", webpage, u'video filename')
+        video_url = (u'http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' %
+                     filename)
+
+        duration_str = filename = self._search_regex(
+            r"__duration='([^']+)'", webpage, u'duration', fatal=False)
+        duration = None if duration_str is None else int(duration_str)
+
+        date_m = re.search(r'''(?x)
+            <p\s+class="vod_dateline">\s*
+                Published\s+On:\s*
+                (?P<monthname>[A-Za-z]+)\s+(?P<day>[0-9]+),\s*(?P<year>[0-9]+)
+            ''', webpage)
+        upload_date = None
+        assert date_m
+        if date_m is not None:
+            month = month_by_name(date_m.group('monthname'))
+            if month is not None:
+                upload_date = '%s%02d%02d' % (
+                    date_m.group('year'), month, int(date_m.group('day')))
+
+        description = self._og_search_description(webpage)
+        READ_MORE = u' (Read more)'
+        if description.endswith(READ_MORE):
+            description = description[:-len(READ_MORE)]
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'title': self._og_search_title(webpage),
+            'description': description,
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'duration': duration,
+            'upload_date': upload_date,
+        }
diff --git a/youtube_dl/extractor/ninegag.py b/youtube_dl/extractor/ninegag.py

new file mode 100644 (file)

index 0000000..ea986c0
--- /dev/null
+++ b/youtube_dl/extractor/ninegag.py
@@ -0,0 +1,43 @@
+import json
+import re
+
+from .common import InfoExtractor
+
+
+class NineGagIE(InfoExtractor):
+    IE_NAME = '9gag'
+    _VALID_URL = r'^https?://(?:www\.)?9gag\.tv/v/(?P<id>[0-9]+)'
+
+    _TEST = {
+        u"url": u"http://9gag.tv/v/1912",
+        u"file": u"1912.mp4",
+        u"info_dict": {
+            u"description": u"This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
+            u"title": u"\"People Are Awesome 2013\" Is Absolutely Awesome"
+        },
+        u'add_ie': [u'Youtube']
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+        data_json = self._html_search_regex(r'''(?x)
+            <div\s*id="tv-video"\s*data-video-source="youtube"\s*
+                data-video-meta="([^"]+)"''', webpage, u'video metadata')
+
+        data = json.loads(data_json)
+
+        return {
+            '_type': 'url_transparent',
+            'url': data['youtubeVideoId'],
+            'ie_key': 'Youtube',
+            'id': video_id,
+            'title': data['title'],
+            'description': data['description'],
+            'view_count': int(data['view_count']),
+            'like_count': int(data['statistic']['like']),
+            'dislike_count': int(data['statistic']['dislike']),
+            'thumbnail': data['thumbnail_url'],
+        }
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index 1f7b4d2e7e9fa79ef9f81f71f190f943c35dd3a5..d08e47734c217864a93062acbe87e2e658c57779 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -22,6 +22,11 @@ class OoyalaIE(InfoExtractor):
      def _url_for_embed_code(embed_code):
          return 'http://player.ooyala.com/player.js?embedCode=%s' % embed_code
  
+    @classmethod
+    def _build_url_result(cls, embed_code):
+        return cls.url_result(cls._url_for_embed_code(embed_code),
+            ie=cls.ie_key())
+
      def _extract_result(self, info, more_info):
          return {'id': info['embedCode'],
                  'ext': 'mp4',
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index cfca2a06352287038ff367e0f83fa67bd4cee782..b42eae89aca1bdc894e29a876d06e4c5d49564a0 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -12,7 +12,7 @@ from ..utils import (
  )
  
  class ORFIE(InfoExtractor):
-    _VALID_URL = r'https?://tvthek.orf.at/(programs/.+?/episodes|topics/.+?)/(?P<id>\d+)'
+    _VALID_URL = r'https?://tvthek\.orf\.at/(programs/.+?/episodes|topics/.+?)/(?P<id>\d+)'
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py

index 65462d867027b67f3cf8a26d6e6fa7a545471fe7..25f019231e8f98b49666f6d6a74400d494305b82 100644 (file)
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -5,7 +5,7 @@ from .common import InfoExtractor
  
  
  class PBSIE(InfoExtractor):
-    _VALID_URL = r'https?://video.pbs.org/video/(?P<id>\d+)/?'
+    _VALID_URL = r'https?://video\.pbs\.org/video/(?P<id>\d+)/?'
  
      _TEST = {
          u'url': u'http://video.pbs.org/video/2365006249/',
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

new file mode 100644 (file)

index 0000000..71abd50
--- /dev/null
+++ b/youtube_dl/extractor/pornhd.py
@@ -0,0 +1,38 @@
+import re
+
+from .common import InfoExtractor
+from ..utils import compat_urllib_parse
+
+
+class PornHdIE(InfoExtractor):
+    _VALID_URL = r'(?:http://)?(?:www\.)?pornhd\.com/videos/(?P<video_id>[0-9]+)/(?P<video_title>.+)'
+    _TEST = {
+        u'url': u'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
+        u'file': u'1962.flv',
+        u'md5': u'35272469887dca97abd30abecc6cdf75',
+        u'info_dict': {
+            u"title": u"sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
+            u"age_limit": 18,
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+
+        video_id = mobj.group('video_id')
+        video_title = mobj.group('video_title')
+
+        webpage = self._download_webpage(url, video_id)
+
+        video_url = self._html_search_regex(
+            r'&hd=(http.+?)&', webpage, u'video URL')
+        video_url = compat_urllib_parse.unquote(video_url)
+        age_limit = 18
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'ext': 'flv',
+            'title': video_title,
+            'age_limit': age_limit,
+        }
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index 8b3471919565d4c7044d51eb24e8ef01cc8e77fc..d9135c6b929765e87b13e58f6fd6af5567c55199 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -12,7 +12,7 @@ from ..aes import (
  )
  
  class PornHubIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>pornhub\.com/view_video\.php\?viewkey=(?P<videoid>[0-9]+))'
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>pornhub\.com/view_video\.php\?viewkey=(?P<videoid>[0-9a-f]+))'
      _TEST = {
          u'url': u'http://www.pornhub.com/view_video.php?viewkey=648719015',
          u'file': u'648719015.mp4',
diff --git a/youtube_dl/extractor/pyvideo.py b/youtube_dl/extractor/pyvideo.py

new file mode 100644 (file)

index 0000000..3305459
--- /dev/null
+++ b/youtube_dl/extractor/pyvideo.py
@@ -0,0 +1,51 @@
+import re
+import os
+
+from .common import InfoExtractor
+
+
+class PyvideoIE(InfoExtractor):
+    _VALID_URL = r'(?:http://)?(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
+    _TESTS = [{
+        u'url': u'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
+        u'file': u'24_4WWkSmNo.mp4',
+        u'md5': u'de317418c8bc76b1fd8633e4f32acbc6',
+        u'info_dict': {
+            u"title": u"Become a logging expert in 30 minutes",
+            u"description": u"md5:9665350d466c67fb5b1598de379021f7",
+            u"upload_date": u"20130320",
+            u"uploader": u"NextDayVideo",
+            u"uploader_id": u"NextDayVideo",
+        },
+        u'add_ie': ['Youtube'],
+    },
+    {
+        u'url': u'http://pyvideo.org/video/2542/gloriajw-spotifywitherikbernhardsson182m4v',
+        u'md5': u'5fe1c7e0a8aa5570330784c847ff6d12',
+        u'info_dict': {
+            u'id': u'2542',
+            u'ext': u'm4v',
+            u'title': u'Gloriajw-SpotifyWithErikBernhardsson182',
+        },
+    },
+    ]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        webpage = self._download_webpage(url, video_id)
+        m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', webpage)
+
+        if m_youtube is not None:
+            return self.url_result(m_youtube.group(1), 'Youtube')
+
+        title = self._html_search_regex(r'<div class="section">.*?<h3>([^>]+?)</h3>',
+            webpage, u'title', flags=re.DOTALL)
+        video_url = self._search_regex([r'<source src="(.*?)"',
+            r'<dt>Download</dt>.*?<a href="(.+?)"'],
+            webpage, u'video url', flags=re.DOTALL)
+        return {
+            'id': video_id,
+            'title': os.path.splitext(title)[0],
+            'url': video_url,
+        }
diff --git a/youtube_dl/extractor/radiofrance.py b/youtube_dl/extractor/radiofrance.py

new file mode 100644 (file)

index 0000000..34652f6
--- /dev/null
+++ b/youtube_dl/extractor/radiofrance.py
@@ -0,0 +1,55 @@
+# coding: utf-8
+import re
+
+from .common import InfoExtractor
+
+
+class RadioFranceIE(InfoExtractor):
+    _VALID_URL = r'^https?://maison\.radiofrance\.fr/radiovisions/(?P<id>[^?#]+)'
+    IE_NAME = u'radiofrance'
+
+    _TEST = {
+        u'url': u'http://maison.radiofrance.fr/radiovisions/one-one',
+        u'file': u'one-one.ogg',
+        u'md5': u'bdbb28ace95ed0e04faab32ba3160daf',
+        u'info_dict': {
+            u"title": u"One to one",
+            u"description": u"Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
+            u"uploader": u"Thomas Hercouët",
+        },
+    }
+
+    def _real_extract(self, url):
+        m = re.match(self._VALID_URL, url)
+        video_id = m.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+        title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, u'title')
+        description = self._html_search_regex(
+            r'<div class="bloc_page_wrapper"><div class="text">(.*?)</div>',
+            webpage, u'description', fatal=False)
+        uploader = self._html_search_regex(
+            r'<div class="credit">&nbsp;&nbsp;&copy;&nbsp;(.*?)</div>',
+            webpage, u'uploader', fatal=False)
+
+        formats_str = self._html_search_regex(
+            r'class="jp-jplayer[^"]*" data-source="([^"]+)">',
+            webpage, u'audio URLs')
+        formats = [
+            {
+                'format_id': fm[0],
+                'url': fm[1],
+                'vcodec': 'none',
+            }
+            for fm in
+            re.findall(r"([a-z0-9]+)\s*:\s*'([^']+)'", formats_str)
+        ]
+        # No sorting, we don't know any more about these formats
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'description': description,
+            'uploader': uploader,
+        }
diff --git a/youtube_dl/extractor/rtlnow.py b/youtube_dl/extractor/rtlnow.py

index 2f238de35832d61222331cf423e2691d8de52721..ccf0b1546452bbe85837ca1de837f7321a0bec0c 100644 (file)
--- a/youtube_dl/extractor/rtlnow.py
+++ b/youtube_dl/extractor/rtlnow.py
@@ -7,14 +7,15 @@ from ..utils import (
      ExtractorError,
  )
  
+
  class RTLnowIE(InfoExtractor):
      """Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
-    _VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?rtlnitronow\.de/|(?:www\.)?superrtlnow\.de/|(?:www\.)?n-tvnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
+    _VALID_URL = r'(?:http://)?(?P<url>(?P<domain>rtl-now\.rtl\.de|rtl2now\.rtl2\.de|(?:www\.)?voxnow\.de|(?:www\.)?rtlnitronow\.de|(?:www\.)?superrtlnow\.de|(?:www\.)?n-tvnow\.de)/+[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
      _TESTS = [{
          u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
          u'file': u'90419.flv',
          u'info_dict': {
-            u'upload_date': u'20070416', 
+            u'upload_date': u'20070416',
              u'title': u'Ahornallee - Folge 1 - Der Einzug',
              u'description': u'Folge 1 - Der Einzug',
          },
@@ -81,7 +82,7 @@ class RTLnowIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
  
          webpage_url = u'http://' + mobj.group('url')
-        video_page_url = u'http://' + mobj.group('base_url')
+        video_page_url = u'http://' + mobj.group('domain') + u'/'
          video_id = mobj.group(u'video_id')
  
          webpage = self._download_webpage(webpage_url, video_id)
diff --git a/youtube_dl/extractor/rutube.py b/youtube_dl/extractor/rutube.py

index a18034fe26411288bf13f49ea86ee617317fc780..e3e9bc07ffbf9cfbfb6a092f6f88583a31a012fb 100644 (file)
--- a/youtube_dl/extractor/rutube.py
+++ b/youtube_dl/extractor/rutube.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class RutubeIE(InfoExtractor):
-    _VALID_URL = r'https?://rutube.ru/video/(?P<long_id>\w+)'
+    _VALID_URL = r'https?://rutube\.ru/video/(?P<long_id>\w+)'
  
      _TEST = {
          u'url': u'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
diff --git a/youtube_dl/extractor/slashdot.py b/youtube_dl/extractor/slashdot.py

index f5003c7f91bc78d10a63d25604537e5e77f9fdb8..d68646d24bf80c31e31ec71c6d7a4fe872f8b033 100644 (file)
--- a/youtube_dl/extractor/slashdot.py
+++ b/youtube_dl/extractor/slashdot.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class SlashdotIE(InfoExtractor):
-    _VALID_URL = r'https?://tv.slashdot.org/video/\?embed=(?P<id>.*?)(&|$)'
+    _VALID_URL = r'https?://tv\.slashdot\.org/video/\?embed=(?P<id>.*?)(&|$)'
  
      _TEST = {
          u'add_ie': ['Ooyala'],
diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py

index f035a3214c60461ff2c35ee14faf159d4c6d44f5..beea58d6317727133f85b74c14097445cf785dc5 100644 (file)
--- a/youtube_dl/extractor/smotri.py
+++ b/youtube_dl/extractor/smotri.py
@@ -1,13 +1,17 @@
  # encoding: utf-8
  
+import os.path
  import re
  import json
  import hashlib
+import uuid
  
  from .common import InfoExtractor
  from ..utils import (
-    determine_ext,
-    ExtractorError
+    compat_urllib_parse,
+    compat_urllib_request,
+    ExtractorError,
+    url_basename,
  )
  
  
@@ -130,7 +134,16 @@ class SmotriIE(InfoExtractor):
          # We will extract some from the video web page instead
          video_page_url = 'http://' + mobj.group('url')
          video_page = self._download_webpage(video_page_url, video_id, u'Downloading video page')
-        
+
+        # Warning if video is unavailable
+        warning = self._html_search_regex(
+            r'<div class="videoUnModer">(.*?)</div>', video_page,
+            u'warning messagef', default=None)
+        if warning is not None:
+            self._downloader.report_warning(
+                u'Video %s may not be available; smotri said: %s ' %
+                (video_id, warning))
+
          # Adult content
          if re.search(u'EroConfirmText">', video_page) is not None:
              self.report_age_confirmation()
@@ -146,38 +159,44 @@ class SmotriIE(InfoExtractor):
          # Extract the rest of meta data
          video_title = self._search_meta(u'name', video_page, u'title')
          if not video_title:
-            video_title = video_url.rsplit('/', 1)[-1]
+            video_title = os.path.splitext(url_basename(video_url))[0]
  
          video_description = self._search_meta(u'description', video_page)
          END_TEXT = u' на сайте Smotri.com'
-        if video_description.endswith(END_TEXT):
+        if video_description and video_description.endswith(END_TEXT):
              video_description = video_description[:-len(END_TEXT)]
          START_TEXT = u'Смотреть онлайн ролик '
-        if video_description.startswith(START_TEXT):
+        if video_description and video_description.startswith(START_TEXT):
              video_description = video_description[len(START_TEXT):]
          video_thumbnail = self._search_meta(u'thumbnail', video_page)
  
          upload_date_str = self._search_meta(u'uploadDate', video_page, u'upload date')
-        upload_date_m = re.search(r'(?P<year>\d{4})\.(?P<month>\d{2})\.(?P<day>\d{2})T', upload_date_str)
-        video_upload_date = (
-            (
-                upload_date_m.group('year') +
-                upload_date_m.group('month') +
-                upload_date_m.group('day')
+        if upload_date_str:
+            upload_date_m = re.search(r'(?P<year>\d{4})\.(?P<month>\d{2})\.(?P<day>\d{2})T', upload_date_str)
+            video_upload_date = (
+                (
+                    upload_date_m.group('year') +
+                    upload_date_m.group('month') +
+                    upload_date_m.group('day')
+                )
+                if upload_date_m else None
              )
-            if upload_date_m else None
-        )
+        else:
+            video_upload_date = None
          
          duration_str = self._search_meta(u'duration', video_page)
-        duration_m = re.search(r'T(?P<hours>[0-9]{2})H(?P<minutes>[0-9]{2})M(?P<seconds>[0-9]{2})S', duration_str)
-        video_duration = (
-            (
-                (int(duration_m.group('hours')) * 60 * 60) +
-                (int(duration_m.group('minutes')) * 60) +
-                int(duration_m.group('seconds'))
+        if duration_str:
+            duration_m = re.search(r'T(?P<hours>[0-9]{2})H(?P<minutes>[0-9]{2})M(?P<seconds>[0-9]{2})S', duration_str)
+            video_duration = (
+                (
+                    (int(duration_m.group('hours')) * 60 * 60) +
+                    (int(duration_m.group('minutes')) * 60) +
+                    int(duration_m.group('seconds'))
+                )
+                if duration_m else None
              )
-            if duration_m else None
-        )
+        else:
+            video_duration = None
          
          video_uploader = self._html_search_regex(
              u'<div class="DescrUser"><div>Автор.*?onmouseover="popup_user_info[^"]+">(.*?)</a>',
@@ -200,7 +219,7 @@ class SmotriIE(InfoExtractor):
              'uploader': video_uploader,
              'upload_date': video_upload_date,
              'uploader_id': video_uploader_id,
-            'video_duration': video_duration,
+            'duration': video_duration,
              'view_count': video_view_count,
              'age_limit': 18 if adult_content else 0,
              'video_page_url': video_page_url
@@ -250,3 +269,105 @@ class SmotriUserIE(InfoExtractor):
              u'user nickname')
  
          return self.playlist_result(entries, user_id, user_nickname)
+
+
+class SmotriBroadcastIE(InfoExtractor):
+    IE_DESC = u'Smotri.com broadcasts'
+    IE_NAME = u'smotri:broadcast'
+    _VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<broadcastid>[^/]+))/?.*'
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        broadcast_id = mobj.group('broadcastid')
+
+        broadcast_url = 'http://' + mobj.group('url')
+        broadcast_page = self._download_webpage(broadcast_url, broadcast_id, u'Downloading broadcast page')
+
+        if re.search(u'>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
+            raise ExtractorError(u'Broadcast %s does not exist' % broadcast_id, expected=True)
+
+        # Adult content
+        if re.search(u'EroConfirmText">', broadcast_page) is not None:
+
+            (username, password) = self._get_login_info()
+            if username is None:
+                raise ExtractorError(u'Erotic broadcasts allowed only for registered users, '
+                    u'use --username and --password options to provide account credentials.', expected=True)
+
+            # Log in
+            login_form_strs = {
+                u'login-hint53': '1',
+                u'confirm_erotic': '1',
+                u'login': username,
+                u'password': password,
+            }
+            # Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
+            # chokes on unicode
+            login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
+            login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
+            login_url = broadcast_url + '/?no_redirect=1'
+            request = compat_urllib_request.Request(login_url, login_data)
+            request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+            broadcast_page = self._download_webpage(
+                request, broadcast_id, note=u'Logging in and confirming age')
+
+            if re.search(u'>Неверный логин или пароль<', broadcast_page) is not None:
+                raise ExtractorError(u'Unable to log in: bad username or password', expected=True)
+
+            adult_content = True
+        else:
+            adult_content = False
+
+        ticket = self._html_search_regex(
+            u'window\.broadcast_control\.addFlashVar\\(\'file\', \'([^\']+)\'\\);',
+            broadcast_page, u'broadcast ticket')
+
+        url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
+
+        broadcast_password = self._downloader.params.get('videopassword', None)
+        if broadcast_password:
+            url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
+
+        broadcast_json_page = self._download_webpage(url, broadcast_id, u'Downloading broadcast JSON')
+
+        try:
+            broadcast_json = json.loads(broadcast_json_page)
+
+            protected_broadcast = broadcast_json['_pass_protected'] == 1
+            if protected_broadcast and not broadcast_password:
+                raise ExtractorError(u'This broadcast is protected by a password, use the --video-password option', expected=True)
+
+            broadcast_offline = broadcast_json['is_play'] == 0
+            if broadcast_offline:
+                raise ExtractorError(u'Broadcast %s is offline' % broadcast_id, expected=True)
+
+            rtmp_url = broadcast_json['_server']
+            if not rtmp_url.startswith('rtmp://'):
+                raise ExtractorError(u'Unexpected broadcast rtmp URL')
+
+            broadcast_playpath = broadcast_json['_streamName']
+            broadcast_thumbnail = broadcast_json['_imgURL']
+            broadcast_title = broadcast_json['title']
+            broadcast_description = broadcast_json['description']
+            broadcaster_nick = broadcast_json['nick']
+            broadcaster_login = broadcast_json['login']
+            rtmp_conn = 'S:%s' % uuid.uuid4().hex
+        except KeyError:
+            if protected_broadcast:
+                raise ExtractorError(u'Bad broadcast password', expected=True)
+            raise ExtractorError(u'Unexpected broadcast JSON')
+
+        return {
+            'id': broadcast_id,
+            'url': rtmp_url,
+            'title': broadcast_title,
+            'thumbnail': broadcast_thumbnail,
+            'description': broadcast_description,
+            'uploader': broadcaster_nick,
+            'uploader_id': broadcaster_login,
+            'age_limit': 18 if adult_content else 0,
+            'ext': 'flv',
+            'play_path': broadcast_playpath,
+            'rtmp_live': True,
+            'rtmp_conn': rtmp_conn
+        }
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 3a19ab17222831d87ffde4992e5712b01359e6eb..e22ff9c387ab0e01c1e6fcb1da793af877f37a5c 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -1,3 +1,4 @@
+# encoding: utf-8
  import json
  import re
  import itertools
@@ -23,9 +24,12 @@ class SoundcloudIE(InfoExtractor):
       """
  
      _VALID_URL = r'''^(?:https?://)?
-                    (?:(?:(?:www\.)?soundcloud\.com/([\w\d-]+)/([\w\d-]+)/?(?:[?].*)?$)
+                    (?:(?:(?:www\.|m\.)?soundcloud\.com/
+                            (?P<uploader>[\w\d-]+)/
+                            (?!sets/)(?P<title>[\w\d-]+)/?
+                            (?P<token>[^?]+?)?(?:[?].*)?$)
                         |(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+))
-                       |(?P<widget>w.soundcloud.com/player/?.*?url=.*)
+                       |(?P<widget>w\.soundcloud\.com/player/?.*?url=.*)
                      )
                      '''
      IE_NAME = u'soundcloud'
@@ -56,6 +60,32 @@ class SoundcloudIE(InfoExtractor):
                  u'skip_download': True,
              },
          },
+        # private link
+        {
+            u'url': u'https://soundcloud.com/jaimemf/youtube-dl-test-video-a-y-baw/s-8Pjrp',
+            u'md5': u'aa0dd32bfea9b0c5ef4f02aacd080604',
+            u'info_dict': {
+                u'id': u'123998367',
+                u'ext': u'mp3',
+                u'title': u'Youtube - Dl Test Video \'\' Ä↭',
+                u'uploader': u'jaimeMF',
+                u'description': u'test chars:  \"\'/\\ä↭',
+                u'upload_date': u'20131209',
+            },
+        },
+        # downloadable song
+        {
+            u'url': u'https://soundcloud.com/simgretina/just-your-problem-baby-1',
+            u'md5': u'56a8b69568acaa967b4c49f9d1d52d19',
+            u'info_dict': {
+                u'id': u'105614606',
+                u'ext': u'wav',
+                u'title': u'Just Your Problem Baby (Acapella)',
+                u'description': u'Vocals',
+                u'uploader': u'Sim Gretina',
+                u'upload_date': u'20130815',
+            },
+        },
      ]
  
      _CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
@@ -73,7 +103,7 @@ class SoundcloudIE(InfoExtractor):
      def _resolv_url(cls, url):
          return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
  
-    def _extract_info_dict(self, info, full_title=None, quiet=False):
+    def _extract_info_dict(self, info, full_title=None, quiet=False, secret_token=None):
          track_id = compat_str(info['id'])
          name = full_title or track_id
          if quiet:
@@ -82,7 +112,7 @@ class SoundcloudIE(InfoExtractor):
          thumbnail = info['artwork_url']
          if thumbnail is not None:
              thumbnail = thumbnail.replace('-large', '-t500x500')
-        ext = info.get('original_format', u'mp3')
+        ext = u'mp3'
          result = {
              'id': track_id,
              'uploader': info['user']['username'],
@@ -98,14 +128,16 @@ class SoundcloudIE(InfoExtractor):
                      track_id, self._CLIENT_ID))
              result['formats'] = [{
                  'format_id': 'download',
-                'ext': ext,
+                'ext': info.get('original_format', u'mp3'),
                  'url': format_url,
                  'vcodec': 'none',
              }]
          else:
              # We have to retrieve the url
+            streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
+                'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
              stream_json = self._download_webpage(
-                'http://api.soundcloud.com/i1/tracks/{0}/streams?client_id={1}'.format(track_id, self._IPHONE_CLIENT_ID),
+                streams_url,
                  track_id, u'Downloading track url')
  
              formats = []
@@ -157,6 +189,7 @@ class SoundcloudIE(InfoExtractor):
              raise ExtractorError(u'Invalid URL: %s' % url)
  
          track_id = mobj.group('track_id')
+        token = None
          if track_id is not None:
              info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
              full_title = track_id
@@ -165,19 +198,22 @@ class SoundcloudIE(InfoExtractor):
              return self.url_result(query['url'][0], ie='Soundcloud')
          else:
              # extract uploader (which is in the url)
-            uploader = mobj.group(1)
+            uploader = mobj.group('uploader')
              # extract simple title (uploader + slug of song title)
-            slug_title =  mobj.group(2)
-            full_title = '%s/%s' % (uploader, slug_title)
+            slug_title =  mobj.group('title')
+            token = mobj.group('token')
+            full_title = resolve_title = '%s/%s' % (uploader, slug_title)
+            if token:
+                resolve_title += '/%s' % token
      
              self.report_resolve(full_title)
      
-            url = 'http://soundcloud.com/%s/%s' % (uploader, slug_title)
+            url = 'http://soundcloud.com/%s' % resolve_title
              info_json_url = self._resolv_url(url)
          info_json = self._download_webpage(info_json_url, full_title, u'Downloading info JSON')
  
          info = json.loads(info_json)
-        return self._extract_info_dict(info, full_title)
+        return self._extract_info_dict(info, full_title, secret_token=token)
  
  class SoundcloudSetIE(SoundcloudIE):
      _VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/sets/([\w\d-]+)(?:[?].*)?$'
@@ -217,7 +253,7 @@ class SoundcloudSetIE(SoundcloudIE):
  
  
  class SoundcloudUserIE(SoundcloudIE):
-    _VALID_URL = r'https?://(www\.)?soundcloud.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
+    _VALID_URL = r'https?://(www\.)?soundcloud\.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
      IE_NAME = u'soundcloud:user'
  
      # it's in tests/test_playlists.py
diff --git a/youtube_dl/extractor/space.py b/youtube_dl/extractor/space.py

index 0d32a068895e1c0a53cd23c61d6cdc233d82826d..11455e0fa212f3ab6ec2b9cb258f2824346a2862 100644 (file)
--- a/youtube_dl/extractor/space.py
+++ b/youtube_dl/extractor/space.py
@@ -6,7 +6,7 @@ from ..utils import RegexNotFoundError, ExtractorError
  
  
  class SpaceIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.space\.com/\d+-(?P<title>[^/\.\?]*?)-video.html'
+    _VALID_URL = r'https?://www\.space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
      _TEST = {
          u'add_ie': ['Brightcove'],
          u'url': u'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
diff --git a/youtube_dl/extractor/stanfordoc.py b/youtube_dl/extractor/stanfordoc.py

index b27838bf9dc5ea430f01b054eb152d78fb946d0d..44c52c718e2090cca8f28c6542d67ae4585332e6 100644 (file)
--- a/youtube_dl/extractor/stanfordoc.py
+++ b/youtube_dl/extractor/stanfordoc.py
@@ -1,14 +1,7 @@
  import re
-import socket
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
  from ..utils import (
-    compat_http_client,
-    compat_str,
-    compat_urllib_error,
-    compat_urllib_request,
-
      ExtractorError,
      orderedSet,
      unescapeHTML,
@@ -18,7 +11,7 @@ from ..utils import (
  class StanfordOpenClassroomIE(InfoExtractor):
      IE_NAME = u'stanfordoc'
      IE_DESC = u'Stanford Open ClassRoom'
-    _VALID_URL = r'^(?:https?://)?openclassroom.stanford.edu(?P<path>/?|(/MainFolder/(?:HomePage|CoursePage|VideoPage)\.php([?]course=(?P<course>[^&]+)(&video=(?P<video>[^&]+))?(&.*)?)?))$'
+    _VALID_URL = r'^(?:https?://)?openclassroom\.stanford\.edu(?P<path>/?|(/MainFolder/(?:HomePage|CoursePage|VideoPage)\.php([?]course=(?P<course>[^&]+)(&video=(?P<video>[^&]+))?(&.*)?)?))$'
      _TEST = {
          u'url': u'http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=PracticalUnix&video=intro-environment&speed=100',
          u'file': u'PracticalUnix_intro-environment.mp4',
@@ -45,11 +38,7 @@ class StanfordOpenClassroomIE(InfoExtractor):
              self.report_extraction(info['id'])
              baseUrl = 'http://openclassroom.stanford.edu/MainFolder/courses/' + course + '/videos/'
              xmlUrl = baseUrl + video + '.xml'
-            try:
-                metaXml = compat_urllib_request.urlopen(xmlUrl).read()
-            except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-                raise ExtractorError(u'Unable to download video info XML: %s' % compat_str(err))
-            mdoc = xml.etree.ElementTree.fromstring(metaXml)
+            mdoc = self._download_xml(xmlUrl, info['id'])
              try:
                  info['title'] = mdoc.findall('./title')[0].text
                  info['url'] = baseUrl + mdoc.findall('./videoFile')[0].text
@@ -95,12 +84,9 @@ class StanfordOpenClassroomIE(InfoExtractor):
                  'upload_date': None,
              }
  
-            self.report_download_webpage(info['id'])
              rootURL = 'http://openclassroom.stanford.edu/MainFolder/HomePage.php'
-            try:
-                rootpage = compat_urllib_request.urlopen(rootURL).read()
-            except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-                raise ExtractorError(u'Unable to download course info page: ' + compat_str(err))
+            rootpage = self._download_webpage(rootURL, info['id'],
+                errnote=u'Unable to download course info page')
  
              info['title'] = info['id']
  
diff --git a/youtube_dl/extractor/tf1.py b/youtube_dl/extractor/tf1.py

index 772134a128e6f75d3a15d4fbb4ee37a776edfe10..2c5c88be8ede5ae6d0fa9f3c4e540cddb13190b6 100644 (file)
--- a/youtube_dl/extractor/tf1.py
+++ b/youtube_dl/extractor/tf1.py
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  class TF1IE(InfoExtractor):
      """TF1 uses the wat.tv player."""
-    _VALID_URL = r'http://videos.tf1.fr/.*-(.*?).html'
+    _VALID_URL = r'http://videos\.tf1\.fr/.*-(.*?)\.html'
      _TEST = {
          u'url': u'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
          u'file': u'10635995.mp4',
diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py

new file mode 100644 (file)

index 0000000..cec6526
--- /dev/null
+++ b/youtube_dl/extractor/theplatform.py
@@ -0,0 +1,80 @@
+import re
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    xpath_with_ns,
+)
+
+_x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language'})
+
+
+class ThePlatformIE(InfoExtractor):
+    _VALID_URL = r'(?:https?://link\.theplatform\.com/s/[^/]+/|theplatform:)(?P<id>[^/\?]+)'
+
+    _TEST = {
+        # from http://www.metacafe.com/watch/cb-e9I_cZgTgIPd/blackberrys_big_bold_z30/
+        u'url': u'http://link.theplatform.com/s/dJ5BDC/e9I_cZgTgIPd/meta.smil?format=smil&Tracking=true&mbr=true',
+        u'info_dict': {
+            u'id': u'e9I_cZgTgIPd',
+            u'ext': u'flv',
+            u'title': u'Blackberry\'s big, bold Z30',
+            u'description': u'The Z30 is Blackberry\'s biggest, baddest mobile messaging device yet.',
+            u'duration': 247,
+        },
+        u'params': {
+            # rtmp download
+            u'skip_download': True,
+        },
+    }
+
+    def _get_info(self, video_id):
+        smil_url = ('http://link.theplatform.com/s/dJ5BDC/{0}/meta.smil?'
+            'format=smil&mbr=true'.format(video_id))
+        meta = self._download_xml(smil_url, video_id)
+
+        try:
+            error_msg = next(
+                n.attrib['abstract']
+                for n in meta.findall(_x('.//smil:ref'))
+                if n.attrib.get('title') == u'Geographic Restriction')
+        except StopIteration:
+            pass
+        else:
+            raise ExtractorError(error_msg, expected=True)
+
+        info_url = 'http://link.theplatform.com/s/dJ5BDC/{0}?format=preview'.format(video_id)
+        info_json = self._download_webpage(info_url, video_id)
+        info = json.loads(info_json)
+
+        head = meta.find(_x('smil:head'))
+        body = meta.find(_x('smil:body'))
+        base_url = head.find(_x('smil:meta')).attrib['base']
+        switch = body.find(_x('smil:switch'))
+        formats = []
+        for f in switch.findall(_x('smil:video')):
+            attr = f.attrib
+            formats.append({
+                'url': base_url,
+                'play_path': 'mp4:' + attr['src'],
+                'ext': 'flv',
+                'width': int(attr['width']),
+                'height': int(attr['height']),
+                'vbr': int(attr['system-bitrate']),
+            })
+        formats.sort(key=lambda f: (f['height'], f['width'], f['vbr']))
+
+        return {
+            'id': video_id,
+            'title': info['title'],
+            'formats': formats,
+            'description': info['description'],
+            'thumbnail': info['defaultThumbnailUrl'],
+            'duration': info['duration']//1000,
+        }
+        
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        return self._get_info(video_id)
diff --git a/youtube_dl/extractor/unistra.py b/youtube_dl/extractor/unistra.py

index 516e18914e0f81a3fc0c7a137afaa487d63b7045..474610eec79483da01c14ca3e1d985b7aa8fd49a 100644 (file)
--- a/youtube_dl/extractor/unistra.py
+++ b/youtube_dl/extractor/unistra.py
@@ -3,7 +3,7 @@ import re
  from .common import InfoExtractor
  
  class UnistraIE(InfoExtractor):
-    _VALID_URL = r'http://utv.unistra.fr/(?:index|video).php\?id_video\=(\d+)'
+    _VALID_URL = r'http://utv\.unistra\.fr/(?:index|video)\.php\?id_video\=(\d+)'
  
      _TEST = {
          u'url': u'http://utv.unistra.fr/video.php?id_video=154',
diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py

index 4f803bcd3c02c69bc390fa0784e8622007f9db49..5a136a9527613e2fb076c1169b3ef5e90c24eafd 100644 (file)
--- a/youtube_dl/extractor/vbox7.py
+++ b/youtube_dl/extractor/vbox7.py
@@ -15,7 +15,7 @@ class Vbox7IE(InfoExtractor):
      _TEST = {
          u'url': u'http://vbox7.com/play:249bb972c2',
          u'file': u'249bb972c2.flv',
-        u'md5': u'9c70d6d956f888bdc08c124acc120cfe',
+        u'md5': u'99f65c0c9ef9b682b97313e052734c3f',
          u'info_dict': {
              u"title": u"\u0421\u043c\u044f\u0445! \u0427\u0443\u0434\u043e - \u0447\u0438\u0441\u0442 \u0437\u0430 \u0441\u0435\u043a\u0443\u043d\u0434\u0438 - \u0421\u043a\u0440\u0438\u0442\u0430 \u043a\u0430\u043c\u0435\u0440\u0430"
          }
diff --git a/youtube_dl/extractor/veehd.py b/youtube_dl/extractor/veehd.py

index 3a99a29c6520ba6824a9060264d678a5cf31e6e6..3cf8c853d2e466e00228d7eb3cb0f33d664beb9b 100644 (file)
--- a/youtube_dl/extractor/veehd.py
+++ b/youtube_dl/extractor/veehd.py
@@ -9,7 +9,7 @@ from ..utils import (
  )
  
  class VeeHDIE(InfoExtractor):
-    _VALID_URL = r'https?://veehd.com/video/(?P<id>\d+)'
+    _VALID_URL = r'https?://veehd\.com/video/(?P<id>\d+)'
  
      _TEST = {
          u'url': u'http://veehd.com/video/4686958',
diff --git a/youtube_dl/extractor/vevo.py b/youtube_dl/extractor/vevo.py

index 4378b17800f1df78275d68a9525ca95585dc8b9d..a4b26a26f4132840c57700fad96785dfb390a8db 100644 (file)
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -15,7 +15,12 @@ class VevoIE(InfoExtractor):
      Accepts urls from vevo.com or in the format 'vevo:{id}'
      (currently used by MTVIE)
      """
-    _VALID_URL = r'((http://www.vevo.com/watch/.*?/.*?/)|(vevo:))(?P<id>.*?)(\?|$)'
+    _VALID_URL = r'''(?x)
+        (?:https?://www\.vevo\.com/watch/(?:[^/]+/[^/]+/)?|
+           https?://cache\.vevo\.com/m/html/embed\.html\?video=|
+           https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
+           vevo:)
+        (?P<id>[^&?#]+)'''
      _TESTS = [{
          u'url': u'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
          u'file': u'GB1101300280.mp4',
@@ -24,7 +29,7 @@ class VevoIE(InfoExtractor):
              u"upload_date": u"20130624",
              u"uploader": u"Hurts",
              u"title": u"Somebody to Die For",
-            u"duration": 230,
+            u"duration": 230.12,
              u"width": 1920,
              u"height": 1080,
          }
diff --git a/youtube_dl/extractor/vice.py b/youtube_dl/extractor/vice.py

index 6b93afa50f765a0be24265b3154ee8f670f68312..87812d6afa6db12558fbd5abd314f2046f29bdd4 100644 (file)
--- a/youtube_dl/extractor/vice.py
+++ b/youtube_dl/extractor/vice.py
@@ -6,7 +6,7 @@ from ..utils import ExtractorError
  
  
  class ViceIE(InfoExtractor):
-    _VALID_URL = r'http://www.vice.com/.*?/(?P<name>.+)'
+    _VALID_URL = r'http://www\.vice\.com/.*?/(?P<name>.+)'
  
      _TEST = {
          u'url': u'http://www.vice.com/Fringes/cowboy-capitalists-part-1',
diff --git a/youtube_dl/extractor/viddler.py b/youtube_dl/extractor/viddler.py

index 75335dfb8797e83c7413f0c8bee86603ed429847..9328ef4a2121f091c256e9324d0de0e8b7dcbecd 100644 (file)
--- a/youtube_dl/extractor/viddler.py
+++ b/youtube_dl/extractor/viddler.py
@@ -2,13 +2,10 @@ import json
  import re
  
  from .common import InfoExtractor
-from ..utils import (
-    determine_ext,
-)
  
  
  class ViddlerIE(InfoExtractor):
-    _VALID_URL = r'(?P<domain>https?://(?:www\.)?viddler.com)/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
+    _VALID_URL = r'(?P<domain>https?://(?:www\.)?viddler\.com)/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
      _TEST = {
          u"url": u"http://www.viddler.com/v/43903784",
          u'file': u'43903784.mp4',
diff --git a/youtube_dl/extractor/videofyme.py b/youtube_dl/extractor/videofyme.py

index 912802d9aa22082f2f39148db7920a6287c74ec6..f75169041b4f958b9f345daba99a4a1ba575cf4e 100644 (file)
--- a/youtube_dl/extractor/videofyme.py
+++ b/youtube_dl/extractor/videofyme.py
@@ -7,7 +7,7 @@ from ..utils import (
  )
  
  class VideofyMeIE(InfoExtractor):
-    _VALID_URL = r'https?://(www.videofy.me/.+?|p.videofy.me/v)/(?P<id>\d+)(&|#|$)'
+    _VALID_URL = r'https?://(www\.videofy\.me/.+?|p\.videofy\.me/v)/(?P<id>\d+)(&|#|$)'
      IE_NAME = u'videofy.me'
  
      _TEST = {
diff --git a/youtube_dl/extractor/videopremium.py b/youtube_dl/extractor/videopremium.py

index acae81448e38e3b362fcfdd93b4a6dcd9cc5f7d0..65463c73324ca83ab87b45bc33d569c3fe881163 100644 (file)
--- a/youtube_dl/extractor/videopremium.py
+++ b/youtube_dl/extractor/videopremium.py
@@ -15,6 +15,7 @@ class VideoPremiumIE(InfoExtractor):
          u'params': {
              u'skip_download': True,
          },
+        u'skip': u'Test file has been deleted.',
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index f27763ae2ff110051613243ef2543e3349d6b019..c3623fcbe6b01493c5ec2115f4fe5f2d32737e59 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -16,11 +16,20 @@ from ..utils import (
      unsmuggle_url,
  )
  
+
  class VimeoIE(InfoExtractor):
      """Information extractor for vimeo.com."""
  
      # _VALID_URL matches Vimeo URLs
-    _VALID_URL = r'(?P<proto>https?://)?(?:(?:www|(?P<player>player))\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)/?(?:[?].*)?(?:#.*)?$'
+    _VALID_URL = r'''(?x)
+        (?P<proto>https?://)?
+        (?:(?:www|(?P<player>player))\.)?
+        vimeo(?P<pro>pro)?\.com/
+        (?:.*?/)?
+        (?:(?:play_redirect_hls|moogaloop\.swf)\?clip_id=)?
+        (?:videos?/)?
+        (?P<id>[0-9]+)
+        /?(?:[?&].*)?(?:[#].*)?$'''
      _NETRC_MACHINE = 'vimeo'
      IE_NAME = u'vimeo'
      _TESTS = [
@@ -115,7 +124,7 @@ class VimeoIE(InfoExtractor):
      def _real_initialize(self):
          self._login()
  
-    def _real_extract(self, url, new_video=True):
+    def _real_extract(self, url):
          url, data = unsmuggle_url(url)
          headers = std_headers
          if data is not None:
@@ -151,8 +160,14 @@ class VimeoIE(InfoExtractor):
                  config = json.loads(config_json)
              except RegexNotFoundError:
                  # For pro videos or player.vimeo.com urls
-                config = self._search_regex([r' = {config:({.+?}),assets:', r'(?:c|b)=({.+?});'],
-                    webpage, u'info section', flags=re.DOTALL)
+                # We try to find out to which variable is assigned the config dic
+                m_variable_name = re.search('(\w)\.video\.id', webpage)
+                if m_variable_name is not None:
+                    config_re = r'%s=({.+?});' % re.escape(m_variable_name.group(1))
+                else:
+                    config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
+                config = self._search_regex(config_re, webpage, u'info section',
+                    flags=re.DOTALL)
                  config = json.loads(config)
          except Exception as e:
              if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
@@ -196,6 +211,16 @@ class VimeoIE(InfoExtractor):
          if mobj is not None:
              video_upload_date = mobj.group(1) + mobj.group(2) + mobj.group(3)
  
+        try:
+            view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, u'view count'))
+            like_count = int(self._search_regex(r'UserLikes:(\d+)', webpage, u'like count'))
+            comment_count = int(self._search_regex(r'UserComments:(\d+)', webpage, u'comment count'))
+        except RegexNotFoundError:
+            # This info is only available in vimeo.com/{id} urls
+            view_count = None
+            like_count = None
+            comment_count = None
+
          # Vimeo specific: extract request signature and timestamp
          sig = config['request']['signature']
          timestamp = config['request']['timestamp']
@@ -242,6 +267,9 @@ class VimeoIE(InfoExtractor):
              'description':  video_description,
              'formats': formats,
              'webpage_url': url,
+            'view_count': view_count,
+            'like_count': like_count,
+            'comment_count': comment_count,
          }
  
  
@@ -251,11 +279,17 @@ class VimeoChannelIE(InfoExtractor):
      _MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
      _TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'
  
+    def _page_url(self, base_url, pagenum):
+        return '%s/videos/page:%d/' % (base_url, pagenum)
+
+    def _extract_list_title(self, webpage):
+        return self._html_search_regex(self._TITLE_RE, webpage, u'list title')
+
      def _extract_videos(self, list_id, base_url):
          video_ids = []
          for pagenum in itertools.count(1):
              webpage = self._download_webpage(
-                '%s/videos/page:%d/' % (base_url, pagenum),list_id,
+                self._page_url(base_url, pagenum) ,list_id,
                  u'Downloading page %s' % pagenum)
              video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
              if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
@@ -263,11 +297,9 @@ class VimeoChannelIE(InfoExtractor):
  
          entries = [self.url_result('http://vimeo.com/%s' % video_id, 'Vimeo')
                     for video_id in video_ids]
-        list_title = self._html_search_regex(self._TITLE_RE, webpage,
-            u'list title')
          return {'_type': 'playlist',
                  'id': list_id,
-                'title': list_title,
+                'title': self._extract_list_title(webpage),
                  'entries': entries,
                  }
  
@@ -284,7 +316,7 @@ class VimeoUserIE(VimeoChannelIE):
  
      @classmethod
      def suitable(cls, url):
-        if VimeoChannelIE.suitable(url) or VimeoIE.suitable(url):
+        if VimeoChannelIE.suitable(url) or VimeoIE.suitable(url) or VimeoAlbumIE.suitable(url) or VimeoGroupsIE.suitable(url):
              return False
          return super(VimeoUserIE, cls).suitable(url)
  
@@ -292,3 +324,30 @@ class VimeoUserIE(VimeoChannelIE):
          mobj = re.match(self._VALID_URL, url)
          name = mobj.group('name')
          return self._extract_videos(name, 'http://vimeo.com/%s' % name)
+
+
+class VimeoAlbumIE(VimeoChannelIE):
+    IE_NAME = u'vimeo:album'
+    _VALID_URL = r'(?:https?://)?vimeo.\com/album/(?P<id>\d+)'
+    _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
+
+    def _page_url(self, base_url, pagenum):
+        return '%s/page:%d/' % (base_url, pagenum)
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        album_id =  mobj.group('id')
+        return self._extract_videos(album_id, 'http://vimeo.com/album/%s' % album_id)
+
+
+class VimeoGroupsIE(VimeoAlbumIE):
+    IE_NAME = u'vimeo:group'
+    _VALID_URL = r'(?:https?://)?vimeo.\com/groups/(?P<name>[^/]+)'
+
+    def _extract_list_title(self, webpage):
+        return self._og_search_title(webpage)
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        name = mobj.group('name')
+        return self._extract_videos(name, 'http://vimeo.com/groups/%s' % name)
diff --git a/youtube_dl/extractor/wat.py b/youtube_dl/extractor/wat.py

index 29c25f0e309c7d4179d1226ed0a079a0d17fcba6..4fab6c6e8511711047e3ba9143452397a0aca0fa 100644 (file)
--- a/youtube_dl/extractor/wat.py
+++ b/youtube_dl/extractor/wat.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class WatIE(InfoExtractor):
-    _VALID_URL=r'http://www.wat.tv/.*-(?P<shortID>.*?)_.*?.html'
+    _VALID_URL=r'http://www\.wat\.tv/.*-(?P<shortID>.*?)_.*?\.html'
      IE_NAME = 'wat.tv'
      _TEST = {
          u'url': u'http://www.wat.tv/video/world-war-philadelphia-vost-6bv55_2fjr7_.html',
diff --git a/youtube_dl/extractor/wimp.py b/youtube_dl/extractor/wimp.py

index b9c3b13f918f5fbffa73010cbefb46543dac9586..82a626e0eb866e4924f32f7809e33d59d9261168 100644 (file)
--- a/youtube_dl/extractor/wimp.py
+++ b/youtube_dl/extractor/wimp.py
@@ -11,7 +11,8 @@ class WimpIE(InfoExtractor):
          u'file': u'deerfence.flv',
          u'md5': u'8b215e2e0168c6081a1cf84b2846a2b5',
          u'info_dict': {
-            u"title": u"Watch Till End: Herd of deer jump over a fence."
+            u"title": u"Watch Till End: Herd of deer jump over a fence.",
+            u"description": u"These deer look as fluid as running water when they jump over this fence as a herd. This video is one that needs to be watched until the very end for the true majesty to be witnessed, but once it comes, it's sure to take your breath away.",
          }
      }
  
@@ -19,18 +20,14 @@ class WimpIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group(1)
          webpage = self._download_webpage(url, video_id)
-        title = self._search_regex(r'<meta name="description" content="(.+?)" />',webpage, 'video title')
-        thumbnail_url = self._search_regex(r'<meta property="og\:image" content="(.+?)" />', webpage,'video thumbnail')
          googleString = self._search_regex("googleCode = '(.*?)'", webpage, 'file url')
          googleString = base64.b64decode(googleString).decode('ascii')
-        final_url = self._search_regex('","(.*?)"', googleString,'final video url')
-        ext = final_url.rpartition(u'.')[2]
-
-        return [{
-            'id':        video_id,
-            'url':       final_url,
-            'ext':       ext,
-            'title':     title,
-            'thumbnail': thumbnail_url,
-        }]
+        final_url = self._search_regex('","(.*?)"', googleString, u'final video url')
  
+        return {
+            'id': video_id,
+            'url': final_url,
+            'title': self._og_search_title(webpage),
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'description': self._og_search_description(webpage),
+        }
diff --git a/youtube_dl/extractor/wistia.py b/youtube_dl/extractor/wistia.py

new file mode 100644 (file)

index 0000000..e1748c2
--- /dev/null
+++ b/youtube_dl/extractor/wistia.py
@@ -0,0 +1,55 @@
+import json
+import re
+
+from .common import InfoExtractor
+
+
+class WistiaIE(InfoExtractor):
+    _VALID_URL = r'^https?://(?:fast\.)?wistia\.net/embed/iframe/(?P<id>[a-z0-9]+)'
+
+    _TEST = {
+        u"url": u"http://fast.wistia.net/embed/iframe/sh7fpupwlt",
+        u"file": u"sh7fpupwlt.mov",
+        u"md5": u"cafeb56ec0c53c18c97405eecb3133df",
+        u"info_dict": {
+            u"title": u"cfh_resourceful_zdkh_final_1"
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+
+        webpage = self._download_webpage(url, video_id)
+        data_json = self._html_search_regex(
+            r'Wistia.iframeInit\((.*?), {}\);', webpage, u'video data')
+
+        data = json.loads(data_json)
+
+        formats = []
+        thumbnails = []
+        for atype, a in data['assets'].items():
+            if atype == 'still':
+                thumbnails.append({
+                    'url': a['url'],
+                    'resolution': '%dx%d' % (a['width'], a['height']),
+                })
+                continue
+            if atype == 'preview':
+                continue
+            formats.append({
+                'format_id': atype,
+                'url': a['url'],
+                'width': a['width'],
+                'height': a['height'],
+                'filesize': a['size'],
+                'ext': a['ext'],
+            })
+        formats.sort(key=lambda a: a['filesize'])
+
+        return {
+            'id': video_id,
+            'title': data['name'],
+            'formats': formats,
+            'thumbnails': thumbnails,
+        }
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index 279f75e7a1f5b860e81d955c33bb58fcea092cbc..ef9997ee4456f4ec1aafdbcd915ae3b670ce1489 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -26,7 +26,7 @@ class XHamsterIE(InfoExtractor):
      {
          u'url': u'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
          u'file': u'2221348.flv',
-        u'md5': u'970a94178ca4118c5aa3aaea21211b81',
+        u'md5': u'e767b9475de189320f691f49c679c4c7',
          u'info_dict': {
              u"upload_date": u"20130914",
              u"uploader_id": u"jojo747400",
@@ -46,7 +46,7 @@ class XHamsterIE(InfoExtractor):
                  return mobj.group('server')+'/key='+mobj.group('file')
  
          def is_hd(webpage):
-            return webpage.find('<div class=\'icon iconHD\'>') != -1
+            return webpage.find('<div class=\'icon iconHD\'') != -1
  
          mobj = re.match(self._VALID_URL, url)
  
diff --git a/youtube_dl/extractor/xtube.py b/youtube_dl/extractor/xtube.py

index e3458d2bd4abaa196190f886afce2e9ac05df191..1a6a7688d435bd275777aeb4ba5425cf56d00267 100644 (file)
--- a/youtube_dl/extractor/xtube.py
+++ b/youtube_dl/extractor/xtube.py
@@ -32,7 +32,7 @@ class XTubeIE(InfoExtractor):
  
          video_title = self._html_search_regex(r'<div class="p_5px[^>]*>([^<]+)', webpage, u'title')
          video_uploader = self._html_search_regex(r'so_s\.addVariable\("owner_u", "([^"]+)', webpage, u'uploader', fatal=False)
-        video_description = self._html_search_regex(r'<p class="video_description">([^<]+)', webpage, u'description', default=None)
+        video_description = self._html_search_regex(r'<p class="video_description">([^<]+)', webpage, u'description', fatal=False)
          video_url= self._html_search_regex(r'var videoMp4 = "([^"]+)', webpage, u'video_url').replace('\\/', '/')
          path = compat_urllib_parse_urlparse(video_url).path
          extension = os.path.splitext(path)[1][1:]
diff --git a/youtube_dl/extractor/yahoo.py b/youtube_dl/extractor/yahoo.py

index e457c4707a8feda7c3d0709c18671282b6da3814..5c9c361b9ee5658d307a7759040b855a3e794cf1 100644 (file)
--- a/youtube_dl/extractor/yahoo.py
+++ b/youtube_dl/extractor/yahoo.py
@@ -47,7 +47,7 @@ class YahooIE(InfoExtractor):
          # The 'meta' field is not always in the video webpage, we request it
          # from another page
          long_id = info['id']
-        return self._get_info(info['id'], video_id)
+        return self._get_info(long_id, video_id)
  
      def _get_info(self, long_id, video_id):
          query = ('SELECT * FROM yahoo.media.video.streams WHERE id="%s"'
diff --git a/youtube_dl/extractor/youjizz.py b/youtube_dl/extractor/youjizz.py

index 1fcc518acde9dbb08fef1ccb42a9ee7ae550967a..e971b5b4b3a32b801edb12efb429c8228417c307 100644 (file)
--- a/youtube_dl/extractor/youjizz.py
+++ b/youtube_dl/extractor/youjizz.py
@@ -7,7 +7,7 @@ from ..utils import (
  
  
  class YouJizzIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:\w+\.)?youjizz\.com/videos/(?P<videoid>[^.]+).html$'
+    _VALID_URL = r'^(?:https?://)?(?:\w+\.)?youjizz\.com/videos/(?P<videoid>[^.]+)\.html$'
      _TEST = {
          u'url': u'http://www.youjizz.com/videos/zeichentrick-1-2189178.html',
          u'file': u'2189178.flv',
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index 7fff761bd0b5a7835c5b4a11c3a1d15ac67567d8..a68576547e85f344d7ccaa78092fc0146b2e935e 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -7,7 +7,6 @@ import itertools
  import json
  import os.path
  import re
-import socket
  import string
  import struct
  import traceback
@@ -17,9 +16,7 @@ from .common import InfoExtractor, SearchInfoExtractor
  from .subtitles import SubtitlesInfoExtractor
  from ..utils import (
      compat_chr,
-    compat_http_client,
      compat_parse_qs,
-    compat_urllib_error,
      compat_urllib_parse,
      compat_urllib_request,
      compat_urlparse,
@@ -45,19 +42,11 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
      # If True it will raise an error if no login info is provided
      _LOGIN_REQUIRED = False
  
-    def report_lang(self):
-        """Report attempt to set language."""
-        self.to_screen(u'Setting language')
-
      def _set_language(self):
-        request = compat_urllib_request.Request(self._LANG_URL)
-        try:
-            self.report_lang()
-            compat_urllib_request.urlopen(request).read()
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            self._downloader.report_warning(u'unable to set language: %s' % compat_str(err))
-            return False
-        return True
+        return bool(self._download_webpage(
+            self._LANG_URL, None,
+            note=u'Setting language', errnote='unable to set language',
+            fatal=False))
  
      def _login(self):
          (username, password) = self._get_login_info()
@@ -67,12 +56,12 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
                  raise ExtractorError(u'No login info available, needed for using %s.' % self.IE_NAME, expected=True)
              return False
  
-        request = compat_urllib_request.Request(self._LOGIN_URL)
-        try:
-            login_page = compat_urllib_request.urlopen(request).read().decode('utf-8')
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            self._downloader.report_warning(u'unable to fetch login page: %s' % compat_str(err))
-            return False
+        login_page = self._download_webpage(
+            self._LOGIN_URL, None,
+            note=u'Downloading login page',
+            errnote=u'unable to fetch login page', fatal=False)
+        if login_page is False:
+            return
  
          galx = self._search_regex(r'(?s)<input.+?name="GALX".+?value="(.+?)"',
                                    login_page, u'Login GALX parameter')
@@ -102,29 +91,28 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
          # chokes on unicode
          login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
          login_data = compat_urllib_parse.urlencode(login_form).encode('ascii')
-        request = compat_urllib_request.Request(self._LOGIN_URL, login_data)
-        try:
-            self.report_login()
-            login_results = compat_urllib_request.urlopen(request).read().decode('utf-8')
-            if re.search(r'(?i)<form[^>]* id="gaia_loginform"', login_results) is not None:
-                self._downloader.report_warning(u'unable to log in: bad username or password')
-                return False
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            self._downloader.report_warning(u'unable to log in: %s' % compat_str(err))
+
+        req = compat_urllib_request.Request(self._LOGIN_URL, login_data)
+        login_results = self._download_webpage(
+            req, None,
+            note=u'Logging in', errnote=u'unable to log in', fatal=False)
+        if login_results is False:
+            return False
+        if re.search(r'(?i)<form[^>]* id="gaia_loginform"', login_results) is not None:
+            self._downloader.report_warning(u'unable to log in: bad username or password')
              return False
          return True
  
      def _confirm_age(self):
          age_form = {
-                'next_url':     '/',
-                'action_confirm':   'Confirm',
-                }
-        request = compat_urllib_request.Request(self._AGE_URL, compat_urllib_parse.urlencode(age_form))
-        try:
-            self.report_age_confirmation()
-            compat_urllib_request.urlopen(request).read().decode('utf-8')
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            raise ExtractorError(u'Unable to confirm age: %s' % compat_str(err))
+            'next_url': '/',
+            'action_confirm': 'Confirm',
+        }
+        req = compat_urllib_request.Request(self._AGE_URL, compat_urllib_parse.urlencode(age_form))
+
+        self._download_webpage(
+            req, None,
+            note=u'Confirming age', errnote=u'Unable to confirm age')
          return True
  
      def _real_initialize(self):
@@ -388,10 +376,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
          super(YoutubeIE, self).__init__(*args, **kwargs)
          self._player_cache = {}
  
-    def report_video_webpage_download(self, video_id):
-        """Report attempt to download video webpage."""
-        self.to_screen(u'%s: Downloading video webpage' % video_id)
-
      def report_video_info_webpage_download(self, video_id):
          """Report attempt to download video info webpage."""
          self.to_screen(u'%s: Downloading video info webpage' % video_id)
@@ -1258,15 +1242,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
          video_id = self._extract_id(url)
  
          # Get video webpage
-        self.report_video_webpage_download(video_id)
          url = 'https://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id
-        request = compat_urllib_request.Request(url)
-        try:
-            video_webpage_bytes = compat_urllib_request.urlopen(request).read()
-        except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            raise ExtractorError(u'Unable to download video webpage: %s' % compat_str(err))
-
-        video_webpage = video_webpage_bytes.decode('utf-8', 'ignore')
+        video_webpage = self._download_webpage(url, video_id)
  
          # Attempt to extract SWF player URL
          mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
@@ -1383,6 +1360,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
              else:
                  video_description = u''
  
+        def _extract_count(klass):
+            count = self._search_regex(
+                r'class="%s">([\d,]+)</span>' % re.escape(klass),
+                video_webpage, klass, default=None)
+            if count is not None:
+                return int(count.replace(',', ''))
+            return None
+        like_count = _extract_count(u'likes-count')
+        dislike_count = _extract_count(u'dislikes-count')
+
          # subtitles
          video_subtitles = self.extract_subtitles(video_id, video_webpage)
  
@@ -1392,9 +1379,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
  
          if 'length_seconds' not in video_info:
              self._downloader.report_warning(u'unable to extract video duration')
-            video_duration = ''
+            video_duration = None
          else:
-            video_duration = compat_urllib_parse.unquote_plus(video_info['length_seconds'][0])
+            video_duration = int(compat_urllib_parse.unquote_plus(video_info['length_seconds'][0]))
  
          # annotations
          video_annotations = None
@@ -1515,6 +1502,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
                  'annotations':  video_annotations,
                  'webpage_url': 'https://www.youtube.com/watch?v=%s' % video_id,
                  'view_count': view_count,
+                'like_count': like_count,
+                'dislike_count': dislike_count,
              })
          return results
  
@@ -1529,10 +1518,10 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
                             \? (?:.*?&)*? (?:p|a|list)=
                          |  p/
                          )
-                        ((?:PL|EC|UU|FL)?[0-9A-Za-z-_]{10,})
+                        ((?:PL|EC|UU|FL|RD)?[0-9A-Za-z-_]{10,})
                          .*
                       |
-                        ((?:PL|EC|UU|FL)[0-9A-Za-z-_]{10,})
+                        ((?:PL|EC|UU|FL|RD)[0-9A-Za-z-_]{10,})
                       )"""
      _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&page=%s'
      _MORE_PAGES_INDICATOR = r'data-link-type="next"'
@@ -1554,7 +1543,7 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
      def _extract_mix(self, playlist_id):
          # The mixes are generated from a a single video
          # the id of the playlist is just 'RD' + video_id
-        url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[2:], playlist_id)
+        url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
          webpage = self._download_webpage(url, playlist_id, u'Downloading Youtube mix')
          title_span = (get_element_by_attribute('class', 'title long-title', webpage) or
              get_element_by_attribute('class', 'title ', webpage))
@@ -1582,9 +1571,12 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
              else:
                  self.to_screen(u'Downloading playlist PL%s - add --no-playlist to just download video %s' % (playlist_id, video_id))
  
-        if len(playlist_id) == 13:  # 'RD' + 11 characters for the video id
+        if playlist_id.startswith('RD'):
              # Mixes require a custom extraction process
              return self._extract_mix(playlist_id)
+        if playlist_id.startswith('TL'):
+            raise ExtractorError(u'For downloading YouTube.com top lists, use '
+                u'the "yttoplist" keyword, for example "youtube-dl \'yttoplist:music:Top Tracks\'"', expected=True)
  
          # Extract the video ids from the playlist pages
          ids = []
@@ -1607,6 +1599,38 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
          return self.playlist_result(url_results, playlist_id, playlist_title)
  
  
+class YoutubeTopListIE(YoutubePlaylistIE):
+    IE_NAME = u'youtube:toplist'
+    IE_DESC = (u'YouTube.com top lists, "yttoplist:{channel}:{list title}"'
+        u' (Example: "yttoplist:music:Top Tracks")')
+    _VALID_URL = r'yttoplist:(?P<chann>.*?):(?P<title>.*?)$'
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        channel = mobj.group('chann')
+        title = mobj.group('title')
+        query = compat_urllib_parse.urlencode({'title': title})
+        playlist_re = 'href="([^"]+?%s[^"]+?)"' % re.escape(query)
+        channel_page = self._download_webpage('https://www.youtube.com/%s' % channel, title)
+        link = self._html_search_regex(playlist_re, channel_page, u'list')
+        url = compat_urlparse.urljoin('https://www.youtube.com/', link)
+        
+        video_re = r'data-index="\d+".*?data-video-id="([0-9A-Za-z_-]{11})"'
+        ids = []
+        # sometimes the webpage doesn't contain the videos
+        # retry until we get them
+        for i in itertools.count(0):
+            msg = u'Downloading Youtube mix'
+            if i > 0:
+                msg += ', retry #%d' % i
+            webpage = self._download_webpage(url, title, msg)
+            ids = orderedSet(re.findall(video_re, webpage))
+            if ids:
+                break
+        url_results = self._ids_to_results(ids)
+        return self.playlist_result(url_results, playlist_title=title)
+
+
  class YoutubeChannelIE(InfoExtractor):
      IE_DESC = u'YouTube.com channels'
      _VALID_URL = r"^(?:https?://)?(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/([0-9A-Za-z_-]+)"
@@ -1632,10 +1656,11 @@ class YoutubeChannelIE(InfoExtractor):
          video_ids = []
          url = 'https://www.youtube.com/channel/%s/videos' % channel_id
          channel_page = self._download_webpage(url, channel_id)
-        if re.search(r'channel-header-autogenerated-label', channel_page) is not None:
-            autogenerated = True
-        else:
-            autogenerated = False
+        autogenerated = re.search(r'''(?x)
+                class="[^"]*?(?:
+                    channel-header-autogenerated-label|
+                    yt-channel-title-autogenerated
+                )[^"]*"''', channel_page) is not None
  
          if autogenerated:
              # The videos are contained in a single page
@@ -1692,7 +1717,7 @@ class YoutubeUserIE(InfoExtractor):
          # page by page until there are no video ids - it means we got
          # all of them.
  
-        video_ids = []
+        url_results = []
  
          for pagenum in itertools.count(0):
              start_index = pagenum * self._GDATA_PAGE_SIZE + 1
@@ -1710,10 +1735,17 @@ class YoutubeUserIE(InfoExtractor):
                  break
  
              # Extract video identifiers
-            ids_in_page = []
-            for entry in response['feed']['entry']:
-                ids_in_page.append(entry['id']['$t'].split('/')[-1])
-            video_ids.extend(ids_in_page)
+            entries = response['feed']['entry']
+            for entry in entries:
+                title = entry['title']['$t']
+                video_id = entry['id']['$t'].split('/')[-1]
+                url_results.append({
+                    '_type': 'url',
+                    'url': video_id,
+                    'ie_key': 'Youtube',
+                    'id': 'video_id',
+                    'title': title,
+                })
  
              # A little optimization - if current page is not
              # "full", ie. does not contain PAGE_SIZE video ids then
@@ -1721,12 +1753,9 @@ class YoutubeUserIE(InfoExtractor):
              # are no more ids on further pages - no need to query
              # again.
  
-            if len(ids_in_page) < self._GDATA_PAGE_SIZE:
+            if len(entries) < self._GDATA_PAGE_SIZE:
                  break
  
-        url_results = [
-            self.url_result(video_id, 'Youtube', video_id=video_id)
-            for video_id in video_ids]
          return self.playlist_result(url_results, playlist_title=username)
  
  
@@ -1737,10 +1766,6 @@ class YoutubeSearchIE(SearchInfoExtractor):
      IE_NAME = u'youtube:search'
      _SEARCH_KEY = 'ytsearch'
  
-    def report_download_page(self, query, pagenum):
-        """Report attempt to download search page with given number."""
-        self._downloader.to_screen(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
-
      def _get_n_results(self, query, n):
          """Get a specified number of results for a query"""
  
@@ -1749,16 +1774,15 @@ class YoutubeSearchIE(SearchInfoExtractor):
          limit = n
  
          while (50 * pagenum) < limit:
-            self.report_download_page(query, pagenum+1)
              result_url = self._API_URL % (compat_urllib_parse.quote_plus(query), (50*pagenum)+1)
-            request = compat_urllib_request.Request(result_url)
-            try:
-                data = compat_urllib_request.urlopen(request).read().decode('utf-8')
-            except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-                raise ExtractorError(u'Unable to download API page: %s' % compat_str(err))
-            api_response = json.loads(data)['data']
-
-            if not 'items' in api_response:
+            data_json = self._download_webpage(
+                result_url, video_id=u'query "%s"' % query,
+                note=u'Downloading page %s' % (pagenum + 1),
+                errnote=u'Unable to download API page')
+            data = json.loads(data_json)
+            api_response = data['data']
+
+            if 'items' not in api_response:
                  raise ExtractorError(u'[youtube] No video results')
  
              new_ids = list(video['id'] for video in api_response['items'])
diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py

index 689f19735456e2a05defe8f2eb49c5b2f4848580..35ece354a6ecdf7ba5705184d4ceff22d57eb3d4 100644 (file)
--- a/youtube_dl/extractor/zdf.py
+++ b/youtube_dl/extractor/zdf.py
@@ -73,14 +73,14 @@ class ZDFIE(InfoExtractor):
              try:
                  proto_pref = -PROTO_ORDER.index(format_m.group('proto'))
              except ValueError:
-                proto_pref = 999
+                proto_pref = -999
  
              quality = fnode.find('./quality').text
              QUALITY_ORDER = ['veryhigh', '300', 'high', 'med', 'low']
              try:
                  quality_pref = -QUALITY_ORDER.index(quality)
              except ValueError:
-                quality_pref = 999
+                quality_pref = -999
  
              abr = int(fnode.find('./audioBitrate').text) // 1000
              vbr = int(fnode.find('./videoBitrate').text) // 1000
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index c486ef8ecfef9772aaabdb3863a2814349a296b7..2e48f187e665dad81caa663efdb9d0c33f088936 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -1,6 +1,7 @@
  #!/usr/bin/env python
  # -*- coding: utf-8 -*-
  
+import ctypes
  import datetime
  import email.utils
  import errno
@@ -15,9 +16,9 @@ import platform
  import re
  import ssl
  import socket
+import subprocess
  import sys
  import traceback
-import xml.etree.ElementTree
  import zlib
  
  try:
@@ -548,7 +549,7 @@ def make_HTTPS_handler(opts_no_check_certificate):
  
              def connect(self):
                  sock = socket.create_connection((self.host, self.port), self.timeout)
-                if self._tunnel_host:
+                if getattr(self, '_tunnel_host', False):
                      self.sock = sock
                      self._tunnel()
                  try:
@@ -562,11 +563,14 @@ def make_HTTPS_handler(opts_no_check_certificate):
          return HTTPSHandlerV3()
      else:
          context = ssl.SSLContext(ssl.PROTOCOL_SSLv3)
-        context.set_default_verify_paths()
-        
          context.verify_mode = (ssl.CERT_NONE
                                 if opts_no_check_certificate
                                 else ssl.CERT_REQUIRED)
+        context.set_default_verify_paths()
+        try:
+            context.load_default_certs()
+        except AttributeError:
+            pass  # Python < 3.4
          return compat_urllib_request.HTTPSHandler(context=context)
  
  class ExtractorError(Exception):
@@ -763,6 +767,10 @@ def unified_strdate(date_str):
              upload_date = datetime.datetime.strptime(date_str, expression).strftime('%Y%m%d')
          except:
              pass
+    if upload_date is None:
+        timetuple = email.utils.parsedate_tz(date_str)
+        if timetuple:
+            upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
      return upload_date
  
  def determine_ext(url, default_ext=u'unknown_video'):
@@ -1021,3 +1029,72 @@ def format_bytes(bytes):
      suffix = [u'B', u'KiB', u'MiB', u'GiB', u'TiB', u'PiB', u'EiB', u'ZiB', u'YiB'][exponent]
      converted = float(bytes) / float(1024 ** exponent)
      return u'%.2f%s' % (converted, suffix)
+
+
+def str_to_int(int_str):
+    int_str = re.sub(r'[,\.]', u'', int_str)
+    return int(int_str)
+
+
+def get_term_width():
+    columns = os.environ.get('COLUMNS', None)
+    if columns:
+        return int(columns)
+
+    try:
+        sp = subprocess.Popen(
+            ['stty', 'size'],
+            stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+        out, err = sp.communicate()
+        return int(out.split()[1])
+    except:
+        pass
+    return None
+
+
+def month_by_name(name):
+    """ Return the number of a month by (locale-independently) English name """
+
+    ENGLISH_NAMES = [
+        u'January', u'February', u'March', u'April', u'May', u'June',
+        u'July', u'August', u'September', u'October', u'November', u'December']
+    try:
+        return ENGLISH_NAMES.index(name) + 1
+    except ValueError:
+        return None
+
+
+def fix_xml_all_ampersand(xml_str):
+    """Replace all the '&' by '&amp;' in XML"""
+    return xml_str.replace(u'&', u'&amp;')
+
+
+def setproctitle(title):
+    assert isinstance(title, type(u''))
+    try:
+        libc = ctypes.cdll.LoadLibrary("libc.so.6")
+    except OSError:
+        return
+    title = title
+    buf = ctypes.create_string_buffer(len(title) + 1)
+    buf.value = title.encode('utf-8')
+    try:
+        libc.prctl(15, ctypes.byref(buf), 0, 0, 0)
+    except AttributeError:
+        return  # Strange libc, just skip this
+
+
+def remove_start(s, start):
+    if s.startswith(start):
+        return s[len(start):]
+    return s
+
+
+def url_basename(url):
+    path = compat_urlparse.urlparse(url).path
+    return path.strip(u'/').split(u'/')[-1]
+
+
+class HEADRequest(compat_urllib_request.Request):
+    def get_method(self):
+        return "HEAD"
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 68b30bfd4a4ec455f3dad230e1fc30c353d807ca..24855bceb5094ed3e31175dfd901073f2624bade 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,2 +1,2 @@
  
-__version__ = '2013.12.04'
+__version__ = '2013.12.23'
author	Rogério Brito <rbrito@ime.usp.br>
	Thu, 26 Dec 2013 18:41:27 +0000 (16:41 -0200)
committer	Rogério Brito <rbrito@ime.usp.br>
	Thu, 26 Dec 2013 18:41:27 +0000 (16:41 -0200)
README.md		patch \| blob \| history
README.txt		patch \| blob \| history
test/test_YoutubeDL.py		patch \| blob \| history
test/test_all_urls.py		patch \| blob \| history
test/test_playlists.py		patch \| blob \| history
test/test_utils.py		patch \| blob \| history
test/test_write_info_json.py		patch \| blob \| history
test/test_youtube_lists.py		patch \| blob \| history
youtube-dl		patch \| blob \| history
youtube-dl.1		patch \| blob \| history
youtube-dl.bash-completion		patch \| blob \| history
youtube_dl/FileDownloader.py		patch \| blob \| history
youtube_dl/YoutubeDL.py		patch \| blob \| history
youtube_dl/__init__.py		patch \| blob \| history
youtube_dl/aes.py		patch \| blob \| history
youtube_dl/extractor/__init__.py		patch \| blob \| history
youtube_dl/extractor/academicearth.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/addanime.py		patch \| blob \| history
youtube_dl/extractor/aparat.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/appletrailers.py		patch \| blob \| history
youtube_dl/extractor/archiveorg.py		patch \| blob \| history
youtube_dl/extractor/arte.py		patch \| blob \| history
youtube_dl/extractor/auengine.py		patch \| blob \| history
youtube_dl/extractor/bambuser.py		patch \| blob \| history
youtube_dl/extractor/blinkx.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/bliptv.py		patch \| blob \| history
youtube_dl/extractor/bloomberg.py		patch \| blob \| history
youtube_dl/extractor/brightcove.py		patch \| blob \| history
youtube_dl/extractor/cbs.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/channel9.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/clipsyndicate.py		patch \| blob \| history
youtube_dl/extractor/comedycentral.py		patch \| blob \| history
youtube_dl/extractor/common.py		patch \| blob \| history
youtube_dl/extractor/crunchyroll.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/cspan.py		patch \| blob \| history
youtube_dl/extractor/dailymotion.py		patch \| blob \| history
youtube_dl/extractor/daum.py		patch \| blob \| history
youtube_dl/extractor/dreisat.py		patch \| blob \| history
youtube_dl/extractor/eighttracks.py		patch \| blob \| history
youtube_dl/extractor/exfm.py		patch \| blob \| history
youtube_dl/extractor/facebook.py		patch \| blob \| history
youtube_dl/extractor/faz.py		patch \| blob \| history
youtube_dl/extractor/fktv.py		patch \| blob \| history
youtube_dl/extractor/francetv.py		patch \| blob \| history
youtube_dl/extractor/gamekings.py		patch \| blob \| history
youtube_dl/extractor/gametrailers.py		patch \| blob \| history
youtube_dl/extractor/generic.py		patch \| blob \| history
youtube_dl/extractor/hotnewhiphop.py		patch \| blob \| history
youtube_dl/extractor/ign.py		patch \| blob \| history
youtube_dl/extractor/imdb.py		patch \| blob \| history
youtube_dl/extractor/instagram.py		patch \| blob \| history
youtube_dl/extractor/ivi.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/jukebox.py		patch \| blob \| history
youtube_dl/extractor/liveleak.py		patch \| blob \| history
youtube_dl/extractor/livestream.py		patch \| blob \| history
youtube_dl/extractor/mdr.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/metacafe.py		patch \| blob \| history
youtube_dl/extractor/metacritic.py		patch \| blob \| history
youtube_dl/extractor/mixcloud.py		patch \| blob \| history
youtube_dl/extractor/mtv.py		patch \| blob \| history
youtube_dl/extractor/muzu.py		patch \| blob \| history
youtube_dl/extractor/myspass.py		patch \| blob \| history
youtube_dl/extractor/naver.py		patch \| blob \| history
youtube_dl/extractor/ndtv.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/ninegag.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/ooyala.py		patch \| blob \| history
youtube_dl/extractor/orf.py		patch \| blob \| history
youtube_dl/extractor/pbs.py		patch \| blob \| history
youtube_dl/extractor/pornhd.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/pornhub.py		patch \| blob \| history
youtube_dl/extractor/pyvideo.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/radiofrance.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/rtlnow.py		patch \| blob \| history
youtube_dl/extractor/rutube.py		patch \| blob \| history
youtube_dl/extractor/slashdot.py		patch \| blob \| history
youtube_dl/extractor/smotri.py		patch \| blob \| history
youtube_dl/extractor/soundcloud.py		patch \| blob \| history
youtube_dl/extractor/space.py		patch \| blob \| history
youtube_dl/extractor/stanfordoc.py		patch \| blob \| history
youtube_dl/extractor/tf1.py		patch \| blob \| history
youtube_dl/extractor/theplatform.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/unistra.py		patch \| blob \| history
youtube_dl/extractor/vbox7.py		patch \| blob \| history
youtube_dl/extractor/veehd.py		patch \| blob \| history
youtube_dl/extractor/vevo.py		patch \| blob \| history
youtube_dl/extractor/vice.py		patch \| blob \| history
youtube_dl/extractor/viddler.py		patch \| blob \| history
youtube_dl/extractor/videofyme.py		patch \| blob \| history
youtube_dl/extractor/videopremium.py		patch \| blob \| history
youtube_dl/extractor/vimeo.py		patch \| blob \| history
youtube_dl/extractor/wat.py		patch \| blob \| history
youtube_dl/extractor/wimp.py		patch \| blob \| history
youtube_dl/extractor/wistia.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/xhamster.py		patch \| blob \| history
youtube_dl/extractor/xtube.py		patch \| blob \| history
youtube_dl/extractor/yahoo.py		patch \| blob \| history
youtube_dl/extractor/youjizz.py		patch \| blob \| history
youtube_dl/extractor/youtube.py		patch \| blob \| history
youtube_dl/extractor/zdf.py		patch \| blob \| history
youtube_dl/utils.py		patch \| blob \| history
youtube_dl/version.py		patch \| blob \| history