New upstream version 2017.02.07

author Rogério Brito <rbrito@ime.usp.br>

Thu, 9 Feb 2017 07:09:12 +0000 (05:09 -0200)

committer Rogério Brito <rbrito@ime.usp.br>

Thu, 9 Feb 2017 07:12:11 +0000 (05:12 -0200)
author Rogério Brito <rbrito@ime.usp.br>
Thu, 9 Feb 2017 07:09:12 +0000 (05:09 -0200)
committer Rogério Brito <rbrito@ime.usp.br>
Thu, 9 Feb 2017 07:12:11 +0000 (05:12 -0200)
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md

deleted file mode 100644 (file)

index 36559dd..0000000
--- a/.github/ISSUE_TEMPLATE.md
+++ /dev/null
@@ -1,58 +0,0 @@
-## Please follow the guide below
-
-- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
-- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
-- Use *Preview* tab to see how your issue will actually look like
-
----
-
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.12.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
-- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.12.01**
-
-### Before submitting an *issue* make sure you have:
-- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
-- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
-
-### What is the purpose of your *issue*?
-- [ ] Bug report (encountered problems with youtube-dl)
-- [ ] Site support request (request for adding support for a new site)
-- [ ] Feature request (request for a new functionality)
-- [ ] Question
-- [ ] Other
-
----
-
-### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
-
----
-
-### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
-
-Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
-```
-$ youtube-dl -v <your command line>
-[debug] System config: []
-[debug] User config: []
-[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
-[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.12.01
-[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
-[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
-[debug] Proxy map: {}
-...
-<end of log>
-```
-
----
-
-### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
-- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
-- Single video: https://youtu.be/BaW_jenozKc
-- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
-
----
-
-### Description of your *issue*, suggested solution and other information
-
-Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
-If work on your *issue* requires account credentials please provide them or explain how one can obtain them.
diff --git a/.github/ISSUE_TEMPLATE_tmpl.md b/.github/ISSUE_TEMPLATE_tmpl.md

deleted file mode 100644 (file)

index ab99681..0000000
--- a/.github/ISSUE_TEMPLATE_tmpl.md
+++ /dev/null
@@ -1,58 +0,0 @@
-## Please follow the guide below
-
-- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
-- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
-- Use *Preview* tab to see how your issue will actually look like
-
----
-
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
-- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
-
-### Before submitting an *issue* make sure you have:
-- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
-- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
-
-### What is the purpose of your *issue*?
-- [ ] Bug report (encountered problems with youtube-dl)
-- [ ] Site support request (request for adding support for a new site)
-- [ ] Feature request (request for a new functionality)
-- [ ] Question
-- [ ] Other
-
----
-
-### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
-
----
-
-### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
-
-Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
-```
-$ youtube-dl -v <your command line>
-[debug] System config: []
-[debug] User config: []
-[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
-[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version %(version)s
-[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
-[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
-[debug] Proxy map: {}
-...
-<end of log>
-```
-
----
-
-### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
-- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
-- Single video: https://youtu.be/BaW_jenozKc
-- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
-
----
-
-### Description of your *issue*, suggested solution and other information
-
-Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
-If work on your *issue* requires account credentials please provide them or explain how one can obtain them.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md

deleted file mode 100644 (file)

index 46fa26f..0000000
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ /dev/null
@@ -1,27 +0,0 @@
-## Please follow the guide below
-
-- You will be asked some questions, please read them **carefully** and answer honestly
-- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
-- Use *Preview* tab to see how your *pull request* will actually look like
-
----
-
-### Before submitting a *pull request* make sure you have:
-- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
-- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
-
-### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
-- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
-- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
-
-### What is the purpose of your *pull request*?
-- [ ] Bug fix
-- [ ] Improvement
-- [ ] New extractor
-- [ ] New feature
-
----
-
-### Description of your *pull request* and other information
-
-Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.
diff --git a/.gitignore b/.gitignore

deleted file mode 100644 (file)

index 9ce4b5e..0000000
--- a/.gitignore
+++ /dev/null
@@ -1,48 +0,0 @@
-*.pyc
-*.pyo
-*.class
-*~
-*.DS_Store
-wine-py2exe/
-py2exe.log
-*.kate-swp
-build/
-dist/
-MANIFEST
-README.txt
-youtube-dl.1
-youtube-dl.bash-completion
-youtube-dl.fish
-youtube_dl/extractor/lazy_extractors.py
-youtube-dl
-youtube-dl.exe
-youtube-dl.tar.gz
-.coverage
-cover/
-updates_key.pem
-*.egg-info
-*.srt
-*.sbv
-*.vtt
-*.flv
-*.mp4
-*.m4a
-*.m4v
-*.mp3
-*.3gp
-*.wav
-*.ape
-*.mkv
-*.swf
-*.part
-*.swp
-test/testdata
-test/local_parameters.json
-.tox
-youtube-dl.zsh
-
-# IntelliJ related files
-.idea
-*.iml
-
-tmp/
diff --git a/.travis.yml b/.travis.yml

deleted file mode 100644 (file)

index c74c9cc..0000000
--- a/.travis.yml
+++ /dev/null
@@ -1,18 +0,0 @@
-language: python
-python:
-  - "2.6"
-  - "2.7"
-  - "3.2"
-  - "3.3"
-  - "3.4"
-  - "3.5"
-sudo: false
-script: nosetests test --verbose
-notifications:
-  email:
-    - filippo.valsorda@gmail.com
-    - yasoob.khld@gmail.com
-#  irc:
-#    channels:
-#      - "irc.freenode.org#youtube-dl"
-#    skip_join: true
diff --git a/AUTHORS b/AUTHORS

deleted file mode 100644 (file)

index 4a6f7e1..0000000
--- a/AUTHORS
+++ /dev/null
@@ -1,192 +0,0 @@
-Ricardo Garcia Gonzalez
-Danny Colligan
-Benjamin Johnson
-Vasyl' Vavrychuk
-Witold Baryluk
-Paweł Paprota
-Gergely Imreh
-Rogério Brito
-Philipp Hagemeister
-Sören Schulze
-Kevin Ngo
-Ori Avtalion
-shizeeg
-Filippo Valsorda
-Christian Albrecht
-Dave Vasilevsky
-Jaime Marquínez Ferrándiz
-Jeff Crouse
-Osama Khalid
-Michael Walter
-M. Yasoob Ullah Khalid
-Julien Fraichard
-Johny Mo Swag
-Axel Noack
-Albert Kim
-Pierre Rudloff
-Huarong Huo
-Ismael Mejía
-Steffan Donal
-Andras Elso
-Jelle van der Waa
-Marcin Cieślak
-Anton Larionov
-Takuya Tsuchida
-Sergey M.
-Michael Orlitzky
-Chris Gahan
-Saimadhav Heblikar
-Mike Col
-Oleg Prutz
-pulpe
-Andreas Schmitz
-Michael Kaiser
-Niklas Laxström
-David Triendl
-Anthony Weems
-David Wagner
-Juan C. Olivares
-Mattias Harrysson
-phaer
-Sainyam Kapoor
-Nicolas Évrard
-Jason Normore
-Hoje Lee
-Adam Thalhammer
-Georg Jähnig
-Ralf Haring
-Koki Takahashi
-Ariset Llerena
-Adam Malcontenti-Wilson
-Tobias Bell
-Naglis Jonaitis
-Charles Chen
-Hassaan Ali
-Dobrosław Żybort
-David Fabijan
-Sebastian Haas
-Alexander Kirk
-Erik Johnson
-Keith Beckman
-Ole Ernst
-Aaron McDaniel (mcd1992)
-Magnus Kolstad
-Hari Padmanaban
-Carlos Ramos
-5moufl
-lenaten
-Dennis Scheiba
-Damon Timm
-winwon
-Xavier Beynon
-Gabriel Schubiner
-xantares
-Jan Matějka
-Mauroy Sébastien
-William Sewell
-Dao Hoang Son
-Oskar Jauch
-Matthew Rayfield
-t0mm0
-Tithen-Firion
-Zack Fernandes
-cryptonaut
-Adrian Kretz
-Mathias Rav
-Petr Kutalek
-Will Glynn
-Max Reimann
-Cédric Luthi
-Thijs Vermeir
-Joel Leclerc
-Christopher Krooss
-Ondřej Caletka
-Dinesh S
-Johan K. Jensen
-Yen Chi Hsuan
-Enam Mijbah Noor
-David Luhmer
-Shaya Goldberg
-Paul Hartmann
-Frans de Jonge
-Robin de Rooij
-Ryan Schmidt
-Leslie P. Polzer
-Duncan Keall
-Alexander Mamay
-Devin J. Pohly
-Eduardo Ferro Aldama
-Jeff Buchbinder
-Amish Bhadeshia
-Joram Schrijver
-Will W.
-Mohammad Teimori Pabandi
-Roman Le Négrate
-Matthias Küch
-Julian Richen
-Ping O.
-Mister Hat
-Peter Ding
-jackyzy823
-George Brighton
-Remita Amine
-Aurélio A. Heckert
-Bernhard Minks
-sceext
-Zach Bruggeman
-Tjark Saul
-slangangular
-Behrouz Abbasi
-ngld
-nyuszika7h
-Shaun Walbridge
-Lee Jenkins
-Anssi Hannula
-Lukáš Lalinský
-Qijiang Fan
-Rémy Léone
-Marco Ferragina
-reiv
-Muratcan Simsek
-Evan Lu
-flatgreen
-Brian Foley
-Vignesh Venkat
-Tom Gijselinck
-Founder Fang
-Andrew Alexeyew
-Saso Bezlaj
-Erwin de Haan
-Jens Wille
-Robin Houtevelts
-Patrick Griffis
-Aidan Rowe
-mutantmonkey
-Ben Congdon
-Kacper Michajłow
-José Joaquín Atria
-Viťas Strádal
-Kagami Hiiragi
-Philip Huppert
-blahgeek
-Kevin Deldycke
-inondle
-Tomáš Čech
-Déstin Reed
-Roman Tsiupa
-Artur Krysiak
-Jakub Adam Wieczorek
-Aleksandar Topuzović
-Nehal Patel
-Rob van Bekkum
-Petr Zvoníček
-Pratyush Singh
-Aleksander Nitecki
-Sebastian Blunt
-Matěj Cepl
-Xie Yanbo
-Philip Xu
-John Hawkinson
-Rich Leeper
-Zhong Jianxin
-Thor77
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

deleted file mode 100644 (file)

index 495955b..0000000
--- a/CONTRIBUTING.md
+++ /dev/null
@@ -1,298 +0,0 @@
-**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
-```
-$ youtube-dl -v <your command line>
-[debug] System config: []
-[debug] User config: []
-[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
-[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2015.12.06
-[debug] Git HEAD: 135392e
-[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
-[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
-[debug] Proxy map: {}
-...
-```
-**Do not post screenshots of verbose logs; only plain text is acceptable.**
-
-The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
-
-Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
-
-### Is the description of the issue itself sufficient?
-
-We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts.
-
-So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious
-
-- What the problem is
-- How it could be fixed
-- How your proposed solution would look like
-
-If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
-
-For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the `-v` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
-
-If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
-
-**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
-
-###  Are you using the latest version?
-
-Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
-
-###  Is the issue already documented?
-
-Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/rg3/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
-
-###  Why are existing options not enough?
-
-Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
-
-###  Is there enough context in your bug report?
-
-People want to solve problems, and often think they do us a favor by breaking down their larger problems (e.g. wanting to skip already downloaded files) to a specific request (e.g. requesting us to look whether the file exists before downloading the info page). However, what often happens is that they break down the problem into two steps: One simple, and one impossible (or extremely complicated one).
-
-We are then presented with a very complicated request when the original problem could be solved far easier, e.g. by recording the downloaded video IDs in a separate file. To avoid this, you must include the greater context where it is non-obvious. In particular, every feature request that does not consist of adding support for a new site should contain a use case scenario that explains in what situation the missing feature would be useful.
-
-###  Does the issue involve one problem, and one problem only?
-
-Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
-
-In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
-
-###  Is anyone going to need the feature?
-
-Only post features that you (or an incapacitated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
-
-###  Is your question about youtube-dl?
-
-It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
-
-# DEVELOPER INSTRUCTIONS
-
-Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
-
-To run youtube-dl as a developer, you don't need to build anything either. Simply execute
-
-    python -m youtube_dl
-
-To run the test, simply invoke your favorite test runner, or execute a test file directly; any of the following work:
-
-    python -m unittest discover
-    python test/test_download.py
-    nosetests
-
-If you want to create a build of youtube-dl yourself, you'll need
-
-* python
-* make (only GNU make is supported)
-* pandoc
-* zip
-* nosetests
-
-### Adding support for a new site
-
-If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
-
-After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
-
-1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
-2. Check out the source code with:
-
-        git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
-
-3. Start a new git branch with
-
-        cd youtube-dl
-        git checkout -b yourextractor
-
-4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
-
-    ```python
-    # coding: utf-8
-    from __future__ import unicode_literals
-
-    from .common import InfoExtractor
-
-
-    class YourExtractorIE(InfoExtractor):
-        _VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
-        _TEST = {
-            'url': 'http://yourextractor.com/watch/42',
-            'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
-            'info_dict': {
-                'id': '42',
-                'ext': 'mp4',
-                'title': 'Video title goes here',
-                'thumbnail': 're:^https?://.*\.jpg$',
-                # TODO more properties, either as:
-                # * A value
-                # * MD5 checksum; start the string with md5:
-                # * A regular expression; start the string with re:
-                # * Any Python type (for example int or float)
-            }
-        }
-
-        def _real_extract(self, url):
-            video_id = self._match_id(url)
-            webpage = self._download_webpage(url, video_id)
-
-            # TODO more code goes here, for example ...
-            title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
-
-            return {
-                'id': video_id,
-                'title': title,
-                'description': self._og_search_description(webpage),
-                'uploader': self._search_regex(r'<div[^>]+id="uploader"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False),
-                # TODO more properties (see youtube_dl/extractor/common.py)
-            }
-    ```
-5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
-6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
-7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
-9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
-
-        $ git add youtube_dl/extractor/extractors.py
-        $ git add youtube_dl/extractor/yourextractor.py
-        $ git commit -m '[yourextractor] Add new extractor'
-        $ git push origin yourextractor
-
-10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
-
-In any case, thank you very much for your contributions!
-
-## youtube-dl coding conventions
-
-This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
-
-Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
-
-### Mandatory and optional metafields
-
-For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
-
- - `id` (media identifier)
- - `title` (media title)
- - `url` (media download URL) or `formats`
-
-In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
-
-[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
-
-#### Example
-
-Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
-
-```python
-meta = self._download_json(url, video_id)
-```
-    
-Assume at this point `meta`'s layout is:
-
-```python
-{
-    ...
-    "summary": "some fancy summary text",
-    ...
-}
-```
-
-Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
-
-```python
-description = meta.get('summary')  # correct
-```
-
-and not like:
-
-```python
-description = meta['summary']  # incorrect
-```
-
-The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
-
-Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
-
-```python
-description = self._search_regex(
-    r'<span[^>]+id="title"[^>]*>([^<]+)<',
-    webpage, 'description', fatal=False)
-```
-
-With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
-
-You can also pass `default=<some fallback value>`, for example:
-
-```python
-description = self._search_regex(
-    r'<span[^>]+id="title"[^>]*>([^<]+)<',
-    webpage, 'description', default=None)
-```
-
-On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
- 
-### Provide fallbacks
-
-When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
-
-#### Example
-
-Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
-
-```python
-title = meta['title']
-```
-
-If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
-
-Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
-
-```python
-title = meta.get('title') or self._og_search_title(webpage)
-```
-
-This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
-
-### Make regular expressions flexible
-
-When using regular expressions try to write them fuzzy and flexible.
- 
-#### Example
-
-Say you need to extract `title` from the following HTML code:
-
-```html
-<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
-```
-
-The code for that task should look similar to:
-
-```python
-title = self._search_regex(
-    r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
-```
-
-Or even better:
-
-```python
-title = self._search_regex(
-    r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
-    webpage, 'title', group='title')
-```
-
-Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute: 
-
-The code definitely should not look like:
-
-```python
-title = self._search_regex(
-    r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
-    webpage, 'title', group='title')
-```
-
-### Use safe conversion functions
-
-Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
-
diff --git a/ChangeLog b/ChangeLog

index a91de7b63d91b99d606be44cb27e4e72f4e251b2..7e2afaacf411c744af2ba54530fb28ab91d8b338 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,382 @@
+version 2017.02.07
+
+Core
+* [extractor/common] Fix audio only with audio group in m3u8 (#11995)
++ [downloader/fragment] Respect --no-part
+* [extractor/common] Speed-up HTML5 media entries extraction (#11979)
+
+Extractors
+* [pornhub] Fix extraction (#11997)
++ [canalplus] Add support for cstar.fr (#11990)
++ [extractor/generic] Improve RTMP support (#11993)
++ [gaskrank] Add support for gaskrank.tv (#11685)
+* [bandcamp] Fix extraction for incomplete albums (#11727)
+* [iwara] Fix extraction (#11781)
+* [googledrive] Fix extraction on Python 3.6
++ [videopress] Add support for videopress.com
++ [afreecatv] Extract RTMP formats
+
+
+version 2017.02.04.1
+
+Extractors
++ [twitch:stream] Add support for player.twitch.tv (#11971)
+* [radiocanada] Fix extraction for toutv rtmp formats
+
+
+version 2017.02.04
+
+Core
++ Add --playlist-random to shuffle playlists (#11889, #11901)
+* [utils] Improve comments processing in js_to_json (#11947)
+* [utils] Handle single-line comments in js_to_json
+* [downloader/external:ffmpeg] Minimize the use of aac_adtstoasc filter
+
+Extractors
++ [piksel] Add another app token pattern (#11969)
++ [vk] Capture and output author blocked error message (#11965)
++ [turner] Fix secure HLS formats downloading with ffmpeg (#11358, #11373,
+  #11800)
++ [drtv] Add support for live and radio sections (#1827, #3427)
+* [myspace] Fix extraction and extract HLS and HTTP formats
++ [youtube] Add format info for itag 325 and 328
+* [vine] Fix extraction (#11955)
+- [sportbox] Remove extractor (#11954)
++ [filmon] Add support for filmon.com (#11187)
++ [infoq] Add audio only formats (#11565)
+* [douyutv] Improve room id regular expression (#11931)
+* [iprima] Fix extraction (#11920, #11896)
+* [youtube] Fix ytsearch when cookies are provided (#11924)
+* [go] Relax video id regular expression (#11937)
+* [facebook] Fix title extraction (#11941)
++ [youtube:playlist] Recognize TL playlists (#11945)
++ [bilibili] Support new Bangumi URLs (#11845)
++ [cbc:watch] Extract audio codec for audio only formats (#11893)
++ [elpais] Fix extraction for some URLs (#11765)
+
+
+version 2017.02.01
+
+Extractors
++ [facebook] Add another fallback extraction scenario (#11926)
+* [prosiebensat1] Fix extraction of descriptions (#11810, #11929)
+- [crunchyroll] Remove ScaledBorderAndShadow settings (#9028)
++ [vimeo] Extract upload timestamp
++ [vimeo] Extract license (#8726, #11880)
++ [nrk:series] Add support for series (#11571, #11711)
+
+
+version 2017.01.31
+
+Core
++ [compat] Add compat_etree_register_namespace
+
+Extractors
+* [youtube] Fix extraction for domainless player URLs (#11890, #11891, #11892,
+  #11894, #11895, #11897, #11900, #11903, #11904, #11906, #11907, #11909,
+  #11913, #11914, #11915, #11916, #11917, #11918, #11919)
++ [vimeo] Extract both mixed and separated DASH formats
++ [ruutu] Extract DASH formats
+* [itv] Fix extraction for python 2.6
+
+
+version 2017.01.29
+
+Core
+* [extractor/common] Fix initialization template (#11605, #11825)
++ [extractor/common] Document fragment_base_url and fragment's path fields
+* [extractor/common] Fix duration per DASH segment (#11868)
++ Introduce --autonumber-start option for initial value of %(autonumber)s
+  template (#727, #2702, #9362, #10457, #10529, #11862)
+
+Extractors
++ [azmedien:playlist] Add support for topic and themen playlists (#11817)
+* [npo] Fix subtitles extraction
++ [itv] Extract subtitles
++ [itv] Add support for itv.com (#9240)
++ [mtv81] Add support for mtv81.com (#7619)
++ [vlive] Add support for channels (#11826)
++ [kaltura] Add fallback for fileExt
++ [kaltura] Improve uploader_id extraction
++ [konserthusetplay] Add support for rspoplay.se (#11828)
+
+
+version 2017.01.28
+
+Core
+* [utils] Improve parse_duration
+
+Extractors
+* [crunchyroll] Improve series and season metadata extraction (#11832)
+* [soundcloud] Improve formats extraction and extract audio bitrate
++ [soundcloud] Extract HLS formats
+* [soundcloud] Fix track URL extraction (#11852)
++ [twitch:vod] Expand URL regular expressions (#11846)
+* [aenetworks] Fix season episodes extraction (#11669)
++ [tva] Add support for videos.tva.ca (#11842)
+* [jamendo] Improve and extract more metadata (#11836)
++ [disney] Add support for Disney sites (#7409, #11801, #4975, #11000)
+* [vevo] Remove request to old API and catch API v2 errors
++ [cmt,mtv,southpark] Add support for episode URLs (#11837)
++ [youtube] Add fallback for duration extraction (#11841)
+
+
+version 2017.01.25
+
+Extractors
++ [openload] Fallback video extension to mp4
++ [extractor/generic] Add support for Openload embeds (#11536, #11812)
+* [srgssr] Fix rts video extraction (#11831)
++ [afreecatv:global] Add support for afreeca.tv (#11807)
++ [crackle] Extract vtt subtitles
++ [crackle] Extract multiple resolutions for thumbnails
++ [crackle] Add support for mobile URLs
++ [konserthusetplay] Extract subtitles (#11823)
++ [konserthusetplay] Add support for HLS videos (#11823)
+* [vimeo:review] Fix config URL extraction (#11821)
+
+
+version 2017.01.24
+
+Extractors
+* [pluralsight] Fix extraction (#11820)
++ [nextmedia] Add support for NextTV (壹電視)
+* [24video] Fix extraction (#11811)
+* [youtube:playlist] Fix nonexistent and private playlist detection (#11604)
++ [chirbit] Extract uploader (#11809)
+
+
+version 2017.01.22
+
+Extractors
++ [pornflip] Add support for pornflip.com (#11556, #11795)
+* [chaturbate] Fix extraction (#11797, #11802)
++ [azmedien] Add support for AZ Medien sites (#11784, #11785)
++ [nextmedia] Support redirected URLs
++ [vimeo:channel] Extract videos' titles for playlist entries (#11796)
++ [youtube] Extract episode metadata (#9695, #11774)
++ [cspan] Support Ustream embedded videos (#11547)
++ [1tv] Add support for HLS videos (#11786)
+* [uol] Fix extraction (#11770)
+* [mtv] Relax triforce feed regular expression (#11766)
+
+
+version 2017.01.18
+
+Extractors
+* [bilibili] Fix extraction (#11077)
++ [canalplus] Add fallback for video id (#11764)
+* [20min] Fix extraction (#11683, #11751)
+* [imdb] Extend URL regular expression (#11744)
++ [naver] Add support for tv.naver.com links (#11743)
+
+
+version 2017.01.16
+
+Core
+* [options] Apply custom config to final composite configuration (#11741)
+* [YoutubeDL] Improve protocol auto determining (#11720)
+
+Extractors
+* [xiami] Relax URL regular expressions
+* [xiami] Improve track metadata extraction (#11699)
++ [limelight] Check hand-make direct HTTP links
++ [limelight] Add support for direct HTTP links at video.llnw.net (#11737)
++ [brightcove] Recognize another player ID pattern (#11688)
++ [niconico] Support login via cookies (#7968)
+* [yourupload] Fix extraction (#11601)
++ [beam:live] Add support for beam.pro live streams (#10702, #11596)
+* [vevo] Improve geo restriction detection
++ [dramafever] Add support for URLs with language code (#11714)
+* [cbc] Improve playlist support (#11704)
+
+
+version 2017.01.14
+
+Core
++ [common] Add ability to customize akamai manifest host
++ [utils] Add more date formats
+
+Extractors
+- [mtv] Eliminate _transform_rtmp_url
+* [mtv] Generalize triforce mgid extraction
++ [cmt] Add support for full episodes and video clips (#11623)
++ [mitele] Extract DASH formats
++ [ooyala] Add support for videos with embedToken (#11684)
+* [mixcloud] Fix extraction (#11674)
+* [openload] Fix extraction (#10408)
+* [tv4] Improve extraction (#11698)
+* [freesound] Fix and improve extraction (#11602)
++ [nick] Add support for beta.nick.com (#11655)
+* [mtv,cc] Use HLS by default with native HLS downloader (#11641)
+* [mtv] Fix non-HLS extraction
+
+
+version 2017.01.10
+
+Extractors
+* [youtube] Fix extraction (#11663, #11664)
++ [inc] Add support for inc.com (#11277, #11647)
++ [youtube] Add itag 212 (#11575)
++ [egghead:course] Add support for egghead.io courses
+
+
+version 2017.01.08
+
+Core
+* Fix "invalid escape sequence" errors under Python 3.6 (#11581)
+
+Extractors
++ [hitrecord] Add support for hitrecord.org (#10867, #11626)
+- [videott] Remove extractor
+* [swrmediathek] Improve extraction
+- [sharesix] Remove extractor
+- [aol:features] Remove extractor
+* [sendtonews] Improve info extraction
+* [3sat,phoenix] Fix extraction (#11619)
+* [comedycentral/mtv] Add support for HLS videos (#11600)
+* [discoverygo] Fix JSON data parsing (#11219, #11522)
+
+
+version 2017.01.05
+
+Extractors
++ [zdf] Fix extraction (#11055, #11063)
+* [pornhub:playlist] Improve extraction (#11594)
++ [cctv] Add support for ncpa-classic.com (#11591)
++ [tunein] Add support for embeds (#11579)
+
+
+version 2017.01.02
+
+Extractors
+* [cctv] Improve extraction (#879, #6753, #8541)
++ [nrktv:episodes] Add support for episodes (#11571)
++ [arkena] Add support for video.arkena.com (#11568)
+
+
+version 2016.12.31
+
+Core
++ Introduce --config-location option for custom configuration files (#6745,
+  #10648)
+
+Extractors
++ [twitch] Add support for player.twitch.tv (#11535, #11537)
++ [videa] Add support for videa.hu (#8181, #11133)
+* [vk] Fix postlive videos extraction
+* [vk] Extract from playerParams (#11555)
+- [freevideo] Remove extractor (#11515)
++ [showroomlive] Add support for showroom-live.com (#11458)
+* [xhamster] Fix duration extraction (#11549)
+* [rtve:live] Fix extraction (#11529)
+* [brightcove:legacy] Improve embeds detection (#11523)
++ [twitch] Add support for rechat messages (#11524)
+* [acast] Fix audio and timestamp extraction (#11521)
+
+
+version 2016.12.22
+
+Core
+* [extractor/common] Improve detection of video-only formats in m3u8
+  manifests (#11507)
+
+Extractors
++ [theplatform] Pass geo verification headers to SMIL request (#10146)
++ [viu] Pass geo verification headers to auth request
+* [rtl2] Extract more formats and metadata
+* [vbox7] Skip malformed JSON-LD (#11501)
+* [uplynk] Force downloading using native HLS downloader (#11496)
++ [laola1] Add support for another extraction scenario (#11460)
+
+
+version 2016.12.20
+
+Core
+* [extractor/common] Improve fragment URL construction for DASH media
+* [extractor/common] Fix codec information extraction for mixed audio/video
+  DASH media (#11490)
+
+Extractors
+* [vbox7] Fix extraction (#11494)
++ [uktvplay] Add support for uktvplay.uktv.co.uk (#11027)
++ [piksel] Add support for player.piksel.com (#11246)
++ [vimeo] Add support for DASH formats
+* [vimeo] Fix extraction for HLS formats (#11490)
+* [kaltura] Fix wrong widget ID in some cases (#11480)
++ [nrktv:direkte] Add support for live streams (#11488)
+* [pbs] Fix extraction for geo restricted videos (#7095)
+* [brightcove:new] Skip widevine classic videos
++ [viu] Add support for viu.com (#10607, #11329)
+
+
+version 2016.12.18
+
+Core
++ [extractor/common] Recognize DASH formats in html5 media entries
+
+Extractors
++ [ccma] Add support for ccma.cat (#11359)
+* [laola1tv] Improve extraction
++ [laola1tv] Add support embed URLs (#11460)
+* [nbc] Fix extraction for MSNBC videos (#11466)
+* [twitch] Adapt to new videos pages URL schema (#11469)
++ [meipai] Add support for meipai.com (#10718)
+* [jwplatform] Improve subtitles and duration extraction
++ [ondemandkorea] Add support for ondemandkorea.com (#10772)
++ [vvvvid] Add support for vvvvid.it (#5915)
+
+
+version 2016.12.15
+
+Core
++ [utils] Add convenience urljoin
+
+Extractors
++ [openload] Recognize oload.tv URLs (#10408)
++ [facebook] Recognize .onion URLs (#11443)
+* [vlive] Fix extraction (#11375, #11383)
++ [canvas] Extract DASH formats
++ [melonvod] Add support for vod.melon.com (#11419)
+
+
+version 2016.12.12
+
+Core
++ [utils] Add common user agents map
++ [common] Recognize HLS manifests that contain video only formats (#11394)
+
+Extractors
++ [dplay] Use Safari user agent for HLS (#11418)
++ [facebook] Detect login required error message
+* [facebook] Improve video selection (#11390)
++ [canalplus] Add another video id pattern (#11399)
+* [mixcloud] Relax URL regular expression (#11406)
+* [ctvnews] Relax URL regular expression (#11394)
++ [rte] Capture and output error message (#7746, #10498)
++ [prosiebensat1] Add support for DASH formats
+* [srgssr] Improve extraction for geo restricted videos (#11089)
+* [rts] Improve extraction for geo restricted videos (#4989)
+
+
+version 2016.12.09
+
+Core
+* [socks] Fix error reporting (#11355)
+
+Extractors
+* [openload] Fix extraction (#10408)
+* [pandoratv] Fix extraction (#11023)
++ [telebruxelles] Add support for emission URLs
+* [telebruxelles] Extract all formats
++ [bloomberg] Add another video id regular expression (#11371)
+* [fusion] Update ooyala id regular expression (#11364)
++ [1tv] Add support for playlists (#11335)
+* [1tv] Improve extraction (#11335)
++ [aenetworks] Extract more formats (#11321)
++ [thisoldhouse] Recognize /tv-episode/ URLs (#11271)
+
+
  version 2016.12.01
  
  Extractors
diff --git a/README.md b/README.md

index ea9131c3ab88e944c34b7bf50dc87fad8113bc20..89876bd7adcaab93bc30268f3b77632412ba47d7 100644 (file)
--- a/README.md
+++ b/README.md
@@ -29,7 +29,7 @@ Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.ex
  
  You can also use pip:
  
-    sudo pip install --upgrade youtube-dl
+    sudo -H pip install --upgrade youtube-dl
      
  This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
  
@@ -44,11 +44,7 @@ Or with [MacPorts](https://www.macports.org/):
  Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
  
  # DESCRIPTION
-**youtube-dl** is a command-line program to download videos from
-YouTube.com and a few more sites. It requires the Python interpreter, version
-2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
-your Unix box, on Windows or on Mac OS X. It is released to the public domain,
-which means you can modify it, redistribute it or use it however you like.
+**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
  
      youtube-dl [OPTIONS] URL [URL...]
  
@@ -84,13 +80,14 @@ which means you can modify it, redistribute it or use it however you like.
                                       configuration in ~/.config/youtube-
                                       dl/config (%APPDATA%/youtube-dl/config.txt
                                       on Windows)
+    --config-location PATH           Location of the configuration file; either
+                                     the path to the config or its containing
+                                     directory.
      --flat-playlist                  Do not extract the videos of a playlist,
                                       only list them.
      --mark-watched                   Mark videos watched (YouTube only)
      --no-mark-watched                Do not mark videos watched (YouTube only)
      --no-color                       Do not emit color codes in output
-    --abort-on-unavailable-fragment  Abort downloading when some fragment is not
-                                     available
  
  ## Network Options:
      --proxy URL                      Use the specified HTTP/HTTPS/SOCKS proxy.
@@ -100,16 +97,13 @@ which means you can modify it, redistribute it or use it however you like.
                                       string (--proxy "") for direct connection
      --socket-timeout SECONDS         Time to wait before giving up, in seconds
      --source-address IP              Client-side IP address to bind to
-                                     (experimental)
      -4, --force-ipv4                 Make all connections via IPv4
-                                     (experimental)
      -6, --force-ipv6                 Make all connections via IPv6
-                                     (experimental)
      --geo-verification-proxy URL     Use this proxy to verify the IP address for
                                       some geo-restricted sites. The default
                                       proxy specified by --proxy (or none, if the
                                       options is not present) is used for the
-                                     actual downloading. (experimental)
+                                     actual downloading.
  
  ## Video Selection:
      --playlist-start NUMBER          Playlist video to start at (default is 1)
@@ -140,23 +134,23 @@ which means you can modify it, redistribute it or use it however you like.
                                       COUNT views
      --max-views COUNT                Do not download any videos with more than
                                       COUNT views
-    --match-filter FILTER            Generic video filter (experimental).
-                                     Specify any key (see help for -o for a list
-                                     of available keys) to match if the key is
-                                     present, !key to check if the key is not
-                                     present,key > NUMBER (like "comment_count >
-                                     12", also works with >=, <, <=, !=, =) to
-                                     compare against a number, and & to require
-                                     multiple matches. Values which are not
-                                     known are excluded unless you put a
-                                     question mark (?) after the operator.For
-                                     example, to only match videos that have
-                                     been liked more than 100 times and disliked
-                                     less than 50 times (or the dislike
-                                     functionality is not available at the given
-                                     service), but who also have a description,
-                                     use --match-filter "like_count > 100 &
-                                     dislike_count <? 50 & description" .
+    --match-filter FILTER            Generic video filter. Specify any key (see
+                                     help for -o for a list of available keys)
+                                     to match if the key is present, !key to
+                                     check if the key is not present,key >
+                                     NUMBER (like "comment_count > 12", also
+                                     works with >=, <, <=, !=, =) to compare
+                                     against a number, and & to require multiple
+                                     matches. Values which are not known are
+                                     excluded unless you put a question mark (?)
+                                     after the operator.For example, to only
+                                     match videos that have been liked more than
+                                     100 times and disliked less than 50 times
+                                     (or the dislike functionality is not
+                                     available at the given service), but who
+                                     also have a description, use --match-filter
+                                     "like_count > 100 & dislike_count <? 50 &
+                                     description" .
      --no-playlist                    Download only the video, if the URL refers
                                       to a video and a playlist.
      --yes-playlist                   Download the playlist, if the URL refers to
@@ -179,6 +173,8 @@ which means you can modify it, redistribute it or use it however you like.
                                       only)
      --skip-unavailable-fragments     Skip unavailable fragments (DASH and
                                       hlsnative only)
+    --abort-on-unavailable-fragment  Abort downloading when some fragment is not
+                                     available
      --buffer-size SIZE               Size of download buffer (e.g. 1024 or 16K)
                                       (default is 1024)
      --no-resize-buffer               Do not automatically adjust the buffer
@@ -186,8 +182,9 @@ which means you can modify it, redistribute it or use it however you like.
                                       automatically resized from an initial value
                                       of SIZE.
      --playlist-reverse               Download playlist videos in reverse order
+    --playlist-random                Download playlist videos in random order
      --xattr-set-filesize             Set file xattribute ytdl.filesize with
-                                     expected filesize (experimental)
+                                     expected file size (experimental)
      --hls-prefer-native              Use the native HLS downloader instead of
                                       ffmpeg
      --hls-prefer-ffmpeg              Use ffmpeg instead of the native HLS
@@ -211,7 +208,9 @@ which means you can modify it, redistribute it or use it however you like.
      --autonumber-size NUMBER         Specify the number of digits in
                                       %(autonumber)s when it is present in output
                                       filename template or --auto-number option
-                                     is given
+                                     is given (default is 5)
+    --autonumber-start NUMBER        Specify the start value for %(autonumber)s
+                                     (default is 1)
      --restrict-filenames             Restrict filenames to only ASCII
                                       characters, and avoid "&" and spaces in
                                       filenames
@@ -354,7 +353,7 @@ which means you can modify it, redistribute it or use it however you like.
      -u, --username USERNAME          Login with this account ID
      -p, --password PASSWORD          Account password. If this option is left
                                       out, youtube-dl will ask interactively.
-    -2, --twofactor TWOFACTOR        Two-factor auth code
+    -2, --twofactor TWOFACTOR        Two-factor authentication code
      -n, --netrc                      Use .netrc authentication data
      --video-password PASSWORD        Video password (vimeo, smotri, youku)
  
@@ -375,7 +374,7 @@ which means you can modify it, redistribute it or use it however you like.
                                       avprobe)
      --audio-format FORMAT            Specify audio format: "best", "aac",
                                       "vorbis", "mp3", "m4a", "opus", or "wav";
-                                     "best" by default
+                                     "best" by default; No effect without -x
      --audio-quality QUALITY          Specify ffmpeg/avconv audio quality, insert
                                       a value between 0 (better) and 9 (worse)
                                       for VBR or a specific bitrate like 128K
@@ -447,6 +446,8 @@ Note that options in configuration file are just the same options aka switches u
  
  You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run.
  
+You can also use `--config-location` if you want to use custom configuration file for a particular youtube-dl run.
+
  ### Authentication with `.netrc` file
  
  You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
@@ -638,7 +639,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
   - `acodec`: Name of the audio codec in use
   - `vcodec`: Name of the video codec in use
   - `container`: Name of the container format
- - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
+ - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `m3u8`, or `m3u8_native`)
   - `format_id`: A short description of the format
  
  Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
@@ -664,7 +665,7 @@ $ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
  # Download best format available but not better that 480p
  $ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
  
-# Download best video only format but no bigger that 50 MB
+# Download best video only format but no bigger than 50 MB
  $ youtube-dl -f 'best[filesize<50M]'
  
  # Download best format available via direct link over HTTP/HTTPS protocol
@@ -744,7 +745,7 @@ Most people asking this question are not aware that youtube-dl now defaults to d
  
  ### I get HTTP error 402 when trying to download a video. What's this?
  
-Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
+Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a web browser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
  
  ### Do I need any other programs?
  
@@ -756,7 +757,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
  
  Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
  
-### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
+### I extracted a video URL with `-g`, but it does not play on another machine / in my web browser.
  
  It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`.
  
@@ -840,7 +841,7 @@ Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
  
  In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
  
-Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
+Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
  
  Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
  
@@ -932,7 +933,7 @@ If you want to create a build of youtube-dl yourself, you'll need
  
  If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
  
-After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
+After you have ensured this site is distributing its content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
  2. Check out the source code with:
@@ -962,7 +963,7 @@ After you have ensured this site is distributing it's content legally, you can f
                  'id': '42',
                  'ext': 'mp4',
                  'title': 'Video title goes here',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  # TODO more properties, either as:
                  # * A value
                  # * MD5 checksum; start the string with md5:
@@ -1037,7 +1038,7 @@ Assume at this point `meta`'s layout is:
  }
  ```
  
-Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
+Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional meta field you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
  
  ```python
  description = meta.get('summary')  # correct
@@ -1149,7 +1150,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
      ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
  ```
  
-Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
+Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
  
  Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
  
@@ -1252,7 +1253,7 @@ We are then presented with a very complicated request when the original problem
  
  Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
  
-In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
+In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
  
  ###  Is anyone going to need the feature?
  
diff --git a/README.txt b/README.txt

new file mode 100644 (file)

index 0000000..24d4314
--- /dev/null
+++ b/README.txt
@@ -0,0 +1,1760 @@
+youtube-dl - download videos from youtube.com or other video platforms
+
+-   INSTALLATION
+-   DESCRIPTION
+-   OPTIONS
+-   CONFIGURATION
+-   OUTPUT TEMPLATE
+-   FORMAT SELECTION
+-   VIDEO SELECTION
+-   FAQ
+-   DEVELOPER INSTRUCTIONS
+-   EMBEDDING YOUTUBE-DL
+-   BUGS
+-   COPYRIGHT
+
+
+
+INSTALLATION
+
+
+To install it right away for all UNIX users (Linux, OS X, etc.), type:
+
+    sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
+    sudo chmod a+rx /usr/local/bin/youtube-dl
+
+If you do not have curl, you can alternatively use a recent wget:
+
+    sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
+    sudo chmod a+rx /usr/local/bin/youtube-dl
+
+Windows users can download an .exe file and place it in any location on
+their PATH except for %SYSTEMROOT%\System32 (e.g. DO NOT put in
+C:\Windows\System32).
+
+You can also use pip:
+
+    sudo -H pip install --upgrade youtube-dl
+
+This command will update youtube-dl if you have already installed it.
+See the pypi page for more information.
+
+OS X users can install youtube-dl with Homebrew:
+
+    brew install youtube-dl
+
+Or with MacPorts:
+
+    sudo port install youtube-dl
+
+Alternatively, refer to the developer instructions for how to check out
+and work with the git repository. For further options, including PGP
+signatures, see the youtube-dl Download Page.
+
+
+
+DESCRIPTION
+
+
+YOUTUBE-DL is a command-line program to download videos from YouTube.com
+and a few more sites. It requires the Python interpreter, version 2.6,
+2.7, or 3.2+, and it is not platform specific. It should work on your
+Unix box, on Windows or on Mac OS X. It is released to the public
+domain, which means you can modify it, redistribute it or use it however
+you like.
+
+    youtube-dl [OPTIONS] URL [URL...]
+
+
+
+OPTIONS
+
+
+    -h, --help                       Print this help text and exit
+    --version                        Print program version and exit
+    -U, --update                     Update this program to latest version. Make
+                                     sure that you have sufficient permissions
+                                     (run with sudo if needed)
+    -i, --ignore-errors              Continue on download errors, for example to
+                                     skip unavailable videos in a playlist
+    --abort-on-error                 Abort downloading of further videos (in the
+                                     playlist or the command line) if an error
+                                     occurs
+    --dump-user-agent                Display the current browser identification
+    --list-extractors                List all supported extractors
+    --extractor-descriptions         Output descriptions of all supported
+                                     extractors
+    --force-generic-extractor        Force extraction to use the generic
+                                     extractor
+    --default-search PREFIX          Use this prefix for unqualified URLs. For
+                                     example "gvsearch2:" downloads two videos
+                                     from google videos for youtube-dl "large
+                                     apple". Use the value "auto" to let
+                                     youtube-dl guess ("auto_warning" to emit a
+                                     warning when guessing). "error" just throws
+                                     an error. The default value "fixup_error"
+                                     repairs broken URLs, but emits an error if
+                                     this is not possible instead of searching.
+    --ignore-config                  Do not read configuration files. When given
+                                     in the global configuration file
+                                     /etc/youtube-dl.conf: Do not read the user
+                                     configuration in ~/.config/youtube-
+                                     dl/config (%APPDATA%/youtube-dl/config.txt
+                                     on Windows)
+    --config-location PATH           Location of the configuration file; either
+                                     the path to the config or its containing
+                                     directory.
+    --flat-playlist                  Do not extract the videos of a playlist,
+                                     only list them.
+    --mark-watched                   Mark videos watched (YouTube only)
+    --no-mark-watched                Do not mark videos watched (YouTube only)
+    --no-color                       Do not emit color codes in output
+
+
+Network Options:
+
+    --proxy URL                      Use the specified HTTP/HTTPS/SOCKS proxy.
+                                     To enable experimental SOCKS proxy, specify
+                                     a proper scheme. For example
+                                     socks5://127.0.0.1:1080/. Pass in an empty
+                                     string (--proxy "") for direct connection
+    --socket-timeout SECONDS         Time to wait before giving up, in seconds
+    --source-address IP              Client-side IP address to bind to
+    -4, --force-ipv4                 Make all connections via IPv4
+    -6, --force-ipv6                 Make all connections via IPv6
+    --geo-verification-proxy URL     Use this proxy to verify the IP address for
+                                     some geo-restricted sites. The default
+                                     proxy specified by --proxy (or none, if the
+                                     options is not present) is used for the
+                                     actual downloading.
+
+
+Video Selection:
+
+    --playlist-start NUMBER          Playlist video to start at (default is 1)
+    --playlist-end NUMBER            Playlist video to end at (default is last)
+    --playlist-items ITEM_SPEC       Playlist video items to download. Specify
+                                     indices of the videos in the playlist
+                                     separated by commas like: "--playlist-items
+                                     1,2,5,8" if you want to download videos
+                                     indexed 1, 2, 5, 8 in the playlist. You can
+                                     specify range: "--playlist-items
+                                     1-3,7,10-13", it will download the videos
+                                     at index 1, 2, 3, 7, 10, 11, 12 and 13.
+    --match-title REGEX              Download only matching titles (regex or
+                                     caseless sub-string)
+    --reject-title REGEX             Skip download for matching titles (regex or
+                                     caseless sub-string)
+    --max-downloads NUMBER           Abort after downloading NUMBER files
+    --min-filesize SIZE              Do not download any videos smaller than
+                                     SIZE (e.g. 50k or 44.6m)
+    --max-filesize SIZE              Do not download any videos larger than SIZE
+                                     (e.g. 50k or 44.6m)
+    --date DATE                      Download only videos uploaded in this date
+    --datebefore DATE                Download only videos uploaded on or before
+                                     this date (i.e. inclusive)
+    --dateafter DATE                 Download only videos uploaded on or after
+                                     this date (i.e. inclusive)
+    --min-views COUNT                Do not download any videos with less than
+                                     COUNT views
+    --max-views COUNT                Do not download any videos with more than
+                                     COUNT views
+    --match-filter FILTER            Generic video filter. Specify any key (see
+                                     help for -o for a list of available keys)
+                                     to match if the key is present, !key to
+                                     check if the key is not present,key >
+                                     NUMBER (like "comment_count > 12", also
+                                     works with >=, <, <=, !=, =) to compare
+                                     against a number, and & to require multiple
+                                     matches. Values which are not known are
+                                     excluded unless you put a question mark (?)
+                                     after the operator.For example, to only
+                                     match videos that have been liked more than
+                                     100 times and disliked less than 50 times
+                                     (or the dislike functionality is not
+                                     available at the given service), but who
+                                     also have a description, use --match-filter
+                                     "like_count > 100 & dislike_count <? 50 &
+                                     description" .
+    --no-playlist                    Download only the video, if the URL refers
+                                     to a video and a playlist.
+    --yes-playlist                   Download the playlist, if the URL refers to
+                                     a video and a playlist.
+    --age-limit YEARS                Download only videos suitable for the given
+                                     age
+    --download-archive FILE          Download only videos not listed in the
+                                     archive file. Record the IDs of all
+                                     downloaded videos in it.
+    --include-ads                    Download advertisements as well
+                                     (experimental)
+
+
+Download Options:
+
+    -r, --limit-rate RATE            Maximum download rate in bytes per second
+                                     (e.g. 50K or 4.2M)
+    -R, --retries RETRIES            Number of retries (default is 10), or
+                                     "infinite".
+    --fragment-retries RETRIES       Number of retries for a fragment (default
+                                     is 10), or "infinite" (DASH and hlsnative
+                                     only)
+    --skip-unavailable-fragments     Skip unavailable fragments (DASH and
+                                     hlsnative only)
+    --abort-on-unavailable-fragment  Abort downloading when some fragment is not
+                                     available
+    --buffer-size SIZE               Size of download buffer (e.g. 1024 or 16K)
+                                     (default is 1024)
+    --no-resize-buffer               Do not automatically adjust the buffer
+                                     size. By default, the buffer size is
+                                     automatically resized from an initial value
+                                     of SIZE.
+    --playlist-reverse               Download playlist videos in reverse order
+    --playlist-random                Download playlist videos in random order
+    --xattr-set-filesize             Set file xattribute ytdl.filesize with
+                                     expected file size (experimental)
+    --hls-prefer-native              Use the native HLS downloader instead of
+                                     ffmpeg
+    --hls-prefer-ffmpeg              Use ffmpeg instead of the native HLS
+                                     downloader
+    --hls-use-mpegts                 Use the mpegts container for HLS videos,
+                                     allowing to play the video while
+                                     downloading (some players may not be able
+                                     to play it)
+    --external-downloader COMMAND    Use the specified external downloader.
+                                     Currently supports
+                                     aria2c,avconv,axel,curl,ffmpeg,httpie,wget
+    --external-downloader-args ARGS  Give these arguments to the external
+                                     downloader
+
+
+Filesystem Options:
+
+    -a, --batch-file FILE            File containing URLs to download ('-' for
+                                     stdin)
+    --id                             Use only video ID in file name
+    -o, --output TEMPLATE            Output filename template, see the "OUTPUT
+                                     TEMPLATE" for all the info
+    --autonumber-size NUMBER         Specify the number of digits in
+                                     %(autonumber)s when it is present in output
+                                     filename template or --auto-number option
+                                     is given (default is 5)
+    --autonumber-start NUMBER        Specify the start value for %(autonumber)s
+                                     (default is 1)
+    --restrict-filenames             Restrict filenames to only ASCII
+                                     characters, and avoid "&" and spaces in
+                                     filenames
+    -A, --auto-number                [deprecated; use -o
+                                     "%(autonumber)s-%(title)s.%(ext)s" ] Number
+                                     downloaded files starting from 00000
+    -t, --title                      [deprecated] Use title in file name
+                                     (default)
+    -l, --literal                    [deprecated] Alias of --title
+    -w, --no-overwrites              Do not overwrite files
+    -c, --continue                   Force resume of partially downloaded files.
+                                     By default, youtube-dl will resume
+                                     downloads if possible.
+    --no-continue                    Do not resume partially downloaded files
+                                     (restart from beginning)
+    --no-part                        Do not use .part files - write directly
+                                     into output file
+    --no-mtime                       Do not use the Last-modified header to set
+                                     the file modification time
+    --write-description              Write video description to a .description
+                                     file
+    --write-info-json                Write video metadata to a .info.json file
+    --write-annotations              Write video annotations to a
+                                     .annotations.xml file
+    --load-info-json FILE            JSON file containing the video information
+                                     (created with the "--write-info-json"
+                                     option)
+    --cookies FILE                   File to read cookies from and dump cookie
+                                     jar in
+    --cache-dir DIR                  Location in the filesystem where youtube-dl
+                                     can store some downloaded information
+                                     permanently. By default
+                                     $XDG_CACHE_HOME/youtube-dl or
+                                     ~/.cache/youtube-dl . At the moment, only
+                                     YouTube player files (for videos with
+                                     obfuscated signatures) are cached, but that
+                                     may change.
+    --no-cache-dir                   Disable filesystem caching
+    --rm-cache-dir                   Delete all filesystem cache files
+
+
+Thumbnail images:
+
+    --write-thumbnail                Write thumbnail image to disk
+    --write-all-thumbnails           Write all thumbnail image formats to disk
+    --list-thumbnails                Simulate and list all available thumbnail
+                                     formats
+
+
+Verbosity / Simulation Options:
+
+    -q, --quiet                      Activate quiet mode
+    --no-warnings                    Ignore warnings
+    -s, --simulate                   Do not download the video and do not write
+                                     anything to disk
+    --skip-download                  Do not download the video
+    -g, --get-url                    Simulate, quiet but print URL
+    -e, --get-title                  Simulate, quiet but print title
+    --get-id                         Simulate, quiet but print id
+    --get-thumbnail                  Simulate, quiet but print thumbnail URL
+    --get-description                Simulate, quiet but print video description
+    --get-duration                   Simulate, quiet but print video length
+    --get-filename                   Simulate, quiet but print output filename
+    --get-format                     Simulate, quiet but print output format
+    -j, --dump-json                  Simulate, quiet but print JSON information.
+                                     See --output for a description of available
+                                     keys.
+    -J, --dump-single-json           Simulate, quiet but print JSON information
+                                     for each command-line argument. If the URL
+                                     refers to a playlist, dump the whole
+                                     playlist information in a single line.
+    --print-json                     Be quiet and print the video information as
+                                     JSON (video is still being downloaded).
+    --newline                        Output progress bar as new lines
+    --no-progress                    Do not print progress bar
+    --console-title                  Display progress in console titlebar
+    -v, --verbose                    Print various debugging information
+    --dump-pages                     Print downloaded pages encoded using base64
+                                     to debug problems (very verbose)
+    --write-pages                    Write downloaded intermediary pages to
+                                     files in the current directory to debug
+                                     problems
+    --print-traffic                  Display sent and read HTTP traffic
+    -C, --call-home                  Contact the youtube-dl server for debugging
+    --no-call-home                   Do NOT contact the youtube-dl server for
+                                     debugging
+
+
+Workarounds:
+
+    --encoding ENCODING              Force the specified encoding (experimental)
+    --no-check-certificate           Suppress HTTPS certificate validation
+    --prefer-insecure                Use an unencrypted connection to retrieve
+                                     information about the video. (Currently
+                                     supported only for YouTube)
+    --user-agent UA                  Specify a custom user agent
+    --referer URL                    Specify a custom referer, use if the video
+                                     access is restricted to one domain
+    --add-header FIELD:VALUE         Specify a custom HTTP header and its value,
+                                     separated by a colon ':'. You can use this
+                                     option multiple times
+    --bidi-workaround                Work around terminals that lack
+                                     bidirectional text support. Requires bidiv
+                                     or fribidi executable in PATH
+    --sleep-interval SECONDS         Number of seconds to sleep before each
+                                     download when used alone or a lower bound
+                                     of a range for randomized sleep before each
+                                     download (minimum possible number of
+                                     seconds to sleep) when used along with
+                                     --max-sleep-interval.
+    --max-sleep-interval SECONDS     Upper bound of a range for randomized sleep
+                                     before each download (maximum possible
+                                     number of seconds to sleep). Must only be
+                                     used along with --min-sleep-interval.
+
+
+Video Format Options:
+
+    -f, --format FORMAT              Video format code, see the "FORMAT
+                                     SELECTION" for all the info
+    --all-formats                    Download all available video formats
+    --prefer-free-formats            Prefer free video formats unless a specific
+                                     one is requested
+    -F, --list-formats               List all available formats of requested
+                                     videos
+    --youtube-skip-dash-manifest     Do not download the DASH manifests and
+                                     related data on YouTube videos
+    --merge-output-format FORMAT     If a merge is required (e.g.
+                                     bestvideo+bestaudio), output to given
+                                     container format. One of mkv, mp4, ogg,
+                                     webm, flv. Ignored if no merge is required
+
+
+Subtitle Options:
+
+    --write-sub                      Write subtitle file
+    --write-auto-sub                 Write automatically generated subtitle file
+                                     (YouTube only)
+    --all-subs                       Download all the available subtitles of the
+                                     video
+    --list-subs                      List all available subtitles for the video
+    --sub-format FORMAT              Subtitle format, accepts formats
+                                     preference, for example: "srt" or
+                                     "ass/srt/best"
+    --sub-lang LANGS                 Languages of the subtitles to download
+                                     (optional) separated by commas, use --list-
+                                     subs for available language tags
+
+
+Authentication Options:
+
+    -u, --username USERNAME          Login with this account ID
+    -p, --password PASSWORD          Account password. If this option is left
+                                     out, youtube-dl will ask interactively.
+    -2, --twofactor TWOFACTOR        Two-factor authentication code
+    -n, --netrc                      Use .netrc authentication data
+    --video-password PASSWORD        Video password (vimeo, smotri, youku)
+
+
+Adobe Pass Options:
+
+    --ap-mso MSO                     Adobe Pass multiple-system operator (TV
+                                     provider) identifier, use --ap-list-mso for
+                                     a list of available MSOs
+    --ap-username USERNAME           Multiple-system operator account login
+    --ap-password PASSWORD           Multiple-system operator account password.
+                                     If this option is left out, youtube-dl will
+                                     ask interactively.
+    --ap-list-mso                    List all supported multiple-system
+                                     operators
+
+
+Post-processing Options:
+
+    -x, --extract-audio              Convert video files to audio-only files
+                                     (requires ffmpeg or avconv and ffprobe or
+                                     avprobe)
+    --audio-format FORMAT            Specify audio format: "best", "aac",
+                                     "vorbis", "mp3", "m4a", "opus", or "wav";
+                                     "best" by default; No effect without -x
+    --audio-quality QUALITY          Specify ffmpeg/avconv audio quality, insert
+                                     a value between 0 (better) and 9 (worse)
+                                     for VBR or a specific bitrate like 128K
+                                     (default 5)
+    --recode-video FORMAT            Encode the video to another format if
+                                     necessary (currently supported:
+                                     mp4|flv|ogg|webm|mkv|avi)
+    --postprocessor-args ARGS        Give these arguments to the postprocessor
+    -k, --keep-video                 Keep the video file on disk after the post-
+                                     processing; the video is erased by default
+    --no-post-overwrites             Do not overwrite post-processed files; the
+                                     post-processed files are overwritten by
+                                     default
+    --embed-subs                     Embed subtitles in the video (only for mp4,
+                                     webm and mkv videos)
+    --embed-thumbnail                Embed thumbnail in the audio as cover art
+    --add-metadata                   Write metadata to the video file
+    --metadata-from-title FORMAT     Parse additional metadata like song title /
+                                     artist from the video title. The format
+                                     syntax is the same as --output, the parsed
+                                     parameters replace existing values.
+                                     Additional templates: %(album)s,
+                                     %(artist)s. Example: --metadata-from-title
+                                     "%(artist)s - %(title)s" matches a title
+                                     like "Coldplay - Paradise"
+    --xattrs                         Write metadata to the video file's xattrs
+                                     (using dublin core and xdg standards)
+    --fixup POLICY                   Automatically correct known faults of the
+                                     file. One of never (do nothing), warn (only
+                                     emit a warning), detect_or_warn (the
+                                     default; fix file if we can, warn
+                                     otherwise)
+    --prefer-avconv                  Prefer avconv over ffmpeg for running the
+                                     postprocessors (default)
+    --prefer-ffmpeg                  Prefer ffmpeg over avconv for running the
+                                     postprocessors
+    --ffmpeg-location PATH           Location of the ffmpeg/avconv binary;
+                                     either the path to the binary or its
+                                     containing directory.
+    --exec CMD                       Execute a command on the file after
+                                     downloading, similar to find's -exec
+                                     syntax. Example: --exec 'adb push {}
+                                     /sdcard/Music/ && rm {}'
+    --convert-subs FORMAT            Convert the subtitles to other format
+                                     (currently supported: srt|ass|vtt)
+
+
+
+CONFIGURATION
+
+
+You can configure youtube-dl by placing any supported command line
+option to a configuration file. On Linux and OS X, the system wide
+configuration file is located at /etc/youtube-dl.conf and the user wide
+configuration file at ~/.config/youtube-dl/config. On Windows, the user
+wide configuration file locations are %APPDATA%\youtube-dl\config.txt or
+C:\Users\<user name>\youtube-dl.conf. Note that by default configuration
+file may not exist so you may need to create it yourself.
+
+For example, with the following configuration file youtube-dl will
+always extract the audio, not copy the mtime, use a proxy and save all
+videos under Movies directory in your home directory:
+
+    # Lines starting with # are comments
+
+    # Always extract audio
+    -x
+
+    # Do not copy the mtime
+    --no-mtime
+
+    # Use this proxy
+    --proxy 127.0.0.1:3128
+
+    # Save all videos under Movies directory in your home directory
+    -o ~/Movies/%(title)s.%(ext)s
+
+Note that options in configuration file are just the same options aka
+switches used in regular command line calls thus there MUST BE NO
+WHITESPACE after - or --, e.g. -o or --proxy but not - o or -- proxy.
+
+You can use --ignore-config if you want to disable the configuration
+file for a particular youtube-dl run.
+
+You can also use --config-location if you want to use custom
+configuration file for a particular youtube-dl run.
+
+Authentication with .netrc file
+
+You may also want to configure automatic credentials storage for
+extractors that support authentication (by providing login and password
+with --username and --password) in order not to pass credentials as
+command line arguments on every youtube-dl execution and prevent
+tracking plain text passwords in the shell command history. You can
+achieve this using a .netrc file on a per extractor basis. For that you
+will need to create a .netrc file in your $HOME and restrict permissions
+to read/write by only you:
+
+    touch $HOME/.netrc
+    chmod a-rwx,u+rw $HOME/.netrc
+
+After that you can add credentials for an extractor in the following
+format, where _extractor_ is the name of the extractor in lowercase:
+
+    machine <extractor> login <login> password <password>
+
+For example:
+
+    machine youtube login myaccount@gmail.com password my_youtube_password
+    machine twitch login my_twitch_account_name password my_twitch_password
+
+To activate authentication with the .netrc file you should pass --netrc
+to youtube-dl or place it in the configuration file.
+
+On Windows you may also need to setup the %HOME% environment variable
+manually.
+
+
+
+OUTPUT TEMPLATE
+
+
+The -o option allows users to indicate a template for the output file
+names.
+
+TL;DR: navigate me to examples.
+
+The basic usage is not to set any template arguments when downloading a
+single file, like in youtube-dl -o funny_video.flv "http://some/video".
+However, it may contain special sequences that will be replaced when
+downloading each video. The special sequences have the format %(NAME)s.
+To clarify, that is a percent symbol followed by a name in parentheses,
+followed by a lowercase S. Allowed names are:
+
+-   id: Video identifier
+-   title: Video title
+-   url: Video URL
+-   ext: Video filename extension
+-   alt_title: A secondary title of the video
+-   display_id: An alternative identifier for the video
+-   uploader: Full name of the video uploader
+-   license: License name the video is licensed under
+-   creator: The creator of the video
+-   release_date: The date (YYYYMMDD) when the video was released
+-   timestamp: UNIX timestamp of the moment the video became available
+-   upload_date: Video upload date (YYYYMMDD)
+-   uploader_id: Nickname or id of the video uploader
+-   location: Physical location where the video was filmed
+-   duration: Length of the video in seconds
+-   view_count: How many users have watched the video on the platform
+-   like_count: Number of positive ratings of the video
+-   dislike_count: Number of negative ratings of the video
+-   repost_count: Number of reposts of the video
+-   average_rating: Average rating give by users, the scale used depends
+    on the webpage
+-   comment_count: Number of comments on the video
+-   age_limit: Age restriction for the video (years)
+-   format: A human-readable description of the format
+-   format_id: Format code specified by --format
+-   format_note: Additional info about the format
+-   width: Width of the video
+-   height: Height of the video
+-   resolution: Textual description of width and height
+-   tbr: Average bitrate of audio and video in KBit/s
+-   abr: Average audio bitrate in KBit/s
+-   acodec: Name of the audio codec in use
+-   asr: Audio sampling rate in Hertz
+-   vbr: Average video bitrate in KBit/s
+-   fps: Frame rate
+-   vcodec: Name of the video codec in use
+-   container: Name of the container format
+-   filesize: The number of bytes, if known in advance
+-   filesize_approx: An estimate for the number of bytes
+-   protocol: The protocol that will be used for the actual download
+-   extractor: Name of the extractor
+-   extractor_key: Key name of the extractor
+-   epoch: Unix epoch when creating the file
+-   autonumber: Five-digit number that will be increased with each
+    download, starting at zero
+-   playlist: Name or id of the playlist that contains the video
+-   playlist_index: Index of the video in the playlist padded with
+    leading zeros according to the total length of the playlist
+-   playlist_id: Playlist identifier
+-   playlist_title: Playlist title
+
+Available for the video that belongs to some logical chapter or section:
+- chapter: Name or title of the chapter the video belongs to -
+chapter_number: Number of the chapter the video belongs to - chapter_id:
+Id of the chapter the video belongs to
+
+Available for the video that is an episode of some series or programme:
+- series: Title of the series or programme the video episode belongs to
+- season: Title of the season the video episode belongs to -
+season_number: Number of the season the video episode belongs to -
+season_id: Id of the season the video episode belongs to - episode:
+Title of the video episode - episode_number: Number of the video episode
+within a season - episode_id: Id of the video episode
+
+Available for the media that is a track or a part of a music album: -
+track: Title of the track - track_number: Number of the track within an
+album or a disc - track_id: Id of the track - artist: Artist(s) of the
+track - genre: Genre(s) of the track - album: Title of the album the
+track belongs to - album_type: Type of the album - album_artist: List of
+all artists appeared on the album - disc_number: Number of the disc or
+other physical medium the track belongs to - release_year: Year (YYYY)
+when the album was released
+
+Each aforementioned sequence when referenced in an output template will
+be replaced by the actual value corresponding to the sequence name. Note
+that some of the sequences are not guaranteed to be present since they
+depend on the metadata obtained by a particular extractor. Such
+sequences will be replaced with NA.
+
+For example for -o %(title)s-%(id)s.%(ext)s and an mp4 video with title
+youtube-dl test video and id BaW_jenozKcj, this will result in a
+youtube-dl test video-BaW_jenozKcj.mp4 file created in the current
+directory.
+
+Output templates can also contain arbitrary hierarchical path, e.g.
+-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' which will
+result in downloading each video in a directory corresponding to this
+path template. Any missing directory will be automatically created for
+you.
+
+To use percent literals in an output template use %%. To output to
+stdout use -o -.
+
+The current default template is %(title)s-%(id)s.%(ext)s.
+
+In some cases, you don't want special characters such as 中, spaces, or
+&, such as when transferring the downloaded filename to a Windows system
+or the filename through an 8bit-unsafe channel. In these cases, add the
+--restrict-filenames flag to get a shorter title:
+
+Output template and Windows batch files
+
+If you are using an output template inside a Windows batch file then you
+must escape plain percent characters (%) by doubling, so that
+-o "%(title)s-%(id)s.%(ext)s" should become
+-o "%%(title)s-%%(id)s.%%(ext)s". However you should not touch %'s that
+are not plain characters, e.g. environment variables for expansion
+should stay intact: -o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s".
+
+Output template examples
+
+Note on Windows you may need to use double quotes instead of single.
+
+    $ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
+    youtube-dl test video ''_ä↭𝕐.mp4    # All kinds of weird characters
+
+    $ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc --restrict-filenames
+    youtube-dl_test_video_.mp4          # A simple file name
+
+    # Download YouTube playlist videos in separate directory indexed by video order in a playlist
+    $ youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
+
+    # Download all playlists of YouTube channel/user keeping each playlist in separate directory:
+    $ youtube-dl -o '%(uploader)s/%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/user/TheLinuxFoundation/playlists
+
+    # Download Udemy course keeping each chapter in separate directory under MyVideos directory in your home
+    $ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
+
+    # Download entire series season keeping each series and each season in separate directory under C:/MyVideos
+    $ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" http://videomore.ru/kino_v_detalayah/5_sezon/367617
+
+    # Stream the video being downloaded to stdout
+    $ youtube-dl -o - BaW_jenozKc
+
+
+
+FORMAT SELECTION
+
+
+By default youtube-dl tries to download the best available quality, i.e.
+if you want the best quality you DON'T NEED to pass any special options,
+youtube-dl will guess it for you by DEFAULT.
+
+But sometimes you may want to download in a different format, for
+example when you are on a slow or intermittent connection. The key
+mechanism for achieving this is so-called _format selection_ based on
+which you can explicitly specify desired format, select formats based on
+some criterion or criteria, setup precedence and much more.
+
+The general syntax for format selection is --format FORMAT or shorter
+-f FORMAT where FORMAT is a _selector expression_, i.e. an expression
+that describes format or formats you would like to download.
+
+TL;DR: navigate me to examples.
+
+The simplest case is requesting a specific format, for example with
+-f 22 you can download the format with format code equal to 22. You can
+get the list of available format codes for particular video using
+--list-formats or -F. Note that these format codes are extractor
+specific.
+
+You can also use a file extension (currently 3gp, aac, flv, m4a, mp3,
+mp4, ogg, wav, webm are supported) to download the best quality format
+of a particular file extension served as a single file, e.g. -f webm
+will download the best quality format with the webm extension served as
+a single file.
+
+You can also use special names to select particular edge case formats: -
+best: Select the best quality format represented by a single file with
+video and audio. - worst: Select the worst quality format represented by
+a single file with video and audio. - bestvideo: Select the best quality
+video-only format (e.g. DASH video). May not be available. - worstvideo:
+Select the worst quality video-only format. May not be available. -
+bestaudio: Select the best quality audio only-format. May not be
+available. - worstaudio: Select the worst quality audio only-format. May
+not be available.
+
+For example, to download the worst quality video-only format you can use
+-f worstvideo.
+
+If you want to download multiple videos and they don't have the same
+formats available, you can specify the order of preference using
+slashes. Note that slash is left-associative, i.e. formats on the left
+hand side are preferred, for example -f 22/17/18 will download format 22
+if it's available, otherwise it will download format 17 if it's
+available, otherwise it will download format 18 if it's available,
+otherwise it will complain that no suitable formats are available for
+download.
+
+If you want to download several formats of the same video use a comma as
+a separator, e.g. -f 22,17,18 will download all these three formats, of
+course if they are available. Or a more sophisticated example combined
+with the precedence feature: -f 136/137/mp4/bestvideo,140/m4a/bestaudio.
+
+You can also filter the video formats by putting a condition in
+brackets, as in -f "best[height=720]" (or -f "[filesize>10M]").
+
+The following numeric meta fields can be used with comparisons <, <=, >,
+>=, = (equals), != (not equals): - filesize: The number of bytes, if
+known in advance - width: Width of the video, if known - height: Height
+of the video, if known - tbr: Average bitrate of audio and video in
+KBit/s - abr: Average audio bitrate in KBit/s - vbr: Average video
+bitrate in KBit/s - asr: Audio sampling rate in Hertz - fps: Frame rate
+
+Also filtering work for comparisons = (equals), != (not equals), ^=
+(begins with), $= (ends with), *= (contains) and following string meta
+fields: - ext: File extension - acodec: Name of the audio codec in use -
+vcodec: Name of the video codec in use - container: Name of the
+container format - protocol: The protocol that will be used for the
+actual download, lower-case (http, https, rtsp, rtmp, rtmpe, mms, f4m,
+ism, m3u8, or m3u8_native) - format_id: A short description of the
+format
+
+Note that none of the aforementioned meta fields are guaranteed to be
+present since this solely depends on the metadata obtained by particular
+extractor, i.e. the metadata offered by the video hoster.
+
+Formats for which the value is not known are excluded unless you put a
+question mark (?) after the operator. You can combine format filters, so
+-f "[height <=? 720][tbr>500]" selects up to 720p videos (or videos
+where the height is not known) with a bitrate of at least 500 KBit/s.
+
+You can merge the video and audio of two formats into a single file
+using -f <video-format>+<audio-format> (requires ffmpeg or avconv
+installed), for example -f bestvideo+bestaudio will download the best
+video-only format, the best audio-only format and mux them together with
+ffmpeg/avconv.
+
+Format selectors can also be grouped using parentheses, for example if
+you want to download the best mp4 and webm formats with a height lower
+than 480 you can use -f '(mp4,webm)[height<480]'.
+
+Since the end of April 2015 and version 2015.04.26, youtube-dl uses
+-f bestvideo+bestaudio/best as the default format selection (see #5447,
+#5456). If ffmpeg or avconv are installed this results in downloading
+bestvideo and bestaudio separately and muxing them together into a
+single file giving the best overall quality available. Otherwise it
+falls back to best and results in downloading the best available quality
+served as a single file. best is also needed for videos that don't come
+from YouTube because they don't provide the audio and video in two
+different files. If you want to only download some DASH formats (for
+example if you are not interested in getting videos with a resolution
+higher than 1080p), you can add
+-f bestvideo[height<=?1080]+bestaudio/best to your configuration file.
+Note that if you use youtube-dl to stream to stdout (and most likely to
+pipe it to your media player then), i.e. you explicitly specify output
+template as -o -, youtube-dl still uses -f best format selection in
+order to start content delivery immediately to your player and not to
+wait until bestvideo and bestaudio are downloaded and muxed.
+
+If you want to preserve the old format selection behavior (prior to
+youtube-dl 2015.04.26), i.e. you want to download the best available
+quality media served as a single file, you should explicitly specify
+your choice with -f best. You may want to add it to the configuration
+file in order not to type it every time you run youtube-dl.
+
+Format selection examples
+
+Note on Windows you may need to use double quotes instead of single.
+
+    # Download best mp4 format available or any other best if no mp4 available
+    $ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
+
+    # Download best format available but not better that 480p
+    $ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
+
+    # Download best video only format but no bigger than 50 MB
+    $ youtube-dl -f 'best[filesize<50M]'
+
+    # Download best format available via direct link over HTTP/HTTPS protocol
+    $ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
+
+    # Download the best video format and the best audio format without merging them
+    $ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
+
+Note that in the last example, an output template is recommended as
+bestvideo and bestaudio may have the same file name.
+
+
+
+VIDEO SELECTION
+
+
+Videos can be filtered by their upload date using the options --date,
+--datebefore or --dateafter. They accept dates in two formats:
+
+-   Absolute dates: Dates in the format YYYYMMDD.
+-   Relative dates: Dates in the format
+    (now|today)[+-][0-9](day|week|month|year)(s)?
+
+Examples:
+
+    # Download only the videos uploaded in the last 6 months
+    $ youtube-dl --dateafter now-6months
+
+    # Download only the videos uploaded on January 1, 1970
+    $ youtube-dl --date 19700101
+
+    $ # Download only the videos uploaded in the 200x decade
+    $ youtube-dl --dateafter 20000101 --datebefore 20091231
+
+
+
+FAQ
+
+
+How do I update youtube-dl?
+
+If you've followed our manual installation instructions, you can simply
+run youtube-dl -U (or, on Linux, sudo youtube-dl -U).
+
+If you have used pip, a simple sudo pip install -U youtube-dl is
+sufficient to update.
+
+If you have installed youtube-dl using a package manager like _apt-get_
+or _yum_, use the standard system update mechanism to update. Note that
+distribution packages are often outdated. As a rule of thumb, youtube-dl
+releases at least once a month, and often weekly or even daily. Simply
+go to http://yt-dl.org/ to find out the current version. Unfortunately,
+there is nothing we youtube-dl developers can do if your distribution
+serves a really outdated version. You can (and should) complain to your
+distribution in their bugtracker or support forum.
+
+As a last resort, you can also uninstall the version installed by your
+package manager and follow our manual installation instructions. For
+that, remove the distribution's package, with a line like
+
+    sudo apt-get remove -y youtube-dl
+
+Afterwards, simply follow our manual installation instructions:
+
+    sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
+    sudo chmod a+x /usr/local/bin/youtube-dl
+    hash -r
+
+Again, from then on you'll be able to update with sudo youtube-dl -U.
+
+youtube-dl is extremely slow to start on Windows
+
+Add a file exclusion for youtube-dl.exe in Windows Defender settings.
+
+I'm getting an error Unable to extract OpenGraph title on YouTube playlists
+
+YouTube changed their playlist format in March 2014 and later on, so
+you'll need at least youtube-dl 2014.07.25 to download all YouTube
+videos.
+
+If you have installed youtube-dl with a package manager, pip, setup.py
+or a tarball, please use that to update. Note that Ubuntu packages do
+not seem to get updated anymore. Since we are not affiliated with
+Ubuntu, there is little we can do. Feel free to report bugs to the
+Ubuntu packaging people - all they have to do is update the package to a
+somewhat recent version. See above for a way to update.
+
+I'm getting an error when trying to use output template: error: using output template conflicts with using title, video ID or auto number
+
+Make sure you are not using -o with any of these options -t, --title,
+--id, -A or --auto-number set in command line or in a configuration
+file. Remove the latter if any.
+
+Do I always have to pass -citw?
+
+By default, youtube-dl intends to have the best options (incidentally,
+if you have a convincing case that these should be different, please
+file an issue where you explain that). Therefore, it is unnecessary and
+sometimes harmful to copy long option strings from webpages. In
+particular, the only option out of -citw that is regularly useful is -i.
+
+Can you please put the -b option back?
+
+Most people asking this question are not aware that youtube-dl now
+defaults to downloading the highest available quality as reported by
+YouTube, which will be 1080p or 720p in some cases, so you no longer
+need the -b option. For some specific videos, maybe YouTube does not
+report them to be available in a specific high quality format you're
+interested in. In that case, simply request it with the -f option and
+youtube-dl will try to download it.
+
+I get HTTP error 402 when trying to download a video. What's this?
+
+Apparently YouTube requires you to pass a CAPTCHA test if you download
+too much. We're considering to provide a way to let you solve the
+CAPTCHA, but at the moment, your best course of action is pointing a web
+browser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
+
+Do I need any other programs?
+
+youtube-dl works fine on its own on most sites. However, if you want to
+convert video/audio, you'll need avconv or ffmpeg. On some sites - most
+notably YouTube - videos can be retrieved in a higher quality format
+without sound. youtube-dl will detect whether avconv/ffmpeg is present
+and automatically pick the best option.
+
+Videos or video formats streamed via RTMP protocol can only be
+downloaded when rtmpdump is installed. Downloading MMS and RTSP videos
+requires either mplayer or mpv to be installed.
+
+I have downloaded a video but how can I play it?
+
+Once the video is fully downloaded, use any video player, such as mpv,
+vlc or mplayer.
+
+I extracted a video URL with -g, but it does not play on another machine / in my web browser.
+
+It depends a lot on the service. In many cases, requests for the video
+(to download/play it) must come from the same IP address and with the
+same cookies and/or HTTP headers. Use the --cookies option to write the
+required cookies into a file, and advise your downloader to read cookies
+from that file. Some sites also require a common user agent to be used,
+use --dump-user-agent to see the one in use by youtube-dl. You can also
+get necessary cookies and HTTP headers from JSON output obtained with
+--dump-json.
+
+It may be beneficial to use IPv6; in some cases, the restrictions are
+only applied to IPv4. Some services (sometimes only for a subset of
+videos) do not restrict the video URL by IP address, cookie, or
+user-agent, but these are the exception rather than the rule.
+
+Please bear in mind that some URL protocols are NOT supported by
+browsers out of the box, including RTMP. If you are using -g, your own
+downloader must support these as well.
+
+If you want to play the video on a machine that is not running
+youtube-dl, you can relay the video content from the machine that runs
+youtube-dl. You can use -o - to let youtube-dl stream a video to stdout,
+or simply allow the player to download the files written by youtube-dl
+in turn.
+
+ERROR: no fmt_url_map or conn information found in video info
+
+YouTube has switched to a new video info format in July 2011 which is
+not supported by old versions of youtube-dl. See above for how to update
+youtube-dl.
+
+ERROR: unable to download video
+
+YouTube requires an additional signature since September 2012 which is
+not supported by old versions of youtube-dl. See above for how to update
+youtube-dl.
+
+Video URL contains an ampersand and I'm getting some strange output [1] 2839 or 'v' is not recognized as an internal or external command
+
+That's actually the output from your shell. Since ampersand is one of
+the special shell characters it's interpreted by the shell preventing
+you from passing the whole URL to youtube-dl. To disable your shell from
+interpreting the ampersands (or any other special characters) you have
+to either put the whole URL in quotes or escape them with a backslash
+(which approach will work depends on your shell).
+
+For example if your URL is
+https://www.youtube.com/watch?t=4&v=BaW_jenozKc you should end up with
+following command:
+
+youtube-dl 'https://www.youtube.com/watch?t=4&v=BaW_jenozKc'
+
+or
+
+youtube-dl https://www.youtube.com/watch?t=4\&v=BaW_jenozKc
+
+For Windows you have to use the double quotes:
+
+youtube-dl "https://www.youtube.com/watch?t=4&v=BaW_jenozKc"
+
+ExtractorError: Could not find JS function u'OF'
+
+In February 2015, the new YouTube player contained a character sequence
+in a string that was misinterpreted by old versions of youtube-dl. See
+above for how to update youtube-dl.
+
+HTTP Error 429: Too Many Requests or 402: Payment Required
+
+These two error codes indicate that the service is blocking your IP
+address because of overuse. Contact the service and ask them to unblock
+your IP address, or - if you have acquired a whitelisted IP address
+already - use the --proxy or --source-address options to select another
+IP address.
+
+SyntaxError: Non-ASCII character
+
+The error
+
+    File "youtube-dl", line 2
+    SyntaxError: Non-ASCII character '\x93' ...
+
+means you're using an outdated version of Python. Please update to
+Python 2.6 or 2.7.
+
+What is this binary file? Where has the code gone?
+
+Since June 2012 (#342) youtube-dl is packed as an executable zipfile,
+simply unzip it (might need renaming to youtube-dl.zip first on some
+systems) or clone the git repository, as laid out above. If you modify
+the code, you can run it by executing the __main__.py file. To recompile
+the executable, run make youtube-dl.
+
+The exe throws an error due to missing MSVCR100.dll
+
+To run the exe you need to install first the Microsoft Visual C++ 2010
+Redistributable Package (x86).
+
+On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
+
+If you put youtube-dl and ffmpeg in the same directory that you're
+running the command from, it will work, but that's rather cumbersome.
+
+To make a different directory work - either for ffmpeg, or for
+youtube-dl, or for both - simply create the directory (say, C:\bin, or
+C:\Users\<User name>\bin), put all the executables directly in there,
+and then set your PATH environment variable to include that directory.
+
+From then on, after restarting your shell, you will be able to access
+both youtube-dl and ffmpeg (and youtube-dl will be able to find ffmpeg)
+by simply typing youtube-dl or ffmpeg, no matter what directory you're
+in.
+
+How do I put downloads into a specific folder?
+
+Use the -o to specify an output template, for example
+-o "/home/user/videos/%(title)s-%(id)s.%(ext)s". If you want this for
+all of your downloads, put the option into your configuration file.
+
+How do I download a video starting with a -?
+
+Either prepend http://www.youtube.com/watch?v= or separate the ID from
+the options with --:
+
+    youtube-dl -- -wNyEUrxzFU
+    youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU"
+
+How do I pass cookies to youtube-dl?
+
+Use the --cookies option, for example
+--cookies /path/to/cookies/file.txt.
+
+In order to extract cookies from browser use any conforming browser
+extension for exporting cookies. For example, cookies.txt (for Chrome)
+or Export Cookies (for Firefox).
+
+Note that the cookies file must be in Mozilla/Netscape format and the
+first line of the cookies file must be either # HTTP Cookie File or
+# Netscape HTTP Cookie File. Make sure you have correct newline format
+in the cookies file and convert newlines if necessary to correspond with
+your OS, namely CRLF (\r\n) for Windows and LF (\n) for Unix and
+Unix-like systems (Linux, Mac OS, etc.). HTTP Error 400: Bad Request
+when using --cookies is a good sign of invalid newline format.
+
+Passing cookies to youtube-dl is a good way to workaround login when a
+particular extractor does not implement it explicitly. Another use case
+is working around CAPTCHA some websites require you to solve in
+particular cases in order to get access (e.g. YouTube, CloudFlare).
+
+How do I stream directly to media player?
+
+You will first need to tell youtube-dl to stream media to stdout with
+-o -, and also tell your media player to read from stdin (it must be
+capable of this for streaming) and then pipe former to latter. For
+example, streaming to vlc can be achieved with:
+
+    youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
+
+How do I download only new videos from a playlist?
+
+Use download-archive feature. With this feature you should initially
+download the complete playlist with
+--download-archive /path/to/download/archive/file.txt that will record
+identifiers of all the videos in a special file. Each subsequent run
+with the same --download-archive will download only new videos and skip
+all videos that have been downloaded before. Note that only successful
+downloads are recorded in the file.
+
+For example, at first,
+
+    youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
+
+will download the complete PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re playlist
+and create a file archive.txt. Each subsequent run will only download
+new videos if any:
+
+    youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
+
+Should I add --hls-prefer-native into my config?
+
+When youtube-dl detects an HLS video, it can download it either with the
+built-in downloader or ffmpeg. Since many HLS streams are slightly
+invalid and ffmpeg/youtube-dl each handle some invalid cases better than
+the other, there is an option to switch the downloader if needed.
+
+When youtube-dl knows that one particular downloader works better for a
+given website, that downloader will be picked. Otherwise, youtube-dl
+will pick the best downloader for general compatibility, which at the
+moment happens to be ffmpeg. This choice may change in future versions
+of youtube-dl, with improvements of the built-in downloader and/or
+ffmpeg.
+
+In particular, the generic extractor (used when your website is not in
+the list of supported sites by youtube-dl cannot mandate one specific
+downloader.
+
+If you put either --hls-prefer-native or --hls-prefer-ffmpeg into your
+configuration, a different subset of videos will fail to download
+correctly. Instead, it is much better to file an issue or a pull request
+which details why the native or the ffmpeg HLS downloader is a better
+choice for your use case.
+
+Can you add support for this anime video site, or site which shows current movies for free?
+
+As a matter of policy (as well as legality), youtube-dl does not include
+support for services that specialize in infringing copyright. As a rule
+of thumb, if you cannot easily find a video that the service is quite
+obviously allowed to distribute (i.e. that has been uploaded by the
+creator, the creator's distributor, or is published under a free
+license), the service is probably unfit for inclusion to youtube-dl.
+
+A note on the service that they don't host the infringing content, but
+just link to those who do, is evidence that the service should NOT be
+included into youtube-dl. The same goes for any DMCA note when the whole
+front page of the service is filled with videos they are not allowed to
+distribute. A "fair use" note is equally unconvincing if the service
+shows copyright-protected videos in full without authorization.
+
+Support requests for services that DO purchase the rights to distribute
+their content are perfectly fine though. If in doubt, you can simply
+include a source that mentions the legitimate purchase of content.
+
+How can I speed up work on my issue?
+
+(Also known as: Help, my important issue not being solved!) The
+youtube-dl core developer team is quite small. While we do our best to
+solve as many issues as possible, sometimes that can take quite a while.
+To speed up your issue, here's what you can do:
+
+First of all, please do report the issue at our issue tracker. That
+allows us to coordinate all efforts by users and developers, and serves
+as a unified point. Unfortunately, the youtube-dl project has grown too
+large to use personal email as an effective communication channel.
+
+Please read the bug reporting instructions below. A lot of bugs lack all
+the necessary information. If you can, offer proxy, VPN, or shell access
+to the youtube-dl developers. If you are able to, test the issue from
+multiple computers in multiple countries to exclude local censorship or
+misconfiguration issues.
+
+If nobody is interested in solving your issue, you are welcome to take
+matters into your own hands and submit a pull request (or coerce/pay
+somebody else to do so).
+
+Feel free to bump the issue from time to time by writing a small comment
+("Issue is still present in youtube-dl version ...from France, but fixed
+from Belgium"), but please not more than once a month. Please do not
+declare your issue as important or urgent.
+
+How can I detect whether a given URL is supported by youtube-dl?
+
+For one, have a look at the list of supported sites. Note that it can
+sometimes happen that the site changes its URL scheme (say, from
+http://example.com/video/1234567 to http://example.com/v/1234567 ) and
+youtube-dl reports an URL of a service in that list as unsupported. In
+that case, simply report a bug.
+
+It is _not_ possible to detect whether a URL is supported or not. That's
+because youtube-dl contains a generic extractor which matches ALL URLs.
+You may be tempted to disable, exclude, or remove the generic extractor,
+but the generic extractor not only allows users to extract videos from
+lots of websites that embed a video from another service, but may also
+be used to extract video from a service that it's hosting itself.
+Therefore, we neither recommend nor support disabling, excluding, or
+removing the generic extractor.
+
+If you want to find out whether a given URL is supported, simply call
+youtube-dl with it. If you get no videos back, chances are the URL is
+either not referring to a video or unsupported. You can find out which
+by examining the output (if you run youtube-dl on the console) or
+catching an UnsupportedError exception if you run it from a Python
+program.
+
+
+
+WHY DO I NEED TO GO THROUGH THAT MUCH RED TAPE WHEN FILING BUGS?
+
+
+Before we had the issue template, despite our extensive bug reporting
+instructions, about 80% of the issue reports we got were useless, for
+instance because people used ancient versions hundreds of releases old,
+because of simple syntactic errors (not in youtube-dl but in general
+shell usage), because the problem was already reported multiple times
+before, because people did not actually read an error message, even if
+it said "please install ffmpeg", because people did not mention the URL
+they were trying to download and many more simple, easy-to-avoid
+problems, many of whom were totally unrelated to youtube-dl.
+
+youtube-dl is an open-source project manned by too few volunteers, so
+we'd rather spend time fixing bugs where we are certain none of those
+simple problems apply, and where we can be reasonably confident to be
+able to reproduce the issue without asking the reporter repeatedly. As
+such, the output of youtube-dl -v YOUR_URL_HERE is really all that's
+required to file an issue. The issue template also guides you through
+some basic steps you can do, such as checking that your version of
+youtube-dl is current.
+
+
+
+DEVELOPER INSTRUCTIONS
+
+
+Most users do not need to build youtube-dl and can download the builds
+or get them from their distribution.
+
+To run youtube-dl as a developer, you don't need to build anything
+either. Simply execute
+
+    python -m youtube_dl
+
+To run the test, simply invoke your favorite test runner, or execute a
+test file directly; any of the following work:
+
+    python -m unittest discover
+    python test/test_download.py
+    nosetests
+
+If you want to create a build of youtube-dl yourself, you'll need
+
+-   python
+-   make (only GNU make is supported)
+-   pandoc
+-   zip
+-   nosetests
+
+Adding support for a new site
+
+If you want to add support for a new site, first of all MAKE SURE this
+site is NOT DEDICATED TO COPYRIGHT INFRINGEMENT. youtube-dl does NOT
+SUPPORT such sites thus pull requests adding support for them WILL BE
+REJECTED.
+
+After you have ensured this site is distributing its content legally,
+you can follow this quick list (assuming your service is called
+yourextractor):
+
+1.  Fork this repository
+2.  Check out the source code with:
+
+        git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
+
+3.  Start a new git branch with
+
+        cd youtube-dl
+        git checkout -b yourextractor
+
+4.  Start with this simple template and save it to
+    youtube_dl/extractor/yourextractor.py:
+
+        # coding: utf-8
+        from __future__ import unicode_literals
+
+        from .common import InfoExtractor
+
+
+        class YourExtractorIE(InfoExtractor):
+            _VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
+            _TEST = {
+                'url': 'http://yourextractor.com/watch/42',
+                'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
+                'info_dict': {
+                    'id': '42',
+                    'ext': 'mp4',
+                    'title': 'Video title goes here',
+                    'thumbnail': r're:^https?://.*\.jpg$',
+                    # TODO more properties, either as:
+                    # * A value
+                    # * MD5 checksum; start the string with md5:
+                    # * A regular expression; start the string with re:
+                    # * Any Python type (for example int or float)
+                }
+            }
+
+            def _real_extract(self, url):
+                video_id = self._match_id(url)
+                webpage = self._download_webpage(url, video_id)
+
+                # TODO more code goes here, for example ...
+                title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
+
+                return {
+                    'id': video_id,
+                    'title': title,
+                    'description': self._og_search_description(webpage),
+                    'uploader': self._search_regex(r'<div[^>]+id="uploader"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False),
+                    # TODO more properties (see youtube_dl/extractor/common.py)
+                }
+
+5.  Add an import in youtube_dl/extractor/extractors.py.
+6.  Run python test/test_download.py TestDownload.test_YourExtractor.
+    This _should fail_ at first, but you can continually re-run it until
+    you're done. If you decide to add more than one test, then rename
+    _TEST to _TESTS and make it into a list of dictionaries. The tests
+    will then be named TestDownload.test_YourExtractor,
+    TestDownload.test_YourExtractor_1,
+    TestDownload.test_YourExtractor_2, etc.
+7.  Have a look at youtube_dl/extractor/common.py for possible helper
+    methods and a detailed description of what your extractor should and
+    may return. Add tests and code for as many as you want.
+8.  Make sure your code follows youtube-dl coding conventions and check
+    the code with flake8. Also make sure your code works under all
+    Python versions claimed supported by youtube-dl, namely 2.6, 2.7,
+    and 3.2+.
+9.  When the tests pass, add the new files and commit them and push the
+    result, like this:
+
+        $ git add youtube_dl/extractor/extractors.py
+        $ git add youtube_dl/extractor/yourextractor.py
+        $ git commit -m '[yourextractor] Add new extractor'
+        $ git push origin yourextractor
+
+10. Finally, create a pull request. We'll then review and merge it.
+
+In any case, thank you very much for your contributions!
+
+
+youtube-dl coding conventions
+
+This section introduces a guide lines for writing idiomatic, robust and
+future-proof extractor code.
+
+Extractors are very fragile by nature since they depend on the layout of
+the source data provided by 3rd party media hosters out of your control
+and this layout tends to change. As an extractor implementer your task
+is not only to write code that will extract media links and metadata
+correctly but also to minimize dependency on the source's layout and
+even to make the code foresee potential future changes and be ready for
+that. This is important because it will allow the extractor not to break
+on minor layout changes thus keeping old youtube-dl versions working.
+Even though this breakage issue is easily fixed by emitting a new
+version of youtube-dl with a fix incorporated, all the previous versions
+become broken in all repositories and distros' packages that may not be
+so prompt in fetching the update from us. Needless to say, some non
+rolling release distros may never receive an update at all.
+
+Mandatory and optional metafields
+
+For extraction to work youtube-dl relies on metadata your extractor
+extracts and provides to youtube-dl expressed by an information
+dictionary or simply _info dict_. Only the following meta fields in the
+_info dict_ are considered mandatory for a successful extraction process
+by youtube-dl:
+
+-   id (media identifier)
+-   title (media title)
+-   url (media download URL) or formats
+
+In fact only the last option is technically mandatory (i.e. if you can't
+figure out the download location of the media the extraction does not
+make any sense). But by convention youtube-dl also treats id and title
+as mandatory. Thus the aforementioned metafields are the critical data
+that the extraction does not make any sense without and if any of them
+fail to be extracted then the extractor is considered completely broken.
+
+Any field apart from the aforementioned ones are considered OPTIONAL.
+That means that extraction should be TOLERANT to situations when sources
+for these fields can potentially be unavailable (even if they are always
+available at the moment) and FUTURE-PROOF in order not to break the
+extraction of general purpose mandatory fields.
+
+Example
+
+Say you have some source dictionary meta that you've fetched as JSON
+with HTTP request and it has a key summary:
+
+    meta = self._download_json(url, video_id)
+
+Assume at this point meta's layout is:
+
+    {
+        ...
+        "summary": "some fancy summary text",
+        ...
+    }
+
+Assume you want to extract summary and put it into the resulting info
+dict as description. Since description is an optional meta field you
+should be ready that this key may be missing from the meta dict, so that
+you should extract it like:
+
+    description = meta.get('summary')  # correct
+
+and not like:
+
+    description = meta['summary']  # incorrect
+
+The latter will break extraction process with KeyError if summary
+disappears from meta at some later time but with the former approach
+extraction will just go ahead with description set to None which is
+perfectly fine (remember None is equivalent to the absence of data).
+
+Similarly, you should pass fatal=False when extracting optional data
+from a webpage with _search_regex, _html_search_regex or similar
+methods, for instance:
+
+    description = self._search_regex(
+        r'<span[^>]+id="title"[^>]*>([^<]+)<',
+        webpage, 'description', fatal=False)
+
+With fatal set to False if _search_regex fails to extract description it
+will emit a warning and continue extraction.
+
+You can also pass default=<some fallback value>, for example:
+
+    description = self._search_regex(
+        r'<span[^>]+id="title"[^>]*>([^<]+)<',
+        webpage, 'description', default=None)
+
+On failure this code will silently continue the extraction with
+description set to None. That is useful for metafields that may or may
+not be present.
+
+Provide fallbacks
+
+When extracting metadata try to do so from multiple sources. For example
+if title is present in several places, try extracting from at least some
+of them. This makes it more future-proof in case some of the sources
+become unavailable.
+
+Example
+
+Say meta from the previous example has a title and you are about to
+extract it. Since title is a mandatory meta field you should end up with
+something like:
+
+    title = meta['title']
+
+If title disappears from meta in future due to some changes on the
+hoster's side the extraction would fail since title is mandatory. That's
+expected.
+
+Assume that you have some another source you can extract title from, for
+example og:title HTML meta of a webpage. In this case you can provide a
+fallback scenario:
+
+    title = meta.get('title') or self._og_search_title(webpage)
+
+This code will try to extract from meta first and if it fails it will
+try extracting og:title from a webpage.
+
+Make regular expressions flexible
+
+When using regular expressions try to write them fuzzy and flexible.
+
+Example
+
+Say you need to extract title from the following HTML code:
+
+    <span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
+
+The code for that task should look similar to:
+
+    title = self._search_regex(
+        r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
+
+Or even better:
+
+    title = self._search_regex(
+        r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
+        webpage, 'title', group='title')
+
+Note how you tolerate potential changes in the style attribute's value
+or switch from using double quotes to single for class attribute:
+
+The code definitely should not look like:
+
+    title = self._search_regex(
+        r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
+        webpage, 'title', group='title')
+
+Use safe conversion functions
+
+Wrap all extracted numeric data into safe functions from utils:
+int_or_none, float_or_none. Use them for string to number conversions as
+well.
+
+
+
+EMBEDDING YOUTUBE-DL
+
+
+youtube-dl makes the best effort to be a good command-line program, and
+thus should be callable from any programming language. If you encounter
+any problems parsing its output, feel free to create a report.
+
+From a Python program, you can embed youtube-dl in a more powerful
+fashion, like this:
+
+    from __future__ import unicode_literals
+    import youtube_dl
+
+    ydl_opts = {}
+    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
+        ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
+
+Most likely, you'll want to use various options. For a list of options
+available, have a look at youtube_dl/YoutubeDL.py. For a start, if you
+want to intercept youtube-dl's output, set a logger object.
+
+Here's a more complete example of a program that outputs only errors
+(and a short message after the download is finished), and
+downloads/converts the video to an mp3 file:
+
+    from __future__ import unicode_literals
+    import youtube_dl
+
+
+    class MyLogger(object):
+        def debug(self, msg):
+            pass
+
+        def warning(self, msg):
+            pass
+
+        def error(self, msg):
+            print(msg)
+
+
+    def my_hook(d):
+        if d['status'] == 'finished':
+            print('Done downloading, now converting ...')
+
+
+    ydl_opts = {
+        'format': 'bestaudio/best',
+        'postprocessors': [{
+            'key': 'FFmpegExtractAudio',
+            'preferredcodec': 'mp3',
+            'preferredquality': '192',
+        }],
+        'logger': MyLogger(),
+        'progress_hooks': [my_hook],
+    }
+    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
+        ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
+
+
+
+BUGS
+
+
+Bugs and suggestions should be reported at:
+https://github.com/rg3/youtube-dl/issues. Unless you were prompted to or
+there is another pertinent reason (e.g. GitHub fails to accept the bug
+report), please do not send bug reports via personal email. For
+discussions, join us in the IRC channel #youtube-dl on freenode
+(webchat).
+
+PLEASE INCLUDE THE FULL OUTPUT OF YOUTUBE-DL WHEN RUN WITH -v, i.e. ADD
+-v flag to YOUR COMMAND LINE, copy the WHOLE output and post it in the
+issue body wrapped in ``` for better formatting. It should look similar
+to this:
+
+    $ youtube-dl -v <your command line>
+    [debug] System config: []
+    [debug] User config: []
+    [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
+    [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
+    [debug] youtube-dl version 2015.12.06
+    [debug] Git HEAD: 135392e
+    [debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
+    [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
+    [debug] Proxy map: {}
+    ...
+
+DO NOT POST SCREENSHOTS OF VERBOSE LOGS; ONLY PLAIN TEXT IS ACCEPTABLE.
+
+The output (including the first lines) contains important debugging
+information. Issues without the full output are often not reproducible
+and therefore do not get solved in short order, if ever.
+
+Please re-read your issue once again to avoid a couple of common
+mistakes (you can and should use this as a checklist):
+
+Is the description of the issue itself sufficient?
+
+We often get issue reports that we cannot really decipher. While in most
+cases we eventually get the required information after asking back
+multiple times, this poses an unnecessary drain on our resources. Many
+contributors, including myself, are also not native speakers, so we may
+misread some parts.
+
+So please elaborate on what feature you are requesting, or what bug you
+want to be fixed. Make sure that it's obvious
+
+-   What the problem is
+-   How it could be fixed
+-   How your proposed solution would look like
+
+If your report is shorter than two lines, it is almost certainly missing
+some of these, which makes it hard for us to respond to it. We're often
+too polite to close the issue outright, but the missing info makes
+misinterpretation likely. As a committer myself, I often get frustrated
+by these issues, since the only possible way for me to move forward on
+them is to ask for clarification over and over.
+
+For bug reports, this means that your report should contain the
+_complete_ output of youtube-dl when called with the -v flag. The error
+message you get for (most) bugs even says so, but you would not believe
+how many of our bug reports do not contain this information.
+
+If your server has multiple IPs or you suspect censorship, adding
+--call-home may be a good idea to get more diagnostics. If the error is
+ERROR: Unable to extract ... and you cannot reproduce it from multiple
+countries, add --dump-pages (warning: this will yield a rather large
+output, redirect it to the file log.txt by adding >log.txt 2>&1 to your
+command-line) or upload the .dump files you get when you add
+--write-pages somewhere.
+
+SITE SUPPORT REQUESTS MUST CONTAIN AN EXAMPLE URL. An example URL is a
+URL you might want to download, like
+http://www.youtube.com/watch?v=BaW_jenozKc. There should be an obvious
+video present. Except under very special circumstances, the main page of
+a video service (e.g. http://www.youtube.com/) is _not_ an example URL.
+
+Are you using the latest version?
+
+Before reporting any issue, type youtube-dl -U. This should report that
+you're up-to-date. About 20% of the reports we receive are already
+fixed, but people are using outdated versions. This goes for feature
+requests as well.
+
+Is the issue already documented?
+
+Make sure that someone has not already opened the issue you're trying to
+open. Search at the top of the window or browse the GitHub Issues of
+this repository. If there is an issue, feel free to write something
+along the lines of "This affects me as well, with version 2015.01.01.
+Here is some more information on the issue: ...". While some issues may
+be old, a new post into them often spurs rapid activity.
+
+Why are existing options not enough?
+
+Before requesting a new feature, please have a quick peek at the list of
+supported options. Many feature requests are for features that actually
+exist already! Please, absolutely do show off your work in the issue
+report and detail how the existing similar options do _not_ solve your
+problem.
+
+Is there enough context in your bug report?
+
+People want to solve problems, and often think they do us a favor by
+breaking down their larger problems (e.g. wanting to skip already
+downloaded files) to a specific request (e.g. requesting us to look
+whether the file exists before downloading the info page). However, what
+often happens is that they break down the problem into two steps: One
+simple, and one impossible (or extremely complicated one).
+
+We are then presented with a very complicated request when the original
+problem could be solved far easier, e.g. by recording the downloaded
+video IDs in a separate file. To avoid this, you must include the
+greater context where it is non-obvious. In particular, every feature
+request that does not consist of adding support for a new site should
+contain a use case scenario that explains in what situation the missing
+feature would be useful.
+
+Does the issue involve one problem, and one problem only?
+
+Some of our users seem to think there is a limit of issues they can or
+should open. There is no limit of issues they can or should open. While
+it may seem appealing to be able to dump all your issues into one
+ticket, that means that someone who solves one of your issues cannot
+mark the issue as closed. Typically, reporting a bunch of issues leads
+to the ticket lingering since nobody wants to attack that behemoth,
+until someone mercifully splits the issue into multiple ones.
+
+In particular, every site support request issue should only pertain to
+services at one site (generally under a common domain, but always using
+the same backend technology). Do not request support for vimeo user
+videos, White house podcasts, and Google Plus pages in the same issue.
+Also, make sure that you don't post bug reports alongside feature
+requests. As a rule of thumb, a feature request does not include outputs
+of youtube-dl that are not immediately related to the feature at hand.
+Do not post reports of a network error alongside the request for a new
+video service.
+
+Is anyone going to need the feature?
+
+Only post features that you (or an incapacitated friend you can
+personally talk to) require. Do not post features because they seem like
+a good idea. If they are really useful, they will be requested by
+someone who requires them.
+
+Is your question about youtube-dl?
+
+It may sound strange, but some bug reports we receive are completely
+unrelated to youtube-dl and relate to a different, or even the
+reporter's own, application. Please make sure that you are actually
+using youtube-dl. If you are using a UI for youtube-dl, report the bug
+to the maintainer of the actual application providing the UI. On the
+other hand, if your UI for youtube-dl fails in some way you believe is
+related to youtube-dl, by all means, go ahead and report the bug.
+
+
+
+COPYRIGHT
+
+
+youtube-dl is released into the public domain by the copyright holders.
+
+This README file was originally written by Daniel Bolton and is likewise
+released into the public domain.
diff --git a/devscripts/buildserver.py b/devscripts/buildserver.py

index fc99c3213dddf985cfcf4fe74584cc09eeaf3175..1344b4d87b554b690fa8d5f0fab5462b7397aaea 100644 (file)
--- a/devscripts/buildserver.py
+++ b/devscripts/buildserver.py
@@ -424,8 +424,6 @@ class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
                      self.send_header('Content-Length', len(msg))
                      self.end_headers()
                      self.wfile.write(msg)
-                except HTTPError as e:
-                    self.send_response(e.code, str(e))
              else:
                  self.send_response(500, 'Unknown build method "%s"' % action)
          else:
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index edb76d9cc9eef838d1146a487e80fbc35a9f5ae7..2d82cc321cbbc4e608382ab0160a4bda50b282fe 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -33,7 +33,8 @@
   - **AdobeTVVideo**
   - **AdultSwim**
   - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- - **AfreecaTV**: afreecatv.com
+ - **afreecatv**: afreecatv.com
+ - **afreecatv:global**: afreecatv.com
   - **AirMozilla**
   - **AlJazeera**
   - **Allocine**
@@ -74,6 +75,8 @@
   - **awaan:live**
   - **awaan:season**
   - **awaan:video**
+ - **AZMedien**: AZ Medien videos
+ - **AZMedienPlaylist**: AZ Medien playlists
   - **Azubu**
   - **AzubuLive**
   - **BaiduVideo**: 百度视频
@@ -81,11 +84,13 @@
   - **bambuser:channel**
   - **Bandcamp**
   - **Bandcamp:album**
+ - **bangumi.bilibili.com**: BiliBili番剧
   - **bbc**: BBC
   - **bbc.co.uk**: BBC iPlayer
   - **bbc.co.uk:article**: BBC articles
   - **bbc.co.uk:iplayer:playlist**
   - **bbc.co.uk:playlist**
+ - **Beam:live**
   - **Beatport**
   - **Beeg**
   - **BehindKink**
@@ -131,7 +136,8 @@
   - **cbsnews**: CBS News
   - **cbsnews:livevideo**: CBS News Live Videos
   - **CBSSports**
- - **CCTV**
+ - **CCMA**
+ - **CCTV**: 央视网
   - **CDA**
   - **CeskaTelevize**
   - **channel9**: Channel 9
@@ -197,6 +203,7 @@
   - **Digiteka**
   - **Discovery**
   - **DiscoveryGo**
+ - **Disney**
   - **Dotsub**
   - **DouyuTV**: 斗鱼
   - **DPlay**
@@ -205,7 +212,8 @@
   - **DRBonanza**
   - **Dropbox**
   - **DrTuber**
- - **DRTV**
+ - **drtv**
+ - **drtv:live**
   - **Dumpert**
   - **dvtv**: http://video.aktualne.cz/
   - **dw**
@@ -213,6 +221,7 @@
   - **EaglePlatform**
   - **EbaumsWorld**
   - **EchoMsk**
+ - **egghead:course**: egghead.io course
   - **eHow**
   - **Einthusan**
   - **eitb.tv**
@@ -239,8 +248,9 @@
   - **fc2**
   - **fc2:embed**
   - **Fczenit**
- - **features.aol.com**
   - **fernsehkritik.tv**
+ - **filmon**
+ - **filmon:channel**
   - **Firstpost**
   - **FiveTV**
   - **Flickr**
@@ -262,7 +272,6 @@
   - **francetvinfo.fr**
   - **Freesound**
   - **freespeech.org**
- - **FreeVideo**
   - **Funimation**
   - **FunnyOrDie**
   - **Fusion**
@@ -273,6 +282,7 @@
   - **Gamersyde**
   - **GameSpot**
   - **GameStar**
+ - **Gaskrank**
   - **Gazeta**
   - **GDCVault**
   - **generic**: Generic downloader that works on some sites
@@ -304,6 +314,7 @@
   - **history:topic**: History.com Topic
   - **hitbox**
   - **hitbox:live**
+ - **HitRecord**
   - **HornBunny**
   - **HotNewHipHop**
   - **HotStar**
@@ -321,6 +332,7 @@
   - **Imgur**
   - **ImgurAlbum**
   - **Ina**
+ - **Inc**
   - **Indavideo**
   - **IndavideoEmbed**
   - **InfoQ**
@@ -330,6 +342,7 @@
   - **IPrima**
   - **iqiyi**: 爱奇艺
   - **Ir90Tv**
+ - **ITV**
   - **ivi**: ivi.ru
   - **ivi:compilation**: ivi.ru compilations
   - **ivideon**: Ivideon TV
@@ -364,7 +377,8 @@
   - **kuwo:singer**: 酷我音乐 - 歌手
   - **kuwo:song**: 酷我音乐
   - **la7.it**
- - **Laola1Tv**
+ - **laola1tv**
+ - **laola1tv:embed**
   - **LCI**
   - **Lcp**
   - **LcpPlay**
@@ -402,6 +416,8 @@
   - **MatchTV**
   - **MDR**: MDR.DE and KiKA
   - **media.ccc.de**
+ - **Meipai**: 美拍
+ - **MelonVOD**
   - **META**
   - **metacafe**
   - **Metacritic**
@@ -435,6 +451,7 @@
   - **mtg**: MTG services
   - **mtv**
   - **mtv.de**
+ - **mtv81**
   - **mtv:video**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
@@ -477,6 +494,7 @@
   - **Newstube**
   - **NextMedia**: 蘋果日報
   - **NextMediaActionNews**: 蘋果日報 - 動新聞
+ - **NextTV**: 壹電視
   - **nfb**: National Film Board of Canada
   - **nfl.com**
   - **NhkVod**
@@ -513,6 +531,9 @@
   - **NRKPlaylist**
   - **NRKSkole**: NRK Skole
   - **NRKTV**: NRK TV and NRK Radio
+ - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
+ - **NRKTVEpisodes**
+ - **NRKTVSeries**
   - **ntv.ru**
   - **Nuvid**
   - **NYTimes**
@@ -523,6 +544,7 @@
   - **Odnoklassniki**
   - **OktoberfestTV**
   - **on.aol.com**
+ - **OnDemandKorea**
   - **onet.tv**
   - **onet.tv:channel**
   - **OnionStudios**
@@ -546,6 +568,7 @@
   - **PhilharmonieDeParis**: Philharmonie de Paris
   - **phoenix.de**
   - **Photobucket**
+ - **Piksel**
   - **Pinkbike**
   - **Pladform**
   - **play.fm**
@@ -562,6 +585,7 @@
   - **PolskieRadio**
   - **PolskieRadioCategory**
   - **PornCom**
+ - **PornFlip**
   - **PornHd**
   - **PornHub**: PornHub and Thumbzilla
   - **PornHubPlaylist**
@@ -643,7 +667,6 @@
   - **screen.yahoo:search**: Yahoo screen search
   - **Screencast**
   - **ScreencastOMatic**
- - **ScreenJunkies**
   - **Seeker**
   - **SenateISVP**
   - **SendtoNews**
@@ -651,7 +674,7 @@
   - **Sexu**
   - **Shahid**
   - **Shared**: shared.sx
- - **ShareSix**
+ - **ShowRoomLive**
   - **Sina**
   - **SixPlay**
   - **skynewsarabia:article**
@@ -685,7 +708,6 @@
   - **Spiegeltv**
   - **Spike**
   - **Sport5**
- - **SportBox**
   - **SportBoxEmbed**
   - **SportDeutschland**
   - **Sportschau**
@@ -771,6 +793,7 @@
   - **TV2Article**
   - **TV3**
   - **TV4**: tv4.se and tv4play.se
+ - **TVA**
   - **TVANouvelles**
   - **TVANouvellesArticle**
   - **TVC**
@@ -784,10 +807,13 @@
   - **Tweakers**
   - **twitch:chapter**
   - **twitch:clips**
- - **twitch:past_broadcasts**
   - **twitch:profile**
   - **twitch:stream**
   - **twitch:video**
+ - **twitch:videos:all**
+ - **twitch:videos:highlights**
+ - **twitch:videos:past-broadcasts**
+ - **twitch:videos:uploads**
   - **twitch:vod**
   - **twitter**
   - **twitter:amplify**
@@ -795,6 +821,7 @@
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
+ - **UKTVPlay**
   - **Unistra**
   - **uol.com.br**
   - **uplynk**
@@ -823,6 +850,7 @@
   - **ViceShow**
   - **Vidbit**
   - **Viddler**
+ - **Videa**
   - **video.google:search**: Google Video search
   - **video.mit.edu**
   - **VideoDetective**
@@ -832,7 +860,7 @@
   - **videomore:season**
   - **videomore:video**
   - **VideoPremium**
- - **VideoTt**: video.tt - Your True Tube (Currently broken)
+ - **VideoPress**
   - **videoweed**: VideoWeed
   - **Vidio**
   - **vidme**
@@ -859,11 +887,15 @@
   - **Vimple**: Vimple - one-click video hosting
   - **Vine**
   - **vine:user**
+ - **Viu**
+ - **viu:ott**
+ - **viu:playlist**
   - **Vivo**: vivo.sx
   - **vk**: VK
   - **vk:uservideos**: VK - User's Videos
   - **vk:wallpost**
   - **vlive**
+ - **vlive:channel**
   - **Vodlocker**
   - **VODPlatform**
   - **VoiceRepublic**
@@ -873,6 +905,7 @@
   - **VRT**
   - **vube**: Vube.com
   - **VuClip**
+ - **VVVVID**
   - **VyboryMos**
   - **Vzaar**
   - **Walla**
diff --git a/setup.cfg b/setup.cfg

deleted file mode 100644 (file)

index 2dc06ff..0000000
--- a/setup.cfg
+++ /dev/null
@@ -1,6 +0,0 @@
-[wheel]
-universal = True
-
-[flake8]
-exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
-ignore = E402,E501,E731
diff --git a/test/test_utils.py b/test/test_utils.py

index 2e3cd0179db9dd97792fafa695f7e7a043542a38..edc712f0741576c852be2b528f95dcf81f309bfc 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -70,6 +70,7 @@ from youtube_dl.utils import (
      lowercase_escape,
      url_basename,
      base_url,
+    urljoin,
      urlencode_postdata,
      urshift,
      update_url_query,
@@ -294,6 +295,9 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
          self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
          self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
+        self.assertEqual(unified_strdate('July 15th, 2013'), '20130715')
+        self.assertEqual(unified_strdate('September 1st, 2013'), '20130901')
+        self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902')
  
      def test_unified_timestamps(self):
          self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@@ -445,6 +449,23 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(base_url('http://foo.de/bar/baz'), 'http://foo.de/bar/')
          self.assertEqual(base_url('http://foo.de/bar/baz?x=z/x/c'), 'http://foo.de/bar/')
  
+    def test_urljoin(self):
+        self.assertEqual(urljoin('http://foo.de/', '/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('//foo.de/', '/a/b/c.txt'), '//foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', 'a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de', '/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de', 'a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', '//foo.de/a/b/c.txt'), '//foo.de/a/b/c.txt')
+        self.assertEqual(urljoin(None, 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin(None, '//foo.de/a/b/c.txt'), '//foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('', 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin(['foobar'], 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', None), None)
+        self.assertEqual(urljoin('http://foo.de/', ''), None)
+        self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
+        self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
+
      def test_parse_age_limit(self):
          self.assertEqual(parse_age_limit(None), None)
          self.assertEqual(parse_age_limit(False), None)
@@ -489,6 +510,7 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
          self.assertEqual(parse_duration('87 Min.'), 5220)
          self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
+        self.assertEqual(parse_duration('PT00H03M30SZ'), 210)
  
      def test_fix_xml_ampersands(self):
          self.assertEqual(
@@ -763,12 +785,27 @@ class TestUtil(unittest.TestCase):
          on = js_to_json('["abc", "def",]')
          self.assertEqual(json.loads(on), ['abc', 'def'])
  
+        on = js_to_json('[/*comment\n*/"abc"/*comment\n*/,/*comment\n*/"def",/*comment\n*/]')
+        self.assertEqual(json.loads(on), ['abc', 'def'])
+
+        on = js_to_json('[//comment\n"abc" //comment\n,//comment\n"def",//comment\n]')
+        self.assertEqual(json.loads(on), ['abc', 'def'])
+
          on = js_to_json('{"abc": "def",}')
          self.assertEqual(json.loads(on), {'abc': 'def'})
  
+        on = js_to_json('{/*comment\n*/"abc"/*comment\n*/:/*comment\n*/"def"/*comment\n*/,/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'abc': 'def'})
+
          on = js_to_json('{ 0: /* " \n */ ",]" , }')
          self.assertEqual(json.loads(on), {'0': ',]'})
  
+        on = js_to_json('{ /*comment\n*/0/*comment\n*/: /* " \n */ ",]" , }')
+        self.assertEqual(json.loads(on), {'0': ',]'})
+
+        on = js_to_json('{ 0: // comment\n1 }')
+        self.assertEqual(json.loads(on), {'0': 1})
+
          on = js_to_json(r'["<p>x<\/p>"]')
          self.assertEqual(json.loads(on), ['<p>x</p>'])
  
@@ -778,15 +815,27 @@ class TestUtil(unittest.TestCase):
          on = js_to_json("['a\\\nb']")
          self.assertEqual(json.loads(on), ['ab'])
  
+        on = js_to_json("/*comment\n*/[/*comment\n*/'a\\\nb'/*comment\n*/]/*comment\n*/")
+        self.assertEqual(json.loads(on), ['ab'])
+
          on = js_to_json('{0xff:0xff}')
          self.assertEqual(json.loads(on), {'255': 255})
  
+        on = js_to_json('{/*comment\n*/0xff/*comment\n*/:/*comment\n*/0xff/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'255': 255})
+
          on = js_to_json('{077:077}')
          self.assertEqual(json.loads(on), {'63': 63})
  
+        on = js_to_json('{/*comment\n*/077/*comment\n*/:/*comment\n*/077/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'63': 63})
+
          on = js_to_json('{42:42}')
          self.assertEqual(json.loads(on), {'42': 42})
  
+        on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'42': 42})
+
      def test_extract_attributes(self):
          self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
          self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
diff --git a/tox.ini b/tox.ini

deleted file mode 100644 (file)

index 9c4e4a3..0000000
--- a/tox.ini
+++ /dev/null
@@ -1,14 +0,0 @@
-[tox]
-envlist = py26,py27,py33,py34,py35
-[testenv]
-deps =
-   nose
-   coverage
-# We need a valid $HOME for test_compat_expanduser
-passenv = HOME
-defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
-    --exclude test_subtitles.py --exclude test_write_annotations.py
-    --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
-    --exclude test_socks.py
-commands = nosetests --verbose {posargs:{[testenv]defaultargs}}  # --with-coverage --cover-package=youtube_dl --cover-html
-                                               # test.test_download:TestDownload.test_NowVideo
diff --git a/youtube-dl b/youtube-dl

new file mode 100755 (executable)

index 0000000..2b3870d

Binary files /dev/null and b/youtube-dl differ
diff --git a/youtube-dl.1 b/youtube-dl.1

new file mode 100644 (file)

index 0000000..38c41e2
--- /dev/null
+++ b/youtube-dl.1
@@ -0,0 +1,2490 @@
+.TH "YOUTUBE\-DL" "1" "" "" ""
+.SH NAME
+.PP
+youtube\-dl \- download videos from youtube.com or other video platforms
+.SH SYNOPSIS
+.PP
+\f[B]youtube\-dl\f[] [OPTIONS] URL [URL...]
+.SH DESCRIPTION
+.PP
+\f[B]youtube\-dl\f[] is a command\-line program to download videos from
+YouTube.com and a few more sites.
+It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is
+not platform specific.
+It should work on your Unix box, on Windows or on Mac OS X.
+It is released to the public domain, which means you can modify it,
+redistribute it or use it however you like.
+.SH OPTIONS
+.TP
+.B \-h, \-\-help
+Print this help text and exit
+.RS
+.RE
+.TP
+.B \-\-version
+Print program version and exit
+.RS
+.RE
+.TP
+.B \-U, \-\-update
+Update this program to latest version.
+Make sure that you have sufficient permissions (run with sudo if needed)
+.RS
+.RE
+.TP
+.B \-i, \-\-ignore\-errors
+Continue on download errors, for example to skip unavailable videos in a
+playlist
+.RS
+.RE
+.TP
+.B \-\-abort\-on\-error
+Abort downloading of further videos (in the playlist or the command
+line) if an error occurs
+.RS
+.RE
+.TP
+.B \-\-dump\-user\-agent
+Display the current browser identification
+.RS
+.RE
+.TP
+.B \-\-list\-extractors
+List all supported extractors
+.RS
+.RE
+.TP
+.B \-\-extractor\-descriptions
+Output descriptions of all supported extractors
+.RS
+.RE
+.TP
+.B \-\-force\-generic\-extractor
+Force extraction to use the generic extractor
+.RS
+.RE
+.TP
+.B \-\-default\-search \f[I]PREFIX\f[]
+Use this prefix for unqualified URLs.
+For example "gvsearch2:" downloads two videos from google videos for
+youtube\-dl "large apple".
+Use the value "auto" to let youtube\-dl guess ("auto_warning" to emit a
+warning when guessing).
+"error" just throws an error.
+The default value "fixup_error" repairs broken URLs, but emits an error
+if this is not possible instead of searching.
+.RS
+.RE
+.TP
+.B \-\-ignore\-config
+Do not read configuration files.
+When given in the global configuration file /etc/youtube\-dl.conf: Do
+not read the user configuration in ~/.config/youtube\- dl/config
+(%APPDATA%/youtube\-dl/config.txt on Windows)
+.RS
+.RE
+.TP
+.B \-\-config\-location \f[I]PATH\f[]
+Location of the configuration file; either the path to the config or its
+containing directory.
+.RS
+.RE
+.TP
+.B \-\-flat\-playlist
+Do not extract the videos of a playlist, only list them.
+.RS
+.RE
+.TP
+.B \-\-mark\-watched
+Mark videos watched (YouTube only)
+.RS
+.RE
+.TP
+.B \-\-no\-mark\-watched
+Do not mark videos watched (YouTube only)
+.RS
+.RE
+.TP
+.B \-\-no\-color
+Do not emit color codes in output
+.RS
+.RE
+.SS Network Options:
+.TP
+.B \-\-proxy \f[I]URL\f[]
+Use the specified HTTP/HTTPS/SOCKS proxy.
+To enable experimental SOCKS proxy, specify a proper scheme.
+For example socks5://127.0.0.1:1080/.
+Pass in an empty string (\-\-proxy "") for direct connection
+.RS
+.RE
+.TP
+.B \-\-socket\-timeout \f[I]SECONDS\f[]
+Time to wait before giving up, in seconds
+.RS
+.RE
+.TP
+.B \-\-source\-address \f[I]IP\f[]
+Client\-side IP address to bind to
+.RS
+.RE
+.TP
+.B \-4, \-\-force\-ipv4
+Make all connections via IPv4
+.RS
+.RE
+.TP
+.B \-6, \-\-force\-ipv6
+Make all connections via IPv6
+.RS
+.RE
+.TP
+.B \-\-geo\-verification\-proxy \f[I]URL\f[]
+Use this proxy to verify the IP address for some geo\-restricted sites.
+The default proxy specified by \-\-proxy (or none, if the options is not
+present) is used for the actual downloading.
+.RS
+.RE
+.SS Video Selection:
+.TP
+.B \-\-playlist\-start \f[I]NUMBER\f[]
+Playlist video to start at (default is 1)
+.RS
+.RE
+.TP
+.B \-\-playlist\-end \f[I]NUMBER\f[]
+Playlist video to end at (default is last)
+.RS
+.RE
+.TP
+.B \-\-playlist\-items \f[I]ITEM_SPEC\f[]
+Playlist video items to download.
+Specify indices of the videos in the playlist separated by commas like:
+"\-\-playlist\-items 1,2,5,8" if you want to download videos indexed 1,
+2, 5, 8 in the playlist.
+You can specify range: "\-\-playlist\-items 1\-3,7,10\-13", it will
+download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13.
+.RS
+.RE
+.TP
+.B \-\-match\-title \f[I]REGEX\f[]
+Download only matching titles (regex or caseless sub\-string)
+.RS
+.RE
+.TP
+.B \-\-reject\-title \f[I]REGEX\f[]
+Skip download for matching titles (regex or caseless sub\-string)
+.RS
+.RE
+.TP
+.B \-\-max\-downloads \f[I]NUMBER\f[]
+Abort after downloading NUMBER files
+.RS
+.RE
+.TP
+.B \-\-min\-filesize \f[I]SIZE\f[]
+Do not download any videos smaller than SIZE (e.g.
+50k or 44.6m)
+.RS
+.RE
+.TP
+.B \-\-max\-filesize \f[I]SIZE\f[]
+Do not download any videos larger than SIZE (e.g.
+50k or 44.6m)
+.RS
+.RE
+.TP
+.B \-\-date \f[I]DATE\f[]
+Download only videos uploaded in this date
+.RS
+.RE
+.TP
+.B \-\-datebefore \f[I]DATE\f[]
+Download only videos uploaded on or before this date (i.e.
+inclusive)
+.RS
+.RE
+.TP
+.B \-\-dateafter \f[I]DATE\f[]
+Download only videos uploaded on or after this date (i.e.
+inclusive)
+.RS
+.RE
+.TP
+.B \-\-min\-views \f[I]COUNT\f[]
+Do not download any videos with less than COUNT views
+.RS
+.RE
+.TP
+.B \-\-max\-views \f[I]COUNT\f[]
+Do not download any videos with more than COUNT views
+.RS
+.RE
+.TP
+.B \-\-match\-filter \f[I]FILTER\f[]
+Generic video filter.
+Specify any key (see help for \-o for a list of available keys) to match
+if the key is present, !key to check if the key is not present,key >
+NUMBER (like "comment_count > 12", also works with >=, <, <=, !=, =) to
+compare against a number, and & to require multiple matches.
+Values which are not known are excluded unless you put a question mark
+(?) after the operator.For example, to only match videos that have been
+liked more than 100 times and disliked less than 50 times (or the
+dislike functionality is not available at the given service), but who
+also have a description, use \-\-match\-filter "like_count > 100 &
+dislike_count <?
+50 & description" .
+.RS
+.RE
+.TP
+.B \-\-no\-playlist
+Download only the video, if the URL refers to a video and a playlist.
+.RS
+.RE
+.TP
+.B \-\-yes\-playlist
+Download the playlist, if the URL refers to a video and a playlist.
+.RS
+.RE
+.TP
+.B \-\-age\-limit \f[I]YEARS\f[]
+Download only videos suitable for the given age
+.RS
+.RE
+.TP
+.B \-\-download\-archive \f[I]FILE\f[]
+Download only videos not listed in the archive file.
+Record the IDs of all downloaded videos in it.
+.RS
+.RE
+.TP
+.B \-\-include\-ads
+Download advertisements as well (experimental)
+.RS
+.RE
+.SS Download Options:
+.TP
+.B \-r, \-\-limit\-rate \f[I]RATE\f[]
+Maximum download rate in bytes per second (e.g.
+50K or 4.2M)
+.RS
+.RE
+.TP
+.B \-R, \-\-retries \f[I]RETRIES\f[]
+Number of retries (default is 10), or "infinite".
+.RS
+.RE
+.TP
+.B \-\-fragment\-retries \f[I]RETRIES\f[]
+Number of retries for a fragment (default is 10), or "infinite" (DASH
+and hlsnative only)
+.RS
+.RE
+.TP
+.B \-\-skip\-unavailable\-fragments
+Skip unavailable fragments (DASH and hlsnative only)
+.RS
+.RE
+.TP
+.B \-\-abort\-on\-unavailable\-fragment
+Abort downloading when some fragment is not available
+.RS
+.RE
+.TP
+.B \-\-buffer\-size \f[I]SIZE\f[]
+Size of download buffer (e.g.
+1024 or 16K) (default is 1024)
+.RS
+.RE
+.TP
+.B \-\-no\-resize\-buffer
+Do not automatically adjust the buffer size.
+By default, the buffer size is automatically resized from an initial
+value of SIZE.
+.RS
+.RE
+.TP
+.B \-\-playlist\-reverse
+Download playlist videos in reverse order
+.RS
+.RE
+.TP
+.B \-\-playlist\-random
+Download playlist videos in random order
+.RS
+.RE
+.TP
+.B \-\-xattr\-set\-filesize
+Set file xattribute ytdl.filesize with expected file size (experimental)
+.RS
+.RE
+.TP
+.B \-\-hls\-prefer\-native
+Use the native HLS downloader instead of ffmpeg
+.RS
+.RE
+.TP
+.B \-\-hls\-prefer\-ffmpeg
+Use ffmpeg instead of the native HLS downloader
+.RS
+.RE
+.TP
+.B \-\-hls\-use\-mpegts
+Use the mpegts container for HLS videos, allowing to play the video
+while downloading (some players may not be able to play it)
+.RS
+.RE
+.TP
+.B \-\-external\-downloader \f[I]COMMAND\f[]
+Use the specified external downloader.
+Currently supports aria2c,avconv,axel,curl,ffmpeg,httpie,wget
+.RS
+.RE
+.TP
+.B \-\-external\-downloader\-args \f[I]ARGS\f[]
+Give these arguments to the external downloader
+.RS
+.RE
+.SS Filesystem Options:
+.TP
+.B \-a, \-\-batch\-file \f[I]FILE\f[]
+File containing URLs to download (\[aq]\-\[aq] for stdin)
+.RS
+.RE
+.TP
+.B \-\-id
+Use only video ID in file name
+.RS
+.RE
+.TP
+.B \-o, \-\-output \f[I]TEMPLATE\f[]
+Output filename template, see the "OUTPUT TEMPLATE" for all the info
+.RS
+.RE
+.TP
+.B \-\-autonumber\-size \f[I]NUMBER\f[]
+Specify the number of digits in %(autonumber)s when it is present in
+output filename template or \-\-auto\-number option is given (default is
+5)
+.RS
+.RE
+.TP
+.B \-\-autonumber\-start \f[I]NUMBER\f[]
+Specify the start value for %(autonumber)s (default is 1)
+.RS
+.RE
+.TP
+.B \-\-restrict\-filenames
+Restrict filenames to only ASCII characters, and avoid "&" and spaces in
+filenames
+.RS
+.RE
+.TP
+.B \-A, \-\-auto\-number
+[deprecated; use \-o "%(autonumber)s\-%(title)s.%(ext)s" ] Number
+downloaded files starting from 00000
+.RS
+.RE
+.TP
+.B \-t, \-\-title
+[deprecated] Use title in file name (default)
+.RS
+.RE
+.TP
+.B \-l, \-\-literal
+[deprecated] Alias of \-\-title
+.RS
+.RE
+.TP
+.B \-w, \-\-no\-overwrites
+Do not overwrite files
+.RS
+.RE
+.TP
+.B \-c, \-\-continue
+Force resume of partially downloaded files.
+By default, youtube\-dl will resume downloads if possible.
+.RS
+.RE
+.TP
+.B \-\-no\-continue
+Do not resume partially downloaded files (restart from beginning)
+.RS
+.RE
+.TP
+.B \-\-no\-part
+Do not use .part files \- write directly into output file
+.RS
+.RE
+.TP
+.B \-\-no\-mtime
+Do not use the Last\-modified header to set the file modification time
+.RS
+.RE
+.TP
+.B \-\-write\-description
+Write video description to a .description file
+.RS
+.RE
+.TP
+.B \-\-write\-info\-json
+Write video metadata to a .info.json file
+.RS
+.RE
+.TP
+.B \-\-write\-annotations
+Write video annotations to a .annotations.xml file
+.RS
+.RE
+.TP
+.B \-\-load\-info\-json \f[I]FILE\f[]
+JSON file containing the video information (created with the
+"\-\-write\-info\-json" option)
+.RS
+.RE
+.TP
+.B \-\-cookies \f[I]FILE\f[]
+File to read cookies from and dump cookie jar in
+.RS
+.RE
+.TP
+.B \-\-cache\-dir \f[I]DIR\f[]
+Location in the filesystem where youtube\-dl can store some downloaded
+information permanently.
+By default $XDG_CACHE_HOME/youtube\-dl or ~/.cache/youtube\-dl .
+At the moment, only YouTube player files (for videos with obfuscated
+signatures) are cached, but that may change.
+.RS
+.RE
+.TP
+.B \-\-no\-cache\-dir
+Disable filesystem caching
+.RS
+.RE
+.TP
+.B \-\-rm\-cache\-dir
+Delete all filesystem cache files
+.RS
+.RE
+.SS Thumbnail images:
+.TP
+.B \-\-write\-thumbnail
+Write thumbnail image to disk
+.RS
+.RE
+.TP
+.B \-\-write\-all\-thumbnails
+Write all thumbnail image formats to disk
+.RS
+.RE
+.TP
+.B \-\-list\-thumbnails
+Simulate and list all available thumbnail formats
+.RS
+.RE
+.SS Verbosity / Simulation Options:
+.TP
+.B \-q, \-\-quiet
+Activate quiet mode
+.RS
+.RE
+.TP
+.B \-\-no\-warnings
+Ignore warnings
+.RS
+.RE
+.TP
+.B \-s, \-\-simulate
+Do not download the video and do not write anything to disk
+.RS
+.RE
+.TP
+.B \-\-skip\-download
+Do not download the video
+.RS
+.RE
+.TP
+.B \-g, \-\-get\-url
+Simulate, quiet but print URL
+.RS
+.RE
+.TP
+.B \-e, \-\-get\-title
+Simulate, quiet but print title
+.RS
+.RE
+.TP
+.B \-\-get\-id
+Simulate, quiet but print id
+.RS
+.RE
+.TP
+.B \-\-get\-thumbnail
+Simulate, quiet but print thumbnail URL
+.RS
+.RE
+.TP
+.B \-\-get\-description
+Simulate, quiet but print video description
+.RS
+.RE
+.TP
+.B \-\-get\-duration
+Simulate, quiet but print video length
+.RS
+.RE
+.TP
+.B \-\-get\-filename
+Simulate, quiet but print output filename
+.RS
+.RE
+.TP
+.B \-\-get\-format
+Simulate, quiet but print output format
+.RS
+.RE
+.TP
+.B \-j, \-\-dump\-json
+Simulate, quiet but print JSON information.
+See \-\-output for a description of available keys.
+.RS
+.RE
+.TP
+.B \-J, \-\-dump\-single\-json
+Simulate, quiet but print JSON information for each command\-line
+argument.
+If the URL refers to a playlist, dump the whole playlist information in
+a single line.
+.RS
+.RE
+.TP
+.B \-\-print\-json
+Be quiet and print the video information as JSON (video is still being
+downloaded).
+.RS
+.RE
+.TP
+.B \-\-newline
+Output progress bar as new lines
+.RS
+.RE
+.TP
+.B \-\-no\-progress
+Do not print progress bar
+.RS
+.RE
+.TP
+.B \-\-console\-title
+Display progress in console titlebar
+.RS
+.RE
+.TP
+.B \-v, \-\-verbose
+Print various debugging information
+.RS
+.RE
+.TP
+.B \-\-dump\-pages
+Print downloaded pages encoded using base64 to debug problems (very
+verbose)
+.RS
+.RE
+.TP
+.B \-\-write\-pages
+Write downloaded intermediary pages to files in the current directory to
+debug problems
+.RS
+.RE
+.TP
+.B \-\-print\-traffic
+Display sent and read HTTP traffic
+.RS
+.RE
+.TP
+.B \-C, \-\-call\-home
+Contact the youtube\-dl server for debugging
+.RS
+.RE
+.TP
+.B \-\-no\-call\-home
+Do NOT contact the youtube\-dl server for debugging
+.RS
+.RE
+.SS Workarounds:
+.TP
+.B \-\-encoding \f[I]ENCODING\f[]
+Force the specified encoding (experimental)
+.RS
+.RE
+.TP
+.B \-\-no\-check\-certificate
+Suppress HTTPS certificate validation
+.RS
+.RE
+.TP
+.B \-\-prefer\-insecure
+Use an unencrypted connection to retrieve information about the video.
+(Currently supported only for YouTube)
+.RS
+.RE
+.TP
+.B \-\-user\-agent \f[I]UA\f[]
+Specify a custom user agent
+.RS
+.RE
+.TP
+.B \-\-referer \f[I]URL\f[]
+Specify a custom referer, use if the video access is restricted to one
+domain
+.RS
+.RE
+.TP
+.B \-\-add\-header \f[I]FIELD:VALUE\f[]
+Specify a custom HTTP header and its value, separated by a colon
+\[aq]:\[aq].
+You can use this option multiple times
+.RS
+.RE
+.TP
+.B \-\-bidi\-workaround
+Work around terminals that lack bidirectional text support.
+Requires bidiv or fribidi executable in PATH
+.RS
+.RE
+.TP
+.B \-\-sleep\-interval \f[I]SECONDS\f[]
+Number of seconds to sleep before each download when used alone or a
+lower bound of a range for randomized sleep before each download
+(minimum possible number of seconds to sleep) when used along with
+\-\-max\-sleep\-interval.
+.RS
+.RE
+.TP
+.B \-\-max\-sleep\-interval \f[I]SECONDS\f[]
+Upper bound of a range for randomized sleep before each download
+(maximum possible number of seconds to sleep).
+Must only be used along with \-\-min\-sleep\-interval.
+.RS
+.RE
+.SS Video Format Options:
+.TP
+.B \-f, \-\-format \f[I]FORMAT\f[]
+Video format code, see the "FORMAT SELECTION" for all the info
+.RS
+.RE
+.TP
+.B \-\-all\-formats
+Download all available video formats
+.RS
+.RE
+.TP
+.B \-\-prefer\-free\-formats
+Prefer free video formats unless a specific one is requested
+.RS
+.RE
+.TP
+.B \-F, \-\-list\-formats
+List all available formats of requested videos
+.RS
+.RE
+.TP
+.B \-\-youtube\-skip\-dash\-manifest
+Do not download the DASH manifests and related data on YouTube videos
+.RS
+.RE
+.TP
+.B \-\-merge\-output\-format \f[I]FORMAT\f[]
+If a merge is required (e.g.
+bestvideo+bestaudio), output to given container format.
+One of mkv, mp4, ogg, webm, flv.
+Ignored if no merge is required
+.RS
+.RE
+.SS Subtitle Options:
+.TP
+.B \-\-write\-sub
+Write subtitle file
+.RS
+.RE
+.TP
+.B \-\-write\-auto\-sub
+Write automatically generated subtitle file (YouTube only)
+.RS
+.RE
+.TP
+.B \-\-all\-subs
+Download all the available subtitles of the video
+.RS
+.RE
+.TP
+.B \-\-list\-subs
+List all available subtitles for the video
+.RS
+.RE
+.TP
+.B \-\-sub\-format \f[I]FORMAT\f[]
+Subtitle format, accepts formats preference, for example: "srt" or
+"ass/srt/best"
+.RS
+.RE
+.TP
+.B \-\-sub\-lang \f[I]LANGS\f[]
+Languages of the subtitles to download (optional) separated by commas,
+use \-\-list\- subs for available language tags
+.RS
+.RE
+.SS Authentication Options:
+.TP
+.B \-u, \-\-username \f[I]USERNAME\f[]
+Login with this account ID
+.RS
+.RE
+.TP
+.B \-p, \-\-password \f[I]PASSWORD\f[]
+Account password.
+If this option is left out, youtube\-dl will ask interactively.
+.RS
+.RE
+.TP
+.B \-2, \-\-twofactor \f[I]TWOFACTOR\f[]
+Two\-factor authentication code
+.RS
+.RE
+.TP
+.B \-n, \-\-netrc
+Use .netrc authentication data
+.RS
+.RE
+.TP
+.B \-\-video\-password \f[I]PASSWORD\f[]
+Video password (vimeo, smotri, youku)
+.RS
+.RE
+.SS Adobe Pass Options:
+.TP
+.B \-\-ap\-mso \f[I]MSO\f[]
+Adobe Pass multiple\-system operator (TV provider) identifier, use
+\-\-ap\-list\-mso for a list of available MSOs
+.RS
+.RE
+.TP
+.B \-\-ap\-username \f[I]USERNAME\f[]
+Multiple\-system operator account login
+.RS
+.RE
+.TP
+.B \-\-ap\-password \f[I]PASSWORD\f[]
+Multiple\-system operator account password.
+If this option is left out, youtube\-dl will ask interactively.
+.RS
+.RE
+.TP
+.B \-\-ap\-list\-mso
+List all supported multiple\-system operators
+.RS
+.RE
+.SS Post\-processing Options:
+.TP
+.B \-x, \-\-extract\-audio
+Convert video files to audio\-only files (requires ffmpeg or avconv and
+ffprobe or avprobe)
+.RS
+.RE
+.TP
+.B \-\-audio\-format \f[I]FORMAT\f[]
+Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or
+"wav"; "best" by default; No effect without \-x
+.RS
+.RE
+.TP
+.B \-\-audio\-quality \f[I]QUALITY\f[]
+Specify ffmpeg/avconv audio quality, insert a value between 0 (better)
+and 9 (worse) for VBR or a specific bitrate like 128K (default 5)
+.RS
+.RE
+.TP
+.B \-\-recode\-video \f[I]FORMAT\f[]
+Encode the video to another format if necessary (currently supported:
+mp4|flv|ogg|webm|mkv|avi)
+.RS
+.RE
+.TP
+.B \-\-postprocessor\-args \f[I]ARGS\f[]
+Give these arguments to the postprocessor
+.RS
+.RE
+.TP
+.B \-k, \-\-keep\-video
+Keep the video file on disk after the post\- processing; the video is
+erased by default
+.RS
+.RE
+.TP
+.B \-\-no\-post\-overwrites
+Do not overwrite post\-processed files; the post\-processed files are
+overwritten by default
+.RS
+.RE
+.TP
+.B \-\-embed\-subs
+Embed subtitles in the video (only for mp4, webm and mkv videos)
+.RS
+.RE
+.TP
+.B \-\-embed\-thumbnail
+Embed thumbnail in the audio as cover art
+.RS
+.RE
+.TP
+.B \-\-add\-metadata
+Write metadata to the video file
+.RS
+.RE
+.TP
+.B \-\-metadata\-from\-title \f[I]FORMAT\f[]
+Parse additional metadata like song title / artist from the video title.
+The format syntax is the same as \-\-output, the parsed parameters
+replace existing values.
+Additional templates: %(album)s, %(artist)s.
+Example: \-\-metadata\-from\-title "%(artist)s \- %(title)s" matches a
+title like "Coldplay \- Paradise"
+.RS
+.RE
+.TP
+.B \-\-xattrs
+Write metadata to the video file\[aq]s xattrs (using dublin core and xdg
+standards)
+.RS
+.RE
+.TP
+.B \-\-fixup \f[I]POLICY\f[]
+Automatically correct known faults of the file.
+One of never (do nothing), warn (only emit a warning), detect_or_warn
+(the default; fix file if we can, warn otherwise)
+.RS
+.RE
+.TP
+.B \-\-prefer\-avconv
+Prefer avconv over ffmpeg for running the postprocessors (default)
+.RS
+.RE
+.TP
+.B \-\-prefer\-ffmpeg
+Prefer ffmpeg over avconv for running the postprocessors
+.RS
+.RE
+.TP
+.B \-\-ffmpeg\-location \f[I]PATH\f[]
+Location of the ffmpeg/avconv binary; either the path to the binary or
+its containing directory.
+.RS
+.RE
+.TP
+.B \-\-exec \f[I]CMD\f[]
+Execute a command on the file after downloading, similar to find\[aq]s
+\-exec syntax.
+Example: \-\-exec \[aq]adb push {} /sdcard/Music/ && rm {}\[aq]
+.RS
+.RE
+.TP
+.B \-\-convert\-subs \f[I]FORMAT\f[]
+Convert the subtitles to other format (currently supported: srt|ass|vtt)
+.RS
+.RE
+.SH CONFIGURATION
+.PP
+You can configure youtube\-dl by placing any supported command line
+option to a configuration file.
+On Linux and OS X, the system wide configuration file is located at
+\f[C]/etc/youtube\-dl.conf\f[] and the user wide configuration file at
+\f[C]~/.config/youtube\-dl/config\f[].
+On Windows, the user wide configuration file locations are
+\f[C]%APPDATA%\\youtube\-dl\\config.txt\f[] or
+\f[C]C:\\Users\\<user\ name>\\youtube\-dl.conf\f[].
+Note that by default configuration file may not exist so you may need to
+create it yourself.
+.PP
+For example, with the following configuration file youtube\-dl will
+always extract the audio, not copy the mtime, use a proxy and save all
+videos under \f[C]Movies\f[] directory in your home directory:
+.IP
+.nf
+\f[C]
+#\ Lines\ starting\ with\ #\ are\ comments
+
+#\ Always\ extract\ audio
+\-x
+
+#\ Do\ not\ copy\ the\ mtime
+\-\-no\-mtime
+
+#\ Use\ this\ proxy
+\-\-proxy\ 127.0.0.1:3128
+
+#\ Save\ all\ videos\ under\ Movies\ directory\ in\ your\ home\ directory
+\-o\ ~/Movies/%(title)s.%(ext)s
+\f[]
+.fi
+.PP
+Note that options in configuration file are just the same options aka
+switches used in regular command line calls thus there \f[B]must be no
+whitespace\f[] after \f[C]\-\f[] or \f[C]\-\-\f[], e.g.
+\f[C]\-o\f[] or \f[C]\-\-proxy\f[] but not \f[C]\-\ o\f[] or
+\f[C]\-\-\ proxy\f[].
+.PP
+You can use \f[C]\-\-ignore\-config\f[] if you want to disable the
+configuration file for a particular youtube\-dl run.
+.PP
+You can also use \f[C]\-\-config\-location\f[] if you want to use custom
+configuration file for a particular youtube\-dl run.
+.SS Authentication with \f[C]\&.netrc\f[] file
+.PP
+You may also want to configure automatic credentials storage for
+extractors that support authentication (by providing login and password
+with \f[C]\-\-username\f[] and \f[C]\-\-password\f[]) in order not to
+pass credentials as command line arguments on every youtube\-dl
+execution and prevent tracking plain text passwords in the shell command
+history.
+You can achieve this using a \f[C]\&.netrc\f[]
+file (http://stackoverflow.com/tags/.netrc/info) on a per extractor
+basis.
+For that you will need to create a \f[C]\&.netrc\f[] file in your
+\f[C]$HOME\f[] and restrict permissions to read/write by only you:
+.IP
+.nf
+\f[C]
+touch\ $HOME/.netrc
+chmod\ a\-rwx,u+rw\ $HOME/.netrc
+\f[]
+.fi
+.PP
+After that you can add credentials for an extractor in the following
+format, where \f[I]extractor\f[] is the name of the extractor in
+lowercase:
+.IP
+.nf
+\f[C]
+machine\ <extractor>\ login\ <login>\ password\ <password>
+\f[]
+.fi
+.PP
+For example:
+.IP
+.nf
+\f[C]
+machine\ youtube\ login\ myaccount\@gmail.com\ password\ my_youtube_password
+machine\ twitch\ login\ my_twitch_account_name\ password\ my_twitch_password
+\f[]
+.fi
+.PP
+To activate authentication with the \f[C]\&.netrc\f[] file you should
+pass \f[C]\-\-netrc\f[] to youtube\-dl or place it in the configuration
+file (#configuration).
+.PP
+On Windows you may also need to setup the \f[C]%HOME%\f[] environment
+variable manually.
+.SH OUTPUT TEMPLATE
+.PP
+The \f[C]\-o\f[] option allows users to indicate a template for the
+output file names.
+.PP
+\f[B]tl;dr:\f[] navigate me to examples (#output-template-examples).
+.PP
+The basic usage is not to set any template arguments when downloading a
+single file, like in
+\f[C]youtube\-dl\ \-o\ funny_video.flv\ "http://some/video"\f[].
+However, it may contain special sequences that will be replaced when
+downloading each video.
+The special sequences have the format \f[C]%(NAME)s\f[].
+To clarify, that is a percent symbol followed by a name in parentheses,
+followed by a lowercase S.
+Allowed names are:
+.IP \[bu] 2
+\f[C]id\f[]: Video identifier
+.IP \[bu] 2
+\f[C]title\f[]: Video title
+.IP \[bu] 2
+\f[C]url\f[]: Video URL
+.IP \[bu] 2
+\f[C]ext\f[]: Video filename extension
+.IP \[bu] 2
+\f[C]alt_title\f[]: A secondary title of the video
+.IP \[bu] 2
+\f[C]display_id\f[]: An alternative identifier for the video
+.IP \[bu] 2
+\f[C]uploader\f[]: Full name of the video uploader
+.IP \[bu] 2
+\f[C]license\f[]: License name the video is licensed under
+.IP \[bu] 2
+\f[C]creator\f[]: The creator of the video
+.IP \[bu] 2
+\f[C]release_date\f[]: The date (YYYYMMDD) when the video was released
+.IP \[bu] 2
+\f[C]timestamp\f[]: UNIX timestamp of the moment the video became
+available
+.IP \[bu] 2
+\f[C]upload_date\f[]: Video upload date (YYYYMMDD)
+.IP \[bu] 2
+\f[C]uploader_id\f[]: Nickname or id of the video uploader
+.IP \[bu] 2
+\f[C]location\f[]: Physical location where the video was filmed
+.IP \[bu] 2
+\f[C]duration\f[]: Length of the video in seconds
+.IP \[bu] 2
+\f[C]view_count\f[]: How many users have watched the video on the
+platform
+.IP \[bu] 2
+\f[C]like_count\f[]: Number of positive ratings of the video
+.IP \[bu] 2
+\f[C]dislike_count\f[]: Number of negative ratings of the video
+.IP \[bu] 2
+\f[C]repost_count\f[]: Number of reposts of the video
+.IP \[bu] 2
+\f[C]average_rating\f[]: Average rating give by users, the scale used
+depends on the webpage
+.IP \[bu] 2
+\f[C]comment_count\f[]: Number of comments on the video
+.IP \[bu] 2
+\f[C]age_limit\f[]: Age restriction for the video (years)
+.IP \[bu] 2
+\f[C]format\f[]: A human\-readable description of the format
+.IP \[bu] 2
+\f[C]format_id\f[]: Format code specified by \f[C]\-\-format\f[]
+.IP \[bu] 2
+\f[C]format_note\f[]: Additional info about the format
+.IP \[bu] 2
+\f[C]width\f[]: Width of the video
+.IP \[bu] 2
+\f[C]height\f[]: Height of the video
+.IP \[bu] 2
+\f[C]resolution\f[]: Textual description of width and height
+.IP \[bu] 2
+\f[C]tbr\f[]: Average bitrate of audio and video in KBit/s
+.IP \[bu] 2
+\f[C]abr\f[]: Average audio bitrate in KBit/s
+.IP \[bu] 2
+\f[C]acodec\f[]: Name of the audio codec in use
+.IP \[bu] 2
+\f[C]asr\f[]: Audio sampling rate in Hertz
+.IP \[bu] 2
+\f[C]vbr\f[]: Average video bitrate in KBit/s
+.IP \[bu] 2
+\f[C]fps\f[]: Frame rate
+.IP \[bu] 2
+\f[C]vcodec\f[]: Name of the video codec in use
+.IP \[bu] 2
+\f[C]container\f[]: Name of the container format
+.IP \[bu] 2
+\f[C]filesize\f[]: The number of bytes, if known in advance
+.IP \[bu] 2
+\f[C]filesize_approx\f[]: An estimate for the number of bytes
+.IP \[bu] 2
+\f[C]protocol\f[]: The protocol that will be used for the actual
+download
+.IP \[bu] 2
+\f[C]extractor\f[]: Name of the extractor
+.IP \[bu] 2
+\f[C]extractor_key\f[]: Key name of the extractor
+.IP \[bu] 2
+\f[C]epoch\f[]: Unix epoch when creating the file
+.IP \[bu] 2
+\f[C]autonumber\f[]: Five\-digit number that will be increased with each
+download, starting at zero
+.IP \[bu] 2
+\f[C]playlist\f[]: Name or id of the playlist that contains the video
+.IP \[bu] 2
+\f[C]playlist_index\f[]: Index of the video in the playlist padded with
+leading zeros according to the total length of the playlist
+.IP \[bu] 2
+\f[C]playlist_id\f[]: Playlist identifier
+.IP \[bu] 2
+\f[C]playlist_title\f[]: Playlist title
+.PP
+Available for the video that belongs to some logical chapter or section:
+\- \f[C]chapter\f[]: Name or title of the chapter the video belongs to
+\- \f[C]chapter_number\f[]: Number of the chapter the video belongs to
+\- \f[C]chapter_id\f[]: Id of the chapter the video belongs to
+.PP
+Available for the video that is an episode of some series or programme:
+\- \f[C]series\f[]: Title of the series or programme the video episode
+belongs to \- \f[C]season\f[]: Title of the season the video episode
+belongs to \- \f[C]season_number\f[]: Number of the season the video
+episode belongs to \- \f[C]season_id\f[]: Id of the season the video
+episode belongs to \- \f[C]episode\f[]: Title of the video episode \-
+\f[C]episode_number\f[]: Number of the video episode within a season \-
+\f[C]episode_id\f[]: Id of the video episode
+.PP
+Available for the media that is a track or a part of a music album: \-
+\f[C]track\f[]: Title of the track \- \f[C]track_number\f[]: Number of
+the track within an album or a disc \- \f[C]track_id\f[]: Id of the
+track \- \f[C]artist\f[]: Artist(s) of the track \- \f[C]genre\f[]:
+Genre(s) of the track \- \f[C]album\f[]: Title of the album the track
+belongs to \- \f[C]album_type\f[]: Type of the album \-
+\f[C]album_artist\f[]: List of all artists appeared on the album \-
+\f[C]disc_number\f[]: Number of the disc or other physical medium the
+track belongs to \- \f[C]release_year\f[]: Year (YYYY) when the album
+was released
+.PP
+Each aforementioned sequence when referenced in an output template will
+be replaced by the actual value corresponding to the sequence name.
+Note that some of the sequences are not guaranteed to be present since
+they depend on the metadata obtained by a particular extractor.
+Such sequences will be replaced with \f[C]NA\f[].
+.PP
+For example for \f[C]\-o\ %(title)s\-%(id)s.%(ext)s\f[] and an mp4 video
+with title \f[C]youtube\-dl\ test\ video\f[] and id
+\f[C]BaW_jenozKcj\f[], this will result in a
+\f[C]youtube\-dl\ test\ video\-BaW_jenozKcj.mp4\f[] file created in the
+current directory.
+.PP
+Output templates can also contain arbitrary hierarchical path, e.g.
+\f[C]\-o\ \[aq]%(playlist)s/%(playlist_index)s\ \-\ %(title)s.%(ext)s\[aq]\f[]
+which will result in downloading each video in a directory corresponding
+to this path template.
+Any missing directory will be automatically created for you.
+.PP
+To use percent literals in an output template use \f[C]%%\f[].
+To output to stdout use \f[C]\-o\ \-\f[].
+.PP
+The current default template is \f[C]%(title)s\-%(id)s.%(ext)s\f[].
+.PP
+In some cases, you don\[aq]t want special characters such as 中, spaces,
+or &, such as when transferring the downloaded filename to a Windows
+system or the filename through an 8bit\-unsafe channel.
+In these cases, add the \f[C]\-\-restrict\-filenames\f[] flag to get a
+shorter title:
+.SS Output template and Windows batch files
+.PP
+If you are using an output template inside a Windows batch file then you
+must escape plain percent characters (\f[C]%\f[]) by doubling, so that
+\f[C]\-o\ "%(title)s\-%(id)s.%(ext)s"\f[] should become
+\f[C]\-o\ "%%(title)s\-%%(id)s.%%(ext)s"\f[].
+However you should not touch \f[C]%\f[]\[aq]s that are not plain
+characters, e.g.
+environment variables for expansion should stay intact:
+\f[C]\-o\ "C:\\%HOMEPATH%\\Desktop\\%%(title)s.%%(ext)s"\f[].
+.SS Output template examples
+.PP
+Note on Windows you may need to use double quotes instead of single.
+.IP
+.nf
+\f[C]
+$\ youtube\-dl\ \-\-get\-filename\ \-o\ \[aq]%(title)s.%(ext)s\[aq]\ BaW_jenozKc
+youtube\-dl\ test\ video\ \[aq]\[aq]_ä↭𝕐.mp4\ \ \ \ #\ All\ kinds\ of\ weird\ characters
+
+$\ youtube\-dl\ \-\-get\-filename\ \-o\ \[aq]%(title)s.%(ext)s\[aq]\ BaW_jenozKc\ \-\-restrict\-filenames
+youtube\-dl_test_video_.mp4\ \ \ \ \ \ \ \ \ \ #\ A\ simple\ file\ name
+
+#\ Download\ YouTube\ playlist\ videos\ in\ separate\ directory\ indexed\ by\ video\ order\ in\ a\ playlist
+$\ youtube\-dl\ \-o\ \[aq]%(playlist)s/%(playlist_index)s\ \-\ %(title)s.%(ext)s\[aq]\ https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
+
+#\ Download\ all\ playlists\ of\ YouTube\ channel/user\ keeping\ each\ playlist\ in\ separate\ directory:
+$\ youtube\-dl\ \-o\ \[aq]%(uploader)s/%(playlist)s/%(playlist_index)s\ \-\ %(title)s.%(ext)s\[aq]\ https://www.youtube.com/user/TheLinuxFoundation/playlists
+
+#\ Download\ Udemy\ course\ keeping\ each\ chapter\ in\ separate\ directory\ under\ MyVideos\ directory\ in\ your\ home
+$\ youtube\-dl\ \-u\ user\ \-p\ password\ \-o\ \[aq]~/MyVideos/%(playlist)s/%(chapter_number)s\ \-\ %(chapter)s/%(title)s.%(ext)s\[aq]\ https://www.udemy.com/java\-tutorial/
+
+#\ Download\ entire\ series\ season\ keeping\ each\ series\ and\ each\ season\ in\ separate\ directory\ under\ C:/MyVideos
+$\ youtube\-dl\ \-o\ "C:/MyVideos/%(series)s/%(season_number)s\ \-\ %(season)s/%(episode_number)s\ \-\ %(episode)s.%(ext)s"\ http://videomore.ru/kino_v_detalayah/5_sezon/367617
+
+#\ Stream\ the\ video\ being\ downloaded\ to\ stdout
+$\ youtube\-dl\ \-o\ \-\ BaW_jenozKc
+\f[]
+.fi
+.SH FORMAT SELECTION
+.PP
+By default youtube\-dl tries to download the best available quality,
+i.e.
+if you want the best quality you \f[B]don\[aq]t need\f[] to pass any
+special options, youtube\-dl will guess it for you by \f[B]default\f[].
+.PP
+But sometimes you may want to download in a different format, for
+example when you are on a slow or intermittent connection.
+The key mechanism for achieving this is so\-called \f[I]format
+selection\f[] based on which you can explicitly specify desired format,
+select formats based on some criterion or criteria, setup precedence and
+much more.
+.PP
+The general syntax for format selection is \f[C]\-\-format\ FORMAT\f[]
+or shorter \f[C]\-f\ FORMAT\f[] where \f[C]FORMAT\f[] is a \f[I]selector
+expression\f[], i.e.
+an expression that describes format or formats you would like to
+download.
+.PP
+\f[B]tl;dr:\f[] navigate me to examples (#format-selection-examples).
+.PP
+The simplest case is requesting a specific format, for example with
+\f[C]\-f\ 22\f[] you can download the format with format code equal to
+22.
+You can get the list of available format codes for particular video
+using \f[C]\-\-list\-formats\f[] or \f[C]\-F\f[].
+Note that these format codes are extractor specific.
+.PP
+You can also use a file extension (currently \f[C]3gp\f[], \f[C]aac\f[],
+\f[C]flv\f[], \f[C]m4a\f[], \f[C]mp3\f[], \f[C]mp4\f[], \f[C]ogg\f[],
+\f[C]wav\f[], \f[C]webm\f[] are supported) to download the best quality
+format of a particular file extension served as a single file, e.g.
+\f[C]\-f\ webm\f[] will download the best quality format with the
+\f[C]webm\f[] extension served as a single file.
+.PP
+You can also use special names to select particular edge case formats:
+\- \f[C]best\f[]: Select the best quality format represented by a single
+file with video and audio.
+\- \f[C]worst\f[]: Select the worst quality format represented by a
+single file with video and audio.
+\- \f[C]bestvideo\f[]: Select the best quality video\-only format (e.g.
+DASH video).
+May not be available.
+\- \f[C]worstvideo\f[]: Select the worst quality video\-only format.
+May not be available.
+\- \f[C]bestaudio\f[]: Select the best quality audio only\-format.
+May not be available.
+\- \f[C]worstaudio\f[]: Select the worst quality audio only\-format.
+May not be available.
+.PP
+For example, to download the worst quality video\-only format you can
+use \f[C]\-f\ worstvideo\f[].
+.PP
+If you want to download multiple videos and they don\[aq]t have the same
+formats available, you can specify the order of preference using
+slashes.
+Note that slash is left\-associative, i.e.
+formats on the left hand side are preferred, for example
+\f[C]\-f\ 22/17/18\f[] will download format 22 if it\[aq]s available,
+otherwise it will download format 17 if it\[aq]s available, otherwise it
+will download format 18 if it\[aq]s available, otherwise it will
+complain that no suitable formats are available for download.
+.PP
+If you want to download several formats of the same video use a comma as
+a separator, e.g.
+\f[C]\-f\ 22,17,18\f[] will download all these three formats, of course
+if they are available.
+Or a more sophisticated example combined with the precedence feature:
+\f[C]\-f\ 136/137/mp4/bestvideo,140/m4a/bestaudio\f[].
+.PP
+You can also filter the video formats by putting a condition in
+brackets, as in \f[C]\-f\ "best[height=720]"\f[] (or
+\f[C]\-f\ "[filesize>10M]"\f[]).
+.PP
+The following numeric meta fields can be used with comparisons
+\f[C]<\f[], \f[C]<=\f[], \f[C]>\f[], \f[C]>=\f[], \f[C]=\f[] (equals),
+\f[C]!=\f[] (not equals): \- \f[C]filesize\f[]: The number of bytes, if
+known in advance \- \f[C]width\f[]: Width of the video, if known \-
+\f[C]height\f[]: Height of the video, if known \- \f[C]tbr\f[]: Average
+bitrate of audio and video in KBit/s \- \f[C]abr\f[]: Average audio
+bitrate in KBit/s \- \f[C]vbr\f[]: Average video bitrate in KBit/s \-
+\f[C]asr\f[]: Audio sampling rate in Hertz \- \f[C]fps\f[]: Frame rate
+.PP
+Also filtering work for comparisons \f[C]=\f[] (equals), \f[C]!=\f[]
+(not equals), \f[C]^=\f[] (begins with), \f[C]$=\f[] (ends with),
+\f[C]*=\f[] (contains) and following string meta fields: \-
+\f[C]ext\f[]: File extension \- \f[C]acodec\f[]: Name of the audio codec
+in use \- \f[C]vcodec\f[]: Name of the video codec in use \-
+\f[C]container\f[]: Name of the container format \- \f[C]protocol\f[]:
+The protocol that will be used for the actual download, lower\-case
+(\f[C]http\f[], \f[C]https\f[], \f[C]rtsp\f[], \f[C]rtmp\f[],
+\f[C]rtmpe\f[], \f[C]mms\f[], \f[C]f4m\f[], \f[C]ism\f[], \f[C]m3u8\f[],
+or \f[C]m3u8_native\f[]) \- \f[C]format_id\f[]: A short description of
+the format
+.PP
+Note that none of the aforementioned meta fields are guaranteed to be
+present since this solely depends on the metadata obtained by particular
+extractor, i.e.
+the metadata offered by the video hoster.
+.PP
+Formats for which the value is not known are excluded unless you put a
+question mark (\f[C]?\f[]) after the operator.
+You can combine format filters, so
+\f[C]\-f\ "[height\ <=?\ 720][tbr>500]"\f[] selects up to 720p videos
+(or videos where the height is not known) with a bitrate of at least 500
+KBit/s.
+.PP
+You can merge the video and audio of two formats into a single file
+using \f[C]\-f\ <video\-format>+<audio\-format>\f[] (requires ffmpeg or
+avconv installed), for example \f[C]\-f\ bestvideo+bestaudio\f[] will
+download the best video\-only format, the best audio\-only format and
+mux them together with ffmpeg/avconv.
+.PP
+Format selectors can also be grouped using parentheses, for example if
+you want to download the best mp4 and webm formats with a height lower
+than 480 you can use \f[C]\-f\ \[aq](mp4,webm)[height<480]\[aq]\f[].
+.PP
+Since the end of April 2015 and version 2015.04.26, youtube\-dl uses
+\f[C]\-f\ bestvideo+bestaudio/best\f[] as the default format selection
+(see #5447 (https://github.com/rg3/youtube-dl/issues/5447),
+#5456 (https://github.com/rg3/youtube-dl/issues/5456)).
+If ffmpeg or avconv are installed this results in downloading
+\f[C]bestvideo\f[] and \f[C]bestaudio\f[] separately and muxing them
+together into a single file giving the best overall quality available.
+Otherwise it falls back to \f[C]best\f[] and results in downloading the
+best available quality served as a single file.
+\f[C]best\f[] is also needed for videos that don\[aq]t come from YouTube
+because they don\[aq]t provide the audio and video in two different
+files.
+If you want to only download some DASH formats (for example if you are
+not interested in getting videos with a resolution higher than 1080p),
+you can add \f[C]\-f\ bestvideo[height<=?1080]+bestaudio/best\f[] to
+your configuration file.
+Note that if you use youtube\-dl to stream to \f[C]stdout\f[] (and most
+likely to pipe it to your media player then), i.e.
+you explicitly specify output template as \f[C]\-o\ \-\f[], youtube\-dl
+still uses \f[C]\-f\ best\f[] format selection in order to start content
+delivery immediately to your player and not to wait until
+\f[C]bestvideo\f[] and \f[C]bestaudio\f[] are downloaded and muxed.
+.PP
+If you want to preserve the old format selection behavior (prior to
+youtube\-dl 2015.04.26), i.e.
+you want to download the best available quality media served as a single
+file, you should explicitly specify your choice with \f[C]\-f\ best\f[].
+You may want to add it to the configuration file (#configuration) in
+order not to type it every time you run youtube\-dl.
+.SS Format selection examples
+.PP
+Note on Windows you may need to use double quotes instead of single.
+.IP
+.nf
+\f[C]
+#\ Download\ best\ mp4\ format\ available\ or\ any\ other\ best\ if\ no\ mp4\ available
+$\ youtube\-dl\ \-f\ \[aq]bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best\[aq]
+
+#\ Download\ best\ format\ available\ but\ not\ better\ that\ 480p
+$\ youtube\-dl\ \-f\ \[aq]bestvideo[height<=480]+bestaudio/best[height<=480]\[aq]
+
+#\ Download\ best\ video\ only\ format\ but\ no\ bigger\ than\ 50\ MB
+$\ youtube\-dl\ \-f\ \[aq]best[filesize<50M]\[aq]
+
+#\ Download\ best\ format\ available\ via\ direct\ link\ over\ HTTP/HTTPS\ protocol
+$\ youtube\-dl\ \-f\ \[aq](bestvideo+bestaudio/best)[protocol^=http]\[aq]
+
+#\ Download\ the\ best\ video\ format\ and\ the\ best\ audio\ format\ without\ merging\ them
+$\ youtube\-dl\ \-f\ \[aq]bestvideo,bestaudio\[aq]\ \-o\ \[aq]%(title)s.f%(format_id)s.%(ext)s\[aq]
+\f[]
+.fi
+.PP
+Note that in the last example, an output template is recommended as
+bestvideo and bestaudio may have the same file name.
+.SH VIDEO SELECTION
+.PP
+Videos can be filtered by their upload date using the options
+\f[C]\-\-date\f[], \f[C]\-\-datebefore\f[] or \f[C]\-\-dateafter\f[].
+They accept dates in two formats:
+.IP \[bu] 2
+Absolute dates: Dates in the format \f[C]YYYYMMDD\f[].
+.IP \[bu] 2
+Relative dates: Dates in the format
+\f[C](now|today)[+\-][0\-9](day|week|month|year)(s)?\f[]
+.PP
+Examples:
+.IP
+.nf
+\f[C]
+#\ Download\ only\ the\ videos\ uploaded\ in\ the\ last\ 6\ months
+$\ youtube\-dl\ \-\-dateafter\ now\-6months
+
+#\ Download\ only\ the\ videos\ uploaded\ on\ January\ 1,\ 1970
+$\ youtube\-dl\ \-\-date\ 19700101
+
+$\ #\ Download\ only\ the\ videos\ uploaded\ in\ the\ 200x\ decade
+$\ youtube\-dl\ \-\-dateafter\ 20000101\ \-\-datebefore\ 20091231
+\f[]
+.fi
+.SH FAQ
+.SS How do I update youtube\-dl?
+.PP
+If you\[aq]ve followed our manual installation
+instructions (http://rg3.github.io/youtube-dl/download.html), you can
+simply run \f[C]youtube\-dl\ \-U\f[] (or, on Linux,
+\f[C]sudo\ youtube\-dl\ \-U\f[]).
+.PP
+If you have used pip, a simple
+\f[C]sudo\ pip\ install\ \-U\ youtube\-dl\f[] is sufficient to update.
+.PP
+If you have installed youtube\-dl using a package manager like
+\f[I]apt\-get\f[] or \f[I]yum\f[], use the standard system update
+mechanism to update.
+Note that distribution packages are often outdated.
+As a rule of thumb, youtube\-dl releases at least once a month, and
+often weekly or even daily.
+Simply go to http://yt\-dl.org/ to find out the current version.
+Unfortunately, there is nothing we youtube\-dl developers can do if your
+distribution serves a really outdated version.
+You can (and should) complain to your distribution in their bugtracker
+or support forum.
+.PP
+As a last resort, you can also uninstall the version installed by your
+package manager and follow our manual installation instructions.
+For that, remove the distribution\[aq]s package, with a line like
+.IP
+.nf
+\f[C]
+sudo\ apt\-get\ remove\ \-y\ youtube\-dl
+\f[]
+.fi
+.PP
+Afterwards, simply follow our manual installation
+instructions (http://rg3.github.io/youtube-dl/download.html):
+.IP
+.nf
+\f[C]
+sudo\ wget\ https://yt\-dl.org/latest/youtube\-dl\ \-O\ /usr/local/bin/youtube\-dl
+sudo\ chmod\ a+x\ /usr/local/bin/youtube\-dl
+hash\ \-r
+\f[]
+.fi
+.PP
+Again, from then on you\[aq]ll be able to update with
+\f[C]sudo\ youtube\-dl\ \-U\f[].
+.SS youtube\-dl is extremely slow to start on Windows
+.PP
+Add a file exclusion for \f[C]youtube\-dl.exe\f[] in Windows Defender
+settings.
+.SS I\[aq]m getting an error
+\f[C]Unable\ to\ extract\ OpenGraph\ title\f[] on YouTube playlists
+.PP
+YouTube changed their playlist format in March 2014 and later on, so
+you\[aq]ll need at least youtube\-dl 2014.07.25 to download all YouTube
+videos.
+.PP
+If you have installed youtube\-dl with a package manager, pip, setup.py
+or a tarball, please use that to update.
+Note that Ubuntu packages do not seem to get updated anymore.
+Since we are not affiliated with Ubuntu, there is little we can do.
+Feel free to report
+bugs (https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to
+the Ubuntu packaging
+people (mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl)
+\- all they have to do is update the package to a somewhat recent
+version.
+See above for a way to update.
+.SS I\[aq]m getting an error when trying to use output template:
+\f[C]error:\ using\ output\ template\ conflicts\ with\ using\ title,\ video\ ID\ or\ auto\ number\f[]
+.PP
+Make sure you are not using \f[C]\-o\f[] with any of these options
+\f[C]\-t\f[], \f[C]\-\-title\f[], \f[C]\-\-id\f[], \f[C]\-A\f[] or
+\f[C]\-\-auto\-number\f[] set in command line or in a configuration
+file.
+Remove the latter if any.
+.SS Do I always have to pass \f[C]\-citw\f[]?
+.PP
+By default, youtube\-dl intends to have the best options (incidentally,
+if you have a convincing case that these should be different, please
+file an issue where you explain that (https://yt-dl.org/bug)).
+Therefore, it is unnecessary and sometimes harmful to copy long option
+strings from webpages.
+In particular, the only option out of \f[C]\-citw\f[] that is regularly
+useful is \f[C]\-i\f[].
+.SS Can you please put the \f[C]\-b\f[] option back?
+.PP
+Most people asking this question are not aware that youtube\-dl now
+defaults to downloading the highest available quality as reported by
+YouTube, which will be 1080p or 720p in some cases, so you no longer
+need the \f[C]\-b\f[] option.
+For some specific videos, maybe YouTube does not report them to be
+available in a specific high quality format you\[aq]re interested in.
+In that case, simply request it with the \f[C]\-f\f[] option and
+youtube\-dl will try to download it.
+.SS I get HTTP error 402 when trying to download a video. What\[aq]s
+this?
+.PP
+Apparently YouTube requires you to pass a CAPTCHA test if you download
+too much.
+We\[aq]re considering to provide a way to let you solve the
+CAPTCHA (https://github.com/rg3/youtube-dl/issues/154), but at the
+moment, your best course of action is pointing a web browser to the
+youtube URL, solving the CAPTCHA, and restart youtube\-dl.
+.SS Do I need any other programs?
+.PP
+youtube\-dl works fine on its own on most sites.
+However, if you want to convert video/audio, you\[aq]ll need
+avconv (https://libav.org/) or ffmpeg (https://www.ffmpeg.org/).
+On some sites \- most notably YouTube \- videos can be retrieved in a
+higher quality format without sound.
+youtube\-dl will detect whether avconv/ffmpeg is present and
+automatically pick the best option.
+.PP
+Videos or video formats streamed via RTMP protocol can only be
+downloaded when rtmpdump (https://rtmpdump.mplayerhq.hu/) is installed.
+Downloading MMS and RTSP videos requires either
+mplayer (http://mplayerhq.hu/) or mpv (https://mpv.io/) to be installed.
+.SS I have downloaded a video but how can I play it?
+.PP
+Once the video is fully downloaded, use any video player, such as
+mpv (https://mpv.io/), vlc (http://www.videolan.org/) or
+mplayer (http://www.mplayerhq.hu/).
+.SS I extracted a video URL with \f[C]\-g\f[], but it does not play on
+another machine / in my web browser.
+.PP
+It depends a lot on the service.
+In many cases, requests for the video (to download/play it) must come
+from the same IP address and with the same cookies and/or HTTP headers.
+Use the \f[C]\-\-cookies\f[] option to write the required cookies into a
+file, and advise your downloader to read cookies from that file.
+Some sites also require a common user agent to be used, use
+\f[C]\-\-dump\-user\-agent\f[] to see the one in use by youtube\-dl.
+You can also get necessary cookies and HTTP headers from JSON output
+obtained with \f[C]\-\-dump\-json\f[].
+.PP
+It may be beneficial to use IPv6; in some cases, the restrictions are
+only applied to IPv4.
+Some services (sometimes only for a subset of videos) do not restrict
+the video URL by IP address, cookie, or user\-agent, but these are the
+exception rather than the rule.
+.PP
+Please bear in mind that some URL protocols are \f[B]not\f[] supported
+by browsers out of the box, including RTMP.
+If you are using \f[C]\-g\f[], your own downloader must support these as
+well.
+.PP
+If you want to play the video on a machine that is not running
+youtube\-dl, you can relay the video content from the machine that runs
+youtube\-dl.
+You can use \f[C]\-o\ \-\f[] to let youtube\-dl stream a video to
+stdout, or simply allow the player to download the files written by
+youtube\-dl in turn.
+.SS ERROR: no fmt_url_map or conn information found in video info
+.PP
+YouTube has switched to a new video info format in July 2011 which is
+not supported by old versions of youtube\-dl.
+See above (#how-do-i-update-youtube-dl) for how to update youtube\-dl.
+.SS ERROR: unable to download video
+.PP
+YouTube requires an additional signature since September 2012 which is
+not supported by old versions of youtube\-dl.
+See above (#how-do-i-update-youtube-dl) for how to update youtube\-dl.
+.SS Video URL contains an ampersand and I\[aq]m getting some strange
+output \f[C][1]\ 2839\f[] or
+\f[C]\[aq]v\[aq]\ is\ not\ recognized\ as\ an\ internal\ or\ external\ command\f[]
+.PP
+That\[aq]s actually the output from your shell.
+Since ampersand is one of the special shell characters it\[aq]s
+interpreted by the shell preventing you from passing the whole URL to
+youtube\-dl.
+To disable your shell from interpreting the ampersands (or any other
+special characters) you have to either put the whole URL in quotes or
+escape them with a backslash (which approach will work depends on your
+shell).
+.PP
+For example if your URL is
+https://www.youtube.com/watch?t=4&v=BaW_jenozKc you should end up with
+following command:
+.PP
+\f[C]youtube\-dl\ \[aq]https://www.youtube.com/watch?t=4&v=BaW_jenozKc\[aq]\f[]
+.PP
+or
+.PP
+\f[C]youtube\-dl\ https://www.youtube.com/watch?t=4\\&v=BaW_jenozKc\f[]
+.PP
+For Windows you have to use the double quotes:
+.PP
+\f[C]youtube\-dl\ "https://www.youtube.com/watch?t=4&v=BaW_jenozKc"\f[]
+.SS ExtractorError: Could not find JS function u\[aq]OF\[aq]
+.PP
+In February 2015, the new YouTube player contained a character sequence
+in a string that was misinterpreted by old versions of youtube\-dl.
+See above (#how-do-i-update-youtube-dl) for how to update youtube\-dl.
+.SS HTTP Error 429: Too Many Requests or 402: Payment Required
+.PP
+These two error codes indicate that the service is blocking your IP
+address because of overuse.
+Contact the service and ask them to unblock your IP address, or \- if
+you have acquired a whitelisted IP address already \- use the
+\f[C]\-\-proxy\f[] or \f[C]\-\-source\-address\f[]
+options (#network-options) to select another IP address.
+.SS SyntaxError: Non\-ASCII character
+.PP
+The error
+.IP
+.nf
+\f[C]
+File\ "youtube\-dl",\ line\ 2
+SyntaxError:\ Non\-ASCII\ character\ \[aq]\\x93\[aq]\ ...
+\f[]
+.fi
+.PP
+means you\[aq]re using an outdated version of Python.
+Please update to Python 2.6 or 2.7.
+.SS What is this binary file? Where has the code gone?
+.PP
+Since June 2012 (#342 (https://github.com/rg3/youtube-dl/issues/342))
+youtube\-dl is packed as an executable zipfile, simply unzip it (might
+need renaming to \f[C]youtube\-dl.zip\f[] first on some systems) or
+clone the git repository, as laid out above.
+If you modify the code, you can run it by executing the
+\f[C]__main__.py\f[] file.
+To recompile the executable, run \f[C]make\ youtube\-dl\f[].
+.SS The exe throws an error due to missing \f[C]MSVCR100.dll\f[]
+.PP
+To run the exe you need to install first the Microsoft Visual C++ 2010
+Redistributable Package
+(x86) (https://www.microsoft.com/en-US/download/details.aspx?id=5555).
+.SS On Windows, how should I set up ffmpeg and youtube\-dl? Where should
+I put the exe files?
+.PP
+If you put youtube\-dl and ffmpeg in the same directory that you\[aq]re
+running the command from, it will work, but that\[aq]s rather
+cumbersome.
+.PP
+To make a different directory work \- either for ffmpeg, or for
+youtube\-dl, or for both \- simply create the directory (say,
+\f[C]C:\\bin\f[], or \f[C]C:\\Users\\<User\ name>\\bin\f[]), put all the
+executables directly in there, and then set your PATH environment
+variable (https://www.java.com/en/download/help/path.xml) to include
+that directory.
+.PP
+From then on, after restarting your shell, you will be able to access
+both youtube\-dl and ffmpeg (and youtube\-dl will be able to find
+ffmpeg) by simply typing \f[C]youtube\-dl\f[] or \f[C]ffmpeg\f[], no
+matter what directory you\[aq]re in.
+.SS How do I put downloads into a specific folder?
+.PP
+Use the \f[C]\-o\f[] to specify an output template (#output-template),
+for example \f[C]\-o\ "/home/user/videos/%(title)s\-%(id)s.%(ext)s"\f[].
+If you want this for all of your downloads, put the option into your
+configuration file (#configuration).
+.SS How do I download a video starting with a \f[C]\-\f[]?
+.PP
+Either prepend \f[C]http://www.youtube.com/watch?v=\f[] or separate the
+ID from the options with \f[C]\-\-\f[]:
+.IP
+.nf
+\f[C]
+youtube\-dl\ \-\-\ \-wNyEUrxzFU
+youtube\-dl\ "http://www.youtube.com/watch?v=\-wNyEUrxzFU"
+\f[]
+.fi
+.SS How do I pass cookies to youtube\-dl?
+.PP
+Use the \f[C]\-\-cookies\f[] option, for example
+\f[C]\-\-cookies\ /path/to/cookies/file.txt\f[].
+.PP
+In order to extract cookies from browser use any conforming browser
+extension for exporting cookies.
+For example,
+cookies.txt (https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg)
+(for Chrome) or Export
+Cookies (https://addons.mozilla.org/en-US/firefox/addon/export-cookies/)
+(for Firefox).
+.PP
+Note that the cookies file must be in Mozilla/Netscape format and the
+first line of the cookies file must be either
+\f[C]#\ HTTP\ Cookie\ File\f[] or
+\f[C]#\ Netscape\ HTTP\ Cookie\ File\f[].
+Make sure you have correct newline
+format (https://en.wikipedia.org/wiki/Newline) in the cookies file and
+convert newlines if necessary to correspond with your OS, namely
+\f[C]CRLF\f[] (\f[C]\\r\\n\f[]) for Windows and \f[C]LF\f[]
+(\f[C]\\n\f[]) for Unix and Unix\-like systems (Linux, Mac OS, etc.).
+\f[C]HTTP\ Error\ 400:\ Bad\ Request\f[] when using \f[C]\-\-cookies\f[]
+is a good sign of invalid newline format.
+.PP
+Passing cookies to youtube\-dl is a good way to workaround login when a
+particular extractor does not implement it explicitly.
+Another use case is working around
+CAPTCHA (https://en.wikipedia.org/wiki/CAPTCHA) some websites require
+you to solve in particular cases in order to get access (e.g.
+YouTube, CloudFlare).
+.SS How do I stream directly to media player?
+.PP
+You will first need to tell youtube\-dl to stream media to stdout with
+\f[C]\-o\ \-\f[], and also tell your media player to read from stdin (it
+must be capable of this for streaming) and then pipe former to latter.
+For example, streaming to vlc (http://www.videolan.org/) can be achieved
+with:
+.IP
+.nf
+\f[C]
+youtube\-dl\ \-o\ \-\ "http://www.youtube.com/watch?v=BaW_jenozKcj"\ |\ vlc\ \-
+\f[]
+.fi
+.SS How do I download only new videos from a playlist?
+.PP
+Use download\-archive feature.
+With this feature you should initially download the complete playlist
+with \f[C]\-\-download\-archive\ /path/to/download/archive/file.txt\f[]
+that will record identifiers of all the videos in a special file.
+Each subsequent run with the same \f[C]\-\-download\-archive\f[] will
+download only new videos and skip all videos that have been downloaded
+before.
+Note that only successful downloads are recorded in the file.
+.PP
+For example, at first,
+.IP
+.nf
+\f[C]
+youtube\-dl\ \-\-download\-archive\ archive.txt\ "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
+\f[]
+.fi
+.PP
+will download the complete \f[C]PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re\f[]
+playlist and create a file \f[C]archive.txt\f[].
+Each subsequent run will only download new videos if any:
+.IP
+.nf
+\f[C]
+youtube\-dl\ \-\-download\-archive\ archive.txt\ "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
+\f[]
+.fi
+.SS Should I add \f[C]\-\-hls\-prefer\-native\f[] into my config?
+.PP
+When youtube\-dl detects an HLS video, it can download it either with
+the built\-in downloader or ffmpeg.
+Since many HLS streams are slightly invalid and ffmpeg/youtube\-dl each
+handle some invalid cases better than the other, there is an option to
+switch the downloader if needed.
+.PP
+When youtube\-dl knows that one particular downloader works better for a
+given website, that downloader will be picked.
+Otherwise, youtube\-dl will pick the best downloader for general
+compatibility, which at the moment happens to be ffmpeg.
+This choice may change in future versions of youtube\-dl, with
+improvements of the built\-in downloader and/or ffmpeg.
+.PP
+In particular, the generic extractor (used when your website is not in
+the list of supported sites by
+youtube\-dl (http://rg3.github.io/youtube-dl/supportedsites.html) cannot
+mandate one specific downloader.
+.PP
+If you put either \f[C]\-\-hls\-prefer\-native\f[] or
+\f[C]\-\-hls\-prefer\-ffmpeg\f[] into your configuration, a different
+subset of videos will fail to download correctly.
+Instead, it is much better to file an issue (https://yt-dl.org/bug) or a
+pull request which details why the native or the ffmpeg HLS downloader
+is a better choice for your use case.
+.SS Can you add support for this anime video site, or site which shows
+current movies for free?
+.PP
+As a matter of policy (as well as legality), youtube\-dl does not
+include support for services that specialize in infringing copyright.
+As a rule of thumb, if you cannot easily find a video that the service
+is quite obviously allowed to distribute (i.e.
+that has been uploaded by the creator, the creator\[aq]s distributor, or
+is published under a free license), the service is probably unfit for
+inclusion to youtube\-dl.
+.PP
+A note on the service that they don\[aq]t host the infringing content,
+but just link to those who do, is evidence that the service should
+\f[B]not\f[] be included into youtube\-dl.
+The same goes for any DMCA note when the whole front page of the service
+is filled with videos they are not allowed to distribute.
+A "fair use" note is equally unconvincing if the service shows
+copyright\-protected videos in full without authorization.
+.PP
+Support requests for services that \f[B]do\f[] purchase the rights to
+distribute their content are perfectly fine though.
+If in doubt, you can simply include a source that mentions the
+legitimate purchase of content.
+.SS How can I speed up work on my issue?
+.PP
+(Also known as: Help, my important issue not being solved!) The
+youtube\-dl core developer team is quite small.
+While we do our best to solve as many issues as possible, sometimes that
+can take quite a while.
+To speed up your issue, here\[aq]s what you can do:
+.PP
+First of all, please do report the issue at our issue
+tracker (https://yt-dl.org/bugs).
+That allows us to coordinate all efforts by users and developers, and
+serves as a unified point.
+Unfortunately, the youtube\-dl project has grown too large to use
+personal email as an effective communication channel.
+.PP
+Please read the bug reporting instructions (#bugs) below.
+A lot of bugs lack all the necessary information.
+If you can, offer proxy, VPN, or shell access to the youtube\-dl
+developers.
+If you are able to, test the issue from multiple computers in multiple
+countries to exclude local censorship or misconfiguration issues.
+.PP
+If nobody is interested in solving your issue, you are welcome to take
+matters into your own hands and submit a pull request (or coerce/pay
+somebody else to do so).
+.PP
+Feel free to bump the issue from time to time by writing a small comment
+("Issue is still present in youtube\-dl version ...from France, but
+fixed from Belgium"), but please not more than once a month.
+Please do not declare your issue as \f[C]important\f[] or
+\f[C]urgent\f[].
+.SS How can I detect whether a given URL is supported by youtube\-dl?
+.PP
+For one, have a look at the list of supported
+sites (docs/supportedsites.md).
+Note that it can sometimes happen that the site changes its URL scheme
+(say, from http://example.com/video/1234567 to
+http://example.com/v/1234567 ) and youtube\-dl reports an URL of a
+service in that list as unsupported.
+In that case, simply report a bug.
+.PP
+It is \f[I]not\f[] possible to detect whether a URL is supported or not.
+That\[aq]s because youtube\-dl contains a generic extractor which
+matches \f[B]all\f[] URLs.
+You may be tempted to disable, exclude, or remove the generic extractor,
+but the generic extractor not only allows users to extract videos from
+lots of websites that embed a video from another service, but may also
+be used to extract video from a service that it\[aq]s hosting itself.
+Therefore, we neither recommend nor support disabling, excluding, or
+removing the generic extractor.
+.PP
+If you want to find out whether a given URL is supported, simply call
+youtube\-dl with it.
+If you get no videos back, chances are the URL is either not referring
+to a video or unsupported.
+You can find out which by examining the output (if you run youtube\-dl
+on the console) or catching an \f[C]UnsupportedError\f[] exception if
+you run it from a Python program.
+.SH Why do I need to go through that much red tape when filing bugs?
+.PP
+Before we had the issue template, despite our extensive bug reporting
+instructions (#bugs), about 80% of the issue reports we got were
+useless, for instance because people used ancient versions hundreds of
+releases old, because of simple syntactic errors (not in youtube\-dl but
+in general shell usage), because the problem was already reported
+multiple times before, because people did not actually read an error
+message, even if it said "please install ffmpeg", because people did not
+mention the URL they were trying to download and many more simple,
+easy\-to\-avoid problems, many of whom were totally unrelated to
+youtube\-dl.
+.PP
+youtube\-dl is an open\-source project manned by too few volunteers, so
+we\[aq]d rather spend time fixing bugs where we are certain none of
+those simple problems apply, and where we can be reasonably confident to
+be able to reproduce the issue without asking the reporter repeatedly.
+As such, the output of \f[C]youtube\-dl\ \-v\ YOUR_URL_HERE\f[] is
+really all that\[aq]s required to file an issue.
+The issue template also guides you through some basic steps you can do,
+such as checking that your version of youtube\-dl is current.
+.SH DEVELOPER INSTRUCTIONS
+.PP
+Most users do not need to build youtube\-dl and can download the
+builds (http://rg3.github.io/youtube-dl/download.html) or get them from
+their distribution.
+.PP
+To run youtube\-dl as a developer, you don\[aq]t need to build anything
+either.
+Simply execute
+.IP
+.nf
+\f[C]
+python\ \-m\ youtube_dl
+\f[]
+.fi
+.PP
+To run the test, simply invoke your favorite test runner, or execute a
+test file directly; any of the following work:
+.IP
+.nf
+\f[C]
+python\ \-m\ unittest\ discover
+python\ test/test_download.py
+nosetests
+\f[]
+.fi
+.PP
+If you want to create a build of youtube\-dl yourself, you\[aq]ll need
+.IP \[bu] 2
+python
+.IP \[bu] 2
+make (only GNU make is supported)
+.IP \[bu] 2
+pandoc
+.IP \[bu] 2
+zip
+.IP \[bu] 2
+nosetests
+.SS Adding support for a new site
+.PP
+If you want to add support for a new site, first of all \f[B]make
+sure\f[] this site is \f[B]not dedicated to copyright
+infringement (README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)\f[].
+youtube\-dl does \f[B]not support\f[] such sites thus pull requests
+adding support for them \f[B]will be rejected\f[].
+.PP
+After you have ensured this site is distributing its content legally,
+you can follow this quick list (assuming your service is called
+\f[C]yourextractor\f[]):
+.IP " 1." 4
+Fork this repository (https://github.com/rg3/youtube-dl/fork)
+.IP " 2." 4
+Check out the source code with:
+.RS 4
+.IP
+.nf
+\f[C]
+git\ clone\ git\@github.com:YOUR_GITHUB_USERNAME/youtube\-dl.git
+\f[]
+.fi
+.RE
+.IP " 3." 4
+Start a new git branch with
+.RS 4
+.IP
+.nf
+\f[C]
+cd\ youtube\-dl
+git\ checkout\ \-b\ yourextractor
+\f[]
+.fi
+.RE
+.IP " 4." 4
+Start with this simple template and save it to
+\f[C]youtube_dl/extractor/yourextractor.py\f[]:
+.RS 4
+.IP
+.nf
+\f[C]
+#\ coding:\ utf\-8
+from\ __future__\ import\ unicode_literals
+
+from\ .common\ import\ InfoExtractor
+
+
+class\ YourExtractorIE(InfoExtractor):
+\ \ \ \ _VALID_URL\ =\ r\[aq]https?://(?:www\\.)?yourextractor\\.com/watch/(?P<id>[0\-9]+)\[aq]
+\ \ \ \ _TEST\ =\ {
+\ \ \ \ \ \ \ \ \[aq]url\[aq]:\ \[aq]http://yourextractor.com/watch/42\[aq],
+\ \ \ \ \ \ \ \ \[aq]md5\[aq]:\ \[aq]TODO:\ md5\ sum\ of\ the\ first\ 10241\ bytes\ of\ the\ video\ file\ (use\ \-\-test)\[aq],
+\ \ \ \ \ \ \ \ \[aq]info_dict\[aq]:\ {
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]id\[aq]:\ \[aq]42\[aq],
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]ext\[aq]:\ \[aq]mp4\[aq],
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]title\[aq]:\ \[aq]Video\ title\ goes\ here\[aq],
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]thumbnail\[aq]:\ r\[aq]re:^https?://.*\\.jpg$\[aq],
+\ \ \ \ \ \ \ \ \ \ \ \ #\ TODO\ more\ properties,\ either\ as:
+\ \ \ \ \ \ \ \ \ \ \ \ #\ *\ A\ value
+\ \ \ \ \ \ \ \ \ \ \ \ #\ *\ MD5\ checksum;\ start\ the\ string\ with\ md5:
+\ \ \ \ \ \ \ \ \ \ \ \ #\ *\ A\ regular\ expression;\ start\ the\ string\ with\ re:
+\ \ \ \ \ \ \ \ \ \ \ \ #\ *\ Any\ Python\ type\ (for\ example\ int\ or\ float)
+\ \ \ \ \ \ \ \ }
+\ \ \ \ }
+
+\ \ \ \ def\ _real_extract(self,\ url):
+\ \ \ \ \ \ \ \ video_id\ =\ self._match_id(url)
+\ \ \ \ \ \ \ \ webpage\ =\ self._download_webpage(url,\ video_id)
+
+\ \ \ \ \ \ \ \ #\ TODO\ more\ code\ goes\ here,\ for\ example\ ...
+\ \ \ \ \ \ \ \ title\ =\ self._html_search_regex(r\[aq]<h1>(.+?)</h1>\[aq],\ webpage,\ \[aq]title\[aq])
+
+\ \ \ \ \ \ \ \ return\ {
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]id\[aq]:\ video_id,
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]title\[aq]:\ title,
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]description\[aq]:\ self._og_search_description(webpage),
+\ \ \ \ \ \ \ \ \ \ \ \ \[aq]uploader\[aq]:\ self._search_regex(r\[aq]<div[^>]+id="uploader"[^>]*>([^<]+)<\[aq],\ webpage,\ \[aq]uploader\[aq],\ fatal=False),
+\ \ \ \ \ \ \ \ \ \ \ \ #\ TODO\ more\ properties\ (see\ youtube_dl/extractor/common.py)
+\ \ \ \ \ \ \ \ }
+\f[]
+.fi
+.RE
+.IP " 5." 4
+Add an import in
+\f[C]youtube_dl/extractor/extractors.py\f[] (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
+.IP " 6." 4
+Run
+\f[C]python\ test/test_download.py\ TestDownload.test_YourExtractor\f[].
+This \f[I]should fail\f[] at first, but you can continually re\-run it
+until you\[aq]re done.
+If you decide to add more than one test, then rename \f[C]_TEST\f[] to
+\f[C]_TESTS\f[] and make it into a list of dictionaries.
+The tests will then be named \f[C]TestDownload.test_YourExtractor\f[],
+\f[C]TestDownload.test_YourExtractor_1\f[],
+\f[C]TestDownload.test_YourExtractor_2\f[], etc.
+.IP " 7." 4
+Have a look at
+\f[C]youtube_dl/extractor/common.py\f[] (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py)
+for possible helper methods and a detailed description of what your
+extractor should and may
+return (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252).
+Add tests and code for as many as you want.
+.IP " 8." 4
+Make sure your code follows youtube\-dl coding
+conventions (#youtube-dl-coding-conventions) and check the code with
+flake8 (https://pypi.python.org/pypi/flake8).
+Also make sure your code works under all Python (http://www.python.org/)
+versions claimed supported by youtube\-dl, namely 2.6, 2.7, and 3.2+.
+.IP " 9." 4
+When the tests pass, add (http://git-scm.com/docs/git-add) the new files
+and commit (http://git-scm.com/docs/git-commit) them and
+push (http://git-scm.com/docs/git-push) the result, like this:
+.RS 4
+.IP
+.nf
+\f[C]
+$\ git\ add\ youtube_dl/extractor/extractors.py
+$\ git\ add\ youtube_dl/extractor/yourextractor.py
+$\ git\ commit\ \-m\ \[aq][yourextractor]\ Add\ new\ extractor\[aq]
+$\ git\ push\ origin\ yourextractor
+\f[]
+.fi
+.RE
+.IP "10." 4
+Finally, create a pull
+request (https://help.github.com/articles/creating-a-pull-request).
+We\[aq]ll then review and merge it.
+.PP
+In any case, thank you very much for your contributions!
+.SS youtube\-dl coding conventions
+.PP
+This section introduces a guide lines for writing idiomatic, robust and
+future\-proof extractor code.
+.PP
+Extractors are very fragile by nature since they depend on the layout of
+the source data provided by 3rd party media hosters out of your control
+and this layout tends to change.
+As an extractor implementer your task is not only to write code that
+will extract media links and metadata correctly but also to minimize
+dependency on the source\[aq]s layout and even to make the code foresee
+potential future changes and be ready for that.
+This is important because it will allow the extractor not to break on
+minor layout changes thus keeping old youtube\-dl versions working.
+Even though this breakage issue is easily fixed by emitting a new
+version of youtube\-dl with a fix incorporated, all the previous
+versions become broken in all repositories and distros\[aq] packages
+that may not be so prompt in fetching the update from us.
+Needless to say, some non rolling release distros may never receive an
+update at all.
+.SS Mandatory and optional metafields
+.PP
+For extraction to work youtube\-dl relies on metadata your extractor
+extracts and provides to youtube\-dl expressed by an information
+dictionary (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257)
+or simply \f[I]info dict\f[].
+Only the following meta fields in the \f[I]info dict\f[] are considered
+mandatory for a successful extraction process by youtube\-dl:
+.IP \[bu] 2
+\f[C]id\f[] (media identifier)
+.IP \[bu] 2
+\f[C]title\f[] (media title)
+.IP \[bu] 2
+\f[C]url\f[] (media download URL) or \f[C]formats\f[]
+.PP
+In fact only the last option is technically mandatory (i.e.
+if you can\[aq]t figure out the download location of the media the
+extraction does not make any sense).
+But by convention youtube\-dl also treats \f[C]id\f[] and \f[C]title\f[]
+as mandatory.
+Thus the aforementioned metafields are the critical data that the
+extraction does not make any sense without and if any of them fail to be
+extracted then the extractor is considered completely broken.
+.PP
+Any
+field (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257)
+apart from the aforementioned ones are considered \f[B]optional\f[].
+That means that extraction should be \f[B]tolerant\f[] to situations
+when sources for these fields can potentially be unavailable (even if
+they are always available at the moment) and \f[B]future\-proof\f[] in
+order not to break the extraction of general purpose mandatory fields.
+.SS Example
+.PP
+Say you have some source dictionary \f[C]meta\f[] that you\[aq]ve
+fetched as JSON with HTTP request and it has a key \f[C]summary\f[]:
+.IP
+.nf
+\f[C]
+meta\ =\ self._download_json(url,\ video_id)
+\f[]
+.fi
+.PP
+Assume at this point \f[C]meta\f[]\[aq]s layout is:
+.IP
+.nf
+\f[C]
+{
+\ \ \ \ ...
+\ \ \ \ "summary":\ "some\ fancy\ summary\ text",
+\ \ \ \ ...
+}
+\f[]
+.fi
+.PP
+Assume you want to extract \f[C]summary\f[] and put it into the
+resulting info dict as \f[C]description\f[].
+Since \f[C]description\f[] is an optional meta field you should be ready
+that this key may be missing from the \f[C]meta\f[] dict, so that you
+should extract it like:
+.IP
+.nf
+\f[C]
+description\ =\ meta.get(\[aq]summary\[aq])\ \ #\ correct
+\f[]
+.fi
+.PP
+and not like:
+.IP
+.nf
+\f[C]
+description\ =\ meta[\[aq]summary\[aq]]\ \ #\ incorrect
+\f[]
+.fi
+.PP
+The latter will break extraction process with \f[C]KeyError\f[] if
+\f[C]summary\f[] disappears from \f[C]meta\f[] at some later time but
+with the former approach extraction will just go ahead with
+\f[C]description\f[] set to \f[C]None\f[] which is perfectly fine
+(remember \f[C]None\f[] is equivalent to the absence of data).
+.PP
+Similarly, you should pass \f[C]fatal=False\f[] when extracting optional
+data from a webpage with \f[C]_search_regex\f[],
+\f[C]_html_search_regex\f[] or similar methods, for instance:
+.IP
+.nf
+\f[C]
+description\ =\ self._search_regex(
+\ \ \ \ r\[aq]<span[^>]+id="title"[^>]*>([^<]+)<\[aq],
+\ \ \ \ webpage,\ \[aq]description\[aq],\ fatal=False)
+\f[]
+.fi
+.PP
+With \f[C]fatal\f[] set to \f[C]False\f[] if \f[C]_search_regex\f[]
+fails to extract \f[C]description\f[] it will emit a warning and
+continue extraction.
+.PP
+You can also pass \f[C]default=<some\ fallback\ value>\f[], for example:
+.IP
+.nf
+\f[C]
+description\ =\ self._search_regex(
+\ \ \ \ r\[aq]<span[^>]+id="title"[^>]*>([^<]+)<\[aq],
+\ \ \ \ webpage,\ \[aq]description\[aq],\ default=None)
+\f[]
+.fi
+.PP
+On failure this code will silently continue the extraction with
+\f[C]description\f[] set to \f[C]None\f[].
+That is useful for metafields that may or may not be present.
+.SS Provide fallbacks
+.PP
+When extracting metadata try to do so from multiple sources.
+For example if \f[C]title\f[] is present in several places, try
+extracting from at least some of them.
+This makes it more future\-proof in case some of the sources become
+unavailable.
+.SS Example
+.PP
+Say \f[C]meta\f[] from the previous example has a \f[C]title\f[] and you
+are about to extract it.
+Since \f[C]title\f[] is a mandatory meta field you should end up with
+something like:
+.IP
+.nf
+\f[C]
+title\ =\ meta[\[aq]title\[aq]]
+\f[]
+.fi
+.PP
+If \f[C]title\f[] disappears from \f[C]meta\f[] in future due to some
+changes on the hoster\[aq]s side the extraction would fail since
+\f[C]title\f[] is mandatory.
+That\[aq]s expected.
+.PP
+Assume that you have some another source you can extract \f[C]title\f[]
+from, for example \f[C]og:title\f[] HTML meta of a \f[C]webpage\f[].
+In this case you can provide a fallback scenario:
+.IP
+.nf
+\f[C]
+title\ =\ meta.get(\[aq]title\[aq])\ or\ self._og_search_title(webpage)
+\f[]
+.fi
+.PP
+This code will try to extract from \f[C]meta\f[] first and if it fails
+it will try extracting \f[C]og:title\f[] from a \f[C]webpage\f[].
+.SS Make regular expressions flexible
+.PP
+When using regular expressions try to write them fuzzy and flexible.
+.SS Example
+.PP
+Say you need to extract \f[C]title\f[] from the following HTML code:
+.IP
+.nf
+\f[C]
+<span\ style="position:\ absolute;\ left:\ 910px;\ width:\ 90px;\ float:\ right;\ z\-index:\ 9999;"\ class="title">some\ fancy\ title</span>
+\f[]
+.fi
+.PP
+The code for that task should look similar to:
+.IP
+.nf
+\f[C]
+title\ =\ self._search_regex(
+\ \ \ \ r\[aq]<span[^>]+class="title"[^>]*>([^<]+)\[aq],\ webpage,\ \[aq]title\[aq])
+\f[]
+.fi
+.PP
+Or even better:
+.IP
+.nf
+\f[C]
+title\ =\ self._search_regex(
+\ \ \ \ r\[aq]<span[^>]+class=(["\\\[aq]])title\\1[^>]*>(?P<title>[^<]+)\[aq],
+\ \ \ \ webpage,\ \[aq]title\[aq],\ group=\[aq]title\[aq])
+\f[]
+.fi
+.PP
+Note how you tolerate potential changes in the \f[C]style\f[]
+attribute\[aq]s value or switch from using double quotes to single for
+\f[C]class\f[] attribute:
+.PP
+The code definitely should not look like:
+.IP
+.nf
+\f[C]
+title\ =\ self._search_regex(
+\ \ \ \ r\[aq]<span\ style="position:\ absolute;\ left:\ 910px;\ width:\ 90px;\ float:\ right;\ z\-index:\ 9999;"\ class="title">(.*?)</span>\[aq],
+\ \ \ \ webpage,\ \[aq]title\[aq],\ group=\[aq]title\[aq])
+\f[]
+.fi
+.SS Use safe conversion functions
+.PP
+Wrap all extracted numeric data into safe functions from \f[C]utils\f[]:
+\f[C]int_or_none\f[], \f[C]float_or_none\f[].
+Use them for string to number conversions as well.
+.SH EMBEDDING YOUTUBE\-DL
+.PP
+youtube\-dl makes the best effort to be a good command\-line program,
+and thus should be callable from any programming language.
+If you encounter any problems parsing its output, feel free to create a
+report (https://github.com/rg3/youtube-dl/issues/new).
+.PP
+From a Python program, you can embed youtube\-dl in a more powerful
+fashion, like this:
+.IP
+.nf
+\f[C]
+from\ __future__\ import\ unicode_literals
+import\ youtube_dl
+
+ydl_opts\ =\ {}
+with\ youtube_dl.YoutubeDL(ydl_opts)\ as\ ydl:
+\ \ \ \ ydl.download([\[aq]http://www.youtube.com/watch?v=BaW_jenozKc\[aq]])
+\f[]
+.fi
+.PP
+Most likely, you\[aq]ll want to use various options.
+For a list of options available, have a look at
+\f[C]youtube_dl/YoutubeDL.py\f[] (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279).
+For a start, if you want to intercept youtube\-dl\[aq]s output, set a
+\f[C]logger\f[] object.
+.PP
+Here\[aq]s a more complete example of a program that outputs only errors
+(and a short message after the download is finished), and
+downloads/converts the video to an mp3 file:
+.IP
+.nf
+\f[C]
+from\ __future__\ import\ unicode_literals
+import\ youtube_dl
+
+
+class\ MyLogger(object):
+\ \ \ \ def\ debug(self,\ msg):
+\ \ \ \ \ \ \ \ pass
+
+\ \ \ \ def\ warning(self,\ msg):
+\ \ \ \ \ \ \ \ pass
+
+\ \ \ \ def\ error(self,\ msg):
+\ \ \ \ \ \ \ \ print(msg)
+
+
+def\ my_hook(d):
+\ \ \ \ if\ d[\[aq]status\[aq]]\ ==\ \[aq]finished\[aq]:
+\ \ \ \ \ \ \ \ print(\[aq]Done\ downloading,\ now\ converting\ ...\[aq])
+
+
+ydl_opts\ =\ {
+\ \ \ \ \[aq]format\[aq]:\ \[aq]bestaudio/best\[aq],
+\ \ \ \ \[aq]postprocessors\[aq]:\ [{
+\ \ \ \ \ \ \ \ \[aq]key\[aq]:\ \[aq]FFmpegExtractAudio\[aq],
+\ \ \ \ \ \ \ \ \[aq]preferredcodec\[aq]:\ \[aq]mp3\[aq],
+\ \ \ \ \ \ \ \ \[aq]preferredquality\[aq]:\ \[aq]192\[aq],
+\ \ \ \ }],
+\ \ \ \ \[aq]logger\[aq]:\ MyLogger(),
+\ \ \ \ \[aq]progress_hooks\[aq]:\ [my_hook],
+}
+with\ youtube_dl.YoutubeDL(ydl_opts)\ as\ ydl:
+\ \ \ \ ydl.download([\[aq]http://www.youtube.com/watch?v=BaW_jenozKc\[aq]])
+\f[]
+.fi
+.SH BUGS
+.PP
+Bugs and suggestions should be reported at:
+<https://github.com/rg3/youtube-dl/issues>.
+Unless you were prompted to or there is another pertinent reason (e.g.
+GitHub fails to accept the bug report), please do not send bug reports
+via personal email.
+For discussions, join us in the IRC channel
+#youtube\-dl (irc://chat.freenode.net/#youtube-dl) on freenode
+(webchat (http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
+.PP
+\f[B]Please include the full output of youtube\-dl when run with
+\f[C]\-v\f[]\f[], i.e.
+\f[B]add\f[] \f[C]\-v\f[] flag to \f[B]your command line\f[], copy the
+\f[B]whole\f[] output and post it in the issue body wrapped in ``` for
+better formatting.
+It should look similar to this:
+.IP
+.nf
+\f[C]
+$\ youtube\-dl\ \-v\ <your\ command\ line>
+[debug]\ System\ config:\ []
+[debug]\ User\ config:\ []
+[debug]\ Command\-line\ args:\ [u\[aq]\-v\[aq],\ u\[aq]http://www.youtube.com/watch?v=BaW_jenozKcj\[aq]]
+[debug]\ Encodings:\ locale\ cp1251,\ fs\ mbcs,\ out\ cp866,\ pref\ cp1251
+[debug]\ youtube\-dl\ version\ 2015.12.06
+[debug]\ Git\ HEAD:\ 135392e
+[debug]\ Python\ version\ 2.6.6\ \-\ Windows\-2003Server\-5.2.3790\-SP2
+[debug]\ exe\ versions:\ ffmpeg\ N\-75573\-g1d0487f,\ ffprobe\ N\-75573\-g1d0487f,\ rtmpdump\ 2.4
+[debug]\ Proxy\ map:\ {}
+\&...
+\f[]
+.fi
+.PP
+\f[B]Do not post screenshots of verbose logs; only plain text is
+acceptable.\f[]
+.PP
+The output (including the first lines) contains important debugging
+information.
+Issues without the full output are often not reproducible and therefore
+do not get solved in short order, if ever.
+.PP
+Please re\-read your issue once again to avoid a couple of common
+mistakes (you can and should use this as a checklist):
+.SS Is the description of the issue itself sufficient?
+.PP
+We often get issue reports that we cannot really decipher.
+While in most cases we eventually get the required information after
+asking back multiple times, this poses an unnecessary drain on our
+resources.
+Many contributors, including myself, are also not native speakers, so we
+may misread some parts.
+.PP
+So please elaborate on what feature you are requesting, or what bug you
+want to be fixed.
+Make sure that it\[aq]s obvious
+.IP \[bu] 2
+What the problem is
+.IP \[bu] 2
+How it could be fixed
+.IP \[bu] 2
+How your proposed solution would look like
+.PP
+If your report is shorter than two lines, it is almost certainly missing
+some of these, which makes it hard for us to respond to it.
+We\[aq]re often too polite to close the issue outright, but the missing
+info makes misinterpretation likely.
+As a committer myself, I often get frustrated by these issues, since the
+only possible way for me to move forward on them is to ask for
+clarification over and over.
+.PP
+For bug reports, this means that your report should contain the
+\f[I]complete\f[] output of youtube\-dl when called with the
+\f[C]\-v\f[] flag.
+The error message you get for (most) bugs even says so, but you would
+not believe how many of our bug reports do not contain this information.
+.PP
+If your server has multiple IPs or you suspect censorship, adding
+\f[C]\-\-call\-home\f[] may be a good idea to get more diagnostics.
+If the error is \f[C]ERROR:\ Unable\ to\ extract\ ...\f[] and you cannot
+reproduce it from multiple countries, add \f[C]\-\-dump\-pages\f[]
+(warning: this will yield a rather large output, redirect it to the file
+\f[C]log.txt\f[] by adding \f[C]>log.txt\ 2>&1\f[] to your
+command\-line) or upload the \f[C]\&.dump\f[] files you get when you add
+\f[C]\-\-write\-pages\f[] somewhere (https://gist.github.com/).
+.PP
+\f[B]Site support requests must contain an example URL\f[].
+An example URL is a URL you might want to download, like
+\f[C]http://www.youtube.com/watch?v=BaW_jenozKc\f[].
+There should be an obvious video present.
+Except under very special circumstances, the main page of a video
+service (e.g.
+\f[C]http://www.youtube.com/\f[]) is \f[I]not\f[] an example URL.
+.SS Are you using the latest version?
+.PP
+Before reporting any issue, type \f[C]youtube\-dl\ \-U\f[].
+This should report that you\[aq]re up\-to\-date.
+About 20% of the reports we receive are already fixed, but people are
+using outdated versions.
+This goes for feature requests as well.
+.SS Is the issue already documented?
+.PP
+Make sure that someone has not already opened the issue you\[aq]re
+trying to open.
+Search at the top of the window or browse the GitHub
+Issues (https://github.com/rg3/youtube-dl/search?type=Issues) of this
+repository.
+If there is an issue, feel free to write something along the lines of
+"This affects me as well, with version 2015.01.01.
+Here is some more information on the issue: ...".
+While some issues may be old, a new post into them often spurs rapid
+activity.
+.SS Why are existing options not enough?
+.PP
+Before requesting a new feature, please have a quick peek at the list of
+supported
+options (https://github.com/rg3/youtube-dl/blob/master/README.md#options).
+Many feature requests are for features that actually exist already!
+Please, absolutely do show off your work in the issue report and detail
+how the existing similar options do \f[I]not\f[] solve your problem.
+.SS Is there enough context in your bug report?
+.PP
+People want to solve problems, and often think they do us a favor by
+breaking down their larger problems (e.g.
+wanting to skip already downloaded files) to a specific request (e.g.
+requesting us to look whether the file exists before downloading the
+info page).
+However, what often happens is that they break down the problem into two
+steps: One simple, and one impossible (or extremely complicated one).
+.PP
+We are then presented with a very complicated request when the original
+problem could be solved far easier, e.g.
+by recording the downloaded video IDs in a separate file.
+To avoid this, you must include the greater context where it is
+non\-obvious.
+In particular, every feature request that does not consist of adding
+support for a new site should contain a use case scenario that explains
+in what situation the missing feature would be useful.
+.SS Does the issue involve one problem, and one problem only?
+.PP
+Some of our users seem to think there is a limit of issues they can or
+should open.
+There is no limit of issues they can or should open.
+While it may seem appealing to be able to dump all your issues into one
+ticket, that means that someone who solves one of your issues cannot
+mark the issue as closed.
+Typically, reporting a bunch of issues leads to the ticket lingering
+since nobody wants to attack that behemoth, until someone mercifully
+splits the issue into multiple ones.
+.PP
+In particular, every site support request issue should only pertain to
+services at one site (generally under a common domain, but always using
+the same backend technology).
+Do not request support for vimeo user videos, White house podcasts, and
+Google Plus pages in the same issue.
+Also, make sure that you don\[aq]t post bug reports alongside feature
+requests.
+As a rule of thumb, a feature request does not include outputs of
+youtube\-dl that are not immediately related to the feature at hand.
+Do not post reports of a network error alongside the request for a new
+video service.
+.SS Is anyone going to need the feature?
+.PP
+Only post features that you (or an incapacitated friend you can
+personally talk to) require.
+Do not post features because they seem like a good idea.
+If they are really useful, they will be requested by someone who
+requires them.
+.SS Is your question about youtube\-dl?
+.PP
+It may sound strange, but some bug reports we receive are completely
+unrelated to youtube\-dl and relate to a different, or even the
+reporter\[aq]s own, application.
+Please make sure that you are actually using youtube\-dl.
+If you are using a UI for youtube\-dl, report the bug to the maintainer
+of the actual application providing the UI.
+On the other hand, if your UI for youtube\-dl fails in some way you
+believe is related to youtube\-dl, by all means, go ahead and report the
+bug.
+.SH COPYRIGHT
+.PP
+youtube\-dl is released into the public domain by the copyright holders.
+.PP
+This README file was originally written by Daniel
+Bolton (https://github.com/dbbolton) and is likewise released into the
+public domain.
diff --git a/youtube-dl.bash-completion b/youtube-dl.bash-completion

new file mode 100644 (file)

index 0000000..b9e0d2e
--- /dev/null
+++ b/youtube-dl.bash-completion
@@ -0,0 +1,29 @@
+__youtube_dl()
+{
+    local cur prev opts fileopts diropts keywords
+    COMPREPLY=()
+    cur="${COMP_WORDS[COMP_CWORD]}"
+    prev="${COMP_WORDS[COMP_CWORD-1]}"
+    opts="--help --version --update --ignore-errors --abort-on-error --dump-user-agent --list-extractors --extractor-descriptions --force-generic-extractor --default-search --ignore-config --config-location --flat-playlist --mark-watched --no-mark-watched --no-color --proxy --socket-timeout --source-address --force-ipv4 --force-ipv6 --geo-verification-proxy --cn-verification-proxy --playlist-start --playlist-end --playlist-items --match-title --reject-title --max-downloads --min-filesize --max-filesize --date --datebefore --dateafter --min-views --max-views --match-filter --no-playlist --yes-playlist --age-limit --download-archive --include-ads --limit-rate --retries --fragment-retries --skip-unavailable-fragments --abort-on-unavailable-fragment --buffer-size --no-resize-buffer --test --playlist-reverse --playlist-random --xattr-set-filesize --hls-prefer-native --hls-prefer-ffmpeg --hls-use-mpegts --external-downloader --external-downloader-args --batch-file --id --output --autonumber-size --autonumber-start --restrict-filenames --auto-number --title --literal --no-overwrites --continue --no-continue --no-part --no-mtime --write-description --write-info-json --write-annotations --load-info-json --cookies --cache-dir --no-cache-dir --rm-cache-dir --write-thumbnail --write-all-thumbnails --list-thumbnails --quiet --no-warnings --simulate --skip-download --get-url --get-title --get-id --get-thumbnail --get-description --get-duration --get-filename --get-format --dump-json --dump-single-json --print-json --newline --no-progress --console-title --verbose --dump-pages --write-pages --youtube-print-sig-code --print-traffic --call-home --no-call-home --encoding --no-check-certificate --prefer-insecure --user-agent --referer --add-header --bidi-workaround --sleep-interval --max-sleep-interval --format --all-formats --prefer-free-formats --list-formats --youtube-include-dash-manifest --youtube-skip-dash-manifest --merge-output-format --write-sub --write-auto-sub --all-subs --list-subs --sub-format --sub-lang --username --password --twofactor --netrc --video-password --ap-mso --ap-username --ap-password --ap-list-mso --extract-audio --audio-format --audio-quality --recode-video --postprocessor-args --keep-video --no-post-overwrites --embed-subs --embed-thumbnail --add-metadata --metadata-from-title --xattrs --fixup --prefer-avconv --prefer-ffmpeg --ffmpeg-location --exec --convert-subs"
+    keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater :ythistory"
+    fileopts="-a|--batch-file|--download-archive|--cookies|--load-info"
+    diropts="--cache-dir"
+
+    if [[ ${prev} =~ ${fileopts} ]]; then
+        COMPREPLY=( $(compgen -f -- ${cur}) )
+        return 0
+    elif [[ ${prev} =~ ${diropts} ]]; then
+        COMPREPLY=( $(compgen -d -- ${cur}) )
+        return 0
+    fi
+
+    if [[ ${cur} =~ : ]]; then
+        COMPREPLY=( $(compgen -W "${keywords}" -- ${cur}) )
+        return 0
+    elif [[ ${cur} == * ]] ; then
+        COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
+        return 0
+    fi
+}
+
+complete -F __youtube_dl youtube-dl
diff --git a/youtube-dl.fish b/youtube-dl.fish

new file mode 100644 (file)

index 0000000..067680f
--- /dev/null
+++ b/youtube-dl.fish
@@ -0,0 +1,161 @@
+
+complete --command youtube-dl --long-option help --short-option h --description 'Print this help text and exit'
+complete --command youtube-dl --long-option version --description 'Print program version and exit'
+complete --command youtube-dl --long-option update --short-option U --description 'Update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)'
+complete --command youtube-dl --long-option ignore-errors --short-option i --description 'Continue on download errors, for example to skip unavailable videos in a playlist'
+complete --command youtube-dl --long-option abort-on-error --description 'Abort downloading of further videos (in the playlist or the command line) if an error occurs'
+complete --command youtube-dl --long-option dump-user-agent --description 'Display the current browser identification'
+complete --command youtube-dl --long-option list-extractors --description 'List all supported extractors'
+complete --command youtube-dl --long-option extractor-descriptions --description 'Output descriptions of all supported extractors'
+complete --command youtube-dl --long-option force-generic-extractor --description 'Force extraction to use the generic extractor'
+complete --command youtube-dl --long-option default-search --description 'Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". Use the value "auto" to let youtube-dl guess ("auto_warning" to emit a warning when guessing). "error" just throws an error. The default value "fixup_error" repairs broken URLs, but emits an error if this is not possible instead of searching.'
+complete --command youtube-dl --long-option ignore-config --description 'Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: Do not read the user configuration in ~/.config/youtube-dl/config (%APPDATA%/youtube-dl/config.txt on Windows)'
+complete --command youtube-dl --long-option config-location --description 'Location of the configuration file; either the path to the config or its containing directory.'
+complete --command youtube-dl --long-option flat-playlist --description 'Do not extract the videos of a playlist, only list them.'
+complete --command youtube-dl --long-option mark-watched --description 'Mark videos watched (YouTube only)'
+complete --command youtube-dl --long-option no-mark-watched --description 'Do not mark videos watched (YouTube only)'
+complete --command youtube-dl --long-option no-color --description 'Do not emit color codes in output'
+complete --command youtube-dl --long-option proxy --description 'Use the specified HTTP/HTTPS/SOCKS proxy. To enable experimental SOCKS proxy, specify a proper scheme. For example socks5://127.0.0.1:1080/. Pass in an empty string (--proxy "") for direct connection'
+complete --command youtube-dl --long-option socket-timeout --description 'Time to wait before giving up, in seconds'
+complete --command youtube-dl --long-option source-address --description 'Client-side IP address to bind to'
+complete --command youtube-dl --long-option force-ipv4 --short-option 4 --description 'Make all connections via IPv4'
+complete --command youtube-dl --long-option force-ipv6 --short-option 6 --description 'Make all connections via IPv6'
+complete --command youtube-dl --long-option geo-verification-proxy --description 'Use this proxy to verify the IP address for some geo-restricted sites. The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.'
+complete --command youtube-dl --long-option cn-verification-proxy
+complete --command youtube-dl --long-option playlist-start --description 'Playlist video to start at (default is %default)'
+complete --command youtube-dl --long-option playlist-end --description 'Playlist video to end at (default is last)'
+complete --command youtube-dl --long-option playlist-items --description 'Playlist video items to download. Specify indices of the videos in the playlist separated by commas like: "--playlist-items 1,2,5,8" if you want to download videos indexed 1, 2, 5, 8 in the playlist. You can specify range: "--playlist-items 1-3,7,10-13", it will download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13.'
+complete --command youtube-dl --long-option match-title --description 'Download only matching titles (regex or caseless sub-string)'
+complete --command youtube-dl --long-option reject-title --description 'Skip download for matching titles (regex or caseless sub-string)'
+complete --command youtube-dl --long-option max-downloads --description 'Abort after downloading NUMBER files'
+complete --command youtube-dl --long-option min-filesize --description 'Do not download any videos smaller than SIZE (e.g. 50k or 44.6m)'
+complete --command youtube-dl --long-option max-filesize --description 'Do not download any videos larger than SIZE (e.g. 50k or 44.6m)'
+complete --command youtube-dl --long-option date --description 'Download only videos uploaded in this date'
+complete --command youtube-dl --long-option datebefore --description 'Download only videos uploaded on or before this date (i.e. inclusive)'
+complete --command youtube-dl --long-option dateafter --description 'Download only videos uploaded on or after this date (i.e. inclusive)'
+complete --command youtube-dl --long-option min-views --description 'Do not download any videos with less than COUNT views'
+complete --command youtube-dl --long-option max-views --description 'Do not download any videos with more than COUNT views'
+complete --command youtube-dl --long-option match-filter --description 'Generic video filter. Specify any key (see help for -o for a list of available keys) to match if the key is present, !key to check if the key is not present,key > NUMBER (like "comment_count > 12", also works with >=, <, <=, !=, =) to compare against a number, and & to require multiple matches. Values which are not known are excluded unless you put a question mark (?) after the operator.For example, to only match videos that have been liked more than 100 times and disliked less than 50 times (or the dislike functionality is not available at the given service), but who also have a description, use --match-filter "like_count > 100 & dislike_count <? 50 & description" .'
+complete --command youtube-dl --long-option no-playlist --description 'Download only the video, if the URL refers to a video and a playlist.'
+complete --command youtube-dl --long-option yes-playlist --description 'Download the playlist, if the URL refers to a video and a playlist.'
+complete --command youtube-dl --long-option age-limit --description 'Download only videos suitable for the given age'
+complete --command youtube-dl --long-option download-archive --description 'Download only videos not listed in the archive file. Record the IDs of all downloaded videos in it.' --require-parameter
+complete --command youtube-dl --long-option include-ads --description 'Download advertisements as well (experimental)'
+complete --command youtube-dl --long-option limit-rate --short-option r --description 'Maximum download rate in bytes per second (e.g. 50K or 4.2M)'
+complete --command youtube-dl --long-option retries --short-option R --description 'Number of retries (default is %default), or "infinite".'
+complete --command youtube-dl --long-option fragment-retries --description 'Number of retries for a fragment (default is %default), or "infinite" (DASH and hlsnative only)'
+complete --command youtube-dl --long-option skip-unavailable-fragments --description 'Skip unavailable fragments (DASH and hlsnative only)'
+complete --command youtube-dl --long-option abort-on-unavailable-fragment --description 'Abort downloading when some fragment is not available'
+complete --command youtube-dl --long-option buffer-size --description 'Size of download buffer (e.g. 1024 or 16K) (default is %default)'
+complete --command youtube-dl --long-option no-resize-buffer --description 'Do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.'
+complete --command youtube-dl --long-option test
+complete --command youtube-dl --long-option playlist-reverse --description 'Download playlist videos in reverse order'
+complete --command youtube-dl --long-option playlist-random --description 'Download playlist videos in random order'
+complete --command youtube-dl --long-option xattr-set-filesize --description 'Set file xattribute ytdl.filesize with expected file size (experimental)'
+complete --command youtube-dl --long-option hls-prefer-native --description 'Use the native HLS downloader instead of ffmpeg'
+complete --command youtube-dl --long-option hls-prefer-ffmpeg --description 'Use ffmpeg instead of the native HLS downloader'
+complete --command youtube-dl --long-option hls-use-mpegts --description 'Use the mpegts container for HLS videos, allowing to play the video while downloading (some players may not be able to play it)'
+complete --command youtube-dl --long-option external-downloader --description 'Use the specified external downloader. Currently supports aria2c,avconv,axel,curl,ffmpeg,httpie,wget'
+complete --command youtube-dl --long-option external-downloader-args --description 'Give these arguments to the external downloader'
+complete --command youtube-dl --long-option batch-file --short-option a --description 'File containing URLs to download ('"'"'-'"'"' for stdin)' --require-parameter
+complete --command youtube-dl --long-option id --description 'Use only video ID in file name'
+complete --command youtube-dl --long-option output --short-option o --description 'Output filename template, see the "OUTPUT TEMPLATE" for all the info'
+complete --command youtube-dl --long-option autonumber-size --description 'Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given (default is %default)'
+complete --command youtube-dl --long-option autonumber-start --description 'Specify the start value for %(autonumber)s (default is %default)'
+complete --command youtube-dl --long-option restrict-filenames --description 'Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames'
+complete --command youtube-dl --long-option auto-number --short-option A --description '[deprecated; use -o "%(autonumber)s-%(title)s.%(ext)s" ] Number downloaded files starting from 00000'
+complete --command youtube-dl --long-option title --short-option t --description '[deprecated] Use title in file name (default)'
+complete --command youtube-dl --long-option literal --short-option l --description '[deprecated] Alias of --title'
+complete --command youtube-dl --long-option no-overwrites --short-option w --description 'Do not overwrite files'
+complete --command youtube-dl --long-option continue --short-option c --description 'Force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.'
+complete --command youtube-dl --long-option no-continue --description 'Do not resume partially downloaded files (restart from beginning)'
+complete --command youtube-dl --long-option no-part --description 'Do not use .part files - write directly into output file'
+complete --command youtube-dl --long-option no-mtime --description 'Do not use the Last-modified header to set the file modification time'
+complete --command youtube-dl --long-option write-description --description 'Write video description to a .description file'
+complete --command youtube-dl --long-option write-info-json --description 'Write video metadata to a .info.json file'
+complete --command youtube-dl --long-option write-annotations --description 'Write video annotations to a .annotations.xml file'
+complete --command youtube-dl --long-option load-info-json --description 'JSON file containing the video information (created with the "--write-info-json" option)'
+complete --command youtube-dl --long-option cookies --description 'File to read cookies from and dump cookie jar in' --require-parameter
+complete --command youtube-dl --long-option cache-dir --description 'Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.'
+complete --command youtube-dl --long-option no-cache-dir --description 'Disable filesystem caching'
+complete --command youtube-dl --long-option rm-cache-dir --description 'Delete all filesystem cache files'
+complete --command youtube-dl --long-option write-thumbnail --description 'Write thumbnail image to disk'
+complete --command youtube-dl --long-option write-all-thumbnails --description 'Write all thumbnail image formats to disk'
+complete --command youtube-dl --long-option list-thumbnails --description 'Simulate and list all available thumbnail formats'
+complete --command youtube-dl --long-option quiet --short-option q --description 'Activate quiet mode'
+complete --command youtube-dl --long-option no-warnings --description 'Ignore warnings'
+complete --command youtube-dl --long-option simulate --short-option s --description 'Do not download the video and do not write anything to disk'
+complete --command youtube-dl --long-option skip-download --description 'Do not download the video'
+complete --command youtube-dl --long-option get-url --short-option g --description 'Simulate, quiet but print URL'
+complete --command youtube-dl --long-option get-title --short-option e --description 'Simulate, quiet but print title'
+complete --command youtube-dl --long-option get-id --description 'Simulate, quiet but print id'
+complete --command youtube-dl --long-option get-thumbnail --description 'Simulate, quiet but print thumbnail URL'
+complete --command youtube-dl --long-option get-description --description 'Simulate, quiet but print video description'
+complete --command youtube-dl --long-option get-duration --description 'Simulate, quiet but print video length'
+complete --command youtube-dl --long-option get-filename --description 'Simulate, quiet but print output filename'
+complete --command youtube-dl --long-option get-format --description 'Simulate, quiet but print output format'
+complete --command youtube-dl --long-option dump-json --short-option j --description 'Simulate, quiet but print JSON information. See --output for a description of available keys.'
+complete --command youtube-dl --long-option dump-single-json --short-option J --description 'Simulate, quiet but print JSON information for each command-line argument. If the URL refers to a playlist, dump the whole playlist information in a single line.'
+complete --command youtube-dl --long-option print-json --description 'Be quiet and print the video information as JSON (video is still being downloaded).'
+complete --command youtube-dl --long-option newline --description 'Output progress bar as new lines'
+complete --command youtube-dl --long-option no-progress --description 'Do not print progress bar'
+complete --command youtube-dl --long-option console-title --description 'Display progress in console titlebar'
+complete --command youtube-dl --long-option verbose --short-option v --description 'Print various debugging information'
+complete --command youtube-dl --long-option dump-pages --description 'Print downloaded pages encoded using base64 to debug problems (very verbose)'
+complete --command youtube-dl --long-option write-pages --description 'Write downloaded intermediary pages to files in the current directory to debug problems'
+complete --command youtube-dl --long-option youtube-print-sig-code
+complete --command youtube-dl --long-option print-traffic --description 'Display sent and read HTTP traffic'
+complete --command youtube-dl --long-option call-home --short-option C --description 'Contact the youtube-dl server for debugging'
+complete --command youtube-dl --long-option no-call-home --description 'Do NOT contact the youtube-dl server for debugging'
+complete --command youtube-dl --long-option encoding --description 'Force the specified encoding (experimental)'
+complete --command youtube-dl --long-option no-check-certificate --description 'Suppress HTTPS certificate validation'
+complete --command youtube-dl --long-option prefer-insecure --description 'Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)'
+complete --command youtube-dl --long-option user-agent --description 'Specify a custom user agent'
+complete --command youtube-dl --long-option referer --description 'Specify a custom referer, use if the video access is restricted to one domain'
+complete --command youtube-dl --long-option add-header --description 'Specify a custom HTTP header and its value, separated by a colon '"'"':'"'"'. You can use this option multiple times'
+complete --command youtube-dl --long-option bidi-workaround --description 'Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH'
+complete --command youtube-dl --long-option sleep-interval --description 'Number of seconds to sleep before each download when used alone or a lower bound of a range for randomized sleep before each download (minimum possible number of seconds to sleep) when used along with --max-sleep-interval.'
+complete --command youtube-dl --long-option max-sleep-interval --description 'Upper bound of a range for randomized sleep before each download (maximum possible number of seconds to sleep). Must only be used along with --min-sleep-interval.'
+complete --command youtube-dl --long-option format --short-option f --description 'Video format code, see the "FORMAT SELECTION" for all the info'
+complete --command youtube-dl --long-option all-formats --description 'Download all available video formats'
+complete --command youtube-dl --long-option prefer-free-formats --description 'Prefer free video formats unless a specific one is requested'
+complete --command youtube-dl --long-option list-formats --short-option F --description 'List all available formats of requested videos'
+complete --command youtube-dl --long-option youtube-include-dash-manifest
+complete --command youtube-dl --long-option youtube-skip-dash-manifest --description 'Do not download the DASH manifests and related data on YouTube videos'
+complete --command youtube-dl --long-option merge-output-format --description 'If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no merge is required'
+complete --command youtube-dl --long-option write-sub --description 'Write subtitle file'
+complete --command youtube-dl --long-option write-auto-sub --description 'Write automatically generated subtitle file (YouTube only)'
+complete --command youtube-dl --long-option all-subs --description 'Download all the available subtitles of the video'
+complete --command youtube-dl --long-option list-subs --description 'List all available subtitles for the video'
+complete --command youtube-dl --long-option sub-format --description 'Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"'
+complete --command youtube-dl --long-option sub-lang --description 'Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags'
+complete --command youtube-dl --long-option username --short-option u --description 'Login with this account ID'
+complete --command youtube-dl --long-option password --short-option p --description 'Account password. If this option is left out, youtube-dl will ask interactively.'
+complete --command youtube-dl --long-option twofactor --short-option 2 --description 'Two-factor authentication code'
+complete --command youtube-dl --long-option netrc --short-option n --description 'Use .netrc authentication data'
+complete --command youtube-dl --long-option video-password --description 'Video password (vimeo, smotri, youku)'
+complete --command youtube-dl --long-option ap-mso --description 'Adobe Pass multiple-system operator (TV provider) identifier, use --ap-list-mso for a list of available MSOs'
+complete --command youtube-dl --long-option ap-username --description 'Multiple-system operator account login'
+complete --command youtube-dl --long-option ap-password --description 'Multiple-system operator account password. If this option is left out, youtube-dl will ask interactively.'
+complete --command youtube-dl --long-option ap-list-mso --description 'List all supported multiple-system operators'
+complete --command youtube-dl --long-option extract-audio --short-option x --description 'Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)'
+complete --command youtube-dl --long-option audio-format --description 'Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default; No effect without -x'
+complete --command youtube-dl --long-option audio-quality --description 'Specify ffmpeg/avconv audio quality, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default %default)'
+complete --command youtube-dl --long-option recode-video --description 'Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv|avi)' --arguments 'mp4 flv ogg webm mkv' --exclusive
+complete --command youtube-dl --long-option postprocessor-args --description 'Give these arguments to the postprocessor'
+complete --command youtube-dl --long-option keep-video --short-option k --description 'Keep the video file on disk after the post-processing; the video is erased by default'
+complete --command youtube-dl --long-option no-post-overwrites --description 'Do not overwrite post-processed files; the post-processed files are overwritten by default'
+complete --command youtube-dl --long-option embed-subs --description 'Embed subtitles in the video (only for mp4, webm and mkv videos)'
+complete --command youtube-dl --long-option embed-thumbnail --description 'Embed thumbnail in the audio as cover art'
+complete --command youtube-dl --long-option add-metadata --description 'Write metadata to the video file'
+complete --command youtube-dl --long-option metadata-from-title --description 'Parse additional metadata like song title / artist from the video title. The format syntax is the same as --output, the parsed parameters replace existing values. Additional templates: %(album)s, %(artist)s. Example: --metadata-from-title "%(artist)s - %(title)s" matches a title like "Coldplay - Paradise"'
+complete --command youtube-dl --long-option xattrs --description 'Write metadata to the video file'"'"'s xattrs (using dublin core and xdg standards)'
+complete --command youtube-dl --long-option fixup --description 'Automatically correct known faults of the file. One of never (do nothing), warn (only emit a warning), detect_or_warn (the default; fix file if we can, warn otherwise)'
+complete --command youtube-dl --long-option prefer-avconv --description 'Prefer avconv over ffmpeg for running the postprocessors (default)'
+complete --command youtube-dl --long-option prefer-ffmpeg --description 'Prefer ffmpeg over avconv for running the postprocessors'
+complete --command youtube-dl --long-option ffmpeg-location --description 'Location of the ffmpeg/avconv binary; either the path to the binary or its containing directory.'
+complete --command youtube-dl --long-option exec --description 'Execute a command on the file after downloading, similar to find'"'"'s -exec syntax. Example: --exec '"'"'adb push {} /sdcard/Music/ && rm {}'"'"''
+complete --command youtube-dl --long-option convert-subs --description 'Convert the subtitles to other format (currently supported: srt|ass|vtt)'
+
+
+complete --command youtube-dl --arguments ":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater :ythistory"
diff --git a/youtube-dl.plugin.zsh b/youtube-dl.plugin.zsh

deleted file mode 100644 (file)

index 4edab52..0000000
--- a/youtube-dl.plugin.zsh
+++ /dev/null
@@ -1,24 +0,0 @@
-# This allows the youtube-dl command to be installed in ZSH using antigen.
-# Antigen is a bundle manager. It allows you to enhance the functionality of
-# your zsh session by installing bundles and themes easily.
-
-# Antigen documentation:
-# http://antigen.sharats.me/
-# https://github.com/zsh-users/antigen
-
-# Install youtube-dl:
-# antigen bundle rg3/youtube-dl
-# Bundles installed by antigen are available for use immediately.
-
-# Update youtube-dl (and all other antigen bundles):
-# antigen update
-
-# The antigen command will download the git repository to a folder and then
-# execute an enabling script (this file). The complete process for loading the
-# code is documented here:
-# https://github.com/zsh-users/antigen#notes-on-writing-plugins
-
-# This specific script just aliases youtube-dl to the python script that this
-# library provides. This requires updating the PYTHONPATH to ensure that the
-# full set of code can be located.
-alias youtube-dl="PYTHONPATH=$(dirname $0) $(dirname $0)/bin/youtube-dl"
diff --git a/youtube-dl.zsh b/youtube-dl.zsh

new file mode 100644 (file)

index 0000000..a0fe383
--- /dev/null
+++ b/youtube-dl.zsh
@@ -0,0 +1,28 @@
+#compdef youtube-dl
+
+__youtube_dl() {
+    local curcontext="$curcontext" fileopts diropts cur prev
+    typeset -A opt_args
+    fileopts="--download-archive|-a|--batch-file|--load-info-json|--load-info|--cookies"
+    diropts="--cache-dir"
+    cur=$words[CURRENT]
+    case $cur in
+        :)
+            _arguments '*: :(::ytfavorites ::ytrecommended ::ytsubscriptions ::ytwatchlater ::ythistory)'
+        ;;
+        *)
+            prev=$words[CURRENT-1]
+            if [[ ${prev} =~ ${fileopts} ]]; then
+                _path_files
+            elif [[ ${prev} =~ ${diropts} ]]; then
+                _path_files -/
+            elif [[ ${prev} == "--recode-video" ]]; then
+                _arguments '*: :(mp4 flv ogg webm mkv)'
+            else
+                _arguments '*: :(--help --version --update --ignore-errors --abort-on-error --dump-user-agent --list-extractors --extractor-descriptions --force-generic-extractor --default-search --ignore-config --config-location --flat-playlist --mark-watched --no-mark-watched --no-color --proxy --socket-timeout --source-address --force-ipv4 --force-ipv6 --geo-verification-proxy --cn-verification-proxy --playlist-start --playlist-end --playlist-items --match-title --reject-title --max-downloads --min-filesize --max-filesize --date --datebefore --dateafter --min-views --max-views --match-filter --no-playlist --yes-playlist --age-limit --download-archive --include-ads --limit-rate --retries --fragment-retries --skip-unavailable-fragments --abort-on-unavailable-fragment --buffer-size --no-resize-buffer --test --playlist-reverse --playlist-random --xattr-set-filesize --hls-prefer-native --hls-prefer-ffmpeg --hls-use-mpegts --external-downloader --external-downloader-args --batch-file --id --output --autonumber-size --autonumber-start --restrict-filenames --auto-number --title --literal --no-overwrites --continue --no-continue --no-part --no-mtime --write-description --write-info-json --write-annotations --load-info-json --cookies --cache-dir --no-cache-dir --rm-cache-dir --write-thumbnail --write-all-thumbnails --list-thumbnails --quiet --no-warnings --simulate --skip-download --get-url --get-title --get-id --get-thumbnail --get-description --get-duration --get-filename --get-format --dump-json --dump-single-json --print-json --newline --no-progress --console-title --verbose --dump-pages --write-pages --youtube-print-sig-code --print-traffic --call-home --no-call-home --encoding --no-check-certificate --prefer-insecure --user-agent --referer --add-header --bidi-workaround --sleep-interval --max-sleep-interval --format --all-formats --prefer-free-formats --list-formats --youtube-include-dash-manifest --youtube-skip-dash-manifest --merge-output-format --write-sub --write-auto-sub --all-subs --list-subs --sub-format --sub-lang --username --password --twofactor --netrc --video-password --ap-mso --ap-username --ap-password --ap-list-mso --extract-audio --audio-format --audio-quality --recode-video --postprocessor-args --keep-video --no-post-overwrites --embed-subs --embed-thumbnail --add-metadata --metadata-from-title --xattrs --fixup --prefer-avconv --prefer-ffmpeg --ffmpeg-location --exec --convert-subs)'
+            fi
+        ;;
+    esac
+}
+
+__youtube_dl
+\ No newline at end of file
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index 53f20ac2cb1bd16398e160db329004b49d6bf424..a7bf5a1b06766094cc06e85fd40c6fa64c2cc64b 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -24,6 +24,7 @@ import sys
  import time
  import tokenize
  import traceback
+import random
  
  from .compat import (
      compat_basestring,
@@ -159,6 +160,7 @@ class YoutubeDL(object):
      playlistend:       Playlist item to end at.
      playlist_items:    Specific indices of playlist to download.
      playlistreverse:   Download playlist items in reverse order.
+    playlistrandom:    Download playlist items in random order.
      matchtitle:        Download only matching titles.
      rejecttitle:       Reject downloads for matching titles.
      logger:            Log messages to a logging.Logger instance.
@@ -584,7 +586,7 @@ class YoutubeDL(object):
              if autonumber_size is None:
                  autonumber_size = 5
              autonumber_templ = '%0' + str(autonumber_size) + 'd'
-            template_dict['autonumber'] = autonumber_templ % self._num_downloads
+            template_dict['autonumber'] = autonumber_templ % (self.params.get('autonumber_start', 1) - 1 + self._num_downloads)
              if template_dict.get('playlist_index') is not None:
                  template_dict['playlist_index'] = '%0*d' % (len(str(template_dict['n_entries'])), template_dict['playlist_index'])
              if template_dict.get('resolution') is None:
@@ -842,6 +844,9 @@ class YoutubeDL(object):
              if self.params.get('playlistreverse', False):
                  entries = entries[::-1]
  
+            if self.params.get('playlistrandom', False):
+                random.shuffle(entries)
+
              for i, entry in enumerate(entries, 1):
                  self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
                  extra = {
@@ -1339,7 +1344,7 @@ class YoutubeDL(object):
                  format['format_id'] = compat_str(i)
              else:
                  # Sanitize format_id from characters used in format selector expression
-                format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id'])
+                format['format_id'] = re.sub(r'[\s,/+\[\]()]', '_', format['format_id'])
              format_id = format['format_id']
              if format_id not in formats_dict:
                  formats_dict[format_id] = []
@@ -1363,7 +1368,7 @@ class YoutubeDL(object):
                  format['ext'] = determine_ext(format['url']).lower()
              # Automatically determine protocol if missing (useful for format
              # selection purposes)
-            if 'protocol' not in format:
+            if format.get('protocol') is None:
                  format['protocol'] = determine_protocol(format)
              # Add HTTP headers, so that external programs can use them from the
              # json output
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index 6850d95e1ff359453571a6ac635d6ffa99ae038f..5c5b8094bc1b6cbc68b09eba85b06a63a2ee785d 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -133,6 +133,12 @@ def _real_main(argv=None):
          parser.error('TV Provider account username missing\n')
      if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
          parser.error('using output template conflicts with using title, video ID or auto number')
+    if opts.autonumber_size is not None:
+        if opts.autonumber_size <= 0:
+            parser.error('auto number size must be positive')
+    if opts.autonumber_start is not None:
+        if opts.autonumber_start < 0:
+            parser.error('auto number start must be positive or 0')
      if opts.usetitle and opts.useid:
          parser.error('using title conflicts with using video ID')
      if opts.username is not None and opts.password is None:
@@ -321,6 +327,7 @@ def _real_main(argv=None):
          'listformats': opts.listformats,
          'outtmpl': outtmpl,
          'autonumber_size': opts.autonumber_size,
+        'autonumber_start': opts.autonumber_start,
          'restrictfilenames': opts.restrictfilenames,
          'ignoreerrors': opts.ignoreerrors,
          'force_generic_extractor': opts.force_generic_extractor,
@@ -337,6 +344,7 @@ def _real_main(argv=None):
          'playliststart': opts.playliststart,
          'playlistend': opts.playlistend,
          'playlistreverse': opts.playlist_reverse,
+        'playlistrandom': opts.playlist_random,
          'noplaylist': opts.noplaylist,
          'logtostderr': opts.outtmpl == '-',
          'consoletitle': opts.consoletitle,
@@ -405,7 +413,7 @@ def _real_main(argv=None):
          'postprocessor_args': postprocessor_args,
          'cn_verification_proxy': opts.cn_verification_proxy,
          'geo_verification_proxy': opts.geo_verification_proxy,
-
+        'config_location': opts.config_location,
      }
  
      with YoutubeDL(ydl_opts) as ydl:
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index 83ee7e25747532c61f344aaea921021690669f61..7189020192601c289f47eafbda40feefd14cde6c 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -2344,7 +2344,7 @@ try:
      from urllib.parse import unquote_plus as compat_urllib_parse_unquote_plus
  except ImportError:  # Python 2
      _asciire = (compat_urllib_parse._asciire if hasattr(compat_urllib_parse, '_asciire')
-                else re.compile('([\x00-\x7f]+)'))
+                else re.compile(r'([\x00-\x7f]+)'))
  
      # HACK: The following are the correct unquote_to_bytes, unquote and unquote_plus
      # implementations from cpython 3.4.3's stdlib. Python 2's version
@@ -2529,6 +2529,24 @@ else:
                  el.text = el.text.decode('utf-8')
          return doc
  
+if hasattr(etree, 'register_namespace'):
+    compat_etree_register_namespace = etree.register_namespace
+else:
+    def compat_etree_register_namespace(prefix, uri):
+        """Register a namespace prefix.
+        The registry is global, and any existing mapping for either the
+        given prefix or the namespace URI will be removed.
+        *prefix* is the namespace prefix, *uri* is a namespace uri. Tags and
+        attributes in this namespace will be serialized with prefix if possible.
+        ValueError is raised if prefix is reserved or is invalid.
+        """
+        if re.match(r"ns\d+$", prefix):
+            raise ValueError("Prefix format reserved for internal use")
+        for k, v in list(etree._namespace_map.items()):
+            if k == uri or v == prefix:
+                del etree._namespace_map[k]
+        etree._namespace_map[uri] = prefix
+
  if sys.version_info < (2, 7):
      # Here comes the crazy part: In 2.6, if the xpath is a unicode,
      # .//node does not match if a node is a direct child of . !
@@ -2865,6 +2883,7 @@ __all__ = [
      'compat_cookiejar',
      'compat_cookies',
      'compat_etree_fromstring',
+    'compat_etree_register_namespace',
      'compat_expanduser',
      'compat_get_terminal_size',
      'compat_getenv',
diff --git a/youtube_dl/downloader/external.py b/youtube_dl/downloader/external.py

index 5d3e5d8d3d748d98ea187e8eca4444c5504e07fb..41e37261d034bbb61dc1932fd08283952e4bee25 100644 (file)
--- a/youtube_dl/downloader/external.py
+++ b/youtube_dl/downloader/external.py
@@ -17,6 +17,7 @@ from ..utils import (
      encodeArgument,
      handle_youtubedl_headers,
      check_executable,
+    is_outdated_version,
  )
  
  
@@ -198,6 +199,15 @@ class FFmpegFD(ExternalFD):
  
          args = [ffpp.executable, '-y']
  
+        seekable = info_dict.get('_seekable')
+        if seekable is not None:
+            # setting -seekable prevents ffmpeg from guessing if the server
+            # supports seeking(by adding the header `Range: bytes=0-`), which
+            # can cause problems in some cases
+            # https://github.com/rg3/youtube-dl/issues/11800#issuecomment-275037127
+            # http://trac.ffmpeg.org/ticket/6125#comment:10
+            args += ['-seekable', '1' if seekable else '0']
+
          args += self._configuration_args()
  
          # start_time = info_dict.get('start_time') or 0
@@ -264,7 +274,9 @@ class FFmpegFD(ExternalFD):
              if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
                  args += ['-f', 'mpegts']
              else:
-                args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
+                args += ['-f', 'mp4']
+                if (ffpp.basename == 'ffmpeg' and is_outdated_version(ffpp._versions['ffmpeg'], '3.2')) and (not info_dict.get('acodec') or info_dict['acodec'].split('.')[0] in ('aac', 'mp4a')):
+                    args += ['-bsf:a', 'aac_adtstoasc']
          elif protocol == 'rtmp':
              args += ['-f', 'flv']
          else:
diff --git a/youtube_dl/downloader/fragment.py b/youtube_dl/downloader/fragment.py

index 60df627a65dfc589899f009fa5df9ce76a441ae5..56f97526676cda29b8c3b15de0e07cb676cc8573 100644 (file)
--- a/youtube_dl/downloader/fragment.py
+++ b/youtube_dl/downloader/fragment.py
@@ -61,6 +61,7 @@ class FragmentFD(FileDownloader):
                  'noprogress': True,
                  'ratelimit': self.params.get('ratelimit'),
                  'retries': self.params.get('retries', 0),
+                'nopart': self.params.get('nopart', False),
                  'test': self.params.get('test', False),
              }
          )
diff --git a/youtube_dl/downloader/hls.py b/youtube_dl/downloader/hls.py

index 7373ec05fd0d4a1d983f48668229b21d98977581..4989abce12ee236e5c528778e5b95f67d92e165e 100644 (file)
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@@ -65,6 +65,9 @@ class HlsFD(FragmentFD):
          s = manifest.decode('utf-8', 'ignore')
  
          if not self.can_download(s, info_dict):
+            if info_dict.get('extra_param_to_segment_url'):
+                self.report_error('pycrypto not found. Please install it.')
+                return False
              self.report_warning(
                  'hlsnative has detected features it does not support, '
                  'extraction will be delegated to ffmpeg')
diff --git a/youtube_dl/extractor/abcnews.py b/youtube_dl/extractor/abcnews.py

index 6ae5d9a96ac6919ab1ea1ae906bf510018d5578b..4f56c4c11935ee85a9412a39de20138bb83cc33d 100644 (file)
--- a/youtube_dl/extractor/abcnews.py
+++ b/youtube_dl/extractor/abcnews.py
@@ -23,7 +23,7 @@ class AbcNewsVideoIE(AMPIE):
              'title': '\'This Week\' Exclusive: Iran\'s Foreign Minister Zarif',
              'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
              'duration': 180,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
@@ -59,7 +59,7 @@ class AbcNewsIE(InfoExtractor):
              'display_id': 'dramatic-video-rare-death-job-america',
              'title': 'Occupational Hazards',
              'description': 'Nightline investigates the dangers that lurk at various jobs.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20100428',
              'timestamp': 1272412800,
          },
diff --git a/youtube_dl/extractor/abcotvs.py b/youtube_dl/extractor/abcotvs.py

index 054bb05964910c3d521eb6615661c1843239290f..76e98132b9d18514e54ed37e11df61089a75678c 100644 (file)
--- a/youtube_dl/extractor/abcotvs.py
+++ b/youtube_dl/extractor/abcotvs.py
@@ -23,7 +23,7 @@ class ABCOTVSIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'East Bay museum celebrates vintage synthesizers',
                  'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': 1421123075,
                  'upload_date': '20150113',
                  'uploader': 'Jonathan Bloom',
diff --git a/youtube_dl/extractor/acast.py b/youtube_dl/extractor/acast.py

index 94ce88c834f5ce1575b36f839f02fdf43f96e046..6dace305141423ec35d25abe3c4283476a762b2f 100644 (file)
--- a/youtube_dl/extractor/acast.py
+++ b/youtube_dl/extractor/acast.py
@@ -8,6 +8,7 @@ from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
      int_or_none,
+    parse_iso8601,
      OnDemandPagedList,
  )
  
@@ -15,18 +16,33 @@ from ..utils import (
  class ACastIE(InfoExtractor):
      IE_NAME = 'acast'
      _VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<channel>[^/]+)/(?P<id>[^/#?]+)'
-    _TEST = {
+    _TESTS = [{
+        # test with one bling
          'url': 'https://www.acast.com/condenasttraveler/-where-are-you-taipei-101-taiwan',
          'md5': 'ada3de5a1e3a2a381327d749854788bb',
          'info_dict': {
              'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
              'ext': 'mp3',
              'title': '"Where Are You?": Taipei 101, Taiwan',
-            'timestamp': 1196172000000,
+            'timestamp': 1196172000,
+            'upload_date': '20071127',
              'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
              'duration': 211,
          }
-    }
+    }, {
+        # test with multiple blings
+        'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
+        'md5': '55c0097badd7095f494c99a172f86501',
+        'info_dict': {
+            'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
+            'ext': 'mp3',
+            'title': '2. Raggarmordet - Röster ur det förflutna',
+            'timestamp': 1477346700,
+            'upload_date': '20161024',
+            'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
+            'duration': 2797,
+        }
+    }]
  
      def _real_extract(self, url):
          channel, display_id = re.match(self._VALID_URL, url).groups()
@@ -35,11 +51,11 @@ class ACastIE(InfoExtractor):
          return {
              'id': compat_str(cast_data['id']),
              'display_id': display_id,
-            'url': cast_data['blings'][0]['audio'],
+            'url': [b['audio'] for b in cast_data['blings'] if b['type'] == 'BlingAudio'][0],
              'title': cast_data['name'],
              'description': cast_data.get('description'),
              'thumbnail': cast_data.get('image'),
-            'timestamp': int_or_none(cast_data.get('publishingDate')),
+            'timestamp': parse_iso8601(cast_data.get('publishingDate')),
              'duration': int_or_none(cast_data.get('duration')),
          }
  
diff --git a/youtube_dl/extractor/adobetv.py b/youtube_dl/extractor/adobetv.py

index 5ae16fa16809b557e74e133a4a7811d396b1c2c2..008c98e51ead3ffcad7bb350fcf928a945b91e35 100644 (file)
--- a/youtube_dl/extractor/adobetv.py
+++ b/youtube_dl/extractor/adobetv.py
@@ -30,7 +30,7 @@ class AdobeTVIE(AdobeTVBaseIE):
              'ext': 'mp4',
              'title': 'Quick Tip - How to Draw a Circle Around an Object in Photoshop',
              'description': 'md5:99ec318dc909d7ba2a1f2b038f7d2311',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'upload_date': '20110914',
              'duration': 60,
              'view_count': int,
diff --git a/youtube_dl/extractor/aenetworks.py b/youtube_dl/extractor/aenetworks.py

index 6adb6d824c00ec733afaf1bbe1b243f7d623b647..c97317400ea1f660674d34ca22f4177365198d59 100644 (file)
--- a/youtube_dl/extractor/aenetworks.py
+++ b/youtube_dl/extractor/aenetworks.py
@@ -26,7 +26,7 @@ class AENetworksIE(AENetworksBaseIE):
      _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
      _TESTS = [{
          'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
-        'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
+        'md5': 'a97a65f7e823ae10e9244bc5433d5fe6',
          'info_dict': {
              'id': '22253814',
              'ext': 'mp4',
@@ -87,7 +87,7 @@ class AENetworksIE(AENetworksBaseIE):
                      self._html_search_meta('aetn:SeriesTitle', webpage))
              elif url_parts_len == 2:
                  entries = []
-                for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
+                for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
                      episode_attributes = extract_attributes(episode_item)
                      episode_url = compat_urlparse.urljoin(
                          url, episode_attributes['data-canonical'])
@@ -99,7 +99,7 @@ class AENetworksIE(AENetworksBaseIE):
  
          query = {
              'mbr': 'true',
-            'assetTypes': 'medium_video_s3'
+            'assetTypes': 'high_video_s3'
          }
          video_id = self._html_search_meta('aetn:VideoID', webpage)
          media_url = self._search_regex(
@@ -155,7 +155,7 @@ class HistoryTopicIE(AENetworksBaseIE):
              'id': 'world-war-i-history',
              'title': 'World War I History',
          },
-        'playlist_mincount': 24,
+        'playlist_mincount': 23,
      }, {
          'url': 'http://www.history.com/topics/world-war-i-history/videos',
          'only_matching': True,
@@ -193,7 +193,8 @@ class HistoryTopicIE(AENetworksBaseIE):
              return self.theplatform_url_result(
                  release_url, video_id, {
                      'mbr': 'true',
-                    'switch': 'hls'
+                    'switch': 'hls',
+                    'assetTypes': 'high_video_ak',
                  })
          else:
              webpage = self._download_webpage(url, topic_id)
@@ -203,6 +204,7 @@ class HistoryTopicIE(AENetworksBaseIE):
                  entries.append(self.theplatform_url_result(
                      video_attributes['data-release-url'], video_attributes['data-id'], {
                          'mbr': 'true',
-                        'switch': 'hls'
+                        'switch': 'hls',
+                        'assetTypes': 'high_video_ak',
                      }))
              return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))
diff --git a/youtube_dl/extractor/afreecatv.py b/youtube_dl/extractor/afreecatv.py

index 75b36699363609876c755d4c120ec195aa81ec3a..e0a0f7c57b83c7a715e7a39c16a8a71df1cbb500 100644 (file)
--- a/youtube_dl/extractor/afreecatv.py
+++ b/youtube_dl/extractor/afreecatv.py
@@ -18,6 +18,7 @@ from ..utils import (
  
  
  class AfreecaTVIE(InfoExtractor):
+    IE_NAME = 'afreecatv'
      IE_DESC = 'afreecatv.com'
      _VALID_URL = r'''(?x)
                      https?://
@@ -143,3 +144,107 @@ class AfreecaTVIE(InfoExtractor):
                  expected=True)
  
          return info
+
+
+class AfreecaTVGlobalIE(AfreecaTVIE):
+    IE_NAME = 'afreecatv:global'
+    _VALID_URL = r'https?://(?:www\.)?afreeca\.tv/(?P<channel_id>\d+)(?:/v/(?P<video_id>\d+))?'
+    _TESTS = [{
+        'url': 'http://afreeca.tv/36853014/v/58301',
+        'info_dict': {
+            'id': '58301',
+            'title': 'tryhard top100',
+            'uploader_id': '36853014',
+            'uploader': 'makgi Hearthstone Live!',
+        },
+        'playlist_count': 3,
+    }]
+
+    def _real_extract(self, url):
+        channel_id, video_id = re.match(self._VALID_URL, url).groups()
+        video_type = 'video' if video_id else 'live'
+        query = {
+            'pt': 'view',
+            'bid': channel_id,
+        }
+        if video_id:
+            query['vno'] = video_id
+        video_data = self._download_json(
+            'http://api.afreeca.tv/%s/view_%s.php' % (video_type, video_type),
+            video_id or channel_id, query=query)['channel']
+
+        if video_data.get('result') != 1:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, video_data['remsg']))
+
+        title = video_data['title']
+
+        info = {
+            'thumbnail': video_data.get('thumb'),
+            'view_count': int_or_none(video_data.get('vcnt')),
+            'age_limit': int_or_none(video_data.get('grade')),
+            'uploader_id': channel_id,
+            'uploader': video_data.get('cname'),
+        }
+
+        if video_id:
+            entries = []
+            for i, f in enumerate(video_data.get('flist', [])):
+                video_key = self.parse_video_key(f.get('key', ''))
+                f_url = f.get('file')
+                if not video_key or not f_url:
+                    continue
+                entries.append({
+                    'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
+                    'title': title,
+                    'upload_date': video_key.get('upload_date'),
+                    'duration': int_or_none(f.get('length')),
+                    'url': f_url,
+                    'protocol': 'm3u8_native',
+                    'ext': 'mp4',
+                })
+
+            info.update({
+                'id': video_id,
+                'title': title,
+                'duration': int_or_none(video_data.get('length')),
+            })
+            if len(entries) > 1:
+                info['_type'] = 'multi_video'
+                info['entries'] = entries
+            elif len(entries) == 1:
+                i = entries[0].copy()
+                i.update(info)
+                info = i
+        else:
+            formats = []
+            for s in video_data.get('strm', []):
+                s_url = s.get('purl')
+                if not s_url:
+                    continue
+                stype = s.get('stype')
+                if stype == 'HLS':
+                    formats.extend(self._extract_m3u8_formats(
+                        s_url, channel_id, 'mp4', m3u8_id=stype, fatal=False))
+                elif stype == 'RTMP':
+                    format_id = [stype]
+                    label = s.get('label')
+                    if label:
+                        format_id.append(label)
+                    formats.append({
+                        'format_id': '-'.join(format_id),
+                        'url': s_url,
+                        'tbr': int_or_none(s.get('bps')),
+                        'height': int_or_none(s.get('brt')),
+                        'ext': 'flv',
+                        'rtmp_live': True,
+                    })
+            self._sort_formats(formats)
+
+            info.update({
+                'id': channel_id,
+                'title': self._live_title(title),
+                'is_live': True,
+                'formats': formats,
+            })
+
+        return info
diff --git a/youtube_dl/extractor/airmozilla.py b/youtube_dl/extractor/airmozilla.py

index f8e70f4e580746093d97e3d2d596d008ed3e6c15..0e069187994d0b9d25463d2d2f3cdb6c74ce5406 100644 (file)
--- a/youtube_dl/extractor/airmozilla.py
+++ b/youtube_dl/extractor/airmozilla.py
@@ -20,7 +20,7 @@ class AirMozillaIE(InfoExtractor):
              'id': '6x4q2w',
              'ext': 'mp4',
              'title': 'Privacy Lab - a meetup for privacy minded people in San Francisco',
-            'thumbnail': 're:https?://vid\.ly/(?P<id>[0-9a-z-]+)/poster',
+            'thumbnail': r're:https?://vid\.ly/(?P<id>[0-9a-z-]+)/poster',
              'description': 'Brings together privacy professionals and others interested in privacy at for-profits, non-profits, and NGOs in an effort to contribute to the state of the ecosystem...',
              'timestamp': 1422487800,
              'upload_date': '20150128',
diff --git a/youtube_dl/extractor/allocine.py b/youtube_dl/extractor/allocine.py

index 517b06def4d2ff690628eece4b1e85e647aea267..90f11d39f5393528e75ff641342cfda8c8706e0b 100644 (file)
--- a/youtube_dl/extractor/allocine.py
+++ b/youtube_dl/extractor/allocine.py
@@ -21,7 +21,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Astérix - Le Domaine des Dieux Teaser VF',
              'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html',
@@ -32,7 +32,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Planes 2 Bande-annonce VF',
              'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
@@ -43,7 +43,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Dragons 2 - Bande annonce finale VF',
              'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/video-19550147/',
@@ -53,7 +53,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
              'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/alphaporno.py b/youtube_dl/extractor/alphaporno.py

index c34719d1fefb6afd45e8775c3683913799380714..3a6d99f6bfd050e7c204a08fe32fbf230a8ff694 100644 (file)
--- a/youtube_dl/extractor/alphaporno.py
+++ b/youtube_dl/extractor/alphaporno.py
@@ -19,7 +19,7 @@ class AlphaPornoIE(InfoExtractor):
              'display_id': 'sensual-striptease-porn-with-samantha-alexandra',
              'ext': 'mp4',
              'title': 'Sensual striptease porn with Samantha Alexandra',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'timestamp': 1418694611,
              'upload_date': '20141216',
              'duration': 387,
diff --git a/youtube_dl/extractor/aol.py b/youtube_dl/extractor/aol.py

index 2cdee33200232dc69c1755213fc2f8298c6c8fa3..b50f454ee0ca661aa3cf93ab05121bf8857eeca8 100644 (file)
--- a/youtube_dl/extractor/aol.py
+++ b/youtube_dl/extractor/aol.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  class AolIE(InfoExtractor):
      IE_NAME = 'on.aol.com'
-    _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
+    _VALID_URL = r'(?:aol-video:|https?://(?:(?:www|on)\.)?aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
  
      _TESTS = [{
          # video with 5min ID
@@ -33,7 +33,7 @@ class AolIE(InfoExtractor):
          }
      }, {
          # video with vidible ID
-        'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
+        'url': 'http://www.aol.com/video/view/netflix-is-raising-rates/5707d6b8e4b090497b04f706/',
          'info_dict': {
              'id': '5707d6b8e4b090497b04f706',
              'ext': 'mp4',
@@ -108,30 +108,3 @@ class AolIE(InfoExtractor):
              'uploader': video_data.get('videoOwner'),
              'formats': formats,
          }
-
-
-class AolFeaturesIE(InfoExtractor):
-    IE_NAME = 'features.aol.com'
-    _VALID_URL = r'https?://features\.aol\.com/video/(?P<id>[^/?#]+)'
-
-    _TESTS = [{
-        'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',
-        'md5': '7db483bb0c09c85e241f84a34238cc75',
-        'info_dict': {
-            'id': '519507715',
-            'ext': 'mp4',
-            'title': 'What To Watch - February 17, 2016',
-        },
-        'add_ie': ['FiveMin'],
-        'params': {
-            # encrypted m3u8 download
-            'skip_download': True,
-        },
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-        return self.url_result(self._search_regex(
-            r'<script type="text/javascript" src="(https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js[^"]+)"',
-            webpage, '5min embed url'), 'FiveMin')
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 35f3656f11d7579a1f67cd0ac6e9c06a37c44917..2d5599456688eba9756e28c2ffe9dbae48decb2c 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -253,7 +253,7 @@ class ARDIE(InfoExtractor):
              'duration': 2600,
              'title': 'Die Story im Ersten: Mission unter falscher Flagge',
              'upload_date': '20140804',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'skip': 'HTTP Error 404: Not Found',
      }
diff --git a/youtube_dl/extractor/arkena.py b/youtube_dl/extractor/arkena.py

index d45cae301df005f455aad3ec6aeda5ed87d5b50e..50ffb442dd051be347e2c79c2d4a11dacb9f574b 100644 (file)
--- a/youtube_dl/extractor/arkena.py
+++ b/youtube_dl/extractor/arkena.py
@@ -4,8 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
      determine_ext,
+    ExtractorError,
      float_or_none,
      int_or_none,
      mimetype2ext,
@@ -15,7 +17,13 @@ from ..utils import (
  
  
  class ArkenaIE(InfoExtractor):
-    _VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:
+                                video\.arkena\.com/play2/embed/player\?|
+                                play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)
+                            )
+                        '''
      _TESTS = [{
          'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
          'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
@@ -37,6 +45,9 @@ class ArkenaIE(InfoExtractor):
      }, {
          'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
          'only_matching': True,
+    }, {
+        'url': 'http://video.arkena.com/play2/embed/player?accountId=472718&mediaId=35763b3b-00090078-bf604299&pageStyling=styled',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -53,6 +64,14 @@ class ArkenaIE(InfoExtractor):
          video_id = mobj.group('id')
          account_id = mobj.group('account_id')
  
+        # Handle http://video.arkena.com/play2/embed/player URL
+        if not video_id:
+            qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+            video_id = qs.get('mediaId', [None])[0]
+            account_id = qs.get('accountId', [None])[0]
+            if not video_id or not account_id:
+                raise ExtractorError('Invalid URL', expected=True)
+
          playlist = self._download_json(
              'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
              % (video_id, account_id),
diff --git a/youtube_dl/extractor/atresplayer.py b/youtube_dl/extractor/atresplayer.py

index d2f3889645f9b9324deb0eda00d4f6b67ab32dc1..e3c669830343bb4f698dc342adebbd764877fd4b 100644 (file)
--- a/youtube_dl/extractor/atresplayer.py
+++ b/youtube_dl/extractor/atresplayer.py
@@ -30,7 +30,7 @@ class AtresPlayerIE(InfoExtractor):
                  'title': 'Especial Solidario de Nochebuena',
                  'description': 'md5:e2d52ff12214fa937107d21064075bf1',
                  'duration': 5527.6,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'This video is only available for registered users'
          },
@@ -43,7 +43,7 @@ class AtresPlayerIE(InfoExtractor):
                  'title': 'David Bustamante',
                  'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
                  'duration': 1439.0,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/atttechchannel.py b/youtube_dl/extractor/atttechchannel.py

index b01d35bb24bb45ee76741a672926372720b81ede..8f93fb353471dbd6516a540b0621d7c292720e05 100644 (file)
--- a/youtube_dl/extractor/atttechchannel.py
+++ b/youtube_dl/extractor/atttechchannel.py
@@ -14,7 +14,7 @@ class ATTTechChannelIE(InfoExtractor):
              'ext': 'flv',
              'title': 'AT&T Archives : The UNIX System: Making Computers Easier to Use',
              'description': 'A 1982 film about UNIX is the foundation for software in use around Bell Labs and AT&T.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140127',
          },
          'params': {
diff --git a/youtube_dl/extractor/audioboom.py b/youtube_dl/extractor/audioboom.py

index d7d1c6306443b77dd7161b3c07480ad16c14ffa5..8fc5f65c67a94417498fd4480e87d50ba49c4766 100644 (file)
--- a/youtube_dl/extractor/audioboom.py
+++ b/youtube_dl/extractor/audioboom.py
@@ -17,7 +17,7 @@ class AudioBoomIE(InfoExtractor):
              'description': 'Guest:   Nate Davis - NFL free agency,   Guest:   Stan Gans',
              'duration': 2245.72,
              'uploader': 'Steve Czaban',
-            'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
+            'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
          }
      }, {
          'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
diff --git a/youtube_dl/extractor/azmedien.py b/youtube_dl/extractor/azmedien.py

new file mode 100644 (file)

index 0000000..cbc3ed5
--- /dev/null
+++ b/youtube_dl/extractor/azmedien.py
@@ -0,0 +1,172 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from .kaltura import KalturaIE
+from ..utils import (
+    get_element_by_id,
+    strip_or_none,
+    urljoin,
+)
+
+
+class AZMedienBaseIE(InfoExtractor):
+    def _kaltura_video(self, partner_id, entry_id):
+        return self.url_result(
+            'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
+            video_id=entry_id)
+
+
+class AZMedienIE(AZMedienBaseIE):
+    IE_DESC = 'AZ Medien videos'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            telezueri\.ch|
+                            telebaern\.tv|
+                            telem1\.ch
+                        )/
+                        [0-9]+-show-[^/\#]+
+                        (?:
+                            /[0-9]+-episode-[^/\#]+
+                            (?:
+                                /[0-9]+-segment-(?:[^/\#]+\#)?|
+                                \#
+                            )|
+                            \#
+                        )
+                        (?P<id>[^\#]+)
+                    '''
+
+    _TESTS = [{
+        # URL with 'segment'
+        'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
+        'info_dict': {
+            'id': '1_2444peh4',
+            'ext': 'mov',
+            'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
+            'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
+            'uploader_id': 'TeleZ?ri',
+            'upload_date': '20161218',
+            'timestamp': 1482084490,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # URL with 'segment' and fragment:
+        'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
+        'only_matching': True
+    }, {
+        # URL with 'episode' and fragment:
+        'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
+        'only_matching': True
+    }, {
+        # URL with 'show' and fragment:
+        'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
+        'only_matching': True
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        partner_id = self._search_regex(
+            r'<script[^>]+src=["\'](?:https?:)?//(?:[^/]+\.)?kaltura\.com(?:/[^/]+)*/(?:p|partner_id)/([0-9]+)',
+            webpage, 'kaltura partner id')
+        entry_id = self._html_search_regex(
+            r'<a[^>]+data-id=(["\'])(?P<id>(?:(?!\1).)+)\1[^>]+data-slug=["\']%s'
+            % re.escape(video_id), webpage, 'kaltura entry id', group='id')
+
+        return self._kaltura_video(partner_id, entry_id)
+
+
+class AZMedienPlaylistIE(AZMedienBaseIE):
+    IE_DESC = 'AZ Medien playlists'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            telezueri\.ch|
+                            telebaern\.tv|
+                            telem1\.ch
+                        )/
+                        (?P<id>[0-9]+-
+                            (?:
+                                show|
+                                topic|
+                                themen
+                            )-[^/\#]+
+                            (?:
+                                /[0-9]+-episode-[^/\#]+
+                            )?
+                        )$
+                    '''
+
+    _TESTS = [{
+        # URL with 'episode'
+        'url': 'http://www.telebaern.tv/118-show-news/13735-episode-donnerstag-15-dezember-2016',
+        'info_dict': {
+            'id': '118-show-news/13735-episode-donnerstag-15-dezember-2016',
+            'title': 'News - Donnerstag, 15. Dezember 2016',
+        },
+        'playlist_count': 9,
+    }, {
+        # URL with 'themen'
+        'url': 'http://www.telem1.ch/258-themen-tele-m1-classics',
+        'info_dict': {
+            'id': '258-themen-tele-m1-classics',
+            'title': 'Tele M1 Classics',
+        },
+        'playlist_mincount': 15,
+    }, {
+        # URL with 'topic', contains nested playlists
+        'url': 'http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen',
+        'only_matching': True,
+    }, {
+        # URL with 'show' only
+        'url': 'http://www.telezueri.ch/86-show-talktaeglich',
+        'only_matching': True
+    }]
+
+    def _real_extract(self, url):
+        show_id = self._match_id(url)
+        webpage = self._download_webpage(url, show_id)
+
+        entries = []
+
+        partner_id = self._search_regex(
+            r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
+            webpage, 'kaltura partner id', default=None)
+
+        if partner_id:
+            entries = [
+                self._kaltura_video(partner_id, m.group('id'))
+                for m in re.finditer(
+                    r'data-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage)]
+
+        if not entries:
+            entries = [
+                self.url_result(m.group('url'), ie=AZMedienIE.ie_key())
+                for m in re.finditer(
+                    r'<a[^>]+data-real=(["\'])(?P<url>http.+?)\1', webpage)]
+
+        if not entries:
+            entries = [
+                # May contain nested playlists (e.g. [1]) thus no explicit
+                # ie_key
+                # 1. http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen)
+                self.url_result(urljoin(url, m.group('url')))
+                for m in re.finditer(
+                    r'<a[^>]+name=[^>]+href=(["\'])(?P<url>/.+?)\1', webpage)]
+
+        title = self._search_regex(
+            r'episodeShareTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
+            webpage, 'title',
+            default=strip_or_none(get_element_by_id(
+                'video-title', webpage)), group='title')
+
+        return self.playlist_result(entries, show_id, title)
diff --git a/youtube_dl/extractor/azubu.py b/youtube_dl/extractor/azubu.py

index 1eebf5dfd48d31654f3612bbb058c0dd7aa9e030..3ba2f00d39dc7836a3bfeddb8ca1da3e929e6e88 100644 (file)
--- a/youtube_dl/extractor/azubu.py
+++ b/youtube_dl/extractor/azubu.py
@@ -21,7 +21,7 @@ class AzubuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': '2014 HOT6 CUP LAST BIG MATCH Ro8 Day 1',
                  'description': 'md5:d06bdea27b8cc4388a90ad35b5c66c01',
-                'thumbnail': 're:^https?://.*\.jpe?g',
+                'thumbnail': r're:^https?://.*\.jpe?g',
                  'timestamp': 1417523507.334,
                  'upload_date': '20141202',
                  'duration': 9988.7,
@@ -38,7 +38,7 @@ class AzubuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Fnatic at Worlds 2014: Toyz - "I love Rekkles, he has amazing mechanics"',
                  'description': 'md5:4a649737b5f6c8b5c5be543e88dc62af',
-                'thumbnail': 're:^https?://.*\.jpe?g',
+                'thumbnail': r're:^https?://.*\.jpe?g',
                  'timestamp': 1410530893.320,
                  'upload_date': '20140912',
                  'duration': 172.385,
diff --git a/youtube_dl/extractor/bandcamp.py b/youtube_dl/extractor/bandcamp.py

index 88c590e98388d5f6058dd71ffb97f4f0254f0c5b..056e06376667e02b34c8efa7b2565be51e4625a4 100644 (file)
--- a/youtube_dl/extractor/bandcamp.py
+++ b/youtube_dl/extractor/bandcamp.py
@@ -209,6 +209,15 @@ class BandcampAlbumIE(InfoExtractor):
              'id': 'entropy-ep',
          },
          'playlist_mincount': 3,
+    }, {
+        # not all tracks have songs
+        'url': 'https://insulters.bandcamp.com/album/we-are-the-plague',
+        'info_dict': {
+            'id': 'we-are-the-plague',
+            'title': 'WE ARE THE PLAGUE',
+            'uploader_id': 'insulters',
+        },
+        'playlist_count': 2,
      }]
  
      def _real_extract(self, url):
@@ -217,12 +226,16 @@ class BandcampAlbumIE(InfoExtractor):
          album_id = mobj.group('album_id')
          playlist_id = album_id or uploader_id
          webpage = self._download_webpage(url, playlist_id)
-        tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage)
-        if not tracks_paths:
+        track_elements = re.findall(
+            r'(?s)<div[^>]*>(.*?<a[^>]+href="([^"]+?)"[^>]+itemprop="url"[^>]*>.*?)</div>', webpage)
+        if not track_elements:
              raise ExtractorError('The page doesn\'t contain any tracks')
+        # Only tracks with duration info have songs
          entries = [
              self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
-            for t_path in tracks_paths]
+            for elem_content, t_path in track_elements
+            if self._html_search_meta('duration', elem_content, default=None)]
+
          title = self._html_search_regex(
              r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
              webpage, 'title', fatal=False)
diff --git a/youtube_dl/extractor/beampro.py b/youtube_dl/extractor/beampro.py

new file mode 100644 (file)

index 0000000..f3a9e32
--- /dev/null
+++ b/youtube_dl/extractor/beampro.py
@@ -0,0 +1,73 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    clean_html,
+    compat_str,
+    int_or_none,
+    parse_iso8601,
+    try_get,
+)
+
+
+class BeamProLiveIE(InfoExtractor):
+    IE_NAME = 'Beam:live'
+    _VALID_URL = r'https?://(?:\w+\.)?beam\.pro/(?P<id>[^/?#&]+)'
+    _RATINGS = {'family': 0, 'teen': 13, '18+': 18}
+    _TEST = {
+        'url': 'http://www.beam.pro/niterhayven',
+        'info_dict': {
+            'id': '261562',
+            'ext': 'mp4',
+            'title': 'Introducing The Witcher 3 //  The Grind Starts Now!',
+            'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
+            'thumbnail': r're:https://.*\.jpg$',
+            'timestamp': 1483477281,
+            'upload_date': '20170103',
+            'uploader': 'niterhayven',
+            'uploader_id': '373396',
+            'age_limit': 18,
+            'is_live': True,
+            'view_count': int,
+        },
+        'skip': 'niterhayven is offline',
+        'params': {
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        channel_name = self._match_id(url)
+
+        chan = self._download_json(
+            'https://beam.pro/api/v1/channels/%s' % channel_name, channel_name)
+
+        if chan.get('online') is False:
+            raise ExtractorError(
+                '{0} is offline'.format(channel_name), expected=True)
+
+        channel_id = chan['id']
+
+        formats = self._extract_m3u8_formats(
+            'https://beam.pro/api/v1/channels/%s/manifest.m3u8' % channel_id,
+            channel_name, ext='mp4', m3u8_id='hls', fatal=False)
+        self._sort_formats(formats)
+
+        user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
+
+        return {
+            'id': compat_str(chan.get('id') or channel_name),
+            'title': self._live_title(chan.get('name') or channel_name),
+            'description': clean_html(chan.get('description')),
+            'thumbnail': try_get(chan, lambda x: x['thumbnail']['url'], compat_str),
+            'timestamp': parse_iso8601(chan.get('updatedAt')),
+            'uploader': chan.get('token') or try_get(
+                chan, lambda x: x['user']['username'], compat_str),
+            'uploader_id': compat_str(user_id) if user_id else None,
+            'age_limit': self._RATINGS.get(chan.get('audience')),
+            'is_live': True,
+            'view_count': int_or_none(chan.get('viewersTotal')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/bet.py b/youtube_dl/extractor/bet.py

index 1f8ef030380c5fb548d14cc8e944c8dad1fca900..d7ceaa85e45c7da4ffbf3909b1e00f8ffd9ac75c 100644 (file)
--- a/youtube_dl/extractor/bet.py
+++ b/youtube_dl/extractor/bet.py
@@ -17,7 +17,7 @@ class BetIE(MTVServicesInfoExtractor):
                  'description': 'President Obama urges persistence in confronting racism and bias.',
                  'duration': 1534,
                  'upload_date': '20141208',
-                'thumbnail': 're:(?i)^https?://.*\.jpg$',
+                'thumbnail': r're:(?i)^https?://.*\.jpg$',
                  'subtitles': {
                      'en': 'mincount:2',
                  }
@@ -37,7 +37,7 @@ class BetIE(MTVServicesInfoExtractor):
                  'description': 'A BET News special.',
                  'duration': 1696,
                  'upload_date': '20141125',
-                'thumbnail': 're:(?i)^https?://.*\.jpg$',
+                'thumbnail': r're:(?i)^https?://.*\.jpg$',
                  'subtitles': {
                      'en': 'mincount:2',
                  }
diff --git a/youtube_dl/extractor/bild.py b/youtube_dl/extractor/bild.py

index 1a0184861d20d7674badc042bfc44fbda6c9718b..b8dfbd42b429beb8f530076b1efdee571ec4d855 100644 (file)
--- a/youtube_dl/extractor/bild.py
+++ b/youtube_dl/extractor/bild.py
@@ -19,7 +19,7 @@ class BildIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Das können die  neuen iPads',
              'description': 'md5:a4058c4fa2a804ab59c00d7244bbf62f',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 196,
          }
      }
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index 2d174e6f9a81da7412cd58ac316c7b5924dcde78..80dd8382e4e8758274e3a7ba2418479ee3d2fbbc 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -5,19 +5,27 @@ import hashlib
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_parse_qs
+from ..compat import (
+    compat_parse_qs,
+    compat_urlparse,
+)
  from ..utils import (
+    ExtractorError,
      int_or_none,
      float_or_none,
+    parse_iso8601,
+    smuggle_url,
+    strip_jsonp,
      unified_timestamp,
+    unsmuggle_url,
      urlencode_postdata,
  )
  
  
  class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.bilibili.tv/video/av1074402/',
          'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
          'info_dict': {
@@ -28,29 +36,65 @@ class BiliBiliIE(InfoExtractor):
              'duration': 308.315,
              'timestamp': 1398012660,
              'upload_date': '20140420',
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
              'uploader': '菊子桑',
              'uploader_id': '156160',
          },
-    }
+    }, {
+        # Tested in BiliBiliBangumiIE
+        'url': 'http://bangumi.bilibili.com/anime/1869/play#40062',
+        'only_matching': True,
+    }, {
+        'url': 'http://bangumi.bilibili.com/anime/5802/play#100643',
+        'md5': '3f721ad1e75030cc06faf73587cfec57',
+        'info_dict': {
+            'id': '100643',
+            'ext': 'mp4',
+            'title': 'CHAOS;CHILD',
+            'description': '如果你是神明，并且能够让妄想成为现实。那你会进行怎么样的妄想？是淫靡的世界？独裁社会？毁灭性的制裁？还是……2015年，涩谷。从6年前发生的大灾害“涩谷地震”之后复兴了的这个街区里新设立的私立高中...',
+        },
+        'skip': 'Geo-restricted to China',
+    }]
+
+    _APP_KEY = '84956560bc028eb7'
+    _BILIBILI_KEY = '94aba54af9065f71de72f5508f1cd42e'
  
-    _APP_KEY = '6f90a59ac58a4123'
-    _BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
+    def _report_error(self, result):
+        if 'message' in result:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, result['message']), expected=True)
+        elif 'code' in result:
+            raise ExtractorError('%s returns error %d' % (self.IE_NAME, result['code']), expected=True)
+        else:
+            raise ExtractorError('Can\'t extract Bangumi episode ID')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        url, smuggled_data = unsmuggle_url(url, {})
+
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        anime_id = mobj.group('anime_id')
          webpage = self._download_webpage(url, video_id)
  
-        if 'anime/v' not in url:
+        if 'anime/' not in url:
              cid = compat_parse_qs(self._search_regex(
                  [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
                   r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
                  webpage, 'player parameters'))['cid'][0]
          else:
+            if 'no_bangumi_tip' not in smuggled_data:
+                self.to_screen('Downloading episode %s. To download all videos in anime %s, re-run youtube-dl with %s' % (
+                    video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
+            headers = {
+                'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
+            }
+            headers.update(self.geo_verification_headers())
+
              js = self._download_json(
                  'http://bangumi.bilibili.com/web_api/get_source', video_id,
                  data=urlencode_postdata({'episode_id': video_id}),
-                headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
+                headers=headers)
+            if 'result' not in js:
+                self._report_error(js)
              cid = js['result']['cid']
  
          payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
@@ -58,7 +102,11 @@ class BiliBiliIE(InfoExtractor):
  
          video_info = self._download_json(
              'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
-            video_id, note='Downloading video info page')
+            video_id, note='Downloading video info page',
+            headers=self.geo_verification_headers())
+
+        if 'durl' not in video_info:
+            self._report_error(video_info)
  
          entries = []
  
@@ -85,7 +133,7 @@ class BiliBiliIE(InfoExtractor):
          title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
          description = self._html_search_meta('description', webpage)
          timestamp = unified_timestamp(self._html_search_regex(
-            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
+            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))
          thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
  
          # TODO 'view_count' requires deobfuscating Javascript
@@ -99,7 +147,7 @@ class BiliBiliIE(InfoExtractor):
          }
  
          uploader_mobj = re.search(
-            r'<a[^>]+href="https?://space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
+            r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
              webpage)
          if uploader_mobj:
              info.update({
@@ -123,3 +171,70 @@ class BiliBiliIE(InfoExtractor):
                  'description': description,
                  'entries': entries,
              }
+
+
+class BiliBiliBangumiIE(InfoExtractor):
+    _VALID_URL = r'https?://bangumi\.bilibili\.com/anime/(?P<id>\d+)'
+
+    IE_NAME = 'bangumi.bilibili.com'
+    IE_DESC = 'BiliBili番剧'
+
+    _TESTS = [{
+        'url': 'http://bangumi.bilibili.com/anime/1869',
+        'info_dict': {
+            'id': '1869',
+            'title': '混沌武士',
+            'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
+        },
+        'playlist_count': 26,
+    }, {
+        'url': 'http://bangumi.bilibili.com/anime/1869',
+        'info_dict': {
+            'id': '1869',
+            'title': '混沌武士',
+            'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
+        },
+        'playlist': [{
+            'md5': '91da8621454dd58316851c27c68b0c13',
+            'info_dict': {
+                'id': '40062',
+                'ext': 'mp4',
+                'title': '混沌武士',
+                'description': '故事发生在日本的江户时代。风是一个小酒馆的打工女。一日，酒馆里来了一群恶霸，虽然他们的举动令风十分不满，但是毕竟风只是一届女流，无法对他们采取什么行动，只能在心里嘟哝。这时，酒家里又进来了个“不良份子...',
+                'timestamp': 1414538739,
+                'upload_date': '20141028',
+                'episode': '疾风怒涛 Tempestuous Temperaments',
+                'episode_number': 1,
+            },
+        }],
+        'params': {
+            'playlist_items': '1',
+        },
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if BiliBiliIE.suitable(url) else super(BiliBiliBangumiIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        bangumi_id = self._match_id(url)
+
+        # Sometimes this API returns a JSONP response
+        season_info = self._download_json(
+            'http://bangumi.bilibili.com/jsonp/seasoninfo/%s.ver' % bangumi_id,
+            bangumi_id, transform_source=strip_jsonp)['result']
+
+        entries = [{
+            '_type': 'url_transparent',
+            'url': smuggle_url(episode['webplay_url'], {'no_bangumi_tip': 1}),
+            'ie_key': BiliBiliIE.ie_key(),
+            'timestamp': parse_iso8601(episode.get('update_time'), delimiter=' '),
+            'episode': episode.get('index_title'),
+            'episode_number': int_or_none(episode.get('index')),
+        } for episode in season_info['episodes']]
+
+        entries = sorted(entries, key=lambda entry: entry.get('episode_number'))
+
+        return self.playlist_result(
+            entries, bangumi_id,
+            season_info.get('bangumi_title'), season_info.get('evaluate'))
diff --git a/youtube_dl/extractor/biobiochiletv.py b/youtube_dl/extractor/biobiochiletv.py

index 7608c0a085b3c656277b03f61d19c1f60ea8d4f1..b92031c8ab6bac3c5d5135743dda5883de080855 100644 (file)
--- a/youtube_dl/extractor/biobiochiletv.py
+++ b/youtube_dl/extractor/biobiochiletv.py
@@ -19,7 +19,7 @@ class BioBioChileTVIE(InfoExtractor):
              'id': 'sobre-camaras-y-camarillas-parlamentarias',
              'ext': 'mp4',
              'title': 'Sobre Cámaras y camarillas parlamentarias',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Fernando Atria',
          },
          'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
@@ -31,7 +31,7 @@ class BioBioChileTVIE(InfoExtractor):
              'id': 'natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades',
              'ext': 'mp4',
              'title': 'Natalia Valdebenito repasa a diputado Hasbún: Pasó a la categoría de hablar brutalidades',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Piangella Obrador',
          },
          'params': {
diff --git a/youtube_dl/extractor/bloomberg.py b/youtube_dl/extractor/bloomberg.py

index 2a8cd64b99d2551da9777aaa259d356d8cad51ed..c5e11e8eb81151ca8dd07be04c1fe2005a26bfa9 100644 (file)
--- a/youtube_dl/extractor/bloomberg.py
+++ b/youtube_dl/extractor/bloomberg.py
@@ -45,7 +45,8 @@ class BloombergIE(InfoExtractor):
          name = self._match_id(url)
          webpage = self._download_webpage(url, name)
          video_id = self._search_regex(
-            r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
+            (r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
+             r'videoId\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
              webpage, 'id', group='url', default=None)
          if not video_id:
              bplayer_data = self._parse_json(self._search_regex(
diff --git a/youtube_dl/extractor/breakcom.py b/youtube_dl/extractor/breakcom.py

index 725859b4d2d554df91ff4793a2b3d245f02c8996..5a87c2661910303d638351a0f5155dd20db35793 100644 (file)
--- a/youtube_dl/extractor/breakcom.py
+++ b/youtube_dl/extractor/breakcom.py
@@ -1,9 +1,9 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
      parse_age_limit,
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class BreakIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>break|screenjunkies)\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
      _TESTS = [{
          'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
          'info_dict': {
@@ -20,45 +20,124 @@ class BreakIE(InfoExtractor):
              'title': 'When Girls Act Like D-Bags',
              'age_limit': 13,
          }
+    }, {
+        'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
+        'md5': '5c2b686bec3d43de42bde9ec047536b0',
+        'info_dict': {
+            'id': '2841915',
+            'display_id': 'best-quentin-tarantino-movie',
+            'ext': 'mp4',
+            'title': 'Best Quentin Tarantino Movie',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 3671,
+            'age_limit': 13,
+            'tags': list,
+        },
+    }, {
+        'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
+        'info_dict': {
+            'id': '2348808',
+            'display_id': 'honest-trailers-the-dark-knight',
+            'ext': 'mp4',
+            'title': 'Honest Trailers - The Dark Knight',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
+            'age_limit': 10,
+            'tags': list,
+        },
+    }, {
+        # requires subscription but worked around
+        'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
+        'info_dict': {
+            'id': '3003285',
+            'display_id': 'knocking-dead-ep-1-the-show-so-far',
+            'ext': 'mp4',
+            'title': 'State of The Dead Recap: Knocking Dead Pilot',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 3307,
+            'age_limit': 13,
+            'tags': list,
+        },
      }, {
          'url': 'http://www.break.com/video/ugc/baby-flex-2773063',
          'only_matching': True,
      }]
  
+    _DEFAULT_BITRATES = (48, 150, 320, 496, 864, 2240, 3264)
+
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        site, display_id, video_id = re.match(self._VALID_URL, url).groups()
+
+        if not video_id:
+            webpage = self._download_webpage(url, display_id)
+            video_id = self._search_regex(
+                (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
+                webpage, 'video id')
+
          webpage = self._download_webpage(
-            'http://www.break.com/embed/%s' % video_id, video_id)
-        info = json.loads(self._search_regex(
-            r'var embedVars = ({.*})\s*?</script>',
-            webpage, 'info json', flags=re.DOTALL))
+            'http://www.%s.com/embed/%s' % (site, video_id),
+            display_id, 'Downloading video embed page')
+        embed_vars = self._parse_json(
+            self._search_regex(
+                r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
+            display_id)
  
-        youtube_id = info.get('youtubeId')
+        youtube_id = embed_vars.get('youtubeId')
          if youtube_id:
              return self.url_result(youtube_id, 'Youtube')
  
-        formats = [{
-            'url': media['uri'] + '?' + info['AuthToken'],
-            'tbr': media['bitRate'],
-            'width': media['width'],
-            'height': media['height'],
-        } for media in info['media'] if media.get('mediaPurpose') == 'play']
+        title = embed_vars['contentName']
  
-        if not formats:
+        formats = []
+        bitrates = []
+        for f in embed_vars.get('media', []):
+            if not f.get('uri') or f.get('mediaPurpose') != 'play':
+                continue
+            bitrate = int_or_none(f.get('bitRate'))
+            if bitrate:
+                bitrates.append(bitrate)
              formats.append({
-                'url': info['videoUri']
+                'url': f['uri'],
+                'format_id': 'http-%d' % bitrate if bitrate else 'http',
+                'width': int_or_none(f.get('width')),
+                'height': int_or_none(f.get('height')),
+                'tbr': bitrate,
+                'format': 'mp4',
              })
  
-        self._sort_formats(formats)
+        if not bitrates:
+            # When subscriptionLevel > 0, i.e. plus subscription is required
+            # media list will be empty. However, hds and hls uris are still
+            # available. We can grab them assuming bitrates to be default.
+            bitrates = self._DEFAULT_BITRATES
+
+        auth_token = embed_vars.get('AuthToken')
  
-        duration = int_or_none(info.get('videoLengthInSeconds'))
-        age_limit = parse_age_limit(info.get('audienceRating'))
+        def construct_manifest_url(base_url, ext):
+            pieces = [base_url]
+            pieces.extend([compat_str(b) for b in bitrates])
+            pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
+            return ','.join(pieces)
+
+        if bitrates and auth_token:
+            hds_url = embed_vars.get('hdsUri')
+            if hds_url:
+                formats.extend(self._extract_f4m_formats(
+                    construct_manifest_url(hds_url, 'f4m'),
+                    display_id, f4m_id='hds', fatal=False))
+            hls_url = embed_vars.get('hlsUri')
+            if hls_url:
+                formats.extend(self._extract_m3u8_formats(
+                    construct_manifest_url(hls_url, 'm3u8'),
+                    display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': info['contentName'],
-            'thumbnail': info['thumbUri'],
-            'duration': duration,
-            'age_limit': age_limit,
+            'display_id': display_id,
+            'title': title,
+            'thumbnail': embed_vars.get('thumbUri'),
+            'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
+            'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
+            'tags': embed_vars.get('tags', '').split(','),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 945cf19e8bce0f1f9576d26abc455c9795a250d3..5c6e99da134efe150962dd979cbf36e391527d3a 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -179,7 +179,7 @@ class BrightcoveLegacyIE(InfoExtractor):
  
          params = {}
  
-        playerID = find_param('playerID')
+        playerID = find_param('playerID') or find_param('playerId')
          if playerID is None:
              raise ExtractorError('Cannot find player ID')
          params['playerID'] = playerID
@@ -204,7 +204,7 @@ class BrightcoveLegacyIE(InfoExtractor):
          #   // build Brightcove <object /> XML
          # }
          m = re.search(
-            r'''(?x)customBC.\createVideo\(
+            r'''(?x)customBC\.createVideo\(
                  .*?                                                  # skipping width and height
                  ["\'](?P<playerID>\d+)["\']\s*,\s*                   # playerID
                  ["\'](?P<playerKey>AQ[^"\']{48})[^"\']*["\']\s*,\s*  # playerKey begins with AQ and is 50 characters
@@ -232,13 +232,16 @@ class BrightcoveLegacyIE(InfoExtractor):
          """Return a list of all Brightcove URLs from the webpage """
  
          url_m = re.search(
-            r'<meta\s+property=[\'"]og:video[\'"]\s+content=[\'"](https?://(?:secure|c)\.brightcove.com/[^\'"]+)[\'"]',
-            webpage)
+            r'''(?x)
+                <meta\s+
+                    (?:property|itemprop)=([\'"])(?:og:video|embedURL)\1[^>]+
+                    content=([\'"])(?P<url>https?://(?:secure|c)\.brightcove.com/(?:(?!\2).)+)\2
+            ''', webpage)
          if url_m:
-            url = unescapeHTML(url_m.group(1))
+            url = unescapeHTML(url_m.group('url'))
              # Some sites don't add it, we can't download with this url, for example:
              # http://www.ktvu.com/videos/news/raw-video-caltrain-releases-video-of-man-almost/vCTZdY/
-            if 'playerKey' in url or 'videoId' in url:
+            if 'playerKey' in url or 'videoId' in url or 'idVideo' in url:
                  return [url]
  
          matches = re.findall(
@@ -259,7 +262,7 @@ class BrightcoveLegacyIE(InfoExtractor):
          url, smuggled_data = unsmuggle_url(url, {})
  
          # Change the 'videoId' and others field to '@videoPlayer'
-        url = re.sub(r'(?<=[?&])(videoI(d|D)|bctid)', '%40videoPlayer', url)
+        url = re.sub(r'(?<=[?&])(videoI(d|D)|idVideo|bctid)', '%40videoPlayer', url)
          # Change bckey (used by bcove.me urls) to playerKey
          url = re.sub(r'(?<=[?&])bckey', 'playerKey', url)
          mobj = re.match(self._VALID_URL, url)
@@ -548,7 +551,7 @@ class BrightcoveNewIE(InfoExtractor):
              container = source.get('container')
              ext = mimetype2ext(source.get('type'))
              src = source.get('src')
-            if ext == 'ism':
+            if ext == 'ism' or container == 'WVM':
                  continue
              elif ext == 'm3u8' or container == 'M2TS':
                  if not src:
diff --git a/youtube_dl/extractor/byutv.py b/youtube_dl/extractor/byutv.py

index 4be175d7039dd845f7c961af552bc1153b73598e..8ef089653d80e75bcf9eb89b81a9c82f08e7971d 100644 (file)
--- a/youtube_dl/extractor/byutv.py
+++ b/youtube_dl/extractor/byutv.py
@@ -16,7 +16,7 @@ class BYUtvIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Season 5 Episode 5',
              'description': 'md5:e07269172baff037f8e8bf9956bc9747',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1486.486,
          },
          'params': {
diff --git a/youtube_dl/extractor/camdemy.py b/youtube_dl/extractor/camdemy.py

index d4e6fbdce029b8267450b9d50d3b41556a47664d..8f0c6c545c35312813bb461b3218f868a660fdd2 100644 (file)
--- a/youtube_dl/extractor/camdemy.py
+++ b/youtube_dl/extractor/camdemy.py
@@ -26,7 +26,7 @@ class CamdemyIE(InfoExtractor):
              'id': '5181',
              'ext': 'mp4',
              'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'creator': 'ss11spring',
              'duration': 1591,
              'upload_date': '20130114',
@@ -41,7 +41,7 @@ class CamdemyIE(InfoExtractor):
              'id': '13885',
              'ext': 'mp4',
              'title': 'EverCam + Camdemy QuickStart',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:2a9f989c2b153a2342acee579c6e7db6',
              'creator': 'evercam',
              'duration': 318,
diff --git a/youtube_dl/extractor/canalplus.py b/youtube_dl/extractor/canalplus.py

index 1c3c41d26619ec2fa347c4a75093b2a1cf7003a2..4b9fa2ddf8c42ce642c3e9395746a3858dd7060c 100644 (file)
--- a/youtube_dl/extractor/canalplus.py
+++ b/youtube_dl/extractor/canalplus.py
@@ -27,6 +27,7 @@ class CanalplusIE(InfoExtractor):
                                      (?:www\.)?d8\.tv|
                                      (?:www\.)?c8\.fr|
                                      (?:www\.)?d17\.tv|
+                                    (?:(?:football|www)\.)?cstar\.fr|
                                      (?:www\.)?itele\.fr
                                  )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
                                  player\.canalplus\.fr/#/(?P<id>\d+)
@@ -40,6 +41,7 @@ class CanalplusIE(InfoExtractor):
          'd8': 'd8',
          'c8': 'd8',
          'd17': 'd17',
+        'cstar': 'd17',
          'itele': 'itele',
      }
  
@@ -86,6 +88,19 @@ class CanalplusIE(InfoExtractor):
              'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
              'upload_date': '20161014',
          },
+    }, {
+        'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
+        'info_dict': {
+            'id': '1416769',
+            'display_id': 'pid7566-feminines-videos',
+            'ext': 'mp4',
+            'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
+            'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
+            'upload_date': '20160921',
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'http://m.canalplus.fr/?vid=1398231',
          'only_matching': True,
@@ -105,8 +120,9 @@ class CanalplusIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id)
          video_id = self._search_regex(
              [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
-             r'id=["\']canal_video_player(?P<id>\d+)'],
-            webpage, 'video id', group='id')
+             r'id=["\']canal_video_player(?P<id>\d+)',
+             r'data-video=["\'](?P<id>\d+)'],
+            webpage, 'video id', default=mobj.group('vid'), group='id')
  
          info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
          video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py

index d183d5d527fb8ab4163b16fcaffd0aeedbf0dd0c..544c6657c12e53afee6d0e1a2916a9a47d606415 100644 (file)
--- a/youtube_dl/extractor/canvas.py
+++ b/youtube_dl/extractor/canvas.py
@@ -17,7 +17,7 @@ class CanvasIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'De afspraak veilt voor de Warmste Week',
              'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 49.02,
          }
      }, {
@@ -29,7 +29,7 @@ class CanvasIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Pieter 0167',
              'description': 'md5:943cd30f48a5d29ba02c3a104dc4ec4e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 2553.08,
              'subtitles': {
                  'nl': [{
@@ -48,7 +48,7 @@ class CanvasIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Herbekijk Sorry voor alles',
              'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 3788.06,
          },
          'params': {
@@ -89,6 +89,9 @@ class CanvasIE(InfoExtractor):
              elif format_type == 'HDS':
                  formats.extend(self._extract_f4m_formats(
                      format_url, display_id, f4m_id=format_type, fatal=False))
+            elif format_type == 'MPEG_DASH':
+                formats.extend(self._extract_mpd_formats(
+                    format_url, display_id, mpd_id=format_type, fatal=False))
              else:
                  formats.append({
                      'format_id': format_type,
diff --git a/youtube_dl/extractor/carambatv.py b/youtube_dl/extractor/carambatv.py

index 66c0f900a402664653a846e9b39fc44c1da2853e..9ba909a918755b5a02f8d8fe684a6f73a438ada9 100644 (file)
--- a/youtube_dl/extractor/carambatv.py
+++ b/youtube_dl/extractor/carambatv.py
@@ -21,7 +21,7 @@ class CarambaTVIE(InfoExtractor):
              'id': '191910501',
              'ext': 'mp4',
              'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2678.31,
          },
      }, {
@@ -69,7 +69,7 @@ class CarambaTVPageIE(InfoExtractor):
              'id': '475222',
              'ext': 'flv',
              'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              # duration reported by videomore is incorrect
              'duration': int,
          },
diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py

index d71fddf58a068461cd2d377b31e4c3981d6c2b3d..cf678e7f843225f00a69546c59ba27a2b9c93c3d 100644 (file)
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -90,36 +90,49 @@ class CBCIE(InfoExtractor):
              },
          }],
          'skip': 'Geo-restricted to Canada',
+    }, {
+        # multiple CBC.APP.Caffeine.initInstance(...)
+        'url': 'http://www.cbc.ca/news/canada/calgary/dog-indoor-exercise-winter-1.3928238',
+        'info_dict': {
+            'title': 'Keep Rover active during the deep freeze with doggie pushups and other fun indoor tasks',
+            'id': 'dog-indoor-exercise-winter-1.3928238',
+        },
+        'playlist_mincount': 6,
      }]
  
      @classmethod
      def suitable(cls, url):
          return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
  
+    def _extract_player_init(self, player_init, display_id):
+        player_info = self._parse_json(player_init, display_id, js_to_json)
+        media_id = player_info.get('mediaId')
+        if not media_id:
+            clip_id = player_info['clipId']
+            feed = self._download_json(
+                'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
+                clip_id, fatal=False)
+            if feed:
+                media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
+            if not media_id:
+                media_id = self._download_json(
+                    'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
+                    clip_id)['entries'][0]['id'].split('/')[-1]
+        return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
+
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        player_init = self._search_regex(
-            r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init',
-            default=None)
-        if player_init:
-            player_info = self._parse_json(player_init, display_id, js_to_json)
-            media_id = player_info.get('mediaId')
-            if not media_id:
-                clip_id = player_info['clipId']
-                feed = self._download_json(
-                    'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
-                    clip_id, fatal=False)
-                if feed:
-                    media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
-                if not media_id:
-                    media_id = self._download_json(
-                        'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
-                        clip_id)['entries'][0]['id'].split('/')[-1]
-            return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
-        else:
-            entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
-            return self.playlist_result(entries)
+        entries = [
+            self._extract_player_init(player_init, display_id)
+            for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)]
+        entries.extend([
+            self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
+            for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)])
+        return self.playlist_result(
+            entries, display_id,
+            self._og_search_title(webpage, fatal=False),
+            self._og_search_description(webpage))
  
  
  class CBCPlayerIE(InfoExtractor):
@@ -283,11 +296,12 @@ class CBCWatchVideoIE(CBCWatchBaseIE):
          formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
          if len(formats) < 2:
              formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
-        # Despite metadata in m3u8 all video+audio formats are
-        # actually video-only (no audio)
          for f in formats:
-            if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
-                f['acodec'] = 'none'
+            format_id = f.get('format_id')
+            if format_id.startswith('AAC'):
+                f['acodec'] = 'aac'
+            elif format_id.startswith('AC3'):
+                f['acodec'] = 'ac-3'
          self._sort_formats(formats)
  
          info = {
diff --git a/youtube_dl/extractor/cbsnews.py b/youtube_dl/extractor/cbsnews.py

index 91b0f5fa94c7ba919e01fd097cbdfc71fe6992b4..17bb9af4fe2a8a0066611a2bbc2d090ad7cf5e30 100644 (file)
--- a/youtube_dl/extractor/cbsnews.py
+++ b/youtube_dl/extractor/cbsnews.py
@@ -39,7 +39,7 @@ class CBSNewsIE(CBSIE):
                  'upload_date': '20140404',
                  'timestamp': 1396650660,
                  'uploader': 'CBSI-NEW',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 205,
                  'subtitles': {
                      'en': [{
diff --git a/youtube_dl/extractor/ccc.py b/youtube_dl/extractor/ccc.py

index 8f7f09e22dad6eda3ca08edfbf9edc118146e893..73470214412b542adad72f1227e66fd341742e82 100644 (file)
--- a/youtube_dl/extractor/ccc.py
+++ b/youtube_dl/extractor/ccc.py
@@ -19,7 +19,7 @@ class CCCIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Introduction to Processor Design',
              'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20131228',
              'timestamp': 1388188800,
              'duration': 3710,
@@ -32,7 +32,7 @@ class CCCIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id')
+        event_id = self._search_regex(r"data-id='(\d+)'", webpage, 'event id')
          event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id)
  
          formats = []
diff --git a/youtube_dl/extractor/ccma.py b/youtube_dl/extractor/ccma.py

new file mode 100644 (file)

index 0000000..39938c9
--- /dev/null
+++ b/youtube_dl/extractor/ccma.py
@@ -0,0 +1,99 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    parse_iso8601,
+    clean_html,
+)
+
+
+class CCMAIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ccma\.cat/(?:[^/]+/)*?(?P<type>video|audio)/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.ccma.cat/tv3/alacarta/lespot-de-la-marato-de-tv3/lespot-de-la-marato-de-tv3/video/5630208/',
+        'md5': '7296ca43977c8ea4469e719c609b0871',
+        'info_dict': {
+            'id': '5630208',
+            'ext': 'mp4',
+            'title': 'L\'espot de La Marató de TV3',
+            'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
+            'timestamp': 1470918540,
+            'upload_date': '20160811',
+        }
+    }, {
+        'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
+        'md5': 'fa3e38f269329a278271276330261425',
+        'info_dict': {
+            'id': '943685',
+            'ext': 'mp3',
+            'title': 'El Consell de Savis analitza el derbi',
+            'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
+            'upload_date': '20171205',
+            'timestamp': 1512507300,
+        }
+    }]
+
+    def _real_extract(self, url):
+        media_type, media_id = re.match(self._VALID_URL, url).groups()
+        media_data = {}
+        formats = []
+        profiles = ['pc'] if media_type == 'audio' else ['mobil', 'pc']
+        for i, profile in enumerate(profiles):
+            md = self._download_json('http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
+                'media': media_type,
+                'idint': media_id,
+                'profile': profile,
+            }, fatal=False)
+            if md:
+                media_data = md
+                media_url = media_data.get('media', {}).get('url')
+                if media_url:
+                    formats.append({
+                        'format_id': profile,
+                        'url': media_url,
+                        'quality': i,
+                    })
+        self._sort_formats(formats)
+
+        informacio = media_data['informacio']
+        title = informacio['titol']
+        durada = informacio.get('durada', {})
+        duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
+        timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
+
+        subtitles = {}
+        subtitols = media_data.get('subtitols', {})
+        if subtitols:
+            sub_url = subtitols.get('url')
+            if sub_url:
+                subtitles.setdefault(
+                    subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({
+                        'url': sub_url,
+                    })
+
+        thumbnails = []
+        imatges = media_data.get('imatges', {})
+        if imatges:
+            thumbnail_url = imatges.get('url')
+            if thumbnail_url:
+                thumbnails = [{
+                    'url': thumbnail_url,
+                    'width': int_or_none(imatges.get('amplada')),
+                    'height': int_or_none(imatges.get('alcada')),
+                }]
+
+        return {
+            'id': media_id,
+            'title': title,
+            'description': clean_html(informacio.get('descripcio')),
+            'duration': duration,
+            'timestamp': timestamp,
+            'thumnails': thumbnails,
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/cctv.py b/youtube_dl/extractor/cctv.py

index 72a72cb73502ad242b150ea897512ba5426207c2..c76f361c684e96e992f5fea9f9ffbcd5a114a6ad 100644 (file)
--- a/youtube_dl/extractor/cctv.py
+++ b/youtube_dl/extractor/cctv.py
@@ -4,50 +4,188 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import float_or_none
+from ..compat import compat_str
+from ..utils import (
+    float_or_none,
+    try_get,
+    unified_timestamp,
+)
  
  
  class CCTVIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:.+?\.)?
-        (?:
-            cctv\.(?:com|cn)|
-            cntv\.cn
-        )/
-        (?:
-            video/[^/]+/(?P<id>[0-9a-f]{32})|
-            \d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
-        )'''
+    IE_DESC = '央视网'
+    _VALID_URL = r'https?://(?:(?:[^/]+)\.(?:cntv|cctv)\.(?:com|cn)|(?:www\.)?ncpa-classic\.com)/(?:[^/]+/)*?(?P<id>[^/?#&]+?)(?:/index)?(?:\.s?html|[?#&]|$)'
      _TESTS = [{
-        'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
-        'md5': '819c7b49fc3927d529fb4cd555621823',
+        # fo.addVariable("videoCenterId","id")
+        'url': 'http://sports.cntv.cn/2016/02/12/ARTIaBRxv4rTT1yWf1frW2wi160212.shtml',
+        'md5': 'd61ec00a493e09da810bf406a078f691',
          'info_dict': {
-            'id': '454368eb19ad44a1925bf1eb96140a61',
+            'id': '5ecdbeab623f4973b40ff25f18b174e8',
              'ext': 'mp4',
-            'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
-        }
+            'title': '[NBA]二少联手砍下46分 雷霆主场击败鹈鹕（快讯）',
+            'description': 'md5:7e14a5328dc5eb3d1cd6afbbe0574e95',
+            'duration': 98,
+            'uploader': 'songjunjie',
+            'timestamp': 1455279956,
+            'upload_date': '20160212',
+        },
+    }, {
+        # var guid = "id"
+        'url': 'http://tv.cctv.com/2016/02/05/VIDEUS7apq3lKrHG9Dncm03B160205.shtml',
+        'info_dict': {
+            'id': 'efc5d49e5b3b4ab2b34f3a502b73d3ae',
+            'ext': 'mp4',
+            'title': '[赛车]“车王”舒马赫恢复情况成谜（快讯）',
+            'description': '2月4日，蒙特泽莫罗透露了关于“车王”舒马赫恢复情况，但情况是否属实遭到了质疑。',
+            'duration': 37,
+            'uploader': 'shujun',
+            'timestamp': 1454677291,
+            'upload_date': '20160205',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # changePlayer('id')
+        'url': 'http://english.cntv.cn/special/four_comprehensives/index.shtml',
+        'info_dict': {
+            'id': '4bb9bb4db7a6471ba85fdeda5af0381e',
+            'ext': 'mp4',
+            'title': 'NHnews008 ANNUAL POLITICAL SEASON',
+            'description': 'Four Comprehensives',
+            'duration': 60,
+            'uploader': 'zhangyunlei',
+            'timestamp': 1425385521,
+            'upload_date': '20150303',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # loadvideo('id')
+        'url': 'http://cctv.cntv.cn/lm/tvseries_russian/yilugesanghua/index.shtml',
+        'info_dict': {
+            'id': 'b15f009ff45c43968b9af583fc2e04b2',
+            'ext': 'mp4',
+            'title': 'Путь，усыпанный космеями Серия 1',
+            'description': 'Путь, усыпанный космеями',
+            'duration': 2645,
+            'uploader': 'renxue',
+            'timestamp': 1477479241,
+            'upload_date': '20161026',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # var initMyAray = 'id'
+        'url': 'http://www.ncpa-classic.com/2013/05/22/VIDE1369219508996867.shtml',
+        'info_dict': {
+            'id': 'a194cfa7f18c426b823d876668325946',
+            'ext': 'mp4',
+            'title': '小泽征尔音乐塾 音乐梦想无国界',
+            'duration': 2173,
+            'timestamp': 1369248264,
+            'upload_date': '20130522',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # var ids = ["id"]
+        'url': 'http://www.ncpa-classic.com/clt/more/416/index.shtml',
+        'info_dict': {
+            'id': 'a8606119a4884588a79d81c02abecc16',
+            'ext': 'mp3',
+            'title': '来自维也纳的新年贺礼',
+            'description': 'md5:f13764ae8dd484e84dd4b39d5bcba2a7',
+            'duration': 1578,
+            'uploader': 'djy',
+            'timestamp': 1482942419,
+            'upload_date': '20161228',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'expected_warnings': ['Failed to download m3u8 information'],
+    }, {
+        'url': 'http://ent.cntv.cn/2016/01/18/ARTIjprSSJH8DryTVr5Bx8Wb160118.shtml',
+        'only_matching': True,
+    }, {
+        'url': 'http://tv.cntv.cn/video/C39296/e0210d949f113ddfb38d31f00a4e5c44',
+        'only_matching': True,
+    }, {
+        'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
+        'only_matching': True,
      }, {
          'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
          'only_matching': True,
      }, {
          'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
-        'only_matching': True
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        video_id, display_id = re.match(self._VALID_URL, url).groups()
-        if not video_id:
-            webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(
-                r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
-                webpage, 'video_id')
-        api_data = self._download_json(
-            'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
-        m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        video_id = self._search_regex(
+            [r'var\s+guid\s*=\s*["\']([\da-fA-F]+)',
+             r'videoCenterId["\']\s*,\s*["\']([\da-fA-F]+)',
+             r'changePlayer\s*\(\s*["\']([\da-fA-F]+)',
+             r'load[Vv]ideo\s*\(\s*["\']([\da-fA-F]+)',
+             r'var\s+initMyAray\s*=\s*["\']([\da-fA-F]+)',
+             r'var\s+ids\s*=\s*\[["\']([\da-fA-F]+)'],
+            webpage, 'video id')
+
+        data = self._download_json(
+            'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do', video_id,
+            query={
+                'pid': video_id,
+                'url': url,
+                'idl': 32,
+                'idlr': 32,
+                'modifyed': 'false',
+            })
+
+        title = data['title']
+
+        formats = []
+
+        video = data.get('video')
+        if isinstance(video, dict):
+            for quality, chapters_key in enumerate(('lowChapters', 'chapters')):
+                video_url = try_get(
+                    video, lambda x: x[chapters_key][0]['url'], compat_str)
+                if video_url:
+                    formats.append({
+                        'url': video_url,
+                        'format_id': 'http',
+                        'quality': quality,
+                        'preference': -1,
+                    })
+
+        hls_url = try_get(data, lambda x: x['hls_url'], compat_str)
+        if hls_url:
+            hls_url = re.sub(r'maxbr=\d+&?', '', hls_url)
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        self._sort_formats(formats)
+
+        uploader = data.get('editer_name')
+        description = self._html_search_meta(
+            'description', webpage, default=None)
+        timestamp = unified_timestamp(data.get('f_pgmtime'))
+        duration = float_or_none(try_get(video, lambda x: x['totalLength']))
  
          return {
              'id': video_id,
-            'title': api_data['title'],
-            'formats': self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
-            'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
+            'title': title,
+            'description': description,
+            'uploader': uploader,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/cda.py b/youtube_dl/extractor/cda.py

index e00bdaf66a6d9eb6ac051cc169cabbf02844770b..ae7af2f0e3c432dd5c9f75401d787f0b2b27d083 100755 (executable)
--- a/youtube_dl/extractor/cda.py
+++ b/youtube_dl/extractor/cda.py
@@ -24,7 +24,7 @@ class CDAIE(InfoExtractor):
              'height': 720,
              'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
              'description': 'md5:269ccd135d550da90d1662651fcb9772',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'average_rating': float,
              'duration': 39
          }
@@ -36,7 +36,7 @@ class CDAIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Lądowanie na lotnisku na Maderze',
              'description': 'md5:60d76b71186dcce4e0ba6d4bbdb13e1a',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'crash404',
              'view_count': int,
              'average_rating': float,
diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py

index 4ec79d19dd9db6402752ee65d462631985009cbf..4f88c31ad2af53fe07df449e384137689f65c17d 100644 (file)
--- a/youtube_dl/extractor/ceskatelevize.py
+++ b/youtube_dl/extractor/ceskatelevize.py
@@ -25,7 +25,7 @@ class CeskaTelevizeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Hyde Park Civilizace',
              'description': 'md5:fe93f6eda372d150759d11644ebbfb4a',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 3350,
          },
          'params': {
@@ -39,7 +39,7 @@ class CeskaTelevizeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Hyde Park Civilizace: Bonus 01 - En',
              'description': 'English Subtittles',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 81.3,
          },
          'params': {
@@ -52,7 +52,7 @@ class CeskaTelevizeIE(InfoExtractor):
          'info_dict': {
              'id': 402,
              'ext': 'mp4',
-            'title': 're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
          },
          'params': {
@@ -80,7 +80,7 @@ class CeskaTelevizeIE(InfoExtractor):
                  'id': '61924494877068022',
                  'ext': 'mp4',
                  'title': 'Queer: Bogotart (Queer)',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 1558.3,
              },
          }],
diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py

index 34d4e61569b110b49998768f13bb81cdda75bd75..865dbcaba5016eb957c41bac43966e2a75044f0b 100644 (file)
--- a/youtube_dl/extractor/channel9.py
+++ b/youtube_dl/extractor/channel9.py
@@ -31,7 +31,7 @@ class Channel9IE(InfoExtractor):
              'title': 'Developer Kick-Off Session: Stuff We Love',
              'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
              'duration': 4576,
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'session_code': 'KOS002',
              'session_day': 'Day 1',
              'session_room': 'Arena 1A',
@@ -47,7 +47,7 @@ class Channel9IE(InfoExtractor):
              'title': 'Self-service BI with Power BI - nuclear testing',
              'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
              'duration': 1540,
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'authors': ['Mike Wilmot'],
          },
      }, {
@@ -59,7 +59,7 @@ class Channel9IE(InfoExtractor):
              'title': 'Ranges for the Standard Library',
              'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
              'duration': 5646,
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/charlierose.py b/youtube_dl/extractor/charlierose.py

index 4bf2cf7b0ce7648ec2786952220b452c20201a52..2d517f23194b023a7fcd0fd917a63ac08a6fa80c 100644 (file)
--- a/youtube_dl/extractor/charlierose.py
+++ b/youtube_dl/extractor/charlierose.py
@@ -13,7 +13,7 @@ class CharlieRoseIE(InfoExtractor):
              'id': '27996',
              'ext': 'mp4',
              'title': 'Remembering Zaha Hadid',
-            'thumbnail': 're:^https?://.*\.jpg\?\d+',
+            'thumbnail': r're:^https?://.*\.jpg\?\d+',
              'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
              'subtitles': {
                  'en': [{
diff --git a/youtube_dl/extractor/chaturbate.py b/youtube_dl/extractor/chaturbate.py

index 29a8820d5835b1b3cf7aca3840705a2fb2f2e1e3..8fbc91c1fbae17f8c46adfe1cb947ff64f5d11b4 100644 (file)
--- a/youtube_dl/extractor/chaturbate.py
+++ b/youtube_dl/extractor/chaturbate.py
@@ -1,5 +1,7 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import ExtractorError
  
@@ -31,30 +33,35 @@ class ChaturbateIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        m3u8_url = self._search_regex(
-            r'src=(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage,
-            'playlist', default=None, group='url')
+        m3u8_formats = [(m.group('id').lower(), m.group('url')) for m in re.finditer(
+            r'hlsSource(?P<id>.+?)\s*=\s*(?P<q>["\'])(?P<url>http.+?)(?P=q)', webpage)]
  
-        if not m3u8_url:
+        if not m3u8_formats:
              error = self._search_regex(
                  [r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>',
                   r'<div[^>]+id=(["\'])defchat\1[^>]*>\s*<p><strong>(?P<error>[^<]+)<'],
                  webpage, 'error', group='error', default=None)
              if not error:
-                if any(p not in webpage for p in (
+                if any(p in webpage for p in (
                          self._ROOM_OFFLINE, 'offline_tipping', 'tip_offline')):
                      error = self._ROOM_OFFLINE
              if error:
                  raise ExtractorError(error, expected=True)
              raise ExtractorError('Unable to find stream URL')
  
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        formats = []
+        for m3u8_id, m3u8_url in m3u8_formats:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, ext='mp4',
+                # ffmpeg skips segments for fast m3u8
+                preference=-10 if m3u8_id == 'fast' else None,
+                m3u8_id=m3u8_id, fatal=False, live=True))
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': self._live_title(video_id),
-            'thumbnail': 'https://cdn-s.highwebmedia.com/uHK3McUtGCG3SMFcd4ZJsRv8/roomimage/%s.jpg' % video_id,
+            'thumbnail': 'https://roomimg.stream.highwebmedia.com/ri/%s.jpg' % video_id,
              'age_limit': self._rta_search(webpage),
              'is_live': True,
              'formats': formats,
diff --git a/youtube_dl/extractor/chirbit.py b/youtube_dl/extractor/chirbit.py

index f35df143a604695c0b1fe7b0e33d7384192d1d98..4815b34be7832144075793217de77ba44b7c9471 100644 (file)
--- a/youtube_dl/extractor/chirbit.py
+++ b/youtube_dl/extractor/chirbit.py
@@ -19,6 +19,7 @@ class ChirbitIE(InfoExtractor):
              'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
              'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
              'duration': 306,
+            'uploader': 'Gerryaudio',
          },
          'params': {
              'skip_download': True,
@@ -54,6 +55,9 @@ class ChirbitIE(InfoExtractor):
          duration = parse_duration(self._search_regex(
              r'class=["\']c-length["\'][^>]*>([^<]+)',
              webpage, 'duration', fatal=False))
+        uploader = self._search_regex(
+            r'id=["\']chirbit-username["\'][^>]*>([^<]+)',
+            webpage, 'uploader', fatal=False)
  
          return {
              'id': audio_id,
@@ -61,6 +65,7 @@ class ChirbitIE(InfoExtractor):
              'title': title,
              'description': description,
              'duration': duration,
+            'uploader': uploader,
          }
  
  
diff --git a/youtube_dl/extractor/cliphunter.py b/youtube_dl/extractor/cliphunter.py

index 252c2e846969c96d733911f2f471054286ec0777..ab651d1c8632fe08c29d44334cb4ba4a6ea2fddf 100644 (file)
--- a/youtube_dl/extractor/cliphunter.py
+++ b/youtube_dl/extractor/cliphunter.py
@@ -30,7 +30,7 @@ class CliphunterIE(InfoExtractor):
              'id': '1012420',
              'ext': 'flv',
              'title': 'Fun Jynx Maze solo',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'age_limit': 18,
          },
          'skip': 'Video gone',
@@ -41,7 +41,7 @@ class CliphunterIE(InfoExtractor):
              'id': '2019449',
              'ext': 'mp4',
              'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'age_limit': 18,
          },
      }]
diff --git a/youtube_dl/extractor/clipsyndicate.py b/youtube_dl/extractor/clipsyndicate.py

index 0b6ad895fd7841e70b7dc0dd136052ff0459dd3c..6cdb42f5a4ae56ab9fd1202716c62cadc8bc456d 100644 (file)
--- a/youtube_dl/extractor/clipsyndicate.py
+++ b/youtube_dl/extractor/clipsyndicate.py
@@ -18,7 +18,7 @@ class ClipsyndicateIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Brick Briscoe',
              'duration': 612,
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
          },
      }, {
          'url': 'http://chic.clipsyndicate.com/video/play/5844117/shark_attack',
diff --git a/youtube_dl/extractor/clubic.py b/youtube_dl/extractor/clubic.py

index f7ee3a8f8ebe4715b2d2a5f4634bc50836cc33f7..98f9cb596955621d458461c42a6e7adb7638b9db 100644 (file)
--- a/youtube_dl/extractor/clubic.py
+++ b/youtube_dl/extractor/clubic.py
@@ -19,7 +19,7 @@ class ClubicIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Clubic Week 2.0 : le FBI se lance dans la photo d\u0092identité',
              'description': 're:Gueule de bois chez Nokia. Le constructeur a indiqué cette.*',
-            'thumbnail': 're:^http://img\.clubic\.com/.*\.jpg$',
+            'thumbnail': r're:^http://img\.clubic\.com/.*\.jpg$',
          }
      }, {
          'url': 'http://www.clubic.com/video/video-clubic-week-2-0-apple-iphone-6s-et-plus-mais-surtout-le-pencil-469792.html',
diff --git a/youtube_dl/extractor/cmt.py b/youtube_dl/extractor/cmt.py

index 7d3e9b0c9ce89fff9b8094f2d86beaa5fb35e7e0..e701fbeab8231cb90ec54a1c439e6bae42ec703d 100644 (file)
--- a/youtube_dl/extractor/cmt.py
+++ b/youtube_dl/extractor/cmt.py
@@ -1,13 +1,11 @@
  from __future__ import unicode_literals
  
  from .mtv import MTVIE
-from ..utils import ExtractorError
  
  
  class CMTIE(MTVIE):
      IE_NAME = 'cmt.com'
-    _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
-    _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
+    _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows|(?:full-)?episodes|video-clips)/(?P<id>[^/]+)'
  
      _TESTS = [{
          'url': 'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
@@ -33,17 +31,24 @@ class CMTIE(MTVIE):
      }, {
          'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
          'only_matching': True,
+    }, {
+        'url': 'http://www.cmt.com/full-episodes/537qb3/nashville-the-wayfaring-stranger-season-5-ep-501',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.cmt.com/video-clips/t9e4ci/nashville-juliette-in-2-minutes',
+        'only_matching': True,
      }]
  
-    @classmethod
-    def _transform_rtmp_url(cls, rtmp_video_url):
-        if 'error_not_available.swf' in rtmp_video_url:
-            raise ExtractorError(
-                '%s said: video is not available' % cls.IE_NAME, expected=True)
-
-        return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
-
      def _extract_mgid(self, webpage):
-        return self._search_regex(
+        mgid = self._search_regex(
              r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
-            webpage, 'mgid', group='mgid')
+            webpage, 'mgid', group='mgid', default=None)
+        if not mgid:
+            mgid = self._extract_triforce_mgid(webpage)
+        return mgid
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        mgid = self._extract_mgid(webpage)
+        return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
diff --git a/youtube_dl/extractor/collegerama.py b/youtube_dl/extractor/collegerama.py

index f9e84193d95a8ebd2e49331a34c91b04ad95c649..18c7347668a55ccfdce4aed30a8052829cbf24a9 100644 (file)
--- a/youtube_dl/extractor/collegerama.py
+++ b/youtube_dl/extractor/collegerama.py
@@ -21,7 +21,7 @@ class CollegeRamaIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Een nieuwe wereld: waarden, bewustzijn en techniek van de mensheid 2.0.',
                  'description': '',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 7713.088,
                  'timestamp': 1413309600,
                  'upload_date': '20141014',
diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py

index 0239dfd84d776d45e8457d0357fceacb5d1d7467..4cac294153f166b676f1bdcdb2d47ee9e5fdf693 100644 (file)
--- a/youtube_dl/extractor/comedycentral.py
+++ b/youtube_dl/extractor/comedycentral.py
@@ -48,15 +48,7 @@ class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
          webpage = self._download_webpage(url, playlist_id)
-
-        feed_json = self._search_regex(r'var triforceManifestFeed\s*=\s*(\{.+?\});\n', webpage, 'triforce feeed')
-        feed = self._parse_json(feed_json, playlist_id)
-        zones = feed['manifest']['zones']
-
-        video_zone = zones['t2_lc_promo1']
-        feed = self._download_json(video_zone['feed'], playlist_id)
-        mgid = feed['result']['data']['id']
-
+        mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
          videos_info = self._get_videos_info(mgid)
          return videos_info
  
@@ -79,7 +71,7 @@ class ToshIE(MTVServicesInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
                  'description': 'Tosh asked fans to share their summer plans.',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  # It's really reported to be published on year 2077
                  'upload_date': '20770610',
                  'timestamp': 3390510600,
@@ -93,12 +85,6 @@ class ToshIE(MTVServicesInfoExtractor):
          'only_matching': True,
      }]
  
-    @classmethod
-    def _transform_rtmp_url(cls, rtmp_video_url):
-        new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
-        new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
-        return new_urls
-
  
  class ComedyCentralTVIE(MTVServicesInfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 05c51fac9b0b4162fb126cb79a79d871b591ead8..0b4e2ac207b049d934c6b65fdf9317f856466b0f 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -59,6 +59,7 @@ from ..utils import (
      parse_m3u8_attributes,
      extract_attributes,
      parse_codecs,
+    urljoin,
  )
  
  
@@ -120,9 +121,19 @@ class InfoExtractor(object):
                                   download, lower-case.
                                   "http", "https", "rtsp", "rtmp", "rtmpe",
                                   "m3u8", "m3u8_native" or "http_dash_segments".
-                    * fragments  A list of fragments of the fragmented media,
-                                 with the following entries:
-                                 * "url" (mandatory) - fragment's URL
+                    * fragment_base_url
+                                 Base URL for fragments. Each fragment's path
+                                 value (if present) will be relative to
+                                 this URL.
+                    * fragments  A list of fragments of a fragmented media.
+                                 Each fragment entry must contain either an url
+                                 or a path. If an url is present it should be
+                                 considered by a client. Otherwise both path and
+                                 fragment_base_url must be present. Here is
+                                 the list of all potential fields:
+                                 * "url" - fragment's URL
+                                 * "path" - fragment's path relative to
+                                            fragment_base_url
                                   * "duration" (optional, int or float)
                                   * "filesize" (optional, int)
                      * preference Order number of this format. If this field is
@@ -188,9 +199,10 @@ class InfoExtractor(object):
      uploader_url:   Full URL to a personal webpage of the video uploader.
      location:       Physical location where the video was filmed.
      subtitles:      The available subtitles as a dictionary in the format
-                    {language: subformats}. "subformats" is a list sorted from
-                    lower to higher preference, each element is a dictionary
-                    with the "ext" entry and one of:
+                    {tag: subformats}. "tag" is usually a language code, and
+                    "subformats" is a list sorted from lower to higher
+                    preference, each element is a dictionary with the "ext"
+                    entry and one of:
                          * "data": The subtitles file contents
                          * "url": A URL pointing to the subtitles file
                      "ext" will be calculated from URL if missing
@@ -1013,13 +1025,13 @@ class InfoExtractor(object):
                  unique_formats.append(f)
          formats[:] = unique_formats
  
-    def _is_valid_url(self, url, video_id, item='video'):
+    def _is_valid_url(self, url, video_id, item='video', headers={}):
          url = self._proto_relative_url(url, scheme='http:')
          # For now assume non HTTP(S) URLs always valid
          if not (url.startswith('http://') or url.startswith('https://')):
              return True
          try:
-            self._request_webpage(url, video_id, 'Checking %s URL' % item)
+            self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
              return True
          except ExtractorError as e:
              if isinstance(e.cause, compat_urllib_error.URLError):
@@ -1224,6 +1236,7 @@ class InfoExtractor(object):
                  'protocol': entry_protocol,
                  'preference': preference,
              }]
+        audio_in_video_stream = {}
          last_info = {}
          last_media = {}
          for line in m3u8_doc.splitlines():
@@ -1233,25 +1246,32 @@ class InfoExtractor(object):
                  media = parse_m3u8_attributes(line)
                  media_type = media.get('TYPE')
                  if media_type in ('VIDEO', 'AUDIO'):
+                    group_id = media.get('GROUP-ID')
                      media_url = media.get('URI')
                      if media_url:
                          format_id = []
-                        for v in (media.get('GROUP-ID'), media.get('NAME')):
+                        for v in (group_id, media.get('NAME')):
                              if v:
                                  format_id.append(v)
-                        formats.append({
+                        f = {
                              'format_id': '-'.join(format_id),
                              'url': format_url(media_url),
                              'language': media.get('LANGUAGE'),
-                            'vcodec': 'none' if media_type == 'AUDIO' else None,
                              'ext': ext,
                              'protocol': entry_protocol,
                              'preference': preference,
-                        })
+                        }
+                        if media_type == 'AUDIO':
+                            f['vcodec'] = 'none'
+                            if group_id and not audio_in_video_stream.get(group_id):
+                                audio_in_video_stream[group_id] = False
+                        formats.append(f)
                      else:
                          # When there is no URI in EXT-X-MEDIA let this tag's
                          # data be used by regular URI lines below
                          last_media = media
+                        if media_type == 'AUDIO' and group_id:
+                            audio_in_video_stream[group_id] = True
              elif line.startswith('#') or not line.strip():
                  continue
              else:
@@ -1295,6 +1315,9 @@ class InfoExtractor(object):
                          'abr': abr,
                      })
                  f.update(parse_codecs(last_info.get('CODECS')))
+                if audio_in_video_stream.get(last_info.get('AUDIO')) is False and f['vcodec'] != 'none':
+                    # TODO: update acodec for audio only formats with the same GROUP-ID
+                    f['acodec'] = 'none'
                  formats.append(f)
                  last_info = {}
                  last_media = {}
@@ -1614,21 +1637,16 @@ class InfoExtractor(object):
                  segment_template = element.find(_add_ns('SegmentTemplate'))
                  if segment_template is not None:
                      extract_common(segment_template)
-                    media_template = segment_template.get('media')
-                    if media_template:
-                        ms_info['media_template'] = media_template
+                    media = segment_template.get('media')
+                    if media:
+                        ms_info['media'] = media
                      initialization = segment_template.get('initialization')
                      if initialization:
-                        ms_info['initialization_url'] = initialization
+                        ms_info['initialization'] = initialization
                      else:
                          extract_Initialization(segment_template)
              return ms_info
  
-        def combine_url(base_url, target_url):
-            if re.match(r'^https?://', target_url):
-                return target_url
-            return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
-
          mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
          formats = []
          for period in mpd_doc.findall(_add_ns('Period')):
@@ -1668,6 +1686,7 @@ class InfoExtractor(object):
                          lang = representation_attrib.get('lang')
                          url_el = representation.find(_add_ns('BaseURL'))
                          filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
+                        bandwidth = int_or_none(representation_attrib.get('bandwidth'))
                          f = {
                              'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
                              'url': base_url,
@@ -1675,23 +1694,41 @@ class InfoExtractor(object):
                              'ext': mimetype2ext(mime_type),
                              'width': int_or_none(representation_attrib.get('width')),
                              'height': int_or_none(representation_attrib.get('height')),
-                            'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
+                            'tbr': int_or_none(bandwidth, 1000),
                              'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
                              'fps': int_or_none(representation_attrib.get('frameRate')),
-                            'vcodec': 'none' if content_type == 'audio' else representation_attrib.get('codecs'),
-                            'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
                              'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
                              'format_note': 'DASH %s' % content_type,
                              'filesize': filesize,
                          }
+                        f.update(parse_codecs(representation_attrib.get('codecs')))
                          representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
-                        if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
  
-                            media_template = representation_ms_info['media_template']
-                            media_template = media_template.replace('$RepresentationID$', representation_id)
-                            media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
-                            media_template = re.sub(r'\$(Number|Bandwidth|Time)%([^$]+)\$', r'%(\1)\2', media_template)
-                            media_template.replace('$$', '$')
+                        def prepare_template(template_name, identifiers):
+                            t = representation_ms_info[template_name]
+                            t = t.replace('$RepresentationID$', representation_id)
+                            t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
+                            t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
+                            t.replace('$$', '$')
+                            return t
+
+                        # @initialization is a regular template like @media one
+                        # so it should be handled just the same way (see
+                        # https://github.com/rg3/youtube-dl/issues/11605)
+                        if 'initialization' in representation_ms_info:
+                            initialization_template = prepare_template(
+                                'initialization',
+                                # As per [1, 5.3.9.4.2, Table 15, page 54] $Number$ and
+                                # $Time$ shall not be included for @initialization thus
+                                # only $Bandwidth$ remains
+                                ('Bandwidth', ))
+                            representation_ms_info['initialization_url'] = initialization_template % {
+                                'Bandwidth': bandwidth,
+                            }
+
+                        if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
+
+                            media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
  
                              # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
                              # can't be used at the same time
@@ -1703,7 +1740,7 @@ class InfoExtractor(object):
                                  representation_ms_info['fragments'] = [{
                                      'url': media_template % {
                                          'Number': segment_number,
-                                        'Bandwidth': int_or_none(representation_attrib.get('bandwidth')),
+                                        'Bandwidth': bandwidth,
                                      },
                                      'duration': segment_duration,
                                  } for segment_number in range(
@@ -1721,7 +1758,7 @@ class InfoExtractor(object):
                                  def add_segment_url():
                                      segment_url = media_template % {
                                          'Time': segment_time,
-                                        'Bandwidth': int_or_none(representation_attrib.get('bandwidth')),
+                                        'Bandwidth': bandwidth,
                                          'Number': segment_number,
                                      }
                                      representation_ms_info['fragments'].append({
@@ -1744,14 +1781,16 @@ class InfoExtractor(object):
                              # Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
                              # or any YouTube dashsegments video
                              fragments = []
-                            s_num = 0
-                            for segment_url in representation_ms_info['segment_urls']:
-                                s = representation_ms_info['s'][s_num]
+                            segment_index = 0
+                            timescale = representation_ms_info['timescale']
+                            for s in representation_ms_info['s']:
+                                duration = float_or_none(s['d'], timescale)
                                  for r in range(s.get('r', 0) + 1):
                                      fragments.append({
-                                        'url': segment_url,
-                                        'duration': float_or_none(s['d'], representation_ms_info['timescale']),
+                                        'url': representation_ms_info['segment_urls'][segment_index],
+                                        'duration': duration,
                                      })
+                                    segment_index += 1
                              representation_ms_info['fragments'] = fragments
                          # NB: MPD manifest may contain direct URLs to unfragmented media.
                          # No fragments key is present in this case.
@@ -1761,13 +1800,13 @@ class InfoExtractor(object):
                                  'protocol': 'http_dash_segments',
                              })
                              if 'initialization_url' in representation_ms_info:
-                                initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
+                                initialization_url = representation_ms_info['initialization_url']
                                  if not f.get('url'):
                                      f['url'] = initialization_url
                                  f['fragments'].append({'url': initialization_url})
                              f['fragments'].extend(representation_ms_info['fragments'])
                              for fragment in f['fragments']:
-                                fragment['url'] = combine_url(base_url, fragment['url'])
+                                fragment['url'] = urljoin(base_url, fragment['url'])
                          try:
                              existing_format = next(
                                  fo for fo in formats
@@ -1881,7 +1920,7 @@ class InfoExtractor(object):
                  })
          return formats
  
-    def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
+    def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None):
          def absolute_url(video_url):
              return compat_urlparse.urljoin(base_url, video_url)
  
@@ -1898,11 +1937,16 @@ class InfoExtractor(object):
  
          def _media_formats(src, cur_media_type):
              full_url = absolute_url(src)
-            if determine_ext(full_url) == 'm3u8':
+            ext = determine_ext(full_url)
+            if ext == 'm3u8':
                  is_plain_url = False
                  formats = self._extract_m3u8_formats(
                      full_url, video_id, ext='mp4',
                      entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
+            elif ext == 'mpd':
+                is_plain_url = False
+                formats = self._extract_mpd_formats(
+                    full_url, video_id, mpd_id=mpd_id)
              else:
                  is_plain_url = True
                  formats = [{
@@ -1915,7 +1959,12 @@ class InfoExtractor(object):
          media_tags = [(media_tag, media_type, '')
                        for media_tag, media_type
                        in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
-        media_tags.extend(re.findall(r'(?s)(<(?P<tag>video|audio)[^>]*>)(.*?)</(?P=tag)>', webpage))
+        media_tags.extend(re.findall(
+            # We only allow video|audio followed by a whitespace or '>'.
+            # Allowing more characters may end up in significant slow down (see
+            # https://github.com/rg3/youtube-dl/issues/11979, example URL:
+            # http://www.porntrex.com/maps/videositemap.xml).
+            r'(?s)(<(?P<tag>video|audio)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
          for media_tag, media_type, media_content in media_tags:
              media_info = {
                  'formats': [],
@@ -1955,10 +2004,13 @@ class InfoExtractor(object):
                  entries.append(media_info)
          return entries
  
-    def _extract_akamai_formats(self, manifest_url, video_id):
+    def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
          formats = []
          hdcore_sign = 'hdcore=3.7.0'
-        f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
+        f4m_url = re.sub(r'(https?://[^/+])/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
+        hds_host = hosts.get('hds')
+        if hds_host:
+            f4m_url = re.sub(r'(https?://)[^/]+', r'\1' + hds_host, f4m_url)
          if 'hdcore=' not in f4m_url:
              f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
          f4m_formats = self._extract_f4m_formats(
@@ -1966,7 +2018,10 @@ class InfoExtractor(object):
          for entry in f4m_formats:
              entry.update({'extra_param_to_segment_url': hdcore_sign})
          formats.extend(f4m_formats)
-        m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
+        m3u8_url = re.sub(r'(https?://[^/]+)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
+        hls_host = hosts.get('hls')
+        if hls_host:
+            m3u8_url = re.sub(r'(https?://)[^/]+', r'\1' + hls_host, m3u8_url)
          formats.extend(self._extract_m3u8_formats(
              m3u8_url, video_id, 'mp4', 'm3u8_native',
              m3u8_id='hls', fatal=False))
diff --git a/youtube_dl/extractor/coub.py b/youtube_dl/extractor/coub.py

index a901b8d2223fb7606538d8dcd98e19905ff3889c..5fa1f006b82675d299d1cef30fbe2108496256d5 100644 (file)
--- a/youtube_dl/extractor/coub.py
+++ b/youtube_dl/extractor/coub.py
@@ -20,7 +20,7 @@ class CoubIE(InfoExtractor):
              'id': '5u5n1',
              'ext': 'mp4',
              'title': 'The Matrix Moonwalk',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 4.6,
              'timestamp': 1428527772,
              'upload_date': '20150408',
diff --git a/youtube_dl/extractor/crackle.py b/youtube_dl/extractor/crackle.py

index cc68f1c0082674eaf850c2a0c1e3d6ae0f670d74..377fb45e9d2bcd70c1a6aa6d835331636708c215 100644 (file)
--- a/youtube_dl/extractor/crackle.py
+++ b/youtube_dl/extractor/crackle.py
@@ -6,7 +6,7 @@ from ..utils import int_or_none
  
  
  class CrackleIE(InfoExtractor):
-    _VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
+    _VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',
          'info_dict': {
@@ -14,7 +14,7 @@ class CrackleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Everybody Respects A Bloody Nose',
              'description': 'Jerry is kaffeeklatsching in L.A. with funnyman J.B. Smoove (Saturday Night Live, Real Husbands of Hollywood). They’re headed for brew at 10 Speed Coffee in a 1964 Studebaker Avanti.',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 906,
              'series': 'Comedians In Cars Getting Coffee',
              'season_number': 8,
@@ -31,8 +31,32 @@ class CrackleIE(InfoExtractor):
          }
      }
  
+    _THUMBNAIL_RES = [
+        (120, 90),
+        (208, 156),
+        (220, 124),
+        (220, 220),
+        (240, 180),
+        (250, 141),
+        (315, 236),
+        (320, 180),
+        (360, 203),
+        (400, 300),
+        (421, 316),
+        (460, 330),
+        (460, 460),
+        (462, 260),
+        (480, 270),
+        (587, 330),
+        (640, 480),
+        (700, 330),
+        (700, 394),
+        (854, 480),
+        (1024, 1024),
+        (1920, 1080),
+    ]
+
      # extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
-    _THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
      _MEDIA_FILE_SLOTS = {
          'c544.flv': {
              'width': 544,
@@ -61,17 +85,25 @@ class CrackleIE(InfoExtractor):
  
          item = self._download_xml(
              'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
-            video_id).find('i')
+            video_id, headers=self.geo_verification_headers()).find('i')
          title = item.attrib['t']
  
          subtitles = {}
          formats = self._extract_m3u8_formats(
              'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id),
              video_id, 'mp4', m3u8_id='hls', fatal=None)
-        thumbnail = None
+        thumbnails = []
          path = item.attrib.get('p')
          if path:
-            thumbnail = self._THUMBNAIL_TEMPLATE % path
+            for width, height in self._THUMBNAIL_RES:
+                res = '%dx%d' % (width, height)
+                thumbnails.append({
+                    'id': res,
+                    'url': 'http://images-us-am.crackle.com/%stnl_%s.jpg' % (path, res),
+                    'width': width,
+                    'height': height,
+                    'resolution': res,
+                })
              http_base_url = 'http://ahttp.crackle.com/' + path
              for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
                  formats.append({
@@ -86,10 +118,11 @@ class CrackleIE(InfoExtractor):
                  if locale and v:
                      if locale not in subtitles:
                          subtitles[locale] = []
-                    subtitles[locale] = [{
-                        'url': '%s/%s%s_%s.xml' % (config_doc.attrib['strSubtitleServer'], path, locale, v),
-                        'ext': 'ttml',
-                    }]
+                    for url_ext, ext in (('vtt', 'vtt'), ('xml', 'tt')):
+                        subtitles.setdefault(locale, []).append({
+                            'url': '%s/%s%s_%s.%s' % (config_doc.attrib['strSubtitleServer'], path, locale, v, url_ext),
+                            'ext': ext,
+                        })
          self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
  
          return {
@@ -100,7 +133,7 @@ class CrackleIE(InfoExtractor):
              'series': item.attrib.get('sn'),
              'season_number': int_or_none(item.attrib.get('se')),
              'episode_number': int_or_none(item.attrib.get('ep')),
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
              'subtitles': subtitles,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/criterion.py b/youtube_dl/extractor/criterion.py

index cf6a5d6cbe906443b1db592616cd89926860bbdd..f7815b905d13910e0a931f2609fa015c9ac3f00a 100644 (file)
--- a/youtube_dl/extractor/criterion.py
+++ b/youtube_dl/extractor/criterion.py
@@ -14,7 +14,7 @@ class CriterionIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Le Samouraï',
              'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/crooksandliars.py b/youtube_dl/extractor/crooksandliars.py

index 443eb7691c7b9c2402942db21e521d17550167f9..7fb782db7ce930a47fc9a75730409aec805c18ac 100644 (file)
--- a/youtube_dl/extractor/crooksandliars.py
+++ b/youtube_dl/extractor/crooksandliars.py
@@ -16,7 +16,7 @@ class CrooksAndLiarsIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Fox & Friends Says Protecting Atheists From Discrimination Is Anti-Christian!',
              'description': 'md5:e1a46ad1650e3a5ec7196d432799127f',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1428207000,
              'upload_date': '20150405',
              'uploader': 'Heather',
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index 8d5b69f68d3ddb345dc67487db998cf164b2765c..109d1c5a864f283a01b2b2baaed784384776a5c1 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -142,7 +142,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              'ext': 'flv',
              'title': 'Culture Japan Episode 1 – Rebuilding Japan after the 3.11',
              'description': 'md5:2fbc01f90b87e8e9137296f37b461c12',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Danny Choo Network',
              'upload_date': '20120213',
          },
@@ -158,7 +158,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              'ext': 'mp4',
              'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant',
              'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'TV TOKYO',
              'upload_date': '20160508',
          },
@@ -166,6 +166,25 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              # m3u8 download
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.crunchyroll.com/konosuba-gods-blessing-on-this-wonderful-world/episode-1-give-me-deliverance-from-this-judicial-injustice-727589',
+        'info_dict': {
+            'id': '727589',
+            'ext': 'mp4',
+            'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 – Give Me Deliverance from this Judicial Injustice!",
+            'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'uploader': 'Kadokawa Pictures Inc.',
+            'upload_date': '20170118',
+            'series': "KONOSUBA -God's blessing on this wonderful world!",
+            'season_number': 2,
+            'episode': 'Give Me Deliverance from this Judicial Injustice!',
+            'episode_number': 1,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
          'only_matching': True,
@@ -236,8 +255,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
          output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
          output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
          output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
-        output += """ScaledBorderAndShadow: no
-
+        output += """
  [V4+ Styles]
  Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
  """
@@ -439,6 +457,18 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
  
          subtitles = self.extract_subtitles(video_id, webpage)
  
+        # webpage provide more accurate data than series_title from XML
+        series = self._html_search_regex(
+            r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
+            webpage, 'series', default=xpath_text(metadata, 'series_title'))
+
+        episode = xpath_text(metadata, 'episode_title')
+        episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
+
+        season_number = int_or_none(self._search_regex(
+            r'(?s)<h4[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h4>\s*<h4>\s*Season (\d+)',
+            webpage, 'season number', default=None))
+
          return {
              'id': video_id,
              'title': video_title,
@@ -446,9 +476,10 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
              'thumbnail': xpath_text(metadata, 'episode_image_url'),
              'uploader': video_uploader,
              'upload_date': video_upload_date,
-            'series': xpath_text(metadata, 'series_title'),
-            'episode': xpath_text(metadata, 'episode_title'),
-            'episode_number': int_or_none(xpath_text(metadata, 'episode_number')),
+            'series': series,
+            'season_number': season_number,
+            'episode': episode,
+            'episode_number': episode_number,
              'subtitles': subtitles,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py

index 7e5d4f2276385a363eade175dba78519cea515fe..d4576160b4489e599e4ca7dabc1e18c9d685610f 100644 (file)
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -12,6 +12,7 @@ from ..utils import (
      ExtractorError,
  )
  from .senateisvp import SenateISVPIE
+from .ustream import UstreamIE
  
  
  class CSpanIE(InfoExtractor):
@@ -22,14 +23,13 @@ class CSpanIE(InfoExtractor):
          'md5': '94b29a4f131ff03d23471dd6f60b6a1d',
          'info_dict': {
              'id': '315139',
-            'ext': 'mp4',
              'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
-            'description': 'Attorney General Eric Holder speaks to reporters following the Supreme Court decision in [Shelby County v. Holder], in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced.',
          },
+        'playlist_mincount': 2,
          'skip': 'Regularly fails on travis, for unknown reasons',
      }, {
          'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
-        'md5': '8e5fbfabe6ad0f89f3012a7943c1287b',
+        # md5 is unstable
          'info_dict': {
              'id': 'c4486943',
              'ext': 'mp4',
@@ -38,14 +38,11 @@ class CSpanIE(InfoExtractor):
          }
      }, {
          'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall',
-        'md5': '2ae5051559169baadba13fc35345ae74',
          'info_dict': {
              'id': '342759',
-            'ext': 'mp4',
              'title': 'General Motors Ignition Switch Recall',
-            'duration': 14848,
-            'description': 'md5:118081aedd24bf1d3b68b3803344e7f3'
          },
+        'playlist_mincount': 6,
      }, {
          # Video from senate.gov
          'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers',
@@ -57,12 +54,30 @@ class CSpanIE(InfoExtractor):
          'params': {
              'skip_download': True,  # m3u8 downloads
          }
+    }, {
+        # Ustream embedded video
+        'url': 'https://www.c-span.org/video/?114917-1/armed-services',
+        'info_dict': {
+            'id': '58428542',
+            'ext': 'flv',
+            'title': 'USHR07 Armed Services Committee',
+            'description': 'hsas00-2118-20150204-1000et-07\n\n\nUSHR07 Armed Services Committee',
+            'timestamp': 1423060374,
+            'upload_date': '20150204',
+            'uploader': 'HouseCommittee',
+            'uploader_id': '12987475',
+        },
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          video_type = None
          webpage = self._download_webpage(url, video_id)
+
+        ustream_url = UstreamIE._extract_url(webpage)
+        if ustream_url:
+            return self.url_result(ustream_url, UstreamIE.ie_key())
+
          # We first look for clipid, because clipprog always appears before
          patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
          results = list(filter(None, (re.search(p, webpage) for p in patterns)))
diff --git a/youtube_dl/extractor/ctsnews.py b/youtube_dl/extractor/ctsnews.py

index 83ca90c3b68a66c8c612bd29cda89ae6d91f1478..d565335cf6c31a047b8882415afb4ea259578a06 100644 (file)
--- a/youtube_dl/extractor/ctsnews.py
+++ b/youtube_dl/extractor/ctsnews.py
@@ -28,7 +28,7 @@ class CtsNewsIE(InfoExtractor):
              'ext': 'mp4',
              'title': '韓國31歲童顏男 貌如十多歲小孩',
              'description': '越有年紀的人，越希望看起來年輕一點，而南韓卻有一位31歲的男子，看起來像是11、12歲的小孩，身...',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1378205880,
              'upload_date': '20130903',
          }
@@ -41,7 +41,7 @@ class CtsNewsIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'iPhone6熱銷 蘋果財報亮眼',
              'description': 'md5:f395d4f485487bb0f992ed2c4b07aa7d',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150128',
              'uploader_id': 'TBSCTS',
              'uploader': '中華電視公司',
diff --git a/youtube_dl/extractor/ctvnews.py b/youtube_dl/extractor/ctvnews.py

index 1023b61300b4d381a0f5019e2a3a04cbc77adc8a..55a127b7696e5d5dbb845709451c1b05b8df7211 100644 (file)
--- a/youtube_dl/extractor/ctvnews.py
+++ b/youtube_dl/extractor/ctvnews.py
@@ -8,7 +8,7 @@ from ..utils import orderedSet
  
  
  class CTVNewsIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
+    _VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
      _TESTS = [{
          'url': 'http://www.ctvnews.ca/video?clipId=901995',
          'md5': '10deb320dc0ccb8d01d34d12fc2ea672',
@@ -40,6 +40,9 @@ class CTVNewsIE(InfoExtractor):
      }, {
          'url': 'http://www.ctvnews.ca/canadiens-send-p-k-subban-to-nashville-in-blockbuster-trade-1.2967231',
          'only_matching': True,
+    }, {
+        'url': 'http://vancouverisland.ctvnews.ca/video?clipId=761241',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/cultureunplugged.py b/youtube_dl/extractor/cultureunplugged.py

index 9f26fa5878777d3302383646ad581056f429841a..bcdf27323edc795e75b91488c3989dfd5552d455 100644 (file)
--- a/youtube_dl/extractor/cultureunplugged.py
+++ b/youtube_dl/extractor/cultureunplugged.py
@@ -21,7 +21,7 @@ class CultureUnpluggedIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'The Next, Best West',
              'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'creator': 'Coldstream Creative',
              'duration': 2203,
              'view_count': int,
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 4a3314ea7d4fc2df95543cda554d32a8caf586ac..31bf5faf6605553cdcd79f670285a554e711364f 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -58,7 +58,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
                  'description': 'Several come bundled with the Steam Controller.',
-                'thumbnail': 're:^https?:.*\.(?:jpg|png)$',
+                'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
                  'duration': 74,
                  'timestamp': 1425657362,
                  'upload_date': '20150306',
diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py

index 732b4362a96488e67f4b1858f83429a85e877555..76f0218923536b29550c9384ce8348baf05289d5 100644 (file)
--- a/youtube_dl/extractor/daum.py
+++ b/youtube_dl/extractor/daum.py
@@ -32,7 +32,7 @@ class DaumIE(InfoExtractor):
              'title': '마크 헌트 vs 안토니오 실바',
              'description': 'Mark Hunt vs Antonio Silva',
              'upload_date': '20131217',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 2117,
              'view_count': int,
              'comment_count': int,
@@ -45,7 +45,7 @@ class DaumIE(InfoExtractor):
              'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
              'description': 'md5:79794514261164ff27e36a21ad229fc5',
              'upload_date': '20150604',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 154,
              'view_count': int,
              'comment_count': int,
@@ -61,7 +61,7 @@ class DaumIE(InfoExtractor):
              'title': '01-Korean War ( Trouble on the horizon )',
              'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
              'upload_date': '20080223',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 249,
              'view_count': int,
              'comment_count': int,
@@ -139,7 +139,7 @@ class DaumClipIE(InfoExtractor):
              'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
              'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
              'upload_date': '20130831',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 3868,
              'view_count': int,
          },
diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py

index 6d880d43d6507077018f9489749947d83a36f64b..f232f0dc536f612530e6ca7cfa0fde97e20b9467 100644 (file)
--- a/youtube_dl/extractor/dbtv.py
+++ b/youtube_dl/extractor/dbtv.py
@@ -17,7 +17,7 @@ class DBTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
              'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'timestamp': 1404039863,
              'upload_date': '20140629',
              'duration': 69.544,
diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py

index 14ba88715887caeb9144e68384417b2e7b518b07..00fbbff2fa35d2212d521a43e0e1b41b281d477b 100644 (file)
--- a/youtube_dl/extractor/dctp.py
+++ b/youtube_dl/extractor/dctp.py
@@ -17,7 +17,7 @@ class DctpTvIE(InfoExtractor):
              'title': 'Videoinstallation für eine Kaufhausfassade',
              'description': 'Kurzfilm',
              'upload_date': '20110407',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/deezer.py b/youtube_dl/extractor/deezer.py

index 7a07f3267db874649e5bcc5228a1c7881ebe19d3..ec87b94dbcc74ae60e05d1c6f43a6e4429cbb721 100644 (file)
--- a/youtube_dl/extractor/deezer.py
+++ b/youtube_dl/extractor/deezer.py
@@ -19,7 +19,7 @@ class DeezerPlaylistIE(InfoExtractor):
              'id': '176747451',
              'title': 'Best!',
              'uploader': 'Anonymous',
-            'thumbnail': 're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$',
+            'thumbnail': r're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$',
          },
          'playlist_count': 30,
          'skip': 'Only available in .de',
diff --git a/youtube_dl/extractor/dhm.py b/youtube_dl/extractor/dhm.py

index 44e0c5d4d7094cf965555431e39387a78bdb6f83..aee72a6ed1e2daac661887b2ed225e898635c71b 100644 (file)
--- a/youtube_dl/extractor/dhm.py
+++ b/youtube_dl/extractor/dhm.py
@@ -17,7 +17,7 @@ class DHMIE(InfoExtractor):
              'title': 'MARSHALL PLAN AT WORK IN WESTERN GERMANY, THE',
              'description': 'md5:1fabd480c153f97b07add61c44407c82',
              'duration': 660,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://www.dhm.de/filmarchiv/02-mapping-the-wall/peter-g/rolle-1/',
@@ -26,7 +26,7 @@ class DHMIE(InfoExtractor):
              'id': 'rolle-1',
              'ext': 'flv',
              'title': 'ROLLE 1',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }]
  
diff --git a/youtube_dl/extractor/digiteka.py b/youtube_dl/extractor/digiteka.py

index 7bb79ffda0bbeda00ea103e59ab19ab746196de3..3dfde0d8c772746821afea8b16d0f5d9d8dc1cfb 100644 (file)
--- a/youtube_dl/extractor/digiteka.py
+++ b/youtube_dl/extractor/digiteka.py
@@ -36,7 +36,7 @@ class DigitekaIE(InfoExtractor):
              'id': 's8uk0r',
              'ext': 'mp4',
              'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 74,
              'upload_date': '20150317',
              'timestamp': 1426604939,
@@ -50,7 +50,7 @@ class DigitekaIE(InfoExtractor):
              'id': 'xvpfp8',
              'ext': 'mp4',
              'title': 'Two - C\'est La Vie (clip)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 233,
              'upload_date': '20150224',
              'timestamp': 1424760500,
diff --git a/youtube_dl/extractor/discoverygo.py b/youtube_dl/extractor/discoverygo.py

index c4e83b2c3790670ec7d6c1b7c9cca4e47b4d7779..2042493a8c7836ecae4efd23005101cf805116a7 100644 (file)
--- a/youtube_dl/extractor/discoverygo.py
+++ b/youtube_dl/extractor/discoverygo.py
@@ -6,7 +6,6 @@ from ..utils import (
      extract_attributes,
      int_or_none,
      parse_age_limit,
-    unescapeHTML,
      ExtractorError,
  )
  
@@ -49,7 +48,7 @@ class DiscoveryGoIE(InfoExtractor):
                  webpage, 'video container'))
  
          video = self._parse_json(
-            unescapeHTML(container.get('data-video') or container.get('data-json')),
+            container.get('data-video') or container.get('data-json'),
              display_id)
  
          title = video['name']
diff --git a/youtube_dl/extractor/disney.py b/youtube_dl/extractor/disney.py

new file mode 100644 (file)

index 0000000..396873c
--- /dev/null
+++ b/youtube_dl/extractor/disney.py
@@ -0,0 +1,115 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+    compat_str,
+    determine_ext,
+)
+
+
+class DisneyIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+        https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})'''
+    _TESTS = [{
+        'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
+        'info_dict': {
+            'id': '545ed1857afee5a0ec239977',
+            'ext': 'mp4',
+            'title': 'Moana - Trailer',
+            'description': 'A fun adventure for the entire Family!  Bring home Moana on Digital HD Feb 21 & Blu-ray March 7',
+            'upload_date': '20170112',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.en.disneyme.com/watch/future-worm/robo-carp-2001-544b66002aa7353cdd3f5114',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.disneyturkiye.com.tr/izle/7c-7-cuceler/kimin-sesi-zaten-5456f3d015f6b36c8afdd0e2',
+        'only_matching': True,
+    }, {
+        'url': 'http://disneyjunior.disney.com/embed/546a4798ddba3d1612e4005d',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        domain, video_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(
+            'http://%s/embed/%s' % (domain, video_id), video_id)
+        video_data = self._parse_json(self._search_regex(
+            r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video']
+
+        for external in video_data.get('externals', []):
+            if external.get('source') == 'vevo':
+                return self.url_result('vevo:' + external['data_id'], 'Vevo')
+
+        title = video_data['title']
+
+        formats = []
+        for flavor in video_data.get('flavors', []):
+            flavor_format = flavor.get('format')
+            flavor_url = flavor.get('url')
+            if not flavor_url or not re.match(r'https?://', flavor_url):
+                continue
+            tbr = int_or_none(flavor.get('bitrate'))
+            if tbr == 99999:
+                formats.extend(self._extract_m3u8_formats(
+                    flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False))
+                continue
+            format_id = []
+            if flavor_format:
+                format_id.append(flavor_format)
+            if tbr:
+                format_id.append(compat_str(tbr))
+            ext = determine_ext(flavor_url)
+            if flavor_format == 'applehttp' or ext == 'm3u8':
+                ext = 'mp4'
+            width = int_or_none(flavor.get('width'))
+            height = int_or_none(flavor.get('height'))
+            formats.append({
+                'format_id': '-'.join(format_id),
+                'url': flavor_url,
+                'width': width,
+                'height': height,
+                'tbr': tbr,
+                'ext': ext,
+                'vcodec': 'none' if (width == 0 and height == 0) else None,
+            })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for caption in video_data.get('captions', []):
+            caption_url = caption.get('url')
+            caption_format = caption.get('format')
+            if not caption_url or caption_format.startswith('unknown'):
+                continue
+            subtitles.setdefault(caption.get('language', 'en'), []).append({
+                'url': caption_url,
+                'ext': {
+                    'webvtt': 'vtt',
+                }.get(caption_format, caption_format),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description') or video_data.get('short_desc'),
+            'thumbnail': video_data.get('thumb') or video_data.get('thumb_secure'),
+            'duration': int_or_none(video_data.get('duration_sec')),
+            'upload_date': unified_strdate(video_data.get('publish_date')),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/douyutv.py b/youtube_dl/extractor/douyutv.py

index e366e17e68139288543243667d637544488a6a23..91159441369121773a5b3a5b02b5ecc9e9ee01fd 100644 (file)
--- a/youtube_dl/extractor/douyutv.py
+++ b/youtube_dl/extractor/douyutv.py
@@ -18,7 +18,7 @@ from ..utils import (
  
  class DouyuTVIE(InfoExtractor):
      IE_DESC = '斗鱼'
-    _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?:[^/]+/)*(?P<id>[A-Za-z0-9]+)'
      _TESTS = [{
          'url': 'http://www.douyutv.com/iseven',
          'info_dict': {
@@ -26,8 +26,8 @@ class DouyuTVIE(InfoExtractor):
              'display_id': 'iseven',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 're:.*m7show@163\.com.*',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': r're:.*m7show@163\.com.*',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': '7师傅',
              'is_live': True,
          },
@@ -42,7 +42,7 @@ class DouyuTVIE(InfoExtractor):
              'ext': 'flv',
              'title': 're:^小漠从零单排记！——CSOL2躲猫猫 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'description': 'md5:746a2f7a253966a06755a912f0acc0d2',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'douyu小漠',
              'is_live': True,
          },
@@ -57,8 +57,8 @@ class DouyuTVIE(InfoExtractor):
              'display_id': '17732',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 're:.*m7show@163\.com.*',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': r're:.*m7show@163\.com.*',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': '7师傅',
              'is_live': True,
          },
@@ -68,6 +68,10 @@ class DouyuTVIE(InfoExtractor):
      }, {
          'url': 'http://www.douyu.com/xiaocang',
          'only_matching': True,
+    }, {
+        # \"room_id\"
+        'url': 'http://www.douyu.com/t/lpl',
+        'only_matching': True,
      }]
  
      # Decompile core.swf in webpage by ffdec "Search SWFs in memory". core.swf
@@ -82,7 +86,7 @@ class DouyuTVIE(InfoExtractor):
          else:
              page = self._download_webpage(url, video_id)
              room_id = self._html_search_regex(
-                r'"room_id"\s*:\s*(\d+),', page, 'room id')
+                r'"room_id\\?"\s*:\s*(\d+),', page, 'room id')
  
          room = self._download_json(
              'http://m.douyu.com/html5/live?roomId=%s' % room_id, video_id,
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

index 5790553f38ca29107bad44317fedb271dce0883a..32028bc3b79b61d249ad4bccaebadf745f9f942a 100644 (file)
--- a/youtube_dl/extractor/dplay.py
+++ b/youtube_dl/extractor/dplay.py
@@ -8,6 +8,7 @@ import time
  from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
+    USER_AGENTS,
      int_or_none,
      update_url_query,
  )
@@ -102,10 +103,16 @@ class DPlayIE(InfoExtractor):
                      manifest_url, video_id, ext='mp4',
                      entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
                  # Sometimes final URLs inside m3u8 are unsigned, let's fix this
-                # ourselves
+                # ourselves. Also fragments' URLs are only served signed for
+                # Safari user agent.
                  query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
                  for m3u8_format in m3u8_formats:
-                    m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
+                    m3u8_format.update({
+                        'url': update_url_query(m3u8_format['url'], query),
+                        'http_headers': {
+                            'User-Agent': USER_AGENTS['Safari'],
+                        },
+                    })
                  formats.extend(m3u8_formats)
              elif protocol == 'hds':
                  formats.extend(self._extract_f4m_formats(
diff --git a/youtube_dl/extractor/dramafever.py b/youtube_dl/extractor/dramafever.py

index c115956121a242920ec8016e8c9f3558c34060c6..bcd9fe2a039550d36af3f1a63cb3cf8cc583cb2a 100644 (file)
--- a/youtube_dl/extractor/dramafever.py
+++ b/youtube_dl/extractor/dramafever.py
@@ -66,7 +66,7 @@ class DramaFeverBaseIE(AMPIE):
  
  class DramaFeverIE(DramaFeverBaseIE):
      IE_NAME = 'dramafever'
-    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
+    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
      _TESTS = [{
          'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
          'info_dict': {
@@ -76,7 +76,7 @@ class DramaFeverIE(DramaFeverBaseIE):
              'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
              'episode': 'Episode 1',
              'episode_number': 1,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1404336058,
              'upload_date': '20140702',
              'duration': 343,
@@ -94,7 +94,7 @@ class DramaFeverIE(DramaFeverBaseIE):
              'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
              'episode': 'Mnet Asian Music Awards 2015 - Part 3',
              'episode_number': 4,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1450213200,
              'upload_date': '20151215',
              'duration': 5602,
@@ -103,6 +103,9 @@ class DramaFeverIE(DramaFeverBaseIE):
              # m3u8 download
              'skip_download': True,
          },
+    }, {
+        'url': 'https://www.dramafever.com/zh-cn/drama/4972/15/Doctor_Romantic/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -148,7 +151,7 @@ class DramaFeverIE(DramaFeverBaseIE):
  
  class DramaFeverSeriesIE(DramaFeverBaseIE):
      IE_NAME = 'dramafever:series'
-    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$'
+    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$'
      _TESTS = [{
          'url': 'http://www.dramafever.com/drama/4512/Cooking_with_Shin/',
          'info_dict': {
diff --git a/youtube_dl/extractor/drbonanza.py b/youtube_dl/extractor/drbonanza.py

index 01271f8f06ff91b22680314d644485fe94434391..79ec212c890471bd72a0e88eaf7bec0da70af124 100644 (file)
--- a/youtube_dl/extractor/drbonanza.py
+++ b/youtube_dl/extractor/drbonanza.py
@@ -20,7 +20,7 @@ class DRBonanzaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Talkshowet - Leonard Cohen',
              'description': 'md5:8f34194fb30cd8c8a30ad8b27b70c0ca',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
              'timestamp': 1295537932,
              'upload_date': '20110120',
              'duration': 3664,
@@ -36,7 +36,7 @@ class DRBonanzaIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'EM fodbold 1992 Danmark - Tyskland finale Transmission',
              'description': 'md5:501e5a195749480552e214fbbed16c4e',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
              'timestamp': 1223274900,
              'upload_date': '20081006',
              'duration': 7369,
diff --git a/youtube_dl/extractor/dreisat.py b/youtube_dl/extractor/dreisat.py

index 908c9e514c41ea72bac0e6f6ede41def4ba0b20b..f138025d5564b27bef7d09c2d74d7aefffd8cfdc 100644 (file)
--- a/youtube_dl/extractor/dreisat.py
+++ b/youtube_dl/extractor/dreisat.py
@@ -2,10 +2,19 @@ from __future__ import unicode_literals
  
  import re
  
-from .zdf import ZDFIE
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+    xpath_text,
+    determine_ext,
+    qualities,
+    float_or_none,
+    ExtractorError,
+)
  
  
-class DreiSatIE(ZDFIE):
+class DreiSatIE(InfoExtractor):
      IE_NAME = '3sat'
      _VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
      _TESTS = [
@@ -31,6 +40,163 @@ class DreiSatIE(ZDFIE):
          },
      ]
  
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
+        param_groups = {}
+        for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
+            group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace'))
+            params = {}
+            for param in param_group:
+                params[param.get('name')] = param.get('value')
+            param_groups[group_id] = params
+
+        formats = []
+        for video in smil.findall(self._xpath_ns('.//video', namespace)):
+            src = video.get('src')
+            if not src:
+                continue
+            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
+            group_id = video.get('paramGroup')
+            param_group = param_groups[group_id]
+            for proto in param_group['protocols'].split(','):
+                formats.append({
+                    'url': '%s://%s' % (proto, param_group['host']),
+                    'app': param_group['app'],
+                    'play_path': src,
+                    'ext': 'flv',
+                    'format_id': '%s-%d' % (proto, bitrate),
+                    'tbr': bitrate,
+                })
+        self._sort_formats(formats)
+        return formats
+
+    def extract_from_xml_url(self, video_id, xml_url):
+        doc = self._download_xml(
+            xml_url, video_id,
+            note='Downloading video info',
+            errnote='Failed to download video info')
+
+        status_code = doc.find('./status/statuscode')
+        if status_code is not None and status_code.text != 'ok':
+            code = status_code.text
+            if code == 'notVisibleAnymore':
+                message = 'Video %s is not available' % video_id
+            else:
+                message = '%s returned error: %s' % (self.IE_NAME, code)
+            raise ExtractorError(message, expected=True)
+
+        title = doc.find('.//information/title').text
+        description = xpath_text(doc, './/information/detail', 'description')
+        duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
+        uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
+        uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
+        upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
+
+        def xml_to_thumbnails(fnode):
+            thumbnails = []
+            for node in fnode:
+                thumbnail_url = node.text
+                if not thumbnail_url:
+                    continue
+                thumbnail = {
+                    'url': thumbnail_url,
+                }
+                if 'key' in node.attrib:
+                    m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
+                    if m:
+                        thumbnail['width'] = int(m.group(1))
+                        thumbnail['height'] = int(m.group(2))
+                thumbnails.append(thumbnail)
+            return thumbnails
+
+        thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
+
+        format_nodes = doc.findall('.//formitaeten/formitaet')
+        quality = qualities(['veryhigh', 'high', 'med', 'low'])
+
+        def get_quality(elem):
+            return quality(xpath_text(elem, 'quality'))
+        format_nodes.sort(key=get_quality)
+        format_ids = []
+        formats = []
+        for fnode in format_nodes:
+            video_url = fnode.find('url').text
+            is_available = 'http://www.metafilegenerator' not in video_url
+            if not is_available:
+                continue
+            format_id = fnode.attrib['basetype']
+            quality = xpath_text(fnode, './quality', 'quality')
+            format_m = re.match(r'''(?x)
+                (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
+                (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
+            ''', format_id)
+
+            ext = determine_ext(video_url, None) or format_m.group('container')
+            if ext not in ('smil', 'f4m', 'm3u8'):
+                format_id = format_id + '-' + quality
+            if format_id in format_ids:
+                continue
+
+            if ext == 'meta':
+                continue
+            elif ext == 'smil':
+                formats.extend(self._extract_smil_formats(
+                    video_url, video_id, fatal=False))
+            elif ext == 'm3u8':
+                # the certificates are misconfigured (see
+                # https://github.com/rg3/youtube-dl/issues/8665)
+                if video_url.startswith('https://'):
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    video_url, video_id, f4m_id=format_id, fatal=False))
+            else:
+                proto = format_m.group('proto').lower()
+
+                abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000)
+                vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000)
+
+                width = int_or_none(xpath_text(fnode, './width', 'width'))
+                height = int_or_none(xpath_text(fnode, './height', 'height'))
+
+                filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize'))
+
+                format_note = ''
+                if not format_note:
+                    format_note = None
+
+                formats.append({
+                    'format_id': format_id,
+                    'url': video_url,
+                    'ext': ext,
+                    'acodec': format_m.group('acodec'),
+                    'vcodec': format_m.group('vcodec'),
+                    'abr': abr,
+                    'vbr': vbr,
+                    'width': width,
+                    'height': height,
+                    'filesize': filesize,
+                    'format_note': format_note,
+                    'protocol': proto,
+                    '_available': is_available,
+                })
+            format_ids.append(format_id)
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'thumbnails': thumbnails,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'upload_date': upload_date,
+            'formats': formats,
+        }
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
diff --git a/youtube_dl/extractor/drtuber.py b/youtube_dl/extractor/drtuber.py

index 22da8e48105e5e8ee81a9cc948c67f6ec7d72eb8..1eca82b3b46ae47e511b0f2f3f8bd6bb505cdc23 100644 (file)
--- a/youtube_dl/extractor/drtuber.py
+++ b/youtube_dl/extractor/drtuber.py
@@ -22,7 +22,7 @@ class DrTuberIE(InfoExtractor):
              'like_count': int,
              'comment_count': int,
              'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'],
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }, {
diff --git a/youtube_dl/extractor/drtv.py b/youtube_dl/extractor/drtv.py

index 88d096b307cdf6d484ef6b89253f6cdbcb82deb0..e966d7483cdc2193cfc96d2bdd808c90e515f821 100644 (file)
--- a/youtube_dl/extractor/drtv.py
+++ b/youtube_dl/extractor/drtv.py
@@ -9,12 +9,13 @@ from ..utils import (
      mimetype2ext,
      parse_iso8601,
      remove_end,
+    update_url_query,
  )
  
  
  class DRTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
-
+    _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio/ondemand)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
+    IE_NAME = 'drtv'
      _TESTS = [{
          'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10',
          'md5': '25e659cccc9a2ed956110a299fdf5983',
@@ -79,9 +80,10 @@ class DRTVIE(InfoExtractor):
          subtitles = {}
  
          for asset in data['Assets']:
-            if asset.get('Kind') == 'Image':
+            kind = asset.get('Kind')
+            if kind == 'Image':
                  thumbnail = asset.get('Uri')
-            elif asset.get('Kind') == 'VideoResource':
+            elif kind in ('VideoResource', 'AudioResource'):
                  duration = float_or_none(asset.get('DurationInMilliseconds'), 1000)
                  restricted_to_denmark = asset.get('RestrictedToDenmark')
                  spoken_subtitles = asset.get('Target') == 'SpokenSubtitles'
@@ -96,9 +98,13 @@ class DRTVIE(InfoExtractor):
                          preference = -1
                          format_id += '-spoken-subtitles'
                      if target == 'HDS':
-                        formats.extend(self._extract_f4m_formats(
+                        f4m_formats = self._extract_f4m_formats(
                              uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43',
-                            video_id, preference, f4m_id=format_id))
+                            video_id, preference, f4m_id=format_id)
+                        if kind == 'AudioResource':
+                            for f in f4m_formats:
+                                f['vcodec'] = 'none'
+                        formats.extend(f4m_formats)
                      elif target == 'HLS':
                          formats.extend(self._extract_m3u8_formats(
                              uri, video_id, 'mp4', entry_protocol='m3u8_native',
@@ -112,6 +118,7 @@ class DRTVIE(InfoExtractor):
                              'format_id': format_id,
                              'tbr': int_or_none(bitrate),
                              'ext': link.get('FileFormat'),
+                            'vcodec': 'none' if kind == 'AudioResource' else None,
                          })
                  subtitles_list = asset.get('SubtitlesList')
                  if isinstance(subtitles_list, list):
@@ -144,3 +151,58 @@ class DRTVIE(InfoExtractor):
              'formats': formats,
              'subtitles': subtitles,
          }
+
+
+class DRTVLiveIE(InfoExtractor):
+    IE_NAME = 'drtv:live'
+    _VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv|TV)/live/(?P<id>[\da-z-]+)'
+    _TEST = {
+        'url': 'https://www.dr.dk/tv/live/dr1',
+        'info_dict': {
+            'id': 'dr1',
+            'ext': 'mp4',
+            'title': 're:^DR1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        channel_id = self._match_id(url)
+        channel_data = self._download_json(
+            'https://www.dr.dk/mu-online/api/1.0/channel/' + channel_id,
+            channel_id)
+        title = self._live_title(channel_data['Title'])
+
+        formats = []
+        for streaming_server in channel_data.get('StreamingServers', []):
+            server = streaming_server.get('Server')
+            if not server:
+                continue
+            link_type = streaming_server.get('LinkType')
+            for quality in streaming_server.get('Qualities', []):
+                for stream in quality.get('Streams', []):
+                    stream_path = stream.get('Stream')
+                    if not stream_path:
+                        continue
+                    stream_url = update_url_query(
+                        '%s/%s' % (server, stream_path), {'b': ''})
+                    if link_type == 'HLS':
+                        formats.extend(self._extract_m3u8_formats(
+                            stream_url, channel_id, 'mp4',
+                            m3u8_id=link_type, fatal=False, live=True))
+                    elif link_type == 'HDS':
+                        formats.extend(self._extract_f4m_formats(update_url_query(
+                            '%s/%s' % (server, stream_path), {'hdcore': '3.7.0'}),
+                            channel_id, f4m_id=link_type, fatal=False))
+        self._sort_formats(formats)
+
+        return {
+            'id': channel_id,
+            'title': title,
+            'thumbnail': channel_data.get('PrimaryImageUri'),
+            'formats': formats,
+            'is_live': True,
+        }
diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py

index e5aadcd25ccccb6f9838d0bd1417edc2fbe3bd0f..c9fc9b5a9df65cd8681ce8e0933473ea9658202d 100644 (file)
--- a/youtube_dl/extractor/dumpert.py
+++ b/youtube_dl/extractor/dumpert.py
@@ -21,7 +21,7 @@ class DumpertIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ik heb nieuws voor je',
              'description': 'Niet schrikken hoor',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/',
diff --git a/youtube_dl/extractor/eagleplatform.py b/youtube_dl/extractor/eagleplatform.py

index c2f593eca201a42f7023cc64d4237b5052fbc722..76d39adac5faa9912f42d271def60a46128be3f7 100644 (file)
--- a/youtube_dl/extractor/eagleplatform.py
+++ b/youtube_dl/extractor/eagleplatform.py
@@ -31,7 +31,7 @@ class EaglePlatformIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Навальный вышел на свободу',
              'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 87,
              'view_count': int,
              'age_limit': 0,
@@ -45,7 +45,7 @@ class EaglePlatformIE(InfoExtractor):
              'id': '12820',
              'ext': 'mp4',
              'title': "'O Sole Mio",
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 216,
              'view_count': int,
          },
diff --git a/youtube_dl/extractor/egghead.py b/youtube_dl/extractor/egghead.py

new file mode 100644 (file)

index 0000000..db92146
--- /dev/null
+++ b/youtube_dl/extractor/egghead.py
@@ -0,0 +1,39 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class EggheadCourseIE(InfoExtractor):
+    IE_DESC = 'egghead.io course'
+    IE_NAME = 'egghead:course'
+    _VALID_URL = r'https://egghead\.io/courses/(?P<id>[a-zA-Z_0-9-]+)'
+    _TEST = {
+        'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
+        'playlist_count': 29,
+        'info_dict': {
+            'id': 'professor-frisby-introduces-composable-functional-javascript',
+            'title': 'Professor Frisby Introduces Composable Functional JavaScript',
+            'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
+        },
+    }
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        webpage = self._download_webpage(url, playlist_id)
+
+        title = self._html_search_regex(r'<h1 class="title">([^<]+)</h1>', webpage, 'title')
+        ul = self._search_regex(r'(?s)<ul class="series-lessons-list">(.*?)</ul>', webpage, 'session list')
+
+        found = re.findall(r'(?s)<a class="[^"]*"\s*href="([^"]+)">\s*<li class="item', ul)
+        entries = [self.url_result(m) for m in found]
+
+        return {
+            '_type': 'playlist',
+            'id': playlist_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'entries': entries,
+        }
diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py

index 443865ad27ba96eea8f78c56d14b72a54bc86389..6ca07a13d736b3909269aa1314d6e868150f8aa0 100644 (file)
--- a/youtube_dl/extractor/einthusan.py
+++ b/youtube_dl/extractor/einthusan.py
@@ -19,7 +19,7 @@ class EinthusanIE(InfoExtractor):
                  'id': '2447',
                  'ext': 'mp4',
                  'title': 'Ek Villain',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'description': 'md5:9d29fc91a7abadd4591fb862fa560d93',
              }
          },
@@ -30,7 +30,7 @@ class EinthusanIE(InfoExtractor):
                  'id': '1671',
                  'ext': 'mp4',
                  'title': 'Soodhu Kavvuum',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
              }
          },
diff --git a/youtube_dl/extractor/elpais.py b/youtube_dl/extractor/elpais.py

index 8c725a4e631860584781b116e72b02dd05813fc2..99e00cf3c68ea93fc00d5301e1e6be5567a72bff 100644 (file)
--- a/youtube_dl/extractor/elpais.py
+++ b/youtube_dl/extractor/elpais.py
@@ -2,7 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import unified_strdate
+from ..utils import strip_jsonp, unified_strdate
  
  
  class ElPaisIE(InfoExtractor):
@@ -29,6 +29,16 @@ class ElPaisIE(InfoExtractor):
              'description': 'Que sí, que las cápsulas son cómodas. Pero si le pides algo más a la vida, quizá deberías aprender a usar bien la cafetera italiana. No tienes más que ver este vídeo y seguir sus siete normas básicas.',
              'upload_date': '20160303',
          }
+    }, {
+        'url': 'http://elpais.com/elpais/2017/01/26/ciencia/1485456786_417876.html',
+        'md5': '9c79923a118a067e1a45789e1e0b0f9c',
+        'info_dict': {
+            'id': '1485456786_417876',
+            'ext': 'mp4',
+            'title': 'Hallado un barco de la antigua Roma que naufragó en Baleares hace 1.800 años',
+            'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
+            'upload_date': '20170127',
+        },
      }]
  
      def _real_extract(self, url):
@@ -37,8 +47,15 @@ class ElPaisIE(InfoExtractor):
  
          prefix = self._html_search_regex(
              r'var\s+url_cache\s*=\s*"([^"]+)";', webpage, 'URL prefix')
-        video_suffix = self._search_regex(
-            r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
+        id_multimedia = self._search_regex(
+            r"id_multimedia\s*=\s*'([^']+)'", webpage, 'ID multimedia', default=None)
+        if id_multimedia:
+            url_info = self._download_json(
+                'http://elpais.com/vdpep/1/?pepid=' + id_multimedia, video_id, transform_source=strip_jsonp)
+            video_suffix = url_info['mp4']
+        else:
+            video_suffix = self._search_regex(
+                r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
          video_url = prefix + video_suffix
          thumbnail_suffix = self._search_regex(
              r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
diff --git a/youtube_dl/extractor/eroprofile.py b/youtube_dl/extractor/eroprofile.py

index 297f8a6f5fa4371415554bfe6c44d0745c262491..c08643a17cb99a92dd508201ad5c1ca69fd863ad 100644 (file)
--- a/youtube_dl/extractor/eroprofile.py
+++ b/youtube_dl/extractor/eroprofile.py
@@ -22,7 +22,7 @@ class EroProfileIE(InfoExtractor):
              'display_id': 'sexy-babe-softcore',
              'ext': 'm4v',
              'title': 'sexy babe softcore',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          }
      }, {
@@ -32,7 +32,7 @@ class EroProfileIE(InfoExtractor):
              'id': '1133519',
              'ext': 'm4v',
              'title': 'Try It On Pee_cut_2.wmv - 4shared.com - file sharing - download movie file',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          },
          'skip': 'Requires login',
diff --git a/youtube_dl/extractor/escapist.py b/youtube_dl/extractor/escapist.py

index a3d7bbbcb3f45a4c098397d0622fc59324412fcc..4d8a3c13467b8478b6c2a4a91bae8679a778e062 100644 (file)
--- a/youtube_dl/extractor/escapist.py
+++ b/youtube_dl/extractor/escapist.py
@@ -45,7 +45,7 @@ class EscapistIE(InfoExtractor):
              'ext': 'mp4',
              'description': "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
              'title': "Breaking Down Baldur's Gate",
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 264,
              'uploader': 'The Escapist',
          }
@@ -57,7 +57,7 @@ class EscapistIE(InfoExtractor):
              'ext': 'mp4',
              'description': 'This week, Zero Punctuation reviews Evolve.',
              'title': 'Evolve - One vs Multiplayer',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 304,
              'uploader': 'The Escapist',
          }
diff --git a/youtube_dl/extractor/esri.py b/youtube_dl/extractor/esri.py

index d4205d7fbde331e3bb9fc94275da143575ebd454..e9dcaeb1dd165f86f8eac0a78a9147fa62ada1ab 100644 (file)
--- a/youtube_dl/extractor/esri.py
+++ b/youtube_dl/extractor/esri.py
@@ -22,7 +22,7 @@ class EsriVideoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'ArcGIS Online - Developing Applications',
              'description': 'Jeremy Bartley demonstrates how to develop applications with ArcGIS Online.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 185,
              'upload_date': '20120419',
          }
diff --git a/youtube_dl/extractor/europa.py b/youtube_dl/extractor/europa.py

index adc43919e72aa48fa052db641e5412c6dae9b999..1efc0b2ec04bc874fee5744803e4549dc9058cd1 100644 (file)
--- a/youtube_dl/extractor/europa.py
+++ b/youtube_dl/extractor/europa.py
@@ -23,7 +23,7 @@ class EuropaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'TRADE - Wikileaks on TTIP',
              'description': 'NEW  LIVE EC Midday press briefing of 11/08/2015',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150811',
              'duration': 34,
              'view_count': int,
diff --git a/youtube_dl/extractor/expotv.py b/youtube_dl/extractor/expotv.py

index ef11962f35035617a589e91cde5db43659099f66..95a8977821d3c292470e42f0f9170674ed9a6aa2 100644 (file)
--- a/youtube_dl/extractor/expotv.py
+++ b/youtube_dl/extractor/expotv.py
@@ -17,7 +17,7 @@ class ExpoTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'NYX Butter Lipstick Little Susie',
              'description': 'Goes on like butter, but looks better!',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Stephanie S.',
              'upload_date': '20150520',
              'view_count': int,
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

index 46d007b7d50d7b3916e3dacc897547dfc924446d..12cda36ccfc1088274a93377b976c0084f0e6c33 100644 (file)
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -30,7 +30,10 @@ from .aenetworks import (
      AENetworksIE,
      HistoryTopicIE,
  )
-from .afreecatv import AfreecaTVIE
+from .afreecatv import (
+    AfreecaTVIE,
+    AfreecaTVGlobalIE,
+)
  from .airmozilla import AirMozillaIE
  from .aljazeera import AlJazeeraIE
  from .alphaporno import AlphaPornoIE
@@ -38,10 +41,7 @@ from .amcnetworks import AMCNetworksIE
  from .animeondemand import AnimeOnDemandIE
  from .anitube import AnitubeIE
  from .anysex import AnySexIE
-from .aol import (
-    AolIE,
-    AolFeaturesIE,
-)
+from .aol import AolIE
  from .allocine import AllocineIE
  from .aparat import AparatIE
  from .appleconnect import AppleConnectIE
@@ -80,6 +80,10 @@ from .awaan import (
      AWAANLiveIE,
      AWAANSeasonIE,
  )
+from .azmedien import (
+    AZMedienIE,
+    AZMedienPlaylistIE,
+)
  from .azubu import AzubuIE, AzubuLiveIE
  from .baidu import BaiduVideoIE
  from .bambuser import BambuserIE, BambuserChannelIE
@@ -91,6 +95,7 @@ from .bbc import (
      BBCCoUkPlaylistIE,
      BBCIE,
  )
+from .beampro import BeamProLiveIE
  from .beeg import BeegIE
  from .behindkink import BehindKinkIE
  from .bellmedia import BellMediaIE
@@ -98,7 +103,10 @@ from .beatport import BeatportIE
  from .bet import BetIE
  from .bigflix import BigflixIE
  from .bild import BildIE
-from .bilibili import BiliBiliIE
+from .bilibili import (
+    BiliBiliIE,
+    BiliBiliBangumiIE,
+)
  from .biobiochiletv import BioBioChileTVIE
  from .biqle import BIQLEIE
  from .bleacherreport import (
@@ -150,6 +158,7 @@ from .cbsnews import (
  )
  from .cbssports import CBSSportsIE
  from .ccc import CCCIE
+from .ccma import CCMAIE
  from .cctv import CCTVIE
  from .cda import CDAIE
  from .ceskatelevize import CeskaTelevizeIE
@@ -239,12 +248,16 @@ from .dramafever import (
  from .dreisat import DreiSatIE
  from .drbonanza import DRBonanzaIE
  from .drtuber import DrTuberIE
-from .drtv import DRTVIE
+from .drtv import (
+    DRTVIE,
+    DRTVLiveIE,
+)
  from .dvtv import DVTVIE
  from .dumpert import DumpertIE
  from .defense import DefenseGouvFrIE
  from .discovery import DiscoveryIE
  from .discoverygo import DiscoveryGoIE
+from .disney import DisneyIE
  from .dispeak import DigitallySpeakingIE
  from .dropbox import DropboxIE
  from .dw import (
@@ -254,6 +267,7 @@ from .dw import (
  from .eagleplatform import EaglePlatformIE
  from .ebaumsworld import EbaumsWorldIE
  from .echomsk import EchoMskIE
+from .egghead import EggheadCourseIE
  from .ehow import EHowIE
  from .eighttracks import EightTracksIE
  from .einthusan import EinthusanIE
@@ -288,6 +302,10 @@ from .fc2 import (
      FC2EmbedIE,
  )
  from .fczenit import FczenitIE
+from .filmon import (
+    FilmOnIE,
+    FilmOnChannelIE,
+)
  from .firstpost import FirstpostIE
  from .firsttv import FirstTVIE
  from .fivemin import FiveMinIE
@@ -319,7 +337,6 @@ from .francetv import (
  )
  from .freesound import FreesoundIE
  from .freespeech import FreespeechIE
-from .freevideo import FreeVideoIE
  from .funimation import FunimationIE
  from .funnyordie import FunnyOrDieIE
  from .fusion import FusionIE
@@ -332,6 +349,7 @@ from .gameone import (
  from .gamersyde import GamersydeIE
  from .gamespot import GameSpotIE
  from .gamestar import GameStarIE
+from .gaskrank import GaskrankIE
  from .gazeta import GazetaIE
  from .gdcvault import GDCVaultIE
  from .generic import GenericIE
@@ -369,6 +387,7 @@ from .hgtv import (
  )
  from .historicfilms import HistoricFilmsIE
  from .hitbox import HitboxIE, HitboxLiveIE
+from .hitrecord import HitRecordIE
  from .hornbunny import HornBunnyIE
  from .hotnewhiphop import HotNewHipHopIE
  from .hotstar import HotStarIE
@@ -396,6 +415,7 @@ from .imgur import (
      ImgurAlbumIE,
  )
  from .ina import InaIE
+from .inc import IncIE
  from .indavideo import (
      IndavideoIE,
      IndavideoEmbedIE,
@@ -406,6 +426,7 @@ from .internetvideoarchive import InternetVideoArchiveIE
  from .iprima import IPrimaIE
  from .iqiyi import IqiyiIE
  from .ir90tv import Ir90TvIE
+from .itv import ITVIE
  from .ivi import (
      IviIE,
      IviCompilationIE
@@ -446,7 +467,10 @@ from .kuwo import (
      KuwoMvIE,
  )
  from .la7 import LA7IE
-from .laola1tv import Laola1TvIE
+from .laola1tv import (
+    Laola1TvEmbedIE,
+    Laola1TvIE,
+)
  from .lci import LCIIE
  from .lcp import (
      LcpPlayIE,
@@ -498,6 +522,8 @@ from .mangomolo import (
  )
  from .matchtv import MatchTVIE
  from .mdr import MDRIE
+from .meipai import MeipaiIE
+from .melonvod import MelonVODIE
  from .meta import METAIE
  from .metacafe import MetacafeIE
  from .metacritic import MetacriticIE
@@ -539,6 +565,7 @@ from .mtv import (
      MTVVideoIE,
      MTVServicesEmbeddedIE,
      MTVDEIE,
+    MTV81IE,
  )
  from .muenchentv import MuenchenTVIE
  from .musicplayon import MusicPlayOnIE
@@ -588,6 +615,7 @@ from .nextmedia import (
      NextMediaIE,
      NextMediaActionNewsIE,
      AppleDailyIE,
+    NextTVIE,
  )
  from .nfb import NFBIE
  from .nfl import NFLIE
@@ -649,6 +677,9 @@ from .nrk import (
      NRKPlaylistIE,
      NRKSkoleIE,
      NRKTVIE,
+    NRKTVDirekteIE,
+    NRKTVEpisodesIE,
+    NRKTVSeriesIE,
  )
  from .ntvde import NTVDeIE
  from .ntvru import NTVRuIE
@@ -661,6 +692,7 @@ from .nzz import NZZIE
  from .odatv import OdaTVIE
  from .odnoklassniki import OdnoklassnikiIE
  from .oktoberfesttv import OktoberfestTVIE
+from .ondemandkorea import OnDemandKoreaIE
  from .onet import (
      OnetIE,
      OnetChannelIE,
@@ -691,6 +723,7 @@ from .periscope import (
  from .philharmoniedeparis import PhilharmonieDeParisIE
  from .phoenix import PhoenixIE
  from .photobucket import PhotobucketIE
+from .piksel import PikselIE
  from .pinkbike import PinkbikeIE
  from .pladform import PladformIE
  from .playfm import PlayFMIE
@@ -710,6 +743,7 @@ from .polskieradio import (
  )
  from .porn91 import Porn91IE
  from .porncom import PornComIE
+from .pornflip import PornFlipIE
  from .pornhd import PornHdIE
  from .pornhub import (
      PornHubIE,
@@ -804,7 +838,6 @@ from .sbs import SBSIE
  from .scivee import SciVeeIE
  from .screencast import ScreencastIE
  from .screencastomatic import ScreencastOMaticIE
-from .screenjunkies import ScreenJunkiesIE
  from .seeker import SeekerIE
  from .senateisvp import SenateISVPIE
  from .sendtonews import SendtoNewsIE
@@ -815,7 +848,7 @@ from .shared import (
      SharedIE,
      VivoIE,
  )
-from .sharesix import ShareSixIE
+from .showroomlive import ShowRoomLiveIE
  from .sina import SinaIE
  from .sixplay import SixPlayIE
  from .skynewsarabia import (
@@ -859,10 +892,7 @@ from .spiegeltv import SpiegeltvIE
  from .spike import SpikeIE
  from .stitcher import StitcherIE
  from .sport5 import Sport5IE
-from .sportbox import (
-    SportBoxIE,
-    SportBoxEmbedIE,
-)
+from .sportbox import SportBoxEmbedIE
  from .sportdeutschland import SportDeutschlandIE
  from .sportschau import SportschauIE
  from .srgssr import (
@@ -966,6 +996,7 @@ from .tv2 import (
  )
  from .tv3 import TV3IE
  from .tv4 import TV4IE
+from .tva import TVAIE
  from .tvanouvelles import (
      TVANouvellesIE,
      TVANouvellesArticleIE,
@@ -998,7 +1029,10 @@ from .twitch import (
      TwitchChapterIE,
      TwitchVodIE,
      TwitchProfileIE,
+    TwitchAllVideosIE,
+    TwitchUploadsIE,
      TwitchPastBroadcastsIE,
+    TwitchHighlightsIE,
      TwitchStreamIE,
      TwitchClipsIE,
  )
@@ -1012,6 +1046,7 @@ from .udemy import (
      UdemyCourseIE
  )
  from .udn import UDNEmbedIE
+from .uktvplay import UKTVPlayIE
  from .digiteka import DigitekaIE
  from .unistra import UnistraIE
  from .uol import UOLIE
@@ -1051,6 +1086,7 @@ from .vice import (
  from .viceland import VicelandIE
  from .vidbit import VidbitIE
  from .viddler import ViddlerIE
+from .videa import VideaIE
  from .videodetective import VideoDetectiveIE
  from .videofyme import VideofyMeIE
  from .videomega import VideoMegaIE
@@ -1060,7 +1096,7 @@ from .videomore import (
      VideomoreSeasonIE,
  )
  from .videopremium import VideoPremiumIE
-from .videott import VideoTtIE
+from .videopress import VideoPressIE
  from .vidio import VidioIE
  from .vidme import (
      VidmeIE,
@@ -1095,12 +1131,20 @@ from .viki import (
      VikiIE,
      VikiChannelIE,
  )
+from .viu import (
+    ViuIE,
+    ViuPlaylistIE,
+    ViuOTTIE,
+)
  from .vk import (
      VKIE,
      VKUserVideosIE,
      VKWallPostIE,
  )
-from .vlive import VLiveIE
+from .vlive import (
+    VLiveIE,
+    VLiveChannelIE
+)
  from .vodlocker import VodlockerIE
  from .vodplatform import VODPlatformIE
  from .voicerepublic import VoiceRepublicIE
@@ -1109,6 +1153,7 @@ from .vporn import VpornIE
  from .vrt import VRTIE
  from .vube import VubeIE
  from .vuclip import VuClipIE
+from .vvvvid import VVVVIDIE
  from .vyborymos import VyboryMosIE
  from .vzaar import VzaarIE
  from .walla import WallaIE
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index b4d38e5c258b830e192bcfa2639f2074d9217434..b325c82004b8aedc612cf3656c54816dcaf48e94 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -12,14 +12,16 @@ from ..compat import (
      compat_urllib_parse_unquote_plus,
  )
  from ..utils import (
+    clean_html,
      error_to_compat_str,
      ExtractorError,
+    get_element_by_id,
      int_or_none,
+    js_to_json,
      limit_length,
      sanitized_Request,
+    try_get,
      urlencode_postdata,
-    get_element_by_id,
-    clean_html,
  )
  
  
@@ -27,7 +29,7 @@ class FacebookIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                  (?:
                      https?://
-                        (?:[\w-]+\.)?facebook\.com/
+                        (?:[\w-]+\.)?(?:facebook\.com|facebookcorewwwi\.onion)/
                          (?:[^#]*?\#!/)?
                          (?:
                              (?:
@@ -71,7 +73,7 @@ class FacebookIE(InfoExtractor):
          'info_dict': {
              'id': '274175099429670',
              'ext': 'mp4',
-            'title': 'Facebook video #274175099429670',
+            'title': 'Asif Nawab Butt posted a video to his Timeline.',
              'uploader': 'Asif Nawab Butt',
              'upload_date': '20140506',
              'timestamp': 1399398998,
@@ -150,6 +152,9 @@ class FacebookIE(InfoExtractor):
      }, {
          'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
          'only_matching': True,
+    }, {
+        'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -240,12 +245,30 @@ class FacebookIE(InfoExtractor):
  
          video_data = None
  
+        def extract_video_data(instances):
+            for item in instances:
+                if item[1][0] == 'VideoConfig':
+                    video_item = item[2][0]
+                    if video_item.get('video_id') == video_id:
+                        return video_item['videoData']
+
          server_js_data = self._parse_json(self._search_regex(
-            r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
-        for item in server_js_data.get('instances', []):
-            if item[1][0] == 'VideoConfig':
-                video_data = item[2][0]['videoData']
-                break
+            r'handleServerJS\(({.+})(?:\);|,")', webpage,
+            'server js data', default='{}'), video_id, fatal=False)
+
+        if server_js_data:
+            video_data = extract_video_data(server_js_data.get('instances', []))
+
+        if not video_data:
+            server_js_data = self._parse_json(
+                self._search_regex(
+                    r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+stream_pagelet',
+                    webpage, 'js data', default='{}'),
+                video_id, transform_source=js_to_json, fatal=False)
+            if server_js_data:
+                video_data = extract_video_data(try_get(
+                    server_js_data, lambda x: x['jsmods']['instances'],
+                    list) or [])
  
          if not video_data:
              if not fatal_if_no_video:
@@ -255,6 +278,8 @@ class FacebookIE(InfoExtractor):
                  raise ExtractorError(
                      'The video is not available, Facebook said: "%s"' % m_msg.group(1),
                      expected=True)
+            elif '>You must log in to continue' in webpage:
+                self.raise_login_required()
              else:
                  raise ExtractorError('Cannot parse data')
  
@@ -293,10 +318,16 @@ class FacebookIE(InfoExtractor):
              video_title = self._html_search_regex(
                  r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
                  webpage, 'alternative title', default=None)
-            video_title = limit_length(video_title, 80)
          if not video_title:
+            video_title = self._html_search_meta(
+                'description', webpage, 'title')
+        if video_title:
+            video_title = limit_length(video_title, 80)
+        else:
              video_title = 'Facebook video #%s' % video_id
-        uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
+        uploader = clean_html(get_element_by_id(
+            'fbPhotoPageAuthorName', webpage)) or self._search_regex(
+            r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
          timestamp = int_or_none(self._search_regex(
              r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
              'timestamp', default=None))
diff --git a/youtube_dl/extractor/fc2.py b/youtube_dl/extractor/fc2.py

index c032d4d0282cc7907b08ec42de9ac842dd4a34c2..448647d727159d97b2f940e76136888af1abc64a 100644 (file)
--- a/youtube_dl/extractor/fc2.py
+++ b/youtube_dl/extractor/fc2.py
@@ -133,7 +133,7 @@ class FC2EmbedIE(InfoExtractor):
              'id': '201403223kCqB3Ez',
              'ext': 'flv',
              'title': 'プリズン･ブレイク S1-01 マイケル 【吹替】',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/filmon.py b/youtube_dl/extractor/filmon.py

new file mode 100644 (file)

index 0000000..f775fe0
--- /dev/null
+++ b/youtube_dl/extractor/filmon.py
@@ -0,0 +1,178 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_HTTPError,
+)
+from ..utils import (
+    qualities,
+    strip_or_none,
+    int_or_none,
+    ExtractorError,
+)
+
+
+class FilmOnIE(InfoExtractor):
+    IE_NAME = 'filmon'
+    _VALID_URL = r'(?:https?://(?:www\.)?filmon\.com/vod/view/|filmon:)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.filmon.com/vod/view/24869-0-plan-9-from-outer-space',
+        'info_dict': {
+            'id': '24869',
+            'ext': 'mp4',
+            'title': 'Plan 9 From Outer Space',
+            'description': 'Dead human, zombies and vampires',
+        },
+    }, {
+        'url': 'https://www.filmon.com/vod/view/2825-1-popeye-series-1',
+        'info_dict': {
+            'id': '2825',
+            'title': 'Popeye Series 1',
+            'description': 'The original series of Popeye.',
+        },
+        'playlist_mincount': 8,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        try:
+            response = self._download_json(
+                'https://www.filmon.com/api/vod/movie?id=%s' % video_id,
+                video_id)['response']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError):
+                errmsg = self._parse_json(e.cause.read().decode(), video_id)['reason']
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, errmsg), expected=True)
+            raise
+
+        title = response['title']
+        description = strip_or_none(response.get('description'))
+
+        if response.get('type_id') == 1:
+            entries = [self.url_result('filmon:' + episode_id) for episode_id in response.get('episodes', [])]
+            return self.playlist_result(entries, video_id, title, description)
+
+        QUALITY = qualities(('low', 'high'))
+        formats = []
+        for format_id, stream in response.get('streams', {}).items():
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            formats.append({
+                'format_id': format_id,
+                'url': stream_url,
+                'ext': 'mp4',
+                'quality': QUALITY(stream.get('quality')),
+                'protocol': 'm3u8_native',
+            })
+        self._sort_formats(formats)
+
+        thumbnails = []
+        poster = response.get('poster', {})
+        thumbs = poster.get('thumbs', {})
+        thumbs['poster'] = poster
+        for thumb_id, thumb in thumbs.items():
+            thumb_url = thumb.get('url')
+            if not thumb_url:
+                continue
+            thumbnails.append({
+                'id': thumb_id,
+                'url': thumb_url,
+                'width': int_or_none(thumb.get('width')),
+                'height': int_or_none(thumb.get('height')),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'description': description,
+            'thumbnails': thumbnails,
+        }
+
+
+class FilmOnChannelIE(InfoExtractor):
+    IE_NAME = 'filmon:channel'
+    _VALID_URL = r'https?://(?:www\.)?filmon\.com/(?:tv|channel)/(?P<id>[a-z0-9-]+)'
+    _TESTS = [{
+        # VOD
+        'url': 'http://www.filmon.com/tv/sports-haters',
+        'info_dict': {
+            'id': '4190',
+            'ext': 'mp4',
+            'title': 'Sports Haters',
+            'description': 'md5:dabcb4c1d9cfc77085612f1a85f8275d',
+        },
+    }, {
+        # LIVE
+        'url': 'https://www.filmon.com/channel/filmon-sports',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.filmon.com/tv/2894',
+        'only_matching': True,
+    }]
+
+    _THUMBNAIL_RES = [
+        ('logo', 56, 28),
+        ('big_logo', 106, 106),
+        ('extra_big_logo', 300, 300),
+    ]
+
+    def _real_extract(self, url):
+        channel_id = self._match_id(url)
+
+        try:
+            channel_data = self._download_json(
+                'http://www.filmon.com/api-v2/channel/' + channel_id, channel_id)['data']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError):
+                errmsg = self._parse_json(e.cause.read().decode(), channel_id)['message']
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, errmsg), expected=True)
+            raise
+
+        channel_id = compat_str(channel_data['id'])
+        is_live = not channel_data.get('is_vod') and not channel_data.get('is_vox')
+        title = channel_data['title']
+
+        QUALITY = qualities(('low', 'high'))
+        formats = []
+        for stream in channel_data.get('streams', []):
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            if not is_live:
+                formats.extend(self._extract_wowza_formats(
+                    stream_url, channel_id, skip_protocols=['dash', 'rtmp', 'rtsp']))
+                continue
+            quality = stream.get('quality')
+            formats.append({
+                'format_id': quality,
+                # this is an m3u8 stream, but we are deliberately not using _extract_m3u8_formats
+                # because it doesn't have bitrate variants anyway
+                'url': stream_url,
+                'ext': 'mp4',
+                'quality': QUALITY(quality),
+            })
+        self._sort_formats(formats)
+
+        thumbnails = []
+        for name, width, height in self._THUMBNAIL_RES:
+            thumbnails.append({
+                'id': name,
+                'url': 'http://static.filmon.com/assets/channels/%s/%s.png' % (channel_id, name),
+                'width': width,
+                'height': height,
+            })
+
+        return {
+            'id': channel_id,
+            'display_id': channel_data.get('alias'),
+            'title': self._live_title(title) if is_live else title,
+            'description': channel_data.get('description'),
+            'thumbnails': thumbnails,
+            'formats': formats,
+            'is_live': is_live,
+        }
diff --git a/youtube_dl/extractor/firsttv.py b/youtube_dl/extractor/firsttv.py

index 6b662cc3cd78e4acf661af473f2374b5ec2af05c..081c7184233d3e79d0a2d684bd693631b7600eb2 100644 (file)
--- a/youtube_dl/extractor/firsttv.py
+++ b/youtube_dl/extractor/firsttv.py
@@ -2,7 +2,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      int_or_none,
      qualities,
@@ -22,9 +25,8 @@ class FirstTVIE(InfoExtractor):
          'info_dict': {
              'id': '40049',
              'ext': 'mp4',
-            'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
-            'description': 'md5:36a39c1d19618fec57d12efe212a8370',
-            'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
+            'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
+            'thumbnail': r're:^https?://.*\.(?:jpg|JPG)$',
              'upload_date': '20150212',
              'duration': 2694,
          },
@@ -34,9 +36,8 @@ class FirstTVIE(InfoExtractor):
          'info_dict': {
              'id': '364746',
              'ext': 'mp4',
-            'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
-            'description': 'md5:a242eea0031fd180a4497d52640a9572',
-            'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
+            'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
+            'thumbnail': r're:^https?://.*\.(?:jpg|JPG)$',
              'upload_date': '20160407',
              'duration': 179,
              'formats': 'mincount:3',
@@ -44,6 +45,17 @@ class FirstTVIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.1tv.ru/news/issue/2016-12-01/14:00',
+        'info_dict': {
+            'id': '14:00',
+            'title': 'Выпуск новостей в 14:00   1 декабря 2016 года. Новости. Первый канал',
+            'description': 'md5:2e921b948f8c1ff93901da78ebdb1dfd',
+        },
+        'playlist_count': 13,
+    }, {
+        'url': 'http://www.1tv.ru/shows/tochvtoch-supersezon/vystupleniya/evgeniy-dyatlov-vladimir-vysockiy-koni-priveredlivye-toch-v-toch-supersezon-fragment-vypuska-ot-06-11-2016',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -51,43 +63,91 @@ class FirstTVIE(InfoExtractor):
  
          webpage = self._download_webpage(url, display_id)
          playlist_url = compat_urlparse.urljoin(url, self._search_regex(
-            r'data-playlist-url="([^"]+)', webpage, 'playlist url'))
+            r'data-playlist-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
+            webpage, 'playlist url', group='url'))
+
+        parsed_url = compat_urlparse.urlparse(playlist_url)
+        qs = compat_urlparse.parse_qs(parsed_url.query)
+        item_ids = qs.get('videos_ids[]') or qs.get('news_ids[]')
+
+        items = self._download_json(playlist_url, display_id)
+
+        if item_ids:
+            items = [
+                item for item in items
+                if item.get('uid') and compat_str(item['uid']) in item_ids]
+        else:
+            items = [items[0]]
+
+        entries = []
+        QUALITIES = ('ld', 'sd', 'hd', )
+
+        for item in items:
+            title = item['title']
+            quality = qualities(QUALITIES)
+            formats = []
+            path = None
+            for f in item.get('mbr', []):
+                src = f.get('src')
+                if not src or not isinstance(src, compat_str):
+                    continue
+                tbr = int_or_none(self._search_regex(
+                    r'_(\d{3,})\.mp4', src, 'tbr', default=None))
+                if not path:
+                    path = self._search_regex(
+                        r'//[^/]+/(.+?)_\d+\.mp4', src,
+                        'm3u8 path', default=None)
+                formats.append({
+                    'url': src,
+                    'format_id': f.get('name'),
+                    'tbr': tbr,
+                    'source_preference': quality(f.get('name')),
+                })
+            # m3u8 URL format is reverse engineered from [1] (search for
+            # master.m3u8). dashEdges (that is currently balancer-vod.1tv.ru)
+            # is taken from [2].
+            # 1. http://static.1tv.ru/player/eump1tv-current/eump-1tv.all.min.js?rnd=9097422834:formatted
+            # 2. http://static.1tv.ru/player/eump1tv-config/config-main.js?rnd=9097422834
+            if not path and len(formats) == 1:
+                path = self._search_regex(
+                    r'//[^/]+/(.+?$)', formats[0]['url'],
+                    'm3u8 path', default=None)
+            if path:
+                if len(formats) == 1:
+                    m3u8_path = ','
+                else:
+                    tbrs = [compat_str(t) for t in sorted(f['tbr'] for f in formats)]
+                    m3u8_path = '_,%s,%s' % (','.join(tbrs), '.mp4')
+                formats.extend(self._extract_m3u8_formats(
+                    'http://balancer-vod.1tv.ru/%s%s.urlset/master.m3u8'
+                    % (path, m3u8_path),
+                    display_id, 'mp4',
+                    entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+            self._sort_formats(formats)
+
+            thumbnail = item.get('poster') or self._og_search_thumbnail(webpage)
+            duration = int_or_none(item.get('duration') or self._html_search_meta(
+                'video:duration', webpage, 'video duration', fatal=False))
+            upload_date = unified_strdate(self._html_search_meta(
+                'ya:ovs:upload_date', webpage, 'upload date', default=None))
  
-        item = self._download_json(playlist_url, display_id)[0]
-        video_id = item['id']
-        quality = qualities(('ld', 'sd', 'hd', ))
-        formats = []
-        for f in item.get('mbr', []):
-            src = f.get('src')
-            if not src:
-                continue
-            fname = f.get('name')
-            formats.append({
-                'url': src,
-                'format_id': fname,
-                'quality': quality(fname),
+            entries.append({
+                'id': compat_str(item.get('id') or item['uid']),
+                'thumbnail': thumbnail,
+                'title': title,
+                'upload_date': upload_date,
+                'duration': int_or_none(duration),
+                'formats': formats
              })
-        self._sort_formats(formats)
  
          title = self._html_search_regex(
              (r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
               r"'title'\s*:\s*'([^']+)'"),
-            webpage, 'title', default=None) or item['title']
+            webpage, 'title', default=None) or self._og_search_title(
+            webpage, default=None)
          description = self._html_search_regex(
              r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
              webpage, 'description', default=None) or self._html_search_meta(
-            'description', webpage, 'description')
-        duration = int_or_none(self._html_search_meta(
-            'video:duration', webpage, 'video duration', fatal=False))
-        upload_date = unified_strdate(self._html_search_meta(
-            'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
+            'description', webpage, 'description', default=None)
  
-        return {
-            'id': video_id,
-            'thumbnail': item.get('poster') or self._og_search_thumbnail(webpage),
-            'title': title,
-            'description': description,
-            'upload_date': upload_date,
-            'duration': int_or_none(duration),
-            'formats': formats
-        }
+        return self.playlist_result(entries, display_id, title, description)
diff --git a/youtube_dl/extractor/fivetv.py b/youtube_dl/extractor/fivetv.py

index 13fbc4da2c6fbc7c535c49a66e2a64f9dc042511..15736c9fe91e6d5a860641bcbd3be49636b83b47 100644 (file)
--- a/youtube_dl/extractor/fivetv.py
+++ b/youtube_dl/extractor/fivetv.py
@@ -25,7 +25,7 @@ class FiveTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Россияне выбрали имя для общенациональной платежной системы',
              'description': 'md5:a8aa13e2b7ad36789e9f77a74b6de660',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 180,
          },
      }, {
@@ -35,7 +35,7 @@ class FiveTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': '3D принтер',
              'description': 'md5:d76c736d29ef7ec5c0cf7d7c65ffcb41',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 180,
          },
      }, {
@@ -44,7 +44,7 @@ class FiveTVIE(InfoExtractor):
              'id': 'glavnoe',
              'ext': 'mp4',
              'title': 'Итоги недели с 8 по 14 июня 2015 года',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
diff --git a/youtube_dl/extractor/fktv.py b/youtube_dl/extractor/fktv.py

index a3a2915998dc1cc2fca8f5ccdf6cec6cac0d528b..2958452f470bca7f7322fa9dcdccca66f525cee0 100644 (file)
--- a/youtube_dl/extractor/fktv.py
+++ b/youtube_dl/extractor/fktv.py
@@ -19,7 +19,7 @@ class FKTVIE(InfoExtractor):
              'id': '1',
              'ext': 'mp4',
              'title': 'Folge 1 vom 10. April 2007',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/flipagram.py b/youtube_dl/extractor/flipagram.py

index 1902a23938eb0452eb83908ef6f12f8576bf0617..b7be40f1b90f4f7ace02ba2b2687fe1e2e61ce30 100644 (file)
--- a/youtube_dl/extractor/flipagram.py
+++ b/youtube_dl/extractor/flipagram.py
@@ -81,7 +81,7 @@ class FlipagramIE(InfoExtractor):
              'filesize': int_or_none(cover.get('size')),
          } for cover in flipagram.get('covers', []) if cover.get('url')]
  
-        # Note that this only retrieves comments that are initally loaded.
+        # Note that this only retrieves comments that are initially loaded.
          # For videos with large amounts of comments, most won't be retrieved.
          comments = []
          for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
diff --git a/youtube_dl/extractor/foxgay.py b/youtube_dl/extractor/foxgay.py

index 39174fcecca44b54ce42a174f59f3d14fbec2592..e887ae48869426617fdbf797182cef93f97ac2ef 100644 (file)
--- a/youtube_dl/extractor/foxgay.py
+++ b/youtube_dl/extractor/foxgay.py
@@ -20,7 +20,7 @@ class FoxgayIE(InfoExtractor):
              'title': 'Fuck Turkish-style',
              'description': 'md5:6ae2d9486921891efe89231ace13ffdf',
              'age_limit': 18,
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/foxnews.py b/youtube_dl/extractor/foxnews.py

index 229bcb175789ee78b12ae71dbcca811de69d9b65..dc0662f74ce5a84d59aa94333ee14d56a592cda2 100644 (file)
--- a/youtube_dl/extractor/foxnews.py
+++ b/youtube_dl/extractor/foxnews.py
@@ -22,7 +22,7 @@ class FoxNewsIE(AMPIE):
                  'duration': 265,
                  'timestamp': 1304411491,
                  'upload_date': '20110503',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -36,7 +36,7 @@ class FoxNewsIE(AMPIE):
                  'duration': 292,
                  'timestamp': 1417662047,
                  'upload_date': '20141204',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  # m3u8 download
@@ -111,7 +111,7 @@ class FoxNewsInsiderIE(InfoExtractor):
              'description': 'Is campus censorship getting out of control?',
              'timestamp': 1472168725,
              'upload_date': '20160825',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/franceculture.py b/youtube_dl/extractor/franceculture.py

index 56048ffc21e8de8810b7e6b10122cc621927fbba..b98da692cb23ccc1a6de7a8657f0d8331640280f 100644 (file)
--- a/youtube_dl/extractor/franceculture.py
+++ b/youtube_dl/extractor/franceculture.py
@@ -17,7 +17,7 @@ class FranceCultureIE(InfoExtractor):
              'display_id': 'rendez-vous-au-pays-des-geeks',
              'ext': 'mp3',
              'title': 'Rendez-vous au pays des geeks',
-            'thumbnail': 're:^https?://.*\\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140301',
              'vcodec': 'none',
          }
diff --git a/youtube_dl/extractor/francetv.py b/youtube_dl/extractor/francetv.py

index e7068d1aed9573199211a29a91486bd72e9aecd0..48d43ae58e80bd3b054068e59f4e43464e31ec0f 100644 (file)
--- a/youtube_dl/extractor/francetv.py
+++ b/youtube_dl/extractor/francetv.py
@@ -168,7 +168,7 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
              'id': 'NI_173343',
              'ext': 'mp4',
              'title': 'Les entreprises familiales : le secret de la réussite',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'timestamp': 1433273139,
              'upload_date': '20150602',
          },
@@ -184,7 +184,7 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de l’Armor"',
              'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'timestamp': 1458300695,
              'upload_date': '20160318',
          },
diff --git a/youtube_dl/extractor/freesound.py b/youtube_dl/extractor/freesound.py

index 5ff62af2a33d1743709bdb076dc0c80be0e3156b..138b6bc58cf9aa282c8afc3b6498ba84884197fc 100644 (file)
--- a/youtube_dl/extractor/freesound.py
+++ b/youtube_dl/extractor/freesound.py
@@ -3,10 +3,16 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    get_element_by_class,
+    get_element_by_id,
+    unified_strdate,
+)
  
  
  class FreesoundIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?freesound\.org/people/([^/]+)/sounds/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?freesound\.org/people/[^/]+/sounds/(?P<id>[^/]+)'
      _TEST = {
          'url': 'http://www.freesound.org/people/miklovan/sounds/194503/',
          'md5': '12280ceb42c81f19a515c745eae07650',
@@ -14,26 +20,60 @@ class FreesoundIE(InfoExtractor):
              'id': '194503',
              'ext': 'mp3',
              'title': 'gulls in the city.wav',
-            'uploader': 'miklovan',
              'description': 'the sounds of seagulls in the city',
+            'duration': 130.233,
+            'uploader': 'miklovan',
+            'upload_date': '20130715',
+            'tags': list,
          }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        music_id = mobj.group('id')
-        webpage = self._download_webpage(url, music_id)
-        title = self._html_search_regex(
-            r'<div id="single_sample_header">.*?<a href="#">(.+?)</a>',
-            webpage, 'music title', flags=re.DOTALL)
+        audio_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, audio_id)
+
+        audio_url = self._og_search_property('audio', webpage, 'song url')
+        title = self._og_search_property('audio:title', webpage, 'song title')
+
          description = self._html_search_regex(
-            r'<div id="sound_description">(.*?)</div>', webpage, 'description',
-            fatal=False, flags=re.DOTALL)
+            r'(?s)id=["\']sound_description["\'][^>]*>(.+?)</div>',
+            webpage, 'description', fatal=False)
+
+        duration = float_or_none(
+            get_element_by_class('duration', webpage), scale=1000)
+
+        upload_date = unified_strdate(get_element_by_id('sound_date', webpage))
+        uploader = self._og_search_property(
+            'audio:artist', webpage, 'uploader', fatal=False)
+
+        channels = self._html_search_regex(
+            r'Channels</dt><dd>(.+?)</dd>', webpage,
+            'channels info', fatal=False)
+
+        tags_str = get_element_by_class('tags', webpage)
+        tags = re.findall(r'<a[^>]+>([^<]+)', tags_str) if tags_str else None
+
+        audio_urls = [audio_url]
+
+        LQ_FORMAT = '-lq.mp3'
+        if LQ_FORMAT in audio_url:
+            audio_urls.append(audio_url.replace(LQ_FORMAT, '-hq.mp3'))
+
+        formats = [{
+            'url': format_url,
+            'format_note': channels,
+            'quality': quality,
+        } for quality, format_url in enumerate(audio_urls)]
+        self._sort_formats(formats)
  
          return {
-            'id': music_id,
+            'id': audio_id,
              'title': title,
-            'url': self._og_search_property('audio', webpage, 'music url'),
-            'uploader': self._og_search_property('audio:artist', webpage, 'music uploader'),
              'description': description,
+            'duration': duration,
+            'uploader': uploader,
+            'upload_date': upload_date,
+            'tags': tags,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/freevideo.py b/youtube_dl/extractor/freevideo.py

deleted file mode 100644 (file)

index cd8423a..0000000
--- a/youtube_dl/extractor/freevideo.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import ExtractorError
-
-
-class FreeVideoIE(InfoExtractor):
-    _VALID_URL = r'^https?://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
-
-    _TEST = {
-        'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html',
-        'info_dict': {
-            'id': 'vysukany-zadecek-22033',
-            'ext': 'mp4',
-            'title': 'vysukany-zadecek-22033',
-            'age_limit': 18,
-        },
-        'skip': 'Blocked outside .cz',
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage, handle = self._download_webpage_handle(url, video_id)
-        if '//www.czechav.com/' in handle.geturl():
-            raise ExtractorError(
-                'Access to freevideo is blocked from your location',
-                expected=True)
-
-        video_url = self._search_regex(
-            r'\s+url: "(http://[a-z0-9-]+.cdn.freevideo.cz/stream/.*?/video.mp4)"',
-            webpage, 'video URL')
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': video_id,
-            'age_limit': 18,
-        }
diff --git a/youtube_dl/extractor/funimation.py b/youtube_dl/extractor/funimation.py

index 0ad0d9b6a9fe789228487e861139fa2166d88767..eba00cd5acc0c8d931173a5f85e2e1fa03c2f78f 100644 (file)
--- a/youtube_dl/extractor/funimation.py
+++ b/youtube_dl/extractor/funimation.py
@@ -29,7 +29,7 @@ class FunimationIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Air - 1 - Breeze',
              'description': 'md5:1769f43cd5fc130ace8fd87232207892',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
          },
          'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
      }, {
@@ -40,7 +40,7 @@ class FunimationIE(InfoExtractor):
              'ext': 'mp4',
              'title': '.hack//SIGN - 1 - Role Play',
              'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
          },
          'skip': 'Access without user interaction is forbidden by CloudFlare',
      }, {
@@ -51,7 +51,7 @@ class FunimationIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Attack on Titan: Junior High - Broadcast Dub Preview',
              'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
-            'thumbnail': 're:https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:https?://.*\.(?:jpg|png)',
          },
          'skip': 'Access without user interaction is forbidden by CloudFlare',
      }]
diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py

index f2928b5fecb68df5429a90ca079faac6faf93bb2..81c0ce9a360d3f28476905849565ff341c26b883 100644 (file)
--- a/youtube_dl/extractor/funnyordie.py
+++ b/youtube_dl/extractor/funnyordie.py
@@ -17,7 +17,7 @@ class FunnyOrDieIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Heart-Shaped Box: Literal Video Version',
              'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'thumbnail': r're:^http:.*\.jpg$',
          },
      }, {
          'url': 'http://www.funnyordie.com/embed/e402820827',
@@ -26,7 +26,7 @@ class FunnyOrDieIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Please Use This Song (Jon Lajoie)',
              'description': 'Please use this to sell something.  www.jonlajoie.com',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'thumbnail': r're:^http:.*\.jpg$',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/fusion.py b/youtube_dl/extractor/fusion.py

index b4ab4cbb7e0e86ea9a5ff43a02dfa0ec61e3d25e..ede729b5262c286c347b544fe7493bea020b5afd 100644 (file)
--- a/youtube_dl/extractor/fusion.py
+++ b/youtube_dl/extractor/fusion.py
@@ -29,7 +29,7 @@ class FusionIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id)
  
          ooyala_code = self._search_regex(
-            r'data-video-id=(["\'])(?P<code>.+?)\1',
+            r'data-ooyala-id=(["\'])(?P<code>(?:(?!\1).)+)\1',
              webpage, 'ooyala code', group='code')
  
          return OoyalaIE._build_url_result(ooyala_code)
diff --git a/youtube_dl/extractor/gamersyde.py b/youtube_dl/extractor/gamersyde.py

index d545e01bb8db7a9694efa9d691f817cc9e394357..a218a6944d149d86a549db18a84d6f2ee31b796e 100644 (file)
--- a/youtube_dl/extractor/gamersyde.py
+++ b/youtube_dl/extractor/gamersyde.py
@@ -20,7 +20,7 @@ class GamersydeIE(InfoExtractor):
              'ext': 'mp4',
              'duration': 372,
              'title': 'Bloodborne - Birth of a hero',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/gamespot.py b/youtube_dl/extractor/gamespot.py

index 4e859e09aa16b7608ee103e851f0d0928bbfdb30..682c49e797aab0d63e5ecd7a7ab75e2a65f71e34 100644 (file)
--- a/youtube_dl/extractor/gamespot.py
+++ b/youtube_dl/extractor/gamespot.py
@@ -63,7 +63,7 @@ class GameSpotIE(OnceIE):
              streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
          if progressive_url and manifest_url:
              qualities_basename = self._search_regex(
-                '/([^/]+)\.csmil/',
+                r'/([^/]+)\.csmil/',
                  manifest_url, 'qualities basename', default=None)
              if qualities_basename:
                  QUALITIES_RE = r'((,\d+)+,?)'
diff --git a/youtube_dl/extractor/gamestar.py b/youtube_dl/extractor/gamestar.py

index 55a34604af2cd2bca83ebc2c7957f1f4eb7401f1..e607d6ab8215db56afd3810613f24bb2debf63f5 100644 (file)
--- a/youtube_dl/extractor/gamestar.py
+++ b/youtube_dl/extractor/gamestar.py
@@ -18,7 +18,7 @@ class GameStarIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
              'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1406542020,
              'upload_date': '20140728',
              'duration': 17
diff --git a/youtube_dl/extractor/gaskrank.py b/youtube_dl/extractor/gaskrank.py

new file mode 100644 (file)

index 0000000..36ba7d8
--- /dev/null
+++ b/youtube_dl/extractor/gaskrank.py
@@ -0,0 +1,123 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    int_or_none,
+    js_to_json,
+    unified_strdate,
+)
+
+
+class GaskrankIE(InfoExtractor):
+    """InfoExtractor for gaskrank.tv"""
+    _VALID_URL = r'https?://(?:www\.)?gaskrank\.tv/tv/(?P<categories>[^/]+)/(?P<id>[^/]+)\.html?'
+    _TESTS = [
+        {
+            'url': 'http://www.gaskrank.tv/tv/motorrad-fun/strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden.htm',
+            'md5': '1ae88dbac97887d85ebd1157a95fc4f9',
+            'info_dict': {
+                'id': '201601/26955',
+                'ext': 'mp4',
+                'title': 'Strike! Einparken können nur Männer - Flurschaden hält sich in Grenzen *lol*',
+                'thumbnail': r're:^https?://.*\.jpg$',
+                'categories': ['motorrad-fun'],
+                'display_id': 'strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden',
+                'uploader_id': 'Bikefun',
+                'upload_date': '20170110',
+                'uploader_url': None,
+            }
+        },
+        {
+            'url': 'http://www.gaskrank.tv/tv/racing/isle-of-man-tt-2011-michael-du-15920.htm',
+            'md5': 'c33ee32c711bc6c8224bfcbe62b23095',
+            'info_dict': {
+                'id': '201106/15920',
+                'ext': 'mp4',
+                'title': 'Isle of Man - Michael Dunlop vs Guy Martin - schwindelig kucken',
+                'thumbnail': r're:^https?://.*\.jpg$',
+                'categories': ['racing'],
+                'display_id': 'isle-of-man-tt-2011-michael-du-15920',
+                'uploader_id': 'IOM',
+                'upload_date': '20160506',
+                'uploader_url': 'www.iomtt.com',
+            }
+        }
+    ]
+
+    def _real_extract(self, url):
+        """extract information from gaskrank.tv"""
+        def fix_json(code):
+            """Removes trailing comma in json: {{},} --> {{}}"""
+            return re.sub(r',\s*}', r'}', js_to_json(code))
+
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        categories = [re.match(self._VALID_URL, url).group('categories')]
+        title = self._search_regex(
+            r'movieName\s*:\s*\'([^\']*)\'',
+            webpage, 'title')
+        thumbnail = self._search_regex(
+            r'poster\s*:\s*\'([^\']*)\'',
+            webpage, 'thumbnail', default=None)
+
+        mobj = re.search(
+            r'Video von:\s*(?P<uploader_id>[^|]*?)\s*\|\s*vom:\s*(?P<upload_date>[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9][0-9])',
+            webpage)
+        if mobj is not None:
+            uploader_id = mobj.groupdict().get('uploader_id')
+            upload_date = unified_strdate(mobj.groupdict().get('upload_date'))
+
+        uploader_url = self._search_regex(
+            r'Homepage:\s*<[^>]*>(?P<uploader_url>[^<]*)',
+            webpage, 'uploader_url', default=None)
+        tags = re.findall(
+            r'/tv/tags/[^/]+/"\s*>(?P<tag>[^<]*?)<',
+            webpage)
+
+        view_count = self._search_regex(
+            r'class\s*=\s*"gkRight"(?:[^>]*>\s*<[^>]*)*icon-eye-open(?:[^>]*>\s*<[^>]*)*>\s*(?P<view_count>[0-9\.]*)',
+            webpage, 'view_count', default=None)
+        if view_count:
+            view_count = int_or_none(view_count.replace('.', ''))
+
+        average_rating = self._search_regex(
+            r'itemprop\s*=\s*"ratingValue"[^>]*>\s*(?P<average_rating>[0-9,]+)',
+            webpage, 'average_rating')
+        if average_rating:
+            average_rating = float_or_none(average_rating.replace(',', '.'))
+
+        playlist = self._parse_json(
+            self._search_regex(
+                r'playlist\s*:\s*\[([^\]]*)\]',
+                webpage, 'playlist', default='{}'),
+            display_id, transform_source=fix_json, fatal=False)
+
+        video_id = self._search_regex(
+            r'https?://movies\.gaskrank\.tv/([^-]*?)(-[^\.]*)?\.mp4',
+            playlist.get('0').get('src'), 'video id')
+
+        formats = []
+        for key in playlist:
+            formats.append({
+                'url': playlist[key]['src'],
+                'format_id': key,
+                'quality': playlist[key].get('quality')})
+        self._sort_formats(formats, field_preference=['format_id'])
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': thumbnail,
+            'categories': categories,
+            'display_id': display_id,
+            'uploader_id': uploader_id,
+            'upload_date': upload_date,
+            'uploader_url': uploader_url,
+            'tags': tags,
+            'view_count': view_count,
+            'average_rating': average_rating,
+        }
diff --git a/youtube_dl/extractor/gazeta.py b/youtube_dl/extractor/gazeta.py

index 18ef5c252a9adc0ac2a1e6ae6806d2ea9b5b2546..57c67a4510f428c2c9f466532d8ddc1c19e2c809 100644 (file)
--- a/youtube_dl/extractor/gazeta.py
+++ b/youtube_dl/extractor/gazeta.py
@@ -16,7 +16,7 @@ class GazetaIE(InfoExtractor):
              'ext': 'mp4',
              'title': '«70–80 процентов гражданских в Донецке на грани голода»',
              'description': 'md5:38617526050bd17b234728e7f9620a71',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'skip': 'video not found',
      }, {
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index 3949c8bf7d5f3088b076b78f321fb6657075aded..1c233f038143bbdcda4630dc31c02faa0428bd6c 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -29,6 +29,7 @@ from ..utils import (
      UnsupportedError,
      xpath_text,
  )
+from .commonprotocols import RtmpIE
  from .brightcove import (
      BrightcoveLegacyIE,
      BrightcoveNewIE,
@@ -73,8 +74,15 @@ from .kaltura import KalturaIE
  from .eagleplatform import EaglePlatformIE
  from .facebook import FacebookIE
  from .soundcloud import SoundcloudIE
+from .tunein import TuneInBaseIE
  from .vbox7 import Vbox7IE
  from .dbtv import DBTVIE
+from .piksel import PikselIE
+from .videa import VideaIE
+from .twentymin import TwentyMinutenIE
+from .ustream import UstreamIE
+from .openload import OpenloadIE
+from .videopress import VideoPressIE
  
  
  class GenericIE(InfoExtractor):
@@ -236,7 +244,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Tikibad ontruimd wegens brand',
                  'description': 'md5:05ca046ff47b931f9b04855015e163a4',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 33,
              },
              'params': {
@@ -297,7 +305,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'upload_date': '20130224',
                  'uploader_id': 'TheVerge',
-                'description': 're:^Chris Ziegler takes a look at the\.*',
+                'description': r're:^Chris Ziegler takes a look at the\.*',
                  'uploader': 'The Verge',
                  'title': 'First Firefox OS phones side-by-side',
              },
@@ -343,10 +351,10 @@ class GenericIE(InfoExtractor):
              },
              'skip': 'There is a limit of 200 free downloads / month for the test song',
          },
-        # embedded brightcove video
-        # it also tests brightcove videos that need to set the 'Referer' in the
-        # http requests
          {
+            # embedded brightcove video
+            # it also tests brightcove videos that need to set the 'Referer'
+            # in the http requests
              'add_ie': ['BrightcoveLegacy'],
              'url': 'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
              'info_dict': {
@@ -360,6 +368,24 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,
              },
          },
+        {
+            # embedded with itemprop embedURL and video id spelled as `idVideo`
+            'add_id': ['BrightcoveLegacy'],
+            'url': 'http://bfmbusiness.bfmtv.com/mediaplayer/chroniques/olivier-delamarche/',
+            'info_dict': {
+                'id': '5255628253001',
+                'ext': 'mp4',
+                'title': 'md5:37c519b1128915607601e75a87995fc0',
+                'description': 'md5:37f7f888b434bb8f8cc8dbd4f7a4cf26',
+                'uploader': 'BFM BUSINESS',
+                'uploader_id': '876450612001',
+                'timestamp': 1482255315,
+                'upload_date': '20161220',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
          {
              # https://github.com/rg3/youtube-dl/issues/2253
              'url': 'http://bcove.me/i6nfkrc3',
@@ -401,6 +427,26 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,  # m3u8 download
              },
          },
+        {
+            # Brightcove with alternative playerID key
+            'url': 'http://www.nature.com/nmeth/journal/v9/n7/fig_tab/nmeth.2062_SV1.html',
+            'info_dict': {
+                'id': 'nmeth.2062_SV1',
+                'title': 'Simultaneous multiview imaging of the Drosophila syncytial blastoderm : Quantitative high-speed imaging of entire developing embryos with simultaneous multiview light-sheet microscopy : Nature Methods : Nature Research',
+            },
+            'playlist': [{
+                'info_dict': {
+                    'id': '2228375078001',
+                    'ext': 'mp4',
+                    'title': 'nmeth.2062-sv1',
+                    'description': 'nmeth.2062-sv1',
+                    'timestamp': 1363357591,
+                    'upload_date': '20130315',
+                    'uploader': 'Nature Publishing Group',
+                    'uploader_id': '1964492299001',
+                },
+            }],
+        },
          # ooyala video
          {
              'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
@@ -518,7 +564,7 @@ class GenericIE(InfoExtractor):
                  'id': 'f4dafcad-ff21-423d-89b5-146cfd89fa1e',
                  'ext': 'mp4',
                  'title': 'Ужастики, русский трейлер (2015)',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 153,
              }
          },
@@ -546,17 +592,6 @@ class GenericIE(InfoExtractor):
                  'description': 'md5:8145d19d320ff3e52f28401f4c4283b9',
              }
          },
-        # Embedded Ustream video
-        {
-            'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
-            'md5': '27b99cdb639c9b12a79bca876a073417',
-            'info_dict': {
-                'id': '45734260',
-                'ext': 'flv',
-                'uploader': 'AU SPA:  The NSA and Privacy',
-                'title': 'NSA and Privacy Forum Debate featuring General Hayden and Barton Gellman'
-            }
-        },
          # nowvideo embed hidden behind percent encoding
          {
              'url': 'http://www.waoanime.tv/the-super-dimension-fortress-macross-episode-1/',
@@ -738,7 +773,7 @@ class GenericIE(InfoExtractor):
                  'duration': 48,
                  'timestamp': 1401537900,
                  'upload_date': '20140531',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          # Wistia embed
@@ -808,6 +843,21 @@ class GenericIE(InfoExtractor):
              },
              'playlist_mincount': 7,
          },
+        # TuneIn station embed
+        {
+            'url': 'http://radiocnrv.com/promouvoir-radio-cnrv/',
+            'info_dict': {
+                'id': '204146',
+                'ext': 'mp3',
+                'title': 'CNRV',
+                'location': 'Paris, France',
+                'is_live': True,
+            },
+            'params': {
+                # Live stream
+                'skip_download': True,
+            },
+        },
          # Livestream embed
          {
              'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast',
@@ -898,6 +948,19 @@ class GenericIE(InfoExtractor):
                  'title': 'Webinar: Using Discovery, The National Archives’ online catalogue',
              },
          },
+        # jwplayer rtmp
+        {
+            'url': 'http://www.suffolk.edu/sjc/',
+            'info_dict': {
+                'id': 'sjclive',
+                'ext': 'flv',
+                'title': 'Massachusetts Supreme Judicial Court Oral Arguments',
+                'uploader': 'www.suffolk.edu',
+            },
+            'params': {
+                'skip_download': True,
+            }
+        },
          # rtl.nl embed
          {
              'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@@ -972,6 +1035,20 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,
              }
          },
+        {
+            # Kaltura embedded, some fileExt broken (#11480)
+            'url': 'http://www.cornell.edu/video/nima-arkani-hamed-standard-models-of-particle-physics',
+            'info_dict': {
+                'id': '1_sgtvehim',
+                'ext': 'mp4',
+                'title': 'Our "Standard Models" of particle physics and cosmology',
+                'description': 'md5:67ea74807b8c4fea92a6f38d6d323861',
+                'timestamp': 1321158993,
+                'upload_date': '20111113',
+                'uploader_id': 'kps1',
+            },
+            'add_ie': ['Kaltura'],
+        },
          # Eagle.Platform embed (generic URL)
          {
              'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -981,7 +1058,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Навальный вышел на свободу',
                  'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 87,
                  'view_count': int,
                  'age_limit': 0,
@@ -995,7 +1072,7 @@ class GenericIE(InfoExtractor):
                  'id': '12820',
                  'ext': 'mp4',
                  'title': "'O Sole Mio",
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 216,
                  'view_count': int,
              },
@@ -1008,7 +1085,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Тайны перевала Дятлова • 1 серия 2 часть',
                  'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 694,
                  'age_limit': 0,
              },
@@ -1020,7 +1097,7 @@ class GenericIE(InfoExtractor):
                  'id': '3519514',
                  'ext': 'mp4',
                  'title': 'Joe Dirt 2 Beautiful Loser Teaser Trailer',
-                'thumbnail': 're:^https?://.*\.png$',
+                'thumbnail': r're:^https?://.*\.png$',
                  'duration': 45.115,
              },
          },
@@ -1103,7 +1180,7 @@ class GenericIE(InfoExtractor):
                  'id': '300346',
                  'ext': 'mp4',
                  'title': '中一中男師變性 全校師生力挺',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  # m3u8 download
@@ -1149,7 +1226,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Sauvons les abeilles ! - Le débat',
                  'description': 'md5:d9082128b1c5277987825d684939ca26',
-                'thumbnail': 're:^https?://.*\.jpe?g$',
+                'thumbnail': r're:^https?://.*\.jpe?g$',
                  'timestamp': 1434970506,
                  'upload_date': '20150622',
                  'uploader': 'Public Sénat',
@@ -1163,7 +1240,7 @@ class GenericIE(InfoExtractor):
                  'id': '2855',
                  'ext': 'mp4',
                  'title': 'Don’t Understand Bitcoin? This Man Will Mumble An Explanation At You',
-                'thumbnail': 're:^https?://.*\.jpe?g$',
+                'thumbnail': r're:^https?://.*\.jpe?g$',
                  'uploader': 'ClickHole',
                  'uploader_id': 'clickhole',
              }
@@ -1389,6 +1466,44 @@ class GenericIE(InfoExtractor):
              },
              'playlist_mincount': 3,
          },
+        {
+            # Videa embeds
+            'url': 'http://forum.dvdtalk.com/movie-talk/623756-deleted-magic-star-wars-ot-deleted-alt-scenes-docu-style.html',
+            'info_dict': {
+                'id': '623756-deleted-magic-star-wars-ot-deleted-alt-scenes-docu-style',
+                'title': 'Deleted Magic - Star Wars: OT Deleted / Alt. Scenes Docu. Style - DVD Talk Forum',
+            },
+            'playlist_mincount': 2,
+        },
+        {
+            # 20 minuten embed
+            'url': 'http://www.20min.ch/schweiz/news/story/So-kommen-Sie-bei-Eis-und-Schnee-sicher-an-27032552',
+            'info_dict': {
+                'id': '523629',
+                'ext': 'mp4',
+                'title': 'So kommen Sie bei Eis und Schnee sicher an',
+                'description': 'md5:117c212f64b25e3d95747e5276863f7d',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'add_ie': [TwentyMinutenIE.ie_key()],
+        },
+        {
+            # VideoPress embed
+            'url': 'https://en.support.wordpress.com/videopress/',
+            'info_dict': {
+                'id': 'OcobLTqC',
+                'ext': 'm4v',
+                'title': 'IMG_5786',
+                'timestamp': 1435711927,
+                'upload_date': '20150701',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'add_ie': [VideoPressIE.ie_key()],
+        }
          # {
          #     # TODO: find another test
          #     # http://schema.org/VideoObject
@@ -1880,7 +1995,14 @@ class GenericIE(InfoExtractor):
                  re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
                  re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
          if mobj is not None:
-            return OoyalaIE._build_url_result(smuggle_url(mobj.group('ec'), {'domain': url}))
+            embed_token = self._search_regex(
+                r'embedToken[\'"]?\s*:\s*[\'"]([^\'"]+)',
+                webpage, 'ooyala embed token', default=None)
+            return OoyalaIE._build_url_result(smuggle_url(
+                mobj.group('ec'), {
+                    'domain': url,
+                    'embed_token': embed_token,
+                }))
  
          # Look for multiple Ooyala embeds on SBN network websites
          mobj = re.search(r'SBN\.VideoLinkset\.entryGroup\((\[.*?\])', webpage)
@@ -2011,10 +2133,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'), 'TED')
  
          # Look for embedded Ustream videos
-        mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'Ustream')
+        ustream_url = UstreamIE._extract_url(webpage)
+        if ustream_url:
+            return self.url_result(ustream_url, UstreamIE.ie_key())
  
          # Look for embedded arte.tv player
          mobj = re.search(
@@ -2045,6 +2166,11 @@ class GenericIE(InfoExtractor):
          if soundcloud_urls:
              return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
  
+        # Look for tunein player
+        tunein_urls = TuneInBaseIE._extract_urls(webpage)
+        if tunein_urls:
+            return _playlist_from_matches(tunein_urls)
+
          # Look for embedded mtvservices player
          mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
          if mtvservices_url:
@@ -2211,6 +2337,11 @@ class GenericIE(InfoExtractor):
          if arkena_url:
              return self.url_result(arkena_url, ArkenaIE.ie_key())
  
+        # Look for Piksel embeds
+        piksel_url = PikselIE._extract_url(webpage)
+        if piksel_url:
+            return self.url_result(piksel_url, PikselIE.ie_key())
+
          # Look for Limelight embeds
          mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
          if mobj:
@@ -2320,6 +2451,29 @@ class GenericIE(InfoExtractor):
          if dbtv_urls:
              return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key())
  
+        # Look for Videa embeds
+        videa_urls = VideaIE._extract_urls(webpage)
+        if videa_urls:
+            return _playlist_from_matches(videa_urls, ie=VideaIE.ie_key())
+
+        # Look for 20 minuten embeds
+        twentymin_urls = TwentyMinutenIE._extract_urls(webpage)
+        if twentymin_urls:
+            return _playlist_from_matches(
+                twentymin_urls, ie=TwentyMinutenIE.ie_key())
+
+        # Look for Openload embeds
+        openload_urls = OpenloadIE._extract_urls(webpage)
+        if openload_urls:
+            return _playlist_from_matches(
+                openload_urls, ie=OpenloadIE.ie_key())
+
+        # Look for VideoPress embeds
+        videopress_urls = VideoPressIE._extract_urls(webpage)
+        if videopress_urls:
+            return _playlist_from_matches(
+                videopress_urls, ie=VideoPressIE.ie_key())
+
          # Looking for http://schema.org/VideoObject
          json_ld = self._search_json_ld(
              webpage, video_id, default={}, expected_type='VideoObject')
@@ -2347,6 +2501,8 @@ class GenericIE(InfoExtractor):
          def check_video(vurl):
              if YoutubeIE.suitable(vurl):
                  return True
+            if RtmpIE.suitable(vurl):
+                return True
              vpath = compat_urlparse.urlparse(vurl).path
              vext = determine_ext(vpath)
              return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js')
@@ -2454,6 +2610,15 @@ class GenericIE(InfoExtractor):
                  'age_limit': age_limit,
              }
  
+            if RtmpIE.suitable(video_url):
+                entry_info_dict.update({
+                    '_type': 'url_transparent',
+                    'ie_key': RtmpIE.ie_key(),
+                    'url': video_url,
+                })
+                entries.append(entry_info_dict)
+                continue
+
              ext = determine_ext(video_url)
              if ext == 'smil':
                  entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id)
diff --git a/youtube_dl/extractor/giantbomb.py b/youtube_dl/extractor/giantbomb.py

index 87cd19147d707c50606c43eecb54aef828ba778b..29b684d35875031c0b5a256e0f12cf0695b90353 100644 (file)
--- a/youtube_dl/extractor/giantbomb.py
+++ b/youtube_dl/extractor/giantbomb.py
@@ -23,7 +23,7 @@ class GiantBombIE(InfoExtractor):
              'title': 'Quick Look: Destiny: The Dark Below',
              'description': 'md5:0aa3aaf2772a41b91d44c63f30dfad24',
              'duration': 2399,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/giga.py b/youtube_dl/extractor/giga.py

index 28eb733e2bac89818a54952f77b342fec6ebe4ff..5a9992a278580478655ebe7856d8183b2a56e58d 100644 (file)
--- a/youtube_dl/extractor/giga.py
+++ b/youtube_dl/extractor/giga.py
@@ -24,7 +24,7 @@ class GigaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Anime Awesome: Chihiros Reise ins Zauberland – Das Beste kommt zum Schluss',
              'description': 'md5:afdf5862241aded4718a30dff6a57baf',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 578,
              'timestamp': 1414749706,
              'upload_date': '20141031',
diff --git a/youtube_dl/extractor/glide.py b/youtube_dl/extractor/glide.py

index f0d951396fdba4f74027e81af629f7c27c253f9a..d94dfbf09307b44ddfd6b1576ebca67eb6b6f349 100644 (file)
--- a/youtube_dl/extractor/glide.py
+++ b/youtube_dl/extractor/glide.py
@@ -14,7 +14,7 @@ class GlideIE(InfoExtractor):
              'id': 'UZF8zlmuQbe4mr+7dCiQ0w==',
              'ext': 'mp4',
              'title': "Damon's Glide message",
-            'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
+            'thumbnail': r're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/go.py b/youtube_dl/extractor/go.py

index c7776b1868e617cf78e37dd4c4b3bc742e89e8b7..a34779b169ddf852d3378389f07189c1b051d38c 100644 (file)
--- a/youtube_dl/extractor/go.py
+++ b/youtube_dl/extractor/go.py
@@ -43,7 +43,10 @@ class GoIE(InfoExtractor):
          sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
          if not video_id:
              webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id')
+            video_id = self._search_regex(
+                # There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
+                # from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
+                r'data-video-id=["\']*VDKA(\w+)', webpage, 'video id')
          brand = self._BRANDS[sub_domain]
          video_data = self._download_json(
              'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
diff --git a/youtube_dl/extractor/godtube.py b/youtube_dl/extractor/godtube.py

index 363dc66086e350af241959f2b547004ebd07d6db..92efd16b3e6234d9d64392a14ba47ac3f315a942 100644 (file)
--- a/youtube_dl/extractor/godtube.py
+++ b/youtube_dl/extractor/godtube.py
@@ -23,7 +23,7 @@ class GodTubeIE(InfoExtractor):
                  'timestamp': 1205712000,
                  'uploader': 'beverlybmusic',
                  'upload_date': '20080317',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
      ]
diff --git a/youtube_dl/extractor/googledrive.py b/youtube_dl/extractor/googledrive.py

index 766fc26d0f01145bdd2456a221940fa60ece6953..fec36cbbb7f43d2b8b37370aec270f543e8f257d 100644 (file)
--- a/youtube_dl/extractor/googledrive.py
+++ b/youtube_dl/extractor/googledrive.py
@@ -6,6 +6,7 @@ from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
+    lowercase_escape,
  )
  
  
@@ -13,12 +14,12 @@ class GoogleDriveIE(InfoExtractor):
      _VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28,})'
      _TESTS = [{
          'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1',
-        'md5': '881f7700aec4f538571fa1e0eed4a7b6',
+        'md5': 'd109872761f7e7ecf353fa108c0dbe1e',
          'info_dict': {
              'id': '0ByeS4oOUV-49Zzh4R1J6R09zazQ',
              'ext': 'mp4',
              'title': 'Big Buck Bunny.mp4',
-            'duration': 46,
+            'duration': 45,
          }
      }, {
          # video id is longer than 28 characters
@@ -55,7 +56,7 @@ class GoogleDriveIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(
-            'http://docs.google.com/file/d/%s' % video_id, video_id, encoding='unicode_escape')
+            'http://docs.google.com/file/d/%s' % video_id, video_id)
  
          reason = self._search_regex(r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
          if reason:
@@ -74,7 +75,7 @@ class GoogleDriveIE(InfoExtractor):
              resolution = fmt.split('/')[1]
              width, height = resolution.split('x')
              formats.append({
-                'url': fmt_url,
+                'url': lowercase_escape(fmt_url),
                  'format_id': fmt_id,
                  'resolution': resolution,
                  'width': int_or_none(width),
diff --git a/youtube_dl/extractor/goshgay.py b/youtube_dl/extractor/goshgay.py

index 74e1720ee325da8fb4c011eddec342fe2de62d9b..377981d3e41ca76c29daeecbb5045928dff87a43 100644 (file)
--- a/youtube_dl/extractor/goshgay.py
+++ b/youtube_dl/extractor/goshgay.py
@@ -19,7 +19,7 @@ class GoshgayIE(InfoExtractor):
              'id': '299069',
              'ext': 'flv',
              'title': 'DIESEL SFW XXX Video',
-            'thumbnail': 're:^http://.*\.jpg$',
+            'thumbnail': r're:^http://.*\.jpg$',
              'duration': 80,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/hbo.py b/youtube_dl/extractor/hbo.py

index cbf774377b7261c326bd71f5db2d5de8216be5f4..8116ad9bd42f840bc5875070d5f40e8d904b7abb 100644 (file)
--- a/youtube_dl/extractor/hbo.py
+++ b/youtube_dl/extractor/hbo.py
@@ -120,7 +120,7 @@ class HBOIE(HBOBaseIE):
              'id': '1437839',
              'ext': 'mp4',
              'title': 'Ep. 64 Clip: Encryption',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 1072,
          }
      }
@@ -141,7 +141,7 @@ class HBOEpisodeIE(HBOBaseIE):
              'display_id': 'ep-52-inside-the-episode',
              'ext': 'mp4',
              'title': 'Ep. 52: Inside the Episode',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 240,
          },
      }, {
diff --git a/youtube_dl/extractor/hearthisat.py b/youtube_dl/extractor/hearthisat.py

index 2564538820e7d534adc24fd8c967ee44490e0dc3..18c2520120463ebf17253f0696275f2ea2736d66 100644 (file)
--- a/youtube_dl/extractor/hearthisat.py
+++ b/youtube_dl/extractor/hearthisat.py
@@ -25,7 +25,7 @@ class HearThisAtIE(InfoExtractor):
              'id': '150939',
              'ext': 'wav',
              'title': 'Moofi - Dr. Kreep',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1421564134,
              'description': 'Listen to Dr. Kreep by Moofi on hearthis.at - Modular, Eurorack, Mutable Intruments Braids, Valhalla-DSP',
              'upload_date': '20150118',
@@ -46,7 +46,7 @@ class HearThisAtIE(InfoExtractor):
              'description': 'Listen to DJ Jim Hopkins -  Totally Bitchin\' 80\'s Dance Mix! by TwitchSF on hearthis.at - Dance',
              'upload_date': '20160328',
              'timestamp': 1459186146,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'comment_count': int,
              'view_count': int,
              'like_count': int,
diff --git a/youtube_dl/extractor/heise.py b/youtube_dl/extractor/heise.py

index 278d9f527fd41c8e1e2c180a9ae455a23fbef1fc..1629cdb8d5a7ca584321474cb160f9907884dd69 100644 (file)
--- a/youtube_dl/extractor/heise.py
+++ b/youtube_dl/extractor/heise.py
@@ -29,7 +29,7 @@ class HeiseIE(InfoExtractor):
              'timestamp': 1411812600,
              'upload_date': '20140927',
              'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
          }
      }
  
diff --git a/youtube_dl/extractor/hellporno.py b/youtube_dl/extractor/hellporno.py

index 10da1406787c51fc56d7fb55f120bb6908a5b49a..0ee8ea712c72e618a4d7544f26c376e94fcaf70d 100644 (file)
--- a/youtube_dl/extractor/hellporno.py
+++ b/youtube_dl/extractor/hellporno.py
@@ -20,7 +20,7 @@ class HellPornoIE(InfoExtractor):
              'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
              'ext': 'mp4',
              'title': 'Dixie is posing with naked ass very erotic',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }, {
diff --git a/youtube_dl/extractor/historicfilms.py b/youtube_dl/extractor/historicfilms.py

index 6a36933ac2c98ada87b21af4089aa158d42a3112..56343e98fb6fe33b7d714289c60db47156f48ef2 100644 (file)
--- a/youtube_dl/extractor/historicfilms.py
+++ b/youtube_dl/extractor/historicfilms.py
@@ -14,7 +14,7 @@ class HistoricFilmsIE(InfoExtractor):
              'ext': 'mov',
              'title': 'Historic Films: GP-7',
              'description': 'md5:1a86a0f3ac54024e419aba97210d959a',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 2096,
          },
      }
diff --git a/youtube_dl/extractor/hitbox.py b/youtube_dl/extractor/hitbox.py

index ff797438dec12303aab55af0e29aac8bd35229c5..e21ebb8fb4057ac6b6d226b3c0f99501b3340cdf 100644 (file)
--- a/youtube_dl/extractor/hitbox.py
+++ b/youtube_dl/extractor/hitbox.py
@@ -25,7 +25,7 @@ class HitboxIE(InfoExtractor):
              'alt_title': 'hitboxlive - Aug 9th #6',
              'description': '',
              'ext': 'mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 215.1666,
              'resolution': 'HD 720p',
              'uploader': 'hitboxlive',
@@ -163,7 +163,7 @@ class HitboxLiveIE(HitboxIE):
              if cdn.get('rtmpSubscribe') is True:
                  continue
              base_url = cdn.get('netConnectionUrl')
-            host = re.search('.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1)
+            host = re.search(r'.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1)
              if base_url not in servers:
                  servers.append(base_url)
                  for stream in cdn.get('bitrates'):
diff --git a/youtube_dl/extractor/hitrecord.py b/youtube_dl/extractor/hitrecord.py

new file mode 100644 (file)

index 0000000..01a6946
--- /dev/null
+++ b/youtube_dl/extractor/hitrecord.py
@@ -0,0 +1,68 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    clean_html,
+    float_or_none,
+    int_or_none,
+    try_get,
+)
+
+
+class HitRecordIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?hitrecord\.org/records/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://hitrecord.org/records/2954362',
+        'md5': 'fe1cdc2023bce0bbb95c39c57426aa71',
+        'info_dict': {
+            'id': '2954362',
+            'ext': 'mp4',
+            'title': 'A Very Different World (HITRECORD x ACLU)',
+            'description': 'md5:e62defaffab5075a5277736bead95a3d',
+            'duration': 139.327,
+            'timestamp': 1471557582,
+            'upload_date': '20160818',
+            'uploader': 'Zuzi.C12',
+            'uploader_id': '362811',
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+            'tags': list,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://hitrecord.org/api/web/records/%s' % video_id, video_id)
+
+        title = video['title']
+        video_url = video['source_url']['mp4_url']
+
+        tags = None
+        tags_list = try_get(video, lambda x: x['tags'], list)
+        if tags_list:
+            tags = [
+                t['text']
+                for t in tags_list
+                if isinstance(t, dict) and t.get('text') and
+                isinstance(t['text'], compat_str)]
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'title': title,
+            'description': clean_html(video.get('body')),
+            'duration': float_or_none(video.get('duration'), 1000),
+            'timestamp': int_or_none(video.get('created_at_i')),
+            'uploader': try_get(
+                video, lambda x: x['user']['username'], compat_str),
+            'uploader_id': try_get(
+                video, lambda x: compat_str(x['user']['id'])),
+            'view_count': int_or_none(video.get('total_views_count')),
+            'like_count': int_or_none(video.get('hearts_count')),
+            'comment_count': int_or_none(video.get('comments_count')),
+            'tags': tags,
+        }
diff --git a/youtube_dl/extractor/hornbunny.py b/youtube_dl/extractor/hornbunny.py

index 0615f06af4139acbd3164f5aaac1ab2ede4cdc27..c458a959d9767c47eeaaf7a05f5c853637b6ab34 100644 (file)
--- a/youtube_dl/extractor/hornbunny.py
+++ b/youtube_dl/extractor/hornbunny.py
@@ -20,7 +20,7 @@ class HornBunnyIE(InfoExtractor):
              'duration': 550,
              'age_limit': 18,
              'view_count': int,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/howstuffworks.py b/youtube_dl/extractor/howstuffworks.py

index 65ba2a48b069bd67d2b3382f2d87bc1160145612..2be68abad0af91f1b508bc2cfa6e984ac39dbfd0 100644 (file)
--- a/youtube_dl/extractor/howstuffworks.py
+++ b/youtube_dl/extractor/howstuffworks.py
@@ -21,7 +21,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'title': 'Cool Jobs - Iditarod Musher',
                  'description': 'Cold sleds, freezing temps and warm dog breath... an Iditarod musher\'s dream. Kasey-Dee Gardner jumps on a sled to find out what the big deal is.',
                  'display_id': 'cool-jobs-iditarod-musher',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 161,
              },
              'skip': 'Video broken',
@@ -34,7 +34,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'title': 'Survival Zone: Food and Water In the Savanna',
                  'description': 'Learn how to find both food and water while trekking in the African savannah. In this video from the Discovery Channel.',
                  'display_id': 'survival-zone-food-and-water-in-the-savanna',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -45,7 +45,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'title': 'Sword Swallowing #1 by Dan Meyer',
                  'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
                  'display_id': 'sword-swallowing-1-by-dan-meyer',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/huajiao.py b/youtube_dl/extractor/huajiao.py

index cec0df09a1e78dcff6d2ed4118200e96a55b0050..4ca275dda18e45e18fd628a0c8a5104fd6cfb560 100644 (file)
--- a/youtube_dl/extractor/huajiao.py
+++ b/youtube_dl/extractor/huajiao.py
@@ -20,7 +20,7 @@ class HuajiaoIE(InfoExtractor):
              'title': '#新人求关注#',
              'description': 're:.*',
              'duration': 2424.0,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1475866459,
              'upload_date': '20161007',
              'uploader': 'Penny_余姿昀',
diff --git a/youtube_dl/extractor/huffpost.py b/youtube_dl/extractor/huffpost.py

index 059073749e67605464b6159b9391f71eb5a6052d..97e36f0568f45c0da495cdb54851405a12e51fc7 100644 (file)
--- a/youtube_dl/extractor/huffpost.py
+++ b/youtube_dl/extractor/huffpost.py
@@ -52,7 +52,7 @@ class HuffPostIE(InfoExtractor):
  
          thumbnails = []
          for url in filter(None, data['images'].values()):
-            m = re.match('.*-([0-9]+x[0-9]+)\.', url)
+            m = re.match(r'.*-([0-9]+x[0-9]+)\.', url)
              if not m:
                  continue
              thumbnails.append({
diff --git a/youtube_dl/extractor/imdb.py b/youtube_dl/extractor/imdb.py

index f0fc8d49a4ad50c128d124534fc37141cb510ba6..f95c00c7330f3db4b5354161804460cbc2bb53d0 100644 (file)
--- a/youtube_dl/extractor/imdb.py
+++ b/youtube_dl/extractor/imdb.py
@@ -13,7 +13,7 @@ from ..utils import (
  class ImdbIE(InfoExtractor):
      IE_NAME = 'imdb'
      IE_DESC = 'Internet Movie Database trailers'
-    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-|videoplayer/)vi(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@@ -32,6 +32,9 @@ class ImdbIE(InfoExtractor):
      }, {
          'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
          'only_matching': True,
+    }, {
+        'url': 'http://www.imdb.com/videoplayer/vi1562949145',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/inc.py b/youtube_dl/extractor/inc.py

new file mode 100644 (file)

index 0000000..241ec83
--- /dev/null
+++ b/youtube_dl/extractor/inc.py
@@ -0,0 +1,41 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .kaltura import KalturaIE
+
+
+class IncIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?inc\.com/(?:[^/]+/)+(?P<id>[^.]+).html'
+    _TESTS = [{
+        'url': 'http://www.inc.com/tip-sheet/bill-gates-says-these-5-books-will-make-you-smarter.html',
+        'md5': '7416739c9c16438c09fa35619d6ba5cb',
+        'info_dict': {
+            'id': '1_wqig47aq',
+            'ext': 'mov',
+            'title': 'Bill Gates Says These 5 Books Will Make You Smarter',
+            'description': 'md5:bea7ff6cce100886fc1995acb743237e',
+            'timestamp': 1474414430,
+            'upload_date': '20160920',
+            'uploader_id': 'video@inc.com',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.inc.com/video/david-whitford/founders-forum-tripadvisor-steve-kaufer-most-enjoyable-moment-for-entrepreneur.html',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        partner_id = self._search_regex(
+            r'var\s+_?bizo_data_partner_id\s*=\s*["\'](\d+)', webpage, 'partner id')
+
+        kaltura_id = self._parse_json(self._search_regex(
+            r'pageInfo\.videos\s*=\s*\[(.+)\];', webpage, 'kaltura id'),
+            display_id)['vid_kaltura_id']
+
+        return self.url_result(
+            'kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
diff --git a/youtube_dl/extractor/indavideo.py b/youtube_dl/extractor/indavideo.py

index c6f080484a99f43614f104ead8023e8e57609cda..11cf3c60964fe55c21282ecccf48a7d80ae4bac5 100644 (file)
--- a/youtube_dl/extractor/indavideo.py
+++ b/youtube_dl/extractor/indavideo.py
@@ -19,7 +19,7 @@ class IndavideoEmbedIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Cicatánc',
              'description': '',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'cukiajanlo',
              'uploader_id': '83729',
              'timestamp': 1439193826,
@@ -102,7 +102,7 @@ class IndavideoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Vicces cica',
              'description': 'Játszik a tablettel. :D',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Jet_Pack',
              'uploader_id': '491217',
              'timestamp': 1390821212,
diff --git a/youtube_dl/extractor/infoq.py b/youtube_dl/extractor/infoq.py

index cca0b8a9323c0d2412c65610a3acb3ef2943ba6f..9fb71e8effe107c6e182c3537cd634b7ac21e9bb 100644 (file)
--- a/youtube_dl/extractor/infoq.py
+++ b/youtube_dl/extractor/infoq.py
@@ -4,7 +4,10 @@ from __future__ import unicode_literals
  
  import base64
  
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+    compat_urllib_parse_unquote,
+    compat_urlparse,
+)
  from ..utils import determine_ext
  from .bokecc import BokeCCBaseIE
  
@@ -33,9 +36,21 @@ class InfoQIE(BokeCCBaseIE):
              'ext': 'flv',
              'description': 'md5:308d981fb28fa42f49f9568322c683ff',
          },
+    }, {
+        'url': 'https://www.infoq.com/presentations/Simple-Made-Easy',
+        'md5': '0e34642d4d9ef44bf86f66f6399672db',
+        'info_dict': {
+            'id': 'Simple-Made-Easy',
+            'title': 'Simple Made Easy',
+            'ext': 'mp3',
+            'description': 'md5:3e0e213a8bbd074796ef89ea35ada25b',
+        },
+        'params': {
+            'format': 'bestaudio',
+        },
      }]
  
-    def _extract_rtmp_videos(self, webpage):
+    def _extract_rtmp_video(self, webpage):
          # The server URL is hardcoded
          video_url = 'rtmpe://video.infoq.com/cfx/st/'
  
@@ -47,28 +62,53 @@ class InfoQIE(BokeCCBaseIE):
          playpath = 'mp4:' + real_id
  
          return [{
-            'format_id': 'rtmp',
+            'format_id': 'rtmp_video',
              'url': video_url,
              'ext': determine_ext(playpath),
              'play_path': playpath,
          }]
  
-    def _extract_http_videos(self, webpage):
-        http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
-
+    def _extract_cookies(self, webpage):
          policy = self._search_regex(r'InfoQConstants.scp\s*=\s*\'([^\']+)\'', webpage, 'policy')
          signature = self._search_regex(r'InfoQConstants.scs\s*=\s*\'([^\']+)\'', webpage, 'signature')
          key_pair_id = self._search_regex(r'InfoQConstants.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id')
+        return 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % (
+            policy, signature, key_pair_id)
  
+    def _extract_http_video(self, webpage):
+        http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
          return [{
-            'format_id': 'http',
+            'format_id': 'http_video',
              'url': http_video_url,
              'http_headers': {
-                'Cookie': 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % (
-                    policy, signature, key_pair_id),
+                'Cookie': self._extract_cookies(webpage)
              },
          }]
  
+    def _extract_http_audio(self, webpage, video_id):
+        fields = self._hidden_inputs(webpage)
+        http_audio_url = fields['filename']
+        if http_audio_url is None:
+            return []
+
+        cookies_header = {'Cookie': self._extract_cookies(webpage)}
+
+        # base URL is found in the Location header in the response returned by
+        # GET https://www.infoq.com/mp3download.action?filename=... when logged in.
+        http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url)
+
+        # audio file seem to be missing some times even if there is a download link
+        # so probe URL to make sure
+        if not self._is_valid_url(http_audio_url, video_id, headers=cookies_header):
+            return []
+
+        return [{
+            'format_id': 'http_audio',
+            'url': http_audio_url,
+            'vcodec': 'none',
+            'http_headers': cookies_header,
+        }]
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
@@ -80,7 +120,10 @@ class InfoQIE(BokeCCBaseIE):
              # for China videos, HTTP video URL exists but always fails with 403
              formats = self._extract_bokecc_formats(webpage, video_id)
          else:
-            formats = self._extract_rtmp_videos(webpage) + self._extract_http_videos(webpage)
+            formats = (
+                self._extract_rtmp_video(webpage) +
+                self._extract_http_video(webpage) +
+                self._extract_http_audio(webpage, video_id))
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py

index 196407b063a9393b94c759be6c8080de9a494277..98f408c18650cf8393869432a861a3486575b533 100644 (file)
--- a/youtube_dl/extractor/instagram.py
+++ b/youtube_dl/extractor/instagram.py
@@ -22,7 +22,7 @@ class InstagramIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Video by naomipq',
              'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1371748545,
              'upload_date': '20130620',
              'uploader_id': 'naomipq',
@@ -38,7 +38,7 @@ class InstagramIE(InfoExtractor):
              'id': 'BA-pQFBG8HZ',
              'ext': 'mp4',
              'title': 'Video by britneyspears',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1453760977,
              'upload_date': '20160125',
              'uploader_id': 'britneyspears',
@@ -169,7 +169,7 @@ class InstagramUserIE(InfoExtractor):
                  'id': '614605558512799803_462752227',
                  'ext': 'mp4',
                  'title': '#Porsche Intelligent Performance.',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'uploader': 'Porsche',
                  'uploader_id': 'porsche',
                  'timestamp': 1387486713,
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py

index da2cdc656ac90f15a575eceabf33309b084c8f28..0fe5768834cef9faed9226ebc8418661306f2b54 100644 (file)
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -65,7 +65,7 @@ class IPrimaIE(InfoExtractor):
  
          options = self._parse_json(
              self._search_regex(
-                r'(?s)var\s+playerOptions\s*=\s*({.+?});',
+                r'(?s)(?:TDIPlayerOptions|playerOptions)\s*=\s*({.+?});\s*\]\]',
                  playerpage, 'player options', default='{}'),
              video_id, transform_source=js_to_json, fatal=False)
          if options:
diff --git a/youtube_dl/extractor/ir90tv.py b/youtube_dl/extractor/ir90tv.py

index 214bcd5b59c1a95a7a34ebc2acd87b2dc6f76454..d5a3f6fa5dbbf0d962da53df48948af4fa1d7521 100644 (file)
--- a/youtube_dl/extractor/ir90tv.py
+++ b/youtube_dl/extractor/ir90tv.py
@@ -14,7 +14,7 @@ class Ir90TvIE(InfoExtractor):
              'id': '95719',
              'ext': 'mp4',
              'title': 'شایعات نقل و انتقالات مهم فوتبال اروپا 94/02/18',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://www.90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218',
diff --git a/youtube_dl/extractor/itv.py b/youtube_dl/extractor/itv.py

new file mode 100644 (file)

index 0000000..b0d8604
--- /dev/null
+++ b/youtube_dl/extractor/itv.py
@@ -0,0 +1,196 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import uuid
+import xml.etree.ElementTree as etree
+import json
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_etree_register_namespace,
+)
+from ..utils import (
+    extract_attributes,
+    xpath_with_ns,
+    xpath_element,
+    xpath_text,
+    int_or_none,
+    parse_duration,
+    ExtractorError,
+    determine_ext,
+)
+
+
+class ITVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
+    _TEST = {
+        'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053',
+        'info_dict': {
+            'id': '2a2936a0053',
+            'ext': 'flv',
+            'title': 'Home Movie',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        params = extract_attributes(self._search_regex(
+            r'(?s)(<[^>]+id="video"[^>]*>)', webpage, 'params'))
+
+        ns_map = {
+            'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
+            'tem': 'http://tempuri.org/',
+            'itv': 'http://schemas.datacontract.org/2004/07/Itv.BB.Mercury.Common.Types',
+            'com': 'http://schemas.itv.com/2009/05/Common',
+        }
+        for ns, full_ns in ns_map.items():
+            compat_etree_register_namespace(ns, full_ns)
+
+        def _add_ns(name):
+            return xpath_with_ns(name, ns_map)
+
+        def _add_sub_element(element, name):
+            return etree.SubElement(element, _add_ns(name))
+
+        req_env = etree.Element(_add_ns('soapenv:Envelope'))
+        _add_sub_element(req_env, 'soapenv:Header')
+        body = _add_sub_element(req_env, 'soapenv:Body')
+        get_playlist = _add_sub_element(body, ('tem:GetPlaylist'))
+        request = _add_sub_element(get_playlist, 'tem:request')
+        _add_sub_element(request, 'itv:ProductionId').text = params['data-video-id']
+        _add_sub_element(request, 'itv:RequestGuid').text = compat_str(uuid.uuid4()).upper()
+        vodcrid = _add_sub_element(request, 'itv:Vodcrid')
+        _add_sub_element(vodcrid, 'com:Id')
+        _add_sub_element(request, 'itv:Partition')
+        user_info = _add_sub_element(get_playlist, 'tem:userInfo')
+        _add_sub_element(user_info, 'itv:Broadcaster').text = 'Itv'
+        _add_sub_element(user_info, 'itv:DM')
+        _add_sub_element(user_info, 'itv:RevenueScienceValue')
+        _add_sub_element(user_info, 'itv:SessionId')
+        _add_sub_element(user_info, 'itv:SsoToken')
+        _add_sub_element(user_info, 'itv:UserToken')
+        site_info = _add_sub_element(get_playlist, 'tem:siteInfo')
+        _add_sub_element(site_info, 'itv:AdvertisingRestriction').text = 'None'
+        _add_sub_element(site_info, 'itv:AdvertisingSite').text = 'ITV'
+        _add_sub_element(site_info, 'itv:AdvertisingType').text = 'Any'
+        _add_sub_element(site_info, 'itv:Area').text = 'ITVPLAYER.VIDEO'
+        _add_sub_element(site_info, 'itv:Category')
+        _add_sub_element(site_info, 'itv:Platform').text = 'DotCom'
+        _add_sub_element(site_info, 'itv:Site').text = 'ItvCom'
+        device_info = _add_sub_element(get_playlist, 'tem:deviceInfo')
+        _add_sub_element(device_info, 'itv:ScreenSize').text = 'Big'
+        player_info = _add_sub_element(get_playlist, 'tem:playerInfo')
+        _add_sub_element(player_info, 'itv:Version').text = '2'
+
+        headers = self.geo_verification_headers()
+        headers.update({
+            'Content-Type': 'text/xml; charset=utf-8',
+            'SOAPAction': 'http://tempuri.org/PlaylistService/GetPlaylist',
+        })
+        resp_env = self._download_xml(
+            params['data-playlist-url'], video_id,
+            headers=headers, data=etree.tostring(req_env))
+        playlist = xpath_element(resp_env, './/Playlist')
+        if playlist is None:
+            fault_string = xpath_text(resp_env, './/faultstring')
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, fault_string))
+        title = xpath_text(playlist, 'EpisodeTitle', fatal=True)
+        video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)
+        media_files = xpath_element(video_element, 'MediaFiles', fatal=True)
+        rtmp_url = media_files.attrib['base']
+
+        formats = []
+        for media_file in media_files.findall('MediaFile'):
+            play_path = xpath_text(media_file, 'URL')
+            if not play_path:
+                continue
+            tbr = int_or_none(media_file.get('bitrate'), 1000)
+            formats.append({
+                'format_id': 'rtmp' + ('-%d' % tbr if tbr else ''),
+                'url': rtmp_url,
+                'play_path': play_path,
+                'tbr': tbr,
+                'ext': 'flv',
+            })
+
+        ios_playlist_url = params.get('data-video-playlist')
+        hmac = params.get('data-video-hmac')
+        if ios_playlist_url and hmac:
+            headers = self.geo_verification_headers()
+            headers.update({
+                'Accept': 'application/vnd.itv.vod.playlist.v2+json',
+                'Content-Type': 'application/json',
+                'hmac': hmac.upper(),
+            })
+            ios_playlist = self._download_json(
+                ios_playlist_url, video_id, data=json.dumps({
+                    'user': {
+                        'itvUserId': '',
+                        'entitlements': [],
+                        'token': ''
+                    },
+                    'device': {
+                        'manufacturer': 'Apple',
+                        'model': 'iPad',
+                        'os': {
+                            'name': 'iPhone OS',
+                            'version': '9.3',
+                            'type': 'ios'
+                        }
+                    },
+                    'client': {
+                        'version': '4.1',
+                        'id': 'browser'
+                    },
+                    'variantAvailability': {
+                        'featureset': {
+                            'min': ['hls', 'aes'],
+                            'max': ['hls', 'aes']
+                        },
+                        'platformTag': 'mobile'
+                    }
+                }).encode(), headers=headers, fatal=False)
+            if ios_playlist:
+                video_data = ios_playlist.get('Playlist', {}).get('Video', {})
+                ios_base_url = video_data.get('Base')
+                for media_file in video_data.get('MediaFiles', []):
+                    href = media_file.get('Href')
+                    if not href:
+                        continue
+                    if ios_base_url:
+                        href = ios_base_url + href
+                    ext = determine_ext(href)
+                    if ext == 'm3u8':
+                        formats.extend(self._extract_m3u8_formats(href, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                    else:
+                        formats.append({
+                            'url': href,
+                        })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for caption_url in video_element.findall('ClosedCaptioningURIs/URL'):
+            if not caption_url.text:
+                continue
+            ext = determine_ext(caption_url.text, 'ttml')
+            subtitles.setdefault('en', []).append({
+                'url': caption_url.text,
+                'ext': 'ttml' if ext == 'xml' else ext,
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'subtitles': subtitles,
+            'episode_title': title,
+            'episode_number': int_or_none(xpath_text(playlist, 'EpisodeNumber')),
+            'series': xpath_text(playlist, 'ProgrammeTitle'),
+            'duartion': parse_duration(xpath_text(playlist, 'Duration')),
+        }
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index 7c8cb21c2c5619b4809f5daf8605958a808eccb9..3d3c15024457e30d2002a3ee19e6eeab8a29ee4d 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -28,7 +28,7 @@ class IviIE(InfoExtractor):
                  'title': 'Иван Васильевич меняет профессию',
                  'description': 'md5:b924063ea1677c8fe343d8a72ac2195f',
                  'duration': 5498,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          },
@@ -46,7 +46,7 @@ class IviIE(InfoExtractor):
                  'episode': 'Дело Гольдберга (1 часть)',
                  'episode_number': 1,
                  'duration': 2655,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          },
@@ -60,7 +60,7 @@ class IviIE(InfoExtractor):
                  'title': 'Кукла',
                  'description': 'md5:ffca9372399976a2d260a407cc74cce6',
                  'duration': 5599,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          }
diff --git a/youtube_dl/extractor/iwara.py b/youtube_dl/extractor/iwara.py

index 8d7e7f4721f3e315a16b3bef5bb1c2a788a14429..a7514fc80b3dc64636a9f53b7abc8d7672cb0546 100644 (file)
--- a/youtube_dl/extractor/iwara.py
+++ b/youtube_dl/extractor/iwara.py
@@ -3,14 +3,18 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..compat import compat_urllib_parse_urlparse
-from ..utils import remove_end
+from ..utils import (
+    int_or_none,
+    mimetype2ext,
+    remove_end,
+)
  
  
  class IwaraIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)'
      _TESTS = [{
          'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD',
-        'md5': '1d53866b2c514b23ed69e4352fdc9839',
+        # md5 is unstable
          'info_dict': {
              'id': 'amVwUl1EHpAD9RD',
              'ext': 'mp4',
@@ -23,17 +27,17 @@ class IwaraIE(InfoExtractor):
          'info_dict': {
              'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc',
              'ext': 'mp4',
-            'title': '[3D Hentai] Kyonyu Ã\83\x97 Genkai Ã\83\x97 Emaki Shinobi Girls.mp4',
+            'title': '[3D Hentai] Kyonyu Ã\97 Genkai Ã\97 Emaki Shinobi Girls.mp4',
              'age_limit': 18,
          },
          'add_ie': ['GoogleDrive'],
      }, {
          'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq',
-        'md5': '1d85f1e5217d2791626cff5ec83bb189',
+        # md5 is unstable
          'info_dict': {
              'id': '6liAP9s2Ojc',
              'ext': 'mp4',
-            'age_limit': 0,
+            'age_limit': 18,
              'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)',
              'description': 'md5:590c12c0df1443d833fbebe05da8c47a',
              'upload_date': '20160910',
@@ -52,9 +56,9 @@ class IwaraIE(InfoExtractor):
          # ecchi is 'sexy' in Japanese
          age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0
  
-        entries = self._parse_html5_media_entries(url, webpage, video_id)
+        video_data = self._download_json('http://www.iwara.tv/api/video/%s' % video_id, video_id)
  
-        if not entries:
+        if not video_data:
              iframe_url = self._html_search_regex(
                  r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
                  webpage, 'iframe URL', group='url')
@@ -67,11 +71,25 @@ class IwaraIE(InfoExtractor):
          title = remove_end(self._html_search_regex(
              r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara')
  
-        info_dict = entries[0]
-        info_dict.update({
+        formats = []
+        for a_format in video_data:
+            format_id = a_format.get('resolution')
+            height = int_or_none(self._search_regex(
+                r'(\d+)p', format_id, 'height', default=None))
+            formats.append({
+                'url': a_format['uri'],
+                'format_id': format_id,
+                'ext': mimetype2ext(a_format.get('mime')) or 'mp4',
+                'height': height,
+                'width': int_or_none(height / 9.0 * 16.0 if height else None),
+                'quality': 1 if format_id == 'Source' else 0,
+            })
+
+        self._sort_formats(formats)
+
+        return {
              'id': video_id,
              'title': title,
              'age_limit': age_limit,
-        })
-
-        return info_dict
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/izlesene.py b/youtube_dl/extractor/izlesene.py

index aa0728abc0155fa6abbe8e2a88de18dd89d85138..b1d72177d5acef2c48a82f7df18081005199b47e 100644 (file)
--- a/youtube_dl/extractor/izlesene.py
+++ b/youtube_dl/extractor/izlesene.py
@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
                  'description': 'md5:253753e2655dde93f59f74b572454f6d',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'uploader_id': 'pelikzzle',
                  'timestamp': int,
                  'upload_date': '20140702',
@@ -44,7 +44,7 @@ class IzleseneIE(InfoExtractor):
                  'id': '17997',
                  'ext': 'mp4',
                  'title': 'Tarkan Dortmund 2006 Konseri',
-                'thumbnail': 're:^https://.*\.jpg',
+                'thumbnail': r're:^https://.*\.jpg',
                  'uploader_id': 'parlayankiz',
                  'timestamp': int,
                  'upload_date': '20061112',
diff --git a/youtube_dl/extractor/jamendo.py b/youtube_dl/extractor/jamendo.py

index ee9acac09a4c14f02fc4857d9e22005a1dcd7951..595d7a5b75a25d7e5ac41b29c9052e22cc531e66 100644 (file)
--- a/youtube_dl/extractor/jamendo.py
+++ b/youtube_dl/extractor/jamendo.py
@@ -5,9 +5,27 @@ import re
  
  from ..compat import compat_urlparse
  from .common import InfoExtractor
-
-
-class JamendoIE(InfoExtractor):
+from ..utils import parse_duration
+
+
+class JamendoBaseIE(InfoExtractor):
+    def _extract_meta(self, webpage, fatal=True):
+        title = self._og_search_title(
+            webpage, default=None) or self._search_regex(
+            r'<title>([^<]+)', webpage,
+            'title', default=None)
+        if title:
+            title = self._search_regex(
+                r'(.+?)\s*\|\s*Jamendo Music', title, 'title', default=None)
+        if not title:
+            title = self._html_search_meta(
+                'name', webpage, 'title', fatal=fatal)
+        mobj = re.search(r'(.+) - (.+)', title or '')
+        artist, second = mobj.groups() if mobj else [None] * 2
+        return title, artist, second
+
+
+class JamendoIE(JamendoBaseIE):
      _VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
      _TEST = {
          'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
@@ -16,8 +34,11 @@ class JamendoIE(InfoExtractor):
              'id': '196219',
              'display_id': 'stories-from-emona-i',
              'ext': 'flac',
-            'title': 'Stories from Emona I',
-            'thumbnail': 're:^https?://.*\.jpg'
+            'title': 'Maya Filipič - Stories from Emona I',
+            'artist': 'Maya Filipič',
+            'track': 'Stories from Emona I',
+            'duration': 210,
+            'thumbnail': r're:^https?://.*\.jpg'
          }
      }
  
@@ -28,7 +49,7 @@ class JamendoIE(InfoExtractor):
  
          webpage = self._download_webpage(url, display_id)
  
-        title = self._html_search_meta('name', webpage, 'title')
+        title, artist, track = self._extract_meta(webpage)
  
          formats = [{
              'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@@ -46,37 +67,47 @@ class JamendoIE(InfoExtractor):
  
          thumbnail = self._html_search_meta(
              'image', webpage, 'thumbnail', fatal=False)
+        duration = parse_duration(self._search_regex(
+            r'<span[^>]+itemprop=["\']duration["\'][^>]+content=["\'](.+?)["\']',
+            webpage, 'duration', fatal=False))
  
          return {
              'id': track_id,
              'display_id': display_id,
              'thumbnail': thumbnail,
              'title': title,
+            'duration': duration,
+            'artist': artist,
+            'track': track,
              'formats': formats
          }
  
  
-class JamendoAlbumIE(InfoExtractor):
+class JamendoAlbumIE(JamendoBaseIE):
      _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
      _TEST = {
          'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
          'info_dict': {
              'id': '121486',
-            'title': 'Duck On Cover'
+            'title': 'Shearer - Duck On Cover'
          },
          'playlist': [{
              'md5': 'e1a2fcb42bda30dfac990212924149a8',
              'info_dict': {
                  'id': '1032333',
                  'ext': 'flac',
-                'title': 'Warmachine'
+                'title': 'Shearer - Warmachine',
+                'artist': 'Shearer',
+                'track': 'Warmachine',
              }
          }, {
              'md5': '1f358d7b2f98edfe90fd55dac0799d50',
              'info_dict': {
                  'id': '1032330',
                  'ext': 'flac',
-                'title': 'Without Your Ghost'
+                'title': 'Shearer - Without Your Ghost',
+                'artist': 'Shearer',
+                'track': 'Without Your Ghost',
              }
          }],
          'params': {
@@ -90,18 +121,18 @@ class JamendoAlbumIE(InfoExtractor):
  
          webpage = self._download_webpage(url, mobj.group('display_id'))
  
-        title = self._html_search_meta('name', webpage, 'title')
-
-        entries = [
-            self.url_result(
-                compat_urlparse.urljoin(url, m.group('path')),
-                ie=JamendoIE.ie_key(),
-                video_id=self._search_regex(
-                    r'/track/(\d+)', m.group('path'),
-                    'track id', default=None))
-            for m in re.finditer(
-                r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
-                webpage)
-        ]
+        title, artist, album = self._extract_meta(webpage, fatal=False)
+
+        entries = [{
+            '_type': 'url_transparent',
+            'url': compat_urlparse.urljoin(url, m.group('path')),
+            'ie_key': JamendoIE.ie_key(),
+            'id': self._search_regex(
+                r'/track/(\d+)', m.group('path'), 'track id', default=None),
+            'artist': artist,
+            'album': album,
+        } for m in re.finditer(
+            r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
+            webpage)]
  
          return self.playlist_result(entries, album_id, title)
diff --git a/youtube_dl/extractor/jove.py b/youtube_dl/extractor/jove.py

index cf73cd7533177d028cee83a2a013914b93f64b15..f9a034b78e41a8ec4b998f956c25c47715d6b1c7 100644 (file)
--- a/youtube_dl/extractor/jove.py
+++ b/youtube_dl/extractor/jove.py
@@ -21,7 +21,7 @@ class JoveIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Electrode Positioning and Montage in Transcranial Direct Current Stimulation',
                  'description': 'md5:015dd4509649c0908bc27f049e0262c6',
-                'thumbnail': 're:^https?://.*\.png$',
+                'thumbnail': r're:^https?://.*\.png$',
                  'upload_date': '20110523',
              }
          },
@@ -33,7 +33,7 @@ class JoveIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Culturing Caenorhabditis elegans in Axenic Liquid Media and Creation of Transgenic Worms by Microparticle Bombardment',
                  'description': 'md5:35ff029261900583970c4023b70f1dc9',
-                'thumbnail': 're:^https?://.*\.png$',
+                'thumbnail': r're:^https?://.*\.png$',
                  'upload_date': '20140802',
              }
          },
diff --git a/youtube_dl/extractor/jwplatform.py b/youtube_dl/extractor/jwplatform.py

index 5d56e0a28bd55b93153a92446834ba440ad59572..aff7ab49a9500c8bdabe78fac393eb30ef827db5 100644 (file)
--- a/youtube_dl/extractor/jwplatform.py
+++ b/youtube_dl/extractor/jwplatform.py
@@ -11,6 +11,7 @@ from ..utils import (
      int_or_none,
      js_to_json,
      mimetype2ext,
+    urljoin,
  )
  
  
@@ -110,10 +111,14 @@ class JWPlatformBaseIE(InfoExtractor):
              tracks = video_data.get('tracks')
              if tracks and isinstance(tracks, list):
                  for track in tracks:
-                    if track.get('file') and track.get('kind') == 'captions':
-                        subtitles.setdefault(track.get('label') or 'en', []).append({
-                            'url': self._proto_relative_url(track['file'])
-                        })
+                    if track.get('kind') != 'captions':
+                        continue
+                    track_url = urljoin(base_url, track.get('file'))
+                    if not track_url:
+                        continue
+                    subtitles.setdefault(track.get('label') or 'en', []).append({
+                        'url': self._proto_relative_url(track_url)
+                    })
  
              entries.append({
                  'id': this_video_id,
@@ -121,7 +126,7 @@ class JWPlatformBaseIE(InfoExtractor):
                  'description': video_data.get('description'),
                  'thumbnail': self._proto_relative_url(video_data.get('image')),
                  'timestamp': int_or_none(video_data.get('pubdate')),
-                'duration': float_or_none(jwplayer_data.get('duration')),
+                'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
                  'subtitles': subtitles,
                  'formats': formats,
              })
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index 91bc3a0a7c0af4690cf1a16713de1e76bccaa67a..5ef382f9f730091c079ab5083e0ab87f4677c407 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -107,7 +107,7 @@ class KalturaIE(InfoExtractor):
                          (?P<q1>['\"])wid(?P=q1)\s*:\s*
                          (?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
                          (?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
-                        (?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
+                        (?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
                  """, webpage) or
              re.search(
                  r'''(?xs)
@@ -266,6 +266,12 @@ class KalturaIE(InfoExtractor):
              # skip for now.
              if f.get('fileExt') == 'chun':
                  continue
+            if not f.get('fileExt'):
+                # QT indicates QuickTime; some videos have broken fileExt
+                if f.get('containerFormat') == 'qt':
+                    f['fileExt'] = 'mov'
+                else:
+                    f['fileExt'] = 'mp4'
              video_url = sign_url(
                  '%s/flavorId/%s' % (data_url, f['id']))
              # audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
@@ -316,6 +322,6 @@ class KalturaIE(InfoExtractor):
              'thumbnail': info.get('thumbnailUrl'),
              'duration': info.get('duration'),
              'timestamp': info.get('createdAt'),
-            'uploader_id': info.get('userId'),
+            'uploader_id': info.get('userId') if info.get('userId') != 'None' else None,
              'view_count': info.get('plays'),
          }
diff --git a/youtube_dl/extractor/karrierevideos.py b/youtube_dl/extractor/karrierevideos.py

index c05263e6165159320376939c252af7dea7aeadb2..4e9eb67bf24690571176299de5ff900c7496fec8 100644 (file)
--- a/youtube_dl/extractor/karrierevideos.py
+++ b/youtube_dl/extractor/karrierevideos.py
@@ -20,7 +20,7 @@ class KarriereVideosIE(InfoExtractor):
              'ext': 'flv',
              'title': 'AltenpflegerIn',
              'description': 'md5:dbadd1259fde2159a9b28667cb664ae2',
-            'thumbnail': 're:^http://.*\.png',
+            'thumbnail': r're:^http://.*\.png',
          },
          'params': {
              # rtmp download
@@ -34,7 +34,7 @@ class KarriereVideosIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Väterkarenz und neue Chancen für Mütter - "Baby - was nun?"',
              'description': 'md5:97092c6ad1fd7d38e9d6a5fdeb2bcc33',
-            'thumbnail': 're:^http://.*\.png',
+            'thumbnail': r're:^http://.*\.png',
          },
          'params': {
              # rtmp download
diff --git a/youtube_dl/extractor/keezmovies.py b/youtube_dl/extractor/keezmovies.py

index 588a4d0ec4eda6e38817b26f192536c40a172f3e..e83115e2a6c7b7a63be5237340ca0845272f8c03 100644 (file)
--- a/youtube_dl/extractor/keezmovies.py
+++ b/youtube_dl/extractor/keezmovies.py
@@ -27,7 +27,7 @@ class KeezMoviesIE(InfoExtractor):
              'display_id': 'petite-asian-lady-mai-playing-in-bathtub',
              'ext': 'mp4',
              'title': 'Petite Asian Lady Mai Playing In Bathtub',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'view_count': int,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/ketnet.py b/youtube_dl/extractor/ketnet.py

index eb0a160089b395736a1370171ca7460e32f4e7e2..fb9c2dbd47789ae6f0457a4b2724c53875d14753 100644 (file)
--- a/youtube_dl/extractor/ketnet.py
+++ b/youtube_dl/extractor/ketnet.py
@@ -13,7 +13,7 @@ class KetnetIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Gluur mee op de filmset en op Pennenzakkenrock',
              'description': 'Gluur mee met Ghost Rockers op de filmset',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
diff --git a/youtube_dl/extractor/konserthusetplay.py b/youtube_dl/extractor/konserthusetplay.py

index 55291c66ff066733a8610abe5acc65b1e0daf7f3..c11cbcf4757238642639cb6fac454ce98bb4a5c5 100644 (file)
--- a/youtube_dl/extractor/konserthusetplay.py
+++ b/youtube_dl/extractor/konserthusetplay.py
@@ -2,29 +2,31 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
+    determine_ext,
      float_or_none,
      int_or_none,
  )
  
  
  class KonserthusetPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?konserthusetplay\.se/\?.*\bm=(?P<id>[^&]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?(?:konserthusetplay|rspoplay)\.se/\?.*\bm=(?P<id>[^&]+)'
+    _TESTS = [{
          'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A',
+        'md5': 'e3fd47bf44e864bd23c08e487abe1967',
          'info_dict': {
              'id': 'CKDDnlCY-dhWAAqiMERd-A',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Orkesterns instrument: Valthornen',
              'description': 'md5:f10e1f0030202020396a4d712d2fa827',
              'thumbnail': 're:^https?://.*$',
-            'duration': 398.8,
+            'duration': 398.76,
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
-    }
+    }, {
+        'url': 'http://rspoplay.se/?m=elWuEH34SMKvaO4wO_cHBw',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -42,12 +44,18 @@ class KonserthusetPlayIE(InfoExtractor):
          player_config = media['playerconfig']
          playlist = player_config['playlist']
  
-        source = next(f for f in playlist if f.get('bitrates'))
+        source = next(f for f in playlist if f.get('bitrates') or f.get('provider'))
  
          FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4'
  
          formats = []
  
+        m3u8_url = source.get('url')
+        if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+
          fallback_url = source.get('fallbackUrl')
          fallback_format_id = None
          if fallback_url:
@@ -97,6 +105,13 @@ class KonserthusetPlayIE(InfoExtractor):
          thumbnail = media.get('image')
          duration = float_or_none(media.get('duration'), 1000)
  
+        subtitles = {}
+        captions = source.get('captionsAvailableLanguages')
+        if isinstance(captions, dict):
+            for lang, subtitle_url in captions.items():
+                if lang != 'none' and isinstance(subtitle_url, compat_str):
+                    subtitles.setdefault(lang, []).append({'url': subtitle_url})
+
          return {
              'id': video_id,
              'title': title,
@@ -104,4 +119,5 @@ class KonserthusetPlayIE(InfoExtractor):
              'thumbnail': thumbnail,
              'duration': duration,
              'formats': formats,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/krasview.py b/youtube_dl/extractor/krasview.py

index cf8876fa1f2321e7b020e2e773452f82df1bd2f1..d27d052ff0c11937a910aa689f6583ea5a3c8148 100644 (file)
--- a/youtube_dl/extractor/krasview.py
+++ b/youtube_dl/extractor/krasview.py
@@ -23,7 +23,7 @@ class KrasViewIE(InfoExtractor):
              'title': 'Снег, лёд, заносы',
              'description': 'Снято в городе Нягань, в Ханты-Мансийском автономном округе.',
              'duration': 27,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'params': {
              'skip_download': 'Not accessible from Travis CI server',
diff --git a/youtube_dl/extractor/kusi.py b/youtube_dl/extractor/kusi.py

index 2e66e8cf9d791abe27d908e04e48fd6cd3bfd4dc..6a7e3baa70cc019b497828ab84d176e55216356f 100644 (file)
--- a/youtube_dl/extractor/kusi.py
+++ b/youtube_dl/extractor/kusi.py
@@ -27,7 +27,7 @@ class KUSIIE(InfoExtractor):
              'duration': 223.586,
              'upload_date': '20160826',
              'timestamp': 1472233118,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
      }, {
          'url': 'http://kusi.com/video?clipId=12203019',
diff --git a/youtube_dl/extractor/laola1tv.py b/youtube_dl/extractor/laola1tv.py

index 2fab38079aac0c5f20a1772d52fa52642cb520bf..3190b187c9dfb8fa9204e9761b47ded0c17f5f2d 100644 (file)
--- a/youtube_dl/extractor/laola1tv.py
+++ b/youtube_dl/extractor/laola1tv.py
@@ -1,25 +1,115 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_urlparse,
-)
  from ..utils import (
      ExtractorError,
-    sanitized_Request,
      unified_strdate,
      urlencode_postdata,
      xpath_element,
      xpath_text,
+    urljoin,
+    update_url_query,
  )
  
  
+class Laola1TvEmbedIE(InfoExtractor):
+    IE_NAME = 'laola1tv:embed'
+    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/titanplayer\.php\?.*?\bvideoid=(?P<id>\d+)'
+    _TEST = {
+        # flashvars.premium = "false";
+        'url': 'https://www.laola1.tv/titanplayer.php?videoid=708065&type=V&lang=en&portal=int&customer=1024',
+        'info_dict': {
+            'id': '708065',
+            'ext': 'mp4',
+            'title': 'MA Long CHN - FAN Zhendong CHN',
+            'uploader': 'ITTF - International Table Tennis Federation',
+            'upload_date': '20161211',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        flash_vars = self._search_regex(
+            r'(?s)flashvars\s*=\s*({.+?});', webpage, 'flash vars')
+
+        def get_flashvar(x, *args, **kwargs):
+            flash_var = self._search_regex(
+                r'%s\s*:\s*"([^"]+)"' % x,
+                flash_vars, x, default=None)
+            if not flash_var:
+                flash_var = self._search_regex([
+                    r'flashvars\.%s\s*=\s*"([^"]+)"' % x,
+                    r'%s\s*=\s*"([^"]+)"' % x],
+                    webpage, x, *args, **kwargs)
+            return flash_var
+
+        hd_doc = self._download_xml(
+            'http://www.laola1.tv/server/hd_video.php', video_id, query={
+                'play': get_flashvar('streamid'),
+                'partner': get_flashvar('partnerid'),
+                'portal': get_flashvar('portalid'),
+                'lang': get_flashvar('sprache'),
+                'v5ident': '',
+            })
+
+        _v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
+        title = _v('title', fatal=True)
+
+        token_url = None
+        premium = get_flashvar('premium', default=None)
+        if premium:
+            token_url = update_url_query(
+                _v('url', fatal=True), {
+                    'timestamp': get_flashvar('timestamp'),
+                    'auth': get_flashvar('auth'),
+                })
+        else:
+            data_abo = urlencode_postdata(
+                dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(','))))
+            token_url = self._download_json(
+                'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access',
+                video_id, query={
+                    'videoId': _v('id'),
+                    'target': self._search_regex(r'vs_target = (\d+);', webpage, 'vs target'),
+                    'label': _v('label'),
+                    'area': _v('area'),
+                }, data=data_abo)['data']['stream-access'][0]
+
+        token_doc = self._download_xml(
+            token_url, video_id, 'Downloading token',
+            headers=self.geo_verification_headers())
+
+        token_attrib = xpath_element(token_doc, './/token').attrib
+
+        if token_attrib['status'] != '0':
+            raise ExtractorError(
+                'Token error: %s' % token_attrib['comment'], expected=True)
+
+        formats = self._extract_akamai_formats(
+            '%s?hdnea=%s' % (token_attrib['url'], token_attrib['auth']),
+            video_id)
+        self._sort_formats(formats)
+
+        categories_str = _v('meta_sports')
+        categories = categories_str.split(',') if categories_str else []
+        is_live = _v('islive') == 'true'
+
+        return {
+            'id': video_id,
+            'title': self._live_title(title) if is_live else title,
+            'upload_date': unified_strdate(_v('time_date')),
+            'uploader': _v('meta_organisation'),
+            'categories': categories,
+            'is_live': is_live,
+            'formats': formats,
+        }
+
+
  class Laola1TvIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/(?P<kind>[^/]+)/(?P<slug>[^/?#&]+)'
+    IE_NAME = 'laola1tv'
+    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
          'info_dict': {
@@ -67,85 +157,20 @@ class Laola1TvIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('slug')
-        kind = mobj.group('kind')
-        lang = mobj.group('lang')
-        portal = mobj.group('portal')
+        display_id = self._match_id(url)
  
          webpage = self._download_webpage(url, display_id)
  
          if 'Dieser Livestream ist bereits beendet.' in webpage:
              raise ExtractorError('This live stream has already finished.', expected=True)
  
-        iframe_url = self._search_regex(
+        iframe_url = urljoin(url, self._search_regex(
              r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
-            webpage, 'iframe url')
-
-        video_id = self._search_regex(
-            r'videoid=(\d+)', iframe_url, 'video id')
-
-        iframe = self._download_webpage(compat_urlparse.urljoin(
-            url, iframe_url), display_id, 'Downloading iframe')
-
-        partner_id = self._search_regex(
-            r'partnerid\s*:\s*(["\'])(?P<partner_id>.+?)\1',
-            iframe, 'partner id', group='partner_id')
-
-        hd_doc = self._download_xml(
-            'http://www.laola1.tv/server/hd_video.php?%s'
-            % compat_urllib_parse_urlencode({
-                'play': video_id,
-                'partner': partner_id,
-                'portal': portal,
-                'lang': lang,
-                'v5ident': '',
-            }), display_id)
-
-        _v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
-        title = _v('title', fatal=True)
-
-        VS_TARGETS = {
-            'video': '2',
-            'livestream': '17',
-        }
-
-        req = sanitized_Request(
-            'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access?%s' %
-            compat_urllib_parse_urlencode({
-                'videoId': video_id,
-                'target': VS_TARGETS.get(kind, '2'),
-                'label': _v('label'),
-                'area': _v('area'),
-            }),
-            urlencode_postdata(
-                dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(',')))))
-
-        token_url = self._download_json(req, display_id)['data']['stream-access'][0]
-        token_doc = self._download_xml(token_url, display_id, 'Downloading token')
-
-        token_attrib = xpath_element(token_doc, './/token').attrib
-        token_auth = token_attrib['auth']
-
-        if token_auth in ('blocked', 'restricted', 'error'):
-            raise ExtractorError(
-                'Token error: %s' % token_attrib['comment'], expected=True)
-
-        formats = self._extract_f4m_formats(
-            '%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
-            video_id, f4m_id='hds')
-        self._sort_formats(formats)
-
-        categories_str = _v('meta_sports')
-        categories = categories_str.split(',') if categories_str else []
+            webpage, 'iframe url'))
  
          return {
-            'id': video_id,
+            '_type': 'url',
              'display_id': display_id,
-            'title': title,
-            'upload_date': unified_strdate(_v('time_date')),
-            'uploader': _v('meta_organisation'),
-            'categories': categories,
-            'is_live': _v('islive') == 'true',
-            'formats': formats,
+            'url': iframe_url,
+            'ie_key': 'Laola1TvEmbed',
          }
diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py

index c48a5aad17ad36324b3cf70956d0ed234ffa522b..4321f90c87febbf44b4bedec97a0ba3d6a3e3b49 100644 (file)
--- a/youtube_dl/extractor/leeco.py
+++ b/youtube_dl/extractor/leeco.py
@@ -386,8 +386,8 @@ class LetvCloudIE(InfoExtractor):
          return formats
  
      def _real_extract(self, url):
-        uu_mobj = re.search('uu=([\w]+)', url)
-        vu_mobj = re.search('vu=([\w]+)', url)
+        uu_mobj = re.search(r'uu=([\w]+)', url)
+        vu_mobj = re.search(r'vu=([\w]+)', url)
  
          if not uu_mobj or not vu_mobj:
              raise ExtractorError('Invalid URL: %s' % url, expected=True)
diff --git a/youtube_dl/extractor/lemonde.py b/youtube_dl/extractor/lemonde.py

index be66fff0390e184cc1fd1c8dfa5ccd155664b760..42568f315ed1b6818907f1236705ebdeed2c02cb 100644 (file)
--- a/youtube_dl/extractor/lemonde.py
+++ b/youtube_dl/extractor/lemonde.py
@@ -12,7 +12,7 @@ class LemondeIE(InfoExtractor):
              'id': 'lqm3kl',
              'ext': 'mp4',
              'title': "Comprendre l'affaire Bygmalion en 5 minutes",
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 320,
              'upload_date': '20160119',
              'timestamp': 1453194778,
diff --git a/youtube_dl/extractor/libraryofcongress.py b/youtube_dl/extractor/libraryofcongress.py

index 0a94366fd8059b093d6b6600e380d829b4fe34c7..40295a30b51f733b637c651cc8a434ede14f517a 100644 (file)
--- a/youtube_dl/extractor/libraryofcongress.py
+++ b/youtube_dl/extractor/libraryofcongress.py
@@ -25,7 +25,7 @@ class LibraryOfCongressIE(InfoExtractor):
              'id': '90716351',
              'ext': 'mp4',
              'title': "Pa's trip to Mars",
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 0,
              'view_count': int,
          },
diff --git a/youtube_dl/extractor/libsyn.py b/youtube_dl/extractor/libsyn.py

index d375695f5a26dbc072455777487ed239820c1ec6..4750b03a3fb2f47818858338b7eb9a8b4889c012 100644 (file)
--- a/youtube_dl/extractor/libsyn.py
+++ b/youtube_dl/extractor/libsyn.py
@@ -41,7 +41,7 @@ class LibsynIE(InfoExtractor):
  
          formats = [{
              'url': media_url,
-        } for media_url in set(re.findall('var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))]
+        } for media_url in set(re.findall(r'var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))]
  
          podcast_title = self._search_regex(
              r'<h2>([^<]+)</h2>', webpage, 'podcast title', default=None)
diff --git a/youtube_dl/extractor/lifenews.py b/youtube_dl/extractor/lifenews.py

index afce2010eafadc3ceaab1eaa7d846e5e6360d547..42e263bfaba76f97a4318ab5624872fabda435a2 100644 (file)
--- a/youtube_dl/extractor/lifenews.py
+++ b/youtube_dl/extractor/lifenews.py
@@ -176,7 +176,7 @@ class LifeEmbedIE(InfoExtractor):
              'id': 'e50c2dec2867350528e2574c899b8291',
              'ext': 'mp4',
              'title': 'e50c2dec2867350528e2574c899b8291',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          # with 1080p
diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py

index b7bfa7a6d524e4a5ebd190947b52a369a211e753..e635f3c4dc46c6407a166dec9ab2ef06981b6221 100644 (file)
--- a/youtube_dl/extractor/limelight.py
+++ b/youtube_dl/extractor/limelight.py
@@ -59,14 +59,26 @@ class LimelightBaseIE(InfoExtractor):
                      format_id = 'rtmp'
                      if stream.get('videoBitRate'):
                          format_id += '-%d' % int_or_none(stream['videoBitRate'])
-                    http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
-                    urls.append(http_url)
-                    http_fmt = fmt.copy()
-                    http_fmt.update({
-                        'url': http_url,
-                        'format_id': format_id.replace('rtmp', 'http'),
-                    })
-                    formats.append(http_fmt)
+                    http_format_id = format_id.replace('rtmp', 'http')
+
+                    CDN_HOSTS = (
+                        ('delvenetworks.com', 'cpl.delvenetworks.com'),
+                        ('video.llnw.net', 's2.content.video.llnw.net'),
+                    )
+                    for cdn_host, http_host in CDN_HOSTS:
+                        if cdn_host not in rtmp.group('host').lower():
+                            continue
+                        http_url = 'http://%s/%s' % (http_host, rtmp.group('playpath')[4:])
+                        urls.append(http_url)
+                        if self._is_valid_url(http_url, video_id, http_format_id):
+                            http_fmt = fmt.copy()
+                            http_fmt.update({
+                                'url': http_url,
+                                'format_id': http_format_id,
+                            })
+                            formats.append(http_fmt)
+                            break
+
                      fmt.update({
                          'url': rtmp.group('url'),
                          'play_path': rtmp.group('playpath'),
@@ -164,7 +176,7 @@ class LimelightMediaIE(LimelightBaseIE):
              'ext': 'mp4',
              'title': 'HaP and the HB Prince Trailer',
              'description': 'md5:8005b944181778e313d95c1237ddb640',
-            'thumbnail': 're:^https?://.*\.jpeg$',
+            'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 144.23,
              'timestamp': 1244136834,
              'upload_date': '20090604',
@@ -181,7 +193,7 @@ class LimelightMediaIE(LimelightBaseIE):
              'id': 'a3e00274d4564ec4a9b29b9466432335',
              'ext': 'mp4',
              'title': '3Play Media Overview Video',
-            'thumbnail': 're:^https?://.*\.jpeg$',
+            'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 78.101,
              'timestamp': 1338929955,
              'upload_date': '20120605',
diff --git a/youtube_dl/extractor/litv.py b/youtube_dl/extractor/litv.py

index ded717cf2823f6b999d310eafe72801ff507daa3..337b1b15cf9d783750fa225b5538edfbc1fcfde2 100644 (file)
--- a/youtube_dl/extractor/litv.py
+++ b/youtube_dl/extractor/litv.py
@@ -31,7 +31,7 @@ class LiTVIE(InfoExtractor):
              'id': 'VOD00041610',
              'ext': 'mp4',
              'title': '花千骨第1集',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'description': 'md5:c7017aa144c87467c4fb2909c4b05d6f',
              'episode_number': 1,
          },
@@ -80,7 +80,7 @@ class LiTVIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          program_info = self._parse_json(self._search_regex(
-            'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
+            r'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
              video_id)
  
          season_list = list(program_info.get('seasonList', {}).values())
diff --git a/youtube_dl/extractor/liveleak.py b/youtube_dl/extractor/liveleak.py

index b84e4dd6c20415a1f8bf7a562a61f80bbdbd5786..c7de65353e616dc0d5f2ee1b0128c059d6f4f933 100644 (file)
--- a/youtube_dl/extractor/liveleak.py
+++ b/youtube_dl/extractor/liveleak.py
@@ -18,7 +18,7 @@ class LiveLeakIE(InfoExtractor):
              'description': 'extremely bad day for this guy..!',
              'uploader': 'ljfriel2',
              'title': 'Most unlucky car accident',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          }
      }, {
          'url': 'http://www.liveleak.com/view?i=f93_1390833151',
@@ -29,7 +29,7 @@ class LiveLeakIE(InfoExtractor):
              'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
              'uploader': 'ARD_Stinkt',
              'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          }
      }, {
          'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
@@ -52,7 +52,7 @@ class LiveLeakIE(InfoExtractor):
              'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
              'uploader': 'bony333',
              'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          }
      }, {
          # Covers https://github.com/rg3/youtube-dl/pull/10664#issuecomment-247439521
diff --git a/youtube_dl/extractor/livestream.py b/youtube_dl/extractor/livestream.py

index bc7894bf13ed29963aa1dad7880cf8549be1ca77..c863413bf008baa6baf0233b8185a10ea119d091 100644 (file)
--- a/youtube_dl/extractor/livestream.py
+++ b/youtube_dl/extractor/livestream.py
@@ -37,7 +37,7 @@ class LivestreamIE(InfoExtractor):
              'duration': 5968.0,
              'like_count': int,
              'view_count': int,
-            'thumbnail': 're:^http://.*\.jpg$'
+            'thumbnail': r're:^http://.*\.jpg$'
          }
      }, {
          'url': 'http://new.livestream.com/tedx/cityenglish',
diff --git a/youtube_dl/extractor/lnkgo.py b/youtube_dl/extractor/lnkgo.py

index fd23b0b43fa91af1e828c9038f83ac228b93aa94..068378c9c509a0483650feac42a8ffe92cc60328 100644 (file)
--- a/youtube_dl/extractor/lnkgo.py
+++ b/youtube_dl/extractor/lnkgo.py
@@ -22,7 +22,7 @@ class LnkGoIE(InfoExtractor):
              'description': 'md5:d82a5e36b775b7048617f263a0e3475e',
              'age_limit': 7,
              'duration': 3019,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
          'params': {
              'skip_download': True,  # HLS download
@@ -37,7 +37,7 @@ class LnkGoIE(InfoExtractor):
              'description': 'md5:7352d113a242a808676ff17e69db6a69',
              'age_limit': 18,
              'duration': 346,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
          'params': {
              'skip_download': True,  # HLS download
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index f4dcfd93fa760878566568636d9c2b864b6c7556..da94eab561b91d6b70675911e432b5750d5d5b04 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -73,7 +73,7 @@ class LyndaBaseIE(InfoExtractor):
  
          # Already logged in
          if any(re.search(p, signin_page) for p in (
-                'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
+                r'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
              return
  
          # Step 2: submit email
diff --git a/youtube_dl/extractor/matchtv.py b/youtube_dl/extractor/matchtv.py

index 33b0b539fa9dfde80274d983aa003ea7b39e6622..bc9933a8134eea759918aad3475dd42c7f6b406e 100644 (file)
--- a/youtube_dl/extractor/matchtv.py
+++ b/youtube_dl/extractor/matchtv.py
@@ -14,7 +14,7 @@ class MatchTVIE(InfoExtractor):
          'info_dict': {
              'id': 'matchtv-live',
              'ext': 'flv',
-            'title': 're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
          },
          'params': {
diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py

index 2100583df46ab7955846f8e3b08467d13ed3440e..6e4290aadd6e0d3543d5b48d5df39577032437e2 100644 (file)
--- a/youtube_dl/extractor/mdr.py
+++ b/youtube_dl/extractor/mdr.py
@@ -72,7 +72,7 @@ class MDRIE(InfoExtractor):
  
          data_url = self._search_regex(
              r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
-            webpage, 'data url', group='url').replace('\/', '/')
+            webpage, 'data url', group='url').replace(r'\/', '/')
  
          doc = self._download_xml(
              compat_urlparse.urljoin(url, data_url), video_id)
diff --git a/youtube_dl/extractor/meipai.py b/youtube_dl/extractor/meipai.py

new file mode 100644 (file)

index 0000000..c8eacb4
--- /dev/null
+++ b/youtube_dl/extractor/meipai.py
@@ -0,0 +1,104 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    unified_timestamp,
+)
+
+
+class MeipaiIE(InfoExtractor):
+    IE_DESC = '美拍'
+    _VALID_URL = r'https?://(?:www\.)?meipai.com/media/(?P<id>[0-9]+)'
+    _TESTS = [{
+        # regular uploaded video
+        'url': 'http://www.meipai.com/media/531697625',
+        'md5': 'e3e9600f9e55a302daecc90825854b4f',
+        'info_dict': {
+            'id': '531697625',
+            'ext': 'mp4',
+            'title': '#葉子##阿桑##余姿昀##超級女聲#',
+            'description': '#葉子##阿桑##余姿昀##超級女聲#',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'duration': 152,
+            'timestamp': 1465492420,
+            'upload_date': '20160609',
+            'view_count': 35511,
+            'creator': '她她-TATA',
+            'tags': ['葉子', '阿桑', '余姿昀', '超級女聲'],
+        }
+    }, {
+        # record of live streaming
+        'url': 'http://www.meipai.com/media/585526361',
+        'md5': 'ff7d6afdbc6143342408223d4f5fb99a',
+        'info_dict': {
+            'id': '585526361',
+            'ext': 'mp4',
+            'title': '姿昀和善願 練歌練琴啦😁😁😁',
+            'description': '姿昀和善願 練歌練琴啦😁😁😁',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'duration': 5975,
+            'timestamp': 1474311799,
+            'upload_date': '20160919',
+            'view_count': 1215,
+            'creator': '她她-TATA',
+        }
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._og_search_title(
+            webpage, default=None) or self._html_search_regex(
+            r'<title[^>]*>([^<]+)</title>', webpage, 'title')
+
+        formats = []
+
+        # recorded playback of live streaming
+        m3u8_url = self._html_search_regex(
+            r'file:\s*encodeURIComponent\((["\'])(?P<url>(?:(?!\1).)+)\1\)',
+            webpage, 'm3u8 url', group='url', default=None)
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        if not formats:
+            # regular uploaded video
+            video_url = self._search_regex(
+                r'data-video=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'video url',
+                group='url', default=None)
+            if video_url:
+                formats.append({
+                    'url': video_url,
+                    'format_id': 'http',
+                })
+
+        timestamp = unified_timestamp(self._og_search_property(
+            'video:release_date', webpage, 'release date', fatal=False))
+
+        tags = self._og_search_property(
+            'video:tag', webpage, 'tags', default='').split(',')
+
+        view_count = int_or_none(self._html_search_meta(
+            'interactionCount', webpage, 'view count'))
+        duration = parse_duration(self._html_search_meta(
+            'duration', webpage, 'duration'))
+        creator = self._og_search_property(
+            'video:director', webpage, 'creator', fatal=False)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'duration': duration,
+            'timestamp': timestamp,
+            'view_count': view_count,
+            'creator': creator,
+            'tags': tags,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/melonvod.py b/youtube_dl/extractor/melonvod.py

new file mode 100644 (file)

index 0000000..bd8cf13
--- /dev/null
+++ b/youtube_dl/extractor/melonvod.py
@@ -0,0 +1,72 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    urljoin,
+)
+
+
+class MelonVODIE(InfoExtractor):
+    _VALID_URL = r'https?://vod\.melon\.com/video/detail2\.html?\?.*?mvId=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://vod.melon.com/video/detail2.htm?mvId=50158734',
+        'info_dict': {
+            'id': '50158734',
+            'ext': 'mp4',
+            'title': "Jessica 'Wonderland' MV Making Film",
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'artist': 'Jessica (제시카)',
+            'upload_date': '20161212',
+            'duration': 203,
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        play_info = self._download_json(
+            'http://vod.melon.com/video/playerInfo.json', video_id,
+            note='Downloading player info JSON', query={'mvId': video_id})
+
+        title = play_info['mvInfo']['MVTITLE']
+
+        info = self._download_json(
+            'http://vod.melon.com/delivery/streamingInfo.json', video_id,
+            note='Downloading streaming info JSON',
+            query={
+                'contsId': video_id,
+                'contsType': 'VIDEO',
+            })
+
+        stream_info = info['streamingInfo']
+
+        formats = self._extract_m3u8_formats(
+            stream_info['encUrl'], video_id, 'mp4', m3u8_id='hls')
+        self._sort_formats(formats)
+
+        artist_list = play_info.get('artistList')
+        artist = None
+        if isinstance(artist_list, list):
+            artist = ', '.join(
+                [a['ARTISTNAMEWEBLIST']
+                 for a in artist_list if a.get('ARTISTNAMEWEBLIST')])
+
+        thumbnail = urljoin(info.get('staticDomain'), stream_info.get('imgPath'))
+
+        duration = int_or_none(stream_info.get('playTime'))
+        upload_date = stream_info.get('mvSvcOpenDt', '')[:8] or None
+
+        return {
+            'id': video_id,
+            'title': title,
+            'artist': artist,
+            'thumbnail': thumbnail,
+            'upload_date': upload_date,
+            'duration': duration,
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py

index e6e7659a1de0ebe86f48a4128192de5d14d6d586..9880924e692380fffde3d0c776da329225de4ef8 100644 (file)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -133,7 +133,7 @@ class MetacafeIE(InfoExtractor):
          video_id, display_id = re.match(self._VALID_URL, url).groups()
  
          # the video may come from an external site
-        m_external = re.match('^(\w{2})-(.*)$', video_id)
+        m_external = re.match(r'^(\w{2})-(.*)$', video_id)
          if m_external is not None:
              prefix, ext_id = m_external.groups()
              # Check if video comes from YouTube
diff --git a/youtube_dl/extractor/mgoon.py b/youtube_dl/extractor/mgoon.py

index 94bc87b00797951f932394166d6c5b8f5c3e6d1a..7bb473900fcdb39dccbabbed325350fc04349c4f 100644 (file)
--- a/youtube_dl/extractor/mgoon.py
+++ b/youtube_dl/extractor/mgoon.py
@@ -27,7 +27,7 @@ class MgoonIE(InfoExtractor):
                  'upload_date': '20131220',
                  'ext': 'mp4',
                  'title': 'md5:543aa4c27a4931d371c3f433e8cebebc',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              }
          },
          {
diff --git a/youtube_dl/extractor/mgtv.py b/youtube_dl/extractor/mgtv.py

index e0bb5d208856a121f40f533fcacf3b7bd98d13ea..659ede8c2254d6ce524953c298c8a69b0b13d745 100644 (file)
--- a/youtube_dl/extractor/mgtv.py
+++ b/youtube_dl/extractor/mgtv.py
@@ -18,7 +18,7 @@ class MGTVIE(InfoExtractor):
              'title': '我是歌手第四季双年巅峰会：韩红李玟“双王”领军对抗',
              'description': '我是歌手第四季双年巅峰会',
              'duration': 7461,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          # no tbr extracted from stream_url
diff --git a/youtube_dl/extractor/minhateca.py b/youtube_dl/extractor/minhateca.py

index e6730b75a68d27c16e694fedaac088d27a0ab1ec..dccc542497692ac6aa14a6e36f6b96b0aad7741d 100644 (file)
--- a/youtube_dl/extractor/minhateca.py
+++ b/youtube_dl/extractor/minhateca.py
@@ -19,7 +19,7 @@ class MinhatecaIE(InfoExtractor):
              'id': '125848331',
              'ext': 'mp4',
              'title': 'youtube-dl test video',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'filesize_approx': 1530000,
              'duration': 9,
              'view_count': int,
diff --git a/youtube_dl/extractor/ministrygrid.py b/youtube_dl/extractor/ministrygrid.py

index 10190d5f6e1f3f55b3274855c7614bea62b620e5..8ad9239c50519b15bb4f4db3f41bde9d199759ac 100644 (file)
--- a/youtube_dl/extractor/ministrygrid.py
+++ b/youtube_dl/extractor/ministrygrid.py
@@ -17,7 +17,7 @@ class MinistryGridIE(InfoExtractor):
              'id': '3453494717001',
              'ext': 'mp4',
              'title': 'The Gospel by Numbers',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'upload_date': '20140410',
              'description': 'Coming soon from T4G 2014!',
              'uploader_id': '2034960640001',
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index f577836be6dd01c7dc4b315c0295a973fe308a2d..79e0b8ada1aaefeb90b479967d0f9e2197818bff 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -90,7 +90,7 @@ class MiTeleIE(InfoExtractor):
              'season_id': 'diario_de_t14_11981',
              'episode': 'Programa 144',
              'episode_number': 3,
-            'thumbnail': 're:(?i)^https?://.*\.jpg$',
+            'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
          },
          'add_ie': ['Ooyala'],
@@ -108,7 +108,7 @@ class MiTeleIE(InfoExtractor):
              'season_id': 'cuarto_milenio_t06_12715',
              'episode': 'Programa 226',
              'episode_number': 24,
-            'thumbnail': 're:(?i)^https?://.*\.jpg$',
+            'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 7313,
          },
          'params': {
@@ -190,7 +190,7 @@ class MiTeleIE(InfoExtractor):
          return {
              '_type': 'url_transparent',
              # for some reason only HLS is supported
-            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8'}),
+            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
              'id': video_id,
              'title': title,
              'description': description,
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index 560fe188b675a619785332eea285484fa85154bf..a24b3165a49670444024f4503877efa3467b8dbc 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -16,13 +16,12 @@ from ..utils import (
      clean_html,
      ExtractorError,
      OnDemandPagedList,
-    parse_count,
      str_to_int,
  )
  
  
  class MixcloudIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
+    _VALID_URL = r'https?://(?:(?:www|beta|m)\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
      IE_NAME = 'mixcloud'
  
      _TESTS = [{
@@ -34,9 +33,8 @@ class MixcloudIE(InfoExtractor):
              'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
              'uploader': 'Daniel Holbach',
              'uploader_id': 'dholbach',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'view_count': int,
-            'like_count': int,
          },
      }, {
          'url': 'http://www.mixcloud.com/gillespeterson/caribou-7-inch-vinyl-mix-chat/',
@@ -49,8 +47,10 @@ class MixcloudIE(InfoExtractor):
              'uploader_id': 'gillespeterson',
              'thumbnail': 're:https?://.*',
              'view_count': int,
-            'like_count': int,
          },
+    }, {
+        'url': 'https://beta.mixcloud.com/RedLightRadio/nosedrip-15-red-light-radio-01-18-2016/',
+        'only_matching': True,
      }]
  
      # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
@@ -86,26 +86,18 @@ class MixcloudIE(InfoExtractor):
  
          song_url = play_info['stream_url']
  
-        PREFIX = (
-            r'm-play-on-spacebar[^>]+'
-            r'(?:\s+[a-zA-Z0-9-]+(?:="[^"]+")?)*?\s+')
-        title = self._html_search_regex(
-            PREFIX + r'm-title="([^"]+)"', webpage, 'title')
+        title = self._html_search_regex(r'm-title="([^"]+)"', webpage, 'title')
          thumbnail = self._proto_relative_url(self._html_search_regex(
-            PREFIX + r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail',
-            fatal=False))
+            r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail', fatal=False))
          uploader = self._html_search_regex(
-            PREFIX + r'm-owner-name="([^"]+)"',
-            webpage, 'uploader', fatal=False)
+            r'm-owner-name="([^"]+)"', webpage, 'uploader', fatal=False)
          uploader_id = self._search_regex(
              r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
          description = self._og_search_description(webpage)
-        like_count = parse_count(self._search_regex(
-            r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
-            webpage, 'like count', default=None))
          view_count = str_to_int(self._search_regex(
              [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
-             r'/listeners/?">([0-9,.]+)</a>'],
+             r'/listeners/?">([0-9,.]+)</a>',
+             r'm-tooltip=["\']([\d,.]+) plays'],
              webpage, 'play count', default=None))
  
          return {
@@ -117,7 +109,6 @@ class MixcloudIE(InfoExtractor):
              'uploader': uploader,
              'uploader_id': uploader_id,
              'view_count': view_count,
-            'like_count': like_count,
          }
  
  
diff --git a/youtube_dl/extractor/mlb.py b/youtube_dl/extractor/mlb.py

index e242b897f2b63cf624805c7564cf7e2f02a9d16b..59cd4b8389f28a72f9d16df70edfa64a7ce2ba40 100644 (file)
--- a/youtube_dl/extractor/mlb.py
+++ b/youtube_dl/extractor/mlb.py
@@ -37,7 +37,7 @@ class MLBIE(InfoExtractor):
                  'duration': 66,
                  'timestamp': 1405980600,
                  'upload_date': '20140721',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -51,7 +51,7 @@ class MLBIE(InfoExtractor):
                  'duration': 46,
                  'timestamp': 1405105800,
                  'upload_date': '20140711',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -65,7 +65,7 @@ class MLBIE(InfoExtractor):
                  'duration': 488,
                  'timestamp': 1405399936,
                  'upload_date': '20140715',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -79,7 +79,7 @@ class MLBIE(InfoExtractor):
                  'duration': 52,
                  'timestamp': 1405390722,
                  'upload_date': '20140715',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/mnet.py b/youtube_dl/extractor/mnet.py

index e3f42e7bdae7503d69d7585a407c3d68e333086e..6a85dcbd522cfb087499daab81482fac84d75a0a 100644 (file)
--- a/youtube_dl/extractor/mnet.py
+++ b/youtube_dl/extractor/mnet.py
@@ -22,7 +22,7 @@ class MnetIE(InfoExtractor):
              'timestamp': 1451564040,
              'age_limit': 0,
              'thumbnails': 'mincount:5',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'ext': 'flv',
          },
          'params': {
diff --git a/youtube_dl/extractor/moevideo.py b/youtube_dl/extractor/moevideo.py

index 91ee9c4e95204718cb069fe1dc36908821b7af6d..44bcc498254dc2503b8ce103b59d1c207c44df66 100644 (file)
--- a/youtube_dl/extractor/moevideo.py
+++ b/youtube_dl/extractor/moevideo.py
@@ -30,7 +30,7 @@ class MoeVideoIE(InfoExtractor):
                  'ext': 'flv',
                  'title': 'Sink cut out machine',
                  'description': 'md5:f29ff97b663aefa760bf7ca63c8ca8a8',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'width': 540,
                  'height': 360,
                  'duration': 179,
@@ -46,7 +46,7 @@ class MoeVideoIE(InfoExtractor):
                  'ext': 'flv',
                  'title': 'Operacion Condor.',
                  'description': 'md5:7e68cb2fcda66833d5081c542491a9a3',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'width': 480,
                  'height': 296,
                  'duration': 6027,
diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py

index e3bbe5aa8997694f62a07d8a2e0c383aa64daae1..54716f5c7af1dc15e9b0a5b5174b08ba68782bce 100644 (file)
--- a/youtube_dl/extractor/mofosex.py
+++ b/youtube_dl/extractor/mofosex.py
@@ -18,7 +18,7 @@ class MofosexIE(KeezMoviesIE):
              'display_id': 'amateur-teen-playing-and-masturbating-318131',
              'ext': 'mp4',
              'title': 'amateur teen playing and masturbating',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20121114',
              'view_count': int,
              'like_count': int,
diff --git a/youtube_dl/extractor/mojvideo.py b/youtube_dl/extractor/mojvideo.py

index 0ba435dc5597219e9b569c441e58e3a196e1bfbe..165e658c94424268f3c8df0987dffbdff474723a 100644 (file)
--- a/youtube_dl/extractor/mojvideo.py
+++ b/youtube_dl/extractor/mojvideo.py
@@ -20,7 +20,7 @@ class MojvideoIE(InfoExtractor):
              'display_id': 'v-avtu-pred-mano-rdecelaska-alfi-nipic',
              'ext': 'mp4',
              'title': 'V avtu pred mano rdečelaska - Alfi Nipič',
-            'thumbnail': 're:^http://.*\.jpg$',
+            'thumbnail': r're:^http://.*\.jpg$',
              'duration': 242,
          }
      }
diff --git a/youtube_dl/extractor/motherless.py b/youtube_dl/extractor/motherless.py

index 5e1a8a71a93aa28962d7f260af966d10cf8e9f7a..6fe3b6049b2917ed5d7b075d0ca2c7ae943c459f 100644 (file)
--- a/youtube_dl/extractor/motherless.py
+++ b/youtube_dl/extractor/motherless.py
@@ -23,7 +23,7 @@ class MotherlessIE(InfoExtractor):
              'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
              'upload_date': '20100913',
              'uploader_id': 'famouslyfuckedup',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'age_limit': 18,
          }
      }, {
@@ -37,7 +37,7 @@ class MotherlessIE(InfoExtractor):
                             'game', 'hairy'],
              'upload_date': '20140622',
              'uploader_id': 'Sulivana7x',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'age_limit': 18,
          },
          'skip': '404',
@@ -51,7 +51,7 @@ class MotherlessIE(InfoExtractor):
              'categories': ['superheroine heroine  superher'],
              'upload_date': '20140827',
              'uploader_id': 'shade0230',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'age_limit': 18,
          }
      }, {
diff --git a/youtube_dl/extractor/movieclips.py b/youtube_dl/extractor/movieclips.py

index 30c206f9b61e22d3e029a68979643fc6ee7de635..5453da1acfe19a103d97a026b725b81e86d7859d 100644 (file)
--- a/youtube_dl/extractor/movieclips.py
+++ b/youtube_dl/extractor/movieclips.py
@@ -20,7 +20,7 @@ class MovieClipsIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Warcraft Trailer 1',
              'description': 'Watch Trailer 1 from Warcraft (2016). Legendary’s WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1446843055,
              'upload_date': '20151106',
              'uploader': 'Movieclips',
diff --git a/youtube_dl/extractor/moviezine.py b/youtube_dl/extractor/moviezine.py

index 478e3996743d1eca8434a786b58c4bd799a7dc55..85cc6e22f59cd5af0e194dee675beb3bc69a9369 100644 (file)
--- a/youtube_dl/extractor/moviezine.py
+++ b/youtube_dl/extractor/moviezine.py
@@ -16,7 +16,7 @@ class MoviezineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Oculus - Trailer 1',
              'description': 'md5:40cc6790fc81d931850ca9249b40e8a4',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }
  
diff --git a/youtube_dl/extractor/movingimage.py b/youtube_dl/extractor/movingimage.py

index bb789c32edb45e78e9806faaae169af09826135e..4f62d628a24dbf65db894dce2dc24e56fde7403a 100644 (file)
--- a/youtube_dl/extractor/movingimage.py
+++ b/youtube_dl/extractor/movingimage.py
@@ -18,7 +18,7 @@ class MovingImageIE(InfoExtractor):
              'title': 'SHETLAND WOOL',
              'description': 'md5:c5afca6871ad59b4271e7704fe50ab04',
              'duration': 900,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/msn.py b/youtube_dl/extractor/msn.py

index d75ce8b3b510b68ca0dfe754d8fcf1741e6cbd9d..1473bcf4845d4b8470393686a7bd9baf1df0e398 100644 (file)
--- a/youtube_dl/extractor/msn.py
+++ b/youtube_dl/extractor/msn.py
@@ -78,11 +78,6 @@ class MSNIE(InfoExtractor):
                  m3u8_formats = self._extract_m3u8_formats(
                      format_url, display_id, 'mp4',
                      m3u8_id='hls', fatal=False)
-                # Despite metadata in m3u8 all video+audio formats are
-                # actually video-only (no audio)
-                for f in m3u8_formats:
-                    if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
-                        f['acodec'] = 'none'
                  formats.extend(m3u8_formats)
              else:
                  formats.append({
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index 03351917e71cdfbfb98ecb329eecad9500b288e4..8acea1461a662dc40840526c4efabcbe7a7c29b0 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -13,11 +13,11 @@ from ..utils import (
      fix_xml_ampersands,
      float_or_none,
      HEADRequest,
-    NO_DEFAULT,
      RegexNotFoundError,
      sanitized_Request,
      strip_or_none,
      timeconvert,
+    try_get,
      unescapeHTML,
      update_url_query,
      url_basename,
@@ -42,15 +42,6 @@ class MTVServicesInfoExtractor(InfoExtractor):
          # Remove the templates, like &device={device}
          return re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', url)
  
-    # This was originally implemented for ComedyCentral, but it also works here
-    @classmethod
-    def _transform_rtmp_url(cls, rtmp_video_url):
-        m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\..+?/.*)$', rtmp_video_url)
-        if not m:
-            return {'rtmp': rtmp_video_url}
-        base = 'http://viacommtvstrmfs.fplive.net/'
-        return {'http': base + m.group('finalid')}
-
      def _get_feed_url(self, uri):
          return self._FEED_URL
  
@@ -77,7 +68,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
          url = re.sub(r'.+pxE=mp4', 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=0+_pxK=18639+_pxE=mp4', url, 1)
          return [{'url': url, 'ext': 'mp4'}]
  
-    def _extract_video_formats(self, mdoc, mtvn_id):
+    def _extract_video_formats(self, mdoc, mtvn_id, video_id):
          if re.match(r'.*/(error_country_block\.swf|geoblock\.mp4|copyright_error\.flv(?:\?geo\b.+?)?)$', mdoc.find('.//src').text) is not None:
              if mtvn_id is not None and self._MOBILE_TEMPLATE is not None:
                  self.to_screen('The normal version is not available from your '
@@ -88,21 +79,33 @@ class MTVServicesInfoExtractor(InfoExtractor):
  
          formats = []
          for rendition in mdoc.findall('.//rendition'):
-            try:
-                _, _, ext = rendition.attrib['type'].partition('/')
-                rtmp_video_url = rendition.find('./src').text
-                if rtmp_video_url.endswith('siteunavail.png'):
-                    continue
-                new_urls = self._transform_rtmp_url(rtmp_video_url)
-                formats.extend([{
-                    'ext': 'flv' if new_url.startswith('rtmp') else ext,
-                    'url': new_url,
-                    'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])),
-                    'width': int(rendition.get('width')),
-                    'height': int(rendition.get('height')),
-                } for kind, new_url in new_urls.items()])
-            except (KeyError, TypeError):
-                raise ExtractorError('Invalid rendition field.')
+            if rendition.get('method') == 'hls':
+                hls_url = rendition.find('./src').text
+                formats.extend(self._extract_m3u8_formats(
+                    hls_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls'))
+            else:
+                # fms
+                try:
+                    _, _, ext = rendition.attrib['type'].partition('/')
+                    rtmp_video_url = rendition.find('./src').text
+                    if 'error_not_available.swf' in rtmp_video_url:
+                        raise ExtractorError(
+                            '%s said: video is not available' % self.IE_NAME,
+                            expected=True)
+                    if rtmp_video_url.endswith('siteunavail.png'):
+                        continue
+                    formats.extend([{
+                        'ext': 'flv' if rtmp_video_url.startswith('rtmp') else ext,
+                        'url': rtmp_video_url,
+                        'format_id': '-'.join(filter(None, [
+                            'rtmp' if rtmp_video_url.startswith('rtmp') else None,
+                            rendition.get('bitrate')])),
+                        'width': int(rendition.get('width')),
+                        'height': int(rendition.get('height')),
+                    }])
+                except (KeyError, TypeError):
+                    raise ExtractorError('Invalid rendition field.')
          self._sort_formats(formats)
          return formats
  
@@ -118,15 +121,17 @@ class MTVServicesInfoExtractor(InfoExtractor):
              } for typographic in transcript.findall('./typographic')]
          return subtitles
  
-    def _get_video_info(self, itemdoc):
+    def _get_video_info(self, itemdoc, use_hls=True):
          uri = itemdoc.find('guid').text
          video_id = self._id_from_uri(uri)
          self.report_extraction(video_id)
          content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content')))
          mediagen_url = self._remove_template_parameter(content_el.attrib['url'])
+        mediagen_url = mediagen_url.replace('device={device}', '')
          if 'acceptMethods' not in mediagen_url:
              mediagen_url += '&' if '?' in mediagen_url else '?'
-            mediagen_url += 'acceptMethods=fms'
+            mediagen_url += 'acceptMethods='
+            mediagen_url += 'hls' if use_hls else 'fms'
  
          mediagen_doc = self._download_xml(mediagen_url, video_id,
                                            'Downloading video urls')
@@ -167,9 +172,11 @@ class MTVServicesInfoExtractor(InfoExtractor):
          if mtvn_id_node is not None:
              mtvn_id = mtvn_id_node.text
  
+        formats = self._extract_video_formats(mediagen_doc, mtvn_id, video_id)
+
          return {
              'title': title,
-            'formats': self._extract_video_formats(mediagen_doc, mtvn_id),
+            'formats': formats,
              'subtitles': self._extract_subtitles(mediagen_doc, mtvn_id),
              'id': video_id,
              'thumbnail': self._get_thumbnail_url(uri, itemdoc),
@@ -184,13 +191,13 @@ class MTVServicesInfoExtractor(InfoExtractor):
              data['lang'] = self._LANG
          return data
  
-    def _get_videos_info(self, uri):
+    def _get_videos_info(self, uri, use_hls=True):
          video_id = self._id_from_uri(uri)
          feed_url = self._get_feed_url(uri)
          info_url = update_url_query(feed_url, self._get_feed_query(uri))
-        return self._get_videos_info_from_url(info_url, video_id)
+        return self._get_videos_info_from_url(info_url, video_id, use_hls)
  
-    def _get_videos_info_from_url(self, url, video_id):
+    def _get_videos_info_from_url(self, url, video_id, use_hls=True):
          idoc = self._download_xml(
              url, video_id,
              'Downloading info', transform_source=fix_xml_ampersands)
@@ -199,10 +206,31 @@ class MTVServicesInfoExtractor(InfoExtractor):
          description = xpath_text(idoc, './channel/description')
  
          return self.playlist_result(
-            [self._get_video_info(item) for item in idoc.findall('.//item')],
+            [self._get_video_info(item, use_hls) for item in idoc.findall('.//item')],
              playlist_title=title, playlist_description=description)
  
-    def _extract_mgid(self, webpage, default=NO_DEFAULT):
+    def _extract_triforce_mgid(self, webpage, data_zone=None, video_id=None):
+        triforce_feed = self._parse_json(self._search_regex(
+            r'triforceManifestFeed\s*=\s*({.+?})\s*;\s*\n', webpage,
+            'triforce feed', default='{}'), video_id, fatal=False)
+
+        data_zone = self._search_regex(
+            r'data-zone=(["\'])(?P<zone>.+?_lc_promo.*?)\1', webpage,
+            'data zone', default=data_zone, group='zone')
+
+        feed_url = try_get(
+            triforce_feed, lambda x: x['manifest']['zones'][data_zone]['feed'],
+            compat_str)
+        if not feed_url:
+            return
+
+        feed = self._download_json(feed_url, video_id, fatal=False)
+        if not feed:
+            return
+
+        return try_get(feed, lambda x: x['result']['data']['id'], compat_str)
+
+    def _extract_mgid(self, webpage):
          try:
              # the url can be http://media.mtvnservices.com/fb/{mgid}.swf
              # or http://media.mtvnservices.com/{mgid}
@@ -222,7 +250,11 @@ class MTVServicesInfoExtractor(InfoExtractor):
              sm4_embed = self._html_search_meta(
                  'sm4:video:embed', webpage, 'sm4 embed', default='')
              mgid = self._search_regex(
-                r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid', default=default)
+                r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid', default=None)
+
+        if not mgid:
+            mgid = self._extract_triforce_mgid(webpage)
+
          return mgid
  
      def _real_extract(self, url):
@@ -272,7 +304,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
  
  class MTVIE(MTVServicesInfoExtractor):
      IE_NAME = 'mtv'
-    _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
+    _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://www.mtv.com/feeds/mrss/'
  
      _TESTS = [{
@@ -289,9 +321,41 @@ class MTVIE(MTVServicesInfoExtractor):
      }, {
          'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
          'only_matching': True,
+    }, {
+        'url': 'http://www.mtv.com/episodes/g8xu7q/teen-mom-2-breaking-the-wall-season-7-ep-713',
+        'only_matching': True,
      }]
  
  
+class MTV81IE(InfoExtractor):
+    IE_NAME = 'mtv81'
+    _VALID_URL = r'https?://(?:www\.)?mtv81\.com/videos/(?P<id>[^/?#.]+)'
+
+    _TEST = {
+        'url': 'http://www.mtv81.com/videos/artist-to-watch/the-godfather-of-japanese-hip-hop-segment-1/',
+        'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
+        'info_dict': {
+            'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
+            'ext': 'mp4',
+            'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
+            'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
+            'timestamp': 1468846800,
+            'upload_date': '20160718',
+        },
+    }
+
+    def _extract_mgid(self, webpage):
+        return self._search_regex(
+            r'getTheVideo\((["\'])(?P<id>mgid:.+?)\1', webpage,
+            'mgid', group='id')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        mgid = self._extract_mgid(webpage)
+        return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
+
+
  class MTVVideoIE(MTVServicesInfoExtractor):
      IE_NAME = 'mtv:video'
      _VALID_URL = r'''(?x)^https?://
diff --git a/youtube_dl/extractor/muenchentv.py b/youtube_dl/extractor/muenchentv.py

index d9f17613633d245283f5f5745acca2feb273cbf5..2cc2bf229b3bee21fa5b79e40af2a666cd4ccea1 100644 (file)
--- a/youtube_dl/extractor/muenchentv.py
+++ b/youtube_dl/extractor/muenchentv.py
@@ -22,7 +22,7 @@ class MuenchenTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 're:^münchen.tv-Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'is_live': True,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/mwave.py b/youtube_dl/extractor/mwave.py

index fea1caf478b2a862ae3a028b4a80041b734a5e1b..a67276596f0cf148d2f944a1f1375831a18eca4a 100644 (file)
--- a/youtube_dl/extractor/mwave.py
+++ b/youtube_dl/extractor/mwave.py
@@ -18,7 +18,7 @@ class MwaveIE(InfoExtractor):
              'id': '168859',
              'ext': 'flv',
              'title': '[M COUNTDOWN] SISTAR - SHAKE IT',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'M COUNTDOWN',
              'duration': 206,
              'view_count': int,
@@ -70,7 +70,7 @@ class MwaveMeetGreetIE(InfoExtractor):
              'id': '173294',
              'ext': 'flv',
              'title': '[MEET&GREET] Park BoRam',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Mwave',
              'duration': 3634,
              'view_count': int,
diff --git a/youtube_dl/extractor/myspace.py b/youtube_dl/extractor/myspace.py

index ab32e632e34375561980f168834443754f606383..f281238c93cd0f184b5a5213e67fb16ac82bc8c7 100644 (file)
--- a/youtube_dl/extractor/myspace.py
+++ b/youtube_dl/extractor/myspace.py
@@ -17,9 +17,10 @@ class MySpaceIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'https://myspace.com/fiveminutestothestage/video/little-big-town/109594919',
+            'md5': '9c1483c106f4a695c47d2911feed50a7',
              'info_dict': {
                  'id': '109594919',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Little Big Town',
                  'description': 'This country quartet was all smiles while playing a sold out show at the Pacific Amphitheatre in Orange County, California.',
                  'uploader': 'Five Minutes to the Stage',
@@ -27,37 +28,30 @@ class MySpaceIE(InfoExtractor):
                  'timestamp': 1414108751,
                  'upload_date': '20141023',
              },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            },
          },
          # songs
          {
              'url': 'https://myspace.com/killsorrow/music/song/of-weakened-soul...-93388656-103880681',
+            'md5': '1d7ee4604a3da226dd69a123f748b262',
              'info_dict': {
                  'id': '93388656',
-                'ext': 'flv',
+                'ext': 'm4a',
                  'title': 'Of weakened soul...',
                  'uploader': 'Killsorrow',
                  'uploader_id': 'killsorrow',
              },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            },
          }, {
-            'add_ie': ['Vevo'],
+            'add_ie': ['Youtube'],
              'url': 'https://myspace.com/threedaysgrace/music/song/animal-i-have-become-28400208-28218041',
              'info_dict': {
-                'id': 'USZM20600099',
-                'ext': 'mp4',
-                'title': 'Animal I Have Become',
-                'uploader': 'Three Days Grace',
-                'timestamp': int,
-                'upload_date': '20060502',
+                'id': 'xqds0B_meys',
+                'ext': 'webm',
+                'title': 'Three Days Grace - Animal I Have Become',
+                'description': 'md5:8bd86b3693e72a077cf863a8530c54bb',
+                'uploader': 'ThreeDaysGraceVEVO',
+                'uploader_id': 'ThreeDaysGraceVEVO',
+                'upload_date': '20091002',
              },
-            'skip': 'VEVO is only available in some countries',
          }, {
              'add_ie': ['Youtube'],
              'url': 'https://myspace.com/starset2/music/song/first-light-95799905-106964426',
@@ -76,24 +70,46 @@ class MySpaceIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
+        is_song = mobj.group('mediatype').startswith('music/song')
          webpage = self._download_webpage(url, video_id)
          player_url = self._search_regex(
-            r'playerSwf":"([^"?]*)', webpage, 'player URL')
+            r'videoSwf":"([^"?]*)', webpage, 'player URL', fatal=False)
  
-        def rtmp_format_from_stream_url(stream_url, width=None, height=None):
-            rtmp_url, play_path = stream_url.split(';', 1)
-            return {
-                'format_id': 'rtmp',
-                'url': rtmp_url,
-                'play_path': play_path,
-                'player_url': player_url,
-                'protocol': 'rtmp',
-                'ext': 'flv',
-                'width': width,
-                'height': height,
-            }
+        def formats_from_stream_urls(stream_url, hls_stream_url, http_stream_url, width=None, height=None):
+            formats = []
+            vcodec = 'none' if is_song else None
+            if hls_stream_url:
+                formats.append({
+                    'format_id': 'hls',
+                    'url': hls_stream_url,
+                    'protocol': 'm3u8_native',
+                    'ext': 'm4a' if is_song else 'mp4',
+                    'vcodec': vcodec,
+                })
+            if stream_url and player_url:
+                rtmp_url, play_path = stream_url.split(';', 1)
+                formats.append({
+                    'format_id': 'rtmp',
+                    'url': rtmp_url,
+                    'play_path': play_path,
+                    'player_url': player_url,
+                    'protocol': 'rtmp',
+                    'ext': 'flv',
+                    'width': width,
+                    'height': height,
+                    'vcodec': vcodec,
+                })
+            if http_stream_url:
+                formats.append({
+                    'format_id': 'http',
+                    'url': http_stream_url,
+                    'width': width,
+                    'height': height,
+                    'vcodec': vcodec,
+                })
+            return formats
  
-        if mobj.group('mediatype').startswith('music/song'):
+        if is_song:
              # songs don't store any useful info in the 'context' variable
              song_data = self._search_regex(
                  r'''<button.*data-song-id=(["\'])%s\1.*''' % video_id,
@@ -108,8 +124,10 @@ class MySpaceIE(InfoExtractor):
                  return self._search_regex(
                      r'''data-%s=([\'"])(?P<data>.*?)\1''' % name,
                      song_data, name, default='', group='data')
-            stream_url = search_data('stream-url')
-            if not stream_url:
+            formats = formats_from_stream_urls(
+                search_data('stream-url'), search_data('hls-stream-url'),
+                search_data('http-stream-url'))
+            if not formats:
                  vevo_id = search_data('vevo-id')
                  youtube_id = search_data('youtube-id')
                  if vevo_id:
@@ -121,6 +139,7 @@ class MySpaceIE(InfoExtractor):
                  else:
                      raise ExtractorError(
                          'Found song but don\'t know how to download it')
+            self._sort_formats(formats)
              return {
                  'id': video_id,
                  'title': self._og_search_title(webpage),
@@ -128,27 +147,16 @@ class MySpaceIE(InfoExtractor):
                  'uploader_id': search_data('artist-username'),
                  'thumbnail': self._og_search_thumbnail(webpage),
                  'duration': int_or_none(search_data('duration')),
-                'formats': [rtmp_format_from_stream_url(stream_url)]
+                'formats': formats,
              }
          else:
              video = self._parse_json(self._search_regex(
                  r'context = ({.*?});', webpage, 'context'),
                  video_id)['video']
-            formats = []
-            hls_stream_url = video.get('hlsStreamUrl')
-            if hls_stream_url:
-                formats.append({
-                    'format_id': 'hls',
-                    'url': hls_stream_url,
-                    'protocol': 'm3u8_native',
-                    'ext': 'mp4',
-                })
-            stream_url = video.get('streamUrl')
-            if stream_url:
-                formats.append(rtmp_format_from_stream_url(
-                    stream_url,
-                    int_or_none(video.get('width')),
-                    int_or_none(video.get('height'))))
+            formats = formats_from_stream_urls(
+                video.get('streamUrl'), video.get('hlsStreamUrl'),
+                video.get('mp4StreamUrl'), int_or_none(video.get('width')),
+                int_or_none(video.get('height')))
              self._sort_formats(formats)
              return {
                  'id': video_id,
diff --git a/youtube_dl/extractor/myvi.py b/youtube_dl/extractor/myvi.py

index 4c65be122fd536c6e39fd9f10052e8933e96417d..621ae74a7930cbaefb1f5c867de27d70499fe5d6 100644 (file)
--- a/youtube_dl/extractor/myvi.py
+++ b/youtube_dl/extractor/myvi.py
@@ -27,7 +27,7 @@ class MyviIE(SprutoBaseIE):
              'id': 'f16b2bbd-cde8-481c-a981-7cd48605df43',
              'ext': 'mp4',
              'title': 'хозяин жизни',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 25,
          },
      }, {
diff --git a/youtube_dl/extractor/myvideo.py b/youtube_dl/extractor/myvideo.py

index 6d447a4935e49cd3c4f7525fff6ffe5e9883656e..6bb64eb63c52018c34650a6361e7d70cad2459e0 100644 (file)
--- a/youtube_dl/extractor/myvideo.py
+++ b/youtube_dl/extractor/myvideo.py
@@ -160,7 +160,7 @@ class MyVideoIE(InfoExtractor):
          else:
              video_playpath = ''
  
-        video_swfobj = self._search_regex('swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj')
+        video_swfobj = self._search_regex(r'swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj')
          video_swfobj = compat_urllib_parse_unquote(video_swfobj)
  
          video_title = self._html_search_regex("<h1(?: class='globalHd')?>(.*?)</h1>",
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index 055070ff54fd8990c2e58ab1d6df037b19f3a029..e8131333f8458505b7a323f378c5fee848414934 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -12,10 +12,10 @@ from ..utils import (
  
  
  class NaverIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
  
      _TESTS = [{
-        'url': 'http://tvcast.naver.com/v/81652',
+        'url': 'http://tv.naver.com/v/81652',
          'info_dict': {
              'id': '81652',
              'ext': 'mp4',
@@ -24,7 +24,7 @@ class NaverIE(InfoExtractor):
              'upload_date': '20130903',
          },
      }, {
-        'url': 'http://tvcast.naver.com/v/395837',
+        'url': 'http://tv.naver.com/v/395837',
          'md5': '638ed4c12012c458fefcddfd01f173cd',
          'info_dict': {
              'id': '395837',
@@ -34,6 +34,9 @@ class NaverIE(InfoExtractor):
              'upload_date': '20150519',
          },
          'skip': 'Georestricted',
+    }, {
+        'url': 'http://tvcast.naver.com/v/81652',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py

index 7f1bd9229303ec0390c9d10937374a0cc986790b..434a94de49b9f1623385f20386feaa1b34da75fe 100644 (file)
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -9,6 +9,7 @@ from ..utils import (
      lowercase_escape,
      smuggle_url,
      unescapeHTML,
+    update_url_query,
  )
  
  
@@ -208,7 +209,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
              'md5': 'af1adfa51312291a017720403826bb64',
              'info_dict': {
-                'id': '269389891880',
+                'id': 'p_tweet_snow_140529',
                  'ext': 'mp4',
                  'title': 'How Twitter Reacted To The Snowden Interview',
                  'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
@@ -232,7 +233,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
              'md5': '73135a2e0ef819107bbb55a5a9b2a802',
              'info_dict': {
-                'id': '394064451844',
+                'id': 'nn_netcast_150204',
                  'ext': 'mp4',
                  'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
                  'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
@@ -245,7 +246,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
              'md5': 'a49e173825e5fcd15c13fc297fced39d',
              'info_dict': {
-                'id': '529953347624',
+                'id': 'x_lon_vwhorn_150922',
                  'ext': 'mp4',
                  'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
                  'description': 'md5:c8be487b2d80ff0594c005add88d8351',
@@ -258,7 +259,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
              'md5': '118d7ca3f0bea6534f119c68ef539f71',
              'info_dict': {
-                'id': '669831235788',
+                'id': 'tdy_al_space_160420',
                  'ext': 'mp4',
                  'title': 'See the aurora borealis from space in stunning new NASA video',
                  'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
@@ -271,15 +272,14 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
              'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
              'info_dict': {
-                'id': '314487875924',
+                'id': 'n_hayes_Aimm_140801_272214',
                  'ext': 'mp4',
                  'title': 'The chaotic GOP immigration vote',
                  'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': 1406937606,
                  'upload_date': '20140802',
                  'uploader': 'NBCU-NEWS',
-                'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
              },
          },
          {
@@ -311,28 +311,41 @@ class NBCNewsIE(ThePlatformIE):
          else:
              # "feature" and "nightly-news" pages use theplatform.com
              video_id = mobj.group('mpx_id')
-            if not video_id.isdigit():
-                webpage = self._download_webpage(url, video_id)
-                info = None
-                bootstrap_json = self._search_regex(
-                    [r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
-                     r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"'],
-                    webpage, 'bootstrap json', default=None)
+            webpage = self._download_webpage(url, video_id)
+
+            filter_param = 'byId'
+            bootstrap_json = self._search_regex(
+                [r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
+                 r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"',
+                 r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);'],
+                webpage, 'bootstrap json', default=None)
+            if bootstrap_json:
                  bootstrap = self._parse_json(
                      bootstrap_json, video_id, transform_source=unescapeHTML)
+
+                info = None
                  if 'results' in bootstrap:
                      info = bootstrap['results'][0]['video']
                  elif 'video' in bootstrap:
                      info = bootstrap['video']
+                elif 'msnbcVideoInfo' in bootstrap:
+                    info = bootstrap['msnbcVideoInfo']['meta']
+                elif 'msnbcThePlatform' in bootstrap:
+                    info = bootstrap['msnbcThePlatform']['videoPlayer']['video']
                  else:
                      info = bootstrap
-                video_id = info['mpxId']
+
+                if 'guid' in info:
+                    video_id = info['guid']
+                    filter_param = 'byGuid'
+                elif 'mpxId' in info:
+                    video_id = info['mpxId']
  
              return {
                  '_type': 'url_transparent',
                  'id': video_id,
                  # http://feed.theplatform.com/f/2E2eJC/nbcnews also works
-                'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
+                'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {filter_param: video_id}),
                  'ie_key': 'ThePlatformFeed',
              }
  
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index e3b0da2e966eb9486ab5307a933c51d74f2a14ba..07528d140f38bfa68a0d04cb85978d1017bae547 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -302,7 +302,7 @@ class NDREmbedIE(NDREmbedBaseIE):
          'info_dict': {
              'id': 'livestream217',
              'ext': 'flv',
-            'title': 're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
              'upload_date': '20150910',
          },
@@ -367,7 +367,7 @@ class NJoyEmbedIE(NDREmbedBaseIE):
          'info_dict': {
              'id': 'webradioweltweit100',
              'ext': 'mp3',
-            'title': 're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
              'uploader': 'njoy',
              'upload_date': '20150810',
diff --git a/youtube_dl/extractor/ndtv.py b/youtube_dl/extractor/ndtv.py

index 96528f6499d1e02c5208e61fe8abd1f606b29392..255f608783edad0aa3838de028dd8ea07d9ae1b0 100644 (file)
--- a/youtube_dl/extractor/ndtv.py
+++ b/youtube_dl/extractor/ndtv.py
@@ -21,7 +21,7 @@ class NDTVIE(InfoExtractor):
              'description': 'md5:ab2d4b4a6056c5cb4caa6d729deabf02',
              'upload_date': '20131208',
              'duration': 1327,
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
          },
      }
  
diff --git a/youtube_dl/extractor/netzkino.py b/youtube_dl/extractor/netzkino.py

index 0d165a82ad53ac8ac16ca8943c934db9fb28b720..aec3026b12755e38d3ace0e8978a72409dcc562c 100644 (file)
--- a/youtube_dl/extractor/netzkino.py
+++ b/youtube_dl/extractor/netzkino.py
@@ -25,7 +25,7 @@ class NetzkinoIE(InfoExtractor):
              'comments': 'mincount:3',
              'description': 'md5:1eddeacc7e62d5a25a2d1a7290c64a28',
              'upload_date': '20120813',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'timestamp': 1344858571,
              'age_limit': 12,
          },
diff --git a/youtube_dl/extractor/nextmedia.py b/youtube_dl/extractor/nextmedia.py

index dee9056d39e9bb0076d390054006c6dd4246afae..680f03aad4b318a70806555ac14d57a4bdfd05e0 100644 (file)
--- a/youtube_dl/extractor/nextmedia.py
+++ b/youtube_dl/extractor/nextmedia.py
@@ -2,7 +2,15 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import parse_iso8601
+from ..compat import compat_urlparse
+from ..utils import (
+    clean_html,
+    get_element_by_class,
+    int_or_none,
+    parse_iso8601,
+    remove_start,
+    unified_timestamp,
+)
  
  
  class NextMediaIE(InfoExtractor):
@@ -15,7 +23,7 @@ class NextMediaIE(InfoExtractor):
              'id': '53109199',
              'ext': 'mp4',
              'title': '【佔領金鐘】50外國領事議員撐場 讚學生勇敢香港有希望',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:28222b9912b6665a21011b034c70fcc7',
              'timestamp': 1415456273,
              'upload_date': '20141108',
@@ -30,6 +38,12 @@ class NextMediaIE(InfoExtractor):
          return self._extract_from_nextmedia_page(news_id, url, page)
  
      def _extract_from_nextmedia_page(self, news_id, url, page):
+        redirection_url = self._search_regex(
+            r'window\.location\.href\s*=\s*([\'"])(?P<url>(?!\1).+)\1',
+            page, 'redirection URL', default=None, group='url')
+        if redirection_url:
+            return self.url_result(compat_urlparse.urljoin(url, redirection_url))
+
          title = self._fetch_title(page)
          video_url = self._search_regex(self._URL_PATTERN, page, 'video url')
  
@@ -76,7 +90,7 @@ class NextMediaActionNewsIE(NextMediaIE):
              'id': '19009428',
              'ext': 'mp4',
              'title': '【壹週刊】細10年男友偷食　50歲邵美琪再失戀',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:cd802fad1f40fd9ea178c1e2af02d659',
              'timestamp': 1421791200,
              'upload_date': '20150120',
@@ -93,7 +107,7 @@ class NextMediaActionNewsIE(NextMediaIE):
  
  class AppleDailyIE(NextMediaIE):
      IE_DESC = '臺灣蘋果日報'
-    _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
+    _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/[^/]+/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
      _TESTS = [{
          'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
          'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -101,7 +115,7 @@ class AppleDailyIE(NextMediaIE):
              'id': '36354694',
              'ext': 'mp4',
              'title': '周亭羽走過摩鐵陰霾2男陪吃 九把刀孤寒看醫生',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:2acd430e59956dc47cd7f67cb3c003f4',
              'upload_date': '20150128',
          }
@@ -112,7 +126,7 @@ class AppleDailyIE(NextMediaIE):
              'id': '550549',
              'ext': 'mp4',
              'title': '不滿被踩腳　山東兩大媽一路打下車',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:175b4260c1d7c085993474217e4ab1b4',
              'upload_date': '20150128',
          }
@@ -123,7 +137,7 @@ class AppleDailyIE(NextMediaIE):
              'id': '5003671',
              'ext': 'mp4',
              'title': '20正妹熱舞　《刀龍傳說Online》火辣上市',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:23c0aac567dc08c9c16a3161a2c2e3cd',
              'upload_date': '20150128',
          },
@@ -150,13 +164,17 @@ class AppleDailyIE(NextMediaIE):
              'id': '35770334',
              'ext': 'mp4',
              'title': '咖啡占卜測 XU裝熟指數',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
              'upload_date': '20140417',
          },
      }, {
          'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
          'only_matching': True,
+    }, {
+        # Redirected from http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694
+        'url': 'http://ent.appledaily.com.tw/section/article/headline/20150128/36354694',
+        'only_matching': True,
      }]
  
      _URL_PATTERN = r'\{url: \'(.+)\'\}'
@@ -173,3 +191,48 @@ class AppleDailyIE(NextMediaIE):
  
      def _fetch_description(self, page):
          return self._html_search_meta('description', page, 'news description')
+
+
+class NextTVIE(InfoExtractor):
+    IE_DESC = '壹電視'
+    _VALID_URL = r'https?://(?:www\.)?nexttv\.com\.tw/(?:[^/]+/)+(?P<id>\d+)'
+
+    _TEST = {
+        'url': 'http://www.nexttv.com.tw/news/realtime/politics/11779671',
+        'info_dict': {
+            'id': '11779671',
+            'ext': 'mp4',
+            'title': '「超收稅」近4千億！　藍議員籲發消費券',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1484825400,
+            'upload_date': '20170119',
+            'view_count': int,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._html_search_regex(
+            r'<h1[^>]*>([^<]+)</h1>', webpage, 'title')
+
+        data = self._hidden_inputs(webpage)
+
+        video_url = data['ntt-vod-src-detailview']
+
+        date_str = get_element_by_class('date', webpage)
+        timestamp = unified_timestamp(date_str + '+0800') if date_str else None
+
+        view_count = int_or_none(remove_start(
+            clean_html(get_element_by_class('click', webpage)), '點閱：'))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'url': video_url,
+            'thumbnail': data.get('ntt-vod-img-src'),
+            'timestamp': timestamp,
+            'view_count': view_count,
+        }
diff --git a/youtube_dl/extractor/nfl.py b/youtube_dl/extractor/nfl.py

index 3930d16f16e4d295e9afeb84f88eb36dc7ffc30b..460deb162df7994caa389b1f37c4174cec3fbf78 100644 (file)
--- a/youtube_dl/extractor/nfl.py
+++ b/youtube_dl/extractor/nfl.py
@@ -72,7 +72,7 @@ class NFLIE(InfoExtractor):
              'description': 'md5:56323bfb0ac4ee5ab24bd05fdf3bf478',
              'upload_date': '20140921',
              'timestamp': 1411337580,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://prod.www.steelers.clubs.nfl.com/video-and-audio/videos/LIVE_Post_Game_vs_Browns/9d72f26a-9e2b-4718-84d3-09fb4046c266',
@@ -84,7 +84,7 @@ class NFLIE(InfoExtractor):
              'description': 'md5:6a97f7e5ebeb4c0e69a418a89e0636e8',
              'upload_date': '20131229',
              'timestamp': 1388354455,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish',
diff --git a/youtube_dl/extractor/nick.py b/youtube_dl/extractor/nick.py

index 7672845bfd0c6ebbc08ef326f024f4a02bb44a71..08a75929e1e049249759f94e6179cfa8932ba87c 100644 (file)
--- a/youtube_dl/extractor/nick.py
+++ b/youtube_dl/extractor/nick.py
@@ -10,7 +10,7 @@ from ..utils import update_url_query
  class NickIE(MTVServicesInfoExtractor):
      # None of videos on the website are still alive?
      IE_NAME = 'nick.com'
-    _VALID_URL = r'https?://(?:www\.)?nick(?:jr)?\.com/(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
+    _VALID_URL = r'https?://(?:(?:www|beta)\.)?nick(?:jr)?\.com/(?:[^/]+/)?(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
      _TESTS = [{
          'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
@@ -57,6 +57,9 @@ class NickIE(MTVServicesInfoExtractor):
      }, {
          'url': 'http://www.nickjr.com/paw-patrol/videos/pups-save-a-goldrush-s3-ep302-full-episode/',
          'only_matching': True,
+    }, {
+        'url': 'http://beta.nick.com/nicky-ricky-dicky-and-dawn/videos/nicky-ricky-dicky-dawn-301-full-episode/',
+        'only_matching': True,
      }]
  
      def _get_feed_query(self, uri):
diff --git a/youtube_dl/extractor/niconico.py b/youtube_dl/extractor/niconico.py

index a104e33f8bdea73540779e41db45d92c1249668a..8baac23e4b16643a59e6a83570862b6a5afd2b45 100644 (file)
--- a/youtube_dl/extractor/niconico.py
+++ b/youtube_dl/extractor/niconico.py
@@ -7,7 +7,6 @@ import datetime
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -40,6 +39,7 @@ class NiconicoIE(InfoExtractor):
              'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
              'duration': 33,
          },
+        'skip': 'Requires an account',
      }, {
          # File downloaded with and without credentials are different, so omit
          # the md5 field
@@ -55,6 +55,7 @@ class NiconicoIE(InfoExtractor):
              'timestamp': 1304065916,
              'duration': 209,
          },
+        'skip': 'Requires an account',
      }, {
          # 'video exists but is marked as "deleted"
          # md5 is unstable
@@ -65,9 +66,10 @@ class NiconicoIE(InfoExtractor):
              'description': 'deleted',
              'title': 'ドラえもんエターナル第3話「決戦第3新東京市」＜前編＞',
              'upload_date': '20071224',
-            'timestamp': 1198527840,  # timestamp field has different value if logged in
+            'timestamp': int,  # timestamp field has different value if logged in
              'duration': 304,
          },
+        'skip': 'Requires an account',
      }, {
          'url': 'http://www.nicovideo.jp/watch/so22543406',
          'info_dict': {
@@ -79,13 +81,12 @@ class NiconicoIE(InfoExtractor):
              'upload_date': '20140104',
              'uploader': 'アニメロチャンネル',
              'uploader_id': '312',
-        }
+        },
+        'skip': 'The viewing period of the video you were searching for has expired.',
      }]
  
      _VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
      _NETRC_MACHINE = 'niconico'
-    # Determine whether the downloader used authentication to download video
-    _AUTHENTICATED = False
  
      def _real_initialize(self):
          self._login()
@@ -109,8 +110,6 @@ class NiconicoIE(InfoExtractor):
          if re.search(r'(?i)<h1 class="mb8p4">Log in error</h1>', login_results) is not None:
              self._downloader.report_warning('unable to log in: bad username or password')
              return False
-        # Successful login
-        self._AUTHENTICATED = True
          return True
  
      def _real_extract(self, url):
@@ -128,35 +127,19 @@ class NiconicoIE(InfoExtractor):
              'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
              note='Downloading video info page')
  
-        if self._AUTHENTICATED:
-            # Get flv info
-            flv_info_webpage = self._download_webpage(
-                'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
-                video_id, 'Downloading flv info')
-        else:
-            # Get external player info
-            ext_player_info = self._download_webpage(
-                'http://ext.nicovideo.jp/thumb_watch/' + video_id, video_id)
-            thumb_play_key = self._search_regex(
-                r'\'thumbPlayKey\'\s*:\s*\'(.*?)\'', ext_player_info, 'thumbPlayKey')
-
-            # Get flv info
-            flv_info_data = compat_urllib_parse_urlencode({
-                'k': thumb_play_key,
-                'v': video_id
-            })
-            flv_info_request = sanitized_Request(
-                'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
-                {'Content-Type': 'application/x-www-form-urlencoded'})
-            flv_info_webpage = self._download_webpage(
-                flv_info_request, video_id,
-                note='Downloading flv info', errnote='Unable to download flv info')
+        # Get flv info
+        flv_info_webpage = self._download_webpage(
+            'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
+            video_id, 'Downloading flv info')
  
          flv_info = compat_urlparse.parse_qs(flv_info_webpage)
          if 'url' not in flv_info:
              if 'deleted' in flv_info:
                  raise ExtractorError('The video has been deleted.',
                                       expected=True)
+            elif 'closed' in flv_info:
+                raise ExtractorError('Niconico videos now require logging in',
+                                     expected=True)
              else:
                  raise ExtractorError('Unable to find video URL')
  
diff --git a/youtube_dl/extractor/nosvideo.py b/youtube_dl/extractor/nosvideo.py

index eab816e4916bc2fae7d72cde598cb5b5f69bfde4..53c500c351690cf82872f3c724d6a5c60a6b5f93 100644 (file)
--- a/youtube_dl/extractor/nosvideo.py
+++ b/youtube_dl/extractor/nosvideo.py
@@ -17,7 +17,7 @@ _x = lambda p: xpath_with_ns(p, {'xspf': 'http://xspf.org/ns/0/'})
  
  class NosVideoIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?nosvideo\.com/' + \
-                 '(?:embed/|\?v=)(?P<id>[A-Za-z0-9]{12})/?'
+                 r'(?:embed/|\?v=)(?P<id>[A-Za-z0-9]{12})/?'
      _PLAYLIST_URL = 'http://nosvideo.com/xml/{xml_id:s}.xml'
      _FILE_DELETED_REGEX = r'<b>File Not Found</b>'
      _TEST = {
@@ -27,7 +27,7 @@ class NosVideoIE(InfoExtractor):
              'id': 'mu8fle7g7rpq',
              'ext': 'mp4',
              'title': 'big_buck_bunny_480p_surround-fix.avi.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/nova.py b/youtube_dl/extractor/nova.py

index 103952345aa98ed186515452baf2f945409ffdaa..06cb8cb3f5a867104a8bd67abd9ce60bbc1df25f 100644 (file)
--- a/youtube_dl/extractor/nova.py
+++ b/youtube_dl/extractor/nova.py
@@ -21,7 +21,7 @@ class NovaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Duel: Michal Hrdlička a Petr Suchoň',
              'description': 'md5:d0cc509858eee1b1374111c588c6f5d5',
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
          },
          'params': {
              # rtmp download
@@ -36,7 +36,7 @@ class NovaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Podzemní nemocnice v pražské Krči',
              'description': 'md5:f0a42dd239c26f61c28f19e62d20ef53',
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
          }
      }, {
          'url': 'http://novaplus.nova.cz/porad/policie-modrava/video/5591-policie-modrava-15-dil-blondynka-na-hrbitove',
@@ -46,7 +46,7 @@ class NovaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Policie Modrava - 15. díl - Blondýnka na hřbitově',
              'description': 'md5:dc24e50be5908df83348e50d1431295e',  # Make sure this description is clean of html tags
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
          },
          'params': {
              # rtmp download
@@ -58,7 +58,7 @@ class NovaIE(InfoExtractor):
              'id': '1756858',
              'ext': 'flv',
              'title': 'Televizní noviny - 30. 5. 2015',
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
              'upload_date': '20150530',
          },
          'params': {
@@ -72,7 +72,7 @@ class NovaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Zaklínač 3: Divoký hon',
              'description': 're:.*Pokud se stejně jako my nemůžete.*',
-            'thumbnail': 're:https?://.*\.jpg(\?.*)?',
+            'thumbnail': r're:https?://.*\.jpg(\?.*)?',
              'upload_date': '20150521',
          },
          'params': {
diff --git a/youtube_dl/extractor/novamov.py b/youtube_dl/extractor/novamov.py

index 3bbd4735502e113fcc46a07981ff5863c52fef15..829c71960442a1159efd729f30704d046e92babd 100644 (file)
--- a/youtube_dl/extractor/novamov.py
+++ b/youtube_dl/extractor/novamov.py
@@ -24,7 +24,7 @@ class NovaMovIE(InfoExtractor):
                                  )
                                  (?P<id>[a-z\d]{13})
                              '''
-    _VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
+    _VALID_URL = _VALID_URL_TEMPLATE % {'host': r'novamov\.com'}
  
      _HOST = 'www.novamov.com'
  
@@ -104,7 +104,7 @@ class WholeCloudIE(NovaMovIE):
      IE_NAME = 'wholecloud'
      IE_DESC = 'WholeCloud'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': '(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
  
      _HOST = 'www.wholecloud.net'
  
@@ -128,7 +128,7 @@ class NowVideoIE(NovaMovIE):
      IE_NAME = 'nowvideo'
      IE_DESC = 'NowVideo'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
  
      _HOST = 'www.nowvideo.to'
  
@@ -152,7 +152,7 @@ class VideoWeedIE(NovaMovIE):
      IE_NAME = 'videoweed'
      IE_DESC = 'VideoWeed'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'videoweed\.(?:es|com)'}
  
      _HOST = 'www.videoweed.es'
  
@@ -176,7 +176,7 @@ class CloudTimeIE(NovaMovIE):
      IE_NAME = 'cloudtime'
      IE_DESC = 'CloudTime'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'cloudtime\.to'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'cloudtime\.to'}
  
      _HOST = 'www.cloudtime.to'
  
@@ -190,7 +190,7 @@ class AuroraVidIE(NovaMovIE):
      IE_NAME = 'auroravid'
      IE_DESC = 'AuroraVid'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'auroravid\.to'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'auroravid\.to'}
  
      _HOST = 'www.auroravid.to'
  
diff --git a/youtube_dl/extractor/nowness.py b/youtube_dl/extractor/nowness.py

index 7e53463164b281e84a349a6fc382f5e203f278a4..b6c5ee6e417e12d35731e45e5755435867a3b67b 100644 (file)
--- a/youtube_dl/extractor/nowness.py
+++ b/youtube_dl/extractor/nowness.py
@@ -62,7 +62,7 @@ class NownessIE(NownessBaseIE):
              'ext': 'mp4',
              'title': 'Candor: The Art of Gesticulation',
              'description': 'Candor: The Art of Gesticulation',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1446745676,
              'upload_date': '20151105',
              'uploader_id': '2385340575001',
@@ -76,7 +76,7 @@ class NownessIE(NownessBaseIE):
              'ext': 'mp4',
              'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
              'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1407315371,
              'upload_date': '20140806',
              'uploader_id': '2385340575001',
@@ -91,7 +91,7 @@ class NownessIE(NownessBaseIE):
              'ext': 'mp4',
              'title': 'Bleu, Blanc, Rouge - A Godard Supercut',
              'description': 'md5:f0ea5f1857dffca02dbd37875d742cec',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'upload_date': '20150607',
              'uploader': 'Cinema Sem Lei',
              'uploader_id': 'cinemasemlei',
diff --git a/youtube_dl/extractor/nowtv.py b/youtube_dl/extractor/nowtv.py

index 916a102bfc381cbfe9d2baf83ceb5d39241cd69d..e43b37136e13f2547b43bc474ae3654d36af6595 100644 (file)
--- a/youtube_dl/extractor/nowtv.py
+++ b/youtube_dl/extractor/nowtv.py
@@ -83,7 +83,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Inka Bause stellt die neuen Bauern vor',
              'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432580700,
              'upload_date': '20150525',
              'duration': 2786,
@@ -101,7 +101,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Berlin - Tag & Nacht (Folge 934)',
              'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432666800,
              'upload_date': '20150526',
              'duration': 2641,
@@ -119,7 +119,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Hals- und Beinbruch',
              'description': 'md5:b50d248efffe244e6f56737f0911ca57',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432415400,
              'upload_date': '20150523',
              'duration': 2742,
@@ -137,7 +137,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Angst!',
              'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1222632900,
              'upload_date': '20080928',
              'duration': 3025,
@@ -155,7 +155,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Thema u.a.: Der erste Blick: Die Apple Watch',
              'description': 'md5:4312b6c9d839ffe7d8caf03865a531af',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432751700,
              'upload_date': '20150527',
              'duration': 1083,
@@ -173,7 +173,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': "Büro-Fall / Chihuahua 'Joel'",
              'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432408200,
              'upload_date': '20150523',
              'duration': 3092,
diff --git a/youtube_dl/extractor/noz.py b/youtube_dl/extractor/noz.py

index c47a33d1570537aedc0dc3c2415a5f138fbdc5bb..ccafd77232b5f1e69f61dd5a99e904c7775a51a7 100644 (file)
--- a/youtube_dl/extractor/noz.py
+++ b/youtube_dl/extractor/noz.py
@@ -24,7 +24,7 @@ class NozIE(InfoExtractor):
              'duration': 215,
              'title': '3:2 - Deutschland gewinnt Badminton-Länderspiel in Melle',
              'description': 'Vor rund 370 Zuschauern gewinnt die deutsche Badminton-Nationalmannschaft am Donnerstag ein EM-Vorbereitungsspiel gegen Frankreich in Melle. Video Moritz Frankenberg.',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/npo.py b/youtube_dl/extractor/npo.py

index c91f5846171be2a720523a4531313703d18920fd..9624371450d0d555a8d843aac84346eae4b236fc 100644 (file)
--- a/youtube_dl/extractor/npo.py
+++ b/youtube_dl/extractor/npo.py
@@ -241,7 +241,7 @@ class NPOIE(NPOBaseIE):
          if metadata.get('tt888') == 'ja':
              subtitles['nl'] = [{
                  'ext': 'vtt',
-                'url': 'http://e.omroep.nl/tt888/%s' % video_id,
+                'url': 'http://tt888.omroep.nl/tt888/%s' % video_id,
              }]
  
          return {
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index c89aac63ee90f133074d8ade8b7af23cf020f148..fc3c0cd3ccb25ab8c41fdb1b8e9b424458c93209 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -48,6 +48,13 @@ class NRKBaseIE(InfoExtractor):
  
          entries = []
  
+        conviva = data.get('convivaStatistics') or {}
+        live = (data.get('mediaElementType') == 'Live' or
+                data.get('isLive') is True or conviva.get('isLive'))
+
+        def make_title(t):
+            return self._live_title(t) if live else t
+
          media_assets = data.get('mediaAssets')
          if media_assets and isinstance(media_assets, list):
              def video_id_and_title(idx):
@@ -61,6 +68,13 @@ class NRKBaseIE(InfoExtractor):
                  if not formats:
                      continue
                  self._sort_formats(formats)
+
+                # Some f4m streams may not work with hdcore in fragments' URLs
+                for f in formats:
+                    extra_param = f.get('extra_param_to_segment_url')
+                    if extra_param and 'hdcore' in extra_param:
+                        del f['extra_param_to_segment_url']
+
                  entry_id, entry_title = video_id_and_title(num)
                  duration = parse_duration(asset.get('duration'))
                  subtitles = {}
@@ -72,7 +86,7 @@ class NRKBaseIE(InfoExtractor):
                          })
                  entries.append({
                      'id': asset.get('carrierId') or entry_id,
-                    'title': entry_title,
+                    'title': make_title(entry_title),
                      'duration': duration,
                      'subtitles': subtitles,
                      'formats': formats,
@@ -87,7 +101,7 @@ class NRKBaseIE(InfoExtractor):
                  duration = parse_duration(data.get('duration'))
                  entries = [{
                      'id': video_id,
-                    'title': title,
+                    'title': make_title(title),
                      'duration': duration,
                      'formats': formats,
                  }]
@@ -111,10 +125,25 @@ class NRKBaseIE(InfoExtractor):
                      message_type, message_type)),
                  expected=True)
  
-        conviva = data.get('convivaStatistics') or {}
          series = conviva.get('seriesName') or data.get('seriesTitle')
          episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
  
+        season_number = None
+        episode_number = None
+        if data.get('mediaElementType') == 'Episode':
+            _season_episode = data.get('scoresStatistics', {}).get('springStreamStream') or \
+                data.get('relativeOriginUrl', '')
+            EPISODENUM_RE = [
+                r'/s(?P<season>\d{,2})e(?P<episode>\d{,2})\.',
+                r'/sesong-(?P<season>\d{,2})/episode-(?P<episode>\d{,2})',
+            ]
+            season_number = int_or_none(self._search_regex(
+                EPISODENUM_RE, _season_episode, 'season number',
+                default=None, group='season'))
+            episode_number = int_or_none(self._search_regex(
+                EPISODENUM_RE, _season_episode, 'episode number',
+                default=None, group='episode'))
+
          thumbnails = None
          images = data.get('images')
          if images and isinstance(images, dict):
@@ -127,11 +156,15 @@ class NRKBaseIE(InfoExtractor):
                  } for image in web_images if image.get('imageUrl')]
  
          description = data.get('description')
+        category = data.get('mediaAnalytics', {}).get('category')
  
          common_info = {
              'description': description,
              'series': series,
              'episode': episode,
+            'season_number': season_number,
+            'episode_number': episode_number,
+            'categories': [category] if category else None,
              'age_limit': parse_age_limit(data.get('legalAge')),
              'thumbnails': thumbnails,
          }
@@ -194,7 +227,15 @@ class NRKIE(NRKBaseIE):
  
  class NRKTVIE(NRKBaseIE):
      IE_DESC = 'NRK TV and NRK Radio'
-    _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
+    _EPISODE_RE = r'(?P<id>[a-zA-Z]{4}\d{8})'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:tv|radio)\.nrk(?:super)?\.no/
+                            (?:serie/[^/]+|program)/
+                            (?![Ee]pisodes)%s
+                            (?:/\d{2}-\d{2}-\d{4})?
+                            (?:\#del=(?P<part_id>\d+))?
+                    ''' % _EPISODE_RE
      _API_HOST = 'psapi-we.nrk.no'
  
      _TESTS = [{
@@ -206,63 +247,145 @@ class NRKTVIE(NRKBaseIE):
              'title': '20 spørsmål 23.05.2014',
              'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
              'duration': 1741,
+            'series': '20 spørsmål - TV',
+            'episode': '23.05.2014',
          },
      }, {
          'url': 'https://tv.nrk.no/program/mdfp15000514',
-        'md5': '43d0be26663d380603a9cf0c24366531',
          'info_dict': {
              'id': 'MDFP15000514CA',
              'ext': 'mp4',
              'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
              'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
              'duration': 4605,
+            'series': 'Kunnskapskanalen',
+            'episode': '24.05.2014',
+        },
+        'params': {
+            'skip_download': True,
          },
      }, {
          # single playlist video
          'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
-        'md5': 'adbd1dbd813edaf532b0a253780719c2',
          'info_dict': {
              'id': 'MSPO40010515-part2',
              'ext': 'flv',
              'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
              'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
          },
-        'skip': 'Only works from Norway',
+        'params': {
+            'skip_download': True,
+        },
+        'expected_warnings': ['Video is geo restricted'],
+        'skip': 'particular part is not supported currently',
      }, {
          'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
          'playlist': [{
-            'md5': '9480285eff92d64f06e02a5367970a7a',
              'info_dict': {
-                'id': 'MSPO40010515-part1',
-                'ext': 'flv',
-                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
-                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+                'id': 'MSPO40010515AH',
+                'ext': 'mp4',
+                'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 1)',
+                'description': 'md5:c03aba1e917561eface5214020551b7a',
+                'duration': 772,
+                'series': 'Tour de Ski',
+                'episode': '06.01.2015',
+            },
+            'params': {
+                'skip_download': True,
              },
          }, {
-            'md5': 'adbd1dbd813edaf532b0a253780719c2',
              'info_dict': {
-                'id': 'MSPO40010515-part2',
-                'ext': 'flv',
-                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
-                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+                'id': 'MSPO40010515BH',
+                'ext': 'mp4',
+                'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 2)',
+                'description': 'md5:c03aba1e917561eface5214020551b7a',
+                'duration': 6175,
+                'series': 'Tour de Ski',
+                'episode': '06.01.2015',
+            },
+            'params': {
+                'skip_download': True,
              },
          }],
          'info_dict': {
              'id': 'MSPO40010515',
-            'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
-            'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
-            'duration': 6947.52,
+            'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015',
+            'description': 'md5:c03aba1e917561eface5214020551b7a',
+        },
+        'expected_warnings': ['Video is geo restricted'],
+    }, {
+        'url': 'https://tv.nrk.no/serie/anno/KMTE50001317/sesong-3/episode-13',
+        'info_dict': {
+            'id': 'KMTE50001317AA',
+            'ext': 'mp4',
+            'title': 'Anno 13:30',
+            'description': 'md5:11d9613661a8dbe6f9bef54e3a4cbbfa',
+            'duration': 2340,
+            'series': 'Anno',
+            'episode': '13:30',
+            'season_number': 3,
+            'episode_number': 13,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://tv.nrk.no/serie/nytt-paa-nytt/MUHH46000317/27-01-2017',
+        'info_dict': {
+            'id': 'MUHH46000317AA',
+            'ext': 'mp4',
+            'title': 'Nytt på Nytt 27.01.2017',
+            'description': 'md5:5358d6388fba0ea6f0b6d11c48b9eb4b',
+            'duration': 1796,
+            'series': 'Nytt på nytt',
+            'episode': '27.01.2017',
+        },
+        'params': {
+            'skip_download': True,
          },
-        'skip': 'Only works from Norway',
      }, {
          'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
          'only_matching': True,
      }]
  
  
-class NRKPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
+class NRKTVDirekteIE(NRKTVIE):
+    IE_DESC = 'NRK TV Direkte and NRK Radio Direkte'
+    _VALID_URL = r'https?://(?:tv|radio)\.nrk\.no/direkte/(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/direkte/nrk1',
+        'only_matching': True,
+    }, {
+        'url': 'https://radio.nrk.no/direkte/p1_oslo_akershus',
+        'only_matching': True,
+    }]
+
+
+class NRKPlaylistBaseIE(InfoExtractor):
+    def _extract_description(self, webpage):
+        pass
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        entries = [
+            self.url_result('nrk:%s' % video_id, NRKIE.ie_key())
+            for video_id in re.findall(self._ITEM_RE, webpage)
+        ]
+
+        playlist_title = self. _extract_title(webpage)
+        playlist_description = self._extract_description(webpage)
+
+        return self.playlist_result(
+            entries, playlist_id, playlist_title, playlist_description)
+
  
+class NRKPlaylistIE(NRKPlaylistBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
+    _ITEM_RE = r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"'
      _TESTS = [{
          'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',
          'info_dict': {
@@ -281,23 +404,86 @@ class NRKPlaylistIE(InfoExtractor):
          'playlist_count': 5,
      }]
  
+    def _extract_title(self, webpage):
+        return self._og_search_title(webpage, fatal=False)
+
+    def _extract_description(self, webpage):
+        return self._og_search_description(webpage)
+
+
+class NRKTVEpisodesIE(NRKPlaylistBaseIE):
+    _VALID_URL = r'https?://tv\.nrk\.no/program/[Ee]pisodes/[^/]+/(?P<id>\d+)'
+    _ITEM_RE = r'data-episode=["\']%s' % NRKTVIE._EPISODE_RE
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/program/episodes/nytt-paa-nytt/69031',
+        'info_dict': {
+            'id': '69031',
+            'title': 'Nytt på nytt, sesong: 201210',
+        },
+        'playlist_count': 4,
+    }]
+
+    def _extract_title(self, webpage):
+        return self._html_search_regex(
+            r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
+
+
+class NRKTVSeriesIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
+    _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/groenn-glede',
+        'info_dict': {
+            'id': 'groenn-glede',
+            'title': 'Grønn glede',
+            'description': 'md5:7576e92ae7f65da6993cf90ee29e4608',
+        },
+        'playlist_mincount': 9,
+    }, {
+        'url': 'http://tv.nrksuper.no/serie/labyrint',
+        'info_dict': {
+            'id': 'labyrint',
+            'title': 'Labyrint',
+            'description': 'md5:58afd450974c89e27d5a19212eee7115',
+        },
+        'playlist_mincount': 3,
+    }, {
+        'url': 'https://tv.nrk.no/serie/broedrene-dal-og-spektralsteinene',
+        'only_matching': True,
+    }, {
+        'url': 'https://tv.nrk.no/serie/saving-the-human-race',
+        'only_matching': True,
+    }, {
+        'url': 'https://tv.nrk.no/serie/postmann-pat',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if NRKTVIE.suitable(url) else super(NRKTVSeriesIE, cls).suitable(url)
+
      def _real_extract(self, url):
-        playlist_id = self._match_id(url)
+        series_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, playlist_id)
+        webpage = self._download_webpage(url, series_id)
  
          entries = [
-            self.url_result('nrk:%s' % video_id, 'NRK')
-            for video_id in re.findall(
-                r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"',
-                webpage)
+            self.url_result(
+                'https://tv.nrk.no/program/Episodes/{series}/{season}'.format(
+                    series=series_id, season=season_id))
+            for season_id in re.findall(self._ITEM_RE, webpage)
          ]
  
-        playlist_title = self._og_search_title(webpage)
-        playlist_description = self._og_search_description(webpage)
+        title = self._html_search_meta(
+            'seriestitle', webpage,
+            'title', default=None) or self._og_search_title(
+            webpage, fatal=False)
  
-        return self.playlist_result(
-            entries, playlist_id, playlist_title, playlist_description)
+        description = self._html_search_meta(
+            'series_description', webpage,
+            'description', default=None) or self._og_search_description(webpage)
+
+        return self.playlist_result(entries, series_id, title, description)
  
  
  class NRKSkoleIE(InfoExtractor):
diff --git a/youtube_dl/extractor/ntvde.py b/youtube_dl/extractor/ntvde.py

index d28a8154247f75cbc612f7999083cd60275c5a88..101a5374ccd9c780d6f34e8ff82ef67148e8a0cb 100644 (file)
--- a/youtube_dl/extractor/ntvde.py
+++ b/youtube_dl/extractor/ntvde.py
@@ -22,7 +22,7 @@ class NTVDeIE(InfoExtractor):
          'info_dict': {
              'id': '14438086',
              'ext': 'mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'title': 'Schnee und Glätte führen zu zahlreichen Unfällen und Staus',
              'alt_title': 'Winterchaos auf deutschen Straßen',
              'description': 'Schnee und Glätte sorgen deutschlandweit für einen chaotischen Start in die Woche: Auf den Straßen kommt es zu kilometerlangen Staus und Dutzenden Glätteunfällen. In Düsseldorf und München wirbelt der Schnee zudem den Flugplan durcheinander. Dutzende Flüge landen zu spät, einige fallen ganz aus.',
diff --git a/youtube_dl/extractor/ntvru.py b/youtube_dl/extractor/ntvru.py

index 7d7a785ab10e7b71ceb4729a012ebb574c7752d5..4f9cedb84a47a8481b2c4058c5a59483b9a613bd 100644 (file)
--- a/youtube_dl/extractor/ntvru.py
+++ b/youtube_dl/extractor/ntvru.py
@@ -21,7 +21,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
              'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 136,
          },
      }, {
@@ -32,7 +32,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
              'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 172,
          },
      }, {
@@ -43,7 +43,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': '«Сегодня». 21 марта 2014 года. 16:00',
              'description': '«Сегодня». 21 марта 2014 года. 16:00',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 1496,
          },
      }, {
@@ -54,7 +54,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Остросюжетный фильм «Кома»',
              'description': 'Остросюжетный фильм «Кома»',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 5592,
          },
      }, {
@@ -65,7 +65,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': '«Дело врачей»: «Деревце жизни»',
              'description': '«Дело врачей»: «Деревце жизни»',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 2590,
          },
      }]
diff --git a/youtube_dl/extractor/oktoberfesttv.py b/youtube_dl/extractor/oktoberfesttv.py

index 50fbbc79c12761449adc70e74a58f0442f5b9cfa..a914068f958943eddb79a992413d86bad1c343fa 100644 (file)
--- a/youtube_dl/extractor/oktoberfesttv.py
+++ b/youtube_dl/extractor/oktoberfesttv.py
@@ -13,7 +13,7 @@ class OktoberfestTVIE(InfoExtractor):
              'id': 'hb-zelt',
              'ext': 'mp4',
              'title': 're:^Live-Kamera: Hofbräuzelt [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'is_live': True,
          },
          'params': {
diff --git a/youtube_dl/extractor/ondemandkorea.py b/youtube_dl/extractor/ondemandkorea.py

new file mode 100644 (file)

index 0000000..de1d6b0
--- /dev/null
+++ b/youtube_dl/extractor/ondemandkorea.py
@@ -0,0 +1,60 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    ExtractorError,
+    js_to_json,
+)
+
+
+class OnDemandKoreaIE(JWPlatformBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
+    _TEST = {
+        'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
+        'info_dict': {
+            'id': 'ask-us-anything-e43',
+            'ext': 'mp4',
+            'title': 'Ask Us Anything : E43',
+            'thumbnail': r're:^https?://.*\.jpg$',
+        },
+        'params': {
+            'skip_download': 'm3u8 download'
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id, fatal=False)
+
+        if not webpage:
+            # Page sometimes returns captcha page with HTTP 403
+            raise ExtractorError(
+                'Unable to access page. You may have been blocked.',
+                expected=True)
+
+        if 'msg_block_01.png' in webpage:
+            self.raise_geo_restricted(
+                'This content is not available in your region')
+
+        if 'This video is only available to ODK PLUS members.' in webpage:
+            raise ExtractorError(
+                'This video is only available to ODK PLUS members.',
+                expected=True)
+
+        title = self._og_search_title(webpage)
+
+        jw_config = self._parse_json(
+            self._search_regex(
+                r'(?s)jwplayer\(([\'"])(?:(?!\1).)+\1\)\.setup\s*\((?P<options>.+?)\);',
+                webpage, 'jw config', group='options'),
+            video_id, transform_source=js_to_json)
+        info = self._parse_jwplayer_data(
+            jw_config, video_id, require_title=False, m3u8_id='hls',
+            base_url=url)
+
+        info.update({
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage),
+        })
+        return info
diff --git a/youtube_dl/extractor/onionstudios.py b/youtube_dl/extractor/onionstudios.py

index 6fb1a3fcc0bd565677b232adcb883b3649715dde..1d336cf3069d8aae29eeb4e90a7c3f20241cab2e 100644 (file)
--- a/youtube_dl/extractor/onionstudios.py
+++ b/youtube_dl/extractor/onionstudios.py
@@ -22,7 +22,7 @@ class OnionStudiosIE(InfoExtractor):
              'id': '2937',
              'ext': 'mp4',
              'title': 'Hannibal charges forward, stops for a cocktail',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'The A.V. Club',
              'uploader_id': 'the-av-club',
          },
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index c2807d0f61b2ab5134944bd0c79b2030df80d3a1..84be2b1e3fe9fec47e915c13b14a8f23c0383232 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -18,7 +18,7 @@ class OoyalaBaseIE(InfoExtractor):
      _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
      _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
  
-    def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None):
+    def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None, embed_token=None):
          content_tree = self._download_json(content_tree_url, video_id)['content_tree']
          metadata = content_tree[list(content_tree)[0]]
          embed_code = metadata['embed_code']
@@ -29,7 +29,8 @@ class OoyalaBaseIE(InfoExtractor):
              self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
              compat_urllib_parse_urlencode({
                  'domain': domain,
-                'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds',
+                'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth',
+                'embedToken': embed_token,
              }), video_id)
  
          cur_auth_data = auth_data['authorization_data'][embed_code]
@@ -52,6 +53,12 @@ class OoyalaBaseIE(InfoExtractor):
                  elif delivery_type == 'hds' or ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
                          s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
+                elif delivery_type == 'dash' or ext == 'mpd':
+                    formats.extend(self._extract_mpd_formats(
+                        s_url, embed_code, mpd_id='dash', fatal=False))
+                elif delivery_type == 'smooth':
+                    self._extract_ism_formats(
+                        s_url, embed_code, ism_id='mss', fatal=False)
                  elif ext == 'smil':
                      formats.extend(self._extract_smil_formats(
                          s_url, embed_code, fatal=False))
@@ -146,8 +153,9 @@ class OoyalaIE(OoyalaBaseIE):
          embed_code = self._match_id(url)
          domain = smuggled_data.get('domain')
          supportedformats = smuggled_data.get('supportedformats')
+        embed_token = smuggled_data.get('embed_token')
          content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code)
-        return self._extract(content_tree_url, embed_code, domain, supportedformats)
+        return self._extract(content_tree_url, embed_code, domain, supportedformats, embed_token)
  
  
  class OoyalaExternalIE(OoyalaBaseIE):
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

index 7f19b1ba5c3c355977c694334694b51f71a9840c..32289d8976dcf602839546d179c06ef224a79b20 100644 (file)
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -1,25 +1,18 @@
  # coding: utf-8
-from __future__ import unicode_literals, division
+from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_chr,
-    compat_ord,
-)
+from ..compat import compat_chr
  from ..utils import (
      determine_ext,
      ExtractorError,
  )
-from ..jsinterp import (
-    JSInterpreter,
-    _NAME_RE
-)
  
  
  class OpenloadIE(InfoExtractor):
-    _VALID_URL = r'https?://openload\.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
+    _VALID_URL = r'https?://(?:openload\.(?:co|io)|oload\.tv)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
  
      _TESTS = [{
          'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -28,7 +21,7 @@ class OpenloadIE(InfoExtractor):
              'id': 'kUEfGclsU9o',
              'ext': 'mp4',
              'title': 'skyrim_no-audio_1080.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'https://openload.co/embed/rjC09fkPLYs',
@@ -36,7 +29,7 @@ class OpenloadIE(InfoExtractor):
              'id': 'rjC09fkPLYs',
              'ext': 'mp4',
              'title': 'movie.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'subtitles': {
                  'en': [{
                      'ext': 'vtt',
@@ -60,45 +53,16 @@ class OpenloadIE(InfoExtractor):
          # for title and ext
          'url': 'https://openload.co/embed/Sxz5sADo82g/',
          'only_matching': True,
+    }, {
+        'url': 'https://oload.tv/embed/KnG-kKZdcfY/',
+        'only_matching': True,
      }]
  
-    def openload_decode(self, txt):
-        symbol_dict = {
-            '(ﾟДﾟ) [ﾟΘﾟ]': '_',
-            '(ﾟДﾟ) [ﾟωﾟﾉ]': 'a',
-            '(ﾟДﾟ) [ﾟΘﾟﾉ]': 'b',
-            '(ﾟДﾟ) [\'c\']': 'c',
-            '(ﾟДﾟ) [ﾟｰﾟﾉ]': 'd',
-            '(ﾟДﾟ) [ﾟДﾟﾉ]': 'e',
-            '(ﾟДﾟ) [1]': 'f',
-            '(ﾟДﾟ) [\'o\']': 'o',
-            '(oﾟｰﾟo)': 'u',
-            '(ﾟДﾟ) [\'c\']': 'c',
-            '((ﾟｰﾟ) + (o^_^o))': '7',
-            '((o^_^o) +(o^_^o) +(c^_^o))': '6',
-            '((ﾟｰﾟ) + (ﾟΘﾟ))': '5',
-            '(-~3)': '4',
-            '(-~-~1)': '3',
-            '(-~1)': '2',
-            '(-~0)': '1',
-            '((c^_^o)-(c^_^o))': '0',
-        }
-        delim = '(ﾟДﾟ)[ﾟεﾟ]+'
-        end_token = '(ﾟДﾟ)[ﾟoﾟ]'
-        symbols = '|'.join(map(re.escape, symbol_dict.keys()))
-        txt = re.sub('(%s)\+\s?' % symbols, lambda m: symbol_dict[m.group(1)], txt)
-        ret = ''
-        for aacode in re.findall(r'{0}\+\s?{1}(.*?){0}'.format(re.escape(end_token), re.escape(delim)), txt):
-            for aachar in aacode.split(delim):
-                if aachar.isdigit():
-                    ret += compat_chr(int(aachar, 8))
-                else:
-                    m = re.match(r'^u([\da-f]{4})$', aachar)
-                    if m:
-                        ret += compat_chr(int(m.group(1), 16))
-                    else:
-                        self.report_warning("Cannot decode: %s" % aachar)
-        return ret
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+src=["\']((?:https?://)?(?:openload\.(?:co|io)|oload\.tv)/embed/[a-zA-Z0-9-_]+)',
+            webpage)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -107,36 +71,21 @@ class OpenloadIE(InfoExtractor):
          if 'File not found' in webpage or 'deleted by the owner' in webpage:
              raise ExtractorError('File not found', expected=True)
  
-        # The following decryption algorithm is written by @yokrysty and
-        # declared to be freely used in youtube-dl
-        # See https://github.com/rg3/youtube-dl/issues/10408
-        enc_data = self._html_search_regex(
-            r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
-            webpage, 'encrypted data')
+        ol_id = self._search_regex(
+            '<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
+            webpage, 'openload ID')
  
-        enc_code = self._html_search_regex(r'<script[^>]+>(ﾟωﾟ[^<]+)</script>',
-                                           webpage, 'encrypted code')
+        first_three_chars = int(float(ol_id[0:][:3]))
+        fifth_char = int(float(ol_id[3:5]))
+        urlcode = ''
+        num = 5
  
-        js_code = self.openload_decode(enc_code)
-        jsi = JSInterpreter(js_code)
+        while num < len(ol_id):
+            urlcode += compat_chr(int(float(ol_id[num:][:3])) +
+                                  first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2])))
+            num += 5
  
-        m_offset_fun = self._search_regex(r'slice\(0\s*-\s*(%s)\(\)' % _NAME_RE, js_code, 'javascript offset function')
-        m_diff_fun = self._search_regex(r'charCodeAt\(0\)\s*\+\s*(%s)\(\)' % _NAME_RE, js_code, 'javascript diff function')
-
-        offset = jsi.call_function(m_offset_fun)
-        diff = jsi.call_function(m_diff_fun)
-
-        video_url_chars = []
-
-        for idx, c in enumerate(enc_data):
-            j = compat_ord(c)
-            if j >= 33 and j <= 126:
-                j = ((j + 14) % 94) + 33
-            if idx == len(enc_data) - offset:
-                j += diff
-            video_url_chars += compat_chr(j)
-
-        video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
+        video_url = 'https://openload.co/stream/' + urlcode
  
          title = self._og_search_title(webpage, default=None) or self._search_regex(
              r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
@@ -152,8 +101,7 @@ class OpenloadIE(InfoExtractor):
              'thumbnail': self._og_search_thumbnail(webpage, default=None),
              'url': video_url,
              # Seems all videos have extensions in their titles
-            'ext': determine_ext(title),
+            'ext': determine_ext(title, 'mp4'),
              'subtitles': subtitles,
          }
-
          return info_dict
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index b4cce7ea9334c7bbaf9e617932189504dcd25121..1e2c54e68c3eb16b0ee8e9afb7b50b07ae429207 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -247,7 +247,7 @@ class ORFIPTVIE(InfoExtractor):
              'title': 'Weitere Evakuierungen um Vulkan Calbuco',
              'description': 'md5:d689c959bdbcf04efeddedbf2299d633',
              'duration': 68.197,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150425',
          },
      }
diff --git a/youtube_dl/extractor/pandoratv.py b/youtube_dl/extractor/pandoratv.py

index 2b07958bb1f5815a162dadadff4f450f7ea0e97d..89c95fffb6e6ca7279d927424c19be9325e2ac4d 100644 (file)
--- a/youtube_dl/extractor/pandoratv.py
+++ b/youtube_dl/extractor/pandoratv.py
@@ -11,6 +11,7 @@ from ..utils import (
      float_or_none,
      parse_duration,
      str_to_int,
+    urlencode_postdata,
  )
  
  
@@ -25,7 +26,7 @@ class PandoraTVIE(InfoExtractor):
              'ext': 'flv',
              'title': '頭を撫でてくれる？',
              'description': '頭を撫でてくれる？',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 39,
              'upload_date': '20151218',
              'uploader': 'カワイイ動物まとめ',
@@ -56,6 +57,22 @@ class PandoraTVIE(InfoExtractor):
                  r'^v(\d+)[Uu]rl$', format_id, 'height', default=None)
              if not height:
                  continue
+
+            play_url = self._download_json(
+                'http://m.pandora.tv/?c=api&m=play_url', video_id,
+                data=urlencode_postdata({
+                    'prgid': video_id,
+                    'runtime': info.get('runtime'),
+                    'vod_url': format_url,
+                }),
+                headers={
+                    'Origin': url,
+                    'Content-Type': 'application/x-www-form-urlencoded',
+                })
+            format_url = play_url.get('url')
+            if not format_url:
+                continue
+
              formats.append({
                  'format_id': '%sp' % height,
                  'url': format_url,
diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py

index b490ef74c5fb768751d4598ff88e70a13d41c060..6baed773fc6bf741a69f1baf222148065ef169c4 100644 (file)
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -236,7 +236,7 @@ class PBSIE(InfoExtractor):
                  'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
                  'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b',
                  'duration': 6559,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -249,7 +249,7 @@ class PBSIE(InfoExtractor):
                  'description': 'md5:c741d14e979fc53228c575894094f157',
                  'title': 'NOVA - Killer Typhoon',
                  'duration': 3172,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'upload_date': '20140122',
                  'age_limit': 10,
              },
@@ -270,7 +270,7 @@ class PBSIE(InfoExtractor):
                  'title': 'American Experience - Death and the Civil War, Chapter 1',
                  'description': 'md5:67fa89a9402e2ee7d08f53b920674c18',
                  'duration': 682,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  'skip_download': True,  # requires ffmpeg
@@ -286,7 +286,7 @@ class PBSIE(InfoExtractor):
                  'title': 'FRONTLINE - United States of Secrets (Part One)',
                  'description': 'md5:55756bd5c551519cc4b7703e373e217e',
                  'duration': 6851,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -302,7 +302,7 @@ class PBSIE(InfoExtractor):
                  'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
                  'description': 'md5:c0ff7475a4b70261c7e58f493c2792a5',
                  'duration': 1480,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -315,7 +315,7 @@ class PBSIE(InfoExtractor):
                  'title': 'FRONTLINE - The Atomic Artists',
                  'description': 'md5:f677e4520cfacb4a5ce1471e31b57800',
                  'duration': 723,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  'skip_download': True,  # requires ffmpeg
@@ -330,7 +330,7 @@ class PBSIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'FRONTLINE - Netanyahu at War',
                  'duration': 6852,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'formats': 'mincount:8',
              },
          },
@@ -350,6 +350,15 @@ class PBSIE(InfoExtractor):
          410: 'This video has expired and is no longer available for online streaming.',
      }
  
+    def _real_initialize(self):
+        cookie = (self._download_json(
+            'http://localization.services.pbs.org/localize/auto/cookie/',
+            None, headers=self.geo_verification_headers(), fatal=False) or {}).get('cookie')
+        if cookie:
+            station = self._search_regex(r'#?s=\["([^"]+)"', cookie, 'station')
+            if station:
+                self._set_cookie('.pbs.org', 'pbsol.station', station)
+
      def _extract_webpage(self, url):
          mobj = re.match(self._VALID_URL, url)
  
@@ -476,7 +485,8 @@ class PBSIE(InfoExtractor):
  
              redirect_info = self._download_json(
                  '%s?format=json' % redirect['url'], display_id,
-                'Downloading %s video url info' % (redirect_id or num))
+                'Downloading %s video url info' % (redirect_id or num),
+                headers=self.geo_verification_headers())
  
              if redirect_info['status'] == 'error':
                  raise ExtractorError(
@@ -558,7 +568,7 @@ class PBSIE(InfoExtractor):
          # Try turning it to 'program - title' naming scheme if possible
          alt_title = info.get('program', {}).get('title')
          if alt_title:
-            info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + '[\s\-:]+', '', info['title'])
+            info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + r'[\s\-:]+', '', info['title'])
  
          description = info.get('description') or info.get(
              'program', {}).get('description') or description
diff --git a/youtube_dl/extractor/people.py b/youtube_dl/extractor/people.py

index 9ecdbc13b7535765222b422c815e0dc78f2f69b9..6ca95715eec0a2ac74bd5529d65f9a1e6347211f 100644 (file)
--- a/youtube_dl/extractor/people.py
+++ b/youtube_dl/extractor/people.py
@@ -14,7 +14,7 @@ class PeopleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasn’t Defined Us”',
              'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 246.318,
              'timestamp': 1458720585,
              'upload_date': '20160323',
diff --git a/youtube_dl/extractor/phoenix.py b/youtube_dl/extractor/phoenix.py

index ac009f60f7785ea4efaaa7b0c867c10a998e877e..e435c28e171b9a25d3cc839b7b67a6c0d27e2272 100644 (file)
--- a/youtube_dl/extractor/phoenix.py
+++ b/youtube_dl/extractor/phoenix.py
@@ -1,9 +1,9 @@
  from __future__ import unicode_literals
  
-from .zdf import ZDFIE
+from .dreisat import DreiSatIE
  
  
-class PhoenixIE(ZDFIE):
+class PhoenixIE(DreiSatIE):
      IE_NAME = 'phoenix.de'
      _VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/
          (?:
diff --git a/youtube_dl/extractor/piksel.py b/youtube_dl/extractor/piksel.py

new file mode 100644 (file)

index 0000000..c0c276a
--- /dev/null
+++ b/youtube_dl/extractor/piksel.py
@@ -0,0 +1,123 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    dict_get,
+    int_or_none,
+    unescapeHTML,
+    parse_iso8601,
+)
+
+
+class PikselIE(InfoExtractor):
+    _VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
+    _TESTS = [
+        {
+            'url': 'http://player.piksel.com/v/nv60p12f',
+            'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
+            'info_dict': {
+                'id': 'nv60p12f',
+                'ext': 'mp4',
+                'title': 'فن الحياة  - الحلقة 1',
+                'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
+                'timestamp': 1465231790,
+                'upload_date': '20160606',
+            }
+        },
+        {
+            # Original source: http://www.uscourts.gov/cameras-courts/state-washington-vs-donald-j-trump-et-al
+            'url': 'https://player.piksel.com/v/v80kqp41',
+            'md5': '753ddcd8cc8e4fa2dda4b7be0e77744d',
+            'info_dict': {
+                'id': 'v80kqp41',
+                'ext': 'mp4',
+                'title': 'WAW- State of Washington vs. Donald J. Trump, et al',
+                'description': 'State of Washington vs. Donald J. Trump, et al, Case Number 17-CV-00141-JLR, TRO Hearing, Civil Rights Case, 02/3/2017, 1:00 PM (PST), Seattle Federal Courthouse, Seattle, WA, Judge James L. Robart presiding.',
+                'timestamp': 1486171129,
+                'upload_date': '20170204',
+            }
+        }
+    ]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=["\'](?P<url>(?:https?:)?//player\.piksel\.com/v/[a-z0-9]+)',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        app_token = self._search_regex([
+            r'clientAPI\s*:\s*"([^"]+)"',
+            r'data-de-api-key\s*=\s*"([^"]+)"'
+        ], webpage, 'app token')
+        response = self._download_json(
+            'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token,
+            video_id, query={
+                'v': video_id
+            })['response']
+        failure = response.get('failure')
+        if failure:
+            raise ExtractorError(response['failure']['reason'], expected=True)
+        video_data = response['WsProgramResponse']['program']['asset']
+        title = video_data['title']
+
+        formats = []
+
+        m3u8_url = dict_get(video_data, [
+            'm3u8iPadURL',
+            'ipadM3u8Url',
+            'm3u8AndroidURL',
+            'm3u8iPhoneURL',
+            'iphoneM3u8Url'])
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        asset_type = dict_get(video_data, ['assetType', 'asset_type'])
+        for asset_file in video_data.get('assetFiles', []):
+            # TODO: extract rtmp formats
+            http_url = asset_file.get('http_url')
+            if not http_url:
+                continue
+            tbr = None
+            vbr = int_or_none(asset_file.get('videoBitrate'), 1024)
+            abr = int_or_none(asset_file.get('audioBitrate'), 1024)
+            if asset_type == 'video':
+                tbr = vbr + abr
+            elif asset_type == 'audio':
+                tbr = abr
+
+            format_id = ['http']
+            if tbr:
+                format_id.append(compat_str(tbr))
+
+            formats.append({
+                'format_id': '-'.join(format_id),
+                'url': unescapeHTML(http_url),
+                'vbr': vbr,
+                'abr': abr,
+                'width': int_or_none(asset_file.get('videoWidth')),
+                'height': int_or_none(asset_file.get('videoHeight')),
+                'filesize': int_or_none(asset_file.get('filesize')),
+                'tbr': tbr,
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'thumbnail': video_data.get('thumbnailUrl'),
+            'timestamp': parse_iso8601(video_data.get('dateadd')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/pinkbike.py b/youtube_dl/extractor/pinkbike.py

index a52210fabf538bff34cc39029f07778755873b65..6a4580d54c8733166316c80a2361499e85eb9baf 100644 (file)
--- a/youtube_dl/extractor/pinkbike.py
+++ b/youtube_dl/extractor/pinkbike.py
@@ -23,7 +23,7 @@ class PinkbikeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Brandon Semenuk - RAW 100',
              'description': 'Official release: www.redbull.ca/rupertwalker',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 100,
              'upload_date': '20150406',
              'uploader': 'revelco',
diff --git a/youtube_dl/extractor/pladform.py b/youtube_dl/extractor/pladform.py

index 77e1211d6095cf17464ce09a27b756157b4931e9..e38c7618e4d29177721f21a36479b7cbd3d0cf28 100644 (file)
--- a/youtube_dl/extractor/pladform.py
+++ b/youtube_dl/extractor/pladform.py
@@ -34,7 +34,7 @@ class PladformIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Тайны перевала Дятлова • 1 серия 2 часть',
              'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 694,
              'age_limit': 0,
          },
diff --git a/youtube_dl/extractor/playtvak.py b/youtube_dl/extractor/playtvak.py

index 1e8096a259ad5568d87b96bd566f646ae641862f..391e1bd09ca5677d196c0f67a86c0cb1421b2158 100644 (file)
--- a/youtube_dl/extractor/playtvak.py
+++ b/youtube_dl/extractor/playtvak.py
@@ -25,7 +25,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Vyžeňte vosy a sršně ze zahrady',
              'description': 'md5:f93d398691044d303bc4a3de62f3e976',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'duration': 279,
              'timestamp': 1438732860,
              'upload_date': '20150805',
@@ -38,7 +38,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'flv',
              'title': 're:^Přímý přenos iDNES.cz [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'description': 'Sledujte provoz na ranveji Letiště Václava Havla v Praze',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'is_live': True,
          },
          'params': {
@@ -52,7 +52,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Zavřeli jsme mraženou pizzu do auta. Upekla se',
              'description': 'md5:01e73f02329e2e5760bd5eed4d42e3c2',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'duration': 39,
              'timestamp': 1438969140,
              'upload_date': '20150807',
@@ -66,7 +66,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Táhni! Demonstrace proti imigrantům budila emoce',
              'description': 'md5:97c81d589a9491fbfa323c9fa3cca72c',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'timestamp': 1439052180,
              'upload_date': '20150808',
              'is_live': False,
@@ -79,7 +79,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Recesisté udělali z billboardu kolotoč',
              'description': 'md5:7369926049588c3989a66c9c1a043c4c',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'timestamp': 1415725500,
              'upload_date': '20141111',
              'is_live': False,
diff --git a/youtube_dl/extractor/playvid.py b/youtube_dl/extractor/playvid.py

index 79c2db08541e93d1d377c53c3e8adc415f4302e2..4aef186ea22b4dab1be50a0bdd6dbcbbcae1e2b1 100644 (file)
--- a/youtube_dl/extractor/playvid.py
+++ b/youtube_dl/extractor/playvid.py
@@ -34,7 +34,7 @@ class PlayvidIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ellen Euro Cutie Blond Takes a Sexy Survey Get Facial in The Park',
              'age_limit': 18,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }]
  
diff --git a/youtube_dl/extractor/playwire.py b/youtube_dl/extractor/playwire.py

index 0bc7431189a0eed819fb85a6fbbdc1558a4b84ed..4d96a10a7156225140eee7b1332a282316efb928 100644 (file)
--- a/youtube_dl/extractor/playwire.py
+++ b/youtube_dl/extractor/playwire.py
@@ -18,7 +18,7 @@ class PlaywireIE(InfoExtractor):
              'id': '3353705',
              'ext': 'mp4',
              'title': 'S04_RM_UCL_Rus',
-            'thumbnail': 're:^https?://.*\.png$',
+            'thumbnail': r're:^https?://.*\.png$',
              'duration': 145.94,
          },
      }, {
diff --git a/youtube_dl/extractor/pluralsight.py b/youtube_dl/extractor/pluralsight.py

index 0ffd41ecd3b73bdaaba3b27cd1638cdf0383103e..5c798e874837ff1704650fa991d5b07cde8ab210 100644 (file)
--- a/youtube_dl/extractor/pluralsight.py
+++ b/youtube_dl/extractor/pluralsight.py
@@ -157,13 +157,10 @@ class PluralsightIE(PluralsightBaseIE):
  
          display_id = '%s-%s' % (name, clip_id)
  
-        parsed_url = compat_urlparse.urlparse(url)
-
-        payload_url = compat_urlparse.urlunparse(parsed_url._replace(
-            netloc='app.pluralsight.com', path='player/api/v1/payload'))
-
          course = self._download_json(
-            payload_url, display_id, headers={'Referer': url})['payload']['course']
+            'https://app.pluralsight.com/player/user/api/v1/player/payload',
+            display_id, data=urlencode_postdata({'courseId': course_name}),
+            headers={'Referer': url})
  
          collection = course['modules']
  
diff --git a/youtube_dl/extractor/polskieradio.py b/youtube_dl/extractor/polskieradio.py

index 5ff173774a410bf0eba85069f6f1ed33cd583e7b..2ac1fcb0bc90f500696ca0dc29db9f4c911f0d27 100644 (file)
--- a/youtube_dl/extractor/polskieradio.py
+++ b/youtube_dl/extractor/polskieradio.py
@@ -36,7 +36,7 @@ class PolskieRadioIE(InfoExtractor):
                  'timestamp': 1456594200,
                  'upload_date': '20160227',
                  'duration': 2364,
-                'thumbnail': 're:^https?://static\.prsa\.pl/images/.*\.jpg$'
+                'thumbnail': r're:^https?://static\.prsa\.pl/images/.*\.jpg$'
              },
          }],
      }, {
diff --git a/youtube_dl/extractor/porncom.py b/youtube_dl/extractor/porncom.py

index d85e0294df62d7540304f2a8e87c4f989fcc2e07..8218c7d3bf7ddc8cc7de74f2fc5d2d838cecc982 100644 (file)
--- a/youtube_dl/extractor/porncom.py
+++ b/youtube_dl/extractor/porncom.py
@@ -22,7 +22,7 @@ class PornComIE(InfoExtractor):
              'display_id': 'teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec',
              'ext': 'mp4',
              'title': 'Teen grabs a dildo and fucks her pussy live on 1hottie, I rec',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 551,
              'view_count': int,
              'age_limit': 18,
diff --git a/youtube_dl/extractor/pornflip.py b/youtube_dl/extractor/pornflip.py

new file mode 100644 (file)

index 0000000..a4a5d39
--- /dev/null
+++ b/youtube_dl/extractor/pornflip.py
@@ -0,0 +1,92 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_parse_qs,
+    compat_str,
+)
+from ..utils import (
+    int_or_none,
+    try_get,
+    unified_timestamp,
+)
+
+
+class PornFlipIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[0-9A-Za-z]{11})'
+    _TESTS = [{
+        'url': 'https://www.pornflip.com/v/wz7DfNhMmep',
+        'md5': '98c46639849145ae1fd77af532a9278c',
+        'info_dict': {
+            'id': 'wz7DfNhMmep',
+            'ext': 'mp4',
+            'title': '2 Amateurs swallow make his dream cumshots true',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'duration': 112,
+            'timestamp': 1481655502,
+            'upload_date': '20161213',
+            'uploader_id': '106786',
+            'uploader': 'figifoto',
+            'view_count': int,
+            'age_limit': 18,
+        }
+    }, {
+        'url': 'https://www.pornflip.com/embed/wz7DfNhMmep',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'https://www.pornflip.com/v/%s' % video_id, video_id)
+
+        flashvars = compat_parse_qs(self._search_regex(
+            r'<embed[^>]+flashvars=(["\'])(?P<flashvars>(?:(?!\1).)+)\1',
+            webpage, 'flashvars', group='flashvars'))
+
+        title = flashvars['video_vars[title]'][0]
+
+        def flashvar(kind):
+            return try_get(
+                flashvars, lambda x: x['video_vars[%s]' % kind][0], compat_str)
+
+        formats = []
+        for key, value in flashvars.items():
+            if not (value and isinstance(value, list)):
+                continue
+            format_url = value[0]
+            if key == 'video_vars[hds_manifest]':
+                formats.extend(self._extract_mpd_formats(
+                    format_url, video_id, mpd_id='dash', fatal=False))
+                continue
+            height = self._search_regex(
+                r'video_vars\[video_urls\]\[(\d+)', key, 'height', default=None)
+            if not height:
+                continue
+            formats.append({
+                'url': format_url,
+                'format_id': 'http-%s' % height,
+                'height': int_or_none(height),
+            })
+        self._sort_formats(formats)
+
+        uploader = self._html_search_regex(
+            (r'<span[^>]+class="name"[^>]*>\s*<a[^>]+>\s*<strong>(?P<uploader>[^<]+)',
+             r'<meta[^>]+content=(["\'])[^>]*\buploaded by (?P<uploader>.+?)\1'),
+            webpage, 'uploader', fatal=False, group='uploader')
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'title': title,
+            'thumbnail': flashvar('big_thumb'),
+            'duration': int_or_none(flashvar('duration')),
+            'timestamp': unified_timestamp(self._html_search_meta(
+                'uploadDate', webpage, 'timestamp')),
+            'uploader_id': flashvar('author_id'),
+            'uploader': uploader,
+            'view_count': int_or_none(flashvar('views')),
+            'age_limit': 18,
+        }
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

index 8df12eec0d44c371d99b536b55694cfd2211f9d0..842317e6c9cc2312064fae4e61e5703352aa0096 100644 (file)
--- a/youtube_dl/extractor/pornhd.py
+++ b/youtube_dl/extractor/pornhd.py
@@ -21,7 +21,7 @@ class PornHdIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Restroom selfie masturbation',
              'description': 'md5:3748420395e03e31ac96857a8f125b2b',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
              'age_limit': 18,
          }
@@ -35,7 +35,7 @@ class PornHdIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Sierra loves doing laundry',
              'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
              'age_limit': 18,
          },
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index 40dbe6967fac2126b7bf6e6a1245768b3c039c8e..017f6c55219ff3db0cf9bf745f74031882f12c54 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -156,7 +156,12 @@ class PornHubIE(InfoExtractor):
          comment_count = self._extract_count(
              r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
  
-        video_urls = list(map(compat_urllib_parse_unquote, re.findall(r"player_quality_[0-9]{3}p\s*=\s*'([^']+)'", webpage)))
+        video_urls = []
+        for quote, video_url in re.findall(
+                r'player_quality_[0-9]{3,4}p\s*=\s*(["\'])(.+?)\1;', webpage):
+            video_urls.append(compat_urllib_parse_unquote(re.sub(
+                r'{0}\s*\+\s*{0}'.format(quote), '', video_url)))
+
          if webpage.find('"encrypted":true') != -1:
              password = compat_urllib_parse_unquote_plus(
                  self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
@@ -229,7 +234,14 @@ class PornHubPlaylistBaseIE(InfoExtractor):
  
          webpage = self._download_webpage(url, playlist_id)
  
-        entries = self._extract_entries(webpage)
+        # Only process container div with main playlist content skipping
+        # drop-down menu that uses similar pattern for videos (see
+        # https://github.com/rg3/youtube-dl/issues/11594).
+        container = self._search_regex(
+            r'(?s)(<div[^>]+class=["\']container.+)', webpage,
+            'container', default=webpage)
+
+        entries = self._extract_entries(container)
  
          playlist = self._parse_json(
              self._search_regex(
@@ -243,12 +255,12 @@ class PornHubPlaylistBaseIE(InfoExtractor):
  class PornHubPlaylistIE(PornHubPlaylistBaseIE):
      _VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
      _TESTS = [{
-        'url': 'http://www.pornhub.com/playlist/6201671',
+        'url': 'http://www.pornhub.com/playlist/4667351',
          'info_dict': {
-            'id': '6201671',
-            'title': 'P0p4',
+            'id': '4667351',
+            'title': 'Nataly Hot',
          },
-        'playlist_mincount': 35,
+        'playlist_mincount': 2,
      }]
  
  
diff --git a/youtube_dl/extractor/pornotube.py b/youtube_dl/extractor/pornotube.py

index 63816c3588cebe889e77a24a928cc789ef07c7d5..1b5b9a320dcd31a0f28ad6ed8a20555008072d88 100644 (file)
--- a/youtube_dl/extractor/pornotube.py
+++ b/youtube_dl/extractor/pornotube.py
@@ -19,7 +19,7 @@ class PornotubeIE(InfoExtractor):
              'description': 'md5:a8304bef7ef06cb4ab476ca6029b01b0',
              'categories': ['Adult Humor', 'Blondes'],
              'uploader': 'Alpha Blue Archives',
-            'thumbnail': 're:^https?://.*\\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1417582800,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/pornovoisines.py b/youtube_dl/extractor/pornovoisines.py

index 58f557e3995f25a3787018150c953cb088e4fe81..b6b71069d31070644afd46e2d92a939e4b33f744 100644 (file)
--- a/youtube_dl/extractor/pornovoisines.py
+++ b/youtube_dl/extractor/pornovoisines.py
@@ -23,7 +23,7 @@ class PornoVoisinesIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Recherche appartement',
              'description': 'md5:fe10cb92ae2dd3ed94bb4080d11ff493',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140925',
              'duration': 120,
              'view_count': int,
diff --git a/youtube_dl/extractor/pornoxo.py b/youtube_dl/extractor/pornoxo.py

index 3c9087f2dfe3caa30c879f4905e857a046fd789c..1a0cce7e0274bb4a06bf9b0604d9ebdf75cf3df5 100644 (file)
--- a/youtube_dl/extractor/pornoxo.py
+++ b/youtube_dl/extractor/pornoxo.py
@@ -20,7 +20,7 @@ class PornoXOIE(JWPlatformBaseIE):
              'display_id': 'striptease-from-sexy-secretary',
              'description': 'md5:0ee35252b685b3883f4a1d38332f9980',
              'categories': list,  # NSFW
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }
diff --git a/youtube_dl/extractor/presstv.py b/youtube_dl/extractor/presstv.py

index 2da93ed348671a363120d03cb626a4e9d808fd9d..b5c279203b9486e765f6e79f2d9ce8b67acf73b1 100644 (file)
--- a/youtube_dl/extractor/presstv.py
+++ b/youtube_dl/extractor/presstv.py
@@ -19,7 +19,7 @@ class PressTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Organic mattresses used to clean waste water',
              'upload_date': '20160409',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'description': 'md5:20002e654bbafb6908395a5c0cfcd125'
          }
      }
diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py

index d40cca06f989b7c99329e1650497a06e9a6390e4..23ac93d7e248bce034fcb221d26089d8be412ee2 100644 (file)
--- a/youtube_dl/extractor/promptfile.py
+++ b/youtube_dl/extractor/promptfile.py
@@ -20,7 +20,7 @@ class PromptFileIE(InfoExtractor):
              'id': '86D1CE8462-576CAAE416',
              'ext': 'mp4',
              'title': 'oceans.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index 7cc07a2ad5b88c51aa9f5d339839fd743727e17e..5091d8456faf3a4841ba770ea408aa75be806f83 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -85,6 +85,9 @@ class ProSiebenSat1BaseIE(InfoExtractor):
                      formats.extend(self._extract_m3u8_formats(
                          source_url, clip_id, 'mp4', 'm3u8_native',
                          m3u8_id='hls', fatal=False))
+                elif mimetype == 'application/dash+xml':
+                    formats.extend(self._extract_mpd_formats(
+                        source_url, clip_id, mpd_id='dash', fatal=False))
                  else:
                      tbr = fix_bitrate(source['bitrate'])
                      if protocol in ('rtmp', 'rtmpe'):
@@ -144,16 +147,12 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
              'info_dict': {
                  'id': '2104602',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Episode 18 - Staffel 2',
                  'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                  'upload_date': '20131231',
                  'duration': 5845.04,
              },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            },
          },
          {
              'url': 'http://www.prosieben.de/videokatalog/Gesellschaft/Leben/Trends/video-Lady-Umstyling-f%C3%BCr-Audrina-Rebekka-Audrina-Fergen-billig-aussehen-Battal-Modica-700544.html',
@@ -255,7 +254,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
              'info_dict': {
                  'id': '2572814',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Andreas Kümmert: Rocket Man',
                  'description': 'md5:6ddb02b0781c6adf778afea606652e38',
                  'upload_date': '20131017',
@@ -269,7 +268,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
              'info_dict': {
                  'id': '2156342',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Kurztrips zum Valentinstag',
                  'description': 'Romantischer Kurztrip zum Valentinstag? Nina Heinemann verrät, was sich hier wirklich lohnt.',
                  'duration': 307.24,
@@ -286,12 +285,13 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
                  'description': 'md5:63b8963e71f481782aeea877658dec84',
              },
              'playlist_count': 2,
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
              'info_dict': {
                  'id': '4187506',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Best of Circus HalliGalli',
                  'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
                  'upload_date': '20151229',
@@ -372,7 +372,9 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
          title = self._html_search_regex(self._TITLE_REGEXES, webpage, 'title')
          info = self._extract_video_info(url, clip_id)
          description = self._html_search_regex(
-            self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
+            self._DESCRIPTION_REGEXES, webpage, 'description', default=None)
+        if description is None:
+            description = self._og_search_description(webpage)
          thumbnail = self._og_search_thumbnail(webpage)
          upload_date = unified_strdate(self._html_search_regex(
              self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
@@ -391,7 +393,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
          playlist = self._parse_json(
              self._search_regex(
-                'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
+                r'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
                  webpage, 'playlist'),
              playlist_id)
          entries = []
diff --git a/youtube_dl/extractor/qqmusic.py b/youtube_dl/extractor/qqmusic.py

index 37cb9e2c9dded7c9fa6e1e9eeef4ebeccdf9b4a9..17c27da46da7576205afba3c53254728f711d974 100644 (file)
--- a/youtube_dl/extractor/qqmusic.py
+++ b/youtube_dl/extractor/qqmusic.py
@@ -29,7 +29,7 @@ class QQMusicIE(InfoExtractor):
              'release_date': '20141227',
              'creator': '林俊杰',
              'description': 'md5:d327722d0361576fde558f1ac68a7065',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'note': 'There is no mp3-320 version of this song.',
@@ -42,7 +42,7 @@ class QQMusicIE(InfoExtractor):
              'release_date': '20050626',
              'creator': '李季美',
              'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'note': 'lyrics not in .lrc format',
@@ -54,7 +54,7 @@ class QQMusicIE(InfoExtractor):
              'release_date': '19970225',
              'creator': 'Dark Funeral',
              'description': 'md5:ed14d5bd7ecec19609108052c25b2c11',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/r7.py b/youtube_dl/extractor/r7.py

index 069dbfaed0638e396d024ec81d5142d18f9ad90f..ed38c77ebb6bdeaacabff4b565fe121ee86d07fb 100644 (file)
--- a/youtube_dl/extractor/r7.py
+++ b/youtube_dl/extractor/r7.py
@@ -23,7 +23,7 @@ class R7IE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Policiais humilham suspeito à beira da morte: "Morre com dignidade"',
              'description': 'md5:01812008664be76a6479aa58ec865b72',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 98,
              'like_count': int,
              'view_count': int,
diff --git a/youtube_dl/extractor/radiobremen.py b/youtube_dl/extractor/radiobremen.py

index 0aa8d059bf81dffd28df727650b20aafc49302eb..2c35f9845177b6bda4dd352f207ee2e36efdcb8d 100644 (file)
--- a/youtube_dl/extractor/radiobremen.py
+++ b/youtube_dl/extractor/radiobremen.py
@@ -20,7 +20,7 @@ class RadioBremenIE(InfoExtractor):
              'duration': 178,
              'width': 512,
              'title': 'Druck auf Patrick Öztürk',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'description': 'Gegen den SPD-Bürgerschaftsabgeordneten Patrick Öztürk wird wegen Beihilfe zum gewerbsmäßigen Betrug ermittelt. Am Donnerstagabend sollte er dem Vorstand des SPD-Unterbezirks Bremerhaven dazu Rede und Antwort stehen.',
          },
      }
diff --git a/youtube_dl/extractor/radiocanada.py b/youtube_dl/extractor/radiocanada.py

index 321917ad0810c6ddfe1d8586ba31570251fe012e..3b40002a8f4bf7480e15f253fc8b17d0c2d6e7ca 100644 (file)
--- a/youtube_dl/extractor/radiocanada.py
+++ b/youtube_dl/extractor/radiocanada.py
@@ -54,9 +54,8 @@ class RadioCanadaIE(InfoExtractor):
              raise ExtractorError('This video is DRM protected.', expected=True)
  
          device_types = ['ipad']
-        if app_code != 'toutv':
-            device_types.append('flash')
          if not smuggled_data:
+            device_types.append('flash')
              device_types.append('android')
  
          formats = []
@@ -103,7 +102,7 @@ class RadioCanadaIE(InfoExtractor):
                          continue
                      f_url = re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url)
                      protocol = determine_protocol({'url': f_url})
-                    formats.append({
+                    f = {
                          'format_id': '%s-%d' % (protocol, tbr),
                          'url': f_url,
                          'ext': 'flv' if protocol == 'rtmp' else ext,
@@ -111,7 +110,14 @@ class RadioCanadaIE(InfoExtractor):
                          'width': int_or_none(url_e.get('width')),
                          'height': int_or_none(url_e.get('height')),
                          'tbr': tbr,
-                    })
+                    }
+                    mobj = re.match(r'(?P<url>rtmp://[^/]+/[^/]+)/(?P<playpath>[^?]+)(?P<auth>\?.+)', f_url)
+                    if mobj:
+                        f.update({
+                            'url': mobj.group('url') + mobj.group('auth'),
+                            'play_path': mobj.group('playpath'),
+                        })
+                    formats.append(f)
                      if protocol == 'rtsp':
                          base_url = self._search_regex(
                              r'rtsp://([^?]+)', f_url, 'base url', default=None)
diff --git a/youtube_dl/extractor/radiode.py b/youtube_dl/extractor/radiode.py

index aa5f6f8ad41d1dcdb3cb975e2fcf883c8d3ac7f9..2c06c8b1e416c4d8c0d98f3a917c0823064458f8 100644 (file)
--- a/youtube_dl/extractor/radiode.py
+++ b/youtube_dl/extractor/radiode.py
@@ -13,7 +13,7 @@ class RadioDeIE(InfoExtractor):
              'ext': 'mp3',
              'title': 're:^NDR 2 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'description': 'md5:591c49c702db1a33751625ebfb67f273',
-            'thumbnail': 're:^https?://.*\.png',
+            'thumbnail': r're:^https?://.*\.png',
              'is_live': True,
          },
          'params': {
diff --git a/youtube_dl/extractor/radiojavan.py b/youtube_dl/extractor/radiojavan.py

index ec4fa6e602ea779dd6d3a530ea6cfb639eee3cf4..a53ad97a56ef9000ea5ed65fbf0e24276b03f6f3 100644 (file)
--- a/youtube_dl/extractor/radiojavan.py
+++ b/youtube_dl/extractor/radiojavan.py
@@ -18,7 +18,7 @@ class RadioJavanIE(InfoExtractor):
              'id': 'chaartaar-ashoobam',
              'ext': 'mp4',
              'title': 'Chaartaar - Ashoobam',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'upload_date': '20150215',
              'view_count': int,
              'like_count': int,
diff --git a/youtube_dl/extractor/rai.py b/youtube_dl/extractor/rai.py

index dc640b1bcb58ddb79c89e5f2346a5bc5c63a3547..41afbd9afa5472fdbd782db06f392abd587cf570 100644 (file)
--- a/youtube_dl/extractor/rai.py
+++ b/youtube_dl/extractor/rai.py
@@ -120,7 +120,7 @@ class RaiTVIE(RaiBaseIE):
                  'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
                  'upload_date': '20140407',
                  'duration': 6160,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              }
          },
          {
@@ -133,7 +133,7 @@ class RaiTVIE(RaiBaseIE):
                  'title': 'TG PRIMO TEMPO',
                  'upload_date': '20140612',
                  'duration': 1758,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Geo-restricted to Italy',
          },
@@ -169,7 +169,7 @@ class RaiTVIE(RaiBaseIE):
                  'description': 'md5:364b604f7db50594678f483353164fb8',
                  'upload_date': '20140923',
                  'duration': 386,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              }
          },
      ]
diff --git a/youtube_dl/extractor/rbmaradio.py b/youtube_dl/extractor/rbmaradio.py

index 471928ef86b5d434953fc694eef0bb7da8edd334..53b82fba3964b519fe3829ad1f7384755e943b7c 100644 (file)
--- a/youtube_dl/extractor/rbmaradio.py
+++ b/youtube_dl/extractor/rbmaradio.py
@@ -22,7 +22,7 @@ class RBMARadioIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Main Stage - Ford & Lopatin',
              'description': 'md5:4f340fb48426423530af5a9d87bd7b91',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2452,
              'timestamp': 1307103164,
              'upload_date': '20110603',
diff --git a/youtube_dl/extractor/reuters.py b/youtube_dl/extractor/reuters.py

index 961d504eb261cf4fd05eb50f16401abde5076d42..9dc482d21634965b3a16be80c08ee3b2952eee6c 100644 (file)
--- a/youtube_dl/extractor/reuters.py
+++ b/youtube_dl/extractor/reuters.py
@@ -32,7 +32,7 @@ class ReutersIE(InfoExtractor):
              webpage, 'video data'))
  
          def get_json_value(key, fatal=False):
-            return self._search_regex('"%s"\s*:\s*"([^"]+)"' % key, video_data, key, fatal=fatal)
+            return self._search_regex(r'"%s"\s*:\s*"([^"]+)"' % key, video_data, key, fatal=fatal)
  
          title = unescapeHTML(get_json_value('title', fatal=True))
          mmid, fid = re.search(r',/(\d+)\?f=(\d+)', get_json_value('flv', fatal=True)).groups()
diff --git a/youtube_dl/extractor/reverbnation.py b/youtube_dl/extractor/reverbnation.py

index 4875009e5cafd68867b67393d36d90625e5f29c8..4cb99c244c34369902d085a60b068c5991d37ad8 100644 (file)
--- a/youtube_dl/extractor/reverbnation.py
+++ b/youtube_dl/extractor/reverbnation.py
@@ -18,7 +18,7 @@ class ReverbNationIE(InfoExtractor):
              'title': 'MONA LISA',
              'uploader': 'ALKILADOS',
              'uploader_id': '216429',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/ro220.py b/youtube_dl/extractor/ro220.py

index 962b524e94d2bddd12781b8dace06a1b28bc2c71..69934ef2b042903afeb4ca6bada108ec39caf2d1 100644 (file)
--- a/youtube_dl/extractor/ro220.py
+++ b/youtube_dl/extractor/ro220.py
@@ -14,7 +14,7 @@ class Ro220IE(InfoExtractor):
              'id': 'LYV6doKo7f',
              'ext': 'mp4',
              'title': 'Luati-le Banii sez 4 ep 1',
-            'description': 're:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$',
+            'description': r're:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$',
          }
      }
  
diff --git a/youtube_dl/extractor/rockstargames.py b/youtube_dl/extractor/rockstargames.py

index 48128e219bf468a1e3a01ec0f1116304ced1221d..cd6904bc935ef4c3a308cee47cff56602a8de691 100644 (file)
--- a/youtube_dl/extractor/rockstargames.py
+++ b/youtube_dl/extractor/rockstargames.py
@@ -18,7 +18,7 @@ class RockstarGamesIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Further Adventures in Finance and Felony Trailer',
              'description': 'md5:6d31f55f30cb101b5476c4a379e324a3',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1464876000,
              'upload_date': '20160602',
          }
diff --git a/youtube_dl/extractor/roosterteeth.py b/youtube_dl/extractor/roosterteeth.py

index f5b2f560c7f70c4e341aaf38f9718ea4994b811b..46dfc78f5edac0e9e8ef66f37efa4bbd7afcf3ea 100644 (file)
--- a/youtube_dl/extractor/roosterteeth.py
+++ b/youtube_dl/extractor/roosterteeth.py
@@ -26,7 +26,7 @@ class RoosterTeethIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
              'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
-            'thumbnail': 're:^https?://.*\.png$',
+            'thumbnail': r're:^https?://.*\.png$',
              'series': 'Million Dollars, But...',
              'episode': 'Million Dollars, But... The Game Announcement',
              'comment_count': int,
diff --git a/youtube_dl/extractor/rottentomatoes.py b/youtube_dl/extractor/rottentomatoes.py

index 1d404d20aa8b2223c68cada46e4bfe87613eb6ae..14c8e823698174f60890d9c27535e1dce40c9ce6 100644 (file)
--- a/youtube_dl/extractor/rottentomatoes.py
+++ b/youtube_dl/extractor/rottentomatoes.py
@@ -14,7 +14,7 @@ class RottenTomatoesIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Toy Story 3',
              'description': 'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/rte.py b/youtube_dl/extractor/rte.py

index ebe563ebb89e86e28a6bf55669cd066aca44d851..a6fac6c35d00327c2858f9aead301845c4af572a 100644 (file)
--- a/youtube_dl/extractor/rte.py
+++ b/youtube_dl/extractor/rte.py
@@ -4,118 +4,31 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
      float_or_none,
      parse_iso8601,
      unescapeHTML,
+    ExtractorError,
  )
  
  
-class RteIE(InfoExtractor):
-    IE_NAME = 'rte'
-    IE_DESC = 'Raidió Teilifís Éireann TV'
-    _VALID_URL = r'https?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/[^/]+/(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://www.rte.ie/player/ie/show/iwitness-862/10478715/',
-        'info_dict': {
-            'id': '10478715',
-            'ext': 'flv',
-            'title': 'Watch iWitness  online',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'iWitness : The spirit of Ireland, one voice and one minute at a time.',
-            'duration': 60.046,
-        },
-        'params': {
-            'skip_download': 'f4m fails with --test atm'
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._og_search_title(webpage)
-        description = self._html_search_meta('description', webpage, 'description')
-        duration = float_or_none(self._html_search_meta(
-            'duration', webpage, 'duration', fatal=False), 1000)
-
-        thumbnail = None
-        thumbnail_meta = self._html_search_meta('thumbnail', webpage)
-        if thumbnail_meta:
-            thumbnail_id = self._search_regex(
-                r'uri:irus:(.+)', thumbnail_meta,
-                'thumbnail id', fatal=False)
-            if thumbnail_id:
-                thumbnail = 'http://img.rasset.ie/%s.jpg' % thumbnail_id
-
-        feeds_url = self._html_search_meta('feeds-prefix', webpage, 'feeds url') + video_id
-        json_string = self._download_json(feeds_url, video_id)
-
-        # f4m_url = server + relative_url
-        f4m_url = json_string['shows'][0]['media:group'][0]['rte:server'] + json_string['shows'][0]['media:group'][0]['url']
-        f4m_formats = self._extract_f4m_formats(f4m_url, video_id)
-        self._sort_formats(f4m_formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': f4m_formats,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-        }
-
-
-class RteRadioIE(InfoExtractor):
-    IE_NAME = 'rte:radio'
-    IE_DESC = 'Raidió Teilifís Éireann radio'
-    # Radioplayer URLs have two distinct specifier formats,
-    # the old format #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
-    # the new format #!rii=b<channel_id>_<id>_<playable_item_id>_<date>_
-    # where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
-    # An <id> uniquely defines an individual recording, and is the only part we require.
-    _VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:b?[0-9]*)(?:%3A|:|%5F|_)(?P<id>[0-9]+)'
-
-    _TESTS = [{
-        # Old-style player URL; HLS and RTMPE formats
-        'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
-        'info_dict': {
-            'id': '10507902',
-            'ext': 'mp4',
-            'title': 'Gloria',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'md5:9ce124a7fb41559ec68f06387cabddf0',
-            'timestamp': 1451203200,
-            'upload_date': '20151227',
-            'duration': 7230.0,
-        },
-        'params': {
-            'skip_download': 'f4m fails with --test atm'
-        }
-    }, {
-        # New-style player URL; RTMPE formats only
-        'url': 'http://rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=b16_3250678_8861_06-04-2012_',
-        'info_dict': {
-            'id': '3250678',
-            'ext': 'flv',
-            'title': 'The Lyric Concert with Paul Herriott',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': '',
-            'timestamp': 1333742400,
-            'upload_date': '20120406',
-            'duration': 7199.016,
-        },
-        'params': {
-            'skip_download': 'f4m fails with --test atm'
-        }
-    }]
-
+class RteBaseIE(InfoExtractor):
      def _real_extract(self, url):
          item_id = self._match_id(url)
  
-        json_string = self._download_json(
-            'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
-            item_id)
+        try:
+            json_string = self._download_json(
+                'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
+                item_id)
+        except ExtractorError as ee:
+            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
+                error_info = self._parse_json(ee.cause.read().decode(), item_id, fatal=False)
+                if error_info:
+                    raise ExtractorError(
+                        '%s said: %s' % (self.IE_NAME, error_info['message']),
+                        expected=True)
+            raise
  
          # NB the string values in the JSON are stored using XML escaping(!)
          show = json_string['shows'][0]
@@ -163,3 +76,67 @@ class RteRadioIE(InfoExtractor):
              'duration': duration,
              'formats': formats,
          }
+
+
+class RteIE(RteBaseIE):
+    IE_NAME = 'rte'
+    IE_DESC = 'Raidió Teilifís Éireann TV'
+    _VALID_URL = r'https?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/[^/]+/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.rte.ie/player/ie/show/iwitness-862/10478715/',
+        'md5': '4a76eb3396d98f697e6e8110563d2604',
+        'info_dict': {
+            'id': '10478715',
+            'ext': 'mp4',
+            'title': 'iWitness',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': 'The spirit of Ireland, one voice and one minute at a time.',
+            'duration': 60.046,
+            'upload_date': '20151012',
+            'timestamp': 1444694160,
+        },
+    }
+
+
+class RteRadioIE(RteBaseIE):
+    IE_NAME = 'rte:radio'
+    IE_DESC = 'Raidió Teilifís Éireann radio'
+    # Radioplayer URLs have two distinct specifier formats,
+    # the old format #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
+    # the new format #!rii=b<channel_id>_<id>_<playable_item_id>_<date>_
+    # where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
+    # An <id> uniquely defines an individual recording, and is the only part we require.
+    _VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:b?[0-9]*)(?:%3A|:|%5F|_)(?P<id>[0-9]+)'
+
+    _TESTS = [{
+        # Old-style player URL; HLS and RTMPE formats
+        'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
+        'md5': 'c79ccb2c195998440065456b69760411',
+        'info_dict': {
+            'id': '10507902',
+            'ext': 'mp4',
+            'title': 'Gloria',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': 'md5:9ce124a7fb41559ec68f06387cabddf0',
+            'timestamp': 1451203200,
+            'upload_date': '20151227',
+            'duration': 7230.0,
+        },
+    }, {
+        # New-style player URL; RTMPE formats only
+        'url': 'http://rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=b16_3250678_8861_06-04-2012_',
+        'info_dict': {
+            'id': '3250678',
+            'ext': 'flv',
+            'title': 'The Lyric Concert with Paul Herriott',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': '',
+            'timestamp': 1333742400,
+            'upload_date': '20120406',
+            'duration': 7199.016,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }]
diff --git a/youtube_dl/extractor/rtl2.py b/youtube_dl/extractor/rtl2.py

index cb4ee88033ba1d761faac452de724a6c44f08503..721ee733ce38c7e4a95c7806b10bbbd346166451 100644 (file)
--- a/youtube_dl/extractor/rtl2.py
+++ b/youtube_dl/extractor/rtl2.py
@@ -2,7 +2,9 @@
  from __future__ import unicode_literals
  
  import re
+
  from .common import InfoExtractor
+from ..utils import int_or_none
  
  
  class RTL2IE(InfoExtractor):
@@ -13,7 +15,7 @@ class RTL2IE(InfoExtractor):
              'id': 'folge-203-0',
              'ext': 'f4v',
              'title': 'GRIP sucht den Sommerkönig',
-            'description': 'Matthias, Det und Helge treten gegeneinander an.'
+            'description': 'md5:e3adbb940fd3c6e76fa341b8748b562f'
          },
          'params': {
              # rtmp download
@@ -25,7 +27,7 @@ class RTL2IE(InfoExtractor):
              'id': '21040-anna-erwischt-alex',
              'ext': 'mp4',
              'title': 'Anna erwischt Alex!',
-            'description': 'Anna ist Alex\' Tochter bei Köln 50667.'
+            'description': 'Anna nimmt ihrem Vater nicht ab, dass er nicht spielt. Und tatsächlich erwischt sie ihn auf frischer Tat.'
          },
          'params': {
              # rtmp download
@@ -52,34 +54,47 @@ class RTL2IE(InfoExtractor):
                  r'vico_id\s*:\s*([0-9]+)', webpage, 'vico_id')
              vivi_id = self._html_search_regex(
                  r'vivi_id\s*:\s*([0-9]+)', webpage, 'vivi_id')
-        info_url = 'http://www.rtl2.de/video/php/get_video.php?vico_id=' + vico_id + '&vivi_id=' + vivi_id
  
-        info = self._download_json(info_url, video_id)
+        info = self._download_json(
+            'http://www.rtl2.de/sites/default/modules/rtl2/mediathek/php/get_video_jw.php',
+            video_id, query={
+                'vico_id': vico_id,
+                'vivi_id': vivi_id,
+            })
          video_info = info['video']
          title = video_info['titel']
-        description = video_info.get('beschreibung')
-        thumbnail = video_info.get('image')
  
-        download_url = video_info['streamurl']
-        download_url = download_url.replace('\\', '')
-        stream_url = 'mp4:' + self._html_search_regex(r'ondemand/(.*)', download_url, 'stream URL')
-        rtmp_conn = ['S:connect', 'O:1', 'NS:pageUrl:' + url, 'NB:fpad:0', 'NN:videoFunction:1', 'O:0']
+        formats = []
+
+        rtmp_url = video_info.get('streamurl')
+        if rtmp_url:
+            rtmp_url = rtmp_url.replace('\\', '')
+            stream_url = 'mp4:' + self._html_search_regex(r'/ondemand/(.+)', rtmp_url, 'stream URL')
+            rtmp_conn = ['S:connect', 'O:1', 'NS:pageUrl:' + url, 'NB:fpad:0', 'NN:videoFunction:1', 'O:0']
+
+            formats.append({
+                'format_id': 'rtmp',
+                'url': rtmp_url,
+                'play_path': stream_url,
+                'player_url': 'http://www.rtl2.de/flashplayer/vipo_player.swf',
+                'page_url': url,
+                'flash_version': 'LNX 11,2,202,429',
+                'rtmp_conn': rtmp_conn,
+                'no_resume': True,
+                'preference': 1,
+            })
+
+        m3u8_url = video_info.get('streamurl_hls')
+        if m3u8_url:
+            formats.extend(self._extract_akamai_formats(m3u8_url, video_id))
  
-        formats = [{
-            'url': download_url,
-            'play_path': stream_url,
-            'player_url': 'http://www.rtl2.de/flashplayer/vipo_player.swf',
-            'page_url': url,
-            'flash_version': 'LNX 11,2,202,429',
-            'rtmp_conn': rtmp_conn,
-            'no_resume': True,
-        }]
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
-            'thumbnail': thumbnail,
-            'description': description,
+            'thumbnail': video_info.get('image'),
+            'description': video_info.get('beschreibung'),
+            'duration': int_or_none(video_info.get('duration')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/rtlnl.py b/youtube_dl/extractor/rtlnl.py

index f0250af8a4c0e06e2642c9cced9f6b330dc1d4f6..54076de280f4e4ae24321c875f553cfcd146a574 100644 (file)
--- a/youtube_dl/extractor/rtlnl.py
+++ b/youtube_dl/extractor/rtlnl.py
@@ -40,7 +40,7 @@ class RtlNlIE(InfoExtractor):
              'ext': 'mp4',
              'timestamp': 1424039400,
              'title': 'RTL Nieuws - Nieuwe beelden Kopenhagen: chaos direct na aanslag',
-            'thumbnail': 're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
+            'thumbnail': r're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
              'upload_date': '20150215',
              'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
          }
@@ -52,7 +52,7 @@ class RtlNlIE(InfoExtractor):
              'id': 'f536aac0-1dc3-4314-920e-3bd1c5b3811a',
              'ext': 'mp4',
              'title': 'RTL Nieuws - Meer beelden van overval juwelier',
-            'thumbnail': 're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
+            'thumbnail': r're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
              'timestamp': 1437233400,
              'upload_date': '20150718',
              'duration': 30.474,
diff --git a/youtube_dl/extractor/rtp.py b/youtube_dl/extractor/rtp.py

index 82b323cdd4e40b027d3a6c2c06e9ea9d58b171e2..533ee27cbefcb1bb19bc2bedf06163a0c64d74e2 100644 (file)
--- a/youtube_dl/extractor/rtp.py
+++ b/youtube_dl/extractor/rtp.py
@@ -16,7 +16,7 @@ class RTPIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Paixões Cruzadas',
              'description': 'As paixões musicais de António Cartaxo e António Macedo',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'params': {
              # rtmp download
diff --git a/youtube_dl/extractor/rts.py b/youtube_dl/extractor/rts.py

index 3cc32847b7d0ffb937465a4b5f2d9f33f864bc09..48f17b828c1a8c6b0494ad1edb2bc3f914b3417a 100644 (file)
--- a/youtube_dl/extractor/rts.py
+++ b/youtube_dl/extractor/rts.py
@@ -4,27 +4,24 @@ from __future__ import unicode_literals
  import re
  
  from .srgssr import SRGSSRIE
-from ..compat import (
-    compat_str,
-    compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
      parse_duration,
      parse_iso8601,
      unescapeHTML,
-    xpath_text,
+    determine_ext,
  )
  
  
  class RTSIE(SRGSSRIE):
      IE_DESC = 'RTS.ch'
-    _VALID_URL = r'rts:(?P<rts_id>\d+)|https?://(?:www\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html'
+    _VALID_URL = r'rts:(?P<rts_id>\d+)|https?://(?:.+?\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html'
  
      _TESTS = [
          {
              'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
-            'md5': 'f254c4b26fb1d3c183793d52bc40d3e7',
+            'md5': 'ff7f8450a90cf58dacb64e29707b4a8e',
              'info_dict': {
                  'id': '3449373',
                  'display_id': 'les-enfants-terribles',
@@ -35,38 +32,20 @@ class RTSIE(SRGSSRIE):
                  'uploader': 'Divers',
                  'upload_date': '19680921',
                  'timestamp': -40280400,
-                'thumbnail': 're:^https?://.*\.image',
+                'thumbnail': r're:^https?://.*\.image',
                  'view_count': int,
              },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            }
          },
          {
              'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
-            'md5': 'f1077ac5af686c76528dc8d7c5df29ba',
              'info_dict': {
-                'id': '5742494',
-                'display_id': '5742494',
-                'ext': 'mp4',
-                'duration': 3720,
-                'title': 'Les yeux dans les cieux - Mon homard au Canada',
-                'description': 'md5:d22ee46f5cc5bac0912e5a0c6d44a9f7',
-                'uploader': 'Passe-moi les jumelles',
-                'upload_date': '20140404',
-                'timestamp': 1396635300,
-                'thumbnail': 're:^https?://.*\.image',
-                'view_count': int,
+                'id': '5624065',
+                'title': 'Passe-moi les jumelles',
              },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            }
+            'playlist_mincount': 4,
          },
          {
              'url': 'http://www.rts.ch/video/sport/hockey/5745975-1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski.html',
-            'md5': 'b4326fecd3eb64a458ba73c73e91299d',
              'info_dict': {
                  'id': '5745975',
                  'display_id': '1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski',
@@ -77,14 +56,18 @@ class RTSIE(SRGSSRIE):
                  'uploader': 'Hockey',
                  'upload_date': '20140403',
                  'timestamp': 1396556882,
-                'thumbnail': 're:^https?://.*\.image',
+                'thumbnail': r're:^https?://.*\.image',
                  'view_count': int,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
              'skip': 'Blocked outside Switzerland',
          },
          {
              'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
-            'md5': '9f713382f15322181bb366cc8c3a4ff0',
+            'md5': '1bae984fe7b1f78e94abc74e802ed99f',
              'info_dict': {
                  'id': '5745356',
                  'display_id': 'londres-cachee-par-un-epais-smog',
@@ -92,16 +75,12 @@ class RTSIE(SRGSSRIE):
                  'duration': 33,
                  'title': 'Londres cachée par un épais smog',
                  'description': 'Un important voile de smog recouvre Londres depuis mercredi, provoqué par la pollution et du sable du Sahara.',
-                'uploader': 'Le Journal en continu',
+                'uploader': 'L\'actu en vidéo',
                  'upload_date': '20140403',
                  'timestamp': 1396537322,
-                'thumbnail': 're:^https?://.*\.image',
+                'thumbnail': r're:^https?://.*\.image',
                  'view_count': int,
              },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            }
          },
          {
              'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
@@ -125,6 +104,10 @@ class RTSIE(SRGSSRIE):
                  'title': 'Hockey: Davos décroche son 31e titre de champion de Suisse',
              },
              'playlist_mincount': 5,
+        },
+        {
+            'url': 'http://pages.rts.ch/emissions/passe-moi-les-jumelles/5624065-entre-ciel-et-mer.html',
+            'only_matching': True,
          }
      ]
  
@@ -142,19 +125,32 @@ class RTSIE(SRGSSRIE):
  
          # media_id extracted out of URL is not always a real id
          if 'video' not in all_info and 'audio' not in all_info:
-            page = self._download_webpage(url, display_id)
+            entries = []
  
-            # article with videos on rhs
-            videos = re.findall(
-                r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:([^"]+)"',
-                page)
-            if not videos:
+            for item in all_info.get('items', []):
+                item_url = item.get('url')
+                if not item_url:
+                    continue
+                entries.append(self.url_result(item_url, 'RTS'))
+
+            if not entries:
+                page, urlh = self._download_webpage_handle(url, display_id)
+                if re.match(self._VALID_URL, urlh.geturl()).group('id') != media_id:
+                    return self.url_result(urlh.geturl(), 'RTS')
+
+                # article with videos on rhs
                  videos = re.findall(
-                    r'(?s)<iframe[^>]+class="srg-player"[^>]+src="[^"]+urn:([^"]+)"',
+                    r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:([^"]+)"',
                      page)
-            if videos:
-                entries = [self.url_result('srgssr:%s' % video_urn, 'SRGSSR') for video_urn in videos]
-                return self.playlist_result(entries, media_id, self._og_search_title(page))
+                if not videos:
+                    videos = re.findall(
+                        r'(?s)<iframe[^>]+class="srg-player"[^>]+src="[^"]+urn:([^"]+)"',
+                        page)
+                if videos:
+                    entries = [self.url_result('srgssr:%s' % video_urn, 'SRGSSR') for video_urn in videos]
+
+            if entries:
+                return self.playlist_result(entries, media_id, all_info.get('title'))
  
              internal_id = self._html_search_regex(
                  r'<(?:video|audio) data-id="([0-9]+)"', page,
@@ -168,36 +164,29 @@ class RTSIE(SRGSSRIE):
  
          info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
  
-        upload_timestamp = parse_iso8601(info.get('broadcast_date'))
-        duration = info.get('duration') or info.get('cutout') or info.get('cutduration')
-        if isinstance(duration, compat_str):
-            duration = parse_duration(duration)
-        view_count = info.get('plays')
-        thumbnail = unescapeHTML(info.get('preview_image_url'))
+        title = info['title']
  
          def extract_bitrate(url):
              return int_or_none(self._search_regex(
                  r'-([0-9]+)k\.', url, 'bitrate', default=None))
  
          formats = []
-        for format_id, format_url in info['streams'].items():
-            if format_id == 'hds_sd' and 'hds' in info['streams']:
+        streams = info.get('streams', {})
+        for format_id, format_url in streams.items():
+            if format_id == 'hds_sd' and 'hds' in streams:
                  continue
-            if format_id == 'hls_sd' and 'hls' in info['streams']:
+            if format_id == 'hls_sd' and 'hls' in streams:
                  continue
-            if format_url.endswith('.f4m'):
-                token = self._download_xml(
-                    'http://tp.srgssr.ch/token/akahd.xml?stream=%s/*' % compat_urllib_parse_urlparse(format_url).path,
-                    media_id, 'Downloading %s token' % format_id)
-                auth_params = xpath_text(token, './/authparams', 'auth params')
-                if not auth_params:
-                    continue
-                formats.extend(self._extract_f4m_formats(
-                    '%s?%s&hdcore=3.4.0&plugin=aasp-3.4.0.132.66' % (format_url, auth_params),
-                    media_id, f4m_id=format_id, fatal=False))
-            elif format_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(
-                    format_url, media_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
+            ext = determine_ext(format_url)
+            if ext in ('m3u8', 'f4m'):
+                format_url = self._get_tokenized_src(format_url, media_id, format_id)
+                if ext == 'f4m':
+                    formats.extend(self._extract_f4m_formats(
+                        format_url + ('?' if '?' not in format_url else '&') + 'hdcore=3.4.0',
+                        media_id, f4m_id=format_id, fatal=False))
+                else:
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, media_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
              else:
                  formats.append({
                      'format_id': format_id,
@@ -205,25 +194,37 @@ class RTSIE(SRGSSRIE):
                      'tbr': extract_bitrate(format_url),
                  })
  
-        if 'media' in info:
-            formats.extend([{
-                'format_id': '%s-%sk' % (media['ext'], media['rate']),
-                'url': 'http://download-video.rts.ch/%s' % media['url'],
-                'tbr': media['rate'] or extract_bitrate(media['url']),
-            } for media in info['media'] if media.get('rate')])
+        for media in info.get('media', []):
+            media_url = media.get('url')
+            if not media_url or re.match(r'https?://', media_url):
+                continue
+            rate = media.get('rate')
+            ext = media.get('ext') or determine_ext(media_url, 'mp4')
+            format_id = ext
+            if rate:
+                format_id += '-%dk' % rate
+            formats.append({
+                'format_id': format_id,
+                'url': 'http://download-video.rts.ch/' + media_url,
+                'tbr': rate or extract_bitrate(media_url),
+            })
  
          self._check_formats(formats, media_id)
          self._sort_formats(formats)
  
+        duration = info.get('duration') or info.get('cutout') or info.get('cutduration')
+        if isinstance(duration, compat_str):
+            duration = parse_duration(duration)
+
          return {
              'id': media_id,
              'display_id': display_id,
              'formats': formats,
-            'title': info['title'],
+            'title': title,
              'description': info.get('intro'),
              'duration': duration,
-            'view_count': view_count,
+            'view_count': int_or_none(info.get('plays')),
              'uploader': info.get('programName'),
-            'timestamp': upload_timestamp,
-            'thumbnail': thumbnail,
+            'timestamp': parse_iso8601(info.get('broadcast_date')),
+            'thumbnail': unescapeHTML(info.get('preview_image_url')),
          }
diff --git a/youtube_dl/extractor/rtve.py b/youtube_dl/extractor/rtve.py

index 6a43b036e924470055aea3910d1c5ea807483fdb..746677a24892f61249d32757ba5e4cac92d1f756 100644 (file)
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -209,7 +209,10 @@ class RTVELiveIE(InfoExtractor):
          title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
  
          vidplayer_id = self._search_regex(
-            r'playerId=player([0-9]+)', webpage, 'internal video ID')
+            (r'playerId=player([0-9]+)',
+             r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
+             r'data-id=["\'](\d+)'),
+            webpage, 'internal video ID')
          png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
          png = self._download_webpage(png_url, video_id, 'Downloading url information')
          m3u8_url = _decrypt_url(png)
diff --git a/youtube_dl/extractor/rtvnh.py b/youtube_dl/extractor/rtvnh.py

index f6454c6b0082ed431fa74de49dd5881d3b0b7a0f..6a00f7007221e40fd06293e9c60d31409641d94c 100644 (file)
--- a/youtube_dl/extractor/rtvnh.py
+++ b/youtube_dl/extractor/rtvnh.py
@@ -14,7 +14,7 @@ class RTVNHIE(InfoExtractor):
              'id': '131946',
              'ext': 'mp4',
              'title': 'Grote zoektocht in zee bij Zandvoort naar vermiste vrouw',
-            'thumbnail': 're:^https?:.*\.jpg$'
+            'thumbnail': r're:^https?:.*\.jpg$'
          }
      }
  
diff --git a/youtube_dl/extractor/rudo.py b/youtube_dl/extractor/rudo.py

index 9a330c1961b75f662caa457fabe231e6aa4bcb8a..3bfe934d82c9db7fe7ff8b1b8820d4c96ca0cb32 100644 (file)
--- a/youtube_dl/extractor/rudo.py
+++ b/youtube_dl/extractor/rudo.py
@@ -28,7 +28,7 @@ class RudoIE(JWPlatformBaseIE):
      @classmethod
      def _extract_url(self, webpage):
          mobj = re.search(
-            '<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
+            r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
              webpage)
          if mobj:
              return mobj.group('url')
diff --git a/youtube_dl/extractor/ruhd.py b/youtube_dl/extractor/ruhd.py

index ce631b46c30bcd2eda03c798d61bed616f41e0b4..2b830cf477eef731caef1f2a6cddf10ef3efa14c 100644 (file)
--- a/youtube_dl/extractor/ruhd.py
+++ b/youtube_dl/extractor/ruhd.py
@@ -14,7 +14,7 @@ class RUHDIE(InfoExtractor):
              'ext': 'divx',
              'title': 'КОТ бааааам',
              'description': 'классный кот)',
-            'thumbnail': 're:^http://.*\.jpg$',
+            'thumbnail': r're:^http://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/ruutu.py b/youtube_dl/extractor/ruutu.py

index 6db3e3e9328b754f4a1f0ef149b33489857a3cd5..20d01754a17998f90c64f33cf76693028dd57103 100644 (file)
--- a/youtube_dl/extractor/ruutu.py
+++ b/youtube_dl/extractor/ruutu.py
@@ -23,7 +23,7 @@ class RuutuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Oletko aina halunnut tietää mitä tapahtuu vain hetki ennen lähetystä? - Nyt se selvisi!',
                  'description': 'md5:cfc6ccf0e57a814360df464a91ff67d6',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 114,
                  'age_limit': 0,
              },
@@ -36,7 +36,7 @@ class RuutuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Superpesis: katso koko kausi Ruudussa',
                  'description': 'md5:bfb7336df2a12dc21d18fa696c9f8f23',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 40,
                  'age_limit': 0,
              },
@@ -49,7 +49,7 @@ class RuutuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Osa 1: Mikael Jungner',
                  'description': 'md5:7d90f358c47542e3072ff65d7b1bcffe',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'age_limit': 0,
              },
          },
@@ -81,6 +81,9 @@ class RuutuIE(InfoExtractor):
                      elif ext == 'f4m':
                          formats.extend(self._extract_f4m_formats(
                              video_url, video_id, f4m_id='hds', fatal=False))
+                    elif ext == 'mpd':
+                        formats.extend(self._extract_mpd_formats(
+                            video_url, video_id, mpd_id='dash', fatal=False))
                      else:
                          proto = compat_urllib_parse_urlparse(video_url).scheme
                          if not child.tag.startswith('HTTP') and proto != 'rtmp':
diff --git a/youtube_dl/extractor/savefrom.py b/youtube_dl/extractor/savefrom.py

index 5b7367b94119792661506624264b121503cc6858..30f9cf8245856398239e88e5e8454c1df4bd8c3f 100644 (file)
--- a/youtube_dl/extractor/savefrom.py
+++ b/youtube_dl/extractor/savefrom.py
@@ -20,7 +20,7 @@ class SaveFromIE(InfoExtractor):
              'upload_date': '20120816',
              'uploader': 'Howcast',
              'uploader_id': 'Howcast',
-            'description': 're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*',
+            'description': r're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*',
          },
          'params': {
              'skip_download': True
diff --git a/youtube_dl/extractor/sbs.py b/youtube_dl/extractor/sbs.py

index 43131fb7e5ce82d69d25bf639ce6c2bffe35182a..845712a7640afe9f675757c2f711830c9c79a00f 100644 (file)
--- a/youtube_dl/extractor/sbs.py
+++ b/youtube_dl/extractor/sbs.py
@@ -22,7 +22,7 @@ class SBSIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Dingo Conservation (The Feed)',
              'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'duration': 308,
              'timestamp': 1408613220,
              'upload_date': '20140821',
diff --git a/youtube_dl/extractor/screencast.py b/youtube_dl/extractor/screencast.py

index ed9de964841e52c1e5753556d6b9e53339ba23c3..62a6a8337ccf5d247a38be29cf93b5b72f36dfd7 100644 (file)
--- a/youtube_dl/extractor/screencast.py
+++ b/youtube_dl/extractor/screencast.py
@@ -21,7 +21,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'm4v',
              'title': 'Color Measurement with Ocean Optics Spectrometers',
              'description': 'md5:240369cde69d8bed61349a199c5fb153',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://www.screencast.com/t/V2uXehPJa1ZI',
@@ -31,7 +31,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'mov',
              'title': 'The Amadeus Spectrometer',
              'description': 're:^In this video, our friends at.*To learn more about Amadeus, visit',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://www.screencast.com/t/aAB3iowa',
@@ -41,7 +41,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Google Earth Export',
              'description': 'Provides a demo of a CommunityViz export to Google Earth, one of the 3D viewing options.',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://www.screencast.com/t/X3ddTrYh',
@@ -51,7 +51,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'wmv',
              'title': 'Toolkit 6 User Group Webinar (2014-03-04) - Default Judgment and First Impression',
              'description': 'md5:7b9f393bc92af02326a5c5889639eab0',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://screencast.com/t/aAB3iowa',
diff --git a/youtube_dl/extractor/screencastomatic.py b/youtube_dl/extractor/screencastomatic.py

index 7a88a42cd84dbfd9f343567dffb5f462c10329b7..94a2a37d20696fa3ffc65b6f1df04cab42c7785d 100644 (file)
--- a/youtube_dl/extractor/screencastomatic.py
+++ b/youtube_dl/extractor/screencastomatic.py
@@ -14,7 +14,7 @@ class ScreencastOMaticIE(JWPlatformBaseIE):
              'id': 'c2lD3BeOPl',
              'ext': 'mp4',
              'title': 'Welcome to 3-4 Philosophy @ DECV!',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
              'duration': 369.163,
          }
diff --git a/youtube_dl/extractor/screenjunkies.py b/youtube_dl/extractor/screenjunkies.py

deleted file mode 100644 (file)

index 02e574c..0000000
--- a/youtube_dl/extractor/screenjunkies.py
+++ /dev/null
@@ -1,138 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    int_or_none,
-    parse_age_limit,
-)
-
-
-class ScreenJunkiesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?screenjunkies\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
-    _TESTS = [{
-        'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
-        'md5': '5c2b686bec3d43de42bde9ec047536b0',
-        'info_dict': {
-            'id': '2841915',
-            'display_id': 'best-quentin-tarantino-movie',
-            'ext': 'mp4',
-            'title': 'Best Quentin Tarantino Movie',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 3671,
-            'age_limit': 13,
-            'tags': list,
-        },
-    }, {
-        'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
-        'info_dict': {
-            'id': '2348808',
-            'display_id': 'honest-trailers-the-dark-knight',
-            'ext': 'mp4',
-            'title': "Honest Trailers: 'The Dark Knight'",
-            'thumbnail': 're:^https?://.*\.jpg',
-            'age_limit': 10,
-            'tags': list,
-        },
-    }, {
-        # requires subscription but worked around
-        'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
-        'info_dict': {
-            'id': '3003285',
-            'display_id': 'knocking-dead-ep-1-the-show-so-far',
-            'ext': 'mp4',
-            'title': 'Knocking Dead Ep 1: State of The Dead Recap',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 3307,
-            'age_limit': 13,
-            'tags': list,
-        },
-    }]
-
-    _DEFAULT_BITRATES = (48, 150, 496, 864, 2240)
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        if not video_id:
-            webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(
-                (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
-                webpage, 'video id')
-
-        webpage = self._download_webpage(
-            'http://www.screenjunkies.com/embed/%s' % video_id,
-            display_id, 'Downloading video embed page')
-        embed_vars = self._parse_json(
-            self._search_regex(
-                r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
-            display_id)
-
-        title = embed_vars['contentName']
-
-        formats = []
-        bitrates = []
-        for f in embed_vars.get('media', []):
-            if not f.get('uri') or f.get('mediaPurpose') != 'play':
-                continue
-            bitrate = int_or_none(f.get('bitRate'))
-            if bitrate:
-                bitrates.append(bitrate)
-            formats.append({
-                'url': f['uri'],
-                'format_id': 'http-%d' % bitrate if bitrate else 'http',
-                'width': int_or_none(f.get('width')),
-                'height': int_or_none(f.get('height')),
-                'tbr': bitrate,
-                'format': 'mp4',
-            })
-
-        if not bitrates:
-            # When subscriptionLevel > 0, i.e. plus subscription is required
-            # media list will be empty. However, hds and hls uris are still
-            # available. We can grab them assuming bitrates to be default.
-            bitrates = self._DEFAULT_BITRATES
-
-        auth_token = embed_vars.get('AuthToken')
-
-        def construct_manifest_url(base_url, ext):
-            pieces = [base_url]
-            pieces.extend([compat_str(b) for b in bitrates])
-            pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
-            return ','.join(pieces)
-
-        if bitrates and auth_token:
-            hds_url = embed_vars.get('hdsUri')
-            if hds_url:
-                f4m_formats = self._extract_f4m_formats(
-                    construct_manifest_url(hds_url, 'f4m'),
-                    display_id, f4m_id='hds', fatal=False)
-                if len(f4m_formats) == len(bitrates):
-                    for f, bitrate in zip(f4m_formats, bitrates):
-                        if not f.get('tbr'):
-                            f['format_id'] = 'hds-%d' % bitrate
-                            f['tbr'] = bitrate
-                # TODO: fix f4m downloader to handle manifests without bitrates if possible
-                # formats.extend(f4m_formats)
-
-            hls_url = embed_vars.get('hlsUri')
-            if hls_url:
-                formats.extend(self._extract_m3u8_formats(
-                    construct_manifest_url(hls_url, 'm3u8'),
-                    display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'thumbnail': embed_vars.get('thumbUri'),
-            'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
-            'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
-            'tags': embed_vars.get('tags', '').split(','),
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/senateisvp.py b/youtube_dl/extractor/senateisvp.py

index 35540c082ef2f7c4d6fa9cf9ce8acf404bc33a8c..387a4f7f6952adcb6d1954106ce580f44cde6e6f 100644 (file)
--- a/youtube_dl/extractor/senateisvp.py
+++ b/youtube_dl/extractor/senateisvp.py
@@ -55,7 +55,7 @@ class SenateISVPIE(InfoExtractor):
              'id': 'judiciary031715',
              'ext': 'mp4',
              'title': 'Integrated Senate Video Player',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/sendtonews.py b/youtube_dl/extractor/sendtonews.py

index 2dbe490bba7717a7719290113f26ed5c795ae218..9880a5a78c1f4b18d41c55e1899405dbdb98e7dc 100644 (file)
--- a/youtube_dl/extractor/sendtonews.py
+++ b/youtube_dl/extractor/sendtonews.py
@@ -8,6 +8,9 @@ from ..utils import (
      float_or_none,
      parse_iso8601,
      update_url_query,
+    int_or_none,
+    determine_protocol,
+    unescapeHTML,
  )
  
  
@@ -20,18 +23,18 @@ class SendtoNewsIE(JWPlatformBaseIE):
          'info_dict': {
              'id': 'GxfCe0Zo7D-175909-5588'
          },
-        'playlist_count': 9,
+        'playlist_count': 8,
          # test the first video only to prevent lengthy tests
          'playlist': [{
              'info_dict': {
-                'id': '198180',
+                'id': '240385',
                  'ext': 'mp4',
-                'title': 'Recap: CLE 5, LAA 4',
-                'description': '8/14/16: Naquin, Almonte lead Indians in 5-4 win',
-                'duration': 57.343,
-                'thumbnail': 're:https?://.*\.jpg$',
-                'upload_date': '20160815',
-                'timestamp': 1471221961,
+                'title': 'Indians introduce Encarnacion',
+                'description': 'Indians president of baseball operations Chris Antonetti and Edwin Encarnacion discuss the slugger\'s three-year contract with Cleveland',
+                'duration': 137.898,
+                'thumbnail': r're:https?://.*\.jpg$',
+                'upload_date': '20170105',
+                'timestamp': 1483649762,
              },
          }],
          'params': {
@@ -64,7 +67,20 @@ class SendtoNewsIE(JWPlatformBaseIE):
          for video in playlist_data['playlistData'][0]:
              info_dict = self._parse_jwplayer_data(
                  video['jwconfiguration'],
-                require_title=False, rtmp_params={'no_resume': True})
+                require_title=False, m3u8_id='hls', rtmp_params={'no_resume': True})
+
+            for f in info_dict['formats']:
+                if f.get('tbr'):
+                    continue
+                tbr = int_or_none(self._search_regex(
+                    r'/(\d+)k/', f['url'], 'bitrate', default=None))
+                if not tbr:
+                    continue
+                f.update({
+                    'format_id': '%s-%d' % (determine_protocol(f), tbr),
+                    'tbr': tbr,
+                })
+            self._sort_formats(info_dict['formats'], ('tbr', 'height', 'width', 'format_id'))
  
              thumbnails = []
              if video.get('thumbnailUrl'):
@@ -78,8 +94,8 @@ class SendtoNewsIE(JWPlatformBaseIE):
                      'url': video['smThumbnailUrl'],
                  })
              info_dict.update({
-                'title': video['S_headLine'],
-                'description': video.get('S_fullStory'),
+                'title': video['S_headLine'].strip(),
+                'description': unescapeHTML(video.get('S_fullStory')),
                  'thumbnails': thumbnails,
                  'duration': float_or_none(video.get('SM_length')),
                  'timestamp': parse_iso8601(video.get('S_sysDate'), delimiter=' '),
diff --git a/youtube_dl/extractor/sexu.py b/youtube_dl/extractor/sexu.py

index a99b2a8e7be1bc9de8a01d6ae2de6fb36055703c..5e22ea73029b3d254d76fa0722c8041daa17a6fd 100644 (file)
--- a/youtube_dl/extractor/sexu.py
+++ b/youtube_dl/extractor/sexu.py
@@ -14,7 +14,7 @@ class SexuIE(InfoExtractor):
              'title': 'md5:4d05a19a5fc049a63dbbaf05fb71d91b',
              'description': 'md5:2b75327061310a3afb3fbd7d09e2e403',
              'categories': list,  # NSFW
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }
diff --git a/youtube_dl/extractor/sharesix.py b/youtube_dl/extractor/sharesix.py

deleted file mode 100644 (file)

index 9cce5ce..0000000
--- a/youtube_dl/extractor/sharesix.py
+++ /dev/null
@@ -1,91 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    parse_duration,
-    sanitized_Request,
-    urlencode_postdata,
-)
-
-
-class ShareSixIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?sharesix\.com/(?:f/)?(?P<id>[0-9a-zA-Z]+)'
-    _TESTS = [
-        {
-            'url': 'http://sharesix.com/f/OXjQ7Y6',
-            'md5': '9e8e95d8823942815a7d7c773110cc93',
-            'info_dict': {
-                'id': 'OXjQ7Y6',
-                'ext': 'mp4',
-                'title': 'big_buck_bunny_480p_surround-fix.avi',
-                'duration': 596,
-                'width': 854,
-                'height': 480,
-            },
-        },
-        {
-            'url': 'http://sharesix.com/lfrwoxp35zdd',
-            'md5': 'dd19f1435b7cec2d7912c64beeee8185',
-            'info_dict': {
-                'id': 'lfrwoxp35zdd',
-                'ext': 'flv',
-                'title': 'WhiteBoard___a_Mac_vs_PC_Parody_Cartoon.mp4.flv',
-                'duration': 65,
-                'width': 1280,
-                'height': 720,
-            },
-        }
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        fields = {
-            'method_free': 'Free'
-        }
-        post = urlencode_postdata(fields)
-        req = sanitized_Request(url, post)
-        req.add_header('Content-type', 'application/x-www-form-urlencoded')
-
-        webpage = self._download_webpage(req, video_id,
-                                         'Downloading video page')
-
-        video_url = self._search_regex(
-            r"var\slnk1\s=\s'([^']+)'", webpage, 'video URL')
-        title = self._html_search_regex(
-            r'(?s)<dt>Filename:</dt>.+?<dd>(.+?)</dd>', webpage, 'title')
-        duration = parse_duration(
-            self._search_regex(
-                r'(?s)<dt>Length:</dt>.+?<dd>(.+?)</dd>',
-                webpage,
-                'duration',
-                fatal=False
-            )
-        )
-
-        m = re.search(
-            r'''(?xs)<dt>Width\sx\sHeight</dt>.+?
-                     <dd>(?P<width>\d+)\sx\s(?P<height>\d+)</dd>''',
-            webpage
-        )
-        width = height = None
-        if m:
-            width, height = int(m.group('width')), int(m.group('height'))
-
-        formats = [{
-            'format_id': 'sd',
-            'url': video_url,
-            'width': width,
-            'height': height,
-        }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/showroomlive.py b/youtube_dl/extractor/showroomlive.py

new file mode 100644 (file)

index 0000000..efd9d56
--- /dev/null
+++ b/youtube_dl/extractor/showroomlive.py
@@ -0,0 +1,84 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    urljoin,
+)
+
+
+class ShowRoomLiveIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?showroom-live\.com/(?!onlive|timetable|event|campaign|news|ranking|room)(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://www.showroom-live.com/48_Nana_Okada',
+        'only_matching': True,
+    }
+
+    def _real_extract(self, url):
+        broadcaster_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, broadcaster_id)
+
+        room_id = self._search_regex(
+            (r'SrGlobal\.roomId\s*=\s*(\d+)',
+             r'(?:profile|room)\?room_id\=(\d+)'), webpage, 'room_id')
+
+        room = self._download_json(
+            urljoin(url, '/api/room/profile?room_id=%s' % room_id),
+            broadcaster_id)
+
+        is_live = room.get('is_onlive')
+        if is_live is not True:
+            raise ExtractorError('%s is offline' % broadcaster_id, expected=True)
+
+        uploader = room.get('performer_name') or broadcaster_id
+        title = room.get('room_name') or room.get('main_name') or uploader
+
+        streaming_url_list = self._download_json(
+            urljoin(url, '/api/live/streaming_url?room_id=%s' % room_id),
+            broadcaster_id)['streaming_url_list']
+
+        formats = []
+        for stream in streaming_url_list:
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            stream_type = stream.get('type')
+            if stream_type == 'hls':
+                m3u8_formats = self._extract_m3u8_formats(
+                    stream_url, broadcaster_id, ext='mp4', m3u8_id='hls',
+                    live=True)
+                for f in m3u8_formats:
+                    f['quality'] = int_or_none(stream.get('quality', 100))
+                formats.extend(m3u8_formats)
+            elif stream_type == 'rtmp':
+                stream_name = stream.get('stream_name')
+                if not stream_name:
+                    continue
+                formats.append({
+                    'url': stream_url,
+                    'play_path': stream_name,
+                    'page_url': url,
+                    'player_url': 'https://www.showroom-live.com/assets/swf/v3/ShowRoomLive.swf',
+                    'rtmp_live': True,
+                    'ext': 'flv',
+                    'format_id': 'rtmp',
+                    'format_note': stream.get('label'),
+                    'quality': int_or_none(stream.get('quality', 100)),
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': compat_str(room.get('live_id') or broadcaster_id),
+            'title': self._live_title(title),
+            'description': room.get('description'),
+            'timestamp': int_or_none(room.get('current_live_started_at')),
+            'uploader': uploader,
+            'uploader_id': broadcaster_id,
+            'view_count': int_or_none(room.get('view_num')),
+            'formats': formats,
+            'is_live': True,
+        }
diff --git a/youtube_dl/extractor/skysports.py b/youtube_dl/extractor/skysports.py

index 9dc78c7d2b27748fd2e1c083e7b265448cbff500..4ca9f6b3c811f59ef11eb82d173554341f3ab66d 100644 (file)
--- a/youtube_dl/extractor/skysports.py
+++ b/youtube_dl/extractor/skysports.py
@@ -2,18 +2,19 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import strip_or_none
  
  
  class SkySportsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine',
-        'md5': 'c44a1db29f27daf9a0003e010af82100',
+        'md5': '77d59166cddc8d3cb7b13e35eaf0f5ec',
          'info_dict': {
              'id': '10328419',
-            'ext': 'flv',
-            'title': 'Bale: Its our time to shine',
-            'description': 'md5:9fd1de3614d525f5addda32ac3c482c9',
+            'ext': 'mp4',
+            'title': 'Bale: It\'s our time to shine',
+            'description': 'md5:e88bda94ae15f7720c5cb467e777bb6d',
          },
          'add_ie': ['Ooyala'],
      }
@@ -28,6 +29,6 @@ class SkySportsIE(InfoExtractor):
              'url': 'ooyala:%s' % self._search_regex(
                  r'data-video-id="([^"]+)"', webpage, 'ooyala id'),
              'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
+            'description': strip_or_none(self._og_search_description(webpage)),
              'ie_key': 'Ooyala',
          }
diff --git a/youtube_dl/extractor/slutload.py b/youtube_dl/extractor/slutload.py

index 18cc7721e142c7493bbebdfcb59f621e3fedaf4f..7145d285a0244acdbdb58dfbf330d2c75391dea3 100644 (file)
--- a/youtube_dl/extractor/slutload.py
+++ b/youtube_dl/extractor/slutload.py
@@ -13,7 +13,7 @@ class SlutloadIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'virginie baisee en cam',
              'age_limit': 18,
-            'thumbnail': 're:https?://.*?\.jpg'
+            'thumbnail': r're:https?://.*?\.jpg'
          }
      }
  
diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py

index def46abda45c5d4899f3c3e5a3fb775592efdfa6..370fa887968128281a6286f78a1fdf4bf59f7b9f 100644 (file)
--- a/youtube_dl/extractor/smotri.py
+++ b/youtube_dl/extractor/smotri.py
@@ -81,7 +81,7 @@ class SmotriIE(InfoExtractor):
                  'uploader': 'psavari1',
                  'uploader_id': 'psavari1',
                  'upload_date': '20081103',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  'videopassword': '223322',
@@ -117,7 +117,7 @@ class SmotriIE(InfoExtractor):
                  'uploader': 'вАся',
                  'uploader_id': 'asya_prosto',
                  'upload_date': '20081218',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'age_limit': 18,
              },
              'params': {
diff --git a/youtube_dl/extractor/snotr.py b/youtube_dl/extractor/snotr.py

index 4819fe5b4b6322cc02e9e1fdd4c128cbe28e55b0..f773547483fbf7828118b7b3ff2e537e05b9628c 100644 (file)
--- a/youtube_dl/extractor/snotr.py
+++ b/youtube_dl/extractor/snotr.py
@@ -22,7 +22,7 @@ class SnotrIE(InfoExtractor):
              'duration': 248,
              'filesize_approx': 40700000,
              'description': 'A drone flying through Fourth of July Fireworks',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'expected_warnings': ['description'],
      }, {
@@ -34,7 +34,7 @@ class SnotrIE(InfoExtractor):
              'duration': 126,
              'filesize_approx': 8500000,
              'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }]
  
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 5a201eaa890347d384849565d98d05ac914ddc39..b3aa4ce26ab95933b40f3606c86b8ae6cefc531b 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -173,46 +173,54 @@ class SoundcloudIE(InfoExtractor):
              })
  
          # We have to retrieve the url
-        streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
-                       'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
          format_dict = self._download_json(
-            streams_url,
-            track_id, 'Downloading track url')
+            'http://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
+            track_id, 'Downloading track url', query={
+                'client_id': self._CLIENT_ID,
+                'secret_token': secret_token,
+            })
  
          for key, stream_url in format_dict.items():
+            abr = int_or_none(self._search_regex(
+                r'_(\d+)_url', key, 'audio bitrate', default=None))
              if key.startswith('http'):
-                formats.append({
+                stream_formats = [{
                      'format_id': key,
                      'ext': ext,
                      'url': stream_url,
-                    'vcodec': 'none',
-                })
+                }]
              elif key.startswith('rtmp'):
                  # The url doesn't have an rtmp app, we have to extract the playpath
                  url, path = stream_url.split('mp3:', 1)
-                formats.append({
+                stream_formats = [{
                      'format_id': key,
                      'url': url,
                      'play_path': 'mp3:' + path,
                      'ext': 'flv',
-                    'vcodec': 'none',
-                })
-
-            if not formats:
-                # We fallback to the stream_url in the original info, this
-                # cannot be always used, sometimes it can give an HTTP 404 error
-                formats.append({
-                    'format_id': 'fallback',
-                    'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
-                    'ext': ext,
-                    'vcodec': 'none',
-                })
-
-            for f in formats:
-                if f['format_id'].startswith('http'):
-                    f['protocol'] = 'http'
-                if f['format_id'].startswith('rtmp'):
-                    f['protocol'] = 'rtmp'
+                }]
+            elif key.startswith('hls'):
+                stream_formats = self._extract_m3u8_formats(
+                    stream_url, track_id, 'mp3', entry_protocol='m3u8_native',
+                    m3u8_id=key, fatal=False)
+            else:
+                continue
+
+            for f in stream_formats:
+                f['abr'] = abr
+
+            formats.extend(stream_formats)
+
+        if not formats:
+            # We fallback to the stream_url in the original info, this
+            # cannot be always used, sometimes it can give an HTTP 404 error
+            formats.append({
+                'format_id': 'fallback',
+                'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
+                'ext': ext,
+            })
+
+        for f in formats:
+            f['vcodec'] = 'none'
  
          self._check_formats(formats, track_id)
          self._sort_formats(formats)
diff --git a/youtube_dl/extractor/soundgasm.py b/youtube_dl/extractor/soundgasm.py

index 3a4ddf57ea369a0b250a4d786738e0ea4db9e1dd..e004e2c5ab12705c8d9ff5e12b25f53579539c72 100644 (file)
--- a/youtube_dl/extractor/soundgasm.py
+++ b/youtube_dl/extractor/soundgasm.py
@@ -27,7 +27,7 @@ class SoundgasmIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id)
          audio_url = self._html_search_regex(
              r'(?s)m4a\:\s"([^"]+)"', webpage, 'audio URL')
-        audio_id = re.split('\/|\.', audio_url)[-2]
+        audio_id = re.split(r'\/|\.', audio_url)[-2]
          description = self._html_search_regex(
              r'(?s)<li>Description:\s(.*?)<\/li>', webpage, 'description',
              fatal=False)
diff --git a/youtube_dl/extractor/southpark.py b/youtube_dl/extractor/southpark.py

index 08f8c5744a84dffda03904afd30d44cac42f2917..d8ce416fc7d1a9ec2e3561752890d916f2bcf93a 100644 (file)
--- a/youtube_dl/extractor/southpark.py
+++ b/youtube_dl/extractor/southpark.py
@@ -6,7 +6,7 @@ from .mtv import MTVServicesInfoExtractor
  
  class SouthParkIE(MTVServicesInfoExtractor):
      IE_NAME = 'southpark.cc.com'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
  
      _FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
  
@@ -75,7 +75,7 @@ class SouthParkDeIE(SouthParkIE):
  
  class SouthParkNlIE(SouthParkIE):
      IE_NAME = 'southpark.nl'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
      _FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
  
      _TESTS = [{
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index 186d22b7d1608b01bb0a3d45082403e6a58bb05e..123c33ac36e275d8b624c8830153235c3a4ef338 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -15,7 +15,7 @@ class SpankBangIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'fantasy solo',
              'description': 'Watch fantasy solo free HD porn video - 05 minutes - dillion harper masturbates on a bed free adult movies.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'silly2587',
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py

index 92a7120a3242e732ceb58f51b4391a5efbc569d8..44d8fa52f3071ca00971624db81ce4ad6b2141e3 100644 (file)
--- a/youtube_dl/extractor/spankwire.py
+++ b/youtube_dl/extractor/spankwire.py
@@ -85,7 +85,7 @@ class SpankwireIE(InfoExtractor):
              r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
          heights = [int(video[0]) for video in videos]
          video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
-        if webpage.find('flashvars\.encrypted = "true"') != -1:
+        if webpage.find(r'flashvars\.encrypted = "true"') != -1:
              password = self._search_regex(
                  r'flashvars\.video_title = "([^"]+)',
                  webpage, 'password').replace('+', ' ')
diff --git a/youtube_dl/extractor/spiegeltv.py b/youtube_dl/extractor/spiegeltv.py

index 034bd47ff617bdc96d572b7065b3af03c7117468..e1cfb869834cf0d50b04a11c5fc137d7c9afcad8 100644 (file)
--- a/youtube_dl/extractor/spiegeltv.py
+++ b/youtube_dl/extractor/spiegeltv.py
@@ -18,7 +18,7 @@ class SpiegeltvIE(InfoExtractor):
              'ext': 'm4v',
              'title': 'Flug MH370',
              'description': 'Das Rätsel um die Boeing 777 der Malaysia-Airlines',
-            'thumbnail': 're:http://.*\.jpg$',
+            'thumbnail': r're:http://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/spike.py b/youtube_dl/extractor/spike.py

index abfee3ece451dd4cfb4a45bd83fd7e29d2004c00..c59896a17905c006eabb40271d846eebe7908a66 100644 (file)
--- a/youtube_dl/extractor/spike.py
+++ b/youtube_dl/extractor/spike.py
@@ -46,7 +46,7 @@ class SpikeIE(MTVServicesInfoExtractor):
      _CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)')
  
      def _extract_mgid(self, webpage):
-        mgid = super(SpikeIE, self)._extract_mgid(webpage, default=None)
+        mgid = super(SpikeIE, self)._extract_mgid(webpage)
          if mgid is None:
              url_parts = self._search_regex(self._CUSTOM_URL_REGEX, webpage, 'episode_id')
              video_type, episode_id = url_parts.split('/', 1)
diff --git a/youtube_dl/extractor/sport5.py b/youtube_dl/extractor/sport5.py

index 7e67833062d0a21d2c663b1b5d24246d653f0116..a417b5a4ef0ddf302f11dc36f430572192e64262 100644 (file)
--- a/youtube_dl/extractor/sport5.py
+++ b/youtube_dl/extractor/sport5.py
@@ -41,7 +41,7 @@ class Sport5IE(InfoExtractor):
  
          webpage = self._download_webpage(url, media_id)
  
-        video_id = self._html_search_regex('clipId=([\w-]+)', webpage, 'video id')
+        video_id = self._html_search_regex(r'clipId=([\w-]+)', webpage, 'video id')
  
          metadata = self._download_xml(
              'http://sport5-metadata-rr-d.nsacdn.com/vod/vod/%s/HDS/metadata.xml' % video_id,
diff --git a/youtube_dl/extractor/sportbox.py b/youtube_dl/extractor/sportbox.py

index e5c28ae890ee61536052a5716677d486d0a5b43e..e7bd5bf91921752757e1bc9beb390798b86a9c8a 100644 (file)
--- a/youtube_dl/extractor/sportbox.py
+++ b/youtube_dl/extractor/sportbox.py
@@ -4,65 +4,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
-from ..utils import (
-    js_to_json,
-    unified_strdate,
-)
-
-
-class SportBoxIE(InfoExtractor):
-    _VALID_URL = r'https?://news\.sportbox\.ru/(?:[^/]+/)+spbvideo_NI\d+_(?P<display_id>.+)'
-    _TESTS = [{
-        'url': 'http://news.sportbox.ru/Vidy_sporta/Avtosport/Rossijskij/spbvideo_NI483529_Gonka-2-zaezd-Obyedinenniy-2000-klassi-Turing-i-S',
-        'md5': 'ff56a598c2cf411a9a38a69709e97079',
-        'info_dict': {
-            'id': '80822',
-            'ext': 'mp4',
-            'title': 'Гонка 2  заезд ««Объединенный 2000»: классы Туринг и Супер-продакшн',
-            'description': 'md5:3d72dc4a006ab6805d82f037fdc637ad',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'upload_date': '20140928',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
-    }, {
-        'url': 'http://news.sportbox.ru/Vidy_sporta/billiard/spbvideo_NI486287_CHempionat-mira-po-dinamichnoy-piramide-4',
-        'only_matching': True,
-    }, {
-        'url': 'http://news.sportbox.ru/video/no_ads/spbvideo_NI536574_V_Novorossijske_proshel_detskij_turnir_Pole_slavy_bojevoj?ci=211355',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('display_id')
-
-        webpage = self._download_webpage(url, display_id)
-
-        player = self._search_regex(
-            r'src="/?(vdl/player/[^"]+)"', webpage, 'player')
-
-        title = self._html_search_regex(
-            [r'"nodetitle"\s*:\s*"([^"]+)"', r'class="node-header_{1,2}title">([^<]+)'],
-            webpage, 'title')
-        description = self._og_search_description(webpage) or self._html_search_meta(
-            'description', webpage, 'description')
-        thumbnail = self._og_search_thumbnail(webpage)
-        upload_date = unified_strdate(self._html_search_meta(
-            'dateCreated', webpage, 'upload date'))
-
-        return {
-            '_type': 'url_transparent',
-            'url': compat_urlparse.urljoin(url, '/%s' % player),
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'upload_date': upload_date,
-        }
+from ..utils import js_to_json
  
  
  class SportBoxEmbedIE(InfoExtractor):
@@ -73,7 +15,7 @@ class SportBoxEmbedIE(InfoExtractor):
              'id': '211355',
              'ext': 'mp4',
              'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/sportdeutschland.py b/youtube_dl/extractor/sportdeutschland.py

index a9927f6e29d1d52463cefc3503414305bec0e919..a3c35a899a2186f1e937771cd0e34df408b2d361 100644 (file)
--- a/youtube_dl/extractor/sportdeutschland.py
+++ b/youtube_dl/extractor/sportdeutschland.py
@@ -20,8 +20,8 @@ class SportDeutschlandIE(InfoExtractor):
              'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
              'categories': ['Badminton'],
              'view_count': int,
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
              'timestamp': int,
              'upload_date': 're:^201408[23][0-9]$',
          },
@@ -38,7 +38,7 @@ class SportDeutschlandIE(InfoExtractor):
              'timestamp': 1408976060,
              'duration': 2732,
              'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'view_count': int,
              'categories': ['Li-Ning Badminton WM 2014'],
  
diff --git a/youtube_dl/extractor/srgssr.py b/youtube_dl/extractor/srgssr.py

index 246970c4d98a7d4592deadc1c7744c1504ccefef..319a48a7a543dfcfade0cb91726103a66d864711 100644 (file)
--- a/youtube_dl/extractor/srgssr.py
+++ b/youtube_dl/extractor/srgssr.py
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
      ExtractorError,
      parse_iso8601,
@@ -23,6 +24,16 @@ class SRGSSRIE(InfoExtractor):
          'STARTDATE': 'This video is not yet available. Please try again later.',
      }
  
+    def _get_tokenized_src(self, url, video_id, format_id):
+        sp = compat_urllib_parse_urlparse(url).path.split('/')
+        token = self._download_json(
+            'http://tp.srgssr.ch/akahd/token?acl=/%s/%s/*' % (sp[1], sp[2]),
+            video_id, 'Downloading %s token' % format_id, fatal=False) or {}
+        auth_params = token.get('token', {}).get('authparams')
+        if auth_params:
+            url += '?' + auth_params
+        return url
+
      def get_media_data(self, bu, media_type, media_id):
          media_data = self._download_json(
              'http://il.srgssr.ch/integrationlayer/1.0/ue/%s/%s/play/%s.json' % (bu, media_type, media_id),
@@ -37,9 +48,6 @@ class SRGSSRIE(InfoExtractor):
      def _real_extract(self, url):
          bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
  
-        if bu == 'rts':
-            return self.url_result('rts:%s' % media_id, 'RTS')
-
          media_data = self.get_media_data(bu, media_type, media_id)
  
          metadata = media_data['AssetMetadatas']['AssetMetadata'][0]
@@ -61,14 +69,16 @@ class SRGSSRIE(InfoExtractor):
                  asset_url = asset['text']
                  quality = asset['@quality']
                  format_id = '%s-%s' % (protocol, quality)
-                if protocol == 'HTTP-HDS':
-                    formats.extend(self._extract_f4m_formats(
-                        asset_url + '?hdcore=3.4.0', media_id,
-                        f4m_id=format_id, fatal=False))
-                elif protocol == 'HTTP-HLS':
-                    formats.extend(self._extract_m3u8_formats(
-                        asset_url, media_id, 'mp4', 'm3u8_native',
-                        m3u8_id=format_id, fatal=False))
+                if protocol.startswith('HTTP-HDS') or protocol.startswith('HTTP-HLS'):
+                    asset_url = self._get_tokenized_src(asset_url, media_id, format_id)
+                    if protocol.startswith('HTTP-HDS'):
+                        formats.extend(self._extract_f4m_formats(
+                            asset_url + ('?' if '?' not in asset_url else '&') + 'hdcore=3.4.0',
+                            media_id, f4m_id=format_id, fatal=False))
+                    elif protocol.startswith('HTTP-HLS'):
+                        formats.extend(self._extract_m3u8_formats(
+                            asset_url, media_id, 'mp4', 'm3u8_native',
+                            m3u8_id=format_id, fatal=False))
                  else:
                      formats.append({
                          'format_id': format_id,
@@ -94,10 +104,10 @@ class SRGSSRPlayIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-        'md5': '4cd93523723beff51bb4bee974ee238d',
+        'md5': 'da6b5b3ac9fa4761a942331cef20fcb3',
          'info_dict': {
              'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-            'ext': 'm4v',
+            'ext': 'mp4',
              'upload_date': '20130701',
              'title': 'Snowden beantragt Asyl in Russland',
              'timestamp': 1372713995,
@@ -140,7 +150,7 @@ class SRGSSRPlayIE(InfoExtractor):
              'uploader': '19h30',
              'upload_date': '20141201',
              'timestamp': 1417458600,
-            'thumbnail': 're:^https?://.*\.image',
+            'thumbnail': r're:^https?://.*\.image',
              'view_count': int,
          },
          'params': {
diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py

index b03272f7a273e8a3726adb03d805bd2a449849bf..28baf901c9f021c15544f099f78dd5d5a6b9165c 100644 (file)
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -20,7 +20,7 @@ class SRMediathekIE(ARDMediathekIE):
              'ext': 'mp4',
              'title': 'sportarena (26.10.2014)',
              'description': 'Ringen: KSV Köllerbach gegen Aachen-Walheim; Frauen-Fußball: 1. FC Saarbrücken gegen Sindelfingen; Motorsport: Rallye in Losheim; dazu: Interview mit Timo Bernhard; Turnen: TG Saar; Reitsport: Deutscher Voltigier-Pokal; Badminton: Interview mit Michael Fuchs ',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'skip': 'no longer available',
      }, {
diff --git a/youtube_dl/extractor/stanfordoc.py b/youtube_dl/extractor/stanfordoc.py

index 4a3d8bb8f267b588c59e2f16b208955a70d362d9..cce65fb1014d3595670707d8009832ea37f448dc 100644 (file)
--- a/youtube_dl/extractor/stanfordoc.py
+++ b/youtube_dl/extractor/stanfordoc.py
@@ -66,7 +66,7 @@ class StanfordOpenClassroomIE(InfoExtractor):
                  r'(?s)<description>([^<]+)</description>',
                  coursepage, 'description', fatal=False)
  
-            links = orderedSet(re.findall('<a href="(VideoPage.php\?[^"]+)">', coursepage))
+            links = orderedSet(re.findall(r'<a href="(VideoPage.php\?[^"]+)">', coursepage))
              info['entries'] = [self.url_result(
                  'http://openclassroom.stanford.edu/MainFolder/%s' % unescapeHTML(l)
              ) for l in links]
@@ -84,7 +84,7 @@ class StanfordOpenClassroomIE(InfoExtractor):
              rootpage = self._download_webpage(rootURL, info['id'],
                                                errnote='Unable to download course info page')
  
-            links = orderedSet(re.findall('<a href="(CoursePage.php\?[^"]+)">', rootpage))
+            links = orderedSet(re.findall(r'<a href="(CoursePage.php\?[^"]+)">', rootpage))
              info['entries'] = [self.url_result(
                  'http://openclassroom.stanford.edu/MainFolder/%s' % unescapeHTML(l)
              ) for l in links]
diff --git a/youtube_dl/extractor/stitcher.py b/youtube_dl/extractor/stitcher.py

index 0f8782d038c9fdadf903b05479ff468a039c6aa4..97d1ff6811b27140c77932a766b7cb9d3dbfe7b6 100644 (file)
--- a/youtube_dl/extractor/stitcher.py
+++ b/youtube_dl/extractor/stitcher.py
@@ -22,7 +22,7 @@ class StitcherIE(InfoExtractor):
              'title': 'Machine Learning Mastery and Cancer Clusters',
              'description': 'md5:55163197a44e915a14a1ac3a1de0f2d3',
              'duration': 1604,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
      }, {
          'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true',
@@ -33,7 +33,7 @@ class StitcherIE(InfoExtractor):
              'title': "The CW's 'Crazy Ex-Girlfriend'",
              'description': 'md5:04f1e2f98eb3f5cbb094cea0f9e19b17',
              'duration': 2235,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/streamable.py b/youtube_dl/extractor/streamable.py

index 2c26fa689003c6203399eca293c32c8998636ea5..e973c867c1a23eeeacbbc706269e50900f4f60b9 100644 (file)
--- a/youtube_dl/extractor/streamable.py
+++ b/youtube_dl/extractor/streamable.py
@@ -21,7 +21,7 @@ class StreamableIE(InfoExtractor):
                  'id': 'dnd1',
                  'ext': 'mp4',
                  'title': 'Mikel Oiarzabal scores to make it 0-3 for La Real against Espanyol',
-                'thumbnail': 're:https?://.*\.jpg$',
+                'thumbnail': r're:https?://.*\.jpg$',
                  'uploader': 'teabaker',
                  'timestamp': 1454964157.35115,
                  'upload_date': '20160208',
@@ -37,7 +37,7 @@ class StreamableIE(InfoExtractor):
                  'id': 'moo',
                  'ext': 'mp4',
                  'title': '"Please don\'t eat me!"',
-                'thumbnail': 're:https?://.*\.jpg$',
+                'thumbnail': r're:https?://.*\.jpg$',
                  'timestamp': 1426115495,
                  'upload_date': '20150311',
                  'duration': 12,
diff --git a/youtube_dl/extractor/streetvoice.py b/youtube_dl/extractor/streetvoice.py

index e529051d100b8024007229200648ea259b3d1677..91612c7f22d260c8544cd0ead31dd830daab0424 100644 (file)
--- a/youtube_dl/extractor/streetvoice.py
+++ b/youtube_dl/extractor/streetvoice.py
@@ -16,7 +16,7 @@ class StreetVoiceIE(InfoExtractor):
              'ext': 'mp3',
              'title': '輸',
              'description': 'Crispy脆樂團 - 輸',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 260,
              'upload_date': '20091018',
              'uploader': 'Crispy脆樂團',
diff --git a/youtube_dl/extractor/sunporno.py b/youtube_dl/extractor/sunporno.py

index ef9be7926866f6420d802f14cfdf83b3a9e4f69b..68051169b974d7bc748238566be1c31734eb8ed7 100644 (file)
--- a/youtube_dl/extractor/sunporno.py
+++ b/youtube_dl/extractor/sunporno.py
@@ -21,7 +21,7 @@ class SunPornoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'md5:0a400058e8105d39e35c35e7c5184164',
              'description': 'md5:a31241990e1bd3a64e72ae99afb325fb',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 302,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/svt.py b/youtube_dl/extractor/svt.py

index fb0a4b24ef5bf65ff13ca2288395f09540e71d48..10cf808857e231cee482434010161a93eee85027 100644 (file)
--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@@ -129,7 +129,7 @@ class SVTPlayIE(SVTBaseIE):
              'ext': 'mp4',
              'title': 'Flygplan till Haile Selassie',
              'duration': 3527,
-            'thumbnail': 're:^https?://.*[\.-]jpg$',
+            'thumbnail': r're:^https?://.*[\.-]jpg$',
              'age_limit': 0,
              'subtitles': {
                  'sv': [{
diff --git a/youtube_dl/extractor/swrmediathek.py b/youtube_dl/extractor/swrmediathek.py

index 6d69f7686b37bd2b39b6362373eadefedef0b932..0f615979e132278c91deef05cc4f0e812c158f3d 100644 (file)
--- a/youtube_dl/extractor/swrmediathek.py
+++ b/youtube_dl/extractor/swrmediathek.py
@@ -1,10 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import parse_duration
+from ..utils import (
+    parse_duration,
+    int_or_none,
+    determine_protocol,
+)
  
  
  class SWRMediathekIE(InfoExtractor):
@@ -18,7 +20,7 @@ class SWRMediathekIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'SWR odysso',
              'description': 'md5:2012e31baad36162e97ce9eb3f157b8a',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'thumbnail': r're:^http:.*\.jpg$',
              'duration': 2602,
              'upload_date': '20140515',
              'uploader': 'SWR Fernsehen',
@@ -32,12 +34,13 @@ class SWRMediathekIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Nachtcafé - Alltagsdroge Alkohol - zwischen Sektempfang und Komasaufen',
              'description': 'md5:e0a3adc17e47db2c23aab9ebc36dbee2',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'duration': 5305,
              'upload_date': '20140516',
              'uploader': 'SWR Fernsehen',
              'uploader_id': '990030',
          },
+        'skip': 'redirect to http://swrmediathek.de/index.htm?hinweis=swrlink',
      }, {
          'url': 'http://swrmediathek.de/player.htm?show=bba23e10-cb93-11e3-bf7f-0026b975f2e6',
          'md5': '4382e4ef2c9d7ce6852535fa867a0dd3',
@@ -46,59 +49,67 @@ class SWRMediathekIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Saša Stanišic: Vor dem Fest',
              'description': 'md5:5b792387dc3fbb171eb709060654e8c9',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'duration': 3366,
              'upload_date': '20140520',
              'uploader': 'SWR 2',
              'uploader_id': '284670',
-        }
+        },
+        'skip': 'redirect to http://swrmediathek.de/index.htm?hinweis=swrlink',
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          video = self._download_json(
-            'http://swrmediathek.de/AjaxEntry?ekey=%s' % video_id, video_id, 'Downloading video JSON')
+            'http://swrmediathek.de/AjaxEntry?ekey=%s' % video_id,
+            video_id, 'Downloading video JSON')
  
          attr = video['attr']
-        media_type = attr['entry_etype']
+        title = attr['entry_title']
+        media_type = attr.get('entry_etype')
  
          formats = []
-        for entry in video['sub']:
-            if entry['name'] != 'entry_media':
+        for entry in video.get('sub', []):
+            if entry.get('name') != 'entry_media':
                  continue
  
-            entry_attr = entry['attr']
-            codec = entry_attr['val0']
-            quality = int(entry_attr['val1'])
-
-            fmt = {
-                'url': entry_attr['val2'],
-                'quality': quality,
-            }
-
-            if media_type == 'Video':
-                fmt.update({
-                    'format_note': ['144p', '288p', '544p', '720p'][quality - 1],
-                    'vcodec': codec,
-                })
-            elif media_type == 'Audio':
-                fmt.update({
-                    'acodec': codec,
+            entry_attr = entry.get('attr', {})
+            f_url = entry_attr.get('val2')
+            if not f_url:
+                continue
+            codec = entry_attr.get('val0')
+            if codec == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    f_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif codec == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    f_url + '?hdcore=3.7.0', video_id,
+                    f4m_id='hds', fatal=False))
+            else:
+                formats.append({
+                    'format_id': determine_protocol({'url': f_url}),
+                    'url': f_url,
+                    'quality': int_or_none(entry_attr.get('val1')),
+                    'vcodec': codec if media_type == 'Video' else 'none',
+                    'acodec': codec if media_type == 'Audio' else None,
                  })
-            formats.append(fmt)
-
          self._sort_formats(formats)
  
+        upload_date = None
+        entry_pdatet = attr.get('entry_pdatet')
+        if entry_pdatet:
+            upload_date = entry_pdatet[:-4]
+
          return {
              'id': video_id,
-            'title': attr['entry_title'],
-            'description': attr['entry_descl'],
-            'thumbnail': attr['entry_image_16_9'],
-            'duration': parse_duration(attr['entry_durat']),
-            'upload_date': attr['entry_pdatet'][:-4],
-            'uploader': attr['channel_title'],
-            'uploader_id': attr['channel_idkey'],
+            'title': title,
+            'description': attr.get('entry_descl'),
+            'thumbnail': attr.get('entry_image_16_9'),
+            'duration': parse_duration(attr.get('entry_durat')),
+            'upload_date': upload_date,
+            'uploader': attr.get('channel_title'),
+            'uploader_id': attr.get('channel_idkey'),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/tagesschau.py b/youtube_dl/extractor/tagesschau.py

index 8670cee28d381de6011e3187db3024bcc40519de..c351b754594a08be2f585f901c3a71ac425bcfd7 100644 (file)
--- a/youtube_dl/extractor/tagesschau.py
+++ b/youtube_dl/extractor/tagesschau.py
@@ -23,7 +23,7 @@ class TagesschauPlayerIE(InfoExtractor):
              'id': '179517',
              'ext': 'mp4',
              'title': 'Marie Kristin Boese, ARD Berlin, über den zukünftigen Kurs der AfD',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
              'formats': 'mincount:6',
          },
      }, {
@@ -33,7 +33,7 @@ class TagesschauPlayerIE(InfoExtractor):
              'id': '29417',
              'ext': 'mp3',
              'title': 'Trabi - Bye, bye Rennpappe',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
              'formats': 'mincount:2',
          },
      }, {
@@ -135,7 +135,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Regierungsumbildung in Athen: Neue Minister in Griechenland vereidigt',
              'description': '18.07.2015 20:10 Uhr',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          'url': 'http://www.tagesschau.de/multimedia/sendung/ts-5727.html',
@@ -145,7 +145,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Sendung: tagesschau \t04.12.2014 20:00 Uhr',
              'description': 'md5:695c01bfd98b7e313c501386327aea59',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          # exclusive audio
@@ -156,7 +156,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Trabi - Bye, bye Rennpappe',
              'description': 'md5:8687dda862cbbe2cfb2df09b56341317',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          # audio in article
@@ -167,7 +167,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Viele Baustellen für neuen BND-Chef',
              'description': 'md5:1e69a54be3e1255b2b07cdbce5bcd8b4',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          'url': 'http://www.tagesschau.de/inland/afd-parteitag-135.html',
diff --git a/youtube_dl/extractor/tass.py b/youtube_dl/extractor/tass.py

index 5293393efc219526b61fe04ff12ff25f1d49b33c..6d336da788b8f3b51b3c61d3d0a2215388f0342a 100644 (file)
--- a/youtube_dl/extractor/tass.py
+++ b/youtube_dl/extractor/tass.py
@@ -21,7 +21,7 @@ class TassIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Посетителям московского зоопарка показали красную панду',
                  'description': 'Приехавшую из Дублина Зейну можно увидеть в павильоне "Кошки тропиков"',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/tdslifeway.py b/youtube_dl/extractor/tdslifeway.py

index 4d1f5c8016063ce1d18e5152e479b788e6152c25..101c6ee31a97bfa9841aeebffb911f662853d89c 100644 (file)
--- a/youtube_dl/extractor/tdslifeway.py
+++ b/youtube_dl/extractor/tdslifeway.py
@@ -13,7 +13,7 @@ class TDSLifewayIE(InfoExtractor):
              'id': '3453494717001',
              'ext': 'mp4',
              'title': 'The Gospel by Numbers',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'upload_date': '20140410',
              'description': 'Coming soon from T4G 2014!',
              'uploader_id': '2034960640001',
diff --git a/youtube_dl/extractor/teachertube.py b/youtube_dl/extractor/teachertube.py

index df5d5556fadf82c8dc680643389fdeccf989793f..f14713a78904c0e879571d6642061f3baa7a617a 100644 (file)
--- a/youtube_dl/extractor/teachertube.py
+++ b/youtube_dl/extractor/teachertube.py
@@ -24,7 +24,7 @@ class TeacherTubeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Measures of dispersion from a frequency table',
              'description': 'Measures of dispersion from a frequency table',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.teachertube.com/viewVideo.php?video_id=340064',
@@ -34,7 +34,7 @@ class TeacherTubeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'How to Make Paper Dolls _ Paper Art Projects',
              'description': 'Learn how to make paper dolls in this simple',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.teachertube.com/music.php?music_id=8805',
diff --git a/youtube_dl/extractor/ted.py b/youtube_dl/extractor/ted.py

index 451cde76d2e757fcdfb30ad96847b16aa4d156ff..1b1afab32c349be119f3db8c19c6ed68e5c5ccce 100644 (file)
--- a/youtube_dl/extractor/ted.py
+++ b/youtube_dl/extractor/ted.py
@@ -47,7 +47,7 @@ class TEDIE(InfoExtractor):
              'id': 'tSVI8ta_P4w',
              'ext': 'mp4',
              'title': 'Vishal Sikka: The beauty and power of algorithms',
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
              'description': 'md5:6261fdfe3e02f4f579cbbfc00aff73f4',
              'upload_date': '20140122',
              'uploader_id': 'TEDInstitute',
@@ -189,7 +189,7 @@ class TEDIE(InfoExtractor):
                          'format_id': '%s-%sk' % (format_id, bitrate),
                          'tbr': bitrate,
                      })
-                    if re.search('\d+k', h264_url):
+                    if re.search(r'\d+k', h264_url):
                          http_url = h264_url
              elif format_id == 'rtmp':
                  streamer = talk_info.get('streamer')
diff --git a/youtube_dl/extractor/telebruxelles.py b/youtube_dl/extractor/telebruxelles.py

index eefecc490c5d13476259497e79f7a3ebe68caee7..5886e9c1bb7e0c4e9b192480ac2cfa48118ffe2a 100644 (file)
--- a/youtube_dl/extractor/telebruxelles.py
+++ b/youtube_dl/extractor/telebruxelles.py
@@ -7,33 +7,30 @@ from .common import InfoExtractor
  
  
  class TeleBruxellesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt|emission)/?(?P<id>[^/#?]+)'
      _TESTS = [{
-        'url': 'http://www.telebruxelles.be/news/auditions-devant-parlement-francken-galant-tres-attendus/',
-        'md5': '59439e568c9ee42fb77588b2096b214f',
+        'url': 'http://bx1.be/news/que-risque-lauteur-dune-fausse-alerte-a-la-bombe/',
+        'md5': 'a2a67a5b1c3e8c9d33109b902f474fd9',
          'info_dict': {
-            'id': '11942',
-            'display_id': 'auditions-devant-parlement-francken-galant-tres-attendus',
-            'ext': 'flv',
-            'title': 'Parlement : Francken et Galant répondent aux interpellations de l’opposition',
-            'description': 're:Les auditions des ministres se poursuivent*'
-        },
-        'params': {
-            'skip_download': 'requires rtmpdump'
+            'id': '158856',
+            'display_id': 'que-risque-lauteur-dune-fausse-alerte-a-la-bombe',
+            'ext': 'mp4',
+            'title': 'Que risque l’auteur d’une fausse alerte à la bombe ?',
+            'description': 'md5:3cf8df235d44ebc5426373050840e466',
          },
      }, {
-        'url': 'http://www.telebruxelles.be/sport/basket-brussels-bat-mons-80-74/',
-        'md5': '181d3fbdcf20b909309e5aef5c6c6047',
+        'url': 'http://bx1.be/sport/futsal-schaerbeek-sincline-5-3-a-thulin/',
+        'md5': 'dfe07ecc9c153ceba8582ac912687675',
          'info_dict': {
-            'id': '10091',
-            'display_id': 'basket-brussels-bat-mons-80-74',
-            'ext': 'flv',
-            'title': 'Basket : le Brussels bat Mons 80-74',
-            'description': 're:^Ils l\u2019on fait ! En basket, le B*',
-        },
-        'params': {
-            'skip_download': 'requires rtmpdump'
+            'id': '158433',
+            'display_id': 'futsal-schaerbeek-sincline-5-3-a-thulin',
+            'ext': 'mp4',
+            'title': 'Futsal : Schaerbeek s’incline 5-3 à Thulin',
+            'description': 'md5:fd013f1488d5e2dceb9cebe39e2d569b',
          },
+    }, {
+        'url': 'http://bx1.be/emission/bxenf1-gastronomie/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -50,13 +47,13 @@ class TeleBruxellesIE(InfoExtractor):
              r'file\s*:\s*"(rtmp://[^/]+/vod/mp4:"\s*\+\s*"[^"]+"\s*\+\s*".mp4)"',
              webpage, 'RTMP url')
          rtmp_url = re.sub(r'"\s*\+\s*"', '', rtmp_url)
+        formats = self._extract_wowza_formats(rtmp_url, article_id or display_id)
+        self._sort_formats(formats)
  
          return {
              'id': article_id or display_id,
              'display_id': display_id,
              'title': title,
              'description': description,
-            'url': rtmp_url,
-            'ext': 'flv',
-            'rtmp_live': True  # if rtmpdump is not called with "--live" argument, the download is blocked and can be completed
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/telegraaf.py b/youtube_dl/extractor/telegraaf.py

index 58078c531d151e319fb7e707d8116a730507962b..0f576c1aba1a01491f657d8d892291cfe7934f21 100644 (file)
--- a/youtube_dl/extractor/telegraaf.py
+++ b/youtube_dl/extractor/telegraaf.py
@@ -17,7 +17,7 @@ class TelegraafIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Tikibad ontruimd wegens brand',
              'description': 'md5:05ca046ff47b931f9b04855015e163a4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 33,
          },
          'params': {
diff --git a/youtube_dl/extractor/telemb.py b/youtube_dl/extractor/telemb.py

index 1bbd0e7bdfbf7148ce6c82c57c3e5113263626d3..9bcac4ec008239b5f0a11bea385f674d545bf2d9 100644 (file)
--- a/youtube_dl/extractor/telemb.py
+++ b/youtube_dl/extractor/telemb.py
@@ -19,7 +19,7 @@ class TeleMBIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Mons - Cook with Danielle : des cours de cuisine en anglais ! - Les reportages',
                  'description': 'md5:bc5225f47b17c309761c856ad4776265',
-                'thumbnail': 're:^http://.*\.(?:jpg|png)$',
+                'thumbnail': r're:^http://.*\.(?:jpg|png)$',
              }
          },
          {
@@ -32,7 +32,7 @@ class TeleMBIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Havré - Incendie mortel - Les reportages',
                  'description': 'md5:5e54cb449acb029c2b7734e2d946bd4a',
-                'thumbnail': 're:^http://.*\.(?:jpg|png)$',
+                'thumbnail': r're:^http://.*\.(?:jpg|png)$',
              }
          },
      ]
diff --git a/youtube_dl/extractor/telewebion.py b/youtube_dl/extractor/telewebion.py

index 7786b281371181b8e42378cac766946fdf59b762..1207b1a1b8cdcc5fc5b3d1c71b51c54ba1c300e4 100644 (file)
--- a/youtube_dl/extractor/telewebion.py
+++ b/youtube_dl/extractor/telewebion.py
@@ -13,7 +13,7 @@ class TelewebionIE(InfoExtractor):
              'id': '1263668',
              'ext': 'mp4',
              'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
          },
          'params': {
diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py

index cfbf7f4e1562c78ea1d5ae44437694a5325eb70b..192d8fa292e0a6f360929590274d06b4745fb8f6 100644 (file)
--- a/youtube_dl/extractor/theplatform.py
+++ b/youtube_dl/extractor/theplatform.py
@@ -33,7 +33,9 @@ _x = lambda p: xpath_with_ns(p, {'smil': default_ns})
  
  class ThePlatformBaseIE(OnceIE):
      def _extract_theplatform_smil(self, smil_url, video_id, note='Downloading SMIL data'):
-        meta = self._download_xml(smil_url, video_id, note=note, query={'format': 'SMIL'})
+        meta = self._download_xml(
+            smil_url, video_id, note=note, query={'format': 'SMIL'},
+            headers=self.geo_verification_headers())
          error_element = find_xpath_attr(meta, _x('.//smil:ref'), 'src')
          if error_element is not None and error_element.attrib['src'].startswith(
                  'http://link.theplatform.com/s/errorFiles/Unavailable.'):
@@ -154,7 +156,7 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
              'title': 'iPhone Siri’s sassy response to a math question has people talking',
              'description': 'md5:a565d1deadd5086f3331d57298ec6333',
              'duration': 83.0,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1435752600,
              'upload_date': '20150701',
              'uploader': 'NBCU-NEWS',
@@ -295,7 +297,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
              'ext': 'mp4',
              'title': 'The Biden factor: will Joe run in 2016?',
              'description': 'Could Vice President Joe Biden be preparing a 2016 campaign? Mark Halperin and Sam Stein weigh in.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140208',
              'timestamp': 1391824260,
              'duration': 467.0,
diff --git a/youtube_dl/extractor/thisamericanlife.py b/youtube_dl/extractor/thisamericanlife.py

index 36493a5de06cb0401b78fc5f1ecf2fca59208cb7..91e45f2c3def81545454e20e0b5e07617fa54030 100644 (file)
--- a/youtube_dl/extractor/thisamericanlife.py
+++ b/youtube_dl/extractor/thisamericanlife.py
@@ -13,7 +13,7 @@ class ThisAmericanLifeIE(InfoExtractor):
              'ext': 'm4a',
              'title': '487: Harper High School, Part One',
              'description': 'md5:ee40bdf3fb96174a9027f76dbecea655',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://www.thisamericanlife.org/play_full.php?play=487',
diff --git a/youtube_dl/extractor/thisoldhouse.py b/youtube_dl/extractor/thisoldhouse.py

index 7629f0d10e4ebc40bf25b0f02f52b2524ab9e303..197258df141b4b6864afa0e4c1df7d0db431f64e 100644 (file)
--- a/youtube_dl/extractor/thisoldhouse.py
+++ b/youtube_dl/extractor/thisoldhouse.py
@@ -5,10 +5,10 @@ from .common import InfoExtractor
  
  
  class ThisOldHouseIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to)/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
      _TESTS = [{
          'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
-        'md5': '568acf9ca25a639f0c4ff905826b662f',
+        'md5': '946f05bbaa12a33f9ae35580d2dfcfe3',
          'info_dict': {
              'id': '2REGtUDQ',
              'ext': 'mp4',
@@ -20,6 +20,9 @@ class ThisOldHouseIE(InfoExtractor):
      }, {
          'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
          'only_matching': True,
+    }, {
+        'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/tinypic.py b/youtube_dl/extractor/tinypic.py

index c43cace24d5bfd107328944d0bd290594ec06b3f..bc2def508c41afbc62bd2148eb12c4a8be0a649e 100644 (file)
--- a/youtube_dl/extractor/tinypic.py
+++ b/youtube_dl/extractor/tinypic.py
@@ -34,7 +34,7 @@ class TinyPicIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id, 'Downloading page')
  
          mobj = re.search(r'(?m)fo\.addVariable\("file",\s"(?P<fileid>[\da-z]+)"\);\n'
-                         '\s+fo\.addVariable\("s",\s"(?P<serverid>\d+)"\);', webpage)
+                         r'\s+fo\.addVariable\("s",\s"(?P<serverid>\d+)"\);', webpage)
          if mobj is None:
              raise ExtractorError('Video %s does not exist' % video_id, expected=True)
  
diff --git a/youtube_dl/extractor/tnaflix.py b/youtube_dl/extractor/tnaflix.py

index 77d56b8ca87306a66c22a7e41c5d01de6bba9cb6..7e6ec3430bda4bd042d0b598ad2c7ef4dea53e77 100644 (file)
--- a/youtube_dl/extractor/tnaflix.py
+++ b/youtube_dl/extractor/tnaflix.py
@@ -91,7 +91,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          formats = []
  
          def extract_video_url(vl):
-            return re.sub('speed=\d+', 'speed=', unescapeHTML(vl.text))
+            return re.sub(r'speed=\d+', 'speed=', unescapeHTML(vl.text))
  
          video_link = cfg_xml.find('./videoLink')
          if video_link is not None:
@@ -174,7 +174,7 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
              'display_id': '6538',
              'ext': 'mp4',
              'title': 'Educational xxx video',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          },
          'params': {
@@ -209,7 +209,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE):
              'display_id': 'Carmella-Decesare-striptease',
              'ext': 'mp4',
              'title': 'Carmella Decesare - striptease',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 91,
              'age_limit': 18,
              'categories': ['Porn Stars'],
@@ -224,7 +224,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE):
              'ext': 'mp4',
              'title': 'Educational xxx video',
              'description': 'md5:b4fab8f88a8621c8fabd361a173fe5b8',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 164,
              'age_limit': 18,
              'uploader': 'bobwhite39',
@@ -250,7 +250,7 @@ class EMPFlixIE(TNAFlixNetworkBaseIE):
              'ext': 'mp4',
              'title': 'Amateur Finger Fuck',
              'description': 'Amateur solo finger fucking.',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 83,
              'age_limit': 18,
              'uploader': 'cwbike',
@@ -280,7 +280,7 @@ class MovieFapIE(TNAFlixNetworkBaseIE):
              'ext': 'mp4',
              'title': 'Experienced MILF Amazing Handjob',
              'description': 'Experienced MILF giving an Amazing Handjob',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
              'uploader': 'darvinfred06',
              'view_count': int,
@@ -298,7 +298,7 @@ class MovieFapIE(TNAFlixNetworkBaseIE):
              'ext': 'flv',
              'title': 'Jeune Couple Russe',
              'description': 'Amateur',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
              'uploader': 'whiskeyjar',
              'view_count': int,
diff --git a/youtube_dl/extractor/tudou.py b/youtube_dl/extractor/tudou.py

index bb8b8e23424e7943f2133028aca187d4fcffeab9..2aae55e7e8f8742b471e4f8ffe94ab2ae79bae25 100644 (file)
--- a/youtube_dl/extractor/tudou.py
+++ b/youtube_dl/extractor/tudou.py
@@ -23,7 +23,7 @@ class TudouIE(InfoExtractor):
              'id': '159448201',
              'ext': 'f4v',
              'title': '卡马乔国足开大脚长传冲吊集锦',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1372113489000,
              'description': '卡马乔卡家军，开大脚先进战术不完全集锦！',
              'duration': 289.04,
@@ -36,7 +36,7 @@ class TudouIE(InfoExtractor):
              'id': '117049447',
              'ext': 'f4v',
              'title': 'La Sylphide-Bolshoi-Ekaterina Krysanova & Vyacheslav Lopatin 2012',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1349207518000,
              'description': 'md5:294612423894260f2dcd5c6c04fe248b',
              'duration': 5478.33,
diff --git a/youtube_dl/extractor/tumblr.py b/youtube_dl/extractor/tumblr.py

index ebe411e12aa5fa44e201dcaefc52e839e5b2d212..786143525d4d7cf4455ec59eff20a5e3a88dc4ea 100644 (file)
--- a/youtube_dl/extractor/tumblr.py
+++ b/youtube_dl/extractor/tumblr.py
@@ -17,7 +17,7 @@ class TumblrIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'tatiana maslany news, Orphan Black || DVD extra - behind the scenes ↳...',
              'description': 'md5:37db8211e40b50c7c44e95da14f630b7',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          'url': 'http://5sostrum.tumblr.com/post/90208453769/yall-forgetting-the-greatest-keek-of-them-all',
@@ -27,7 +27,7 @@ class TumblrIE(InfoExtractor):
              'ext': 'mp4',
              'title': '5SOS STRUM ;]',
              'description': 'md5:dba62ac8639482759c8eb10ce474586a',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          'url': 'http://hdvideotest.tumblr.com/post/130323439814/test-description-for-my-hd-video',
@@ -37,7 +37,7 @@ class TumblrIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'HD Video Testing \u2014 Test description for my HD video',
              'description': 'md5:97cc3ab5fcd27ee4af6356701541319c',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
          'params': {
              'format': 'hd',
@@ -92,7 +92,7 @@ class TumblrIE(InfoExtractor):
              'title': 'Video by victoriassecret',
              'description': 'Invisibility or flight…which superpower would YOU choose? #VSFashionShow #ThisOrThat',
              'uploader_id': 'victoriassecret',
-            'thumbnail': 're:^https?://.*\.jpg'
+            'thumbnail': r're:^https?://.*\.jpg'
          },
          'add_ie': ['Instagram'],
      }]
diff --git a/youtube_dl/extractor/tunein.py b/youtube_dl/extractor/tunein.py

index ae4cfaec29b493c3b8b8e11705629901a07a2bf2..7e51de89ed6082d35737142e85efb19726b03985 100644 (file)
--- a/youtube_dl/extractor/tunein.py
+++ b/youtube_dl/extractor/tunein.py
@@ -11,6 +11,12 @@ from ..compat import compat_urlparse
  class TuneInBaseIE(InfoExtractor):
      _API_BASE_URL = 'http://tunein.com/tuner/tune/'
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+src=["\'](?P<url>(?:https?://)?tunein\.com/embed/player/[pst]\d+)',
+            webpage)
+
      def _real_extract(self, url):
          content_id = self._match_id(url)
  
@@ -69,82 +75,83 @@ class TuneInClipIE(TuneInBaseIE):
      _VALID_URL = r'https?://(?:www\.)?tunein\.com/station/.*?audioClipId\=(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=AudioClip&audioclipId=%s'
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816',
-            'md5': '99f00d772db70efc804385c6b47f4e77',
-            'info_dict': {
-                'id': '816',
-                'title': '32m',
-                'ext': 'mp3',
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816',
+        'md5': '99f00d772db70efc804385c6b47f4e77',
+        'info_dict': {
+            'id': '816',
+            'title': '32m',
+            'ext': 'mp3',
          },
-    ]
+    }]
  
  
  class TuneInStationIE(TuneInBaseIE):
      IE_NAME = 'tunein:station'
-    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId\=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId=|embed/player/s)(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=Station&stationId=%s'
  
      @classmethod
      def suitable(cls, url):
          return False if TuneInClipIE.suitable(url) else super(TuneInStationIE, cls).suitable(url)
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/radio/Jazz24-885-s34682/',
-            'info_dict': {
-                'id': '34682',
-                'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
-                'ext': 'mp3',
-                'location': 'Tacoma, WA',
-            },
-            'params': {
-                'skip_download': True,  # live stream
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/radio/Jazz24-885-s34682/',
+        'info_dict': {
+            'id': '34682',
+            'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
+            'ext': 'mp3',
+            'location': 'Tacoma, WA',
+        },
+        'params': {
+            'skip_download': True,  # live stream
          },
-    ]
+    }, {
+        'url': 'http://tunein.com/embed/player/s6404/',
+        'only_matching': True,
+    }]
  
  
  class TuneInProgramIE(TuneInBaseIE):
      IE_NAME = 'tunein:program'
-    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId\=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId=|embed/player/p)(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=Program&programId=%s'
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/radio/Jazz-24-p2506/',
-            'info_dict': {
-                'id': '2506',
-                'title': 'Jazz 24 on 91.3 WUKY-HD3',
-                'ext': 'mp3',
-                'location': 'Lexington, KY',
-            },
-            'params': {
-                'skip_download': True,  # live stream
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/radio/Jazz-24-p2506/',
+        'info_dict': {
+            'id': '2506',
+            'title': 'Jazz 24 on 91.3 WUKY-HD3',
+            'ext': 'mp3',
+            'location': 'Lexington, KY',
          },
-    ]
+        'params': {
+            'skip_download': True,  # live stream
+        },
+    }, {
+        'url': 'http://tunein.com/embed/player/p191660/',
+        'only_matching': True,
+    }]
  
  
  class TuneInTopicIE(TuneInBaseIE):
      IE_NAME = 'tunein:topic'
-    _VALID_URL = r'https?://(?:www\.)?tunein\.com/topic/.*?TopicId\=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:topic/.*?TopicId=|embed/player/t)(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=Topic&topicId=%s'
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/topic/?TopicId=101830576',
-            'md5': 'c31a39e6f988d188252eae7af0ef09c9',
-            'info_dict': {
-                'id': '101830576',
-                'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)',
-                'ext': 'mp3',
-                'location': 'Belgium',
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/topic/?TopicId=101830576',
+        'md5': 'c31a39e6f988d188252eae7af0ef09c9',
+        'info_dict': {
+            'id': '101830576',
+            'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)',
+            'ext': 'mp3',
+            'location': 'Belgium',
          },
-    ]
+    }, {
+        'url': 'http://tunein.com/embed/player/t101830576/',
+        'only_matching': True,
+    }]
  
  
  class TuneInShortenerIE(InfoExtractor):
diff --git a/youtube_dl/extractor/turbo.py b/youtube_dl/extractor/turbo.py

index 7ae63a4992a74368ec8b5f6a266a298cb6776b79..25aa9c58e522ec0cdecceeeb296f129c8da92a2d 100644 (file)
--- a/youtube_dl/extractor/turbo.py
+++ b/youtube_dl/extractor/turbo.py
@@ -24,7 +24,7 @@ class TurboIE(InfoExtractor):
              'duration': 3715,
              'title': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia... ',
              'description': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia...',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/turner.py b/youtube_dl/extractor/turner.py

index 57ffedb87f815d4aef87065a235a89871001be41..1c0be9fc6aa97260622b1148763a57dbf25ce50f 100644 (file)
--- a/youtube_dl/extractor/turner.py
+++ b/youtube_dl/extractor/turner.py
@@ -100,9 +100,13 @@ class TurnerBaseIE(AdobePassIE):
                  formats.extend(self._extract_smil_formats(
                      video_url, video_id, fatal=False))
              elif ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
+                m3u8_formats = self._extract_m3u8_formats(
                      video_url, video_id, 'mp4',
-                    m3u8_id=format_id or 'hls', fatal=False))
+                    m3u8_id=format_id or 'hls', fatal=False)
+                if '/secure/' in video_url and '?hdnea=' in video_url:
+                    for f in m3u8_formats:
+                        f['_seekable'] = False
+                formats.extend(m3u8_formats)
              elif ext == 'f4m':
                  formats.extend(self._extract_f4m_formats(
                      update_url_query(video_url, {'hdcore': '3.7.0'}),
diff --git a/youtube_dl/extractor/tv2.py b/youtube_dl/extractor/tv2.py

index bd28267b0cb6a0154133c98f567c24f054b5459a..d5071e8a5faf72a25c6b21cdfb2987b6a731fb32 100644 (file)
--- a/youtube_dl/extractor/tv2.py
+++ b/youtube_dl/extractor/tv2.py
@@ -126,7 +126,7 @@ class TV2ArticleIE(InfoExtractor):
  
          if not assets:
              # New embed pattern
-            for v in re.findall('TV2ContentboxVideo\(({.+?})\)', webpage):
+            for v in re.findall(r'TV2ContentboxVideo\(({.+?})\)', webpage):
                  video = self._parse_json(
                      v, playlist_id, transform_source=js_to_json, fatal=False)
                  if not video:
diff --git a/youtube_dl/extractor/tv4.py b/youtube_dl/extractor/tv4.py

index 5d2d8f13239e6ac5b10f5506143216301e5d4ecf..ad79db92beb3825dc1293b047acf7c61ca99386a 100644 (file)
--- a/youtube_dl/extractor/tv4.py
+++ b/youtube_dl/extractor/tv4.py
@@ -4,11 +4,10 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
-    ExtractorError,
      int_or_none,
      parse_iso8601,
      try_get,
-    update_url_query,
+    determine_ext,
  )
  
  
@@ -28,24 +27,24 @@ class TV4IE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.tv4.se/kalla-fakta/klipp/kalla-fakta-5-english-subtitles-2491650',
-            'md5': '909d6454b87b10a25aa04c4bdd416a9b',
+            'md5': 'cb837212f342d77cec06e6dad190e96d',
              'info_dict': {
                  'id': '2491650',
                  'ext': 'mp4',
                  'title': 'Kalla Fakta 5 (english subtitles)',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': int,
                  'upload_date': '20131125',
              },
          },
          {
              'url': 'http://www.tv4play.se/iframe/video/3054113',
-            'md5': '77f851c55139ffe0ebd41b6a5552489b',
+            'md5': 'cb837212f342d77cec06e6dad190e96d',
              'info_dict': {
                  'id': '3054113',
                  'ext': 'mp4',
                  'title': 'Så här jobbar ficktjuvarna - se avslöjande bilder',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'description': 'Unika bilder avslöjar hur turisternas fickor vittjas mitt på Stockholms central. Två experter på ficktjuvarna avslöjar knepen du ska se upp för.',
                  'timestamp': int,
                  'upload_date': '20150130',
@@ -75,11 +74,10 @@ class TV4IE(InfoExtractor):
          # If is_geo_restricted is true, it doesn't necessarily mean we can't download it
          if info.get('is_geo_restricted'):
              self.report_warning('This content might not be available in your country due to licensing restrictions.')
-        if info.get('requires_subscription'):
-            raise ExtractorError('This content requires subscription.', expected=True)
  
          title = info['title']
  
+        subtitles = {}
          formats = []
          # http formats are linked with unresolvable host
          for kind in ('hls', ''):
@@ -87,26 +85,41 @@ class TV4IE(InfoExtractor):
                  'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
                  video_id, 'Downloading sources JSON', query={
                      'protocol': kind,
-                    'videoFormat': 'MP4+WEBVTTS+WEBVTT',
+                    'videoFormat': 'MP4+WEBVTT',
                  })
-            item = try_get(data, lambda x: x['playback']['items']['item'], dict)
-            manifest_url = item.get('url')
-            if not isinstance(manifest_url, compat_str):
+            items = try_get(data, lambda x: x['playback']['items']['item'])
+            if not items:
                  continue
-            if kind == 'hls':
-                formats.extend(self._extract_m3u8_formats(
-                    manifest_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id=kind, fatal=False))
-            else:
-                formats.extend(self._extract_f4m_formats(
-                    update_url_query(manifest_url, {'hdcore': '3.8.0'}),
-                    video_id, f4m_id='hds', fatal=False))
+            if isinstance(items, dict):
+                items = [items]
+            for item in items:
+                manifest_url = item.get('url')
+                if not isinstance(manifest_url, compat_str):
+                    continue
+                ext = determine_ext(manifest_url)
+                if ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        manifest_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                        m3u8_id=kind, fatal=False))
+                elif ext == 'f4m':
+                    formats.extend(self._extract_akamai_formats(
+                        manifest_url, video_id, {
+                            'hls': 'tv4play-i.akamaihd.net',
+                        }))
+                elif ext == 'webvtt':
+                    subtitles = self._merge_subtitles(
+                        subtitles, {
+                            'sv': [{
+                                'url': manifest_url,
+                                'ext': 'vtt',
+                            }]})
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
+            'subtitles': subtitles,
              'description': info.get('description'),
              'timestamp': parse_iso8601(info.get('broadcast_date_time')),
              'duration': int_or_none(info.get('duration')),
diff --git a/youtube_dl/extractor/tva.py b/youtube_dl/extractor/tva.py

new file mode 100644 (file)

index 0000000..3ced098
--- /dev/null
+++ b/youtube_dl/extractor/tva.py
@@ -0,0 +1,54 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    smuggle_url,
+)
+
+
+class TVAIE(InfoExtractor):
+    _VALID_URL = r'https?://videos\.tva\.ca/episode/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://videos.tva.ca/episode/85538',
+        'info_dict': {
+            'id': '85538',
+            'ext': 'mp4',
+            'title': 'Épisode du 25 janvier 2017',
+            'description': 'md5:e9e7fb5532ab37984d2dc87229cadf98',
+            'upload_date': '20170126',
+            'timestamp': 1485442329,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            "https://d18jmrhziuoi7p.cloudfront.net/isl/api/v1/dataservice/Items('%s')" % video_id,
+            video_id, query={
+                '$expand': 'Metadata,CustomId',
+                '$select': 'Metadata,Id,Title,ShortDescription,LongDescription,CreatedDate,CustomId,AverageUserRating,Categories,ShowName',
+                '$format': 'json',
+            })
+        metadata = video_data.get('Metadata', {})
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'title': video_data['Title'],
+            'url': smuggle_url('ooyala:' + video_data['CustomId'], {'supportedformats': 'm3u8,hds'}),
+            'description': video_data.get('LongDescription') or video_data.get('ShortDescription'),
+            'series': video_data.get('ShowName'),
+            'episode': metadata.get('EpisodeTitle'),
+            'episode_number': int_or_none(metadata.get('EpisodeNumber')),
+            'categories': video_data.get('Categories'),
+            'average_rating': video_data.get('AverageUserRating'),
+            'timestamp': parse_iso8601(video_data.get('CreatedDate')),
+            'ie_key': 'Ooyala',
+        }
diff --git a/youtube_dl/extractor/tvc.py b/youtube_dl/extractor/tvc.py

index 4065354ddde2c63698908dfac81dc98cac77e79d..008f64cc2e6486cf779f482c24d86f03a740d939 100644 (file)
--- a/youtube_dl/extractor/tvc.py
+++ b/youtube_dl/extractor/tvc.py
@@ -19,7 +19,7 @@ class TVCIE(InfoExtractor):
              'id': '74622',
              'ext': 'mp4',
              'title': 'События. "События". Эфир от 22.05.2015 14:30',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1122,
          },
      }
@@ -72,7 +72,7 @@ class TVCArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'События. "События". Эфир от 22.05.2015 14:30',
              'description': 'md5:ad7aa7db22903f983e687b8a3e98c6dd',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1122,
          },
      }, {
@@ -82,7 +82,7 @@ class TVCArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Эксперты: в столице встал вопрос о максимально безопасных остановках',
              'description': 'md5:f2098f71e21f309e89f69b525fd9846e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 278,
          },
      }, {
@@ -92,7 +92,7 @@ class TVCArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ещё не поздно. Эфир от 03.08.2013',
              'description': 'md5:51fae9f3f8cfe67abce014e428e5b027',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 3316,
          },
      }]
diff --git a/youtube_dl/extractor/tweakers.py b/youtube_dl/extractor/tweakers.py

index 7a9386cde3d9e0e5d78bfd368d47819430c53e85..2b10d9bcaec909caa303bb33c1621527a3299797 100644 (file)
--- a/youtube_dl/extractor/tweakers.py
+++ b/youtube_dl/extractor/tweakers.py
@@ -18,7 +18,7 @@ class TweakersIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'New Nintendo 3DS XL - Op alle fronten beter',
              'description': 'md5:3789b21fed9c0219e9bcaacd43fab280',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'duration': 386,
              'uploader_id': 's7JeEm',
          }
diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py

index af92b713b08e22343f84a282d3db59b355623f04..a983ebf05ac512242415a3052fbd172668ff060e 100644 (file)
--- a/youtube_dl/extractor/twentyfourvideo.py
+++ b/youtube_dl/extractor/twentyfourvideo.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  class TwentyFourVideoIE(InfoExtractor):
      IE_NAME = '24video'
-    _VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.24video.net/video/view/1044982',
@@ -22,7 +22,7 @@ class TwentyFourVideoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Эротика каменного века',
              'description': 'Как смотрели порно в каменном веке.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'SUPERTELO',
              'duration': 31,
              'timestamp': 1275937857,
@@ -43,7 +43,7 @@ class TwentyFourVideoIE(InfoExtractor):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(
-            'http://www.24video.net/video/view/%s' % video_id, video_id)
+            'http://www.24video.sex/video/view/%s' % video_id, video_id)
  
          title = self._og_search_title(webpage)
          description = self._html_search_regex(
@@ -69,11 +69,11 @@ class TwentyFourVideoIE(InfoExtractor):
  
          # Sets some cookies
          self._download_xml(
-            r'http://www.24video.net/video/xml/%s?mode=init' % video_id,
+            r'http://www.24video.sex/video/xml/%s?mode=init' % video_id,
              video_id, 'Downloading init XML')
  
          video_xml = self._download_xml(
-            'http://www.24video.net/video/xml/%s?mode=play' % video_id,
+            'http://www.24video.sex/video/xml/%s?mode=play' % video_id,
              video_id, 'Downloading video XML')
  
          video = xpath_element(video_xml, './/video', 'video', fatal=True)
diff --git a/youtube_dl/extractor/twentymin.py b/youtube_dl/extractor/twentymin.py

index b721ecb0a106a710b6d140d7d21309307196a684..4fd1aa4bfbdaea2ec5abbac2161f6aea25e5fbbd 100644 (file)
--- a/youtube_dl/extractor/twentymin.py
+++ b/youtube_dl/extractor/twentymin.py
@@ -4,91 +4,88 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import remove_end
+from ..utils import (
+    int_or_none,
+    try_get,
+)
  
  
  class TwentyMinutenIE(InfoExtractor):
      IE_NAME = '20min'
-    _VALID_URL = r'https?://(?:www\.)?20min\.ch/(?:videotv/*\?.*\bvid=(?P<id>\d+)|(?:[^/]+/)*(?P<display_id>[^/#?]+))'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?20min\.ch/
+                        (?:
+                            videotv/*\?.*?\bvid=|
+                            videoplayer/videoplayer\.html\?.*?\bvideoId@
+                        )
+                        (?P<id>\d+)
+                    '''
      _TESTS = [{
-        # regular video
          'url': 'http://www.20min.ch/videotv/?vid=469148&cid=2',
-        'md5': 'b52d6bc6ea6398e6a38f12cfd418149c',
+        'md5': 'e7264320db31eed8c38364150c12496e',
          'info_dict': {
              'id': '469148',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': '85 000 Franken für 15 perfekte Minuten',
-            'description': 'Was die Besucher vom Silvesterzauber erwarten können. (Video: Alice Grosjean/Murat Temel)',
-            'thumbnail': 'http://thumbnails.20min-tv.ch/server063/469148/frame-72-469148.jpg'
-        }
-    }, {
-        # news article with video
-        'url': 'http://www.20min.ch/schweiz/news/story/-Wir-muessen-mutig-nach-vorne-schauen--22050469',
-        'md5': 'cd4cbb99b94130cff423e967cd275e5e',
-        'info_dict': {
-            'id': '469408',
-            'display_id': '-Wir-muessen-mutig-nach-vorne-schauen--22050469',
-            'ext': 'flv',
-            'title': '«Wir müssen mutig nach vorne schauen»',
-            'description': 'Kein Land sei innovativer als die Schweiz, sagte Johann Schneider-Ammann in seiner Neujahrsansprache. Das Land müsse aber seine Hausaufgaben machen.',
-            'thumbnail': 'http://www.20min.ch/images/content/2/2/0/22050469/10/teaserbreit.jpg'
+            'thumbnail': r're:https?://.*\.jpg$',
          },
-        'skip': '"This video is no longer available" is shown both on the web page and in the downloaded file.',
      }, {
-        # YouTube embed
-        'url': 'http://www.20min.ch/ro/sports/football/story/Il-marque-une-bicyclette-de-plus-de-30-metres--21115184',
-        'md5': 'cec64d59aa01c0ed9dbba9cf639dd82f',
+        'url': 'http://www.20min.ch/videoplayer/videoplayer.html?params=client@twentyDE|videoId@523629',
          'info_dict': {
-            'id': 'ivM7A7SpDOs',
+            'id': '523629',
              'ext': 'mp4',
-            'title': 'GOLAZO DE CHILENA DE JAVI GÓMEZ, FINALISTA AL BALÓN DE CLM 2016',
-            'description': 'md5:903c92fbf2b2f66c09de514bc25e9f5a',
-            'upload_date': '20160424',
-            'uploader': 'RTVCM Castilla-La Mancha',
-            'uploader_id': 'RTVCM',
+            'title': 'So kommen Sie bei Eis und Schnee sicher an',
+            'description': 'md5:117c212f64b25e3d95747e5276863f7d',
+            'thumbnail': r're:https?://.*\.jpg$',
+        },
+        'params': {
+            'skip_download': True,
          },
-        'add_ie': ['Youtube'],
      }, {
          'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738',
          'only_matching': True,
-    }, {
-        'url': 'http://www.20min.ch/ro/sortir/cinema/story/Grandir-au-bahut--c-est-dur-18927411',
-        'only_matching': True,
      }]
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return [m.group('url') for m in re.finditer(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
+            webpage)]
+
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id') or video_id
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://api.20min.ch/video/%s/show' % video_id,
+            video_id)['content']
  
-        webpage = self._download_webpage(url, display_id)
+        title = video['title']
  
-        youtube_url = self._html_search_regex(
-            r'<iframe[^>]+src="((?:https?:)?//www\.youtube\.com/embed/[^"]+)"',
-            webpage, 'YouTube embed URL', default=None)
-        if youtube_url is not None:
-            return self.url_result(youtube_url, 'Youtube')
+        formats = [{
+            'format_id': format_id,
+            'url': 'http://podcast.20min-tv.ch/podcast/20min/%s%s.mp4' % (video_id, p),
+            'quality': quality,
+        } for quality, (format_id, p) in enumerate([('sd', ''), ('hd', 'h')])]
+        self._sort_formats(formats)
  
-        title = self._html_search_regex(
-            r'<h1>.*?<span>(.+?)</span></h1>',
-            webpage, 'title', default=None)
-        if not title:
-            title = remove_end(re.sub(
-                r'^20 [Mm]inuten.*? -', '', self._og_search_title(webpage)), ' - News')
+        description = video.get('lead')
+        thumbnail = video.get('thumbnail')
  
-        if not video_id:
-            video_id = self._search_regex(
-                r'"file\d?"\s*,\s*\"(\d+)', webpage, 'video id')
+        def extract_count(kind):
+            return try_get(
+                video,
+                lambda x: int_or_none(x['communityobject']['thumbs_%s' % kind]))
  
-        description = self._html_search_meta(
-            'description', webpage, 'description')
-        thumbnail = self._og_search_thumbnail(webpage)
+        like_count = extract_count('up')
+        dislike_count = extract_count('down')
  
          return {
              'id': video_id,
-            'display_id': display_id,
-            'url': 'http://speed.20min-tv.ch/%sm.flv' % video_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
+            'like_count': like_count,
+            'dislike_count': dislike_count,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index 77414a242d68f2309985235f0418d47c77194417..bbba394b0ede953f60c179970c8b41d4013f69c2 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -22,6 +22,7 @@ from ..utils import (
      orderedSet,
      parse_duration,
      parse_iso8601,
+    update_url_query,
      urlencode_postdata,
  )
  
@@ -205,7 +206,14 @@ class TwitchChapterIE(TwitchItemBaseIE):
  
  class TwitchVodIE(TwitchItemBaseIE):
      IE_NAME = 'twitch:vod'
-    _VALID_URL = r'%s/[^/]+/v/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:www\.)?twitch\.tv/(?:[^/]+/v|videos)/|
+                            player\.twitch\.tv/\?.*?\bvideo=v
+                        )
+                        (?P<id>\d+)
+                    '''
      _ITEM_TYPE = 'vod'
      _ITEM_SHORTCUT = 'v'
  
@@ -215,7 +223,7 @@ class TwitchVodIE(TwitchItemBaseIE):
              'id': 'v6528877',
              'ext': 'mp4',
              'title': 'LCK Summer Split - Week 6 Day 1',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 17208,
              'timestamp': 1435131709,
              'upload_date': '20150624',
@@ -235,7 +243,7 @@ class TwitchVodIE(TwitchItemBaseIE):
              'id': 'v11230755',
              'ext': 'mp4',
              'title': 'Untitled Broadcast',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1638,
              'timestamp': 1439746708,
              'upload_date': '20150816',
@@ -248,6 +256,12 @@ class TwitchVodIE(TwitchItemBaseIE):
              'skip_download': True,
          },
          'skip': 'HTTP Error 404: Not Found',
+    }, {
+        'url': 'http://player.twitch.tv/?t=5m10s&video=v6528877',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.twitch.tv/videos/6528877',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -279,6 +293,18 @@ class TwitchVodIE(TwitchItemBaseIE):
          if 't' in query:
              info['start_time'] = parse_duration(query['t'][0])
  
+        if info.get('timestamp') is not None:
+            info['subtitles'] = {
+                'rechat': [{
+                    'url': update_url_query(
+                        'https://rechat.twitch.tv/rechat-messages', {
+                            'video_id': 'v%s' % item_id,
+                            'start': info['timestamp'],
+                        }),
+                    'ext': 'json',
+                }],
+            }
+
          return info
  
  
@@ -300,7 +326,7 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
              response = self._call_api(
                  self._PLAYLIST_PATH % (channel_id, offset, limit),
                  channel_id,
-                'Downloading %s videos JSON page %s'
+                'Downloading %s JSON page %s'
                  % (self._PLAYLIST_TYPE, counter_override or counter))
              page_entries = self._extract_playlist_page(response)
              if not page_entries:
@@ -350,25 +376,85 @@ class TwitchProfileIE(TwitchPlaylistBaseIE):
      }
  
  
-class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
-    IE_NAME = 'twitch:past_broadcasts'
-    _VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
-    _PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcasts=true'
+class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
+    _VALID_URL_VIDEOS_BASE = r'%s/(?P<id>[^/]+)/videos' % TwitchBaseIE._VALID_URL_BASE
+    _PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcast_type='
+
+
+class TwitchAllVideosIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:all'
+    _VALID_URL = r'%s/all' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive,upload,highlight'
+    _PLAYLIST_TYPE = 'all videos'
+
+    _TEST = {
+        'url': 'https://www.twitch.tv/spamfish/videos/all',
+        'info_dict': {
+            'id': 'spamfish',
+            'title': 'Spamfish',
+        },
+        'playlist_mincount': 869,
+    }
+
+
+class TwitchUploadsIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:uploads'
+    _VALID_URL = r'%s/uploads' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'upload'
+    _PLAYLIST_TYPE = 'uploads'
+
+    _TEST = {
+        'url': 'https://www.twitch.tv/spamfish/videos/uploads',
+        'info_dict': {
+            'id': 'spamfish',
+            'title': 'Spamfish',
+        },
+        'playlist_mincount': 0,
+    }
+
+
+class TwitchPastBroadcastsIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:past-broadcasts'
+    _VALID_URL = r'%s/past-broadcasts' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive'
      _PLAYLIST_TYPE = 'past broadcasts'
  
      _TEST = {
-        'url': 'http://www.twitch.tv/spamfish/profile/past_broadcasts',
+        'url': 'https://www.twitch.tv/spamfish/videos/past-broadcasts',
+        'info_dict': {
+            'id': 'spamfish',
+            'title': 'Spamfish',
+        },
+        'playlist_mincount': 0,
+    }
+
+
+class TwitchHighlightsIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:highlights'
+    _VALID_URL = r'%s/highlights' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'highlight'
+    _PLAYLIST_TYPE = 'highlights'
+
+    _TEST = {
+        'url': 'https://www.twitch.tv/spamfish/videos/highlights',
          'info_dict': {
              'id': 'spamfish',
              'title': 'Spamfish',
          },
-        'playlist_mincount': 54,
+        'playlist_mincount': 805,
      }
  
  
  class TwitchStreamIE(TwitchBaseIE):
      IE_NAME = 'twitch:stream'
-    _VALID_URL = r'%s/(?P<id>[^/#?]+)/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:www\.)?twitch\.tv/|
+                            player\.twitch\.tv/\?.*?\bchannel=
+                        )
+                        (?P<id>[^/#?]+)
+                    '''
  
      _TESTS = [{
          'url': 'http://www.twitch.tv/shroomztv',
@@ -392,8 +478,25 @@ class TwitchStreamIE(TwitchBaseIE):
      }, {
          'url': 'http://www.twitch.tv/miracle_doto#profile-0',
          'only_matching': True,
+    }, {
+        'url': 'https://player.twitch.tv/?channel=lotsofs',
+        'only_matching': True,
      }]
  
+    @classmethod
+    def suitable(cls, url):
+        return (False
+                if any(ie.suitable(url) for ie in (
+                    TwitchVideoIE,
+                    TwitchChapterIE,
+                    TwitchVodIE,
+                    TwitchProfileIE,
+                    TwitchAllVideosIE,
+                    TwitchUploadsIE,
+                    TwitchPastBroadcastsIE,
+                    TwitchHighlightsIE))
+                else super(TwitchStreamIE, cls).suitable(url))
+
      def _real_extract(self, url):
          channel_id = self._match_id(url)
  
@@ -474,7 +577,7 @@ class TwitchClipsIE(InfoExtractor):
              'id': 'AggressiveCobraPoooound',
              'ext': 'mp4',
              'title': 'EA Play 2016 Live from the Novo Theatre',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'creator': 'EA',
              'uploader': 'stereotype_',
              'uploader_id': 'stereotype_',
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index ac0b221b4f5ab02c33f1776389cba849bf00ae2b..37e3bc4129fdc43033556d5ab9941965f70b3b8c 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -34,7 +34,7 @@ class TwitterCardIE(TwitterBaseIE):
                  'id': '560070183650213889',
                  'ext': 'mp4',
                  'title': 'Twitter Card',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 30.033,
              }
          },
@@ -45,7 +45,7 @@ class TwitterCardIE(TwitterBaseIE):
                  'id': '623160978427936768',
                  'ext': 'mp4',
                  'title': 'Twitter Card',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 80.155,
              },
          },
@@ -82,7 +82,7 @@ class TwitterCardIE(TwitterBaseIE):
                  'id': '705235433198714880',
                  'ext': 'mp4',
                  'title': 'Twitter web player',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
              },
          }, {
              'url': 'https://twitter.com/i/videos/752274308186120192',
@@ -201,7 +201,7 @@ class TwitterIE(InfoExtractor):
              'id': '643211948184596480',
              'ext': 'mp4',
              'title': 'FREE THE NIPPLE - FTN supporters on Hollywood Blvd today!',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'description': 'FREE THE NIPPLE on Twitter: "FTN supporters on Hollywood Blvd today! http://t.co/c7jHH749xJ"',
              'uploader': 'FREE THE NIPPLE',
              'uploader_id': 'freethenipple',
@@ -217,7 +217,7 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Gifs - tu vai cai tu vai cai tu nao eh capaz disso tu vai cai',
              'description': 'Gifs on Twitter: "tu vai cai tu vai cai tu nao eh capaz disso tu vai cai https://t.co/tM46VHFlO5"',
-            'thumbnail': 're:^https?://.*\.png',
+            'thumbnail': r're:^https?://.*\.png',
              'uploader': 'Gifs',
              'uploader_id': 'giphz',
          },
@@ -257,7 +257,7 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
              'description': 'JG on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'uploader': 'JG',
              'uploader_id': 'jaydingeer',
          },
diff --git a/youtube_dl/extractor/udn.py b/youtube_dl/extractor/udn.py

index 57dd73aef6f6254f22cdcd814e2d76b20c75b847..daf45d0b4e1a3710832875f79e160ebc759849dd 100644 (file)
--- a/youtube_dl/extractor/udn.py
+++ b/youtube_dl/extractor/udn.py
@@ -23,7 +23,7 @@ class UDNEmbedIE(InfoExtractor):
              'id': '300040',
              'ext': 'mp4',
              'title': '生物老師男變女 全校挺"做自己"',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/uktvplay.py b/youtube_dl/extractor/uktvplay.py

new file mode 100644 (file)

index 0000000..2137502
--- /dev/null
+++ b/youtube_dl/extractor/uktvplay.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class UKTVPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://uktvplay\.uktv\.co\.uk/.+?\?.*?\bvideo=(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://uktvplay.uktv.co.uk/shows/world-at-war/c/200/watch-online/?video=2117008346001',
+        'md5': '',
+        'info_dict': {
+            'id': '2117008346001',
+            'ext': 'mp4',
+            'title': 'Pincers',
+            'description': 'Pincers',
+            'uploader_id': '1242911124001',
+            'upload_date': '20130124',
+            'timestamp': 1359049267,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['Failed to download MPD manifest']
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911124001/H1xnMOqP_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            self.BRIGHTCOVE_URL_TEMPLATE % video_id,
+            'BrightcoveNew', video_id)
diff --git a/youtube_dl/extractor/uol.py b/youtube_dl/extractor/uol.py

index c27c643871a5c741a11a134474c2dba20dc91e5e..e67083004789f250faf842ee31fc2b343ad54754 100644 (file)
--- a/youtube_dl/extractor/uol.py
+++ b/youtube_dl/extractor/uol.py
@@ -84,12 +84,27 @@ class UOLIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        if not video_id.isdigit():
-            embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
-            video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id')
+        media_id = None
+
+        if video_id.isdigit():
+            media_id = video_id
+
+        if not media_id:
+            embed_page = self._download_webpage(
+                'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
+                video_id, 'Downloading embed page', fatal=False)
+            if embed_page:
+                media_id = self._search_regex(
+                    (r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
+                    embed_page, 'media id', default=None)
+
+        if not media_id:
+            webpage = self._download_webpage(url, video_id)
+            media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
+
          video_data = self._download_json(
-            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id,
-            video_id)['item']
+            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
+            media_id)['item']
          title = video_data['title']
  
          query = {
@@ -118,7 +133,7 @@ class UOLIE(InfoExtractor):
              tags.append(tag_description)
  
          return {
-            'id': video_id,
+            'id': media_id,
              'title': title,
              'description': clean_html(video_data.get('desMedia')),
              'thumbnail': video_data.get('thumbnail'),
diff --git a/youtube_dl/extractor/uplynk.py b/youtube_dl/extractor/uplynk.py

index 2cd22cf8a1afa51403b3b9801ca7dd08c03503a9..f06bf5b127fd0f352937d79fa6d5267fcb7cdb26 100644 (file)
--- a/youtube_dl/extractor/uplynk.py
+++ b/youtube_dl/extractor/uplynk.py
@@ -30,7 +30,9 @@ class UplynkIE(InfoExtractor):
      def _extract_uplynk_info(self, uplynk_content_url):
          path, external_id, video_id, session_id = re.match(UplynkIE._VALID_URL, uplynk_content_url).groups()
          display_id = video_id or external_id
-        formats = self._extract_m3u8_formats('http://content.uplynk.com/%s.m3u8' % path, display_id, 'mp4')
+        formats = self._extract_m3u8_formats(
+            'http://content.uplynk.com/%s.m3u8' % path,
+            display_id, 'mp4', 'm3u8_native')
          if session_id:
              for f in formats:
                  f['extra_param_to_segment_url'] = 'pbs=' + session_id
diff --git a/youtube_dl/extractor/urort.py b/youtube_dl/extractor/urort.py

index 8872cfcb2795ab0bfb9db1ad5418eb61dd0dffc6..8f6edab4b1f21b241b41accfe4cafefc2dd0092f 100644 (file)
--- a/youtube_dl/extractor/urort.py
+++ b/youtube_dl/extractor/urort.py
@@ -21,7 +21,7 @@ class UrortIE(InfoExtractor):
              'id': '33124-24',
              'ext': 'mp3',
              'title': 'The Bomb',
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
              'uploader': 'Gerilja',
              'uploader_id': 'Gerilja',
              'upload_date': '20100323',
diff --git a/youtube_dl/extractor/ustream.py b/youtube_dl/extractor/ustream.py

index 0c06bf36bd5f76cabecc47e699ad56a45ba63a4a..5737d4d16c853193f5beb08b2eff7126c6be3d3c 100644 (file)
--- a/youtube_dl/extractor/ustream.py
+++ b/youtube_dl/extractor/ustream.py
@@ -69,6 +69,13 @@ class UstreamIE(InfoExtractor):
          },
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
+        if mobj is not None:
+            return mobj.group('url')
+
      def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None):
          def num_to_hex(n):
              return hex(n)[2:]
diff --git a/youtube_dl/extractor/ustudio.py b/youtube_dl/extractor/ustudio.py

index 3484a204658e1f09d472c0b31026ec6621121f1f..56509beedc027ae9eae3f3f36d91be32238d729a 100644 (file)
--- a/youtube_dl/extractor/ustudio.py
+++ b/youtube_dl/extractor/ustudio.py
@@ -22,7 +22,7 @@ class UstudioIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'San Francisco: Golden Gate Bridge',
              'description': 'md5:23925500697f2c6d4830e387ba51a9be',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20111107',
              'uploader': 'Tony Farley',
          }
diff --git a/youtube_dl/extractor/varzesh3.py b/youtube_dl/extractor/varzesh3.py

index 84698371a8ab2daf77faae1684141eb32425f232..f474ed73f861910d9c593510a4aff6be8244e903 100644 (file)
--- a/youtube_dl/extractor/varzesh3.py
+++ b/youtube_dl/extractor/varzesh3.py
@@ -22,7 +22,7 @@ class Varzesh3IE(InfoExtractor):
              'ext': 'mp4',
              'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا',
              'description': 'فصل ۲۰۱۵-۲۰۱۴',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'skip': 'HTTP 404 Error',
      }, {
@@ -67,7 +67,7 @@ class Varzesh3IE(InfoExtractor):
              webpage, display_id, default=None)
          if video_id is None:
              video_id = self._search_regex(
-                'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
+                r'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
                  default=display_id)
  
          return {
diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py

index a1e0851b7424e4c73cd34b72c02f16bc1905b6ce..bef6394626d4eca25785d35199c1092f69f45b54 100644 (file)
--- a/youtube_dl/extractor/vbox7.py
+++ b/youtube_dl/extractor/vbox7.py
@@ -4,11 +4,22 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import urlencode_postdata
+from ..utils import ExtractorError
  
  
  class Vbox7IE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vbox7\.com/(?:play:|emb/external\.php\?.*?\bvid=)(?P<id>[\da-fA-F]+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:[^/]+\.)?vbox7\.com/
+                        (?:
+                            play:|
+                            (?:
+                                emb/external\.php|
+                                player/ext\.swf
+                            )\?.*?\bvid=
+                        )
+                        (?P<id>[\da-fA-F]+)
+                    '''
      _TESTS = [{
          'url': 'http://vbox7.com/play:0946fff23c',
          'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@@ -16,6 +27,14 @@ class Vbox7IE(InfoExtractor):
              'id': '0946fff23c',
              'ext': 'mp4',
              'title': 'Борисов: Притеснен съм за бъдещето на България',
+            'description': 'По думите му е опасно страната ни да бъде обявена за "сигурна"',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1470982814,
+            'upload_date': '20160812',
+            'uploader': 'zdraveibulgaria',
+        },
+        'params': {
+            'proxy': '127.0.0.1:8118',
          },
      }, {
          'url': 'http://vbox7.com/play:249bb972c2',
@@ -29,12 +48,15 @@ class Vbox7IE(InfoExtractor):
      }, {
          'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
          'only_matching': True,
+    }, {
+        'url': 'http://i49.vbox7.com/player/ext.swf?vid=0946fff23c&autoplay=1',
+        'only_matching': True,
      }]
  
      @staticmethod
      def _extract_url(webpage):
          mobj = re.search(
-            '<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
+            r'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
              webpage)
          if mobj:
              return mobj.group('url')
@@ -42,33 +64,41 @@ class Vbox7IE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(
-            'http://vbox7.com/play:%s' % video_id, video_id)
-
-        title = self._html_search_regex(
-            r'<title>(.+?)</title>', webpage, 'title').split('/')[0].strip()
+        response = self._download_json(
+            'https://www.vbox7.com/ajax/video/nextvideo.php?vid=%s' % video_id,
+            video_id)
  
-        video_url = self._search_regex(
-            r'src\s*:\s*(["\'])(?P<url>.+?.mp4.*?)\1',
-            webpage, 'video url', default=None, group='url')
+        if 'error' in response:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, response['error']), expected=True)
  
-        thumbnail_url = self._og_search_thumbnail(webpage)
+        video = response['options']
  
-        if not video_url:
-            info_response = self._download_webpage(
-                'http://vbox7.com/play/magare.do', video_id,
-                'Downloading info webpage',
-                data=urlencode_postdata({'as3': '1', 'vid': video_id}),
-                headers={'Content-Type': 'application/x-www-form-urlencoded'})
-            final_url, thumbnail_url = map(
-                lambda x: x.split('=')[1], info_response.split('&'))
+        title = video['title']
+        video_url = video['src']
  
          if '/na.mp4' in video_url:
              self.raise_geo_restricted()
  
-        return {
+        uploader = video.get('uploader')
+
+        webpage = self._download_webpage(
+            'http://vbox7.com/play:%s' % video_id, video_id, fatal=None)
+
+        info = {}
+
+        if webpage:
+            info = self._search_json_ld(
+                webpage.replace('"/*@context"', '"@context"'), video_id,
+                fatal=False)
+
+        info.update({
              'id': video_id,
-            'url': self._proto_relative_url(video_url, 'http:'),
              'title': title,
-            'thumbnail': thumbnail_url,
-        }
+            'url': video_url,
+            'uploader': uploader,
+            'thumbnail': self._proto_relative_url(
+                info.get('thumbnail') or self._og_search_thumbnail(webpage),
+                'http:'),
+        })
+        return info
diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py

index 6b9c227db7a8a88e89b2df8efd3e067613bf605a..80a643dfe6d6a7a160cb4035b52b9a95b03769cf 100644 (file)
--- a/youtube_dl/extractor/vessel.py
+++ b/youtube_dl/extractor/vessel.py
@@ -24,7 +24,7 @@ class VesselIE(InfoExtractor):
              'id': 'HDN7G5UMs',
              'ext': 'mp4',
              'title': 'Nvidia GeForce GTX Titan X - The Best Video Card on the Market?',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150317',
              'description': 'Did Nvidia pull out all the stops on the Titan X, or does its performance leave something to be desired?',
              'timestamp': int,
diff --git a/youtube_dl/extractor/vevo.py b/youtube_dl/extractor/vevo.py

index d82261e5eec5f3c575bc48f23b23c64aa0355f83..c4e37f694426c175b1f33d7795ca01baf6f7547b 100644 (file)
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -4,9 +4,9 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_etree_fromstring,
      compat_str,
      compat_urlparse,
+    compat_HTTPError,
  )
  from ..utils import (
      ExtractorError,
@@ -140,21 +140,6 @@ class VevoIE(VevoBaseIE):
          'url': 'http://www.vevo.com/watch/INS171400764',
          'only_matching': True,
      }]
-    _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com'
-    _SOURCE_TYPES = {
-        0: 'youtube',
-        1: 'brightcove',
-        2: 'http',
-        3: 'hls_ios',
-        4: 'hls',
-        5: 'smil',  # http
-        7: 'f4m_cc',
-        8: 'f4m_ak',
-        9: 'f4m_l3',
-        10: 'ism',
-        13: 'smil',  # rtmp
-        18: 'dash',
-    }
      _VERSIONS = {
          0: 'youtube',  # only in AuthenticateVideo videoVersions
          1: 'level3',
@@ -163,41 +148,6 @@ class VevoIE(VevoBaseIE):
          4: 'amazon',
      }
  
-    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
-        formats = []
-        els = smil.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
-        for el in els:
-            src = el.attrib['src']
-            m = re.match(r'''(?xi)
-                (?P<ext>[a-z0-9]+):
-                (?P<path>
-                    [/a-z0-9]+     # The directory and main part of the URL
-                    _(?P<tbr>[0-9]+)k
-                    _(?P<width>[0-9]+)x(?P<height>[0-9]+)
-                    _(?P<vcodec>[a-z0-9]+)
-                    _(?P<vbr>[0-9]+)
-                    _(?P<acodec>[a-z0-9]+)
-                    _(?P<abr>[0-9]+)
-                    \.[a-z0-9]+  # File extension
-                )''', src)
-            if not m:
-                continue
-
-            format_url = self._SMIL_BASE_URL + m.group('path')
-            formats.append({
-                'url': format_url,
-                'format_id': 'smil_' + m.group('tbr'),
-                'vcodec': m.group('vcodec'),
-                'acodec': m.group('acodec'),
-                'tbr': int(m.group('tbr')),
-                'vbr': int(m.group('vbr')),
-                'abr': int(m.group('abr')),
-                'ext': m.group('ext'),
-                'width': int(m.group('width')),
-                'height': int(m.group('height')),
-            })
-        return formats
-
      def _initialize_api(self, video_id):
          req = sanitized_Request(
              'http://www.vevo.com/auth', data=b'')
@@ -206,7 +156,7 @@ class VevoIE(VevoBaseIE):
              note='Retrieving oauth token',
              errnote='Unable to retrieve oauth token')
  
-        if 'THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION' in webpage:
+        if re.search(r'(?i)THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION', webpage):
              self.raise_geo_restricted(
                  '%s said: This page is currently unavailable in your region' % self.IE_NAME)
  
@@ -214,148 +164,91 @@ class VevoIE(VevoBaseIE):
          self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
  
      def _call_api(self, path, *args, **kwargs):
-        return self._download_json(self._api_url_template % path, *args, **kwargs)
+        try:
+            data = self._download_json(self._api_url_template % path, *args, **kwargs)
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError):
+                errors = self._parse_json(e.cause.read().decode(), None)['errors']
+                error_message = ', '.join([error['message'] for error in errors])
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
+            raise
+        return data
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        json_url = 'http://api.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
-        response = self._download_json(
-            json_url, video_id, 'Downloading video info',
-            'Unable to download info', fatal=False) or {}
-        video_info = response.get('video') or {}
+        self._initialize_api(video_id)
+
+        video_info = self._call_api(
+            'video/%s' % video_id, video_id, 'Downloading api video info',
+            'Failed to download video info')
+
+        video_versions = self._call_api(
+            'video/%s/streams' % video_id, video_id,
+            'Downloading video versions info',
+            'Failed to download video versions info',
+            fatal=False)
+
+        # Some videos are only available via webpage (e.g.
+        # https://github.com/rg3/youtube-dl/issues/9366)
+        if not video_versions:
+            webpage = self._download_webpage(url, video_id)
+            video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
+
+        uploader = None
          artist = None
          featured_artist = None
-        uploader = None
-        view_count = None
+        artists = video_info.get('artists')
+        for curr_artist in artists:
+            if curr_artist.get('role') == 'Featured':
+                featured_artist = curr_artist['name']
+            else:
+                artist = uploader = curr_artist['name']
+
          formats = []
+        for video_version in video_versions:
+            version = self._VERSIONS.get(video_version['version'])
+            version_url = video_version.get('url')
+            if not version_url:
+                continue
  
-        if not video_info:
-            try:
-                self._initialize_api(video_id)
-            except ExtractorError:
-                ytid = response.get('errorInfo', {}).get('ytid')
-                if ytid:
-                    self.report_warning(
-                        'Video is geoblocked, trying with the YouTube video %s' % ytid)
-                    return self.url_result(ytid, 'Youtube', ytid)
-
-                raise
-
-            video_info = self._call_api(
-                'video/%s' % video_id, video_id, 'Downloading api video info',
-                'Failed to download video info')
-
-            video_versions = self._call_api(
-                'video/%s/streams' % video_id, video_id,
-                'Downloading video versions info',
-                'Failed to download video versions info',
-                fatal=False)
-
-            # Some videos are only available via webpage (e.g.
-            # https://github.com/rg3/youtube-dl/issues/9366)
-            if not video_versions:
-                webpage = self._download_webpage(url, video_id)
-                video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
-
-            timestamp = parse_iso8601(video_info.get('releaseDate'))
-            artists = video_info.get('artists')
-            for curr_artist in artists:
-                if curr_artist.get('role') == 'Featured':
-                    featured_artist = curr_artist['name']
-                else:
-                    artist = uploader = curr_artist['name']
-            view_count = int_or_none(video_info.get('views', {}).get('total'))
-
-            for video_version in video_versions:
-                version = self._VERSIONS.get(video_version['version'])
-                version_url = video_version.get('url')
-                if not version_url:
+            if '.ism' in version_url:
+                continue
+            elif '.mpd' in version_url:
+                formats.extend(self._extract_mpd_formats(
+                    version_url, video_id, mpd_id='dash-%s' % version,
+                    note='Downloading %s MPD information' % version,
+                    errnote='Failed to download %s MPD information' % version,
+                    fatal=False))
+            elif '.m3u8' in version_url:
+                formats.extend(self._extract_m3u8_formats(
+                    version_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls-%s' % version,
+                    note='Downloading %s m3u8 information' % version,
+                    errnote='Failed to download %s m3u8 information' % version,
+                    fatal=False))
+            else:
+                m = re.search(r'''(?xi)
+                    _(?P<width>[0-9]+)x(?P<height>[0-9]+)
+                    _(?P<vcodec>[a-z0-9]+)
+                    _(?P<vbr>[0-9]+)
+                    _(?P<acodec>[a-z0-9]+)
+                    _(?P<abr>[0-9]+)
+                    \.(?P<ext>[a-z0-9]+)''', version_url)
+                if not m:
                      continue
  
-                if '.ism' in version_url:
-                    continue
-                elif '.mpd' in version_url:
-                    formats.extend(self._extract_mpd_formats(
-                        version_url, video_id, mpd_id='dash-%s' % version,
-                        note='Downloading %s MPD information' % version,
-                        errnote='Failed to download %s MPD information' % version,
-                        fatal=False))
-                elif '.m3u8' in version_url:
-                    formats.extend(self._extract_m3u8_formats(
-                        version_url, video_id, 'mp4', 'm3u8_native',
-                        m3u8_id='hls-%s' % version,
-                        note='Downloading %s m3u8 information' % version,
-                        errnote='Failed to download %s m3u8 information' % version,
-                        fatal=False))
-                else:
-                    m = re.search(r'''(?xi)
-                        _(?P<width>[0-9]+)x(?P<height>[0-9]+)
-                        _(?P<vcodec>[a-z0-9]+)
-                        _(?P<vbr>[0-9]+)
-                        _(?P<acodec>[a-z0-9]+)
-                        _(?P<abr>[0-9]+)
-                        \.(?P<ext>[a-z0-9]+)''', version_url)
-                    if not m:
-                        continue
-
-                    formats.append({
-                        'url': version_url,
-                        'format_id': 'http-%s-%s' % (version, video_version['quality']),
-                        'vcodec': m.group('vcodec'),
-                        'acodec': m.group('acodec'),
-                        'vbr': int(m.group('vbr')),
-                        'abr': int(m.group('abr')),
-                        'ext': m.group('ext'),
-                        'width': int(m.group('width')),
-                        'height': int(m.group('height')),
-                    })
-        else:
-            timestamp = int_or_none(self._search_regex(
-                r'/Date\((\d+)\)/',
-                video_info['releaseDate'], 'release date', fatal=False),
-                scale=1000)
-            artists = video_info.get('mainArtists')
-            if artists:
-                artist = uploader = artists[0]['artistName']
-
-            featured_artists = video_info.get('featuredArtists')
-            if featured_artists:
-                featured_artist = featured_artists[0]['artistName']
-
-            smil_parsed = False
-            for video_version in video_info['videoVersions']:
-                version = self._VERSIONS.get(video_version['version'])
-                if version == 'youtube':
-                    continue
-                else:
-                    source_type = self._SOURCE_TYPES.get(video_version['sourceType'])
-                    renditions = compat_etree_fromstring(video_version['data'])
-                    if source_type == 'http':
-                        for rend in renditions.findall('rendition'):
-                            attr = rend.attrib
-                            formats.append({
-                                'url': attr['url'],
-                                'format_id': 'http-%s-%s' % (version, attr['name']),
-                                'height': int_or_none(attr.get('frameheight')),
-                                'width': int_or_none(attr.get('frameWidth')),
-                                'tbr': int_or_none(attr.get('totalBitrate')),
-                                'vbr': int_or_none(attr.get('videoBitrate')),
-                                'abr': int_or_none(attr.get('audioBitrate')),
-                                'vcodec': attr.get('videoCodec'),
-                                'acodec': attr.get('audioCodec'),
-                            })
-                    elif source_type == 'hls':
-                        formats.extend(self._extract_m3u8_formats(
-                            renditions.find('rendition').attrib['url'], video_id,
-                            'mp4', 'm3u8_native', m3u8_id='hls-%s' % version,
-                            note='Downloading %s m3u8 information' % version,
-                            errnote='Failed to download %s m3u8 information' % version,
-                            fatal=False))
-                    elif source_type == 'smil' and version == 'level3' and not smil_parsed:
-                        formats.extend(self._extract_smil_formats(
-                            renditions.find('rendition').attrib['url'], video_id, False))
-                        smil_parsed = True
+                formats.append({
+                    'url': version_url,
+                    'format_id': 'http-%s-%s' % (version, video_version['quality']),
+                    'vcodec': m.group('vcodec'),
+                    'acodec': m.group('acodec'),
+                    'vbr': int(m.group('vbr')),
+                    'abr': int(m.group('abr')),
+                    'ext': m.group('ext'),
+                    'width': int(m.group('width')),
+                    'height': int(m.group('height')),
+                })
          self._sort_formats(formats)
  
          track = video_info['title']
@@ -376,17 +269,15 @@ class VevoIE(VevoBaseIE):
          else:
              age_limit = None
  
-        duration = video_info.get('duration')
-
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
              'thumbnail': video_info.get('imageUrl') or video_info.get('thumbnailUrl'),
-            'timestamp': timestamp,
+            'timestamp': parse_iso8601(video_info.get('releaseDate')),
              'uploader': uploader,
-            'duration': duration,
-            'view_count': view_count,
+            'duration': int_or_none(video_info.get('duration')),
+            'view_count': int_or_none(video_info.get('views', {}).get('total')),
              'age_limit': age_limit,
              'track': track,
              'artist': uploader,
diff --git a/youtube_dl/extractor/vgtv.py b/youtube_dl/extractor/vgtv.py

index 3b38ac700296a2eef8c12f0b45406f54785d7684..8a574bc269789e14f3dcadd6167c5caaa46e49e3 100644 (file)
--- a/youtube_dl/extractor/vgtv.py
+++ b/youtube_dl/extractor/vgtv.py
@@ -61,7 +61,7 @@ class VGTVIE(XstreamIE):
                  'ext': 'mp4',
                  'title': 'Hevnen er søt: Episode 10 - Abu',
                  'description': 'md5:e25e4badb5f544b04341e14abdc72234',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 648.000,
                  'timestamp': 1404626400,
                  'upload_date': '20140706',
@@ -76,7 +76,7 @@ class VGTVIE(XstreamIE):
                  'ext': 'flv',
                  'title': 'OPPTAK: VGTV følger EM-kvalifiseringen',
                  'description': 'md5:3772d9c0dc2dff92a886b60039a7d4d3',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 9103.0,
                  'timestamp': 1410113864,
                  'upload_date': '20140907',
@@ -96,7 +96,7 @@ class VGTVIE(XstreamIE):
                  'ext': 'mp4',
                  'title': 'V75 fra Solvalla 30.05.15',
                  'description': 'md5:b3743425765355855f88e096acc93231',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 25966,
                  'timestamp': 1432975582,
                  'upload_date': '20150530',
@@ -200,7 +200,7 @@ class VGTVIE(XstreamIE):
              format_info = {
                  'url': mp4_url,
              }
-            mobj = re.search('(\d+)_(\d+)_(\d+)', mp4_url)
+            mobj = re.search(r'(\d+)_(\d+)_(\d+)', mp4_url)
              if mobj:
                  tbr = int(mobj.group(3))
                  format_info.update({
@@ -246,7 +246,7 @@ class BTArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Alrekstad internat',
              'description': 'md5:dc81a9056c874fedb62fc48a300dac58',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 191,
              'timestamp': 1289991323,
              'upload_date': '20101117',
diff --git a/youtube_dl/extractor/vidbit.py b/youtube_dl/extractor/vidbit.py

index e7ac5a8425bbccad286fb5f49b9d6ca040ed0cfe..91f45b7cc78d38451591a652a0472f6ba35c0383 100644 (file)
--- a/youtube_dl/extractor/vidbit.py
+++ b/youtube_dl/extractor/vidbit.py
@@ -20,7 +20,7 @@ class VidbitIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Intro to VidBit',
              'description': 'md5:5e0d6142eec00b766cbf114bfd3d16b7',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'upload_date': '20160618',
              'view_count': int,
              'comment_count': int,
diff --git a/youtube_dl/extractor/viddler.py b/youtube_dl/extractor/viddler.py

index 8d92aee878d3ad0c0d5725db755451c88e527f66..67808e7e623c420fc7507efa0a57d550dd64655a 100644 (file)
--- a/youtube_dl/extractor/viddler.py
+++ b/youtube_dl/extractor/viddler.py
@@ -26,7 +26,7 @@ class ViddlerIE(InfoExtractor):
              'timestamp': 1335371429,
              'upload_date': '20120425',
              'duration': 100.89,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'view_count': int,
              'comment_count': int,
              'categories': ['video content', 'high quality video', 'video made easy', 'how to produce video with limited resources', 'viddler'],
diff --git a/youtube_dl/extractor/videa.py b/youtube_dl/extractor/videa.py

new file mode 100644 (file)

index 0000000..311df58
--- /dev/null
+++ b/youtube_dl/extractor/videa.py
@@ -0,0 +1,97 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    mimetype2ext,
+    parse_codecs,
+    xpath_element,
+    xpath_text,
+)
+
+
+class VideaIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+                    https?://
+                        videa\.hu/
+                        (?:
+                            videok/(?:[^/]+/)*[^?#&]+-|
+                            player\?.*?\bv=|
+                            player/v/
+                        )
+                        (?P<id>[^?#&]+)
+                    '''
+    _TESTS = [{
+        'url': 'http://videa.hu/videok/allatok/az-orult-kigyasz-285-kigyot-kigyo-8YfIAjxwWGwT8HVQ',
+        'md5': '97a7af41faeaffd9f1fc864a7c7e7603',
+        'info_dict': {
+            'id': '8YfIAjxwWGwT8HVQ',
+            'ext': 'mp4',
+            'title': 'Az őrült kígyász 285 kígyót enged szabadon',
+            'thumbnail': 'http://videa.hu/static/still/1.4.1.1007274.1204470.3',
+            'duration': 21,
+        },
+    }, {
+        'url': 'http://videa.hu/videok/origo/jarmuvek/supercars-elozes-jAHDWfWSJH5XuFhH',
+        'only_matching': True,
+    }, {
+        'url': 'http://videa.hu/player?v=8YfIAjxwWGwT8HVQ',
+        'only_matching': True,
+    }, {
+        'url': 'http://videa.hu/player/v/8YfIAjxwWGwT8HVQ?autoplay=1',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [url for _, url in re.findall(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1',
+            webpage)]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        info = self._download_xml(
+            'http://videa.hu/videaplayer_get_xml.php', video_id,
+            query={'v': video_id})
+
+        video = xpath_element(info, './/video', 'video', fatal=True)
+        sources = xpath_element(info, './/video_sources', 'sources', fatal=True)
+
+        title = xpath_text(video, './title', fatal=True)
+
+        formats = []
+        for source in sources.findall('./video_source'):
+            source_url = source.text
+            if not source_url:
+                continue
+            f = parse_codecs(source.get('codecs'))
+            f.update({
+                'url': source_url,
+                'ext': mimetype2ext(source.get('mimetype')) or 'mp4',
+                'format_id': source.get('name'),
+                'width': int_or_none(source.get('width')),
+                'height': int_or_none(source.get('height')),
+            })
+            formats.append(f)
+        self._sort_formats(formats)
+
+        thumbnail = xpath_text(video, './poster_src')
+        duration = int_or_none(xpath_text(video, './duration'))
+
+        age_limit = None
+        is_adult = xpath_text(video, './is_adult_content', default=None)
+        if is_adult:
+            age_limit = 18 if is_adult == '1' else 0
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'age_limit': age_limit,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/videomega.py b/youtube_dl/extractor/videomega.py

index 4f0dcd18c7f28ab17aec58c814d53fd8ae21e7ac..c02830dddcd838dbfc8070c17a009d5924a00f7e 100644 (file)
--- a/youtube_dl/extractor/videomega.py
+++ b/youtube_dl/extractor/videomega.py
@@ -19,7 +19,7 @@ class VideoMegaIE(InfoExtractor):
              'id': 'AOSQBJYKIDDIKYJBQSOA',
              'ext': 'mp4',
              'title': '1254207',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://videomega.tv/cdn.php?ref=AOSQBJYKIDDIKYJBQSOA&width=1070&height=600',
diff --git a/youtube_dl/extractor/videomore.py b/youtube_dl/extractor/videomore.py

index 7f25665864c696757903deeb582a64f16eec0d85..9b56630de3516b000436d57ca6e6cbcc580cc0ec 100644 (file)
--- a/youtube_dl/extractor/videomore.py
+++ b/youtube_dl/extractor/videomore.py
@@ -23,7 +23,7 @@ class VideomoreIE(InfoExtractor):
              'title': 'Кино в деталях 5 сезон В гостях Алексей Чумаков и Юлия Ковальчук',
              'series': 'Кино в деталях',
              'episode': 'В гостях Алексей Чумаков и Юлия Ковальчук',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2910,
              'view_count': int,
              'comment_count': int,
@@ -37,7 +37,7 @@ class VideomoreIE(InfoExtractor):
              'title': 'Молодежка 2 сезон 40 серия',
              'series': 'Молодежка',
              'episode': '40 серия',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2809,
              'view_count': int,
              'comment_count': int,
@@ -53,7 +53,7 @@ class VideomoreIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Промо Команда проиграла из-за Бакина?',
              'episode': 'Команда проиграла из-за Бакина?',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 29,
              'age_limit': 16,
              'view_count': int,
@@ -145,7 +145,7 @@ class VideomoreVideoIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Ёлки 3',
              'description': '',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 5579,
              'age_limit': 6,
              'view_count': int,
@@ -168,7 +168,7 @@ class VideomoreVideoIE(InfoExtractor):
              'ext': 'flv',
              'title': '1 серия. Здравствуй, Аквавилль!',
              'description': 'md5:c6003179538b5d353e7bcd5b1372b2d7',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 754,
              'age_limit': 6,
              'view_count': int,
diff --git a/youtube_dl/extractor/videopress.py b/youtube_dl/extractor/videopress.py

new file mode 100644 (file)

index 0000000..049db25
--- /dev/null
+++ b/youtube_dl/extractor/videopress.py
@@ -0,0 +1,99 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import random
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    determine_ext,
+    float_or_none,
+    parse_age_limit,
+    qualities,
+    try_get,
+    unified_timestamp,
+    urljoin,
+)
+
+
+class VideoPressIE(InfoExtractor):
+    _VALID_URL = r'https?://videopress\.com/embed/(?P<id>[\da-zA-Z]+)'
+    _TESTS = [{
+        'url': 'https://videopress.com/embed/kUJmAcSf',
+        'md5': '706956a6c875873d51010921310e4bc6',
+        'info_dict': {
+            'id': 'kUJmAcSf',
+            'ext': 'mp4',
+            'title': 'VideoPress Demo',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 634.6,
+            'timestamp': 1434983935,
+            'upload_date': '20150622',
+            'age_limit': 0,
+        },
+    }, {
+        # 17+, requires birth_* params
+        'url': 'https://videopress.com/embed/iH3gstfZ',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+src=["\']((?:https?://)?videopress\.com/embed/[\da-zA-Z]+)',
+            webpage)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://public-api.wordpress.com/rest/v1.1/videos/%s' % video_id,
+            video_id, query={
+                'birth_month': random.randint(1, 12),
+                'birth_day': random.randint(1, 31),
+                'birth_year': random.randint(1950, 1995),
+            })
+
+        title = video['title']
+
+        def base_url(scheme):
+            return try_get(
+                video, lambda x: x['file_url_base'][scheme], compat_str)
+
+        base_url = base_url('https') or base_url('http')
+
+        QUALITIES = ('std', 'dvd', 'hd')
+        quality = qualities(QUALITIES)
+
+        formats = []
+        for format_id, f in video['files'].items():
+            if not isinstance(f, dict):
+                continue
+            for ext, path in f.items():
+                if ext in ('mp4', 'ogg'):
+                    formats.append({
+                        'url': urljoin(base_url, path),
+                        'format_id': '%s-%s' % (format_id, ext),
+                        'ext': determine_ext(path, ext),
+                        'quality': quality(format_id),
+                    })
+        original_url = try_get(video, lambda x: x['original'], compat_str)
+        if original_url:
+            formats.append({
+                'url': original_url,
+                'format_id': 'original',
+                'quality': len(QUALITIES),
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video.get('description'),
+            'thumbnail': video.get('poster'),
+            'duration': float_or_none(video.get('duration'), 1000),
+            'timestamp': unified_timestamp(video.get('upload_date')),
+            'age_limit': parse_age_limit(video.get('rating')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/videott.py b/youtube_dl/extractor/videott.py

deleted file mode 100644 (file)

index 0f79871..0000000
--- a/youtube_dl/extractor/videott.py
+++ /dev/null
@@ -1,65 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-import base64
-
-from .common import InfoExtractor
-from ..utils import (
-    unified_strdate,
-    int_or_none,
-)
-
-
-class VideoTtIE(InfoExtractor):
-    _WORKING = False
-    ID_NAME = 'video.tt'
-    IE_DESC = 'video.tt - Your True Tube'
-    _VALID_URL = r'https?://(?:www\.)?video\.tt/(?:(?:video|embed)/|watch_video\.php\?v=)(?P<id>[\da-zA-Z]{9})'
-
-    _TESTS = [{
-        'url': 'http://www.video.tt/watch_video.php?v=amd5YujV8',
-        'md5': 'b13aa9e2f267effb5d1094443dff65ba',
-        'info_dict': {
-            'id': 'amd5YujV8',
-            'ext': 'flv',
-            'title': 'Motivational video Change your mind in just 2.50 mins',
-            'description': '',
-            'upload_date': '20130827',
-            'uploader': 'joseph313',
-        }
-    }, {
-        'url': 'http://video.tt/embed/amd5YujV8',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        settings = self._download_json(
-            'http://www.video.tt/player_control/settings.php?v=%s' % video_id, video_id,
-            'Downloading video JSON')['settings']
-
-        video = settings['video_details']['video']
-
-        formats = [
-            {
-                'url': base64.b64decode(res['u'].encode('utf-8')).decode('utf-8'),
-                'ext': 'flv',
-                'format_id': res['l'],
-            } for res in settings['res'] if res['u']
-        ]
-
-        return {
-            'id': video_id,
-            'title': video['title'],
-            'description': video['description'],
-            'thumbnail': settings['config']['thumbnail'],
-            'upload_date': unified_strdate(video['added']),
-            'uploader': video['owner'],
-            'view_count': int_or_none(video['view_count']),
-            'comment_count': None if video.get('comment_count') == '--' else int_or_none(video['comment_count']),
-            'like_count': int_or_none(video['liked']),
-            'dislike_count': int_or_none(video['disliked']),
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/vidio.py b/youtube_dl/extractor/vidio.py

index 6898042de728f533367e2bedc50abc710939c1d6..4e4b4e38caaf920eacc7e29b487d9a9ad26d90cc 100644 (file)
--- a/youtube_dl/extractor/vidio.py
+++ b/youtube_dl/extractor/vidio.py
@@ -18,7 +18,7 @@ class VidioIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'DJ_AMBRED - Booyah (Live 2015)',
              'description': 'md5:27dc15f819b6a78a626490881adbadf8',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 149,
              'like_count': int,
          },
diff --git a/youtube_dl/extractor/vidme.py b/youtube_dl/extractor/vidme.py

index b1156d531aba6793fc7ce7dda9649950d922f606..e9ff336c4f5cb2e5a4b08fe5a97aa9993bdf87e0 100644 (file)
--- a/youtube_dl/extractor/vidme.py
+++ b/youtube_dl/extractor/vidme.py
@@ -23,7 +23,7 @@ class VidmeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Fishing for piranha - the easy way',
              'description': 'source: https://www.facebook.com/photo.php?v=312276045600871',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1406313244,
              'upload_date': '20140725',
              'age_limit': 0,
@@ -39,7 +39,7 @@ class VidmeIE(InfoExtractor):
              'id': 'Gc6M',
              'ext': 'mp4',
              'title': 'O Mere Dil ke chain - Arnav and Khushi VM',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1441211642,
              'upload_date': '20150902',
              'uploader': 'SunshineM',
@@ -61,7 +61,7 @@ class VidmeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'The Carver',
              'description': 'md5:e9c24870018ae8113be936645b93ba3c',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1433203629,
              'upload_date': '20150602',
              'uploader': 'Thomas',
@@ -82,7 +82,7 @@ class VidmeIE(InfoExtractor):
              'id': 'Wmur',
              'ext': 'mp4',
              'title': 'naked smoking & stretching',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1430931613,
              'upload_date': '20150506',
              'uploader': 'naked-yogi',
@@ -115,7 +115,7 @@ class VidmeIE(InfoExtractor):
              'id': 'e5g',
              'ext': 'mp4',
              'title': 'Video upload (e5g)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1401480195,
              'upload_date': '20140530',
              'uploader': None,
diff --git a/youtube_dl/extractor/viewlift.py b/youtube_dl/extractor/viewlift.py

index 19500eba84f1bf7b4fdf7c59b8b56ca7e5b91efc..18735cfb23d907e4fb83882cda10dda7eb6f41ba 100644 (file)
--- a/youtube_dl/extractor/viewlift.py
+++ b/youtube_dl/extractor/viewlift.py
@@ -14,7 +14,7 @@ from ..utils import (
  
  
  class ViewLiftBaseIE(InfoExtractor):
-    _DOMAINS_REGEX = '(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv'
+    _DOMAINS_REGEX = r'(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv'
  
  
  class ViewLiftEmbedIE(ViewLiftBaseIE):
@@ -110,7 +110,7 @@ class ViewLiftIE(ViewLiftBaseIE):
              'ext': 'mp4',
              'title': 'Lost for Life',
              'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 4489,
              'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals']
          }
@@ -123,7 +123,7 @@ class ViewLiftIE(ViewLiftBaseIE):
              'ext': 'mp4',
              'title': 'India',
              'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 979,
              'categories': ['Documentary', 'Sports', 'Politics']
          }
@@ -160,7 +160,7 @@ class ViewLiftIE(ViewLiftBaseIE):
  
          snag = self._parse_json(
              self._search_regex(
-                'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
+                r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
              display_id)
  
          for item in snag:
diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py

index a93196a0772fd5588dd2f55c327427d00e814eb4..52dd95e2fe19041f01e49b1a4a3f9566c1f0e60b 100644 (file)
--- a/youtube_dl/extractor/viewster.py
+++ b/youtube_dl/extractor/viewster.py
@@ -157,7 +157,7 @@ class ViewsterIE(InfoExtractor):
                          formats.extend(m3u8_formats)
                  else:
                      qualities_basename = self._search_regex(
-                        '/([^/]+)\.csmil/',
+                        r'/([^/]+)\.csmil/',
                          manifest_url, 'qualities basename', default=None)
                      if not qualities_basename:
                          continue
diff --git a/youtube_dl/extractor/viidea.py b/youtube_dl/extractor/viidea.py

index a4f914d1449ad1ad4fd38938fe81591a703d6120..4adcd183030438762f4a82be90adb171e1a38d34 100644 (file)
--- a/youtube_dl/extractor/viidea.py
+++ b/youtube_dl/extractor/viidea.py
@@ -40,7 +40,7 @@ class ViideaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Automatics, robotics and biocybernetics',
              'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1372349289,
              'upload_date': '20130627',
              'duration': 565,
@@ -58,7 +58,7 @@ class ViideaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'NLP at Google',
              'description': 'md5:fc7a6d9bf0302d7cc0e53f7ca23747b3',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1284375600,
              'upload_date': '20100913',
              'duration': 5352,
@@ -74,7 +74,7 @@ class ViideaIE(InfoExtractor):
              'id': '23181',
              'title': 'Deep Learning Summer School, Montreal 2015',
              'description': 'md5:0533a85e4bd918df52a01f0e1ebe87b7',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1438560000,
          },
          'playlist_count': 30,
@@ -85,7 +85,7 @@ class ViideaIE(InfoExtractor):
              'id': '9737',
              'display_id': 'mlss09uk_bishop_ibi',
              'title': 'Introduction To Bayesian Inference',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1251622800,
          },
          'playlist': [{
@@ -94,7 +94,7 @@ class ViideaIE(InfoExtractor):
                  'display_id': 'mlss09uk_bishop_ibi_part1',
                  'ext': 'wmv',
                  'title': 'Introduction To Bayesian Inference (Part 1)',
-                'thumbnail': 're:http://.*\.jpg',
+                'thumbnail': r're:http://.*\.jpg',
                  'duration': 4622,
                  'timestamp': 1251622800,
                  'upload_date': '20090830',
@@ -105,7 +105,7 @@ class ViideaIE(InfoExtractor):
                  'display_id': 'mlss09uk_bishop_ibi_part2',
                  'ext': 'wmv',
                  'title': 'Introduction To Bayesian Inference (Part 2)',
-                'thumbnail': 're:http://.*\.jpg',
+                'thumbnail': r're:http://.*\.jpg',
                  'duration': 5641,
                  'timestamp': 1251622800,
                  'upload_date': '20090830',
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index 51c69a80c216889315a4c5fe070572100c13dd36..61cc469bf27b58bfc70eb8bd036737ec0a4cb66c 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -21,12 +21,12 @@ from ..utils import (
      sanitized_Request,
      smuggle_url,
      std_headers,
-    unified_strdate,
+    try_get,
+    unified_timestamp,
      unsmuggle_url,
      urlencode_postdata,
      unescapeHTML,
      parse_filesize,
-    try_get,
  )
  
  
@@ -92,29 +92,30 @@ class VimeoBaseInfoExtractor(InfoExtractor):
      def _vimeo_sort_formats(self, formats):
          # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
          # at the same time without actual units specified. This lead to wrong sorting.
-        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
  
      def _parse_config(self, config, video_id):
+        video_data = config['video']
          # Extract title
-        video_title = config['video']['title']
+        video_title = video_data['title']
  
          # Extract uploader, uploader_url and uploader_id
-        video_uploader = config['video'].get('owner', {}).get('name')
-        video_uploader_url = config['video'].get('owner', {}).get('url')
+        video_uploader = video_data.get('owner', {}).get('name')
+        video_uploader_url = video_data.get('owner', {}).get('url')
          video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
  
          # Extract video thumbnail
-        video_thumbnail = config['video'].get('thumbnail')
+        video_thumbnail = video_data.get('thumbnail')
          if video_thumbnail is None:
-            video_thumbs = config['video'].get('thumbs')
+            video_thumbs = video_data.get('thumbs')
              if video_thumbs and isinstance(video_thumbs, dict):
                  _, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
  
          # Extract video duration
-        video_duration = int_or_none(config['video'].get('duration'))
+        video_duration = int_or_none(video_data.get('duration'))
  
          formats = []
-        config_files = config['video'].get('files') or config['request'].get('files', {})
+        config_files = video_data.get('files') or config['request'].get('files', {})
          for f in config_files.get('progressive', []):
              video_url = f.get('url')
              if not video_url:
@@ -127,10 +128,33 @@ class VimeoBaseInfoExtractor(InfoExtractor):
                  'fps': int_or_none(f.get('fps')),
                  'tbr': int_or_none(f.get('bitrate')),
              })
-        m3u8_url = config_files.get('hls', {}).get('url')
-        if m3u8_url:
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+
+        for files_type in ('hls', 'dash'):
+            for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
+                manifest_url = cdn_data.get('url')
+                if not manifest_url:
+                    continue
+                format_id = '%s-%s' % (files_type, cdn_name)
+                if files_type == 'hls':
+                    formats.extend(self._extract_m3u8_formats(
+                        manifest_url, video_id, 'mp4',
+                        'm3u8_native', m3u8_id=format_id,
+                        note='Downloading %s m3u8 information' % cdn_name,
+                        fatal=False))
+                elif files_type == 'dash':
+                    mpd_pattern = r'/%s/(?:sep/)?video/' % video_id
+                    mpd_manifest_urls = []
+                    if re.search(mpd_pattern, manifest_url):
+                        for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
+                            mpd_manifest_urls.append((format_id + suffix, re.sub(
+                                mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url)))
+                    else:
+                        mpd_manifest_urls = [(format_id, manifest_url)]
+                    for f_id, m_url in mpd_manifest_urls:
+                        formats.extend(self._extract_mpd_formats(
+                            m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
+                            'Downloading %s MPD information' % cdn_name,
+                            fatal=False))
  
          subtitles = {}
          text_tracks = config['request'].get('text_tracks')
@@ -189,11 +213,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
                  'description': 'md5:2d3305bad981a06ff79f027f19865021',
+                'timestamp': 1355990239,
                  'upload_date': '20121220',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user7108434',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user7108434',
                  'uploader_id': 'user7108434',
                  'uploader': 'Filippo Valsorda',
                  'duration': 10,
+                'license': 'by-sa',
              },
          },
          {
@@ -203,7 +229,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'info_dict': {
                  'id': '68093876',
                  'ext': 'mp4',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/openstreetmapus',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/openstreetmapus',
                  'uploader_id': 'openstreetmapus',
                  'uploader': 'OpenStreetMap US',
                  'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
@@ -220,7 +246,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
                  'uploader': 'The BLN & Business of Software',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
                  'uploader_id': 'theblnbusinessofsoftware',
                  'duration': 3610,
                  'description': None,
@@ -234,12 +260,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'id': '68375962',
                  'ext': 'mp4',
                  'title': 'youtube-dl password protected test video',
+                'timestamp': 1371200155,
                  'upload_date': '20130614',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user18948128',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user18948128',
                  'uploader_id': 'user18948128',
                  'uploader': 'Jaime Marquínez Ferrándiz',
                  'duration': 10,
-                'description': 'This is "youtube-dl password protected test video" by  on Vimeo, the home for high quality videos and the people who love them.',
+                'description': 'md5:dca3ea23adb29ee387127bc4ddfce63f',
              },
              'params': {
                  'videopassword': 'youtube-dl',
@@ -253,10 +280,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Key & Peele: Terrorist Interrogation',
                  'description': 'md5:8678b246399b070816b12313e8b4eb5c',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/atencio',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio',
                  'uploader_id': 'atencio',
                  'uploader': 'Peter Atencio',
-                'upload_date': '20130927',
+                'timestamp': 1380339469,
+                'upload_date': '20130928',
                  'duration': 187,
              },
          },
@@ -268,8 +296,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'The New Vimeo Player (You Know, For Videos)',
                  'description': 'md5:2ec900bf97c3f389378a96aee11260ea',
+                'timestamp': 1381846109,
                  'upload_date': '20131015',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/staff',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/staff',
                  'uploader_id': 'staff',
                  'uploader': 'Vimeo Staff',
                  'duration': 62,
@@ -284,21 +313,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Pier Solar OUYA Official Trailer',
                  'uploader': 'Tulio Gonçalves',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user28849593',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593',
                  'uploader_id': 'user28849593',
              },
          },
          {
              # contains original format
              'url': 'https://vimeo.com/33951933',
-            'md5': '2d9f5475e0537f013d0073e812ab89e6',
+            'md5': '53c688fa95a55bf4b7293d37a89c5c53',
              'info_dict': {
                  'id': '33951933',
                  'ext': 'mp4',
                  'title': 'FOX CLASSICS - Forever Classic ID - A Full Minute',
                  'uploader': 'The DMCI',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/dmci',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/dmci',
                  'uploader_id': 'dmci',
+                'timestamp': 1324343742,
                  'upload_date': '20111220',
                  'description': 'md5:ae23671e82d05415868f7ad1aec21147',
              },
@@ -309,11 +339,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'url': 'https://vimeo.com/channels/tributes/6213729',
              'info_dict': {
                  'id': '6213729',
-                'ext': 'mp4',
+                'ext': 'mov',
                  'title': 'Vimeo Tribute: The Shining',
                  'uploader': 'Casey Donahue',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/caseydonahue',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue',
                  'uploader_id': 'caseydonahue',
+                'timestamp': 1250886430,
                  'upload_date': '20090821',
                  'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
              },
@@ -323,7 +354,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'expected_warnings': ['Unable to download JSON metadata'],
          },
          {
-            # redirects to ondemand extractor and should be passed throught it
+            # redirects to ondemand extractor and should be passed through it
              # for successful extraction
              'url': 'https://vimeo.com/73445910',
              'info_dict': {
@@ -331,7 +362,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'The Reluctant Revolutionary',
                  'uploader': '10Ft Films',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
                  'uploader_id': 'tenfootfilms',
              },
              'params': {
@@ -462,6 +493,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
                      '%s said: %s' % (self.IE_NAME, seed_status['title']),
                      expected=True)
  
+        cc_license = None
+        timestamp = None
+
          # Extract the config JSON
          try:
              try:
@@ -475,14 +509,18 @@ class VimeoIE(VimeoBaseInfoExtractor):
                      vimeo_clip_page_config = self._search_regex(
                          r'vimeo\.clip_page_config\s*=\s*({.+?});', webpage,
                          'vimeo clip page config')
-                    config_url = self._parse_json(
-                        vimeo_clip_page_config, video_id)['player']['config_url']
+                    page_config = self._parse_json(vimeo_clip_page_config, video_id)
+                    config_url = page_config['player']['config_url']
+                    cc_license = page_config.get('cc_license')
+                    timestamp = try_get(
+                        page_config, lambda x: x['clip']['uploaded_on'],
+                        compat_str)
                  config_json = self._download_webpage(config_url, video_id)
                  config = json.loads(config_json)
              except RegexNotFoundError:
                  # For pro videos or player.vimeo.com urls
                  # We try to find out to which variable is assigned the config dic
-                m_variable_name = re.search('(\w)\.video\.id', webpage)
+                m_variable_name = re.search(r'(\w)\.video\.id', webpage)
                  if m_variable_name is not None:
                      config_re = r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1))
                  else:
@@ -545,10 +583,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
              self._downloader.report_warning('Cannot find video description')
  
          # Extract upload date
-        video_upload_date = None
-        mobj = re.search(r'<time[^>]+datetime="([^"]+)"', webpage)
-        if mobj is not None:
-            video_upload_date = unified_strdate(mobj.group(1))
+        if not timestamp:
+            timestamp = self._search_regex(
+                r'<time[^>]+datetime="([^"]+)"', webpage,
+                'timestamp', default=None)
  
          try:
              view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, 'view count'))
@@ -585,15 +623,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
          info_dict = self._parse_config(config, video_id)
          formats.extend(info_dict['formats'])
          self._vimeo_sort_formats(formats)
+
+        if not cc_license:
+            cc_license = self._search_regex(
+                r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
+                webpage, 'license', default=None, group='license')
+
          info_dict.update({
              'id': video_id,
              'formats': formats,
-            'upload_date': video_upload_date,
+            'timestamp': unified_timestamp(timestamp),
              'description': video_description,
              'webpage_url': url,
              'view_count': view_count,
              'like_count': like_count,
              'comment_count': comment_count,
+            'license': cc_license,
          })
  
          return info_dict
@@ -611,9 +656,12 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'המעבדה - במאי יותם פלדמן',
              'uploader': 'גם סרטים',
-            'uploader_url': 're:https?://(?:www\.)?vimeo\.com/gumfilms',
+            'uploader_url': r're:https?://(?:www\.)?vimeo\.com/gumfilms',
              'uploader_id': 'gumfilms',
          },
+        'params': {
+            'format': 'best[protocol=https]',
+        },
      }, {
          # requires Referer to be passed along with og:video:url
          'url': 'https://vimeo.com/ondemand/36938/126682985',
@@ -622,7 +670,7 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'Rävlock, rätt läte på rätt plats',
              'uploader': 'Lindroth & Norin',
-            'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user14430847',
+            'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user14430847',
              'uploader_id': 'user14430847',
          },
          'params': {
@@ -712,12 +760,12 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
              # Try extracting href first since not all videos are available via
              # short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729)
              clips = re.findall(
-                r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)', webpage)
+                r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)(?:[^>]+\btitle="([^"]+)")?', webpage)
              if clips:
-                for video_id, video_url in clips:
+                for video_id, video_url, video_title in clips:
                      yield self.url_result(
                          compat_urlparse.urljoin(base_url, video_url),
-                        VimeoIE.ie_key(), video_id=video_id)
+                        VimeoIE.ie_key(), video_id=video_id, video_title=video_title)
              # More relaxed fallback
              else:
                  for video_id in re.findall(r'id=["\']clip_(\d+)', webpage):
@@ -842,7 +890,7 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
              'title': 're:(?i)^Death by dogma versus assembling agile . Sander Hoogendoorn',
              'uploader': 'DevWeek Events',
              'duration': 2773,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader_id': 'user22258446',
          }
      }, {
@@ -866,10 +914,14 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
  
      def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
          webpage = self._download_webpage(webpage_url, video_id)
-        data = self._parse_json(self._search_regex(
-            r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
-            default=NO_DEFAULT if video_password_verified else '{}'), video_id)
-        config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
+        config_url = self._html_search_regex(
+            r'data-config-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+            'config URL', default=None, group='url')
+        if not config_url:
+            data = self._parse_json(self._search_regex(
+                r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
+                default=NO_DEFAULT if video_password_verified else '{}'), video_id)
+            config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
          if config_url is None:
              self._verify_video_password(webpage_url, video_id, webpage)
              config_url = self._get_config_url(
diff --git a/youtube_dl/extractor/vimple.py b/youtube_dl/extractor/vimple.py

index 7fd9b777b4b6bb88cd08e9e625f74f41e8775092..c74b437668c62c7ac38d085433532c994ab8fbb7 100644 (file)
--- a/youtube_dl/extractor/vimple.py
+++ b/youtube_dl/extractor/vimple.py
@@ -37,7 +37,7 @@ class VimpleIE(SprutoBaseIE):
              'ext': 'mp4',
              'title': 'Sunset',
              'duration': 20,
-            'thumbnail': 're:https?://.*?\.jpg',
+            'thumbnail': r're:https?://.*?\.jpg',
          },
      }, {
          'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9',
diff --git a/youtube_dl/extractor/vine.py b/youtube_dl/extractor/vine.py

index 0183f052a599f411a48053360ee41670e758f7af..4957a07f7bde0797f1f8b6f4c4ef9297f8e915e3 100644 (file)
--- a/youtube_dl/extractor/vine.py
+++ b/youtube_dl/extractor/vine.py
@@ -6,8 +6,9 @@ import itertools
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      int_or_none,
-    unified_strdate,
+    unified_timestamp,
  )
  
  
@@ -20,50 +21,16 @@ class VineIE(InfoExtractor):
              'id': 'b9KOOWX7HUx',
              'ext': 'mp4',
              'title': 'Chicken.',
-            'alt_title': 'Vine by Jack Dorsey',
+            'alt_title': 'Vine by Jack',
+            'timestamp': 1368997951,
              'upload_date': '20130519',
-            'uploader': 'Jack Dorsey',
+            'uploader': 'Jack',
              'uploader_id': '76',
              'view_count': int,
              'like_count': int,
              'comment_count': int,
              'repost_count': int,
          },
-    }, {
-        'url': 'https://vine.co/v/MYxVapFvz2z',
-        'md5': '7b9a7cbc76734424ff942eb52c8f1065',
-        'info_dict': {
-            'id': 'MYxVapFvz2z',
-            'ext': 'mp4',
-            'title': 'Fuck Da Police #Mikebrown #justice #ferguson #prayforferguson #protesting #NMOS14',
-            'alt_title': 'Vine by Mars Ruiz',
-            'upload_date': '20140815',
-            'uploader': 'Mars Ruiz',
-            'uploader_id': '1102363502380728320',
-            'view_count': int,
-            'like_count': int,
-            'comment_count': int,
-            'repost_count': int,
-        },
-    }, {
-        'url': 'https://vine.co/v/bxVjBbZlPUH',
-        'md5': 'ea27decea3fa670625aac92771a96b73',
-        'info_dict': {
-            'id': 'bxVjBbZlPUH',
-            'ext': 'mp4',
-            'title': '#mw3 #ac130 #killcam #angelofdeath',
-            'alt_title': 'Vine by Z3k3',
-            'upload_date': '20130430',
-            'uploader': 'Z3k3',
-            'uploader_id': '936470460173008896',
-            'view_count': int,
-            'like_count': int,
-            'comment_count': int,
-            'repost_count': int,
-        },
-    }, {
-        'url': 'https://vine.co/oembed/MYxVapFvz2z.json',
-        'only_matching': True,
      }, {
          'url': 'https://vine.co/v/e192BnZnZ9V',
          'info_dict': {
@@ -71,6 +38,7 @@ class VineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'ยิ้ม~ เขิน~ อาย~ น่าร้ากอ้ะ >//< @n_whitewo @orlameena #lovesicktheseries  #lovesickseason2',
              'alt_title': 'Vine by Pimry_zaa',
+            'timestamp': 1436057405,
              'upload_date': '20150705',
              'uploader': 'Pimry_zaa',
              'uploader_id': '1135760698325307392',
@@ -82,43 +50,60 @@ class VineIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
+    }, {
+        'url': 'https://vine.co/v/MYxVapFvz2z',
+        'only_matching': True,
+    }, {
+        'url': 'https://vine.co/v/bxVjBbZlPUH',
+        'only_matching': True,
+    }, {
+        'url': 'https://vine.co/oembed/MYxVapFvz2z.json',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage('https://vine.co/v/' + video_id, video_id)
-
-        data = self._parse_json(
-            self._search_regex(
-                r'window\.POST_DATA\s*=\s*({.+?});\s*</script>',
-                webpage, 'vine data'),
-            video_id)
-
-        data = data[list(data.keys())[0]]
-
-        formats = [{
-            'format_id': '%(format)s-%(rate)s' % f,
-            'vcodec': f.get('format'),
-            'quality': f.get('rate'),
-            'url': f['videoUrl'],
-        } for f in data['videoUrls'] if f.get('videoUrl')]
  
+        data = self._download_json(
+            'https://archive.vine.co/posts/%s.json' % video_id, video_id)
+
+        def video_url(kind):
+            for url_suffix in ('Url', 'URL'):
+                format_url = data.get('video%s%s' % (kind, url_suffix))
+                if format_url:
+                    return format_url
+
+        formats = []
+        for quality, format_id in enumerate(('low', '', 'dash')):
+            format_url = video_url(format_id.capitalize())
+            if not format_url:
+                continue
+            # DASH link returns plain mp4
+            if format_id == 'dash' and determine_ext(format_url) == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    format_url, video_id, mpd_id='dash', fatal=False))
+            else:
+                formats.append({
+                    'url': format_url,
+                    'format_id': format_id or 'standard',
+                    'quality': quality,
+                })
          self._sort_formats(formats)
  
          username = data.get('username')
  
          return {
              'id': video_id,
-            'title': data.get('description') or self._og_search_title(webpage),
-            'alt_title': 'Vine by %s' % username if username else self._og_search_description(webpage, default=None),
+            'title': data.get('description'),
+            'alt_title': 'Vine by %s' % username if username else None,
              'thumbnail': data.get('thumbnailUrl'),
-            'upload_date': unified_strdate(data.get('created')),
+            'timestamp': unified_timestamp(data.get('created')),
              'uploader': username,
              'uploader_id': data.get('userIdStr'),
-            'view_count': int_or_none(data.get('loops', {}).get('count')),
-            'like_count': int_or_none(data.get('likes', {}).get('count')),
-            'comment_count': int_or_none(data.get('comments', {}).get('count')),
-            'repost_count': int_or_none(data.get('reposts', {}).get('count')),
+            'view_count': int_or_none(data.get('loops')),
+            'like_count': int_or_none(data.get('likes')),
+            'comment_count': int_or_none(data.get('comments')),
+            'repost_count': int_or_none(data.get('reposts')),
              'formats': formats,
          }
  
diff --git a/youtube_dl/extractor/viu.py b/youtube_dl/extractor/viu.py

new file mode 100644 (file)

index 0000000..3fd889c
--- /dev/null
+++ b/youtube_dl/extractor/viu.py
@@ -0,0 +1,249 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
+
+
+class ViuBaseIE(InfoExtractor):
+    def _real_initialize(self):
+        viu_auth_res = self._request_webpage(
+            'https://www.viu.com/api/apps/v2/authenticate', None,
+            'Requesting Viu auth', query={
+                'acct': 'test',
+                'appid': 'viu_desktop',
+                'fmt': 'json',
+                'iid': 'guest',
+                'languageid': 'default',
+                'platform': 'desktop',
+                'userid': 'guest',
+                'useridtype': 'guest',
+                'ver': '1.0'
+            }, headers=self.geo_verification_headers())
+        self._auth_token = viu_auth_res.info()['X-VIU-AUTH']
+
+    def _call_api(self, path, *args, **kwargs):
+        headers = self.geo_verification_headers()
+        headers.update({
+            'X-VIU-AUTH': self._auth_token
+        })
+        headers.update(kwargs.get('headers', {}))
+        kwargs['headers'] = headers
+        response = self._download_json(
+            'https://www.viu.com/api/' + path, *args, **kwargs)['response']
+        if response.get('status') != 'success':
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, response['message']), expected=True)
+        return response
+
+
+class ViuIE(ViuBaseIE):
+    _VALID_URL = r'(?:viu:|https?://www\.viu\.com/[a-z]{2}/media/)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.viu.com/en/media/1116705532?containerId=playlist-22168059',
+        'info_dict': {
+            'id': '1116705532',
+            'ext': 'mp4',
+            'title': 'Citizen Khan - Ep 1',
+            'description': 'md5:d7ea1604f49e5ba79c212c551ce2110e',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to India',
+    }, {
+        'url': 'https://www.viu.com/en/media/1130599965',
+        'info_dict': {
+            'id': '1130599965',
+            'ext': 'mp4',
+            'title': 'Jealousy Incarnate - Episode 1',
+            'description': 'md5:d3d82375cab969415d2720b6894361e9',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to Indonesia',
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video_data = self._call_api(
+            'clip/load', video_id, 'Downloading video data', query={
+                'appid': 'viu_desktop',
+                'fmt': 'json',
+                'id': video_id
+            })['item'][0]
+
+        title = video_data['title']
+
+        m3u8_url = None
+        url_path = video_data.get('urlpathd') or video_data.get('urlpath')
+        tdirforwhole = video_data.get('tdirforwhole')
+        # #EXT-X-BYTERANGE is not supported by native hls downloader
+        # and ffmpeg (#10955)
+        # hls_file = video_data.get('hlsfile')
+        hls_file = video_data.get('jwhlsfile')
+        if url_path and tdirforwhole and hls_file:
+            m3u8_url = '%s/%s/%s' % (url_path, tdirforwhole, hls_file)
+        else:
+            # m3u8_url = re.sub(
+            #     r'(/hlsc_)[a-z]+(\d+\.m3u8)',
+            #     r'\1whe\2', video_data['href'])
+            m3u8_url = video_data['href']
+        formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for key, value in video_data.items():
+            mobj = re.match(r'^subtitle_(?P<lang>[^_]+)_(?P<ext>(vtt|srt))', key)
+            if not mobj:
+                continue
+            subtitles.setdefault(mobj.group('lang'), []).append({
+                'url': value,
+                'ext': mobj.group('ext')
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'series': video_data.get('moviealbumshowname'),
+            'episode': title,
+            'episode_number': int_or_none(video_data.get('episodeno')),
+            'duration': int_or_none(video_data.get('duration')),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
+
+
+class ViuPlaylistIE(ViuBaseIE):
+    IE_NAME = 'viu:playlist'
+    _VALID_URL = r'https?://www\.viu\.com/[^/]+/listing/playlist-(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://www.viu.com/en/listing/playlist-22461380',
+        'info_dict': {
+            'id': '22461380',
+            'title': 'The Good Wife',
+        },
+        'playlist_count': 16,
+        'skip': 'Geo-restricted to Indonesia',
+    }
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        playlist_data = self._call_api(
+            'container/load', playlist_id,
+            'Downloading playlist info', query={
+                'appid': 'viu_desktop',
+                'fmt': 'json',
+                'id': 'playlist-' + playlist_id
+            })['container']
+
+        entries = []
+        for item in playlist_data.get('item', []):
+            item_id = item.get('id')
+            if not item_id:
+                continue
+            item_id = compat_str(item_id)
+            entries.append(self.url_result(
+                'viu:' + item_id, 'Viu', item_id))
+
+        return self.playlist_result(
+            entries, playlist_id, playlist_data.get('title'))
+
+
+class ViuOTTIE(InfoExtractor):
+    IE_NAME = 'viu:ott'
+    _VALID_URL = r'https?://(?:www\.)?viu\.com/ott/(?P<country_code>[a-z]{2})/[a-z]{2}-[a-z]{2}/vod/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.viu.com/ott/sg/en-us/vod/3421/The%20Prime%20Minister%20and%20I',
+        'info_dict': {
+            'id': '3421',
+            'ext': 'mp4',
+            'title': 'A New Beginning',
+            'description': 'md5:1e7486a619b6399b25ba6a41c0fe5b2c',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to Singapore',
+    }, {
+        'url': 'http://www.viu.com/ott/hk/zh-hk/vod/7123/%E5%A4%A7%E4%BA%BA%E5%A5%B3%E5%AD%90',
+        'info_dict': {
+            'id': '7123',
+            'ext': 'mp4',
+            'title': '這就是我的生活之道',
+            'description': 'md5:4eb0d8b08cf04fcdc6bbbeb16043434f',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to Hong Kong',
+    }]
+
+    def _real_extract(self, url):
+        country_code, video_id = re.match(self._VALID_URL, url).groups()
+
+        product_data = self._download_json(
+            'http://www.viu.com/ott/%s/index.php' % country_code, video_id,
+            'Downloading video info', query={
+                'r': 'vod/ajax-detail',
+                'platform_flag_label': 'web',
+                'product_id': video_id,
+            })['data']
+
+        video_data = product_data.get('current_product')
+        if not video_data:
+            raise ExtractorError('This video is not available in your region.', expected=True)
+
+        stream_data = self._download_json(
+            'https://d1k2us671qcoau.cloudfront.net/distribute_web_%s.php' % country_code,
+            video_id, 'Downloading stream info', query={
+                'ccs_product_id': video_data['ccs_product_id'],
+            })['data']['stream']
+
+        stream_sizes = stream_data.get('size', {})
+        formats = []
+        for vid_format, stream_url in stream_data.get('url', {}).items():
+            height = int_or_none(self._search_regex(
+                r's(\d+)p', vid_format, 'height', default=None))
+            formats.append({
+                'format_id': vid_format,
+                'url': stream_url,
+                'height': height,
+                'ext': 'mp4',
+                'filesize': int_or_none(stream_sizes.get(vid_format))
+            })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for sub in video_data.get('subtitle', []):
+            sub_url = sub.get('url')
+            if not sub_url:
+                continue
+            subtitles.setdefault(sub.get('name'), []).append({
+                'url': sub_url,
+                'ext': 'srt',
+            })
+
+        title = video_data['synopsis'].strip()
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'series': product_data.get('series', {}).get('name'),
+            'episode': title,
+            'episode_number': int_or_none(video_data.get('number')),
+            'duration': int_or_none(stream_data.get('duration')),
+            'thumbnail': video_data.get('cover_image_url'),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index 1990e7093acabb2dce11faebfddd220e8d88392b..7c42a4f54864eeb370c6a4583daae950262b45b8 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -245,7 +245,7 @@ class VKIE(VKBaseIE):
              },
          },
          {
-            # finished live stream, live_mp4
+            # finished live stream, postlive_mp4
              'url': 'https://vk.com/videos-387766?z=video-387766_456242764%2Fpl_-387766_-2',
              'md5': '90d22d051fccbbe9becfccc615be6791',
              'info_dict': {
@@ -258,7 +258,7 @@ class VKIE(VKBaseIE):
              },
          },
          {
-            # live stream, hls and rtmp links,most likely already finished live
+            # live stream, hls and rtmp links, most likely already finished live
              # stream by the time you are reading this comment
              'url': 'https://vk.com/video-140332_456239111',
              'only_matching': True,
@@ -281,6 +281,11 @@ class VKIE(VKBaseIE):
          {
              'url': 'http://new.vk.com/video205387401_165548505',
              'only_matching': True,
+        },
+        {
+            # This video is no longer available, because its author has been blocked.
+            'url': 'https://vk.com/video-10639516_456240611',
+            'only_matching': True,
          }
      ]
  
@@ -328,6 +333,12 @@ class VKIE(VKBaseIE):
  
              r'<!>Access denied':
              'Access denied to video %s.',
+
+            r'<!>Видеозапись недоступна, так как её автор был заблокирован.':
+            'Video %s is no longer available, because its author has been blocked.',
+
+            r'<!>This video is no longer available, because its author has been blocked.':
+            'Video %s is no longer available, because its author has been blocked.',
          }
  
          for error_re, error_msg in ERRORS.items():
@@ -378,12 +389,24 @@ class VKIE(VKBaseIE):
          if not data:
              data = self._parse_json(
                  self._search_regex(
-                    r'<!json>\s*({.+?})\s*<!>', info_page, 'json'),
-                video_id)['player']['params'][0]
+                    r'<!json>\s*({.+?})\s*<!>', info_page, 'json', default='{}'),
+                video_id)
+            if data:
+                data = data['player']['params'][0]
+
+        if not data:
+            data = self._parse_json(
+                self._search_regex(
+                    r'var\s+playerParams\s*=\s*({.+?})\s*;\s*\n', info_page,
+                    'player params'),
+                video_id)['params'][0]
  
          title = unescapeHTML(data['md_title'])
  
-        if data.get('live') == 2:
+        # 2 = live
+        # 3 = post live (finished live)
+        is_live = data.get('live') == 2
+        if is_live:
              title = self._live_title(title)
  
          timestamp = unified_timestamp(self._html_search_regex(
@@ -398,7 +421,8 @@ class VKIE(VKBaseIE):
          for format_id, format_url in data.items():
              if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//', 'rtmp')):
                  continue
-            if format_id.startswith(('url', 'cache')) or format_id in ('extra_data', 'live_mp4'):
+            if (format_id.startswith(('url', 'cache')) or
+                    format_id in ('extra_data', 'live_mp4', 'postlive_mp4')):
                  height = int_or_none(self._search_regex(
                      r'^(?:url|cache)(\d+)', format_id, 'height', default=None))
                  formats.append({
@@ -408,8 +432,9 @@ class VKIE(VKBaseIE):
                  })
              elif format_id == 'hls':
                  formats.extend(self._extract_m3u8_formats(
-                    format_url, video_id, 'mp4', m3u8_id=format_id,
-                    fatal=False, live=True))
+                    format_url, video_id, 'mp4',
+                    entry_protocol='m3u8' if is_live else 'm3u8_native',
+                    m3u8_id=format_id, fatal=False, live=is_live))
              elif format_id == 'rtmp':
                  formats.append({
                      'format_id': format_id,
@@ -427,6 +452,7 @@ class VKIE(VKBaseIE):
              'duration': data.get('duration'),
              'timestamp': timestamp,
              'view_count': view_count,
+            'is_live': is_live,
          }
  
  
diff --git a/youtube_dl/extractor/vlive.py b/youtube_dl/extractor/vlive.py

index acf9fda487f6143906b8162a158c1cf9f53fec68..b9718901b8339e3d5fee85a15d20514b038b66aa 100644 (file)
--- a/youtube_dl/extractor/vlive.py
+++ b/youtube_dl/extractor/vlive.py
@@ -2,16 +2,23 @@
  from __future__ import unicode_literals
  
  import re
+import time
+import itertools
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_str,
+)
  from ..utils import (
      dict_get,
      ExtractorError,
      float_or_none,
      int_or_none,
      remove_start,
+    try_get,
+    urlencode_postdata,
  )
-from ..compat import compat_urllib_parse_urlencode
  
  
  class VLiveIE(InfoExtractor):
@@ -48,17 +55,23 @@ class VLiveIE(InfoExtractor):
          webpage = self._download_webpage(
              'http://www.vlive.tv/video/%s' % video_id, video_id)
  
-        video_params = self._search_regex(
-            r'\bvlive\.video\.init\(([^)]+)\)',
-            webpage, 'video params')
-        status, _, _, live_params, long_video_id, key = re.split(
-            r'"\s*,\s*"', video_params)[2:8]
+        VIDEO_PARAMS_RE = r'\bvlive\.video\.init\(([^)]+)'
+        VIDEO_PARAMS_FIELD = 'video params'
+
+        params = self._parse_json(self._search_regex(
+            VIDEO_PARAMS_RE, webpage, VIDEO_PARAMS_FIELD, default=''), video_id,
+            transform_source=lambda s: '[' + s + ']', fatal=False)
+
+        if not params or len(params) < 7:
+            params = self._search_regex(
+                VIDEO_PARAMS_RE, webpage, VIDEO_PARAMS_FIELD)
+            params = [p.strip(r'"') for p in re.split(r'\s*,\s*', params)]
+
+        status, long_video_id, key = params[2], params[5], params[6]
          status = remove_start(status, 'PRODUCT_')
  
          if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR':
-            live_params = self._parse_json('"%s"' % live_params, video_id)
-            live_params = self._parse_json(live_params, video_id)
-            return self._live(video_id, webpage, live_params)
+            return self._live(video_id, webpage)
          elif status == 'VOD_ON_AIR' or status == 'BIG_EVENT_INTRO':
              if long_video_id and key:
                  return self._replay(video_id, webpage, long_video_id, key)
@@ -89,7 +102,22 @@ class VLiveIE(InfoExtractor):
              'thumbnail': thumbnail,
          }
  
-    def _live(self, video_id, webpage, live_params):
+    def _live(self, video_id, webpage):
+        init_page = self._download_webpage(
+            'http://www.vlive.tv/video/init/view',
+            video_id, note='Downloading live webpage',
+            data=urlencode_postdata({'videoSeq': video_id}),
+            headers={
+                'Referer': 'http://www.vlive.tv/video/%s' % video_id,
+                'Content-Type': 'application/x-www-form-urlencoded'
+            })
+
+        live_params = self._search_regex(
+            r'"liveStreamInfo"\s*:\s*(".*"),',
+            init_page, 'live stream info')
+        live_params = self._parse_json(live_params, video_id)
+        live_params = self._parse_json(live_params, video_id)
+
          formats = []
          for vid in live_params.get('resolutions', []):
              formats.extend(self._extract_m3u8_formats(
@@ -98,10 +126,14 @@ class VLiveIE(InfoExtractor):
                  fatal=False, live=True))
          self._sort_formats(formats)
  
-        return dict(self._get_common_fields(webpage),
-                    id=video_id,
-                    formats=formats,
-                    is_live=True)
+        info = self._get_common_fields(webpage)
+        info.update({
+            'title': self._live_title(info['title']),
+            'id': video_id,
+            'formats': formats,
+            'is_live': True,
+        })
+        return info
  
      def _replay(self, video_id, webpage, long_video_id, key):
          playinfo = self._download_json(
@@ -135,8 +167,97 @@ class VLiveIE(InfoExtractor):
                      'ext': 'vtt',
                      'url': caption['source']}]
  
-        return dict(self._get_common_fields(webpage),
-                    id=video_id,
-                    formats=formats,
-                    view_count=view_count,
-                    subtitles=subtitles)
+        info = self._get_common_fields(webpage)
+        info.update({
+            'id': video_id,
+            'formats': formats,
+            'view_count': view_count,
+            'subtitles': subtitles,
+        })
+        return info
+
+
+class VLiveChannelIE(InfoExtractor):
+    IE_NAME = 'vlive:channel'
+    _VALID_URL = r'https?://channels\.vlive\.tv/(?P<id>[0-9A-Z]+)'
+    _TEST = {
+        'url': 'http://channels.vlive.tv/FCD4B',
+        'info_dict': {
+            'id': 'FCD4B',
+            'title': 'MAMAMOO',
+        },
+        'playlist_mincount': 110
+    }
+    _APP_ID = '8c6cc7b45d2568fb668be6e05b6e5a3b'
+
+    def _real_extract(self, url):
+        channel_code = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://channels.vlive.tv/%s/video' % channel_code, channel_code)
+
+        app_id = None
+
+        app_js_url = self._search_regex(
+            r'<script[^>]+src=(["\'])(?P<url>http.+?/app\.js.*?)\1',
+            webpage, 'app js', default=None, group='url')
+
+        if app_js_url:
+            app_js = self._download_webpage(
+                app_js_url, channel_code, 'Downloading app JS', fatal=False)
+            if app_js:
+                app_id = self._search_regex(
+                    r'Global\.VFAN_APP_ID\s*=\s*[\'"]([^\'"]+)[\'"]',
+                    app_js, 'app id', default=None)
+
+        app_id = app_id or self._APP_ID
+
+        channel_info = self._download_json(
+            'http://api.vfan.vlive.tv/vproxy/channelplus/decodeChannelCode',
+            channel_code, note='Downloading decode channel code',
+            query={
+                'app_id': app_id,
+                'channelCode': channel_code,
+                '_': int(time.time())
+            })
+
+        channel_seq = channel_info['result']['channelSeq']
+        channel_name = None
+        entries = []
+
+        for page_num in itertools.count(1):
+            video_list = self._download_json(
+                'http://api.vfan.vlive.tv/vproxy/channelplus/getChannelVideoList',
+                channel_code, note='Downloading channel list page #%d' % page_num,
+                query={
+                    'app_id': app_id,
+                    'channelSeq': channel_seq,
+                    'maxNumOfRows': 1000,
+                    '_': int(time.time()),
+                    'pageNo': page_num
+                }
+            )
+
+            if not channel_name:
+                channel_name = try_get(
+                    video_list,
+                    lambda x: x['result']['channelInfo']['channelName'],
+                    compat_str)
+
+            videos = try_get(
+                video_list, lambda x: x['result']['videoList'], list)
+            if not videos:
+                break
+
+            for video in videos:
+                video_id = video.get('videoSeq')
+                if not video_id:
+                    continue
+                video_id = compat_str(video_id)
+                entries.append(
+                    self.url_result(
+                        'http://www.vlive.tv/video/%s' % video_id,
+                        ie=VLiveIE.ie_key(), video_id=video_id))
+
+        return self.playlist_result(
+            entries, channel_code, channel_name)
diff --git a/youtube_dl/extractor/vodlocker.py b/youtube_dl/extractor/vodlocker.py

index bbfa6e5f26f6043af52ae168b6e2cebb7463edfc..02c9617d297926b75b62cfdd5a4b33f085a9741a 100644 (file)
--- a/youtube_dl/extractor/vodlocker.py
+++ b/youtube_dl/extractor/vodlocker.py
@@ -20,7 +20,7 @@ class VodlockerIE(InfoExtractor):
              'id': 'e8wvyzz4sl42',
              'ext': 'mp4',
              'title': 'Germany vs Brazil',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 4f1a99a8989d736c1de572e6372b022544102f87..59e1359c48628af9b4c53bedc337fa6b9b3d1396 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -26,7 +26,7 @@ class VoiceRepublicIE(InfoExtractor):
              'ext': 'm4a',
              'title': 'Watching the Watchers: Building a Sousveillance State',
              'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
-            'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
              'duration': 1800,
              'view_count': int,
          }
diff --git a/youtube_dl/extractor/vporn.py b/youtube_dl/extractor/vporn.py

index 1557a0e0406ebfb75c2b5b4583c74f05c5dd2cc7..858ac9e71422548f688600450e3f7cc0a630c500 100644 (file)
--- a/youtube_dl/extractor/vporn.py
+++ b/youtube_dl/extractor/vporn.py
@@ -7,6 +7,7 @@ from ..utils import (
      ExtractorError,
      parse_duration,
      str_to_int,
+    urljoin,
  )
  
  
@@ -22,7 +23,7 @@ class VpornIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Violet on her 19th birthday',
                  'description': 'Violet dances in front of the camera which is sure to get you horny.',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'uploader': 'kileyGrope',
                  'categories': ['Masturbation', 'Teen'],
                  'duration': 393,
@@ -40,7 +41,7 @@ class VpornIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Hana Shower',
                  'description': 'Hana showers at the bathroom.',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'uploader': 'Hmmmmm',
                  'categories': ['Big Boobs', 'Erotic', 'Teen', 'Female', '720p'],
                  'duration': 588,
@@ -66,10 +67,9 @@ class VpornIE(InfoExtractor):
          description = self._html_search_regex(
              r'class="(?:descr|description_txt)">(.*?)</div>',
              webpage, 'description', fatal=False)
-        thumbnail = self._html_search_regex(
-            r'flashvars\.imageUrl\s*=\s*"([^"]+)"', webpage, 'description', fatal=False, default=None)
-        if thumbnail:
-            thumbnail = 'http://www.vporn.com' + thumbnail
+        thumbnail = urljoin('http://www.vporn.com', self._html_search_regex(
+            r'flashvars\.imageUrl\s*=\s*"([^"]+)"', webpage, 'description',
+            default=None))
  
          uploader = self._html_search_regex(
              r'(?s)Uploaded by:.*?<a href="/user/[^"]+"[^>]*>(.+?)</a>',
diff --git a/youtube_dl/extractor/vube.py b/youtube_dl/extractor/vube.py

index 10ca6acb12469f85267405f9431b9508c0537e57..8ce3a6b81b6ed69e5fd4960ab4721ecd45bf750e 100644 (file)
--- a/youtube_dl/extractor/vube.py
+++ b/youtube_dl/extractor/vube.py
@@ -26,7 +26,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Best Drummer Ever [HD]',
                  'description': 'md5:2d63c4b277b85c2277761c2cf7337d71',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'uploader': 'William',
                  'timestamp': 1406876915,
                  'upload_date': '20140801',
@@ -45,7 +45,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Chiara Grispo - Price Tag by Jessie J',
                  'description': 'md5:8ea652a1f36818352428cb5134933313',
-                'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$',
+                'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$',
                  'uploader': 'Chiara.Grispo',
                  'timestamp': 1388743358,
                  'upload_date': '20140103',
@@ -65,7 +65,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'My 7 year old Sister and I singing "Alive" by Krewella',
                  'description': 'md5:40bcacb97796339f1690642c21d56f4a',
-                'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$',
+                'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$',
                  'uploader': 'Seraina',
                  'timestamp': 1396492438,
                  'upload_date': '20140403',
@@ -84,7 +84,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Frozen - Let It Go Cover by Siren Gene',
                  'description': 'My rendition of "Let It Go" originally sung by Idina Menzel.',
-                'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$',
+                'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$',
                  'uploader': 'Siren',
                  'timestamp': 1395448018,
                  'upload_date': '20140322',
diff --git a/youtube_dl/extractor/vvvvid.py b/youtube_dl/extractor/vvvvid.py

new file mode 100644 (file)

index 0000000..d44ec85
--- /dev/null
+++ b/youtube_dl/extractor/vvvvid.py
@@ -0,0 +1,140 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    str_or_none,
+)
+
+
+class VVVVIDIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?vvvvid\.it/#!(?:show|anime|film|series)/(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)'
+    _TESTS = [{
+        # video_type == 'video/vvvvid'
+        'url': 'https://www.vvvvid.it/#!show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048/ping-pong',
+        'md5': 'b8d3cecc2e981adc3835adf07f6df91b',
+        'info_dict': {
+            'id': '489048',
+            'ext': 'mp4',
+            'title': 'Ping Pong',
+        },
+    }, {
+        # video_type == 'video/rcs'
+        'url': 'https://www.vvvvid.it/#!show/376/death-note-live-action/377/482493/episodio-01',
+        'md5': '33e0edfba720ad73a8782157fdebc648',
+        'info_dict': {
+            'id': '482493',
+            'ext': 'mp4',
+            'title': 'Episodio 01',
+        },
+    }]
+    _conn_id = None
+
+    def _real_initialize(self):
+        self._conn_id = self._download_json(
+            'https://www.vvvvid.it/user/login',
+            None, headers=self.geo_verification_headers())['data']['conn_id']
+
+    def _real_extract(self, url):
+        show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
+        response = self._download_json(
+            'https://www.vvvvid.it/vvvvid/ondemand/%s/season/%s' % (show_id, season_id),
+            video_id, headers=self.geo_verification_headers(), query={
+                'conn_id': self._conn_id,
+            })
+        if response['result'] == 'error':
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, response['message']), expected=True)
+
+        vid = int(video_id)
+        video_data = list(filter(
+            lambda episode: episode.get('video_id') == vid, response['data']))[0]
+        formats = []
+
+        # vvvvid embed_info decryption algorithm is reverse engineered from function $ds(h) at vvvvid.js
+        def ds(h):
+            g = "MNOPIJKL89+/4567UVWXQRSTEFGHABCDcdefYZabstuvopqr0123wxyzklmnghij"
+
+            def f(m):
+                l = []
+                o = 0
+                b = False
+                m_len = len(m)
+                while ((not b) and o < m_len):
+                    n = m[o] << 2
+                    o += 1
+                    k = -1
+                    j = -1
+                    if o < m_len:
+                        n += m[o] >> 4
+                        o += 1
+                        if o < m_len:
+                            k = (m[o - 1] << 4) & 255
+                            k += m[o] >> 2
+                            o += 1
+                            if o < m_len:
+                                j = (m[o - 1] << 6) & 255
+                                j += m[o]
+                                o += 1
+                            else:
+                                b = True
+                        else:
+                            b = True
+                    else:
+                        b = True
+                    l.append(n)
+                    if k != -1:
+                        l.append(k)
+                    if j != -1:
+                        l.append(j)
+                return l
+
+            c = []
+            for e in h:
+                c.append(g.index(e))
+
+            c_len = len(c)
+            for e in range(c_len * 2 - 1, -1, -1):
+                a = c[e % c_len] ^ c[(e + 1) % c_len]
+                c[e % c_len] = a
+
+            c = f(c)
+            d = ''
+            for e in c:
+                d += chr(e)
+
+            return d
+
+        for quality in ('_sd', ''):
+            embed_code = video_data.get('embed_info' + quality)
+            if not embed_code:
+                continue
+            embed_code = ds(embed_code)
+            video_type = video_data.get('video_type')
+            if video_type in ('video/rcs', 'video/kenc'):
+                formats.extend(self._extract_akamai_formats(
+                    embed_code, video_id))
+            else:
+                formats.extend(self._extract_wowza_formats(
+                    'http://sb.top-ix.org/videomg/_definst_/mp4:%s/playlist.m3u8' % embed_code, video_id))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_data['title'],
+            'formats': formats,
+            'thumbnail': video_data.get('thumbnail'),
+            'duration': int_or_none(video_data.get('length')),
+            'series': video_data.get('show_title'),
+            'season_id': season_id,
+            'season_number': video_data.get('season_number'),
+            'episode_id': str_or_none(video_data.get('id')),
+            'epidode_number': int_or_none(video_data.get('number')),
+            'episode_title': video_data['title'],
+            'view_count': int_or_none(video_data.get('views')),
+            'like_count': int_or_none(video_data.get('video_likes')),
+        }
diff --git a/youtube_dl/extractor/walla.py b/youtube_dl/extractor/walla.py

index 8b9488340368ea0292fa2614336778099c9eb11e..cbb54867244839e0447324f5a57e07cef2f6c646 100644 (file)
--- a/youtube_dl/extractor/walla.py
+++ b/youtube_dl/extractor/walla.py
@@ -20,7 +20,7 @@ class WallaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'וואן דיירקשן: ההיסטריה',
              'description': 'md5:de9e2512a92442574cdb0913c49bc4d8',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 3600,
          },
          'params': {
diff --git a/youtube_dl/extractor/watchindianporn.py b/youtube_dl/extractor/watchindianporn.py

index 5d3b5bdb4cb904acabea0864dd76be8a0cc62c30..ed099beea632b7eed16e93a7b386b6aec3089778 100644 (file)
--- a/youtube_dl/extractor/watchindianporn.py
+++ b/youtube_dl/extractor/watchindianporn.py
@@ -22,7 +22,7 @@ class WatchIndianPornIE(InfoExtractor):
              'display_id': 'hot-milf-from-kerala-shows-off-her-gorgeous-large-breasts-on-camera',
              'ext': 'mp4',
              'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'LoveJay',
              'upload_date': '20160428',
              'duration': 226,
diff --git a/youtube_dl/extractor/webcaster.py b/youtube_dl/extractor/webcaster.py

index 7486cb347d8523a932c2bddd56f26c9a95f78641..e4b65f54f5278009476f096c09f68fd69ccff478 100644 (file)
--- a/youtube_dl/extractor/webcaster.py
+++ b/youtube_dl/extractor/webcaster.py
@@ -20,7 +20,7 @@ class WebcasterIE(InfoExtractor):
              'id': 'c8cefd240aa593681c8d068cff59f407_hd',
              'ext': 'mp4',
              'title': 'Сибирь - Нефтехимик. Лучшие моменты первого периода',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://bl.webcaster.pro/media/start/free_6246c7a4453ac4c42b4398f840d13100_hd/2_2991109016/e8d0d82587ef435480118f9f9c41db41/4635726126',
diff --git a/youtube_dl/extractor/webofstories.py b/youtube_dl/extractor/webofstories.py

index 7aea47ed52f7f64032034ab43d51dbe524bff2b3..1eb1f67024acfca7948d7450f2bca849069190cc 100644 (file)
--- a/youtube_dl/extractor/webofstories.py
+++ b/youtube_dl/extractor/webofstories.py
@@ -19,7 +19,7 @@ class WebOfStoriesIE(InfoExtractor):
              'id': '4536',
              'ext': 'mp4',
              'title': 'The temperature of the sun',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'Hans Bethe talks about calculating the temperature of the sun',
              'duration': 238,
          }
@@ -30,7 +30,7 @@ class WebOfStoriesIE(InfoExtractor):
              'id': '55908',
              'ext': 'mp4',
              'title': 'The story of Gemmata obscuriglobus',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'Planctomycete talks about The story of Gemmata obscuriglobus',
              'duration': 169,
          },
@@ -42,7 +42,7 @@ class WebOfStoriesIE(InfoExtractor):
              'id': '54215',
              'ext': 'mp4',
              'title': '"A Leg to Stand On"',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'Oliver Sacks talks about the death and resurrection of a limb',
              'duration': 97,
          },
@@ -134,7 +134,7 @@ class WebOfStoriesPlaylistIE(InfoExtractor):
  
          entries = [
              self.url_result('http://www.webofstories.com/play/%s' % video_number, 'WebOfStories')
-            for video_number in set(re.findall('href="/playAll/%s\?sId=(\d+)"' % playlist_id, webpage))
+            for video_number in set(re.findall(r'href="/playAll/%s\?sId=(\d+)"' % playlist_id, webpage))
          ]
  
          title = self._search_regex(
diff --git a/youtube_dl/extractor/weiqitv.py b/youtube_dl/extractor/weiqitv.py

index 8e09156c26c58b4cc184dbe97e679ee9b8dfa47f..7e0befd3922b15194fbcf5e142419c1c62ae46c3 100644 (file)
--- a/youtube_dl/extractor/weiqitv.py
+++ b/youtube_dl/extractor/weiqitv.py
@@ -37,11 +37,11 @@ class WeiqiTVIE(InfoExtractor):
          page = self._download_webpage(url, media_id)
  
          info_json_str = self._search_regex(
-            'var\s+video\s*=\s*(.+});', page, 'info json str')
+            r'var\s+video\s*=\s*(.+});', page, 'info json str')
          info_json = self._parse_json(info_json_str, media_id)
  
          letvcloud_url = self._search_regex(
-            'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url')
+            r'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url')
  
          return {
              '_type': 'url_transparent',
diff --git a/youtube_dl/extractor/xbef.py b/youtube_dl/extractor/xbef.py

index e4a2baad22534d772a90b8ec5832c11833f10281..4c41e98b27ff1ede17b9d0d803f18f5de3ebafe3 100644 (file)
--- a/youtube_dl/extractor/xbef.py
+++ b/youtube_dl/extractor/xbef.py
@@ -14,7 +14,7 @@ class XBefIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'md5:7358a9faef8b7b57acda7c04816f170e',
              'age_limit': 18,
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
          }
      }
  
diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py

index de344bad25309c03b1d7378ceb6b3968c2d4c47a..e616adce3ab3333291a316d19c224c846006feea 100644 (file)
--- a/youtube_dl/extractor/xfileshare.py
+++ b/youtube_dl/extractor/xfileshare.py
@@ -44,7 +44,7 @@ class XFileShareIE(InfoExtractor):
              'id': '06y9juieqpmi',
              'ext': 'mp4',
              'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
@@ -56,7 +56,7 @@ class XFileShareIE(InfoExtractor):
              'id': '3rso4kdn6f9m',
              'ext': 'mp4',
              'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          'url': 'http://movpod.in/0wguyyxi1yca',
@@ -67,7 +67,7 @@ class XFileShareIE(InfoExtractor):
              'id': '3ivfabn7573c',
              'ext': 'mp4',
              'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
          'skip': 'Video removed',
      }, {
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index bd8e1af2e0f6c25fc44aea36c23b813b092b4438..36a8c98407bcaf8466660279f2e444df48083a3a 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -5,8 +5,8 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      dict_get,
-    float_or_none,
      int_or_none,
+    parse_duration,
      unified_strdate,
  )
  
@@ -22,7 +22,7 @@ class XHamsterIE(InfoExtractor):
              'title': 'FemaleAgent Shy beauty takes the bait',
              'upload_date': '20121014',
              'uploader': 'Ruseful2011',
-            'duration': 893.52,
+            'duration': 893,
              'age_limit': 18,
          },
      }, {
@@ -33,7 +33,7 @@ class XHamsterIE(InfoExtractor):
              'title': 'Britney Spears  Sexy Booty',
              'upload_date': '20130914',
              'uploader': 'jojo747400',
-            'duration': 200.48,
+            'duration': 200,
              'age_limit': 18,
          },
          'params': {
@@ -48,7 +48,7 @@ class XHamsterIE(InfoExtractor):
              'title': '....',
              'upload_date': '20160208',
              'uploader': 'parejafree',
-            'duration': 72.0,
+            'duration': 72,
              'age_limit': 18,
          },
          'params': {
@@ -101,9 +101,9 @@ class XHamsterIE(InfoExtractor):
               r'''<video[^>]+poster=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''],
              webpage, 'thumbnail', fatal=False, group='thumbnail')
  
-        duration = float_or_none(self._search_regex(
-            r'(["\'])duration\1\s*:\s*(["\'])(?P<duration>.+?)\2',
-            webpage, 'duration', fatal=False, group='duration'))
+        duration = parse_duration(self._search_regex(
+            r'Runtime:\s*</span>\s*([\d:]+)', webpage,
+            'duration', fatal=False))
  
          view_count = int_or_none(self._search_regex(
              r'content=["\']User(?:View|Play)s:(\d+)',
diff --git a/youtube_dl/extractor/xiami.py b/youtube_dl/extractor/xiami.py

index 86abef25704bbc9e8a1494ecc2146b5c5bfabe32..d017e03de2092c8726bdad7e86b364b57e44e136 100644 (file)
--- a/youtube_dl/extractor/xiami.py
+++ b/youtube_dl/extractor/xiami.py
@@ -16,7 +16,9 @@ class XiamiBaseIE(InfoExtractor):
          return webpage
  
      def _extract_track(self, track, track_id=None):
-        title = track['title']
+        track_name = track.get('songName') or track.get('name') or track['subName']
+        artist = track.get('artist') or track.get('artist_name') or track.get('singers')
+        title = '%s - %s' % (artist, track_name) if artist else track_name
          track_url = self._decrypt(track['location'])
  
          subtitles = {}
@@ -31,9 +33,10 @@ class XiamiBaseIE(InfoExtractor):
              'thumbnail': track.get('pic') or track.get('album_pic'),
              'duration': int_or_none(track.get('length')),
              'creator': track.get('artist', '').split(';')[0],
-            'track': title,
-            'album': track.get('album_name'),
-            'artist': track.get('artist'),
+            'track': track_name,
+            'track_number': int_or_none(track.get('track')),
+            'album': track.get('album_name') or track.get('title'),
+            'artist': artist,
              'subtitles': subtitles,
          }
  
@@ -68,14 +71,14 @@ class XiamiBaseIE(InfoExtractor):
  class XiamiSongIE(XiamiBaseIE):
      IE_NAME = 'xiami:song'
      IE_DESC = '虾米音乐'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://www.xiami.com/song/1775610518',
          'md5': '521dd6bea40fd5c9c69f913c232cb57e',
          'info_dict': {
              'id': '1775610518',
              'ext': 'mp3',
-            'title': 'Woman',
+            'title': 'HONNE - Woman',
              'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
              'duration': 265,
              'creator': 'HONNE',
@@ -95,7 +98,7 @@ class XiamiSongIE(XiamiBaseIE):
          'info_dict': {
              'id': '1775256504',
              'ext': 'mp3',
-            'title': '悟空',
+            'title': 'æ\88´è\8d\83 - æ\82\9fç©º',
              'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
              'duration': 200,
              'creator': '戴荃',
@@ -109,6 +112,26 @@ class XiamiSongIE(XiamiBaseIE):
              },
          },
          'skip': 'Georestricted',
+    }, {
+        'url': 'http://www.xiami.com/song/1775953850',
+        'info_dict': {
+            'id': '1775953850',
+            'ext': 'mp3',
+            'title': 'До Скону - Чума Пожирает Землю',
+            'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
+            'duration': 683,
+            'creator': 'До Скону',
+            'track': 'Чума Пожирает Землю',
+            'track_number': 7,
+            'album': 'Ад',
+            'artist': 'До Скону',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.xiami.com/song/xLHGwgd07a1',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -124,7 +147,7 @@ class XiamiPlaylistBaseIE(XiamiBaseIE):
  class XiamiAlbumIE(XiamiPlaylistBaseIE):
      IE_NAME = 'xiami:album'
      IE_DESC = '虾米音乐 - 专辑'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[^/?#&]+)'
      _TYPE = '1'
      _TESTS = [{
          'url': 'http://www.xiami.com/album/2100300444',
@@ -136,28 +159,34 @@ class XiamiAlbumIE(XiamiPlaylistBaseIE):
      }, {
          'url': 'http://www.xiami.com/album/512288?spm=a1z1s.6843761.1110925389.6.hhE9p9',
          'only_matching': True,
+    }, {
+        'url': 'http://www.xiami.com/album/URVDji2a506',
+        'only_matching': True,
      }]
  
  
  class XiamiArtistIE(XiamiPlaylistBaseIE):
      IE_NAME = 'xiami:artist'
      IE_DESC = '虾米音乐 - 歌手'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[^/?#&]+)'
      _TYPE = '2'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.xiami.com/artist/2132?spm=0.0.0.0.dKaScp',
          'info_dict': {
              'id': '2132',
          },
          'playlist_count': 20,
          'skip': 'Georestricted',
-    }
+    }, {
+        'url': 'http://www.xiami.com/artist/bC5Tk2K6eb99',
+        'only_matching': True,
+    }]
  
  
  class XiamiCollectionIE(XiamiPlaylistBaseIE):
      IE_NAME = 'xiami:collection'
      IE_DESC = '虾米音乐 - 精选集'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[^/?#&]+)'
      _TYPE = '3'
      _TEST = {
          'url': 'http://www.xiami.com/collect/156527391?spm=a1z1s.2943601.6856193.12.4jpBnr',
diff --git a/youtube_dl/extractor/xuite.py b/youtube_dl/extractor/xuite.py

index 4b9c1ee9c5222f48c5634184f703baa062cf3ae9..e0818201a2b9ff122904fcbbf7a775a770c5dc5c 100644 (file)
--- a/youtube_dl/extractor/xuite.py
+++ b/youtube_dl/extractor/xuite.py
@@ -24,7 +24,7 @@ class XuiteIE(InfoExtractor):
              'id': '3860914',
              'ext': 'mp3',
              'title': '孤單南半球-歐德陽',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 247.246,
              'timestamp': 1314932940,
              'upload_date': '20110902',
@@ -40,7 +40,7 @@ class XuiteIE(InfoExtractor):
              'id': '25925099',
              'ext': 'mp4',
              'title': 'BigBuckBunny_320x180',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 596.458,
              'timestamp': 1454242500,
              'upload_date': '20160131',
@@ -58,7 +58,7 @@ class XuiteIE(InfoExtractor):
              'ext': 'mp4',
              'title': '暗殺教室 02',
              'description': '字幕:【極影字幕社】',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1384.907,
              'timestamp': 1421481240,
              'upload_date': '20150117',
diff --git a/youtube_dl/extractor/yesjapan.py b/youtube_dl/extractor/yesjapan.py

index 112a6c030138e6c7d0e58619d40f3af012e13362..681338c96a2c743362773e6ea036a5dc64824326 100644 (file)
--- a/youtube_dl/extractor/yesjapan.py
+++ b/youtube_dl/extractor/yesjapan.py
@@ -21,7 +21,7 @@ class YesJapanIE(InfoExtractor):
              'ext': 'mp4',
              'timestamp': 1416391590,
              'upload_date': '20141119',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/yinyuetai.py b/youtube_dl/extractor/yinyuetai.py

index 834d860af32871678f8fef5afd345353273ebc19..1fd8d35c637224a8609b23c87db3206a24987a63 100644 (file)
--- a/youtube_dl/extractor/yinyuetai.py
+++ b/youtube_dl/extractor/yinyuetai.py
@@ -18,7 +18,7 @@ class YinYueTaiIE(InfoExtractor):
              'title': '少女时代_PARTY_Music Video Teaser',
              'creator': '少女时代',
              'duration': 25,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://v.yinyuetai.com/video/h5/2322376',
diff --git a/youtube_dl/extractor/ynet.py b/youtube_dl/extractor/ynet.py

index 0d943c3432a57570afc229ab4efce66b2118f763..c4ae4d88eb0f64dd24d63c7e882ab49552a962c2 100644 (file)
--- a/youtube_dl/extractor/ynet.py
+++ b/youtube_dl/extractor/ynet.py
@@ -17,7 +17,7 @@ class YnetIE(InfoExtractor):
                  'id': 'L-11659-99244',
                  'ext': 'flv',
                  'title': 'איש לא יודע מאיפה באנו',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
              }
          }, {
              'url': 'http://hot.ynet.co.il/home/0,7340,L-8859-84418,00.html',
@@ -25,7 +25,7 @@ class YnetIE(InfoExtractor):
                  'id': 'L-8859-84418',
                  'ext': 'flv',
                  'title': "צפו: הנשיקה הלוהטת של תורגי' ויוליה פלוטקין",
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
              }
          }
      ]
diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py

index 0265a64a7d3c014001b2d0e81789f0e904b32d62..34ab878a4167580b50ef5b3d5bee94e58b02fe41 100644 (file)
--- a/youtube_dl/extractor/youporn.py
+++ b/youtube_dl/extractor/youporn.py
@@ -24,7 +24,7 @@ class YouPornIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
              'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Ask Dan And Jennifer',
              'upload_date': '20101221',
              'average_rating': int,
@@ -43,7 +43,7 @@ class YouPornIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Big Tits Awesome Brunette On amazing webcam show',
              'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Unknown',
              'upload_date': '20111125',
              'average_rating': int,
diff --git a/youtube_dl/extractor/yourupload.py b/youtube_dl/extractor/yourupload.py

index 4e25d6f22312a0dca9f1997baa3bacd1c3fd263d..9fa77283899ceb005b6fed8f4ebcb626d38241ce 100644 (file)
--- a/youtube_dl/extractor/yourupload.py
+++ b/youtube_dl/extractor/yourupload.py
@@ -2,44 +2,37 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import urljoin
  
  
  class YourUploadIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?
-        (?:yourupload\.com/watch|
-           embed\.yourupload\.com|
-           embed\.yucache\.net
-        )/(?P<id>[A-Za-z0-9]+)
-        '''
-    _TESTS = [
-        {
-            'url': 'http://yourupload.com/watch/14i14h',
-            'md5': '5e2c63385454c557f97c4c4131a393cd',
-            'info_dict': {
-                'id': '14i14h',
-                'ext': 'mp4',
-                'title': 'BigBuckBunny_320x180.mp4',
-                'thumbnail': 're:^https?://.*\.jpe?g',
-            }
-        },
-        {
-            'url': 'http://embed.yourupload.com/14i14h',
-            'only_matching': True,
-        },
-        {
-            'url': 'http://embed.yucache.net/14i14h?client_file_id=803349',
-            'only_matching': True,
-        },
-    ]
+    _VALID_URL = r'https?://(?:www\.)?(?:yourupload\.com/(?:watch|embed)|embed\.yourupload\.com)/(?P<id>[A-Za-z0-9]+)'
+    _TESTS = [{
+        'url': 'http://yourupload.com/watch/14i14h',
+        'md5': '5e2c63385454c557f97c4c4131a393cd',
+        'info_dict': {
+            'id': '14i14h',
+            'ext': 'mp4',
+            'title': 'BigBuckBunny_320x180.mp4',
+            'thumbnail': r're:^https?://.*\.jpe?g',
+        }
+    }, {
+        'url': 'http://www.yourupload.com/embed/14i14h',
+        'only_matching': True,
+    }, {
+        'url': 'http://embed.yourupload.com/14i14h',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        embed_url = 'http://embed.yucache.net/{0:}'.format(video_id)
+        embed_url = 'http://www.yourupload.com/embed/%s' % video_id
+
          webpage = self._download_webpage(embed_url, video_id)
  
          title = self._og_search_title(webpage)
-        video_url = self._og_search_video_url(webpage)
+        video_url = urljoin(embed_url, self._og_search_video_url(webpage))
          thumbnail = self._og_search_thumbnail(webpage, default=None)
  
          return {
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index bd24a28389bf847f1e72451e232bfab0807809d8..76710931ae5e6a292af767f3f57685ad0be98cac 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -40,6 +40,7 @@ from ..utils import (
      sanitized_Request,
      smuggle_url,
      str_to_int,
+    try_get,
      unescapeHTML,
      unified_strdate,
      unsmuggle_url,
@@ -316,6 +317,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          '137': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
          '138': {'ext': 'mp4', 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},  # Height can vary (https://github.com/rg3/youtube-dl/issues/4559)
          '160': {'ext': 'mp4', 'height': 144, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '212': {'ext': 'mp4', 'height': 480, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
          '264': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
          '298': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'vcodec': 'h264', 'fps': 60, 'preference': -40},
          '299': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'h264', 'fps': 60, 'preference': -40},
@@ -327,6 +329,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          '141': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 256, 'preference': -50, 'container': 'm4a_dash'},
          '256': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'preference': -50, 'container': 'm4a_dash'},
          '258': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'preference': -50, 'container': 'm4a_dash'},
+        '325': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'dtse', 'preference': -50, 'container': 'm4a_dash'},
+        '328': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'ec-3', 'preference': -50, 'container': 'm4a_dash'},
  
          # Dash webm
          '167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
@@ -376,12 +380,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'youtube-dl test video "\'/\\ä↭𝕐',
                  'uploader': 'Philipp Hagemeister',
                  'uploader_id': 'phihag',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
                  'upload_date': '20121002',
                  'license': 'Standard YouTube License',
                  'description': 'test chars:  "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
                  'categories': ['Science & Technology'],
                  'tags': ['youtube-dl'],
+                'duration': 10,
                  'like_count': int,
                  'dislike_count': int,
                  'start_time': 1,
@@ -401,9 +406,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                           'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                           'iconic ep', 'iconic', 'love', 'it'],
+                'duration': 180,
                  'uploader': 'Icona Pop',
                  'uploader_id': 'IconaPop',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IconaPop',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop',
                  'license': 'Standard YouTube License',
                  'creator': 'Icona Pop',
              }
@@ -418,9 +424,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Justin Timberlake - Tunnel Vision (Explicit)',
                  'alt_title': 'Tunnel Vision',
                  'description': 'md5:64249768eec3bc4276236606ea996373',
+                'duration': 419,
                  'uploader': 'justintimberlakeVEVO',
                  'uploader_id': 'justintimberlakeVEVO',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
                  'license': 'Standard YouTube License',
                  'creator': 'Justin Timberlake',
                  'age_limit': 18,
@@ -437,7 +444,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7',
                  'uploader': 'SET India',
                  'uploader_id': 'setindia',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/setindia',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/setindia',
                  'license': 'Standard YouTube License',
                  'age_limit': 18,
              }
@@ -451,12 +458,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'youtube-dl test video "\'/\\ä↭𝕐',
                  'uploader': 'Philipp Hagemeister',
                  'uploader_id': 'phihag',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
                  'upload_date': '20121002',
                  'license': 'Standard YouTube License',
                  'description': 'test chars:  "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
                  'categories': ['Science & Technology'],
                  'tags': ['youtube-dl'],
+                'duration': 10,
                  'like_count': int,
                  'dislike_count': int,
              },
@@ -472,7 +480,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'm4a',
                  'upload_date': '20121002',
                  'uploader_id': '8KVIDEO',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
                  'description': '',
                  'uploader': '8KVIDEO',
                  'license': 'Standard YouTube License',
@@ -492,6 +500,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'm4a',
                  'title': 'Afrojack, Spree Wilson - The Spark ft. Spree Wilson',
                  'description': 'md5:12e7067fa6735a77bdcbb58cb1187d2d',
+                'duration': 244,
                  'uploader': 'AfrojackVEVO',
                  'uploader_id': 'AfrojackVEVO',
                  'upload_date': '20131011',
@@ -511,6 +520,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Taylor Swift - Shake It Off',
                  'alt_title': 'Shake It Off',
                  'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3',
+                'duration': 242,
                  'uploader': 'TaylorSwiftVEVO',
                  'uploader_id': 'TaylorSwiftVEVO',
                  'upload_date': '20140818',
@@ -528,10 +538,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': 'T4XJQO3qol8',
                  'ext': 'mp4',
+                'duration': 219,
                  'upload_date': '20100909',
                  'uploader': 'The Amazing Atheist',
                  'uploader_id': 'TheAmazingAtheist',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
                  'license': 'Standard YouTube License',
                  'title': 'Burning Everyone\'s Koran',
                  'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
@@ -544,10 +555,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': 'HtVdAasjOgU',
                  'ext': 'mp4',
                  'title': 'The Witcher 3: Wild Hunt - The Sword Of Destiny Trailer',
-                'description': 're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
+                'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
+                'duration': 142,
                  'uploader': 'The Witcher',
                  'uploader_id': 'WitcherGame',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
                  'upload_date': '20140605',
                  'license': 'Standard YouTube License',
                  'age_limit': 18,
@@ -561,9 +573,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Dedication To My Ex (Miss That) (Lyric Video)',
                  'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
+                'duration': 247,
                  'uploader': 'LloydVEVO',
                  'uploader_id': 'LloydVEVO',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
                  'upload_date': '20110629',
                  'license': 'Standard YouTube License',
                  'age_limit': 18,
@@ -575,9 +588,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': '__2ABJjxzNo',
                  'ext': 'mp4',
+                'duration': 266,
                  'upload_date': '20100430',
                  'uploader_id': 'deadmau5',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/deadmau5',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
                  'creator': 'deadmau5',
                  'description': 'md5:12c56784b8032162bb936a5f76d55360',
                  'uploader': 'deadmau5',
@@ -595,9 +609,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': 'lqQg6PlCWgI',
                  'ext': 'mp4',
+                'duration': 6085,
                  'upload_date': '20150827',
                  'uploader_id': 'olympic',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic',
                  'license': 'Standard YouTube License',
                  'description': 'HO09  - Women -  GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
                  'uploader': 'Olympic',
@@ -614,9 +629,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': '_b-2C3KPAM0',
                  'ext': 'mp4',
                  'stretched_ratio': 16 / 9.,
+                'duration': 85,
                  'upload_date': '20110310',
                  'uploader_id': 'AllenMeow',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
                  'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
                  'uploader': '孫艾倫',
                  'license': 'Standard YouTube License',
@@ -648,9 +664,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'md5:7b81415841e02ecd4313668cde88737a',
                  'description': 'md5:116377fd2963b81ec4ce64b542173306',
+                'duration': 220,
                  'upload_date': '20150625',
                  'uploader_id': 'dorappi2000',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
                  'uploader': 'dorappi2000',
                  'license': 'Standard YouTube License',
                  'formats': 'mincount:32',
@@ -690,10 +707,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (Main Camera)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7335,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }, {
@@ -702,10 +720,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (kreestuh)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7337,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }, {
@@ -714,10 +733,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (grizzle)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7337,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }, {
@@ -726,10 +746,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (zim)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7334,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }],
@@ -767,9 +788,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21',
                  'alt_title': 'Dark Walk',
                  'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
+                'duration': 133,
                  'upload_date': '20151119',
                  'uploader_id': 'IronSoulElf',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
                  'uploader': 'IronSoulElf',
                  'license': 'Standard YouTube License',
                  'creator': 'Todd Haberman, Daniel Law Heath & Aaron Kaplan',
@@ -808,10 +830,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'md5:e41008789470fc2533a3252216f1c1d1',
                  'description': 'md5:a677553cf0840649b731a3024aeff4cc',
+                'duration': 721,
                  'upload_date': '20150127',
                  'uploader_id': 'BerkmanCenter',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
-                'uploader': 'BerkmanCenter',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
+                'uploader': 'The Berkman Klein Center for Internet & Society',
                  'license': 'Creative Commons Attribution license (reuse allowed)',
              },
              'params': {
@@ -826,10 +849,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Democratic Socialism and Foreign Policy | Bernie Sanders',
                  'description': 'md5:dda0d780d5a6e120758d1711d062a867',
+                'duration': 4060,
                  'upload_date': '20151119',
                  'uploader': 'Bernie 2016',
                  'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
                  'license': 'Creative Commons Attribution license (reuse allowed)',
              },
              'params': {
@@ -856,12 +880,42 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'upload_date': '20150811',
                  'uploader': 'FlixMatrix',
                  'uploader_id': 'FlixMatrixKaravan',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan',
                  'license': 'Standard YouTube License',
              },
              'params': {
                  'skip_download': True,
              },
+        },
+        {
+            # YouTube Red video with episode data
+            'url': 'https://www.youtube.com/watch?v=iqKdEhx-dD4',
+            'info_dict': {
+                'id': 'iqKdEhx-dD4',
+                'ext': 'mp4',
+                'title': 'Isolation - Mind Field (Ep 1)',
+                'description': 'md5:8013b7ddea787342608f63a13ddc9492',
+                'duration': 2085,
+                'upload_date': '20170118',
+                'uploader': 'Vsauce',
+                'uploader_id': 'Vsauce',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce',
+                'license': 'Standard YouTube License',
+                'series': 'Mind Field',
+                'season_number': 1,
+                'episode_number': 1,
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'expected_warnings': [
+                'Skipping DASH manifest',
+            ],
+        },
+        {
+            # itag 212
+            'url': '1t24XAntNCY',
+            'only_matching': True,
          }
      ]
  
@@ -976,8 +1030,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
  
      def _parse_sig_js(self, jscode):
          funcname = self._search_regex(
-            r'\.sig\|\|([a-zA-Z0-9$]+)\(', jscode,
-            'Initial JS player signature function name')
+            (r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+             r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\('),
+            jscode, 'Initial JS player signature function name', group='sig')
  
          jsi = JSInterpreter(jscode)
          initial_function = jsi.extract_function(funcname)
@@ -998,6 +1053,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
  
          if player_url.startswith('//'):
              player_url = 'https:' + player_url
+        elif not re.match(r'https?://', player_url):
+            player_url = compat_urlparse.urljoin(
+                'https://www.youtube.com', player_url)
          try:
              player_id = (player_url, self._signature_cache_id(s))
              if player_id not in self._player_cache:
@@ -1448,6 +1506,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          else:
              video_alt_title = video_creator = None
  
+        m_episode = re.search(
+            r'<div[^>]+id="watch7-headline"[^>]*>\s*<span[^>]*>.*?>(?P<series>[^<]+)</a></b>\s*S(?P<season>\d+)\s*•\s*E(?P<episode>\d+)</span>',
+            video_webpage)
+        if m_episode:
+            series = m_episode.group('series')
+            season_number = int(m_episode.group('season'))
+            episode_number = int(m_episode.group('episode'))
+        else:
+            series = season_number = episode_number = None
+
          m_cat_container = self._search_regex(
              r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
              video_webpage, 'categories', default=None)
@@ -1476,11 +1544,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          video_subtitles = self.extract_subtitles(video_id, video_webpage)
          automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
  
-        if 'length_seconds' not in video_info:
-            self._downloader.report_warning('unable to extract video duration')
-            video_duration = None
-        else:
-            video_duration = int(compat_urllib_parse_unquote_plus(video_info['length_seconds'][0]))
+        video_duration = try_get(
+            video_info, lambda x: int_or_none(x['length_seconds'][0]))
+        if not video_duration:
+            video_duration = parse_duration(self._html_search_meta(
+                'duration', video_webpage, 'video duration'))
  
          # annotations
          video_annotations = None
@@ -1737,6 +1805,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'is_live': is_live,
              'start_time': start_time,
              'end_time': end_time,
+            'series': series,
+            'season_number': season_number,
+            'episode_number': episode_number,
          }
  
  
@@ -1788,13 +1859,13 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
                              youtu\.be/[0-9A-Za-z_-]{11}\?.*?\blist=
                          )
                          (
-                            (?:PL|LL|EC|UU|FL|RD|UL)?[0-9A-Za-z-_]{10,}
+                            (?:PL|LL|EC|UU|FL|RD|UL|TL)?[0-9A-Za-z-_]{10,}
                              # Top tracks, they can also include dots
                              |(?:MC)[\w\.]*
                          )
                          .*
                       |
-                        ((?:PL|LL|EC|UU|FL|RD|UL)[0-9A-Za-z-_]{10,})
+                        ((?:PL|LL|EC|UU|FL|RD|UL|TL)[0-9A-Za-z-_]{10,})
                       )"""
      _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&disable_polymer=true'
      _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
@@ -1813,6 +1884,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': 'YDL_Empty_List',
          },
          'playlist_count': 0,
+        'skip': 'This playlist is private',
      }, {
          'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
          'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
@@ -1844,6 +1916,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'id': 'PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl',
          },
          'playlist_count': 2,
+        'skip': 'This playlist is private',
      }, {
          'note': 'embedded',
          'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
@@ -1877,7 +1950,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': "Smiley's People 01 detective, Adventure Series, Action",
              'uploader': 'STREEM',
              'uploader_id': 'UCyPhqAZgwYWZfxElWVbVJng',
-            'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng',
+            'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng',
              'upload_date': '20150526',
              'license': 'Standard YouTube License',
              'description': 'md5:507cdcb5a49ac0da37a920ece610be80',
@@ -1898,7 +1971,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': 'Small Scale Baler and Braiding Rugs',
              'uploader': 'Backus-Page House Museum',
              'uploader_id': 'backuspagemuseum',
-            'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
+            'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
              'upload_date': '20161008',
              'license': 'Standard YouTube License',
              'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
@@ -1914,6 +1987,9 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      }, {
          'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
          'only_matching': True,
+    }, {
+        'url': 'TLGGrESM50VT6acwMjAyMjAxNw',
+        'only_matching': True,
      }]
  
      def _real_initialize(self):
@@ -1955,14 +2031,18 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
          url = self._TEMPLATE_URL % playlist_id
          page = self._download_webpage(url, playlist_id)
  
-        for match in re.findall(r'<div class="yt-alert-message">([^<]+)</div>', page):
+        # the yt-alert-message now has tabindex attribute (see https://github.com/rg3/youtube-dl/issues/11604)
+        for match in re.findall(r'<div class="yt-alert-message"[^>]*>([^<]+)</div>', page):
              match = match.strip()
              # Check if the playlist exists or is private
-            if re.match(r'[^<]*(The|This) playlist (does not exist|is private)[^<]*', match):
-                raise ExtractorError(
-                    'The playlist doesn\'t exist or is private, use --username or '
-                    '--netrc to access it.',
-                    expected=True)
+            mobj = re.match(r'[^<]*(?:The|This) playlist (?P<reason>does not exist|is private)[^<]*', match)
+            if mobj:
+                reason = mobj.group('reason')
+                message = 'This playlist %s' % reason
+                if 'private' in reason:
+                    message += ', use --username or --netrc to access it'
+                message += '.'
+                raise ExtractorError(message, expected=True)
              elif re.match(r'[^<]*Invalid parameters[^<]*', match):
                  raise ExtractorError(
                      'Invalid parameters. Maybe URL is incorrect.',
@@ -2186,7 +2266,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
              'title': 'The Young Turks - Live Main Show',
              'uploader': 'The Young Turks',
              'uploader_id': 'TheYoungTurks',
-            'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks',
+            'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks',
              'upload_date': '20150715',
              'license': 'Standard YouTube License',
              'description': 'md5:438179573adcdff3c97ebb1ee632b891',
@@ -2270,18 +2350,18 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
          videos = []
          limit = n
  
+        url_query = {
+            'search_query': query.encode('utf-8'),
+        }
+        url_query.update(self._EXTRA_QUERY_ARGS)
+        result_url = 'https://www.youtube.com/results?' + compat_urllib_parse_urlencode(url_query)
+
          for pagenum in itertools.count(1):
-            url_query = {
-                'search_query': query.encode('utf-8'),
-                'page': pagenum,
-                'spf': 'navigate',
-            }
-            url_query.update(self._EXTRA_QUERY_ARGS)
-            result_url = 'https://www.youtube.com/results?' + compat_urllib_parse_urlencode(url_query)
              data = self._download_json(
                  result_url, video_id='query "%s"' % query,
                  note='Downloading page %s' % pagenum,
-                errnote='Unable to download API page')
+                errnote='Unable to download API page',
+                query={'spf': 'navigate'})
              html_content = data[1]['body']['content']
  
              if 'class="search-message' in html_content:
@@ -2293,6 +2373,12 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
              videos += new_videos
              if not new_videos or len(videos) > limit:
                  break
+            next_link = self._html_search_regex(
+                r'href="(/results\?[^"]*\bsp=[^"]+)"[^>]*>\s*<span[^>]+class="[^"]*\byt-uix-button-content\b[^"]*"[^>]*>Next',
+                html_content, 'next link', default=None)
+            if next_link is None:
+                break
+            result_url = compat_urlparse.urljoin('https://www.youtube.com/', next_link)
  
          if len(videos) > n:
              videos = videos[:n]
diff --git a/youtube_dl/extractor/zapiks.py b/youtube_dl/extractor/zapiks.py

index 22a9a57e882be49109c00036fa3559410b4e334f..bacb82eeeb2a549edbb0cbf6d0a67e07f28b595b 100644 (file)
--- a/youtube_dl/extractor/zapiks.py
+++ b/youtube_dl/extractor/zapiks.py
@@ -24,7 +24,7 @@ class ZapiksIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'EP2S3 - Bon Appétit - Eh bé viva les pyrénées con!',
                  'description': 'md5:7054d6f6f620c6519be1fe710d4da847',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 528,
                  'timestamp': 1359044972,
                  'upload_date': '20130124',
diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py

index 2ef17727592405b7bb20b378403d82470b52ce2f..a365923fbbadc093a484a972c85cf6070f1d2765 100644 (file)
--- a/youtube_dl/extractor/zdf.py
+++ b/youtube_dl/extractor/zdf.py
@@ -1,262 +1,312 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import functools
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
-    int_or_none,
-    unified_strdate,
-    OnDemandPagedList,
-    xpath_text,
      determine_ext,
+    int_or_none,
+    NO_DEFAULT,
+    orderedSet,
+    parse_codecs,
      qualities,
-    float_or_none,
-    ExtractorError,
+    try_get,
+    unified_timestamp,
+    update_url_query,
+    urljoin,
  )
  
  
-class ZDFIE(InfoExtractor):
-    _VALID_URL = r'(?:zdf:|zdf:video:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/(.*beitrag/(?:video/)?))(?P<id>[0-9]+)(?:/[^/?]+)?(?:\?.*)?'
+class ZDFBaseIE(InfoExtractor):
+    def _call_api(self, url, player, referrer, video_id):
+        return self._download_json(
+            url, video_id, 'Downloading JSON content',
+            headers={
+                'Referer': referrer,
+                'Api-Auth': 'Bearer %s' % player['apiToken'],
+            })
+
+    def _extract_player(self, webpage, video_id, fatal=True):
+        return self._parse_json(
+            self._search_regex(
+                r'(?s)data-zdfplayer-jsb=(["\'])(?P<json>{.+?})\1', webpage,
+                'player JSON', default='{}' if not fatal else NO_DEFAULT,
+                group='json'),
+            video_id)
+
+
+class ZDFIE(ZDFBaseIE):
+    _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
+    _QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh')
  
      _TESTS = [{
-        'url': 'http://www.zdf.de/ZDFmediathek/beitrag/video/2037704/ZDFspezial---Ende-des-Machtpokers--?bc=sts;stt',
+        'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
          'info_dict': {
-            'id': '2037704',
-            'ext': 'webm',
-            'title': 'ZDFspezial - Ende des Machtpokers',
-            'description': 'Union und SPD haben sich auf einen Koalitionsvertrag geeinigt. Aber was bedeutet das für die Bürger? Sehen Sie hierzu das ZDFspezial "Ende des Machtpokers - Große Koalition für Deutschland".',
-            'duration': 1022,
-            'uploader': 'spezial',
-            'uploader_id': '225948',
-            'upload_date': '20131127',
-        },
-        'skip': 'Videos on ZDF.de are depublicised in short order',
+            'id': 'zdfmediathek-trailer-100',
+            'ext': 'mp4',
+            'title': 'Die neue ZDFmediathek',
+            'description': 'md5:3003d36487fb9a5ea2d1ff60beb55e8d',
+            'duration': 30,
+            'timestamp': 1477627200,
+            'upload_date': '20161028',
+        }
+    }, {
+        'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html',
+        'only_matching': True,
      }]
  
-    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
-        param_groups = {}
-        for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
-            group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace'))
-            params = {}
-            for param in param_group:
-                params[param.get('name')] = param.get('value')
-            param_groups[group_id] = params
+    @staticmethod
+    def _extract_subtitles(src):
+        subtitles = {}
+        for caption in try_get(src, lambda x: x['captions'], list) or []:
+            subtitle_url = caption.get('uri')
+            if subtitle_url and isinstance(subtitle_url, compat_str):
+                lang = caption.get('language', 'deu')
+                subtitles.setdefault(lang, []).append({
+                    'url': subtitle_url,
+                })
+        return subtitles
+
+    def _extract_format(self, video_id, formats, format_urls, meta):
+        format_url = meta.get('url')
+        if not format_url or not isinstance(format_url, compat_str):
+            return
+        if format_url in format_urls:
+            return
+        format_urls.add(format_url)
+        mime_type = meta.get('mimeType')
+        ext = determine_ext(format_url)
+        if mime_type == 'application/x-mpegURL' or ext == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(
+                format_url, video_id, 'mp4', m3u8_id='hls',
+                entry_protocol='m3u8_native', fatal=False))
+        elif mime_type == 'application/f4m+xml' or ext == 'f4m':
+            formats.extend(self._extract_f4m_formats(
+                update_url_query(format_url, {'hdcore': '3.7.0'}), video_id, f4m_id='hds', fatal=False))
+        else:
+            f = parse_codecs(meta.get('mimeCodec'))
+            format_id = ['http']
+            for p in (meta.get('type'), meta.get('quality')):
+                if p and isinstance(p, compat_str):
+                    format_id.append(p)
+            f.update({
+                'url': format_url,
+                'format_id': '-'.join(format_id),
+                'format_note': meta.get('quality'),
+                'language': meta.get('language'),
+                'quality': qualities(self._QUALITIES)(meta.get('quality')),
+                'preference': -10,
+            })
+            formats.append(f)
+
+    def _extract_entry(self, url, content, video_id):
+        title = content.get('title') or content['teaserHeadline']
+
+        t = content['mainVideoContent']['http://zdf.de/rels/target']
+
+        ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
+
+        if not ptmd_path:
+            ptmd_path = t[
+                'http://zdf.de/rels/streams/ptmd-template'].replace(
+                '{playerId}', 'portal')
+
+        ptmd = self._download_json(urljoin(url, ptmd_path), video_id)
  
          formats = []
-        for video in smil.findall(self._xpath_ns('.//video', namespace)):
-            src = video.get('src')
-            if not src:
+        track_uris = set()
+        for p in ptmd['priorityList']:
+            formitaeten = p.get('formitaeten')
+            if not isinstance(formitaeten, list):
                  continue
-            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
-            group_id = video.get('paramGroup')
-            param_group = param_groups[group_id]
-            for proto in param_group['protocols'].split(','):
-                formats.append({
-                    'url': '%s://%s' % (proto, param_group['host']),
-                    'app': param_group['app'],
-                    'play_path': src,
-                    'ext': 'flv',
-                    'format_id': '%s-%d' % (proto, bitrate),
-                    'tbr': bitrate,
-                })
+            for f in formitaeten:
+                f_qualities = f.get('qualities')
+                if not isinstance(f_qualities, list):
+                    continue
+                for quality in f_qualities:
+                    tracks = try_get(quality, lambda x: x['audio']['tracks'], list)
+                    if not tracks:
+                        continue
+                    for track in tracks:
+                        self._extract_format(
+                            video_id, formats, track_uris, {
+                                'url': track.get('uri'),
+                                'type': f.get('type'),
+                                'mimeType': f.get('mimeType'),
+                                'quality': quality.get('quality'),
+                                'language': track.get('language'),
+                            })
          self._sort_formats(formats)
-        return formats
-
-    def extract_from_xml_url(self, video_id, xml_url):
-        doc = self._download_xml(
-            xml_url, video_id,
-            note='Downloading video info',
-            errnote='Failed to download video info')
-
-        status_code = doc.find('./status/statuscode')
-        if status_code is not None and status_code.text != 'ok':
-            code = status_code.text
-            if code == 'notVisibleAnymore':
-                message = 'Video %s is not available' % video_id
-            else:
-                message = '%s returned error: %s' % (self.IE_NAME, code)
-            raise ExtractorError(message, expected=True)
-
-        title = doc.find('.//information/title').text
-        description = xpath_text(doc, './/information/detail', 'description')
-        duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
-        uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
-        uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
-        upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
-        subtitles = {}
-        captions_url = doc.find('.//caption/url')
-        if captions_url is not None:
-            subtitles['de'] = [{
-                'url': captions_url.text,
-                'ext': 'ttml',
-            }]
-
-        def xml_to_thumbnails(fnode):
-            thumbnails = []
-            for node in fnode:
-                thumbnail_url = node.text
-                if not thumbnail_url:
+
+        thumbnails = []
+        layouts = try_get(
+            content, lambda x: x['teaserImageRef']['layouts'], dict)
+        if layouts:
+            for layout_key, layout_url in layouts.items():
+                if not isinstance(layout_url, compat_str):
                      continue
                  thumbnail = {
-                    'url': thumbnail_url,
+                    'url': layout_url,
+                    'format_id': layout_key,
                  }
-                if 'key' in node.attrib:
-                    m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
-                    if m:
-                        thumbnail['width'] = int(m.group(1))
-                        thumbnail['height'] = int(m.group(2))
+                mobj = re.search(r'(?P<width>\d+)x(?P<height>\d+)', layout_key)
+                if mobj:
+                    thumbnail.update({
+                        'width': int(mobj.group('width')),
+                        'height': int(mobj.group('height')),
+                    })
                  thumbnails.append(thumbnail)
-            return thumbnails
  
-        thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
+        return {
+            'id': video_id,
+            'title': title,
+            'description': content.get('leadParagraph') or content.get('teasertext'),
+            'duration': int_or_none(t.get('duration')),
+            'timestamp': unified_timestamp(content.get('editorialDate')),
+            'thumbnails': thumbnails,
+            'subtitles': self._extract_subtitles(ptmd),
+            'formats': formats,
+        }
  
-        format_nodes = doc.findall('.//formitaeten/formitaet')
-        quality = qualities(['veryhigh', 'high', 'med', 'low'])
+    def _extract_regular(self, url, player, video_id):
+        content = self._call_api(player['content'], player, url, video_id)
+        return self._extract_entry(player['content'], content, video_id)
  
-        def get_quality(elem):
-            return quality(xpath_text(elem, 'quality'))
-        format_nodes.sort(key=get_quality)
-        format_ids = []
-        formats = []
-        for fnode in format_nodes:
-            video_url = fnode.find('url').text
-            is_available = 'http://www.metafilegenerator' not in video_url
-            if not is_available:
-                continue
-            format_id = fnode.attrib['basetype']
-            quality = xpath_text(fnode, './quality', 'quality')
-            format_m = re.match(r'''(?x)
-                (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
-                (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
-            ''', format_id)
-
-            ext = determine_ext(video_url, None) or format_m.group('container')
-            if ext not in ('smil', 'f4m', 'm3u8'):
-                format_id = format_id + '-' + quality
-            if format_id in format_ids:
-                continue
+    def _extract_mobile(self, video_id):
+        document = self._download_json(
+            'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
+            video_id)['document']
  
-            if ext == 'meta':
-                continue
-            elif ext == 'smil':
-                formats.extend(self._extract_smil_formats(
-                    video_url, video_id, fatal=False))
-            elif ext == 'm3u8':
-                # the certificates are misconfigured (see
-                # https://github.com/rg3/youtube-dl/issues/8665)
-                if video_url.startswith('https://'):
-                    continue
-                formats.extend(self._extract_m3u8_formats(
-                    video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
-            elif ext == 'f4m':
-                formats.extend(self._extract_f4m_formats(
-                    video_url, video_id, f4m_id=format_id, fatal=False))
-            else:
-                proto = format_m.group('proto').lower()
-
-                abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000)
-                vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000)
-
-                width = int_or_none(xpath_text(fnode, './width', 'width'))
-                height = int_or_none(xpath_text(fnode, './height', 'height'))
-
-                filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize'))
-
-                format_note = ''
-                if not format_note:
-                    format_note = None
-
-                formats.append({
-                    'format_id': format_id,
-                    'url': video_url,
-                    'ext': ext,
-                    'acodec': format_m.group('acodec'),
-                    'vcodec': format_m.group('vcodec'),
-                    'abr': abr,
-                    'vbr': vbr,
-                    'width': width,
-                    'height': height,
-                    'filesize': filesize,
-                    'format_note': format_note,
-                    'protocol': proto,
-                    '_available': is_available,
-                })
-            format_ids.append(format_id)
+        title = document['titel']
  
+        formats = []
+        format_urls = set()
+        for f in document['formitaeten']:
+            self._extract_format(video_id, formats, format_urls, f)
          self._sort_formats(formats)
  
+        thumbnails = []
+        teaser_bild = document.get('teaserBild')
+        if isinstance(teaser_bild, dict):
+            for thumbnail_key, thumbnail in teaser_bild.items():
+                thumbnail_url = try_get(
+                    thumbnail, lambda x: x['url'], compat_str)
+                if thumbnail_url:
+                    thumbnails.append({
+                        'url': thumbnail_url,
+                        'id': thumbnail_key,
+                        'width': int_or_none(thumbnail.get('width')),
+                        'height': int_or_none(thumbnail.get('height')),
+                    })
+
          return {
              'id': video_id,
              'title': title,
-            'description': description,
-            'duration': duration,
+            'description': document.get('beschreibung'),
+            'duration': int_or_none(document.get('length')),
+            'timestamp': unified_timestamp(try_get(
+                document, lambda x: x['meta']['editorialDate'], compat_str)),
              'thumbnails': thumbnails,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'upload_date': upload_date,
+            'subtitles': self._extract_subtitles(document),
              'formats': formats,
-            'subtitles': subtitles,
          }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        xml_url = 'http://www.zdf.de/ZDFmediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
-        return self.extract_from_xml_url(video_id, xml_url)
  
+        webpage = self._download_webpage(url, video_id, fatal=False)
+        if webpage:
+            player = self._extract_player(webpage, url, fatal=False)
+            if player:
+                return self._extract_regular(url, player, video_id)
+
+        return self._extract_mobile(video_id)
  
-class ZDFChannelIE(InfoExtractor):
-    _VALID_URL = r'(?:zdf:topic:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/.*kanaluebersicht/(?:[^/]+/)?)(?P<id>[0-9]+)'
+
+class ZDFChannelIE(ZDFBaseIE):
+    _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
-        'url': 'http://www.zdf.de/ZDFmediathek#/kanaluebersicht/1586442/sendung/Titanic',
+        'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
          'info_dict': {
-            'id': '1586442',
+            'id': 'das-aktuelle-sportstudio',
+            'title': 'das aktuelle sportstudio | ZDF',
          },
-        'playlist_count': 3,
-    }, {
-        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/aktuellste/332',
-        'only_matching': True,
+        'playlist_count': 21,
      }, {
-        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/meist-gesehen/332',
-        'only_matching': True,
+        'url': 'https://www.zdf.de/dokumentation/planet-e',
+        'info_dict': {
+            'id': 'planet-e',
+            'title': 'planet e.',
+        },
+        'playlist_count': 4,
      }, {
-        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/_/1798716?bc=nrt;nrm?flash=off',
+        'url': 'https://www.zdf.de/filme/taunuskrimi/',
          'only_matching': True,
      }]
-    _PAGE_SIZE = 50
-
-    def _fetch_page(self, channel_id, page):
-        offset = page * self._PAGE_SIZE
-        xml_url = (
-            'http://www.zdf.de/ZDFmediathek/xmlservice/web/aktuellste?ak=web&offset=%d&maxLength=%d&id=%s'
-            % (offset, self._PAGE_SIZE, channel_id))
-        doc = self._download_xml(
-            xml_url, channel_id,
-            note='Downloading channel info',
-            errnote='Failed to download channel info')
-
-        title = doc.find('.//information/title').text
-        description = doc.find('.//information/detail').text
-        for asset in doc.findall('.//teasers/teaser'):
-            a_type = asset.find('./type').text
-            a_id = asset.find('./details/assetId').text
-            if a_type not in ('video', 'topic'):
-                continue
-            yield {
-                '_type': 'url',
-                'playlist_title': title,
-                'playlist_description': description,
-                'url': 'zdf:%s:%s' % (a_type, a_id),
-            }
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ZDFIE.suitable(url) else super(ZDFChannelIE, cls).suitable(url)
  
      def _real_extract(self, url):
          channel_id = self._match_id(url)
-        entries = OnDemandPagedList(
-            functools.partial(self._fetch_page, channel_id), self._PAGE_SIZE)
  
-        return {
-            '_type': 'playlist',
-            'id': channel_id,
-            'entries': entries,
-        }
+        webpage = self._download_webpage(url, channel_id)
+
+        entries = [
+            self.url_result(item_url, ie=ZDFIE.ie_key())
+            for item_url in orderedSet(re.findall(
+                r'data-plusbar-url=["\'](http.+?\.html)', webpage))]
+
+        return self.playlist_result(
+            entries, channel_id, self._og_search_title(webpage, fatal=False))
+
+        r"""
+        player = self._extract_player(webpage, channel_id)
+
+        channel_id = self._search_regex(
+            r'docId\s*:\s*(["\'])(?P<id>(?!\1).+?)\1', webpage,
+            'channel id', group='id')
+
+        channel = self._call_api(
+            'https://api.zdf.de/content/documents/%s.json' % channel_id,
+            player, url, channel_id)
+
+        items = []
+        for module in channel['module']:
+            for teaser in try_get(module, lambda x: x['teaser'], list) or []:
+                t = try_get(
+                    teaser, lambda x: x['http://zdf.de/rels/target'], dict)
+                if not t:
+                    continue
+                items.extend(try_get(
+                    t,
+                    lambda x: x['resultsWithVideo']['http://zdf.de/rels/search/results'],
+                    list) or [])
+            items.extend(try_get(
+                module,
+                lambda x: x['filterRef']['resultsWithVideo']['http://zdf.de/rels/search/results'],
+                list) or [])
+
+        entries = []
+        entry_urls = set()
+        for item in items:
+            t = try_get(item, lambda x: x['http://zdf.de/rels/target'], dict)
+            if not t:
+                continue
+            sharing_url = t.get('http://zdf.de/rels/sharing-url')
+            if not sharing_url or not isinstance(sharing_url, compat_str):
+                continue
+            if sharing_url in entry_urls:
+                continue
+            entry_urls.add(sharing_url)
+            entries.append(self.url_result(
+                sharing_url, ie=ZDFIE.ie_key(), video_id=t.get('id')))
+
+        return self.playlist_result(entries, channel_id, channel.get('title'))
+        """
diff --git a/youtube_dl/extractor/zingmp3.py b/youtube_dl/extractor/zingmp3.py

index 0f0e9d0eb9b1ac945934b11a134d143d82b19fb0..adfdcaabf6cb32ba9671db628007eeecbeb31b49 100644 (file)
--- a/youtube_dl/extractor/zingmp3.py
+++ b/youtube_dl/extractor/zingmp3.py
@@ -95,7 +95,7 @@ class ZingMp3IE(ZingMp3BaseInfoExtractor):
              'id': 'ZWZB9WAB',
              'title': 'Xa Mãi Xa',
              'ext': 'mp3',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
diff --git a/youtube_dl/jsinterp.py b/youtube_dl/jsinterp.py

index a8df4aef0a2553222d45b9f38131a2945470d412..24cdec28c6cb2332232212d6bcf39d03edc27c7a 100644 (file)
--- a/youtube_dl/jsinterp.py
+++ b/youtube_dl/jsinterp.py
@@ -213,7 +213,7 @@ class JSInterpreter(object):
      def extract_object(self, objname):
          obj = {}
          obj_m = re.search(
-            (r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) +
+            (r'(?<!this\.)%s\s*=\s*\{' % re.escape(objname)) +
              r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\}(?:,\s*)?)*)' +
              r'\}\s*;',
              self.code)
diff --git a/youtube_dl/options.py b/youtube_dl/options.py

index 53497fbc6f60a945b6350ce36e352a8eb6ef1f2c..349f44778e799af0c58b249939f14958036080a1 100644 (file)
--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -178,6 +178,10 @@ def parseOpts(overrideArguments=None):
          'When given in the global configuration file /etc/youtube-dl.conf: '
          'Do not read the user configuration in ~/.config/youtube-dl/config '
          '(%APPDATA%/youtube-dl/config.txt on Windows)')
+    general.add_option(
+        '--config-location',
+        dest='config_location', metavar='PATH',
+        help='Location of the configuration file; either the path to the config or its containing directory.')
      general.add_option(
          '--flat-playlist',
          action='store_const', dest='extract_flat', const='in_playlist',
@@ -212,23 +216,23 @@ def parseOpts(overrideArguments=None):
      network.add_option(
          '--source-address',
          metavar='IP', dest='source_address', default=None,
-        help='Client-side IP address to bind to (experimental)',
+        help='Client-side IP address to bind to',
      )
      network.add_option(
          '-4', '--force-ipv4',
          action='store_const', const='0.0.0.0', dest='source_address',
-        help='Make all connections via IPv4 (experimental)',
+        help='Make all connections via IPv4',
      )
      network.add_option(
          '-6', '--force-ipv6',
          action='store_const', const='::', dest='source_address',
-        help='Make all connections via IPv6 (experimental)',
+        help='Make all connections via IPv6',
      )
      network.add_option(
          '--geo-verification-proxy',
          dest='geo_verification_proxy', default=None, metavar='URL',
          help='Use this proxy to verify the IP address for some geo-restricted sites. '
-        'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
+        'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.'
      )
      network.add_option(
          '--cn-verification-proxy',
@@ -293,7 +297,7 @@ def parseOpts(overrideArguments=None):
          '--match-filter',
          metavar='FILTER', dest='match_filter', default=None,
          help=(
-            'Generic video filter (experimental). '
+            'Generic video filter. '
              'Specify any key (see help for -o for a list of available keys) to'
              ' match if the key is present, '
              '!key to check if the key is not present,'
@@ -341,7 +345,7 @@ def parseOpts(overrideArguments=None):
      authentication.add_option(
          '-2', '--twofactor',
          dest='twofactor', metavar='TWOFACTOR',
-        help='Two-factor auth code')
+        help='Two-factor authentication code')
      authentication.add_option(
          '-n', '--netrc',
          action='store_true', dest='usenetrc', default=False,
@@ -446,7 +450,7 @@ def parseOpts(overrideArguments=None):
          '--skip-unavailable-fragments',
          action='store_true', dest='skip_unavailable_fragments', default=True,
          help='Skip unavailable fragments (DASH and hlsnative only)')
-    general.add_option(
+    downloader.add_option(
          '--abort-on-unavailable-fragment',
          action='store_false', dest='skip_unavailable_fragments',
          help='Abort downloading when some fragment is not available')
@@ -466,10 +470,14 @@ def parseOpts(overrideArguments=None):
          '--playlist-reverse',
          action='store_true',
          help='Download playlist videos in reverse order')
+    downloader.add_option(
+        '--playlist-random',
+        action='store_true',
+        help='Download playlist videos in random order')
      downloader.add_option(
          '--xattr-set-filesize',
          dest='xattr_set_filesize', action='store_true',
-        help='Set file xattribute ytdl.filesize with expected filesize (experimental)')
+        help='Set file xattribute ytdl.filesize with expected file size (experimental)')
      downloader.add_option(
          '--hls-prefer-native',
          dest='hls_prefer_native', action='store_true', default=None,
@@ -657,8 +665,12 @@ def parseOpts(overrideArguments=None):
          help=('Output filename template, see the "OUTPUT TEMPLATE" for all the info'))
      filesystem.add_option(
          '--autonumber-size',
-        dest='autonumber_size', metavar='NUMBER',
-        help='Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given')
+        dest='autonumber_size', metavar='NUMBER', default=5, type=int,
+        help='Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given (default is %default)')
+    filesystem.add_option(
+        '--autonumber-start',
+        dest='autonumber_start', metavar='NUMBER', default=1, type=int,
+        help='Specify the start value for %(autonumber)s (default is %default)')
      filesystem.add_option(
          '--restrict-filenames',
          action='store_true', dest='restrictfilenames', default=False,
@@ -747,7 +759,7 @@ def parseOpts(overrideArguments=None):
          help='Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)')
      postproc.add_option(
          '--audio-format', metavar='FORMAT', dest='audioformat', default='best',
-        help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default')
+        help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default; No effect without -x')
      postproc.add_option(
          '--audio-quality', metavar='QUALITY',
          dest='audioquality', default='5',
@@ -845,22 +857,32 @@ def parseOpts(overrideArguments=None):
              return conf
  
          command_line_conf = compat_conf(sys.argv[1:])
-
-        if '--ignore-config' in command_line_conf:
-            system_conf = []
-            user_conf = []
+        opts, args = parser.parse_args(command_line_conf)
+
+        system_conf = user_conf = custom_conf = []
+
+        if '--config-location' in command_line_conf:
+            location = compat_expanduser(opts.config_location)
+            if os.path.isdir(location):
+                location = os.path.join(location, 'youtube-dl.conf')
+            if not os.path.exists(location):
+                parser.error('config-location %s does not exist.' % location)
+            custom_conf = _readOptions(location)
+        elif '--ignore-config' in command_line_conf:
+            pass
          else:
              system_conf = _readOptions('/etc/youtube-dl.conf')
-            if '--ignore-config' in system_conf:
-                user_conf = []
-            else:
+            if '--ignore-config' not in system_conf:
                  user_conf = _readUserConf()
-        argv = system_conf + user_conf + command_line_conf
  
+        argv = system_conf + user_conf + custom_conf + command_line_conf
          opts, args = parser.parse_args(argv)
          if opts.verbose:
-            write_string('[debug] System config: ' + repr(_hide_login_info(system_conf)) + '\n')
-            write_string('[debug] User config: ' + repr(_hide_login_info(user_conf)) + '\n')
-            write_string('[debug] Command-line args: ' + repr(_hide_login_info(command_line_conf)) + '\n')
+            for conf_label, conf in (
+                    ('System config', system_conf),
+                    ('User config', user_conf),
+                    ('Custom config', custom_conf),
+                    ('Command-line args', command_line_conf)):
+                write_string('[debug] %s: %s\n' % (conf_label, repr(_hide_login_info(conf))))
  
      return parser, opts, args
diff --git a/youtube_dl/postprocessor/metadatafromtitle.py b/youtube_dl/postprocessor/metadatafromtitle.py

index 920573da9d8f472b8fdd8681cab0be1c6331afb7..164edd3a820af4d0c3d1af48b9cf81a6b5460e9b 100644 (file)
--- a/youtube_dl/postprocessor/metadatafromtitle.py
+++ b/youtube_dl/postprocessor/metadatafromtitle.py
@@ -12,7 +12,7 @@ class MetadataFromTitlePP(PostProcessor):
          self._titleregex = self.format_to_regex(titleformat)
  
      def format_to_regex(self, fmt):
-        """
+        r"""
          Converts a string like
             '%(title)s - %(artist)s'
          to a regex like
diff --git a/youtube_dl/socks.py b/youtube_dl/socks.py

index 63d19b3a5214221afa71a6d43bde36a39c13cd4b..0f5d7bdb2128b17c2e1dba3144ff01d9b3d2f06a 100644 (file)
--- a/youtube_dl/socks.py
+++ b/youtube_dl/socks.py
@@ -55,12 +55,12 @@ class Socks5AddressType(object):
      ATYP_IPV6 = 0x04
  
  
-class ProxyError(IOError):
+class ProxyError(socket.error):
      ERR_SUCCESS = 0x00
  
      def __init__(self, code=None, msg=None):
          if code is not None and msg is None:
-            msg = self.CODES.get(code) and 'unknown error'
+            msg = self.CODES.get(code) or 'unknown error'
          super(ProxyError, self).__init__(code, msg)
  
  
@@ -123,7 +123,7 @@ class sockssocket(socket.socket):
          while len(data) < cnt:
              cur = self.recv(cnt - len(data))
              if not cur:
-                raise IOError('{0} bytes missing'.format(cnt - len(data)))
+                raise EOFError('{0} bytes missing'.format(cnt - len(data)))
              data += cur
          return data
  
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 9595bcf9f120ea4d24133e3f7399e637d14ac035..67a847ebad8238fc4f368f46b336b80e6caa3673 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -86,6 +86,11 @@ std_headers = {
  }
  
  
+USER_AGENTS = {
+    'Safari': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27',
+}
+
+
  NO_DEFAULT = object()
  
  ENGLISH_MONTH_NAMES = [
@@ -123,7 +128,13 @@ DATE_FORMATS = (
      '%d %B %Y',
      '%d %b %Y',
      '%B %d %Y',
+    '%B %dst %Y',
+    '%B %dnd %Y',
+    '%B %dth %Y',
      '%b %d %Y',
+    '%b %dst %Y',
+    '%b %dnd %Y',
+    '%b %dth %Y',
      '%b %dst %Y %I:%M',
      '%b %dnd %Y %I:%M',
      '%b %dth %Y %I:%M',
@@ -132,6 +143,7 @@ DATE_FORMATS = (
      '%Y/%m/%d',
      '%Y/%m/%d %H:%M',
      '%Y/%m/%d %H:%M:%S',
+    '%Y-%m-%d %H:%M',
      '%Y-%m-%d %H:%M:%S',
      '%Y-%m-%d %H:%M:%S.%f',
      '%d.%m.%Y %H:%M',
@@ -496,7 +508,7 @@ def sanitize_path(s):
      if drive_or_unc:
          norm_path.pop(0)
      sanitized_path = [
-        path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|[\s.]$)', '#', path_part)
+        path_part if path_part in ['.', '..'] else re.sub(r'(?:[/<>:"\|\\?\*]|[\s.]$)', '#', path_part)
          for path_part in norm_path]
      if drive_or_unc:
          sanitized_path.insert(0, drive_or_unc + os.path.sep)
@@ -1178,7 +1190,7 @@ def date_from_str(date_str):
          return today
      if date_str == 'yesterday':
          return today - datetime.timedelta(days=1)
-    match = re.match('(now|today)(?P<sign>[+-])(?P<time>\d+)(?P<unit>day|week|month|year)(s)?', date_str)
+    match = re.match(r'(now|today)(?P<sign>[+-])(?P<time>\d+)(?P<unit>day|week|month|year)(s)?', date_str)
      if match is not None:
          sign = match.group('sign')
          time = int(match.group('time'))
@@ -1695,6 +1707,16 @@ def base_url(url):
      return re.match(r'https?://[^?#&]+/', url).group()
  
  
+def urljoin(base, path):
+    if not isinstance(path, compat_str) or not path:
+        return None
+    if re.match(r'^(?:https?:)?//', path):
+        return path
+    if not isinstance(base, compat_str) or not re.match(r'^(?:https?:)?//', base):
+        return None
+    return compat_urlparse.urljoin(base, path)
+
+
  class HEADRequest(compat_urllib_request.Request):
      def get_method(self):
          return 'HEAD'
@@ -1751,7 +1773,7 @@ def parse_duration(s):
      s = s.strip()
  
      days, hours, mins, secs, ms = [None] * 5
-    m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s)
+    m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?Z?$', s)
      if m:
          days, hours, mins, secs, ms = m.groups()
      else:
@@ -1768,11 +1790,11 @@ def parse_duration(s):
                  )?
                  (?:
                      (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
-                )?$''', s)
+                )?Z?$''', s)
          if m:
              days, hours, mins, secs, ms = m.groups()
          else:
-            m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s)
+            m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
              if m:
                  hours, mins = m.groups()
              else:
@@ -2081,11 +2103,18 @@ def strip_jsonp(code):
  
  
  def js_to_json(code):
+    COMMENT_RE = r'/\*(?:(?!\*/).)*?\*/|//[^\n]*'
+    SKIP_RE = r'\s*(?:{comment})?\s*'.format(comment=COMMENT_RE)
+    INTEGER_TABLE = (
+        (r'(?s)^(0[xX][0-9a-fA-F]+){skip}:?$'.format(skip=SKIP_RE), 16),
+        (r'(?s)^(0+[0-7]+){skip}:?$'.format(skip=SKIP_RE), 8),
+    )
+
      def fix_kv(m):
          v = m.group(0)
          if v in ('true', 'false', 'null'):
              return v
-        elif v.startswith('/*') or v == ',':
+        elif v.startswith('/*') or v.startswith('//') or v == ',':
              return ""
  
          if v[0] in ("'", '"'):
@@ -2096,11 +2125,6 @@ def js_to_json(code):
                  '\\x': '\\u00',
              }.get(m.group(0), m.group(0)), v[1:-1])
  
-        INTEGER_TABLE = (
-            (r'^(0[xX][0-9a-fA-F]+)\s*:?$', 16),
-            (r'^(0+[0-7]+)\s*:?$', 8),
-        )
-
          for regex, base in INTEGER_TABLE:
              im = re.match(regex, v)
              if im:
@@ -2112,11 +2136,11 @@ def js_to_json(code):
      return re.sub(r'''(?sx)
          "(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
          '(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
-        /\*.*?\*/|,(?=\s*[\]}])|
+        {comment}|,(?={skip}[\]}}])|
          [a-zA-Z_][.a-zA-Z_0-9]*|
-        \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:\s*:)?|
-        [0-9]+(?=\s*:)
-        ''', fix_kv, code)
+        \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:{skip}:)?|
+        [0-9]+(?={skip}:)
+        '''.format(comment=COMMENT_RE, skip=SKIP_RE), fix_kv, code)
  
  
  def qualities(quality_ids):
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 1acb630af245e1288773118670821d846bde5bdd..a73e9d89c11d33c6999313fbeaea010aa432401f 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2016.12.01'
+__version__ = '2017.02.07'
author	Rogério Brito <rbrito@ime.usp.br>
	Thu, 9 Feb 2017 07:09:12 +0000 (05:09 -0200)
committer	Rogério Brito <rbrito@ime.usp.br>
	Thu, 9 Feb 2017 07:12:11 +0000 (05:12 -0200)