+If you want to add support for a new site, first of all MAKE SURE this
+site is NOT DEDICATED TO COPYRIGHT INFRINGEMENT. youtube-dl does NOT
+SUPPORT such sites thus pull requests adding support for them WILL BE
+REJECTED.
+
+After you have ensured this site is distributing its content legally,
+you can follow this quick list (assuming your service is called
+yourextractor):
+
+1. Fork this repository
+2. Check out the source code with:
+
+ git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
+
+3. Start a new git branch with
+
+ cd youtube-dl
+ git checkout -b yourextractor
+
+4. Start with this simple template and save it to
+ youtube_dl/extractor/yourextractor.py:
+
+ # coding: utf-8
+ from __future__ import unicode_literals
+
+ from .common import InfoExtractor
+
+
+ class YourExtractorIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
+ _TEST = {
+ 'url': 'http://yourextractor.com/watch/42',
+ 'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
+ 'info_dict': {
+ 'id': '42',
+ 'ext': 'mp4',
+ 'title': 'Video title goes here',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ # TODO more properties, either as:
+ # * A value
+ # * MD5 checksum; start the string with md5:
+ # * A regular expression; start the string with re:
+ # * Any Python type (for example int or float)
+ }
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+
+ # TODO more code goes here, for example ...
+ title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': self._og_search_description(webpage),
+ 'uploader': self._search_regex(r'<div[^>]+id="uploader"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False),
+ # TODO more properties (see youtube_dl/extractor/common.py)
+ }
+
+5. Add an import in youtube_dl/extractor/extractors.py.
+6. Run python test/test_download.py TestDownload.test_YourExtractor.
+ This _should fail_ at first, but you can continually re-run it until
+ you're done. If you decide to add more than one test, then rename
+ _TEST to _TESTS and make it into a list of dictionaries. The tests
+ will then be named TestDownload.test_YourExtractor,
+ TestDownload.test_YourExtractor_1,
+ TestDownload.test_YourExtractor_2, etc.
+7. Have a look at youtube_dl/extractor/common.py for possible helper
+ methods and a detailed description of what your extractor should and
+ may return. Add tests and code for as many as you want.
+8. Make sure your code follows youtube-dl coding conventions and check
+ the code with flake8. Also make sure your code works under all
+ Python versions claimed supported by youtube-dl, namely 2.6, 2.7,
+ and 3.2+.
+9. When the tests pass, add the new files and commit them and push the
+ result, like this:
+
+ $ git add youtube_dl/extractor/extractors.py
+ $ git add youtube_dl/extractor/yourextractor.py
+ $ git commit -m '[yourextractor] Add new extractor'
+ $ git push origin yourextractor
+
+10. Finally, create a pull request. We'll then review and merge it.
+
+In any case, thank you very much for your contributions!
+
+
+youtube-dl coding conventions
+
+This section introduces a guide lines for writing idiomatic, robust and
+future-proof extractor code.
+
+Extractors are very fragile by nature since they depend on the layout of
+the source data provided by 3rd party media hosters out of your control
+and this layout tends to change. As an extractor implementer your task
+is not only to write code that will extract media links and metadata
+correctly but also to minimize dependency on the source's layout and
+even to make the code foresee potential future changes and be ready for
+that. This is important because it will allow the extractor not to break
+on minor layout changes thus keeping old youtube-dl versions working.
+Even though this breakage issue is easily fixed by emitting a new
+version of youtube-dl with a fix incorporated, all the previous versions
+become broken in all repositories and distros' packages that may not be
+so prompt in fetching the update from us. Needless to say, some non
+rolling release distros may never receive an update at all.
+
+Mandatory and optional metafields
+
+For extraction to work youtube-dl relies on metadata your extractor
+extracts and provides to youtube-dl expressed by an information
+dictionary or simply _info dict_. Only the following meta fields in the
+_info dict_ are considered mandatory for a successful extraction process
+by youtube-dl:
+
+- id (media identifier)
+- title (media title)
+- url (media download URL) or formats
+
+In fact only the last option is technically mandatory (i.e. if you can't
+figure out the download location of the media the extraction does not
+make any sense). But by convention youtube-dl also treats id and title
+as mandatory. Thus the aforementioned metafields are the critical data
+that the extraction does not make any sense without and if any of them
+fail to be extracted then the extractor is considered completely broken.
+
+Any field apart from the aforementioned ones are considered OPTIONAL.
+That means that extraction should be TOLERANT to situations when sources
+for these fields can potentially be unavailable (even if they are always
+available at the moment) and FUTURE-PROOF in order not to break the
+extraction of general purpose mandatory fields.
+
+Example
+
+Say you have some source dictionary meta that you've fetched as JSON
+with HTTP request and it has a key summary:
+
+ meta = self._download_json(url, video_id)
+
+Assume at this point meta's layout is:
+
+ {
+ ...
+ "summary": "some fancy summary text",
+ ...
+ }
+
+Assume you want to extract summary and put it into the resulting info
+dict as description. Since description is an optional meta field you
+should be ready that this key may be missing from the meta dict, so that
+you should extract it like:
+
+ description = meta.get('summary') # correct
+
+and not like:
+
+ description = meta['summary'] # incorrect
+
+The latter will break extraction process with KeyError if summary
+disappears from meta at some later time but with the former approach
+extraction will just go ahead with description set to None which is
+perfectly fine (remember None is equivalent to the absence of data).
+
+Similarly, you should pass fatal=False when extracting optional data
+from a webpage with _search_regex, _html_search_regex or similar
+methods, for instance:
+
+ description = self._search_regex(
+ r'<span[^>]+id="title"[^>]*>([^<]+)<',
+ webpage, 'description', fatal=False)
+
+With fatal set to False if _search_regex fails to extract description it
+will emit a warning and continue extraction.
+
+You can also pass default=<some fallback value>, for example:
+
+ description = self._search_regex(
+ r'<span[^>]+id="title"[^>]*>([^<]+)<',
+ webpage, 'description', default=None)
+
+On failure this code will silently continue the extraction with
+description set to None. That is useful for metafields that may or may
+not be present.
+
+Provide fallbacks
+
+When extracting metadata try to do so from multiple sources. For example
+if title is present in several places, try extracting from at least some
+of them. This makes it more future-proof in case some of the sources
+become unavailable.
+
+Example
+
+Say meta from the previous example has a title and you are about to
+extract it. Since title is a mandatory meta field you should end up with
+something like:
+
+ title = meta['title']
+
+If title disappears from meta in future due to some changes on the
+hoster's side the extraction would fail since title is mandatory. That's
+expected.
+
+Assume that you have some another source you can extract title from, for
+example og:title HTML meta of a webpage. In this case you can provide a
+fallback scenario:
+
+ title = meta.get('title') or self._og_search_title(webpage)
+
+This code will try to extract from meta first and if it fails it will
+try extracting og:title from a webpage.
+
+Make regular expressions flexible
+
+When using regular expressions try to write them fuzzy and flexible.
+
+Example
+
+Say you need to extract title from the following HTML code:
+
+ <span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
+
+The code for that task should look similar to:
+
+ title = self._search_regex(
+ r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
+
+Or even better:
+
+ title = self._search_regex(
+ r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
+ webpage, 'title', group='title')
+
+Note how you tolerate potential changes in the style attribute's value
+or switch from using double quotes to single for class attribute:
+
+The code definitely should not look like:
+
+ title = self._search_regex(
+ r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
+ webpage, 'title', group='title')
+
+Use safe conversion functions
+
+Wrap all extracted numeric data into safe functions from utils:
+int_or_none, float_or_none. Use them for string to number conversions as
+well.
+
+
+
+EMBEDDING YOUTUBE-DL
+
+
+youtube-dl makes the best effort to be a good command-line program, and
+thus should be callable from any programming language. If you encounter
+any problems parsing its output, feel free to create a report.
+
+From a Python program, you can embed youtube-dl in a more powerful
+fashion, like this:
+
+ from __future__ import unicode_literals
+ import youtube_dl
+
+ ydl_opts = {}
+ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
+ ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
+
+Most likely, you'll want to use various options. For a list of options
+available, have a look at youtube_dl/YoutubeDL.py. For a start, if you
+want to intercept youtube-dl's output, set a logger object.
+
+Here's a more complete example of a program that outputs only errors
+(and a short message after the download is finished), and
+downloads/converts the video to an mp3 file:
+
+ from __future__ import unicode_literals
+ import youtube_dl
+
+
+ class MyLogger(object):
+ def debug(self, msg):
+ pass
+
+ def warning(self, msg):
+ pass
+
+ def error(self, msg):
+ print(msg)
+
+
+ def my_hook(d):
+ if d['status'] == 'finished':
+ print('Done downloading, now converting ...')
+
+
+ ydl_opts = {
+ 'format': 'bestaudio/best',
+ 'postprocessors': [{
+ 'key': 'FFmpegExtractAudio',
+ 'preferredcodec': 'mp3',
+ 'preferredquality': '192',
+ }],
+ 'logger': MyLogger(),
+ 'progress_hooks': [my_hook],
+ }
+ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
+ ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
+
+