X-Git-Url: https://git.rapsys.eu/youtubedl/blobdiff_plain/22bc55bffeb45b7d2f3056ae863eb3228e6507e8..540a20889afafc522db30a941b5cb2f97eb885ff:/README.txt?ds=inline diff --git a/README.txt b/README.txt index a0f20fd..045e0b1 100644 --- a/README.txt +++ b/README.txt @@ -576,8 +576,8 @@ However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to python string formatting operations. For example, %(NAME)s or %(NAME)05d. To clarify, that is a percent symbol followed by a name in -parentheses, followed by a formatting operations. Allowed names along -with sequence type are: +parentheses, followed by formatting operations. Allowed names along with +sequence type are: - id (string): Video identifier - title (string): Video title @@ -594,6 +594,8 @@ with sequence type are: available - upload_date (string): Video upload date (YYYYMMDD) - uploader_id (string): Nickname or id of the video uploader +- channel (string): Full name of the channel the video is uploaded on +- channel_id (string): Id of the channel - location (string): Physical location where the video was filmed - duration (numeric): Length of the video in seconds - view_count (numeric): How many users have watched the video on the @@ -777,15 +779,20 @@ of a particular file extension served as a single file, e.g. -f webm will download the best quality format with the webm extension served as a single file. -You can also use special names to select particular edge case formats: - -best: Select the best quality format represented by a single file with -video and audio. - worst: Select the worst quality format represented by -a single file with video and audio. - bestvideo: Select the best quality -video-only format (e.g. DASH video). May not be available. - worstvideo: -Select the worst quality video-only format. May not be available. - -bestaudio: Select the best quality audio only-format. May not be -available. - worstaudio: Select the worst quality audio only-format. May -not be available. +You can also use special names to select particular edge case formats: + +- best: Select the best quality format represented by a single file + with video and audio. +- worst: Select the worst quality format represented by a single file + with video and audio. +- bestvideo: Select the best quality video-only format (e.g. DASH + video). May not be available. +- worstvideo: Select the worst quality video-only format. May not be + available. +- bestaudio: Select the best quality audio only-format. May not be + available. +- worstaudio: Select the worst quality audio only-format. May not be + available. For example, to download the worst quality video-only format you can use -f worstvideo. @@ -808,20 +815,31 @@ You can also filter the video formats by putting a condition in brackets, as in -f "best[height=720]" (or -f "[filesize>10M]"). The following numeric meta fields can be used with comparisons <, <=, >, ->=, = (equals), != (not equals): - filesize: The number of bytes, if -known in advance - width: Width of the video, if known - height: Height -of the video, if known - tbr: Average bitrate of audio and video in -KBit/s - abr: Average audio bitrate in KBit/s - vbr: Average video -bitrate in KBit/s - asr: Audio sampling rate in Hertz - fps: Frame rate - -Also filtering work for comparisons = (equals), != (not equals), ^= -(begins with), $= (ends with), *= (contains) and following string meta -fields: - ext: File extension - acodec: Name of the audio codec in use - -vcodec: Name of the video codec in use - container: Name of the -container format - protocol: The protocol that will be used for the -actual download, lower-case (http, https, rtsp, rtmp, rtmpe, mms, f4m, -ism, http_dash_segments, m3u8, or m3u8_native) - format_id: A short -description of the format +>=, = (equals), != (not equals): + +- filesize: The number of bytes, if known in advance +- width: Width of the video, if known +- height: Height of the video, if known +- tbr: Average bitrate of audio and video in KBit/s +- abr: Average audio bitrate in KBit/s +- vbr: Average video bitrate in KBit/s +- asr: Audio sampling rate in Hertz +- fps: Frame rate + +Also filtering work for comparisons = (equals), ^= (starts with), $= +(ends with), *= (contains) and following string meta fields: + +- ext: File extension +- acodec: Name of the audio codec in use +- vcodec: Name of the video codec in use +- container: Name of the container format +- protocol: The protocol that will be used for the actual download, + lower-case (http, https, rtsp, rtmp, rtmpe, mms, f4m, ism, + http_dash_segments, m3u8, or m3u8_native) +- format_id: A short description of the format + +Any string comparison may be prefixed with negation ! in order to +produce an opposite comparison, e.g. !*= (does not contain). Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular @@ -874,7 +892,7 @@ single. # Download best mp4 format available or any other best if no mp4 available $ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' - # Download best format available but not better that 480p + # Download best format available but no better than 480p $ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]' # Download best video only format but no bigger than 50 MB @@ -1417,18 +1435,21 @@ yourextractor): methods and a detailed description of what your extractor should and may return. Add tests and code for as many as you want. 8. Make sure your code follows youtube-dl coding conventions and check - the code with flake8. Also make sure your code works under all - Python versions claimed supported by youtube-dl, namely 2.6, 2.7, - and 3.2+. -9. When the tests pass, add the new files and commit them and push the + the code with flake8: + + $ flake8 youtube_dl/extractor/yourextractor.py + +9. Make sure your code works under all Python versions claimed + supported by youtube-dl, namely 2.6, 2.7, and 3.2+. +10. When the tests pass, add the new files and commit them and push the result, like this: - $ git add youtube_dl/extractor/extractors.py - $ git add youtube_dl/extractor/yourextractor.py - $ git commit -m '[yourextractor] Add new extractor' - $ git push origin yourextractor + $ git add youtube_dl/extractor/extractors.py + $ git add youtube_dl/extractor/yourextractor.py + $ git commit -m '[yourextractor] Add new extractor' + $ git push origin yourextractor -10. Finally, create a pull request. We'll then review and merge it. +11. Finally, create a pull request. We'll then review and merge it. In any case, thank you very much for your contributions! @@ -1557,9 +1578,31 @@ fallback scenario: This code will try to extract from meta first and if it fails it will try extracting og:title from a webpage. -Make regular expressions flexible +Regular expressions + +Don't capture groups you don't use + +Capturing group must be an indication that it's used somewhere in the +code. Any group that is not used must be non capturing. + +Example + +Don't capture id attribute name here since you can't use it for anything +anyway. -When using regular expressions try to write them fuzzy and flexible. +Correct: + + r'(?:id|ID)=(?P\d+)' + +Incorrect: + + r'(id|ID)=(?P\d+)' + +Make regular expressions relaxed and flexible + +When using regular expressions try to write them fuzzy, relaxed and +flexible, skipping insignificant parts that are more likely to change, +allowing both single and double quotes for quoted values and so on. Example @@ -1587,11 +1630,113 @@ The code definitely should not look like: r'(.*?)', webpage, 'title', group='title') -Use safe conversion functions +Long lines policy + +There is a soft limit to keep lines of code under 80 characters long. +This means it should be respected if possible and if it does not make +readability and code maintenance worse. + +For example, you should NEVER split long string literals like URLs or +some other often copied entities over multiple lines to fit this limit: + +Correct: + + 'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' + +Incorrect: + + 'https://www.youtube.com/watch?v=FqZTN594JQw&list=' + 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' + +Inline values + +Extracting variables is acceptable for reducing code duplication and +improving readability of complex expressions. However, you should avoid +extracting variables used only once and moving them to opposite parts of +the extractor file, which makes reading the linear flow difficult. + +Example + +Correct: + + title = self._html_search_regex(r'([^<]+)', webpage, 'title') + +Incorrect: + + TITLE_RE = r'([^<]+)' + # ...some lines of code... + title = self._html_search_regex(TITLE_RE, webpage, 'title') + +Collapse fallbacks + +Multiple fallback values can quickly become unwieldy. Collapse multiple +fallback values into a single expression via a list of patterns. + +Example + +Good: + + description = self._html_search_meta( + ['og:description', 'description', 'twitter:description'], + webpage, 'description', default=None) + +Unwieldy: + + description = ( + self._og_search_description(webpage, default=None) + or self._html_search_meta('description', webpage, default=None) + or self._html_search_meta('twitter:description', webpage, default=None)) + +Methods supporting list of patterns are: _search_regex, +_html_search_regex, _og_search_property, _html_search_meta. + +Trailing parentheses + +Always move trailing parentheses after the last argument. + +Example + +Correct: + + lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'], + list) + +Incorrect: + + lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'], + list, + ) + +Use convenience conversion and parsing functions + +Wrap all extracted numeric data into safe functions from +youtube_dl/utils.py: int_or_none, float_or_none. Use them for string to +number conversions as well. + +Use url_or_none for safe URL processing. + +Use try_get for safe metadata extraction from parsed JSON. + +Use unified_strdate for uniform upload_date or any YYYYMMDD meta field +extraction, unified_timestamp for uniform timestamp extraction, +parse_filesize for filesize extraction, parse_count for count meta +fields extraction, parse_resolution, parse_duration for duration +extraction, parse_age_limit for age_limit extraction. + +Explore youtube_dl/utils.py for more useful convenience functions. + +More examples + +Safely extract optional description from parsed JSON + + description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str) + +Safely extract more optional metadata -Wrap all extracted numeric data into safe functions from utils: -int_or_none, float_or_none. Use them for string to number conversions as -well. + video = try_get(response, lambda x: x['result']['video'][0], dict) or {} + description = video.get('summary') + duration = float_or_none(video.get('durationMs'), scale=1000) + view_count = int_or_none(video.get('views')) @@ -1659,9 +1804,9 @@ BUGS Bugs and suggestions should be reported at: -https://github.com/rg3/youtube-dl/issues. Unless you were prompted to or -there is another pertinent reason (e.g. GitHub fails to accept the bug -report), please do not send bug reports via personal email. For +https://github.com/ytdl-org/youtube-dl/issues. Unless you were prompted +to or there is another pertinent reason (e.g. GitHub fails to accept the +bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel #youtube-dl on freenode (webchat).