- cached_path.cached_path(url_or_filename: Union[str, os.PathLike], cache_dir: Optional[Union[str, os.PathLike]] = None, extract_archive: bool = False, force_extract: bool = False) str ¶
Given something that might be a URL or local path, determine which. If it’s a remote resource, download the file and cache it, and then return the path to the cached file. If it’s already a local path, make sure the file exists and return the path.
For URLs, the following schemes are all supported out-of-the-box:
s3for objects on AWS S3,
gsfor objects on Google Cloud Storage (GCS), and
hffor objects or repositories on HuggingFace Hub.
You can also extend
cached_path()to handle more schemes with
To download a file over
To download an object on GCS:
To download the PyTorch weights for the model epwalsh/bert-xsmall-dummy on HuggingFace, you could do:
For paths or URLs that point to a tarfile or zipfile, you can append the path to a specific file within the archive to the
url_or_filename, preceeded by a “!”. The archive will be automatically extracted (provided you set
True), returning the local path to the specific file. For example:
url_or_filename – A URL or path to parse and possibly download.
cache_dir – The directory to cache downloads. If not specified, the global default cache directory will be used (
~/.cache/cached_path). This can be set to something else with
extract_archive – If
True, then zip or tar.gz archives will be automatically extracted. In which case the directory is returned.
Trueand the file is an archive file, it will be extracted regardless of whether or not the extracted directory already exists.
Use this flag with caution! This can lead to race conditions if used from multiple processes on the same file.
The local path to the (potentially cached) resource.
The return type is always a
streven if the original argument was a
- Return type
FileNotFoundError – If the resource cannot be found locally or remotely.
ValueError – When the URL is invalid.
Other errors – Other error types are possible as well depending on the client used to fetch the resource.