cached_path()¶
- cached_path.cached_path(url_or_filename, cache_dir=None, extract_archive=False, force_extract=False, quiet=False, progress=None, headers=None)[source]¶
Given something that might be a URL or local path, determine which. If it’s a remote resource, download the file and cache it, and then return the path to the cached file. If it’s already a local path, make sure the file exists and return the path.
For URLs, the following schemes are all supported out-of-the-box:
httpandhttps,s3for objects on AWS S3,gsfor objects on Google Cloud Storage (GCS), andhffor objects or repositories on HuggingFace Hub.
If you have Beaker-py installed you can also use URLs of the form:
beaker://{user_name}/{dataset_name}/{file_path}.You can also extend
cached_path()to handle more schemes withadd_scheme_client().Examples
To download a file over
https:cached_path("https://github.com/allenai/cached_path/blob/main/README.md")
To download an object on GCS:
cached_path("gs://allennlp-public-models/lerc-2020-11-18.tar.gz")
To download the PyTorch weights for the model epwalsh/bert-xsmall-dummy on HuggingFace, you could do:
cached_path("hf://epwalsh/bert-xsmall-dummy/pytorch_model.bin")
For paths or URLs that point to a tarfile or zipfile, you can append the path to a specific file within the archive to the
url_or_filename, preceeded by a “!”. The archive will be automatically extracted (provided you setextract_archivetoTrue), returning the local path to the specific file. For example:cached_path("model.tar.gz!weights.th", extract_archive=True)
- Parameters:
url_or_filename (
Union[str,PathLike]) – A URL or path to parse and possibly download.cache_dir (
Union[str,PathLike,None], default:None) – The directory to cache downloads. If not specified, the global default cache directory will be used (~/.cache/cached_path). This can be set to something else withset_cache_dir().extract_archive (
bool, default:False) – IfTrue, then zip or tar.gz archives will be automatically extracted. In which case the directory is returned.force_extract (
bool, default:False) –If
Trueand the file is an archive file, it will be extracted regardless of whether or not the extracted directory already exists.Caution
Use this flag with caution! This can lead to race conditions if used from multiple processes on the same file.
quiet (
bool, default:False) – IfTrue, progress displays won’t be printed.progress (
Optional[Progress], default:None) – A custom progress display to use. If not set andquiet=False, a default display fromget_download_progress()will be used.headers (
Optional[Dict[str,str]], default:None) – Custom headers to add to HTTP requests. Example:{"Authorization": "Bearer YOUR_TOKEN"}for private resources. Only used for HTTP/HTTPS resources.
- Returns:
The local path to the (potentially cached) resource.
- Return type:
- Raises:
FileNotFoundError – If the resource cannot be found locally or remotely.
ValueError – When the URL is invalid.
Other errors – Other error types are possible as well depending on the client used to fetch the resource.