cached_path()
¶
- cached_path.cached_path(url_or_filename: Union[str, os.PathLike], cache_dir: Optional[Union[str, os.PathLike]] = None, extract_archive: bool = False, force_extract: bool = False) str ¶
Given something that might be a URL or local path, determine which. If it’s a remote resource, download the file and cache it, and then return the path to the cached file. If it’s already a local path, make sure the file exists and return the path.
For URLs, the following schemes are all supported out-of-the-box:
http
andhttps
,s3
for objects on AWS S3,gs
for objects on Google Cloud Storage (GCS), andhf
for objects or repositories on HuggingFace Hub.
You can also extend
cached_path()
to handle more schemes withadd_scheme_client()
.Examples
To download a file over
https
:cached_path("https://github.com/allenai/cached_path/blob/main/README.md")
To download an object on GCS:
cached_path("gs://allennlp-public-models/lerc-2020-11-18.tar.gz")
To download the PyTorch weights for the model epwalsh/bert-xsmall-dummy on HuggingFace, you could do:
cached_path("hf://epwalsh/bert-xsmall-dummy/pytorch_model.bin")
For paths or URLs that point to a tarfile or zipfile, you can append the path to a specific file within the archive to the
url_or_filename
, preceeded by a “!”. The archive will be automatically extracted (provided you setextract_archive
toTrue
), returning the local path to the specific file. For example:cached_path("model.tar.gz!weights.th", extract_archive=True)
- Parameters
url_or_filename – A URL or path to parse and possibly download.
cache_dir – The directory to cache downloads. If not specified, the global default cache directory will be used (
~/.cache/cached_path
). This can be set to something else withset_cache_dir()
.extract_archive – If
True
, then zip or tar.gz archives will be automatically extracted. In which case the directory is returned.force_extract –
If
True
and the file is an archive file, it will be extracted regardless of whether or not the extracted directory already exists.Caution
Use this flag with caution! This can lead to race conditions if used from multiple processes on the same file.
- Returns
The local path to the (potentially cached) resource.
Important
The return type is always a
str
even if the original argument was aPath
.- Return type
str
- Raises
FileNotFoundError – If the resource cannot be found locally or remotely.
ValueError – When the URL is invalid.
Other errors – Other error types are possible as well depending on the client used to fetch the resource.