`cached_path()`¶

cached_path.cached_path(url_or_filename: Union[str, os.PathLike], cache_dir: Optional[Union[str, os.PathLike]] = None, extract_archive: bool = False, force_extract: bool = False) → str[source]¶

Given something that might be a URL or local path, determine which. If it’s a remote resource, download the file and cache it, and then return the path to the cached file. If it’s already a local path, make sure the file exists and return the path.

For URLs, the following schemes are all supported out-of-the-box:

http and https,
s3 for objects on AWS S3,
gs for objects on Google Cloud Storage (GCS), and
hf for objects or repositories on HuggingFace Hub.

You can also extend cached_path() to handle more schemes with add_scheme_client().

Examples

To download a file over https:

cached_path("https://github.com/allenai/cached_path/blob/main/README.md")

To download an object on GCS:

cached_path("gs://allennlp-public-models/lerc-2020-11-18.tar.gz")

To download the PyTorch weights for the model epwalsh/bert-xsmall-dummy on HuggingFace, you could do:

cached_path("hf://epwalsh/bert-xsmall-dummy/pytorch_model.bin")

For paths or URLs that point to a tarfile or zipfile, you can append the path to a specific file within the archive to the url_or_filename, preceeded by a “!”. The archive will be automatically extracted (provided you set extract_archive to True), returning the local path to the specific file. For example:

cached_path("model.tar.gz!weights.th", extract_archive=True)

Parameters

url_or_filename – A URL or path to parse and possibly download.
cache_dir – The directory to cache downloads. If not specified, the global default cache directory will be used (~/.cache/cached_path). This can be set to something else with set_cache_dir().
extract_archive – If True, then zip or tar.gz archives will be automatically extracted. In which case the directory is returned.
force_extract –
If True and the file is an archive file, it will be extracted regardless of whether or not the extracted directory already exists.

Caution

Use this flag with caution! This can lead to race conditions if used from multiple processes on the same file.

Returns

The local path to the (potentially cached) resource.

Important

The return type is always a str even if the original argument was a Path.

Return type

str

Raises

FileNotFoundError – If the resource cannot be found locally or remotely.
ValueError – When the URL is invalid.
Other errors – Other error types are possible as well depending on the client used to fetch the resource.

cached_path()¶

`cached_path()`¶