cached_path()#

cached_path.cached_path(url_or_filename, cache_dir=None, extract_archive=False, force_extract=False, quiet=False, progress=None)[source]#

Given something that might be a URL or local path, determine which. If it’s a remote resource, download the file and cache it, and then return the path to the cached file. If it’s already a local path, make sure the file exists and return the path.

For URLs, the following schemes are all supported out-of-the-box:

If you have Beaker-py installed you can also use URLs of the form: beaker://{user_name}/{dataset_name}/{file_path}.

You can also extend cached_path() to handle more schemes with add_scheme_client().

Examples

To download a file over https:

cached_path("https://github.com/allenai/cached_path/blob/main/README.md")

To download an object on GCS:

cached_path("gs://allennlp-public-models/lerc-2020-11-18.tar.gz")

To download the PyTorch weights for the model epwalsh/bert-xsmall-dummy on HuggingFace, you could do:

cached_path("hf://epwalsh/bert-xsmall-dummy/pytorch_model.bin")

For paths or URLs that point to a tarfile or zipfile, you can append the path to a specific file within the archive to the url_or_filename, preceeded by a “!”. The archive will be automatically extracted (provided you set extract_archive to True), returning the local path to the specific file. For example:

cached_path("model.tar.gz!weights.th", extract_archive=True)
Parameters:
  • url_or_filename (Union[str, PathLike]) – A URL or path to parse and possibly download.

  • cache_dir (Union[PathLike, str, None], default: None) – The directory to cache downloads. If not specified, the global default cache directory will be used (~/.cache/cached_path). This can be set to something else with set_cache_dir().

  • extract_archive (bool, default: False) – If True, then zip or tar.gz archives will be automatically extracted. In which case the directory is returned.

  • force_extract (bool, default: False) –

    If True and the file is an archive file, it will be extracted regardless of whether or not the extracted directory already exists.

    Caution

    Use this flag with caution! This can lead to race conditions if used from multiple processes on the same file.

  • quiet (bool, default: False) – If True, progress displays won’t be printed.

  • progress (Optional[Progress], default: None) – A custom progress display to use. If not set and quiet=False, a default display from get_download_progress() will be used.

Returns:

The local path to the (potentially cached) resource.

Return type:

pathlib.Path

Raises:
  • FileNotFoundError – If the resource cannot be found locally or remotely.

  • ValueError – When the URL is invalid.

  • Other errors – Other error types are possible as well depending on the client used to fetch the resource.