Utilities

from cached_path import *

Functions

cached_path.set_cache_dir(cache_dir: Union[str, os.PathLike]) None

Set the global default cache directory.

cached_path.get_cache_dir() Union[str, os.PathLike]

Get the global default cache directory.

cached_path.file_friendly_logging(on: bool = True) None

Turn on (or off) file-friendly logging globally.

cached_path.add_scheme_client(client: Type[cached_path.schemes.scheme_client.SchemeClient]) None

Add a new SchemeClient.

This can be used to extend cached_path.cached_path() to handle custom schemes, or handle existing schemes differently.

cached_path.resource_to_filename(resource: str, etag: Optional[str] = None) str

Convert a resource into a hashed filename in a repeatable way. If etag is specified, append its hash to the resources’, delimited by a period.

THis is essentially the inverse of filename_to_url().

cached_path.filename_to_url(filename: str, cache_dir: Optional[Union[str, os.PathLike]] = None) Tuple[str, Optional[str]]

Return the URL and etag (which may be None) stored for filename. Raises FileNotFoundError if filename or its stored metadata do not exist.

This is essentially the inverse of resource_to_filename().

cached_path.find_latest_cached(url: str, cache_dir: Optional[Union[str, os.PathLike]] = None) Optional[str]

Get the path to the latest cached version of a given resource.

cached_path.check_tarfile(tar_file: tarfile.TarFile)

Tar files can contain files outside of the extraction directory, or symlinks that point outside the extraction directory. We also don’t want any block devices fifos, or other weird file types extracted. This checks for those issues and throws an exception if there is a problem.

Classes

class cached_path.SchemeClient(resource: str)

A client used for caching remote resources corresponding to URLs with a particular scheme.

Subclasses must define the scheme class variable and implement get_etag() and get_resource().

Important

Take care when implementing subclasses to raise the right error types from get_etag() and get_resource().

connection_error_types: ClassVar[Tuple[BaseException, ...]] = (<class 'requests.exceptions.ConnectionError'>,)

Subclasses can override this to define error types that will be treated as retriable connection errors.

get_etag() Optional[str]

Get the Etag or an equivalent version identifier associated with the resource.

Returns

The ETag as a str or None if there is no ETag associated with the resource.

Return type

Optional[str]

Raises
  • FileNotFoundError – If the resource doesn’t exist.

  • Connection error – Any error type defined in SchemeClient.connection_error_types will be treated as a retriable connection error.

  • Other errors – Any other error type can be raised, in which case cached_path() will log the error and move on to try to fetch the resource without the ETag.

get_resource(temp_file: IO) None

Download the resource to the given temporary file.

Raises
  • FileNotFoundError – If the resource doesn’t exist.

  • Connection error – Any error type defined in SchemeClient.connection_error_types will be treated as a retriable connection error.

  • Other errors – Any other error type can be raised, in which case cached_path() will fail and propogate the error.

scheme: ClassVar[Union[str, Tuple[str, ...]]] = ()

The scheme or schemes that the client will be used for (e.g. “http”).