eradiate.data.SafeOnlineDataStore#

class eradiate.data.SafeOnlineDataStore(base_url, path, registry_fname='registry.txt', attempts=3)[source]#

Bases: DataStore

Serve files located online, with integrity check.

Parameters:
  • base_url (str) – URL to the online storage location.

  • path (path-like) – Path to the local cache location.

  • registry_fname (path-like, optional) – Path to the registry file, relative to path.

  • attempts (int, default: 3) – Number of download attempts to make before giving up because of connection errors or a hash mismatch.

Fields:
  • manager (pooch.Pooch) – The Pooch instance used to manage downloaded content.

  • registry_fname (Path) – Path to the registry file, relative to path.

Notes

This class basically wraps a pooch.Pooch instance.

fetch(filename, downloader=None)[source]#

Fetch a file from the data store. This method wraps pooch.Pooch.fetch() and automatically selects compressed files when they are available.

Parameters:
  • filename (path-like) – File name to fetch from the local storage, relative to the storage root.

  • downloader (callable(), optional) – A callable that will be called to download a given URL to a provided local file name. This is mostly useful to display progress bars during download.

Returns:

path (Path) – Absolute path where the retrieved resource is located.

Notes

If a compressed resource exists, it will be served automatically. For instance, if "foo.nc" is requested and foo.nc.gz is registered, the latter will be downloaded, decompressed and served as "foo.nc".

is_registered(filename, allow_compressed=True)[source]#

Check if a file is registered, with an option to look for compressed data.

Parameters:
  • filename (path-like) – File name to fetch from the local storage, relative to the storage root.

  • allow_compressed (bool, optional) – If True, a query for foo.bar will result in a query for the gzip-compressed file name foo.bar.gz. The compressed file takes precedence.

Returns:

path (Path) – The file name which matched filename.

Raises:

ValueError – If filename could not be matched with any entry in the registry.

purge(keep=None)[source]#

Purge local storage location. The default behaviour is very aggressive and will wipe out the entire directory contents.

Parameters:

keep ("registered" or list of str, optional) – If set to "registered", files in the registry, as well as the registry file itself, will not be deleted. Finer control is possible by passing a list of exclusion rules (paths relative to the store’s local storage root, shell wildcards allowed).

Notes

Passing keep="registered" keeps registered files to minimize the amount of data to be downloaded upon future queries the the data store. This means, for instance, that if data is registered and downloaded as a compressed file, then served decompressed, the compressed file will be kept, while the decompressed file will be deleted.

Warning

This is a destructive operation, make sure you know what you’re doing!

registry_delete()[source]#

Delete the registry file.

registry_fetch()[source]#

Get the absolute path to the registry file and make sure that it is written to the local cache.

registry_files(filter=None)[source]#

Get a list of registered files.

Parameters:

filter (callable(), optional) – A filter function taking a file path as a single string argument and returning a Boolean. Filenames for which the filter returns True will be returned.

Returns:

files (list of str) – List of registered files.

registry_reload(delete=False)[source]#

Reload the registry file from the local cache.

Parameters:

delete (bool, optional) – If True, the existing registry file will be deleted and downloaded again.

property base_url#

Address of the remote storage location.

Type:

str

property path#

Absolute path to the local data storage folder.

Type:

path

property registry#

Registry contents.

Type:

dict

property registry_path#

Absolute path to the registry file.

Type:

Path