ngs_tools.utils

Module Contents

Classes

suppress_stdout_stderr

A context manager for doing a "deep suppression" of stdout and stderr in

ParallelWithProgress

Wrapper around joblib.Parallel that uses tqdm to print execution progress.

TqdmUpTo

Wrapper around tqdm() so that it can be used with urlretrieve().

FileWrapper

Generic wrapper class for file-formats. Used to wrap file-format-specific

Functions

retry(→ Any)

Utility function to retry a function some number of times, with optional

retry_decorator(→ Callable)

Function decorator to retry a function on exceptions.

run_executable(→ Union[subprocess.Popen, ...)

Execute a single shell command.

is_remote(→ bool)

Check if a string is a remote URL.

is_gzip(→ bool)

Check if a file is Gzipped by checking the magic string.

open_as_text(→ TextIO)

Open a (possibly gzipped) file in text mode.

decompress_gzip(→ str)

Decompress a gzip file to provided file path.

compress_gzip(→ str)

Compress a file into gzip.

concatenate_files(*paths, out_path)

Concatenates an arbitrary number of files into one file.

concatenate_files_as_text(→ str)

Concatenates an arbitrary number of files into one TEXT file.

download_file(→ str)

Download a remote file to the provided path while displaying a progress bar.

stream_file(→ str)

A context manager that creates a FIFO file to use for piping remote files

all_exists(→ bool)

Check whether all provided paths exist.

mkstemp([dir, delete])

Wrapper for tempfile.mkstemp() that automatically closes the OS-level

write_pickle(→ str)

Pickle a Python object and compress with Gzip.

read_pickle(→ object)

Load a Python pickle that was compressed with Gzip.

flatten_dictionary(→ Generator[Tuple[tuple, object], ...)

Generator that flattens the given dictionary into 2-element tuples

flatten_iter(→ Generator[object, None, None])

Generator that flattens the given iterable, except for strings.

merge_dictionaries(→ dict)

Merge two dictionaries, applying an arbitrary function f to duplicate keys.

flatten_dict_values(→ list)

Extract all values from a nested dictionary.

set_executable(path)

Set the permissions of a file to be executable.

class ngs_tools.utils.suppress_stdout_stderr

A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.

This will not suppress raised exceptions, since exceptions are printed

to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223

__enter__()
__exit__(*_)
ngs_tools.utils.retry(function: Callable, retries: int, args: Optional[tuple] = None, kwargs: Optional[dict] = None, retry_every: Optional[int] = None, backoff: bool = False, exceptions: Optional[Tuple[Exception]] = None) Any

Utility function to retry a function some number of times, with optional exponential backoff.

Parameters:
  • function – Function to retry

  • retries – Number of times to retry

  • args – Function arguments

  • kwargs – Dictionary of keyword arguments

  • retry_every – Time to wait in seconds between retries. Defaults to no wait time.

  • backoff – Whether or not to exponential backoff between retries

  • exceptions – Tuple of exceptions to expect. Defaults to all exceptions.

Returns:

Whatever function returns

ngs_tools.utils.retry_decorator(retries: int, retry_every: Optional[int] = None, backoff: bool = False, exceptions: Optional[Tuple[Exception]] = None) Callable

Function decorator to retry a function on exceptions.

Parameters:
  • retries – Number of times to retry

  • retry_every – Time to wait in seconds between retries. Defaults to no wait time.

  • backoff – Whether or not to exponential backoff between retries

  • exceptions – Tuple of exceptions to expect. Defaults to all exceptions.

ngs_tools.utils.run_executable(command: List[str], stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, wait: bool = True, stream: bool = True, quiet: bool = False, returncode: int = 0, alias: bool = True) Union[subprocess.Popen, Tuple[subprocess.Popen, List[str], List[str]]]

Execute a single shell command.

Parameters:
  • command – A list representing a single shell command

  • stdin – Object to pass into the stdin argument for :class:subprocess.Popen. Defaults to None

  • stdout – Object to pass into the stdout argument for :class:subprocess.Popen Defaults to subprocess.PIPE

  • stderr – Object to pass into the stderr argument for :class:subprocess.Popen, Defaults to subprocess.PIPE

  • wait – Whether to wait until the command has finished. Defaults to True

  • stream – Whether to stream the output to the command line. Defaults to True

  • quiet – Whether to not display anything to the command line and not check the return code. Defaults to False

  • returncode – The return code expected if the command runs as intended. Defaults to 0

  • alias – Whether to use the basename of the first element of command. Defaults to True

Returns:

A tuple of (the spawned process, string printed to stdout, string printed to stderr) if wait=True. Otherwise, just the spawned process.

Raises:

subprocess.CalledProcessError – If not quiet and the process exited with an exit code != exitcode

class ngs_tools.utils.ParallelWithProgress(pbar: Optional[tqdm.tqdm] = None, total: Optional[int] = None, desc: Optional[str] = None, disable: bool = False, *args, **kwargs)

Bases: joblib.Parallel

Wrapper around joblib.Parallel that uses tqdm to print execution progress. Taken from https://stackoverflow.com/a/61900501

__call__(*args, **kwargs)
print_progress()
ngs_tools.utils.is_remote(path: str) bool

Check if a string is a remote URL.

Parameters:

path – string to check

Returns:

True or False

ngs_tools.utils.is_gzip(path: str) bool

Check if a file is Gzipped by checking the magic string.

Parameters:

path – path to file

Returns:

True or False

ngs_tools.utils.open_as_text(path: str, mode: typing_extensions.Literal[r, w]) TextIO

Open a (possibly gzipped) file in text mode.

Parameters:
  • path – Path to file

  • mode – Mode to open file in. Either r for read or w for write.

Returns:

Opened file pointer that supports read and write functions.

ngs_tools.utils.decompress_gzip(gzip_path: str, out_path: str) str

Decompress a gzip file to provided file path.

Parameters:
  • gzip_path – Path to gzip file

  • out_path – Path to decompressed file

Returns:

Path to decompressed file

ngs_tools.utils.compress_gzip(file_path: str, out_path: str) str

Compress a file into gzip.

Parameters:
  • file_path – Path to file

  • out_dir – Path to compressed file

Returns:

Path to compressed file

ngs_tools.utils.concatenate_files(*paths: str, out_path: str)

Concatenates an arbitrary number of files into one file.

Parameters:
  • *paths – An arbitrary number of paths to files

  • out_path – Path to place concatenated file

Returns:

Path to concatenated file

ngs_tools.utils.concatenate_files_as_text(*paths: str, out_path: str) str

Concatenates an arbitrary number of files into one TEXT file.

Only supports plaintext and gzip files.

Parameters:
  • *paths – An arbitrary number of paths to files

  • out_path – Path to place concatenated file

Returns:

Path to concatenated file

class ngs_tools.utils.TqdmUpTo

Bases: tqdm.tqdm

Wrapper around tqdm() so that it can be used with urlretrieve(). https://github.com/tqdm/tqdm/blob/master/examples/tqdm_wget.py

update_to(b=1, bsize=1, tsize=None)
ngs_tools.utils.download_file(url: str, path: str) str

Download a remote file to the provided path while displaying a progress bar.

Parameters:
  • url – Remote url

  • path – Local path to download the file to

Returns:

Path to downloaded file

ngs_tools.utils.stream_file(url: str, path: str) str

A context manager that creates a FIFO file to use for piping remote files into processes. This function must be used as a context manager (the with keyword) so that any exceptions in the streaming thread may be captured.

This function spawns a new thread to download the remote file into a FIFO file object. FIFO file objects are only supported on unix systems.

Parameters:
  • url – Url to the file

  • path – Path to place FIFO file

Yields:

Path to FIFO file

Raises:

OSError – If the operating system does not support FIFO

ngs_tools.utils.all_exists(*paths: str) bool

Check whether all provided paths exist.

Parameters:

*paths – paths to files

Returns:

True if all files exist, False otherwise

class ngs_tools.utils.FileWrapper(path: str, mode: typing_extensions.Literal[r, w] = 'r')

Generic wrapper class for file-formats. Used to wrap file-format-specific implementations of reading and writing entries. This class is not designed to be initialized directly. Instead, it should be inherited by children that implements the read and write methods appropriately.

The file is opened immediately as soon as the class is initialized. This class can also be used as a context manager to safely close the file pointer with a with block.

path

Path to the file

mode

Open mode. Either r or w.

fp

File pointer

closed

Whether the file has been closed

property is_remote: bool
property is_gzip: bool
property closed: bool
__del__()
__enter__()
__exit__(*args, **kwargs)
__iter__()
_open()

Open the file

close()

Close the (possibly already-closed) file

reset()

Reset this wrapper by first closing the file and re-running initialization, which re-opens the file.

tell() int

Get the current location of the file pointer

abstract read() Any

Read a single entry. This method must be overridden by children.

abstract write(entry: Any)

Write a single entry. This method must be overridden by children.

ngs_tools.utils.mkstemp(dir: Optional[str] = None, delete: bool = False)

Wrapper for tempfile.mkstemp() that automatically closes the OS-level file descriptor. This function behaves like tempfile.mkdtemp() but for files.

Parameters:
  • dir – Directory to create the temporary file. This value is passed as the dir kwarg of tempfile.mkstemp(). Defaults to None.

  • delete – Whether to delete the temporary file before returning. Defaults to False.

Returns:

path to the temporary file

ngs_tools.utils.write_pickle(obj: object, path: str, *args, **kwargs) str

Pickle a Python object and compress with Gzip.

Any additional arguments and keyword arguments are passed to pickle.dump().

Parameters:
  • obj – Object to pickle

  • path – Path to save pickle

Returns:

Saved pickle path

ngs_tools.utils.read_pickle(path: str) object

Load a Python pickle that was compressed with Gzip.

Parameters:

path – Path to pickle

Returns:

Unpickled object

ngs_tools.utils.flatten_dictionary(d: dict, keys: Optional[tuple] = None) Generator[Tuple[tuple, object], None, None]

Generator that flattens the given dictionary into 2-element tuples containing keys and values. For nested dictionaries, the keys are appended into a tuple.

Parameters:
  • d – Dictionary to flatten

  • keys – Previous keys, defaults to None. Used exclusively for recursion.

Yields:

Flattened dictionary as (keys, value)

ngs_tools.utils.flatten_iter(it: Iterable) Generator[object, None, None]

Generator that flattens the given iterable, except for strings.

Parameters:

lst – Iterable to flatten

Yields:

Flattened iterable elements

ngs_tools.utils.merge_dictionaries(d1: dict, d2: dict, f: Callable[[object, object], object] = add, default: object = 0) dict

Merge two dictionaries, applying an arbitrary function f to duplicate keys. Dictionaries may be nested.

Parameters:
  • d1 – First dictionary

  • d2 – Second dictionary

  • f – Merge function. This function should take two arguments and return one, defaults to +

  • default – Default value or callable to use for keys not present in either dictionary, defaults to 0

Returns:

Merged dictionary

ngs_tools.utils.flatten_dict_values(d: dict) list

Extract all values from a nested dictionary.

Parameters:

d – Nested dictionary from which to extract values from

Returns:

All values from the dictionary as a list

ngs_tools.utils.set_executable(path: str)

Set the permissions of a file to be executable.