ngs_tools.utils
Module Contents
Classes
A context manager for doing a "deep suppression" of stdout and stderr in |
|
Wrapper around joblib.Parallel that uses tqdm to print execution progress. |
|
Wrapper around |
|
Generic wrapper class for file-formats. Used to wrap file-format-specific |
Functions
|
Utility function to retry a function some number of times, with optional |
|
Function decorator to retry a function on exceptions. |
|
Execute a single shell command. |
|
Check if a string is a remote URL. |
|
Check if a file is Gzipped by checking the magic string. |
|
Open a (possibly gzipped) file in text mode. |
|
Decompress a gzip file to provided file path. |
|
Compress a file into gzip. |
|
Concatenates an arbitrary number of files into one file. |
|
Concatenates an arbitrary number of files into one TEXT file. |
|
Download a remote file to the provided path while displaying a progress bar. |
|
A context manager that creates a FIFO file to use for piping remote files |
|
Check whether all provided paths exist. |
|
Wrapper for |
|
Pickle a Python object and compress with Gzip. |
|
Load a Python pickle that was compressed with Gzip. |
|
Generator that flattens the given dictionary into 2-element tuples |
|
Generator that flattens the given iterable, except for strings. |
|
Merge two dictionaries, applying an arbitrary function f to duplicate keys. |
|
Extract all values from a nested dictionary. |
|
Set the permissions of a file to be executable. |
- class ngs_tools.utils.suppress_stdout_stderr
A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.
This will not suppress raised exceptions, since exceptions are printed
to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223
- __enter__()
- __exit__(*_)
- ngs_tools.utils.retry(function: Callable, retries: int, args: Optional[tuple] = None, kwargs: Optional[dict] = None, retry_every: Optional[int] = None, backoff: bool = False, exceptions: Optional[Tuple[Exception]] = None) Any
Utility function to retry a function some number of times, with optional exponential backoff.
- Parameters:
function – Function to retry
retries – Number of times to retry
args – Function arguments
kwargs – Dictionary of keyword arguments
retry_every – Time to wait in seconds between retries. Defaults to no wait time.
backoff – Whether or not to exponential backoff between retries
exceptions – Tuple of exceptions to expect. Defaults to all exceptions.
- Returns:
Whatever
function
returns
- ngs_tools.utils.retry_decorator(retries: int, retry_every: Optional[int] = None, backoff: bool = False, exceptions: Optional[Tuple[Exception]] = None) Callable
Function decorator to retry a function on exceptions.
- Parameters:
retries – Number of times to retry
retry_every – Time to wait in seconds between retries. Defaults to no wait time.
backoff – Whether or not to exponential backoff between retries
exceptions – Tuple of exceptions to expect. Defaults to all exceptions.
- ngs_tools.utils.run_executable(command: List[str], stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, wait: bool = True, stream: bool = True, quiet: bool = False, returncode: int = 0, alias: bool = True) Union[subprocess.Popen, Tuple[subprocess.Popen, List[str], List[str]]]
Execute a single shell command.
- Parameters:
command – A list representing a single shell command
stdin – Object to pass into the
stdin
argument for :class:subprocess.Popen
. Defaults to Nonestdout – Object to pass into the stdout argument for :class:
subprocess.Popen
Defaults tosubprocess.PIPE
stderr – Object to pass into the stderr argument for :class:
subprocess.Popen
, Defaults tosubprocess.PIPE
wait – Whether to wait until the command has finished. Defaults to True
stream – Whether to stream the output to the command line. Defaults to True
quiet – Whether to not display anything to the command line and not check the return code. Defaults to False
returncode – The return code expected if the command runs as intended. Defaults to 0
alias – Whether to use the basename of the first element of command. Defaults to True
- Returns:
A tuple of (the spawned process, string printed to stdout, string printed to stderr) if wait=True. Otherwise, just the spawned process.
- Raises:
subprocess.CalledProcessError – If not
quiet
and the process exited with an exit code !=exitcode
- class ngs_tools.utils.ParallelWithProgress(pbar: Optional[tqdm.tqdm] = None, total: Optional[int] = None, desc: Optional[str] = None, disable: bool = False, *args, **kwargs)
Bases:
joblib.Parallel
Wrapper around joblib.Parallel that uses tqdm to print execution progress. Taken from https://stackoverflow.com/a/61900501
- __call__(*args, **kwargs)
- print_progress()
- ngs_tools.utils.is_remote(path: str) bool
Check if a string is a remote URL.
- Parameters:
path – string to check
- Returns:
True or False
- ngs_tools.utils.is_gzip(path: str) bool
Check if a file is Gzipped by checking the magic string.
- Parameters:
path – path to file
- Returns:
True or False
- ngs_tools.utils.open_as_text(path: str, mode: typing_extensions.Literal[r, w]) TextIO
Open a (possibly gzipped) file in text mode.
- Parameters:
path – Path to file
mode – Mode to open file in. Either
r
for read orw
for write.
- Returns:
Opened file pointer that supports
read
andwrite
functions.
- ngs_tools.utils.decompress_gzip(gzip_path: str, out_path: str) str
Decompress a gzip file to provided file path.
- Parameters:
gzip_path – Path to gzip file
out_path – Path to decompressed file
- Returns:
Path to decompressed file
- ngs_tools.utils.compress_gzip(file_path: str, out_path: str) str
Compress a file into gzip.
- Parameters:
file_path – Path to file
out_dir – Path to compressed file
- Returns:
Path to compressed file
- ngs_tools.utils.concatenate_files(*paths: str, out_path: str)
Concatenates an arbitrary number of files into one file.
- Parameters:
*paths – An arbitrary number of paths to files
out_path – Path to place concatenated file
- Returns:
Path to concatenated file
- ngs_tools.utils.concatenate_files_as_text(*paths: str, out_path: str) str
Concatenates an arbitrary number of files into one TEXT file.
Only supports plaintext and gzip files.
- Parameters:
*paths – An arbitrary number of paths to files
out_path – Path to place concatenated file
- Returns:
Path to concatenated file
- class ngs_tools.utils.TqdmUpTo
Bases:
tqdm.tqdm
Wrapper around
tqdm()
so that it can be used withurlretrieve()
. https://github.com/tqdm/tqdm/blob/master/examples/tqdm_wget.py- update_to(b=1, bsize=1, tsize=None)
- ngs_tools.utils.download_file(url: str, path: str) str
Download a remote file to the provided path while displaying a progress bar.
- Parameters:
url – Remote url
path – Local path to download the file to
- Returns:
Path to downloaded file
- ngs_tools.utils.stream_file(url: str, path: str) str
A context manager that creates a FIFO file to use for piping remote files into processes. This function must be used as a context manager (the
with
keyword) so that any exceptions in the streaming thread may be captured.This function spawns a new thread to download the remote file into a FIFO file object. FIFO file objects are only supported on unix systems.
- Parameters:
url – Url to the file
path – Path to place FIFO file
- Yields:
Path to FIFO file
- Raises:
OSError – If the operating system does not support FIFO
- ngs_tools.utils.all_exists(*paths: str) bool
Check whether all provided paths exist.
- Parameters:
*paths – paths to files
- Returns:
True if all files exist, False otherwise
- class ngs_tools.utils.FileWrapper(path: str, mode: typing_extensions.Literal[r, w] = 'r')
Generic wrapper class for file-formats. Used to wrap file-format-specific implementations of reading and writing entries. This class is not designed to be initialized directly. Instead, it should be inherited by children that implements the
read
andwrite
methods appropriately.The file is opened immediately as soon as the class is initialized. This class can also be used as a context manager to safely close the file pointer with a
with
block.- path
Path to the file
- mode
Open mode. Either
r
orw
.
- fp
File pointer
- closed
Whether the file has been closed
- property is_remote: bool
- property is_gzip: bool
- property closed: bool
- __del__()
- __enter__()
- __exit__(*args, **kwargs)
- __iter__()
- _open()
Open the file
- close()
Close the (possibly already-closed) file
- reset()
Reset this wrapper by first closing the file and re-running initialization, which re-opens the file.
- tell() int
Get the current location of the file pointer
- abstract read() Any
Read a single entry. This method must be overridden by children.
- abstract write(entry: Any)
Write a single entry. This method must be overridden by children.
- ngs_tools.utils.mkstemp(dir: Optional[str] = None, delete: bool = False)
Wrapper for
tempfile.mkstemp()
that automatically closes the OS-level file descriptor. This function behaves liketempfile.mkdtemp()
but for files.- Parameters:
dir – Directory to create the temporary file. This value is passed as the
dir
kwarg oftempfile.mkstemp()
. Defaults to None.delete – Whether to delete the temporary file before returning. Defaults to False.
- Returns:
path to the temporary file
- ngs_tools.utils.write_pickle(obj: object, path: str, *args, **kwargs) str
Pickle a Python object and compress with Gzip.
Any additional arguments and keyword arguments are passed to
pickle.dump()
.- Parameters:
obj – Object to pickle
path – Path to save pickle
- Returns:
Saved pickle path
- ngs_tools.utils.read_pickle(path: str) object
Load a Python pickle that was compressed with Gzip.
- Parameters:
path – Path to pickle
- Returns:
Unpickled object
- ngs_tools.utils.flatten_dictionary(d: dict, keys: Optional[tuple] = None) Generator[Tuple[tuple, object], None, None]
Generator that flattens the given dictionary into 2-element tuples containing keys and values. For nested dictionaries, the keys are appended into a tuple.
- Parameters:
d – Dictionary to flatten
keys – Previous keys, defaults to None. Used exclusively for recursion.
- Yields:
Flattened dictionary as (keys, value)
- ngs_tools.utils.flatten_iter(it: Iterable) Generator[object, None, None]
Generator that flattens the given iterable, except for strings.
- Parameters:
lst – Iterable to flatten
- Yields:
Flattened iterable elements
- ngs_tools.utils.merge_dictionaries(d1: dict, d2: dict, f: Callable[[object, object], object] = add, default: object = 0) dict
Merge two dictionaries, applying an arbitrary function f to duplicate keys. Dictionaries may be nested.
- Parameters:
d1 – First dictionary
d2 – Second dictionary
f – Merge function. This function should take two arguments and return one, defaults to +
default – Default value or callable to use for keys not present in either dictionary, defaults to 0
- Returns:
Merged dictionary
- ngs_tools.utils.flatten_dict_values(d: dict) list
Extract all values from a nested dictionary.
- Parameters:
d – Nested dictionary from which to extract values from
- Returns:
All values from the dictionary as a list
- ngs_tools.utils.set_executable(path: str)
Set the permissions of a file to be executable.