ngs_tools.chemistry.Chemistry

Module Contents

Classes

SubSequenceDefinition

Definition of a subsequence. This class is used to parse a subsequence out from

SubSequenceParser

Class that uses a collection of SubSequenceDefinition instances to parse

Chemistry

Base class to represent any kind of chemistry.

SequencingStrand

Generic enumeration.

SequencingChemistry

Base class to represent a sequencing chemistry.

Attributes

WHITELISTS_DIR

ngs_tools.chemistry.Chemistry.WHITELISTS_DIR
exception ngs_tools.chemistry.Chemistry.SubSequenceDefinitionError

Bases: Exception

Common base class for all non-exit exceptions.

class ngs_tools.chemistry.Chemistry.SubSequenceDefinition(index: int, start: Optional[int] = None, length: Optional[int] = None)

Definition of a subsequence. This class is used to parse a subsequence out from a list of sequences.

TODO: anchoring

_index

Sequence index to use (from a list of sequences); for internal use only. Use index instead.

_start

Starting position of the subsequence; for internal use only. Use start instead.

_length

Length of the subsequence; for internal use only. Use length instead.

property index: int

Sequence index

property start: Optional[int]

Substring starting position

property end: Optional[int]

Substring end position. None if start or length is None.

property length: Optional[int]

Substring length. None if not provided on initialization.

is_overlapping(other: SubSequenceDefinition) bool

Whether this subsequence overlaps with another subsequence.

Parameters:

other – The other SubSequenceDefinition instance to compare to

Returns:

True if they are overlapping. False otherwise.

parse(s: List[str]) str

Parse the given list of strings according to the arguments used to initialize this instance. If start and length was not provided, then this is simply the entire string at index index. Otherwise, the substring from position start of length length is extracted from the string at index index.

Parameters:

s – List of strings to parse

Returns:

The parsed string

__eq__(other: SubSequenceDefinition)

Return self==value.

__repr__()

Return repr(self).

__str__()

Return str(self).

exception ngs_tools.chemistry.Chemistry.SubSequenceParserError

Bases: Exception

Common base class for all non-exit exceptions.

class ngs_tools.chemistry.Chemistry.SubSequenceParser(*definitions: SubSequenceDefinition)

Class that uses a collection of SubSequenceDefinition instances to parse an entire subsequence from a list of strings.

_definitions

List of SubSequenceDefinition instances; for internal use only.

property definitions
is_overlapping(other: SubSequenceParser) bool

Whether this parser overlaps with another parser. Checks all pairwise combinations and returns True if any two SubSequenceDefinition instances overlap.

Parameters:

other – The other SubSequenceParser instance to compare to

Returns:

True if they are overlapping. False otherwise.

parse(sequences: List[str], concatenate: bool = False) Union[str, Tuple[str]]

Iteratively constructs a full subsequence by applying each SubSequenceDefinition in _definitions on the list of provided sequences. If concatenate=False, then this function returns a tuple of length equal to the number of definitions. Each element of the tuple is a string that was parsed by each definition. Otherwise, all the parsed strings are concatenated into a single string.

Parameters:
  • sequences – List of sequences to parse

  • concatenate – Whether or not to concatenate the parsed strings. Defaults to False.

Returns:

Concatenated parsed sequence (if concatenate=True). Otherwise, a tuple of parsed strings.

parse_reads(reads: List[ngs_tools.fastq.Read.Read], concatenate: bool = False) Tuple[Union[str, Tuple[str]], Union[str, Tuple[str]]]

Behaves identically to parse(), but instead on a list of ngs_tools.fastq.Read instances. parse() is called on the read sequences and qualities separately.

Parameters:
  • reads – List of reads to parse

  • concatenate – Whether or not to concatenate the parsed strings. Defaults to False.

Returns:

Parsed sequence from read sequences Parsed sequence from quality sequences

__eq__(other: SubSequenceParser)

Check whether this parser equals another. The order of definitions must also be equal.

__iter__()
__len__()
__getitem__(i)
__repr__()

Return repr(self).

__str__()

Return str(self).

exception ngs_tools.chemistry.Chemistry.ChemistryError

Bases: Exception

Common base class for all non-exit exceptions.

class ngs_tools.chemistry.Chemistry.Chemistry(name: str, description: str, files: Optional[Dict[str, str]] = None)

Bases: abc.ABC

Base class to represent any kind of chemistry.

_name

Chemistry name; for internal use only. Use name instead.

_description

Chemistry description; for internal use only. Use description instead.

_files

Dictionary containing files related to this chemistry. For internal use only.

property name: str

Chemistry name

property description: str

Chemistry description

has_file(name: str) bool

Whether _files contains a file with the specified name

get_file(name: str) bool

Get a file path by its name

class ngs_tools.chemistry.Chemistry.SequencingStrand

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

UNSTRANDED = 0
FORWARD = 1
REVERSE
class ngs_tools.chemistry.Chemistry.SequencingChemistry(n: int, strand: SequencingStrand, parsers: Dict[str, SubSequenceParser], *args, **kwargs)

Bases: Chemistry

Base class to represent a sequencing chemistry.

property n: int

Number of sequences to parse at once

property parsers: Dict[str, SubSequenceParser]

Retrieve a copy of the _parsers dictionary.

property strand: SequencingStrand

Retrieve the strandedness of the chemistry.

property lengths: Tuple[int, Ellipsis]

The expected length for each sequence, based on parsers. None indicates any length is expected.

property has_barcode: bool
abstract property barcode_parser: SubSequenceParser
property has_umi: bool
abstract property umi_parser: SubSequenceParser
property has_whitelist: bool
abstract property whitelist_path
get_parser(name: str) SubSequenceParser

Get a SubSequenceParser by its name

has_parser(name: str) bool

Whether _parsers contains a parser with the specified name

reorder(reordering: List[int]) Chemistry

Reorder the file indices according to the reordering list. This list reorders the file at each index to the value at that index.

Parameters:

reordering – List containing how to reorder file indices, where the file at index i of this index will now be at index reordering[i].

Returns:

A new Chemistry instance (or the subclass)

parse(sequences: List[str], concatenate: bool = False) Dict[str, Union[str, Tuple[str]]]

Parse a list of strings using the parsers in _parsers and return a dictionary with keys corresponding to those in _parsers.

Parameters:
  • sequences – List of strings

  • concatenate – Whether or not to concatenate the parsed strings. Defaults to False.

Returns:

Dictionary containing parsed strings

Raises:

ChemistryError – If the number sequences does not equal n

parse_reads(reads: List[ngs_tools.fastq.Read.Read], concatenate: bool = False, check_name: bool = True) Dict[str, Tuple[Union[str, Tuple[str]], Union[str, Tuple[str]]]]

Behaves identically to parse() but on a list of ngs_tools.fastq.Read instances. The resulting dictionary contains tuple values, where the first element corresponds to the parsed read sequences, while the second corresponds to the parsed quality strings.

Parameters:
  • reads – List of ngs_tools.fastq.Read instances

  • concatenate – Whether or not to concatenate the parsed strings. Defaults to False.

  • check_name – If True, raises ChemistryError if all the reads do not have the same name. Defaults to True.

Returns:

Dictionary containing tuples of parsed read sequences and quality strings

Raises:

ChemistryError – If the number sequences does not equal n, or check_name=True and not all reads have the same name.

__eq__(other: Chemistry)

Check the equality of two chemistries by comparing each parser.

__str__()

Return str(self).

__repr__()

Return repr(self).

to_kallisto_bus_arguments() Dict[str, str]

Convert this spatial chemistry definition to arguments that can be used as input to kallisto bus. https://www.kallistobus.tools/

Returns:

A Dictionary of arguments-to-value mappings. For this particular function, the dictionary has a single -x key and the value is a custom technology definition string, as specified in the kallisto manual.

to_starsolo_arguments() Dict[str, str]

Converts this spatial chemistry definition to arguments that can be used as input to STARsolo. https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md

Returns:

A Dictionary of arguments-to-value mappings.