ngs_tools.chemistry.Chemistry
Module Contents
Classes
Definition of a subsequence. This class is used to parse a subsequence out from |
|
Class that uses a collection of |
|
Base class to represent any kind of chemistry. |
|
Generic enumeration. |
|
Base class to represent a sequencing chemistry. |
Attributes
- ngs_tools.chemistry.Chemistry.WHITELISTS_DIR
- exception ngs_tools.chemistry.Chemistry.SubSequenceDefinitionError
Bases:
Exception
Common base class for all non-exit exceptions.
- class ngs_tools.chemistry.Chemistry.SubSequenceDefinition(index: int, start: Optional[int] = None, length: Optional[int] = None)
Definition of a subsequence. This class is used to parse a subsequence out from a list of sequences.
TODO: anchoring
- _length
Length of the subsequence; for internal use only. Use
length
instead.
- property index: int
Sequence index
- property start: Optional[int]
Substring starting position
- property length: Optional[int]
Substring length. None if not provided on initialization.
- is_overlapping(other: SubSequenceDefinition) bool
Whether this subsequence overlaps with another subsequence.
- Parameters:
other – The other
SubSequenceDefinition
instance to compare to- Returns:
True if they are overlapping. False otherwise.
- parse(s: List[str]) str
Parse the given list of strings according to the arguments used to initialize this instance. If
start
andlength
was not provided, then this is simply the entire string at indexindex
. Otherwise, the substring from positionstart
of lengthlength
is extracted from the string at indexindex
.- Parameters:
s – List of strings to parse
- Returns:
The parsed string
- __eq__(other: SubSequenceDefinition)
Return self==value.
- __repr__()
Return repr(self).
- __str__()
Return str(self).
- exception ngs_tools.chemistry.Chemistry.SubSequenceParserError
Bases:
Exception
Common base class for all non-exit exceptions.
- class ngs_tools.chemistry.Chemistry.SubSequenceParser(*definitions: SubSequenceDefinition)
Class that uses a collection of
SubSequenceDefinition
instances to parse an entire subsequence from a list of strings.- _definitions
List of
SubSequenceDefinition
instances; for internal use only.
- property definitions
- is_overlapping(other: SubSequenceParser) bool
Whether this parser overlaps with another parser. Checks all pairwise combinations and returns True if any two
SubSequenceDefinition
instances overlap.- Parameters:
other – The other
SubSequenceParser
instance to compare to- Returns:
True if they are overlapping. False otherwise.
- parse(sequences: List[str], concatenate: bool = False) Union[str, Tuple[str]]
Iteratively constructs a full subsequence by applying each
SubSequenceDefinition
in_definitions
on the list of provided sequences. Ifconcatenate=False
, then this function returns a tuple of length equal to the number of definitions. Each element of the tuple is a string that was parsed by each definition. Otherwise, all the parsed strings are concatenated into a single string.- Parameters:
sequences – List of sequences to parse
concatenate – Whether or not to concatenate the parsed strings. Defaults to False.
- Returns:
Concatenated parsed sequence (if
concatenate=True
). Otherwise, a tuple of parsed strings.
- parse_reads(reads: List[ngs_tools.fastq.Read.Read], concatenate: bool = False) Tuple[Union[str, Tuple[str]], Union[str, Tuple[str]]]
Behaves identically to
parse()
, but instead on a list ofngs_tools.fastq.Read
instances.parse()
is called on the read sequences and qualities separately.- Parameters:
reads – List of reads to parse
concatenate – Whether or not to concatenate the parsed strings. Defaults to False.
- Returns:
Parsed sequence from read sequences Parsed sequence from quality sequences
- __eq__(other: SubSequenceParser)
Check whether this parser equals another. The order of definitions must also be equal.
- __iter__()
- __len__()
- __getitem__(i)
- __repr__()
Return repr(self).
- __str__()
Return str(self).
- exception ngs_tools.chemistry.Chemistry.ChemistryError
Bases:
Exception
Common base class for all non-exit exceptions.
- class ngs_tools.chemistry.Chemistry.Chemistry(name: str, description: str, files: Optional[Dict[str, str]] = None)
Bases:
abc.ABC
Base class to represent any kind of chemistry.
- _description
Chemistry description; for internal use only. Use
description
instead.
- _files
Dictionary containing files related to this chemistry. For internal use only.
- property name: str
Chemistry name
- property description: str
Chemistry description
- get_file(name: str) bool
Get a file path by its name
- class ngs_tools.chemistry.Chemistry.SequencingStrand
Bases:
enum.Enum
Generic enumeration.
Derive from this class to define new enumerations.
- UNSTRANDED = 0
- FORWARD = 1
- REVERSE
- class ngs_tools.chemistry.Chemistry.SequencingChemistry(n: int, strand: SequencingStrand, parsers: Dict[str, SubSequenceParser], *args, **kwargs)
Bases:
Chemistry
Base class to represent a sequencing chemistry.
- property n: int
Number of sequences to parse at once
- property parsers: Dict[str, SubSequenceParser]
Retrieve a copy of the
_parsers
dictionary.
- property strand: SequencingStrand
Retrieve the strandedness of the chemistry.
- property lengths: Tuple[int, Ellipsis]
The expected length for each sequence, based on
parsers
. None indicates any length is expected.
- property has_barcode: bool
- abstract property barcode_parser: SubSequenceParser
- property has_umi: bool
- abstract property umi_parser: SubSequenceParser
- property has_whitelist: bool
- abstract property whitelist_path
- get_parser(name: str) SubSequenceParser
Get a
SubSequenceParser
by its name
- has_parser(name: str) bool
Whether
_parsers
contains a parser with the specified name
- reorder(reordering: List[int]) Chemistry
Reorder the file indices according to the
reordering
list. This list reorders the file at each index to the value at that index.- Parameters:
reordering – List containing how to reorder file indices, where the file at index
i
of this index will now be at indexreordering[i]
.- Returns:
A new
Chemistry
instance (or the subclass)
- parse(sequences: List[str], concatenate: bool = False) Dict[str, Union[str, Tuple[str]]]
Parse a list of strings using the parsers in
_parsers
and return a dictionary with keys corresponding to those in_parsers
.- Parameters:
sequences – List of strings
concatenate – Whether or not to concatenate the parsed strings. Defaults to False.
- Returns:
Dictionary containing parsed strings
- Raises:
ChemistryError – If the number sequences does not equal
n
- parse_reads(reads: List[ngs_tools.fastq.Read.Read], concatenate: bool = False, check_name: bool = True) Dict[str, Tuple[Union[str, Tuple[str]], Union[str, Tuple[str]]]]
Behaves identically to
parse()
but on a list ofngs_tools.fastq.Read
instances. The resulting dictionary contains tuple values, where the first element corresponds to the parsed read sequences, while the second corresponds to the parsed quality strings.- Parameters:
reads – List of
ngs_tools.fastq.Read
instancesconcatenate – Whether or not to concatenate the parsed strings. Defaults to False.
check_name – If True, raises
ChemistryError
if all the reads do not have the same name. Defaults to True.
- Returns:
Dictionary containing tuples of parsed read sequences and quality strings
- Raises:
ChemistryError – If the number sequences does not equal
n
, orcheck_name=True
and not all reads have the same name.
- __str__()
Return str(self).
- __repr__()
Return repr(self).
- to_kallisto_bus_arguments() Dict[str, str]
Convert this spatial chemistry definition to arguments that can be used as input to kallisto bus. https://www.kallistobus.tools/
- Returns:
A Dictionary of arguments-to-value mappings. For this particular function, the dictionary has a single -x key and the value is a custom technology definition string, as specified in the kallisto manual.
- to_starsolo_arguments() Dict[str, str]
Converts this spatial chemistry definition to arguments that can be used as input to STARsolo. https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md
- Returns:
A Dictionary of arguments-to-value mappings.