ngs_tools.bam
Module Contents
Functions
|
Generator to map an arbitrary function to every read and return its return |
|
Apply an arbitrary function to every read in a BAM. Reads for which the |
|
Count the number of BAM entries. Optionally, a function may be provided to |
|
Split a BAM into many parts, either by the number of reads or by an |
|
Add tags to BAM entries using sequences from one or more FASTQ files. |
|
Filter a BAM by applying the given function to each |
- exception ngs_tools.bam.BamError
Bases:
Exception
Common base class for all non-exit exceptions.
- ngs_tools.bam.map_bam(bam_path: str, map_func: Callable[[pysam.AlignedSegment], Any], n_threads: int = 1, show_progress: bool = False)
Generator to map an arbitrary function to every read and return its return values.
- Parameters:
bam_path – Path to the BAM file
map_func – Function that takes a
pysam.AlignedSegment
object and returns some valuen_threads – Number of threads to use. Defaults to 1.
show_progress – Whether to display a progress bar. Defaults to False.
- Yields:
map_func
applied to each read in the BAM file
- ngs_tools.bam.apply_bam(bam_path: str, apply_func: Callable[[pysam.AlignedSegment], Optional[pysam.AlignedSegment]], out_path: str, n_threads: int = 1, show_progress: bool = False)
Apply an arbitrary function to every read in a BAM. Reads for which the function returns None are not written to the output BAM.
- Parameters:
bam_path – Path to the BAM file
apply_func – Function that takes a
pysam.AlignedSegment
object and optionally returnspysam.AlignedSegment
objectsout_path – Path to output BAM file
n_threads – Number of threads to use. Defaults to 1.
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Path to written BAM
- ngs_tools.bam.count_bam(bam_path: str, filter_func: Optional[Callable[[pysam.AlignedSegment], bool]] = None, n_threads: int = 1, show_progress: bool = False) int
Count the number of BAM entries. Optionally, a function may be provided to only count certain alignments.
- Parameters:
bam_path – Path to BAM
filter_func – Function that takes a
pysam.AlignedSegment
object and returns True for reads to be counted and False otherwisen_threads – Number of threads to use. Defaults to 1.
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Number of alignments in BAM
- ngs_tools.bam.split_bam(bam_path: str, split_prefix: str, split_func: Optional[Callable[[pysam.AlignedSegment], str]] = None, n: Optional[int] = None, n_threads: int = 1, check_pair_groups: bool = True, show_progress: bool = False) Dict[str, Tuple[str, int]]
Split a BAM into many parts, either by the number of reads or by an arbitrary function. Only one of
split_func
orn
must be provided. Read pairs are always written to the same file.This function makes two passes through the BAM file. The first pass is to identify which reads must be written together (i.e. are pairs). The second pass is to actually extract the reads and write them to the appropriate split.
The following procedure is used to identify pairs. 1) The
.is_paired
property is checked to be True. 2) If the read is uanligned, at most one other unaligned read with the sameread name is allowed to be in the BAM. This other read is its mate. If the read is aligned, it should have the
HI
BAM tag indicating the alignment index. If noHI
tag is present, then it is assumed only one alignment should be present for each read pair. If any of these constraints are not met, an exception is raised.- Parameters:
bam_path – Path to the BAM file
split_prefix – File path prefix to all the split BAMs
split_func – Function that takes a
pysam.AlignedSegment
object and returns a string ID that is used to group reads into splits. All reads with a given ID will be written to a single BAM. Defaults to None.n – Number of BAMs to split into. Defaults to None.
n_threads – Number of threads to use. Only affects reading. Writing is still serialized. Defaults to 1.
check_pair_groups – When using
split_func
, make sure that paired reads are assigned the same ID (and thus are split into the same BAM). Defaults to True.show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Dictionary of tuples, where the first element is the path to a split BAM, and the second element is the number of BAM entries written to that split. The keys are either the string ID of each split (if
split_func
is used) or the split index (ifn
is used), and the values are paths.- Raises:
BamError – If any pair constraints are not met.
- ngs_tools.bam.tag_bam_with_fastq(bam_path: str, fastq_path: Union[str, List[str]], tag_func: Union[Callable[[ngs_tools.fastq.Read], dict], List[Callable[[ngs_tools.fastq.Read], dict]]], out_path: str, check_name: bool = True, n_threads: int = 1, show_progress: bool = False)
Add tags to BAM entries using sequences from one or more FASTQ files.
Internally, this function calls
apply_bam()
.Note
The tag keys generated from tag_func must contain unique keys of at most 2 characters.
- Parameters:
bam_path – Path to the BAM file
fastq_path – Path to FASTQ file. This option may be a list to extract tags from multiple FASTQ files. In this case, tag_func must also be a list of functions.
tag_func – Function that takes a
ngs_tools.fastq.Read
object and returns a dictionary of tags. When multiple FASTQs are being parsed simultaneously, each function needs to produce unique dictionary keys. Additionally, BAM tag keys may only be at most 2 characters. However, neither of these conditions are checked in favor of runtime.out_path – Path to output BAM file
check_name – Whether or not to raise a
BamError
if the FASTQ does not contain a read in the BAMn_threads – Number of threads to use. Defaults to 1.
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Path to written BAM
- Raises:
BamError – If only one of fastq_path and tag_func is a list, if both are lists but they have different lengths, if check_name=True but there are missing tags.
- ngs_tools.bam.filter_bam(bam_path: str, filter_func: Callable[[pysam.AlignedSegment], bool], out_path: str, n_threads: int = 1, show_progress: bool = False)
Filter a BAM by applying the given function to each
pysam.AlignedSegment
object. When the function returns False, the read is not written to the output BAM.Internally, this function calls
apply_bam()
.- Parameters:
bam_path – Path to the BAM file
filter_func – Function that takes a
pysam.AlignedSegment
object and returns False for reads to be filtered outout_path – Path to output BAM file
n_threads – Number of threads to use. Defaults to 1.
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Path to written BAM