ngs_tools.fastq

Submodules

Package Contents

Functions

fastq_to_bam(→ str)

Convert a Fastq to unmapped BAM.

fastqs_to_bam(→ str)

Convert FASTQs to an unmapped BAM according to an arbitrary function.

fastqs_to_bam_with_chemistry(→ str)

Convert FASTQs to an unmapped BAM according to the provided

ngs_tools.fastq.fastq_to_bam(fastq_path: str, bam_path: str, name: Optional[str] = None, n_threads: int = 1, show_progress: bool = False) str

Convert a Fastq to unmapped BAM.

Parameters:
  • fastq_path – Path to the input FASTQ

  • bam_path – Path to the output BAM

  • name – Name for this set of reads. Defaults to None. If not provided, a random string is generated by calling shortuuid.uuid(). This value is added as the read group (RG tag) for all the reads in the BAM.

  • n_threads – Number of threads to use. Defaults to 1.

  • show_progress – Whether to display a progress bar. Defaults to False.

Returns:

Path to BAM

ngs_tools.fastq.fastqs_to_bam(fastq_paths: List[str], parse_func: Callable[[Tuple[Read.Read, Ellipsis], pysam.AlignmentHeader], pysam.AlignedSegment], bam_path: str, name: Optional[str] = None, n_threads: int = 1, show_progress: bool = False) str

Convert FASTQs to an unmapped BAM according to an arbitrary function.

Parameters:
  • fastq_paths – List of FASTQ paths.

  • parse_func – Function that accepts a tuple of ngs_tools.fastq.Read objects (one from each FASTQ) and a pysam.AlignmentHeader object as the second argument, and returns a new pysam.AlignedSegment object to write into the BAM. Note that the second argument must be used for the header argument when initializing the new pysam.AlignedSegment. Whenever this function returns None, the read will not be written to the BAM.

  • name – Name for this set of reads. Defaults to None. If not provided, a random string is generated by calling shortuuid.uuid(). This value is added as the read group (RG tag) for all the reads in the BAM.

  • bam_path – Path to the output BAM

  • n_threads – Number of threads to use. Defaults to 1.

  • show_progress – Whether to display a progress bar. Defaults to False.

Returns:

Path to BAM

ngs_tools.fastq.fastqs_to_bam_with_chemistry(fastq_paths: List[str], chemistry: ngs_tools.chemistry.Chemistry, tag_map: Dict[str, Tuple[str, str]], bam_path: str, name: Optional[str] = None, sequence_key: str = 'cdna', n_threads: int = 1, show_progress: bool = False) str

Convert FASTQs to an unmapped BAM according to the provided ngs_tools.chemistry.Chemistry instance.

Note that any split features (i.e. split barcode where barcode is in multiple positions) are concatenated.

Parameters:
  • fastq_paths – List of FASTQ paths. The order must match that of the chemistry.

  • chemistryngs_tools.chemistry.Chemistry instance to use to parse the reads.

  • tag_map – Mapping of parser names to their corresponding BAM tags. The keys are the parser names, and the values must be a tuple of (sequence BAM tag, quality BAM tag), where the former is the tag that will be used for the nucleotide sequence, and the latter is the tag that will be used for the quality scores.

  • bam_path – Path to the output BAM

  • name – Name for this set of reads. Defaults to None. If not provided, a random string is generated by calling shortuuid.uuid(). This value is added as the read group (RG tag) for all the reads in the BAM.

  • sequence_key – Parser key to use as the actual alignment sequence. Defaults to cdna.

  • n_threads – Number of threads to use. Defaults to 1.

  • show_progress – Whether to display a progress bar. Defaults to False.

Returns:

Path to BAM

Raises:

FastqError – If the number of FASTQs provided does not meet the number required for the specified chemistry, if the tag map provides keys that do not exist for the chemistry, or if the tag map contains multiple BAM tags.