ngs_tools.fasta

Submodules

Package Contents

Functions

split_genomic_fasta_to_cdna(→ str)

Split a genomic FASTA into cDNA by using gene and transcript information

split_genomic_fasta_to_intron(→ str)

Split a genomic FASTA into introns by using gene and transcript information

split_genomic_fasta_to_nascent(→ str)

Split a genomic FASTA into nascent transcripts by using gene information

ngs_tools.fasta.split_genomic_fasta_to_cdna(fasta_path: str, out_path: str, gene_infos: dict, transcript_infos: dict, show_progress: bool = False) str

Split a genomic FASTA into cDNA by using gene and transcript information generated from extracting information from a GTF.

Parameters:
Returns:

Path to written FASTA

ngs_tools.fasta.split_genomic_fasta_to_intron(fasta_path: str, out_path: str, gene_infos: dict, transcript_infos: dict, flank: int = 30, show_progress: bool = False) str

Split a genomic FASTA into introns by using gene and transcript information generated from extracting information from a GTF. Optionally append flanking sequences and collapse introns that have overlapping flanking regions.

Parameters:
  • fasta_path – Path to FASTA containing genomic sequences

  • out_path – Path to output FASTA that will contain cDNA sequences

  • gene_infos – Dictionary containing gene information, as returned by ngs_tools.gtf.genes_and_transcripts_from_gtf()

  • transcript_infos – Dictionary containing transcript information, as returned by ngs_tools.gtf.genes_and_transcripts_from_gtf()

  • flank – Number of flanking bases to include for each intron. Defaults to 30.

  • show_progress – Whether to display a progress bar. Defaults to False.

Returns:

Path to written FASTA

ngs_tools.fasta.split_genomic_fasta_to_nascent(fasta_path: str, out_path: str, gene_infos: dict, suffix='', show_progress: bool = False) str

Split a genomic FASTA into nascent transcripts by using gene information generated from extracting information from a GTF.

Parameters:
  • fasta_path – Path to FASTA containing genomic sequences

  • out_path – Path to output FASTA that will contain cDNA sequences

  • gene_infos – Dictionary containing gene information, as returned by ngs_tools.gtf.genes_and_transcripts_from_gtf()

  • suffix – Suffix to append to output FASTA entry names. Defaults to “”.

  • show_progress – Whether to display a progress bar. Defaults to False.

Returns:

Path to written FASTA