ngs_tools.fasta
Submodules
Package Contents
Functions
|
Split a genomic FASTA into cDNA by using gene and transcript information |
Split a genomic FASTA into introns by using gene and transcript information |
|
Split a genomic FASTA into nascent transcripts by using gene information |
- ngs_tools.fasta.split_genomic_fasta_to_cdna(fasta_path: str, out_path: str, gene_infos: dict, transcript_infos: dict, show_progress: bool = False) str
Split a genomic FASTA into cDNA by using gene and transcript information generated from extracting information from a GTF.
- Parameters:
fasta_path – Path to FASTA containing genomic sequences
out_path – Path to output FASTA that will contain cDNA sequences
gene_infos – Dictionary containing gene information, as returned by
ngs_tools.gtf.genes_and_transcripts_from_gtf()
transcript_infos – Dictionary containing transcript information, as returned by
ngs_tools.gtf.genes_and_transcripts_from_gtf()
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Path to written FASTA
- ngs_tools.fasta.split_genomic_fasta_to_intron(fasta_path: str, out_path: str, gene_infos: dict, transcript_infos: dict, flank: int = 30, show_progress: bool = False) str
Split a genomic FASTA into introns by using gene and transcript information generated from extracting information from a GTF. Optionally append flanking sequences and collapse introns that have overlapping flanking regions.
- Parameters:
fasta_path – Path to FASTA containing genomic sequences
out_path – Path to output FASTA that will contain cDNA sequences
gene_infos – Dictionary containing gene information, as returned by
ngs_tools.gtf.genes_and_transcripts_from_gtf()
transcript_infos – Dictionary containing transcript information, as returned by
ngs_tools.gtf.genes_and_transcripts_from_gtf()
flank – Number of flanking bases to include for each intron. Defaults to 30.
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Path to written FASTA
- ngs_tools.fasta.split_genomic_fasta_to_nascent(fasta_path: str, out_path: str, gene_infos: dict, suffix='', show_progress: bool = False) str
Split a genomic FASTA into nascent transcripts by using gene information generated from extracting information from a GTF.
- Parameters:
fasta_path – Path to FASTA containing genomic sequences
out_path – Path to output FASTA that will contain cDNA sequences
gene_infos – Dictionary containing gene information, as returned by
ngs_tools.gtf.genes_and_transcripts_from_gtf()
suffix – Suffix to append to output FASTA entry names. Defaults to “”.
show_progress – Whether to display a progress bar. Defaults to False.
- Returns:
Path to written FASTA