gretel package¶
Submodules¶
gretel.gretel module¶
-
gretel.gretel.
generate_path
(n_snps, hansel, original_hansel, debug_hpos=None)[source]¶ Explore and generate the most likely path (haplotype) through the observed Hansel structure.
Parameters: - n_snps (int) – The number of variants.
- hansel (
hansel.hansel.Hansel
) – The Hansel structure currently being explored by Gretel. - original_hansel (
hansel.hansel.Hansel
) – A copy of the Hansel structure created by Gretel, before any reweighting.
Returns: - Path (list{str} or None) – The sequence of variants that represent the completed path (or haplotype), or None if one could not be successfully constructed.
- Path Probabilities (dict{str, float}) – The hp_original (orignal Hansel) and hp_current (current Hansel) joint probabilities of the variants in the returned path occurring together in the given order.
- Minimum Marginal (float) – The smallest marginal distribution observed across selected variants.
-
gretel.gretel.
reweight_hansel_from_path
(hansel, path, ratio)[source]¶ Given a completed path, reweight the applicable pairwise observations in the Hansel structure.
Parameters: hansel (
hansel.hansel.Hansel
) – The Hansel structure currently being explored by Gretel.path (list{str}) – The ordered sequence of selected variants.
ratio (float) – The proportion of evidence to remove from each paired observation that was considered to recover the provided path.
It is recommended this be the smallest marginal distribution observed across selected variants.
i.e. For each selected variant in the path, note the value of the marginal distribution for the probability of observing that particular variant at that genomic position. Parameterise the minimum value of those marginals.
Returns: Spent Observations – The sum of removed observations from the Hansel structure.
Return type:
gretel.util module¶
-
gretel.util.
get_ref_len_from_bam
(bam_path, target_contig)[source]¶ Fetch the length of a given reference sequence from a
pysam.AlignmentFile
.Parameters: - bam_path (str) – Path to the BAM alignment
- target_contig (str) – The name of the contig for which to recover haplotypes.
Returns: end_pos – The 1-indexed genomic position at which to stop considering variants.
Return type:
-
gretel.util.
load_fasta
(fa_path)[source]¶ A convenient wrapper function for constructing a
pysam.FastaFile
Parameters: fa_path (str) – Path to FASTA Returns: FASTA File Interface Return type: pysam.FastaFile
-
gretel.util.
load_from_bam
(bam_path, target_contig, start_pos, end_pos, vcf_handler, use_end_sentinels=False, n_threads=1, debug_reads=False, debug_pos=False, stepper='samtools')[source]¶ Load variants observed in a
pysam.AlignmentFile
to an instance ofhansel.hansel.Hansel
.Parameters: bam_path (str) – Path to the BAM alignment
target_contig (str) – The name of the contig for which to recover haplotypes.
start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
vcf_handler (dict{str, any}) – Variant metadata, as provided by
gretel.gretel.process_vcf()
.use_end_sentinels (boolean, optional(default=False)) – Whether or not to append an additional pairwise observation between the final variant on a read towards a sentinel.
Note
Experimental This feature is for testing purposes, currently it is recommended that the flag be left at the default of False. However, some data sets report minor performance improvements for some haplotypes when set to True. This flag may be removed at any time without warning.
n_threads (int, optional(default=1)) – Number of threads to spawn for reading the BAM
debug_reads (list{str}, optional) – A list of read names for which to print out debugging information
debug_pos (list{int}, optional) – A list of positions for which to print out debugging information
stepper (str, optional(default=samtools)) – The pysam pileup stepper to use
Returns: Hansel
Return type:
-
gretel.util.
process_vcf
(vcf_path, contig_name, start_pos, end_pos)[source]¶ Parse a VCF to extract the genomic positions of called variants.
Parameters: - vcf_path (str) – Path to the VCF file.
- contig_name (str) – Name of the target contig on which variants were called.
- start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
- end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
Returns: Gretel Metastructure – A collection of structures used for the execution of Gretel. The currently used keys are:
- N : int
The number of observed SNPs
- snp_fwd : dict{int, int}
A reverse lookup from the n’th variant, to its genomic position on the contig
- snp_rev : dict{int, int}
A forward lookup to translate the n’th genomic position to its i’th SNP rank
- region : list{int}
A masked representation of the target contig, positive values are variant positions
Return type: