gretel package¶
Submodules¶
gretel.gretel module¶
-
gretel.gretel.
append_path
(path, next_m, next_v)[source]¶ Append a selected variant to a given path. .. deprecated:: 1.0
This method is somewhat of a stub. It is likely to be deprecated at no notice in future.Parameters: - path (list{str}) – The current sequence of variants representing a path (haplotype) in progress.
- next_m (str) – The symbol to append to the path.
- next_v (float) – The marginal probability of next_m at the current position.
Raises: Exception
– Raised if next_m is None.
-
gretel.gretel.
generate_path
(n_snps, hansel, original_hansel)[source]¶ Explore and generate the most likely path (haplotype) through the observed Hansel structure.
Parameters: - n_snps (int) – The number of variants.
- hansel (
hansel.hansel.Hansel
) – The Hansel structure currently being explored by Gretel. - original_hansel (
hansel.hansel.Hansel
) – A copy of the Hansel structure created by Gretel, before any reweighting.
Returns: - Path (list{str} or None) – The sequence of variants that represent the completed path (or haplotype), or None if one could not be successfully constructed.
- Path Probabilities (dict{str, float}) – The unweighted (orignal Hansel) and weighted (current Hansel) joint probabilities of the variants in the returned path occurring together in the given order.
- Minimum Marginal (float) – The smallest marginal distribution observed across selected variants.
-
gretel.gretel.
process_bam
(vcf_handler, bam_path, contig_name, start_pos, end_pos, L, use_end_sentinels, n_threads)[source]¶ Initialise a Hansel structure and load variants from a BAM.
Parameters: - vcf_handler (dict{str, any}) – Variant metadata, as provided by
gretel.gretel.process_vcf()
. - bam_path (str) – Path to the alignment BAM.
- contig_name (str) – The name of the contig for which to recover haplotypes.
- start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
- end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
- L (int) – The Gretel L-parameter, controlling the number of positions back from the head of the current path (including the head) to consider when calculating conditional probabilities.
- use_end_sentinels (boolean) – Whether or not to append an additional pairwise observation between the final variant on a read towards a sentinel.
- n_threads (int) – Number of threads to spawn for reading the BAM
Returns: Gretel Metastructure – A collection of structures used for the execution of Gretel. The currently used keys are:
- read_support :
hansel.hansel.Hansel
The Hansel structure.
- read_support_o :
hansel.hansel.Hansel
A copy of the Hansel structure stored with the intention of not reweighting its observations.
- meta : dict{str, any}
A dictionary of metadata returned from the BAM parsing, such as a list of the number of variants that each read spans.
Return type: - vcf_handler (dict{str, any}) – Variant metadata, as provided by
-
gretel.gretel.
process_vcf
(vcf_path, contig_name, start_pos, end_pos)[source]¶ Parse a VCF to extract the genomic positions of called variants.
Parameters: - vcf_path (str) – Path to the VCF file.
- contig_name (str) – Name of the target contig on which variants were called.
- start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
- end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
Returns: Gretel Metastructure – A collection of structures used for the execution of Gretel. The currently used keys are:
- N : int
The number of observed SNPs
- snp_fwd : dict{int, int}
A reverse lookup from the n’th variant, to its genomic position on the contig
- snp_rev : dict{int, int}
A forward lookup to translate the n’th genomic position to its i’th SNP rank
- region : list{int}
A masked representation of the target contig, positive values are variant positions
Return type:
-
gretel.gretel.
reweight_hansel_from_path
(hansel, path, ratio)[source]¶ Given a completed path, reweight the applicable pairwise observations in the Hansel structure.
Parameters: hansel (
hansel.hansel.Hansel
) – The Hansel structure currently being explored by Gretel.path (list{str}) – The ordered sequence of selected variants.
ratio (float) – The proportion of evidence to remove from each paired observation that was considered to recover the provided path.
It is recommended this be the smallest marginal distribution observed across selected variants.
i.e. For each selected variant in the path, note the value of the marginal distribution for the probability of observing that particular variant at that genomic position. Parameterise the minimum value of those marginals.
Returns: Spent Observations – The sum of removed observations from the Hansel structure.
Return type:
gretel.util module¶
-
gretel.util.
load_fasta
(fa_path)[source]¶ A convenient wrapper function for constructing a
pysam.FastaFile
Parameters: fa_path (str) – Path to FASTA Returns: FASTA File Interface Return type: pysam.FastaFile
-
gretel.util.
load_from_bam
(bam_path, target_contig, start_pos, end_pos, vcf_handler, use_end_sentinels=False, n_threads=1)[source]¶ Load variants observed in a
pysam.AlignmentFile
to an instance ofhansel.hansel.Hansel
.Parameters: bam_path (str) – Path to the BAM alignment
target_contig (str) – The name of the contig for which to recover haplotypes.
start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
vcf_handler (dict{str, any}) – Variant metadata, as provided by
gretel.gretel.process_vcf()
.use_end_sentinels (boolean, optional(default=False)) – Whether or not to append an additional pairwise observation between the final variant on a read towards a sentinel.
Note
Experimental This feature is for testing purposes, currently it is recommended that the flag be left at the default of False. However, some data sets report minor performance improvements for some haplotypes when set to True. This flag may be removed at any time without warning.
n_threads (int, optional(default=1)) – Number of threads to spawn for reading the BAM
Returns: Metadata – A dictionary of metadata that may come in useful later. Primarily used to return a list of integers describing the number of variants covered by each read in the provided alignment BAM.
Return type: dict{str, any}