gretel package

Submodules

gretel.cmd module

gretel.cmd.main()[source]

Gretel: A metagenomic haplotyper.

gretel.gretel module

gretel.gretel.append_path(path, next_m, next_v)[source]

Append a selected variant to a given path. .. deprecated:: 1.0

This method is somewhat of a stub. It is likely to be deprecated at no notice in future.
Parameters:
  • path (list{str}) – The current sequence of variants representing a path (haplotype) in progress.
  • next_m (str) – The symbol to append to the path.
  • next_v (float) – The marginal probability of next_m at the current position.
Raises:

Exception – Raised if next_m is None.

gretel.gretel.generate_path(n_snps, hansel, original_hansel)[source]

Explore and generate the most likely path (haplotype) through the observed Hansel structure.

Parameters:
  • n_snps (int) – The number of variants.
  • hansel (hansel.hansel.Hansel) – The Hansel structure currently being explored by Gretel.
  • original_hansel (hansel.hansel.Hansel) – A copy of the Hansel structure created by Gretel, before any reweighting.
Returns:

  • Path (list{str} or None) – The sequence of variants that represent the completed path (or haplotype), or None if one could not be successfully constructed.
  • Path Probabilities (dict{str, float}) – The unweighted (orignal Hansel) and weighted (current Hansel) joint probabilities of the variants in the returned path occurring together in the given order.
  • Minimum Marginal (float) – The smallest marginal distribution observed across selected variants.

gretel.gretel.process_bam(vcf_handler, bam_path, contig_name, start_pos, end_pos, L, use_end_sentinels, n_threads)[source]

Initialise a Hansel structure and load variants from a BAM.

Parameters:
  • vcf_handler (dict{str, any}) – Variant metadata, as provided by gretel.gretel.process_vcf().
  • bam_path (str) – Path to the alignment BAM.
  • contig_name (str) – The name of the contig for which to recover haplotypes.
  • start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
  • end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
  • L (int) – The Gretel L-parameter, controlling the number of positions back from the head of the current path (including the head) to consider when calculating conditional probabilities.
  • use_end_sentinels (boolean) – Whether or not to append an additional pairwise observation between the final variant on a read towards a sentinel.
  • n_threads (int) – Number of threads to spawn for reading the BAM
Returns:

Gretel Metastructure – A collection of structures used for the execution of Gretel. The currently used keys are:

read_support : hansel.hansel.Hansel

The Hansel structure.

read_support_o : hansel.hansel.Hansel

A copy of the Hansel structure stored with the intention of not reweighting its observations.

meta : dict{str, any}

A dictionary of metadata returned from the BAM parsing, such as a list of the number of variants that each read spans.

Return type:

dict

gretel.gretel.process_vcf(vcf_path, contig_name, start_pos, end_pos)[source]

Parse a VCF to extract the genomic positions of called variants.

Parameters:
  • vcf_path (str) – Path to the VCF file.
  • contig_name (str) – Name of the target contig on which variants were called.
  • start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
  • end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
Returns:

Gretel Metastructure – A collection of structures used for the execution of Gretel. The currently used keys are:

N : int

The number of observed SNPs

snp_fwd : dict{int, int}

A reverse lookup from the n’th variant, to its genomic position on the contig

snp_rev : dict{int, int}

A forward lookup to translate the n’th genomic position to its i’th SNP rank

region : list{int}

A masked representation of the target contig, positive values are variant positions

Return type:

dict

gretel.gretel.reweight_hansel_from_path(hansel, path, ratio)[source]

Given a completed path, reweight the applicable pairwise observations in the Hansel structure.

Parameters:
  • hansel (hansel.hansel.Hansel) – The Hansel structure currently being explored by Gretel.

  • path (list{str}) – The ordered sequence of selected variants.

  • ratio (float) – The proportion of evidence to remove from each paired observation that was considered to recover the provided path.

    It is recommended this be the smallest marginal distribution observed across selected variants.

    i.e. For each selected variant in the path, note the value of the marginal distribution for the probability of observing that particular variant at that genomic position. Parameterise the minimum value of those marginals.

Returns:

Spent Observations – The sum of removed observations from the Hansel structure.

Return type:

float

gretel.util module

gretel.util.load_fasta(fa_path)[source]

A convenient wrapper function for constructing a pysam.FastaFile

Parameters:fa_path (str) – Path to FASTA
Returns:FASTA File Interface
Return type:pysam.FastaFile
gretel.util.load_from_bam(bam_path, target_contig, start_pos, end_pos, vcf_handler, use_end_sentinels=False, n_threads=1)[source]

Load variants observed in a pysam.AlignmentFile to an instance of hansel.hansel.Hansel.

Parameters:
  • bam_path (str) – Path to the BAM alignment

  • target_contig (str) – The name of the contig for which to recover haplotypes.

  • start_pos (int) – The 1-indexed genomic position from which to begin considering variants.

  • end_pos (int) – The 1-indexed genomic position at which to stop considering variants.

  • vcf_handler (dict{str, any}) – Variant metadata, as provided by gretel.gretel.process_vcf().

  • use_end_sentinels (boolean, optional(default=False)) – Whether or not to append an additional pairwise observation between the final variant on a read towards a sentinel.

    Note

    Experimental This feature is for testing purposes, currently it is recommended that the flag be left at the default of False. However, some data sets report minor performance improvements for some haplotypes when set to True. This flag may be removed at any time without warning.

  • n_threads (int, optional(default=1)) – Number of threads to spawn for reading the BAM

Returns:

Metadata – A dictionary of metadata that may come in useful later. Primarily used to return a list of integers describing the number of variants covered by each read in the provided alignment BAM.

Return type:

dict{str, any}

Module contents