gretel package

Submodules

gretel.cmd module

gretel.cmd.main()[source]

Gretel: A metagenomic haplotyper.

gretel.gretel module

gretel.gretel.generate_path(n_snps, hansel, original_hansel, debug_hpos=None)[source]

Explore and generate the most likely path (haplotype) through the observed Hansel structure.

Parameters:
  • n_snps (int) – The number of variants.
  • hansel (hansel.hansel.Hansel) – The Hansel structure currently being explored by Gretel.
  • original_hansel (hansel.hansel.Hansel) – A copy of the Hansel structure created by Gretel, before any reweighting.
Returns:

  • Path (list{str} or None) – The sequence of variants that represent the completed path (or haplotype), or None if one could not be successfully constructed.
  • Path Probabilities (dict{str, float}) – The hp_original (orignal Hansel) and hp_current (current Hansel) joint probabilities of the variants in the returned path occurring together in the given order.
  • Minimum Marginal (float) – The smallest marginal distribution observed across selected variants.

gretel.gretel.reweight_hansel_from_path(hansel, path, ratio)[source]

Given a completed path, reweight the applicable pairwise observations in the Hansel structure.

Parameters:
  • hansel (hansel.hansel.Hansel) – The Hansel structure currently being explored by Gretel.

  • path (list{str}) – The ordered sequence of selected variants.

  • ratio (float) – The proportion of evidence to remove from each paired observation that was considered to recover the provided path.

    It is recommended this be the smallest marginal distribution observed across selected variants.

    i.e. For each selected variant in the path, note the value of the marginal distribution for the probability of observing that particular variant at that genomic position. Parameterise the minimum value of those marginals.

Returns:

Spent Observations – The sum of removed observations from the Hansel structure.

Return type:

float

gretel.util module

gretel.util.get_ref_len_from_bam(bam_path, target_contig)[source]

Fetch the length of a given reference sequence from a pysam.AlignmentFile.

Parameters:
  • bam_path (str) – Path to the BAM alignment
  • target_contig (str) – The name of the contig for which to recover haplotypes.
Returns:

end_pos – The 1-indexed genomic position at which to stop considering variants.

Return type:

int

gretel.util.load_fasta(fa_path)[source]

A convenient wrapper function for constructing a pysam.FastaFile

Parameters:fa_path (str) – Path to FASTA
Returns:FASTA File Interface
Return type:pysam.FastaFile
gretel.util.load_from_bam(bam_path, target_contig, start_pos, end_pos, vcf_handler, use_end_sentinels=False, n_threads=1, debug_reads=False, debug_pos=False, stepper='samtools')[source]

Load variants observed in a pysam.AlignmentFile to an instance of hansel.hansel.Hansel.

Parameters:
  • bam_path (str) – Path to the BAM alignment

  • target_contig (str) – The name of the contig for which to recover haplotypes.

  • start_pos (int) – The 1-indexed genomic position from which to begin considering variants.

  • end_pos (int) – The 1-indexed genomic position at which to stop considering variants.

  • vcf_handler (dict{str, any}) – Variant metadata, as provided by gretel.gretel.process_vcf().

  • use_end_sentinels (boolean, optional(default=False)) – Whether or not to append an additional pairwise observation between the final variant on a read towards a sentinel.

    Note

    Experimental This feature is for testing purposes, currently it is recommended that the flag be left at the default of False. However, some data sets report minor performance improvements for some haplotypes when set to True. This flag may be removed at any time without warning.

  • n_threads (int, optional(default=1)) – Number of threads to spawn for reading the BAM

  • debug_reads (list{str}, optional) – A list of read names for which to print out debugging information

  • debug_pos (list{int}, optional) – A list of positions for which to print out debugging information

  • stepper (str, optional(default=samtools)) – The pysam pileup stepper to use

Returns:

Hansel

Return type:

hansel.hansel.Hansel

gretel.util.process_vcf(vcf_path, contig_name, start_pos, end_pos)[source]

Parse a VCF to extract the genomic positions of called variants.

Parameters:
  • vcf_path (str) – Path to the VCF file.
  • contig_name (str) – Name of the target contig on which variants were called.
  • start_pos (int) – The 1-indexed genomic position from which to begin considering variants.
  • end_pos (int) – The 1-indexed genomic position at which to stop considering variants.
Returns:

Gretel Metastructure – A collection of structures used for the execution of Gretel. The currently used keys are:

N : int

The number of observed SNPs

snp_fwd : dict{int, int}

A reverse lookup from the n’th variant, to its genomic position on the contig

snp_rev : dict{int, int}

A forward lookup to translate the n’th genomic position to its i’th SNP rank

region : list{int}

A masked representation of the target contig, positive values are variant positions

Return type:

dict

Module contents