API Reference

amino_acid_seq(length, frequencies)

Generates an amino acid sequence given frequencies of each amino acid.

Parameters:
  • length (int) – The length of the amino acid sequence to generate.
  • frequencies (dict) – A dictionary containing a mapping of each amino acid to its frequency.

Note

The sum of all the values in frequencies must be 1.

Returns:An amino acid sequence with the given frequencies.
Return type:str
Raises:ValueError – When the length of the sequence is invalid or when the probabilities do not sum to 1.

Example

>>> from Bio import SeqIO
>>> seq = SeqIO.read("beta_lactamase.fasta", "fasta").seq
>>> frequencies = k_mer_frequencies(seq, 1)
>>> amino_acid_seq(25, frequencies)
amino_acids_to_codons(aa_seq, codon_frequencies, genetic_code=11)

Generates a DNA representation of an amino acid sequence.

Parameters:
  • aa_seq (str) – The amino acids to convert to DNA.
  • codon_frequencies (dict) – A dictionary of codon frequencies for each amino acid. For each amino acid, the sum of the frequencies of its codons must be 1.
  • genetic_code (int, optional) – The genetic code to use when converting to DNA. Defaults to 11, the standard genetic code.
Returns:

A DNA sequence with the given codon usage.

Return type:

str

Example

>>> from Bio import SeqIO
>>> seq = SeqIO.read("sequence.fasta", "fasta").seq
>>> frequencies = codon_frequencies(seq)
>>> amino_acids_to_codons("INQTEL", frequencies)
'ATAAATCAAACCGAACTT'
codon_frequencies(seq, mode='absolute', genetic_code=11)

Calculates the frequency of each codon

Absolute mode is such that the total of the dictionary’s values is equal to one. Relative mode is such that the sum of each amino acid’s codons’ frequencies is equal to one.

Parameters:
  • seq (str or list) – The DNA sequence(s).
  • mode (str, optional) – One of “absolute” or “relative”. Defaults to “absolute”
  • genetic_code (int, optional) – The genetic code to use when converting to DNA. Defaults to 11, the standard genetic code.
Returns:

The codon frequencies of each codon.

Return type:

dict

Raises:

ValueError – When the sequence length is not divisible into codons, i.e. when sequence length is not divisible by three.

Example

>>> from Bio import SeqIO
>>> seq = SeqIO.read("sequence.fasta", "fasta").seq
>>> codon_frequencies(seq)
{'AAA': 0.016129032258064516,
 'AAC': 0.016129032258064516,
 'AAG': 0.016129032258064516,
 'AAT': 0.016129032258064516,
 'ACA': 0,
 'ACC': 0.016129032258064516,
 'ACG': 0,
 'ACT': 0.016129032258064516,
 'AGA': 0,
 'AGC': 0,
 'AGG': 0,
 'AGT': 0,
 'ATA': 0.016129032258064516,
 'ATC': 0,
 'ATG': 0.03225806451612903,
 'ATT': 0,
 'CAA': 0.04838709677419355,
 'CAC': 0,
 'CAG': 0,
 'CAT': 0.03225806451612903,
 'CCA': 0.03225806451612903,
 'CCC': 0.016129032258064516,
 'CCG': 0.04838709677419355,
 'CCT': 0.03225806451612903,
 'CGA': 0.016129032258064516,
 'CGC': 0,
 'CGG': 0.03225806451612903,
 'CGT': 0,
 'CTA': 0,
 'CTC': 0,
 'CTG': 0.03225806451612903,
 'CTT': 0.0967741935483871,
 'GAA': 0.016129032258064516,
 'GAC': 0,
 'GAG': 0,
 'GAT': 0.016129032258064516,
 'GCA': 0,
 'GCC': 0.06451612903225806,
 'GCG': 0.016129032258064516,
 'GCT': 0,
 'GGA': 0.03225806451612903,
 'GGC': 0.016129032258064516,
 'GGG': 0.016129032258064516,
 'GGT': 0,
 'GTA': 0,
 'GTC': 0.016129032258064516,
 'GTG': 0,
 'GTT': 0.04838709677419355,
 'TAA': 0,
 'TAC': 0,
 'TAG': 0,
 'TAT': 0.03225806451612903,
 'TCA': 0.016129032258064516,
 'TCC': 0.03225806451612903,
 'TCG': 0,
 'TCT': 0.03225806451612903,
 'TGA': 0,
 'TGC': 0.016129032258064516,
 'TGG': 0.016129032258064516,
 'TGT': 0,
 'TTA': 0,
 'TTC': 0.04838709677419355,
 'TTG': 0,
 'TTT': 0.03225806451612903}
k_mer_frequencies(seq, k, include_missing=True, vector=False, codons=False, genetic_code=11)

Calculates relative frequencies of each k-mer in the sequence.

Parameters:
  • seq (str or list) – The sequence(s) to for which to generate k-mer frequencies.
  • k (int or list) – the length of the k-mer(s).
  • include_missing (bool, optional) – If True, include missing k-mers as having a frequency of 0. Only supports DNA k-mers. Defaults to False.
  • vector (bool, optional) – Return a 1-D Numpy array of the k-mer frequencies, ordered by k-mers alphabetically. If True, include_missing must also be True. Defaults to False.
  • codons (bool, optional) – Whether to include a codon usage entry in the resulting dictionary. Defaults to False.
  • genetic_code (int, optional) – The genetic code to use when converting to DNA. Defaults to 11, the standard genetic code.
Returns:

A dict in which the keys are k values and the values are dictionaries mapping k-mers to floats of their frequencies.

Return type:

dict

Raises:
  • ValueError – When an invalid value of k is provided or include_missing is False and vector is True.
  • ValueError – When codons and vector are both True.
  • ValueError – When k or seq is not provided.

Example

>>> k_mer_frequencies("INQTEL", 1, include_missing=False)
{1: {'E': 0.16666666666666666,
     'I': 0.16666666666666666,
     'L': 0.16666666666666666,
     'N': 0.16666666666666666,
     'Q': 0.16666666666666666,
     'T': 0.16666666666666666}}
>>> k_mer_frequencies("GATGATGGC", [1, 2], include_missing=False)
{1: {'A': 0.2222222222222222,
     'C': 0.1111111111111111,
     'G': 0.4444444444444444,
     'T': 0.2222222222222222},
 2: {'AT': 0.25, 'GA': 0.25, 'GC': 0.125, 'GG': 0.125, 'TG': 0.25}}
>>> k_mer_frequencies(["A", "T"], 1, include_missing=False)
{1: {'A': 0.5, 'T': 0.5}}
>>> k_mer_frequencies("GATGATGGC", 2, include_missing=True)
{2: {'AA': 0,
     'AC': 0,
     'AG': 0,
     'AT': 0.25,
     'CA': 0,
     'CC': 0,
     'CG': 0,
     'CT': 0,
     'GA': 0.25,
     'GC': 0.125,
     'GG': 0.125,
     'GT': 0,
     'TA': 0,
     'TC': 0,
     'TG': 0.25,
     'TT': 0}}
>>> k_mer_frequencies("GATGATGGC", 2, include_missing=True, vector=True)
array([0.   , 0.   , 0.   , 0.25 , 0.   , 0.   , 0.   , 0.   , 0.25 ,
       0.125, 0.125, 0.   , 0.   , 0.   , 0.25 , 0.   ])
k_mers(seq, k)

Yields all k-mers in the input sequence with repeats.

Parameters:
  • seq (str) – The sequence for which to generate k-mers.
  • k (int) – the length of the k-mers.
Yields:

str – the next k-mer

Raises:

ValueError – When the value of k is less than the length of the sequence, k <= 0, or len(seq) is 0.

Example

>>> list(k_mers("GATTACA", 1))
['G', 'A', 'T', 'T', 'A', 'C', 'A']
>>> list(k_mers("GATTACA", 2))
['GA', 'AT', 'TT', 'TA', 'AC', 'CA']
>>> list(k_mers("GATTACA", 3))
['GAT', 'ATT', 'TTA', 'TAC', 'ACA']
>>> list(k_mers("GATTACA", 4))
['GATT', 'ATTA', 'TTAC', 'TACA']
>>> k_mers("GATTACA", 4)
<generator object k_mers at 0x10831d258>
generate(target_params, aa_seq, population_size=100, mutation_probability=0.3, crossover_probability=0.8, max_gens_since_improvement=50, improvement_rel_threshold=0.0, genetic_code=11, verbose=False, mode='ED', fitness_function=None)

Generate a sequence matching \(k\)-mer usage.

Parameters:
  • target_params (dict) – The parameters to optimize towards. Should be of the format {\(k_n\): {\(k_{n1}\): 0.2, \(k_{n2}\): 0.3,…}…}. Pass absolute codon usage with "codons" as the key.
  • aa_seq (str) – The amino acid sequence for the optimized sequence.
  • population_size (int, optional) – The size of the population for the genetic algorithm. Defaults to 100.
  • mutation_probability (float, optional) – The likelihood of changing each member of each generation. Defaults to 0.3.
  • crossover_probability (float, optional) – The likelihood of each member of the population undergoing crossover. Defaults to 0.8.
  • max_gens_since_improvement (int, optional) – The number of generations of no improvement after which to stop optimization. Defaults to 50.
  • improvement_rel_threshold (float, optional) – The minimum ftness improvement in precentage for which to reset the value of generations since improvement. Defaults to 0, meaning that any improvement resets the counter. Greater values will result in earlier stopping.
  • genetic_code (int, optional) – The genetic code to use. Defaults to 11, the standard genetic code.
  • verbose (bool, optional) – Whether to print the generation number, generations since improvement, and fitness. Defaults to false.
  • mode (str, optional) – Whether to use Jensen-Shannon Divergence or Euclidean distance. Defaults to "ED". Use "JSD" for Jensen-Shannon Divergence.
  • fitness_function (function, optional) – A function by which to measure the fitness of a potential sequence. Must take a vector representing a sequence and return a float, with lower scores indicating greater fitness. Defaults to None, in which case it uses either JSD or ED.
Returns:

The generated sequence.

Return type:

str

visualize(k_mers, target_freqs, optimized_freqs, original_freqs=None, title='Freqgen Optimization Results', plot_height=400, plot_width=1200, show=True, filepath='freqgen.html', codons=False)

Creates a visualization of the results of a Freqgen optimization.

Note

Currently, this function does not support the visualization of codon optimization and k-mer optimization simultaneously.

Parameters:
  • k_mers (list) – A list of the k-mers to use as the labels for the x-axis.
  • target_freqs (list) – A list of the target frequencies in the same order as the k_mers argument.
  • optimized_freqs (list) – A list of the resultant frequencies in the same order as the k_mers argument.
  • original_freqs (list, optional) – A list of the original frequencies in the same order as the k_mers argument.
  • title (str, optional) – A title to use for the graph. Defaults to “Freqgen Optimization Results”.
  • plot_height (int, optional) – The height for the graph. Defaults to 400.
  • plot_width (int, optional) – The width for the graph. Defaults to 1200.
  • show (bool, optional) – Whether to show the plot or simply return it. Defaults to True.
  • filepath (str, optional) – The output filepath. Defaults to “freqgen.html”.
  • codons (bool, optional) – Whether codons are included in the input vectors. If they are, the x-axis legend will updated accordingly.

Note

Codons must be denoted with a * in the k_mers argument. For example, the codon GAG should be passed as GAG*

Returns:A Bokeh figure containing the bar graph.
Return type:bokeh.plotting.figure.Figure