Freqgen

Build Status CodeFactor Docs

Freqgen is a tool to generate coding DNA sequences with specified amino acid usage frequencies or sequence, GC content, codon usage bias, and/or \(k\)-mer usage bias. To accomplish this, Freqgen uses genetic algorithms to efficiently search the solution space of possible DNA sequences to find ones that most closely match the desired parameters.

Features

  • Supports both CLI and Python module usage
  • Thoroughly documented with examples
  • Leverages NumPy for C-optimized number crunching
  • Can simultaneously match multiple DNA statistics

Installation

Simply run:

$ pip install freqgen

Or, to get the latest (but not necessarily stable) development version:

$ pip install git+https://github.com/Lab41/freqgen.git

Five-second CLI tutorial

The basic flow of Freqgen can be summarized in three steps:

  1. Generate a new amino acid sequence based on the amino acid usage profile of reference sequences. If you already have a specific amino acid sequence in mind (i.e. for synthetic biology uses), skip this step:

    $ freqgen aa reference_sequences.fna -o new_sequence.faa -l LENGTH
    
  2. Create a YAML file containing \(k\)-mer frequencies for the amino acid sequence’s DNA to have:

    $ freqgen featurize reference_sequences.fna -k INT -o reference_freqs.yaml
    
  3. Generate the DNA sequence coding for the amino acid sequence:

    $ freqgen -t reference_freqs.yaml -s new_sequence.faa -v -o optimized.fna
    
  4. Visualize the results of the optimization (optional):

    $ freqgen visualize --target reference_freqs.yaml --optimized optimized.fna
    

Citation

To be determined.

Indices and tables