
Build Status CodeFactor Docs

Freqgen is a tool to generate coding DNA sequences with specified amino acid usage frequencies or sequence, GC content, codon usage bias, and/or \(k\)-mer usage bias. To accomplish this, Freqgen uses genetic algorithms to efficiently search the solution space of possible DNA sequences to find ones that most closely match the desired parameters.


  • Supports both CLI and Python module usage
  • Thoroughly documented with examples
  • Leverages NumPy for C-optimized number crunching
  • Can simultaneously match multiple DNA statistics


Simply run:

$ pip install freqgen

Or, to get the latest (but not necessarily stable) development version:

$ pip install git+

Five-second CLI tutorial

The basic flow of Freqgen can be summarized in three steps:

  1. Generate a new amino acid sequence based on the amino acid usage profile of reference sequences. If you already have a specific amino acid sequence in mind (i.e. for synthetic biology uses), skip this step:

    $ freqgen aa reference_sequences.fna -o new_sequence.faa -l LENGTH
  2. Create a YAML file containing \(k\)-mer frequencies for the amino acid sequence’s DNA to have:

    $ freqgen featurize reference_sequences.fna -k INT -o reference_freqs.yaml
  3. Generate the DNA sequence coding for the amino acid sequence:

    $ freqgen -t reference_freqs.yaml -s new_sequence.faa -v -o optimized.fna
  4. Visualize the results of the optimization (optional):

    $ freqgen visualize --target reference_freqs.yaml --optimized optimized.fna


To be determined.

Indices and tables