Freqgen¶
Freqgen is a tool to generate coding DNA sequences with specified amino acid usage frequencies or sequence, GC content, codon usage bias, and/or \(k\)-mer usage bias. To accomplish this, Freqgen uses genetic algorithms to efficiently search the solution space of possible DNA sequences to find ones that most closely match the desired parameters.
Features¶
- Supports both CLI and Python module usage
- Thoroughly documented with examples
- Leverages NumPy for C-optimized number crunching
- Can simultaneously match multiple DNA statistics
Installation¶
Simply run:
$ pip install freqgen
Or, to get the latest (but not necessarily stable) development version:
$ pip install git+https://github.com/Lab41/freqgen.git
Five-second CLI tutorial¶
The basic flow of Freqgen can be summarized in three steps:
Generate a new amino acid sequence based on the amino acid usage profile of reference sequences. If you already have a specific amino acid sequence in mind (i.e. for synthetic biology uses), skip this step:
$ freqgen aa reference_sequences.fna -o new_sequence.faa -l LENGTH
Create a YAML file containing \(k\)-mer frequencies for the amino acid sequence’s DNA to have:
$ freqgen featurize reference_sequences.fna -k INT -o reference_freqs.yaml
Generate the DNA sequence coding for the amino acid sequence:
$ freqgen -t reference_freqs.yaml -s new_sequence.faa -v -o optimized.fna
Visualize the results of the optimization (optional):
$ freqgen visualize --target reference_freqs.yaml --optimized optimized.fna
Citation¶
To be determined.