Dereplication of a FASTA file


Purpose

This tool takes a FASTA-formatted sequence file and removes all clones (100% sequence identity clones). It returns a ZIP file that contains the dereplicated sequences, a names file for running Mothur on dereplicated sequences, and a file with two columns. The first column will have the representative sequence tag, whereas the second column will have the "multiplicity" of that sequence, namely, the number of clones of that sequence (including that sequence). For an example:

S11235 1
S81321 2
S34558 17
S91442 1
S33377 34
...


For instance, if your FASTA file contains all unique sequences, then the second column will contain only 1s. You can use this file to create a UniFrac environment file.

Input Form

* denotes mandatory parameters

Sequence file*:

Funding Agency Disclaimer      License and Terms of Use

Logo of the University of Illinois at Urbana-Champaign