FASTA formatFASTA format is a standard format for encoding DNA or protein sequences. A FASTA file may contain a single or multiple sequences in FASTA format. A single sequence is described by a title line followed by one or more data lines. The title line begins with a right angle bracket followed by a label. The label ends with the first white space character. Everything after that on the first line is considered a comment. The data lines begin right after the title line and contain the sequence characters in order. Each data line except the last should be exactly 60 letters long, although many programs allow a little flexibility on that score.
The first Protein Encoding Gene of Staph aureus MRSA 252 is shown in FASTA format. The letters in this example are amino acid codes. The box below shows a FASTA file containing multiple RNA genes from Listeria monocytogenes 10403S. In this case, the letters are DNA Nucleotide Codes, and the file extension would be either ".fasta" or ".fna" (for Fasta Nucleic Acid). When the sequences are amino acids, file extensions are either ".fasta" or ".faa" (for Fasta Amino Acid).
>fig|393133.3.rna.1 ggagaaatacccaagtccggctgaaggggacagactcgaaatctgttaggtggtgtatgc cgcgccggggttcgaatccccgtttctccg >fig|393133.3.rna.2 gggttgttagctcagttggtagagcagctgactcttaatcagcgggtcgggggttcgaaa ccctcacaaccca >fig|393133.3.rna.3 gcccatatagttaaacggatataacaagcccctcctaagggctagttcgtggttcgattc cgcgtatgggcg >fig|393133.3.rna.4 gccgctttagctcagttggtagagcacttccatggtaaggaaggggtcgtcggttcaaat ccgacaagtggct >fig|393133.3.rna.5 gtcctgatagctcagctggatagagcaacggccttctaagccgtcggtcgggggttcgaa tccctctcaggacg >fig|393133.3.rna.6 gagccgttagctcagttggtagagcatctgacttttaatcagagggtcgctggttcgaac ccagcacggctca >fig|393133.3.rna.7 gccggcttagctcagttggtagagcaactgatttgtaatcagtaggtcgcgagttcgact cttgcagccggca >fig|393133.3.rna.8 ggggaagtactcaagtggctgaagaggtgcccctgctaagggtataggtcgctcgcgcgg cgcgagggttcaaatccctccttctccg |
|
来自: bengua1985 > 《生物信息学》