相信Clustal程序是大家耳熟能详的序列比对软件了。其中ClustalX是windows下图形化界面的版本,ClustalW是基于命令行的版本,后者是做生物信息中一般用的比较多,今天的主角就是ClustalW。 接触ClustalW的人大多以为ClustalW是一个交互运行的软件(3年前小生本人也是这么以为的),命令很清晰但是一次只能处理一组数据,如下图所示: 近期一个师弟问我,有没有办法批量对一批同源蛋白序列group做序列比对,因为ClustalW只能手工一个group接着一个group地比对实在太麻烦了。于是我给他推荐了一系列比对软件,例如支持多线程的的mafft。同时ClustalW完全可以自动化批量运行。 其实在命令行下,如果仅输入clustalw的话就会出现上面的那张图的结果。但是大家可以尝试在后面加上-HELP,也就是运行:
你会发现一堆参数信息(后面附录)。有了这些参数,用户在运行clustalw的时候就可以通过带参数的方式直接运行了,这样如果需要比对多组蛋白或者核苷酸序列的话,提前用perl程序把要运行的脚本或者命令生成好就ok了。到这里相信大家都清楚如何利用clustalw批量比对序列了吧。 CLUSTAL 2.0.12 Multiple Sequence Alignments DATA (sequences) -INFILE=file.ext :input sequences. -PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment). VERBS (do things) -OPTIONS :list the command line parameters -HELP or -CHECK :outline the command line params. -FULLHELP :output full help content. -ALIGN :do full multiple alignment. -TREE :calculate NJ tree. -PIM :output percent identity matrix (while calculating the tree) -BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000). -CONVERT :output the input sequences in a different file format. PARAMETERS (set things) ***General settings:**** -INTERACTIVE :read command line, then enter normal interactive menus -QUICKTREE :use FAST algorithm for the alignment guide tree -TYPE= :PROTEIN or DNA sequences -NEGATIVE :protein alignment with negative values in matrix -OUTFILE= :sequence alignment file name -OUTPUT= :GCG, GDE, PHYLIP, PIR or NEXUS -OUTORDER= :INPUT or ALIGNED -CASE :LOWER or UPPER (for GDE output only) -SEQNOS= :OFF or ON (for Clustal output only) -SEQNO_RANGE=:OFF or ON (NEW: for all output formats) -RANGE=m,n :sequence range to write starting m to m+n -MAXSEQLEN=n :maximum allowed input sequence length -QUIET :Reduce console output to minimum -STATS= :Log some alignents statistics to file ***Fast Pairwise Alignments:*** -KTUPLE=n :word size -TOPDIAGS=n :number of best diags. -WINDOW=n :window around best diags. -PAIRGAP=n :gap penalty -SCORE :PERCENT or ABSOLUTE ***Slow Pairwise Alignments:*** -PWMATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename -PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename -PWGAPOPEN=f :gap opening penalty -PWGAPEXT=f :gap opening penalty ***Multiple Alignments:*** -NEWTREE= :file for new guide tree -USETREE= :file for old guide tree -MATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename -DNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename -GAPOPEN=f :gap opening penalty -GAPEXT=f :gap extension penalty -ENDGAPS :no end gap separation pen. -GAPDIST=n :gap separation pen. range -NOPGAP :residue-specific gaps off -NOHGAP :hydrophilic gaps off -HGAPRESIDUES= :list hydrophilic res. -MAXDIV=n :% ident. for delay -TYPE= :PROTEIN or DNA -TRANSWEIGHT=f :transitions weighting -ITERATION= :NONE or TREE or ALIGNMENT -NUMITER=n :maximum number of iterations to perform -NOWEIGHTS :disable sequence weighting ***Profile Alignments:*** -PROFILE :Merge two alignments by profile alignment -NEWTREE1= :file for new guide tree for profile1 -NEWTREE2= :file for new guide tree for profile2 -USETREE1= :file for old guide tree for profile1 -USETREE2= :file for old guide tree for profile2 ***Sequence to Profile Alignments:*** -SEQUENCES :Sequentially add profile2 sequences to profile1 alignment -NEWTREE= :file for new guide tree -USETREE= :file for old guide tree ***Structure Alignments:*** -NOSECSTR1 :do not use secondary structure-gap penalty mask for profile 1 -NOSECSTR2 :do not use secondary structure-gap penalty mask for profile 2 -SECSTROUT=STRUCTURE or MASK or BOTH or NONE :output in alignment file -HELIXGAP=n :gap penalty for helix core residues -STRANDGAP=n :gap penalty for strand core residues -LOOPGAP=n :gap penalty for loop regions -TERMINALGAP=n :gap penalty for structure termini -HELIXENDIN=n :number of residues inside helix to be treated as terminal -HELIXENDOUT=n :number of residues outside helix to be treated as terminal -STRANDENDIN=n :number of residues inside strand to be treated as terminal -STRANDENDOUT=n:number of residues outside strand to be treated as terminal ***Trees:*** -OUTPUTTREE=nj OR phylip OR dist OR nexus -SEED=n :seed number for bootstraps. -KIMURA :use Kimura's correction. -TOSSGAPS :ignore positions with gaps. -BOOTLABELS=node OR branch :position of bootstrap values in tree display -CLUSTERING= :NJ or UPGMA
|
|