分享

R字符串处理 ← 糗世界

 panhoy 2015-11-20
> dna<-DNAString("TCTCCCAACCCTTGTACCAGT")
> Biostrings::dna2rna(dna)
  21-letter "RNAString" instance
seq: UCUCCCAACCCUUGUACCAGU
> rna<-transcribe(dna)
> rna
  21-letter "RNAString" instance
seq: AGAGGGUUGGGAACAUGGUCA
> rna2dna(rna)
  21-letter "DNAString" instance
seq: AGAGGGTTGGGAACATGGTCA
> cD<-cDNA(rna)
> cD
  21-letter "DNAString" instance
seq: TCTCCCAACCCTTGTACCAGT
> codons(rna)
  Views on a 21-letter RNAString subject
subject: AGAGGGUUGGGAACAUGGUCA
views:
    start end width
[1]     1   3     3 [AGA]
[2]     4   6     3 [GGG]
[3]     7   9     3 [UUG]
[4]    10  12     3 [GGA]
[5]    13  15     3 [ACA]
[6]    16  18     3 [UGG]
[7]    19  21     3 [UCA]
> AA<-translate(rna)
> AA
  7-letter "AAString" instance
seq: RGLGTWS
> complement(dna)
  21-letter "DNAString" instance
seq: AGAGGGTTGGGAACATGGTCA
> reverseComplement(dna)
  21-letter "DNAString" instance
seq: ACTGGTACAAGGGTTGGGAGA
> reverse(dna)
  21-letter "DNAString" instance
seq: TGACCATGTTCCCAACCCTCT
 
> library("BSgenome.Hsapiens.UCSC.hg19")
Loading required package: BSgenome
Loading required package: GenomicRanges
> Hsapiens
Human genome
| 
| organism: Homo sapiens (Human)
| provider: UCSC
| provider version: hg19
| release date: Feb. 2009
| release name: Genome Reference Consortium GRCh37
| 
| single sequences (see '?seqnames'):
|   chr1                   chr2                   chr3                 
|   chr4                   chr5                   chr6                 
|   chr7                   chr8                   chr9                 
|   chr10                  chr11                  chr12                
|   chr13                  chr14                  chr15                
|   chr16                  chr17                  chr18                
|   chr19                  chr20                  chr21                
|   chr22                  chrX                   chrY                 
|   chrM                   chr1_gl000191_random   chr1_gl000192_random 
|   chr4_ctg9_hap1         chr4_gl000193_random   chr4_gl000194_random 
|   chr6_apd_hap1          chr6_cox_hap2          chr6_dbb_hap3        
|   chr6_mann_hap4         chr6_mcf_hap5          chr6_qbl_hap6        
|   chr6_ssto_hap7         chr7_gl000195_random   chr8_gl000196_random 
|   chr8_gl000197_random   chr9_gl000198_random   chr9_gl000199_random 
|   chr9_gl000200_random   chr9_gl000201_random   chr11_gl000202_random
|   chr17_ctg5_hap1        chr17_gl000203_random  chr17_gl000204_random
|   chr17_gl000205_random  chr17_gl000206_random  chr18_gl000207_random
|   chr19_gl000208_random  chr19_gl000209_random  chr21_gl000210_random
|   chrUn_gl000211         chrUn_gl000212         chrUn_gl000213       
|   chrUn_gl000214         chrUn_gl000215         chrUn_gl000216       
|   chrUn_gl000217         chrUn_gl000218         chrUn_gl000219       
|   chrUn_gl000220         chrUn_gl000221         chrUn_gl000222       
|   chrUn_gl000223         chrUn_gl000224         chrUn_gl000225       
|   chrUn_gl000226         chrUn_gl000227         chrUn_gl000228       
|   chrUn_gl000229         chrUn_gl000230         chrUn_gl000231       
|   chrUn_gl000232         chrUn_gl000233         chrUn_gl000234       
|   chrUn_gl000235         chrUn_gl000236         chrUn_gl000237       
|   chrUn_gl000238         chrUn_gl000239         chrUn_gl000240       
|   chrUn_gl000241         chrUn_gl000242         chrUn_gl000243       
|   chrUn_gl000244         chrUn_gl000245         chrUn_gl000246       
|   chrUn_gl000247         chrUn_gl000248         chrUn_gl000249       
| 
| multiple sequences (see '?mseqnames'):
|   upstream1000  upstream2000  upstream5000  
| 
| (use the '$' or '[[' operator to access a given sequence)
> chr22NoN<-mask(Hsapiens$chr22,"N")
> alphabetFrequency(Hsapiens$chr22, baseOnly=TRUE)
      A       C       G       T   other 
9094775 8375984 8369235 9054551       0 
> alphabetFrequency(Hsapiens$chr22)
      A       C       G       T       M       R       W       S 
9094775 8375984 8369235 9054551       0       0       0       0 
      Y       K       V       H       D       B       N       - 
      0       0       0       0       0       0       0       0 
      + 
      0 
> hasOnlyBaseLetters(Hsapiens$chr22)
[1] TRUE
> uniqueLetters(Hsapiens$chr22)
[1] "A" "C" "G" "T"
> GC_content <- letterFrequency(Hsapiens$chr22, letters="CG")
> GC_content
     C|G 
16745219 
> GC_content <- letterFrequency(Hsapiens$chr22, letters="CG")/letterFrequency(Hsapiens$chr22, letters="ACGT")
> GC_content
      C|G 
0.4798807

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约