Cell Ranger uses an aligner called STAR, which peforms splicing-aware alignment of reads to the genome. Cell Ranger then uses the transcript annotation GTF to bucket the reads into exonic, intronic, and intergenic, and by whether the reads align (confidently) to the genome. A read is exonic if at least 50% of it intersects an exon, intronic if it is non-exonic and intersects an intron, and intergenic otherwise.
基本的注释信息:
Column
Name
Description
1
Chromosome
Must refer to a chromosome/contig in the genome fasta.
2
Source
Unused.
3
Feature
cellranger count only uses rows where this line is exon.
4
Start
Start position on the reference (1-based inclusive).
5
End
End position on the reference (1-based inclusive).
6
Score
Unused.
7
Strand
Strandedness of this feature on the reference: + or -.
8
Frame
Unused.
9
Attributes
A semicolon-delimited list of key-value pairs of the form key "value". The attribute keys transcript_id and gene_idare required; gene_name is optional and may be non-unique, but if present will be preferentially displayed in reports.
1genome_output/ 2├── fasta 3│ └── genome.fa 4├── genes 5│ └── genes.gtf 6├── pickle 7│ └── genes.pickle 8├── reference.json 9└── star # STAR genome index folder
For the genome sequence, include all major chromosomes, unplaced and unlocalized scaffolds, but do not include patches and alternative haplotypes.
In Ensembl, the recommended genome file to download is annotated as "primary assembly." - In NCBI, it is "no alternative - analysis set."
For the GTF file, genes must be annotated with feature type 'exon' (column 3). - Prior to mkref, GTF annotation files from Ensembl and NCBI are typically filtered with `mkgtf` to include only a subset of the annotated gene biotypes.
Creating a Reference Package with cellranger mkref
[1]LAI: 评估基因组质量一个标准: https://www.jianshu.com/p/7d794d22e0a0 [2]如何对基因组序列进行注释: https://www.jianshu.com/p/931e9821c45a [3]STAR: https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Falexdobin%2FSTAR [4]feature type 'exon' (column 3): https://links.jianshu.com/go?to=https%3A%2F%2Fsupport.10xgenomics.com%2Fsingle-cell-gene-expression%2Fsoftware%2Fpipelines%2Flatest%2Fadvanced%2Freferences%23gtf [5]filtered with : https://links.jianshu.com/go?to=https%3A%2F%2Fsupport.10xgenomics.com%2Fsingle-cell-gene-expression%2Fsoftware%2Fpipelines%2Flatest%2Fadvanced%2Freferences%23mkgtf [6]Creating a Reference Package with cellranger mkref: https://links.jianshu.com/go?to=https%3A%2F%2Fsupport.10xgenomics.com%2Fsingle-cell-gene-expression%2Fsoftware%2Fpipelines%2Flatest%2Fadvanced%2Freferences%23header [7]R包clusterProfiler的纯无参自定义物种注释的GO、KEGG富集分析及GSEA: https://links.jianshu.com/go?to=http%3A%2F%2Fblog.sciencenet.cn%2Fblog-3406804-1213409.html [8]关于人类参考基因组的一些认识: https://www.jianshu.com/p/3806afaf0c8c [9]https://www.cnblogs.com/leezx/p/5710819.html [10]Why Use Zebrafish to Study Human Diseases?: https://links.jianshu.com/go?to=https%3A%2F%2Firp.nih.gov%2Fblog%2Fpost%2F2016%2F08%2Fwhy-use-zebrafish-to-study-human-diseases