昨天写了一篇(单倍型的显著性分析)的博文,里面介绍了为什么GWAS分析后,要进行单倍型的显著性分析,简而言之,如果显著性位点在block中,以block为代表进行利用,可以进行PRS(多基因评分)或者MAS(分子标记辅助选择。数据是plink格式的map和ped格式的基因型数据。使用plink1.9,通过参数--blocks计算单倍型$ plink --file a19 --blocks no-pheno-req PLINK v1.90b4.6 64-bit (15 Aug 2017) www.cog-genomics.org/plink/1.9/ (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --blocks no-pheno-req --file a19
15488 MB RAM detected; reserving 7744 MB for main workspace. .ped scan complete (for binary autoconversion). Performing single-pass .bed write (54 variants, 310 people). --file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written. 54 variants loaded from .bim file. 310 people (0 males, 0 females, 310 ambiguous) loaded from .fam. Ambiguous sex IDs written to plink.nosex . Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 310 founders and 0 nonfounders present. Calculating allele frequencies... done. 54 variants and 310 people pass filters and QC. Note: No phenotypes present. --blocks: 8 haploblocks written to plink.blocks . Extra block details written to plink.blocks.det . Longest span: 102.68kb.
]$ cat plink.blocks.det CHR BP1 BP2 KB NSNPS SNPS 19 26609728 26619327 9.6 4 QTLSaanen19.135_O|QTLSaanen19.137_O|QTLSaanen19.26_O|QTLSaanen19.28_O 19 26647834 26659230 11.397 2 QTLSaanen19.32|GoatD01.005522 19 26818809 26824298 5.49 2 QTLSaanen19.45_O|QTLSaanen19.46 19 27170279 27214437 44.159 5 snp24006-scaffold244048787|QTLSaanen19.49|QTLSaanen19.50_O|QTLSaanen19.51|QTLSaanen19.53_O 19 27220123 27228780 8.658 2 QTLSaanen19.54_O|QTLSaanen19.55 19 27480793 27482976 2.184 2 snp24012-scaffold244-1259949|19_27482976_AF-PAKI 19 27502643 27605322 102.68 15 QTLSaanen19.63_O|QTLSaanen19.64|QTLSaanen19.65_O|snp24013-scaffold244-1309587|QTLSaanen19.67|QTLSaanen19.68|QTLSaanen19.69_O|snp24014-scaffold244-1338180|QTLSaanen19.70_O|QTLSaanen19.71|QTLSaanen19.74|QTLSaanen19.76_O|QTLSaanen19.77_O|QTLSaanen19.78_O|snp24015-scaffold244-1384770 19 27618016 27623534 5.519 3 QTLSaanen19.79|QTLSaanen19.79_O|QTLSaanen19.80
使用plink1.07,参数:--hap-freq$ plink1.07 --file a19 --hap plink.blocks --noweb --hap-freq
@----------------------------------------------------------@ | PLINK! | v1.07 | 10/Aug/2009 | |----------------------------------------------------------| | (C) 2009 Shaun Purcell, GNU General Public License, v2 | |----------------------------------------------------------| | For documentation, citation & bug-report instructions: | | http://pngu.mgh./purcell/plink/ | @----------------------------------------------------------@
Skipping web check... [ --noweb ] Writing this text to log file [ plink.log ] Analysis started: Wed Apr 2 07:56:49 2025
Options in effect: --file a19 --hap plink.blocks --noweb --hap-freq
54 (of 54) markers to be included from [ a19.map ] Warning, found 310 individuals with ambiguous sex codes These individuals will be set to missing ( or use --allow-no-sex ) Writing list of these individuals to [ plink.nosex ] 310 individuals read from [ a19.ped ] 0 individuals with nonmissing phenotypes Assuming a disease phenotype (1=unaff, 2=aff, 0=miss) Missing phenotype value is also -9 0 cases, 0 controls and 310 missing 0 males, 0 females, and 310 of unspecified sex Before frequency and genotyping pruning, there are 54 SNPs 310 founders and 0 non-founders found Total genotyping rate in remaining individuals is 1 0 SNPs failed missingness test ( GENO > 1 ) 0 SNPs failed frequency test ( MAF < 0 ) After frequency and genotyping pruning, there are 54 SNPs After filtering, 0 cases, 0 controls and 310 missing After filtering, 0 males, 0 females, and 310 of unspecified sex
Read 8 haplotypes from [ plink.blocks ] Estimating haplotype frequencies/phases ( MHF >= 0.01 ) Considering phases P(H|G) >= 0.01 Requiring per individual per haplotype missingness < 0.5 Writing haplotype frequencies to [ plink.frq.hap ] 8 out of 8 haplotypes phased Analysis finished: Wed Apr 2 07:56:49 2025
$ cat plink.frq.hap LOCUS HAPLOTYPE F H1 ACCC 0.05323 H1 GATC 0.35 H1 GATT 0.5968 H2 GC 0.4821 H2 AT 0.4885 H2 GT 0.01956 H3 CA 0.1339 H3 AG 0.8661 H4 CGGTT 0.1758 H4 TAGCC 0.01867 H4 CAACC 0.0477 H4 TAACC 0.7442 H5 GT 0.1774 H5 AC 0.4613 H5 GC 0.3613 H6 AT 0.1806 H6 GT 0.1435 H6 GC 0.6758 H7 TGCCAATCTTACAAT 0.1871 H7 ATTTGGTTCCCAGGC 0.25 H7 ATTTGGCTCCCAGGC 0.1581 H7 ATTTGGTCTCACAAT 0.04355 H7 ATTTGGTCTTACAAT 0.3323 H7 ATTTGGTTCCACAAT 0.01774 H8 AAG 0.1371 H8 GGT 0.1242 H8 AGT 0.3323 H8 AAT 0.4065
可以看到,H1~H8是八个block,每个block里面包括不同的单倍型以及对应的频率,比如第一个block,包括3个类型:对应的频率分别是:0.05323, 0.35和0.5968把对应的基因型提取出来,确定每个样本的单倍型类型,结合表型数据,就可以进行单倍型间的显著性分析了,进而绘制箱线图、小提琴图等图了。测试数据上传到知识星球了:(https://t./ML3ox)
|