2 Data QC
2.1 Sample QC
Required Dataset (one of the following):
data.frame
file from the Data Input pageSNP post-QC
data.frame
file from the subpage Data QC/SNP QC
Step 1: Get Summary
First, obtain the sample summary statistics (missing rate and heterozygosity rate) by clicking both Summary buttons and you will see the results.
2.2 SNP QC
Required Dataset (one of the following):
data.frame
file from the Data Input pageSample post-QC
data.frame
file from the subpage Data QC/Sample QC
Step 1: Get Summary
First, obtain the SNP summary statistics [missing rate, minor allele frequency (MAF), heterozygosity rate, and Hardy-Weinberg equilibrium (HWE)] by clicking all Summary buttons and you will see the results.
Step 2: Sample QC
Adjust the thresholds and click the SNP QC by
Thresholds button.
This will generate the Post-QC
data.frame
file.
Note: If you prefer not to perform QC based on SNP missing rate or heterozygosity rate, set the missing rate threshold to 1, the MAF to 0, and the heterozygosity rate to 0 and 1. Additionally, leave the ‘Do SNP QC by HWE’ checkbox unticked to skip QC based on SNP HWE.
2.3 SNP Density
Required Dataset (one of the following):
Site Info. (RDS) of the current
data.frame
, downloadable from Data Input or Data QC pages.-
Chromosome Info. (CSV): Reference genome information of the current study.
Click here: Download an example of Chromosome Info.(CSV).
➡️ Example: Chromosome Info. of rice (Data source: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034140825.1/)
Chr | Start | End |
---|---|---|
Chr01 | 0 | 43929697 |
Chr02 | 0 | 36447916 |
Chr03 | 0 | 37399924 |
Chr04 | 0 | 36078568 |
Chr05 | 0 | 30400764 |
Chr06 | 0 | 32122276 |
Chr07 | 0 | 29936421 |
Chr08 | 0 | 28605474 |
Chr09 | 0 | 27474823 |
Chr10 | 0 | 23931887 |
Chr11 | 0 | 31111469 |
Chr12 | 0 | 28271460 |
Steps:
Upload Site Info. (RDS) and Chromosome Info. (CSV).
Choose a window size in kilobases (kb).
Click the Summary button. This will calculate the density of SNPs across the genome.
Outputs:
SNP Density Plot (PDF): An ideogram visualizing SNP density across the genome within a defined window size. A gradient color palette is used to represent varying SNP densities: green for lower densities, yellow for medium densities, and red for higher densities, with grey indicating regions with zero SNP.
SNP Density (CSV): A table detailing SNP density across each chromosome. bp_over_SNPs: The total base pairs (bp) per SNP in each window, representing the average spacing between SNPs. SNPs_over_1000bp: The number of SNPs per 1,000 base pairs, providing a normalized measure of SNP density across the genome.

SNP Density Complete!