4 Data QC
➡️ This section contains three subpages: Sample QC, SNP QC, and SNP Density, allowing you to assess the quality of samples and SNPs in data.frame
, as well as visualize SNP density across the genome.
4.1 Sample QC
Required Dataset (one of the following):
data.frame
file from the Data Input pageSNP post-QC
data.frame
file from the subpage Data QC/SNP QC
Step 1: Get Summary
First, obtain the sample summary statistics (missing rate and heterozygosity rate) by clicking both Summary buttons and you will see the results.
4.2 SNP QC
Required Dataset (one of the following):
data.frame
file from the Data Input pageSample post-QC
data.frame
file from the subpage Data QC/Sample QC
Step 1: Get Summary
First, obtain the SNP summary statistics [missing rate, minor allele frequency (MAF), heterozygosity rate, and Hardy-Weinberg equilibrium (HWE)] by clicking all Summary buttons and you will see the results.
Step 2: Sample QC
Adjust the thresholds and click the SNP QC by Thresholds button. This will generate the Post-QC data.frame
file.
Note: If you prefer not to perform QC based on SNP missing rate or heterozygosity rate, set the missing rate threshold to 1, the MAF to 0, and the heterozygosity rate to 0 and 1. Additionally, leave the ‘Do SNP QC by HWE’ checkbox unticked to skip QC based on SNP HWE.
4.3 SNP Density
Required Dataset (one of the following):
Site Info. (RDS) of the current
data.frame
, downloadable from Data Input or Data QC pages.Chromosome Info. (CSV): Reference genome information of the current study.
Click here: Download an example of Chromosome Info.(CSV).
▼ Example: The Chromosome Info. of rice (Data source: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034140825.1/)
Chr Start End Chr01 0 43929697 Chr02 0 36447916 Chr03 0 37399924 Chr04 0 36078568 Chr05 0 30400764 Chr06 0 32122276 Chr07 0 29936421 Chr08 0 28605474 Chr09 0 27474823 Chr10 0 23931887 Chr11 0 31111469 Chr12 0 28271460
Steps:
Upload Site Info. (RDS) and Chromosome Info. (CSV).
Choose a window size in kilobases (kb).
Click the Summary button. This will calculate the density of SNPs across the genome.
Outputs:
SNP Density Plot (PDF): An ideogram visualizing SNP density across the genome within a defined window size. A gradient color palette is used to represent varying SNP densities: green for lower densities, yellow for medium densities, and red for higher densities, with grey indicating regions with zero SNP.
SNP Density (CSV): A table detailing SNP density across each chromosome. bp_over_SNPs: The total base pairs (bp) per SNP in each window, representing the average spacing between SNPs. SNPs_over_1000bp: The number of SNPs per 1,000 base pairs, providing a normalized measure of SNP density across the genome.
SNP Density Complete!