2 Data QC

➡️ This section contains three subpages: Sample QC, SNP QC, and SNP | Density, | allowing you to assess the quality of samples and SNPs in data.frame, as well as visualize SNP density across the genome.

2.1 Sample QC

Required Dataset (one of the following):

  • data.frame file from the Data Input page

  • SNP post-QC data.frame file from the subpage Data QC/SNP QC


Step 1: Get Summary

First, obtain the sample summary statistics (missing rate and heterozygosity rate) by clicking both Summary buttons and you will see the results.


Step 2: Sample QC

Adjust the thresholds and click the Sample QC by Thresholds button. This will generate the Post-QC data.frame file.

Note: If you prefer not to perform sample QC by sample missing rate or heterozygosity rate, please set the threshold to 0.


Outputs:

  • data.frame (RDS): Updated data.frame file. It’s necessary for downstream analyses, please download and save it!

  • Site Info. (RDS): Updated SNP site information file. It’s necessary for downstream analyses, please download and save it!

Sample QC Complete!


2.2 SNP QC

Required Dataset (one of the following):

  • data.frame file from the Data Input page

  • Sample post-QC data.frame file from the subpage Data QC/Sample QC


Step 1: Get Summary

First, obtain the SNP summary statistics [missing rate, minor allele frequency (MAF), heterozygosity rate, and Hardy-Weinberg equilibrium (HWE)] by clicking all Summary buttons and you will see the results.


Step 2: Sample QC

Adjust the thresholds and click the SNP QC by Thresholds button. This will generate the Post-QC data.frame file.

Note: If you prefer not to perform QC based on SNP missing rate or heterozygosity rate, set the missing rate threshold to 1, the MAF to 0, and the heterozygosity rate to 0 and 1. Additionally, leave the ‘Do SNP QC by HWE’ checkbox unticked to skip QC based on SNP HWE.


Outputs:

  • data.frame (RDS): Updated data.frame file. It’s necessary for downstream analyses, please download and save it!

  • Site Info. (RDS): Updated SNP site information file. It’s necessary for downstream analyses, please download and save it!

SNP QC Complete!


2.3 SNP Density

Required Dataset (one of the following):

  • Site Info. (RDS) of the current data.frame, downloadable from Data Input or Data QC pages.

  • Chromosome Info. (CSV): Reference genome information of the current study.

    Click here: Download an example of Chromosome Info.(CSV).

➡️ Example: Chromosome Info. of rice (Data source: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034140825.1/)
Chr Start End
Chr01 0 43929697
Chr02 0 36447916
Chr03 0 37399924
Chr04 0 36078568
Chr05 0 30400764
Chr06 0 32122276
Chr07 0 29936421
Chr08 0 28605474
Chr09 0 27474823
Chr10 0 23931887
Chr11 0 31111469
Chr12 0 28271460

Steps:

  1. Upload Site Info. (RDS) and Chromosome Info. (CSV).

  2. Choose a window size in kilobases (kb).

  3. Click the Summary button. This will calculate the density of SNPs across the genome.


Outputs:

  • SNP Density Plot (PDF): An ideogram visualizing SNP density across the genome within a defined window size. A gradient color palette is used to represent varying SNP densities: green for lower densities, yellow for medium densities, and red for higher densities, with grey indicating regions with zero SNP.

  • SNP Density (CSV): A table detailing SNP density across each chromosome. bp_over_SNPs: The total base pairs (bp) per SNP in each window, representing the average spacing between SNPs. SNPs_over_1000bp: The number of SNPs per 1,000 base pairs, providing a normalized measure of SNP density across the genome.

SNP Density Complete!