4 Data QC

➡️ This section contains three subpages: Sample QC, SNP QC, and SNP Density, allowing you to assess the quality of samples and SNPs in data.frame, as well as visualize SNP density across the genome.

4.1 Sample QC

Required Dataset (one of the following):
  • data.frame file from the Data Input page

  • SNP post-QC data.frame file from the subpage Data QC/SNP QC

Step 1: Get Summary

First, obtain the sample summary statistics (missing rate and heterozygosity rate) by clicking both Summary buttons and you will see the results.

Step 2: Sample QC

Adjust the thresholds and click the Sample QC by Thresholds button. This will generate the Post-QC data.frame file.

Note: If you prefer not to perform sample QC by sample missing rate or heterozygosity rate, please set the threshold to 0.

Outputs:
  • data.frame (RDS): Updated data.frame file. It’s necessary for downstream analyses, please download and save it!

  • Site Info. (RDS): Updated SNP site information file. It’s necessary for downstream analyses, please download and save it!

Sample QC Complete!

4.2 SNP QC

Required Dataset (one of the following):
  • data.frame file from the Data Input page

  • Sample post-QC data.frame file from the subpage Data QC/Sample QC

Step 1: Get Summary

First, obtain the SNP summary statistics [missing rate, minor allele frequency (MAF), heterozygosity rate, and Hardy-Weinberg equilibrium (HWE)] by clicking all Summary buttons and you will see the results.

Step 2: Sample QC

Adjust the thresholds and click the SNP QC by Thresholds button. This will generate the Post-QC data.frame file.

Note: If you prefer not to perform QC based on SNP missing rate or heterozygosity rate, set the missing rate threshold to 1, the MAF to 0, and the heterozygosity rate to 0 and 1. Additionally, leave the ‘Do SNP QC by HWE’ checkbox unticked to skip QC based on SNP HWE.

Outputs:
  • data.frame (RDS): Updated data.frame file. It’s necessary for downstream analyses, please download and save it!

  • Site Info. (RDS): Updated SNP site information file. It’s necessary for downstream analyses, please download and save it!

SNP QC Complete!

4.3 SNP Density

Required Dataset (one of the following):
  • Site Info. (RDS) of the current data.frame, downloadable from Data Input or Data QC pages.

  • Chromosome Info. (CSV): Reference genome information of the current study.

    Click here: Download an example of Chromosome Info.(CSV).

    ▼ Example: The Chromosome Info. of rice (Data source: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034140825.1/)

    Chr Start End
    Chr01 0 43929697
    Chr02 0 36447916
    Chr03 0 37399924
    Chr04 0 36078568
    Chr05 0 30400764
    Chr06 0 32122276
    Chr07 0 29936421
    Chr08 0 28605474
    Chr09 0 27474823
    Chr10 0 23931887
    Chr11 0 31111469
    Chr12 0 28271460
Steps:
  1. Upload Site Info. (RDS) and Chromosome Info. (CSV).

  2. Choose a window size in kilobases (kb).

  3. Click the Summary button. This will calculate the density of SNPs across the genome.

Outputs:
  • SNP Density Plot (PDF): An ideogram visualizing SNP density across the genome within a defined window size. A gradient color palette is used to represent varying SNP densities: green for lower densities, yellow for medium densities, and red for higher densities, with grey indicating regions with zero SNP.

  • SNP Density (CSV): A table detailing SNP density across each chromosome. bp_over_SNPs: The total base pairs (bp) per SNP in each window, representing the average spacing between SNPs. SNPs_over_1000bp: The number of SNPs per 1,000 base pairs, providing a normalized measure of SNP density across the genome.

SNP Density Complete!