6 Population Structure

➡️ This section contains seven subpages: PCA, DAPC, UPGMA Tree, NJ Tree, Kinship, Scatter PlotPlus, and Tree PlotPlus allowing you to conduct various population structure analyses and customize your plot.

6.1 PCA (Principal Component Analysis)

A widely used method to uncover underlying population structure by reducing the dimensionality of genetic data.

Required Dataset:
  • data.frame
One Step:
  1. Click the Run PCA button to generate PCA plots and the following downloadable files.

Note: You can upload the Group Info. (from Population Structure/DAPC) or Core Sample Info. (from Core Collection/Core Sample Set) to classify individuals and color them in the PCA Scatter Plot.

Outputs:
  • PCA Scatter Plot (PDF): A scatter plot showing the distribution of samples based on principal components, with each dot representing an individual.

  • PC Explained Variance Plot (PDF): Visualizes the variance explained by each principal component.

  • Explained Variance (CSV): Contains the explained variance of each principal component.

  • PCA Transformed Data (CSV): Dataset transformed into principal components, with samples as rows and principal components as columns.

  • PCA Object (RDS): Contains all PCA results for future use and reproducibility, and can be used as input data in the Population Structure/Scatter PlotPlus subpage.

PCA Complete!

6.2 DAPC (Discriminant Analysis of Principal Components)

A multivariate method for identifying and visualizing genetic clusters by combining PCA and Linear Discriminant Analysis (LDA) (Jombart, Devillard, and Balloux 2010). For more information, visit https://adegenet.r-forge.r-project.org/files/tutorial-dapc.pdf.

Required Dataset:
  • genind
Step 1: Cluster Identification
  1. Click the Run DAPC I button to determine the optimal number of clusters (the lowest BIC value indicates the optimal number of clusters).

Note: The default number of PC axes for cluster identification is set to retain PCs that capture up to 80% of the total variance. You can refer the “PC Explained Variance Plot” in the Population Structure/PCA subpage.

Step 2: DAPC Analysis
  1. Choose the number of cluster (K) based on the “Bayesian Information Criterion (BIC) Plot”.

  2. Click the Run DAPC II button to generate DAPC plots and the following downloadable files.

Note: You can download the “DAPC Object” and upload it on Population Structure/Scatter PlotPlus subpage to customize your 2D and 3D scatter plots.

Outputs:
  • Bayesian Information Criterion (BIC) Plot (PDF): Visual representation of the BIC for model selection.

  • Density Plot of First & Second Discriminant Function (PDF): Displays the density of the first and second discriminant functions, with each row bar representing an individual.

  • DAPC Scatter Plot (PDF): A scatter plot showing the distribution of samples based on discriminant functions (x-axis: first discriminant function; y-axis: second discriminant function), with each dot representing an individual.

  • DAPC Membership Probability Plot (PDF): Visualizes membership probabilities of individuals in different groups, with each row bar representing an individual.

  • DAPC Group Info. (CSV): Contains the group assignments for each individual based on DAPC. This file used in various subpages.

  • DAPC Transformed Data (CSV): Dataset transformed into discriminant functions with samples as rows and discriminant functions as columns.

  • DAPC Object (RDS): Contains all results from the DAPC analysis for future reproducibility. It can be used as input data in the Population Structure/Scatter PlotPlus and Core Collection/Core SNP Set subpages.

DAPC Complete!

6.3 UPGMA (Unweighted Pair Group Method with Arithmetic mean) Tree

A classic approaach for constructing rooted trees based on genetic distance data. UPGMA tree is generated by poppr and ggtree packages (Yu et al. 2016; Kamvar, Tabima, and Grünwald 2014).

Required Dataset:
  • genlight
Steps:
  1. Choose the number of bootstrap replicates, which will be used for assessing the confidence of the branching structure.

  2. Click the Run UPGMA button to generate tree plot.

Note: You can download the “UPGMA Object” and upload it on Population Structure/Tree PlotPlus subpage to customize your phylogenetic tree.

Outputs:
  • UPGMA Phylogenetic Tree (PDF): A UPGMA rooted tree with a user-defined layout style.

  • UPGMA Object (RDS): Contains all information of the UPGMA tree, and can be used as input data in the Population Structure/Tree PlotPlus subpage.

UPGMA Tree Complete!

6.4 NJ (Neighbor-Joining) Tree

A method for building unrooted trees using genetic distance data. NJ tree is generated by ape and ggtree packages (Paradis and Schliep 2018; Yu et al. 2016).

Required Dataset:
  • genlight
One Step:
  1. Click the Run NJ button to generate tree plot.

Note: You can download the “NJ Object” and upload it on Population Structure/Tree PlotPlus subpage to customize your phylogenetic tree.

Outputs:
  • NJ Phylogenetic Tree (PDF): A NJ unrooted tree with a user-defined layout style.

  • NJ Object (RDS): Contains all information of the NJ tree, and can be used as input data in the Population Structure/Tree PlotPlus subpage.

NJ Tree Complete!

6.5 Kinship Analysis

A statistical method for assessing genetic relationships and relatedness among individuals based on shared alleles (Kang et al. 2010). Kinship matrix is generated by statgenGWAS package.For more information, visit https://rdrr.io/cran/statgenGWAS/man/kinship.html.

Required Dataset:
  • data.frame
Steps:
  1. Upload Group Info. from Population Structure/DAPC (optional). If uploaded, the order of samples will follow the group assignment; otherwise, it will follow the order of the original VCF data.

  2. Choose a method to run kinship analysis.

  3. Click the Run Kinship button to generate the kinship matrix.

Outputs:
  • Kinship Matrix Plot (PDF): A visual representation of the kinship matrix.

  • Kinship Matrix (RDS): Contains the kinship matrix data.

Note: This kinship matrix can be directly used as input for GAPIT package in genome-wide association studies (GWAS), helping to control for confounding effects.

Kinship Analysis Complete!

6.6 Scatter Plot Plus

Customize your scatter plot based on the results from Population Structure/PCA or Population Structure/DAPC.

Required Files:
  • PCA Object (PCA_prcomp_Object.rds file) or DAPC Object (DAPC_dapc_Object.rds file)
  • Group and Other Info. (modifiable from DAPC_Group_Info.csv)

Note: You can add more information about samples by adding new variables to the Group Info. file. Ensure that the sample order remains unchanged.

▼ Example of Group Info. file (CSV).

Steps:
  1. Upload PCA or DAPC Object (RDS)

  2. Upload Group and Other Info. (CSV)

  3. Click the Run Scatter Plot button to generate the 2D and 3D interactive scatter plots.

  4. Customize the scatter plot and click the Run Scatter Plot button again.

Note: The scatter plots are downloaded as HTML files and can be opened with browsers like Chrome or Edge.

Outputs:
  • 2D Scatter Plot (HTML): Two-dimensional interactive scatter plot with user-defined attributes.

  • 3D Scatter Plot (HTML): Three-dimensional interactive scatter plot with user-defined attributes.

Scatter Plot Plus Complete!

6.7 Tree Plot Plus

Customize your phylogenetic tree plot based on the results from Population Structure/UPGMA or Population Structure/NJ.

Required Files:
  • UPGMA Object (UPGMA_phylo_Object.rds file) or NJ Object (NJ_phylo_Object.rds file)
  • Group and Other Info. (modifiable from DAPC_Group_Info.csv)

Note: You can add more information about samples by adding new variables to the Group Info. file. Ensure that the sample order remains unchanged.

Steps:
  1. Upload UPGMA or NJ Object (RDS)

  2. Upload Group and Other Info. (CSV)

  3. Click the Run Tree Plot button to generate the tree plot.

  4. Customize the tree plot and click the Run Tree Plot button again.

Outputs:
  • Phylogenetic Tree Plot (PDF): A phylogenetic tree plot with user-defined layout style and attributes.

Tree Plot Plus Complete!

References

Jombart, Thibaut, Sébastien Devillard, and François Balloux. 2010. “Discriminant Analysis of Principal Components: A New Method for the Analysis of Genetically Structured Populations.” BMC Genetics 11 (1): 94. https://doi.org/10.1186/1471-2156-11-94.
Kamvar, Zhian N., Javier F. Tabima, and Niklaus J. Grünwald. 2014. Poppr: An R Package for Genetic Analysis of Populations with Clonal, Partially Clonal, and/or Sexual Reproduction.” PeerJ 2 (March): e281. https://doi.org/10.7717/peerj.281.
Kang, Hyun Min, Jae Hoon Sul, Susan K Service, Noah A Zaitlen, Sit-yee Kong, Nelson B Freimer, Chiara Sabatti, and Eleazar Eskin. 2010. “Variance Component Model to Account for Sample Structure in Genome-Wide Association Studies.” Nature Genetics 42 (4): 348–54. https://doi.org/10.1038/ng.548.
Paradis, Emmanuel, and Klaus Schliep. 2018. “Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R.” Edited by Russell Schwartz. Bioinformatics 35 (3): 526–28. https://doi.org/10.1093/bioinformatics/bty633.
Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2016. “Ggtree: An r Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Edited by Greg McInerny. Methods in Ecology and Evolution 8 (1): 28–36. https://doi.org/10.1111/2041-210x.12628.