Software

For QC and statistical analyses will use plink 1.9, and for the data visualization we will use R.


We will be working with plink, a free, open-source whole-genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. Even though the focus of plink is on analysis of genotype/phenotype data, it is widely used in popgen as it has many features for data manipulation, it offers basic statistics, and many popgen tools assume input files to be in plink format (e.g. fastStructure, ADMIXTURE, etc.). Plink parses each command line as a collection of flags (each of which starts with two dashes --), plus parameters (which immediately follow a flag).

plink can either read
1. text-format files (.ped + .map) or
2. binary files (.bed + .bim + .fam).

Because reading large text files can be time-consuming, it is recommended to use binary files.

plink data formats (Marees et al. 2017)

Text plink data consist of two files (which should have matching names and they should be stored together):
1. .ped contains information on the individuals and their genotypes;
2. .map contains information on the genetic markers.

Binary plink data consist of three files (one binary and two text files which should have matching names and which should be stored together):
1. .bed contains individual identifiers (IDs) and genotypes,
2. .fam contain information on the individuals,
3. .bim contains information on the genetic markers.

Analysis using covariates often requires the fourth file, containing the values of these covariates for each individual.

Note

At evop-login server PLINK v1.9 can be used by typing plink1.9. For simplicity reasons let's create an alias so we can call plink1.9 by simply typing plink. To do so we have to add alias plink='plink1.9' in our .bashrc file. We can do it directly from command line with help of echo and append >> (be carefule to run append only once).

# add plink alias in .bashrc    
echo "alias plink='plink1.9'" >> ~/.bashrc

# source .bashrc to actualize the changes
source ~/.bashrc

# check if plink is added to the list of aliases
alias

R

The focus of this tutorial is on GWAS and thus we are paying less importance to steps involving data visualization in R. This is why R scripts are ready to be used from command line. However, we encourage you to take a look at each of them and try to understand what is inside. Most of the R scripts are written in tidyverse syntax and in order to run them you will need to install several R packages.

Required R packages

  • tidyverse
  • ggpubr
  • qqman