Poster Presentation The 45th Lorne Conference on Protein Structure and Function 2020

Understanding differences in mutation tolerance between populations at the sequence and structural level (#329)

Elston N Dsouza 1 2 , Michael A Silk 1 2 , Moshe Olshansky 1 , David B Ascher 1 2
  1. Baker Institute, South Yarra, VIC, Australia
  2. Department of Biochemistry and Molecular Biology, Bio21, Parkville, VIC, Australia

Genomic  sequencing  has the potential  to revolutionise personalized medicine,  however despite the decreasing time and  costs leading to exponential increases in  the amount of data available, over 80% is from  people of European background. However, there are large  differences between ethnicities, which has complicated the widespread translation of approaches developed using this data. We have aimed to identify regions in proteins across the human proteome that vary in  selective pressure across different ethnic populations. Using 184,648 genomic sequences from gnomAD v2 and UK BioBank, we classified these into 8 distinct ethnicities. Using the Missense Tolerance Ratio (MTR); a direct estimate of purifying selection acting on a given region, selective pressure was measured at both the gene and protein residue level for each ethnic population.  Genes under differing selective pressure between ethnic populations featured significant deviations from neutral MTR scores. A statistical post-hoc analysis and multivariate outlier detection identified 132 canonical genes that have unique selective profiles across different ethnicities. These have been mapped to protein 3D structures in order to better understand the structural and mechanistic consequences to provide a clearer understanding of how selective pressure interplays with phenotype. Phenotypic data and disease status may be predicted with our population specific MTR scores using machine learning models, thereby highlighting differences in traits and disease susceptibility while incorporating ethnic-specific selective effects.