For researchers

schedule

6 min

One kind of genetic analysis we do at Nucleus is known as “polygenic scores” (PGS) or “polygenic risk scores” (PRS). Our reports refer to this analysis as “common genetic scores.” Common genetic scores are calculated by considering the combined impact a user’s common genetic variants have on their disease risk or trait expression. The effect of each common variant is measured from large genome-wide association studies (GWAS) (1). GWAS are case/control studies that seek to understand how specific DNA differences, usually single nucleotide polymorphisms or SNPs, are associated with a particular trait or disease.

Let’s dive deeper into how Nucleus calculates these scores, how they are translated into absolute scores, and their integration with classical Mendelian analysis.

01

Calculating a Z-score

The first step in calculating a common genetic score is determining the number of risk alleles an individual has that are associated with a particular disease or trait. This number is then multiplied by each respective variant’s reported effect size.

This value is then transformed such that when measured in many individuals in the general population, the average score is zero, and the standard deviation in the population equals 1 (‘unit-variance’). This effectively makes every common genetic score returned to Nucleus customers a “Z” or “standardized” score.

The scaling and centering is based on allele frequencies taken from relevant and user-matched populations, primarily the 1000 Genomes Project (2,3). This means that Z-scores, as given, should be interpreted as being relative to users of the same ancestry group. Given that most GWAS are primarily performed with people of white European ancestry, this approach is one way Nucleus is addressing the current ancestry biases in polygenic scores (4).

The exact source of the GWAS association data differs for each report. They include scores from the Polygenic Score Catalog (5), in-house trained scores, and scores based directly on published results from GWAS studies. This is explained in greater detail in Folkersen et al. (2020) (6), but the guiding principle is to always provide the most recent, well-established, and predictive scores for each disease and trait.

02

Calculating absolute risk scores

Importantly, the Z-score indicates the magnitude of genetic risk relative to people of the same ancestry group — it is not a disease risk prediction as defined in Wand et al. (2021) (7).

Disease risk predictions are given in terms of percentages. In a specific group of people with a given set of genetic (Z-scores) and non-genetic factors (e.g., age, BMI, sex at birth), an absolute risk score model predicts that a certain percentage of the group will have or develop a trait or disease. The continuous calibration and optimization of these disease risk predictions are currently based on large public biobank databases.

It is important to note that the calibration considers ancestry: the PRS Z-score’s effect on the disease risk prediction is modulated according to the genetic distance of the user’s ancestry from the ancestry of the training population (8). This is the second way Nucleus is addressing the well-known ancestry biases in polygenic scores.

03

Classical Mendelian analysis

Finally, Nucleus Premium, in addition to polygenic scores, analyzes a customer’s whole-genome sequence for rare pathogenic and likely-pathogenic variants (collectively called “high-effect variants” in our reports) in genes associated with a particular disease or trait. This addresses the well-known problem of over-reliance on only one “type” of genetics and bridges the gap between rare and common genetic variant analysis (9).

Variants are classified using an internally validated combination of tools within the Ensembl Variant Effect Predictor (VEP), publicly and privately available databases such as ClinVar and HGMD, and variant interpretation guidelines provided by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology. If a pathogenic or likely pathogenic variant is found, the customer receives a different, more specialized report that focuses on the risks associated with that particular variant.

Heritability

Taken together, this approach of combining common variant genetics, non-genetic factors and rare variant genetics in each report provides a highly comprehensive view of someone’s health and future well-being.

When it comes to understanding the overall effect of genetics on any given disease or trait, we have an important upper bound in the form of broad-sense heritability (H²). H² can be calculated for any trait or disease. It is a percentage that indicates how much of the variation in any given disease or trait can theoretically be explained by DNA.

Broad-sense heritability is included within each disease report. The goal is to reiterate the point that DNA is an essential tool when studying health and well-being, and has a strong influence on someone’s life. However, it is not a deterministic crystal ball — even if one day we know everything about how genomics affects diseases and traits.

References

1

Lee SH, van der Werf JH, Hayes BJ, et al🔗 Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PLoS Genet. 2008 Oct;4(10):e1000231. doi: 10.1371/journal.pgen.1000231. Epub 2008 Oct 24. PMID: 18949033.

2

1000 Genomes Project Consortium; Auton A, Brooks LD, Durbin RM, et al🔗 A global reference for human genetic variation. Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393. PMID: 26432245.

4

Martin AR, Kanai M, Kamatani Y, et al🔗 Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019 Apr;51(4):584-591. doi: 10.1038/s41588-019-0379-x. Epub 2019 Mar 29. Erratum in: Nat Genet. 2021 May;53(5):763. PMID: 30926966.

6

Folkersen L, Pain O, Ingason A, et al🔗 Impute.me: An open-source, non-profit tool for using data from direct-to-consumer genetic testing to calculate and interpret polygenic risk scores. Front Genet. 2020 Jun 30;11:578. doi: 10.3389/fgene.2020.00578. PMID: 32714365.

7

Wand H, Lambert SA, Tamburro C, et al🔗 Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021 Mar;591(7849):211-219. doi: 10.1038/s41586-021-03243-6. Epub 2021 Mar 10. PMID: 33692554; PMCID: PMC8609771.

8

Privé F, Aschard H, Carmi S, et al🔗 Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet. 2022 Jan 6;109(1):12-23. doi: 10.1016/j.ajhg.2021.11.008. Erratum in: Am J Hum Genet. 2022 Feb 3;109(2):373. PMID: 34995502.

9

Lacaze P, Manchanda R, Green RC. 🔗 Prioritizing the detection of rare pathogenic variants in population screening. Nat Rev Genet. 2023 Jan 13. doi: 10.1038/s41576-022-00571-9. Epub ahead of print. PMID: 36639513.

history_toggle_off

LAST UPDATED 02/02/2024

HIPAA-COMPLIANT

CLIA-CERTIFIED

CAP-accredited

Made in the U.S.A.

@ 2024 Nucleus Genomics, Inc.

@ 2024 Nucleus Genomics, Inc.

HIPAA-COMPLIANT

CLIA-CERTIFIED

CAP-accredited

Made in the U.S.A.

HIPAA-COMPLIANT

CLIA-CERTIFIED

CAP-accredited

Made in the U.S.A.

@ 2024 Nucleus Genomics, Inc.