New polygenic risk score data and new features in the CVDKP

Today there are several exciting developments for the Cardiovascular Disease Knowledge Portal. We now provide open access to files specifying risk scores for five major complex diseases, as described in a paper published today. Additionally, new interfaces that simplify the interpretation of genetic association data have been added to the CVDKP, making it easier to pinpoint variants and datasets that are informative for a disease or phenotype of interest.

Genome-wide polygenic risk score (GPS) variant weight files available in the CVDKP

One promise of the genomic era is that we will be able to predict from people's genotypes whether they are at risk of developing disease. Although this is now possible for some monogenic diseases, prediction of genetic risk for polygenic diseases has been more challenging. But a new paper, published today in Nature Genetics by Amit Khera, Mark Chaffin, and colleagues, brings us much closer to that goal for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer.

The authors developed methods for calculating genome-wide polygenic risk scores (GPS) for those five diseases, using sophisticated algorithms to derive parameters for calculating the GPS based on large, imputed GWAS datasets. Validation and testing of the GPS scoring methods were facilitated by the availability of huge numbers of samples in data provided by the UK Biobank: the researchers were able to use a validation set of over 120,000 samples and a testing set of nearly 290,000.

For one of these diseases, coronary artery disease (CAD), monogenic predictors were already known: carriers of mutations causing familial hypercholesterolemia have a higher risk of CAD than the general population. The GPS predictor for CAD performed much better, identifying 20-fold more individuals than the monogenic predictors. Applying all five GPS predictors to the test population, the researchers found that over 19% of the subjects had 3-fold or greater increased risk for at least one of the five diseases.

The ability to classify patients by polygenic risk score has several important implications. Most critically important is that an individual's risk can be determined long before symptoms begin, even at birth, and preventive measures such as a healthy lifestyle or statin use can be started well before any damage has occurred. Scoring genetic risk will also help researchers to select the appropriate populations for clinical trials, and could also help physicians to determine which patients could benefit most from disease screening.

These GPS predictors are necessarily tailored for people of European ancestry, since sufficiently large sample sizes are currently only available for this population. The authors stress that this underscores the need for genetic data from other populations, so that these methods with their potentially major impacts on human health can be applied worldwide.

To help promote research on disease risk prediction, the authors have provided files listing the variants and weights for each of the five GPS that may be downloaded from the Data page of the CVDKP.

New CVDKP features help distill knowledge from data

We are pleased to announce four new features in the CVDKP:
  • "Clumping" variants by linkage disequilibrium
"Manhattan plots" showing the significance of variant associations for a particular phenotype plotted against their chromosomal locations are available from the "View full genetic association results for a phenotype" menu on the CVDKP home page.  Now, when viewing a Manhattan plot you may select a threshold (r2 = 0.1, 0.2. 0.4, 0.6, or 0.8) for linkage disequilibrium (LD) in order to reduce the number of linked variants representing a single association signal that are shown on the plot.  
  • New Region page
The Gene page of the CVDKP integrates and summarizes information about the associations of variants across the region of a gene. Now, you can see this integration and summation for any region of the genome, not just the areas surrounding protein-coding genes. Simply enter a chromosome and coordinates in the home page search box:


The resulting page resembles a Gene page. The traffic light integrates all associations across the region to give you an immediate indication of whether there are significant associations found in any of the datasets in the CVDKP. Further down the page, tools and displays let you drill down to the specifics for a phenotype or variant of interest. This new Region page provides a way to explore any part of the genome in great detail.
  • PheWAS graphic on the Variant page
Previously, the Variant page of the CVDKP displayed significant associations for each variant in a graphic that showed a color-coded box for each phenotype-dataset combination. But the rapidly increasing number of phenotypes has made this view unsustainably large. In its place, we have incorporated a phenome-wide association study (PheWAS) visualization developed at the University of Michigan. The graphic shows at a glance which phenotype associations are most significant for a particular variant. Mouse over a point to see more details.


  • All Associations graphic on the Variant page
The PheWAS graphic distills variant associations in order to highlight the most significant ones. But suppose you want to drill down to the details and explore associations in every dataset, viewing parameters like sample size, odds ratio, and more? There's a graphic for that too: our new All Associations interactive graphic, located in the "Associations across all datasets" section of the variant page. Start by using keywords to filter phenotypes. Filtering allows you to view one specific phenotype, several related phenotypes, or phenotypes in a broad category, such as lipid phenotypes; both the graphic and the table below it change in response to phenotype filtering.  There are also options to filter by setting ranges of p-values and/or sample sizes.

The graph plots p-value (vertical axis) vs. dataset sample size (horizontal axis) for each association. Points in the graph are triangular; whether the triangle points up or down indicates a positive or negative direction of effect, respectively. Mousing over a point shows you more details about the association and the dataset. This graphic can help you evaluate whether an association is likely to be real. As shown in the illlustration below, a genuine signal should increase in significance (i.e., decrease in p-value) with increasing sample size.





Like the rest of the CVDKP, these features are under continuous development. Please give them a try and let us know what you think!

Comments

Popular posts from this blog

Large new atrial fibrillation dataset now in the CVDKP

Announcing the Cardiovascular Disease Knowledge Portal