Health care is a misnomer for our medical system–It should be called sick care. Doctors mostly make their money when we are sick. What if doctors really could prevent disease? —well they can, but you need to be prepared to do the work because disease prevention is about:
- Lifestyle (what you eat, your weight and how much you exercise—covered here)
- Exposure to disease (wash your hands)
- Keeping great medical records (don’t get me started on a. doctors keeping paper files, b. doctors making it difficult to get your medical records (push them) and c. electronic medical record systems having different formats for the data (more))
- Documenting and understanding your genome (DNA)
This set of notes will dig into “4” – Your genome! My hope is to explain this subject in a way where you can understand how to get your genome data, view it at a high level, view the details and begin to understand the interworking’s of your genetic makeup so you understand the value of leaving your ‘sick care’ doctor behind and finding a true personalized ‘health care’ MD).
Step 1: Have your genome mapped
There are many low-cost direct-to-consumer DNA mapping sites and this linked article will explain a few options for you to consider (here is another). I personally like 23andMe ($199 USD) because it does a great job of explaining DNA to a novice and a professional, they seek FDA approval, and the site allows you to download your data.
Let’s first cover a few standard definitions to make sure we are all on the same page:
- DNA (deoxyribonucleic acid) – A molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms and many viruses.
- Chromosome – a DNA molecule with part or all the genetic material (genome) of an organism. Human cells have 23 pairs of chromosomes (22 pairs of autosomes and one pair of sex chromosomes), giving a total of 46.
- Genes – From 23andMe, “Genes are segments of DNA that tell your body how to function and what traits to express. People have about 22,000 genes in their genome. Most of these come in duplicate – one copy from your mother and one from your father. Everyone has the same set of genes, but each one can vary by a few letters (bases) between people. These “variants” can lead to differences in the way you look, how you respond to stimuli, and whether or not you are predisposed to certain diseases.”
Once you get your data back from one of these direct-to-consumer genome mapping sites you will have access to their portal. I’m going to use 23andMe as the example, but many are similar. When you get your report, you can easily go to the ‘Health’ section and see what it is reporting. It will look something like the following:
Step 2: Download your raw genome data to a safe password protected and encrypted location.
If you are using 23andMe you can download your raw data using these instructions. If you know what you are looking for you can also dig into your raw data here (more on this later).
But what can you do with your raw genome data?
WARNING: This is where things get a bit tricky. There are five very important things to know:
- Some sites (like Promethease) list all the SNP markers (From 23andMe, “A marker is a specific location in the genome where a genetic sequence has been shown to vary between people. Markers are denoted by a unique identifier, most often an “rs number”) associated with different traits and diseases, as curated from SNPedia. Drawing any conclusion from this reporting is often frowned upon by geneticists. There is such a thing as an SNP that is strongly associated with a disease (These are typically the ones 23andme has FDA approval to report—example BRCA1/2 The individual gene mutations BRCA1 increases the risk of breast cancer. Angelina Jolie is just one of the thousands of women who chose bilateral prophylactic mastectomy to mitigate the increased risk of the BRCA1 mutation.) but most common diseases are not really affected by any given SNP.
- The best analysis uses the compound effect of many SNPs with an understanding that each only contributes a small effect. This concept is called polygenic risk scoring (PRS). This allows scientists to take anyone’s genome and calculate your aggregate risk for certain diseases even if you don’t have one of the known major mutations. Polygenic Risk Scoring is the total score of all the minor gene variations that increase disease risk. This is a powerful upgrade to your doctor’s ability to predict disease in any given patient. This means doctors are no longer in the dark with only the family history to guide them. (here, here, here and here are 4 great articles on PRS)
- From the Norton & Elaine Sarnoff Center for Jewish Genetics: “Your “risk score” is not an absolute determinant of your health, personal lifestyle choices have an effect – You’ve heard of nature versus nurture. If an individual with a high “risk score” acts on preventative care advice, they may decrease their risk of having a genetic disease. The opposite can be true of someone with a low “risk score”. (Read more: How do your genes and the environment interact?)”
- Be careful of companies target marketing supplements or programs at gene variants –always check with a licensed medical doctor (MD) before taking any actions.
Step 3: Mapping your raw data to the SNPedia database (but heed warning #2 above)
I am going to use the Promethease site. A report is $12, and it can directly connect your 23andMe DNA data with the SNPedia human genetics wiki. It also provides information on the effects of genetic variants on Phenotypes (the composite of the organism’s observable characteristics or traits, including its physical form and structure; its developmental processes; its biochemical and physiological properties; its behavior, and the products of behavior, for example, a bird’s nest. An organism’s phenotype results from two basic factors: the expression of an organism’s genetic code, and the influence of environmental factors.) and the information is sourced from peer-reviewed scientific publications. Keep in mind that the match against the SNPedia database may be wrong, as the raw data is not held to the same quality level as that which is part of an FDA approved report from 23andMe.
The report only takes 5 to 10 minutes to generate and you will get it via email as a zip file and via their website. It will look like the figure below where you have a search panel on the right and the data on the left. In the example below, you can see the SNP (Single Nucleotide Polymorphism) marker is rs1333049 (From 23andMe, “A marker (SNP) is a specific location in the genome where a genetic sequence has been shown to vary between people. Markers are denoted by a unique identifier, most often an “rs number”, or “rsid”.”). You will also see the Position (From 23andMe, “If you stretched out all of the DNA in a chromosome from end to end, you could count the position of each letter (A,C,T,G) relative to the first one in the sequence. This count is referred to as a genome coordinate or position. 23andMe uses the same coordinates as the National Center for Biotechnology Information (NCBI), build 37.”). You will also see the Magnitude (From SNPedia.com, “Magnitude is a subjective measure of interest varying from 0 to 10. Over time it should be adjusted up or down by the community.” The range is from 0 (you have the common genotype) to 10 (significant information).) You probably only want to review magnitude 3 and above.
If you click on the SNP marker hyperlink rs1333049 you will be taken to the details page in the WIKI.
From the page above on the far right, you have links to many great sites including Ensembl and 23andMe’s detail pages.
Once on the 23andMe page you can also see the Variant (From 23andMe, “At any position in the genome that varies, there is more than one possible version (or variant) of the DNA sequence. For example, some people might have an A at a certain position, whereas other people might have a T.” Genetic variations, or variants, are the differences that make each person’s genome unique. DNA sequencing identifies an individual’s variants by comparing the DNA sequence of an individual to the DNA sequence of a reference genome maintained by the Genome Reference Consortium (GRC).) and Your genotype at a marker (From 23andMe, “Your genotype at a marker is the combination of variants that you have at that position on both chromosomes’ copies. For example, if you have the A on one chromosome copy and a T on the other one, your genotype is AT. Some chromosomes don’t come in pairs (i.e. the mitochondrial chromosome and, for the most part, the X and Y chromosomes in men), so your genotype can sometimes be a single letter.”)
There are several other tools out there to get information on each one of the SNP markers. One of the best is found here at NIH.gov. With this, you can search for many research articles per SNP marker.
Now that you have all that data, please reread Warning #2 above!
Step 4: Map your data to known polygenic algorithms
These sites are reported to be working with polygenic risk scores:
- https://www.thehonestgene.org (Ancestry and BMI predictions)
- https://DNA.land (Ancestry)
Keep in mind that this is a relatively new science that has been enabled by the mapping of the human genome. The research is coming out fast. As an example, Sekar Kathiresan and his colleagues at Harvard University and the Broad Institute have been focused on variations linked to coronary artery disease, atrial fibrillation (an irregular heart rate), type 2 diabetes, inflammatory bowel disease, and breast cancer. They developed an algorithm that could use all this information on a disease’s genetic variants to produce a polygenic risk score, a single number that would indicate a person’s risk of developing each disease based on their genomic data. Their algorithm identified 20 times more people at high risk of a heart attack than did the traditional method of just looking for the variant that indicates inherited high cholesterol. If more people know they’re at risk, they can go on medication or start making lifestyle changes to prevent the onset of the disease. You can get a copy of the report here or here.
As an example, here is data from Impute.me a non-profit (please donate) genetics analysis site run by independent academics since August 2015. Their design goal is to provide analysis at the cutting edge of what is currently known and possible in genetics research. A central part of their site is the creation of a guidebook for personal genome analysis. This book provides more in-depth explanations for many of the concepts involved and it’s highly recommended as a guide to accompany your analysis. (New: Updates to the site will be announced at twitter).
Let’s go into a couple interesting things you can do with their site. Note that I am using the text below directly from the Input.me website.
A polygenic risk score is a value that gives a summary of a large number of different SNPs – each of which contributes a little to disease risk. The higher the value, the higher the risk of developing the disease. Of course, the interpretation of this risk depends a lot on other factors as well: How heritable the disease is. How much of this heritability we can explain with known SNPs. And not least, what would the risk of disease be for you otherwise, i.e. without taking the genetic component into account. Because the polygenic risk score is only a risk-modifier, knowledge of these three other values are all required if you want to know your overall risk is, i.e. what’s the chance in percent. This calculator cannot provide that. But it can provide a view of the known genetic component of your disease risk, based on all the SNPs that we know are associated with the disease. This, we believe, makes it a better choice for complex diseases than the typical one-SNP-at-the time analysis typically seen in consumer genetics.
If you upload your 23andMe data after a couple of days you will have access to this site and a unique ID that will be good for 2 weeks.
This is a module that can visualize the entire compendium of human disease – at each point showing relevant genetic findings. The goal is to illustrate how to present genetic data depending on a medical status.
Diseases, where one mutation has a strong medical effect on you are luckily rare. For the majority of people, learning from our genes is instead matter risk modifications and weak predictions. For a healthy adult, these are typically of little practical use. However, the assumption changes drastically if you are not healthy; If you are anyway being evaluated for a given disease, it may very well be useful to know if a different but medically related diagnosis has a particularly high or low risk.
For example, if a person is suffering from mental problems, but have not yet been properly evaluated for any specific diagnosis, then genetic risk information for all diseases related to mental problems may become useful knowledge. Because the information can then serve as a guiding point in that difficult challenge of first diagnosis. Similar examples can be made for virtually all areas of early medical evaluation.
It is the purpose of the module to help with this: By forcing browsing into pre-defined sets of disease-areas, the algorithm provides you only with genetic information that is relevant to.your current medical status. Nothing more, nothing less. Risk scores relevant to the medical area you are interested in will be shown. Fluke signals from irrelevant disorders will not. The details behind all information given here can be explored in the remaining modules of the site, as indicated when you click on each of colored bubbles above. As such this module can serve as an entry-way into the entire site, depending on your context and interest.
In the root of the tree, we find ‘feeling fine’, which is always a neutral color: People who feel fine don’t need to worry about their genetic risk scores. However, when selecting ‘heading to hospital’, climbing up the tree, the genetic risk scores are revealed as they become relevant. More of the thinking behind this module is explained in this short animation-video from 2017.
The overview of rare disease variants found in this module is not the most extensive single-SNP effects available online. They are shown here because they are all well-supported strong genetic effects, for a selection of rare inherited diseases where microarray analysis made sense. This was the reason these SNPs were included in the 2016-version of the 23andme health.
Especially the last part – that microarray analysis made sense – is very important when analyzing the genetics of rare disease; the microarray technology used in consumer genetics is not optimal because the really strong mutations typically are not measured on a microarray. DNA-sequencing is required to detect them. Therefore microarray analysis of rare disease effects has many problems with false negative results. There’s a lot of further details to this discussion, chapter 3.5 in this book is a good place to seek more information.
Nonetheless, the 2016-selection of microarray-measurable SNPs made by 23andme still is reasonably relevant to report, particularly for the carrier-information. For non-23andme users, this module has the additional benefit of translating the data for proprietary 23andme SNPs, with the caveat that because the SNPs are very rare they are often hard to impute.
This is a test of a systematic approach to drug-response SNPs. Most of the known drug-response-associated genetics concern liver enzymes (e.g. CYP2C19) and their break-down of drug metabolites. These are well characterized elsewhere already. The focus of this module is to integrate systematic multi-SNP profiles beyond liver enzymes and provide estimates of drug-response.
To illustrate how this works, the module shows the calculations that take place for a number of drug response predictions, both on a per-drug level and on a per-SNP level, corresponding to the first and the second table. The first table summarizes per-drug calculation whenever possible. If possible, a Z-score is calculated in the same way as also described in the complex disease module. If not, it is indicated as ‘not calculated’. In that case, it is necessary to look at the second table for comments on the individual SNPs from the input studies. The Z-score approach takes information from many SNPs, and can, therefore, be considered as more thorough, of course depending on the underlying scientific study.
Most SNPs in the genome are not actually found within a gene: They are ‘intergenic’. When talking about a gene-mutation however, as is done in popular media, most often the meaning is a SNP that alters the sequence of a gene. Because of selection pressure throughout our evolution, these are rare. Also, they are often the focus of scientific studies using DNA-sequencing technology to discover the causes of rare diseases. However, interestingly many of us actually have these ‘gene-breaking’ SNPs while nonetheless being perfectly healthy. The imputation technology used with this site gives the opportunity to identify a number of these based on just on genotyping microarray results. If you give your ID-code to this module a table of all measured missense and nonsense mutations will be presented.
Interpretation of the table can be done in many ways and unlike other modules, this does not give ‘one true answer’. One method is to search for SNPs where you have one or two copies of the non-common allele and then investigate the consequence using other resources such as dbSnp or ExAC. Note however that the definition of ‘common’ is very dependent on ethnicity: in this browser common just means the allele most often found in impute.me-users. However, it is recommended to check the ethical distribution in e.g. the 1000 genomes browser. Another help provided is the polyphen and SIFT-scores, which can give an indication of the consequence. Ultimately the goal of this is to satisfy one’s curiosity about the state of your functional genes. If you happen to find out that you carry two copies of completely deleterious mutations (nonsense mutation) but otherwise feel healthy, feel free to contact us. By being healthy, in spite of a specific broken gene, you’d be contributing to complete our view of genes and how they work.
Thousands of mutations in the BRCA1 and BRCA2 genes have been documented. 23andMe reports data for three mutations that account much of inherited breast cancer, but other possible mutations in these two genes are not included in the 23andme report. Many can only be detected by sequencing, such as from myriad genetics. However, dozens of extra possible mutations of interest can be reached with imputation analysis. The following lists your genotype for the directly measured three 23andme-SNPs as well as all other SNPs in the two genes that are either missense or nonsense. For interpretation, we recommend reading more about polyphen, sift-scores, and clinvar.
If clinvar is indicated as pathogenic and the SNP is measured in your genome and your genotype is not of the genotype indicates as normal, then this indicates a potential problem. The list is sorted according to the clinvar variable by default.
UK Bio-bank Calculator
A study of ~½ million UK residents, known as the UK biobank, has recently been published. This module allows the calculation of a genetic risk score for any of the published traits.
Now that you have all that data please reread Warning #4 above.
Step 5. Make a plan.
If you are high risk for coronary artery disease see a cardiologist. If you are at high risk for breast cancer, mental illness, eye problems etc. see a medical (MD) specialist.
…but be wary of Warning #5 above–don’t go see a “quack” and don’t self medicate!
..but also do your homework and understand if the specialist is up to date on the latest and greatest –for example, if you see a psychiatrist for ADHD make sure they are trained in epigenetics.
- Book: Understanding your DNA
- Reddit: 23andMe, Promethease, SNPedia
- PRSice: (nih article) (download) (manual)
- PLINK (download) (using 23andMe data)
- Google Genomics (and 23andMe data)