Every time I visit the doctor, I feel like I live in the dark ages

Health care is a misnomer for our medical system–It should be called sick care. Doctors mostly make their money when we are sick. What if doctors really could prevent disease? —well they can, but you need to be prepared to do the work because disease prevention is about:

  1. Lifestyle (what you eat, your weight and how much you exercise—covered here)
  2. Exposure to disease (wash your hands)
  3. Keeping great medical records (don’t get me started on a. doctors keeping paper files, b. doctors making it difficult to get your medical records (push them) and c. electronic medical record systems having different formats for the data (more))
  4. Documenting and understanding your genome (DNA)

This set of notes will dig into “4” – Your genome!  My hope is to explain this subject in a way where you can understand how to get your genome data, view it at a high level, view the details and begin to understand the interworking’s of your genetic makeup so you understand the value of leaving your ‘sick care’ doctor behind and finding a true personalized ‘health care’ MD).

Step 1: Have your genome mapped

There are many low-cost direct-to-consumer DNA mapping sites and this linked article will explain a few options for you to consider (here is another).  I personally like 23andMe ($199 USD) because it does a great job of explaining DNA to a novice and a professional, they seek FDA approval, and the site allows you to download your data.

Let’s first cover a few standard definitions to make sure we are all on the same page:

  • DNA (deoxyribonucleic acid) – A molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms and many viruses.
  • Chromosome – a DNA molecule with part or all the genetic material (genome) of an organism. Human cells have 23 pairs of chromosomes (22 pairs of autosomes and one pair of sex chromosomes), giving a total of 46.
  • Genes – From 23andMe, “Genes are segments of DNA that tell your body how to function and what traits to express. People have about 22,000 genes in their genome. Most of these come in duplicate – one copy from your mother and one from your father. Everyone has the same set of genes, but each one can vary by a few letters (bases) between people. These “variants” can lead to differences in the way you look, how you respond to stimuli, and whether or not you are predisposed to certain diseases.”

Once you get your data back from one of these direct-to-consumer genome mapping sites you will have access to their portal.  I’m going to use 23andMe as the example, but many are similar. When you get your report, you can easily go to the ‘Health’ section and see what it is reporting. It will look something like the following:

Step 2: Download your raw genome data to a safe password protected and encrypted location.  

If you are using 23andMe you can download your raw data using these instructions.  If you know what you are looking for you can also dig into your raw data here (more on this later).

But what can you do with your raw genome data?

WARNING: This is where things get a bit tricky. There are five very important things to know:

  1. If you upload your raw data to a website (here is a list of websites and another) you need to be very careful about things like, who is behind the site, what is their country of origin, what is their security policy and most important what is their privacy policy.
  2. Some sites (like Promethease) list all the SNP markers (From 23andMe, “A marker is a specific location in the genome where a genetic sequence has been shown to vary between people. Markers are denoted by a unique identifier, most often an “rs number”) associated with different traits and diseases, as curated from SNPedia. Drawing any conclusion from this reporting is often frowned upon by geneticists. There is such a thing as an SNP that is strongly associated with a disease (These are typically the ones 23andme has FDA approval to report—example BRCA1/2  The individual gene mutations BRCA1 increases the risk of breast cancer. Angelina Jolie is just one of the thousands of women who chose bilateral prophylactic mastectomy to mitigate the increased risk of the BRCA1 mutation.) but most common diseases are not really affected by any given SNP.
  3. The best analysis uses the compound effect of many SNPs with an understanding that each only contributes a small effect. This concept is called polygenic risk scoring (PRS).  This allows scientists to take anyone’s genome and calculate your aggregate risk for certain diseases even if you don’t have one of the known major mutations. Polygenic Risk Scoring is the total score of all the minor gene variations that increase disease risk. This is a powerful upgrade to your doctor’s ability to predict disease in any given patient. This means doctors are no longer in the dark with only the family history to guide them.  (here, here, here and here are 4 great articles on PRS)
  4. From the Norton & Elaine Sarnoff Center for Jewish Genetics: “Your “risk score” is not an absolute determinant of your health, personal lifestyle choices have an effect – You’ve heard of nature versus nurture. If an individual with a high “risk score” acts on preventative care advice, they may decrease their risk of having a genetic disease. The opposite can be true of someone with a low “risk score”. (Read more: How do your genes and the environment interact?)”
  5. Be careful of companies target marketing supplements or programs at gene variants –always check with a licensed medical doctor (MD) before taking any actions.

Step 3: Mapping your raw data to the SNPedia database (but heed warning #2 above)

I am going to use the Promethease site.  A report is $12, and it can directly connect your 23andMe DNA data with the SNPedia human genetics wiki. It also provides information on the effects of genetic variants on Phenotypes (the composite of the organism’s observable characteristics or traits, including its physical form and structure; its developmental processes; its biochemical and physiological properties; its behavior, and the products of behavior, for example, a bird’s nest. An organism’s phenotype results from two basic factors: the expression of an organism’s genetic code, and the influence of environmental factors.) and the information is sourced from peer-reviewed scientific publications. Keep in mind that the match against the SNPedia database may be wrong, as the raw data is not held to the same quality level as that which is part of an FDA approved report from 23andMe.

The report only takes 5 to 10 minutes to generate and you will get it via email as a zip file and via their website.  It will look like the figure below where you have a search panel on the right and the data on the left.  In the example below, you can see the SNP (Single Nucleotide Polymorphism) marker is rs1333049 (From 23andMe, “A marker (SNP) is a specific location in the genome where a genetic sequence has been shown to vary between people. Markers are denoted by a unique identifier, most often an “rs number”, or “rsid”.”).  You will also see the Position (From 23andMe, “If you stretched out all of the DNA in a chromosome from end to end, you could count the position of each letter (A,C,T,G) relative to the first one in the sequence. This count is referred to as a genome coordinate or position. 23andMe uses the same coordinates as the National Center for Biotechnology Information (NCBI), build 37.”).  You will also see the Magnitude (From SNPedia.com, “Magnitude is a subjective measure of interest varying from 0 to 10. Over time it should be adjusted up or down by the community.” The range is from 0 (you have the common genotype) to 10 (significant information).)  You probably only want to review magnitude 3 and above.

If you click on the SNP marker hyperlink rs1333049 you will be taken to the details page in the WIKI.

From the page above on the far right, you have links to many great sites including Ensembl and 23andMe’s detail pages.

You can see more about the Ensembl tool in the article published at https://scottsuhy.com/2018/12/11/dmd-going-one-level-deeper-a-personal-problem/

Once on the 23andMe page you can also see the Variant (From 23andMe, “At any position in the genome that varies, there is more than one possible version (or variant) of the DNA sequence. For example, some people might have an A at a certain position, whereas other people might have a T.” Genetic variations, or variants, are the differences that make each person’s genome unique. DNA sequencing identifies an individual’s variants by comparing the DNA sequence of an individual to the DNA sequence of a reference genome maintained by the Genome Reference Consortium (GRC).) and Your genotype at a marker (From 23andMe, “Your genotype at a marker is the combination of variants that you have at that position on both chromosomes’ copies. For example, if you have the A on one chromosome copy and a T on the other one, your genotype is AT. Some chromosomes don’t come in pairs (i.e. the mitochondrial chromosome and, for the most part, the X and Y chromosomes in men), so your genotype can sometimes be a single letter.”)

There are several other tools out there to get information on each one of the SNP markers.  One of the best is found here at NIH.gov.  With this, you can search for many research articles per SNP marker.

Now that you have all that data, please reread Warning #2 above!

Step 4: Map your data to known polygenic algorithms

These sites are reported to be working with polygenic risk scores:

Keep in mind that this is a relatively new science that has been enabled by the mapping of the human genome. The research is coming out fast. As an example, Sekar Kathiresan and his colleagues at Harvard University and the Broad Institute have been focused on variations linked to coronary artery disease, atrial fibrillation (an irregular heart rate), type 2 diabetes, inflammatory bowel disease, and breast cancer.  They developed an algorithm that could use all this information on a disease’s genetic variants to produce a polygenic risk score, a single number that would indicate a person’s risk of developing each disease based on their genomic data. Their algorithm identified 20 times more people at high risk of a heart attack than did the traditional method of just looking for the variant that indicates inherited high cholesterol. If more people know they’re at risk, they can go on medication or start making lifestyle changes to prevent the onset of the disease.  You can get a copy of the report here or here.

As an example, here is data from Impute.me a non-profit (please donate) genetics analysis site run by independent academics since August 2015. Their design goal is to provide analysis at the cutting edge of what is currently known and possible in genetics research. A central part of their site is the creation of a guidebook for personal genome analysis. This book provides more in-depth explanations for many of the concepts involved and it’s highly recommended as a guide to accompany your analysis. (New: Updates to the site will be announced at twitter). 

Let’s go into a couple interesting things you can do with their site.  Note that I am using the text below directly from the Input.me website.

Complex Disease

A polygenic risk score is a value that gives a summary of a large number of different SNPs – each of which contributes a little to disease risk. The higher the value, the higher the risk of developing the disease. Of course, the interpretation of this risk depends a lot on other factors as well: How heritable the disease is. How much of this heritability we can explain with known SNPs. And not least, what would the risk of disease be for you otherwise, i.e. without taking the genetic component into account. Because the polygenic risk score is only a risk-modifier, knowledge of these three other values are all required if you want to know your overall risk is, i.e. what’s the chance in percent. This calculator cannot provide that. But it can provide a view of the known genetic component of your disease risk, based on all the SNPs that we know are associated with the disease. This, we believe, makes it a better choice for complex diseases than the typical one-SNP-at-the time analysis typically seen in consumer genetics.

If you upload your 23andMe data after a couple of days you will have access to this site and a unique ID that will be good for 2 weeks.

Precision Medicine

This is a module that can visualize the entire compendium of human disease – at each point showing relevant genetic findings. The goal is to illustrate how to present genetic data depending on a medical status.

Diseases, where one mutation has a strong medical effect on you are luckily rare. For the majority of people, learning from our genes is instead matter risk modifications and weak predictions. For a healthy adult, these are typically of little practical use. However, the assumption changes drastically if you are not healthy; If you are anyway being evaluated for a given disease, it may very well be useful to know if a different but medically related diagnosis has a particularly high or low risk. 

For example, if a person is suffering from mental problems, but have not yet been properly evaluated for any specific diagnosis, then genetic risk information for all diseases related to mental problems may become useful knowledge. Because the information can then serve as a guiding point in that difficult challenge of first diagnosis. Similar examples can be made for virtually all areas of early medical evaluation.

It is the purpose of the module to help with this: By forcing browsing into pre-defined sets of disease-areas, the algorithm provides you only with genetic information that is relevant to.your current medical status. Nothing more, nothing less. Risk scores relevant to the medical area you are interested in will be shown. Fluke signals from irrelevant disorders will not. The details behind all information given here can be explored in the remaining modules of the site, as indicated when you click on each of colored bubbles above. As such this module can serve as an entry-way into the entire site, depending on your context and interest.

In the root of the tree, we find ‘feeling fine’, which is always a neutral color: People who feel fine don’t need to worry about their genetic risk scores. However, when selecting ‘heading to hospital’, climbing up the tree, the genetic risk scores are revealed as they become relevant. More of the thinking behind this module is explained in this short animation-video from 2017.

Rare Disease

The overview of rare disease variants found in this module is not the most extensive single-SNP effects available online. They are shown here because they are all well-supported strong genetic effects, for a selection of rare inherited diseases where microarray analysis made sense. This was the reason these SNPs were included in the 2016-version of the 23andme health.

Especially the last part – that microarray analysis made sense – is very important when analyzing the genetics of rare disease; the microarray technology used in consumer genetics is not optimal because the really strong mutations typically are not measured on a microarray. DNA-sequencing is required to detect them. Therefore microarray analysis of rare disease effects has many problems with false negative results. There’s a lot of further details to this discussion, chapter 3.5 in this book is a good place to seek more information.

Nonetheless, the 2016-selection of microarray-measurable SNPs made by 23andme still is reasonably relevant to report, particularly for the carrier-information. For non-23andme users, this module has the additional benefit of translating the data for proprietary 23andme SNPs, with the caveat that because the SNPs are very rare they are often hard to impute.

Drug Response

This is a test of a systematic approach to drug-response SNPs. Most of the known drug-response-associated genetics concern liver enzymes (e.g. CYP2C19) and their break-down of drug metabolites. These are well characterized elsewhere already. The focus of this module is to integrate systematic multi-SNP profiles beyond liver enzymes and provide estimates of drug-response.

To illustrate how this works, the module shows the calculations that take place for a number of drug response predictions, both on a per-drug level and on a per-SNP level, corresponding to the first and the second table. The first table summarizes per-drug calculation whenever possible. If possible, a Z-score is calculated in the same way as also described in the complex disease module. If not, it is indicated as ‘not calculated’. In that case, it is necessary to look at the second table for comments on the individual SNPs from the input studies. The Z-score approach takes information from many SNPs, and can, therefore, be considered as more thorough, of course depending on the underlying scientific study.

Gene Mutations

Most SNPs in the genome are not actually found within a gene: They are ‘intergenic’. When talking about a gene-mutation however, as is done in popular media, most often the meaning is a SNP that alters the sequence of a gene. Because of selection pressure throughout our evolution, these are rare. Also, they are often the focus of scientific studies using DNA-sequencing technology to discover the causes of rare diseases. However, interestingly many of us actually have these ‘gene-breaking’ SNPs while nonetheless being perfectly healthy. The imputation technology used with this site gives the opportunity to identify a number of these based on just on genotyping microarray results. If you give your ID-code to this module a table of all measured missense and nonsense mutations will be presented. 

Interpretation of the table can be done in many ways and unlike other modules, this does not give ‘one true answer’. One method is to search for SNPs where you have one or two copies of the non-common allele and then investigate the consequence using other resources such as dbSnp or ExAC. Note however that the definition of ‘common’ is very dependent on ethnicity: in this browser common just means the allele most often found in impute.me-users. However, it is recommended to check the ethical distribution in e.g. the 1000 genomes browser. Another help provided is the polyphen and SIFT-scores, which can give an indication of the consequence. Ultimately the goal of this is to satisfy one’s curiosity about the state of your functional genes. If you happen to find out that you carry two copies of completely deleterious mutations (nonsense mutation) but otherwise feel healthy, feel free to contact us. By being healthy, in spite of a specific broken gene, you’d be contributing to complete our view of genes and how they work.

BRCA

Thousands of mutations in the BRCA1 and BRCA2 genes have been documented. 23andMe reports data for three mutations that account much of inherited breast cancer, but other possible mutations in these two genes are not included in the 23andme report. Many can only be detected by sequencing, such as from myriad genetics. However, dozens of extra possible mutations of interest can be reached with imputation analysis. The following lists your genotype for the directly measured three 23andme-SNPs as well as all other SNPs in the two genes that are either missense or nonsense. For interpretation, we recommend reading more about polyphensift-scores, and clinvar.

If clinvar is indicated as pathogenic and the SNP is measured in your genome and your genotype is not of the genotype indicates as normal, then this indicates a potential problem. The list is sorted according to the clinvar variable by default.

UK Bio-bank Calculator

A study of ~½ million UK residents, known as the UK biobank, has recently been published. This module allows the calculation of a genetic risk score for any of the published traits.

Now that you have all that data please reread Warning #4 above.

Step 5. Make a plan.

If you are high risk for coronary artery disease see a cardiologist. If you are at high risk for breast cancer, mental illness, eye problems etc. see a medical (MD) specialist.

…but be wary of Warning #5 above–don’t go see a “quack” and don’t self medicate!

..but also do your homework and understand if the specialist is up to date on the latest and greatest –for example, if you see a psychiatrist for ADHD make sure they are trained in epigenetics. 

Other Information:

Other Tools:

DMD, Going one level deeper – a personal problem…

I’ll keep these notes updated as a learn more about Duchenne Muscular Dystrophy (DMD) and the progress toward a cure.

I love Wired.  They have incredible content for people interested in STEM but after I read an article I’m often left with a feeling that I grasped the basics but I really didn’t understand the details–and I think it may be because I didn’t listen as well as I should have during high school biology.  For example, this article from August 2018 on DMD was very interesting to me because I have young relatives with the disease.  

Basically the article says the following:

  • Some King Charles Spaniels have a mutation on their X chromosomes, in a gene that codes for a muscle protein called dystrophin much like a human suffering from DMD.
  • Eric Olson from the University of Texas Southwestern Medical Center has successfully halted the progression of the disease in some of the dogs using a gene editing tool known as CRISPR but there is still a lot of work to be done (additional longer-term canine studies to test for safety) before human trials would be safe.
  • “Olson found a way to target an error-prone hot spot on exon 51, which he figured could, with a single slice, benefit approximately 13 percent of DMD patients.”
  • Olson licensed the technology and founded a startup called Exonics Therapeutics along with the CureDuchenne group (who invested $2M) and The Column Group (who invested $40M).
  • One of the challenges is figuring out how to manufacture enough viral delivery vehicles to inject CRISPR into all the muscles in the human body.

I get the basics and I should just move on but I can’t... I need to know more.  The new technology fascinates me: What is CRISPR and how does it work?  What is gene editing? What is a viral delivery vehicle?  What is dystrophin? ...but then there are also items I should understand but I don’t (items that I know I learned in high school but I’ve forgotten or never really grasped at the time): What’s a chromosome? What’s a gene?  What’s an Exon? What’s a protein and why is it important? …and how do dogs relate to humans? 

So the journey begins and it shows how I think and my limitations :-).  I know I won’t understand what Exonics does without understanding CRISPR/Cas9.  I won’t understand CRISPR/Cas9 without understanding ‘gene editing’.  I won’t understand ‘gene editing’ without understanding chromosomes & genes.  I won’t understand chromosomes & genes without understanding DNA.  I won’t understand DNA without understanding cells.   I won’t understand cells without understanding proteins. I won’t understand proteins without understanding molecules and atoms.  Hopefully, you get the point.  Most people know when to stop… me… unfortunately I need to go one step further and I constantly find myself realizing I didn’t retain much of what I learned in high school. …and then it becomes a bit of a puzzle. Some people like Sudoku… I like science. …but unfortunately, I’m not a scientist however I do have the passion (and motivation) to learn about this subject. 

Let’s start with the basic definitions (YES, high school biology)–humans only:

What’s a cell

The cell is the smallest unit of life.  The human body has >10Trillion cells.  A Cell has a membrane that contains receptors (proteins) that detect external signaling (ex. Hormones) and cytoplasm (all the stuff inside the cell like amino acids that perform functions and the nucleus).  

We have to take a detour to high school chemistry for a second: What are molecules and atoms?

An atom is the smallest unit of matter containing a nucleus (Protons, Neutrons) and electrons. The number of atoms in the human body–it’s staggering (here).

A molecule is 2 or more atoms held together by chemical bonds.  Much of the research references molecular formulas so you need to understand them. 

A molecular formula (example ‘a’) is a representation of a molecule that uses chemical symbols to indicate the types of atoms followed by subscripts to show the number of atoms of each type in the molecule. (A subscript is used only when more than one atom of a given type is present.)

The structural formula (example ‘b’) for a compound gives the same information as its molecular formula (the types and numbers of atoms in the molecule) but also shows how the atoms are connected in the molecule. The lines represent bonds that hold the atoms together. A chemical bond is an attraction between atoms or ions that holds them together in a molecule.

Example A and B are the formulae for methane as it contains one Carbon atom and four Hydrogen atoms.  Here are other examples for your reference:

A typical human cell has somewhere around 42 million protein moleculesYou can also find he number of molecules in the human body (here).

What is DNA (Deoxyribonucleic acid)?

DNA (and RNA) are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life.

Specifically, DNA is a molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms.  All the cells in a person’s body have the same DNA and the same genes. However, the difference between cells in different tissues and organs is that the “expression” of the genes differs between cells. Expression generally means that the message from the DNA is being copied and made into protein. For example, liver cells have different proteins than skin cells, even though their DNA is the same.

DNA is made up of Nucleotides (sugar, phosphates and nitrogenbases).  There are 4 types of nitrogen bases: Thymine (T), Adenine(A), Guanine (G), Cytosine(C)

“A” bonds only with “T” and “C” only bonds with “G”

What is RNA (Ribonucleic acid)? 

RNA is a molecule essential in various biological roles in coding, decoding, regulation, and expression of genes. Like DNA, RNA is assembled as a chain of nucleotides, but unlike DNA it is more often found in nature as a single-strand folded onto itself. Cellular organisms use messenger RNA (mRNA) to convey genetic information (using the nitrogenous bases of guanine, uracil, adenine, and cytosine, denoted by the letters (G, U, A, and C) that directs the synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.

What is a chromosome?

A chromosome is a DNA molecule that contains part of a human’s genetic material.  A  human cell nucleus contains 23 pairs (46 total) of chromosomes (DNA molecules) which are long strands of DNA tightly wound into coils (note that sperm and egg cells contain only 23 total chromosomes). If you unwound each cells DNA it would be about 6 foot long. 

What is a gene?

gene is a sequence (section) of DNA or RNA that uses a set of rules to translate information encoded within the DNA or mRNA sequences into proteins for a molecule that has a function.

Genes are either turned ‘on’ or ‘off’ mixed among other non-coded ‘junk DNA’.

Human beings have roughly 20,500 genes, all coiled up in DNA, housed in each cell. That’s 20,500 places where the machinery of human life can be altered.

Genes are divided into sections called exons and introns (junk DNA). Exons are the sections of DNA that code for the protein and they are interspersed with introns.

The HUGO Gene Nomenclature Committee (HGNC) designates an official name and symbol (an abbreviation of the name) for each known human gene. The Committee has named more than 13,000 of the estimated 20,000 to 25,000 genes in the human genome.

Genes can also mutate…  Although the human genome consists of 3 billion nucleotides, changes in even a single base pair can result in dramatic physiological malfunctions.  For example, sickle-cell anemia is a disease caused by the alteration of a single nucleotide in the gene for the beta chain of the hemoglobin protein (the oxygen-carrying protein that makes blood red) and that is all it takes to turn a normal hemoglobin gene into a sickle-cell hemoglobin gene. This single nucleotide change alters only one amino acid in the protein chain– the results are devastating! Beta hemoglobin is a single chain of 147 amino acids, but because of the single-base mutation, the sixth amino acid in the chain is valine, rather than glutamic acid. Note below that ‘Wild-Type’ is the normal hemoglobin. 

To understand amino acids like valine and glutamic acid you need to understand the codon table found here:

Gene Sequencing

DNA sequencing is the process of determining the order of nucleotides in DNA.  DNA molecules are incredibly long and consist of billions of nitrogen bases. In fact, if all the DNA bases of the human genome were typed as A, C, T, and G, the 3 billion letters would fill 4,000 books of 500 pages each.  The Human Genome Project was the effort to map all the human nucleotides and genes. 

The sickle-cell gene mentioned above is CLLU1 and if you were to compare the human gene sequence to that of a chimp or a macaque it would look like the following:

Tools

There are 2 common Genome Browsers (and several others).  One from Ensembl and another from the University of California Santa Cruz Genomics Institute browser.

Let’s look at an Ensembl example:


Within the chromosome you can view the detail of a region (1) and inspect the genes (2).  For example, here (3) you can see the sickle-cell anemia gene CLLU1

The sequence will provide the order of nucleotides in the gene and you can begin to see the sequence from the chimp / macaque example from above (1). 

Now, with that backdrop, we can now begin to understand the content in the Wired article.

What is CRISPR/Cas9 and gene editing? 

The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) method is based on a natural system used by bacteria to protect themselves from infection by viruses.  When a bacterium detects the presence of virus DNA it produces 2 types of short RNA one of which contains a sequence that matches that of the invading virus.  These 2 RNAs form a complex with a protein enzyme called Cas9. Cas9 can cut DNA (think of Cas9 as a set of molecular scissors). When the matching sequence known as a “guide” RNA finds it matching target within the viral genome the Cas9 cuts the target DNA disabling the virus.

Cas9 can be engineered to cut any DNA sequence (not just viral DNA) at a precise location by changing the guide RNA to match the target DNA. Once inside the nucleus of the cell, the RNA-Cas9 complex will locate and lock on to a short target sequence known as the PAM (Protospacer Adjacent Motif). The Cas9 will then unzip the DNA and match it to its target RNA and if the match is complete the Cas9 will use its tiny molecular scissors to cut the DNA.  Once the CRISPR system has made the cut this new DNA can pair up with the cut ends recombining and replacing the original sequence with the new version.

Here is the basic process:

  1. Build the guide RNA (gRNA).  This guide RNA will direct the protein (Cas9) to its target DNA sequence. The guide RNA consists of a tracrRNA (a scaffold sequence necessary for Cas-binding) and a crRNA sequence (a user-defined ∼20 nucleotide spacer) that is identical to the target. The crRNA can be any ∼20 nucleotide DNA sequence, provided it meets two conditions:
    • The sequence is unique compared to the rest of the genome.
    • The target is present immediately adjacent to the Protospacer Adjacent Motif (PAM). The PAM sequence is essential for target binding, but the exact sequence depends on which Cas protein you use (check out the list of additional Cas proteins and PAM sequences).
  2. Guide RNA + CAS9. Once expressed, the Cas9 protein and the gRNA form a complex through interactions between the gRNA scaffold and surface-exposed positively-charged grooves on Cas9. Cas9 undergoes a conformational change upon gRNA binding that shifts the molecule from an inactive, non-DNA binding entity into an active DNA-binding entity. Importantly, the spacer region of the gRNA remains free to interact with target DNA.
  3. Bind. Once the Cas9-gRNA complex finds a DNA target, the seed sequence (8-10 bases at the 3′ end of the gRNA targeting sequence) will begin to bind to the target DNA. If the seed and target DNA sequences match, the gRNA will continue to bind to the target DNA in a 3′ to 5′ direction.
  4. Cut. Once Cas9 binds to the target DNA it cuts the target DNA ∼3-4 nucleotides upstream of the PAM sequence.
  5. REPAIR: (NHEJ or HDR) Once the CRISPR system has made the cut this new DNA can pair up with the cut ends recombining and replacing the original sequence with the new version.
    • The efficient but error-prone non-homologous end joining (NHEJ) pathway
    • The less efficient but high-fidelity homology-directed repair (HDR) pathway

CRISPR can also be used to target many genes at once which is helpful for complex diseases that are caused not by one single mutation but by many genes acting together.

If you want to geek out you can try CRISPR yourself by ordering a kit here.  … here is a YouTube video that shows the basics.  If you want to go very very deep on CRISPR read this PMC article.

What is Dystrophin and how is it important to Duchenne Muscular Dystrophy (DMD)?

In the study published in Science, a team led by Eric Olson at the University of Texas Southwestern Medical Center used CRISPR to successfully modify the DNA of four young dogs, reversing the molecular defect responsible for the canine version of DMD

The dystrophin gene (view it in Ensembl) is the largest in the human genome, and there are thousands of different mutations that can all result in the disease. Olson found a way to target an error-prone hot spot on exon 51 (Ensembl), which he figured could, with a single slice, benefit approximately 13 percent of DMD patients

However, a challenge is manufacturing enough viral delivery vehicles to inject CRISPR into all the muscles in the human body and it is expensive.

What is Exonics doing?

From PMC Oct 2018 Gene editing restores dystrophin expression in a canine model of Duchenne muscular dystrophy

From ScienceMag.org Oct 2018 “We used adeno-associated viruses to deliver CRISPR gene editing components to four dogs and examined dystrophin protein expression…” “dystrophin was restored to levels ranging from 3 to 90% of normal, depending on muscle type. In cardiac muscle, dystrophin levels in the dog receiving the highest dose reached 92% of normal. The treated dogs also showed improved muscle histology. ” You can purchase the full report for $30 here.

From PMC Nov 2017 Single-cut genome editing restores dystrophin expression in a new mouse model of muscular dystrophy

From the funding PR release Nov 2017: “Exonics has used SingleCut CRISPR to genetically repair and restore dystrophin, the key protein missing in children with Duchenne.”

From ScienceMag.org April 2017 CRISPR-Cpf1 correction of muscular dystrophy mutations in human cardiomyocytes and mice “pathophysiological hallmarks of muscular dystrophy were corrected in mdx mice following Cpf1-mediated germline editing”

These folks at Exonics are heros!