Complementary DNA

In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to clone eukaryotic genes in prokaryotes. When scientists want to express a specific protein in a cell that does not normally express that protein (i.e., heterologous expression), they will transfer the cDNA that codes for the protein to the recipient cell. In molecular biology, cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays and RNA-seq.

cDNA is also produced naturally by retroviruses (such as HIV-1, HIV-2, simian immunodeficiency virus, etc.) and then integrated into the host's genome, where it creates a provirus.[1]

The term cDNA is also used, typically in a bioinformatics context, to refer to an mRNA transcript's sequence, expressed as DNA bases (deoxy-GCAT) rather than RNA bases (GCAU).

RNA serves as a template for cDNA synthesis.[2] In cellular life, cDNA is generated by viruses and retrotransposons for integration of RNA into target genomic DNA. In molecular biology, RNA is purified from source material after genomic DNA, proteins and other cellular components are removed. cDNA is then synthesized through in vitro reverse transcription.[3]

RNA is transcribed from genomic DNA in host cells and is extracted by first lysing cells then purifying RNA utilizing widely-used methods such as phenol-chloroform, silica column, and bead-based RNA extraction methods.[4] Extraction methods vary depending on the source material. For example, extracting RNA from plant tissue requires additional reagents, such as polyvinylpyrrolidone (PVP), to remove phenolic compounds, carbohydrates, and other compounds that will otherwise render RNA unusable.[5] To remove DNA and proteins, enzymes such as DNase and Proteinase K are used for degradation.[6] Importantly, RNA integrity is maintained by inactivating RNases with chaotropic agents such as guanidinium isothiocyanate, sodium dodecyl sulphate (SDS), phenol or chloroform. Total RNA is then separated from other cellular components and precipitated with alcohol. Various commercial kits exist for simple and rapid RNA extractions for specific applications.[7] Additional bead-based methods can be used to isolate specific sub-types of RNA (e.g. mRNA and microRNA) based on size or unique RNA regions.[8][9]

Using a reverse transcriptase enzyme and purified RNA templates, one strand of cDNA is produced (first-strand cDNA synthesis). The M-MLV reverse transcriptase from the Moloney murine leukemia virus is commonly used due to its reduced RNase H activity suited for transcription of longer RNAs.[10] The AMV reverse transcriptase from the avian myeloblastosis virus may also be used for RNA templates with strong secondary structures (i.e. high melting temperature).[11] cDNA is commonly generated from mRNA for gene expression analyses such as RT-qPCR and RNA-seq.[12] mRNA is selectively reverse transcribed using oligo-dT primers that are the reverse complement of the poly-adenylated tail on the 3' end of all mRNA. An optimized mixture of oligo-dT and random hexamer primers increases the chance of obtaining full-length cDNA while reducing 5' or 3' bias.[13] Ribosomal RNA may also be depleted to enrich both mRNA and non-poly-adenylated transcripts such as some non-coding RNA.[14]

The result of first-strand syntheses, RNA-DNA hybrids, can be processed through multiple second-strand synthesis methods or processed directly in downstream assays.[15][16] An early method known as hairpin-primed synthesis relied on hairpin formation on the 3' end of the first-strand cDNA to prime second-strand synthesis. However, priming is random and hairpin hydrolysis leads to loss of information. The Gubler and Hoffman Procedure uses E. Coli RNase H to nick mRNA that is replaced with E. Coli DNA Polymerase I and sealed with E. Coli DNA Ligase. An optimization of this procedure relies on low RNase H activity of M-MLV to nick mRNA with remaining RNA later removed by adding RNase H after DNA Polymerase translation of the second-strand cDNA. This prevents lost sequence information at the 5' end of the mRNA.

Complementary DNA is often used in gene cloning or as gene probes or in the creation of a cDNA library. When scientists transfer a gene from one cell into another cell in order to express the new genetic material as a protein in the recipient cell, the cDNA will be added to the recipient (rather than the entire gene), because the DNA for an entire gene may include DNA that does not code for the protein or that interrupts the coding sequence of the protein (e.g., introns). Partial sequences of cDNAs are often obtained as expressed sequence tags.

With amplification of DNA sequences via polymerase chain reaction (PCR) now commonplace, one will typically conduct reverse transcription as an initial step, followed by PCR to obtain an exact sequence of cDNA for intra-cellular expression. This is achieved by designing sequence-specific DNA primers that hybridize to the 5' and 3' ends of a cDNA region coding for a protein. Once amplified, the sequence can be cut at each end with nucleases and inserted into one of many small circular DNA sequences known as expression vectors. Such vectors allow for self-replication, inside the cells, and potentially integration in the host DNA. They typically also contain a strong promoter to drive transcription of the target cDNA into mRNA, which is then translated into protein.

On 13 June 2013, the United States Supreme Court ruled in the case of that while naturally occurring human genes cannot be patented, cDNA is patent eligible because it does not occur naturally.[17]

cDNA is also used to study gene expression via methods such as RNA-seq or RT-qPCR.[18][19][20] For sequencing, RNA must be fragmented due to sequencing platform size limitations. Additionally, second-strand synthesized cDNA must be ligated with adapters that allow cDNA fragments to be PCR amplified and bind to sequencing flow cells. Gene-specific analysis methods commonly use microarrays and RT-qPCR to quantify cDNA levels via fluorometric and other methods.

Some viruses also use cDNA to turn their viral RNA into mRNA (viral RNA → cDNA → mRNA). The mRNA is used to make viral proteins to take over the host cell.

An example of this first step from viral DNA to cDNA can be seen in the HIV cycle of infection. Here, the host cell membrane becomes attached to the virus’ lipid envelope which allows the viral capsid with two copies of viral genome RNA to enter the host. The cDNA copy is then made through reverse transcription of the viral RNA, a process facilitated by the chaperone CypA and a viral capsid associated reverse transcriptase.[21]

cDNA is also generated by retrotransposons in eukaryotic genomes. Retrotransposons are mobile genetic elements that move themselves within, and sometimes between, genomes via RNA intermediates. This mechanism is shared with viruses with the exclusion of the generation of infectious particles.[22][23]

Mark D. Adams et al. “Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project.” Science (American Association for the Advancement of Science) 252.5013 (1991): 1651–1656. Web.

Philip M. Murphy, and H. Lee Tiffany. “Cloning of Complementary DNA Encoding a Functional Human Interleukin-8 Receptor.” Science (American Association for the Advancement of Science) 253.5025 (1991): 1280–1283. Web.