<< Back to MOTIFvations Blog Home Page
Complete Guide to Understanding Single-Cell RNA-Seq
By Anne-Sophie Ay-Berthomieu, Ph.D.
March 4, 2021
Table of Contents:
With the rise of Next-Generation-Sequencing (NGS), omics analysis has become mainstream. In particular, more and more researchers include transcriptomics data in their publications. RNA-Seq allows the identification of all the transcripts in a sample, replacing other methods such as quantitative RT-PCR and microarrays. Bulk RNA-Seq is very informative and can lead to the identification of the molecular mechanisms underlying physiological or pathological processes.
However, for researchers working with heterogeneous samples, including biopsies or animal organs, bulk RNA-Seq can be limited. In order to overcome this limitation, single-cell technologies have emerged and have been adapted to different omic protocols such as single cell RNA sequencing (scRNA-Seq).
Here, we will tell you the story of scRNA-Seq, highlighting the benefits and the limitations of such technology and explain the protocols.
What is RNA-Seq?
Early transcriptomic analysis used RT-PCR / RT-qPCR and microarrays. While RT-PCR / RT-qPCR measures the expression level of one known transcript, microarrays offer the possibility of covering a wide range of targets. However, both of them measure the expression level of predetermined transcripts, therein limiting the discovery of new processes. On the other hand, RNA-Seq is able to sequence the whole transcriptome, enabling the sequencing of non-model organisms whose genome is unknown. RNA-Seq also gives precise information about mRNA splicing, and transcription, by identifying exon/exon connections, alternative splicing, and transcription starting site (TSS) with single-base resolution. Moreover, RNA-Seq can highlight sequence variations, including SNP. RNA-Seq displays little to no background noise as compared to micro-arrays and it is quantitative within a very dynamic-range. Indeed, micro-arrays are very limited for low or highly expressed transcripts.
The RNA-Seq protocol first starts with total RNA isolation. DNA and solvent contaminations have to be avoided. After RNA isolation, ribosomal RNA is removed. It represents a high proportion of total RNA and will use up precious sequencing reads. Depending on which kind of RNAs have to be detected, mRNA enrichment can be performed using poly(A) selection or ribosomal depletion can be used to remove a particular species of rRNA. NGS technology, like PCR, uses double-stranded DNA for library preparation. So first, RNA is converted into cDNA via reverse transcription. Sequencing indexes are then added by ligation and the library is amplified by PCR and sequenced. We will detail the protocol later in the article.
Development of Bulk and Single-Cell RNA-Seq Assays
Initial development of RNA-Seq greatly benefited the study of small/non-coding RNAs. Microarrays were not a good fit for small non-coding RNA as oligonucleotide sequences on microarrays were based on existing genome sequences, particularly coding genes. Also, small RNAs were too short to be captured on a chip. One of the first studies using RNA pyrosequencing was led by Bartel DP. at the MIT, using C. elegans samples. This study, which sequenced ~400K small RNAs, enabled the discovery of 18 new microRNAs, thousands of siRNA, and a third class of small RNAs, called 21U-RNAs. These small 21U-RNAs are 21 nucleotides long with a 5’ uridine and mapped to two different regions of chromosome IV either between coding genes or within introns.
The miRnome was further studied by the same group in Drosophila by integrating computational predictions of new microRNAs with RNA-Seq of small RNAs. Using pyrosequencing, almost 50% of predicted miRNAs were identified and 59 novel genes were discovered. The same sequencing data led to another publication where Ruby JG. et al. identified an alternative pathway for miRNA biogenesis. It turns out that some intronic miRNAs are able to bypass DROSHA cleavage and are translated as pre-miR called “mirtrons.” Fourteen mirtrons were discovered in Drosophila and four in C. elegans. At the same time, Lai EC’s group from the Memorial Sloan-Kettering Cancer Center in New-York confirmed this alternative miR biogenesis pathway in another publication.
Large-scale RNA-Seq experiments not only brought a lot of new findings to the field of non-coding RNA but also to transcriptomics by improving genome annotations. These revealed that the transcribed genome regions are wider than previously expected. However, like for other bulk experiments, bulk RNA-Seq can have some limitations for heterogeneous samples, including organs, biopsies, or during dynamic processes such as development and differentiation. In 2011, Linnarsson S.‘s team developed a protocol to barcode individual cells during the reverse transcription step. Single-cells (ES R1 and MEF cells) were loaded in a 96-well plate and lysed. RNAs were reverse transcribed to generate cDNA. Unique helper oligos were then used to incorporate a specific sequence at the 3’ end of the cDNA. Despite difficulties in detecting alternative splicing, mRNA expression level was similar to qPCR results. Generating a two-dimensional cell map, the researchers highlighted gene expression patterns specific for each cell line.
The quantity of starting material needed for bulk RNA-Seq can also be an issue. In 2009, Tang F. et al. published the first protocol for scRNA-Seq. They were able to analyze mRNA expression in one single mouse blastomere. They identified 75% more genes than with microarrays and discovered 1753 spliced junctions. They further analyzed the effects of miRNAs on mRNA expression by depleting Dicer1 or Ago2 genes in oocytes. They observed an upregulation of more than 1500 genes with 619 genes in common.
These two pioneered publications brought a new technology that is still in development. Whereas bulk RNA-Seq is quite straightforward to analyze the transcriptome, scRNA-Seq gives more information about the variations of the transcriptome during evolving processes (healthy or pathologic).
How Do Single-Cell RNA-Seq Protocols Work?
The scRNA-Seq workflow is more or less similar to the scATAC-seq workflow except for the nuclei isolation step, since RNAs are located in the cytoplasm. Cells are sorted, RNAs are tagged with a barcode and reverse transcribed. All RNAs from the same cells are tagged with the same barcode. Cells can then be pooled together and cDNA amplified to prepare the sequencing library.
Single Cell Partitioning
Cell preparation is the first and the most important part of the scRNA-Seq protocol. To obtain a single-cell suspension, cells need to be gently mixed by pipetting to avoid any clumps and must be filtered to remove debris. High quality starting material is key to obtaining the most sensitive scRNA-Seq data. Whereas the preparation can be easy for cell lines or suspension cells, it can be more challenging for tissues. Indeed, the high tempertures required for a standard protease digestion can activate different pathways and thus completely alter the transcriptome profile. However, successful protocols which preserve the transcriptome profile are well documented in several publications for tumors, bone, bone marrow, and fresh-frozen tissues.
The partitioning of single cell suspensions can be done either by using limiting dilution in a 96 well plate or by taking advantage of existing platforms. The most common are the 10X Genomics Chromium™ Single Cell, the Bio-Rad ddSEQ™ Single-Cell Isolator, the 1CellBio inDrop™, and the Dolomite µEncapsulator. These systems use a fluidic system to isolate single cells, unique barcodes, and reverse transcription reagents inside either gel beads or oil droplets.
Another strategy is split-pooling; instead of isolating single-cells and barcode them individually, samples are subjected to multiple rounds of aliquoting, pooling, and barcoding, so each cell displays a unique indexing combination.
Reverse Transcription - Barcoding – Library Prep - Sequencing
Once cells are isolated with the barcodes and the RT reagents, cells are lysed to extract cellular mRNA. Similarly to RNA-Seq, rRNA will high-jack all of the NGS reads and so it needs to be removed. To allow the capture of mRNA only, most of the protocols use poly-dT primers. These primers also bear a unique molecular identifier. Poly-dT primed mRNAs are retrotranscripted into cDNA. At this step, the transcriptome of each cell is individually stamped. Then, cDNA is amplified by PCR, and sequencing indexes are incorporated. Depending on the protocol, the barcodes can be added at the amplification step instead of the RT step. The libraries are ready and pooled for NGS.
The analysis of non-poly-adenylated mRNAs or non-coding RNAs is more challenging and necessitates specific protocols. Most of these protocols use random primers for the reverse transcription and special conditions so that ribosomal RNAs are not reverse transcribed effectively. Fan X. et al. showed that with low starting RNA concentration and specific lysis and RT conditions, rRNAs represent only 1.5% of the total RNAs identified with scRNA-Seq.
Lots of all-inclusive solutions exist to analyze scRNA-Sequencing data. These include several quality controls and the interpretation of scRNA-Seq data. Companies selling reagents for single-cell experiments, usually also offer software to analyze the sequencing data (10X Genomics Loupe or Fluidigm Singular).
QC mainly results in the exclusion of single-cells with poor-quality data. Once the data are filtered, the goal is to align the sequence to the genome, analyze the expression of every transcript across each cell, and cluster them. Different algorithms and computational approaches can easily help to cluster the scRNA-Seq data.
Advantages & Limitations of scRNA-Seq Assays
scRNA-Seq is a very powerful method that allows transcriptomic analysis of heterogeneous tissue, or dynamic processes in one single experiment. Without microdissection or FACS-sorting, samples can be directly prepared for the scRNA-Seq protocol. Depending on the number of cells and the sequencing depth, scRNA-Seq can be very sensitive and detect a population representing less than 1% of the total population.
Because of the quantity of information generated by scRNA-Seq, some of the protocol steps have to be handled with care. The preparation of the single-cell solution can be challenging especially for tissues or cells surrounded by an extracellular matrix. The protocols used to isolate these cells can by themselves modify the transcriptome and the sequencing data could be the result of these manipulations instead of an endogenous cellular process. In contrast, for bulk RNA-Seq, tissues can be directly lysed in trizol, freezing the transcriptome at the very first step of the extraction. scRNA-Seq identifies fewer transcripts than bulk RNA-Seq and this imperfect coverage can lead to a biased quantification. However, this flaw can be corrected by bioinformatic analysis. Finally, most scRNA-Seq protocols only focus on poly-A RNAs, which excludes all non-coding RNAs.
scRNA-Seq is more expensive and more time-consuming than bulk RNA-Seq but in one experiment, you obtain the transcriptome profile of several populations.
Discoveries Enabled by scRNA-Seq
scRNA-Seq has already been used in several high-impact studies, taking advantage of this sensitive technology to decipher pathological and biological processes.
Tumorogenesis is an evolving process where, in one tumor, cells can be at different stages. Small-cell lung carcinoma (SCLC) is a highly invasive tumor type with poor outcomes. To understand the chemoresistance of patients suffering from SCLC, Stewart AC. et al. explanted the tumors and generated circulating tumor cell (CTC)-derived xenografts and analyzed tumor heterogeneity by scRNA-Seq. In chemotherapy-resistant tumors, scRNA-Seq highlighted intratumoral heterogeneity with activation of potential resistance pathways and the epithelial-to-mesenchymal transition. The same kind of profile was found in post-relapse tumors suggesting that several populations coexist in a tumor using different resistance mechanisms to chemotherapy. Altogether, these results emphasize the need for chemotherapy combination treatment in order to counteract the plurality of resistance mechanisms.
During development, cells commit to differentiation pathways and acquire specific functions. The complexity of such a process depends on the type of organs. In the airway, O2 is transported to the alveoli to enter the blood circulation. The main cell types are progenitor cells, secretory club cells, and ciliated cells. Rarer cell types also exist including solitary neuroendocrine (NE) cells, goblet cells, and tuft cells. Rajagopal J.’s lab combined scRNA-Seq and lineage tracing to study the murine tracheal epithelium. They discovered a new cluster of cells similar to ionocytes found in Xenopus and zebrafish skin. These cells highly express the cystic fibrosis gene, cftr. They confirmed the developmental trajectories that were already identified in the airways and highlighted unique structures they called “hillocks.” Hillock cells are characterized by cellular adhesion and squamous epithelial differentiation. With lineage tracing, they showed that tuft cells, neuroendocrine cells, and ionocyte cells were all derived from basal progenitor cells.
Understanding the Brain
Most neurodegenerative diseases are still incurable. Deciphering the mysteries of brain tissue, how it works, and how it declines is currently one of the most important fields of research in the world. Microglia, parenchymal macrophages of the CNS, have been shown to be widely involved in brain development and neurological diseases. Li Q. et al. analyzed mouse microglia and myeloid cells at different development stages using scRNA-Seq. Whereas adult microglia were quite homogenous and independent of the brain region, early post-natal microglia showed interesting heterogeneity. The scientists discovered a proliferative region associated microglia (PAM) in the developing white matter displaying a similar transcriptomic profile to degenerative disease-associated microglia. PAM transiently appear in the white matter during myelination. Further investigations are needed to fully understand the role of PAM in physiological and pathological brain development.
The Future of Single-Cell RNA-Seq Assays
In 2020, more than 500 research articles used scRNA-Seq technology, suggesting that single-cell technology is becoming widely adopted in research labs. Single-cell technology has paved the way to the discovery of under-representative cell populations, signaling pathways, and pathological mechanisms.
Above, we related scRNA-Seq to new discoveries in cancer, development, and neurological diseases but it is just a glimpse into the possibilities for this technology. Single-cell has already been adapted to other omic experiments, including ATAC-seq and WGS. Combining all these protocols will give an entire picture of how the cell works in a physiological or pathological context, with genomic mutation, chromatin accessibility, and transcriptome.
With the development of ready-to-use single-cell stations, with adapted reagents and bioinformatic software, single-cell technology could be also used for personalized medicine. Identifying deregulated pathways could facilitate the anticipation of chemoresistance for cancer as well as giving the correct posology on the first attempt.
Nonetheless, single-seq RNA-Seq will bring a lot of new knowledge in basic biology but also a better understanding of major diseases such as diabetes, Alzheimer or cancer.
Summary: Single Cell Assays are Taking Research to the Next Level
As pyrosequencing has replaced Sanger sequencing, RNA-Seq has supplanted micro-arrays and opened the door to wide transcriptomic analysis. Now RNA-Seq has been adapted to single-cell technology. The possibility of analyzing the transcriptome in heterogeneous populations, during dynamic processes, without cell-sorting or purification steps unlocks a whole new world of understanding.
scRNA-Seq is one of the tools already available for researchers. The combination of several single-cell technologies will allow scientists to address new questions by deciphering complex mechanisms in heterogeneous samples.
With the development of commercially available single-cell platforms, more and more labs will be equipped to run this kind of experiment. We look forward to the fantastic discoveries coming out of these studies.
About the author
Anne-Sophie Ay-Berthomieu, Ph.D.
Anne-Sophie was born in the south of France and grew up between the Mediterranean Sea and the Pyrenean Mountains. She grew up as a science fiction fan, leading her to specialize in molecular biology and genetics during graduate school at the University of Lyon, France (secretly hoping her research would give her superpowers!). After living in different places for work, she is back in Lyon, France where she shares her time between her husband, her family, and her friends. During her free time, Anne-Sophie challenges herself with hiking, climbing, racing, and traveling in foreign countries – while waiting for her superpowers to grow!
Contact Anne-Sophie on LinkedIn with any questions, or to tell her about your superpowers.
Beginner’s Guide to Understanding Single-Cell ATAC-Seq
September 30, 2020
In this article, we describe how single-cell ATAC-Seq (scATAC-Seq) works and highlight the benefits and the drawbacks of scATAC-Seq relative to bulk ATAC-Seq.
The Roles of RNA Epigenetics & RNA Processing in Embryonic Development
July 26, 2019
Early events in embryonic development involve many different mechanisms, including epigenetic changes and post-transcriptional activities such as mRNA processing and modifications. A recent paper investigated the roles of m6A and the RNA-binding protein hnRNPA2/B1 in this process.
<< Back to MOTIFvations Blog Home Page