Spatial TranscriptomicsRecent development in transcriptomics technologies (detection of RNA) enable quantification of RNA content of tissues and single cells. Especially spatial transcriptomics technologies that detect both transcripts and their spatial location have become available for researchers. The major breakthroughs in the field of spatial transcriptomics have been the spatial resolution (< 1 micrometer) and the possibilities to simultaneously detect all transcripts (genome wide). The major advantages of high-resolution spatial transcriptomics methods compared to single cell or single nuclei transcriptomics are the ability to investigate all cells including rare cells in a single tissue section while maintaining information on the cellular environment and neighboring cells. Especially in the field of neuroscience where the complexity and molecular diversity of the brain form challenges that make it extremely difficult to generate detailed and complete maps of protein expression using ‘bulk’ or single nuclei transcriptomics. Stereo-seq data on the human protein atlasThe brain resource now contains the first spatial transcriptomics data of the human ‘healthy’ cerebral cortex (frontal cortex). Based on single nuclei transcriptomics we determined the likelihood of transcripts to be expressed in same cell-type. By creating a transcript-to-transcript matrix we can predict the location of transcripts or combining several cell-type markers predict the cell-type for each location in the spatial transcriptomics data. Figure 1: Integration of Single Nuclei and Spatial Transcriptomics Data for Cell Segmentation and Transcript Location Imputation. A) Co-expression Analysis: Single nuclei transcriptomics data is utilized to assess the co-expression of genes. By scaling these values using a z-score, we determine the likelihood of genes being co-expressed within the same cell. B) RNA Location Imputation: This co-expression information enables imputation of RNA locations. For each protein-coding gene at every tissue spot, we calculate the support for that transcript by summing the z-score values of all neighboring transcripts. C) Identification of Cell-Type Marker Genes: Genes are clustered based on their co-expression profiles (all genes to all genes). These clusters were annotated using known cell-type markers. Clusters containing gene sets enriched in neurons, astrocytes, oligodendrocytes, microglia, and vasculature-associated cells where selected. D) Creating cell-type masks: Using the accumulated gene-gene co-expression data, we calculate the likelihood of each tissue spot belonging to one of the annotated clusters, based on the identity of neighboring transcripts.*
Creating cell-type masks and quantification of protein expressionIn the first version of stereo-seq resource the location of vascular cells, astrocytes, oligodendrocytes, microglia and neurons was determined using sets of marker genes with elevated expression in the corresponding cell-types. This enables to link every location within the spatial transcriptomics data to a main cell type. For each protein coding transcript, the number of counts in each cell-type is represented as bars. It should be noted that spatial diffusion and overlapping cells cause a level of noise. human cerebral cortex Astrocytes
Oligodendrocytes
Microglia
Neurons
Vasculature
human cerebellum Purkinje cells
Astrocytes
Oligodendrocytes
Microglia
Neurons
Vasculature
Figure 2: Overview of the cell type masks. Available samplesA summary of spatial resolved protein expression in the layers of the cerebral cortex (frontal cortex) is presented in the cerebral cortex summary including the top genes with elvated expression in astrocytes, neurons,oligodendrocytes, microglia, and vasculature associated cells. Data on gene expression in the layers of the cerebellum is presented in cerebellum summary. Imputation of transcript locationThe 1 by 1 cm stereo-seq chip has approximately 400 million points. On average we detect about 100 million transcripts. Plotting the real detected transcripts provides a sparse image with little information about the tissue. For this version we utilized the single nuclei transcriptomics data to calculate and predict the likelihood of a transcript to be located at every location. The color of the pixel indicates the cell-type mask of this predicted location. Proteins with elevated expression in a single cell type are mostly predicted to tissue areas assigned to the cell type. It should be noted that the total number of captured transcripts and area covered is dominated by neurons. When imputing posible transcript location many of the results reveal a neuron dominated picture mainly due to low area covered and low transcript counts for non-neuronal cell-types.
Figure 3: Imputation of transcript location based on co-expression data. For each pixel that contains transcripts an area with a radius of 5 micrometer is explored to calculate the likelyhood for expression of all protein coding genes at that location. These are not real measurements but predictions. Current limitation and challenges and future perspectivesThe field of spatial transcriptomics is relatively new and with the latest developments we now in theory can perform single cell spatial transcriptomics analysis. Spatial transcriptomics analysis of brain samples: The major technical challenges in molecular neuroscience have a biological origin. The brain has 1) many cell-types and cell states, 2) cell-types have different sizes and morphologies, 3) cell-types have different levels of overall transcriptional activity, and 4) the difference between sub-types or cell states often is a small fraction of total transcriptome. To create a molecular map of the brain at single cell resolutions we need to 1)capture all cell-types and cell-states in a large enough numbers to generate the necessary statistical power to compare between cell-types or cell-states, 2) create strategies to compare between cell-types with different morphologies and total transcript content, and 3) create sensitive assays to capture the minor but biological relevant differences between sub-types and cell states. The HPA approach to map the molecular and cellular landscape of the brain: In the field of high-resolution transcriptomics (resolution < single cell) several methods have been developed to group individual counts to single cells. Many of the currently used methods are imaged based and use a nuclear staining to define the cell-center and use a radius approach to link detected transcripts to these cells. This approach works for many tissues especially if the cells in these tissues have similar (round) shapes and overall total number of transcripts. For the analysis of brain, these methods are not well suited and have difficulty capturing cells with low number of transcripts (e.g. microglia, vascular cells) especially if these are near neurons that have high number of transcripts. In the Human Protein atlas project, we therefore use co-expression to link individual spots to a cell-types creating a cell-type mask. In the current version we demonstrate how this can be used to identify protein coding transcripts elevated in one of the five major cell-types and how to impute the possible location of transcripts in the cerebral cortex. Future perspective: The data presented in the current version is based on main cell-type masks and the main tissue layers but has not reached single cell resolution and does not provide information on sub-types (e.g. cortical layers, interneurons etc). The next tasks are 1) capture and link gene counts to define the molecular signatures of all cell types and cell states in tissue layers 2) add more nervous and peripheral datasets to the protein atlas. Relevant links and publications Ståhl PL et al., Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. (2016) Chen A et al., Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. (2022) Liu L et al., Spatiotemporal omics for biology and medicine. Cell. (2024) |