The cell cycle dependent transcriptome and proteome
The cell cycle is an ordered and tightly regulated series of events over which the cell grows and divides into two daughter cells. It consists of four stages, during which the cell increases in size (G1), replicates its genome (S), increases further in size and prepares for mitosis (G2), and finally goes through mitosis as well as cytokinesis (M). Depending on external and internal signals, the cell may also exit the replicative cell cycle from G1 and enter a non-replicative resting state (G0). Dysregulation of the cell cycle is known to have devastating consequences, such as uncontrolled cell proliferation, genomic instability (Malumbres M et al. (2009)), and cancer (Massagué J. (2004); Hartwell LH et al. (1994)). Therefore, the cell cycle needs to be tightly controlled, while at the same time remaining responsive to various intracellular and extracellular signals (Barnum KJ et al. (2014)). The cell cycle control system involves an intricate network of proteins that are tightly regulated by mechanisms such as transcriptional regulation (Weinberg RA. (1995)), protein post-translational modifications (PTMs) (Morgan DO. (1995)), and protein degradation (Teixeira LK et al. (2013); King RW et al. (1996)).
In asynchronous cell cultures, the cell cycle is a fundamental source of cell-to-cell variation in both transcript and protein abundances (Cho RJ et al. (2001); Whitfield ML et al. (2002); Boström J et al. (2017); Lane KR et al. (2013); Ohta S et al. (2010); Ly T et al. (2014); Pagliuca FW et al. (2011); Ly T et al. (2015)). The Subcellular Section provides a resource to explore protein heterogeneity at the single cell level in unperturbed log-phase growing cells. Among the 13041 genes in the Subcellular Section, a quarter (3193) show cell-to-cell variation in terms of expression level and/or spatial distribution of the encoded protein(s) in at least one cell line in the regular ICC-IF pipeline. For a subset of these genes, the temporal protein and RNA expression patterns have been further characterized in individual cells using the Fluorescent Ubiquitination-based Cell Cycle Indicator (FUCCI) U-2 OS cell line (Mahdessian D et al. (2021)). In this study, 311 of the genes now present in the Subcellular section were found to correlate with progression through interphase. In addition, there is currently 354 genes encoding proteins that are defined as cell cycle dependent (CCD) by their localization to mitotic structures, giving a total of 640 CCD proteins. Single cell sequencing of FUCCI U-2 OS cells sorted according to cell cycle phase have also identified 529 genes that encode CCD transcripts. This spatially resolved proteomic map of the cell cycle has been integrated into the Subcellular section in order to provide a resource for molecular insights into the human cell cycle and cellular proliferation.
Single-cell variation in the Subcellular Section
Genetically identical cells may exhibit differences in their patterns of gene- and protein expression. This phenomenon is often referred to as cell-to-cell variation or single-cell variation (SCV). While it is hypothesized that there is an underlying functional importance to this variability, the scale and significance of variations at the single-cell level remains poorly understood (Dueck H et al. (2016)). Environmental changes, DNA damage, cell cycle progression, and stochasticity are examples of factors that may cause changes in RNA and protein expression within isogenic cell populations, and thus serve as sources of single-cell heterogeneity (Snijder B et al. (2011)). This may create different phenotypic characteristics within individual cells and provide them with a molecular and phenotypic fingerprint. Identification of all human proteins that display single-cell variation lays a foundation for characterizing the driving forces of single-cell heterogeneity, and for understanding the functional consequences.
In an immunofluorescence (IF) image, single-cell protein variations can be observed as differences in the staining intensity or spatial distribution between cells, as exemplified in Figure 1. Interestingly, as many as 3193 of all human proteins localized in the Subcellular Section show single-cell variations (Thul PJ et al. (2017)). Of these, 3074 proteins show variations in expression level (staining intensity), 206 proteins show variations in spatial distribution, and 87 proteins show both types of variation.
Figure 1. Examples of proteins showing single-cell variation. GTPBP8 is a GTP binding protein (detected in U-2 OS cells). CLCN6 is a chloride transport protein (detected in U-2 OS cells). INCENP is a component of the chromosomal passenger complex (CPC) that is a key regulator of mitosis (detected in MCF7 cells). RACGAP1 has a key role in controlling cell growth and cell division (detected in U-2 OS cells). RRM2 provides precursors necessary for DNA synthesis (detected in U-2 OS cells). KIF20A is a mitotic kinesin required for cytokinesis (detected in U-2 OS cells). DUSP18 and DUSP19 are phosphatases (detected in A-431 and SK-MEL-30 cells, respectively). CCNB1 is a key regulator of the cell cycle at the G2/M transition for cell division (detected in U-2-OS cells). The target protein is shown in green, microtubules in red, and the nucleus in blue.
Single-cell variation is most commonly observed for proteins in the nucleoplasm, cytosol, vesicles, nucleoli and mitochondria (Figure 2). Gene Ontology (GO)-based enrichment analysis of genes encoding proteins with single-cell variation at protein level reveals an enrichment of GO terms describing numerous biological processes, including DNA repair, translation, apoptosis, transcription, cell cycle progression and metabolism (Figure 3). The enriched terms for the GO domain Molecular Function describes many different enzymatic activities as well as binding to DNA, RNA and chromatin.
Figure 2. Localizations of proteins showing single-cell variations to the different organelles, grouped by meta-compartments.
Figure 3. Gene Ontology-based enrichment analysis for genes encoding proteins with single-cell variations, showing the significantly enriched terms for the GO domain Biological Process. Each bar is clickable and gives a search result of proteins that belong to the selected category.
Figure 4. Gene Ontology-based enrichment analysis for genes encoding proteins with single-cell variations, showing the significantly enriched terms for the GO domain Molecular Function. Each bar is clickable and gives a search result of proteins that belong to the selected category.
Interphase proteogenomics in single cells
Previous studies of transcript and protein abundance in different phases of the human cell cycle have revealed variations in the expression of 400-1,200 genes (Cho RJ et al. (2001); Whitfield ML et al. (2002); Boström J et al. (2017)) and 300-700 proteins (Lane KR et al. (2013); Ohta S et al. (2010); Ly T et al. (2014); Pagliuca FW et al. (2011); Ly T et al. (2015)). However, cell synchronization is known to alter gene expression (Cooper S et al. (2007)), cell morphology and metabolism (Davis PK et al. (2001)), and precludes the discovery of expression changes within cell cycle phases. The use of single-cell RNA sequencing has allowed the analysis of transcriptional changes without the need for synchronization and has enabled the discovery of additional cell cycle regulated genes (Domenighetti G et al. (1988); Scialdone A et al. (2015)). However, studies of cell cycle dependent (CCD) variations in protein expression at single-cell level have been lacking due to technological limitations.
The HPA Subcellular Section now includes a targeted single-cell transcriptomic analysis, as well as proteomic imaging (i.e., imaging proteogenomics, Figure 5) of 1137 variable proteins that are expressed in FUCCI U-2 OS cells (Sakaue-Sawano A et al. (2008); Mahdessian D et al. (2021)). This cell line expresses a pair of fluorescently tagged marker proteins, Cdt1 tagged with red fluorescent protein (RFP) and Geminin tagged with green fluorescent protein (GFP), which enable visualization of interphase progression in individual cells. The intensities of the RFP- and GFP-tagged cell cycle markers can be used to create a linear representation of cell cycle pseudo time, enabling protein and RNA expression in individual cells to be plotted along an axis representing progression through interphase.
Figure 5. Schematic overview of the single-cell imaging proteogenomic workflow. U-2 OS FUCCI cells express two fluorescently tagged cell cycle markers, CDT1 during G1 phase (red, RFP-tagged) and Geminin during S and G2 phases (green, GFP-tagged); these markers are co-expressed during the G1-S transition (yellow). By fitting a polar model to the red and green fluorescence intensities, a linear representation of cell cycle pseudotime is obtained. Independent measurements of RNA and protein expression are compared after pseudotime alignment of individual cells.
The single-cell RNA-sequencing data from the FUCCI U-2 OS cells enables analysis of RNA abundance in relation to cell cycle progression. This analysis has led to the identification of 529 genes that show variance in RNA expression levels that correlate to interphase cell cycle progression.
In the single-cell proteomic imaging analysis, 311 proteins display variation in protein expression levels that temporally correlate with interphase progression through G1, S and G2. These cell cycle dependent (CCD) proteins include known cell cycle regulators, such as the cyclin CCNB1 and ANLN, which is required for cytokinesis, but also novel CCD proteins, such as DUSP18 (Figure 6). However, most proteins (826) show cell-to-cell variations that are largely unexplained by cell cycle progression (non-CCD). This opens up intriguing avenues for further exploration of the stochasticity or deterministic factors that govern these variations, as well as the role of spatiotemporal proteome dynamics for regulating other cellular states and functions.
Figure 6. Examples of temporal expression profiles for single cell protein (blue) and RNA (orange) expression. The boxplot shows a mock-up bulk proteomic experiment.
Proteins in mitotic structures
In addition to proteins that show single-cell variations due to progression through interphase, there are 354 genes in the Subcellular section encoding proteins that are defined as cell cycle dependent (CCD) as they localize to mitotic structures, including mitotic chromosomes (70), mitotic spindle (89), kinetochores (5), cytokinetic bridge (160), midbody (56), midbody ring (30) and cleavage furrow (1). Examples of these can be seen in Figure 7.
Figure 7. Example images of proteins localized to mitotic substructures: KIF20A to cleavage furrow, TAF1D, TACC3, KIF11 and CKAP2L to mitotic spindle, BIRC5 to cytokinetic bridge, DVL3 and CTTNBP2 to midbody ring, and SGO1 to kinetochores.
Localizations of the cell cycle dependent proteome
In total, there are 640 genes encoding variable proteins that have been identified as cell cycle dependent (CCD) and 826 genes encoding variable proteins that have been identified as cell cycle independent (non-CCD) in the Subcellular Section. The high resolution of the HPA Subcellular Section dataset allows us to look at the subcellular localizations of proteins showing CCD and non-CCD variability in protein expression (Figure 8). Larger fractions of the CCD proteins are found in mitotic structures, while larger fractions of the non-CCD variable proteins localize to e.g. the cytosol, mitochondria and plasma membrane. Almost half of the CCD variable proteins reside in the nuclear meta compartment, including the nucleus, nuclear speckles, nuclear bodies, and nucleoli. This is in agreement with one of the main functions of the nucleus in replication and separation of DNA during the cell cycle.
Figure 8. Bar plot showing the subcellular localizations enriched for CCD proteins (blue) and non-CCD proteins (red) relative to the proteome mapped in the HPA.
Temporal delay between RNA and protein
Previous studies have shown that many RNAs peak in expression in the G1 phase, which is also the longest period of the cell cycle (Boström J et al. (2017); Grant GD et al. (2013)). Among the 529 genes for which RNA expression is correlated to the cell cycle in FUCCI U-2 OS cells, (248) peak in G1. However, most proteins that show cell cycle dependent expression (241) peak towards the end of the cell cycle, corresponding to late S and G2 (Figure 9). This seems to reflect a temporal delay between RNA and protein expression Mahdessian D et al. (2021).