All transcripts of all genes have been analyzed regarding the location(s) of corresponding protein based on prediction methods for signal peptides and transmembrane regions.
Genes with at least one transcript predicted to encode a secreted protein, according to prediction methods or to UniProt location data, have been further annotated and classified with the aim to determine if the corresponding protein(s) are secreted or actually retained in intracellular locations or membrane-attached.
Remaining genes, with no transcript predicted to encode a secreted protein, will be assigned the prediction-based location(s).
The annotated location overrules the predicted location, so that a gene encoding a predicted secreted protein that has been annotated as intracellular will have intracellular as the final location.
Number of protein-coding transcripts from the gene as defined by Ensembl.
HUMAN PROTEIN ATLAS INFORMATIONi
Summary of RNA expression and protein localization based on data generated within the Human Protein Atlas project.
Cell line expression clusteri
The RNA data was used to cluster genes according to their expression across cell lines. Clusters contain genes that have similar expression patterns, and each cluster has been manually annotated to describe common features in terms of function and specificity.
RNA specificity category based on RNA sequencing data from all cell lines in the Human Protein Atlas. Genes are classified into six different categories (enriched, group enriched, enhanced, low specificity and not detected) according to their RNA expression levels across the panel of cell lines.
Cell line enhanced (A549, BEWO, HBEC3-KT, hTERT-HME1, RT4, SiHa, SK-BR-3)
Cell line distributioni
RNA distribution category based on RNA sequencing data from all cell lines in the Human Protein Atlas. Genes are classified into five different categories (detected in all, detected in many, detected in some, detected in single and not detected) according to their pattern of detected RNA expression across the panel of cell lines.
Evidence score for genes based on UniProt protein existence (UniProt evidence); a Human Protein Atlas antibody- or RNA based score (HPA evidence); and evidence based on PeptideAtlas (MS evidence). The avaliable scores are evidence at protein level, evidence at transcript level, no evidence, or not avaliable.
RNA expression data as normalized transcript per million (nTPM) values of tissue culture cell lines.The analyzed cell lines are divided into 16 color-coded groups according to the organ they were obtained from. Detailed information about a specific cell line is revealed by hovering over the corresponding bar in the chart. More information and cell line data can be found in the Cell line section .
The RNA data was used to cluster genes according to their expression across samples. The resulting clusters have been manually annotated to describe common features in terms of function and specificity. The annotation of the cluster is displayed together with a confidence score of the gene's assignment to the cluster. The confidence is calculated as the fraction of times the gene was assigned to this cluster in repeated calculations and is reported between 0 to 1, where 1 is the highest possible confidence. The clustering results are shown in a UMAP, where the cluster this gene was assigned to is highlighted as a colored area in which most of the cluster genes reside. A table shows the 15 most similar genes in terms of expression profile.
KRT7 is part of cluster 15RT4 - Unknown function with confidencei
Confidence is the fraction of times a gene was assigned to the cluster in repeated clustering, and therefore reflects how strongly associated it is to the cluster. A confidence of 1 indicates that the gene was assigned to this cluster in all repeated clusterings.