Explore the Subcellular location UMAP

Subcellular methods

Uniform Manifold Approximation and Projection (UMAP) is an analytical technique for reducing the dimensionality of a data set (Becht E et al. (2018)). The subcellular location UMAP is generated using the large collection of confocal microscopy images showing the subcellular localization patterns of human proteins. A machine learning model trained to classify the subcellular locations in these images is used to extract 1024 features from each image in the subcellular section of the Human Protein Atlas (Ouyang W et al. (2019)). The dimensionality of this dataset is then reduced by uniform manifold approximation and projection (UMAP). The result is displayed in a two- or three dimensional scatter plot, where each data point represents one image. This tool provides a new way to visualize and explore the highly dimensional protein localization data that makes up the subcellular resource and find images that group together based on similiarity of these features. By coloring the data points, each representing one image, according to subcellular localizations it is evident that images of proteins localizing to the same compartment tend to cluster together. Overlaying the UMAP projection with different data can allow you to find new staining patterns and identify interesting groups of genes, in a large and complex data set.

Clicking a data point in the plot displays the corresponding image together with information about gene name, cell line, annotated subcellular location(s), and antibody. The legend below the UMAP can be used to toggle the different subcellular locations on and off in the UMAP. Click on one location in the legend to only display data points for images with an annotation of that structure. You can select multiple subcellular locations at the same time. Clicking again on one of the selected subcellular locations will deselect it, while clicking on Clear filter will reset and display all data points in the UMAP again. Images with annotations of multiple locations, representing multilocalizing proteins, are shown in grey.

A strength of the HPA database is the gene-centric integrations of a large collection of different datasets. The search function allows you to search for an individual gene, but also to perform complex filtering of the data points in the UMAP. Using pre-defined search terms, images can be filtered based on general gene information (eg. gene name or chromosome location) as well as data from all different sections of the HPA (eg tissue expression or prognostic cancer association). Read more about about how to use the search function here.

Show moreShow less

Images:

Genes:



Antibody
Cell line
Location(s) in image
Gene name(s)

e.g. Metabolic proteins, Localized to mitochondria, Genes tissue enriched in brain (RNA)

Field

Cluster

Cell type

Legend - click to toggle in UMAPⁱ

The legend can be used to toggle the different subcellular locations on and off in the UMAP. Click on one location in the legend to only display data points for images with an annotation of that structure. You can select multiple subcellular locations at the same time. Clicking again on one of the selected subcellular locations will deselect it.

Multilocalizing

Cytoplasm

Actin filaments

Aggresome

Centriolar satellite

Centrosome

Cleavage furrow

Cytokinetic bridge

Cytoplasmic bodies

Cytosol

Focal adhesion sites

Intermediate filaments

Microtubule ends

Microtubules

Midbody

Midbody ring

Mitochondria

Mitotic spindle

Rods & Rings

Endomembrane system

Cell Junctions

Endoplasmic reticulum

Endosomes

Golgi apparatus

Lipid droplets

Lysosomes

Peroxisomes

Plasma membrane

Vesicles

Nucleus

Kinetochore

Mitotic chromosome

Nuclear bodies

Nuclear membrane

Nuclear speckles

Nucleoli

Nucleoli fibrillar center

Nucleoli rim

Nucleoplasm