Downloadable data

  Programmatic access
If you want to programmatically access a subset of the data more information can be found on the help page
 
  Search results
The data files represented here includes data available in the Human Protein Atlas version 21.1. A subset of this data can also be downloaded from the Search page with the genes corresponding to the current search result in the result in different formats; XML, RDF, TSV & JSON.
 
  Single entry
Data for a single entry can be accessed in XML, RDF (trig), TSV or JSON format by adding the corresponding format extension to the Ensembl id as in the below URLs:
http://www.proteinatlas.org/ENSG00000134057.xml
http://www.proteinatlas.org/ENSG00000134057.trig
http://www.proteinatlas.org/ENSG00000134057.tsv
http://www.proteinatlas.org/ENSG00000134057.json

 
  Archived data
As of version 13 of the Human Protein Atlas, the site can be reached using the url structure "http://vXX.proteinatlas.org" where XX is the version number. For example, version 13 of the Human Protein Atlas has the url http://v13.proteinatlas.org.

 
1 Normal tissue data
Expression profiles for proteins in human tissues based on immunohistochemisty using tissue micro arrays. The tab-separated file includes Ensembl gene identifier ("Gene"), tissue name ("Tissue"), annotated cell type ("Cell type"), expression value ("Level"), and the gene reliability of the expression value ("Reliability"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

normal_tissue.tsv.zip
TSV-file (zip compressed), 5.3 MB
 
2 Pathology data
Staining profiles for proteins in human tumor tissue based on immunohistochemisty using tissue micro arrays and log-rank P value for Kaplan-Meier analysis of correlation between mRNA expression level and patient survival. The tab-separated file includes Ensembl gene identifier ("Gene"), gene name ("Gene name"), tumor name ("Cancer"), the number of patients annotated for different staining levels ("High", "Medium", "Low" & "Not detected") and log-rank p values for patient survival and mRNA correlation ("prognostic - favorable", "unprognostic - favorable", "prognostic - unfavorable", "unprognostic - unfavorable"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

pathology.tsv.zip
TSV-file (zip compressed), 3.4 MB
 
3 Subcellular location data
Subcellular location of proteins based on immunofluorescently stained cells. The tab-separated file includes the following columns: Ensembl gene identifier ("Gene"), name of gene ("Gene name"), gene reliability score ("Reliability"), enhanced locations ("Enhanced"), supported locations ("Supported"), Approved locations ("Approved"), uncertain locations ("Uncertain"), locations with single-cell variation in intensity ("Single-cell variation intensity"), locations with spatial single-cell variation ("Single-cell variation spatial"), locations with observed cell cycle dependency (type can be one or more of biological definition, custom data or correlation) ("Cell cycle dependency"), Gene Ontology Cellular Component term identifier ("GO id")
The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

subcellular_location.tsv.zip
TSV-file (zip compressed), 211.4 KB
 
4 RNA consensus tissue gene data
Consensus transcript expression levels summarized per gene in 54 tissues based on transcriptomics data from HPA and GTEx. The consensus normalized expression ("nTPM") value is calculated as the maximum nTPM value for each gene in the two data sources. For tissues with multiple sub-tissues (brain regions, lymphoid tissues and intestine) the maximum of all sub-tissues is used for the tissue type. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue") and normalized expression ("nTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_tissue_consensus.tsv.zip
TSV-file (zip compressed), 5.4 MB
 
5 RNA HPA tissue gene data
Transcript expression levels summarized per gene in 256 tissues based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue"), transcripts per million ("TPM"), protein-transcripts per million ("pTPM") and normalized expression ("nTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.
RNA sequencing data for human tissue

rna_tissue_hpa.tsv.zip
TSV-file (zip compressed), 39.9 MB
 
6 RNA GTEx tissue gene data
Transcript expression levels summarized per gene in 37 tissues based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue"), transcripts per million ("TPM"), protein-transcripts per million ("pTPM") and normalized expression ("nTPM"). The data was obtained from GTEx and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_tissue_gtex.tsv.zip
TSV-file (zip compressed), 5.8 MB
 
7 RNA FANTOM tissue gene data
Transcript expression levels summarized per gene in 60 tissues based on CAGE data. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue"), tags per million ("Tags per million"), scaled-tags per million ("Scaled tags per million") and normalized expression ("nTPM"). The data was obtained from FANTOM5 and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_tissue_fantom.tsv.zip
TSV-file (zip compressed), 9.5 MB
 
8 RNA single cell type data
Transcript expression levels summarized per gene in 76 cell types from 26 datasets. The tab-separated file includes Ensembl gene identifier ("Gene"), gene name ("Gene name"), analysed sample ("Cell type") and normalized expresion ("nTPM"). Information about the external datasets and processing of the data can be found here.

rna_single_cell_type.tsv.zip
TSV-file (zip compressed), 7.6 MB
 
9 RNA single cell type tissue cluster data
Transcript expression levels summarized per gene and cluster in 26 datasets. The tab-separated file includes Ensembl gene identifier ("Gene"), gene name ("Gene name"), tissue ("Tissue"), analysed sample ("Cell type"), cluster ("Cluster"), read count ("Read count") and protein-transcripts per million ("pTPM"). Information about the external datasets and processing of the data can be found here.

rna_single_cell_type_tissue.tsv.zip
TSV-file (zip compressed), 44.8 MB
 
10 RNA single cell read count data
Read count per gene and cell in 26 datasets. The tab-separated file is in matrix format with Ensembl gene identifiers as columns and single cells as rows. Columns included are tissue ("Tissue"), cell ("Cell") and cluster ("Cluster"). Information about the external datasets and processing of the data can be found here.

rna_single_cell_read_count.tsv.zip
TSV-file (zip compressed), 1.6 GB
 
11 RNA GTEx brain region gene data
Transcript expression levels summarized per gene in 10 brain regions based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Brain region"), transcripts per million ("TPM"), protein-transcripts per million ("pTPM") and normalized expression ("nTPM"). The data was obtained from GTEx and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_brain_gtex.tsv.zip
TSV-file (zip compressed), 1.6 MB
 
12 RNA FANTOM brain region gene data
Transcript expression levels summarized per gene in 14 brain regions based on CAGE. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Brain region"), tags per million ("Tags per million"), scaled-tags per million ("Scaled tags per million") and normalized expression ("nTPM"). The data was obtained from FANTOM5 and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_brain_fantom.tsv.zip
TSV-file (zip compressed), 2.3 MB
 
13 RNA pig brain region gene data
Transcript expression levels summarized per gene in 15 brain regions based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Brain region") and transcripts per million ("TPM") and protein-coding transcripts per million ("pTPM") and normalized expression ("nTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_pig_brain_hpa.tsv.zip
TSV-file (zip compressed), 2 MB
 
14 RNA pig brain subregion sample gene data
Transcript expression levels summarized per gene in 32 brain subregions per sample based on RNA-seq. The tab-separated file includes Ensembl gene identifier for pig gene ("Gene"), main region ("Main region"), sub region ("Sub region"), animal ("Animal"), transcripts per million ("TPM") and protein-coding transcripts per million ("pTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_pig_brain_sample_hpa.tsv.zip
TSV-file (zip compressed), 72.3 MB
 
15 RNA mouse brain region gene data
Transcript expression levels summarized per gene in 13 brain regions based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Brain region") and transcripts per million ("TPM") and protein-coding transcripts per million ("pTPM") and normalized expression ("nTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_mouse_brain_hpa.tsv.zip
TSV-file (zip compressed), 1.8 MB
 
16 RNA mouse brain subregion sample gene data
Transcript expression levels summarized per gene in 19 brain subregions per sample based on RNA-seq. The tab-separated file includes Ensembl gene identifier for mouse gene ("Gene"), main brain region ("Main region"), subregion ("Subregion"), animal ("Animal"), transcripts per million ("TPM") and protein-coding transcripts per million ("pTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_mouse_brain_sample_hpa.tsv.zip
TSV-file (zip compressed), 35.9 MB
 
17 RNA Allen mouse brain region gene data
Transcript expression levels summarized per gene in 11 brain regions based on ISH. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue") and expression energy ("Expression energy"). The data was obtained from Allen brain atlas and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_mouse_brain_allen.tsv.zip
TSV-file (zip compressed), 704.8 KB
 
18 RNA HPA blood cell gene data
Transcript expression levels summarized per gene in 18 cell types and total PBMC. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Blood cell"), transcripts per million ("TPM"), protein-coding transcripts per million ("pTPM") and normalized expression ("nTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_blood_cell.tsv.zip
TSV-file (zip compressed), 2.6 MB
 
19 RNA HPA blood cell sample gene data
Transcript expression levels summarized per gene in 109 blood cell samples. "rna_blood_cell_sample.tsv.zip" includes Ensembl gene identifier ("Gene"), analysed sample ("Blood cell"), donor ("Donor"), transcripts per million ("TPM"), protein-coding transcripts per million ("pTPM"). "rna_blood_cell_sample_tpm_m.tsv.zip" is in matrix format and only includes TPM. The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_blood_cell_sample.tsv.zip
TSV-file (zip compressed), 32.8 MB
rna_blood_cell_sample_tpm_m.tsv.zip
TSV-file (zip compressed), 5 MB
 
20 RNA Monaco blood cell gene data
Transcript expression levels summarized per gene in 30 blood cell types. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Blood cell"), transcripts per million ("TPM") and protein-coding transcripts per million ("pTPM"). The data was obtained from Monaco publication and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_blood_cell_monaco.tsv.zip
TSV-file (zip compressed), 3.1 MB
 
21 RNA Schmiedel blood cell gene data
Transcript expression levels summarized per gene in 15 blood cell types. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Blood cell") and transcripts per million ("TPM"). The data was obtained from Schmiedel publication and is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_blood_cell_schmiedel.tsv.zip
TSV-file (zip compressed), 1.5 MB
 
22 RNA HPA cell line gene data
Transcript expression levels summarized per gene in 69 cell lines. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Cell line"), transcripts per million ("TPM"), protein-coding transcripts per million ("pTPM") and normalized expression ("nTPM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.
RNA sequencing data for human cell lines

rna_celline.tsv.zip
TSV-file (zip compressed), 10.3 MB
 
23 RNA TCGA cancer sample gene data
Transcript expression levels summarized per gene in 7932 samples from 17 different cancer types. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Sample"), cancer type ("Cancer") and fragments per kilobase million ("FPKM"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

rna_cancer_sample.tsv.zip
TSV-file (zip compressed), 1.1 GB
 
24 RNA isoform data
Transcript expression levels in 141 cell line samples, 1506 human tissue samples, 105 GTEx retina samples, 328 PBMC samples, 109 blood cell samples, 144 pig brain tissue samples and 75 mouse brain tissue samples based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), Ensembl transcript identifier ("Transcript"), transcript per million ("TPM") and estimated counts ("est_counts") of the analysed sample ("sample_name.sample_id"). The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.

transcript_rna_celline.tsv.zip
TSV-file (zip compressed), 116.9 MB
transcript_rna_tissue.tsv.zip
TSV-file (zip compressed), 1.4 GB
transcript_rna_gtexretina.tsv.zip
TSV-file (zip compressed), 91.8 MB
transcript_rna_pbmc.tsv.zip
TSV-file (zip compressed), 285.2 MB
transcript_rna_bloodcells.tsv.zip
TSV-file (zip compressed), 78.3 MB
transcript_rna_pigbrain.tsv.zip
TSV-file (zip compressed), 51.7 MB
transcript_rna_mousebrain.tsv.zip
TSV-file (zip compressed), 39 MB
 
25 Data from the Human Protein Atlas in tab-separated format
This file contains a subset of the data in the Human Protein Atlas version 21.1 corresponding to the data seen in the search result. This data can also be downloaded for a resulting gene set when using the search function (via the TSV link on the result page).

proteinatlas.tsv.zip
TSV-file (zip compressed), 11.6 MB
 
26 Data from the Human Protein Atlas in json format
This file contains the same subset of the data as the above proteinatlas.tsv but in a different format and potentially more useful for 3rd party web APIs. This data can also be downloaded for a resulting gene set using the search function (via the Download: Custom TSV/JSON link on the result page).

proteinatlas.json.gz
TSV-file (gz compressed), 27.9 MB
 
27 Data from the Human Protein Atlas in XML format
The XML file contains most of the data in the Human Protein Atlas version 21.1, including protein expression data (in normal and tumor tissues and in cell lines), antigen sequences, Western blot data for antibodies, protein array data for antibodies, RNA-seq data, external references such as UniProt identifiers, and more. The data is based on Ensembl version 103.38. The file structure is presented in the XSD-schema. This data can also be downloaded for a resulting gene set when using the search function (via the xml link on the result page).
The XML file presented here is compressed with gzip due to its size. It can be uncompressed with an archive program like 7‑zip.

proteinatlas.xml.gz
XML-file (gzip compressed), 911.1 MB
 
28 Data from the Human Protein Atlas in RDF format
This file contains a subset of the data in the Human Protein Atlas version 21.1 corresponding to the tissue annotations on gene level. This data can also be downloaded for a resulting gene set when using the search function (via the RDF link on the result page). This RDF release is BETA and will be extended and developed in coming releases. We thank Mark Thompson, Rajaram Kaliyaperumal and Eelke van der Horst (LUMC, The Netherlands), and Christine Chichester (SIB, Switzerland) for providing templates for generating the first beta-release of HPA nanopublications. Their contribution was made possible by IMI project Open PHACTS and EU FP7 project RD-Connect. This beta was developed within an ELIXIR collaboration.

proteinatlas.trig.gz
RDF trig-file (gzip compressed), 82.1 MB
 
29 Cell graphic
Schematic cell containing all structures annotated within the Human Protein Atlas.

cell.svg
SVG-file (vectorized graphic), 545.5 KB