Structure - Methods summary

Summary

The Structure section contains information about the three-dimensional structure of human proteins.The predicted 3D structure from the AlphaFold Protein Structure Database developed by Deepmind and EMBL-EBI is shown for genes with Ensembl transcripts corresponding to the Uniprot entry used for prediction. The Protein Browser can be used to select splice variants and display protein related features such as known antigen sequences, transmembrane regions and InterPro domains on the structures. The amino acid positions of population variants and variants with known clinical relevance in the Ensembl variation database can also be displayed.

Key publications

Jumper J et al. (2021) "Highly accurate protein structure prediction with AlphaFold" Nature 596(7873):583-589.

Varadi M et al. (2022) "AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models" Nucleic Acids Research 50(D1):D439-D444

What can you learn from the Structure Section?

Learn about:

  • the predicted 3D structure of proteins
  • the known missense variants with clinical significance
  • the known missense variants in the population
  • the antigen structure for the majority of the antibodies

How has the data been generated?

The predicted 3D protein structures are retrieved from the AlphaFold Protein Structure Database developed by DeepMind and EMBL-EBI. The AI-system Alphafold is a machine learning approach in which the primary amino acid sequence and aligned sequences of homologues together with physical and biological knowledge about related protein structures are incorporated into the design of a deep learning algorithm to directly predict the 3D structure of a protein.

The structures included have a uniprot id mapped to an ensembl gene in our gene set with at least one transcript having 100% identity to the structure sequence.

The population and clinical variants data is incorporated from the Ensembl variation database. For variants with clinical relevance only variants with clinical significance terms "pathogenic" and "likely pathogenic" were included.

All structures are displayed using the NGL Viewer.

What is presented in the section?

In the gene summary page of the Structure section, predicted 3D protein structures can be displayed and explored. In the drop-down panel the available experimental structures for each protein can be selected and displayed. Check boxes allow for display of antigen sequences and positions for population and/or clinical variants, and the structures can be colored according to b-factor, residue index or chain name.