A crowd-sourced competition to extract single-cell features from bioimages


In an article in Nature Methods, Emma Lundberg and coworkers described the results from a crowd-sourced competition (Human Protein Atlas - Single-Cell Classification) with the aim to develop machine learning models to label single-cell protein patterns in fluorescent images.

Protein localization is crucial to the understanding of biological networks and it is well known that the expression and localization of proteins vary both between cell types, but also within genetically identical cell populations. Subcellular localization of proteins therefore needs to be attributed at the single-cell level.

The competition was hosted on the Kaggle platform and involved 757 participating teams and more than 19,000 submissions over a period of three months. The aim was to develop computational models to classify subcellular protein localization patterns for every single cell in microscopy images, given a training set with only image-level labels based on the standard HPA annotation pipeline and consisting of almost 22,000 confocal microscope images of 17 different human cell lines and proteins encoded by almost 8,000 different genes.

To tackle the challenge most teams used either a cell-level approach or an image-level approach with deep neural networks as backbones in their models. The winning team came up with a new approach for fair activation based on Puzzle-CAM using a model including pipelines for cell-level and image-level prediction (Fair Cell Activation Network or FCAN) and subsequent cell prediction (Swin Transformer).

These types of models will help to shed light on single-cell spatial variability within images, protein spatial expression heterogeneity across cells and biological processes in cells, and provide a better understanding of dynamic protein functions in different organelles.

Link to article