High-throughput single cell data analysis - A tutorial

G.H. Tinnevelt; K. Wouters; G.J. Postma; R. Folcarelli; J.J. Jansen

doi:10.1016/j.aca.2021.338872

High-throughput single cell data analysis - A tutorial

G.H. Tinnevelt^*, K. Wouters, G.J. Postma, R. Folcarelli, J.J. Jansen

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.

Original language	English
Article number	338872
Number of pages	14
Journal	Analytica Chimica Acta
Volume	1185
DOIs	https://doi.org/10.1016/j.aca.2021.338872
Publication status	Published - 15 Nov 2021

Keywords

ARTIFICIAL NEURAL NETWORKS
AUTOMATED IDENTIFICATION
COMPENSATION
FLOW-CYTOMETRY DATA
FLUORESCENCE
MASS CYTOMETRY
ORGANIZING MAPS
PARTIAL LEAST-SQUARES
SOLVING CHEMICAL PROBLEMS
VISUALIZATION

Access to Document

10.1016/j.aca.2021.338872Licence: CC BY-NC-ND

Cite this

@article{a2bb3fbb4ee64492be129bcdf00883af,

title = "High-throughput single cell data analysis - A tutorial",

abstract = "White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.",

keywords = "ARTIFICIAL NEURAL NETWORKS, AUTOMATED IDENTIFICATION, COMPENSATION, FLOW-CYTOMETRY DATA, FLUORESCENCE, MASS CYTOMETRY, ORGANIZING MAPS, PARTIAL LEAST-SQUARES, SOLVING CHEMICAL PROBLEMS, VISUALIZATION",

author = "G.H. Tinnevelt and K. Wouters and G.J. Postma and R. Folcarelli and J.J. Jansen",

note = "Publisher Copyright: {\textcopyright} 2021 The Authors",

year = "2021",

month = nov,

day = "15",

doi = "10.1016/j.aca.2021.338872",

language = "English",

volume = "1185",

journal = "Analytica Chimica Acta",

issn = "0003-2670",

publisher = "Elsevier Science",

}

TY - JOUR

T1 - High-throughput single cell data analysis - A tutorial

AU - Tinnevelt, G.H.

AU - Wouters, K.

AU - Postma, G.J.

AU - Folcarelli, R.

AU - Jansen, J.J.

PY - 2021/11/15

Y1 - 2021/11/15

N2 - White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.

AB - White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.

KW - ARTIFICIAL NEURAL NETWORKS

KW - AUTOMATED IDENTIFICATION

KW - COMPENSATION

KW - FLOW-CYTOMETRY DATA

KW - FLUORESCENCE

KW - MASS CYTOMETRY

KW - ORGANIZING MAPS

KW - PARTIAL LEAST-SQUARES

KW - SOLVING CHEMICAL PROBLEMS

KW - VISUALIZATION

U2 - 10.1016/j.aca.2021.338872

DO - 10.1016/j.aca.2021.338872

M3 - Article

C2 - 34711307

SN - 0003-2670

VL - 1185

JO - Analytica Chimica Acta

JF - Analytica Chimica Acta

M1 - 338872

ER -