FlowCap
This page contains supplemental information to our submission to Dream6 challenge 4 "Classification of AML". For information about this challenge, please visit the Dream6 homepage. In the following we describe our approach and provide a download of the program used to make the predictions for the challenge and its source code.
Method description
We base our classifier on the following assumptions:
- each experiment (tube) is an independent indication if this patient suffers from AML or not
- for each cell in a tube we may independently decide if this cell is infected or not
- the number of cells classified as infected differs substantially between patients suffering from AML and healthy patients
Following these assumptions we take a step-wise approach using Jstacs:
1. We build a classifier that returns the probability that a specific cell from a specific tube is infected or not. Such a classifier is learned for each kind of tube (that means each selection of markers) independently, where the measurements of all cells of AML patients are used as foreground (positive) and the measurements of all cells of healthy patients are used as background (negative). The log-values of the measurements are modeled by normal distributions and the parameters are learned by the maximum conditional likelihood principle.
2. For each patient, we compute the fraction of cells in a tube classified as infected. For each patient we obtain a series of 8 such fractions, one for each tube.
3. We create another classifier working on these 8 values using a logistic regression and learn its parameters by the maximum supervised posterior principle based on the labeling of patients. The output of this classifier is the final prediction.
For the predictions for the unlabeled data, we use the trained classifiers and follow the protocol as before: Classify each cell in each tube - compute the fraction of cells classified as infected - use logistic function for final prediction.
This approach can be summarized by the following pseudo code:
Pseudo code
Parameter learning
For each tube do Load the measurements for each cell in this tube; Compute log-values for all measurements; Create a sample based on the log-values of the individual cells and label all cell stemming from AML patients as foreground class and all cells from healthy patients as background class; Create a classifier based on 7 independent normal distributions, corresponding to the measurements for the two scatter and five antibody measurements, for the foreground and background class each; Estimate the parameters of the classifier (i.e., means and standard deviations) from this sample using the maximum conditional likelihood (MCL) learning principle. Classify the log-measurements of all cells of a patient and compute the fraction of cells (later denoted as patient posterior) with a probability P(AML | cell) > 0.5; Done; For each patient do Create a sequence of the 8 patient posteriors; Done; Create a sample from these sequences with labels according to the patients’ state of health; Create a classifier based on logistic regression; Estimate the parameters of the classifier from this sample using the maximum supervised posterior (MSP) learning principle and a product normal prior with standard deviation 1;
Prediction
For each tube do Load the measurements for each cell in this tube; Compute log-values for all measurements; Create a sample based on the log-values of the individual cells; Classify the log-measurements of all cells of a patient using the previously trained classifier for this tube and compute the fraction of cells (later denoted as patient posterior) with a probability P(AML | cell) > 0.5; Done; For each patient do Create a sequence of the 8 patient posteriors; Done;
Finally, use the classifier based on logistic regression to obtain the final prediction based on this sequence of patient posteriors.
Download
- Binaries, including the XML of the classifier used in the challenge
- Sources, additionally require Jstacs 2.0 sources to compile
- XML of the classifier used in the challenge