To extract functional information on genes and processes from large expression datasets, analysis methods are required that can computationally deal with these amounts of data, are tunable to specific research questions, and construct classifiers that are not overspecific to the dataset at hand. To satisfy these requirements, a stepwise procedure that combines elements from principal component analysis and discriminant analysis, was developed to specifically retrieve genes involved in processes of interest and classify samples based upon those genes. In a global expression dataset of 300 gene knock-outs in Saccharomyces cerevisiae, the procedure successfully classified samples with similar 'cellular component' Gene Ontology annotations of the knock-out gene by expression signatures of limited numbers of genes. The genes discriminating 'mitochondrion' from the other subgroups were evaluated in more detail. The thiamine pathway turned out to be one of the processes involved and was successfully evaluated in a logistic model to predict whether yeast knock-outs were mitochondrial or not. Further, this pathway is biologically related to the mitochondrial system. Hence, this strongly indicates that our approach is effective and efficient in extracting meaningful information from large microarray experiments and assigning functions to yet uncharacterized genes. Copyright (c) 2007 John Wiley & Sons, Ltd.