Learning robust classifiers from sparse data is a challenging research problem in the area of information retrieval, web mining, text categorization, topic identification, and natural language processing. For many real-world applications it is often too expensive to gather a sufficient amount of labeled samples to train high performance classifiers. In this paper we propose a novel maximal figure-of-merit (MFoM) learning approach to robust classifier design that optimizes any performance measure of interest (e.g. accuracy, recall, precision, or F_1 measure) for any target classifiers. By embedding the overall training objective into the decision functions used by the classifiers, the proposed MFoM approach is capable of learning the parameters of the classifiers in a decision-feedback manner to effectively take into account of both positive and negative training samples, and therefore reduce the required size of the training set. To solve this highly nonlinear optimization problem, a generalized probabilistic descent algorithm is used. The proposed MFoM learning approach has three desirable properties as compared to conventional learning approaches: (a) it is a metric oriented approach for designing the classifier; (b) the optimized metric is consistent in the training set and evaluation set; and, (c) it is more robust and less sensitive to data variation. We evaluate the MFoM learning approach on the Reuters-21578 text categorization task. In all the experiments, we employ a simple binary decision tree classifier with a linear discriminant function for each node of the tree. Our experimental results indicate that the classifiers obtained by maximizing the F_1 measure give significantly better performance and enhanced robustness as compared to those trained using most conventional approaches, including the support vector machine (SVM) approach. Moreover, when considering only categories with less than 30 training samples, the MFoM-based classifiers could achieve significant improvements in macro-averaging and micro-averaging F_1 measures of 0.361 and 0.537 as compared to 0.251 and 0.429 obtained by the linear SVM classifiers, respectively. In order to demonstrate the generality of MFoM method for designing classifiers tuned to optimizing any performance metric, we designed and compared three MFoM classifiers based on the chosen metrics of precision, recall, and F_1 . The results clearly show that the performance based on a chosen metric is more consistent between the training stage and evaluation stage, and that the classifier is able to optimize the chosen metric during evaluation.