TR-IIS-07-013    Fulltext


Global and Componentwise Extrapolations for Accelerating Training of Bayesian Networks and Conditional Random Fields

Chun-Nan Hsu, Han-Shen Huang, Bo-Hou Yang, and Yu-Ming Chang

Abstract

The triple jump extrapolation method is an effective approximation of Aitken’s acceleration that can accelerate the convergence of many parameter learning algorithms, including EM and generalized iterative scaling. It has two options — global and componentwise extrapolation. Empirical studies showed that neither can dominate the other in all cases and it is not known which one is better under what condition. In this paper, we investigate this problem and conclude that, when the Jacobian is (block) diagonal, componentwise extrapolation will be more effective. We derive two hints that allow us to determine the block diagonality. The first hint is that when we have a highly sparse data set, the Jacobian of the EM mapping for training a Bayesian network will be block diagonal. The second hint is that the block diagonality of the Jacobian of the GIS mapping for training CRF is negatively correlated with the strength of feature dependencies. We empirically verify these hints with controlled and real-world data sets. The results show that our hints can accurately predict which method will be superior. The results also show that both global and componentwise extrapolation can provide substantial acceleration. In particular, when applied to train large-scale CRF models, the GIS variant accelerated by componentwise extrapolation not only outperforms its global extrapolation counterpart, as our hint predicts, but can also compete with limited-memory BFGS (L-BFGS), the de facto standard for CRF training, in terms of both computational efficiency and F-scores.

Keywords: Bayesian Networks, Conditional Random Fields, Expectation-Maximization (EM) Algorithm, Generalized Iterative Scaling, Triple-Jump Extrapolation