 |
|
IDEAL-Q: An automated tool for label-free quantitation analysis using an efficient peptide alignment approach and spectral data validation
|
|
Authors: Chih-Chiang Tsou, Chia-Feng Tsai, Ying-Hao Tsui, Putty-Reddy Sudhir, Yi-Ting Wang, Yu-Ju Chen, Jeou-Yuan Chen, Ting-Yi Sung, Wen-Lian Hsu
|
 |
 |
| Sung, Ting-Yi |
Hsu, Wen-Lian |
|
Abstract: In this paper, we present a fully automated tool, called IDEAL-Q, for label-free quantitation analysis. It accepts raw data in the standard mzXML format as well as search results from major search engines, including Mascot, SEQUEST, and X!Tandem, as input data. To quantify as many identified peptides as possible, IDEAL-Q uses an efficient algorithm to predict the elution time of a peptide unidentified in a specific LC-MS/MS run, but identified in other runs. Then, the predicted elution time is used to detect peak clusters of the assigned peptide. Detected peptide peaks are processed by statistical and computational methods and further validated by Signal-to-noise ratio, Charge state, and Isotopic distribution criteria (SCI validation) to filter out noisy data. IDEAL-Q's performance has been evaluated by several experiments. First, a serially diluted protein mixed with E. coli lysate showed a high correlation with expected ratios and demonstrated good linearity (R2=0.996). In a biological replicate experiment on the THP-1 cell lysate, IDEAL-Q quantified 87% (1,672 peptides) of all identified peptides, surpassing the 45.7% (909 peptides) achieved by the conventional identity-based approach, which only quantifies peptides identified in all LC-MS/MS runs. Manual validation on all 11,940 peptide ions in 6 replicate LC-MS/MS runs revealed that 97.8% of the peptide ions were correctly aligned and 93.3% were correctly validated by SCI. Thus, the mean of the protein ratio, 1.00±0.05, demonstrates IDEAL-Q's high accuracy without human intervention. Finally, IDEAL-Q was applied again to the biological replicate experiment with an additional SDS-PAGE fractionation step, to show its compatibility for label-free experiments with fractionation. For flexible workflow design, IDEAL-Q supports different fractionation strategies and various normalization schemes, including multiple spiked internal standards. User-friendly interfaces are provided to facilitate convenient inspection, validation, and modification of quantitation results. In summary, IDEAL-Q is an efficient, user-friendly, and robust quantitation tool.
|
|
|
A binarization method with learning-built rules for document images produced by cameras
|
|
Authors: C.-H. Chou, W.-H. Lin, and F. Chang
|
 |
| Chang, Fu |
|
Abstract: In this paper, we propose a novel binarization method for document images produced
by cameras. Such images often have varying degrees of brightness and require
more careful treatment than merely applying a statistical method to obtain a
threshold value. To resolve the problem, our method divides an image into several
regions and decides how to binarize each region. The decision rules are derived
from a learning process that takes training images as input. Tests on images produced
under normal and inadequate illumination conditions show that our method
yields better visual quality and better OCR performance than three global binarization
methods and four locally adaptive binarization methods.
|
|
|
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles
|
|
Authors: Chin, C. H., Chen, S. H., Ho, C. W., Ko, M. T.* Lin, C. Y.*
|
 |
 |
| Lin, Chung-Yen |
Ko, Ming-Tat |
|
Abstract: Background
Many research results show that the biological systems are composed of functional modules. Members in the same module usually have common functions. This is useful information to understand how biological systems work. Therefore, detecting functional modules is an important research topic in the post-genome era. One of functional module detecting methods is to find dense regions in Protein-Protein Interaction (PPI) networks. Most of current methods neglect confidence-scores of interactions, and pay little attention on using gene expression data to improve their results.
Results
In this paper, we propose a novel hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles, and we name it HUNTER. Our method not only can extract functional modules from a weighted PPI network, but also use gene expression data as optional input to increase the quality of outcomes. Using HUNTER on yeast data, we found it can discover more novel components related with RNA polymerase complex than those existed methods from yeast interactome. And these new components show the close relationship with polymerase after functional analysis on Gene Ontology.
Conclusion
A C++ implementation of our prediction method, dataset and supplementary material are available at http://hub.iis.sinica.edu.tw/Hunter/ webcite. Our proposed HUNTER method has been applied on yeast data, and the empirical results show that our method can accurately identify functional modules. Such useful application derived from our algorithm can reconstruct the biological machinery, identify undiscovered components and decipher common sub-modules inside these complexes like RNA polymerases I, II, III.
|
|
|
Phylogenetic reconstruction by Automatic Likelihood Model Selector (PALM) : A Framework for Phylogenetic Analysis with the Best Substitution Model
|
|
Authors: Chen, S.H., Su,S. Y., Lo, C. Z., Huang, T. J., Kuo, B. H., Lin, C. Y.*
|
 |
| Lin, Chung-Yen |
|
Abstract: Selecting a good model for inferring relatedness and the tree topology for a given
sequence set is crucial in phylogenetic analysis. However, it is a time-consuming,
computing-intensive and bewildered practice that relies on the knowledge of
substitution model theories and the skills to run through all possible combinations of
several separate programs. To make the whole analysis smoothly and avoid tedious
manipulations of various programs, we build an intuitive framework named
as Phylogentic Reconstruction by Automatic Likelihood Model selector (PALM)
with the most convinced, updated algorithms and models in a seamless way. PALM
integrates ClustalW, PhyML, Modeltest, Prottest, and our own developed works to
test the fitness of 24 substitution models (JC69, K80, F81, HKY, TrN, GTR with
combination of +G, +I) for DNA sequences and 96 substitution models (JTT, MtREV,
MtMam, MtArt, Dayhoff, WAG, RtREV, CpREV, Blosum62, VT, HIVb, HIVw with
combination of +G, +I, +F) for protein sequences according to the ranked score
maximum likelihood in various criterion (AIC, AICc, BIC and LnL). For speedy
computation, we adopt parallel calculation in estimating of maximum likelihood for
protein sequences. The output files generated during analysis including bootstrapping
results and the concluded tree topology are provided for download. The PALM tree
viewer is a versatile tool for operating tree topology and restarting analysis iteratively.
Options of branch length values, bootstrapping score, swapping of branching,
removing nodes, adding new sequences for new analysis can be performed just in a
mouse-click way.
PALM simplifies the computing task and result management of a phylogenetic
analysis process. It is freely accessible at http://palm.iis.sinica.edu.tw.
|
|
|
Impact of DNA binding position variants on yeast gene expression
|
|
Authors: Swamy, B. S. K., Cho, C.Y., Chiang, S., Tsai, Z. T.Y., and Tsai, H.K
|
 |
| Tsai, H.K |
|
Abstract: Transcription factors (TFs) regulate gene expression by binding to specific binding sites (TFBSs) in gene promoters. TFBS motifs may contain one or more variable positions. Although the prevailing assumption is that nucleotide variants at such positions are functionally equivalent, there is increasing evidence that such variants play a role in regulation of gene expression. In this article, we propose a method for studying the relationship between the expression of target genes and nucleotide variants in TFBS motifs at a genome-wide scale in Saccharomyces cerevisiae, especially the combinatorial effects of variants at two positions. Our analysis shows that nucleotide variations in more than one-third of variable positions and in 20% of dependent position pairs are highly correlated to gene expression. We define such positions as 'functional'. However, some positions are only functional as dependent pairs, but not individually. In addition, a significant proportion of the functional positions have been well conserved across all yeast-related species studied. We also find that some positions require the presence of co-occurring TFs, while others maintain their functionality in the absence of a co-occurring TF. Our analysis supports the importance of nucleotide variants at variable positions of TFBSs in gene regulation.
|
|
|
Algebra of programming in Agda: dependent types for relational program derivation
|
|
Authors: S-C. Mu, H-S. Ko, and P. Jansson
|
 |
| Shin-Cheng Mu |
|
Abstract: Relational program derivation is the technique of stepwise refining a relational specification to a program by algebraic rules. The program thus obtained is correct by construction. Meanwhile, dependent type theory is rich enough to express various correctness properties to be verified by the type checker. We have developed a library, AoPA (Algebra of Programming in Agda), to encode relational derivations in the dependently typed programming language Agda. A program is coupled with an algebraic derivation whose correctness is guaranteed by the type system. Two non-trivial examples are presented: an optimisation problem and a derivation of quicksort in which well-founded recursion is used to model terminating hylomorphisms in a language with inductive types.
|
|
|
Building GML-native Web-based Geographic Information Systems
|
|
Authors: Chia-Hsin Huang, Tyng-Ruey Chuang, Dong-Po Deng, and Hahn-Ming Lee
|
 |
| Tyng-Ruey Chuang |
|
Abstract: Disaster response systems are designed to facilitate decision-making based on large amounts of heterogeneous geographic information. Most geographic information systems (GISs) use relational databases to manipulate information efficiently. However, they suffer from interoperability issues because they need to expend significant effort mapping heterogeneous geographic information, which may have complicated structures, into relational data models, and vice versa. Geography Markup Language (GML) is regarded as a standard for expressing, storing, and exchanging geospatial data, and has been applied to help solve interoperability problems. Interestingly, no GIS has been built on native XML/GML technologies so far. There are two possible reasons for this: current XML processors are incapable of processing geospatial information, and they are inefficient in manipulating large XML documents. In this paper, we resolve these two difficulties and move forward to realizing GML-native web-based geographic information systems.
|
|
|
Computation and communication schedule optimization for data-sharing tasks on uniprocessor
|
|
Authors: Jan-Jan Wu, , En-Jan Chou, Pangfeng Liu
|
 |
| Jan-Jan Wu |
|
Abstract: Almost every computation task requires input data in order to find a solution. This is not a problem for a centralized system because data is usually available locally. However, in a parallel and distributed system, e.g., computation grids, the data may be in remote sites and must be transferred to the local site before the computation can proceed. As a result, the interleaved sequence of data transfer and job execution has a significant impact on the overall computational efficiency. In this paper, we analyze the computational complexity of the shared-data job scheduling problem on uniprocessor, with and without consideration of the storage capacity constraint on the local site.
We show that if there is an upper bound on the server capacity, the problem is NP-complete, even when each job depends on at most two data items. For the case where there is no upper bound on the server capacity, we show that there exists an efficient algorithm that can provide an optimal job schedule when each job depends on at most two data items. We also propose an efficient heuristic algorithm that can determine good schedules for cases where there is no limit on the amount of data a job may access. The reported experiment results demonstrate that this heuristic algorithm performs very well, and derives near optimal solutions.
|
|
|
Learning to binarize document images
|
|
Authors: C.-H. Chou, W.-H. Lin, and F. Chang
|
 |
| Fu Chang |
|
Abstract: Document images produced by cameras often have varying degrees of brightness. To resolve the problem, we propose a method that divides an image into sever-al regions and decides what binarization action to take on each region based on the rules that are derived from a learning process. Since each region can allow more than one action to take, we are dealing with a multi-label and multi-class classification problem that can be solved effectively by support vector machines. Tests on images produced under normal and inade-quate illumination conditions show that our method yields better OCR performance than three global bina-rization methods and four locally adaptive binariza-tion methods.
|
|
|
Social Tagging, Online Communication, and Peircean Semiotics: A Conceptual Framework
|
|
Authors: Wei-Ching Huang and Tyng-Ruey Chuang
|
 |
| Tyng-Ruey Chuang |
|
Abstract: One of the recent web developments has focused on the opportunities it presents for social tagging through user participation and collaboration. As a result, social tagging has changed the traditional online communication process. The interpretation of tagging between humans and machines may create new problems if essential questions about how social tagging corresponds to online communications, what objects the tags refer to, who the interpreters are, and why they are engaged are not explored systematically. Since such reasoning is an interpretation of social tagging among humans, tags and machines, it is a complex issue that calls for deep reflection. In this paper, we investigate the relevance of the potential problems raised by social tagging through the framework of C.S. Peirce's semiotics. We find that general phenomena of social tagging can be well classified by Peirce's 10 classes of signs for reasoning. This suggests that regarding social tagging as a sign and systematically analyzing the interpretation are positively associated with the 10 classes of signs. Peircean semiotics can be used to examine the dynamics and determinants of tagging; hence, the various uses of this categorization schema may have implications for the design and development of information systems and web applications.
|
|
|
Tree decomposition for large-scale SVMs: experimental and theoretical results
|
|
Authors: F. Chang, C.-Y. Guo, X.-R. Lin, C.-J. Lu
|
 |
| Fu Chang |
|
Abstract: To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for large-scale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training applied to the remaining data points. Second, it is efficient for seeking the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, we can provide a generalization error bound for the classifier derived by the tree decomposition method. For experiment data sets whose size can be handled by current non-linear, or kernel-based SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy.
|
|
|
Elliptive Curve Method on Video Cards
|
|
Authors: D. J. Bernstein, T.-R. Chen, C.-M. Cheng, T. Lange, and B.-Y. Yang
|
 |
| B.-Y. Yang |
|
Abstract: This paper reports record-setting performance for the ellipticcurve
method of integer factorization: for example, 604.99 curves/second
for ECM stage 1 with B1 = 8192 for 280-bit integers on a single PC.
The state-of-the-art GMP-ECM software handles 171.42 curves/second
for ECM stage 1 with B1 = 8192 for 280-bit integers using all four cores
of a 2.4GHz Core 2 Quad Q6600.
The extra speed takes advantage of extra hardware, specifically two
NVIDIA GTX 280 graphics cards, using a new ECM implementation
introduced in this paper. Our implementation uses Edwards curves, relies
on new parallel addition formulas, and is carefully tuned for the
highly parallel GPU architecture. On a single GTX 280 the implementation
performs 22.66 million modular multiplications per second for a
general 280-bit modulus. GMP-ECM, using all four cores of a Q6600,
performs 17.91 million multiplications per second.
This paper also reports speeds on other graphics processors: for example,
2414 280-bit elliptic-curve scalar multiplications per second on an
older NVIDIA 8800 GTS (G80), again for a general 280-bit modulus. For
comparison, the CHES 2008 paper ¡§Exploiting the Power of GPUs for
Asymmetric Cryptography¡¨ reported 1412 elliptic-curve scalar multiplications
per second on the same graphics processor despite having fewer
bits in the scalar (224 instead of 280), fewer bits in the modulus (224
instead of 280), and a special modulus (2224 - 296 + 1).
|
|
|
OneClick: A Framework for Measuring Network Quality of Experience
|
|
Authors: Kuan-Ta Chen, Cheng Chun Tu, and Wei-Cheng Xiao
|
 |
| Kuan-Ta Chen |
|
Abstract: As the service requirements of network applications shift from high throughput to high media quality, interactivity,
and responsiveness, the definition of QoE (Quality of Experience)
has become multidimensional. Although it may not be difficult to
measure individual dimensions of the QoE, how to capture users¡¦
overall perceptions when they are using network applications
remains an open question.
|
|
|
Model-based Clustering by Probabilistic Self-organizing Maps
|
|
Authors: Shih-Sian Cheng, Hsin-Chia Fu, and Hsin-Min Wang
|
 |
| Hsin-Min Wang |
|
Abstract: In this paper, we consider the learning process of a probabilistic self-organizing map (PbSOM) as a model-based data clustering procedure that preserves the topological relationships between data clusters in a neural network. Based on this concept, we develop a coupling-likelihood mixture model for the PbSOM that extends the reference vectors in Kohonen's self-organizing map (SOM) to multivariate Gaussian distributions. We also derive three expectation-maximization (EM)-type algorithms, called the SOCEM, SOEM, and SODAEM algorithms, for learning the model (PbSOM) based on the maximum-likelihood criterion. SOCEM is derived by using the classification EM (CEM) algorithm to maximize the classification likelihood; SOEM is derived by using the EM algorithm to maximize the mixture likelihood; and SODAEM is a deterministic annealing (DA) variant of SOCEM and SOEM. Moreover, by shrinking the neighborhood size, SOCEM and SOEM can be interpreted, respectively, as DA variants of the CEM and EM algorithms for Gaussian model-based clustering. The experimental results show that the proposed PbSOM learning algorithms achieve comparable data clustering performance to that of the deterministic annealing EM (DAEM) approach, while maintaining the topology-preserving property.
|
|
|
Predicting helix-helix interactions from residue contacts in membrane proteins
|
|
Authors: Allan Lo, Yi-Yuan Chiu, Einar Andreas Rodland, Ping-Chiang Lyu, Ting-Yi Sung, and Wen-Lian Hsu
|
 |
| Wen-Lian Hsu |
|
Abstract:
Motivation: Helix-helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction.
Results: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins.
Availability: http://bio-cluster.iis.sinica.edu.tw/TMhit/
Contact: tsung@iis.sinica.edu.tw
|
|
|
A Neural Network Model for Constructing Endophenotypes of Common Complex Diseases - An Application to Male Young-onset Hypertension Microarray Data
|
|
Authors: Ke-Shiuan Lynn, Li-Lan Li, Yen-Ju Lin, Chiuen-Huei Wang, Shu-Hui Sheng, Ju-Hwa Lin, Wayne Liao, Wen-Lian Hsu and Wen-Harn Pan
|
 |
| Wen-Lian Hsu |
|
Abstract:
Motivation: Identification of disease-related genes using high-throughput microarray data is more difficult for complex diseases as compared with monogenic ones. We hypothesized that an endophenotype derived from transcriptional data is associated with a set of genes corresponding to a pathway cluster. We assumed that a complex disease is associated with multiple endophenotypes and can be induced by their up/downregulated gene expression patterns. Thus, a neural network model was adopted to simulate the gene–endophenotype–disease relationship in which endophenotypes were represented by hidden nodes.
Results: We successfully constructed a three-endophenotype model for Taiwanese hypertensive males with high identification accuracy. Of the three endophenotypes, one is strongly protective, another is weakly protective and the third is highly correlated with developing young-onset male hypertension. Sixteen of the involved 101 genes were highly and consistently influential to the endophenotypes. Identification of SLC4A5, SLC5A10 and LDOC1 indicated that sodium/bicarbonate transport, sodium/glucose transport and cell-proliferation regulation may play important upstream roles and identification of BNIP1, APOBEC3F and LDOC1 suggested that apoptosis, innate immune response and cell-proliferation regulation may play important downstream roles in hypertension. The involved genes not only provide insights into the mechanism of hypertension but should also be considered in future gene mapping endeavors.
Availability: Microarray data and test program are available at http://ms.iis.sinica.edu.tw/microarray/index.htm
Contact: pan@ibms.sinica.edu.tw or hsu@iis.sinica.edu.tw
|
|
|
Effect of Network Quality on Player Departure Behavior in Online Games
|
|
Authors: Kuan-Ta Chen, Polly Huang, and Chin-Laung Lei
|
 |
| Kuan-Ta Chen |
|
Abstract:
Understanding the impact of network conditions on player satisfaction, which is one of the major concerns of network game designers, is a popular research topic. Of the various ways to gauge user satisfaction, in this paper, we focus on how network quality affects a player's decision to leave a game prematurely. To answer this question, we analyze a 1,356-million-packet trace from a large commercial Massively Multiplayer Online Role-Playing Game (MMORPG) called ShenZhou Online. We show that both network delay and network loss significantly affect a player's decision to leave a game prematurely. It is feasible to predict whether players will quit prematurely based on the network conditions they experience. The proposed model can determine the relative impact of different types of network impairment. For our traces, the degrees of player intolerance to network delay, delay jitter, client packet loss, and server packet loss are in the proportion of 1:2:4:3 approximately. The model can also be used to make system design decisions. Through simulations, we show that by prioritizing server processing according to the goodness of network conditions, employing dejitter buffers, or replacing TCP with a more lightweight transport protocol, the probability of premature departure can be significantly reduced. In this way, we demonstrate how our model of players' network experience provides feedback for the design of online games.
|
|
|
Optimizing server placement for parallel I/O in switch-based clusters
|
|
Authors: Jan-Jan Wu, Yi-Fang Lin, Da-Wei Wang, Chien-Min Wang
|
 |
| Jan-Jan Wu |
|
Abstract:
In this paper, we consider how to optimize I/O server placement in order to improve parallel I/O performance in switch-based clusters. The significant advances in cluster networks in recent years have made it practical to connect tens of thousands of hosts via networks that have enormous and scalable total capacity, and in which communications between a host and any other host incur the same cost. The same cost property frees users from consideration of network contention and allows them to concentrate on load-balancing issues. We formulate the server placement problem on a cluster that has the same cost property as a weighted bipartite matching with the goal of balancing the workload on the I/O nodes. To find an optimal solution to this problem, we propose an O(n^3^2m(logn+logm)) algorithm, called Load Balance Matching (LBM), where n is the number of compute nodes and m is the number of I/O servers. We also investigate server placement for general clusters in which multiple same-cost subclusters are interconnected to form a large cluster. This class of clusters typically adopt irregular topologies that allow the construction of scalable systems with an incremental expansion capability. Also, due to the limited bandwidth on network links between subclusters, network link contention is a major concern when distributing servers over the entire network. We show that finding an optimal placement strategy for general clusters with the goal of minimizing link contention is computationally intractable. To resolve this problem, we propose a hierarchical strategy that places servers in two steps. First, to minimize link contention, we decide which subcluster each server should be assigned to. We propose a tree-based heuristic algorithm, called Load Balance Traversing (LBT), to solve this problem. In the second step, the LBM algorithm decides the location of each server within a subcluster. Our simulation results demonstrate that LBT achieves a significant improvement in parallel I/O performance over four other algorithms, and is near-optimal in some cases.
|
|
|
A bi-prototype theory of facial attractiveness
|
|
Authors: Fu Chang and Chien-Hsing Chou
|
 |
| Fu Chang |
|
Abstract:
The attractiveness of human faces can be predicted with a high degree of accu-racy if we represent them as feature vectors and compute their relative distances from two prototypes, namely, the average of attractive faces and the average of unattractive faces. Moreover, if we select the key features for facial representa-tion, the degree of attractiveness, defined in terms of the relative distances, exhi-bits a high degree of correlation with the average rating scores given by human assessors. These findings motivate a bi-prototype theory that relates facial at-tractiveness to the averages of attractive and unattractive faces, rather than the average of all faces, as previously hypothesized by some researchers.
|
|
|
Global and componentwise extrapolations for accelerating training of Bayesian networks and conditional random fields
|
|
Authors: Han-Shen Huang, Bo-Hou Yang, Yu-Ming Chang, Chun-Nan Hsu
|
 |
| Chun-Nan Hsu |
|
Abstract:
Abstract The triple jump extrapolation method is an effective approximation of Aitken?™s acceleration that can accelerate the convergence of many algorithms for data mining, including EM and generalized iterative scaling (GIS). It has two options?”global and componentwise extrapolation. Empirical studies showed that neither can dominate the other and it is not known which one is better under what condition. In this paper, we investigate this problem and conclude that, when the Jacobian is (block) diagonal, componentwise extrapolation will be more effective. We derive two hints to determine the block diagonality. The first hint is that when we have a highly sparse data set, the Jacobian of the EM mapping for training a Bayesian network will be block diagonal. The second is that the block diagonality of the Jacobian of the GIS mapping for training CRF is negatively correlated with the strength of feature dependencies. We empirically verify these hints with controlled and real-world data sets and show that our hints can accurately predict which method will be superior. We also show that both global and componentwise extrapolation can provide substantial acceleration. In particular, when applied to train large-scale CRF models, the GIS variant accelerated by componentwise extrapolation not only outperforms its global extrapolation counterpart, as our hint predicts, but can also compete with limited-memory BFGS (L-BFGS), the de facto standard for CRF training, in terms of both computational efficiency and F-scores. Though none of the above methods are as fast as stochastic gradient descent (SGD), careful tuning is required for SGD and the results given in this paper provide a useful foundation for automatic tuning.
|
|
|
Data Broadcast with Adaptive Network Coding in Heterogeneous Wireless Networks
|
|
Authors: De-Nian Yang and Ming-Syan Chen
|
|
Abstract:
In this paper, we propose a new data broadcast mechanism with network coding in heterogeneous wireless networks. Our mechanism adaptively clusters the mobile hosts in fewer cells to minimize the bandwidth consumption. In addition, we adaptively code the data according to the data temporarily stored in each mobile host with a distributed manner. Our mechanism allows each delivered message to be coded from only a subset of data to further reduce the number of required messages. We formulate the cell selection and broadcast coding problem with integer programming and prove that the problem is NP-hard. We design a distributed algorithm based on Lagrangean relaxation. Our algorithm needs no server to record the location, queried, and stored information of receivers. Moreover, our algorithm is adaptive to the dynamic group membership, mobility, queried, and stored data of receivers.
|
|
|
AdHoc Probe: End to end Capacity Probing in Ad Hoc Networks
|
|
Authors: Ling-Jyh Chen, Tony Sun, Guang Yang, M. Y. Sanadidi, and Mario Gerla
|
 |
| Ling-Jyh Chen |
|
Abstract:
Knowledge of end-to-end path capacity is useful for video/audio stream adaptation, network management and overlay design. Capacity estimation in wired and last-hop wireless networks has been extensively investigated, but a thorough and systematic study in ad hoc, multihop wireless networks is still lacking. Yet the rate of a wireless link can change dynamically (and rapidly) due to changes in interference, distance or energy optimization policy. Timely knowledge of path capacity is key to efficient routing, traffic management and application deployment. In this paper, we present AdHoc Probe, a packet-pair based technique, to estimate end-to-end path capacity in ad hoc wireless networks. We apply AdHoc Probe to path capacity estimation in auto rate wireless networks with variable displacement and interference; and, in remote wireless networks across the Internet. Using analysis, simulation and testbed experiments, we show AdHoc Probe can withstand mobility and is able to trace the rate adaptation of wireless networks timely and correctly. AdHoc Probe is simpler, faster and much less intrusive than current schemes.
|
|
|
A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization
|
|
Authors: Yi-Ting Chen, Berlin Chen, Hsin-Min Wang
|
 |
| Hsin-Min Wang |
|
Abstract:
In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.
|
|
|
Historical Data : 2009 , 2008 , 2007
|
|
 |
|
|