Abstract: Consider a cloud computing scenario where the server is resource-abundant and is capable of finishing the designated tasks.
It is envisioned that secure media applications with privacy preservation will be seriously treated.
In view of the fact that scale-invariant feature transform (SIFT) has been widely adopted in various fields, this paper is the first to target the importance of privacy-preserving SIFT (PPSIFT) and to address the problem of secure SIFT feature extraction and representation in the encrypted domain.
As all of the operations in SIFT must be moved to the encrypted domain, we propose a privacy-preserving realization of the SIFT method based on homomorphic encryption.
We show through the security analysis based on the discrete logarithm problem and RSA that PPSIFT is secure against ciphertext only attack and known plaintext attack.
Experimental results obtained from different case studies demonstrate that the proposed homomorphic encryption-based privacy-preserving SIFT performs comparably to original SIFT and
that our method is useful in SIFT-based privacy-preserving applications.
Abstract: Abstract:

Abstract:
Abstract:
Abstract: evaluated by a series of experiments, for which we have very
encouraging results.
Abstract:
Abstract:
Abstract: However, NoSQL databases are suitable for social computing for the following three reasons. First, data records stored in NoSQL database are assumed to be independent from each other. However most data in social computing are closely related. For example, a tagged picture can relate to multiple users, and possibly their other data as well. Second, structure queries are very common in social computing, but it is very difficult to express structure queries concisely in NoSQL database. Social network application developers often have to use complex SQL queries to explore data relationship. These queries usually involve joining multiple tables, which may degrade performance severely. For example, most open source implementations of column-family databases only support indexing on the primary key, so to join multiple tables on the secondary key will cause extensive scan on data and is therefore inefficient. Finally, it is very common for a social application to traverse a graph consisting of persons based on their relationship. However, the current query interface of NoSQL database is not adequate to support graph traversal because NoSQL only supports very primitive operations such as get and set, and it is not intuitive to express graph traversal with these primitive NoSQL operations.
Graph database provides better support for social computing. Data relation is the first class citizen in a graph database because graph database is optimized for structured query. In order to support a large-scale social computing application we need a large scale graph database. To provide such a database it requires a distributed graph data store and a distributed data processing system. To the best of our knowledge, there has not been any open source and cloudready distributed graph database available to social computing. This lack of support for large scale social computing motives us to develop our distributed graph database. Our system consists of a distributed graph data store and a parallel graph processing system. The graph data store provides indexing on nodes and edges for processing efficiency, and a user friendly data manipulation interface for facilitating graph data processing. The parallel graph processing system provides capability to process extremely large social networks.
We conduct experiments to demonstrate the efficiency of our system by running representative applications on real-world large scale social networks including Youtube, Flicker, LiveJournal and Orkut. Experimental results indicate that our system outperforms Hadoop file system in subgraph computation on social networks."

Abstract:
Abstract:
Abstract: 
Abstract:
Secretome analysis is important in pathogen studies. A fundamental and convenient way to identify secreted proteins is to first predict signal peptides, which are essential for protein secretion. However, signal peptides are highly complex functional sequences that are easily confused with transmembrane domains. Such confusion would obviously affect the discovery of secreted proteins. Transmembrane proteins are important drug targets, but very few transmembrane protein structures have been determined experimentally; hence, prediction of the structures is essential. In the field of structure prediction, researchers do not make assumptions about organisms, so there is a need for a general signal peptide predictor. To improve signal peptide prediction without prior knowledge of the associated organisms, we present a machine-learning method, called SVMSignal, which uses biochemical properties as features, as well as features acquired from a novel encoding, to capture biochemical profile patterns for learning the structures of signal peptides directly.
We tested SVMSignal and five popular methods on two benchmark datasets from the SPdb and UniProt/Swiss-Prot databases, respectively. Although SVMSignal was trained on an old dataset, it performed well, and the results demonstrate that learning the structures of signal peptides directly is a promising approach. We also utilized SVMSignal to analyze proteomes in the entire HAMAP microbial database. Finally, we conducted a comparative study of secretome analysis on seven tuberculosis-related strains selected from the HAMAP database. We identified ten potential secreted proteins, two of which are drug resistant and four are potential transmembrane proteins. SVMSignal is publicly available at http://bio-cluster.iis.sinica.edu.tw/SVMSignal. It provides user-friendly interfaces and visualizations, and the prediction results are available for download.
Abstract:
Abstract:
Abstract:
Abstract: Abstract:
Abstract:
Abstract: The algorithm relies on the Lagrange multipliers to optimally distribute the number of states for each node of the multinomial lattice. We also show experiment results to demonstrate effectiveness and efficiency of our algorithm by comparing with Monte Carlo simulations.
Abstract:
Abstract: 

Abstract:
Abstract:
Abstract:
Abstract: Motivation: Gene regulation involves complicated mechanisms such as cooperativity between a set of transcription factors (TFs). Previous studies have used target genes shared by two TFs as a clue to infer TF–TF interactions. However, this task remains challenging because the target genes with low binding affinity are frequently omitted by experimental data, especially when a single strict threshold is employed. This article aims at improving the accuracy of inferring TF–TF interactions by incorporating motif discovery as a fundamental step when detecting overlapping targets of TFs based on ChIP-chip data.
Results: The proposed method, simTFBS, outperforms three naïve methods that adopt fixed thresholds when inferring TF–TF interactions based on ChIP-chip data. In addition, simTFBS is compared with two advanced methods and demonstrates its advantages in predicting TF–TF interactions. By comparing simTFBS with predictions based on the set of available annotated yeast TF binding motifs, we demonstrate that the good performance of simTFBS is indeed coming from the additional motifs found by the proposed procedures.
Abstract: In this paper, we present CloudBrush, a parallel algorithm that runs on the MapReduce framework of cloud computing for de novo assembly of high-throughput sequencing data. The algorithm uses Myers’s bi-directed string graphs as its basis and consists of two main stages: graph construction and graph simplification. First, a vertex is defined for each non-redundant sequence read. We present a prefix-and-extend algorithm to identify overlaps between a pair of reads and to reduce transitive edges. The graph is further simplified by using conventional operations including linear path compression, dead-end tip removal and bubble removal. We also present a new operation, similar neighbour detection and edge adjustment, abbreviated as SNEE, to detect and simplify braid structure in the string graph. Besides, we also prune edges from one side of a node if at least one of these edges is not similar with the others. Note that, after doing so, all paths in a remaining connected subgraph corresponds to similar subsequences of the underlying genome. We then traverse each connected subgraph to find a long path supported by a sufficient amount of reads to represent the subgraph.
Preliminary results show that the CloudBrush assembler, compared with Contrail and Edena on the sequencing data of E. coli genomes, may yield longer contigs.
Abstract:
Abstract:
Abstract: 

Abstract: 

Abstract: Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.
In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins.
We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.
Abstract: A set S ⊆ V is a power dominating set (PDS) of a graph G =( V , E ) if every vertex and every edge in G can be observed based on the observation rules of power system monitoring. The power domination problem involves minimizing the cardinality of a PDS of a graph. We consider this combinatorial optimization problem and present a linear time algorithm for finding the minimum PDS of an interval graph if the interval ordering of the graph is provided. In addition, we show that the algorithm, which runs in Θ( n log n ) time, where n is the number of intervals, is asymptotically optimal if the interval ordering is not given. We also show that the results hold for the class of circular-arc graphs.
Abstract:
Abstract: Motivation: Metagenomics involves sampling and studying the genetic materials in microbial communities. Several statistical methods have been proposed for comparative analysis of microbial community compositions. Most of the methods are based on the estimated abundances of taxonomic units or functional groups from metagenomic samples. However, such estimated abundances might deviate from the true abundances in habitats due to sampling biases and other systematic artifacts in metagenomic data processing.
Results: We developed the MetaRank scheme to convert abundances into ranks. MetaRank employs a series of statistical hypothesis tests to compare abundances within a microbial community and determine their ranks. We applied MetaRank to synthetic samples and real metagenomes. The results confirm that MetaRank can reduce the effects of sampling biases and clarify the characteristics of metagenomes in comparative studies of microbial communities. Therefore, MetaRank provides a useful rank-based approach to analyzing microbiomes.
Abstract:
Abstract: 
Abstract: 
Abstract: