Page 113 - My FlipBook

P. 113

Brochure 2020

collected time-course transcriptome data isolated from interfaces. The constructed pipelines can be exported as XML
hypertrophic murine hearts with or without transverse aorta les together with all of their parameter settings for reusability
banding surgery at different times. By analyzing the entire
transcriptome dataset and reconstruction of transcription factor and portability. Second, isobaric-labeling techniques, such as
co-expression networks, we discovered that global genetic the TMT 10plex reagent set, are being widely adopted in MS
changes began soon after cardiac pressure overload, i.e., before experiments to perform relative quantitation of proteins in
morphological changes from cardiac hypertrophy. multiple samples, e.g., tumor and adjacent normal tissues. We
II. Glycan synthesis, Proteomics and Proteogenomics have developed a new tool called Multi-Q 2 (totally different
AI Approach for Glycan Synthesis. Carbohydrates play from the previous version Multi-Q) for isobaric-labeling
important roles in organisms, and glycan synthesis is essential quantitation analysis that exhibits improved quantitation
to understand drug action and vaccine developments. The accuracy and coverage. In addition, we have also conducted
world-renowned programmable one-pot oligosaccharide many analyses on different normalization schemes and other
synthesis approach developed by Dr. Chi-Huey Wong was settings to ensure better quantitation accuracy.
the first automatic means of synthesizing a large number of The Human Proteome Project (HPP) organized by the Human
oligosaccharides rapidly. Optimer, the early software for that Proteome Organization aims to characterize the entire human
one-pot method, provides synthetic blueprints by searching proteome. Its main goal in recent years has been focused
against a Building Block Library (BBL), also designed and on detecting missing proteins, i.e., those that have not been
synthesized by Wong. By ordering BBLs with appropriate relative experimentally detected at the protein level. In recent years,
reactivity values (RRVs), glycans can be synthesized efficiently we have been conducting research on HPP. First, we explored
and e ectively in "one pot". However, there are two limitations challenges from the informatics perspective why missing
to this method: (1) there are only ~150 BBLs with measured proteins are hard to detect from MS experiments. Second, since
RRVs in the library, and (2) the current one-pot method can rigorous HPP data interpretation guidelines must be adopted
only synthesize small oligosaccharides due to RRV ordering to claim identi cation of a missing protein, we investigated the
requirements. To overcome the rst limitation, we proposed the e ect of isobaric substitution on missing protein identi cation.
concept of "virtual BBLs". More than 50,000 virtually-generated Third, given that unique in silico-digested peptides are crucial
BBLs, whose RRVs are predicted by machine learning (with >0.97 to identify a missing protein, we developed a web server called
PCC accuracy), provide more synthetic solutions for chemists. iHPDM, which contains a comprehensive proteolytic peptide
To overcome the second limitation, we proposed the concept database constructed from human proteins digested by 15
of "hierarchical one-pot synthesis" by recursively composing protease combinations to facilitate selection of proteases used in
fragments using a one-pot approach and treating fragments as MS experiments. In addition, we have also helped collaborators
new BBLs in the library. Our new program, Auto-CHO, provides within Academia Sinica to analyze their MS data and identify
synthetic solutions for complex glycans by multiple hierarchical missing proteins according to HPP data interpretation
one-pot operations. This research has made great strides in guidelines.
glycan synthesis by computer simulation. Bioinformatics for Proteogenomics. Protein-coding region
variations―including single amino acid variations (SAV ),
Figure1 : The one-pot synthesis indels, and alternative splicing junctions―may cause or have
illustration of Globo-H been linked to particular cancers. For example, SAV L858R in
by Auto-CHO. Epidermal Growth Factor Receptor has been observed at the
genomic level in lung cancer patients in Taiwan. Proteogenomics
Bioinformatics for Proteomics. Studying proteomics is important research that can validate such variations at the protein level
because proteins perform various functions in cells and are by means of MS experiments is receiving increased attention.
targets for drugs to tackle disease. Mass Spectrometry (MS) However, such research must surmount two main challenges.
has become the predominant technology for large-scale First, sufficient SAV-harboring protein sequences must be
proteomic research. The main tasks of MS-based proteomics generated to construct a customized database for identifying
analysis are protein identification and quantitation. We have SAV variant peptides from MS data. Second, even when variant
developed computational methods and implemented software peptides have been identified from MS data, it is necessary
tools for these two tasks. First, analyses of MS data involve a to validate them, for instance, to check whether they can be
series of steps, even when using the popular TransProteomic obtained from known peptides by isobaric substitution. To
Pipeline (TPP). To reduce the need for manual interventions to tackle the rst issue, we formulated the problem of generating
launch each step, we have developed a software tool, called a minimized number of SAV-harboring protein sequences to
WinProphet, which seamlessly integrates with TPP and other contain all possible combinations of SAVs as the classical set
external command-line programs to allow users to configure, covering problem, and proposed an e cient algorithm, called
manage, and automatically execute pipelines through graphical MinProtMaxVP, to solve it. In the future, we plan to develop a
software tool for generating a customized target-decoy database
for variant peptide identi cation based on MinProtMaxVP and
appropriate decoy generation methods. To tackle the second
issue, we have investigated the effect of isobaric substitution
on variant peptide identi cation, in addition to missing protein
identification, and have also proposed a so-called LeTE-
fusion pipeline to evaluate the possibility of detecting variant
peptides. In the future, we will further propose several methods
for data validation to ensure rigorous identification of variant
peptides. Our results can hopefully be applied to research on
proteogenomic characterization of di erent cancers.

111

108 109 110 111 112 113 114 115 116 117 118