Page 120 - My FlipBook
P. 120


實室 Data Mining and Machine Learning


Research Laboratories Laboratory

Research Faculty

De-Nian Yang / Chair In this era of data availability, various types of data (e.g., sensory, trajectory, transactions,
multimedia, social networks, Web browsing logs, etc.) are being generated at a monumental
Research Fellow rate. Due to the abundant and inexpensive nature of hardware and networks, the time has
never been better to explore all possible means of utilizing such data to enhance existing
Yuan-Hao Chang applications or to investigate new technologies for solving di cult problems. The Data mining
and Machine Learning Laboratory was formed with the main objectives of initiating innovative
Research Fellow research and strengthening scienti c and technological excellence in: (1) e ective collection,
representation, storage, processing, and analysis of massive data; and (2) exploring data mining
Meng-Chang technologies to e ciently and e ectively discover valuable knowledge within various types
Chen of data. Currently, the research of this group focuses on the following areas: I. Modeling and
Prediction for Real-Time Bidding on Online Display Advertising; II. Social Network Analysis and
Research Fellow Query Processing; III. E cient Data Management for Large-scale Computer Systems with Write-
constrained Memory and Storage Devices; and IV. Composite Neural Network: Theory and Its
Mi-Yen Yeh Application to PM2.5 Prediction.

Research Fellow

Mark Liao

Distinguished
Research Fellow

I. Modeling and Prediction for Real-Time Bidding on Online Display Advertising

Online display advertising is now programmatic. The ad impression displayed for each
website visitor may be di erent and is dynamically determined by a mechanism called Real-
Time Bidding (RTB), which brokers interactions between publishers and advertisers. In the
RTB environment, advertisers rely heavily on having a good prediction model for ad click-
through rates to e ectively and e ciently target potential customers and o er reasonable
bids. We aim to design an appropriate learning method from historical bidding data with
incomplete labels so that advertisers can have an e ective model to accurately predict ad
click-through rates and establish a good bidding strategy under budgetary constraints.

II. Social Network Analysis and Query Processing

The popularity of live streaming has led to explosive growth in new video content and
social communities on emerging platforms such as Facebook Live and Twitch. In addition to
allowing users to create streaming channels on various topics in real-time (e.g., news, sports,
games), these new platforms support two unique features: (1) multi-streaming, with viewers
on these platforms being able to follow multiple streams of live events simultaneously; and
(2) live interactions, enabling viewers to chat with each other and the broadcaster and send
virtual gifts to the broadcaster in real-time as events unfold. However, existing approaches
for selecting live streaming channels still focus on satisfying the individual preferences of
users, without considering the need to accommodate real-time social interactions among
viewers and to diversify the content of streams. Therefore, we are formulating a new Social-
aware Diverse and Preferred Live Streaming Channel Query (SDSQ) that jointly selects a set
of diverse and preferred live streaming channels and a group of socially tight-knit viewers.
We have demonstrated that SDSQ is NP-hard and inapproximable within any factor, and
have designed SDSSel, a 2-approximation algorithm with a guaranteed error bound. We
are performing a user study on Twitch to validate the need for SDSQ and the usefulness
of SDSSel. Moreover, the explosive growth in popularity of social networking leads to
problematic usage. An increasing number of social network mental disorders (SNMDs)―
such as Cyber-Relationship Addiction, Information Overload, and Net Compulsion―have
been recently documented. Currently, symptoms of these mental disorders are usually
observed passively, resulting in delayed clinical intervention. We argue that mining online
social behaviors provides an opportunity to actively identify SNMDs at an early stage. It is

118
   115   116   117   118   119   120   121   122   123   124   125