World Wide Web is the best and largest information resources. To enable higher levels of automation, transforming human readable web data into structured format has been an important issue for semantic Web. In the past decades, information extraction from static Web (free-text) and deep Web (semi-structured) has attracted many research focuses both from academy and industry. For information extraction from static pages, while it can be used to populate Web knowledge base (KB) such as DBpedia, YAGO, Web KB can also guide information extraction. Deep web data extraction, on the other hand, can further speed up the extraction procedure for data instances of the same relational schema (even though we need to construct one wrapper for each website) and create Web KB with the help of proper ontology learning. In this talk, we will introduce two information extraction tools, namely WebETL and DS4NER for static and deep Web respectively, and demonstrate their applications for event search, disaster report management, and POI search.
Dr. Chia-Hui Chang is a full Professor at National Central University, Taiwan. Dr. Chang obtained her Ph.D. in Computer Science and Information Engineering from National Taiwan University, Taiwan in 1999. Her research interests focus on Information Extraction, Web Intelligence, Data Mining, Machine Learning and System Integration. Dr. Chang has published more than 80 papers at refereed conferences and journals including WWW, PAKDD, TKDE, IEEE Intelligent Systems, etc. She served as area co-chairs for ACL 2017, NAACL 2018 and PC members for ICDE, CIKM, PAKDD, AAAI, ICTIR, etc. She is also an executive director of Taiwan Association for Artificial Intelligence (TAAI) and currently vice president for Association for Computational Linguistic and Chinese Language Processing (ACLCLP).