The Dark Web Archiving Project: Web Sites, Forums, and Multimedia Contents

 

 

Hsinchun Chen

Professor

Department of Management Information Systems, University of Arizona, USA

Homepage: http://ai.bpa.arizona.edu/hchen/

 

Abstract:

The international and political landscape had been significantly altered since September 11, 2001. Cultural, religious, and ideological conflicts have surfaced drastically recently. Some of these conflicts are accelerated by the advancement in various digital, information and communication technologies. Extremism and terrorism are some of the unintended consequences of this "flat" world (according to the book "The World is Flat" by Thomas Friedman). Based on theories and observations about "Social Movement Organizations," our NSF-funded Dark Web project aims to create an open-source, longitudinal research testbed of extremist-generated contents on the web, including web sites, forums, and various multimedia documents. Such a collection will bring significant research value to political, social, and international relation scientists in understanding some of the root causes of conflict. In this talk I will review the spidering, archiving, and analysis methodology and techniques developed for the Dark Web project. Over the past four years we have generated one of the world's largest such collections, with 1000s of extremist web sites (millions of web pages), 100s of high-quality forums (several hundred-thousand participants, threads and postings), and millions of multimedia documents (images and videos). We have also used our collection to perform social network analysis, content analysis, and web metrics analysis of the extremist web sites, authorship and sentiment analysis of the extremist forums, and content analysis of the extremist videos. Lessons learned and future directions will be discussed during the talk.

 

Biography:

Dr. Hsinchun Chen is McClelland Professor of Management Information Systems at the University of Arizona and Andersen Consulting Professor of the Year (1999). He received the B.S. degree from the National Chiao-Tung University in Taiwan, the MBA degree from SUNY Buffalo, and the Ph.D. degree in Information Systems from the New York University. He is author/editor of 13 books 17 book chapters, and more than 130 SCI journal articles covering intelligence analysis, biomedical informatics, data/text/web mining, digital library, knowledge management, and Web computing. His recent books include: Medical Informatics: Knowledge Management and Data Mining in Biomedicine and Intelligence and Security Informatics for International Security: Information Sharing and Data Mining, both published by Springer. Dr. Chen was ranked #8 in publication productivity in Information Systems (CAIS 2005) and #1 in Digital Library research (IP&M 2005) in two recent bibliometric studies. He serves on ten editorial boards including: ACM Transactions on Information Systems, ACM Journal on Educational Resources in Computing, IEEE Transactions on Intelligent Transportation Systems, IEEE Transactions on Systems, Man, and Cybernetics, Journal of the American Society for Information Science and Technology, Decision Support Systems, and International Journal on Digital Library. Dr. Chen has served as a Scientific Counselor/Advisor of the National Library of Medicine (USA), Academia Sinica (Taiwan), and National Library of China (China). He has been an advisor for major NSF, DOJ, NLM, DOD, DHS, and other international research programs in digital library, digital government, medical informatics, and national security research. Dr. Chen is founding director of Artificial Intelligence Lab and Hoffman E-Commerce Lab. The UA Artificial Intelligence Lab, which houses 40+ researchers, has received more than $20M in research funding from NSF, NIH, NLM, DOD, DOJ, CIA, DHS, and other agencies over the past 17 years. The Hoffman E-Commerce Lab, which has been funded mostly by major IT industry partners, features one of the most advanced e-commerce hardware and software environments in the College of Management. Dr. Chen is conference co-chair of ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2004 and has served as the conference/program co-chair for the past eight International Conferences of Asian Digital Libraries (ICADL), the premiere digital library meeting in Asia that he helped develop. Dr. Chen is also (founding) conference co-chair of the IEEE International Conferences on Intelligence and Security Informatics (ISI) 2003-2007. The ISI conference, which has been sponsored by NSF, CIA, DHS, and NIJ, has become the premiere meeting for international and homeland security IT research. Dr. Chen’s COPLINK system, which has been quoted as a national model for public safety information sharing and analysis,has been adopted in more than 150 law enforcement and intelligence agencies in 20 states. The COPLINK research had been featured in New York Times, Newsweek, Los Angeles Times, Washington Post, Boston Globe, among others. The COPLINK project was selected as a finalist by the prestigious International Association of Chiefs of Police (IACP)/Motorola 2003 Weaver Seavey Award for Quality in Law Enforcement in 2003. COPLINK research has recently been expanded to border protection (BorderSafe), disease and bioagent surveillance (BioPortal), and terrorism informatics research (Dark Web), funded by NSF, CIA, and DHS. Dr. Chen is the founder of the Knowledge Computing Corporation, a university spin-off company, which is a market leader in law enforcement and intelligence information sharing and data mining. Dr. Chen has also received numerous awards in information technology and knowledge management education and research including: AT&T Foundation Award, SAP Award, the Andersen Consulting Professor of the Year Award, the University of Arizona Technology Innovation Award, and the National Chaio-Tung University Distinguished Alumnus Award. Dr. Chen is an IEEE Fellow.