Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21]

@

Journal of Information Science and Engineering, Vol. 30 No. 3, pp. 787-817 (May 2014)


Method for Extracting Valuable Common Structures from Heterogeneous Rooted and Labeled Tree Data*


JURYON PAIK1, JUNGHYUN NAM2, UNG MO KIM1 AND DONGHO WON1,+
1Department of Computer Engineering
Sungkyunkwan University
Suwon, 440-746 Korea
2Department of Computer Engineering
Konkuk University
Chungju, 380-701 Korea

The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns. Because mining frequent tree patterns has a wide range of applications such as XML mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) the computationally high cost of the candidate maintenance, (2) the repetitious input dataset scans, and (3) the high memory dependency. These problems stem from the fact that most of the algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in them. To solve the problems, we apply the pattern-growth approach instead of the apriori approach, and choose to extract maximal frequent subtree patterns rather than frequent subtree patterns. We present several new definitions and evaluate the effectiveness of the proposed algorithm.

Keywords: tree mining, subtree pattern, maximal frequent subtree, pattern-growth approach, label projection

Full Text () Retrieve PDF document (201405_15.pdf)

Received February 7, 2013; revised May 19 & July 13, 2013; accepted August 19, 2013.
Communicated by Vincent S. Tseng.
* This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2010-0020210).
+ Corresponding author.