Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20]

@

Journal of Information Science and Engineering, Vol. 25 No. 4, pp. 1121-1133 (July 2009)

Mining Top-K Path Traversal Patterns over Streaming Web Click-Sequences*

HUA-FU LI1,2 AND SUH-YIN LEE2
1Department of Computer Science
Kainan University
Taoyuan, 338 Taiwan
E-mail: hfli@mail.knu.edu.tw
2Department of Computer Science
National Chiao Tung University
Hsinchu, 300 Taiwan
E-mail: {hfli; sylee}@csie.nctu.edu.tw

Online, one-pass mining Web click streams poses some interesting computational issues, such as unbounded length of streaming data, possibly very fast arrival rate, and just one scan over previously arrived Web click-sequences. In this paper, we propose a new, single-pass algorithm, called DSM-TKP (Data Stream Mining for Top-K Path traversal patterns), for mining a set of top-k path traversal patterns, where k is the desired number of path traversal patterns to be mined. An effective summary data structure, called TKP-forest (a forest of Top-K Path traversal patterns), is used to maintain the essential information about the top-k path traversal patterns generated so far. Experimental studies show that the proposed DSM-TKP algorithm uses stable memory usage and makes only one pass over the streaming Web click-sequences.

Keywords: web usage mining, data streams, path traversal patterns, top-k pattern mining, single-pass mining

Full Text () Retrieve PDF document (200907_10.pdf)

Received August 23, 2007; revised April 21, 2008; accepted June 26, 2008.
Communicated by Makoto Takizawa.
* A preliminary version in Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, France, September 19-22, 2005. The research was supported in part by the National Science Council of Taiwan, R.O.C., Project No. 95-2221-E-009-069-MY3 and NSC 96-2218-E-424-001.