TR-IIS-05-015    Fulltext

Design and Implementation of Domain-Based Proxy Prefetching

Ray-I Chang, Jan-Ming Ho


Users are usually interested in some specific domains while surfing the Internet. Based on such domain-preferential browse-behavior, the Domain-Top (DT) proxy
prefetching method is proposed. DT uses the popular pages in the same popular domain to model users’ future demands. If there is a request for any one of the pages in the popular domain, the popular pages in the same domain are considered as its future demands and will be prefetched. The development of DT prefetching is based on a hypothesis that the browse-behavior is always domain-preferential. However, clients may explore the Internet aimlessly and will aceess different domains in the near future. Analyzing proxy logs without considering diverse browse-behavior may acquire wrong anticipation in prefetching. This paper proposes the DTC (DT prefetching with Classification) method that tries to improve DT prefetching by removing unreliable logs. DTC adopts the concept of entropy to discriminate the browse-behavior from "domain mode" and "exploratory mode". Only access logs in domain mode are considered in calculating the popular domains. Different from DT that considers a constant number of popular pages in prefetching, we ssign each domain a suitable number of popular pages. Experiments on real traces show that the proposed DTC method can achieve higher hit ratio than that of the DT method. As DTC utilizes only the historical logs to offline decide the popular pages and the popular domains for prefetching, only few function modules on the present proxy need to be revised. It imposes small burden and can be easily implemented in Squid -- the most famous open source proxy server.


Keywords: proxy caching, web prefetching, open source software