Estimation of False Drops in Set-valued Objects Retrieval with Signature Files

Hiroyuki Kitagawa (Institute of Information Sciences and Electronics, University of Tsukuba),
Yoshiaki Fukishima (Master's Degree Program in Sciences and Engineering, University of Tsukuba),
Yoshiharu Ishikawa (Doctoral Degree Program in Engineering, University of Tsukuba),
Nobuo Ohbo (Institute of Information Sciences and Electronics, University of Tsukuba)

PS file (18 pages, 528,524 bytes)


Advanced database systems have to support complex data structures as treated in object-oriented data models and nested relational data models. In particular, efficient processing of set-valued object retrieval (simply, set retrieval) is indispensable for such systems. In the previous paper, we proposed the use of signature files as efficient set retrieval facilities and showed their potential capabilities based on a disk page access cost model. Retrieval with signature files is always accompanied by mismatches called false drops, and it is very important in designing signature files to properly control the false drops.

In this paper, we present an in-depth study of false drops in set retrieval with signature files. We derive formulas estimating false drops in four types of set retrieval based on the ``has-subset,'' ``is-subset,'' ``has-intersection,'' and ``is-equal'' relationships. Then we evaluate their validity by computer simulations. Simulation study is also done to investigate false drops in practically probable more complex situations.