An Efficient Semantic-Based Organization and Similarity Search Method for Internet Data Resources
Abstract
A large number of data resources with different types are appearing in the internet with the development of information technology, and some negative ones have done harm to our society and citizens. In order to insure the harmony of the society, it is important to discovery the bad resources from the heterogeneous massive data resources in the cyberspace, the internet resource discovery has attracted increasing attention. In this paper, we present the iHash method, a semantic-based organization and similarity search method for internet data resources. First, the iHash normalizes the internet data objects into a high-dimensional feature space, solving the “feature explosion” problem of the feature space; second, we partition the high-dimensional data in the feature space according to clustering method, transform the data clusters into regular shapes, and use the Pyramid-similar method to organize the high-dimensional data; finally, we realize the range and kNN queries based on our method. At last we discuss the performance evaluation of the iHash method and find it performs efficiently for similarity search.
Origin | Files produced by the author(s) |
---|
Loading...