An Efficient Data Indexing Approach on Hadoop Using Java Persistence API

Yang Lai; Shi Zhongzhi

doi:10.1007/978-3-642-16327-2_27

Conference Papers Year : 2010

An Efficient Data Indexing Approach on Hadoop Using Java Persistence API

(1, 2) , (1)

1
2

Yang Lai

Function : Author

The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology

UCAS - Graduate University of Chinese [Beijing]

Shi Zhongzhi

Function : Author

The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology

Abstract

Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining.

Keywords

Domains

Digital Libraries [cs.DL]

Fichier principal

An_Efficient_Data_Indexing_Approach_on_Hadoop_using_Java_Persistence_API.pdf (454.03 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01055056

Submitted on : Monday, August 11, 2014-1:10:02 PM

Last modification on : Tuesday, November 12, 2024-11:34:03 AM

Long-term archiving on : Wednesday, November 26, 2014-9:55:58 PM

Dates and versions

hal-01055056 , version 1 (11-08-2014)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01055056 , version 1
DOI : 10.1007/978-3-642-16327-2_27

Cite

Yang Lai, Shi Zhongzhi. An Efficient Data Indexing Approach on Hadoop Using Java Persistence API. 6th IFIP TC 12 International Conference on Intelligent Information Processing (IIP), Oct 2010, Manchester, United Kingdom. pp.213-224, ⟨10.1007/978-3-642-16327-2_27⟩. ⟨hal-01055056⟩

An Efficient Data Indexing Approach on Hadoop Using Java Persistence API

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share