%0 Conference Proceedings %T PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce %+ Graduate School of the Chinese Academy of Sciences (GSCAS) %+ Key Laboratory of Intelligent Information Processing, Institute of Computing Technology [Beijing] %A Li, Ning %A Zhuang, Fuzhen %A He, Qing %A Shi, Zhongzhi %Z Part 2: Machine Learning %< avec comité de lecture %( IFIP Advances in Information and Communication Technology %B 7th International Conference on Intelligent Information Processing (IIP) %C Guilin, China %Y Zhongzhi Shi %Y David Leake %Y Sunil Vadera %I Springer %3 Intelligent Information Processing VI %V AICT-385 %P 40-49 %8 2012-10-12 %D 2012 %R 10.1007/978-3-642-32891-6_8 %K Probabilistic Latent Semantic Analysis %K MapReduce %K EM %K Parallel %Z Computer Science [cs]Conference papers %X PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement. %G English %Z TC 12 %2 https://inria.hal.science/hal-01524958/document %2 https://inria.hal.science/hal-01524958/file/978-3-642-32891-6_8_Chapter.pdf %L hal-01524958 %U https://inria.hal.science/hal-01524958 %~ IFIP %~ IFIP-AICT %~ IFIP-TC %~ IFIP-TC12 %~ IFIP-IIP %~ IFIP-AICT-385