PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce
Abstract
PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.
Domains
Computer Science [cs]Origin | Files produced by the author(s) |
---|
Loading...