%0 Conference Proceedings %T D-JB: An Online Join Method for Skewed and Varied Data Streams %+ CAS Institute of Computing Technology (ICT) %+ China Reinsurance (Group) Corporation [Beijing] (CHINA RE) %A Wang, Chunkai %A Feng, Jian %A Shi, Zhongzhi %Z Part 3: Data Intelligence %< avec comité de lecture %( IFIP Advances in Information and Communication Technology %B 2nd International Conference on Intelligence Science (ICIS) %C Beijing, China %Y Zhongzhi Shi %Y Cyriel Pennartz %Y Tiejun Huang %I Springer International Publishing %3 Intelligence Science II %V AICT-539 %P 115-125 %8 2018-11-02 %D 2018 %R 10.1007/978-3-030-01313-4_12 %K Distributed data stream management system %K Online join %K State migration %K Bipartite graph-based model %Z Computer Science [cs]Conference papers %X Scalable distributed join processing in a parallel environment requires a partitioning policy to transfer data. Online theta-joins over data streams are more computationally expensive and impose higher memory requirement in distributed data stream management systems (DDSMS) than database management systems (DBMS). The complete bipartite graph-based model can support distributed stream joins, and has the characteristics of memory-efficiency, elasticity and scalability. However, due to the instability of data stream rate and the imbalance of attribute value distribution, the online theta-joins over skewed and varied streams lead to the load imbalance of cluster. In this paper, we present a framework D-JB (Dynamic Join Biclique) for handling skewed and varied streams, enhancing the adaptability of the join model and minimizing the system cost based on the varying workloads. Our proposal includes a mixed key-based and tuple-based partitioning scheme to handle skewed data in each side of the bipartite graph-based model, a strategy for redistribution of query nodes in two sides of this model, and a migration algorithm about state consistency to support full-history joins. Experiments show that our method can effectively handle skewed and varied data streams and improve the throughput of DDSMS. %G English %Z TC 12 %2 https://inria.hal.science/hal-02118799/document %2 https://inria.hal.science/hal-02118799/file/474230_1_En_12_Chapter.pdf %L hal-02118799 %U https://inria.hal.science/hal-02118799 %~ IFIP %~ IFIP-AICT %~ IFIP-TC %~ IFIP-TC12 %~ IFIP-ICIS %~ IFIP-AICT-539