Yacheng WangHanhua ChenHao ChenHai Jin
Abstract Graph stream is the model used to represent evolving graph data over time, which can be represented as a sequence of edge streams containing temporal information. To effectively manage an ultra large scale graph stream, existing designs usually use summarization structures based on compressed matrices to support approximate storage and querying of graph streams. However, the state-of-the-art structures are based on limited-sized compressed matrix. When dealing with dynamical graph stream data, they either use an extra adjacency list outside the compressed matrix to store left-over edges whose expected buckets in the matrix have been occupied by other previously inserted edges, or allocate new building blocks of compressed matrices to provide more space capacity. Such designs suffer from linear lookup time caused by the adjacency list or long system blocking time caused by data movement during structure scaling. Moreover, in graph stream applications with dynamically growing data sizes, recent data commonly carries greater significance and value. Existing designs fail to store the time information of items of graph streams in a space efficient way and leave recent data management over graph streams an unsolved problem. To address the dynamically expanding graph stream with the ability to accentuate the importance of recent data, in this work, we propose Sliding-ITeM, a novel adaptive-size graph stream summarization structure with a sliding window model. Two factors contribute to the efficiency of Sliding-ITeM. First, Sliding-ITeM proposes a novel fingerprint suffix index tree (FSIT) structure to efficiently manage the items assigned to a same bucket of a compressed matrix in a fine-grained and scalable way. It thus achieves time and space efficiency for graph stream management as well as avoiding costly blocking time for structure extending. Second, Sliding-ITeM divides continuous time into discrete time slices and stores items belong to different time slices in separate ITeM compressed matrices. Sliding-ITeM organizes the compressed matrices into a chain style chronologically and achieves efficient obtaining of value from recent data as well as removal of expired data following a sliding-window model. We conduct comprehensive experiments over large-scale graph stream data collected from real world systems to evaluate the performance of Sliding-ITeM. Experimental results show that it significantly reduces the operation latency by more than 67% in sliding window queries compared to state-of-the-art designs, while greatly reducing the system blocking duration by three orders of magnitude.
Tiancheng ZhangDejun YueYu GuYi WangGe Yu
Prajith Ramakrishnan GeethakumariIoannis Sourdis
Mayur DatarAristides GionisPiotr IndykRajeev Motwani