State Spill Policies for State Intensive Continuous Query Plan Evaluation The needs of new modern day applications such as network monitoring systems, telecommunications data management, web applications, remote medical monitoring applications and others for near real time results over continuous data streams have spurred the development of new data management systems called Data Stream Management Systems (DSMS). Unlike traditional database systems which answer one-time user queries only after the finite data has been captured on disk, DSMSs provide on-the-fly answers to user queries as data is arriving at various rates in the form of continuous, potentially infinite streams of tuples. To meet the timeliness requirements of applications, DSMSs aim to keep all data in main memory. Thus queries with multiple stateful operators pose a major strain on memory.33856
Existing adaptation techniques designed to address this issue are ineffective when faced with continuous bursts of high data rates. When system load exceeds system capacity, a DSMS has three options: 1) discard some new data; 2) crash; or 3) spill data to disk. Only option three allows it to produce delayed, yet accurate and complete query results. However, this option involves disk access overhead and change in the natural order of tuples flowing through the query plan tree. As not all stream operators can process correctly out of order tuples, data spilling may have a negative impact on the quality of the final results. Moreover, since operators in a query plan are interconnected, changes in the order of tuple flows inevitably impact the stages of execution of affected downstream operators such as for example data purging . Data purging is necessary for processing continuous queries composed of stateful operators. The state of such operators is pided into finite non-overlapping sets of tuples called windows. Thus, after all the tuples for a window have been processed and all results output, these tuples can be discarded to free memory for new data.
To address these issues, we have redesigned the state structure of continuous operators into smaller, finite, non-overlapping sets of tuples such as partitioned window groups, which incur less disk-access overhead. Second, we provide for the capability of continuous operators to correctly process out of order tuples using punctuation pointers. Third, we design methods for downstream operators to synchronize their processing stages with those of upstream operators to achieve optimized query plan throughput. Putting these techniques together, we have designed a consolidated spilling adaptation strategy which considers all aspects of operators inter-connections in a query plan for making optimal adaptation decisions.
The effectiveness of our integrated approach was empirically tested in a comparative evaluation study against several alternate spilling adaptation strategies. We conducted our experiments on CAPE, a DSMS developed at WPI, using different types of query plans composed of multiple partitioned window join operators. Our experiments prove that despite the higher overhead of a more synchronized adaptation approach, our consolidated strategy provides better query plan performance and higher plan throughput during periods of continuous bursts of high data rates.
连续密集查询计划正式溢出策略评估
新的现代应用,如网络监控系统,电信数据管理,Web应用程序,远程医疗监控应用和别人比连续数据流的近实时结果的需求刺激了新的数据管理系统的开发称为数据流管理系统( DSMS)。不像回答一次性用户查询后,才有限数据被捕获在磁盘上的传统数据库系统中,DSMSs提供即时回答用户查询数据是到达以不同的速率在连续的,潜在的无限流的形式元组。为了满足应用的时效性要求,DSMSs目标是保持所有数据在主内存中。与多个运营状态查询,这样就构成一个内存大的压力。论文网
旨在解决此问题的现有的适应技术是无效的,当面对高数据速率的连续突发。当系统负荷超出系统容量,DSMS有三种选择:1)放弃了一些新的数据; 2)碰撞;或3)数据溢出到磁盘。只有第三个选项允许它来生产推迟,但准确和完整的查询结果。但是,此选项涉及磁盘访问的开销和改变流过查询计划树元组的自然顺序。由于并非所有的流运营商可以正确处理乱序的元组,数据溢出可能对最终结果的质量产生负面影响。此外,由于在查询计划操作是相互关联的,改变元组流的顺序必然影响执行受影响的下游运营商的各阶段,例如数据清除。数据净化是必要的处理状态运营商构成连续查询。这样的运营商的状态分为有限的非重叠的元组集合称为窗口。因此,在所有的元组的窗口都被处理和所有的结果输出,这些元组可以被丢弃,以释放内存为新的数据。 连续密集查询计划正式溢出策略评估英文文献和中文翻译:http://www.youerw.com/fanyi/lunwen_31152.html