In this paper, we study the effects - on both efficiency and accuracy - of various stream approximations such as ignoring correlations, ignoring low-probability states, or retaining only the single most likely sequence of events. Through experiments on a real-world RFID data set, we identify conditions under which various approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. This study is the first to evaluate the cost vs. quality trade-off of imprecise stream models.
We perform this study using Lahar, a prototype Markovian stream warehouse. A secondary contribution of this paper is the development of query semantics and algorithms for processing aggregation queries on the output of pattern queries鈥攚e develop these queries in order to more fully understand the effects of approximation on a wider set of imprecise stream queries.