We exploit super-pixel to obtain semantic motion regions that determine spatial co-occurrence domains.
To capture the co-occurrence statistics at multiple temporal scales and build the relationships of them, a tree-structured model is built by a recursive manner.
High node is generated by fusing the low layer associated nodes which are connected by the patch matching.