文摘
We model each pixel as local spatial-frequency patterns and local extremum patterns. We propose a soft spatial binning to adaptively aggregate the generated feature set. We represent keypoints as the second-order statistics to capture feature correlations. Our descriptor is rotationally invariant, highly discriminative, and robust to noise.