Parallel External Memory Suffix Sorting
详细信息    查看全文
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9133
  • 期:1
  • 页码:329-342
  • 全文大小:335 KB
  • 参考文献:1.Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53鈥?6 (2004)MATH MathSciNet View Article
    2.Andersson, A., Hagerup, T., H氓stad, J., Petersson, O.: Tight bounds for searching a sorted array of strings. SIAM J. Comput. 30(5), 1552鈥?578 (2000)MATH MathSciNet View Article
    3.Apostolico, A., Iliopoulos, C.S., Landau, G.M., Schieber, B., Vishkin, U.: Parallel construction of a suffix tree with applications. Algorithmica 3, 347鈥?65 (1988)MATH MathSciNet View Article
    4.Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. IEEE 88(11), 1733鈥?744 (2000)View Article
    5.Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Sanders, P., Zeh, N. (eds.) ALENEX 2013. pp. 88鈥?02. SIAM (2013)
    6.Blelloch, G.E., Shun, J.: A simple parallel cartesian tree algorithm and its application to suffix tree construction. In: M眉ller-Hannemann, M., Werneck, R.F.F. (eds.) ALENEX 2011, pp. 48鈥?8. SIAM (2011)
    7.Crauser, A., Ferragina, P.: A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32(1), 1鈥?5 (2002)MATH MathSciNet View Article
    8.Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for XXL data sets. Softw. Pract. Exper. 38(6), 589鈥?37 (2008)View Article
    9.Deo, M., Keely, S.: Parallel suffix array and least common prefix for the GPU. In: Nicolau, A., Shen, X., Amarasinghe, S.P., Vuduc, R.W. (eds.) PPoPP 2013, pp. 197鈥?06. ACM (2013)
    10.Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987鈥?011 (2000)MATH MathSciNet View Article
    11.Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552鈥?81 (2005)MathSciNet View Article
    12.Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707鈥?30 (2012)MATH MathSciNet View Article
    13.Gonnet, G.H., Baeza-Yates, R.A., Snider, T.: New indices for text: pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 66鈥?2. Prentice-Hall, Englewood Cliffs (1992)
    14.Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378鈥?07 (2005)MATH MathSciNet View Article
    15.K盲rkk盲inen, J., Kempa, D.: Engineering a lightweight external memory suffix array construction algorithm. In: Iliopoulos, C.S., Langiu, A. (eds.) ICABD 2014, CEUR Workshop Proceedings, vol. 1146, pp. 53鈥?0 (2014). CEUR-WS.鈥媜rg
    16. K盲rkk盲inen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel-Ziv factorization: simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 189鈥?00. Springer, Heidelberg (2013) View Article
    17.K盲rkk盲inen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Bilgin, A., Marcellin, M.W., Serra-Sagrist脿, J., Storer, J.A. (eds.) DCC 2014, pp. 153鈥?62. IEEE (2014)
    18. K盲rkk盲inen, J., Kempa, D., Puglisi, S.J.: String range matching. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 232鈥?41. Springer, Heidelberg (2014)
    19.K盲rkk盲inen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918鈥?36 (2006)MathSciNet View Article
    20.Kulla, F., Sanders, P.: Scalable parallel suffix array construction. Parallel Comput. 33(9), 605鈥?12 (2007)View Article
    21.Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10(3), R25 (2009)View Article
    22.Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754鈥?760 (2009)View Article
    23. Louza, F.A., Telles, G.P., Ciferri, C.D.D.A.: External memory generalized suffix and LCP arrays construction. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 201鈥?10. Springer, Heidelberg (2013) View Article
    24.Manber, U., Myers, G.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935鈥?48 (1993)MATH MathSciNet View Article
    25.Mori, Y.: libdivsufsort, a C library for suffix array construction. http://鈥媍ode.鈥媑oogle.鈥媍om/鈥媝/鈥媗ibdivsufsort/鈥?/span>
    26.Nakamura, R., Inenaga, S., Bannai, H., Funamoto, T., Takeda, M., Shinohara, A.: Linear-time text compression by longest-first substitution. Algorithms 2(4), 1429鈥?448 (2009)MathSciNet View Article
    27.Nong, G., Chan, W.H., Zhang, S., Guan, X.F.: Suffix array construction in external memory using d-critical substrings. ACM Trans. Inf. Syst. 32(1), 1 (2014)View Article
    28. Osipov, V.: Parallel suffix array construction for shared memory architectures. In: Calder贸n-Benavides, L., Gonz谩lez-Caro, C., Ch谩vez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 379鈥?84. Springer, Heidelberg (2012) View Article
    29.Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 31 (2007). Article 4View Article
    30.Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549鈥?56 (2012)View Article
    31.Tischler, G.: Faster average case low memory semi-external construction of the Burrows-Wheeler transform. In: Iliopoulos, C.S., Langiu, A. (eds.) ICABD 2014, CEUR Workshop Proceedings, vol. 1146, pp. 61鈥?8 (2014). CEUR-WS.鈥媜rg
    32.Weiner, P.: Linear pattern matching algorithms. In: SWAT 1973, pp. 1鈥?1. IEEE (1973)
    33.Williams, H.E., Zobel, J.: Compressing integers for fast file access. Comput. J. 42(3), 193鈥?01 (1999)View Article
    34.Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337鈥?43 (1977)MATH MathSciNet View Article
  • 作者单位:Juha K盲rkk盲inen (16)
    Dominik Kempa (16)
    Simon J. Puglisi (16)

    16. Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
  • 丛书名:Combinatorial Pattern Matching
  • ISBN:978-3-319-19929-0
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
Suffix sorting (or suffix array construction) is one of the most important tasks in string processing, with dozens of applications, particularly in text indexing and data compression. Some of these applications require the suffix array to be built for large inputs that greatly exceed the size of RAM and so external memory must be used. However, existing approaches for external memory suffix sorting either use debilitatingly large amounts of disk space, or become too slow when the size of the input data is more than a few times bigger than the size of RAM. In this paper we address the latter problem via a non-trivial parallelization of computation. In our experiments, the resulting algorithm is much faster than the best prior external memory algorithms while using very little disk space in addition to what is needed for the input and output. On the way to this result we provide the current fastest (parallel) internal memory algorithm for suffix sorting, which is usually around twice as fast as previous methods, while using around one quarter of the working space.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700