Linux内核中的预取算法
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
计算机系统中最严重的不平衡来自CPU与I/O之间持续扩大的性能差距。磁盘传输带宽的改善速度跟不上CPU计算能力的增长,数据访问延迟的改善更是严重地滞后。预取作为提升磁盘I/O性能的一种重要技术,可以有效地将来自应用程序的同步的小I/O转化为异步的大I/O,从而减少访问延迟,并促成I/O的并行化。
     对顺序文件访问的预读是现代操作系统的一项必备功能。Linux内核在虚拟文件系统层实现了一个通用的文件预读算法。随着Linux的广泛应用和在越来越多样化的负载环境中运行,它的预取算法遇到了很多微妙问题的挑战。例如:当内存压力较大的时候,预读页面会被过早地回收,出现预读抖动;对已经缓存在内存中的页面进行预读操作是毫无意义的;被中断的读请求及其重试读可能使顺序性检测算法陷于混乱。本文设计了一个模块化的按需预读框架。通过引入一对预取触发条件,使一些原本复杂的问题以统一和自然的方式得到了处理。在此基础上,对访问模式的检测和预读逻辑被简化,并可以独立地进行扩展。
     随着多核、多线程处理器成为新的硬件设计方向,软件的并行化也随之成为新的潮流,并带来更多的并发I/O。如何在交织的并发I/O中有效地识别和处理顺序流,已经成为预取算法所面临的一个现实挑战。我们在按需预读框架的基础上,设计并实现了被动和主动两种顺序流的检测和预读算法,引入页面状态作为对不可靠的预读状态变量的必要补充,采用更宽松的顺序性判决条件。它们可以有效地支持:淹没在随机读中的顺序访问;对单个文件实例的并发访问产生的交织访问模式;稀疏的顺序读;局部重排序的NFS顺序读;高密度的随机读区域;对大矩阵的列扫描,等等。
     被动算法在单线程的情况下不会造成额外的开销,而在多线程情况下的时空复杂度与并发度的大小无关。主动算法通过高效的检测逻辑,确保发现顺序流和随机访问的热点区域,自适应的预读大小可以避免预读抖动,因而往往可以获得更好的性能。实验表明本文算法相对传统预读有突出的优势:在发生预读抖动的情况下,网络输出流量可达到传统算法的3~4倍;在随机干扰之下的顺序读性能可提高29%;交织读的性能是传统算法的4~27倍,同时应用程序可见延迟改善了35倍;在NFS文件服务中,性能提高了1.8倍:在lighttpd文件服务中,磁盘占用率降低了26%,网络输出提高了17%;在随机文件加载实验中,主动检测算法可以获得3倍于传统预读的性能。本文所述的预读算法具有重大的应用价值,其中的按需预读基本框架和顺序流的被动预读算法已先后被Linux 2.6.23和2.6.24内核采用。
The ever-increasing gap between CPU and I/O speed has created the most outstanding imbalance in computer systems.The improvement of disk throughput has been unable to keep up with the rapidly speeding-up CPU,and the I/O delay has been lagging behind seriously.Prefetching is a vital technique for improving disk I/O performance.It can transform small synchronous I/O requests from applications into large asynchronous readahead I/Os,so that I/O delays could be hid from applications,and I/O could be parallelized.
     Sequential prefetching has been a standard function in modern operating systems. The Linux kernel implemented a generic file readahead algorithm in its virtual file system layer.As Linux is widely deployed and runs an increasing variety of workloads, its in-kernel readahead algorithm has been challenged by many unexpected and subtle problems.To name a few:readahead thrashings arise when readahead pages are evicted prematurely under memory pressure;readahead attempts on already cached pages are undesirable;interrupted-then-retried reads can easily fool the sequential detection logic. In this thesis,we present a demand prefetching framework.By introducing a pair of readahead triggers,it can handle most abnormal cases in a natural and uniform way.The modular design simplifies the pattern detection and readahead logics,and makes them independently improvable.
     With the trend towards multi-core and multi-thread processors,software developers multi-thread their applications and bring forward more parallel I/Os.It is a big challenge for the prefetching algorithm to efficiently recognize and handle the interleaved sequential read streams in parallel I/O workloads.On the basis of the demand prefetching framework, we designed and implemented a reactive sequential detection algorithm and a proactive one,which feature the use of page status as a complement to the less reliable readahead states,as well as relaxed sequentiatity criterion.They can support various semi-sequential access patterns:sequential streams mixed in random accesses;interleaved read pattern created by concurrent sequential accesses to one file descriptor;sparse sequential reads; locally reordered sequential NFS reads;high density random read areas;row iterations on huge arrays,etc.
     The reactive algorithm brings no overhead to the classical single thread case;for parallel I/Os the space and time complexities are trivial and irrelevant to the number of concurrent streams.The proactive algorithm is equipped with an efficient sequential detection logic,which guarantees the detection of sequential streams and hot random read areas.Its adaptive readahead size can prevent readahead thrashing and hence achieve better performance.Experiments show that the proposed algorithms perform much better than legacy Linux readahead:Network throughput is 3~4 times better in the case of thrashing; Sequential reads intermixed with random ones are faster by 29%;I/O throughput of interleaved streams could be 4~27 times better,while the application visible I/O latencies are improved by up to 35 times;On serving large files via NFS,throughput could be 1.8 times better;On serving large files with lighttpd,the disk utilization is decreased by 26%while providing 17%more network throughput;On randomly populating a file,the proactive algorithm performs 3 time better than the legacy one.Our prefetching algorithms have real-world relevance,among them the demand prefetching framework and reactive sequential prefetching algorithm has been adopted by Linux kernel 2.6.23 and 2.6.24.
引文
Greg Banks.Nfs tuning secrets.Technical report,sgi,2008.URL http://mirror.linux.org.au/pub/linux.conf.au/2 0 0 8/slides/130-lca2008-nfs-tuning-secrets-d7.odp.
    Suparna Bhattacharya,John Tran,Mike Sullivan,and Chris Mason.Linux aio performance and robustness for enterprise workloads.In Proceedings of the Linux Symposium,volume 1,pages 63-78,Ottawa,Ontario,Canada,July 2004.
    Daniel P.Bovet and Marco Cesati.Understanding the Linux Kernel,3rd Edition.O'Reilly,2005.ISBN 0-596-00565-2.
    Angela Demke Brown,Todd C.Mowry,and Orran Krieger.Compiler-based i/o prefetching for out-of-core applications.ACM Transactions on Computer Systems,19:111-170,2001.doi:10.1145/377769.377774.URL http://portal,acm.org/citation.cfm?id=377774.
    Randal E.Bryant and David R.O'Hallaron.Computer Systems:A Programmer's Perspective.Prentice Hall,2003.ISBN 0-13-034074-X.
    Ali R.Butt,Chris Gniady,and Y.Charlie Hu.The performance impact of kernel prefetching on buffer cache replacement algorithms.IEEE Transactions on Computers,56(7):889—908,July 2007.ISSN 0018-9340.doi:10.1109/TC.2007.1029.
    Pei Cao,Edward W.Felten,Anna R.Karlin,and Kai Li.A study of integrated prefetching and caching strategies.In Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems,pages 188-197,Ottawa,Ontario,Canada,1995.ACM.ISBN 0-89791-695-6.doi:10.1145/223587.223608.URL http://portal.acm.org/ citation.cfm?id=223608.
    Pei Cao,Edward W.Felten,Anna R.Karlin,and Kai Li.Implementation and performance of integrated application-controlled file caching,prefetching,and disk scheduling.ACM Transactions on Computer Systems,14:311-343,1996.doi:10.1145/235543.235544.URL http://portal.acm.org/citation.cfm?id=235544.
    Fay W.Chang and Garth A.Gibson.Automatic i/o hint generation through speculative execution.In Operating Systems Design and Implementation,pages 1-14,1999.URL http://citeseer.ist.psu.edu/chang99automatic.html.
    P0eter M.Chen,Edward K.Lee,Garth A.Gibson,Randy H.Katz,and David A.Patterson.Raid:high-performance,reliable secondary storage.ACM Computing Surveys,26(2):145-185,1994.ISSN 0360-0300.doi:http://doi.acm.org/10.1145/176979.176981.
    Gianluca Dini,Giuseppe Lettieri,and Lanfranco Lopriore.Caching and prefetching algorithms for programs with looping reference patterns.The Computer Journal,49:42-61,2006.doi:10.1093/comjnl/bxh140.URL http://comjnl.oxfordjournals.org/cgi/content/abstract/49/1/42.
    Douglas Dumitru.Understanding flash ssd performance.Technical report,EasyCo LLC,2007.URL http://managedflash.com/news/papers/easyco-flashperformance-art.pdf.
    Daniel Ellard and Margo Seltzer.Nfs tricks and benchmarking traps.In Proceedings of the FREENIX 2003 Technical Conference,pages 101-114,San Antonio,TX,June 2003.URL http://citeseer.ist.psu.edu/ellard03nfs.html.
    Daniel Ellard,Jonathan Ledlie,Pia Malkani,and Margo Seltzer.Passive NFS tracing of email and research workloads.In Proceedings of the Second USENIX Conference on File and Storage Technologies (FAST'03),pages 203-216,San Francisco,CA,March 2003.URL http://citeseer.ist.psu.edu/ellard03passive.html.
    Behdad Esfahbod.Preload-An Adaptive Prefetching Daemon.PhD thesis,Graduate Department of Computer Science,University of Toronto,2006.
    Wu-Chun Feng.The importance of being low power in high-performance computing.Technical report,Los Alamos National Laboratory,August 2005.
    Keir Fraser and Fay Chang.Operating system i/o speculation:How two invocations are faster than one.In Proceedings of the General Track:2003 USENIX Annual Technical Conference,San Antonio,Texas,USA,June 2003.URL http://citeseer.ist.psu.edu/674785.html.
    John F.Gantz,David Reinsel,Christopher Chute,Wolfgang Schlichting,John McArthur,Stephen Minton,Irida Xheneti,Anna Toncheva,and Alex Manfrediz.The expanding digital universe-a forecast of worldwide information growth through 2010.Technical report,IDC,March 2007.
    John F.Gantz,Christopher Chute,Alex Manfrediz,Stephen Minton,David Reinsel,Wolfgang Schlichting,and Anna Toncheva.The diverse and exploding digital universe-an updated forecast of worldwide information growth through 2011.Technical report,IDC,March 2008.
    Binny S.Gill and Luis Angel D.Bathen.Optimal multistream sequential prefetching in a shared cache.ACM Transactions on Storage,3(3):10,2007.ISSN 1553-3077.doi:http://doi.acm.org/10.1145/1288783.1288789.
    Binny S.Gill and Dharmendra S.Modha.Sarc:sequential prefetching in adaptive replacement cache.In ATEC'05:Proceedings of the USENIX Annual Technical Conference 2005 on USENIX Annual Technical Conference,pages 33-33,Berkeley,CA,USA,2005.USENIX Association.
    Ed Grochowski.Hdd technology roadmap.Technical report,Hitachi San Jose Research Center,2003.URL http://www.hitachigst.com/hdd/hddpdf/tech/hdd_technology2003.pdf.
    Windsor W.Hsu,Alan Jay Smith,and Honesty C.Young.I/o reference behavior of production database workloads and the tpc benchmarks-an analysis at the logical level.ACM Transactions on Database Systems,26:96-143,2001.doi:10.1145/383734.383737.URL http://portal.acm.org/citation.cfm?id=383737.
    Song Jiang,Feng Chen,and Xiaodong Zhang.Clock-pro:an effective improvement of the clock replacement.In ATEC '05:Proceedings of the annual conference on USENIX Annual Technical Conference,pages 35-35,Berkeley,CA,USA,2005.USENIX Association.
    Greg Kroah-Hartman,Jonathan Corbet,and Amanda McPherson.Linux kernel development:How fast is it going,who is doing it and who is sponsoring it? Technical report,Linux Foundation,2008.URL https://www.linuxfoundation.org/publications/linuxkerneldevelopment.php.
    Tom M.Kroeger and Darrell D.E.Long.Design and implementation of a predictive file prefetching
    ?algorithm.In Proceedings of the General Track:2002 USENIX Annual Technical Conference,pages
    105-118,Berkeley,CA,USA,2001.USENIX Association.ISBN 1-880446-09-X.
    
    Hui Lei and Dan Duchamp.An analytical approach to file prefetching.In 1997 USENIX Annual Technical Conference,Anaheim,California,USA,1997.URL http://citeseer.ist.psu.edu/article/le i97analyticai.html.
    Chuanpeng Li and Kai Shen.Managing prefetch memory for data-intensive online servers.In FAST'05:Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies,pages 19-19,Berkeley,CA,USA,2005.USENIX Association.
    Chuanpeng Li,Kai Shen,and Athanasios E.Papathanasiou.Competitive prefetching for concurrent sequential i/o.In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2007,pages 189-202,Lisbon,Portugal,2007.ACM.ISBN 978-1-59593-636-3.doi:10.1145/1272996.1273017.URL http://portal.acm.org/citation,cfm?id=1273017.
    Zhenmin Li,Zhifeng Chen,and Yuanyuan Zhou.Mining block correlations to improve storage performance.ACM Transactions on Storage,1:213-245,2005.doi:10.1145/1063786.1063790.URL http://portal,acm.org/citation.cfm?id=106 3790.
    Shuang Liang,Song Jiang,and Xiaodong Zhang.Step:Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers.In ICDCS '07:Proceedings of the 27th International Conference on Distributed Computing Systems,page 64,Washington,DC,USA,2007.IEEE Computer Society.ISBN 0-7695-2837-3.doi:http://dx.doi.org/10.1109/ICDCS.2007.141.
    Thad Mcllroy and Arcadia House.Thriving in the digital age - a guide for printers.Technical report,Arcadia House Electronic Publishers,Inc.,2006.URL http://thefutureofpublishing.com/documents/Presentations/Thriving_in_the_Digital Age,pdf.
    Bill Moore.Zfs—the last word in file systems.Technical report,Sun Microsystems,2006.URL http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf
    Ram Pai,Badari Pulavarty,and Mingming Cao.Linux 2.6 performance improvement through readahead optimization.In Proceedings of the Linux Symposium,volume 2,pages 391-402, Ottawa,Ontario,Canada,July 2004.URL http://www.linuxsymposium.org/2004/ view_abstract.php?content_key=54.
    Athanasios E.Papathanasiou and Michael L.Scott.Aggressive prefetching:An idea whose time has come.In Proceedings of the 10th Workshop on Hot Topics in Operating Systems (HotQS),2005.
    David A.Patterson,Garth Gibson,and Randy H.Katz.A case for redundant arrays of inexpensive disks (raid).SIGMOD Rec,17(3):109-116,1988.ISSN 0163-5808.doi:http://doi.acm.org/10.1145/971701.50214.
    R.H.Patterson,G.A.Gibson,E.Ginting,D.Stodolsky,and J.Zelenka.Informed prefetching and caching.In Proceedings of the fifteenth ACM symposium on Operating systems principles,pages 79-95,Copper Mountain,Colorado,United States,1995.ACM.ISBN 0-89791-715-4.doi:10.1145/224056.224064.URL http://portal.acm.org/citation.cfm?id=224064.
    R.Hugo Patterson,Garth A.Gibson,and M.Satyanarayanan.A status report on research in transparent informed prefetching.ACM SIGOPS Operating Systems Review,27(2):21-34,1993.ISSN 0163-5980.doi:http://doi.acm.org/10.1145/155848.155855.
    Jehan-Francois Paris,Ahmed Amer,and Darrell D.E.Long.A stochastic approach to file access prediction.In Proceedings of the international workshop on Storage network architecture and parallel I/Os,pages 36-40,New Orleans,Louisiana,2003.ACM,doi:10.1145/1162618.1162623.URL http://portal.acm.org/citation.cfm?id=1162623.
    Sonny Rao,Dominique Heger,and Steven Pratt.Examining linux 2.6 page-cache performance.In Proceedings of the Linux Symposium,volume 2,pages 79-90,Ottawa,Ontario,Canada,July 2005.
    Chris Ruemmler and John Wilkes.UNIX disk access patterns.In Usenix Conference,pages 405-420,Winter 1993.URLhttp://citeseer.ist.psu.edu/ruemmler93unix.html.
    Hugo Patterson Ⅲ Russel.Informed Prefetching and Caching.PhD thesis,School of Computer Science,Carnegie Mellon University,Pittsburgh,PA 15213,USA,December 1997.
    Patrick Schmid.15 years of hard drive history:Capacities outran performance,November 2006.URL http://www.tomshardware.com/reviews/15-years-of-hard-drive-history,1368.html.
    Elizabeth Shriver,Christopher Small,and Keith A.Smith.Why does file system prefetching work?In ATEC'99:Proceedings of the Annual Technical Conference on 1999 USENIX Annual Technical Conference,pages 71-84,Berkeley,CA,USA,1999.USENIX Association.
    Evgenia Smirni and Daniel Reed.Lessons from characterizing the input/ output behavior of parallel scientific applications.Performance Evaluation,33:27-44(18),June 1998.doi:10.1016/S0166-5316(98)00009-1.URL http://www.ingentaconnect.com/content/els/01665316/1998/00000033/00000001/art00009.
    Gary A.S.Whittle,Jehan-Francois P(?)ris,Ahmed Amer,Darrell D.E.Long,and Randal Bums.Using multiple predictors to improve the accuracy of file access predictions.In MSS '03:Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03),page 230,Washington,DC,USA,2003.IEEE Computer Society.ISBN 0-7695-1914-8.
    Fengguang Wu,Hongsheng Xi,Jun Li,and Nanhai Zou.Linux readahead:less tricks for more.In Proceedings of the Linux Symposium,volume 2,pages 273-284,Ottawa,Ontario,Canada,June 2007.
    Fengguang Wu,Hongsheng Xi,and Chenfeng Xu.On the design of a new linux readahead framework.ACM SIGOPS Operating Systems Review,42(5):75-84,July 2008.ISSN 0163-5980.
    Jianyong Zhang,Anand Sivasubramaniam,Hubertus Franke,Natarajan Gautam,Yanyong Zhang,and Shailabh Nagar.Synthesizing representative i/o workloads for tpc-h.In Proceedings of the 10th International Symposium on High Performance Computer Architecture,page 142.IEEE Computer Society,2004.ISBN 0-7695-2053-7.URL http://portal.acm.org/citation.cfm?id=1072464.
    周应超.Linux内核的文件cache管理机制介绍.Technical report,中科院计算所,2006.URL http://www.ibm.com/developerworks/cn/linux/1-cache/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700