Optimizing R with SparkR on a commodity cluster for biomedical research

详细信息查看全文

文摘

•: R is a popular environment for clinical data analysis. It does not directly support big data workloads.
•: Both, the Message Passing Interface (MPI) and SparkR allow to parallelize computational demanding workloads on clusters.
•: SparkR offers elastic resources even on non-dedicated hardware and tight integration with Hadoop distributed services.
•: SparkR requires minimal changes to original code in R in order to utilize parallel execution.
•: Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700