Multiple task transfer learning with small sample sizes
详细信息    查看全文
  • 作者:Budhaditya Saha ; Sunil Gupta ; Dinh Phung…
  • 关键词:Multi ; task ; Transfer learning ; Optimization ; Healthcare ; Data mining ; Statistical analysis
  • 刊名:Knowledge and Information Systems
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:46
  • 期:2
  • 页码:315-342
  • 全文大小:1,257 KB
  • 参考文献:1.Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272CrossRef
    2.Argyriou A, Pontil M, Ying Y, Charles MA (2007) A spectral regularization framework for multi-task structure learning. In: Advances in neural information processing systems, pp 25–32
    3.Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New YorkMATH
    4.Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 92–100
    5.Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learning 3(1):1–122CrossRef
    6.Chelba C, Acero A (2006) Adaptation of maximum entropy capitalizer: little data can help a lot. Comput speech Lang 20(4):382–399CrossRef
    7.Chen M, Weinberger KQ, Blitzer J (2011) Co-training for domain adaptation. In: NIPS, pp 2456–2464
    8.Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York
    9.Daumé III H (2009) Bayesian multitask learning with latent hierarchies. In: Proceedings of the 25th conference on uncertainty in artificial intelligence, pp 135–142
    10.Duan L, Xu D, Tsang IW (2012) Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans Neural Netw Learn Syst 23(3):504–518CrossRef
    11.Eaton E, Ruvolo PL (2013) Ella: an efficient lifelong learning algorithm. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 507–515
    12.Evgeniou A, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems: proceedings of the 2006 conference, vol 19. The MIT Press, p 41
    13.Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 109–117
    14.Gao J, Fan W, Jiang J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 283–291
    15.Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: Proceedings of the 18th ACM SIGKDD. ACM, pp 895–903
    16.Gupta S, Phung D, Venkatesh S (2012) A bayesian nonparametric joint factor model for learning shared and individual subspaces from multiple data sources. In: Proceedings of the SDM, pp 200–211
    17.Gupta S, Phung D, Venkatesh S (2013) Factorial multi-task learning: a bayesian nonparametric approach. In: Proceedings of international conference on machine learning, pp 657–665
    18.Hastie T, Tibshirani R, Jerome J, Friedman H (2001) The elements of statistical learning, vol 1. Springer, New YorkMATH CrossRef
    19.Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Neural information processing systems, pp 964–972
    20.Jebara T (2004) Multi-task feature and kernel selection for SVMs. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 55
    21.Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 457–464
    22.Kang Z, Grauman K, Sha F (2011) Learning with whom to share in multi-task feature learning. In: Proceedings of the 28th international conference on machine learning, pp 521–528
    23.Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 65
    24.Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 339–348
    25.Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 547–556
    26.Nemirovski A (2005) Efficient methods in convex programming. Lecture Notes. http://​www2.​isye.​gatech.​edu/​~nemirovs/​
    27.Nesterov Y, Nesterov UE (2004) Introductory lectures on convex optimization: a basic course, vol 87. Springer, Berlin
    28.Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
    29.Rai P, Daume H (2010) Infinite predictor subspace models for multitask learning. In: International conference on artificial intelligence and statistics, pp 613–620
    30.Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264CrossRef
    31.Schmidt M (2010) Graphical model structure learning with l1-regularization. PhD thesis, The University of British Columbia
    32.Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288MATH MathSciNet
    33.Xue Y, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with dirichlet process priors. J Mach Learn Res 8:35–63MATH MathSciNet
    34.Yang H, Lyu MR, King I (2013) Efficient online learning for multitask feature selection. ACM Trans Knowl Discov Data 7(2):6:1–6:27CrossRef
    35.Zhang Y, Yeung D-Y (2010) A convex formulation for learning task relationships in multi-task learning. In: UAI, pp 733–442
  • 作者单位:Budhaditya Saha (1)
    Sunil Gupta (1)
    Dinh Phung (1)
    Svetha Venkatesh (1)

    1. Center for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, Australia
  • 刊物类别:Computer Science
  • 刊物主题:Information Systems and Communication Service
    Business Information Systems
  • 出版者:Springer London
  • ISSN:0219-3116
文摘
Prognosis, such as predicting mortality, is common in medicine. When confronted with small numbers of samples, as in rare medical conditions, the task is challenging. We propose a framework for classification with data with small numbers of samples. Conceptually, our solution is a hybrid of multi-task and transfer learning, employing data samples from source tasks as in transfer learning, but considering all tasks together as in multi-task learning. Each task is modelled jointly with other related tasks by directly augmenting the data from other tasks. The degree of augmentation depends on the task relatedness and is estimated directly from the data. We apply the model on three diverse real-world data sets (healthcare data, handwritten digit data and face data) and show that our method outperforms several state-of-the-art multi-task learning baselines. We extend the model for online multi-task learning where the model parameters are incrementally updated given new data or new tasks. The novelty of our method lies in offering a hybrid multi-task/transfer learning model to exploit sharing across tasks at the data-level and joint parameter learning.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700