基于混合线性模型进行遗传数据分析的异常值检测方法
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
利用混合线性模型进行遗传数据分析对于统计学家和遗传学家来说都是一种挑战,因为无论是线性、二次性还是似然估计方法都会在很大程度上受到自变量或依变量中的异常数值的干扰。要了解异常值对分析结果的影响,唯一的方式是通过反复地数据质量鉴定和模型优化。基于上述考虑,本研究借助于MINQUE(最小二次范数无偏估计)和AUP(调整的无偏预测)方法(表示为:方法Ⅰ),提出了利用混合线性模型进行遗传数据分析的异常值检测方法,并将该方法与基于EM算法和BLUP(最佳线性无偏预测)的方法(表示为:方法Ⅱ)进行比较,然后通过两个实例分析来验证方法。
     本研究首先利用一个常用的遗传模型(包括品种、年份和地点)来演示该方法,并引入一组统计量来评价异常值对分析结果的影响程度,如:Cook距离(CD(β)),Andrews-Pregibon统计量(AP),Cook-Weisberg统计量(CW)和方差比例(VR)是用来评价某个数据点对混合线性模型种固定效应的影响;而Cook距离(CD(e))是用来评价某个数据点对随机效应的影响。采用C++编程语言编写了计算机模拟程序,通过蒙特卡罗模拟方法产生模拟数据,随机设定若干异常值,并运用本研究提出的方法来检测异常值,来检验方法的有效性和可靠性。结果表明,利用上述的异常值评价指标,方法Ⅰ和方法Ⅱ都能够检测到模拟数据中人为设定的异常值,两者具有相似的异常值检测能力。
     此外,本研究还运用方法Ⅰ和方法Ⅱ对不含有异常值的数据进行分析,来比较两种方法的假阳性率。结果表明,与方法Ⅱ相比,利用方法Ⅰ所得到的异常值评价指标更加平稳,因此,方法Ⅰ在异常值检测方面更加稳健。另外,在模拟数据中,针对特定品种、年份和地点的组合设定异常值。大多数情况下,方法Ⅰ和方法Ⅱ都能检测到这类异常值,对于有些例子,方法Ⅰ能够具有更强的检测能力,而对于另一下例子,方法Ⅱ则表现的更好。主要分析结果可总结如下:
     1)本研究提出的方法可以较好地检测出混合线性模型中的异常表型值。如果模型中只存在少量离散的异常观察值,无论用方法Ⅰ还是用方法Ⅱ,都能检测到这些异常值。但如果一个品种在同一地点、同一年份存在多个异常值,则无法检测到这些异常值,反正会将正确的观察值判定为异常值。
     2)基于上述方法,本研究采用C++编程语言编写了一套计算机程序,用于混合线性模型的遗传数据分析,检测异常观测值,并根据统计检验P值的大小来排列异常值。这套程序也可以提供模型中方差分量的估计值和随机效应的预测值。
     3)在常用遗传模型的分析结果中,有些值异常值会由于其他异常值的掩盖而无法被检测出来,而有些正常的观察值则会由于其它多个异常值的影响而被误认为是异常值。
     4)在常用遗传模型的分析实例中,异常值的存在可能会严重影响固定效应的估计和随机效应的预测,而去掉这些异常值之后,则可能在很大程度上改进模型的参数估计。对于QTL定位数据,去除异常值之后,可以检测到额外的QTL,并能改进遗传率的估计。两个实例分析的结果都表明,去除异常值之后,都能改进模型的参数估计,当然,我们并不能武断地认为这些去除异常值完全没有生物学意义。
     5)另外,我们可以将本项目提出的方法拓展到复杂的遗传模型,如:加显模型,加显-母体效应模型等,来分析异常值对遗传效应以及非遗传效应的影响。另外,我们也可以将该方法应用于基因芯片数据分析,来检测芯片数据采集过程中由于机器校准、数据输入以及编码造成的异常数据。
Mixed linear models for genetic data analysis is one of the most challenging problems for statisticians as well as geneticists, because it traditionally focused on linear, quadratic and the likelihood estimation methods which are not robust to aberrant cases in response as well as in the factor space. Vibrant inspection, through quality data check and model specification is the only way in understanding the effect of unusual data points on the results of analysis. Keeping this notion in mind, the present study was conducted to propose a technique in the framework of adjusted unbiased prediction (AUP) via minimum norm quadratic unbiased estimation (MINQUE) method (say, Method-I) for detection of unusual data points in mixed linear models for genetic data analysis. The proposed method was compared with the best linear unbiased prediction (BLUP) via expectation and maximization (EM) algorithm (called, Method-II) for checking its validity. In addition, to address the consequence of influential observations and outliers in biological research to two real data sets.
     A general genetic model was considered to illustrate the proposed method and to compare it with the existing methods by taking into account various influence diagnostic statistics. Four influence diagnostic statistics i.e. the analogue of Cook distance (CD(β)), Andrews-Pregibon statistic (AP) , Cook-Weisberg statistic (CW) and variance ratio (VR) were applied for detecting influential data points influencing the fixed affects of a mixed linear model; while the analogue of Cook distance (CD(e)) was used for inspecting the influential data points affecting the random components of the aforementioned model. To check the efficacy and reliability of the proposed method, Monte Carlo simulations were conducted for variable setting of aberrant observations in the phenotype data of a general genetic model. All these simulations were performed by a program written in C++ programming language. It was not rigorously proved that Method-I perform better as compared to Method-II and vice versa. Almost the same detection ability and trends regarding the presence of aberrant observations in the response were recorded from both the methods, using the aforementioned influence diagnostic statistics for the influence of i-th data point influencing the fixed and random components of a mixed linear model.
     In the present study, both the methods were compared for the false positive rate by taking a clean data set. The values of each influence diagnostic statistics for the influence of fixed and random components of a general genetic model (mixed linear model) were more clustered under the Method-I as compared to Method-II. It indicates the robustness of a proposed method (Method-I) in the presence of unusual observations and built our confidence that it will perform better in identifying aberrant observations. In simulation, for different perturbation in the phenotype data with regard to various genotype(s), location(s) and year(s), it was observed that our approach showed the same trend, very nice resemblance and in agreement with the Method-II under a variety of influence diagnostic statistics. However, in some of the situations, Method-I showed larger magnitudes for some of the influence statistics and vice versa.
     The main results from the simulations and the real data sets are summarized as follow:
     1. Our approach is verified to perform well in identifying the aberrant observation in the response vector of mixed linear model, if exists. If their is only one aberrant observation in the phenotype data, regarding any genotype corresponding to either location or year, it could be successfully detected using either of the influence diagnostic statistics under both the methods. If their exist multiple influential observations in the phenotype data of a general genetic model, some of them could be effectively detected by both the methods while for others, the influence diagnostic statistics will show some sort of noise.
     2. A program written in C++ programming language is developed to identify the influential observations and outliers in the data analysis of a general genetic experiment in the framework of mixed linear model. The program also provides the estimates of variance components and prediction of random effects involved in the model. In addition, the significance (P-value) of each individual observation in a data set.
     3. The results of general genetic model, analyzed in the framework of mixed linear model showed both the masking and swamping effects in the presence of multiple unusual data points in the phenotype values.
     4. In worked example (general genetic experiment), it was observed that the presence of influential observations and outliers can badly distort the estimates of variance components and prediction of random effects (breeding values). The removal of these data points can bring drastic change in the parameters' estimates of a mixed linear model and provide useful results. In QTL mapping data, the results demonstrate that clean data set give ways in identifying additional QTLs with individual effects; and improved estimates of phenotypic variation (heritability), and particularly that of residuals can be obtained in the absence of influential observations and outliers. In general, it was observed, in both the data sets analyzed, that the removal of influential observations and outliers can bring substantial change in the estimates of various parameters of a mixed linear model. However, it is not claimed that biologically outliers and influential observations may not be good data points.
     5. The method can be easily extended to more complex genetic models i.e. additive dominance, additive dominance maternal models etc. for studying the effect of unusual data points on variable genetic and non-genetic effects involved in the mixed linear model. In addition, it can also be used in microarray data analysis based on mixed linear model approach to identify the hidden peculiarities caused by machine or data entry or recording errors, or might be possibility of differentially expressed (not expressed) genes.
引文
1. Airy, G. B., 1856. Letter from Professor Airy, Astronomer Royal, to the Editor, Astronomical Journal, 90: 137-138.
    2. Allison, D. B., Neale, M. C., Zannolli, R., Schork. N.J., Amos, C.I., and Blangero, J., 1999. Testing the robustness of the likelihood-ratio test in a variance-component quantitative-trait loci-mapping procedure. Am J Hum Genet., 65: 531-544.
    3. Allison, D. B., Ferna'ndez, J. R., Heo, M., and Beasley, T. M., 2000. Testing the robustness of the new Haseman-Elston quantitative-trait loci-mapping procedure. Am J Hum Genet., 67: 249-252.
    4. Amos, C. I., Zhu, D. K., and Boerwinkle, E., 1996. Assessing genetic linkage and association with robust components of variance approaches. Ann Hum Genet., 60:143-160.
    5. Andrews, D. F., and Pregibon, D., 1978. Finding the outliers that matters. J. Roy. Stat. Soc. Ser. B, (40): 85-93.
    6. Anscombe, F. J., and Guttman, I., 1960. Rejection of outliers. Technometrics, 2(2): 123-147.
    7. Banerjee, M., and Frees, E. W., 1997. Influence diagnostics for linear longitudinal models. J Am. Statist. Ass., 92: 999-1005.
    8. Belsley, D. A., Kuh, E. and Welsch, R. E., 1980. Regression Diagnostics: Identifying Influential Data and Source of Collinearity. John Willey and Sons.
    9. Beckman, R. J., and Trussel, H. J., 1974. The distribution of an arbitrary studentized residual and the effects of updating in multiple regression. Journal of the American Statistical Association, 69: 199-201.
    10. Beckman, R., and Cook, R. D., 1983. Outlier...s. Technometrics, 25(2): 119-149.
    11. Beckman, R. J., Nachtsheim, C. J. and Cook, R. D., 1987. Diagnostics for mixed-model analysis of variance. Technometrics, 2(4): 413-426.
    12. Bendre, S. M., and Kale, B. K., 1985. Masking effect on tests for outlier in exponential models. Journal of American Statistical Association, 80: 1020-1025.
    13. Bendre, S. M., and Kale, B. K., 1987. Masking effect on tests for outliers in normal samples. Biometrika, 74(4): 891-896.
    14. Bernoulli, D., 1777. The most probable choice between several discrepant observations and the formation thereform of the most likely induction. In C.G. Allen (1961), Biometrika, 48:3-13
    15. Callanan, T. P., and Harville, D. A., 1989. Some new algorithms for computing maximum likelihood estimates of variance components. In Computer Science and Statistics: Proceedings of the 21st Symposium on the interface (K. Berk and L. Malone, eds.), 435-444. Amer. Stat. Assoc, 1429 Duke Street, Alexandria, Virginia.
    16. Cao, G. Q., Zhu, J., He, C. X., Gao, Y. M., and Wu, P., 2001. Study on epistatic effects and QTL x environment interaction effects of QTLs for panicle length in rice (Oryza sativa L.). Journal of Zhejiang University (Agric. & Life Sci.), 27(1): 55-61.
    17. Cavanaugh, J. E., and Shang, J., 2005. A diagnostic for assessing the influence of cases on the prediction of random effects in a mixed model. Journal of Data Science, 3: 137-151.
    18. Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A., 1983. Graphical Methods for Data Analysis (The Wadsworth statistics/probability series). Belmont, CA: Wadsworth.
    19. Chatterjee, S. and Hadi, S. A., 1986. Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 3 (1): 379-393.
    20. Chauvenet, W., 1960. A manual of Spherical and Practical Autonomy (Vol. II, 5th Edn.), New York: Dover; Written in 1863.
    21. Chen, G. B., and Zhu, J., 2003. Software for the classical quantitative genetics. Copy Right, Institute of Bioinformatics, Zhejiang University, Hangzhou, China: URL http://ibi.zju.edu.cn/software/qga/index.htm.
    22. Cheng, R., Park, N., Hodge, S. E, and Juo, S. H. H., 2003. Comparison of the linkage results of two phenotypic constructs from longitudinal data in the Framingham Heart Study: analyses on data measured at three time points and on the average of three measurements. Proceeding of BMC Genetics, URL http://www.biomedcentral.com/1471-2156/4/s1/S20.
    23. Christensen, R., Pearson, L. M. and Johnson, W., 1992a. Case-deletion diagnostics for mixed models. Technometrics, 34(1): 38-45.
    24. Christensen, R., Johnson, W., and Pearson, L. M., 1992b. Prediction diagnostics for spatial linear models. Biometrika, 79(3): 583-591.
    25. Chu, T. M., and Weir, B. S., Wolfinger, R. D., 2004. Comparison of Li-Wong and loglinear mixed models for the statistical analysis of oligonucleiotide arrays. Bioinformatics, 20(4): 500-506.
    26. Cook, R. D., 1977. Detection of influential observations in linear regression. Technometrics, 19: 15-18.
    27. Cook, R. D., and Weisberg, S., 1982. Residual and Influence in Regression. Chapman and Hall.
    28. Cook, R. D., 1986. Assessment of local influence (with discussion). J. Roy. Stat. Soc. Ser. B, (48): 133-169.
    29. Corbeil, R. R., and Searle, S. R., 1976a. Restricted Maximum Likelihood (REML) estimation of variance components in the mixed model. Technometrics, 18: 31-38.
    30. De Gruttola, V., Ware, J. H., and Louis, T. A., 1987. Influence analysis of generalized least squares estimators. Journal of the American Statistical Association, 82:911-917.
    31.Demidenko, E., 2004. Mixed Models Theory and Application. John Willey and Sons, Inc., Hobokcn, New Jersey.
    32. Demidenko, E., and Stukel, T. A., 2005. Influence analysis for linear mixed-effects models. Statist. Med., 24: 893-909.
    33. Dempster, A. P., Laird, N. M., and Rubin, D. B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B, (39-1): 1-38.
    34. Diao, G., and Lin, D.Y., 2005. A powerful and robust method for mapping quantitative trait loci in genetic pedigrees. Am. J. Hum., 77: 97-111.
    35. Doerge, R. W., and Churchill, G. A., 1996. Permutation tests for multiple loci affecting a quantitative character. Genetics, 142:285-294.
    36. Draper, N., and Smith, H., 1998. Applied Regression Analysis. John Willey & Sons. New York.
    37. Edgeworth, F. Y., 1887. On discordant observations. Philosophical Magazine, 23 (5): 364-375.
    38. Feingold, E., 2001. Methods for linkage analysis of quantitative trait loci in humans. Theor Popul Biol., 60:167-180.
    39. Feingold, E., 2002. Regression-based quantitative-trait-locus mapping in the 21st century. Am J Hum Genet., 71: 217-222.
    40. Fellner, W. H., 1986. Robust estimation of variance components. Technometrics, 28: 51-60.
    41. Fernandes, E., Pacheco, A., and Penha-Goncalves, C., 2007. Mapping of quantitative trait loci using the skew-normal distribution. J. Zhejiang Univ. Sci. B, 8(11): 792-801.
    42. Filzmoser, P., Maronna, R., and Werner, M., 2008. Outlier identification in high dimensions. Computational Statistics and Data Analysis, 52: 1694-1711.
    43. Fisher, R. A., 1918. The correlation between relatives on the supposition of Mendelian inheritance. Trans. Royal Soc. Edinburgh, 52: 399-433.
    44. Fisher, R. A., 1925. Statistical methods for research workers. Hafner, New York.
    45. Fisher, R. A., 1935. The designs of experiments. 8th Ed. Hafner, New York.
    46. Fisher, R. A., 1956. Statistical methods and scientific inference. 13~(th) Edn. Hafner, New York.
    47. Fung, W. K., Zhu, Z. Y., Wei, B. C., and He, X., 2002. Influence diagnostics and outlier tests for semiparametric mixed models. J. Roy. Stat. Soc. Ser. B, (64-3): 565-579.
    48. Galton, F., 1889. Natural inheritance. Macmillan, London.
    49. Geller, F., Dempfle, A., and Gorg, T., 2003. Genome scan for body mass index and height in the Framingham Heart Study. BMC Genet Suppl., 4:S91
    50. Gill, P. E., Murray, W. and Wright, M. H., 1981. Practical optimization. Academic Press, New York.
    51. Glaisher, J. W. L., 1873. On the rejection of discordant observations. Monthly Notices of the Royal Astronomical Society, 23: 391-402.
    52. Glaisher, J. W. L., 1874. Note on a paper by Mr. Stone, on the rejection of discordant observations. Monthly Notices of the Royal Astronomical Society, 34: Pp. 251.
    53. Goldberger, A. S., 1962. Best linear unbiased predictors in generalized linear regression model. Journal of American Statistical Association, 57: 369-375.
    54. Hadi, A. S., and Simonoff, J. S., 1993. Procedures for the identification of multiple outliers in linear models. Journal of the American Statistical Association, 88(424): 1264-1272.
    55. Haley, C. S., and Knott, S. A., 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity, 69: 315-324.
    
    56. Hanson, W.D., 1964. Genotype-environment interaction concepts for field experimentation. Biometrics, 20(3): 540-552.
    
    57. Hartley, H. D., and Rao, J. N. K., 1967. Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika, 54: 93-108.
    58. Harville, D. A., 1977. Maximum likelihood approaches to variance component estimation and related problems. Journal of the American Statistical Association, 72: 320-340.
    59. Haslett, J., 1999. A simple derivation of deletion diagnostics for the general linear model with correlated errors. J. Roy. Stat. Soc. Ser. B, 61: 603-609.
    60. Haslett, J., and Hayes, K., 1998. Residuals for the linear model with general covariance structure. J. Roy. Stat. Soc. Ser.B, 60: 201-215.
    61. Hayat, Y., Salahuddin, Mahmood, Q., Islam, E., and Yang, J., 2007a. Comparative study of outliers based on statistical methods to evaluate and select the optimum regression model for fertilizers utilization. Scientific Research Monthly, 3: 81-84.
    62. Hayat, Y., Yang, J., and Zhu, J., 2007b. Diagnostic measures in genetic data analysis under the framework of a mixed model. Abstracts, the 3~(rd) international conference of quantitative genetics, Zhejiang University, Hangzhou, China. August 19-24, 2007.
    63. Hayat, Y., Yang, J., and Zhu, J., 2008a. Mixed model approaches for detecting influential observations in genetic analysis. Journal of Genetics and Genomoics, (Submitted).
    64. Hayat, Y., Yang, J., Xu, H. M., Zhu, J. (2008b). Influence of outliers on QTL mapping for complex traits. Journal of Zhejiang Univ. Sci. B, (Submitted).
    65. Henderson, C. R., 1950. Estimation of genetic parameters (abstract). Ann. Math. Statist., 21: 309-310.
    66. Henderson, C. R., 1963. Selection index and expected genetic advance. In statistical genetics and plant breeding, 141-163. Nat. Acad. Sci., Nat. Res. Council, Publication 982, Washington, DC.
    67. Henderson, C. R., 1973. Sire evaluation and genetic trends. In proceedings of the animal breeding and genetics symposium in honor of Dr. J. L. Lush, pp. 10-14. American Society of Animal Science, Champain, IL.
    68. Hoaglin, D. C., and Welsch, R., 1978. The Hat matrix in regression and ANOVA. The American Statistician, 32: 17-22.
    69. Hocking, R. R., 1983. A diagnostic tool for mixed models with applications to negative estimates of variance components. Proceedings of the Eighth Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc., 8, 711-716.
    70. Hocking, R. R., Green, G. W., and Bremer, R. H., 1986. Estimation of variance components in mixed factorial models including model-based diagnostics. Paper presented at the annual Joint Statistical Meeting, Chicago, August, 1986.
    71. Huber, P. J., 1981. Robust Statistics. New York: John Willey and Sons, Inc.
    72. Hurtado, G. I., and Gerig, T. M., 1994. Detection of influential observations in linear mixed models, Joint Statistical Meeting of the ASA, Toronto, Canada.
    73. Rodriguez, G. I. H., 1993. Detection of influential observations in linear mixed models. PhD thesis, Department of Statistics, Raleigh, USA.
    74. Hilden-Minton, J. A., 1995. Multilelvel diagnostics for hierarchical linear models: Approach and example from NELS-88, paper presented at the meeting of the American Education Resaerch Association, San Francisco, California.
    75. Jansen, R. C., and Stam, P., 1994. High resolution of quantitative traits into multiple loci via interval mapping. Genetics, 136: 1447-1455.
    76. Jennrich, R. J., and Schluchter, M. D., 1986. Unbiased repeated measures models with structured covariance matrices. Biometrics, 42: 805-820.
    77. Kackar, R. N., and Harville, D. A., 1981. Unbiasedness of two-stage estimation and prediction for mixed linear models. Comm. Stat. Theor. Math. A, 10: 1249-1261.
    78. Kearsey, M. J., 1998. The principles of QTL analysis (a minimal mathematics approach). Journal of Experimental Botany, 49(327): 1619-1623.
    79. Kullback, S., and Leibler, R. A., 1951. On information and sufficiency. Annals of Mathematical Statistics, 22: 79-86.
    80. Lance, C. E., Stewart, A. M., and Carretta, T. R., 1996. On the treatment of outliers in cognitive and psychomotor test data. Military Psychology, 8(1): 43-58.
    81. Lander, E. S., and Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121: 185-199.
    82. Lesaffre, E., and Verbeke, G., 1998. Local influence in linear mixed models. Biometrics, 54(2): 570-582.
    83. Li, C., and Wong, W. H., 2000. Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. PNAS, 98(1): 31-36.
    84. Little, J. K., 1985. Influence and a quadratic form in the Andrews-Pregibon statistic. Technometrics, 27(1): 13-15.
    85.Littell, R. C., Milliken, G. A., Stroup,.W. W., Wolfinger, R. D., and Schabenberger, O., 2006. SAS for Mixed Models, 2nd Edn, Copyright, SAS Institute Inc., Cary, NC, USA.
    86. Lynch, M., and Walsh, B., 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, Massachusetts, USA.
    87. Martin, R. J., 1992. Leverage, influence and residuals in regression models when observations are correlated. Communication in Statistics, Theory and Methods, 21: 1183-1212.
    88. Mason, R. L., Gunst, R. E, and Hess, J. L., 2003. Statistical Design and Analysis of Experiments: With Application to Engineering and Science. John Wiley & Sons, Inc., Hoboken, New Jersey, Canada.
    89. Meyer, K., 1985. Maximum likelihood estimation of variance components for a multivariate mixed model with equal design matrices. Biometrics, 41: 153-165.
    90. Miller, R.G., 1974. An unbalanced jackknife. Annals of Statistics, 2: 880-891.
    91. Newcomb, S., 1886. A generalized theory of the combination of observations so as to obtain the best result. American Journal of Mathematics, 8: 343-366.
    92. Olive, D. J., 2005. Applied Robust Regression. South Illinois University, Carbondale, IL 62901-4408.
    93. Ofversten, J., 1998. Assessing sensitivity of agricultural crop varieties. Journal of Agricultural Biological and Environmental Statistics, 3(1): 37-47.
    94. Patterson, A.H., and Thompson, R., 1971. Recovery of inter-block information when block sizes are unequal. Biometrika, 58: 545-554.
    95. Pearson, K., 1903. Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability and correlation of organs. Phill. Trans. Royal Soc. Lond. A, 200: 1-66.
    96. Peirce, B., 1852. Criterion for the rejection of doubtful observations. Astronomical Journal, 2: 161-163.
    97. Pena, D., 1986. Discussion of paper by R. D. Cook. J. Roy. Stat. Soc. Ser. B, 48: 164-165.
    98. Perez-Enciso, M., and Toro, M.A., 1999. Robust QTL effect estimation using the Minimum Distance method. Heredity, 83(3): 347-353.
    99. Piepho, H. P., 1994. Best linear unbiased prediction (BLUP) for regional trials: a comparison to additive main effects and multiplicative interaction (AMMI) analysis. Theor. and Appl. Genetics, 89: 647-654.
    100. Piepho, H. P., 2000. A mixed model approach to mapping quantitative trait loci in barley on the basis of multiple environment data. Genetics, 156: 2043-2050.
    101.Pregibon, D., 1981. Logistic regression diagnostics. Annals of Statistics, 9: 705-724.
    102. Rao, C. R., 1970. Estimation of heteroscedastic variances in linear models. Journal of American Statistical Association, 65: 161-172.
    103. Rao, C. R., 1971a. Estimation of variance and covariance components MINQUE theory. Journal of Multivariate Analysis 1: 257-275.
    104. Rao, C. R., 1971b. Minimum quadratic unbiased estimation of variance components. Journal of Multivariate Analysis 1:445-456
    105. Rao, C. R., 1972. Estimation of variance and covariance components in linear models. Journal of Multivariate Analysis 67: 112-115.
    106. Rousseeuw, P. J. and Leroy, A. M., 1987. Robust Regression and Outlier Detection. New York: Wiley.
    107. SAS., 1999. SAS STAT User's Guide Version 7 and 8, SAS Institute Inc, Cary, NC USA. Pp. 2118.
    108. Schabenberger, O., 2004. Mixed model influence diagnostics. Paper 189-29, SUGI 29 Proceedings 9-12, Montreal, Canada.
    109. Searle, S. R., 1971. Topics in Variance Component Estimation. Biometrics, 27: 1-76.
    110. Searle, S. R., Casella, G., and McCulloch, C. E., 1992. Variance Components. John Wiley & Sons, New York.
    111. Stone, E. J., 1873. On the rejection of discordant observations. Monthly Notices of the Royal Astronomical Society, 34: 9-15.
    112. Strug, L., Sun, L., and Corey, M., 2003. The genetics of cross-sectional and longitudinal body mass index. BMC Genet Suppl., 4:S14.
    113. Tang, H-K., and Siegmund, D., 2001. Mapping quantitative trait loci in oligogenic models. Biostatistics, 2:147-162.
    114. Taplin, R. H., and Raftery, A. E., 1994. Analysis of agricultural field trials in the presence of outliers and fertility jumps. Biometrics, 50(3): 764-781.
    115. Tilquin, P., Coppieters, W., Elsen, J. M., Lantier, F., Moreno, C., and Baret, P. V., 2001. Statistical powers of QTL mapping methods applied to bacteria counts. Genet. Res. Camb., 78: 303-316.
    116. Thomas, W., 1990. Influence on the confidence regions for regression coefficients in generalized linear models. American Statistical Association, 85(410): 393-397.
    117. Verbeke, G., and Molenberghs, G., 2000. Linear Mixed Models for Longitudinal Data. Springer-Verlag, New York.
    118. Wang, D. L., Zhu, J., Li, Z. K., and Paterson, A. H., 1999. Mapping QTLs with epistatic effects and QTL x environment interactions by mixed linear model approaches. Theor. and Appl. Genet., 99:1255-1264.
    119.Waternaux, C., Laird, N. M., and Ware, J. H., 1989. Methods for analysis of longitudinal data: Bloodlead concentrations and cognitive development. Journal of the American Statistical Association, 84: 33-41.
    120. Wei, B. C., 1998. Exponential family nonlinearity models. Singapore: Springer.
    121. Williams, R. W., Airey, D. C., Kulkarni, A., Zhou, G., and Lu, L., 2001. Genetic dissection of the olfactory bulbs of mice: QTLs on four chromosomes modulate bulb size. Behav. Genet., 31: 61-77.
    122. Wintlock, J., 1856. On Professor Airy's objections to Peirce's criterion. Astronomical Journal, 4: 145-147.
    123. Yan, W., and Rajcan, I., 2003. Prediction of cultivar performance based on single-versus multiple-year tests in soybean. Crop Sci., 43: 549-555.
    124. Yang, J., Zhu, J., and Williams, R. W., 2007. Mapping genetic architecture of complex trait in experimental populations. Bioinformatics, 23:1527-1536.
    125. Zeng, Z. B., 1994. Precision mapping of quantitative trait loci. Genetics, 136: 1457-1468.
    126. Zewotir, T., and Galpin, J.S., 2004. The behaviour of normal plots under non-normality for mixed models. South African Statist. J., 38: 115-138.
    127. Zewotir, T. and Galpin, J. S., 2005. Influence diagnostics for linear mixed models. Journal of Data Science, 3: 153-177.
    128. Zewotir, T., and Galpin, J. S., 2006. Evaluation of linear mixed model case deletion diagnostic tools by Monte Carlo simulation. Communications in Statistics—Simulation and Computation, 35: 645-682.
    129. Zewotir, T., and Galpin, J. S., 2007. A unified approach on residuals, leverages and outliers in the linear mixed model. Test, 16: 58-75.
    130. Zhu, J., 1992. Mixed model approaches for estimating genetic variances and covariances. Journal of Biomathematics, 7(1): 1-11.
    131. Zhu, J., 1993. Methods of predicting genotype value and heterosis for offspring of hybrids (Chinese). Journal of Biomathematics, 8(1): 32-44.
    132. Zhu J., Xu, F. H., and Lai, M. G., 1993. Analysis methods for unbalanced data from regional trials of crop variety, analysis for single trait. (Chinese). J. of Zhejiang Agric. Univ., 19: 7-13.
    133. Zhu, J., 1994. General genetic models and new analysis methods for quantitative traits. (Chinese). 7. Zhejiang Agric. Univ., 20(6): 551-559.
    134. Zhu, J., and Weir, B. S., 1994a. Analysis of cytoplasmic and maternal effects. I. A genetic model for diploid plant seeds and animals. Theor. and Appl. Genetics, 89: 153-159.
    135. Zhu, J., and Weir, B.S., 1994b. Analysis of cytoplasmic and maternal effects. II. Genetic model for triploid endosperms. Theor. and Appl. Genetics, 89: 160-166.
    136.Zhu, J., and Weir, B.S., 1996. Diallel analysis for sex-linked and maternal effects. Theoretical Applied and Genetics, 92: 1-9.
    137. Zhu, J., 1997. Analysis Methods for Genetic Models. Beijing: Agricultural publication house of China.
    138.Zhu, J., Wang, G. J., and Zang, R. C., 1997. Genetic analysis on gene effects and GE interaction effects for kernel nutrient quality traits of Upland cotton.Journal of Biomathematics, 12(2): 111-120.
    139. Zhu, J., 1998. Mixed model approaches of mapping genes for complex quantitative traits. In Proceedings of Genetics and Crop Breeding of China. pp. 19-20. Edited by Wang, L. Z. and Dai, J. R., Chinese Agricultural Science and Technology Publication House, Beijing.
    140. Zhu, J., 1999. Mixed model approaches of mapping genes for complex quantitative traits. J. Zhejiang Univ. (Natural Science), 33(3): 327-335.
    141. Zhu, J., 2000. Mixed linear model approaches for analyzing genetic models of complex quantitative traits. Journal of Zhejiang University Science, 1(1): 78-90