企业数据合理化方法的研究实践
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
企业信息系统已在大多数企业中普及,企业数据也已经成为企业维持正常运营、积累竞争优势的一大资源。然而日益变化的企业内外部环境注定了企业对于数据的需求是一个不断增长的长期过程,而企业内部各组织、流程和地域之间的复杂关联,也导致企业对于数据的需求往往以分散和片面的形式呈现;同时,企业信息化的阶段性也加速了数据孤岛的一再出现,以上这些因素使得企业数据环境越来越趋向复杂化。
     一个不良的企业数据环境造成的问题是显而易见的:一方面导致企业流程效率下降,信息系统的可信性降低,另一方面为数据仓库和数据挖掘等商务智能的应用带来了较大的风险和成本。目前在类似领域的国内外研究方法主要包括数据质量和数据集成两方面,在数据质量研究中,侧重数据质量的维度定义和度量手段等理论性方法,忽视了对数据适用性的探讨;在数据集成领域侧重对现存异构数据从语义和语法上寻求协调统一,从静态和局部上提供了多种解决方案。本文旨在结合企业实际存在的问题,提出企业数据合理
Enterprise data is now becoming a important resource for business operation and corporation competitiveness development. As a result of changes in business environment, enterprise data requirement evolvement is a long term, ever-lasting progress. Besides, the existence of diverse data requirement between different department, business process and location leads to a distributed data storage and usage. And the fact that corporate information system is largely planned and implemented by phase also brings heterogeneity to data systems. These three factors makes enterprise data environment more and more sophisticated.A Bad-Organized data environment brings out many problems, including decreasing proficiency of business process, reducing the incredibility of information systems and adding cost and risk in data warehousing and data mining projects.
    Current approaches can be classified into two categories: data quality and data integration. Data quality approaches focus on quality dimension definition, measurement and improvement, with less method discussing on "Fitness for Use" of enterprise data;and approaches proposed in data integration literatures aim at solutions for data integrity both semantically and syntactically in heterogonous autonomic and distributed data. These approaches solve inconsistency and quality problem to some degree but often in static and partial way.This thesis proposed a composite approach Enterprise Data Rationalization Approach to fulfill enterprise data requirement and improve both data integrity and quality with dynamic and overall method. Enterprise data rationalization approach separates enterprise core data from operational data, adopting flexible data modeling techniques and central architecture for core data, and implement data enrichment workflow management together meta data repository and service to ensure core data's quality.There are three data rationalization enables in the proposed approach: 1. Core data separation, central data repository and flexibledata modeling tehiniques altogether to meet the
    multi-dimensional and changing data requirement,2. Data enrichment workflow management system to ensure the quality and traceability of data process and provide workflow definition customization to align with process reengineering.3. Meta data repository and services to store and management business and technical knowledge and provide quality assurance.This thesis also gives an example implemented by a multinational company based on proposed approach, illustrating how to combine these three enablers to reengineer and rationalize corporation data environment.
引文
[1] Robert Winter, Bernhard Strauch. A Method for Demand-driven information requirements analysis in data warehousing projects.. Proceedings of the 36th Hawaii international Conference on System Sciences. 2003
    [2] Len Silverston. The Data Model Resource Book-A Library of Universal Data. Models for Al Enterprises. Revised Edition volumel. John Wiley & Sons, Inc. 2001
    [3] Steve Hoberman: Data Modeler's Workbench: Tools and Techniques for Analysis and Design. Joh Wiley & Sons, Inc. 2002
    [4] R. Ehnasri, S. B. Navathe. Fundamentals of Database Systems. Addison-Wesley, Menlo Park, CA, 1994
    [5] A. P. Shcth, J. A. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3), pp183-236, 1990.
    [6] C. Batini, etc. A comparative analysis of methodologies for database schema integration. ACM of Computing Surveys, 18(4), pp 323-364, 1986
    [7] W. Litwin, L. Mark, Nick Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Serveys, 22(3), pp 267-293, 1990
    [8] M. T. Ozsu, P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, San Ysidro, CA, 1999
    [9] J. Wang, F. H. Lochovsky. Data extraction and label assignment for web databases. Proceedings of International Conference on World Wide Web, pp 187-196, 2003
    [10] C. Parent, S. Spaccapietra. Issues and approaches of database integration. CACM, 41(5), pp166-178, 1998
    [11] S. Abiteboul, P. Buneman, D. Suciu. Data on the Web - From relations to semistructured data and XML. Morgan Kaufann, San Francisco, CA, 2000
    [12] Won Kim, Jungyun Seo. Classifying schematic and data heterogeneity in multi-database system. IEEE Computer, 24(12), pp12-18, 1991
    [13] L. V. S. Lakshmanan, etc. SchemaSQL - a language for interoperability in relational multi-database systems. Proceeding of VLDB Conference. Pp239-250, Morgan Kaufmann, 1996
    [14] S. Raghavan, H. Garcia-Molina. Integrating diverse information management systems: a brief survey. IEEE Data Enginerring Bulletin, 24(4) pp44-52,2001
    [15] V. Kashyap, A. Sheth. Semantic Heterogeneity in global information systems: The role of metadata, context and ontologies. In Mechiael P. Papazoglou and Gunter Sehlageter, editors, Cooperative Information Systems, pp 139-178, Academic Press, San Diego, 1998
    [16] C. Parent, S. Spaecapietra. Issues and approaches of database integration. CACM, 41(5), pp166-178, 1998
    [17] G. Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25(1), pp 3.8-49, 1992
    [18] W. H. Inmon. Building the Data Warehouse. John Wiley and Sons, New York, 1996
    [19] T. J. Wilmering, etc. A metadata architecture for mediated integration of product usage data. AUTOTESTCON 2003. IEEE Systems Readiness Technology Conference. Proceedings, 25(22), pp: 564-575, Sept. 2003
    [20] T. Berners-Lee, J. Hendler, O. Lassila. The semantic web. Scientific American, May 2001
    [21] Simon Parsons. Current Approaches to handling imperfect information in data and knowledge bases. IEEE Transactions on knowledge and data engineering. 8(3)353-372
    [22] D. P. Ballou and H. L. Pazer, Cost/Quality Tradcoffs for Control Procedures in Information System, International Journal of Management Science, 1987, 15(6), pp.509-521
    [23] Richard Y. Wang, M. P. Reddy, Henry B. Kon. Toward Quality Data: An attibute-based approach. Decision Support System. 1995 (13), P349-372
    [24] A. Motro and I. Rakov. Estimating the quality of data in relational databases. In Proceedings of the 1996 Conference on Information Quality, pp. 94--106, October 1996.
    [25] Bobrowski, M.;Marre, M.;Yankelevich, D. Software Engineering View of Data Quality. European Quality Week. 1998.
    [26] Massimo Mecella, etc. Managing Data Quality in Cooperative Information Systems. Proceddings of the 28th VLDB Conference, Hong Kong, China, 2002
    [27] M. Gertz. Managing Data Quality and Integrity in Federated Databases. In 2nd Annual IFIP TC-11 WG 11.5 Working Conf. on Integrity-and Internal Control in Information Systems. Warrenton, Virginia, November 1998.
    [28] Melgratti, H.;Yankelevieh, D. Tools for Data Quality. Technical Report 99-005
    [29] P. Missier, C. Batini, A Multidimensional model for Information Quality in Cooperative Information Systems, Proes. 8th International Conference on Information Quality, ICIQ 2003, Cambridge
    [30] Moniea Bobrowski, Martina Marre, and Daniel Yankelevich. A homogeneous framework to measure data quality. In Proceedings of the International Conference on Information Quality (IQ), pages 115-124, Cambridge, MA, 1999.
    [31] Ganesan Shankaranarayanan, Mostapha Ziad, and Richard Wang. Managing Data Quality in Dynamic Decision Environments: An Information Product Approach. Journal of Data Management, 2003.
    [32] Helena Galhardas, Daniela Florescu, Dennis Shasha, and Eric Simon. An extensible framework for data cleaning. In Proceedings of the International Conference on Data Engineering (ICDE), San Diego, CA, 2000.
    [33] Yingwei Cui, Jennifer Widom, Janet L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems. 25(2), pp179-227, 2000
    [34] Larry P. English. Improving. Data. Warehouse and Business Information Quality. Wiley Computer Publishing. John Wiley & Sons, Inc. 1999
    [35] Stuart Madnick, Richard Wang, Frank Dravis, Xinping Chen. Improving the Quality of Corporate Household Data: Current Practices and Research Directions. Proceedings. of the Sixth International Conference on Information Quality, November 2001. pp. 92-104.
    [36] Stuart Madnick, Richard Wang, Wei Zhang. A Framework for Corporate Householding. Proceedings of the Seventh International Conference on Information Quality, November 2002, pp. 36-46.
    [37] Stuart Madnick, Richard Wang, Xiang Xian. The Design and Implementation of a Corporate Householding Knowledge Processor to Improve Data Quality. Journal of Management Information & Systems, Winter 2003-4, Volume 20, No.3, pp.41-69.
    [38] David Marco. Building and Managing the Meta Data Repository: A Full Lifecycle Guide. John Wiley & Sons. Inc. 2000
    [39] Understanding Metadata(M). National information Standards Organization (NISO Press) ISBN: 1-880124-62-9
    [40] Ralph Kimball, Laura Reeves. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses. 电子工业出版社 2004
    [41] John Poole, Dan Chang. Common Warehouse Metamodel: An Introduction to the Standard for Data Warehouse Integration. John Wiley & Sons. 2002
    [42] Cheng Hsu, M'hamed Bouziane, Laurie Rattner. Information Resources Management in Heterogeneous, Distributed Environments: A Metadatabase Approach. IEEE Transactions on Software Engineering, Vol 17, No6 1991.6. P604-P625
    [43] M, Jeusfeld, M. Jarke, M. Staudt, C. Quix, and T. List. Application experience with a repository system for information systems development. In Proc. GI-Symposium EMISA, "Development Methods for Information Systems and their Application", Fischbachau, Germany, September 1999. Teubner
    [44] Martin Staudt, Anea Vaduva, Thomas Vetterli. The Role of Metadata for Data Warehousing. Ifi-99.06 Technical Reports.
    [45] Beeher, Jorg, etc. Workflow Application architectures: Classfication and Characteristics of Workflow-based information systems. In: Layna Fischer(ed): Workflow Handbook 2002. Future Strategies. Lighthouse Point, FL2002, pp 39-50.
    [46] Georgakopoulos, D. Homiek, M. Shet, A. An Overview of Workflow Management: From Process Modeling to Workflow Automation Infrastructure. Distributed and Parallel Databases, 3(2), April 1995;pages 119-153
    [47] Workflow Management Coalition. The Workflow Reference Model[WfMC 1003][S]. 1994
    [48] Ulrich Hasenkamp, Wofgang Hilpert. Workflow Management in the light of Emerging Collaborative Applications. Proceeding of Workshops of WM'2001 in Baden-Baden, 14. Mar, 2001.
    [49] P. Missier, G. Lalk, V. Verykios, F. Grillo, T. Lorusso, P. Angeletti, Improving Data Quality in Practice: A Case Study in the Italian Public Administration. Distributed and Parallel Databases International Journal, Kluwer Academic Publishers, March 2003, Vol. 13, no.2,
    [50] IBM Lotus Group. Domino Enterprise Integration, Lotus White Paper. 1998.5

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700