Extracting class diagram from hidden dependencies in data set

Bogumiła Hnatkowska, Zbigniew Huzar, Lech Tuzinkiewicz


A conceptual model is a high-level, graphical representation of a specic do-
main, presenting its key concepts and relationships between them. In particular, these dependencies can be inferred from concepts' instances being a part of big raw data les. The paper aims to propose a method for constructing a conceptual model from data frames encompassed in data les. The result is presented in the form of a class diagram. The method is explained with several examples and veried by a case study in which the real data sets are processed. It can also be applied for checking the quality of the data set.


conceptual model; class diagram; UML; data retrieval; raw data; csv 2019/10/28;

Full Text:



Data Cleansing: Care for most valuable business asset. https://www.hitechbpo.com/data-cleansing.php.

Embley D., Campbell D., Jiang Y., et al.: Conceptual-model-based data extraction from multiple-record Web pages. In: Data & Knowledge Engineering, vol. 31, pp. 227-251, 1999. URL http://dx.doi.org/10.1016/S0169-023X(99)


Embley D., Kurtz B., Woodeld S.: Object-Oriented Systems Analysis: A Model-Driven Approach. Prentice Hall, USA, 1992.

Embley D., Liddle S.: Conceptual Modeling, chap. Big Data - Conceptual Modeling to the Rescue. Springer, Heidelberg, 2013.

Hermans F., Pinzger M., van Deursen A.: ECOOP 2010 - Object-Oriented Programming, chap. Automatically Extracting Class Diagrams from Spreadsheets, pp. 52-75. Springer, Heidelberg, 2010.

Hnatkowska B., Huzar Z., Tuzinkiewicz L.: Integrating research and practice in software engineering, chap. A data-driven conceptual modeling, pp. 97-109. Springer, Cham, 2020.

Kung C., Solvberg A.: Activity Modeling and Behavior Modeling. In: Proc. Of the IFIP WG 8.1 Working Conference on Information Systems Design Methodologies: Improving the Practice, pp. 145-171. North-Holland Publishing Co., Amsterdam,

The Netherlands, The Netherlands, 1986. ISBN 0-444-70014-5. URL


Liu J., Li J., Liu C., Chen Y.: Discover Dependencies from Data - A Review. In: IEEE Transactions on Knowledge and Data Engineering, vol. 24, pp. 251-264,

URL http://dx.doi.org/10.1109/TKDE.2010.197.

McKinney W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition. O'Reilly Media, USA, 2017.

Ross R.: Conceputal model vs. Concept Model: Not the Same! In: Business Rules Journal, vol. 20, 2019. http://www.brcommunity.com/a2019/b977.html.

Svolba G.: Data Quality for Analytics Using SAS. SAS Institute Inc., USA, 2012.

Teixeira R., Amaral V.: Software Technologies: Applications and Foundations. STAF 2016, chap. On the Emergence of Patterns for Spreadsheets Data Arrangements,

pp. 333-345. Springer, Cham, 2016.

Tijerino Y., Embley D., Lonsdale D., et al.: Towards Ontology Generation from Tables. In: World Wide Web, vol. 8, pp. 261-285, 2005.

Veerman E., Moss J., Knight B., Hackney J.: SQL Server 2008. Integration Services. Problem-Design-Solution. O'Reilly Media, USA, 2010.

DOI: https://doi.org/10.7494/csci.2020.21.2.3483


  • There are currently no refbacks.