Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A reflection on the impact of model mining from GitHub
Universidad Rey Juan Carlos, Spain.
TU Eindhoven, Netherlands.
RISE Research Institutes of Sweden, Digital Systems, Mobility and Systems.ORCID iD: 0000-0001-5656-9253
Universität Rostock, Germany.
2023 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 164, article id 107317Article in journal (Refereed) Published
Abstract [en]

Context: Since 1998, the ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems (MODELS) has been studying all aspects surrounding modeling in software engineering, from languages and methods to tools and applications. In order to enable empirical studies, the MODELS community developed a need for having examples of models, especially of models used in real software development projects. Such models may be used for a range of purposes, but mostly related to domain analysis and software design (at various levels of abstraction). However, finding such models was very difficult. The most used ones had their origin in academic books or student projects, which addressed “artificial” applications, i.e., were not base on real-case scenarios. To address this issue, the authors of this reflection paper, members of the modeling and of the mining software repositories fields, came together with the aim of creating a dataset with an abundance of modeling projects by mining GitHub. As a scoping of our effort we targeted models represented using the UML notation because this is the lingua franca in practice for software modeling. As a result, almost 100k models from 22k projects were made publicly available, known as the Lindholmen dataset. Objective: In this paper, we analyze the impact of our research, and compare this to what we envisioned in 2016. We draw practical lessons gained from this effort, reflect on the perils and pitfalls of the dataset, and point out promising avenues of research. Method: We base our reflection on the systematic analysis of recent research literature, and especially those papers citing our dataset and its associated publications. Results: What we envisioned in the original research when making the dataset available has to a major extent not come true; however, fellow researchers have found alternative uses of the dataset. Conclusions: By understanding the possibilities and shortcomings of the current dataset, we aim to offer the research community i) future research avenues of how the data can be used; and ii) raise awareness of the limitations, not only to point out threats to validity of research, but also to encourage fellow researchers to find ideas to overcome them. Our reflections can also be helpful to researchers who want to perform similar mining efforts.

Place, publisher, year, edition, pages
Elsevier, 2023. Vol. 164, article id 107317
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:ri:diva-66688DOI: 10.1016/j.infsof.2023.107317OAI: oai:DiVA.org:ri-66688DiVA, id: diva2:1794327
Note

The work of G. Robles has been supported in part by the Spanish Ministry of Science and Innovation (PID2022-139551NB-I0).

Available from: 2023-09-05 Created: 2023-09-05 Last updated: 2024-03-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Jolak, Rodi

Search in DiVA

By author/editor
Jolak, Rodi
By organisation
Mobility and Systems
In the same journal
Information and Software Technology
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 61 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf