Domain-Specific Ontology Construction and LLMs Fine-tuning for Procurement Knowledge
Main Article Content
Abstract
This paper presents a novel semi-automated approach for creating high-quality datasets through ontology-guided knowledge extraction for domain-specific large language model fine-tuning. We address the challenge of sparse knowledge graphs (KG) generated from traditional triplet extraction methods by developing a hierarchical ontology construction framework applied to procurement domain data. Our methodology begins with procurement-specific filtering of FineWeb data using keyword-based selection, reducing the dataset size by 80%. We used Llama-3.2-3B for data annotation, achieving 3,000 positive and negative samples from 44,000 processed samples, followed by training a BERT-based classifier with an F1 score of 75%. We introduce a semi-manual ontology development approach that combines structured Resource Description Framework (RDF) with targeted large language models (LLMs) prompting for focused graph node expansion. The process involves clustering of extracted nodes to reduce complexity and enable topic-specific investigation. With procurement expert validation, we generated a dataset of 140 question-answer pairs covering key ontology nodes, while rest 460 samples were generated in automated fashion using ontology prompt. Our ontology achieves a Weighted Composite Score (WCS) of 76.42%, indicating high topic coverage across the procurement domain graph. Fine-tuning experiments on Llama-3.2-1B and Llama-3.2-3B models demonstrate improvements validated through blind A/B testing using the DeepEval framework: the fine-tuned Llama-3.2-1B model was preferred over the base model in 78.15% of comparisons for answer relevancy, 77.87% for faithfulness, and 77.95% for factual consistency rate (FCR). The fine-tuned Llama-3.2-3B model showed moderate gains, winning 68.35% for answer relevancy, 72.29% for faithfulness, and 72.36% for FCR.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Akıncı, F. S., & Tuğlular, T. (2025). A contract-driven automated unit test maintenance approach with generative artificial intelligence for backend software projects. Journal of Smart Systems, 4(2), 74-97.
A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, and et al., “The Llama 3 Herd of Models,” arXiv preprint arXiv:2407.21783, 2024. https://doi.org/10.48550/arXiv.2407.21783 .
A. Halike, A. Wumaier, and T. Yibulayin, “Zero-shot relation triple extraction with prompts for low-resource languages,” Applied Sciences, vol. 13, no. 7, p. 4636, 2023. https://doi.org/10.3390/app13074636 .
A. Hogan, E. Blomqvist, M. Cochez, C. D’Amato, G. de Melo, C. Gutiérrez, S. Kirrane, J. E. L. Gayo, R. Navigli, S. Neumaier, A.-C. N. Ngomo, A. Polleres, S. M. Rashid, A. Rula, L. Schmelzeisen, J. Sequeda, S. Staab, and A. Zimmermann, “Knowledge graphs,” ACM Computing Surveys, vol. 54, no. 4, pp. 1–37, 2021. https://doi.org/10.1145/3447772 .
A. Mavridis, S. Tegos, C. Anastasiou, M. Papoutsoglou, and G. Meditskos, “Large language models for intelligent RDF knowledge graph construction: Results from medical ontology mapping,” Frontiers in Artificial Intelligence, vol. 8, p. 1546179, 2025. https://doi.org/10.3389/frai.2025.1546179 .
A. Sattar, M. N. Ahmad, E. S. M. Surin, and A. K. Mahmood, “An improved methodology for collaborative construction of reusable, localized, and shareable ontology,” IEEE Access, vol. 9, pp. 17463–17484, 2021. https://doi.org/10.1109/ACCESS.2021.3054412 .
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, D. Liu, D. Shi, E. Wu, F. Wang, F. Li, G. Chen, G. Zhang, H. Lin, H. Zhou, H. Wang, I. Zhao, J. Chen, J. Li, J. Yang, K. Liu, K. Zhang, L. Sun, L. Wang, L. Li, M. Xu, M. Zhang, N. Li, P. Wang, Q. Zhao, R. Liu, S. Chen, S. Zhang, T. Li, T. Yang, W. Zhang, W. Xu, X. Liu, X. Wang, Y. Zhao, Y. Qiu, Z. Liu, Z. Chen, Z. Qiu, “Qwen3 Technical Report,” arXiv preprint arXiv:2505.09388, 2025. https://doi.org/10.48550/arXiv.2505.09388 .
C. Peng, F. Xia, M. Naseriparsa, and F. Osborne, “Knowledge Graphs: Opportunities and Challenges,” Artificial Intelligence Review, vol. 56, no. 11, pp. 13071–13102, 2023. https://doi.org/10.1007/s10462-023-10465-9 .
D. Doumanas, A. Soularidis, D. Spiliotopoulos, C. Vassilakis, and K. Kotis, “Fine-tuning large language models for ontology engineering: A comparative analysis of GPT-4 and Mistral,” Applied Sciences, vol. 15, no. 4, p. 2146, 2025. https://doi.org/10.3390/app15042146 .
D. Wesslund, V. Stenström, P. Linde, and A. Holmberg, “LLM based triplet extraction from financial reports,” arXiv preprint arXiv:2602.11886, 2026. https://doi.org/10.48550/arXiv.2602.11886 .
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, 2021. https://doi.org/10.48550/arXiv.2106.09685 .
G. Penedo, H. Kydlíček, L. Ben Allal, A. Lozhkov, M. Mitchell, C. Raffel, L. von Werra, and T. Wolf, “The FineWeb datasets: Decanting the web for the finest text data at scale,” arXiv preprint arXiv:2406.17557, 2024. https://doi.org/10.48550/arXiv.2406.17557 .
H. Jayadianti, “Mapping relational databases to RDF using direct mapping for Indonesian movies ontology,” in MATEC Web of Conferences, vol. 372, p. 04011, 2022. https://doi.org/10.1051/matecconf/202237204011 .
H. Zeng, M. Zhang, Y. Xia, and R. Chen, “Decoupling the Depth and Scope of Graph Neural Networks,” arXiv preprint arXiv:2201.07858, 2022. https://doi.org/10.48550/arXiv.2201.07858 .
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre training of Deep Bidirectional Transformers for Language Understanding,” in Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long and Short Papers), Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186. https://doi.org/10.18653/v1/N19 1423 .
J. Fan, X. Tian, C. Lv, S. Zhang, Y. Wang, and J. Zhang, “Extractive social media text summarization based on MFMMR-BertSum,” Array, vol. 18, p. 100322, 2023. https://doi.org/10.1016/j.array.2023.100322
K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, K. Clark, S. Pfohl, H. Cole‑Lewis, D. Neal, M. Schaekermann, A. Wang, M. Amin, S. Lachgar, P. Mansfield, S. Prakash, B. Green, E. Dominowska, B. Agüera y Arcas, N. Tomasev, Y. Liu, R. Wong, C. Semturs, S. Sara Mahdavi, J. Barral, D. Webster, G. S. Corrado, Y. Matias, and A. Karthikesalingam, “Towards Expert‑Level Medical Question Answering with Large Language Models,” arXiv preprint arXiv:2305.09617, 2023. https://doi.org/10.48550/arXiv.2305.09617 .
K. Wang, J. Zhu, M. Ren, Z. Liu, S. Li, Z. Zhang, C. Zhang, X. Wu, Q. Zhan, Q. Liu, and Y. Wang, “A Survey on Data Synthesis and Augmentation for Large Language Models,” arXiv preprint arXiv:2410.12896, 2024. https://doi.org/10.48550/arXiv.2410.12896 .
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” arXiv preprint arXiv:2203.02155, 2022. https://doi.org/10.48550/arXiv.2203.02155 .
M. Guida, F. Caniato, A. Moretto, and S. Ronchi, “The role of artificial intelligence in the procurement process: State of the art and research agenda,” Journal of Purchasing and Supply Management, vol. 29, no. 2, 100823, 2023. https://doi.org/10.1016/j.pursup.2023.100823 .
M. Guida, F. Caniato, and A. Moretto, “AI meets spend classification: A new frontier in information processing,” Journal of Purchasing and Supply Management, vol. 31, no. 3, 100993, 2025. https://doi.org/10.1016/j.pursup.2025.100993 .
M. Koniaris, D. Galanis, E. Giannini, and P. Tsanakas, “Evaluation of automatic legal text summarization techniques for Greek case law,” Information, vol. 14, no. 4, p. 250, 2023. https://doi.org/10.3390/info14040250 .
M. M. Karim, S. Khan, D. H. Van, X. Liu, C. Wang, and Q. Qu, “Transforming data annotation with AI agents: A review of architectures, reasoning, applications, and impact,” Future Internet, vol. 17, no. 8, p. 353, 2025. https://doi.org/10.3390/fi17080353 .
M. S. Baysan, S. Uysal, İ. İşlek, Ç. Çığ Karaman, and T. Güngör, “LLM-as-a-Judge: Automated evaluation of search query parsing using large language models,” Frontiers in Big Data, vol. 8, p. 1611389, 2025. https://doi.org/10.3389/fdata.2025.1611389 .
Ontology Development 101: A Guide to Creating Your First Ontology, N. F. Noy and D. L. McGuinness, Stanford University, Stanford, CA, USA, Technical Report, 2001. https://protege.stanford.edu/publications/ontology_development/ontology101.pdf .
P. Colombo, T. P. Pires, M. Boudiaf, D. Culver, R. Melo, C. Corro, A. F. T. Martins, F. Esposito, V. L. Raposo, S. Morgado, and M. Desa, “SaulLM 7B: A pioneering Large Language Model for Law,” arXiv preprint arXiv:2403.03883, 2024. https://doi.org/10.48550/arXiv.2403.03883 .
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval Augmented Generation for Knowledge Intensive NLP Tasks,” arXiv preprint arXiv:2005.11401, 2020. https://doi.org/10.48550/arXiv.2005.11401 .
R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct Preference Optimization: Your Language Model is Secretly a Reward Model,” arXiv preprint arXiv:2305.18290, 2023. https://doi.org/10.48550/arXiv.2305.18290 .
S. E. Safitri, W. D. Yuniarti, M. R. Handayani, and K. Umam, “User opinion mining on the Maxim application reviews using BERT-Base Multilingual Uncased,” Jurnal Sisfokom, vol. 14, no. 3, pp. 365–372, 2025. https://doi.org/10.32736/sisfokom.v14i3.2391 .
S. Han, L. Shi, and F. R. Tsui, “Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification,” PLoS ONE, vol. 20, no. 1, p. e0317042, 2025. https://doi.org/10.1371/journal.pone.0317042 .
S. Herold, J. Heller, F. Rozemeijer, and D. Mahr, “Brave new procurement deals: An experimental study of how generative artificial intelligence reshapes buyer–supplier negotiations,” Journal of Purchasing and Supply Management, vol. 31, no. 4, 101012, 2025. https://doi.org/10.1016/j.pursup.2025.101012 .
S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A survey on knowledge graphs: Representation, acquisition, and applications,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494–514, 2022. https://doi.org/10.1109/TNNLS.2021.3070843 .
S. Kim, D. Kim, C. Park, W. Lee, W. Song, Y. Kim, H. Kim, Y. Kim, H. Lee, J. Kim, C. Ahn, S. Yang, S. Lee, H. Park, G. Gim, M. Cha, H. Lee, and S. Kim, “SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up‑Scaling,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), Mexico City, Mexico, pp. 23–35, Jun. 2024. https://doi.org/10.18653/v1/2024.naacl‑industry.3 .
S. Toro, A. V. Anagnostopoulos, S. M. Bello, K. Blumberg, R. Cameron, L. Carmody, A. D. Diehl, D. M. Dooley, W. D. Duncan, P. Fey, P. Gaudet, N. L. Harris, M. P. Joachimiak, L. Kiani, T. Lubiana, M. C. Munoz‑Torres, S. O‘Neil, D. Osumi‑Sutherland, A. Puig‑Barbe, J. T. Reese, L. Reiser, S. M. C. Robb, T. Ruemping, J. Seager, E. Sid, R. Stefancsik, M. Weber, V. Wood, M. A. Haendel, and C. J. Mungall, “Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON‑AI),” Journal of Biomedical Semantics, vol. 15, no. 1, pp. 1–16, 2024. https://doi.org/10.1186/s13326-024-00320-3 .
S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. S. Rosenberg, and G. Mann, “BloombergGPT: A Large Language Model for Finance,” arXiv preprint arXiv:2303.17564, 2023. https://doi.org/10.48550/arXiv.2303.17564 .
T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” arXiv preprint arXiv:2305.14314, 2023. https://doi.org/10.48550/arXiv.2305.14314 .
T. Hossain, K. M. Saifuddin, M. I. K. Islam, F. Tanvir, and E. Akbas, “Tackling oversmoothing in GNN via graph sparsification,” in Machine Learning and Knowledge Discovery in Databases: Research Track and Demo Track – European Conference, ECML PKDD 2024, Proceedings, Lecture Notes in Computer Science, Springer, 2024, pp. 161–179. https://doi.org/10.1007/978-3-031-70371-3_10 .
T. K. Rusch, M. M. Bronstein, and S. Mishra, “A survey on oversmoothing in graph neural networks,” arXiv preprint arXiv:2303.10993, 2023. https://doi.org/10.48550/arXiv.2303.10993 .
T. R. Gruber, “A translation approach to portable ontology specifications,” Knowledge Acquisition, vol. 5, no. 2, pp. 199–220, 1993. https://doi.org/10.1006/knac.1993.1008 .
W. Lu, R. K. Luu, and M. J. Buehler, “Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities,” npj Computational Materials, vol. 11, p. 84, 2025. https://doi.org/10.1038/s41524-025-01564-y .
X. Hao, Z. Ji, X. Li, L. Yin, L. Liu, M. Sun, Q. Liu, and R. Yang, “Construction and application of a knowledge graph,” Remote Sensing, vol. 13, no. 13, p. 2511, 2021. https://doi.org/10.3390/rs13132511 .
X. Wu, Z. Chen, W. Wang, and A. Jadbabaie, “A non-asymptotic analysis of oversmoothing in graph neural networks,” arXiv preprint arXiv:2212.10701, 2023. https://doi.org/10.48550/arXiv.2212.10701 .
Y. Dai, S. Wang, N. N. Xiong, and W. Guo, “A survey on knowledge graph embedding: Approaches, applications and benchmarks,” Electronics, vol. 9, no. 5, p. 750, 2020. https://doi.org/10.3390/electronics9050750 .
Y. Kong, X. Liu, Z. Zhao, D. Zhang, and J. Duan, “Bolt defect classification algorithm based on knowledge graph and feature fusion,” Energy Reports, vol. 8, suppl. 1, pp. 856–863, 2022. https://doi.org/10.1016/j.egyr.2021.11.127 .
Z. Bahroun, A. Saihi, R. Asad, and M. Tanash, “A systematic analysis of generative artificial intelligence for supply chain transformation,” Supply Chain Analytics, vol. 13, 100188, 2026. https://doi.org/10.1016/j.sca.2025.100188 .
Z. Wu, Y. Zhang, and H. Li, “A semi-automatic ontology development framework for safety requirements,” Buildings, vol. 15, no. 4, p. 569, 2025. https://doi.org/10.3390/buildings15040569 .