Search for collections on Undip Repository

SISTEM KLASIFIKASI DAN REKOMENDASI DOKUMEN KARYA ILMIAH BERBASIS FINE-TUNING DAN REPRESENTASI VEKTOR KONTEKSTUAL BERT

ANTARIKSA, Muhammad Deagama Surya and Sugiharto, Aris and Surarso, Bayu (2025) SISTEM KLASIFIKASI DAN REKOMENDASI DOKUMEN KARYA ILMIAH BERBASIS FINE-TUNING DAN REPRESENTASI VEKTOR KONTEKSTUAL BERT. Masters thesis, UNIVERSITAS DIPONEGORO.

[thumbnail of 1. Cover-1.pdf] Text
1. Cover-1.pdf

Download (155kB)
[thumbnail of 1. Cover.pdf] Text
1. Cover.pdf
Restricted to Repository staff only

Download (1MB)
[thumbnail of 2. BAB I.pdf] Text
2. BAB I.pdf

Download (343kB)
[thumbnail of 3. BAB II.pdf] Text
3. BAB II.pdf

Download (1MB)
[thumbnail of 4. BAB III.pdf] Text
4. BAB III.pdf
Restricted to Repository staff only

Download (2MB)
[thumbnail of 5. BAB IV.pdf] Text
5. BAB IV.pdf
Restricted to Repository staff only

Download (2MB)
[thumbnail of 6. BAB V.pdf] Text
6. BAB V.pdf
Restricted to Repository staff only

Download (312kB)
[thumbnail of 7. Daftar Pustaka.pdf] Text
7. Daftar Pustaka.pdf

Download (306kB)
[thumbnail of 8. Lampiran.pdf] Text
8. Lampiran.pdf
Restricted to Repository staff only

Download (7MB)

Abstract

Pesatnya pertumbuhan dokumen karya ilmiah menimbulkan tantangan dalam pengelolaan informasi karena peneliti maupun mahasiswa sering kesulitan menelusuri literatur akibat publikasi yang terus meningkat. Sistem klasifikasi dan rekomendasi yang ada sebagian besar masih berbasis metode tradisional yang hanya mengandalkan kemiripan kata tanpa memahami makna semantik. Penelitian ini merumuskan permasalahan efektivitas metode tradisional dibandingkan pendekatan berbasis transformer BERT dengan tujuan mengembangkan sistem klasifikasi dan rekomendasi dokumen berbasis fine-tuning BERT serta membandingkan performanya dengan metode tradisional. Dataset terdiri dari 2.000 dokumen bidang ilmu komputer dan informatika yang dibagi 70% training, 10% validation, dan 20% testing. Model bert-base-uncased digunakan dengan penyesuaian hyperparameter, evaluasi klasifikasi dilakukan menggunakan akurasi, presisi, recall, dan F1-score, sedangkan rekomendasi dinilai dengan cosine similarity melalui Precision@K dan Recall@K. Hasil menunjukkan konfigurasi terbaik dengan learning rate 3e-5, batch size 32, dan optimizer AdamW yang stabil tanpa overfitting serta mencapai akurasi dan F1-score 91% pada data testing. Uji relevansi menunjukkan Precision@5 sebesar 93,13% dan stabil di atas 92% hingga top-200, lalu menurun bertahap hingga 34,9% pada seluruh koleksi sedangkan Recall@K meningkat konsisten dari 0,70% pada top-5 hingga 100% pada seluruh koleksi. Temuan ini membuktikan BERT fine-tuned mampu menjaga ketepatan rekomendasi awal sekaligus menjangkau hampir semua dokumen relevan, sehingga memberikan hasil klasifikasi dan rekomendasi lebih unggul dibanding metode tradisional.
Kata Kunci: BERT, Fine-tuning, Klasifikasi Teks, Sistem Rekomendasi, Cosine Similarity

The rapid growth of scientific documents presents challenges in information management as researchers and students often face difficulties in exploring literature due to the continuous increase in publications. Existing classification and recommendation systems mostly rely on traditional methods that focus only on word similarity without capturing semantic meaning. This study addresses the effectiveness of traditional methods compared to the BERT transformer approach, with the aim of developing a scientific document classification and recommendation system based on BERT fine-tuning and comparing its performance with traditional approaches. The dataset consists of 2.000 documents in the field of computer science and informatics, divided into 70% training, 10% validation, and 20% testing. The bert-base-uncased model was used with hyperparameter adjustments. Classification evaluation employed accuracy, precision, recall, and F1-score, while recommendation was assessed using cosine similarity with Precision@K and Recall@K. Results show that the best configuration with learning rate 3e-5, batch size 32, and AdamW optimizer achieved stable performance without overfitting and reached 91% accuracy and F1-score on the testing data. Relevance testing demonstrated Precision@5 of 93,13%, remaining above 92% up to top-200 before gradually decreasing to 34,9% across the full collection, while Recall@K consistently increased from 0,70% at top-5 to 100% across the collection. These findings confirm that fine-tuned BERT effectively maintains the precision of top recommendations while covering nearly all relevant documents, outperforming traditional methods in both classification and recommendation.
Keywords: BERT, Fine-tuning, Text Classification, Recommendation System, Cosine Similarity

Item Type: Thesis (Masters)
Uncontrolled Keywords: BERT, Fine-tuning, Klasifikasi Teks, Sistem Rekomendasi, Cosine Similarity
Subjects: Sciences and Mathemathic
Divisions: Postgraduate Program > Master Program in Information System
Depositing User: ekana listianawati
Date Deposited: 10 Dec 2025 08:37
Last Modified: 10 Dec 2025 08:37
URI: https://eprints2.undip.ac.id/id/eprint/42036

Actions (login required)

View Item View Item