Search for collections on Undip Repository

OPTIMASI GSDMM UNTUK ANALISIS TOPIK TEKS PENDEK OPINI PELANGGAN DENGAN EVALUASI SEMANTIK INDOBERTWEET EMBEDDING

ARIFUDIN, Asa and Warsito, Budi and Suseno, Jatmiko Endro (2026) OPTIMASI GSDMM UNTUK ANALISIS TOPIK TEKS PENDEK OPINI PELANGGAN DENGAN EVALUASI SEMANTIK INDOBERTWEET EMBEDDING. Masters thesis, UNIVERSITAS DIPONEGORO.

[thumbnail of 1. cover awal.pdf] Text
1. cover awal.pdf

Download (128kB)
[thumbnail of 2. Cover lengkap.pdf] Text
2. Cover lengkap.pdf
Restricted to Repository staff only

Download (2MB)
[thumbnail of 3. BAB I.pdf] Text
3. BAB I.pdf

Download (247kB)
[thumbnail of 4. BAB II.pdf] Text
4. BAB II.pdf

Download (441kB)
[thumbnail of 5. BAB III.pdf] Text
5. BAB III.pdf
Restricted to Repository staff only

Download (349kB)
[thumbnail of 6. BAB IV.pdf] Text
6. BAB IV.pdf
Restricted to Repository staff only

Download (841kB)
[thumbnail of 7. BAB V.pdf] Text
7. BAB V.pdf
Restricted to Repository staff only

Download (234kB)
[thumbnail of 8. Daftar Pustaka.pdf] Text
8. Daftar Pustaka.pdf

Download (196kB)

Abstract

Analisis opini pelanggan di media sosial X menghadapi tantangan signifikan akibat karakteristik teks pendek yang memiliki unsur kata sedikit dan informal. Penelitian ini mengusulkan pendekatan metodologis strategi Optimasi Dua Fase yang menerapkan algoritma Gibbs Sampling Dirichlet Mixture Model (GSDMM) dengan evaluasi semantik berbasis IndoBERTweet. GSDMM difungsikan sebagai metode pemodelan topik, sedangkan representasi vektor IndoBERTweet berperan sebagai validator semantik untuk mengatasi bias pada metrik statistik konvensional. Menggunakan dataset sebanyak 35.899 tweet yang terkait akun resmi layanan operator seluler, proses optimasi dilakukan melalui dua tahapan sistematis. Fase pertama menerapkan seleksi berbasis statistik perplexity dan topic aktif terhadap 27 kombinasi hyperparameter K, alpha dan beta untuk mengeliminasi model yang tidak stabil. Fase kedua menerapkan seleksi semantik menggunakan Cost Function (CF) yang menyeimbangkan kepadatan klaster (Within-Cluster Variation) dan pemisahan antar-klaster (Between-Cluster Variation) dalam ruang vektor. Hasil penelitian menunjukkan bahwa konfigurasi optimal (K=20, alpha=0.5, beta=0.5) mampu menghasilkan 11 topik aktif dengan struktur yang jauh lebih koheren dalam satu topik dan terpisah antar topik dibandingkan model baseline. Secara kualitatif, metode ini berhasil memetakan isu strategis yang mencakup layanan aplikasi, kualitas jaringan, harga paket, hingga isu laten keamanan data. Penelitian ini membuktikan bahwa penerapan evaluasi semantik IndoBERTweet efektif meningkatkan kualitas topik pada data teks pendek.
Kata Kunci: GSDMM, IndoBERTweet, pemodelan topik, teks pendek, media sosial, opini pelanggan

Analyzing customer opinions on social media X presents significant challenges due to the characteristics of short, sparse, and informal text. This study proposes a methodological approach using a Two-Phase Optimization strategy that applies Gibbs Sampling Dirichlet Mixture Model (GSDMM) algorithm with IndoBERTweet-based semantic evaluation. In this framework, GSDMM functions as the topic modeling method, while IndoBERTweet vector representations serve as a semantic validator to overcome the bias inherent in conventional statistical metric. Using a dataset of 35,899 tweets related to official account mobile operator services, the optimization process was conducted through two systematic stages. The first phase applied statistical selection based on perplexity and active topics across 27 combinations of hyperparameters (K, alpha, and beta) to eliminate unstable models. The second phase applied semantic selection using a Cost Function (CF) that balances cluster compactness (Within-Cluster Variation) and inter-cluster separation (Between-Cluster Variation) within the vector space. The results show that the optimal configuration (K=20, alpha=0.5, beta=0.5) produced 11 active topics with a structure that is significantly more coherent within topics and distinct between topics compared to the baseline model. Qualitatively, this method successfully mapped strategic issues ranging from application services, network quality, and package pricing to latent issues regarding data security. This study demonstrates that the application of IndoBERTweet semantic evaluation effectively enhances topic quality in short text data.
Keywords: GSDMM, IndoBERTweet, topic modeling, short text, social media, customer opinion

Item Type: Thesis (Masters)
Uncontrolled Keywords: GSDMM, IndoBERTweet, pemodelan topik, teks pendek, media sosial, opini pelanggan
Subjects: Sciences and Mathemathic
Divisions: Postgraduate Program > Master Program in Information System
Depositing User: ekana listianawati
Date Deposited: 05 Mar 2026 07:40
Last Modified: 05 Mar 2026 07:40
URI: https://eprints2.undip.ac.id/id/eprint/46699

Actions (login required)

View Item View Item