Search for collections on Undip Repository

EVALUASI KINERJA ALGORITMA C5.0, SUPPORT VECTOR MACHINE, RANDOM FOREST, DAN SELEKSI FITUR CHI-SQUARE DALAM ANALISIS SENTIMEN BERITA ANCAMAN PRIVASI DATA

SAMI'UN, Defitroh Chen and Sugiharto, Aris and Jie, Ferry (2024) EVALUASI KINERJA ALGORITMA C5.0, SUPPORT VECTOR MACHINE, RANDOM FOREST, DAN SELEKSI FITUR CHI-SQUARE DALAM ANALISIS SENTIMEN BERITA ANCAMAN PRIVASI DATA. Masters thesis, UNIVERSITAS DIPONEGORO.

[thumbnail of Cover-1 (1).pdf] Text
Cover-1 (1).pdf

Download (94kB)
[thumbnail of Cover.pdf] Text
Cover.pdf
Restricted to Repository staff only

Download (842kB)
[thumbnail of BAB I.pdf] Text
BAB I.pdf

Download (101kB)
[thumbnail of BAB II.pdf] Text
BAB II.pdf

Download (390kB)
[thumbnail of BAB III.pdf] Text
BAB III.pdf
Restricted to Repository staff only

Download (174kB)
[thumbnail of BAB IV.pdf] Text
BAB IV.pdf
Restricted to Repository staff only

Download (436kB)
[thumbnail of BAB V.pdf] Text
BAB V.pdf
Restricted to Repository staff only

Download (94kB)
[thumbnail of Daftar pustaka.pdf] Text
Daftar pustaka.pdf

Download (187kB)
[thumbnail of Lampiran.pdf] Text
Lampiran.pdf
Restricted to Repository staff only

Download (194kB)

Abstract

Isu keamanan dan privasi data sangat penting di era digital yang semakin maju, di mana teknologi terus berkembang dengan cepat, namun diiringi dengan meningkatnya risiko kebocoran data dan kurangnya proteksi terhadap informasi pribadi. Pada tahun 2022, isu ini menjadi perbincangan utama di Indonesia dan menimbulkan banyak tanggapan di media sosial. YouTube, sebagai salah satu sumber berita utama yang banyak diakses masyarakat, menyediakan komentar pengguna yang menjadi sumber utama untuk analisis sentimen dalam memahami respons publik terhadap isu keamanan dan privasi data. Tahap awal yang dilakukan dalam analisis sentimen yaitu pra-pemrosesan data yang meliputi tahap cleaning, case folding, tokenizing, memperbaiki slang words, stemming, dan stopword removal. Selanjutnya, metode TF-IDF digunakan untuk mengukur kepentingan kata dalam dokumen, sementara seleksi fitur Chi-Square diterapkan untuk meningkatkan kinerja model klasifikasi. Penelitian ini menggunakan tiga algoritma klasifikasi, yaitu C5.0, Random Forest (RF), dan Support Vector Machine (SVM). C5.0 bekerja dengan membagi dataset menjadi subset berdasarkan fitur yang relevan, RF menggabungkan beberapa pohon keputusan untuk meningkatkan akurasi, sedangkan SVM mencari hyperplane terbaik yang memisahkan kelas dengan margin terbesar. Kombinasi algoritma-algoritma ini bersama seleksi fitur Chi-Square terbukti optimal dalam meningkatkan akurasi. Hasil menunjukkan RF memiliki akurasi tertinggi (85,58%), diikuti oleh SVM (85,02%) dan C5.0 (81,82%). Selain itu, analisis terhadap komentar di YouTube menunjukkan dominasi sentimen negatif, mencerminkan kekhawatiran publik terhadap potensi kebocoran data oleh pemerintah. Temuan ini menjadi wawasan berharga bagi pembuat kebijakan untuk lebih responsif dalam memperkuat keamanan data dan transparansi, demi membangun lingkungan digital yang lebih aman dan terpercaya.
Kata kunci: Analisis sentimen, Seleksi fitur Chi-Square, C5.0, RF, SVM

Data security and privacy issues are crucial in an increasingly advanced digital era, where technology continues to develop rapidly but is accompanied by rising risks of data breaches and insufficient protection of personal information. In 2022, this issue became a major topic of discussion in Indonesia, sparking numerous responses on social media. YouTube, as one of the primary news sources widely accessed by the public, provides user comments that serve as a main source for sentiment analysis to understand public responses regarding data security and privacy issues. The initial stage in sentiment analysis involves data preprocessing, including cleaning, case folding, tokenizing, correcting slang words, stemming, and stopword removal. Then, the TF-IDF method is applied to measure word importance within documents, while Chi-Square feature selection is used to enhance classification model performance. This study employs three classification algorithms: C5.0, Random Forest (RF), and Support Vector Machine (SVM). C5.0 works by dividing the dataset into subsets based on relevant features, RF combines multiple decision trees to improve accuracy, and SVM finds the best hyperplane that separates classes with the largest margin. The combination of these algorithms with Chi-Square feature selection proved optimal in enhancing accuracy. The results show that RF achieved the highest accuracy (85.58%), followed by SVM (85.02%) and C5.0 (81.72%). Additionally, the analysis of YouTube comments reveals a predominance of negative sentiment, reflecting public concerns about potential data breaches by the government. These findings provide valuable insights for policymakers to be more responsive in strengthening data security and transparency, aiming to build a safer and more trustworthy digital environment.
Keywords: Sentiment analysis, Chi-Square feature selection, C5.0, RF, SVM

Item Type: Thesis (Masters)
Uncontrolled Keywords: Analisis sentimen, Seleksi fitur Chi-Square, C5.0, RF, SVM
Subjects: Sciences and Mathemathic
Divisions: Postgraduate Program > Master Program in Information System
Depositing User: ekana listianawati
Date Deposited: 30 Apr 2025 07:33
Last Modified: 30 Apr 2025 07:33
URI: https://eprints2.undip.ac.id/id/eprint/31803

Actions (login required)

View Item View Item