Artikel
Judul : Weighted Inverse Document Frequency and Vector Space Model for Hadith Search Engine
Abstrak : Hadith is th second source of Islamic law after Qur’an that many types and references of hadith that need to be studied. Not many Muslims know and have difficulty in studying hadiths. This research aims to build a hadith search engine from reliable source by utilizing Information Retrieval techniques. The structured representation of the text that used is Bag of Word (1-term) with the Weighted Inverse Document Frequency (WIDF) method to calculate the frequency of occurrence of each term before being converted in vector form with the Vector Space Model (VSM). Based on the experiment results using 380 text of hadith, Recall value of WIDF and VSM is 96%, while precision value, just around 35.46% because as long as the text of the hadith contains the keywords that are looking for, the hadith will still be displayed even if the meaning or content is not relevant.
Penulis : Septya Egho Pratama
Keyword : information retrieval, search engine, vector space model, wighted inverse document frequency
Referensi : [1] A. C. Muna, “Perkembangan Studi Hadits Kontemporer,” Religia, vol. 14, no. 2, 2012. [2] Mardani, Hukum Islam; Pengantar Ilmu Hukum Islam di Indonesia. Yogyakarta: Pustaka Pelajar, 2015. [3] Rohidin, Pengantar Hukum Islam (Dari Semenanjung Arabia Sampai Indonesia), 1st ed. Yogyakarta: Lintang Rasi Aksara Books, 2016. [4] F. Djamil, Filsafat Hukum Islam. Jakarta: Logos Wacana Ilmu, 1997. [5] ‘Abd al-Wahab Khallaf, Ilm Usul al-Fiqh. Kairo: Dar Al-Hadith, 2003. [6] A. Wahyudi, “MENGURAI PETA KITAB-KITAB HADITS (Kajian Referensi atas Kitab-kitab Hadits),” AL-IHKAM J. Huk. Pranata Sos., 2015. [7] P. W. Handayani, I. M. Wiryana, and J.-T. Milde, “MESIN PENCARI BERBASISKAN SEMANTIK UNTUK BAHASA INDONESIA,” Jurnal Sistem Informasi, vol. 4, no. 2. pp. 110–114, 2012. [8] J. M. Kassim and M. Rahmany, “Introduction to Semantic Search Engine,” 2009 Int. Conf. Electr. Eng. Informatics, vol. 02, no. August, pp. 380–386, 2009. [9] J. B. Killoran, “How to use search engine optimization techniques to increase website visibility,” IEEE Trans. Prof. Commun., vol. 56, no. 1, pp. 50–66, 2013. [10] S. M. Weiss, N. Indurkhya, T. Zhang, and F. J. Damerau, “Information Retrieval and Text Mining,” Springer Berlin Heidelb., no. Fundamentals of Predictive Text Mining, pp. 75–90, 2010. [11] C. D. Manning, P. Ragahvan, and H. Schutze, An Introduction to Information Retrieval, no. c. 2009. [12] P. R. Agrawal, “Google Search,” 2016. [13] C. C. Brown, “Google Scholar,” Charlest. Advis., 2017. [14] A. Hassan and S. S. Dadwal, “Search Engine Marketing,” in Digital Marketing and Consumer Engagement, 2017. [15] A. A. Maarif, “Penerapan Algoritma TF-IDF untuk Pencarian Karya Ilmiah,” Dok. Karya Ilm. | Tugas Akhir | Progr. Stud. Tek. Inform. - S1 | Fak. Ilmu Komput. | Univ. Dian Nuswantoro Semarang, 2015. [16] A. A. Okfan Rizal Ferdiansyah, Ema Utami, “Implementasi Principal Component Analysis Untuk Sistem Temu Balik Citra Digital,” Creat. Inf. Technol. J., vol. 2, no. 3, 2015. [17] C. Slamet, R. Andrian, D. S. Maylawati, W. Darmalaksana, and M. A. Ramdhani, “Web Scraping and Naïve Bayes Classification for Job Search Engine,” vol. 288, no. 1, pp. 1–7, 2018. [18] F. Amin, “Sistem Temu Kembali Informasi dengan Pemeringkatan Metode Vector Space Model,” J. Teknol. Inf. Din., vol. 18, no. 2, pp. 122–129, 2013. [19] G. Karyono, F. S. Utomo, A. Sistem, and T. Balik, “Temu Balik Informasi Pada Dokumen Teks Berbahasa Indonesia Dengan Metode Vector Space Retrieval Model,” Semin. Nas. Teknol. Inf. dan Terap. 2012, vol. 2012, no. Semantik, pp. 282–289, 2012. [20] I. Irmawati, “SISTEM TEMU KEMBALI INFORMASI PADA DOKUMEN DENGAN METODE VECTOR SPACE MODEL,” J. Ilm. FIFO, 2017. [21] F. Sanjaya, “Pemanfaatan Sistem Temu Kembali Informasi dalam Pencarian Dokumen Menggunakan Metode Vector Space Model,” J. Inf. Technol., 2018. [22] P. E. Mas’udia, M. D. Atmadja, and L. D. Mustafa, “INFORMATION RETRIEVAL TUGAS AKHIR DAN PERHITUNGAN KEMIRIPAN DOKUMEN MENGACU PADA ABSTRAK MENGGUNAKAN VECTOR SPACE MODEL,” Simetris J. Tek. Mesin, Elektro dan Ilmu Komput., 2017. [23] I. Irmawati, “Information Retrieval in Documents using Vector Space Model,” J. Ilm. FIFO, 2017. [24] C. Van Gysel, M. de Rijke, and E. Kanoulas, “Learning Latent Vector Spaces for Product Search,” 2016. [25] T. Nadu, “TEXT PROCESSING IN INFORMATION RETRIEVAL SYSTEM USING VECTOR SPACE MODEL,” no. 978, pp. 0–5, 2014. [26] F. Amin, “Sistem Temu Kembali Informasi dengan Metode Vector Space Model,” J. Sist. Inf. BISNIS, 2012. [27] D. Susandi and U. Sholahudin, “Pemanfaatan Vector Space Model pada Penerapan Algoritma Nazief Adriani , KNN dan Fungsi Similarity Cosine untuk Pembobotan IDF dan WIDF pada Prototipe Sistem Klasifikasi Teks Bahasa Indonesia,” vol. 3, no. 1, pp. 22–29, 2016. [28] T. Report, “Text categorization based on,” 1994. [29] C. D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. 2008. [30] A. M. Siregar and A. Puspabhuana, “Improvement of term weight result in the information retrieval systems,” in Proceedings of 2017 4th International Conference on New Media Studies, CONMEDIA 2017, 2018. [31] F. Nadirman, A. Ridha, and A. Annisa, “Searching and Visualization of References in Research Documents,” TELKOMNIKA (Telecommunication Comput. Electron. Control., 2014. [32] Y. Wang, “Design of Information Retrieval System Using Rough Fuzzy Set,” TELKOMNIKA Indones. J. Electr. Eng., 2014. [33] H. Jiawei, M. Kamber, J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2006. [34] Y. E. Zohar, “Introduction to Text Mining,” Automated Learning Group, University of Illinois, 2002. [Online]. Available: http://www.docstoc.com/docs/25443990/Introduction-to-TextMining. [35] I. H. Witten, “Text mining,” in The Practical Handbook of Internet Computing, 2004. [36] T. Tokunaga, T. Tokunaga, I. Makoto, and I. Makoto, “Text categorization based on weighted inverse document frequency,” Spec. Interes. Groups Inf. Process Soc. Japan (SIG-IPSJ, 1994. [37] Kurniawati and A. Syauqi, “Term weighting based class indexes using space density for Al-Qur’an relevant meaning ranking,” in 2016 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016, 2017. [38] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, 1975. [39] C. Slamet, A. R. Atmadja, D. S. Maylawati, R. S. Lestari, W. Darmalaksana, and M. A. Ramdhani, “Automated Text Summarization for Indonesian Article Using Vector Space Model,” IOP Conf. Ser. Mater. Sci. Eng., vol. 288, no. 1, pp. 0–6, 2018. [40] L. T. Su, “The relevance of recall and precision in user evaluation,” J. Am. Soc. Inf. Sci., 1994. [41] L. Torgo and R. Ribeiro, “Precision and recall for regression,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009. [42] M. Junker, R. Hoch, and A. Dengel, “On the evaluation of document analysis components by recall, precision, and accuracy,” in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 1999. [43] I. H. Al-Asqalani, Bulughul Al-Maram, Terjemah oleh A.Hasan. Bangil: Pustaka Tamam, 1997. [44] S. Vijayarani, J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining - An Overview,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015. [45] D. S. Maylawati, H. Aulawi, and M. A. Ramdhani, “Flexibility of Indonesian text pre-processing library,” Indones. J. Electr. Eng. Comput. Sci., 2019. [46] T. Mardiana, T. Bharata Adji, and I. Hidayah, “Stemming Influence on Similarity Detection of Abstract Written in Indonesia,” TELKOMNIKA (Telecommunication Comput. Electron. Control., 2016. [47] A. S. Rizki, A. Tjahyanto, and R. Trialih, “Comparison of stemming algorithms and its effect on Indonesian text processing,” TELKOMNIKA (Telecommunication Comput. Electron. Control., 2019. [48] A. F. Hidayatullah, C. I. Ratnasari, and S. Wisnugroho, “Analysis of Stemming Influence on Indonesian Tweet Classification,” TELKOMNIKA (Telecommunication Comput. Electron. Control., 2016. [49] J. Asian, H. E. Williams, and S. M. M. Tahaghoghi, “Stemming Indonesian,” in Conferences in Research and Practice in Information Technology Series, 2005. [50] M. Adriani, J. Asian, S. M. M. T. . Nazief, and H. . Williams, “Stemming Indonesian: A Confix-stripping approach,” ACM Trans. Asian Lang. Inf. Process., vol. 6, no. 1, pp. 1–33, 2007. [51] L. Agusta, “Perbandingan Algoritma Stemming Porter Dengan Algoritma Nazief & Adriani Untuk Stemming Dokumen Teks Bahasa Indonesia,” Konf. Nas. Sist. dan Inform. 2009, 2009. [52] R. Setiawan, A. Kurniawan, W. Budiharto, I. H. Kartowisastro, and H. Prabowo, “Flexible Affix Classification for Stemming Indonesian Language,” in Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2016. [53] D. S. Maylawati, W. B. Zulfikar, C. Slamet, and M. A. Ramdhani, “An Improved of Stemming Algorithm for Mining Indonesian Text with Slang on Social Media,” in The 6th International Conference on Cyber and IT Service Management (CITSM 2018), 2018. [54] H. M. Wallach, “Topic Modeling: Beyond Bag-of-Words,” ICML, no. 1, pp. 977–984, 2006. [55] D. Sa’Adillah Maylawati, M. Irfan, and W. Budiawan Zulfikar, “Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text,” in Journal of Physics: Conference Series, 2017, vol. 801, no. 1. [56] Pusat Bahasa Kemdikbud, “Kamus Besar Bahasa Indonesia ( KBBI ),” Kementeri. Pendidik. dan Budaya, 2016. [57] E. Setiawan, “KBBI - Kamus Besar Bahasa Indonesia,” Kamus Besar Bahasa Indonesia (KBBI), 2019. . [58] D. Sa’Adillah Maylawati, M. Irfan, and W. Budiawan Zulfikar, “Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text,” J. Phys. Conf. Ser., vol. 801, no. 1, 2017. [59] D. S. Maylawati and G. A. P. Saptawati, “Set of Frequent Word Item sets as Feature Representation for Text with Indonesian Slang,” in International Conference on Computing and Applied Informatics, 2016, pp. 1–6. [60] S. Alias, S. K. Mohammad, G. K. Hoon, and T. T. Ping, “A text representation model using Sequential Pattern-Growth method,” Pattern Anal. Appl., vol. 21, no. 1, pp. 233–247, 2018. [61] H. Ahonen-Myka, “Finding All Maximal Frequent Sequences in Text,” Proc. ICML Work. Mach. Learn. Text Data Anal., pp. 11–17, 1999. [62] H. Ahonen-Myka, “Discovery of Frequent Word Sequences in Text,” Proc. ESF Explor. Work. Pattern Detect. Discov., vol. {LNCS} (24, no. Teollisuuskatu 23, pp. 180–189, 2002. [63] R. A. García-Hernández and Y. Ledeneva, “Word sequence models for single text summarization,” Proc. 2nd Int. Conf. Adv. Comput. Interact. ACHI 2009, pp. 44–48, 2009.
File :