OAK

Boosting discrimination information based document clustering using consensus and classification

Metadata Downloads
Abstract
Adequate choice of term discrimination information measure (DIM) stipulates guaranteed document clustering. Exercise for the right choice is empirical in nature, and characteristics of data in the documents help experts to speculate a viable solution. Thus, a consistent DIM for the clustering is a mere conjecture and demands intelligent selection of the information measure. In this work, we propose an automated consensus building measure based on a text classifier. Two distinct DIMs construct basic partitions of documents and form base clusters. The consensus building measure method uses the clusters information to find concordant documents and constitute a dataset to train the text classifier. The classifier predicts labels for discordant documents from earlier clustering stage and forms new clusters. The experimentation is performed with eight standard data sets to test efficacy of the proposed technique. The improvement observed by applying the proposed consensus clustering demonstrates its superiority over individual results. Relative Risk (RR) and Measurement of Discrimination Information (MDI) are the two discrimination information measures used for obtaining the base clustering solutions in our experiments. © 2013 IEEE.
Author(s)
Sheri, Ahmad MuqeemRafique, Muhammad AasimHassan, Malik TahirJunejo, Khurum NazirJeon, Moongu
Issued Date
2019-06
Type
Article
DOI
10.1109/ACCESS.2019.2923462
URI
https://scholar.gist.ac.kr/handle/local/12660
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
IEEE Access, v.7, pp.78954 - 78962
ISSN
2169-3536
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.