DOCUMENT SIMILARITY EVALUATION USING A FUZZY CLUSTERING APPROACH
Keywords:
.Abstract
The Vector Space Model and other techniques to document clustering rely on single term analysis of the document data set. In these circumstances, it is particularly important to use more useful criteria to categorise documents more accurately, such as phrases and their weights. A taxonomy of documents may be constructed, automated document categorization, grouping search engine results, and other uses for document clustering that are particularly advantageous. Because of this, the Fuzzy Clustering method is better at producing the intended results. Our research presents the key idea behind efficient Fuzzy document clustering. The first element, the Document Index Graph, is a document index design that enables steady construction of the index for the document set while putting a focus on efficiency as opposed to relying solely on single-term indexes. It provides efficient phrase matching, that can be used to determine how similar two documents are. This model is adaptable in that it can go back to a compact version of the vector space model if we don't index phrases. Two computational models are applied in both phases: the Gaussian Mixture Model and Expectation Maximization. These two elements work together to create a robust and reliable document similarity computation model, which produces far better Web document clustering results than previous methods.