摘要:
As most existing document retrieval models are inefficient in semantic learning and are unable to learn the document similarity in topic level, a topic-based document retrieval model (TDRM) is p TDRM provides a common topic space for all documents, represents each document as a vector in the common space, defines the document similarity as the cosine of the angle between document vectors, and uses Latent Dirichlet Allocation to learn the topic distribution of each document. Experimental results show that, as compared with the document similarity model based on the TextTiling and the optimal matching of bipartite graph, TDRM is of higher average precision and recall in the retrieval of similar document, with its harmonic mean of average precision and recall being 44% greater than that of the reference model.
展开