String alignment for automated document versioningKnowledge and Information Systems
|
Reviews
[Write a review of this article]
There are no reviews of this article
Notes for this article
Also see his site for Snitch, a package to visualize text alignment: http://www.must.edu.my/~wlwoon/snitch/
This is described in a paper I can't locate:
Chun Kit See, Wei Lee Woon, Kuok-Shoong Wong: Simple Forward Method for Plagiarism Detection. in Gabriele Kotsis, David Taniar, Eric Pardede, Ismail Khalil Ibrahim (Eds.): MoMM'2007 - The Fifth International Conference on Advances in Mobile Computing and Multimedia, 3-5 December 2007, Jakarta, Indonesia. books@ocg.at 230 Austrian Computer Society 2007
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
AbstractAbstract The automated analysis of documents is an important task given the rapid increase in availability of digital texts. Automatic text processing systems often encode documents as vectors of term occurrence frequencies, a representation which facilitates the classification and clustering of documents. Historically, this approach derives from the related field of data mining, where database entries are commonly represented as points in a vector space. While this lineage has certainly contributed to the development of text processing, there are situations where document collections do not conform to this clustered structure, and where the vector representation may be unsuitable for text analysis. As a proof-of-concept, we had previously presented a framework where the optimal alignments of documents could be used for visualising the relationships within small sets of documents. In this paper we develop this approach further by using it to automatically generate the version histories of various document collections. For comparison, version histories generated using conventional methods of document representation are also produced. To facilitate this comparison, a simple procedure for evaluating the accuracy of the version histories thus generated is proposed.
BibTeX record
RIS record