Регистрация | Вход в службу | FAQ      [?] 
CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Recent | Recommended | Search | Authors | Tags | Export

String alignment for automated document versioning

by: Wei Woon, Kuok-Shoong Wong
Knowledge and Information Systems


View FullText article


X Reviews [Write a review of this article]

There are no reviews of this article

X Notes for this article

This group has 0 private notes и ещё 1 public note for this article.

Also see his site for Snitch, a package to visualize text alignment: http://www.must.edu.my/~wlwoon/snitch/

This is described in a paper I can't locate:

Chun Kit See, Wei Lee Woon, Kuok-Shoong Wong: Simple Forward Method for Plagiarism Detection. in Gabriele Kotsis, David Taniar, Eric Pardede, Ismail Khalil Ibrahim (Eds.): MoMM'2007 - The Fifth International Conference on Advances in Mobile Computing and Multimedia, 3-5 December 2007, Jakarta, Indonesia. books@ocg.at 230 Austrian Computer Society 2007


markymaypo (public ) - 2008-06-25 22:09:05

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Abstract

Abstract  The automated analysis of documents is an important task given the rapid increase in availability of digital texts. Automatic text processing systems often encode documents as vectors of term occurrence frequencies, a representation which facilitates the classification and clustering of documents. Historically, this approach derives from the related field of data mining, where database entries are commonly represented as points in a vector space. While this lineage has certainly contributed to the development of text processing, there are situations where document collections do not conform to this clustered structure, and where the vector representation may be unsuitable for text analysis. As a proof-of-concept, we had previously presented a framework where the optimal alignments of documents could be used for visualising the relationships within small sets of documents. In this paper we develop this approach further by using it to automatically generate the version histories of various document collections. For comparison, version histories generated using conventional methods of document representation are also produced. To facilitate this comparison, a simple procedure for evaluating the accuracy of the version histories thus generated is proposed.


X BibTeX record

X RIS record



RIS BibTeX
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.