In accordance with the disclosure, there is provided a method for identifying duplicate documents comprising drafting a first document and creating a near unique representative string based on the document content. The method further comprises searching for other documents with the same NRS and selectively...http://www.google.com.hk/patents/US20080243842?utm_source=gb-gplus-share專利 US20080243842 - Optimizing the performance of duplicate identification by content