Quantcast
Channel: how to check uniqueness (non duplication) of a post in an rss feed - Stack Overflow
Browsing all 4 articles
Browse latest View live

Answer by Brian for how to check uniqueness (non duplication) of a post in an...

Some RSS feeds have a guid element as an identifier. Posts with a shared guid are probably duplicates. Some RSS feeds just stuff the URL in there to indicate that a post's uniqueness is tied to its...

View Article



Answer by Jagira for how to check uniqueness (non duplication) of a post in...

Take a look at the clustering algorithms used Google news. Though your requirements are not that high, but they are vaguely related to what Google news does - They cluster stories about same event from...

View Article

Answer by Tim Carter for how to check uniqueness (non duplication) of a post...

The URL would be a good start. As for different versions when people make changes. That would depend on implementation details. If pubDate is used in the item element of the feed, it would be useful to...

View Article

how to check uniqueness (non duplication) of a post in an rss feed

when retrieving and caching/saving (in a database) some posts from an rss feed, how to determine that: it is the same post (example: when some typos are fixed in the feed or if the title changes, the...

View Article
Browsing all 4 articles
Browse latest View live




Latest Images