How to archive RSS feed?

I need to take several RSS feeds and archive all the elements that are added to them. I have never consumed or created RSS before, but I know xml, so the format seems pretty intuitive.

I know how to parse a feed: How can I start creating a C # RSS Reader?

I know that I cannot rely on a feed server to provide a complete history: Is it possible to get an RSS archive .

I know that I will have to have some user logic around duplicates: how to check the uniqueness (not duplication) of a post in an RSS post

My question is, how can I ensure that I don’t miss any items? My initial plan is to write a parser where for each item in the feed: 1) Check if it is already in the archive database 2) If not, add it to the database If I plan that it will work once a day, can Am I sure that I will not miss any items?

+3
source share
3 answers

It depends on the feed, some sites often publish articles and can set up their RSS feed to display only the 10 most recent articles. Some sites will do the opposite.

"" ping . (: , , , , , ).

+3

, . :

    Connect to the Web site, and download the XML source of the feed. The Feed Download Engine downloads feeds and enclosures via HTTP or Secure Hypertext Transfer Protocol (HTTPS) protocols only.

    Transform the feed source into the Windows RSS Platform native format, which is based on RSS 2.0 with additional namespace extensions. (The native format is essentially a superset of all supported formats.) To do this, the Windows RSS Platform requires Microsoft XML (MSXML) 3.0 SP5 or later.

    Merge new feed items with existing feed items in the feed store.
    Purge older items from the feed store when the predetermined maximum number of items have been received.

    Optionally, schedule downloads of enclosures with Background Intelligent Transfer Service (BITS). 

HTTP , :

, HTTP GET Delta HTTP (RFC3229) World Wide Web. , . , HTTP gzip Microsoft Win32 Internet (WinInet).

, , , . HTTP 304, GET HTTP (If-Modified-Since, If-None-Match, ETag ..) .

:

The following properties directly affect the number of items that remain after a synchronization operation.

    PubDate—used to determine the "age" of items. If PubDate is not set, LastDownloadTime is used. If the feed is a list, the order of items is predetermined and PubDate (if present) is ignored.

    MaxItemCount—a per-feed setting that limits the number of archived items. The feed ItemCount will never exceed the maximum, even if there are more items that could be downloaded from the feed.

    ItemCountLimit — the upper limit of items for any one feed, normally defined as 2500. The value of MaxItemCount may not exceed this limit. Set MaxItemCount to ItemCountLimit to retain the highest possible number of items.

References

0
source

All Articles