Ah, it’s nice (in a “it’s not really nice” kinda way) to see that scale has finally brought many of the challenges we faced at PointCast back in ’96-’97 to the RSS aggregator world. For a while there, it seemed that everyone was thinking that RSS (remember, it stands for Really Simple Syndication, right?), Atom, etc were truly viable solutions as currently envisioned and implemented (and as desired by users, i.e., full text feeds).
Unfortunately, life isn’t that simple; if it was, PointCast might still be around and thriving today (well, that’s a bit of a stretch)…
RSS has several benefits for users and publishers. Personally, I see these as:
1. Timely, structured, aggregated updates of content that interest me (user)
2. Anonymous content pulls, in a relationship structure that I can terminate at will (user)
3. Distributed, standardized content structure and serving (publisher/aggregator client author)
Unfortunately, it is these very things that impact scalability. To wit:
w.r.t. #1
Until the perceived user difference between displaying a full text feed post/article in my aggregator client and loading the post/article real time is non-existent; and broadband access is so ubiquitous that “whacking” (remember that web term?) pages before jumping on the train, plane, etc holds no value… users will desire full-text, fetched pages vs. headline-only, substantially increasing bandwidth requirements.
w.r.t. #2
If the server doesn’t know the client, it must trust the client to “do the right thing”, e.g., respect a feed specific polling limitation. This inherently creates vulnerability for abuse, be it intentional (e.g., denial of service) or simple user over zealousness.
w.r.t. #3
PointCast had a central NOC; we pulled content from all of our providers and centrally served it up only the new/updated articles to known clients who could tell us their current client-side state. The content we pulled was “generally” in a standardized format, but of course it was easier for the folks working with giant media companies to request custom feed work than it was for them to force said media co’s to comply with our standard way back when. And it was certainly easier (business wise), though substantially more expensive and more technically challenging, for us to serve content vs. the media co’s.
Of course, things changed over time. Central servers and known clients fetching from a known state at first… then came “point servers” (local caching servers that all clients behind a given firewall would point to)… then CDF (the precursor to RSS, IMHO) when we wanted to distribute serving load for smaller publishers… to experiments with multi-casting (we actually implemented rudimentary multicast for very limited data feeds [e.g., weather temps] in either the 2.6 or 2.7 client, I forget)… to contemplation of P2P… to a new (3.0 architecture) focused on headlines-only! Full circle!
Unfortunately, users want what they want. And as I mentioned above, that’s full text, available instantaneously, whether they’re connected or not. Compression, P2P, multi-cast, known state, and similar concepts will all be needed, if the goal is truly to provide what user’s want and minimize resource utilization. RSS? No. RCS – Really Complex Syndication… but it’ll be worth it!