Algroveon-Parser – RSS and Atom Parser without external dependencies
Slim feed parser in pure Python for RSS 2.0, RSS 0.91, and Atom 1.0.
This Python library is designed for parsing RSS and Atom feeds, deliberately relying solely on the Python standard library. It processes raw feed data without external dependencies and was created as a direct replacement for an external parser solution because I wanted to see how much I could achieve on my own while keeping external dependencies to an absolute minimum.
What the Parser Does
- Formats: RSS 2.0, RSS 0.91, Atom 1.0, RDF-based RSS 1.0 feeds
- Output: Typed
FeedandEntrydataclasses, ready for immediate use - HTML Sanitizer: Allowlist-based, XSS-secure, produces cleaned HTML and plain text
- Image Extraction:
media:thumbnail→media:content→ first<img>from content → summary - Date Normalization: RFC 2822 and ISO 8601, always timezone-aware
- Encoding Fallback: Tolerates feeds that declare incorrect encoding
Modules
| Module | Task |
|---|---|
parser.py |
Format detection, dispatcher, encoding fallback |
rss2.py |
RSS 2.0 / 0.91 parser including content:encoded, dc:creator, media:* |
atom.py |
Atom 1.0 parser including <link rel="alternate"> |
sanitize.py |
HTML sanitizer + plain-text extraction |
images.py |
Image URL extraction from XML elements and HTML content |
date.py |
RFC-2822 and ISO-8601 date normalization |
models.py |
Feed and Entry dataclasses |
Namespaces and Real-World Feeds
Developed and tested against 16 real feeds (as of March 2026): Tagesschau, Spiegel, Süddeutsche, Zeit, Heise, The Verge, Handelsblatt, WiWo, Postillon, and others. Supported namespaces: content:encoded, dc:creator, dc:date, media:thumbnail, media:content. The scope will be significantly expanded, but it already serves as good training to ensure the parser can work as universally as possible with various technologies and feed variants in the long term.
No HTTP client included – intentionally. The parser accepts raw bytes, making it independent of the transport layer.
Running embedded within the Algroveon news infrastructure. Packaging as a standalone, cleanly distributable module is not yet complete.