RSS(丰富站点摘要)是一种用于提供定期更改的Web内容的格式。许多与新闻相关的网站,博客和其他在线出版商将其内容作为RSS源联合给任何想要的人。在python中,我们利用以下程序包来读取和处理这些供稿。
pip install feedparser
1. Feed结构
在下面的示例中,我们获取了Feed的结构,以便可以进一步分析要处理Feed的哪些部分。
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]
print entry.keys()
运行上面示例代码得到以下结果:
['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']
2. Feed标题和帖子
在下面的示例中,我们读取了rss feed的标题和标题。
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
print 'Number of RSS posts :', len(NewsFeed.entries)
entry = NewsFeed.entries[1]
print 'Post Title :',entry.title
运行上面示例代码得到以下结果:
Number of RSS posts : 5
Post Title : Cong-JD(S) in SC over choice of pro tem speaker
3. Feed详细
根据上述条目结构,可以使用python程序从Feed中获取必要的详细信息,如下所示。由于条目是字典,因此利用其键来生成所需的值。
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]
print entry.published
print "******"
print entry.summary
print "------News Link--------"
print entry.link
运行上面示例代码得到以下结果:
Fri, 18 May 2018 20:13:13 GMT
******
Controversy erupted on Friday over the appointment of BJP MLA K G Bopaiah as pro tem speaker for the assembly, with Congress and JD(S) claiming the move went against convention that the post should go to the most senior member of the House. The combine approached the SC to challenge the appointment. Hearing is scheduled for 10:30 am today.
------News Link--------
https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms