分享

NLP之TM之LDA:利用LDA算法瞬时掌握文档的主题内容—利用希拉里邮件数据集训练LDA模型并对新文本进行主题分类

 处女座的程序猿 2021-09-28

NLP之TM之LDA:利用LDA算法瞬时掌握文档的主题内容—利用希拉里邮件数据集训练LDA模型并对新文本进行主题分类


输出结果


设计思路

核心代码

lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=20)
print('输出第10号分类:',lda.print_topic(10, topn=5))  
print('输出所有的主题分类:',lda.print_topics(num_topics=20, num_words=5))    

 

训练数据集

下载链接:希拉里邮件数据集

LDA模型应用

使用训练好的LDA模型,输入以下几句话,判定各自属于哪个topic

Already voted? That's great! Now help Hillary win by signing up to make calls now
It's Election Day! Millions of Americans have cast their votes for Hillary—join them and confirm where you vote
We don’t want to shrink the vision of this country. We want to keep expanding it
We have a chance to elect a 45th president who will build on our progress, who will finish the job
 

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多