一周以后就是圣诞节啦,作为程序猿的你,……
好了,前言结束(哈哈哈自行脑补),下面是代码:
ps: 所有素材掩码已经打包好,文末领取
程序用到了wordcloud、PIL、numpy、jieba四个库,缺啥装啥:
pip install wordcloud
pip install PIL
pip install numpy
pip install jieba
安装完成后,就可以开始生成词云了,下面的示例都是小红帽这个故事,其中中文版小红帽故事是直接通过百度翻译获得的,存在少许错别字,但是对最终结果影响不大。
英文词云的生成比较简单,不需要额外的分词及停用词,代码如下:
from wordcloud import WordCloud import PIL.Image as image import numpy as np # 一些变量值,依据自己实际情况进行设置 edcoding_type = "utf-8" # 编码方式 background_color = "white" # 生成图片的背景颜色 txt_path = "little red-cap.txt" # 文本路径 mask_path = "mask.png" # 词云形状掩码路径 img_path = "red-cap_wordcloud.png" # 输出生成的词云图片路径 max_words = 200 # 最大显示词数量 # 读取文本内容 def get_txt(txtpath): with open(txtpath, encoding = edcoding_type) as f: text = f.read() return text # 生成词云 def generate_wordcloud(wordlist, maskpath, backgroundcolor, maxwords): mask = np.array(image.open(maskpath)) # 设置图形掩码 wordcloud = WordCloud( background_color = backgroundcolor, # 设置图片背景颜色 mask = mask, # 设置掩码 max_words = maxwords # 设置最大显示词数 ).generate(wordlist) return wordcloud text = get_txt(txt_path) # 获取文本 word_cloud = generate_wordcloud(text, # 生成词云 mask_path, background_color, max_words) image_file = word_cloud.to_image() # 生成图片 image_file.show() # 显示图片 word_cloud.to_file(img_path) # 保存生成的图片
生成的结果如下:
中文词云生成比较复杂,需要自己进行分词(使用jieba),需要自己设置中文字体(这里使用黑体simhei),需要自己手动剔除停用词(停用词的意思就是去掉一些无意义的词,比如“的”、“和”、“或”等等,英文里面之所以不用,是因为wordcloud自带了有默认停用词库并且默认把分出的词给过滤了)。代码如下:
from wordcloud import WordCloud import PIL.Image as image import numpy as np import jieba # 一些变量值,依据自己实际情况进行设置 edcoding_type = "utf-8" # 编码方式 background_color = "white" # 生成图片的背景颜色 txt_path = "小红帽.txt" # 文本路径 font_path = "simhei.ttf" # 字体路径 mask_path = "mask.png" # 词云形状掩码路径 stopwords_path = "chinese_stopwords.txt" # 停用词路径 img_path = "小红帽词云.png" # 输出生成的词云图片路径 max_words = 200 # 最大显示词数量 # 读取文本内容 def get_txt(txtpath): with open(txtpath, encoding = edcoding_type) as f: text = f.read() return text # 进行分词 def cut_words(text): words = " ".join(jieba.cut(text, cut_all = False)) # cut_all=False是精确模式 wordslist = words.split(" ") # 按空格进行分词 return wordslist # 读取停用词 def get_stopwordslist(stopwordspath): stopwords = [line.strip() for line in open(stopwordspath, encoding = edcoding_type).readlines()] return stopwords # 去掉停用词 def refined_words(wordlist, stopwordlist): wordlistrefined = " " for word in wordlist: if word not in stopwordlist: if word != '\t': wordlistrefined += word wordlistrefined += " " return wordlistrefined # 生成词云 def generate_wordcloud(wordlist, maskpath, fontpath, backgroundcolor, maxwords): mask = np.array(image.open(maskpath)) # 设置图形掩码 wordcloud = WordCloud( font_path = fontpath, # 设置字体路径 background_color = backgroundcolor, # 设置图片背景颜色 mask = mask, # 设置掩码 max_words = maxwords # 设置最大显示词数 ).generate(wordlist) return wordcloud text = get_txt(txt_path) # 获取文本 words_list = cut_words(text) # 获取词表 stop_words = get_stopwordslist(stopwords_path) # 加载停用词表 refined_words = refined_words(words_list, stop_words) # 去掉停用词 word_cloud = generate_wordcloud(refined_words, # 生成词云 mask_path, font_path, background_color, max_words) image_file = word_cloud.to_image() # 生成图片 image_file.show() # 显示图片 word_cloud.to_file(img_path) # 保存生成的图片
生成的结果如下:
资料内容如下图:
资料下载链接:wordcloud_source.zip