一、安装Spark
检查jdk和spark运行测试
二、Python编程练习:英文文本的词频统计
源代码:
import string dict={} txt=open('test.txt','r',encoding="UTF-8").read().lower() for ch in string.punctuation: txt=txt.replace(ch,"") list=[] list=txt.split(" ") for i in list: if i in dict: dict[i]+=1 else: dict[i]=1 dict= sorted(dict.items(),key=lambda d:d[1],reverse= True) f = open('res.txt','w') for items in dict: f.writelines('{}--{}'.format(items[0],items[1]) + '\n')
运行结果: