在自然语言处理领域,会遇到两个或多个单词具有共同根的情况。 例如,三个词 - agreed
, agreeing
和 agreeable
具有相同的词根同意。 涉及任何这些词的搜索应将它们视为同一个词,即根词。 因此,将所有单词链接到根词中变得至关重要。 NLTK
库具有执行此链接的方法,并提供显示根词的输出。
nltk
中有三种最常用的词干算法。它们的结果略有不同。 以下示例显示了所有三种词干算法及其结果的使用。
import nltk from nltk.stem.porter import PorterStemmer from nltk.stem.lancaster import LancasterStemmer from nltk.stem import SnowballStemmer porter_stemmer = PorterStemmer() lanca_stemmer = LancasterStemmer() sb_stemmer = SnowballStemmer("english",) word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" # First Word tokenization nltk_tokens = nltk.word_tokenize(word_data) #Next find the roots of the word print '***PorterStemmer****\n' for w_port in nltk_tokens: print "Actual: %s || Stem: %s" % (w_port,porter_stemmer.stem(w_port)) print '\n***LancasterStemmer****\n' for w_lanca in nltk_tokens: print "Actual: %s || Stem: %s" % (w_lanca,lanca_stemmer.stem(w_lanca)) print '\n***SnowballStemmer****\n' for w_snow in nltk_tokens: print "Actual: %s || Stem: %s" % (w_snow,sb_stemmer.stem(w_snow))
当运行上面的程序时,我们得到以下输出 -
***PorterStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famou Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: hi Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: hi Actual: subalterns || Stem: subaltern ***LancasterStemmer**** Actual: Aging || Stem: ag Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: fam Actual: crime || Stem: crim Actual: family || Stem: famy Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transf Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: on Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern ***SnowballStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famous Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern