HVite是解码工具,输出语音信号,和字典信息、声学模型、语言模型等条件下,输出对应的转录文本(transcription)。
首先,字典(Vocab)的结构如下:
typedef struct { int nwords; /* total number of words */ int nprons; /* total number of prons */ Word nullWord; /* dummy null word/node */ Word subLatWord; /* special word for HNet subLats */ Word *wtab; /* hash table for DictEntry's */ MemHeap heap; /* storage for dictionary */ MemHeap wordHeap; /* for DictEntry structs */ MemHeap pronHeap; /* for WordPron structs */ MemHeap phonesHeap; /* for arrays of phones */ } Vocab;
包含了词的个数、发音的个数、字典入口(DictEntry)的hash表,每个槽为指向一个DictEntry的指针(Word),那么DictEntry的结构如下:
typedef struct _DictEntry{ LabId wordName; /* word identifier */ Pron pron; /* first pronunciation */ int nprons; /* number of prons for this word */ Word next; /* next word in hash table chain */ void *aux; /* hook used by HTK library modules for temp info */ } DictEntry;
它指明了当前word的名字、发音以及是否有多个发音(nprons大于1)等等。然后是一些初始化HMMSet、加载模型参数等等,在之前的工具中都分析过了。
在解码过程中,比较重要而之前又没涉及过的数据结构是Lattice,理解Lattice的结构和作用,viterbi的算法就理解差不多了。可以认为Lattice是掌握viterbi算法的钥匙。
typedef struct lattice { MemHeap *heap; /* Heap lattice uses */ LatFormat format; /* indicate which fields are valid */ Vocab *voc; /* Dictionary lattice based on */ int nn; /* Number of nodes */ int na; /* Number of arcs */ LNode *lnodes; /* Array of lattice nodes */ LArc *larcs; /* Array of lattice arcs */ LabId subLatId; /* Lattice Identifier (for SubLats only) */ SubLatDef *subList; /* List of sublats in this lattice level */ SubLatDef *refList; /* List of all SubLats referring to this lat */ struct lattice *chain; /* Linked list used for various jobs */ char *utterance; /* Utterance file name (NULL==unknown) */ char *vocab; /* Dictionary file name (NULL==unknown) */ char *hmms; /* MMF file name (NULL==unknown) */ char *net; /* Network file name (NULL==unknown) */ float acscale; /* Acoustic scale factor */ float lmscale; /* LM scale factor */ LogFloat wdpenalty; /* Word insertion penalty */ float prscale; /* Pronunciation scale factor */ HTime framedur; /* Frame duration in 100ns units */ float logbase; /* base of logarithm for likelihoods in lattice files (1.0 = default (e), 0.0 = no logs) */ float tscale; /* time scale factor (default: 1, i.e. seconds) */ Ptr hook; /* User definable hook */ } Lattice;
首先,我们从总体上了解Lattice是什么。然后再逐步细化下去。lattice结构包含的信息有多个节点、多少条边、节点和边构成的数组、字lattices和它的所有前向lattices和一些参数系数等。