建立倒排索引

有两个doc，

doc1：I really liked my small dogs, and I think my mom also liked them.
doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

分词，初步的倒排索引的建立

word	doc1	doc2
I	*	*
really	*
liked	*	*
my	*	*
small	*
dogs	*
and	*
think	*
mom	*	*
also	*
them	*
He		*
never		*
any		*
so		*
hope		*
that		*
will		*
not		*
expect		*
me		*
to		*
him		*

演示了一下倒排索引最简单的建立的一个过程

进行搜索

mother like little dog

分词

mother
like
little
dog

分词后去倒排索引去匹配，没有任何结果

这个不是我们想要的搜索结果，因为在我们看来，mother和mom是同义词，都是妈妈的意思。like和liked没有区别吗，都是喜欢的意思，只不过一个是现在时，一个是过去时。little和small没有区别，是同义词，都是小的意思。dog和dogs也没有区别，都是狗，只不过一个是单数，一个是复数。

建立索引的时候，除了创建倒排索引，还会执行normalization操作

normalization：就是说对拆分出的各个单词进行相应的处理，以提升后面搜索的时候能够搜索到相关联的文档的概率

比如时态的转换，单复数的转换，同义词的转换，大小写的转换

mom —> mother
liked —> like
small —> little
dogs —> dog

重新建立倒排索引，加入normalization，再次用mother like little dog搜索，就可以搜索到了

word	doc1	doc2	normalization
I	*	*
really	*
like	*	*	liked --> like
my	*	*
little	*		small --> little
dog	*		dogs --> dog
and	*
think	*
mom	*	*	mother ->mom
also	*
them	*
He		*
never		*
any		*
so		*
hope		*
that		*
will		*
not		*
expect		*
me		*
to		*
him		*

mother like little dog，分词，经过normalization操作

统一变为

mother	--> mom
liked	--> like
small	--> little
dogs	--> dog

所以doc1和doc2都能搜索出来

doc1：I really liked my small dogs, and I think my mom also liked them.
doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

32倒排索引核心原理

建立倒排索引

进行搜索

建立索引的时候，除了创建倒排索引，还会执行normalization操作

[ 申请 ]友情链接：