这一篇,讨论RedisSearch如何通过java进行操作。
RedisSearch是一个搜索的工具,在搜索的时候,它会先将要搜索的内容进行分词处理,创建索引的时候也会分词。对于英文来说,分词比较简单,基本上空格和标点符号就可以,但是中文分词相对复杂一些,因为中文不能通过空格进行简单的分词。
现在有各种不同的中文分词器,比如jieba,IK已经RedisSearch使用的分词器:friso。
friso在gitee上可以找到:https://gitee.com/lionsoul/friso
friso的具体使用,可以参考gitee中的介绍。
我在使用之前,先将friso和jieba进行了简单的分词效果对比,发现其分词效果比起jieba还是差一点,此处并无对作者不敬之意。并且friso目前也只是出于维护状态,版本已经五年没有更新,friso的主要作者在维护一个新的分词器,有兴趣可以移步观摩一下:
https://gitee.com/lionsoul/jcseg#jcseg%E6%98%AF%E4%BB%80%E4%B9%88
因为我要做中文分词,friso默认的字典并不适合,所以需要自定义字典。
RedisSearch将friso直接打包,如果想自定义,只能通过更改friso的初始化配置来实现。
可以先看下默认的配置:
127.0.0.1:6379> FT.CONFIG get *
1) 1) EXTLOAD
2) (nil)
2) 1) SAFEMODE
2) true
3) 1) CONCURRENT_WRITE_MODE
2) false
4) 1) NOGC
2) false
5) 1) MINPREFIX
2) 2
6) 1) FORKGC_SLEEP_BEFORE_EXIT
2) 0
7) 1) MAXDOCTABLESIZE
2) 1000000
8) 1) MAXSEARCHRESULTS
2) 1000000
9) 1) MAXAGGREGATERESULTS
2) unlimited
10) 1) MAXEXPANSIONS
2) 200
11) 1) MAXPREFIXEXPANSIONS
2) 200
12) 1) TIMEOUT
2) 500
13) 1) INDEX_THREADS
2) 8
14) 1) SEARCH_THREADS
2) 20
15) 1) FRISOINI
2) nil
16) 1) ON_TIMEOUT
2) return
17) 1) GCSCANSIZE
2) 100
18) 1) MIN_PHONETIC_TERM_LEN
2) 3
19) 1) GC_POLICY
2) fork
20) 1) FORK_GC_RUN_INTERVAL
2) 30
21) 1) FORK_GC_CLEAN_THRESHOLD
2) 100
22) 1) FORK_GC_RETRY_INTERVAL
2) 5
23) 1) _MAX_RESULTS_TO_UNSORTED_MODE
2) 1000
24) 1) UNION_ITERATOR_HEAP
2) 20
25) 1) CURSOR_MAX_IDLE
2) 300000
26) 1) NO_MEM_POOLS
2) false
27) 1) PARTIAL_INDEXED_DOCS
2) false
28) 1) UPGRADE_INDEX
2) Upgrade config for upgrading
29) 1) _NUMERIC_COMPRESS
2) false
30) 1) _PRINT_PROFILE_CLOCK
2) true
31) 1) RAW_DOCID_ENCODING
2) false
32) 1) _NUMERIC_RANGES_PARENTS
2) 0
第15个配置项就是friso的配置,默认是空的。
如果想更改可以,我目前找到两种方法,但是是试了第一种:
在redis启动时,增加参数配置,如下:
redis-server --loadmodule /usr/lib/redis/modules/redisearch.so FRISOINI /home/friso.ini
命令可以放到Dockerfile中,并在file中cp friso的初始化文件和字典:
FROM redislabs/redisearch:latest
MAINTAINER qzh "qiaozh2006@126.com"WORKDIR /opt/
ADD friso.ini /home/
ADD friso_dict /home/
EXPOSE 6379
ENTRYPOINT ["redis-server", "--loadmodule", "/usr/lib/redis/modules/redisearch.so","FRISOINI", "/home/friso.ini"]
friso.ini文件可以从gitee上获取,只需要更改其中的字典路径即可
friso.lex_dir = /home/vendors/dict/UTF-8/
friso_dict 文件夹内容结构为:
friso_dict
-vendors
--Makefile.am
--dict
---Makefile.am
---GBK
---UTF-8
----friso.lex.ini
----lex-placename.lex
lex-placename.lex即为自定义字典,主要包括的是地名,在文件friso.lex.ini中将自定义字典加入:
__LEX_CJK_WORDS__ :[
lex-main.lex;
lex-admin.lex;
lex-chars.lex;
lex-cn-mz.lex;
lex-cn-place.lex;
lex-company.lex;
lex-festival.lex;
lex-flname.lex;
lex-food.lex;
lex-lang.lex;
lex-nation.lex;
lex-net.lex;
lex-org.lex;
lex-touris.lex;
lex-placename.lex;
# add more here
]
可以通过redis配置文件的方式,加载模块和相关配置。但是这种方式只是网上查的可以,并没有试过,以后有空再试下。
自定义字典配置好后,打包,运行,再看下RedisSearch中friso的配置
15) 1) FRISOINI
2) /home/friso.ini
仍然是用jedis,相关配置一样,可以在第一篇文章中找到:RedisJson和RedisSearch探究(一)_六狗回来的博客-CSDN博客
定义province:
package com.redisStream.pojo.address; import lombok.Getter; import lombok.Setter; import java.util.List; @Getter @Setter public class Province { private String provinceName; private String provincePinyin; private List<City> cityList ; }
定义city:
package com.redisStream.pojo.address; import com.fasterxml.jackson.annotation.JsonFormat; import lombok.Getter; import lombok.Setter; import java.util.List; @Getter @Setter public class City { private String cityName; private List<County> countyList; private String cityPinyin; //"geoinfo":-122.064228,37.377658 private String geoinfo; }
定义county :
package com.redisStream.pojo.address; import com.fasterxml.jackson.annotation.JsonFormat; import lombok.Getter; import lombok.Setter; import java.util.List; @Getter @Setter public class County { private String countyName; private String countyPinyin; private List<String> attributes; }
package com.redisStream.utils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import redis.clients.jedis.UnifiedJedis; import redis.clients.jedis.search.FieldName; import redis.clients.jedis.search.IndexDefinition; import redis.clients.jedis.search.IndexOptions; import redis.clients.jedis.search.Schema; import javax.annotation.PostConstruct; import java.lang.reflect.Field; import java.util.Map; @Component public class RedisSearchUtils { private static final Logger log = LoggerFactory.getLogger(RedisSearchUtils.class); @Autowired private UnifiedJedis jedis; private String prefix = "$."; @PostConstruct private void init(){ createIndex("place-index","place:", new String[]{"provinceName","cityList[*].cityName","cityList[*].geoinfo","cityList[*].countyList[*].countyName"}); } public boolean createIndex(String indexName, String key, String... fields){ try { try{ Map<String, Object> map = jedis.ftInfo(indexName); log.info("index configuration:{}",map); jedis.ftDropIndex(indexName); } catch (Exception e){ log.error("the index does not exist", e); } Schema schema = new Schema(); float weight = 1.0f; for(String field : fields) { String attribute; if (StringUtils.isNoneBlank(field)) { if (field.indexOf(".") == -1) { attribute = field; } else { String[] fieldSplit = field.split("\\."); attribute = fieldSplit[fieldSplit.length - 1]; } if (attribute.toLowerCase().startsWith("geo")) { Schema.Field field1 = new Schema.Field(FieldName.of(prefix + field).as(attribute), Schema.FieldType.GEO); //schema.addGeoField(prefix + field); schema.addField(field1); continue; } else { Schema.TextField textField = new Schema.TextField(FieldName.of(prefix + field).as(attribute), weight, false, false, false, null); schema.addField(textField); weight *= 3; continue; } } } IndexDefinition rule = new IndexDefinition(IndexDefinition.Type.JSON).setLanguage("chinese") .setPrefixes(new String[]{key}); jedis.ftCreate(indexName, IndexOptions.defaultOptions().setDefinition(rule), schema); return true; } catch (Exception e){ log.error("create redis search index failed", e); return false; } } }
package com.redisStream.controller; import com.alibaba.fastjson.JSON; import com.redisStream.pojo.address.Province; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RestController; import redis.clients.jedis.UnifiedJedis; import redis.clients.jedis.json.Path2; import redis.clients.jedis.search.Document; import redis.clients.jedis.search.Query; import redis.clients.jedis.search.SearchResult; import java.util.HashMap; import java.util.List; import java.util.Map; @RestController public class PlaceController { private static final Logger log = LoggerFactory.getLogger(PlaceController.class); @Autowired private UnifiedJedis jedis; private String key_prefix = "place:"; @PostMapping("/addProvince") public String addProvince(@RequestBody Province newKeyInfo) throws Exception{ jedis.jsonSet(key_prefix + newKeyInfo.getProvinceName(), JSON.toJSONString(newKeyInfo)); return JSON.toJSONString(jedis.jsonGet(key_prefix + newKeyInfo.getProvinceName())); }
测试一下,发送请求增加数据:
POST http://localhost:8081/addforProvince
{
"provinceName": "河北省",
"provincePinyin": "hebeisheng",
"cityList": [{
"cityName": "张家口市",
"cityPinyin": "zhangjiakoushi",
"geoinfo": "115.408848,40.970239",
"countyList": [{
"countyName": "崇礼县",
"countyPinyin": "chonglixian",
"attributes": ["滑雪场","高山"]
}]
}]}
返回值:
{"cityList":[{"cityName":"张家口市","cityPinyin":"zhangjiakoushi","countyList":[{"attributes":["滑雪场","高山"],"countyName":"崇礼县","countyPinyin":"chonglixian"}],"geoinfo":"115.408848,40.970239"}],"provinceName":"河北省","provincePinyin":"hebeisheng"}
@PostMapping("/queryforProvince") public Map<String, String> queryProvince(@RequestBody String keyword) throws Exception { Query q = new Query("@provinceName:" + keyword); SearchResult result = jedis.ftSearch(indexName,q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
发送请求: POST http://localhost:8081/queryforProvince
河北省
发送之后的response为空。
为什么是空?我试了好几种方式,最后发现,在创建index的时候,将geo去掉就可以了。
不能同时创建geo 类型的index么?查了官网,也没有找到相关说明。这个地方应该我觉得可能是jsonpath搞错了,但是翻看了JSONPath - XPath for JSON,试了几次,也没有别的办法。
没有办法,只好将index拆开两部分。
createIndex("place-index","place:", new String[]{"provinceGeoInfo", "provinceName","cityList.cityName","cityList.countyList.countyName"});
createIndex("place-geo-index","place:", new String[]{"cityList[*].geoinfo"});
哪位知道怎么弄,请不吝赐教,回复一下。
index更改之后,再试一下,就可以了。
createIndex("place-index","place:", new String[]{"provinceName","cityList[*].cityName","cityList[*].countyList[*].countyName"}); createIndex("place-geo-index","place:", new String[]{"provinceName","cityList[*].geoinfo"});
@PostMapping("/queryforCity") public Map<String, String> queryCity(@RequestBody String keyword) throws Exception { Query q = new Query("@cityName:" + keyword); SearchResult result = jedis.ftSearch(indexName,q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
和获取省份是一样的,只是生成Query的时候,指定的field不同,实验一下:
POST http://localhost:8081/queryforCity
{张家口市}
@PostMapping("/queryforAddrall") public Map<String, String> queryAddrALl(@RequestBody String keyword) throws Exception { Query q = new Query(keyword); SearchResult result = jedis.ftSearch("place-index",q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
这是时候无论,搜索河北省还是张家口市,这个接口都可以搜到:
@PostMapping("/queryforgeo") public Map<String, String> queryGeo(@RequestBody GEOQueryBody body) throws Exception { Query q = new Query(body.getName()); if(StringUtils.isNoneBlank(body.getGeoinfo())) { String[] geo = body.getGeoinfo().split(","); q.addFilter(new Query.GeoFilter("geoinfo", Double.parseDouble(geo[0].trim()), Double.parseDouble(geo[1].trim()), Double.parseDouble(geo[2].trim()), Query.GeoFilter.KILOMETERS)); } SearchResult result = jedis.ftSearch("place-geo-index",q); List<Document> docs = result.getDocuments(); Map<String, String> map = new HashMap<>(); for(Document doc : docs){ doc.getProperties().forEach(a -> map.put(doc.getId(), a.toString())); } return map; }
GEOQueryBody也很简单:
public class GEOQueryBody { private String name; private String geoinfo; }
需要注意的是,GEOFilter的第一个参数property,只要写字段的名字就可以了,即写成:
q.addFilter(new Query.GeoFilter("geoinfo", Double.parseDouble(geo[0].trim()), Double.parseDouble(geo[1].trim()), Double.parseDouble(geo[2].trim()), Query.GeoFilter.KILOMETERS));
定义index的时候,没有使用as来取别名,那么要写成jsonpath路径,比如:
q.addFilter(new Query.GeoFilter("$.geoinfo", Double.parseDouble(geo[0]), Double.parseDouble(geo[1]), Double.parseDouble(geo[2]), Query.GeoFilter.KILOMETERS));
发请求试一下:
如果把经纬度更换了,比如变为:
{
"name":"河北省",
"geoinfo":"125.408848,40.970239,20"
}
搜索结果就为空。
@PostMapping("/addforCity") public String addCity(@RequestBody Province newKeyInfo) throws Exception{ Path2 path = new Path2("$." + ".cityList"); jedis.jsonArrAppend(key_prefix + newKeyInfo.getProvinceName(), path, JSON.toJSONString(newKeyInfo.getCityList().get(0))); return JSON.toJSONString(jedis.jsonGet(key_prefix + newKeyInfo.getProvinceName())); }
添加一下:
{
"provinceName": "河北省",
"geoinfoprovince": "125.1111,33.2222",
"provincePinyin": "hebeisheng",
"cityList": [{
"cityName": "石家庄市",
"cityPinyin": "shijiazhuangshi",
"geoinfo": "125.408848,41.970239",
"countyList": [{
"countyName": "正定县",
"countyPinyin": "chonglixian",
"attributes": ["正定","历史名城"]
}]
}]}
返回结果为:
{"cityList":[{"cityName":"张家口市","cityPinyin":"zhangjiakoushi","countyList":[{"attributes":["滑雪场","高山"],"countyName":"崇礼县","countyPinyin":"chonglixian"}],"geoinfo":"115.408848,40.970239"},{"cityName":"石家庄市","cityPinyin":"shijiazhuangshi","countyList":[{"attributes":["正定","历史名城"],"countyName":"正定县","countyPinyin":"chonglixian"}],"geoinfo":"125.408848,41.970239"}],"geoinfoprovince":"125.1111,33.2222","provinceName":"河北省","provincePinyin":"hebeisheng"}
虽然有的还没有摸透,但是基本上能找到解决方法,但是我在尝试对jsonarray进行索引,并添加了多个list内容的时候的时候,怎么也无法搜索,也就是以上的例子,如果我再加一个石家庄市,然后,再进行相同的搜索,就会返回空结果。折腾了半天,后来在官网发现了这句话:
JSON arrays can only be indexed in a TAG field.
也就是说,如果是text类型,是不支持创建jsonarray类型的。
心塞啊,我希望自己看错了。