场景:
将下面的数据里category里的分类统计计数
数据源
es_ip10000.json
{"_index":"order","_type":"service","_id":"107.151.83.180:22","_score":1,"_source":{"ip":"107.151.83.180","parent_category":["支撑系统"],"category":["其他支撑系统"]}} {"_index":"order","_type":"service","_id":"107.151.84.167:22","_score":1,"_source":{"ip":"107.151.84.167","parent_category":["支撑系统"],"category":["其他支撑系统"]}} {"_index":"order","_type":"service","_id":"107.151.84.177:22","_score":1,"_source":{"ip":"107.151.84.177","parent_category":["支撑系统"],"category":["其他支撑系统"]}} {"_index":"order","_type":"service","_id":"107.152.188.252:1723","_score":1,"_source":{"ip":"107.152.188.252","parent_category":["网络产品"],"category":["路由器"]}} {"_index":"order","_type":"service","_id":"107.151.89.125:1025","_score":1,"_source":{"ip":"107.151.89.125"}} {"_index":"order","_type":"service","_id":"107.152.58.217:22","_score":1,"_source":{"ip":"107.152.58.217","parent_category":["支撑系统"],"category":["服务"]}} {"_index":"order","_type":"subdomain","_id":"107.15.221.83:443","_score":1,"_source":{"ip":"107.15.221.83","parent_category":["办公外设","系统软件"],"category":["打印机","操作系统"]}}
取_source
下的category
字段
cat es_ip10000.json | jq ._source.category > category.txt
输出结果
[ "其他支撑系统" ] [ "其他支撑系统" ] [ "其他支撑系统" ] [ "路由器" ] null [ "服务" ] [ "打印机", "操作系统" ]
用编辑器,去除 ,
[
和 ]
处理后的结果
"其他支撑系统" "其他支撑系统" "其他支撑系统" "路由器" null "服务" "打印机" "操作系统"
排序 -->去重->统计->再排序
cat category.txt | sort | uniq -c | sort -n >category_count.txt
说明:
uniq -c #去重并统计
sort -n # 正序排序
输出结果:
1 null 1 "操作系统" 1 "打印机" 1 "服务" 1 "路由器" 3 "其他支撑系统" 12