Unlike its image based counterpart, point cloud based retrieval for place recognition has remained as an unexplored and unsolved problem. This is largely due to the difficulty in extracting local feature descriptors from a point cloud that can subsequently be encoded into a global de-scriptor for the retrieval task. In this paper, we propose the PointNetVLAD where we leverage on(利用) the recent success of deep networks to solve point cloud based retrieval for place recognition. Specifically, our PointNetVLAD is a combi-nation/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud. Fur-thermore, we propose the “lazy triplet and quadruplet” loss functions that can achieve more discriminative and gener-alizable global descriptors to tackle(处理,解决) the retrieval task. We create benchmark datasets for point cloud based retrieval for place recognition, and the experimental results on these datasets show the feasibility of our PointNetVLAD. Our code and datasets are publicly available on the project web-site
点云检索的难点在于如何提取提取一个可以被编码为用于检索任务的全局描述符的局部特征描述符(有一点绕)。
本文提出了什么呢?
lazy triplet and quadruplet
loss function.LiDAR
(Light Detection and Ranging) 激光雷达
SfM
(Structure-from-Motion) 动态结构
circumvent
避开
benchmark datasets
基准数据集
\(\mathcal{M}\) 固定框架下的3D数据库。
AOC
Area of coverage 覆盖区域
进一步定义将\(\mathcal{M}\)分解为\(M\)个子图。
那么\(\mathcal{M} = \cup_{i=1}^{M} m_i | AOC(m_i) \approx AOC(m_j)\).
并且我们希望子图\(m_i\)是比较小的,满足\(|m_i| \ll |\mathcal{M}|\)
\(\mathcal{G}(\cdot)\) 下采样,但是实际上作者提前处理了。下采样之后回保证子图点云点数一样。
\(f(\cdot)\) 是对于一个给定的点云\(\bar{p}\)将其映射为固定大小的全剧描述符变量。
Given a query 3D point cloud denoted as \(q\), where \(\verb|AOC|(q) \approx \verb|AOC|(m_i)\) and \(|\mathcal{G}(q)|= |\mathcal{G}(m_i)|\) , out goal is to retrieve the submap \(m_*\) from the database \(\mathcal{M}\) that is structurally most similar to \(q\).
其实就是将点云映射成某个\(m\)维向量,然后\(\text{KNN}\)去找就好了。
这部分先略过,因为我们专门有PointNet的文章。只需要知道
需要补充很多知识点,比如:
NetVLAD
Hinge Loss
and SVM
(这个估计要看书才会)triplet loss
✅一些参考链接:
NetVLAD 笔记
NetVLAD 知乎
如何理解各种Loss ✅ 这个写的非常好
总而言之,我们通过NetVLAD
将PointNet
得到的 local descriptors 转化为一个\(D\times K\)的全局向量。
有一个问题,为什么我们要进行转化呢?
但是考虑到\(D\times K\)维度太高了,所以我们用 full connected layer 去降维,最后用L2 Normalized
产生最终的全局描述符(\(f(P) \in \R ^ {\mathcal{O}} | \mathcal{O} \ll (D\times K)\))。
但是仍然有几个问题:
- full connected layer 为什么可以降维,如何实现的?(应该是看论文)
- 什么是
L2-Normalized
,那么L1-Norm
存在吗?这里为什么要用?解决了什么.About L2 Normalization Link:
- Kaggle, 并不是很好写的,但是对比介绍很详细
- 知乎, 写的很好
- 带一点代码的
\(l_2(\Vert v \Vert_2)\)就是欧几里得范数,相比
\[l_2 = \sqrt{\sum \left( x_i^2\right)} \]L1
不够鲁邦,无法输出稀疏数据。
什么是Metric Learning ?
知乎