中山大学学报自然科学版 ›› 2019, Vol. 58 ›› Issue (3): 79-85.doi: 10.13471/j.cnki.acta.snus.2019.03.010

• 论文 • 上一篇    下一篇

AKNN-Qalsh: PostgreSQL系统高维空间近似最近邻检索插件

张楚涵,张家侨, 冯剑琳   

  1. 中山大学数据科学与计算机学院,广东 广州 510006
  • 收稿日期:2018-10-11 出版日期:2019-05-25 发布日期:2019-05-25
  • 通讯作者: 冯剑琳(1970年生),男;研究方向:数据库、数据挖掘;E-mail: fengjlin@mail.sysu.edu.cn

AKNN-Qalsh: an approximate KNN search extension for high-dimensional data in PostgreSQL

ZHANG Chuhan, ZHANG Jiaqiao, FENG Jianlin   

  1. School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China
  • Received:2018-10-11 Online:2019-05-25 Published:2019-05-25

摘要:

复杂数据对象(如图片、文本)通常被表示成高维特征向量。PostgreSQL系统现有的最近邻检索方法KNN-Gist基于树状索引实现,无法高效支持高维数据的最近邻检索。引入的PostgreSQL系统高维空间近似最近邻检索插件:AKNN-Qalsh,基于位置敏感哈希机制实现,支持大规模、高维数据对象的近似最近邻检索。通过在五个真实数据集上的密集实验,验证了该插件的有效性。

关键词: 高维数据, 特征向量, 最近邻检索, 位置敏感哈希, PostgreSQL插件

Abstract:

Complex data objects (such as pictures, text) are usually represented as high-dimensional feature vectors. The existing nearest neighbor search method KNN-Gist in PostgreSQL is based on the tree-structured index and cannot efficiently support the nearest neighbor search of high-dimensional data. The PostgreSQL system high-dimensional approximate nearest neighbor search extension: AKNN-Qalsh is introduced, which is based on the Locality-Sensitive Hashing scheme and supports approximate nearest neighbor search of large-scale, high-dimensional data objects. The effectiveness of the extension via extensive experiments on five real data sets is demonstrated.

Key words: high-dimensional data, feature vector, nearest neighbor search, Locality-Sensitive Hashing, PostgreSQL extension

中图分类号: