Baumann, Philipp; Hochbaum, Dorit S.; Spaen, Quico (11 December 2017). High-performance geometric algorithms for sparse computation in big data analytics. In: Proceedings of the 2017 IEEE International Conference on Big Data. Boston. 11.-14.12.2017. 10.1109/BigData.2017.8257970
Text
High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics.pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (573kB) |
Several leading supervised and unsupervised machine learning algorithms require as input similarities between objects in a data set. Since the number of pairwise similarities grows quadratically with the size of the data set, it is computationally prohibitive to compute all pairwise similarities for large-scale data sets. The recently introduced methodology of “sparse computation” resolves this issue by computing only the relevant similarities instead of all pairwise similarities. To identify the relevant similarities, sparse computation efficiently projects the data onto a low-dimensional space where a similarity is considered relevant if the corresponding objects are close in this space. The relevant similarities are then computed in the original space. Sparse computation identifies close pairs by partitioning the low-dimensional space into grid blocks, and considering objects close if they fall in the same or adjacent grid blocks. This guarantees that all pairs of objects that are within a specified L∞ distance are identified as well as some pairs that are within twice this distance. For very large data sets, sparse computation can have high runtime due to the enumeration of pairs of adjacent blocks. We propose here new geometric algorithms that eliminate the need to enumerate adjacent blocks. Our empirical results on data sets with up to 10 million objects show that the new algorithms achieve a significant reduction in runtime. The algorithms have applications in large-scale computational geometry and (approximate) nearest neighbor search. Python implementations of the proposed algorithms are publicly available.
Item Type: |
Conference or Workshop Item (Paper) |
---|---|
Division/Institute: |
03 Faculty of Business, Economics and Social Sciences > Department of Business Management > Institute of Financial Management > Professorship for Quantitative Methods in Business Administration |
UniBE Contributor: |
Baumann, Philipp |
Subjects: |
600 Technology > 650 Management & public relations |
Language: |
English |
Submitter: |
Juliana Kathrin Moser-Zurbrügg |
Date Deposited: |
04 Apr 2018 08:04 |
Last Modified: |
05 Dec 2022 15:11 |
Publisher DOI: |
10.1109/BigData.2017.8257970 |
BORIS DOI: |
10.7892/boris.111909 |
URI: |
https://boris.unibe.ch/id/eprint/111909 |