Open Source Data Storage Optimization
AIStore is a lightweight object storage system with the capability to linearly scale out with each added storage node and a special focus on petascale deep learning.
License: MIT License
A virtual distributed storage system that bridges the gab between computation frameworks and storage systems.
License: Apache License 2.0
In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc..
License: Unknown
A high performance real-time analytics database. Check this article for introduction.
License: Apache License 2.0
A memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at petabyte scale - Demo .
License: Apache License 2.0
On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc..
License: Unknown
A realtime distributed OLAP datastore. Comparison of the open source OLAP systems for big data: ClickHouse, Druid, and Pinot is found here .
License: Apache License 2.0
A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. - (Video)
License: Apache License 2.0
BayesDB is an AI-native embedding database.
License: Apache License 2.0
ClickHouse is an open source column oriented database management system.
License: Apache License 2.0
Delta Lake is a storage layer that brings scalable, ACID transactions to Apache Spark and other big-data engines.
License: Apache License 2.0
NoSQL interface for Postgres that allows for object interaction to data stored.
License: Apache License 2.0
GPTCache is a library for creating semantic cache for large language model queries.
License: MIT License
HDFS-compatible file system with scale-out strongly consistent metadata.
License: Apache License 2.0
A low-latency vector search engine (GraphQL, RESTful) with out-of-the-box support for different media types. Modules include Semantic Search, Q&A, Classification, Customizable Models (PyTorch/TensorFlow/Keras), and more.
License: BSD 3-Clause "New" or "Revised" License
Python implementation of chunked, compressed, N-dimensional arrays designed for use in parallel computing.
License: MIT License
Last Updated: Dec 26, 2023