Join Our Discord (630+ Members)

8 Metadata Management Frameworks for Data Governance and Machine Learning Pipelines

Explore open source metadata management frameworks for data governance and ML pipelines, ensuring proper documentation and lineage tracking.

Open Source Metadata Management Frameworks

  • Amundsen

    Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

    License: Apache License 2.0

  • ArangoML Pipeline

    ArangoML Pipeline is a common and extensible Metadata Layer for Machine Learning Pipelines which allows Data Scientists and DataOps to manage all information related to their ML pipeline in one place.

    License: No License

  • Apache Atlas

    Apache Atlas framework is an extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem.

    License: Apache License 2.0

  • DataHub

    DataHub is LinkedIn’s generalized metadata search & discovery tool.

    License: Apache License 2.0

  • Marquez

    Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata.

    License: Apache License 2.0

  • Metacat

    Metacat is a unified metadata exploration API service. Metacat focusses on solving these problems: 1) federated views of metadata systems; 2) arbitrary metadata storage about data sets; 3) metadata discovery.

    License: Apache License 2.0

  • ML Metadata

    a library for recording and retrieving metadata associated with ML developer and data scientist workflows.

    License: Apache License 2.0

  • Model Card Toolkit

    streamlines and automates generation of Model Cards .

    License: Apache License 2.0

Last Updated: Dec 26, 2023