Open Source NLP Frameworks
Built atop Zalando Research’s Flair and Hugging Face’s Transformers library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks with an Easy API for training, inference, and deploying NLP-based microservices.
License: Unknown
Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales’ research lab, ICLR&D.
License: Apache License 2.0
Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models.
License: Mozilla Public License 2.0
A Conditional Transformer Language Model for Controllable Generation released by SalesForce.
License: BSD 3-Clause "New" or "Revised" License
Dust assists in the design and deployment of large language model apps.
License: MIT License
ESPnet is an end-to-end speech processing toolkit.
License: Apache License 2.0
PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc..
License: Other
Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.
License: Other
GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
License: Apache License 2.0
Gretel Synthetics is a synthetic data generators for structured and unstructured text, featuring differentially private learning.
License: Other
Guardrails is a package that lets a user add structure, type and quality guarantees to the outputs of large language models.
License: Apache License 2.0
h2oGPT is an open source generative AI, gives organizations like yours the power to own large language models while preserving your data ownership.
License: Apache License 2.0
Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.
License: Apache License 2.0
ICE is a Python library and trace visualizer for language model programs.
License: MIT License
Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
License: Apache License 2.0
LangChain assists in building applications with LLMs through composability.
License: MIT License
LlamaIndex (GPT Index) is a data framework for your LLM application.
License: MIT License
LMFlow is an extensible, convenient, and efficient toolbox for finetuning large machine learning models.
License: Apache License 2.0
Megatron-LM is a highly optimized and efficient library for training large language models.
License: Other
MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases.
License: Apache License 2.0
Get up and running with large language models, locally.
License: MIT License
A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be “meaning-aware”.
License: MIT License
Sentence Transformers provides an easy method to compute dense vector representations for sentences, paragraphs, and images.
License: Apache License 2.0
Industrial-strength natural language processing library built with python and cython by the explosion.ai team.
License: MIT License
A framework for building neural networks in Tensorflow, particularly sequence models.
License: Apache License 2.0
TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.
License: Apache License 2.0
Huggingface’s library of state-of-the-art pretrained models for Natural Language Processing (NLP).
License: Apache License 2.0
Large Language Model Text Generation Inference under TFOIL license.
License: Other
YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE).
License: MIT License
Last Updated: Dec 26, 2023