37 Large Language Model (LLM) Frameworks and Tools for Natural Language Processing (NLP)

AdaptNLP

Built atop Zalando Research’s Flair and Hugging Face’s Transformers library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks with an Easy API for training, inference, and deploying NLP-based microservices.

License: Unknown

GitHub
Website: Unknown

Blackstone

Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales’ research lab, ICLR&D.

License: Apache License 2.0

GitHub
Website: https://research.iclr.co.uk

Coqui STT

Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models.

License: Mozilla Public License 2.0

GitHub
Website: https://coqui.ai

CTRL

A Conditional Transformer Language Model for Controllable Generation released by SalesForce.

License: BSD 3-Clause "New" or "Revised" License

GitHub
Website: https://arxiv.org/abs/1909.05858

dspy

A framework for programming with foundation models.

License: MIT License

GitHub

Dust

Dust assists in the design and deployment of large language model apps.

License: MIT License

GitHub
Website: https://dust.tt

ESPnet

ESPnet is an end-to-end speech processing toolkit.

License: Apache License 2.0

GitHub
Website: https://espnet.github.io/espnet/

Facebook's XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc..

License: Other

GitHub

FastChat

FastChat is an open platform for training, serving, and evaluating large language model based chatbots.

License: Apache License 2.0

GitHub

Flair

Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.

License: Other

GitHub
Website: https://flairnlp.github.io/flair/

FlexGen

FlexGen is a high-throughput generation engine for running large language models with limited GPU memory.

License: Apache License 2.0

GitHub

GluonNLP

GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.

License: Apache License 2.0

GitHub
Website: https://nlp.gluon.ai/

Gretel Synthetics

Gretel Synthetics is a synthetic data generators for structured and unstructured text, featuring differentially private learning.

License: Other

GitHub
Website: https://gretel.ai/platform/synthetics

Grover

Grover is a model for Neural Fake News – both generation and detection. However, it probably can also be used for other generation tasks.

License: Apache License 2.0

GitHub

Guardrails

Guardrails is a package that lets a user add structure, type and quality guarantees to the outputs of large language models.

License: Apache License 2.0

GitHub
Website: https://docs.guardrailsai.com/

h2oGPT

h2oGPT is an open source generative AI, gives organizations like yours the power to own large language models while preserving your data ownership.

License: Apache License 2.0

GitHub
Website: http://h2o.ai

Haystack

Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.

License: Apache License 2.0

GitHub
Website: https://haystack.deepset.ai

Interactive Composition Explorer

ICE is a Python library and trace visualizer for language model programs.

License: MIT License

GitHub
Website: https://ice.ought.org

Kashgari

Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.

License: Apache License 2.0

GitHub
Website: http://kashgari.readthedocs.io/

Lamini

Lamini is an LLM engine for rapidly customizing models.

License: No License

GitHub

LangChain

LangChain assists in building applications with LLMs through composability.

License: MIT License

GitHub
Website: https://python.langchain.com

LlamaIndex

LlamaIndex (GPT Index) is a data framework for your LLM application.

License: MIT License

GitHub
Website: https://docs.llamaindex.ai

LLaMA

LLaMA is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference.

License: Other

GitHub

LMFlow

LMFlow is an extensible, convenient, and efficient toolbox for finetuning large machine learning models.

License: Apache License 2.0

GitHub
Website: https://optimalscale.github.io/LMFlow/

Megatron-LM

Megatron-LM is a highly optimized and efficient library for training large language models.

License: Other

GitHub

MLC LLM

MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases.

License: Apache License 2.0

GitHub
Website: https://llm.mlc.ai/docs

Ollama

Get up and running with large language models, locally.

License: MIT License

GitHub
Website: https://ollama.ai

sense2vec

A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be “meaning-aware”.

License: MIT License

GitHub
Website: https://explosion.ai/blog/sense2vec-reloaded

Sentence Transformers

Sentence Transformers provides an easy method to compute dense vector representations for sentences, paragraphs, and images.

License: Apache License 2.0

GitHub
Website: https://www.SBERT.net

SpaCy

Industrial-strength natural language processing library built with python and cython by the explosion.ai team.

License: MIT License

GitHub
Website: https://spacy.io

StableLM

Stability AI language models.

License: Unknown

GitHub
Website: Unknown

Tensorflow Lingvo

A framework for building neural networks in Tensorflow, particularly sequence models.

License: Apache License 2.0

GitHub

Tensorflow Text

TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.

License: Apache License 2.0

GitHub
Website: https://www.tensorflow.org/beta/tutorials/tensorflow_text/intro

Transformers

Huggingface’s library of state-of-the-art pretrained models for Natural Language Processing (NLP).

License: Apache License 2.0

GitHub
Website: https://huggingface.co/transformers

text-generation-inference

Large Language Model Text Generation Inference under TFOIL license.

License: Other

GitHub
Website: http://hf.co/docs/text-generation-inference

trlX

trlX is a distributed training framework designed from the ground up to focus on fine-tuning large language models with reinforcement learning using either a provided reward function or a reward-labeled dataset.

License: MIT License

GitHub

YouTokenToMe

YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE).

License: MIT License

GitHub

37 Large Language Model (LLM) Frameworks and Tools for Natural Language Processing (NLP)

Open Source NLP Frameworks

Adversarial ML Tools

Data Labeling Tools

Experiment Management Tools

Explainability Tools

Metadata Management Frameworks

Privacy-Preserving ML Frameworks

Model Serving & Monitoring

ML Benchmarking Tools

Anomaly Detection Tools

AutoML Tools

Data Pipeline Frameworks

Data Visualization Tools

Data Storage Optimization

Distributed Computing Frameworks

Neural Search Tools

Feature Store Tools

NLP Frameworks

High-Performance ML Libraries

Data Science Notebook Platforms

RecSys Frameworks

RL Frameworks

Stream Processing Frameworks

Model Training Orchestration

Computer Vision Tools

Model Serialization Tools