37 Large Language Model (LLM) Frameworks and Tools for Natural Language Processing (NLP)

Open Source NLP Frameworks

AdaptNLP
Built atop Zalando Research’s Flair and Hugging Face’s Transformers library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks with an Easy API for training, inference, and deploying NLP-based microservices.
License: Unknown
GitHub
Website: Unknown
Blackstone
Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales’ research lab, ICLR&D.
License: Apache License 2.0
GitHub
Website: https://research.iclr.co.uk
Coqui STT
Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models.
License: Mozilla Public License 2.0
GitHub
Website: https://coqui.ai
CTRL
A Conditional Transformer Language Model for Controllable Generation released by SalesForce.
License: BSD 3-Clause "New" or "Revised" License
GitHub
Website: https://arxiv.org/abs/1909.05858
dspy
A framework for programming with foundation models.
License: MIT License
GitHub
Dust
Dust assists in the design and deployment of large language model apps.
License: MIT License
GitHub
Website: https://dust.tt
ESPnet
ESPnet is an end-to-end speech processing toolkit.
License: Apache License 2.0
GitHub
Website: https://espnet.github.io/espnet/
Facebook's XLM
PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc..
License: Other
GitHub
FastChat
FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
License: Apache License 2.0
GitHub
Flair
Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.
License: Other
GitHub
Website: https://flairnlp.github.io/flair/
FlexGen
FlexGen is a high-throughput generation engine for running large language models with limited GPU memory.
License: Apache License 2.0
GitHub
GluonNLP
GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
License: Apache License 2.0
GitHub
Website: https://nlp.gluon.ai/
Gretel Synthetics
Gretel Synthetics is a synthetic data generators for structured and unstructured text, featuring differentially private learning.
License: Other
GitHub
Website: https://gretel.ai/platform/synthetics
Grover
Grover is a model for Neural Fake News – both generation and detection. However, it probably can also be used for other generation tasks.
License: Apache License 2.0
GitHub
Guardrails
Guardrails is a package that lets a user add structure, type and quality guarantees to the outputs of large language models.
License: Apache License 2.0
GitHub
Website: https://docs.guardrailsai.com/
h2oGPT
h2oGPT is an open source generative AI, gives organizations like yours the power to own large language models while preserving your data ownership.
License: Apache License 2.0
GitHub
Website: http://h2o.ai
Haystack
Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.
License: Apache License 2.0
GitHub
Website: https://haystack.deepset.ai
Interactive Composition Explorer
ICE is a Python library and trace visualizer for language model programs.
License: MIT License
GitHub
Website: https://ice.ought.org
Kashgari
Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
License: Apache License 2.0
GitHub
Website: http://kashgari.readthedocs.io/
Lamini
Lamini is an LLM engine for rapidly customizing models.
License: No License
GitHub
LangChain
LangChain assists in building applications with LLMs through composability.
License: MIT License
GitHub
Website: https://python.langchain.com
LlamaIndex
LlamaIndex (GPT Index) is a data framework for your LLM application.
License: MIT License
GitHub
Website: https://docs.llamaindex.ai
LLaMA
LLaMA is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference.
License: Other
GitHub
LMFlow
LMFlow is an extensible, convenient, and efficient toolbox for finetuning large machine learning models.
License: Apache License 2.0
GitHub
Website: https://optimalscale.github.io/LMFlow/
Megatron-LM
Megatron-LM is a highly optimized and efficient library for training large language models.
License: Other
GitHub
MLC LLM
MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases.
License: Apache License 2.0
GitHub
Website: https://llm.mlc.ai/docs
Ollama
Get up and running with large language models, locally.
License: MIT License
GitHub
Website: https://ollama.ai
sense2vec
A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be “meaning-aware”.
License: MIT License
GitHub
Website: https://explosion.ai/blog/sense2vec-reloaded
Sentence Transformers
Sentence Transformers provides an easy method to compute dense vector representations for sentences, paragraphs, and images.
License: Apache License 2.0
GitHub
Website: https://www.SBERT.net
SpaCy
Industrial-strength natural language processing library built with python and cython by the explosion.ai team.
License: MIT License
GitHub
Website: https://spacy.io
StableLM
Stability AI language models.
License: Unknown
GitHub
Website: Unknown
Tensorflow Lingvo
A framework for building neural networks in Tensorflow, particularly sequence models.
License: Apache License 2.0
GitHub
Tensorflow Text
TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.
License: Apache License 2.0
GitHub
Website: https://www.tensorflow.org/beta/tutorials/tensorflow_text/intro
Transformers
Huggingface’s library of state-of-the-art pretrained models for Natural Language Processing (NLP).
License: Apache License 2.0
GitHub
Website: https://huggingface.co/transformers
text-generation-inference
Large Language Model Text Generation Inference under TFOIL license.
License: Other
GitHub
Website: http://hf.co/docs/text-generation-inference
trlX
trlX is a distributed training framework designed from the ground up to focus on fine-tuning large language models with reinforcement learning using either a provided reward function or a reward-labeled dataset.
License: MIT License
GitHub
YouTokenToMe
YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE).
License: MIT License
GitHub