Join Our Discord (750+ Members)

37 Large Language Model (LLM) Frameworks and Tools for Natural Language Processing (NLP)

Explore open source large language model frameworks and tools for natural language processing (NLP).

Open Source NLP Frameworks

  • AdaptNLP

    Built atop Zalando Research’s Flair and Hugging Face’s Transformers library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks with an Easy API for training, inference, and deploying NLP-based microservices.

    License: Unknown

    GitHub
    Website: Unknown
  • Blackstone

    Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales’ research lab, ICLR&D.

    License: Apache License 2.0

  • Coqui STT

    Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models.

    License: Mozilla Public License 2.0

  • CTRL

    A Conditional Transformer Language Model for Controllable Generation released by SalesForce.

    License: BSD 3-Clause "New" or "Revised" License

  • dspy

    A framework for programming with foundation models.

    License: MIT License

  • Dust

    Dust assists in the design and deployment of large language model apps.

    License: MIT License

  • ESPnet

    ESPnet is an end-to-end speech processing toolkit.

    License: Apache License 2.0

  • Facebook's XLM

    PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc..

    License: Other

  • FastChat

    FastChat is an open platform for training, serving, and evaluating large language model based chatbots.

    License: Apache License 2.0

  • Flair

    Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.

    License: Other

  • FlexGen

    FlexGen is a high-throughput generation engine for running large language models with limited GPU memory.

    License: Apache License 2.0

  • GluonNLP

    GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.

    License: Apache License 2.0

  • Gretel Synthetics

    Gretel Synthetics is a synthetic data generators for structured and unstructured text, featuring differentially private learning.

    License: Other

  • Grover

    Grover is a model for Neural Fake News – both generation and detection. However, it probably can also be used for other generation tasks.

    License: Apache License 2.0

  • Guardrails

    Guardrails is a package that lets a user add structure, type and quality guarantees to the outputs of large language models.

    License: Apache License 2.0

  • h2oGPT

    h2oGPT is an open source generative AI, gives organizations like yours the power to own large language models while preserving your data ownership.

    License: Apache License 2.0

  • Haystack

    Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.

    License: Apache License 2.0

  • Interactive Composition Explorer

    ICE is a Python library and trace visualizer for language model programs.

    License: MIT License

  • Kashgari

    Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.

    License: Apache License 2.0

  • Lamini

    Lamini is an LLM engine for rapidly customizing models.

    License: No License

  • LangChain

    LangChain assists in building applications with LLMs through composability.

    License: MIT License

  • LlamaIndex

    LlamaIndex (GPT Index) is a data framework for your LLM application.

    License: MIT License

  • LLaMA

    LLaMA is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference.

    License: Other

  • LMFlow

    LMFlow is an extensible, convenient, and efficient toolbox for finetuning large machine learning models.

    License: Apache License 2.0

  • Megatron-LM

    Megatron-LM is a highly optimized and efficient library for training large language models.

    License: Other

  • MLC LLM

    MLC LLM is a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases.

    License: Apache License 2.0

  • Ollama

    Get up and running with large language models, locally.

    License: MIT License

  • sense2vec

    A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be “meaning-aware”.

    License: MIT License

  • Sentence Transformers

    Sentence Transformers provides an easy method to compute dense vector representations for sentences, paragraphs, and images.

    License: Apache License 2.0

  • SpaCy

    Industrial-strength natural language processing library built with python and cython by the explosion.ai team.

    License: MIT License

  • StableLM

    Stability AI language models.

    License: Unknown

    GitHub
    Website: Unknown
  • Tensorflow Lingvo

    A framework for building neural networks in Tensorflow, particularly sequence models.

    License: Apache License 2.0

  • Tensorflow Text

    TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.

    License: Apache License 2.0

  • Transformers

    Huggingface’s library of state-of-the-art pretrained models for Natural Language Processing (NLP).

    License: Apache License 2.0

  • text-generation-inference

    Large Language Model Text Generation Inference under TFOIL license.

    License: Other

  • trlX

    trlX is a distributed training framework designed from the ground up to focus on fine-tuning large language models with reinforcement learning using either a provided reward function or a reward-labeled dataset.

    License: MIT License

  • YouTokenToMe

    YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE).

    License: MIT License

Last Updated: Dec 26, 2023