Open Source Alternative to:
Repository activity:
Stars3,249
Forks254
Open Issues0
Last commit1 month ago
License:
Apache-2.0
Languages:
Python
Dockerfile
Batchfile
Towhee is an open-source machine learning pipeline that helps you encode your unstructured data into embeddings. It is dedicated to making neural data processing pipelines simple and fast, allowing you to focus on your core tasks without worrying about the complexities of data processing.
- Easy to Use: You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments.
- Various Modalities: From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities.
- Blazing Fast: We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding to model inference, making your pipeline execution 10x faster.
- SOTA Models: We provide 700+ pre-trained embedding models spanning 5 fields (CV, NLP, Multimodal, Audio, Medical), 15 tasks, and 140+ model architectures. These include BERT, CLIP, ViT, SwinTransformer, data2vec, etc.
- Fully Integrated with Ecosystems: Towhee provides out-of-the-box integration with your favorite libraries, tools, and frameworks, making development quick and easy.
- Pythonic API: Towhee includes a pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, making processing unstructured data as easy as handling tabular data.
Towhee is all you need to efficiently process and encode your unstructured data into useful embeddings, leveraging state-of-the-art models and seamless integration with existing tools.