python-tokenizers

Provides an implementation of today's most used tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. * Train new vocabularies and tokenize, using today's most used tokenizers. * Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. * Easy to use, but also extremely versatile. * Designed for research and production. * Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. * Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

openSUSE Leap 16.0 हेतु कोई आधिकारिक पैकेज उपलब्ध नहीं है

वितरण

openSUSE Tumbleweed

devel:languages:python:backports अल्पविकसित
0.21.4
science:machinelearning अल्पविकसित
0.21.4
system:homeautomation:home-assistant अल्पविकसित
0.19.1

openSUSE Leap 16.0

devel:languages:python:backports अल्पविकसित
0.21.4
science:machinelearning अल्पविकसित
0.21.4
home:mslacken:ml समुदाय
0.21.4

openSUSE Leap 15.6

devel:languages:python:backports अल्पविकसित
0.21.4
science:machinelearning अल्पविकसित
0.21.4

openSUSE Factory RISCV

science:machinelearning अल्पविकसित
0.21.4

SLFO 1.2

openSUSE Backports for SLE 15 SP7

devel:languages:python:backports अल्पविकसित
0.21.4

openSUSE Backports for SLE 15 SP4

devel:languages:python:backports अल्पविकसित
0.21.4

असमर्थित वितरण

निम्नलिखित वितरण आधिकारिक रूप से समर्थित नहीं हैं। इन पैकेज के उपयोग/प्रभाव का उत्तरदायित्व आप पर है।