piecemaker: Tools for Preparing Text for Tokenizers
Tokenizers break text into pieces that are more usable by
machine learning models. Many tokenizers share some preparation steps.
This package provides those shared steps, along with a simple
tokenizer.
Version: |
1.0.2 |
Depends: |
R (≥ 2.10) |
Imports: |
cli, glue, rlang (≥ 0.4.2), stringi, stringr |
Suggests: |
covr, testthat (≥ 3.0.0) |
Published: |
2023-06-02 |
DOI: |
10.32614/CRAN.package.piecemaker |
Author: |
Jon Harmon [aut,
cre],
Jonathan Bratt
[aut],
Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] |
Maintainer: |
Jon Harmon <jonthegeek at gmail.com> |
BugReports: |
https://github.com/macmillancontentscience/piecemaker/issues |
License: |
Apache License (≥ 2) |
URL: |
https://github.com/macmillancontentscience/piecemaker,
https://macmillancontentscience.github.io/piecemaker/ |
NeedsCompilation: |
no |
Materials: |
README, NEWS |
CRAN checks: |
piecemaker results |
Documentation:
Downloads:
Reverse dependencies:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=piecemaker
to link to this page.