Maciej Pacula

AutoCorpus 1.0.1 released!

I just released AutoCorpus 1.0.1. AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets such as Wikipedia. It consists of fast native binaries which can process tens of gigabytes of text in little time.

AutoCorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.

Homepage: http://mpacula.com/autocorpus
GitHub: https://github.com/mpacula/AutoCorpus

Uncategorized

November 26th, 2011

No response

AutoCorpus 1.0.1 released!

Comment now!

Trackbacks

Archives

Links

Meta