German Wikipedia LMs (GWLMs)
We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.
This is an ongoing project!
German Wikipedia Corpus
We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.
Fine-tuned Models
We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:
Acknowledgements
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC).
Many Thanks for providing access to the TPUs ❤️