Data description: Penn Treebank Corpus - should be free for research purposes - the same processing of data as used in many LM papers, including "Empirical Evaluation and Combination of Advanced Language Modeling Techniques" - ptb.train.txt: train set - ptb.valid.txt: development set (should be used just for tuning hyper-parameters, but not for training) - ptb.test.txt: test set for reporting perplexity - ptb.char.*: the same data, just rewritten as sequences of characters, with spaces rewritten as '_' - useful for training character based models, as is shown in example 9