Data description:

Penn Treebank Corpus
    - should be free for research purposes
    - the same processing of data as used in many LM papers, including "Empirical Evaluation and Combination of Advanced Language Modeling Techniques"
    - ptb.train.txt: train set
    - ptb.valid.txt: development set (should be used just for tuning hyper-parameters, but not for training)
    - ptb.test.txt: test set for reporting perplexity
    
    - ptb.char.*: the same data, just rewritten as sequences of characters, with spaces rewritten as '_' - useful for training character based models, as is shown in example 9