Skip to content

stanfordmlgroup/nlm-noising

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Noising as Smoothing in Neural Network Language Models

Dependencies

Overview

Based off of Tensorflow inplementation here, which is in turn based off of PTB LSTM implementation here.

Implements noising for neural language modeling as described in this paper.

@inproceedings{noising2017,
  title={Data Noising as Smoothing in Neural Network Language Models},
  author={Xie, Ziang and Wang, Sida I. and Li, Jiwei and L{\'e}vy, Daniel and Nie, Aiming and Jurafsky, Dan and Ng, Andrew Y.},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2017}
}

The noising code can be found in loader.py and utils.py.

How to run

First download PTB data from here and put in data directory. Make sure to update paths in cfg.py to point to data. Alternatively, you can also grab the Text8 data here, then run the script data/text8/makedata-text8.sh.

Then run lm.py. Here's an example setting:

python lm.py --run_dir /tmp/lm_1500_kn  --hidden_dim 1500 --drop_prob 0.65 --gamma 0.2 --scheme ngram --ngram_scheme kn --absolute_discounting

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published