Skip to main content
Have a personal or library account? Click to login
Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core Cover

Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core

Open Access
|Aug 2016

References

  1. D’hondt E Verberne S Weber N Koster K Boves L ‘Using skipgrams and PoS-based feature selection for patent classification’ Computational Linguistics in the Netherlands Journal 2012 2 52 70 URL: http://clinjournal.org/sites/clinjournal.org/files/4Dhondt2012 0.pdf
  2. Federico M Bertoldi N Cettolo M IRSTLM: an open source toolkit for handling large scale language models ‘INTERSPEECH’ 2008 ISCA 1618 1621 URL: http://www.isca-speech.org/archive/interspeech_2008/i08_1618.html
  3. Guthrie D Hepple M Storing the web in memory: Space efficient language models with constant time retrieval ‘Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing’, EMNLP ’10 2010 Stroudsburg, PA, USA Association for Computational Linguistics 262 272 URL: http://dl.acm.org/citation.cfm?id=1870658.1870684
  4. Guthrie D Allison B Liu W Guthrie L Wilks Y A closer look at skip-gram modelling ‘Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC–2006)’ 2006 Genoa, Italy URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.4714
  5. Heafield K KenLM: faster and smaller language model queries ‘Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation’ 2011 Edinburgh, Scotland, United Kingdom 187 197 URL: http://kheafield.com/professional/avenue/kenlm.pdf
  6. Huffman D A ‘A method for the construction of minimum-redundancy codes’ Proceedings of the Institute of Radio Engineers 1952 40 9 1098 1101 URL: http://compression.graphicon.ru/download/articles/huff/huffman 1952 minimum-redundancy-codes.pdf 10.1109/jrproc.1952.273898
  7. Kunneman F van den Bosch A ‘Open-domain extraction of future events from twitter’ Natural Language Engineering 2016 10.1017/S1351324916000036
  8. Manber U Myers G Suffix arrays: A new method for on-line string searches ‘Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms’, SODA ’90 1990 Philadelphia, PA, USA Society for Industrial and Applied Mathematics 319 327 10.1137/0222058
  9. Mikolov T Chen K Corrado G Dean J ‘Efficient estimation of word representations in vector space’ CoRR 2013 abs/1301.3781. URL: http://arxiv.org/abs/1301.3781
  10. Onrust L van den Bosch A van Hamme H Improving cross-domain n-gram language modelling with skipgrams ‘In Proceedings of ACL’ 2016 To appear
  11. Rayson P Garside R Comparing corpora using frequency profiling In proceedings of the workshop on Comparing Corpora, held in conjunction ACL 2000 October 2000 2000 Hong Kong 1 6 10.3115/1117729.1117730
  12. Stehouwer H Van Zaanen M Ganzha M. Paprzycki M. Finding patterns in strings using suffix arrays ‘Proceedings of the International Multiconference on Computer Science and Information Technology’ 2010 Wisa, Poland IEEE 505 511 10.1109/IMCSIT.2010.5679928 URL: http://hdl.handle.net/11858/00-001M-0000-0012-3E7F-2
  13. Stolcke A Hansen J. H. L. Pellom B. L. Srilm – an extensible language modeling toolkit 7th International Conference on Spoken Language Processing, ICSLP2002 – INTERSPEECH 2002 September 16–20, 2002 2002 Denver, Colorado, USA ISCA URL: http://www.speech.sri.com/projects/srilm/papers/icslp2002-srilm.pdf
  14. van Gompel M Reynaert M ‘FoLiA: A practical XML format for linguistic annotation – a descriptive and comparative study’ Computational Linguistics in the Netherlands Journal 2013 3 URL: http://clinjournal.org/sites/clinjournal.org/files/05-vanGompel-Reynaert-CLIN2013.pdf
  15. van Gompel M Reynaert M CLAM: Quickly deploy NLP commandline tools on the web ‘Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics’ 2014 URL: http://aclweb.org/anthology/C14-2016
  16. van Gompel M van den Bosch A Translation assistance by translation of L1 fragments in an L2 context ‘Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)’ 2014 Baltimore, Maryland Association for Computational Linguistics 871 880 URL: www.aclweb.org/anthology/P14-1082
  17. Weiner P Linear pattern matching algorithms ‘SWAT (FOCS)’ 1973 IEEE Computer Society 1 11 10.1109/swat.1973.13
DOI: https://doi.org/10.5334/jors.105 | Journal eISSN: 2049-9647
Language: English
Submitted on: Nov 9, 2015
Accepted on: Jul 1, 2016
Published on: Aug 2, 2016
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2016 Maarten van Gompel, Antal van den Bosch, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.