
Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core
References
-
D’hondt
E
Verberne
S
Weber
N
Koster
K
Boves
L
‘Using skipgrams and PoS-based feature selection for patent classification’
Computational Linguistics in the Netherlands Journal
2012
2
52
70
URL:
http://clinjournal.org/sites/clinjournal.org/files/4Dhondt2012 0.pdf -
Federico
M
Bertoldi
N
Cettolo
M
IRSTLM: an open source toolkit for handling large scale language models ‘INTERSPEECH’ 2008 ISCA 1618 1621 URL:http://www.isca-speech.org/archive/interspeech_2008/i08_1618.html -
Guthrie
D
Hepple
M
Storing the web in memory: Space efficient language models with constant time retrieval
‘Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing’, EMNLP ’10
2010
Stroudsburg, PA, USA
Association for Computational Linguistics 262 272 URL:http://dl.acm.org/citation.cfm?id=1870658.1870684 -
Guthrie
D
Allison
B
Liu
W
Guthrie
L
Wilks
Y
A closer look at skip-gram modelling
‘Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC–2006)’
2006
Genoa, Italy
URL:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.4714 -
Heafield
K
KenLM: faster and smaller language model queries
‘Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation’
2011
Edinburgh, Scotland, United Kingdom
187
197
URL:
http://kheafield.com/professional/avenue/kenlm.pdf -
Huffman
D A
‘A method for the construction of minimum-redundancy codes’
Proceedings of the Institute of Radio Engineers
1952
40
9
1098
1101
URL:
http://compression.graphicon.ru/download/articles/huff/huffman 1952 minimum-redundancy-codes.pdf 10.1109/jrproc.1952.273898 - Kunneman F van den Bosch A ‘Open-domain extraction of future events from twitter’ Natural Language Engineering 2016 10.1017/S1351324916000036
-
Manber
U
Myers
G
Suffix arrays: A new method for on-line string searches
‘Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms’, SODA ’90
1990
Philadelphia, PA, USA
Society for Industrial and Applied Mathematics 319 327 10.1137/0222058 -
Mikolov
T
Chen
K
Corrado
G
Dean
J
‘Efficient estimation of word representations in vector space’
CoRR
2013
abs/1301.3781. URL:
http://arxiv.org/abs/1301.3781 - Onrust L van den Bosch A van Hamme H Improving cross-domain n-gram language modelling with skipgrams ‘In Proceedings of ACL’ 2016 To appear
-
Rayson
P
Garside
R
Comparing corpora using frequency profiling
In proceedings of the workshop on Comparing Corpora, held in conjunction ACL 2000
October 2000 2000 Hong Kong 1 6 10.3115/1117729.1117730 -
Stehouwer
H
Van Zaanen
M
Ganzha
M.
Paprzycki
M.
Finding patterns in strings using suffix arrays
‘Proceedings of the International Multiconference on Computer Science and Information Technology’
2010
Wisa, Poland
IEEE 505 511 10.1109/IMCSIT.2010.5679928 URL:http://hdl.handle.net/11858/00-001M-0000-0012-3E7F-2 -
Stolcke
A
Hansen
J. H. L.
Pellom
B. L.
Srilm – an extensible language modeling toolkit
7th International Conference on Spoken Language Processing, ICSLP2002 – INTERSPEECH 2002
September 16–20, 2002 2002 Denver, Colorado, USAISCA URL:http://www.speech.sri.com/projects/srilm/papers/icslp2002-srilm.pdf -
van Gompel
M
Reynaert
M
‘FoLiA: A practical XML format for linguistic annotation – a descriptive and comparative study’
Computational Linguistics in the Netherlands Journal
2013
3
URL:
http://clinjournal.org/sites/clinjournal.org/files/05-vanGompel-Reynaert-CLIN2013.pdf -
van Gompel
M
Reynaert
M
CLAM: Quickly deploy NLP commandline tools on the web
‘Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics’
2014
URL:
http://aclweb.org/anthology/C14-2016 -
van Gompel
M
van den Bosch
A
Translation assistance by translation of L1 fragments in an L2 context
‘Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)’
2014
Baltimore, Maryland
Association for Computational Linguistics 871 880 URL:www.aclweb.org/anthology/P14-1082 -
Weiner
P
Linear pattern matching algorithms
‘SWAT (FOCS)’
1973
IEEE Computer Society 1 11 10.1109/swat.1973.13
DOI: https://doi.org/10.5334/jors.105 | Journal eISSN: 2049-9647
Language: English
Submitted on: Nov 9, 2015
Accepted on: Jul 1, 2016
Published on: Aug 2, 2016
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year
Keywords:
© 2016 Maarten van Gompel, Antal van den Bosch, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.