7th International Conference on Natural Language Processing, Hyderabad
Tutorial 3 at ICON 2009, to be held Monday, 14 December 2009, at IIIT Hyderabad, is proposed by Prof Adam Kilgarriff, LexMaster Class, UK. The duration is a full day, 0930–1730 hours.
Abstract
The Sketch Engine is a leading corpus query tool, used for dictionary projects and language research around the world. It is available on a website, already loaded with large corpora for twelve of the world’s major languages (with more to follow) so it allows people to use corpora over the web without the need to install software or find or build the corpora themselves. Our slogan is "corpora for all": we want to facilitate corpus use across the language industries and in language teaching and research, by making it very easy to access high-quality corpora with user-friendly and powerful tools. The website also includes a tool for uploading and installing the user’s own corpora into the Sketch Engine, and another, WebBootCaT, for building "instant corpora" from the web, and exploring them in the Sketch Engine and outside.
The Sketch Engine is the one corpus query tool which provides summaries of a word’s behaviour (‘word sketches’) which bring together grammatical and collocational analysis.
The tool is regularly extended in responses to user requests and to developments in corpus linguistics, lexicography and computational linguistics. Recent additions include:
1. GDEX (good dictionary example finding) which scores sentences according to whether they are likely to be useful as dictionary examples or for learners of the language. Concordances can then be sorted by GDEX score, so users will always tend to see good, clear examples in the top hits.
2. FindX: a general-purpose list-making facility, for "finding the words which are most X", where X might be "inclined to occur in the passive" or "found in the "verb NP verbing" construction".
3. Re-engineering so there is no longer a corpus size limit of 2 billion words. (We are currently developing a 10b word corpus of web English.)
The tutorial will be 50% lecture and 50% hands-on. The lecture will:
1. introduce the Sketch Engine, demonstrating its core functions
2. discuss and demonstrate WebBootCaT, GDEX and FindX
3. briefly discuss the corpora that are already loaded, and others in the pipeline.
The practical session will give participants an opportunity to use the tool and to explore its functions, with a guide to hand.
23rd nov 2009
The technical schedule has been released. ![]()
23rd nov 2009
Click here for details of the panel discussion. ![]()
ICON-2009 Secretariat
Language Technologies Research Centre
International Institute of Information Technology
Gachibowli
Hyderabad - 500032, India
Tel: +91-40.23001412
Fax: +91-40.66531413
icon2009@iiit.ac.in