Local information for Portland, Denver, Fortworth, Louisville: fullhyd.com
ICON 2009

7th International Conference on Natural Language Processing, Hyderabad

the sketch engine

Tutorial 3 at ICON 2009, to be held Monday, 14 December 2009, at IIIT Hyderabad, is proposed by Prof Adam Kilgarriff, LexMaster Class, UK. The duration is a full day, 0930–1730 hours.

Abstract

The Sketch Engine is a leading corpus query tool, used for dictionary projects and language research around the world. It is available on a website, already loaded with large corpora for twelve of the world’s major languages (with more to follow) so it allows people to use corpora over the web without the need to install software or find or build the corpora themselves. Our slogan is "corpora for all": we want to facilitate corpus use across the language industries and in language teaching and research, by making it very easy to access high-quality corpora with user-friendly and powerful tools. The website also includes a tool for uploading and installing the user’s own corpora into the Sketch Engine, and another, WebBootCaT, for building "instant corpora" from the web, and exploring them in the Sketch Engine and outside.

The Sketch Engine is the one corpus query tool which provides summaries of a word’s behaviour (‘word sketches’) which bring together grammatical and collocational analysis.

The tool is regularly extended in responses to user requests and to developments in corpus linguistics, lexicography and computational linguistics. Recent additions include:

1. GDEX (good dictionary example finding) which scores sentences according to whether they are likely to be useful as dictionary examples or for learners of the language. Concordances can then be sorted by GDEX score, so users will always tend to see good, clear examples in the top hits.
2. FindX: a general-purpose list-making facility, for "finding the words which are most X", where X might be "inclined to occur in the passive" or "found in the "verb NP verbing" construction".
3. Re-engineering so there is no longer a corpus size limit of 2 billion words. (We are currently developing a 10b word corpus of web English.)

The tutorial will be 50% lecture and 50% hands-on. The lecture will:

1. introduce the Sketch Engine, demonstrating its core functions
2. discuss and demonstrate WebBootCaT, GDEX and FindX
3. briefly discuss the corpora that are already loaded, and others in the pipeline.

The practical session will give participants an opportunity to use the tool and to explore its functions, with a guide to hand.

organizing bodies

latest updates

23rd nov 2009

The technical schedule has been released.

23rd nov 2009

Click here for details of the panel discussion.

more

contact us

ICON-2009 Secretariat

Language Technologies Research Centre
International Institute of Information Technology
Gachibowli
Hyderabad - 500032, India
Tel: +91-40.23001412
Fax: +91-40.66531413
icon2009@iiit.ac.in