Indian Languages Corpora Initiative (ILCI) Project
 

Indian Language Corpora Initiative (ILCI) is a central Government project funded by Department of Information Technology (DIT), Government of India to  provide a common language platform by  creating a parallel annotated corpora in the tourism & health domain in 11 Indian languages with Hindi as the source language.  The main objective of the project is to build an annotated parallel corpora (Hindi to Indian languages with English) with standards for 12 major Indian languages including English - 8 Indo Aryan languages (Hindi, Urdu, Punjabi, Bangla, Oriya, Gujrati, Marathi and Konkani) and 3 Dravidian languages (Tamil, Telugu, Malayalam) plus English in the domain of tourism and health.

ILCI  is a consortia project with 11 members, IIITM-K is assigned for the work of Malayalam Language and other 10 centres involved are JNU for overall cordination, , ISI Kolkata, Utkal University-Orissa, IIT Mumbai, Gujarat University-Ahemedbad, Goa University, Dravidian University-AP, Tamil University-TN and Pujab University.   The work involves developing and building Corpora collection, Corpora annotation, Standards and Tool development.


AIM AND SCOPE OF THE PROJECT  

The main objective of the project is to build an annotated parallel corpora with standards and tools for 12 major Indian including English  - 8 Indo Aryan languages (Hindi, Urdu, Punjabi, Bangla, Oriya, Gujrati, Marathi and Konkani) and 3 Dravidian languages (Tamil, Telugu, Malayalam) plus English in the domain of tourism and health.
Major aims of the project are: 

  • Draft Standards
  • Corpora collection
  • Corpora annotation
  • Tools development
More details about ILCI Project