Some PLANS for the FORUM
Posted: January 26th, 2009, 2:19 am
I would like to introduce some of the plans that we have for the Lakota Language Forum.
1) Lemmatizer
This weird sounding word represents a very useful tool for language learners and language users. To describe lemmatizer we first have to explain the term lemma. The English words goes, going, went, gone are all forms of one word and that is go. This base form is the citation form that you will find as an entry in a dictionary. But you will not find goes, going, went, gone as separate entries in most dictionaries. The base form (e.g. go) is called lemma.
Because English has very simple word morphology each lemma has but very few forms (most verbs have only four forms, a small number of irregular verbs have five forms, nouns have two forms).
Lakota, on the other hand, is a highly inflectional language with complex morphology which results in a large number of word forms for every lemma. Most stative and active verbs have at least 7 forms but those with changeable A (ablaut) have 21 forms. Transitive verbs have 30 forms and if they allow ablaut then they have 90 forms. These numbers are further multiplied by the number of non-personal prefixes and suffixes that each verb takes. Because of their high number the forms of each Lakota verb cannot be included in a dictionary. Moreover, some of the word forms look very different than the base form, look for example at iwíčhauŋkičupi which is a form of the verb ičú. Such forms represent a difficulty for beginner learners who are unable to associate them with the base forms unless they become familiar with all the personal affixes.
This is where a lemmatizer can help. It is a software tool that can recognize any word form and associate it with the appropriate lemma or base form. For instance if you type in owíčhabluspe the lemmatizer will tell you that it is a form of the verb oyúspA and you can then look that word up in the dictionary.
The lemmatizer will also be able to generate all word forms of each verb. This can be used in several useful ways described below.
2) Find the paradigm for any verb
The list of word forms created by the lemmatizer will enable us to create a conjugation paradigm for each verb and a table with subject-object combinations for each transitive verb. So if you type máni in the lemmatizer you will see the following:
If you type ičú in the lemmatizer you will see the following:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
3) Spellchecker
Having all possible word forms available will enable us to create a spellchecker. This will further enhance the ability to type Lakota using consistent spelling and it will be a great tool for learners and language users on all levels.
4) Online dictionary
The lemmatizer will also be an integral part of an online Lakota dictionary. In the future we are hoping to make the dictionary database available online with advanced search options. The online dictionary will be in multimedia format, so it will have not only words but also pictures for nouns and sounds for pronunciation of the words.
1) Lemmatizer
This weird sounding word represents a very useful tool for language learners and language users. To describe lemmatizer we first have to explain the term lemma. The English words goes, going, went, gone are all forms of one word and that is go. This base form is the citation form that you will find as an entry in a dictionary. But you will not find goes, going, went, gone as separate entries in most dictionaries. The base form (e.g. go) is called lemma.
Because English has very simple word morphology each lemma has but very few forms (most verbs have only four forms, a small number of irregular verbs have five forms, nouns have two forms).
Lakota, on the other hand, is a highly inflectional language with complex morphology which results in a large number of word forms for every lemma. Most stative and active verbs have at least 7 forms but those with changeable A (ablaut) have 21 forms. Transitive verbs have 30 forms and if they allow ablaut then they have 90 forms. These numbers are further multiplied by the number of non-personal prefixes and suffixes that each verb takes. Because of their high number the forms of each Lakota verb cannot be included in a dictionary. Moreover, some of the word forms look very different than the base form, look for example at iwíčhauŋkičupi which is a form of the verb ičú. Such forms represent a difficulty for beginner learners who are unable to associate them with the base forms unless they become familiar with all the personal affixes.
This is where a lemmatizer can help. It is a software tool that can recognize any word form and associate it with the appropriate lemma or base form. For instance if you type in owíčhabluspe the lemmatizer will tell you that it is a form of the verb oyúspA and you can then look that word up in the dictionary.
The lemmatizer will also be able to generate all word forms of each verb. This can be used in several useful ways described below.
2) Find the paradigm for any verb
The list of word forms created by the lemmatizer will enable us to create a conjugation paradigm for each verb and a table with subject-object combinations for each transitive verb. So if you type máni in the lemmatizer you will see the following:
1s mawáni | 1d maúŋni | 1p maúŋnipi |
2s mayáni | 2p mayánipi | |
3s máni | 3p mánipi |
If you type ičú in the lemmatizer you will see the following:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
3) Spellchecker
Having all possible word forms available will enable us to create a spellchecker. This will further enhance the ability to type Lakota using consistent spelling and it will be a great tool for learners and language users on all levels.
4) Online dictionary
The lemmatizer will also be an integral part of an online Lakota dictionary. In the future we are hoping to make the dictionary database available online with advanced search options. The online dictionary will be in multimedia format, so it will have not only words but also pictures for nouns and sounds for pronunciation of the words.