Skip to content

AraVeLex: Modern Standard Arabic Verbal lexicon

This is a human-readable rendition of a JSON file defining a frictionless package. It was generated automatically.

  • name: aravelex
  • licenses:
  • keywords: arabic, verbs, paradigms, paralex
  • profile data-package
  • contributors
  • [1]
    • title Sacha Beniamine
    • role author
  • version 1.0.1
  • languages_iso639 ['arb']
  • related_identifiers
  • relation isDerivedFrom
  • identifier https://github.com/unimorph/ara/blob/master/ara
  • paralex-version 2.0.7

This package describes the following tables:

cells

Paradigm cells - This table is located in std_modern_arabic_cells.csv.

  • The identifier column (or primaryKey) is ['cell_id']

Columns defined by cells-schema:

  • cell_id (string): Cell identifier. The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl

    • constraints: a cell_id is obligatory; it must be unique; it must match the regular expression (imperf|pass|sbjv|juss|act|ind|prf|imp|1|2|3|d|f|m|p|s)(\.(imperf|pass|sbjv|juss|act|ind|prf|imp|1|2|3|d|f|m|p|s))*.
  • unimorph_cell_old (any)

  • unimorph_cell (any)

forms

Inflected forms - This table is located in std_modern_arabic_paradigms.csv.

  • The identifier column (or primaryKey) is ['form_id']

  • Formal relations (foreignKeys) with other tables:

    Each value in column Must refer to
    ['cell'] ['cell_id'] in table cells
    ['lexeme'] ['lexeme_id'] in table lexemes

Columns defined by forms-schema:

  • form_id (string): Form table row identifiers. These identifiers are specific to form, lexeme, cell triples.

    • constraints: a form_id is obligatory; it must be unique.
  • cell (string): Reference to a cell identifier. The set of feature values as would appear in a gloss, separated by dots, eg. prs.ind.1sg or f.pl

  • orth_roman_form (any)

  • orth_form (string): Inflected form (orthographic). The form, given orthographically

  • lexeme (string): Reference to a lexeme identifier. Lexeme identifiers must be unique to paradigms.

  • phon_form (string): Inflected form (phonemic or phonetic). The form, given in phonemic or phonetic notation, with sounds separated by spaces

    • constraints: a phon_form must match the regular expression (dˤ|sˤ|tˤ|ðˤ|aː|iː|uː|h|j|w|b|d|f|k|l|m|n|q|r|s|t|x|z|ð|ħ|ɣ|ʃ|ʔ|ʕ|ʤ|θ|a|i|u)( (dˤ|sˤ|tˤ|ðˤ|aː|iː|uː|h|j|w|b|d|f|k|l|m|n|q|r|s|t|x|z|ð|ħ|ɣ|ʃ|ʔ|ʕ|ʤ|θ|a|i|u))*.

    • rdfProperty: https://www.paralex-standard.org/paralex_ontology.xml#phon_form

    • missingValues: #DEF#

lexemes

Lexemes - This table is located in std_modern_arabic_lexemes.csv.

  • The identifier column (or primaryKey) is ['lexeme_id']

Columns defined by lexemes-schema:

  • lexeme_id (string): Identifier for the lexeme. Lexeme identifiers. Often, they are identical to the label (lemma). However, they must be unique to paradigms, distinguishing homonyms with different inflection. For example, the animal mouse/mice and the computer peripheric mouse/mouses would both have the label 'mouse' but could be identified by the lexeme identifiers mouse_1 and mouse_2.

    • constraints: a lexeme_id is obligatory; it must be unique.
  • lemma (any)

  • lemma_roman (any)

  • POS (string): Part of Speech. The relevant part of speech for this item. This must refer to a PartOfSpeech entity from the lexinfo (https://lexinfo.net/) ontology.

    • constraints: a POS must be one of the values: verb, numeral, conjunction, noun, adposition, determiner, article, adverb, pronoun, fusedPreposition, adjective, symbol, particle, conditionalParticle, demonstrativePronoun, interjection, semiColon, diminutiveNoun, possessivePronoun, prepositionalAdverb, compoundPreposition, interrogativeRelativePronoun, possessiveParticle, plainVerb, letter, interrogativeDeterminer, relativePronoun, postposition, fusedPronounAuxiliary, interrogativeOrdinalNumeral, indefiniteOrdinalNumeral, strongPersonalPronoun, possessiveRelativePronoun, ordinalAdjective, collectivePronoun, commonNoun, infinitiveParticle, comparativeParticle, partitiveArticle, invertedComma, lightVerb, emphaticPronoun, distinctiveParticle, genericNumeral, possessiveAdjective, reflexivePossessivePronoun, colon, coordinationParticle, presentParticipleAdjective, fusedPrepositionPronoun, cardinalNumeral, indefiniteDeterminer, numeralFraction, questionMark, generalAdverb, superlativeParticle, point, indefiniteMultiplicativeNumeral, comma, closeParenthesis, futureParticle, personalPronoun, reflexivePersonalPronoun, adverbialPronoun, reciprocalPronoun, openParenthesis, pastParticipleAdjective, negativePronoun, relativeDeterminer, existentialPronoun, pronominalAdverb, relativeParticle, exclamativeDeterminer, multiplicativeNumeral, reflexiveDeterminer, modal, unclassifiedParticle, properNoun, allusivePronoun, interrogativeCardinalNumeral, bullet, subordinatingConjunction, irreflexivePersonalPronoun, possessiveDeterminer, negativeParticle, indefinitePronoun, generalizationWord, coordinatingConjunction, deficientVerb, adjective-i, impersonalPronoun, indefiniteCardinalNumeral, adjective-na, qualifierAdjective, affirmativeParticle, mainVerb, fusedPrepositionDeterminer, indefiniteArticle, weakPersonalPronoun, suspensionPoints, interrogativeMultiplicativeNumeral, affixedPersonalPronoun, auxiliary, circumposition, copula, demonstrativeDeterminer, participleAdjective, exclamativePoint, interrogativePronoun, presentativePronoun, punctuation, definiteArticle, slash, exclamativePronoun, preposition, conditionalPronoun, relationNoun, interrogativeParticle.

    • rdfProperty: https://www.paralex-standard.org/paralex_ontology.xml#POS

sounds

Sound inventory with distinctive features - This table is located in std_modern_arabic_sounds.csv.

  • The identifier column (or primaryKey) is ['sound_id']
  • missingValues: ``

Columns defined by sounds-schema:

  • sound_id (string): sound representation. These identifiers are specific to sounds.

    • constraints: a sound_id is obligatory; it must be unique.
  • tier (any)

  • syllabic (any)

  • long (any)

  • consonantal (any)

  • sonorant (any)

  • continuant (any)

  • delayed release (any)

  • approximant (any)

  • trill (any)

  • nasal (any)

  • voice (any)

  • spread gl (any)

  • constr gl (any)

  • LABIAL (any)

  • round (any)

  • labiodental (any)

  • CORONAL (any)

  • anterior (any)

  • distributed (any)

  • strident (any)

  • lateral (any)

  • DORSAL (any)

  • high (any)

  • low (any)

  • front (any)

  • back (any)

  • tense (any)

features-values

Grammatical features values - This table is located in std_modern_arabic_features.csv.

  • The identifier column (or primaryKey) is ['value_id']

Columns defined by features-values-schema:

  • value_id (string): Grammatical Feature value identifier. Identifier for the grammatical feature value (as found in the cell)

    • constraints: a value_id is obligatory; it must be unique.
  • value_label (any)

  • feature (string): feature. The name of the dimension of this feature, eg. case, tense, modality, voice, force, gender, evidentiality, person, number, polarity...

  • POS (string): Part of Speech. The relevant part of speech for this item. This must refer to a PartOfSpeech entity from the lexinfo (https://lexinfo.net/) ontology.

    • constraints: a POS must be one of the values: verb, numeral, conjunction, noun, adposition, determiner, article, adverb, pronoun, fusedPreposition, adjective, symbol, particle, conditionalParticle, demonstrativePronoun, interjection, semiColon, diminutiveNoun, possessivePronoun, prepositionalAdverb, compoundPreposition, interrogativeRelativePronoun, possessiveParticle, plainVerb, letter, interrogativeDeterminer, relativePronoun, postposition, fusedPronounAuxiliary, interrogativeOrdinalNumeral, indefiniteOrdinalNumeral, strongPersonalPronoun, possessiveRelativePronoun, ordinalAdjective, collectivePronoun, commonNoun, infinitiveParticle, comparativeParticle, partitiveArticle, invertedComma, lightVerb, emphaticPronoun, distinctiveParticle, genericNumeral, possessiveAdjective, reflexivePossessivePronoun, colon, coordinationParticle, presentParticipleAdjective, fusedPrepositionPronoun, cardinalNumeral, indefiniteDeterminer, numeralFraction, questionMark, generalAdverb, superlativeParticle, point, indefiniteMultiplicativeNumeral, comma, closeParenthesis, futureParticle, personalPronoun, reflexivePersonalPronoun, adverbialPronoun, reciprocalPronoun, openParenthesis, pastParticipleAdjective, negativePronoun, relativeDeterminer, existentialPronoun, pronominalAdverb, relativeParticle, exclamativeDeterminer, multiplicativeNumeral, reflexiveDeterminer, modal, unclassifiedParticle, properNoun, allusivePronoun, interrogativeCardinalNumeral, bullet, subordinatingConjunction, irreflexivePersonalPronoun, possessiveDeterminer, negativeParticle, indefinitePronoun, generalizationWord, coordinatingConjunction, deficientVerb, adjective-i, impersonalPronoun, indefiniteCardinalNumeral, adjective-na, qualifierAdjective, affirmativeParticle, mainVerb, fusedPrepositionDeterminer, indefiniteArticle, weakPersonalPronoun, suspensionPoints, interrogativeMultiplicativeNumeral, affixedPersonalPronoun, auxiliary, circumposition, copula, demonstrativeDeterminer, participleAdjective, exclamativePoint, interrogativePronoun, presentativePronoun, punctuation, definiteArticle, slash, exclamativePronoun, preposition, conditionalPronoun, relationNoun, interrogativeParticle.

    • rdfProperty: https://www.paralex-standard.org/paralex_ontology.xml#POS

  • comment (string): Comment. Human-readable comment.

  • canonical_order (integer): Sorting order for visual presentation. The order in which items are canonically presented. Use integers to represent relative order, order is used per-item.