pattern.fr
The pattern.fr module contains a fast part-of-speech tagger for French (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, and tools for French verb conjugation and noun singularization & pluralization.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
Documentation
The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.
Noun singularization & pluralization
For French nouns there is singularize() and pluralize(). The implementation uses a statistical approach with 93% accuracy for singularization and 92% for pluralization.
>>> from pattern.fr import singularize, pluralize >>> >>> print singularize('chats') >>> print pluralize('chat') chat chats
Verb conjugation
For French verbs there is conjugate(), lemma(), lexeme() and tenses(). The lexicon for verb conjugation contains about 1,750 common French verbs (constructed with Bob Salita's verb conjugation rules). For unknown verbs it will fall back to regular expressions with an accuracy of about 83%.
French verbs have more tenses than English verbs. In particular, the plural differs for each person, and there are additional forms for the FUTURE tense, the IMPERATIVE, CONDITIONAL and SUBJUNCTIVE mood and the PERFECTIVE aspect:
>>> from pattern.fr import conjugate >>> from pattern.fr import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE >>> >>> print conjugate('suis', INFINITIVE) >>> print conjugate('suis', PRESENT, 1, SG, mood=SUBJUNCTIVE) >>> print conjugate('suis', PAST, 3, SG) >>> print conjugate('suis', PAST, 3, SG, aspect=PERFECTIVE) être sois était fut
For PAST tense + PERFECTIVE aspect we can also use PRETERITE (passé simple). For PAST tense + IMPERFECTIVE aspect we can also use IMPERFECT (imparfait):
>>> from pattern.fr import conjugate >>> from pattern.fr import IMPERFECT, PRETERITE >>> >>> print conjugate('suis', IMPERFECT, 3, SG) >>> print conjugate('suis', PRETERITE, 3, SG) était fut
The conjugate() function takes the following optional parameters:
Tense | Person | Number | Mood | Aspect | Alias | Example |
INFINITVE | None | None | None | None | "inf" | être |
PRESENT | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sg" | je suis |
PRESENT | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sg" | tu es |
PRESENT | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sg" | il est |
PRESENT | 1 | PL | INDICATIVE | IMPERFECTIVE | "1pl" | nous sommes |
PRESENT | 2 | PL | INDICATIVE | IMPERFECTIVE | "2pl" | vous êtes |
PRESENT | 3 | PL | INDICATIVE | IMPERFECTIVE | "3pl" | ils sont |
PRESENT | None | None | INDICATIVE | PROGRESSIVE | "part" | étant |
PRESENT | 2 | SG | IMPERATIVE | IMPERFECTIVE | "2sg!" | sois |
PRESENT | 1 | PL | IMPERATIVE | IMPERFECTIVE | "1pl!" | soyons |
PRESENT | 2 | PL | IMPERATIVE | IMPERFECTIVE | "2pl!" | soyez |
PRESENT | 1 | SG | CONDITIONAL | IMPERFECTIVE | "1sg->" | je serais |
PRESENT | 2 | SG | CONDITIONAL | IMPERFECTIVE | "2sg->" | tu serais |
PRESENT | 3 | SG | CONDITIONAL | IMPERFECTIVE | "3sg->" | il serait |
PRESENT | 1 | PL | CONDITIONAL | IMPERFECTIVE | "1pl->" | nous serions |
PRESENT | 2 | PL | CONDITIONAL | IMPERFECTIVE | "2pl->" | vous seriez |
PRESENT | 3 | PL | CONDITIONAL | IMPERFECTIVE | "3pl->" | ils seraient |
PRESENT | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sg?" | je sois |
PRESENT | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sg?" | tu sois |
PRESENT | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sg?" | il soit |
PRESENT | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1pl?" | nous soyons |
PRESENT | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2pl?" | vous soyez |
PRESENT | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3pl?" | ils soient |
PAST | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgp" | j' étais |
PAST | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgp" | tu étais |
PAST | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgp" | il était |
PAST | 1 | PL | INDICATIVE | IMPERFECTIVE | "1ppl" | nous étions |
PAST | 2 | PL | INDICATIVE | IMPERFECTIVE | "2ppl" | vous étiez |
PAST | 3 | PL | INDICATIVE | IMPERFECTIVE | "3ppl" | ils étaient |
PAST | None | None | INDICATIVE | PROGRESSIVE | "ppart" | été |
PAST | 1 | SG | INDICATIVE | PERFECTIVE | "1sgp+" | je fus |
PAST | 2 | SG | INDICATIVE | PERFECTIVE | "2sgp+" | tu fus |
PAST | 3 | SG | INDICATIVE | PERFECTIVE | "3sgp+" | il fut |
PAST | 1 | PL | INDICATIVE | PERFECTIVE | "1ppl+" | nous fûmes |
PAST | 2 | PL | INDICATIVE | PERFECTIVE | "2ppl+" | vous fûtes |
PAST | 3 | PL | INDICATIVE | PERFECTIVE | "3ppl+" | ils furent |
PAST | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sgp?" | je fusse |
PAST | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sgp?" | tu fusses |
PAST | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sgp?" | il fût |
PAST | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1ppl?" | nous fussions |
PAST | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2ppl?" | vous fussiez |
PAST | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3ppl?" | ils fussent |
FUTURE | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgf" | je serai |
FUTURE | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgf" | tu seras |
FUTURE | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgf" | il sera |
FUTURE | 1 | PL | INDICATIVE | IMPERFECTIVE | "1plf" | nous serons |
FUTURE | 2 | PL | INDICATIVE | IMPERFECTIVE | "2plf" | vous serez |
FUTURE | 3 | PL | INDICATIVE | IMPERFECTIVE | "3plf" | ils seron |
Instead of optional parameters, a single short alias, or PARTICIPLE or PAST+PARTICIPLE can also be given. With no parameters, the infinitive form of the verb is returned.
Reference: Salita, B. (2011). French Verb Conjugation Rules. Retrieved from: http://fvcr.sourceforge.net.
Attributive & predicative adjectives
French adjectives inflect with an -e, -s or -es suffix depending on gender. There are many irregular cases (e.g., curieux → une fille curieuse). You can get the base form with the predicative() function. A statistical approach is used with an accuracy of 95%.
>>> from pattern.fr import predicative >>> print predicative('curieuse') curieux
Sentiment analysis
For opinion mining there is sentiment(), which returns a (polarity, subjectivity)-tuple, based on a lexicon of adjectives. Polarity is a value between -1.0 and +1.0, subjectivity between 0.0 and 1.0. The accuracy is around 74% (P 0.77, R 0.73) for book reviews:
>>> from pattern.fr import sentiment >>> print sentiment('Un livre magnifique!') (1.0, 1.0)
Parser
For parsing there is parse(), parsetree() and split(). The parse() function annotates words in the given string with their part-of-speech tags (e.g., NN for nouns and VB for verbs). The parsetree() function takes a string and returns a tree of nested objects (Text → Sentence → Chunk → Word). The split() function takes the output of parse() and returns a Text. See the pattern.en documentation (here) how to manipulate Text objects.
>>> from pattern.fr import parse, split >>> >>> s = parse(u"Le chat noir s'était assis sur le tapis.") >>> for sentence in split(s): >>> print sentence Sentence('Le/DT/B-NP/O chat/NN/I-NP/O noir/JJ/I-NP/O' "s'/PRP/B-NP/O était/VB/B-VP/O assis/VBN/I-VP/O" 'sur/IN/B-PP/B-PNP le/DT/B-NP/I-PNP tapis/NN/I-NP/I-PNP ././O/O')
The parser is based on Lefff. For words in Lefff that can have multiple part-of-speech tags, we used Lexique to find the most frequent POS-tag.
References:
Sagot, B. (2010). The Lefff, a freely available and large-coverage morphological and syntantic lexicon for French. Proceedings of LREC'10.
New, B., Pallier, C., Ferrand, L. & Matos, R. (2001). A lexical database for contemporary french: LEXIQUE. L'année Psychologique.