training common sense - day 3


adjectives



../training-common-sense/sources/pictures/good-bad.png

from: Using WordNet to Measure Semantic Orientations of Adjectives --> http://www.lrec-conf.org/proceedings/lrec2004/pdf/734.pdf


../training-common-sense/sources/pictures/sentiWordNet.png

http://sentiwordnet.isti.cnr.it/



adjectives in WordNet


references
Using WordNet to Measure Semantic Orientations of Adjectives --> http://www.lrec-conf.org/proceedings/lrec2004/pdf/734.pdf
book about WordNet --> ../training-common-sense/sources/texts/wordnet-an-electronic-lexical-database-language-speech-and-communication.9780262061971.33119.pdf

history of WordNet 
George A. Miller is initiator of WordNet, 
from two different fields, he worked on a computational lexicon:


work @ WordNet on adjectives



Philip N. Johnson-Laird

(born 12 October 1936) is a professor at Princeton University's Department of Psychology and author of several notable books on human cognition and the psychology of reasoning.
— from: https://en.wikipedia.org/wiki/Philip_Johnson-Laird

profile: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3250179/
portrait:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3250179/bin/pnas.1117174108unfig01.gif


When complex technology starts spiraling out of hand, this abundance of  information hinders our ability to make reliable decisions. 
“Eventually,” Johnson-Laird says, “the computational demands overwhelm  them, and this often culminates in catastrophes.”
— from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3250179/

worked on the grouping of adjectives to classes of nouns.
Can we find this file?



adjectives in Pattern


references
"Vreselijk  mooi!" (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives --> http://www.clips.ua.ac.be/sites/default/files/desmedt-subjectivity.pdf


commodification of difference


Looking at 'the cyclic commodification of difference', and entry points for a critique of text-mining (thank you Seda Guerses)

Alison Adam, Gay men, Gaydar and the commodification of difference (2008)
Alison Adam, A Feminist Critique of Artificial Intelligence http://ejw.sagepub.com/content/2/3/355.refs (1995)

to compute the semantic similarity/relateness/mathematical-relation between adjectives


departing from the article "Vreselijk  mooi!" --> http://www.clips.ua.ac.be/sites/default/files/desmedt-subjectivity.pdf


then, we looked at: Van de Cruys 2010, called Mining for Meaning, chapter 2 (page 31) --> https://www.rug.nl/research/portal/files/14692283/14complete.pdf

matrix of adjectives
a1 [ n1 n2 . . . . . . ny]
a2 [ n1 n2 . . . . . . ny]
a3 [ n1 n2 . . . . . . ny]
a4 [ n1 n2 . . . . . . ny]
a5 [ n1 n2 . . . . . . ny]
a6 [ n1 n2 . . . . . . ny]
ax [ n1 n2 . . . . . . ny]

example: micro/minimal context of adjectives

a1 n1 --> amazing book occur 5 times together
a2 n1 --> awesome book occur 10 times together
a3 n1 --> beautiful book occur 15 times together


a1 n1 --> amazing book
a2 n2 --> awesome person
a3 n3 --> beautiful flower

bright, brilliant, apocalyptic ... future
brilliant, well-written, interesting ... book

mathematical 'relation' or 'similarity' vs. semantic 'relation' or 'similarity'

../training-common-sense/sources/pictures/similarity-of-adjectives-related-to-nouns_from_mining-for-meaning-2.png


red (often), yellow (sometimes), tasty (very often), fast (hardly ever) ... apple
red (hardly ever), yellow (very often), tasty (often), fast (hardly ever)... banana
red (sometimes), yellow (sometimes), tasty (hardly ever), fast (often) ... car
red (very often), yellow (sometimes), tasty (hadly ever), fast (sometimes) ... truck


Algorithm used = "co-sine similarity measure"


then... (the text continues)

We  applied  this  approach  to  automatically  annotate  new adjectives, based on their semantic relatedness   to gold1000 [= hand annotated] adjectives.  From  the TWNC [Twente Nieuws Corpus (TwNC)]  (Ordelman  et  al., 2002), we analyzed 3,000,000 words and selected the top 2,500  most  frequent  nouns.  For  each  adjective  in TWNC that  is  also  in  the CORNETTO [dutch language, dutch counterpart WordNet] database,  we  counted  the number of times it directly precedes one or more of these top nouns, resulting in 5,784 adjective vectors with 2,500 vector  features. For  each gold1000 [= hand-annotated] adjective  we  then used cosine similarity [= algorithm described by Tim van de Cruys] to retrieve  the  top 20  most similar nearest  neighbors [what is neighbouring what?].  For fantastisch  (fantastic)  the  top five nearest  neighbors  are: geweldig  (great,  70%), prachtig (beautiful, 51%), uitstekend (excellent, 50%), prima (fine, 50%), mooi (nice, 49%) and goed (good, 47%).


beautiful - book: x times
beautiful - person: z times
beautiful - apple: y times
---
adjective vector
features: book, person, apple (ny)

translated into a mathematical matrix, it will look like: 

horrible - book: y times

example try: 

en-sentiment.xml


pattern's sentiment analysis uses a list of English adjectives in a xml file (in the pattern-2.6.zip file, it's located in pattern-2.6/pattern/text/en/). this file is also used by other projects (mentionning pattern) found after a very quick research : 
   TextBlob (Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more) https://github.com/sloria/TextBlob (this software uses two other sentiment analyzers) 
    Emotional (Subjectivtiy and sentiment/polarity analysis library for Node.js) https://github.com/ticup/emotional

fr-sentiment.xml


pattern's sentiment analysis for French uses a list of French adjectives in a xml file (in the pattern-2.6.zip file, it's located in pattern-2.6/pattern/text/fr/). this file is also used by other projects (mentionning pattern) found after a very quick research : 
   TextBlob (Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more) https://github.com/sloria/TextBlob (it seems that it is not the same version than for pattern, version 1.0 instead of 1.1 fpr pattern 2.6)

how about going forwards?


text-mining methods, text-mining culture

- document the steps that has been put on the table during this edition of Relearn, and has been produced since Cqrrelations

- how would you make it (a text-mining-process) better?

- what would an algorithm do?

- let's make a critical fork of Pattern (how are we going to call it ? Pattern the critical edition, PatternPlus, FlattenPlus...)


Google Group question: sentiment_score()

answered by Tom de Smedt

../training-common-sense/sources/pictures/if-happy-is-0.0-something-is-wrong.png

https://groups.google.com/forum/#!topic/pattern-for-python/FTeqb0p5eFM (this question)


Pattern on the web

Pattern's Github --> https://github.com/clips/pattern
Pattern's Google Group --> https://groups.google.com/forum/#!forum/pattern-for-python 

backing-up: Cqrrelations


Cqrrelations  is a work session lasting from  12 till 23 January 2015, in which a wide range of forms of life coming  from different backgrounds in arts, literature, science and computing  will gather to reflect on the influence of models and digital traces  over our daily reality and language. The work session is a space for theories and practices, for experiments,  discussions, prototypes… Human and non-human (in-)expertise will  facilitate the exploration the topic. Parts of the process will be  documented on this platform, possibly to end up in some forms of  publication later on.
Organised by Constant with the support of deBurenPublic events are organised in collaboration with deBuren, CPDP and Recyclart
www.cqrrelations.constantvzw.org/

pictures of the week: http://gallery3.constantvzw.org/index.php/Cqrrelations