Random notes from Sunday

Seda preparing a session on Machine Learning: http://pad.constantvzw.org/public_pad/touchingCorrelations

Weka: http://www.cs.waikato.ac.nz/~ml/index.html (floss java package for machine learning)

"2 features are redundant if they are highly correlated with each other"

"occam's razor: simpler models are usually better"

Hamid Ekbia, Bonnie Nardi, Heteromation and its (dis)contents: The invisible division of labor between humans and machines

Alison Adams, Artificial Knowing: Gender and the Thinking Machine (1993)

where to go today (& tomorrow & ... )

* What would be a sentence that would have a sentiment polarity of 0.0?
popular polarity analysis
* A proposed exercise:
we can change the en.sentiment.xml file in Pattern, with our own adjectives. --> which adjectives?
file used in other programs, elements of pattern?

* A map of Pattern structure
listing the Pattern-2.6.zip structure --> [[training-common-sense-Pattern-2.6-structure]]
and adding notes
looking at geneology; a mapping of where things come from or where else they appear

* Introduction: pattern-readme-plus.md
introduction to the forked Pattern-2.6 [[training-common-sense-pattern-readme-plus.md]]

* A note added to ... 
how to name our comments/notes

* An explanation of vector-projections --> examples:
    Principal Coponent Analysis (PCA) with Animation: https://www.youtube.com/watch?v=9DPiXrN2pEg

* Some quotes, references to critical resources

* a reflection on the data-mining culture --> where?

an option is ....

or ....

* how to create a 'critical-fork-method', a critical-issue-tracker

* create alternative type of tutorials of Pattern 
next to the comments and files of the critical fork

* write the license for the critical fork (as "relearn" ? something else ?)

* Use criticisms of semantics 
(with the focus on words as they relate to each other in a sentence) and pragmatics (with the focus on how meaning of words is always produced in a certain context) to think about the particular failure of assigning numerical value to, for e.g. adjectives

Notes from Pattern lexicon en.sentiment.xml

<word form="grotesque" cornetto_synset_id="n_a-503484" wordnet_id="a-00221627" pos="JJ" sense="distorted and unnatural in shape or size" polarity="-1.0" subjectivity="1.0" intensity="1.0" confidence="0.8" />
<word form="grotesque" cornetto_synset_id="n_a-535905" wordnet_id="a-00221627" pos="JJ" sense="distorted and unnatural in shape or size" polarity="-0.1" subjectivity="1.0" intensity="1.0" confidence="0.8" />

 for the same word - Grotesque - there is the same definition, but differing polarity values "1.0" and "0.1"

note on avaraging a polarity-rate

comment written in the example file (pattern-2.6/examples/03-en), that directly uses the en.sentiment.xml file: 

notes on the construction of sentiment() & en.sentiment.xml

- an issue on the Github page of Pattern asks about the method behind the xx.sentiment.xml file. (see --> https://github.com/clips/pattern/issues/85 )
- Tom de Smedt explaining how the sentiment lexicon is constructed --> http://www.jmlr.org/papers/volume13/desmedt12a/desmedt12a.pdf :

- the en.sentiment.xml file is then extended by using the adjectives used in the Pang & Lee dataset v2. This dataset is based on 1000 positive & 1000 negative movie reviews.

Pang & Lee, movie review dataset
developed at the Cornell University, NLP department --> https://confluence.cornell.edu/display/NLP/Home
* profiles:

* links: