training-common-sense day 2

[[quotes]] : "on the circularity of classification"

"Gandy theorises that the attentions given to classification, in its naming as relevant/not relevant,
criminal/innocent, and so on, are determined by the problems assigned to each instance of its enquiry.
In other words, types are assigned to associated preemptive actions based on the desired outcomes.
The assignment of relevancy can then be seen as assigning relations of power, and the ownerships of
this power reside in the ability to name a particular instance within certain contexts that require
attention" <ref>Becker, K. (2009) The Power of Classification: Culture, Context, Command, Control, Communications, Computing. in Becker, K. and Stalder, F. (eds.) (2009) Deep Search: The Politics of Search Beyond Google. Innsbruck: Studienverlag Gesmbh</ref>
...
What emerges is the quantification of life and social relations that carry potentials to produce other powers and biases within the same spaces that hold great benefit for our social environments. Within this framework classifications are the grounds on which both imbalances and social advances are created and articulated.

ways to go

- going through Pattern, following the KDD steps
- re-do (try to) of the facebook-messages data-mining
- look at types of visualisation

- wordclouds
- - words have multiple meanings
- graph module in Pattern

- two ways of speaking about a KDD-proces

- the data speaks / i didn't make this up
- computer vision

* [excuse me video] --> contradiction of desired + undesired result
* which is also present in all the KDD steps --> common sense
* interesting --> laughter --> when do we laugh?
* so many layers of signification that were present to create such wordcloud ...

- facebook user which is female = (is equalized with) "shopping, boyfriend, etc."
- the wordcloud is a kind of basic vocabulary, disregarding the more complex ways of expressing an urge to shop (for example)

* difference between a truth, or a cultural pattern that appears in the language in a certain situation
* we know there are patterns in language use, we notice them ... science tries to create models to do predictions ... but what would be the solution to these patterns ...
* if we would create such a wordclouds on for example race-classification ... it could be a more rigorous result ... but yes, it's the data right?
* will there always be a bias result of gender-classification? even if you perform an adequate research process
* difference between:

- black people are criminal
- there are more black people in jail

* there are analysis on age, gender, health, personality, ... --> global topics that can be connected
* statistics is a tool that is the result of an abstraction

- a main tool of science since the 17th century

* ~~philosopher~~ bigologist and ethologist (comparative animal behavior science) (died in 1978) ....Konrad Lorenz : "you have to swim in observations, before you can start a statistical experiments"

- perception of shapes/forms
- you have to understand the size of the system, before you can start

* history of orientic parole (the choise if a prisoner could go on parole), descision making process of this is highly influenced by data-mining prediction-tools
* there are suggestive algorithms for dating/products/etc., one loses your ability to make a descicion

> example: GPS --> makes you lose your ability to navigate
< but humans and machines are maybe not naturally exclusive... the presence of GPS might change me, but what is the balance between relying on machines --- trans-intelligence

* how did the facebook users react on the results of gender-mining? Look at qualitive elements of the research
* what is the aim, interest of facebook to focus on gender-mining?

going through Pattern, following the KDD steps:

- download Pattern, options:

- from the relearn server: ../training-common-sense/sources/software/pattern-2.6.zip
- from the CLiPS, Pattern website http://www.clips.ua.ac.be/pages/pattern --> scroll down to installation for instructions
- If you have pip: sudo pip install pattern
- - to install pip: sudo apt-get install pip
- (but.. problems with the tunnel to install via apt & pip)

- examples:

- for Pattern examples, provided by Pattern itself: ../training-common-sense/sources/software/pattern-2.6/examples/
- some prepared example python-scripts, following the 5 KDD steps: ../training-common-sense/KDD-steps/knowledge-discover-in-data_KDD_steps/

----------------------------------------------

amazing polarity average
amazing appears two times in the en.sentiment.xml-file:

1. <word form="amazing"
- wordnet_id="a-01282510"
- pos="JJ"
- sense="inspiring awe or admiration or wonder"
- polarity="0.8"
- subjectivity="1.0"
- intensity="1.0"
- confidence="0.9"
- />
2. <word form="amazing"
- wordnet_id="a-02359789"
- pos="JJ"
- sense="surprising greatly"
- polarity="0.4"
- subjectivity="0.8"
- intensity="1.0"
- confidence="0.9"
- />

In the example that Pattern provides (pattern-2.6/examples/03-en/07-sentiment.py) the word amazing gives a result of a polarity of 0.66666:

from pattern.en import sentiment, polarity, subjectivity, positive
for word in ("amazing"):
print word, sentiment(word)

>>> amazing (0.6000000000000001, 0.9)

which is the mathematical average between the polarity value of 0.8 of the first sense of amazing, and the polarity value of 0.4 of the second sense of amazing.

meaning is mathematically averaged...... (???????)

Looking through sentiment examples:

# It contains adjectives that occur frequently in customer reviews
ie sentiments related to consuming, but consuming what

en-sentiment.xml

"The reliability specifies if an adjective was hand-tagged (1.0) or inferred (0.7)."
Inferred = decided by the algorithm? Sentiment according to the machine?

<word form="frustratingly" wordnet_id="" pos="RB" sense="" polarity="-0.2" subjectivity="0.2" intensity="1.2" reliability="0.9"/>

it appears there is only reliability of 0.9 to be found; not each term has a reliability score.
Some adjectives have no wordnet id. Is the file a mash-up?
Sometimes the term "confidence" is used (reliability and confidence are never found on the same word, and confidence is either rated 0.8 or 0.9). Could this be the same?

Comparing en-sentiment.xml to SentiWordNet 3.0

<word form="affluent" cornetto_synset_id="n_a-526762" wordnet_id="a-02022167" pos="JJ" sense="having an abundant supply of money or possessions of value" polarity="0.6" subjectivity="1.0" intensity="1.0" confidence="0.8"/>

02022167        0        0.25        wealthy#1 moneyed#2 loaded#4 flush#2 affluent#1        having an abundant supply of money or possessions of value; "an affluent banker"; "a speculator flush with cash"; "not merely rich but loaded"; "moneyed aristocrats"; "wealthy corporations"

<word form="afloat" cornetto_synset_id="n_a-533320" wordnet_id="a-00076921" pos="JJ" sense="borne on the water" polarity="0.0" subjectivity="0.1" intensity="1.0" confidence="0.8"/>

00076921        0        0        afloat#2        borne on the water; floating

From sentiwordnet annotation:

"objectivity = 1 - (PosScore + NegScore)"

affluent: objectivity = 1 - (0 + 0,25) = 0.75
afloat: objectivity = 1 - (0 + 0) = 1

http://sentiwordnet.isti.cnr.it/

----------------------------------------------

Presentation Hans Lammerant

[[training-common-sense]] -> context
rewritten version by FJ --> ../training-common-sense/Hans_presentation1.html

to make a link to a new etherpad in html -->
<a href="[[name-of-your-pad]]">[[name-of-your-pad]]</a>

----------------------------------------------------------------
----------------------------------------------------------------

construction of a certain visibility
text get simplified, a construction
get data out of text, to make it 'treatable' with math

step 3A: turning text into numbers

example --> source: french and english versions of Shakespeare. Bag of words: each word is placed in a mathematical space.

bag-of-words:

first step = text --> drop all meaning, drop all connection between words, what you get is a word-order
this is reading a text in a very specific way, it is the way how the computer reads it --> it is counting the words
what you get --> is a vector in a mathematical space
"my algorithm is reading Shakespear like this right now"
"there are 26 dimensions, mathematically that is no problem"

Each word an axe in a multidimensional space

Now: bag-of-letters, 'only' 26 axis; ie 'only' 26 dimensions; the vector has 26 coordinates
It is hard to imagine ... if it would be a, b, c you would have a 3D space

22 points (ie texts) with 26 dimensions: how many e's, how many b's
each text has only one point in the 26 dimensions

every text is a point in this multi-dimensional space, having 26 coordinates for the 26 dimension

"why does it have to be dimensions (why do we use the term dimensions)?" --> why this metaphorical form?

if you reduce dimensions, it gets simpler, you lose info

n-dimensions -- a math idea, not related to our 3D space
one of the basic tools in mathematics is thinking in dimensions
we do not talk about physical dimensions, we talk about mathematical dimensions

datamining: 'trying to get some meaning out of this'

now: translating the texts to a mathematical space. For this you need to simplify: forget about word order (if you would keep it, it would explode the amount of dimensions)
the points are a very simplificated model of the text, which got rid of the meaning . It is common practice.

an ordered, or unordered text of Hamlet, does not matter for the machine

your label/annotation is another column/dimensions/coordinate in your dataset, that changes the location of a point within the 26 (now 27) dimensions

> what does the algorithm see?

Multidimensional scaling
reveals clusters, in this case: language difference.
(example: from a lot of colors in a picture to grayschale, and so to two dimensions)
you rotate the axis in different way until you see the biggest difference.
Metric MDS = multi-dimensional-scaling

--> the act of rotating the axis, in order to find an axe on which you can make a better differentation between points
MDS = a program, a mathematical tool, which helps you finding the highest contrast
this step help you to find a view, and a way to reduce dimensions, and you can throw the rest out

two-dimensions --> is a plane
three dimensions --> projecting points on a plain
this is also possible with 26 dimensions
and you look for a plane which covers your points ...

in order to decide the point of one text into the 2-dimensional graph, in this example all the coordinates of the 26 letters are summed up, and divided by 26 --> average number
these number doesn't have any meaning anymore, it is a way to differentiate between texts

making a bag-of-letter model is a very simplified model of the texts, on such a level that you can read another type of information

this step reduces information, not by reducing dimensions

(if you're looking for a model ... you start to make a dataset, with an amount of dimensions... but in order to get a working model ... you need to be able to recognize your expectations ...)

modeling the line
from now on, you can throw all the data-points away, because you have a model
this is the moment of truth construction

"and you hope that it has something to do with the reality you want to apply it to"

"the big leap is to check if your model is able to predict something later"

knowledge discovery implies it can find clusters by itself
what if you find a contrast in a vector space, but you don't know what it means
> how do you know what it means?
< the only way is to check it to other data
> so you check each regularity, even if you don't know what kind of respondence with reality it has
< a kind of myth what is present in data-mining, is that an algorithm can discover this respondence

the question is if the X's & O's come before the line? or after the line? when is the data labeled? is that a process?
(is the data supervised?)

a kind of flip-flop process where you look for a differentation, then there is a moment where you can create your model, and then your differentation is *fixed* and from that on, it applies the model to other data

a hypothesis is always present in creating a model

in traditional statistics, there always is a hypothesis to get what you want
in data-mining, there seem to still be a hypothesis present
the point is: when do you formulate your hypothesis?

now: when you see a differentiation, you start thinking what it could mean
so, there is another moment of formulating your hypothesis

validation phase
--> a validation method is comparing your results with another text

overfitting --> making sure if very specific points are included in your model ...

there are standard validation procedures to reaching the moment of 'it works'

for example the 20% test-data, and 80% building data
--> the 20% need to be labeled on before hand, in order to check the model's results

Q: What is the name of the dimension to rotate around?
A: The rotating has nothing to do with meaning ... it is just the possibility to represent in 2D at some point?
A: The average is one point in the process. Relative distribution of letters ... we normalize. Afterwards: algorithm helps you find the plane with the most extreme difference.
Q: Can you see the process without, before the normalization
A: If I would not normalize, ..

Q is there a way of looking at the process of the algorithm? to look at the moment between

- there are points in the space, and the algorithm looks at it
- and the algorithm gives back a result that a human can read

outliers
> your algorithm gets better if you take your outliers out
< is the outlier a spelling mistake? or really a outlier?
> is removing outlier a standard practise?
< well, checking it is. and checking if it is a mistake or not

> does the amount of dimensions influence the efficiency of the model, is there an interest in the amount of dimensions?
< if there is a difference between 1000 and 1000000, there is a difference ... and there is also a difference in type of dimensions ...
> i'm trying to understand the economy of the dimensions...
>> but the economy is in the results of the model?