training common sense

[[training-common-sense-day-2]] (friday)
[[training-common-sense-day-3]] (saturday)
[[training-common-sense-day-4]] (monday)
[[training-common-sense-day-5]] (tuesday)

---------------------------------------------------------

links

earlier notes:

http://pad.constantvzw.org/p/commonsense

the Annotater @ Cqrrelations, Jan. 2015

---------------------------------------------------------

git & the critical-fork-folder
to clone the Pattern 2.6 critical-fork folder

git clone USERNAME@10.9.8.7:/home/bare_git_repositories/pattern-2.6-critical-fork/

(there is a hook activated on this bare-repository,
that wil automatically sync the git branch at
../training-common-sense/sources/software/pattern-2.6-critical-fork/
with the bare-repository)

---------------------------------------------------------

* shift -->

before, traditionally : thinking in categories, researching for the truth through questionaires with predefined categories
now, big data : thinking in raw data, natural 'truths', the truth is out there (in the raw data) just undiscovered

* but... categorisation is still present in data-mining, but more hidden, and therefor harder grasp
* still there are categories, but they are often disguished as math

* co-intelligence between humans and machines
* there is a lot of un-rawness to this data

* repeat the model untill the model is true

- example: history of western classical music doesn't involve woman, a comment was: but i looked, there are no
- example: alternative scientific truth searching methods --> work on (non) common sense valid truths

* [how]? [you]? [write]?

../images/gender-female-most-common-words_natural-result.png
from: the article Toward personality insights from language exploration in social media

door ut te betalen

../images/article7.png
machine categorisation of positive / negative words
http://www.wwbp.org/data.html#refinement , from the article: Personality, gender, and age in the language of social media: The Open-Vocabulary Approach

looking at applications

Facebook messages Gender/Age profiles

- Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach (2013) --> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783449/
- Word Well-Being Project --> http://www.wwbp.org/data.html#refinement
- The Big Five Personality test --> http://www.outofservice.com/bigfive/
- - I'm a O80-C74-E18-A32-N14 Big Five!! --> http://www.outofservice.com/bigfive/results/?oR=0.85&cR=0.722&eR=0.375&aR=0.583&nR=0.312
- - http://www.thebigfiveproject.com/results/?ocean=775,555,718,694,25,7c&dem=1970,f,dd

Hedonometer --> http://hedonometer.org/

going to Knowledge Discovery in Data steps, by looking at Pattern

step 1 --> data collection
step 2 --> data preparation
step 3 --> data mining
step 4 --> interpretation
step 5 --> determine actions

Pattern
created at CLiPS, a research project at the University of Antwerp
http://www.clips.ua.ac.be/pages/pattern

annotating the presentation/

dealing with large amounts of data and triyng making senso of that

big data: the idea of looking at knowledge through categories
but its also raw data, which machine can 'read' through without thinking in relation to these categories (looking for truth)

the crudeness of the presented results: what does that mean? (gender differences in social medias)
what does it mean?

we wanto to look at:

- data collection (thier technological as well as social context)
- data preparation (look at the ~~rough~~ raw data and its next 'use')
- data mining (looking into mathematical models, graphs; the translation from linguistic models to mathematical clusters; looking at different projections, what are the differencies? the 'suitability' of the models in relation to the outcomes)
-
-

what we want to do:
- look in actual text mining programs, look and discuss what happens
- look at documentation on the collection/ing of this datas (exuses, apologies about the political/social problematics expressed in the 'truthfullness' of the mathematical results) - revealing subtleties
- the research on the data mining is very close to the text generation project

example:smilies-->not recognized as meaningful
meaningfulness of cleaning data (we know it is-like cleaning toilets- but we don't do it)
use facebook data to train next mining process - 'self-fulfilling prophecy' of common sense
data-mining->"the" step, but actually just a small part of the process
mathematical model--->truth finding moment-->the machine performance
with enough time, with enough data...

compare first results with standard, most of the time human-produced data
e.g. libraries of positivity/negativity online

The best material model of a cat is another, or preferably the same, cat.
Philosophy of Science 1945. (Wiener)

Project IMB: you are what your write on Twitter: http://venturebeat.com/2013/10/08/ibm-researcher-can-decipher-your-personality-in-200-tweets/

Suggested references
An article that poses questions and reflections on methodology applied in research. They obtain "valid cientific" results that have no common sense at all. The abstract is quite inspiring
http://www0.cs.ucl.ac.uk/staff/m.slater/Papers/colourful.pdf

ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave unstated: http://conceptnet5.media.mit.edu/
Unfortunately, common sense is not yet within reach for this 4-year old: http://www.extremetech.com/extreme/161383-artificial-intelligence-has-the-verbal-skills-of-a-four-year-old-still-no-common-sense