training-common-sense day 5

[[training-common-sense-pattern-readme-plus.md]]

reference
[[training-common-sense-day-5]]
ConceptNet --> http://conceptnet5.media.mit.edu/

- still active
- project is being developed since 2003
A good introductory document on the project (with context) :
BT Technology Journal • Vol 22 No 4 • October 2004
211
ConceptNet — a practical commonsense reasoning tool-kit
http://web.media.mit.edu/~push/ConceptNet-BTTJ.pdf
"ConceptNet is a freely available commonsense knowledge base and natural-language-processing tool-kit which supports many practical
textual-reasoning tasks over real-world documents including topic-gisting, analogy-making, and other context oriented inferences. The
knowledge base is a semantic network presently consisting of over 1.6 million assertions of commonsense knowledge encompassing
the spatial, physical, social, temporal, and psychological aspects of everyday life. ConceptNet is generated automatically from the 700 000
sentences of the Open Mind Common Sense Project — a World Wide Web based collaboration with over 14 000 authors"
page 2 :
"The size and scope of ConceptNet make it comparable to,

what are in our opinion, the two other most notable large-

scale semantic knowledge bases in the literature: Cyc and
WordNet. However, there are key differences, and these will
be spelled out in the following section. While WordNet is
optimised for lexical categorisation and word-similarity
determination, and Cyc is optimised for formalised logical
reasoning, ConceptNet is optimised for making practical
context-based inferences over real-world texts. "
...
"ConceptNet is also unique from Cyc and WordNet for its
dedication to contextual reasoning."

Certifying and removing disparate impact?

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process. When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses. We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny

http://arxiv.org/pdf/1412.3756v3.pdf
http://spectrum.ieee.org/tech-talk/computing/software/computer-scientists-find-bias-in-algorithms

Training camp: "How to scream back at algorithms" (will it need to be a course? therapy?)

web interface for WordNet

http://www.visualthesaurus.com/
http://10.9.8.58:8000/cgi-bin/wordnet.cgi
http://10.9.8.58:8000/cgi-bin/wordnet-hypernyms.cgi

you can also download wordnet via apt

sudo apt-get install wordnet

wn YOURWORD -hypen -treen
--> returns the hierarchy of the word towards the root of wordnet: 'entity'
wn YOURWORD -g
--> returns a gloss

Thinking about the collaborative aggreement of meaning - wikitionary could be used as a database for linking words & meaning. But! Does Wikitionary initiallly draw on wordnet as a starting point for the dictionary references?
Taking our old friend amazing:
From wordnet adj file in pattern: ( data.adj)
01282510 00 s 05 amazing 0 awe-inspiring 0 awesome 0 awful 0 awing 0 001 & 01282014 a 0000 | inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden soul beneath"- Melville; "Westminster Hall's awing majesty, so vast, so high, so silent"

Match with internet search:

comment style:

#!PATTERN+ : ..........................

[[training-common-sense-comment-corpora]]

(data-mining-culture) World Well Being Project

aritcle titles

- Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
- Using Twitter to measure public discussion of diseases: A case study
- Extracting human temporal orientation from Facebook language
- An analysis of the user occupational class through Twitter content
- Data-driven content analysis of social media: A systematic overview of automated methods
- The role of personality, age and gender in tweeting about mental illnesses
- Mental illness detection at the World Well-Being Project for the CLPsych 2015 Shared Task
- Psychological language on Twitter predicts county-level heart disease mortality
- Automatic personality assessment through social media language
- Developing age and gender predictive lexica over social media
- Towards assessing changes in degree of depression through Facebook
- The online social self: An open vocabulary approach to personality
- From "sooo excited!!!" to "so proud": Using language to study development
- Personality, gender, and age in the language of social media: The Open-Vocabulary Approach
- Characterizing geographic variation in well-being using tweets.
- Toward personality insights from language exploration in social media
- Choosing the right words: Characterizing and reducing error of the Word Count Approach.
---------------------------------------------------------------------------------------------------------------------------------
17 articles

13 articles are based on language used on social-media, such as Facebook and Twitter