training-common-sense day 5
[[training-common-sense-pattern-readme-plus.md]]
reference
[[training-common-sense-day-5]]
ConceptNet --> http://conceptnet5.media.mit.edu/
- - still active
- - project is being developed since 2003
- A good introductory document on the project (with context) :
- BT Technology Journal • Vol 22 No 4 • October 2004
- 211
- ConceptNet — a practical commonsense reasoning tool-kit
- http://web.media.mit.edu/~push/ConceptNet-BTTJ.pdf
- "ConceptNet is a freely available commonsense knowledge base and natural-language-processing tool-kit which supports many practical
- textual-reasoning tasks over real-world documents including topic-gisting, analogy-making, and other context oriented inferences. The
- knowledge base is a semantic network presently consisting of over 1.6 million assertions of commonsense knowledge encompassing
- the spatial, physical, social, temporal, and psychological aspects of everyday life. ConceptNet is generated automatically from the 700 000
- sentences of the Open Mind Common Sense Project — a World Wide Web based collaboration with over 14 000 authors"
- page 2 :
- "The size and scope of ConceptNet make it comparable to,
what are in our opinion, the two other most notable large-
- scale semantic knowledge bases in the literature: Cyc and
- WordNet. However, there are key differences, and these will
- be spelled out in the following section. While WordNet is
- optimised for lexical categorisation and word-similarity
- determination, and Cyc is optimised for formalised logical
- reasoning, ConceptNet is optimised for making practical
- context-based inferences over real-world texts. "
- ...
- "ConceptNet is also unique from Cyc and WordNet for its
- dedication to contextual reasoning."
Certifying and removing disparate impact?
What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process. When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses. We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny
http://arxiv.org/pdf/1412.3756v3.pdf
http://spectrum.ieee.org/tech-talk/computing/software/computer-scientists-find-bias-in-algorithms
Training camp: "How to scream back at algorithms" (will it need to be a course? therapy?)
web interface for WordNet
http://www.visualthesaurus.com/
http://10.9.8.58:8000/cgi-bin/wordnet.cgi
http://10.9.8.58:8000/cgi-bin/wordnet-hypernyms.cgi
you can also download wordnet via apt
- sudo apt-get install wordnet
- wn YOURWORD -hypen -treen
- --> returns the hierarchy of the word towards the root of wordnet: 'entity'
- wn YOURWORD -g
- --> returns a gloss
Thinking about the collaborative aggreement of meaning - wikitionary could be used as a database for linking words & meaning. But! Does Wikitionary initiallly draw on wordnet as a starting point for the dictionary references?
Taking our old friend amazing:
From wordnet adj file in pattern: ( data.adj)
01282510 00 s 05 amazing 0 awe-inspiring 0 awesome 0 awful 0 awing 0 001 & 01282014 a 0000 | inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden soul beneath"- Melville; "Westminster Hall's awing majesty, so vast, so high, so silent"
- Match with internet search:
comment style:
#!PATTERN+ : ..........................
[[training-common-sense-comment-corpora]]
(data-mining-culture) World Well Being Project
aritcle titles
- Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
- Using Twitter to measure public discussion of diseases: A case study
- Extracting human temporal orientation from Facebook language
- An analysis of the user occupational class through Twitter content
- Data-driven content analysis of social media: A systematic overview of automated methods
- The role of personality, age and gender in tweeting about mental illnesses
- Mental illness detection at the World Well-Being Project for the CLPsych 2015 Shared Task
- Psychological language on Twitter predicts county-level heart disease mortality
- Automatic personality assessment through social media language
- Developing age and gender predictive lexica over social media
- Towards assessing changes in degree of depression through Facebook
- The online social self: An open vocabulary approach to personality
- From "sooo excited!!!" to "so proud": Using language to study development
- Personality, gender, and age in the language of social media: The Open-Vocabulary Approach
- Characterizing geographic variation in well-being using tweets.
- Toward personality insights from language exploration in social media
- Choosing the right words: Characterizing and reducing error of the Word Count Approach.
---------------------------------------------------------------------------------------------------------------------------------
17 articles
13 articles are based on language used on social-media, such as Facebook and Twitter