training-common-sense day 5



[[training-common-sense-pattern-readme-plus.md]]


reference
[[training-common-sense-day-5]]
ConceptNet --> http://conceptnet5.media.mit.edu/
what are in our opinion, the two other most notable large-



Certifying and removing disparate impact?

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process. When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses. We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny

http://arxiv.org/pdf/1412.3756v3.pdf
http://spectrum.ieee.org/tech-talk/computing/software/computer-scientists-find-bias-in-algorithms

Training camp: "How to scream back at algorithms" (will it need to be a course? therapy?)

web interface for WordNet

http://www.visualthesaurus.com/
http://10.9.8.58:8000/cgi-bin/wordnet.cgi
http://10.9.8.58:8000/cgi-bin/wordnet-hypernyms.cgi

you can also download wordnet via apt




Thinking about the collaborative aggreement of meaning - wikitionary could be used as a database for linking words & meaning. But! Does Wikitionary initiallly draw on wordnet as a starting point for the dictionary references? 
Taking our old friend amazing:
    From wordnet adj file in pattern: ( data.adj)
01282510 00 s 05 amazing 0 awe-inspiring 0 awesome 0 awful 0 awing 0 001 & 01282014 a 0000 | inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden soul beneath"- Melville; "Westminster Hall's awing majesty, so vast, so high, so silent"  

comment style:

#!PATTERN+ : ..........................

[[training-common-sense-comment-corpora]]


(data-mining-culture) World Well Being Project

aritcle titles

- Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
- Using Twitter to measure public discussion of diseases: A case study
- Extracting human temporal orientation from Facebook language
- An analysis of the user occupational class through Twitter content
- Data-driven content analysis of social media: A systematic overview of automated methods
- The role of personality, age and gender in tweeting about mental illnesses
- Mental illness detection at the World Well-Being Project for the CLPsych 2015 Shared Task
- Psychological language on Twitter predicts county-level heart disease mortality
- Automatic personality assessment through social media language
- Developing age and gender predictive lexica over social media
- Towards assessing changes in degree of depression through Facebook
- The online social self: An open vocabulary approach to personality
- From "sooo excited!!!" to "so proud": Using language to study development
- Personality, gender, and age in the language of social media: The Open-Vocabulary Approach
- Characterizing geographic variation in well-being using tweets.
- Toward personality insights from language exploration in social media
- Choosing the right words: Characterizing and reducing error of the Word Count Approach.
---------------------------------------------------------------------------------------------------------------------------------
17 articles

13 articles are based on language used on social-media, such as Facebook and Twitter