Notes:
    
A name? Predictive Text Space Adventure 3000

How does it work?
- Dasher (software) is used as an interface to navigate through letters and write/generate text;
- Differents sets of texts (corpora) -> You can feed Dasher with your own texts; the software will analise the sentence composition, frequent words, frequent word chains... and will present these to you
  We gave Dasher the following types of texts to see how it would react:  of Portuguese Fado music lyrics; MP interventions from the Portuguese parliamentary debates; linux kernel code; Relearn 2015 etherpad dump; 

Game metaphor to show the paths that probabilistic predictions can take
Space side-scrollers
Path through language model. Up-down: different linguistic choices. Left-right: Generation speed.


----------------------------

Manual etherpad dump: [[AllEtherText]]

Hi!
On this pad, a computer is using Dasher to generate text.
This is mostly random but we want to understand how things come about using this generation method.

Tools:

Generated texts from day 2: ../text-generation/text-and-drive/generated_texts

How to feed Dasher with any corpus

Using the "Import Training Text" option did not work for us, it hangs at 100% progress. We found another method, though.

  1. Ensure that you have a ~/.dasher directory.
  2. Copy over an alphabet file from /usr/share/dasher to ~/.dasher . There are alphabet files available for many languages, just copy the one you will be working with (e.g. alphabet.french.xml).
  3. Also copy your corpus to ~/.dasher . Your corpus file is just a text or collection of texts that Dasher will learn from. The file name is your choice.
  4. Edit your alphabet file (and maybe rename it since you'll be using one for each corpus), and change the following things
    1. alphabet name: The name that will appear on Dasher's language selection interface
    2. train: The filename for your corpus file
    3. groups and characters: these are the characters which will appear on Dasher. Feel free to remove numbers, punctuation and other that you won't need. Remove the uppercase group to get only lowercase output. Accented characters show up as numeric HTML entities -- see here for entity codes http://www.w3schools.com/html/html_charset.asp
  5. Now run Dasher, you should find your new alphabet in the language selection dialog.

It is not necessary to pre-process your corpus file for usage, but there are some details on the inner workings of Dasher here: http://www.inference.phy.cam.ac.uk/dasher/Training.html

Setting up the OS for Spacedasher


Spaceship sprite source:
http://orig15.deviantart.net/6a58/f/2010/318/2/b/spaceship_sprites_by_pavanz-d32tpys.png