336 image files
Image treatment in book:
Numbering system: page number, image credit number: each page restart numbering
image credits set in sans serif font
Photography credits:
Lithography? Looks like photographic reproductions (not digital scans). Why -> check with PierreH.
Probably lithographic work done by/at printer Snoeck-Ducaju & Zoon (Ghent)
so: piece -> photography -> lithography -> scan? film (rasterization, enlargement) -> plate -> paper -> scan -> ? :-)
or: direct ot plate?
Alexia: not a digital camera; scanned negatives
Would there be any films left in the bins of Snoeck?
Design: Bureau Piet Gerards, Heerlen
Is process of binding interesting?
TODO
- Look at the paper object with Pierre Huyghebaert and interrogate printing, reproduction
- Take a loop and look at raster
Scanning process
No specific treatment for images; images not treated separately or: are treated/scanned as text
but:
nn98-02 = img 3 on pg 98?
nn98-01 = img 2
pagenumbering not padded, so:
nnb119-000.jpg
nnb120-003.jpg
nnb12-000.jpg
nnb124-000.jpg
nnb128-000.jpg
nnb128-001.jpg
nnb128-002.jpg
nnb129-000.jpg
nnb14-000.jpg
nnb14-001.jpg
nnb143-000.jpg
nnb152-000.jpg
nnb152-001.jpg
nnb152-002.jpg
nnb152-003.jpg
nnb154-000.jpg
Where are img on page 12? Are we looking at a different edition? Freya goes up to check 2nd copy (Delft University Library). Two phyisical copies are 'identical'
Pagenumbers seem to be more or less correct; shifted by one page now and than
indeed the numbers are all result of the "multicrop" script.. and the references are temporarily lost..
can you give me a link to multicrop script? where does it live?
Check multicrop on this image: http://192.168.1.222/new_babylon/img/nb72-000.jpg
it is also the glueing of the script run different times with different settings cos many times wierd stuff would happen..
and some images are just the full page copied (full spread). so there is like 3 different numberings.. very neat. but i though to rename is just one command but the "history" of the different scripts used was fun to keep for now.
- multicrop -u 1 -f 10 file.jpg outfile.jpg , multicrop -u 3 -f 20 file.jpg outfile.jpg
the second one for example doesnt auto-rotate the images..
its really a mess indeed for now... and metadata would be something to look into..
we want to make the pdf searcheable at some point (tesseract) so that might make the process easier
TODO
- Look at the pdf and see what we can discover about the actual scanning (scanner/device)
- Look for multicrop artefacts
File analysis
Interesting to 'read back' the process in these actual files.
- identify -verbose * | grep "exif:"
- hehe i see some corporate stuff there :P
Image: nb19-003.jpg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Class: PseudoClass
Geometry: 969x1002+0+0
Resolution: 72x72
Print size: 13.4583x13.9167
Units: PixelsPerInch
Type: Grayscale
Base type: Grayscale
Endianess: Undefined
Colorspace: Gray
Depth: 8-bit
Channel depth:
gray: 8-bit
Channel statistics:
Gray:
min: 33 (0.129412)
max: 255 (1)
mean: 177.834 (0.697388)
standard deviation: 59.5622 (0.233577)
kurtosis: -0.908599
skewness: -0.808636
Colors: 222
Histogram:
1: ( 33, 33, 33) #212121 gray(33,33,33)
2: ( 35, 35, 35) #232323 gray(35,35,35)
2: ( 36, 36, 36) #242424 gray(36,36,36)
3: ( 37, 37, 37) #252525 gray(37,37,37)
4: ( 38, 38, 38) #262626 gray(38,38,38)
14: ( 39, 39, 39) #272727 gray(39,39,39)
Colormap: 256
0: ( 0, 0, 0) #000000 gray(0,0,0)
1: ( 1, 1, 1) #010101 gray(1,1,1)
2: ( 2, 2, 2) #020202 gray(2,2,2)
3: ( 3, 3, 3) #030303 gray(3,3,3)
4: ( 4, 4, 4) #040404 gray(4,4,4)
5: ( 5, 5, 5) #050505 gray(5,5,5)
Rendering intent: Undefined
Gamma: 1
Interlace: None
Background color: gray(255,255,255)
Border color: gray(223,223,223)
Matte color: gray(189,189,189)
Transparent color: gray(0,0,0)
Compose: Over
Page geometry: 969x1002+0+0
Dispose: Undefined
Iterations: 0
Compression: JPEG
Quality: 95
Orientation: TopLeft
Properties:
date:create: 2014-07-08T10:43:49+02:00
date:modify: 2014-07-08T10:43:49+02:00
exif:ExifImageLength: 1002
exif:ExifImageWidth: 969
exif:ExifOffset: 90
exif:Orientation: 1
exif:ResolutionUnit: 2
exif:XResolution: 72/1
exif:YResolution: 72/1
jpeg:colorspace: 1
jpeg:sampling-factor: 1x1
signature: d500cec9dd4d1f1adbb0bc1e5ab12486acc6ac4b1f01b56420ad8006d7588a20
Profiles:
Profile-exif: 126 bytes
Artifacts:
filename: nb19-003.jpg
verbose: true
Tainted: False
Filesize: 569KB
Number pixels: 971K
Pixels per second: 16.18MB
User time: 0.060u
Elapsed time: 0:01.059
Version: ImageMagick 6.7.7-10 2013-09-10 Q16 http://www.imagemagick.org
http://en.wikipedia.org/wiki/Kurtosis
"any measure of the 'peakedness' of the probability distribution of a real-valued random variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution"
wow.
Maria: There's a randomnes with the images.
No need to reconstruct the book (ie order-relation between text and images)
Constant: starting a painting from the edges. to know about the container
Range, dimensions, formats
$ tesseract nnb98-002.jpg -l eng -psm 1 outfile
Tesseract Open Source OCR Engine v3.02.01 with Leptonica
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 0 blob text block, but using orientation anyway: 0
Test blob assigned to row at (-817.5,-67.5) on pass 0
Test blob y=(-885,0), row=(-1072.500000,-322.500000), overlap=562.500000
Test blob assigned to row at (-1072.5,-322.5) on pass 4
Test blob y=(-885,0), row=(-1072.500000,-322.500000), overlap=562.500000
Test blob assigned to row at (-1072.5,-322.5) on pass 1
create gifs:
Ideas
- Reconstructing relations between physical diversity, printed reproduction, scans
- Scale, tilt, orientation
- 2D-3D-2D-3D
- resurrect images from being treated as text
- histogram, historiography, hysteria
- http://en.wikipedia.org/wiki/Steganography
- reducing of range: size, texture, dimension = reviving range
- make one image of all of them; is it actually only one images
- look at original scans (pre-multi-crop)
- overlap imges for example maps, graphs, photographs etc.
- create gifs
- extract a color table from an image