Thursday, January 15, 2015

gimp - How do I convert images of a page of text to pure Black & White and sharpen the text?


I'm a research student working with large numbers of archival documents. For speed when at the archives, I photograph the documents (with a decent but not fancy digital camera), producing colour .jpg files. For ease of reading, and to improve detail when printing some of the documents, I would like to convert the images to pure Black and White - not Greyscale - and enhance / sharpen the text. Ideally, I would like the finished image to look more like a photocopied page, with the background, any shadows etc faded as much as possible to improve contrast with the text. I've tried various things but can't quite get it there. Apologies if there's an obvious method I've missed - I'm a GIMP novice. See the sample image for what I'm working with. enter image description here



Answer



Adjusting brightness/contrast is not easy due to the uneven lighting.


First, to avoid color fringes, you work on a grayscale version of the image, either Image>Mode>Grayscale or Color>Desaturate. If you are using Gimp 2.10, you can also set Image>Precision to 32-bit floating point/linear.



Then you apply the following technique(*) to even the lighting:



  • duplicate the image layer

  • apply a Gaussian blur that is sufficient to make the text disappear completely (around 50px on your image)

  • set the top layer to Grain extract mode

  • create a new layer with the result: Layer>New from visible


In the resulting layer, the background is gray (around 50%) but is a more uniform gray. In the histogram, this is the big spike. You can then use the Levels tool comfortably to optimize the result. In the "input" settings:



  • right handle slightly left of the middle of the big spike (anything to its right becomes completely white)


  • left handle to where the histogram seems to cease (anything to its left becomes completely black)

  • middle handle adjusted to optimize contrast


enter image description here


The text on the other side of the page shows through and limits a bit your ability to stretch contrast. Next time you take these pictures, bring a dark sheet of paper (ideally, black) that you insert under the page that you are shooting.


(*) To explain a bit:



  • With the Gaussian blur a pixel value is replaced by the lightness of the area around it (the blur is assumed to be sufficiently wide to make the influence of local details such as text negligible)

  • The grain extract, which is basically a subtraction subtracts the average lightness of the area from the pixels in the initial image:


    • for background pixels, the average value of the background around them is removed, so whatever the initial background lightness, the result is close to zero (actually, 50% gray, since Grain extract adds a bias to the result),

    • for subject pixels (which are normally quite different from the background) the difference is far from 0 and they remain visible.




No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...