Pictures to Text
A fun little weekend project, I created a program which converted a picture into text. To complete this I used jupyter notebook, found below. The code is avaliable on github.
The notebook contains 3 parts:
- Image - Input image to be converted. It’s chopped up into little bits.
- Font - Input font. The characters are extrated as images.
- Calc - the images and font characters are compared to find the best match
def show_image(path):
image = Image.open(path)
image = image.convert('L')
d = list(image.getdata())
image_2D = np.reshape(d, (image.height, image.width))
plt.imshow(image_2D, cmap='gray')
show_image('download.png')
First thing is import the image. As an example I use the finch. Most of the following code is to convert the image from a PIL array to numpy’s 2D narray and use matlibplot plot function.
def show_image(path):
image = Image.open(path)
image = image.convert('L')
d = list(image.getdata())
image_2D = np.reshape(d, (image.height, image.width))
plt.imshow(image_2D, cmap='gray')
show_image('download.png')
def show_image(path):
image = Image.open(path)
image = image.convert('L')
d = list(image.getdata())
image_2D = np.reshape(d, (image.height, image.width))
plt.imshow(image_2D, cmap='gray')
show_image('download.png')
Output:
The next step is taking the imported image and converting it into character sized images. The heart of the following methods are the view_as_blocks. The function fails if there is an overlap with the edge of the imported image. The function trim_image is there to remove the excess before it sent in to view_as_blocks. The result can be seen below.
def trim_image(matrix, size_x, size_y):
height, width = matrix.shape
trim_height = height % size_y
trim_width = width % size_x
if trim_height > 0:
matrix = matrix[:-trim_height,...]
if trim_width > 0:
matrix = matrix[...,:-trim_width]
return matrix
def get_image_slices(image_path, char_width, char_height):
image = Image.open(image_path)
image = image.convert('L') #greyscale
data = list(image.getdata())
image_2D = np.reshape(data, (image.height, image.width))
image_2D = trim_image(image_2D, char_width, char_height)
image_slices = view_as_blocks(image_2D, block_shape=(char_height,char_width))
return image_slices
char_width = 10
char_height = 18
image_slices = get_image_slices('download.png', char_width, char_height)
images = image_slices.reshape(image_slices.shape[0]*image_slices.shape[1],
image_slices.shape[2], image_slices.shape[3])
nb_across = image_slices.shape[0]
nb_down = image_slices.shape[1]
plt.figure()
for i, img in enumerate(images):
plt.subplot(nb_across, nb_down, i+1)
plt.imshow(img, cmap='gray', vmin=0, vmax=255)
Output:
Fonts
Now we have the images, we can match characters to them. First we need to get images of the font to compare. We will be using the ascii character set, this is found in the string.printable function.
def font_images():
fnt = ImageFont.truetype('fonts/SFMono-Regular.otf', 15)
import string
letters = string.printable
results = []
for letter in letters:
img = Image.new('RGB', (10, 18), color=(255,255,255))
d = ImageDraw.Draw(img)
d.text((0,0),letter, font=fnt, fill=(0,0,0))
img = img.convert('L') #greyscale
d = list(img.getdata())
image_2D = np.reshape(d, (img.height, img.width))
results.append(image_2D)
return (letters, np.stack(results, axis=0))
letters_idx, letters = font_images()
plt.figure(figsize=(60,60))
for i, img in enumerate(letters):
plt.subplot(20, 20, i+1)
plt.imshow(img, cmap='gray', vmin=0, vmax=255)
Output:
Calc
Now we have sections of the input image and character images, we just need to pick the best match. In this case I used:
where A and B are matricis of the same size. We want the lowest score as that will be the smallest difference. I guess I could have inverted it so we were looking for the max or I could have called it the loss function.
def solve_section(section, letters):
min_score = 1000000
min_letter = None
id = 0
for letter in letters:
score = np.sum(np.absolute(letter - section))
if score < min_score:
min_score = score
min_letter = letter
min_id = id
id += 1
return (min_id, min_letter)
def solve(sections, letters):
results = []
ids = []
for s in sections:
id, min_letter = solve_section(s, letters)
results.append(min_letter)
ids.append(id)
return (ids, np.stack(results, axis=0))
ids, result = solve(f, letters)
fig = plt.figure(figsize=(20, 20))
for i in range(len(result)):
sub = fig.add_subplot(image_slices.shape[0], image_slices.shape[1], i+1)
sub.imshow(result[i], cmap='gray', vmin=0, vmax=255)
Output:
Now we can finish it by collecting the characters and printing
result_string = ""
i = 1
for id in ids:
result_string += letters_idx[id]
if i % image_slices.shape[1] == 0:
result_string += '\n'
i += 1
print(result_string)
Output:
.gMBy
WgM_
M@DMg_
$BMGDMg
`MgZMMg_
_<V _MMK_
_1P"` "gg