Archives

All posts for the month January, 2016

OCR OC-gain

Posted by admin on January 20, 2016

Posted in: Data Science. Tagged: data science, OCR.

My PHYS341 students were interested to see how the OCR routine processed their attendance sheet, so I applied it, as shown below.

The left panel shows the original, the right the transformed version. The routine did a reasonable job of un-distorting the page (although it wasn’t too bad to begin with).

And here’s what the routine returns as text:

zolto J:)o<-&\
Qﬁck ${‘bt\L . .
1ZWrW(\ DQVCS
Onras Tkomag
Jam; I-Em!
De»-UV1, ?I‘L\M‘ovV\QJ3€,v'
Ia./I B,a,C/IHMC .
V.o\3 \3<<I°\Ser&eck
jengifcr Brigga
}VK°'('3E\V1c-rad LULAA
Mby Ouersfreci’ '
Tm (jivws ‘
gj)/VIOI/I  ‘
$030-4 “10u\J ‘
{NC /I/1a.V,,,,',q '
O”AKe So/ares
Skwm \<reyc}~e '

…not great.

I’m not totally sure what went wrong. Maybe I should have them write their student numbers instead.

UPDATE: 2016 Feb 5

Here’s another go with a different attendance sheet. Not much better.

I‘/\0,;/{¢r\(~  VETEK BROWN
E ? RICH/W} WLMC/(
“$2114!/~, lZoAr.‘%o Pratt.‘
D“""‘ 8'3"’ ' g/"It; /Vlar,//'/1
 Lolpef‘
kpdkl/n\‘f 2011/(IE
ANN <5©<J\M) Vfxit %'L0V\€z &}"-:5 _)_La/\/MAS _ Karm I>q'v‘-5
\Tou—o0l Hand _
'Dz\m/L ?\c\n'ksvvuu‘ev*
Ian I3/¢«'-Ckﬁa/If ¢
94% \5<<k*?l6<:x\.cLl¢ Jennifer BH995 N\o+‘\'V\c\.6 Luv\0\ . \/\0\\)y 0\,ers+reeJr ’\".m C"\\/Ens _ 51 m cm E 1: Y Jason May A ZPM > M
PIJW1 ﬁgu//
jam 5°0W'/J
34681 \4'€y¢}~¢>/

Uniform Circular Motion Animation in Python

Posted by admin on January 6, 2016

Posted in: Teaching, Uncategorized. Tagged: classical mechanics, ipython notebook, python.

I’m prepping for my classical mechanics course, scheduled to start next week. One of the first things we discuss is uniform circular motion and how it looks projected along the x- and y-axes, so I thought it would be useful to have an animation showing that. I found a few animations online, but none really showed the x and y projections I was looking for.

So I decided to create my own using my go-to language of choice Python. Fortunately, python guru Jake Vanderplas has created a very nice animation module usable in iPython Notebooks.

Based on his example here, I put together the following code to generate the desired animation:

Scanning and OCR-ing a Paper Receipt

Posted by admin on January 2, 2016

Posted in: Data Science.

I spent the morning kludging together a python script to convert a grocery receipt into a spreadsheet as part of one of my New Year’s resolutions. There seem to be a few options out there for scanning and recording receipts, but it’s not clear that they apply an OCR technique to automatically convert them to spreadsheet.

Here’s the receipt I used:

This website provided some python source code to detect edges in the image then the outline of the receipt and transform out any foreshortening or other viewing distortion —

(left) Edge detection, (center) Outline detection, (right) Scanned version.

To detect edges, the code converts the color image to grayscale and applies the Canny edge detection scheme, which involves applying a Gaussian blur to suppress noise, calculating image derivatives, and looking for large values. The result is shown in the image above on the left, and more details on the algorithm here.

Next, the code finds the outline of the receipt by using the OpenCV‘s findContours, sorts the contours by area, and finds the contour with the largest area but with four vertices.

The code then applies a four-point transformation to warp the receipt to give it a rectangular shape and finally thresholds the grayscale to enhance the contrast. The rightmost panel in the above image shows the final result.

To convert the image to a table of text, I used PyTesseract, which provides OCR capabilities. I installed the package Tesseract using homebrew: “brew install tesseract”.

Then I just grabbed the code from this website to convert the final result into a text table:

st = pytesseract.image_to_string(Image.open(save_filename), config="-psm 6")

The “psm=6” option was required to return the text properly.

Unfortunately, the OCR analysis wasn’t perfect. For example,
was converted to
‘*CRESgENT R01 1 1800000401 4.82 IF

The prices on all lines came back fine, but the description was often distorted. I decided I cared more about the price anyway. Fortunately, the WinCo receipt had “TF” or “TX” at the rightmost side, so I performed a regex search to find the beginning of that string and grabbed the characters to the left of that.

Finally, I converted the strings into a list of comma-separated values to load into Excel or Google Sheets, leaving a space between the corrupted description and price so I could enter my own description, giving
CRESgENTR0111, , 4.82

On the off-chance it will be useful to someone else, I’ve posted the code here. Using my script will also require the source code for pyimagesearch, which requires submitting an e-mail address.

Brian Jackson

Professor of Physics at Boise State University

Archives

All posts for the month January, 2016

OCR OC-gain

Uniform Circular Motion Animation in Python

Scanning and OCR-ing a Paper Receipt

Recent Posts

Archives