Monday, June 17, 2019

How to Detect Text in Images

Images are a great way to communicate without text but oftentimes images are used/abused to spread text within social media and advertisements. Text in images also presents an accessibility issue. The truth is that it’s important, for any number of reasons, to be able to detect text in image files. The amazing open source tool that makes detecting text in images possible is tesseract OCR!



I recommend using Homebrew to install tesseract:



brew install tesseract


To run tesseract to read text from an image, you can run the following from command line:



tesseract ~/Downloads/MyImage.png ~/Downloads/MyImage.txt -l eng



The command above extracts detected text in the English language (-l eng) into a text file (MyImage.txt). The process is very quick and there are dozens of supported languages.



Let’s look at the following example:



The following text is detected:




International
‘Champions
Cup ~- TOUR SQUAD #AFCTour2018 CECH MUSTAFI GUENDOUZI oziL
LENO SOKRATIS NELSON IWOBI
MARTINEZ MAVROPANOS SMITHROWE = NKETIAH
BELLERIN OSEI-TUTU WILLOCK PEREZ
KOLASINAC ELNENY RAMSEY LACAZETTE
CHAMBERS MAITLAND-NILES MKHITARYAN AUBAMEYANG
HOLDING


There are a number of utilities in different programming languages that plug into tesseract’s functionality, but it’s important to know the underlying tool! tesseract is an unbelievable tool that you should take advantage of if you need an open source utility for detecting text in an image!


  • Regular Expressions for the Rest of Us

    Regular Expressions for the Rest of Us

    Sooner or later you’ll run across a regular expression. With their cryptic syntax, confusing documentation and massive learning curve, most developers settle for copying and pasting them from StackOverflow and hoping they work. But what if you could decode regular expressions and harness their power? In…

  • Serving Fonts from CDN

    Serving Fonts from CDN

    For maximum performance, we all know we must put our assets on CDN (another domain).  Along with those assets are custom web fonts.  Unfortunately custom web fonts via CDN (or any cross-domain font request) don’t work in Firefox or Internet Explorer (correctly so, by spec) though…

  • MooTools OpenLinks Class – Updated
  • MooTools Gone Wild: Element Flashing

    MooTools Gone Wild: Element Flashing

    If you’re like me and lay awake in bed at night, you’ve flipped on the TV and seen the commercials: misguided, attention-starved college girls fueled by alcohol ruining their futures by flashing lame camera-men on Spring Break. Why do they do it? Attention…

<!–
–>

No comments:

Post a Comment