[SwiftUI] Vocamera

3 minute read

1. Requirements

Project Objectives

While studying English, I found it troublesome to write down all of English words that I didn’t know in the book. I thought it would be useful if there was an application that made a page of a book into an English vocabulary. I looked it up in the App Store, but there was no such application, so I made it myself!

Specification

version 1.0.0

  • Vocabulary
    • Create
    • Edit
    • Delete
  • Vocabulary Note
    • Create
    • Edit
    • Delete
  • Get Image from
    • Library
    • Camera
  • Convert Image to Words
  • Extract only the vocabulary of all words



2. Structure/Function

version 1.0.0

Roughly, Vocamera version 1.0.0 consists of the following:

  • 2 Models: Voca, VocaNote Entities - Core Data Classes & Properties
  • 4 Views: Voca, VocaNote, AddVocaNote, ImageRecognize
  • Image Process structs: Library, Scanner
  • TextRecognizer Class
  • TextProcess Class
  • Others: Utilities, Persistence…


(1) Models
Each entity has a Class and Properties.
coreData
CoreData
(2) Views
ImageRecognizeView can access to the photo library or camera.
VocaNoteView AddVocaNoteView
VocaNoteView(left), AddVocaNoteView(right)
VocaView ImageRecognizeView
VocaView(left), ImageRecognizeView(right)
(3) Image Process Structs
Each struct has its Coordinate Class and load PhotoLibrary or Scanner.
(4) TextRecognizer Class, TextProcess Class:
TextRecognizer Class get UIImages or CGImages and transform them into texts. TextProcess Class removes whitespace, unecessary words.
(5) Others
UtilityVies, UtilityExtensions, Persistence



3. Implementation: Focus on the Problem-Solving

: Among the implementations, parts that take a long time or still have a room for improvement are summarized with the source.

version 1.0.0

In version 1.0.0, the followings are the main features that were difficult(or first) to implement.

➀ Use CoreData
➁ Use various View(Navigation View, fullScreenCover, sheet…) and personal EditMode
➂ Take a pictures or import an existing images and convert it into text.
➃ Trim text extracted from the image.
⑤ Determine a valid words in trimmed text.

The codes below are the main functions needed to implement these features.

Main problems

  • Transform Image to Texts
  • Handling with texts

Functions

➀ View: Switch Screen, Presonal EditMode

: Navigation View, fullScreenCover, sheet…

  • Personal EditMode
@Environment(\.editMode) private var editMode
@State private var mode: EditMode = .inactive
...
...
...
.onChange(of: mode, perform: { newMode in
            if newMode == .active {
                editNote = EditNote(vocaNote: parentVocaNote, viewContext: viewContext)
            } else if newMode == .inactive {
                if parentVocaNote.title.count < 1 {
                    parentVocaNote.title = editNote.beforeNoteTitle
                    showNoteTitleAlert = true
                    self.mode = .active
                }
            }
        })
.navigationBarItems(leading: mode == .active ? cancelButton : nil)
.environment(\.editMode, $mode)
...
...
...

➁ Transform Images to text

  • Input: UIImages or CGImages (In case of UIImage, transform into CGImage)
  • recognizeText function: Transform CGImage into text using DispatchQueue and completionHandler.
    func recognizeText(withCompletionHandler completionHandler:@escaping ([String])-> Void) {
        queue.async {
            var cgImages = [CGImage]()
            if self.uiImages == nil {
                cgImages = (0..<self.cameraScan!.pageCount).compactMap({
                    self.cameraScan?.imageOfPage(at: $0).cgImage
                })
            } else if let uiImages = self.uiImages {
                for uiImage in uiImages {
                    let ciImage = CIImage(image: uiImage)!
                    let ciContext = CIContext(options: nil)
                    let cgImage = ciContext.createCGImage(ciImage, from: ciImage.extent)
                    cgImages.append(cgImage!)
                }
            }
            let imagesAndRequests = cgImages.map({(image: $0, request:VNRecognizeTextRequest())})
            let textPerPage = imagesAndRequests.map{image,request->String in
                let handler = VNImageRequestHandler(cgImage: image, options: [:])
                do{
                    try handler.perform([request])
                    guard let observations = request.results else { return "" }
                    return observations.compactMap({$0.topCandidates(1).first?.string}).joined(separator: "\n")
                }
                catch{
                    print("Failed to recognize the text: ", error)
                    return ""
                }
            }
            DispatchQueue.main.async {
                completionHandler(textPerPage)
            }
        }
    }

➂ Trim text & Validation

  • Trim: Using NLTagger’s options & personal NLTag extesion.
extension NLTag {
    func isBeHaveVerb(word: String) -> Bool {
        if NLTagConstants.beVerb.contains(word) || NLTagConstants.haveVerb.contains(word) || NLTagConstants.otherEraseWords.contains(word) { return true }
        else { return false }
    }
    func containsNumbers(word: String) -> Bool {
        let numberSet = NSCharacterSet.decimalDigits

        if word.rangeOfCharacter(from: numberSet) != nil { return true }
        else { return false }
    }
    ...
    ...
    ...
}
  • Validation: NLtag’s enumerateTags
...
...
...
tagger.enumerateTags(in: sentences.startIndex..<sentences.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
            let word = "\(sentences[tokenRange])"
            guard let tag = tag, word.count > 1, !tag.containsNumbers(word: word), !tag.isBeHaveVerb(word: word), tag != .conjunction, tag != .number, tag != .pronoun ,tag != .preposition else { return true }
            words.append(word)
            return true
        }



4. Result

version 1.0.0

  • period: May 15, 2022 ~ July 6, 2022

  • Appstore

Vocamera 1.0.0 Appstore

  • Vocamera 1.0.0 (video)

Comments