1. Description We need to create a template based flow of documents which automates the templates.
Process is: 1. read the file and extract all text (you) 2. find values in table and compare with text extracted (you) 3. if not found send to template creation setting a vlue to 0 instead of 1(you) 5. manual labeling stores coordinates for every label 6. Based on the coordinates you extract text strings inside of the boxes with regex for example and store values to template table(you) 7. read the document again and extract based on coordinate and compare with media_template table and store the results in media table(you)
2+ time document arrives 1. upload document (already done) 2. Detect if there is a template already. Extract text strings with regex by using coordinates and. Store strings in media table
You work is to quickly do the above work.
2. Skills Python MySQL 5+ years You must have done something similar previously and you know what regex and tesseract is and have used it several times. You have worked with vision, ML, DL or NN