Type | Optical character recognition (OCR) [citation needed]; ICR [citation needed]; Handwriting Recognition, Redaction |
---|
Captricity is a data capture software program (and the company that sells it) that uses a combination of machine-learning and human verification to perform OCR [citation needed] data capture from hand-filled forms.
Captricity was incubated in the Code for America incubator program and is used by government agencies, health clinics and global health practitioners, and researchers such as NYU's Center for Technology and Economic Development [citation needed].
Captricity was founded in 2011 by Kuang Chen and former Harvey Danger musician Jeff J. Lin. The idea for Captricity came from Chen’s PhD dissertation at UC Berkeley. His research focused on data-centric approaches to increase the efficiency of low-resource organizations, so they could better serve disadvantaged clients.
Captricity is currently headquartered in downtown Oakland, CA,[1] and according to its LinkedIn profile, it has 51-200 employees.[2]
Captricity capitalizes on the process of crowd sourcing, parceling out OCR verification tasks to human operators.[3] Captricity claims that their technology achieves 99.9% accuracy.[4] Captricity’s machine learning elements combine OCR, ICR and OMR [citation needed].
Captricity captures handwritten information from forms. This data then populates searchable spreadsheets (like a .csv Excel file). Captricity does not support unstructured data.
To maintain the privacy of the information in the forms, each form is “shredded” into distinct fields and each field is verified by one or more different people.[5] Captricity claims that since no one person can see more than one field from a document, privacy is maintained. Captricity uses Amazon's Mechanical Turk System to perform this human verification step.[6] For example, a worker may see a stream of 4-digit numbers, not knowing that it is the last portion of a collection of US social security numbers.
Captricity performs redaction in addition to OCR. Redaction is a service in which any field or collection of fields can be “blacked out” in the document template.[7] Any information contained in those fields will not be read by the system. For example, if a courthouse wants to release their records to the public, but wants to keep the arresting officer’s name private, the field containing this information can be redacted.
Non-profit and academic researchers often conduct survey research in order to conduct Monitoring and Evaluation of their programs or projects. The Center for Effective Global Action (CEGA), which is affiliated with UC Berkeley, announced a partnership with Captricity in August 2012.[8] Captricity donates digitization services to non-profits [citation needed] via its Data for Communities program, and offers discounts to non-profit organizations such as CEGA members.