UiPath can handle normal, structured documents like forms and editable PDFs, but the data cannot be extracted from some unstructured documents with normal instructions. Non-standard documents are the most challenging thing for us. For example, let's say you have a hard copy of a receipt you get from a store, and you want to keep a record of it. You need to extract specific types of data and store it in Excel. Document Understanding can deal with these documents. You can configure it to scan the receipt and identify the data we're interested in.
We can provide a set of optimizations, classifications, and preconfigurations before we process the document. We created a taxonomy that we've predefined that these kinds of documents can conform to our security purposes. Using the taxonomy, Document Understanding can first classify the type of document, the arguments or variables we want to use, and the data we need to extract or store. Document Understanding can scan a written document and identify if a signature is present.
We keep a person in the loop in between because we can't 100 percent rely on the extraction. Document Understanding uses OCR which sometimes struggles with handwritten material. For example, it might mistake a six for a five. There must be a human in the loop to ensure quality. The device will send it to the validation station on your mobile phone. The bot will learn from the choices you make, and it will be more accurate the next time.
Document Understanding helps us to reduce human error. It can reduce the time staff spends on some tasks, but the amount of time saved depends on a few factors. We still need to validate the data because before proceeding, we sometimes collect and share sensitive data for our clients. We need a validation step in between to check before we send any data.
One benefit of Document Understanding is machine learning. As we process more data, we train Document Understanding to classify information more accurately. Document Understanding can extract and interpret information similar to the way a human can. A human can read a paragraph and distinguish between types of information, but our UiPath bots can't. Document Understanding integrates with artificial intelligence to interpret information within that.
The newer versions of Document Understanding can integrate with ChatGPT or any generative AI tools so that it can better interpret the information autonomously, and we don't need to create the taxonomy or classify the documents. We only need to give a prompt and input the document.
It will read documents similar to the way a human would. Let's use a contract as an example. You want to extract data like the buyer, seller, property address, etc. It will take the information from the document and give it to you. It can also scan for checkboxes and identify which ones are checked, but there are some limitations.
It uses a document object model to map which data is on what page of the document. For example, let's say the data you are interested in is on the third page of the document. The model knows where the data is, so it directly jumps to that particular page and extracts the information. The mapping is very perfect.
We always use attended processes because it's a good practice. The bot can do it without a human in the loop, but I would only do that if you are certain about which information you want to extract. If you're working with a handwritten document or signatures, you need a human in the loop to validate the data and help the machine learning component learn the difference between correct and incorrect information.
The time required for the validation process varies depending on the number of fields. For a small number, it only takes two or three minutes. When you have more fields, it may take a little longer to create and configure the document understanding model. You need to create the taxonomy, classifications, and model.
The validation process is easy. The Validation Station shows you the extracted data on one side and the document on the other, so you can easily scroll down and check if the data is accurate. You just need to click a checkbox. If you don't think it is fine, you have the option to add an exception. Based on that exception, you can create multiple conditions for how to address the same issue if it happens again.
Document Understanding is about 75-100 percent accurate depending on the type of document, and it increases as you train the model.
I would like to see more integration of artificial intelligence. That's being implemented, but it would be a massive improvement to the solution's document processing. If UiPath achieves intelligent document processing, it will be far better than anything on the market. There are currently some limitations with the fields that could be addressed using a GPT engine. With an integrated AI model, you wouldn't need to create your taxonomy. You would only need to provide some prompts, such as "What is the property name?" It will store that as a variable.
I started using Document Understanding six months ago.
In the community version, there is a limit on data extraction using a form-based extractor. There are limitations on digitization in the community version. You can do only 50 or so in one hour. The enterprise version can handle a larger volume of data, but we aren't dealing with huge amounts of data. We can still use multiple types. It allows you to scale with multiple types of extractors in the same document. If I'm confident in how the model is processing a particular field, it can be adopted into the regular business structure and reused.
I was involved in the deployment only as a developer. We created the taxonomy and the model for Document Understanding, then tested multiple cases with multiple documents. We see which extractor would be the best fit for a particular value. We can classify it according to the values we want and we can set up an accuracy also. We can set a confidence level for each variable, so the confidence is different for a regular extractor versus a complex one. I set the confidence level high on the regular extractor.
Initially, the deployment is somewhat complicated for a developer. However, it gets easier once you understand everything. We didn't need a consultant. I could complete the job by myself. It isn't rocket science. UiPath Academy has a free course on Document Understanding. Anyone can use it for free.
We use the free community version. Anybody can use it, but it has some subtle limitations. The enterprise license gives you far better results without limitations.
Document Understanding can handle handwriting and signatures in most cases. The community version limits handwritten document processing, but it's enough for our needs and gives us the correct data every time.
I haven't worked with any other document processing solution besides UiPath. I researched some tools, but Document Understanding seemed like the best fit for me, so I used it.
I rate UiPath eight out of 10. I deduct two points because creating the configurations can be time-consuming.
Thank you for your valuable review Biswajeet.