PPDM Association Events are
specifically designed to facilitate collaborative idea
sharing, discussion, and networking. Learn from your
colleagues’ experiences, ask questions, build
relationships and make key face-to-face connections. Engage
in dynamic sessions and attend an information packed
Neil Constantine with DataCo will present a talk.
Improved technology accessibility, coupled with an E&P industry under pressure to reduce costs and shorten cycle times, have led to increased interest in data analytics and machine learning. These techniques have been successfully applied in other industries yet E&P has been slow to adopt, in part due to our large volumes of unstructured data. Estimates predict an exponential growth in unstructured data by the next decade whilst data analytics has been highlighted as a top area of focus for oil and gas companies, with the potential to automate up to 30% of existing activities. This paper describes a real process to leverage information from unstructured data, which contain nuance and context not seen in traditional repositories, despite the many challenges.
Combining E&P experience with lessons from outside our industry, this process uses taxonomic analysis, optical character recognition (OCR) and natural language processing (NLP) to ‘pull’ information from subsurface documents and place this at users’ fingertips. The value is twofold: high performance computing accelerates this process from weeks to hours, and the additional information at an early stage helps manage subsurface uncertainty and focus interpreter effort.
The process applies taxonomic analysis on file structures and metadata to rapidly identify what data are relevant to the interpreter. These may be machine-readable, e.g. Word documents, or may need additional effort, e.g. scanned legacy reports, in which case an OCR process is applied to convert these images containing text into text. Pre-processing techniques including document straightening and image manipulation improve the OCR, whilst post-processing including pattern-based substitution and spell-checking using domain-specific lexicons minimise errors within the processed text. Additionally, these lexicons can be expanded through the use of text analytics and iterative post-processing to further improve content readability.
NLP techniques are then be applied to both OCR’d documents and other relevant machine-readable files to identify certain key themes e.g. formation tops. These themes may be defined by an interpreter or they may be revealed based on meaningful occurrence of particular text and can then be fed into searches against external information stores to report additional information, such as analogues.
Finally, the process identifies tabular data in source documents and extracts this to application-specific load sheets for review by the interpreter, having applied logical QC to identify errors in the source document or the OCR process. It should be noted that the fidelity of these load sheets is based on a combination of source document quality and limitations of the OCR process, so we would not advocate use of these data without interpreter review.
This example has demonstrated a combination of E&P experience with automation and analytics using accessible technology to support the Geoscientist in better understanding what is happening in a basin or well. It breaches historical limitations of time and capacity to allow the assimilation of large volumes of data on a shortened timeframe and – using an affordable architecture – delivers value to the E&P industry.
Location and Timing
This luncheon will take place at the Shell offices - please go to the 30th floor of 275 George Street, where you will be accompanied to the Annex Building, Level 3 Mudstone Meeting Room.
Registration will take place from 12:00 to 12:15 outside
the meeting room. Lunch will be served at 11:30. If you
have any food allergies please let us know during the
registration process. If you find you are no longer able to
attend, please advise us of your cancellation as soon as
possible to free up a space for another attendee.
Thanks to all of you who
support these events which are aimed at enhancing your
ability to deliver better data management practices within
your company as well as the building of a data management community in the Brisbane area. A special thanks to our
sponsors for hosting this event. Without the support of
sponsors, events of this nature would not be possible.
Thank you to our Event Sponsors
If you would like more information on the PPDM
Association events or opportunities for sponsorship, please
feel free to contact us.
Please ensure that you answer all the questions below and select "Save
Responses" before going to your cart, or you may encounter a 'Cart