< Back to Data Examiner

Best Practices for Managing Data, from Acquisition to Archive

Article Written By Kristy DeMarco, Director of PLM and Vertical Markets for Lyve Mobile Services, Seagate Technology and Tammy M. Weir, President at Weir Consulting Services LLC

Data-intensive workflows at the field level have always presented challenges in the Energy Industry. As data sets collected in the field have grown exponentially, data management challenges have both evolved and compounded. This article will address these pain points and discuss strategies and solutions that should be implemented to ensure data compliance and entitlement and promote ease of accessibility, both internally and externally.

I. Introduction

For every segment of the Energy Industry—upstream, midstream, and downstream—aggregating, transporting, and analyzing critical data, faster, is one element that gives Energy Industry companies a competitive advantage. For upstream applications, specifically, when it comes to data: Time is of the essence. Whether used to assess subsurface geologic features or to identify potential risks that may exist downhole or in surface facilities, it is crucial to capture and process large digital information datasets with the highest possible speed and integrity.

The following sections of this paper discuss the various phases involved upstream data management, including:

  • Pre-Survey Leasing, Permitting, and Contract Management
  • Data Acquisition in the Field
  • The Challenges Presented by Raw, Unstructured Mass Data Sets
  • Optimizing Legacy Data
After addressing key issues data management operators typically encounter in each of these phases, this paper will introduce best practices designed to address each of them.


II. Workflow Phase 1: Seismic Data Acquisition

Every aspect of exploration and production company operations—from geophysical surveys to drilling, hydraulic fracturing, logging and coring, and well testing—depends on geophysical data. Managing that data, however, has become increasingly complex, as the evolution of data acquisition technologies has outpaced approaches to data management in the field. In order to build a focused and sustainable data management process, it is important to implement a logical, systematic approach to data management processes, beginning at the seismic data acquisition phase.

Pre-Survey Leasing, Permitting, and Contract Management

Before a survey can even begin, everything from leasing and permitting requires efficient reviewing, negotiating, finalizing, and executing seismic data acquisition contracts, all of which depends on a streamlined approach to managing legacy exploration and production data. On top of this, operators recognize the importance of geophysical and geological data entitlement, making accounting for potential legal ramifications and determining proactive data management solutions a top priority. Siloed workflows at this stage can restrict productivity and limit business opportunities, sometimes inhibiting a project from getting off the ground in the first place.

Data Acquisition in the Field

When it comes to the actual execution of a seismic shoot, whether on land or at sea, it will become essential to adapt and maintain a continuous, iterative approach to data consolidation, copy, transfer, and ingest, as well. Before energy companies can transform data into a deliverable, they must first establish a physical data storage strategy that guarantees effortless and affordable scalability in the field, frictionless physical data transfer from edge to data center to cloud, as well as coordinated and efficient data management along the way.

Key issues:

  • Inability to scale (as data collection requirements increase)
  • Limited storage in the field for applications
  • Limited options in data transfer due to large data volumes
  • Data organization
  • Traceability throughout the data lifecycle


Data Management Best Practices

When it comes to managing data in the field, instead of recording seismic, survey, and surveillance data on hundreds of tapes, upstream operations can significantly reduce their hardware footprint by recording data to storage arrays, substantially increasing disk space while reducing IT support requirements at the edge and maximizing edge data storage resiliency.

To achieve these goals, upstream companies should leverage high capacity, ruggedized, and securely encrypted storage arrays to create secure copies of raw data so that, before data is manipulated or moved, one copy of the raw data is securely archived. Using high performance storage arrays not only enables operators to keep up with simultaneous recording, backup, and processing workflows; it also allows them to maintain a small data storage infrastructure footprint, making physically shuttling data from field to ingest logistically straightforward. Using securely encrypted drives that implement a user key management service (KMS) layer protects data both in the field and in transit for robust data security against both cyber and physical threats.

Additionally, these operations should take advantage of a physical data transport service to simplify data management logistics. By utilizing a cost-effective data transfer services model, operators benefit from just-in-time device delivery to and from any location, unburdening data management logistical overhead and maximizing existing budgets by ensuring individual projects only pay for the hardware they need. Leaning on the additional strengths of high-capacity, portable devices to aggregate mass datasets from the edge, operators can expect optimized performance in remote environments, robust data security, as well as scalable and efficient in-field storage for business insights. Implementing well-defined, consistent, and documented procedures to organize data management between parties also ensures data isn’t misplaced, along the way.

III. Workflow Phase 2: Data Processing

There are generally two key issues that stymie post-acquisition data processing and visualization:

  • The first involves the type of data aggregated in the field;
  • The second involves the volume of data needing to be stored, moved, interpreted, and analyzed.


The Challenges Presented by Raw, Unstructured Mass Data Sets

Field data generally arrives unstructured (e.g. well logs, written drilling reports, CAD drawings) or semi-structured (e.g. ocean-floor models and simulations) and is typically collected in raw formats, using industry standards (SEGD, in the case of field seismic). This data, however, is of little use until it is promptly extracted; processed on high-performance computing (HPC), in order to produce various products for geophysical and geological interpretation; delivered; and refined.

Processing at the edge has now advanced to a point that various stacking and fast-track products can be produced before the raw data reaches an offsite data center. However, the bulk of pre-stack migration, angle stacks, and depth converted products are still produced in the onshore data centers before shipping to the end client. This necessitates moving data from the edge to the core, efficiently and securely, as it isn’t until then that product data (typically SEGY and SEGY gathers)—often in the range of 20-30 product types—can be delivered to an energy company for reformatting for use within the energy industry’s preferred software suite.

Whether as a result of limited edge storage infrastructure or limited transmission bandwidth, many exploration and production efforts are stopped in their tracks at this stage. To accommodate the physical transfer of large and ever-growing data sets, geoscience solutions require rugged, scalable, and affordable edge storage infrastructure, built both for mass-capacity, vendor agnostic storage at the edge and frictionless physical transfer to any cloud service.

Key issues:

  • Enterprise data recording and management, in the field
  • Difficulty transporting these sets to data centers/the cloud
  • Data sets too large to be moved over satellite or 5G
  • Lags in time to data and time to insights related to mass data sets


Data Management Best Practices

In order to deliver immediate analysis, informed decision making, and enhanced potential for improved business outcomes, upstream companies should partner with a data storage expert that offers a high-capacity data transfer solution, designed to keep up with the demands of acquisition, processing, and visualization workflows. To do this, it is most efficient to utilize one device to both capture and transport raw field data to processing.

On the level of individual files, it is essential that upstream companies implement a versioning file system that allows individual files to exist in several versions at the same time, ensuring copies of seismic data are both sharable and protected. Utilizing devices that simplify multi-location and multi-node data consolidation, as well, cuts the time it takes to move data from field to headquarters for ingest and analysis.

In the end, bypassing limited edge infrastructure, bandwidth limitations, and data management silos by organizing all data consolidation, transportation, and processing using one device not only improves time to data but ensures upstream companies have a competitive advantage over their competitors, delivering essential information to stakeholders quickly and maintaining data infrastructure scalability, accessibility, and security throughout the process.

IV. Workflow Phase 3: Legacy Data Migration and Data Management

Finally, it is imperative that upstream energy operations implement sustainable and organized strategies for managing legacy data. Though new acquisition data drives new business, legacy data ensures geophysical insights remain rich and historically informative. Given that exploration and production companies often collect generations of data across decades of evolving formats and media types, with each data format and media type requiring specific software solutions and procedures—it is not enough to optimize data acquisition and processing at the edge.

Optimizing Legacy Data

Overcoming limitations, inefficiencies, and silos from the edge to the cloud to expose the full value of your data requires establishing systematic procedures, including:

  • A complete inventory review of physical assets
  • A full investigation of the initial dataset
  • Performing warranty tasks to ensure the complete shot point range of a line or survey is covered
  • An effective data management system to maintain a well-organized data repository
  • Data standards and guidelines for cataloging data
  • Maintaining awareness of data entitlement
  • And continuously revisiting processes and workflows, to make sure they’re optimized for all parties involved


From an administrative perspective, putting a system of archiving in place that is both methodical and flexible allows a company to accommodate changes to prescribed standards—as developed by APEGGA, SEG, and others—and ensure data management methods advance at the same pace as technology. Recognizing data as an asset, furthermore, keeps legacy data relevant and accessible.

From a hardware perspective, investing in cost-optimized data storage hardware, software, and services enables users to move petabytes of data from one location to the next for immediate processing and analysis, as well as long-term data archival, accessibility, and security. Working with a data transfer partner whose services help teams quickly and easily transfer their data from any endpoint, edge, or core location to the cloud of their choice also helps operators digitalize and thereby optimize their legacy data—saving costs on the front end and better organizing data archives on the back end.

Key Issues:

  • Lack of clear ownership
  • Disorganized data governance and entitlement
  • Data islands (created by mergers and acquisitions)
  • Backing up to cloud without total awareness of fees


Data Management Optimal Strategies

In order to optimize large-scale data movement, such as in seismic mapping applications, upstream exploration and production companies should select devices that lift data out of the field and rapidly transport it to a data center or headquarters for redundancy and backup without loss, corruption, or breach.

Establishing methods to ensure data processes are always up-to-date, compliant, and flexible is imperative to accommodating for change with internal processes, people, and technology. Long-term considerations for data management, storage, and archival processes also ensure geophysicists can college data, perform observations and analysis, and deliver essential information utilizing insights from both on-prem and cloud products.