Data archive vault in big data platform

Fecha de publicación: 09/02/2017
Fuente: Wipo "BigData"
Embodiments relate to data archiving utilizing an existing big data platform (e.g., HADOOP) as a cost-effective target infrastructure for storage. Particular embodiments construct a logical structure (hereafter, “vault”) in the big data platform so that a source, type, and context of the data is maintained, and metadata can be added to aid searching for snapshots according to a given time, version, and other considerations. A vaulting process transforms relationally stored data in an object view to allow for object-based retrieval or object-wise operations (such as destruction due to legal data privacy reasons), and provide references to also store unstructured data (e.g., sensor data, documents, streams) as attachments. A legacy archive extractor provides extraction services for existing archives, so that extracted information is stored in the same vault. This allows for cross queries over legacy data and data from other sources, facilitating the application of new analysis techniques by data scientists.