Caching framework for big-data engines in the cloud

Fecha de publicación: 07/12/2017
Fuente: Wipo "BigData"
The present invention is generally directed to a caching framework that provides a common abstraction across one or more big data engines, comprising a cache filesystem including a cache filesystem interface used by applications to access cloud storage through a cache subsystem, the cache filesystem interface in communication with a big data engine extension and a cache manager; the big data engine extension, providing cluster information to the cache filesystem and working with the cache filesystem interface to determine which nodes cache which part of a file; and a cache manager for maintaining metadata about the cache, the metadata comprising the status of blocks for each file. The invention may provide common abstraction across big data engines that does not require changes to the setup of infrastructure or user workloads, allows sharing of cached data and caching only the parts of files that are required, can process columnar format.