De-identification method for big data

Fecha de publicación: 07/07/2022

Provided is a de-identification method for big data, for anonymizing the big data so that the big data may be freely distributed to an external system without concern about personal information leakage and enabling a statistical value calculated from the distributed data to be maximally close to a statistical value of original data to thereby secure the reliability of statistical analysis. Records in which values of abstraction reference fields are all the same and the number thereof is less than or equal to N are separately grouped without being excluded from being abstracted, and a connection-type attribute value including an occurrence rate value of a corresponding category attribute value in a group is allocated as an attribute value of an abstracted record to minimize abstraction missing data, so that the statistical value calculated from the distributed data becomes maximally close to the statistical value of the original data.

Volver