K-Anonymization Implementation Using Apache Spark
Abstract
This experiment attempts on data which can reveal a person’s identity to anonymize with k-1 anonymity principle. "Given person-specific field-structured data, produce a release of the data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful”. The attempt to value the sensitivity and meaningful information with huge amount of data concerning privacy-preserving techniques are maintained to overcome fears with everyone’s delicate data. With this paper, we study the k-anonymity principle algorithm in the context of big data, and introduce a top-down k-anonymization, L-diversity and t-closeness solutions for Apache spark using Java. In the era of volumes of data, science needs more scalable and efficient methods to overcome data leakage, where there is information like public health, diagnosis, sensitive information like name, zip, race, education which leaks the information and would be against privacy of one’s data.