[DP#3] Database anonymization – randomized response

Randomized Response Explained

Did you ever cheat on your spouse?
Are you using any illegal drugs?
Have you committed any crimes in your past?

Randomized Response

S.L. Warner, 1965
– I never said that!

How it works

scenario respondents are asked a sensitive yes or no question a binary question
– Have you ever used illegal drugs?

instead of answering directly respondents are ask to flip a coin in private
if the coin comes up heads they answer truthfully
if it comes up Tails respondents flip the coin again and answer yes if it lands hats and no if it lands tails
in reality you would flip the coin twice in both scenarios

Data Collection

Real answer = “Yes”
3/4 = 75%


Increased Truthfulness: 응답자가 민감한 질문에 대해 보다 진실한 답변을 제공하도록 장려함(실제 응답은 무작위성에 의해 가려지므로)
Privacy Protection: 설문조사 데이터에 접근하더라도 응답자의 개인정보 보호를 보장함
Statistical Analysis: 통계 분석하여 인구 집단 내의 민감한 행동이나 특성의 유병률 추정


Randomization Device Choice
Sample Size
Assumption of Randomness
Ethical Concerns


Substance abuse
Criminal behavior
Sexual preferences
Local Differential Privacy


Collecting sensitive information
Preserving privacy
Enhance the quality


Bob17M12345Heart disease
Identifiers: Name – removed
Quasi Identifiers: Age, Sex, ZIP
Sensitive Attribute: Disease
Cherries, Apples, Grapes, Tea, Coffee, DiapersCherries, Apples, Grapes, Tea, Coffee, Diapers
Flour, Coffee, Bread, ButterCherries, Apples, Grapes, Bread, Butter
Milk, Yoghurt, Brie, StiltonMilk, Yoghurt, Brie, Stilton
Jam, Gouda, Ham, Bacon, PepperoniCherries, Apples, Grapes, Gouda, Ham, Bacon, Pepperoni
Tomatoes, Brokkoli, Potatoes, Pasta, RiceCherries, Apples, Grapes, Brokkoli, Potatoes, Pasta, Rice
Soft Drink, Beer, Fish, Meat, EggsSoft Drink, Beer, Fish, Meat, Eggs
\(k^3\)-anonymity: same m items(m=3, Cherries, Apples, Grapes)


I have cherries, apples and grapes which can be generalized to Fruit

Cherries, Apples, Grapes, Tea, Coffee, DiapersFruits, Beverages, Household Items
Cherries, Apples, Grapes, Bread, ButterFruits, Bread, Dairy
Milk, Yoghurt, Brie, StiltonDairy
Cherries, Apples, Grapes, Gouda, Ham, Bacon, PepperoniFruits, Dairy, Sausages
Cherries, Apples, Grapes, Brokkoli, Potatoes, Pasta, RiceFruits, Vegetables, Pasta
Soft Drink, Beer, Fish, Meat, EggsBeverages, Meats
trade-off between Accuracy and Privacy

Step-By-Step Guide on Implementing Multi-Dimensional Mondrian to Achieve k-anonymity

python mondrian_youtube.py

import pandas as pd

data = pd.read_csv('data.csv')
k = 3

qis = ['Age', 'ZIP', 'Gender']

def summarized(partition, dim):
    for qi in qis:
        partition = partition.sort_values(by=qi)

        if(partition[qi].iloc[0] != partition[qi].iloc[-1]):
            s = f"[{partition[qi].iloc[0]} - {partition[qi].iloc[-1]}]"
            partition[qi] = [s]*partition[qi].size
    return partition

def anonymize(partitions, ranks):
    dim = ranks[0][0]

    partition = partition.sort_values(by=dim)
    si = partition[dim].count()
    mid = si//2

    lhs = partition[:mid]
    rhs = partition[mid:]

    if(len(lhs)) >= k and len(rhs) >= k):
        return pd.concat([anonymize(lhs, ranks), anonymize(rhs, ranks)])
    return summarized(partition, dim)

def mondrian(partitions):
    ranks = {}

    for qi in qis:
        ranks[qi] = len(set(partition[qi]))

    # sort ranks
    ranks = sorted(ranks.items(), key=lambda t: t[1], reverse=True)

    return anonymize(partition, ranks)

result = mondrian(data)

result.to_csv('anon_youtube.csv',  index=False)


Leave a Comment