SERC Reading Group: Differential Privacy in the U.S. Census

During the Spring 2022 term, Kevin Mills and I ran a first-of-its-kind reading group with MIT’s Social and Ethical Repsonsibilities of Computing (SERC) Scholars. The readings and discussions centered around formal privacy in public data, and in particular the decisions, designs, and implementations of differentially private algorithms in the 2020 Decennial Census. The group is preparing a document to synthesize and present the readings, research, and discussions.

Below is the reading list and discussion topics for the term. Where possible, I’ll include links to the readings and their authors.

Meeting 1

Reading:

  • [Differential Privacy and the 2020 U.S. Census] by Simson Garfinkel. In MIT Case Studies in Social and Ethical Responsibilities of Computing, Winter 2022.

Discussion points:

  • What do we know about differential privacy and the decennial census at the outset?
  • Based on the case study, what do we see as being sticking points in the implementation and deployment timeline that we should dig further into?
  • What does “data privacy” mean (informally) in this day and age and what is the value of privacy in data generated by the public and collected and published by the government?

Meeting 2

Reading:

Discussion points:

  • Technically, what exactly is differential privacy and what does it accomplish?
  • How does differential privacy align with common or colloquial understandings of “privacy”?
  • What might the privacy concerns be in publishing U.S. census data, specifically? Can we enumerate some potential misuses or harms that would result from insufficient privacy protection?
  • What does the privacy paramter epsilon mean in this context? How can we think about translating from tuning epsilon to protecting individuals’ privacy in meaningful ways?

Futher Reading:

Meeting 3

Reading:

Discussion points:

  • Given that the data published from the decennial census is relatively mundane, to what extent should privacy be a first-order concern in this process? For example, someone’s age might be reasonably inferred (to some confidence) by an observer on the street, so what is the value in the Census Bureau actively obscuring that in the data publication process?
  • What kinds of challenges does the Census Bureau anticipate contending with as evidenced by the algorithm design?
  • Data in the census is a subset of richer and often much scarier datasets sold by data brokers and similar agents. What is the purpose of privacy in the census’ publications given that the same data and much more invasive details are readily available to anyone willing to pay the right price?
  • What role do trust and transparency play in the data collection, analysis, and publication process here?

Meeting 4

Reading:

Discussion points:

  • Does contextual integrity provide a convincing framework for how to think about privacy?
  • Where are the gaps between contextual integrity and the way that differential privacy asks us to imagine privacy loss?
  • Do we think that the text of Title 13 and in particular its language about the Bureau not publishing reidentifiable data demands the use of differential privacy?

Meeting 5

Reading:

Discussion questions:

  • Did you find any of the presentations in the LADOT amicus brief surprising?
  • Was the Court’s dismissal in the LADOT case (Sanchez) surprising?
  • How do the privacy experts in the amicus brief in the Alabama case arguments match up against the concerns we’ve found in previous discussions?
  • The seriousness of the reconstruction attack that is able to be carried out on the 2010 publications is a frequent point of contention in the discourse. Where do we land on that?
  • The costs and benefits to privacy are not distributed equally across the population. Minorities and outliers with respect to general patterns in the data pay more in terms of accuracy for privacy but simultaneously be exactly the people who benefit most from the protections of privacy-preserving techniques. How do we consider that tension and how do we meaningfully engage with stakeholders on this topic?

Further reading: