How do you build a system that can ask people very sensitive questions, for instance about their work, etc; and ensure confidentiality of the answers on a database level, yet allow each person to answer only once (and if necessary edit his answers)?
This is my proposed method.
- The system itself consists of a database, a set of PHP pages, and a configuration file.
- The PHP files identify the user and give him a unique identifier ku, say, the person’s name, SSN, or similar identity. This is only stored in the current session. Necessary referential constraints can be added to ensure the validity of the identifier, so the user specifies a real name – e.g. through a lookup in a user table.
- The system configuration file contains a system key, ks, which is a unique, random key, and which is kept secret.
- The database has three tables.
- Two tables make up the questions. One for each questionnaire, and one for each question connected to the questionnaire. Each questionnaire is given a unique, random key kq, and each question has an auto-incrementing ID.
- The third table contains the answers, which every user has specified. The unique key to each answer is given by the question ID, and a cryptographically secure hash h -> H(ks + ku + kq).
By keeping ks a secret, it is cryptologically impossible to find out the user identity of any given answer in the answers table. And by varying kq for each questionnaire, it is also impossible to track users over several questionnaires. h will, however, be the same for each given questionnaire and user.
It is thereby possible for the system to post answers from a given user into the tables, and access previous answers; and it is also possible for a user to retrieve answers, filter them, and generate statistics, without ever having access to any identities for the given answers. Even if the system is compromised, it is impossible to learn the identities of any answers without using a brute-force (or table lookup) method.
Does this sound ok?