The How and the Why of Collecting Identity Data: Part 3 Designing the Survey

Fast Facts:

When designing a survey to collect identity data, it is vital to carefully design the questions.  You want to ensure that the data you collect matches the goals for how the data will be used.

Carefully select the terminology used in your survey to ensure that your meaning is perfectly understood. It is also essential to realize that some terminology can be offensive to some groups.

Be sure to have the survey reviewed by legal and ethics teams before distributing it.

Continuing our discussion about considerations when collecting and reporting identity data, we will discuss how to design the survey in this post. First, let’s talk about diversity.


According to the working paper, “Diversity statistics in the OECD [Organization for Economic Cooperation and Development]: How do OECD countries collect data on ethnic, racial and indigenous identity?, written by Carlotta Balestra and Lara Fleischer (2018), the term “diversity” is an umbrella concept (p.6). Though people are similar in many ways, there are also many differences, such as gender, age, race, ethnicity, physical abilities, religion, beliefs, etc. Diversity might be cultural, biological, or defined by the person (Balestra & Fleischer, 2018, p. 6). A person’s identity has many components and may change over time. Balestra and Fleischer noted that the racial and ethnic make-up of our societies is in flux due to migration. Additionally, intermarriage blurs boundaries between racial and ethnic groups as a growing number of persons are of mixed ancestry. To further complicate the situation, terms such as “race”, “ethnicity” and “indigenous identity” may mean different things depending on the country in which you reside. It is important to consider the terminology used in your survey to be sure that participants clearly understand what they are being asked to provide (Balestra & Fleischer, 2018, p.6).

There is a growing interest in scholarly publishing in collecting identity data such as race, ethnicity, gender, age, country, educational level, position held, and institution. Through submission systems, journals have been collecting some of this information for years; however, much of these data were rigidly defined and the people answering the questions had a very limited number of available options from which to select their response.

As you are designing your survey for the purpose of collecting identity data, it is a good time to look closely at how you are requesting the information. For example, the submitting author’s country — do you want to know the country where the author was born or the country they call home? What if they were temporarily at an institution in a different country for work-related activities, such as a post-doc or a short-term work assignment, do you want to know the country where they did their research? Be very clear about how you define the information you are trying to collect.

For some time, you may have been asking authors to provide their academic degrees when submitting their manuscripts; but consider now what you are actually trying to understand. For example, do you want an understanding of where the authors are in their careers (e.g., early, late career)? In addition to your standard questions of the highest degree they have achieved, their current position, and institution, you may want to add a question about the year that they obtained their degree or expect to obtain their degree. This helps you to understand whether the person is more junior, with possibly more time but less experience, or more senior with a great deal of experience but possibly limited time to commit to peer review or editorial activities.

Current best practice is to provide more response choices with your multiple-choice questions to allow the person completing the survey to more fully represent their identity. For example, in past years, a person might be asked to qualify themselves as male or female. Now, when asked to describe their gender, they might have the options of male/man, female/woman, non-binary, two-spirit, outside of the gender binary, agender, transgender, etc. You may want to include questions about both gender and sexuality, which allows the person to more fully qualify how they perceive themselves.

Race and ethnicity can be complicated data to accurately collect because they are difficult to define in a way that will mean the same thing to everyone who might be participating in the survey. These terms also “carry heavy intellectual and political baggage”, according to Balestra & Fleischer, and are often “contested within countries and across groups” (2018, p.8). Additionally, the understanding of these racial and ethnic terms changes dramatically over time. Sometimes race is related to a concept, such as Hispanic, or can be related to “racial” features, such as skin tone. As individual DNA analysis becomes more prevalent, people gain a deeper understanding of their race/ethnicity mix, which changes how they report themselves. In some regions, it is common to use religious affiliation, such as Jewish or Yiddish, to classify oneself (Balestra & Fleischer, 2018, 33-34). Terms being used by some journals to try to capture race/ethnicity are: African/Black, African American/Black, America Indian/Alaska Native/Indigenous, Arab American/Middle Eastern/North African, Latino/Hispanic/Spanish Origin, Caribbean, Native Hawaiian/Other Pacific Islander, East Asian, South Asian, Southeast Asian, White/European American, and Bi/Multi-racial.

Another identity datum is a person’s disability status. This information might be especially relevant depending on the topics covered by your journal. Questions may inquire about both physical and mental impairments, as well as if the disability has long- or short-term effects, and whether these conditions were present at birth or occurred later in life. Journals that are looking for reviewers who live with certain disabilities may legitimately want to request these data.

Questions relating to disability status, gender, sexuality, race, ethnicity, etc., can have emotional and cultural connotations; therefore, it is especially important to carefully word your questions so all people are treated with respect. Piloting your survey questions to small, diverse groups can help you identify questions that can be interpreted as hurtful. Additionally, to limit confusion due to terminology, you can define terms somewhere within the survey; this might be done through hover-over info boxes, popup info boxes, lists of defined terms at the end of the survey, or by including a link to a web page where the terms are defined.

Both Anna Jester and Dr. Teodoro Pulvirenti, in their ISMTE  2021 Global Virtual Event session, “Using Data to Promote DEI: You Can’t Improve What You Don’t Measure,” noted the importance of how questions are presented within a survey. Some identity surveys use checkboxes that allow a person to select multiple responses from a list so they can more fully express their identity, rather than limiting them to just a single response per question. Some surveys also include an “Other” option, with a text box where the person can write in a response if they do not feel that the list of available choices adequately represents them. An important point — both presenters encouraged survey designers to include an option for people who prefer not to answer a particular question. This can be done by including “Prefer not to answer” as a response option in all set-response lists.

As you build your survey, assume that you will reissue the survey at some predetermined point in the future, such as an interval of every two years for independent applications, or continuously if the survey is built into the submission system. Some journals include text boxes at the end to allow participants to comment about their surveys.  The hope is that user feedback from participants will help the journal to improve the questions and the responses over time.

Finally, once you have a complete list of questions that you believe will allow you to collect the needed data, you should have the questions reviewed.  For example, the American Psychological Association (APA) uses internal groups, such as their Committee on Ethnic Minority Affairs, the Committee on Sexual Orientation and Gender Diversity, and the Committee on Disability Issues in Psychology, to review their questions before they launch their surveys.  Other organizations may choose to hire a diversity consultant and/or have their questions reviewed by a legal team.  It is imperative that all people in positions of authority within your organization (e.g., society, publisher) are made aware of what identity information will be collected and have the opportunity to review the exact questions that will be used to collect this important and sensitive information.

In our next post, we will focus on the logistics of collecting identity data. We would love to hear your thoughts and questions on these topics.


Balestra, C., & Fleischer, L. (2018). Diversity Statistics in the OECD: How do OECD countries collect data on ethnic, racial and indigenous identity? OECD Statistics working Papers 2018/09.


Don't miss the next post

Subscribe now to ORIGINal Thoughts.

Always keep up to date with new posts from ORIGINal Thoughts by receiving email alerts when new content is available.

Leave a Reply

Your email address will not be published. Required fields are marked *

Don't miss the next post

Please check your spam/junk folders
for the verification email.