# Experimental design

Steps to conduct a statistical study

First, we need to state our hypothesis. For example, we may say ‘Air Pollution Causes Asthma in kids in urban settings’. This is the hypothesis that we wish to prove correct or incorrect as part of our statistical study.

From this, we can state the individuals of interest – who needs to be part of this study? In the above case, it would be children, living in an urban area, so these are the individuals that we should include as part of our study.

Next, we need to specify the variables that we need to measure.  So if we were asking our children that live in an urban area questions, they would probably include the below. In addition to these questions, we would measure air pollution & some way to quantify their asthma severity.

• Where do you live?
• How long have you lived there?
• Do you suffer from asthma?
• To what degree to you suffer with asthma?

Now we know what we are trying to find out, who we need to find that out from & what we’re going to measure, we need to decide if we’re going to use a population or sample dataset. If we are going to use sampling, we also need to decide what our sampling method is (as per our last article).

Next, we need to address any ethical concerns about the project. If you’re going to do any medical checks on the children, you will need their (and their parents) consent. You also need to decide whether you need to ask any sensitive questions as part of the study.

We then get to the fun bit of collecting data! Go go go!

Now you’ve got your data, you need to use descriptive or inferential statistics to answer your hypothesis & then write up your findings.

Types of study

There are two types of study we can use. The first is an experiment. This is where we will study the changes / effect of a particular variable. For example, if we administered 10 people with asthma with a new medication and 10 without; we could observe the changes in their asthma severity. An experiment requires that we do something (change something in the environment, change the device collecting the stats, administer some medication, etc…).

We then have observational studies which is just about observing. A survey is an example of an observational study, but can also include taking measurements (e.g. blood pressure).

An important consideration when designing your study is that it must be rigorous enough to be replicated – for example, if you were studying a particular school & proved your hypothesis, you would need to follow exactly the same process at other schools. This requires good sampling (no undercoverage).

Bias in surveys

Just because you invite someone to participate in your study, doesn’t mean that they will. So even if you have used an appropriate sampling method, it doesn’t mean that everybody that you’ve chosen will choose you. As such, surveys are typically subject to heavy bias. Think about it, if you go to a restaurant and you have a good experience, you’ll be indifferent to filling out the survey. Whereas, if you have a terrible experience, you’ll be very willing to provide your thoughts.

Non-responses clearly lead to bias – if you ask 1,000,000 people and only 1% reply, they will be the 1% that have a bone to pick!

We then have people that lie because the question is too personal or it’s too hard. For example, if you ask someone how much they earn; they might inflate their salaries or if you ask them what % their mortgage rate is, they might feel embarrassed that they don’t know & make something up.

We also have recall bias. Think about when you talk to your grandparents. Do you ever notice that everything was 100 times better when they grew up than it is now? That, is recall bias. They’re looking back at their childhood with rose tinted glasses and forget about the post-war rations etc…

We can do a few things to reduce bias:

• The wording of questions can be biased towards a particular opinion. We need to reduce this as much as possible.
• The order of questions can lead a survey respondent down a particular train of thought & influence their responses.
• A scale of 1-5 may not fit for all questions & so respondents just pick the one they think is closest.
• We need to avoid vague wording (e.g. did you wait a long time?) In this scenario, we should define what ‘long’ is.