Doing Research With NLP
Scientific Research Basics
This article is based on the manual for our training on NLP Research. It is published here in the hope that it may help to simplify or at least summarise the task of creating a research study, for NLP Practitioners who have no research background. It makes no attempt to be comprehensive, and successful research is still best done with guidance from a research specialist at a University or Medical School. However, if NLP is to cease being insulted online as a "pseudoscience", we need to have a better understanding of what research is, and indeed what science is.
Science is "the study of phenomena through systematic observation and evaluation". This involves developing theories as explanations of events, creating a testable hypothesis to check the theory, collecting data under precise conditions (e.g. by experiment) and evaluating the data to check the hypothesis. Theory formation is guided most of all by "Occam's razor" (simpler ideas are better). In Psychology, we are very aware that a) perception is always inaccurate ("The map is not the territory") and b) memory is never fully reliable. In fact, we deliberately alter both perception and memory to achieve clinical results for clients. So, we know that just being convinced we have "seen" the benefits of a treatment is not really "proof". That's why we do research.
In quantitative research (the most commonly used type of research for testing effectiveness of treatments), we alter something (called the "independent variable") and check the effect on something else (called the dependent variable"). For example we alter whether or not the people we study get our new NLP process - the independent variable, because we can change it; and we see what happens to their emotional state - the dependent variable. A research study needs to have "internal and external validity". Internal validity means that we have ruled out any other variables that may be creating the result we observe in the dependent variable (such as the extra attention that subjects get by being in the research). External validity means we can generalise the results of this research to other groups of people or other circumstances. We can increase this validity, for example by making sure our sample is similar to the group we want to generalise to, by repeating the experiment elsewhere. A control group can improve both types of validity.
In designing your research, it is important to note that even if the two variables you study are correlated (a specific change in the dependent variable occurs when we change the independent variable) that only proves correlation, not causation. One 2013 study of people diagnosed schizophrenic found that 40% were left handed (as compared to the usual 10%). Does left handedness cause schizophrenia? Probably not. For example, a higher percentage of left handed people are born premature -- actually about 50% of premature babies are left handed. Premature birth may be a "moderator variable" -- it may increase the risk of later schizophrenia diagnosis as well as of left handedness. We need to check.
Another important type of validity is called "Data Evaluation Validity". A little more jargon: we have an "Alternative hypothesis" (which is the idea that when we alter the independent variable it really does affect the dependent variable. And we have the "Null hypothesis": which is the hypothesis that there is no significant difference between the two conditions of the independent variable (so the experiment has no effect). Unfortunately, sometimes there is a difference but it doesn't really prove much in the real world, or there is no difference, and yet in the real world something important is going on. The best example of this is the risk that the experiment has "Low Statistical Power". For example, the herb Echinacea may help people recover from colds, but if we only do research on 20 people, there may not be enough people for us to see the effect. To avoid these types of error (called "Type 1 errors") as well as the opposite type where we find that Echinacea seems to work, but then it really has no effect in the real world ("Type 2 errors") we use "statistical analysis". This is not the most fun part of research for most Psychology researchers, but it is kind of important. Universities often have software that does all this stuff for them. Yu just need to be able to make sense of the results.
You can do a perfectly good research study without knowing all this statistics, by handing the job over to someone else, or by accepting that you do not know the statistical significance of the results you got, although you know the magnitude of change within the group studied (the effect size).
Standard Deviation From The Mean:
The mean is the average result of a test, survey, or experiment. This is a basic thing to know for comparing two sets of data to check for effect size or statistical significance. Sometimes the average tells you a lot because most results are near it. Sometimes results are very widely spread out. If I told you the "average" income in India is US$616 a year, you might be very surprised to discover that India has a lot of millionaires. Standard deviations are measures of how far from the mean the results are. In case you want to know, the steps in calculating the standard deviation are as follows:
- For each value, find its distance to the mean
- For each value, find the square of this distance
- Find the sum of these squared values
- Divide the sum by the number of values in the data set
- Find the square root of this
99.7% of the data are within "3 standard deviations from the mean (however far that is)
95% are within 2 standard deviations
68% are within 1 standard deviation
P Value Measures Statistical Significance:
We want our results to be statistically significant. Statistical significance tells you whether a result may just be accidental or not. The end result of a statistical significance test is a p value, which represents the probability that random fluctuations alone could have generated results that differed from the null hypothesis (H0), in the direction of the alternate hypothesis (HAlt), by at least as much as what you observed in your data. If this probability is too small, then H0 can no longer explain your results, and you're justified in rejecting it and accepting HAlt, which says that some real effect is present. You can say that the effect seen in your data is statistically significant. If you adopt the criterion that p must be less than or equal to 0.05 to declare significance, then you'll keep the chance of making a Type I error to no more than 5 percent.
In research on NLP processes the most persuasive way of proving that a change has statistical significance is to test for diagnostic criteria such as the PTSD criteria. To show, as research on the NLP Reconsolidation technique has done, that subjects "no longer meet diagnostic criteria" is extremely persuasive evidence for the treatment -- basically it implies that a medical condition was cured. The larger the study size, of course, the less likely it is that your results are due to chance. Research in the military often involves thousands of subjects. That said, most psychology studies involve groups of 10-50 people, so these are still worth doing.
Correlation is the degree to which two factors appear to be related. r-value is the way that correlation is reported statistically. It's a number between --1 and +1. If r = 0, there is little or no correlation between two variables. When the number is higher, the positive correlation between two variables is greater. Generally, r-values should be >.3 in order to report a significant positive correlation.
The effect size tells us how big the difference between the control and the research situation is, within our study. Again, it takes a bit of maths to calculate it. Means are calculated by adding up the total scores and dividing the result by the number of subjects. The standard deviation can be calculated using the online calculator above. The effect size is:
Effect Size = [Mean of experimental group] minus [Mean of control group] Divided by [Standard Deviation].
For example, an effect size of 0.8 means that the score of the average person in the experimental group is 0.8 standard deviations above the average person in the control group, and hence exceeds the scores of 79% of the control group.
Creating a Research Study
At the beginning of this article I used the term "quantitative research" to refer to the kind of research design that can be numerically analysed like this, where we vary an independent variable and observe the effect on the dependent variable. There is also such a thing as "qualitative research" which involves interviewing people, asking them to write narrative accounts, or to record accounts, and then analysing those. This is also very useful, as long as the research has a system which does not impose the researchers' expectations and cognitive biases on the material. In NLP, for example, we have the system of "clean language" (developed by James Lawley and Penny Tompkins) which allows us to precisely interview people using only their own internal metaphors and cognitive style. For example, it has been possible using computer analysis of social media posts to predict which people are depressed, based on their word use. The researchers did not change a specific variable and study the result -- they simply looked for differences in the narratives available. They found that depressed individuals refer to themselves much more (with words like "I", "me"), use negative emotional words more ("sad", "lonely" etc), and most significant use what NLP calls universal quantifiers more (words like "all", "never", "no-one").
In qualitative research we still need to check for Internal and external validity. We can increase validity by "triangulation" (collating results from different investigators, different data collection methods etc.).
The gold standard for quantitative research is the Randomized double blind placebo control (RDBPC) study. This provides the most secure evidence that the independent variable (e.g. the treatment) is causative of the change in the dependent variable (e.g. the emotional state of the participants. The people must be either randomly selected for the two groups, or randomly selected with balancing of some moderator variable (e.g. the severity of the problem -- when we want to make sure that although the groups are random they are equivalent in terms of this other variable). Another solution to these moderator variables is to exclude from the study anyone who has a different condition in one (e.g. excluding people diagnosed psychotic, bipolar, or with dementia from research on PTSD).
"Double blind" means that neither the research subject (say the person who has the technique done on them) nor the treatment deliverer know who is in the control group. In reality this is rather difficult to do with NLP, where the Practitioner needs to be trained in NLP to deliver the treatment, and hence is likely to easily identify which group the client is in. In that case we have at best a single blind. Triple blind, by the way, would mean that even the data entry and analysis people do not know who is in which group.
Often, because it is rather unethical to let half of the study group get no treatment at all, the control group is a "wait-list control group". They don't yet know it, but they are going to be offered a second (certain) opportunity to experience the real treatment afterwards (after the follow-up assessments, which may be say 6 months later!). If the dependent variable is likely to fluctuate due to other variables (anxiety may vary due to random exposure to disturbing news events, for example) then a baseline of several data points may be useful (for example having participants fill in a questionnaire every day at a specified time for a week.
Another important keyword to understand is placebo controlled. A placebo is an "inert" substitute for a treatment or intervention (e.g. giving the control group sleep advice or exercise advice). "Inert" means the process has no known activity that would be expected to affect the outcome. A placebo effect is a psychosomatic effect brought about by relief of fear, anxiety or stress because of study participation. A component of every specific treatment effect can be attributed to the placebo response. Just being in the research study (getting attention etc) causes some degree of change in most psychological variables. It is important to note that NO treatment is NOT the same as placebo treatment. If you have a group that has NO treatment, they have no change in the main independent variable, but also they have no change in this other variable (placebo effect). To determine if improvement in the treated group is due to the specific treatment rather than the act of being treated, a placebo must be used.
We also need to check the correlation between the variable we are measuring and the tool we are using to measure it. For example there are standard questionnaires such as the MMPI-2 (The Minnesota Multiphasic Personality Inventory for personality studies), the PHQ-9 (for depression), the GAD-7 (for anxiety) and the PCL-5 (for PTSD), that have been proven in multiple research studies to correlate well with the symptoms they are checking for. Sometimes more than one measure can be used, to give "convergent validity". There are also measures of psychobiology such as fMRI (Functional Magnetic Resonance Imaging -- measuring blood oxygenation in the brain).
The statistical significance of a study can be dramatically enhanced by replicating the study with another group, using different researchers.
Doing research on human beings obviously raises a number of ethical concerns. You are gaining access to huge amounts of personal information, which must be kept confidential (encrypted) under privacy laws. You need written consent to use potentially identifiable data (e.g. photographs, case studies). If your interventions are able to produce a positive effect in someone's life, they may also be able to produce a negative effect if used incorrectly, so some monitoring for adverse effects and a plan to get support may be important. The researcher, or a third element such as a computer record, inevitably has access to information that the research subjects do not know (such as who is in the control group) and so there is an element of deception involved which needs to be explained and agreed to before. Generally this is resolved as an ethical issue by ensuring that participants sign an informed consent form that explains the entire protocol of the research. Medical schools and Universities often have an ethics committee that has the job of checking through all research studies done by members of their community and approving them before the start. There are also state laws that limit what research can be done (for example, sometimes it is forbidden to ask children about religion or sex in research).
Here is the American Psychological Association comments on informed consent: APA's Ethics Code mandates that psychologists who conduct research should inform participants about:
- The purpose of the research, expected duration and procedures.
- Participants' rights to decline to participate and to withdraw from the research once it has started, as well as the anticipated consequences of doing so.
- Reasonably foreseeable factors that may influence their willingness to participate, such as potential risks, discomfort or adverse effects.
- Any prospective research benefits.
- Limits of confidentiality, such as data coding, disposal, sharing and archiving, and when confidentiality must be broken.
- Incentives for participation.
- Who participants can contact with questions.
Experts also suggest covering the likelihood, magnitude and duration of harm or benefit of participation, emphasizing that their involvement is voluntary and discussing treatment alternatives, if relevant to the research.
If you are working in a group, then you also have ethical responsibilities to that group, for example to acknowledge contributions, often by naming contributors as authors in the research report. This should be discussed before you begin work. You have a responsibility to the scientific community to preserve the integrity of data (not to fake results) and to avoid plagiarism. You need to declare any other benefits you may make as a result of the research conclusions (do these results promote your business?), and any payments made to participants (If they get paid that also alters their motivation of course).
How to Measure Results and Write The Report
Following are three examples of NLP research studies from the real world, and then the American Psychological Association detailed instructions on writing up your research study.
- Kazdin, A.E. (2017) "Research Design in Clinical Psychology" Fifth Edition, Pearson, Boston
- Cooper, Harris M. (2019) "Reporting Quantitative Research in Psychology" American Psychological Association. Kindle Edition.
Three Examples of NLP Research
Example 1: New Zealand Research on NLP Based Treatment for Depression.
A study of Des Shinnick's NLP based "Shinnick Method: Rapid Depression Treatment" was undertaken with supervision of medical Professor Bruce Arroll in 2013.
Some notes about the research, and the measures used:
Inclusion criteria: Patients in primary medical care aged 16 and over, with a PHQ-9 depression inventory between 10 and 20.
Exclusion criteria: Currently on major tranquillizer medication, Diagnosed Psychotic or Bipolar disorder, dementia, terminal illness or current intoxication.
Control group: usual medical care and physical exercise and sleep advice.
Validation: At 6 weeks and at 13 weeks after treatment, a third party will deliver the assessment with no prior knowledge of who was treated.
In session scaling. Have the person circle a number from 1-10 (where 1 = Feeling unhappy, and 10 = Feeling great) before and after each session
Medical Practitioner notification: Patients will be advised that their medical practitioner will be notified if they have any adverse response to the treatment, or in case of suicidal thoughts.
Randomisation: by a random number generator, and each patient is then assigned a practice identifier as well as a patient identifier.
Data safety: Data will be stored in encrypted form.
Patient Health Questionnaire -- 9 Items (Depression)
Ask the patient: how often have they been bothered by the following over the past 2 weeks?
Score: Not at all = 0, Several days = 1, More than half the days = 2, Nearly every day = 3
- Little interest or pleasure in doing things?
- Feeling down, depressed, or hopeless?
- Trouble falling or staying asleep, or sleeping too much?
- Feeling tired or having little energy?
- Poor appetite or overeating?
- Feeling bad about yourself -- or that you are a failure or have let yourself or your family down?
- Trouble concentrating on things, such as reading the newspaper or watching television?
- Moving or speaking so slowly that other people could have noticed? Or so fidgety or restless that you have been moving a lot more than usual?
- Thoughts that you would be better off dead, or thoughts of hurting yourself in some way?
Generalised Anxiety Disorder -- 7 Items (Anxiety)
Ask the patient: how often have they been bothered by the following over the past 2 weeks?
Score: Not at all = 0, Several days = 1, More than half the days = 2, Nearly every day = 3
- Feeling nervous, anxious, or on edge
- Not being able to stop or control worrying
- Worrying too much about different things
- Trouble relaxing
- Being so restless that it's hard to sit still
- Becoming easily annoyed or irritable
- Feeling afraid as if something awful might happen
If you checked off any problems, how difficult have these made it for you to do your work, take care of things at home, or get along with other people? ... Not difficult at all, Somewhat difficult, Very difficult, Extremely difficult.
Example 2: Non Treatment Research Example: Effect of Eye Accessing Cues on Spelling
Loisell, F. (1985) The effect of eye placement on orthographic memorisation, PH.D. Thesis Faculte des Sciences Sociales, Universite de Moncton, New Brunswick, Canada
F. Loiselle at the University of Moncton in New Brunswick, Canada (1985) selected 44 average spellers, as determined by their pretest on memorising nonsense words. Instructions in the experiment, where the 44 were required to memorise another set of nonsense words, were given on a computer screen. The 44 were divided into four subgroups for the experiment.
- Group One were told to visualise each word in the test, while looking up to the left.
- Group Two were told to visualise each word while looking down to the right.
- Group Three were told to visualise each word (no reference to eye position).
- Group Four were simply told to study the word in order to learn it.
The results on testing immediately after were that Group One (who did actually look up left more than the others, but took the same amount of time) increased their success in spelling by 25%, Group Two worsened their spelling by 15%, Group Three increased their success by 10%, and Group Four scored the same as previously. This strongly suggests that looking up left (Visual Recall in NLP terms) enhances spelling, and is twice as effective as simply teaching students to picture the words. Furthermore, looking down right (Kinesthetic in NLP terms) damages the ability to visualise the words. Interestingly, in a final test some time later (testing retention), the scores of Group One remained constant, while the scores of the control group, Group Four, plummeted a further 15%, a drop which was consistent with standard learning studies. The resultant difference in memory of the words for these two groups was 61% .
This study was replicated by Fatemeh Moussavi, (2009) at Islamic Azad University, Tehran, using 140 subjects in the same 4 groups. "Learning through NLP: The Impact of Neuro-Linguistic Programming on Orthographic Memorization (Spelling). "The results showed about 49% increase in the correct spellings of the students who had looked up into left and visualized (VUL) , about 8% increase in the correct spellings in the ones who had been instructed to visualize only (VIS); the students who were instructed to study the words (S-STU) – thus using their previous learning strategy – stayed roughly the same, as one would expect; but the scores of the students who were instructed to look down and to the right (VDR) while visualizing (i.e., to an inappropriate accessing cue ) worsened by about 14%. Furthermore, the results of final test showed VUL group had close to 100% retention of the words memorized by them, VIS group had a drop off in their scores of about 10% in final test, STU group showed a decline in their scores about 15%,and VDR group ,however, actually showed an improvement about 17%."
Example 3. Research and Recognition Project PTSD study
First 4 Trials: Results
This has been NLP's most successful research project. The NLP Fast Rewind Movie process ("Phobia cure") was rebranded as The Reconsolidation of Traumatic Memories (RTM) process. Using the PCL-5 checklist (below), research results were dramatic.
- Pilot Study: Journal of Military, Veteran, and Family Health. 25 of 26 (96%) no longer test as having PTSD. (Gray and Bourke, 2015)
- First Replication Study. 28 of 30 (94%) no longer test as having PTSD. (Tylee, Gray et alia, 2016a)
- Second Replication Study (women). 29 of 30 (96%) no longer test as having PTSD. (Tylee, Gray et alia, 2016b)
- Third Replication Study. 68 of 75 (90%) no longer test as having PTSD. (Steenkamp, Litz et alia, 2016)
The Posttraumatic Stress Disorder (PTSD) Checklist (PCL-5)
The Posttraumatic Stress Disorder (PTSD) Checklist is known as the PCL, it is a self-screening tool to help in the diagnosis of PTSD. The PCL probable diagnosis of PTSD; a definitive diagnosis can only be given by an appropriately qualified clinician. The Posttraumatic Stress Disorder Checklist has four different versions, the version which is most suitable in your case depends on either the psychiatric manual being used for the clinician, or the type of stressful experience that has/may have caused the problems you experience. The different PCL versions are: PCL-5 for PTSD diagnosis using the new DSM-5 psychiatric manual (released 2013) PCL-C for Civilians, for diagnosis using the DSM-IV psychiatric manual PCL-M for Military veterans or service personnel, for diagnosis using the DSM-IV psychiatric manual PCL-S for non-military use, based on a Specific very stressful event rather than multiple events, for diagnosis using the DSM-IV psychiatric manual.
PCL-5: Posttraumatic Checklist for DSM-5 Instructions: There is one question about the stressful experience or event, followed by 20 multiple-choice questions below. These questions have been designed for adults. If you prefer you can download a printable version of this screening tool instead. The questions below are from the PCL-5, which applies to all types of stressful experiences. Disclaimer This self-assessment tool is not a substitute for clinical diagnosis or advice. By using the tool you agree to accept that the distributors and contributors are not responsible or liable for the outcome of the tool, the accuracy of the calculations, or any decisions or events which result from using it. This source does not provide medical advice.
A person who has had an extremely stressful experience may have many a range of different problems as a result of the stressful experience. Some people have had more than one extremely stressful experience. For each of the questions below, keep your worst experience or event in mind, please read each problem carefully and then select one response to indicate how much you have been bothered by that problem in the past month. Description of the specific, worst stressful experience you are holding in mind (not scored): This box can be used by a clinician and compared against the types of "qualifying event" that are known to be possible causes of PTSD.
In the past month, how much were you bothered by:
1. Repeated, disturbing, and unwanted memories of the stressful experience?
2. Repeated, disturbing dreams of the stressful experience?
3. Suddenly feeling or acting as if the stressful experience were actually happening again (as if you were actually back there reliving it)?
4. Feeling very upset when something reminded you of the stressful experience?
5. Having strong physical reactions when something reminded you of the stressful experience (for example, heart pounding, trouble breathing, sweating)?
6. Avoiding memories, thoughts, or feelings related to the stressful experience?
7. Avoiding external reminders of the stressful experience (for example, people, places, conversations, activities, objects, or situations)?
8. Trouble remembering important parts of the stressful experience?
9. Having strong negative beliefs about yourself, other people, or the world (for example, having thoughts such as: l am bad, there is something seriously wrong with me, no one can be trusted, the world is completely dangerous)?
10. Blaming yourself or someone else for the stressful experience or what happened after it?
11. Having strong negative feelings such as fear, horror, anger, guilt, or shame?
12. Loss of interest in activities that you used to enjoy?
13. Feeling distant or cut off from other people?
14. Trouble experiencing positive feelings (for example, being unable to feel happiness or have loving feelings for people close to you)?
15. Irritable behavior, angry outbursts, or acting aggressively?
16. Taking too many risks or doing things that could cause you harm?
17. Being "superalert" or watchful or on guard?
18. Feeling jumpy, or easily startled?
19. Having difficulty concentrating?
20. Trouble falling or staying asleep?
Rate each as:
0 = Not at all
1 = A little bit
2 = Moderately
3 = Quite a bit
4 = Extremely
PCL-5, Weathers, F. W., Litz, B. T., Keane, T. M., Palmieri, P. A., Marx, B. P., & Schnurr, P. P. (2013). Scoring The score is worked out by adding up the number of points for each answer. The minimum score is 0, the maximum is 80.
Result Explained There are several different ways of interpreting the scores given by the PTSD Checklist-5. For a person to have a probable diagnosis of PTSD sufficient criteria must be at least moderately met in each of the four symptom groups. This means you need to have one or more symptoms from questions 1 to 5, either question 6 or 7, two or more from questions 8 to 14, and two or more from questions 15 to 20, each of which must be met moderately, quite a bit or extremely. In addition, a score of 38 or higher indicates probable PTSD in veterans; the score may be set higher or lower for civilians; no agreement has been reached yet since it was only developed after the DSM-5 was published in 2013. A lower cut off may be used for initial screening rather than probable diagnosis, the maximum score is 80. For those people already diagnosed, it can be used to measure improvement. A definite diagnosis can only be given by a clinician, and depends on the details of the extremely stressful experience described at the top of the form and the effect on the individual, a clinician would also need to ask questions about the problems to check the person's understanding of each question and that the PTSD criteria are fully met. The PCL-5 scores are not comparable with scores from the PCL-C, PCL-M or PCL-S because the number of questions and points per question differ.
Measuring improvements in PTSD The PCL5-PTSD tool can be used multiple times after diagnosis to assess the change in PTSD symptoms over time. A reduction of 5 points has been suggested to reflect a reliable reduction in symptoms, meaning the change is not caused by chance. This can be used to check if an individual's symptoms are responding to treatment. A 10-20 point reduction reflects clinically significant change.
Appendix: Writing a Research Report: APA Style JARS
In 2006 the American Psychological Association (APA) formed the Journal Article Reporting Standards Working Group (JARS Working Group), and following here are their recommended standards for Quantitative (numbers measuring) research. JARS--Quant Table 1. Information Recommended for Inclusion in Manuscripts That Report New Data Collections Regardless of Research Design.
Title and Title Page
- Identify main variables and theoretical issues under investigation and the relationships between them.
- Identify the populations studied.
- Provide acknowledgment and explanation of any special circumstances, including
- registration information if the study has been registered
- use of data also appearing in previous publications
- prior reporting of the fundamental data in dissertations or conference papers
- sources of funding or other support
- relationships or affiliations that may be perceived as conflicts of interest
- previous (or current) affiliation of authors if different from location where the study was conducted
- contact information for the corresponding author
- additional information of importance to the reader that may not be appropriately included in other sections of the paper
- State the problem under investigation, including main hypotheses.
- Describe participants (human research), specifying their pertinent characteristics for the study; in animal research, include genus and species. Participants are described in greater detail in the body of the paper.
- Describe the study method, including
- research design (e.g., experiment, observational study)
- sample size
- materials used (e.g., instruments, apparatus)
- outcome measures
- data-gathering procedures, including a brief description of the source of any secondary data. If the study is a secondary data analysis, so indicate.
- Report findings, including effect sizes and confidence intervals or statistical significance levels.
- State conclusions, beyond just results, and report the implications or applications.
- State the importance of the problem, including theoretical or practical implications.
Review of Relevant Scholarship
- Provide a succinct review of relevant scholarship, including
- relation to previous work
- differences between the current report and earlier reports if some aspects of this study have been reported on previously
Hypothesis, Aims, and Objectives
- State specific hypotheses, aims, and objectives, including
- theories or other means used to derive hypotheses
- primary and secondary hypotheses
- other planned analyses
- State how hypotheses and research design relate to one another.
Inclusion and Exclusion
- Report inclusion and exclusion criteria, including any restrictions based on demographic characteristics.
- Report major demographic characteristics (e.g., age, sex, ethnicity, socioeconomic status) and important topic-specific characteristics (e.g., achievement level in studies of educational interventions).
- Describe procedures for selecting participants, including
- sampling method if a systematic sampling plan was implemented
- percentage of sample approached that actually participated
- whether self-selection into the study occurred (either by individuals or by units, such as schools or clinics)
- Describe settings and locations where data were collected as well as dates of data collection.
- Describe agreements and payments made to participants.
- Describe institutional review board agreements, ethical standards met, and safety monitoring.
Sample Size, Power, and Precision
- Describe the sample size, power, and precision, including
- intended sample size
- achieved sample size, if different from the intended sample size
- determination of sample size, including
- power analysis, or methods used to determine precision of parameter estimates
- explanation of any interim analyses and stopping rules employed
Measures and Covariates
- Define all primary and secondary measures and covariates, including measures collected but not included in the report.
- Describe methods used to collect data.
Quality of Measurements
- Describe methods used to enhance the quality of measurements, including
- training and reliability of data collectors
- use of multiple observations
- Provide information on validated or ad hoc instruments created for individual studies, for individual studies (e.g., psychometric and biometric properties).
- Report whether participants, those administering the experimental manipulations, and those assessing the outcomes were aware of condition assignments.
- If masking took place, provide a statement regarding how it was
- Estimate and report values of reliability coefficients for the scores analyzed (i.e., the researcher's sample), if possible. Provide estimates of convergent and discriminant validity where relevant.
- Report estimates related to the reliability of measures, including
- interrater reliability for subjectively scored measures and ratings
- test--retest coefficients in longitudinal studies in which the retest interval corresponds to the measurement schedule used in the study
- internal consistency coefficients for composite scales in which these indices are appropriate for understanding the nature of the instruments being used in the study
- Report the basic demographic characteristics of other samples if reporting reliability or validity coefficients from those samples, such as those described in test manuals or in norming information for the instrument.
Conditions and Design
- State whether conditions were manipulated or naturally observed. Report the type of design as per the JARS--Quant tables (2-6):
- experimental manipulation with participants randomized
- experimental manipulation without randomization
- clinical trial with randomization
- clinical trial without randomization
- nonexperimental design (i.e., no experimental manipulation): observational design, epidemiological design, natural history, and so forth (single-group designs or multiple-group comparisons)
- longitudinal design
- N-of-1 studies
- Report the common name given to designs not currently covered in JARS--Quant.
- Describe planned data diagnostics, including
- criteria for post-data-collection exclusion of participants, if any
- criteria for deciding when to infer missing data and methods used for imputation of missing data
- definition and processing of statistical outliers
- analyses of data distributions
- data transformations to be used, if any
- Describe the analytic strategy for inferential statistics and protection against experiment-wise error for
- primary hypotheses
- secondary hypotheses
- exploratory hypotheses
- Report the flow of participants, including
- total number of participants in each group at each stage of the study
- flow of participants through each stage of the study (include figure depicting flow, when possible; see the JARS--Quant Participant Flowchart)
- Provide dates defining the periods of recruitment and repeated measures or follow-up.
Statistics and Data Analysis
- Provide information detailing the statistical and data-analytic methods used, including
- missing data
- frequency or percentages of missing data
- empirical evidence and/or theoretical arguments for the causes of data that are missing--for example, missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR)
- methods actually used for addressing missing data, if any
- descriptions of each primary and secondary outcome, including the total sample and each subgroup, that includes the number of cases, cell means, standard deviations, and other measures that characterize the data used
- inferential statistics, including
- results of all inferential tests conducted, including exact p values if null hypothesis significance testing (NHST) methods were used, and reporting the minimally sufficient set of statistics (e.g., dfs, mean square [MS] effect, MS error) needed to construct the tests
- effect-size estimates and confidence intervals on estimates that correspond to each inferential test conducted, when possible
- clear differentiation between primary hypotheses and their tests--estimates, secondary hypotheses and their tests--estimates, and exploratory hypotheses and their test--estimates
- missing data
- complex data analyses--for example, structural equation modeling analyses (see also Table 7), hierarchical linear models, factor analysis, multivariate analyses, and so forth, including
- details of the models estimated
- associated variance--covariance (or correlation) matrix or matrices
- identification of the statistical software used to run the analyses (e.g., SAS PROC GLM or the particular R package)
Support of Original Hypotheses
- Provide a statement of support or nonsupport for all hypotheses, whether primary or secondary, including
- distinction by primary and secondary hypotheses
- discussion of the implications of exploratory analyses in terms of both substantive findings and error rates that may be uncontrolled
Similarity of Results
- Discuss similarities and differences between reported results and work of others.
- Provide an interpretation of the results, taking into account
- sources of potential bias and threats to internal and statistical validity
- imprecision of measurement protocols
- overall number of tests or overlap among tests
- adequacy of sample sizes and sampling validity
- Discuss generalizability (external validity) of the findings, taking into account
- target population (sampling validity)
- other contextual issues (setting, measurement, time; ecological validity)
- Discuss implications for future research, program, or policy.