Dr Richard Bolstad is Transformations Principal Trainer

Doing Research With NLP

Richard Bolstad

Scientific Research Basics

This article is based on the manual for our training on NLP Research. It is published here in the hope that it may help to simplify or at least summarise the task of creating a research study, for NLP Practitioners who have no research background. It makes no attempt to be comprehensive, and successful research is still best done with guidance from a research specialist at a University or Medical School. However, if NLP is to cease being insulted online as a "pseudoscience", we need to have a better understanding of what research is, and indeed what science is.

Science is "the study of phenomena through systematic observation and evaluation". This involves developing theories as explanations of events, creating a testable hypothesis to check the theory, collecting data under precise conditions (e.g. by experiment) and evaluating the data to check the hypothesis. Theory formation is guided most of all by "Occam's razor" (simpler ideas are better). In Psychology, we are very aware that a) perception is always inaccurate ("The map is not the territory") and b) memory is never fully reliable. In fact, we deliberately alter both perception and memory to achieve clinical results for clients. So, we know that just being convinced we have "seen" the benefits of a treatment is not really "proof". That's why we do research.

In quantitative research (the most commonly used type of research for testing effectiveness of treatments), we alter something (called the "independent variable") and check the effect on something else (called the dependent variable"). For example we alter whether or not the people we study get our new NLP process - the independent variable, because we can change it; and we see what happens to their emotional state - the dependent variable. A research study needs to have "internal and external validity". Internal validity means that we have ruled out any other variables that may be creating the result we observe in the dependent variable (such as the extra attention that subjects get by being in the research). External validity means we can generalise the results of this research to other groups of people or other circumstances. We can increase this validity, for example by making sure our sample is similar to the group we want to generalise to, by repeating the experiment elsewhere. A control group can improve both types of validity.

In designing your research, it is important to note that even if the two variables you study are correlated (a specific change in the dependent variable occurs when we change the independent variable) that only proves correlation, not causation. One 2013 study of people diagnosed schizophrenic found that 40% were left handed (as compared to the usual 10%). Does left handedness cause schizophrenia? Probably not. For example, a higher percentage of left handed people are born premature -- actually about 50% of premature babies are left handed. Premature birth may be a "moderator variable" -- it may increase the risk of later schizophrenia diagnosis as well as of left handedness. We need to check.

Another important type of validity is called "Data Evaluation Validity". A little more jargon: we have an "Alternative hypothesis" (which is the idea that when we alter the independent variable it really does affect the dependent variable. And we have the "Null hypothesis": which is the hypothesis that there is no significant difference between the two conditions of the independent variable (so the experiment has no effect). Unfortunately, sometimes there is a difference but it doesn't really prove much in the real world, or there is no difference, and yet in the real world something important is going on. The best example of this is the risk that the experiment has "Low Statistical Power". For example, the herb Echinacea may help people recover from colds, but if we only do research on 20 people, there may not be enough people for us to see the effect. To avoid these types of error (called "Type 1 errors") as well as the opposite type where we find that Echinacea seems to work, but then it really has no effect in the real world ("Type 2 errors") we use "statistical analysis". This is not the most fun part of research for most Psychology researchers, but it is kind of important. Universities often have software that does all this stuff for them. Yu just need to be able to make sense of the results.

Statistics Concepts

You can do a perfectly good research study without knowing all this statistics, by handing the job over to someone else, or by accepting that you do not know the statistical significance of the results you got, although you know the magnitude of change within the group studied (the effect size).

Standard Deviation From The Mean:

The mean is the average result of a test, survey, or experiment. This is a basic thing to know for comparing two sets of data to check for effect size or statistical significance. Sometimes the average tells you a lot because most results are near it. Sometimes results are very widely spread out. If I told you the "average" income in India is US$616 a year, you might be very surprised to discover that India has a lot of millionaires. Standard deviations are measures of how far from the mean the results are. In case you want to know, the steps in calculating the standard deviation are as follows:

99.7% of the data are within "3 standard deviations from the mean (however far that is)
95% are within 2 standard deviations
68% are within 1 standard deviation

P Value Measures Statistical Significance:

We want our results to be statistically significant. Statistical significance tells you whether a result may just be accidental or not. The end result of a statistical significance test is a p value, which represents the probability that random fluctuations alone could have generated results that differed from the null hypothesis (H0), in the direction of the alternate hypothesis (HAlt), by at least as much as what you observed in your data. If this probability is too small, then H0 can no longer explain your results, and you're justified in rejecting it and accepting HAlt, which says that some real effect is present. You can say that the effect seen in your data is statistically significant. If you adopt the criterion that p must be less than or equal to 0.05 to declare significance, then you'll keep the chance of making a Type I error to no more than 5 percent.

In research on NLP processes the most persuasive way of proving that a change has statistical significance is to test for diagnostic criteria such as the PTSD criteria. To show, as research on the NLP-based RTM technique has done, that subjects "no longer meet diagnostic criteria" is extremely persuasive evidence for the treatment -- basically it implies that a medical condition was cured. The larger the study size, of course, the less likely it is that your results are due to chance. Research in the military often involves thousands of subjects. That said, most psychology studies involve groups of 10-50 people, so these are still worth doing.

Online Statistical Significance calculator here

R Value:

Correlation is the degree to which two factors appear to be related. r-value is the way that correlation is reported statistically. It's a number between --1 and +1. If r = 0, there is little or no correlation between two variables. When the number is higher, the positive correlation between two variables is greater. Generally, r-values should be >.3 in order to report a significant positive correlation.

Effect Size:

The effect size tells us how big the difference between the control and the research situation is, within our study. Again, it takes a bit of maths to calculate it. Means are calculated by adding up the total scores and dividing the result by the number of subjects. The standard deviation can be calculated using the online calculator above. The effect size is:

Effect Size = [Mean of experimental group] minus [Mean of control group] Divided by [Standard Deviation].

For example, an effect size of 0.8 means that the score of the average person in the experimental group is 0.8 standard deviations above the average person in the control group, and hence exceeds the scores of 79% of the control group.

Creating a Research Study

At the beginning of this article I used the term "quantitative research" to refer to the kind of research design that can be numerically analysed like this, where we vary an independent variable and observe the effect on the dependent variable. There is also such a thing as "qualitative research" which involves interviewing people, asking them to write narrative accounts, or to record accounts, and then analysing those. This is also very useful, as long as the research has a system which does not impose the researchers' expectations and cognitive biases on the material. In NLP, for example, we have the system of "clean language" (developed by James Lawley and Penny Tompkins) which allows us to precisely interview people using only their own internal metaphors and cognitive style. For example, it has been possible using computer analysis of social media posts to predict which people are depressed, based on their word use. The researchers did not change a specific variable and study the result -- they simply looked for differences in the narratives available. They found that depressed individuals refer to themselves much more (with words like "I", "me"), use negative emotional words more ("sad", "lonely" etc), and most significant use what NLP calls universal quantifiers more (words like "all", "never", "no-one").

In qualitative research we still need to check for Internal and external validity. We can increase validity by "triangulation" (collating results from different investigators, different data collection methods etc.).

The gold standard for quantitative research is the Randomized double blind placebo control (RDBPC) study. This provides the most secure evidence that the independent variable (e.g. the treatment) is causative of the change in the dependent variable (e.g. the emotional state of the participants. The people must be either randomly selected for the two groups, or randomly selected with balancing of some moderator variable (e.g. the severity of the problem -- when we want to make sure that although the groups are random they are equivalent in terms of this other variable). Another solution to these moderator variables is to exclude from the study anyone who has a different condition in one (e.g. excluding people diagnosed psychotic, bipolar, or with dementia from research on PTSD).

"Double blind" means that neither the research subject (say the person who has the technique done on them) nor the treatment deliverer know who is in the control group. In reality this is rather difficult to do with NLP, where the Practitioner needs to be trained in NLP to deliver the treatment, and hence is likely to easily identify which group the client is in. In that case we have at best a single blind. Triple blind, by the way, would mean that even the data entry and analysis people do not know who is in which group.

Often, because it is rather unethical to let half of the study group get no treatment at all, the control group is a "wait-list control group". They don't yet know it, but they are going to be offered a second (certain) opportunity to experience the real treatment afterwards (after the follow-up assessments, which may be say 6 months later!). If the dependent variable is likely to fluctuate due to other variables (anxiety may vary due to random exposure to disturbing news events, for example) then a baseline of several data points may be useful (for example having participants fill in a questionnaire every day at a specified time for a week.

Another important keyword to understand is placebo controlled. A placebo is an "inert" substitute for a treatment or intervention (e.g. giving the control group sleep advice or exercise advice). "Inert" means the process has no known activity that would be expected to affect the outcome. A placebo effect is a psychosomatic effect brought about by relief of fear, anxiety or stress because of study participation. A component of every specific treatment effect can be attributed to the placebo response. Just being in the research study (getting attention etc) causes some degree of change in most psychological variables. It is important to note that NO treatment is NOT the same as placebo treatment. If you have a group that has NO treatment, they have no change in the main independent variable, but also they have no change in this other variable (placebo effect). To determine if improvement in the treated group is due to the specific treatment rather than the act of being treated, a placebo must be used.

We also need to check the correlation between the variable we are measuring and the tool we are using to measure it. For example there are standard questionnaires such as the MMPI-2 (The Minnesota Multiphasic Personality Inventory for personality studies), the PHQ-9 (for depression), the GAD-7 (for anxiety) and the PCL-5 (for PTSD), that have been proven in multiple research studies to correlate well with the symptoms they are checking for. Sometimes more than one measure can be used, to give "convergent validity". There are also measures of psychobiology such as fMRI (Functional Magnetic Resonance Imaging -- measuring blood oxygenation in the brain).

The statistical significance of a study can be dramatically enhanced by replicating the study with another group, using different researchers.


Doing research on human beings obviously raises a number of ethical concerns. You are gaining access to huge amounts of personal information, which must be kept confidential (encrypted) under privacy laws. You need written consent to use potentially identifiable data (e.g. photographs, case studies). If your interventions are able to produce a positive effect in someone's life, they may also be able to produce a negative effect if used incorrectly, so some monitoring for adverse effects and a plan to get support may be important. The researcher, or a third element such as a computer record, inevitably has access to information that the research subjects do not know (such as who is in the control group) and so there is an element of deception involved which needs to be explained and agreed to before. Generally this is resolved as an ethical issue by ensuring that participants sign an informed consent form that explains the entire protocol of the research. Medical schools and Universities often have an ethics committee that has the job of checking through all research studies done by members of their community and approving them before the start. There are also state laws that limit what research can be done (for example, sometimes it is forbidden to ask children about religion or sex in research).

Here is the American Psychological Association comments on informed consent: APA's Ethics Code mandates that psychologists who conduct research should inform participants about:

Experts also suggest covering the likelihood, magnitude and duration of harm or benefit of participation, emphasizing that their involvement is voluntary and discussing treatment alternatives, if relevant to the research.

If you are working in a group, then you also have ethical responsibilities to that group, for example to acknowledge contributions, often by naming contributors as authors in the research report. This should be discussed before you begin work. You have a responsibility to the scientific community to preserve the integrity of data (not to fake results) and to avoid plagiarism. You need to declare any other benefits you may make as a result of the research conclusions (do these results promote your business?), and any payments made to participants (If they get paid that also alters their motivation of course).

How to Measure Results and Write The Report

Following are three examples of NLP research studies from the real world, and then the American Psychological Association detailed instructions on writing up your research study.

Main references:

Three Examples of NLP Research

Example 1: New Zealand Research on NLP Based Treatment for Depression.

A study of Des Shinnick's NLP based "Shinnick Method: Rapid Depression Treatment" was undertaken with supervision of medical Professor Bruce Arroll in 2013.

News article here

Some notes about the research, and the measures used:
Inclusion criteria: Patients in primary medical care aged 16 and over, with a PHQ-9 depression inventory between 10 and 20.
Exclusion criteria: Currently on major tranquillizer medication, Diagnosed Psychotic or Bipolar disorder, dementia, terminal illness or current intoxication.
Control group: usual medical care and physical exercise and sleep advice.
Validation: At 6 weeks and at 13 weeks after treatment, a third party will deliver the assessment with no prior knowledge of who was treated.
In session scaling. Have the person circle a number from 1-10 (where 1 = Feeling unhappy, and 10 = Feeling great) before and after each session
Medical Practitioner notification: Patients will be advised that their medical practitioner will be notified if they have any adverse response to the treatment, or in case of suicidal thoughts.
Randomisation: by a random number generator, and each patient is then assigned a practice identifier as well as a patient identifier.
Data safety: Data will be stored in encrypted form.

Patient Health Questionnaire -- 9 Items (Depression)

Ask the patient: how often have they been bothered by the following over the past 2 weeks?
Score: Not at all = 0, Several days = 1, More than half the days = 2, Nearly every day = 3

Online test here

Generalised Anxiety Disorder -- 7 Items (Anxiety)

Ask the patient: how often have they been bothered by the following over the past 2 weeks?
Score: Not at all = 0, Several days = 1, More than half the days = 2, Nearly every day = 3

If you checked off any problems, how difficult have these made it for you to do your work, take care of things at home, or get along with other people? ... Not difficult at all, Somewhat difficult, Very difficult, Extremely difficult.

Online test here

Example 2: Non Treatment Research Example: Effect of Eye Accessing Cues on Spelling

Loisell, F. (1985) The effect of eye placement on orthographic memorisation, PH.D. Thesis Faculte des Sciences Sociales, Universite de Moncton, New Brunswick, Canada

F. Loiselle at the University of Moncton in New Brunswick, Canada (1985) selected 44 average spellers, as determined by their pretest on memorising nonsense words. Instructions in the experiment, where the 44 were required to memorise another set of nonsense words, were given on a computer screen. The 44 were divided into four subgroups for the experiment.

The results on testing immediately after were that Group One (who did actually look up left more than the others, but took the same amount of time) increased their success in spelling by 25%, Group Two worsened their spelling by 15%, Group Three increased their success by 10%, and Group Four scored the same as previously. This strongly suggests that looking up left (Visual Recall in NLP terms) enhances spelling, and is twice as effective as simply teaching students to picture the words. Furthermore, looking down right (Kinesthetic in NLP terms) damages the ability to visualise the words. Interestingly, in a final test some time later (testing retention), the scores of Group One remained constant, while the scores of the control group, Group Four, plummeted a further 15%, a drop which was consistent with standard learning studies. The resultant difference in memory of the words for these two groups was 61% .

This study was replicated by Fatemeh Moussavi, (2009) at Islamic Azad University, Tehran, using 140 subjects in the same 4 groups. "Learning through NLP: The Impact of Neuro-Linguistic Programming on Orthographic Memorization (Spelling). "The results showed about 49% increase in the correct spellings of the students who had looked up into left and visualized (VUL) , about 8% increase in the correct spellings in the ones who had been instructed to visualize only (VIS); the students who were instructed to study the words (S-STU) – thus using their previous learning strategy – stayed roughly the same, as one would expect; but the scores of the students who were instructed to look down and to the right (VDR) while visualizing (i.e., to an inappropriate accessing cue ) worsened by about 14%. Furthermore, the results of final test showed VUL group had close to 100% retention of the words memorized by them, VIS group had a drop off in their scores of about 10% in final test, STU group showed a decline in their scores about 15%,and VDR group ,however, actually showed an improvement about 17%."

Example 3. Research and Recognition Project PTSD study

First 4 Trials: Results

This has been the most successful research project related to NLP. The NLP Fast Rewind Movie process ("Phobia cure") was reformatted, precisely defined, and outfitted with a neuroscientific base as The Reconsolidation of Traumatic Memories (RTM) process. Using the PCL-5 checklist (below), research results were dramatic.

See sample study here

The Posttraumatic Stress Disorder (PTSD) Checklist (PCL-5)

The Posttraumatic Stress Disorder (PTSD) Checklist is known as the PCL, it is a self-screening tool to help in the diagnosis of PTSD. The PCL probable diagnosis of PTSD; a definitive diagnosis can only be given by an appropriately qualified clinician. The Posttraumatic Stress Disorder Checklist has four different versions, the version which is most suitable in your case depends on either the psychiatric manual being used for the clinician, or the type of stressful experience that has/may have caused the problems you experience. The different PCL versions are: PCL-5 for PTSD diagnosis using the new DSM-5 psychiatric manual (released 2013) PCL-C for Civilians, for diagnosis using the DSM-IV psychiatric manual PCL-M for Military veterans or service personnel, for diagnosis using the DSM-IV psychiatric manual PCL-S for non-military use, based on a Specific very stressful event rather than multiple events, for diagnosis using the DSM-IV psychiatric manual.

PCL-5: Posttraumatic Checklist for DSM-5 Instructions: There is one question about the stressful experience or event, followed by 20 multiple-choice questions below. These questions have been designed for adults. If you prefer you can download a printable version of this screening tool instead. The questions below are from the PCL-5, which applies to all types of stressful experiences. Disclaimer This self-assessment tool is not a substitute for clinical diagnosis or advice. By using the tool you agree to accept that the distributors and contributors are not responsible or liable for the outcome of the tool, the accuracy of the calculations, or any decisions or events which result from using it. This source does not provide medical advice.

A person who has had an extremely stressful experience may have many a range of different problems as a result of the stressful experience. Some people have had more than one extremely stressful experience. For each of the questions below, keep your worst experience or event in mind, please read each problem carefully and then select one response to indicate how much you have been bothered by that problem in the past month. Description of the specific, worst stressful experience you are holding in mind (not scored): This box can be used by a clinician and compared against the types of "qualifying event" that are known to be possible causes of PTSD.

In the past month, how much were you bothered by:
1. Repeated, disturbing, and unwanted memories of the stressful experience?
2. Repeated, disturbing dreams of the stressful experience?
3. Suddenly feeling or acting as if the stressful experience were actually happening again (as if you were actually back there reliving it)?
4. Feeling very upset when something reminded you of the stressful experience?
5. Having strong physical reactions when something reminded you of the stressful experience (for example, heart pounding, trouble breathing, sweating)?
6. Avoiding memories, thoughts, or feelings related to the stressful experience?
7. Avoiding external reminders of the stressful experience (for example, people, places, conversations, activities, objects, or situations)?
8. Trouble remembering important parts of the stressful experience?
9. Having strong negative beliefs about yourself, other people, or the world (for example, having thoughts such as: l am bad, there is something seriously wrong with me, no one can be trusted, the world is completely dangerous)?
10. Blaming yourself or someone else for the stressful experience or what happened after it?
11. Having strong negative feelings such as fear, horror, anger, guilt, or shame?
12. Loss of interest in activities that you used to enjoy?
13. Feeling distant or cut off from other people?
14. Trouble experiencing positive feelings (for example, being unable to feel happiness or have loving feelings for people close to you)?
15. Irritable behavior, angry outbursts, or acting aggressively?
16. Taking too many risks or doing things that could cause you harm?
17. Being "superalert" or watchful or on guard?
18. Feeling jumpy, or easily startled?
19. Having difficulty concentrating?
20. Trouble falling or staying asleep?

Rate each as:
0 = Not at all
1 = A little bit
2 = Moderately
3 = Quite a bit
4 = Extremely

PCL-5, Weathers, F. W., Litz, B. T., Keane, T. M., Palmieri, P. A., Marx, B. P., & Schnurr, P. P. (2013). Scoring The score is worked out by adding up the number of points for each answer. The minimum score is 0, the maximum is 80.

Online questionnaire here

Result Explained There are several different ways of interpreting the scores given by the PTSD Checklist-5. For a person to have a probable diagnosis of PTSD sufficient criteria must be at least moderately met in each of the four symptom groups. This means you need to have one or more symptoms from questions 1 to 5, either question 6 or 7, two or more from questions 8 to 14, and two or more from questions 15 to 20, each of which must be met moderately, quite a bit or extremely. In addition, a score of 38 or higher indicates probable PTSD in veterans; the score may be set higher or lower for civilians; no agreement has been reached yet since it was only developed after the DSM-5 was published in 2013. A lower cut off may be used for initial screening rather than probable diagnosis, the maximum score is 80. For those people already diagnosed, it can be used to measure improvement. A definite diagnosis can only be given by a clinician, and depends on the details of the extremely stressful experience described at the top of the form and the effect on the individual, a clinician would also need to ask questions about the problems to check the person's understanding of each question and that the PTSD criteria are fully met. The PCL-5 scores are not comparable with scores from the PCL-C, PCL-M or PCL-S because the number of questions and points per question differ.

Measuring improvements in PTSD The PCL5-PTSD tool can be used multiple times after diagnosis to assess the change in PTSD symptoms over time. A reduction of 5 points has been suggested to reflect a reliable reduction in symptoms, meaning the change is not caused by chance. This can be used to check if an individual's symptoms are responding to treatment. A 10-20 point reduction reflects clinically significant change.

Appendix: Writing a Research Report: APA Style JARS

In 2006 the American Psychological Association (APA) formed the Journal Article Reporting Standards Working Group (JARS Working Group), and following here are their recommended standards for Quantitative (numbers measuring) research. JARS--Quant Table 1. Information Recommended for Inclusion in Manuscripts That Report New Data Collections Regardless of Research Design.

Title and Title Page


Author Note




Study Method





Review of Relevant Scholarship

Hypothesis, Aims, and Objectives


Inclusion and Exclusion

Participant Characteristics

Sampling Procedures

Sample Size, Power, and Precision

Measures and Covariates

Data Collection

Quality of Measurements




Conditions and Design

Data Diagnostics

Analytic Strategy


Participant Flow


Statistics and Data Analysis


Support of Original Hypotheses

Similarity of Results