21  Data Analysis

21.1 Determine the most suitable analyses for your experimental design.

Data analysis is a pivotal stage in any experimental study. Its significance extends beyond merely extracting insights from your collected data; it is a means of scrutinising your research hypothesis to either support or refute it.

In this era of open and transparent science, the role of data analysis has evolved, moving away from merely being a procedure conducted post data collection. Today, it integrates into all stages of experimental research, starting right from the design phase itself. Formulating a robust analysis plan during the design phase significantly improves the quality and validity of your experiment, ensuring that the planned analyses align perfectly with your research question and study design. This proactive approach is also helpful in identifying potential roadblocks and addressing them even before they emerge.

ChatGPT can prove to be an instrumental aid in this process. It can assist in developing an analytical plan, generate appropriate scripts for data analysis, and even provide feedback on the planned design, thereby fostering an environment of transparency and reproducibility. With ChatGPT’s capabilities, researchers can obtain a preview of potential outcomes and foresee challenges that could arise in the actual experiment.

One key principle in conducting rigorous research is the concept of pre-registration, where researchers declare their hypotheses, experimental designs, and data analysis plans before data collection. Pre-registration helps prevent practices such as p-hacking and HARKing (Hypothesising After the Results are Known), which can lead to false-positive results and hamper the reliability of scientific findings.

In line with pre-registration, researchers are strongly encouraged to draft their data analysis scripts even before they commence data collection. This practice allows researchers to visualise the entire process from the onset, fine-tuning their approach as needed, and reducing the potential for bias. Using AI models like ChatGPT to assist in drafting these analysis scripts could significantly speed up this process and ensure that the final scripts align with the best practices in the field.

Given the capabilities of AI models like ChatGPT, researchers can potentially conduct their analysis plans more effectively and transparently, fostering a deeper understanding of their experiments and promoting a culture of open science. Remember, the best time to plan and prepare your analysis scripts is right at the beginning, during the design and simulation phase, when you still have the opportunity to make changes. This forward-thinking approach ensures that your experiment is well-planned, well-executed, and well-analysed, leading to valuable, reliable, and replicable scientific findings.

The analysis process, thus, demands careful planning and execution, particularly when handling complex experimental designs like the truthiness experiment extending Eryn Newman’s work that we’ve been discussing.

21.1.1 Prompt

We're about to run an experiment that builds on Eryn Newman's 'truthiness' work. She presented a bunch of claims, like Giraffes are the only mammals that can't jump, and asked people to decide whether the claim is true or not. Again and again, she found that if the claim was accompanied by a photograph, people were more likely to believe it. We're going to extend this effect by using video, photos, or quotes of people making these various claims rather than using stock footage or photos. We'll also test the generalisability of the truthiness effect with two domains: trivia and COVID-19 claims. We'll compare the following three conditions:

  • Video: Video footage of an expert making the claim.

  • Photo: A still image of an expert making the claim alongside a quote.

  • Quote: A quote from an expert making the claim.

We'll present each participant with 60 different claims (30 per domain, half are true and half are false), where one-third of the claims will be presented by an alleged expert during a mock news report (video), one-third will be presented as a quote next to a still frame from this report (photo), and one-third will be presented as a quote without any media (quote). In this repeated measures design, the same participant sees all three levels of this 'Presentation' variable: video, photo, and quote in a different random order across both trivia and COVID-19 domains. The domain order will be counterbalanced to eliminate ordering effects.

To avoid potential confounding effects where the claim itself might inherently seem more or less plausible, we're randomising not only the order of presentation within each domain but also the association between claims (both true and false) and the type of presentation. This means that for different participants, the same claim might be presented as a quote, photo, or video.

Here's the structure of the dataframe from two simulated participants:

------------------

[INSERT DATAFRAME]

------------------

Given the above experimental details and dataframe, help us determine the most suitable statistical analyses for our experimental design and the order in which we should conduct these analyses.

--------------------------------

ChatGPT Analysis Recommendations

--------------------------------

Great! Now start at the beginning and help me write an analyses script using R Markdown.

Interactive ChatGPT Example