Pagina's

Sunday 23 December 2018

Contrast Analysis with R: Tutorial for obtaining contrast estimates in a 2-way factorial design

In this post, I want to show how contrast estimates can be obtained with R. In particular, I want to show how we can replicate, with R, a contrast analysis of an interaction contrast in a 2 x 4 between subjects design.

Our example is from Haans (2018; see also this post). It considers the effect of students' seating distance from the teacher and the educational performance of the students: the closer to the teacher the student is seated, the higher the performance. A "theory "explaining the effect is that the effect is mainly caused by the teacher having decreased levels of eye contact with the students sitting farther to the back in the lecture hall.

To test that theory, a experiment was conducted with N = 72 participants attending a lecture. The lecture was given to two independent groups of 36 participants. The first group attended the lecture while the teacher was wearing dark sunglasses, the second group attented the lecture while the teacher was not wearing sunglasses. All participants were randomly assigned to 1 of 4 possible rows, with row 1 being closest to the teacher and row 4 the furthest from the teacher The dependent variable was the score on a 10-item questionnaire about the contents of the lecture. So, we have a 2 by 4 factorial design, with n = 9 participants in each combination of the factor levels. 

Here we focus on obtaining an interaction contrast: we will estimate the extent to which the difference between the mean retention score of the participants on the first row and those on the other rows differs between the conditions with and without sunglasses. 

The interaction contrast with SPSS


Saturday 22 December 2018

Planning for precise contrast estimates in between subjects designs

Here I would like to explain the procedure for sample size planning for one-way and two-way (factorial) between subjects designs. We will consider examples based on and described in Haans (2018).


The first example: one-way design


The first example considers the effect of seating location  of students on their educational performance. Seating location is defined as distance from the teacher and operationalized in terms of the row the student is seated in, with first row being the closest to the teacher and the fourth row being the furthest away. 20 Students are randomly assigned to one of the four possible rows, so N = 20, n = 5. The dependent variable is the course grade of the student. (Note: the data and study are hypothetical).

As Haans (2018) explains, one psychological theory explaining the effect of seating position on educational performance is based on social influence. This theory posits that due to the social influence of the teacher, the students that are seated closest to the teacher find themselves in a state of undivided attention. This undivided attention causes their educational performance to be better than the students who are seated further away.

In operational terms, then, we may expect that first row students will have a better average grade than students seated on the other rows. So, the quantitative research question we are interested in is:

"How much do the average grades differ between students seated first row and the students seated on other rows?"

We can estimate this quantity with a Helmert Contrast, where we assign a contrast weight of 1 to mean of the first row grades and weights -1/3 to the means of the grades in the other rows.

Haans (2018) gives us the following results. The contrast estimate equals 2.00 , 95% CI [0.27, 3.73]. In order to interpret this more easily, we divide this estimate by the square root Mean Square Error, to obtain the standardized estimate and standardized confidence interval (not to be confused with the confidence interval of the standardized estimate, but that's a different story. The result is: 1.26, 95% CI [0.17, 2.36].

To answer the research question, the estimated difference equals 1.26 standard deviations, which according to rule-of-thumbs frequently used in psychology is a large difference. The CI shows the enormous amount of uncertainty of this estimate: population values between 0.17 (small) and 2.36 (very large) are also consistent with the observed data and our statistical assumptions. So, it seems safe to conclude that it looks like there is a positive effect of seating position, but the wide range of the CI makes it clear that the data do not tell us enough about the size of the effect, the precision is simply too low.

The precision is f = 1.09, which according to my rules-of-thumb is very imprecise (I consider f = 0.65, to be barely tolerable).

So, let's plan for a replication study with a reasonably precise estimate of  f = 0.40, with 80% assurance. (Note: for some advice on setting target Moe: Planning with assurance, with assurance. ) I've used the app: https://gmulder.shinyapps.io/PlanningFactorialContrasts/ with the default values for a single factor between subjects design with 4 conditions.  According to the app, we need n = 36 participants per condition (making a total of  N = 144).

(For more detailed information considering sample size planning for contrast analysis see: https://the-small-s-scientist.blogspot.com/2019/04/sample-size-planning-for-contrast-estimates.html and for some guidelines for setting target MoE: https://the-small-s-scientist.blogspot.com/2019/03/planning-with-assurance-with-assurance.html)



Saturday 17 November 2018

Planning for precise contrasts in two-way factorial designs: a Tutorial

I've created a first version of Shiny App for sample size planning for precise contrast estimates in one and two-way designs. So, if you want to plan for interaction contrasts for two-way designs, take a look here: https://gmulder.shinyapps.io/PlanningFactorialContrasts/

Important: this is a first version that contains no error checking (but you know what you're doing, so that's not a problem). Also, I have not tested the results of the app with simulation studies. As soon as I have done that, I will share the code of the app.


Update: The values for the factorial two-way within subjects designs are checked with simulation studies (for f = .40, assurance = .80, and correlation rho = .50).

Note: if you have a single factor design, you may also consider looking here: https://the-small-s-scientist.blogspot.com/2018/11/contrast-tutorial.html

Note: if you have a two-groups independent design, go here for an introduction to sample size planning and its relation to the power of the t-test: https://the-small-s-scientist.blogspot.com/2018/03/sample-size-planning-shiny-app.html

See for more detailed information about sample size planning for single factor and factorial designs (between, within, and mixed): https://the-small-s-scientist.blogspot.com/2019/04/sample-size-planning-for-contrast-estimates.html and for guidelines for setting target MoE: https://the-small-s-scientist.blogspot.com/2019/03/planning-with-assurance-with-assurance.html)

How does the app work?

1. Specifying Target MoE and Assurance 

Target MoE should be specified in a number of standard deviations (usually a fraction; for details see Cumming, 2012; Cumming & Calin-Jageman, 2017). The symbol f will be used to refer to this standardized MoE. Target MoE (f) must be larger than zero (f will be automatically set to .05 if you accidentally fill in the value 0). 

I suggest using the following guidelines for target MoE (f): 


Descriptionf
Extremely Precise.05
Very Precise.10
Precise.25
Reasonably Precise.40
Borderline Precise.65

You should only use these guidelines if you lack the information you need for specifying a reasonable value for Target MoE.

Assurance is the probability that (to be) obtained MoE will be no larger than Target MoE. I suggest setting Assurance minimally at .80.

Friday 2 November 2018

Planning for Precise Contrasts: Tutorial for single factor designs

This is a tutorial for  a planning for precision  of contrasts estimates. The application is here: https://gmulder.shinyapps.io/PlanningContrasts/.

NOTE: For a (beta) version of planning for factoral designs: https://the-small-s-scientist.blogspot.com/2018/11/contrasts-factorial-tutorial.html

NOTE: I've updated the app with a few corrections, so there is a new version. (The November version has corrected degrees of freedom  for the 3 and 4 condition within design).

If you like to run the app in R, install the shiny and devtool packages and run the following:

library(shiny)
library(devtools)
source_url("https://git.io/fpI1R")
shinyApp(ui = ui, server = server)


Specifying Target MoE and Assurance 

Target MoE should be specified in a number of standard deviations (usually a fraction; for details see Cumming, 2012; Cumming & Calin-Jageman, 2017). The symbol f will be used to refer to this standardized MoE. Target MoE (f) must be larger than zero (f will be automatically set to .05 if you accidentally fill in the value 0). 

I suggest using the following guidelines for target MoE (f): 


Description f
Extremely Precise .05
Very Precise .10
Precise .25
Reasonably Precise .40
Borderline Precise .65

You should only use these guidelines if you lack the information you need for specifying a reasonable value for Target MoE.

Assurance is the probability that (to be) obtained MoE will be no larger than Target MoE. I suggest setting Assurance minimally at .80. 

Specifying the Design

The app works with independent and dependent designs for 2, 3, and 4 conditions.  With 2 conditions, the analysis is equivalent to the independent and dependent t-tests, with more than two conditions the analysis is equivalent to one-way independent ANOVA or dependent ANOVA.


Specifying the Cross-Condition correlation
If you choose the dependent design, you also need to specify a value for the cross-condition correlation. This value should be larger than zero. One of the assumptions underlying the app, is that there is only 1 observation per participant (or any other unit of analysis). That is why I like to think of this correlation as (conceptually related to) the reliability of the participant scores (averaged over conditions). From that perspective, a correlation around .60 would be borderline acceptable and around .80 would be considered good enough. So, for worst-case scenarios use a correlation smaller than .60, and for optimistic scenario's correlations of .80 or larger.

Note: for technical reasons a correlation of 1 will be automatically changed to .99.

For independent designs the correlation should equal 0. (And the above story about reliability does no longer make sense; but we also do not need it).


Friday 19 October 2018

Custom contrasts for the one-way repeated measures design using Lmer

Here is some code for doing one-way repeated measures analysis with lme4 and custom contrasts. We will use a repeated measures design with three conditions of the factor Treat and 20 participants. The contrasts are Helmert contrasts, but they differ from the built-in Helmert contrasts in that the sum of the absolute values of the contrasts weights equals 2 for each contrast.
The standard error of each contrast equals the square root of the product of the sum of the squared contrast weight w and the residual variance divided by the number of participants n.

$$\sigma_{\hat{\psi}} =  \sqrt{\sum{w_i}\sigma^2_e/n}$$

The residual variance equals the within treatment variance times 1 minus the correlation between conditions. (Which equals the within treatment variance minus 1 times the covariance $\rho\sigma^2_{within}$) .

$$\sigma^2_e = \sigma^2_{within}(1 - \rho)$$

In the example below, the within treatment variance equals 1 and the covariance 0.5 (so the value of the correlation is .50 as well). The residual variance is therefore equal to .50.

For the first contrasts, the weights are equal to {-1, 1, 0}, so the value of the standard error of the contrasts should be equal to the square root of 2*.50/20 = 0.2236. 
 

library(MASS)
library(lme4)

# setting up treatment and participants factors
nTreat = 3
nPP = 20
Treat <- factor(rep(1:nTreat, each=nPP))
PP <- factor(rep(1:nPP, nTreat))

# generate some random 
# specify means

means = c(0, .20, .50)

# create variance-covariance matrix
# var = 1, cov = .5
Sigma = matrix(rep(.5, 9), 3, 3)
diag(Sigma) <- 1

# generate the data; using empirical = TRUE
# so that variance and covariance are known
# set to FALSE for "real" random data

sco = as.vector(mvrnorm(nPP, means, Sigma, empirical = TRUE))

#setting up custom contrasts for Treatment factor 
myContrasts <- rbind(c(-1, 1, 0), c(-.5, -.5, 1))
contrasts(Treat) <- ginv(myContrasts)

#fit linear mixed effects model: 
myModel <- lmer(sco ~ Treat + (1|PP))

summary(myModel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: sco ~ Treat + (1 | PP)
## 
## REML criterion at convergence: 157.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6676 -0.4869  0.1056  0.6053  1.9529 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  PP       (Intercept) 0.5      0.7071  
##  Residual             0.5      0.7071  
## Number of obs: 60, groups:  PP, 20
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   0.2333     0.1826   1.278
## Treat1        0.2000     0.2236   0.894
## Treat2        0.4000     0.1936   2.066
## 
## Correlation of Fixed Effects:
##        (Intr) Treat1
## Treat1 0.000        
## Treat2 0.000  0.000

Thursday 4 October 2018

Distribution of the difference between two binomial variables

For a project I'm working on, I needed to work with the probabilities associated with the difference between two binomial variables. I thought I'd share the code for four functions for calculating the probability mass and the cumulative probabilites.

The functions d.diff.binom (probability mass) and p.diff.binom (cumulative distribution) are functions for calculating the distribution of the differences for equal sample sizes N and equal "succes" probability.  The arguments of the functions are the quantiles k of the distribution of the differences, the number of trials N and the probability of succes p. 


The functions d.diff.binom.un (probability mass) and p.diff.binom.un (cumulative distribution) are functions for calculating the distribution of the differences for sample sizes and succes probabilities that may (or may not) differ between the two populations. The arguments of the functions are the quantiles k of the distribution of the differences, the number of trials N.1 and N.2 for the two groups and the probability of succes p.1 and p.2 for each group. 



#equal N and p:
#probability mass:
d.diff.bin = function(k, N, p) {
diff = outer(0:N, 0:N, "-")
prob = outer(dbinom(0:N, N, p, log=TRUE),
dbinom(0:N, N, p, log=TRUE), "+")
p = sum(exp(prob[diff == k]))
return(p)
}
#cumulative probability:
p.diff.bin = function(k, N, p) {
diff = outer(0:N, 0:N, "-")
prob = outer(dbinom(0:N, N, p, log=TRUE),
dbinom(0:N, N, p, log=TRUE), "+")
p = sum(exp(prob[diff <= k]))
return(p)
}
#examples
d.diff.bin(0, 30, .50)
## [1] 0.1025782
p.diff.bin(0, 30, .50)
## [1] 0.5512891
#mean of distribution:
N = 30
m = sum(sapply(-N:N, d.diff.bin, N = 30, p= .50)*(-N:N))
m
## [1] -3.084642e-20
all.equal(0, m)
## [1] TRUE
#(un)equal N and p:
#probability mass:
d.diff.bin.un = function(k, N.1, N.2, p.1, p.2) {
diff = outer(0:N.1, 0:N.2, "-")
prob = outer(dbinom(0:N.1, N.1, p.1, log=TRUE),
dbinom(0:N.2, N.2, p.2, log=TRUE), "+")
p = sum(exp(prob[diff == k]))
return(p)
}

#cumulative distribion: 
p.diff.bin.un = function(k, N.1, N.2, p.1, p.2) {
diff = outer(0:N.1, 0:N.2, "-")
prob = outer(dbinom(0:N.1, N.1, p.1, log=TRUE),
        dbinom(0:N.2, N.2, p.2, log=TRUE), "+")
p = sum(exp(prob[diff <= k]))
return(p)
}

#examples
d.diff.bin.un(0, 30, 20, .60, .70)
## [1] 0.05896135
p.diff.bin.un(0, 30, 20, .60, .70)
## [1] 0.1497043

Thursday 21 June 2018

A rule of thumb for setting target MOE

One of the most difficult aspects of sample size planning for precision is the specification of a target Margin of Error (MoE). Here, I would like to introduce a simple rule of thumb, in the hope that it helps you in determining a reasonable target MoE.
Here, the rule of thumb is applied to obtaining an estimate of the difference between two independent group means, where the two populations are normally distributed with equal variances.

Goal 1: Assessing the direction of an effect

Sample size planning starts with formulating a goal for the research. A very common goal is to try to determine the direction of an effect. For the goal of assessing the direction of an effect, it helps if the confidence interval of the difference contains only positive or negative values. That is, you want a confidence interval that exludes the value 0, for if that value is included, you would probably conclude that the estimate is consistent with both positive and negative effects. Thus, our first goal is to obtain a confidence interval of the mean difference that excludes the value 0.

Now, a confidence interval excludes 0, if obtained MOE is at most equal to the obtained effect size estimate. Suppose that the estimate equals the true effect of say, 0.50, we want MOE to be at most very close to 0.50, otherwise 0 will be included in the interval. But if our estimate underestimates the true effect, say the estimate equals 0.30, we want MOE to be at most very close to 0.30. Likewise, if we overestimate the effect, MOE can be larger than 0.50.

This means that we cannot say, for instance, we expect that the true effect is .50, so let's plan for a target MOE that with 80% assurance is at most .50, because this target MOE may be too large for underestimates of the true effect, depending on the extent to which the effect is underestimated. So, in specifying target MOE, we should take into account that underestimates of the effect size occur. (Actually, these underestimates occur with a relative frequency of 50% in a huge collection of direct replications). We can say that we do not only want to exclude zero from the interval, but also that we want that to occur in a large proportion of direct replications. This will be our second goal. I will call the probabiity associated with our second goal, the probability of exclusion (PE)

The rule of thumb is that if we want 80% probability that a random confidence interval excludes zero, we should plan for an expected MOE equal to f = d / √2. (the square root sign is unreadable in my browser; so in words: the effect size divided by the square root of 2; with mathjax: $f = d / \sqrt{2}$). Since there is 50% probability that obtained MOE will be larger than expected MOE, this is equal to planning for target MOE = f = d / √2, with 50% assurance or simply without assurance. You can do this in the ESCI-software, but also with the R-functions provided below.

The first example in the code below, is an illustration of planning for assessing the direction of the effect, with true effect size d = .50. If we want 80% assurance to have only positive values in our confidence interval, we should plan for a target MoE = expected MoE = f = d / √2 = 0.3535. Using the SampleSize-function below, this gives a sample size n = 63, or total sample size = N = 2*63 = 126. The probability that the confidence interval excludes 0 equals approximately 80% (p = 0.7951). So, the rule of thumb of planning for d / √2, seems to work pretty good.

Goal 2: distinguishing between effect sizes

If your research goal is to estimate the value of the effect size in stead of its direction, the rule of thumb can be used as follows. Suppose we do not know the true effect size, but want to have 80% assurance that we have a high probability to be able to distinguish between small (d = .20) and large effects (d = .80). That is, if the true effect is .20 we want the value .80 to be excluded from the confidence interval and if the true effect is .80, we want the value .20 to be excluded from the confidence interval.

We can proceed as follows, the difference between the effect sizes is .80 - .20 = .60. We use this value to determine target MOE. Thus, if we now plan for a target MoE = expected Moe = d / √2), we should have approximately 80% PE that obtained MoE will exclude 0.80 if the true effect is 0.20 and vice versa. The functions below give sample size n = 44, and the probablity of exclusion equals .7947. So, our rule of thumb, seems to work pretty good again. See example 2 in the code below.

Alternatively, we could take the region of practical equivalence (ROPE) into account. Suppose, our equivalence range equals .10 sigma. If we want to have enough precision to distinguish large from small effects, we should plan as follows. We take the difference between a large effect and the upper equivalence value of a small effect or, equivalently, the difference between a small effect and the lower equivalence vaue of a large effect, i.e. .50, and plan for f = .50 / √2. If the effect is large we expect a confidence interval that excludes the equivalence range for the small effect (and vice versa), with 80% probability of exclusion.

But we could also take the difference between the lower equivalence value of a large effect and the upper equivalence value of a small effect, i.e. .40, and plan for f = .40/√2. (See the third example in the code below) This will give us 80% PE that any true value within the ROPE of the one effect will exclude values in the ROPE of the other. For example, if the true effect is .70, and expected MOE equals .40/√2 = .2828, there is approximately 80% probability that the 95% CI excludes .30, which is in the ROPE of a small effect. The expected CI will be .70 +/- .2828 = [0.4172, 0.9828]. Note that the lower limit is larger than the upper limit of the ROPE for d = .20, as we want it to be. Note, however, that if the true effect is small (d = .20), the CI will exclude effects equivalent to large effects, which is consistent with our research goal, but it will not exlude the value 0 or effects equivalent to a medium effect. Indeed, the expected CI will be [-0.0828, 0.4828]. (This is not a problem, of course, since this was not the purpose of our research)

As a final example, suppose we want sufficient precision to distinguish small from medium effects (or large from medium effects). If we take the ROPE perspective, with an equivalence range of +/- .10 sigma, the lower equivalence value of the medium effect equals .50 - .10 = .40 and the upper limit of the small effect equals .30. If we want 80% assurance that the CI will be small enough to distinguish small from medium effects, we should plan for expected MOE f = (.40 - .30)/√2 = 0.0707. Using the functions below, this requires a sample size n = 1538. (See the final example in the code below).

Setting target MOE: conclusion

In summary, the rule of thumb is to divide the effect size d by √2 and plan for an expected MoE equal to this value. This will give you a sample size that gives approximately 80% assurance that the CI will not contain 0. In the case of distinguishing effect sizes, one option is to divide the difference between the lower equivalence value of the larger effect and the upper equivalence value of the smaller effect by the square root of 2 and plan for an expected MoE equal to this value. This will give you a sample size that gives approximately 80% PE that the CI of the estimated true value of one effect excludes the values in the ROPE of the other effect.

Do you want at least 90% PE? Use the square root of three, in stead of the square root of two, in determining target MoE.

eMoe = function(n) {
eMoe = qt(.975, 2*(n - 1))*sqrt(2/n)
return(eMoe)
}

cost <- function(n, tMoe) {
(eMoe(n) - tMoe)^2
}

sampleSize <- function(tMoe) {
optimize(cost, interval=c(10, 5000), tMoe = tMoe)$minimum
}

# FIRST EXAMPLE
# plan for 80% assurance of excluding 0
# i.e. estimate the direction if true effect
# equals .50 

d = .50

#application of rule of thumb:
f = .50 / sqrt(2)

#sampleSize (uses ceiling() to round up): 
n = ceiling(sampleSize(f))
n
## [1] 63
# Probabiity of Exclusion (here taken to be equivalent to
# power for two-sided t-test (since true direction is unknown))
df = 2*(n - 1)
ncp = f / sqrt(1/n) #or ncp = d / sqrt(2/n)

pt(qt(.025, df), df, ncp) + 1 - pt(qt(.975, df), df, ncp)
## [1] 0.7951683
# SECOND EXAMPLE: 
# distinguish between small and large effect sizes: 
d = .80 - .20
f = d / sqrt(2)

n = ceiling(sampleSize(f))
n
## [1] 44
df = 2*(n - 1)
ncp = f / sqrt(1/n) #or ncp = d / sqrt(2/n)

#PE: 

pt(qt(.025, df), df, ncp) + 1 - pt(qt(.975, df), df, ncp)
## [1] 0.79467
# EXAMPLE 3: distinguish small and large with ROPE
# ROPE small and large: 
rope.small = c(.10, .30)
rope.large = c(.70, .90)

d = rope.large[1] - rope.small[2]
f = d / sqrt(2)

n = ceiling(sampleSize(f))

n
## [1] 98
df = 2*(n - 1)
ncp = f / sqrt(1/n) #or ncp = d / sqrt(2/n)

#PE: 

pt(qt(.025, df), df, ncp) + 1 - pt(qt(.975, df), df, ncp)
## [1] 0.7956414
# Example 4: distinguish medium from small 
# or medium from large with ROPE

rope.medium = c(.40, .60)
d = rope.medium[1] - rope.small[2]
f = d / sqrt(2)

n = ceiling(sampleSize(f))

n
## [1] 1538
df = 2*(n - 1)
ncp = f / sqrt(1/n) #or ncp = d / sqrt(2/n)

#PE:

pt(qt(.025, df), df, ncp) + 1 - pt(qt(.975, df), df, ncp)
## [1] 0.7916783

Friday 16 March 2018

Sample size planning for precision: the basics

In this post, I will introduce some of the ideas underlying sample size planning for precision. The ideas are illustrated with a shiny-application which can be found here: https://gmulder.shinyapps.io/PlanningApp/. The app illustrates the basic theory considering sample size planning for two independent groups. (If the app is no longer available (my allotted active monthly hours are limited on shinyapps.io), contact me and I'll send you the code).

The basic idea

The basic idea is that we are planning an experiment to estimate the difference in population means of an experimental and a control group. We want to know how many observations per group we have to make in order to estimate the difference between the means with a given target precision. 

Our measure of precision is the Margin of Error (MOE).  In the app, we specify our target MOE as a fraction (f) of the population standard deviation. However, we do not only specify our target MOE, but also our desired level of assurance. The assurance is the probability that our obtained MOE will not exceed our target MOE. Thus, if the assurance is .80 and our target MOE is f = .50, we have a probability of 80% that our obtained MOE will not exceed f = .50. 

The only part of the app you need for sample size planning is the "Sample size planning"-form. Specify f, and the assurance, and the app will give you the desired sample size. 

If you do that with the default values f = .50 and Assurance  = .80, the app will give you the following results on the Planning Results-tab:  Sample Size: 36.2175, Expected MOE (f): 0.46. This tells you that you need to sample 37 participants (for instance) per group and then the Expected MOE (the MOE you will get on average) will equal 0.46 (or even a little less, since you sample more than 36.2175 participants). 

The Planning-Results-tab also gives you a figure for the power of the t-test, testing the NHST nil-hypothesis for the effect size (Cohen's d) specified in the "Set population values"-form. Note that this form, like the rest of the app provides details that are not necessary for sample size planning for precision, but make the theoretical concepts clear. So, let's turn to those details. 


The population

Even though it is not at all necessary to specify the population values in detail, considering the population helps to realize the following. The sample size calculations and the figures for expected MOE and power, are based on the assumption that we are dealing with random samples from normal populations with equal variances (standard deviations). 

From these three assumptions, all the results follow deductively.  The following is important to realize:  if these assumptions do not obtain, the truth of the (statistical) conclusions we derive by deduction is no longer guaranteed. (Maybe you have never before realized that sample size planning involves deductive reasoning; deductive reasoning is also required for the calculation of p-values and to prove that 95% confidence intervals contain the value of the population parameter in 95% of the cases; without these assumptions is it uncertain what the true p-value is and whether or not the 95% confidence interval is in fact a 95% confidence interval).

In general, then, you should try to show (to others, if not to yourself) that it is reasonable to assume normally distributed populations, with equal variances and random sampling, before you decide that the p-value of your t-test, the width of your confidence interval, and the results of sample size calculations are believable.

The populations in the app are normal distributions. By default, the app shows two such distributions. One of the distributions, the one I like to think about as corresponding to the control condition, has μ = 0, the other one has μ = 0.5. Both distributions have a standard deviation (σ = 1). The standardized difference between the means is therefore equal to δ = 0.50.

The default populations are presented in Figure 1 below.