Which Of The Four Treatments Was The Control For The Experiment?
PLoS One. 2014; 9(12): e114872.
A Common Control Grouping - Optimising the Experiment Blueprint to Maximise Sensitivity
Simon Bate
ane Statistical Science Europe, GlaxoSmithKline Pharmaceuticals, Stevenage, United Kingdom,
Natasha A. Karp
two Mouse Informatics Grouping, Wellcome Trust Sanger Institute, Cambridge, United Kingdom,
Shyamal D. Peddada, Editor
Received 2014 Aug six; Accepted 2014 Nov 14.
- Supplementary Materials
-
GUID: A003CB52-F38D-41E2-B5CD-12761FFA102E
GUID: C0019282-5636-413A-AB24-F4CC18E4FE04
- Information Availability Statement
-
The authors ostend that all data underlying the findings are fully available without brake. All relevant information are within the paper.
Abstract
Methods for choosing an advisable sample size in beast experiments accept received much attention in the statistical and biological literature. Due to upstanding constraints the number of animals used is e'er reduced where possible. Still, as the number of animals decreases so the adventure of obtaining inconclusive results increases. By using a more than efficient experimental design we tin can, for a given number of animals, reduce this risk. In this paper ii popular cases are considered, where planned comparisons are fabricated to compare treatments back to command and when researchers plan to make all pairwise comparisons. By using theoretical and empirical techniques we show that for studies where all pairwise comparisons are made the traditional balanced blueprint, as suggested in the literature, maximises sensitivity. For studies that involve planned comparisons of the handling groups back to the control group, which are inherently more sensitive due to the reduced multiple testing burden, the sensitivity is maximised by increasing the number of animals in the command group while decreasing the number in the treated groups.
Introduction
The 3R's, Replacement, Reduction and Refinement, introduced as a framework for achieving the most humane treatment of experimental animals, has been widely accepted equally a prerequisite for a successful animal experiment [1]. Attention on the refinement element of the framework has been growing in recent years. Refinement refers to improvements to scientific procedures and husbandry which minimise bodily or potential pain, suffering, distress or lasting damage and/or improve brute welfare in situations where the use of animals is unavoidable. In 2009, Kilkenny et al. published a systematic review of published papers involving in vivo experiments and highlighted that many published experiments did non use the virtually appropriate experimental design. For case, in the survey only 62% of experiments that should have employed a factorial design had in fact done so [ii]. Experimental design and statistical analysis fall nether the refinement chemical element of the 3R's as they reduce further experimentation and ensure that the animals used fulfil the goals of the experiment. This has led to the publication of the Beast Research: Reporting In Vivo Experiments (ARRIVE) guidelines [iii], a checklist that aims to embed proficient practise in the experimentation procedure. The bear upon of poor experimental design can be profound, as shown by a systematic written report that found a lack of concordance between animate being experiments and clinical trials [4]. The authors ended that majority of the brute studies were of poor methodological quality. In practice though poor blueprint and analysis is not restricted to animate being experimentation and is thought to be endemic throughout scientific research [5], [6].
In scientific discipline and statistics, validity is the extent to which a conclusion or measurement is reliable and corresponds accurately to the real world. The validity of an experiment tin exist evaluated in many ways. For example, the conclusion validity is the degree to which conclusions we attain about our data are reasonable [seven] and relates to the experiment'southward ability to assess the human relationship. The bulk of studies involving animals use statistical hypothesis testing, where a p-value is calculated to assess whether the zero hypothesis (of no effect) can be rejected and hence the alternative hypothesis (the outcome you are trying to prove) accepted. With the use of inferential hypothesis testing, in that location is potential to conclude there is an effect when in fact there is none – a false positive (type I mistake). Conversely, there is potential to conclude in that location is not an issue when in fact there is one (blazon II error). When considering the type Two mistake rate it is oftentimes more useful to consider the statistical power (, where is the probability of a blazon II error occurring). The statistical power is the probability (or chance) of achieving a statistically significant result when conducting a study given that, in reality, there is a real effect [viii]. In practice a power in excess of lxxx% is ordinarily considered acceptable. With experiments involving animals information technology is critical to ensure that the experiment has sufficient power then that not only real effects are detected, simply also that the experiment is non over-resourced such that animals are wasted [9].
Frequently, animal researchers conduct experiments that involve multiple treatments and a common command. For case, a survey of recent PLoS ONE papers identified an R&D drug study involving multiple different treatments versus a vehicle control [10], a study comparing loftier cholesterol diets to a low cholesterol nutrition [xi] and a written report comparison responses at later time points to a baseline group [12]. This type of study design is also commonly used in toxicology and safety assessment where studies are typically performed then that they can compare increasing doses of a treatment back to a control group. For example, Lee et al. [xiii] describe a repeated oral dose toxicity study in rats to compare three doses of {"blazon":"entrez-protein","attrs":{"text":"KMS88009","term_id":"870800947","term_text":"KMS88009"}}KMS88009 back to a vehicle control. In these experiments comparisons back to the control will exist the merely comparisons that are of involvement, regardless of the experimental results. Information technology is important to annotation that the researcher plans which comparisons that wish to make in accelerate – they are examples of so-called planned comparisons [14], every bit opposed to general 'post hoc testing' which involves making all pairwise comparisons). Planned comparisons are benign for two reasons. Firstly, the decision regarding which tests to perform is made earlier the data is nerveless and hence is not influenced by the observed results. In theory, this should reduce the risk of inadvertently finding faux positive results in a 'data-trawling' practice. Secondly, planned comparisons increase the sensitivity of the experiment equally it reduces the multiple testing burden. The multiple testing burden arises because the hazard of finding a fake positive, for a given significance threshold, accumulates with each statistical test conducted. If all pairwise comparisons are performed, for example using an LSD (Least Significant difference) examination [8], and so there is an increased risk of finding fake positives. To manage this hazard a more stringent threshold is applied; by making a multiple comparison adjustment to the LSD p-values. Consider the scenario with one control group and iii treatments. If all groups are compared then the mail service-hoc testing would involve 6 separate pairwise statistical comparisons. However, if planned comparisons of treatments back to control are performed so this corresponds to simply three divide statistical comparisons and the threshold aligning would be less. In this paper, we shall consider the implications on the choice of design when the researcher knows in advance which comparisons they wish to make.
When constructing experimental designs that involve a number of treatment groups and a command group, involvement rightly focuses on the sample size that is required in each of the experimental (treatment and control) groups. It appears to be standard practice to assign the same number of animals to each of the experimental groups (the so-called 'balanced' designs). Such practice is maybe encouraged by sample size adding software, where typically ane sample size is recommended beyond all groups [xv], [xvi]. The statistical test applied also influences the sample size required. A common approach used to analyse data generated from these experiments, assuming the parametric assumptions concur, is to compare the handling group means to the command group mean, using either t-tests, Analysis of Variance followed by Dunnett's test or applying a multiple comparison adjustment to the LSD p-values. It is therefore common practise to perform a sample size calculation under the assumption that the statistical assay volition be performed using i of these tests [17].
In this newspaper, we shall employ optimal design theory to investigate the effects of varying the replication of the experimental groups. Nosotros shall assume that the information will be analysed using either multiple t-tests or Analysis of Variance followed by a suitable multiple comparison procedure. Crucially we differentiate between the experimental situations where the researcher but plans to compare the treatments back to control and when they programme to make all pairwise comparisons. We will focus on the old case, and highlight how dissimilar experimental designs result in different levels of statistical power.
Methods
Two approaches are considered in this paper to investigate the upshot of varying the command group replication; a theoretical investigation and a ability comparing.
Theoretical approach to maximising sensitivity
For the theoretical investigation, we need to make a few assumptions. While restrictive, many creature experiments satisfy these assumptions. We presume that:
-
The researcher conducts an experiment to either (a) compare treatments to a single control or (b) make all possible pairwise comparisons betwixt the experimental groups. The experimental design therefore involves experimental groups.
-
A full of animals are used in the experiment and they are allocated at random to the experimental groups.
-
The replication in the handling groups is the same .
-
The replication in the command grouping is .
-
The variability of the responses is the aforementioned across all experimental groups. In practice the response may crave a transformation in social club to satisfy this condition.
-
The parametric assumptions hold (for example, the responses are numerical, contained, continuous and the residuals are normally distributed) and hence a parametric test, such every bit the t-test or Assay of Variance followed past pairwise comparisons, will exist used to compare the experimental groups.
Past considering the predicted standard error of the estimates of the comparisons of interest, when using a given experimental design, it is possible to compare and contrast different designs. The more efficient the design, the smaller the predicted standard errors volition be and hence the statistical tests will be more sensitive. For a given total number of animals, we apply mathematical arguments (encounter S1 Derivations for more than details) to investigate how varying the replication of the command group influences these standard errors.
Ability assay assessment
To highlight the applied implications of using different experimental designs, we investigate the statistical power that can be achieved when comparison all treatments back to a single common control using planned comparisons. The tests within this manuscript are not adjusted for multiplicity, equally the adjustment needed varies between the assay scenarios and adds complication to the analysis. The absolute level of statistical power is not of direct interest; rather nosotros are interested in investigating how varying the experimental pattern (control grouping replication) influences its statistical power.
For a given level of variability , a difference betwixt the two group means , a significance level of 5% and sample sizes and , the ability of a two-sided test that is not adjusted for multiplicity is given by
(1)
where is the cumulative density function (CDF) of the t distribution with degrees of freedom and is an estimate of the variance [14]. The derivation of this formula is given in S2 Derivations.
Using (1) nosotros investigate the power that can be accomplished in various real-life situations. For convenience the total number of animals included in each state of affairs is selected and then that (where is the number of treatment groups and is the replication in the treatment groups) is approximately an integer.
Results
Theoretical approach to maximising sensitivity
Using the mathematical arguments given in S1 Derivations we tin, for a variety of scenarios, assess the optimal replication in the command grouping to achieve for maximum sensitivity. We assume that the researcher is running an experiment that satisfies the 5 conditions discussed in the methods.
Scenario 1
Presume that the just comparisons the researcher plans to make involve comparing the treatment groups dorsum to the command grouping. For a given total number of animals , if there are animals in each of the treatment groups and animals in the command group, then permit there be more than animals in the control group compared to the treated groups, i.due east.
and
Note if then there are more than animals in the control group compared to the treated groups and if then there are fewer animals in the command group compared to the treated groups.
The estimates of the pairwise comparisons of involvement are as precise as possible if:
In other words, the number of animals in the control group should exist times the number in the treatment groups. So in an experiment involving comparing four treatment groups back to a control group, so twice as many animals should be allocated to the command group than are allocated to the treatment groups.
Scenario 2
Assume that the researcher is interested in making all possible pairwise comparisons between the experimental groups. It turns out that these comparisons are estimated every bit precisely as possible if:
In other words, as expected past symmetry, as all groups are involved in the same number of comparisons, the aforementioned number of animals should be allocated to each of the experimental groups (treatment groups and the control group).
From consideration of these two scenarios, we can see the optimal design depends on the goal of the experiment. With the defined planned comparisons in Scenario one, an unbalanced design with more animals allocated to the command group results in comparisons that are estimated more precisely. This gain in sensitivity is at the expense of treatment comparisons that the researcher does not plan to make. In other words, the pairwise comparisons of involvement are more than precise, everything else beingness equal, if one design is employed when compared to some other. From a less mathematical signal of view, these results make sense as the control group is used more often than the other treatment means, and hence it is important to have a precise judge of the control group mean.
Power analysis cess for Scenario 1
We shall now consider Scenario 1 in more detail. The previous analysis identified the optimal design to maximise sensitivity and nosotros now focus on quantifying the touch on on the statistical power of the diverse designs.
Using equation (1), the statistical power of various levels of replication of the control group was investigated by assessing various designs when the size of the biological effect increases for a defined amount of biological variability (Table i and Fig. 1) and when the size of the biological effect is fixed but the biological variation increases (Tabular array 2 and Fig. ii). 3 control grouping replication strategies are considered: when the replication in the control group is (i) more than than the treated groups - the theoretically optimal solution, (ii) equal to the replication in the treated groups and (iii) less than the replication in the treated groups. While (i) has been shown to be the optimal solution, (ii) and (iii) are usually applied in practice and hence it is of interest to consider how these designs compare to the theoretically optimal design. To give context, the biological upshot existence tested for each adding has been presented as a standardised consequence size (Cohen's d or Z statistic) where the biological issue is scaled relative to the biological variability (i.e. 1 equals a differences equivalent to one unit of variability) [18].
Statistical ability of various levels of replication of the command group as the biological result increases.
The variability of the responses is fixed at ii.25. Three strategies for selecting the size of the control group were considered: (i) Optimal, co-ordinate to the theoretical derivation, (ii) Equal to the handling groups and (iii) Less than, where the control group replication is less than the treatment groups.
Statistical ability of various levels of replication of the control group as the variability increases.
The difference between the treatment and control groups is fixed at 2. Three strategies for selecting the size of the control grouping were considered: (i) Optimal, co-ordinate to the theoretical derivation, (ii) Equal to the treatment groups and (iii) Less than, where the control group replication is less than the treatment groups.
Tabular array ane
Tabular array 2
From Tables i and 2 it can exist seen that, in the situations considered, a gain in statistical power of between 0.16% and vii.02% (with a median of 3.12%) can be accomplished when using the mathematically optimal replication of controls, compared to replicating all groups every bit. This benefit is reduced if the statistical ability obtained when using both designs approaches 100%. While such improvements are possibly of marginal practical importance, especially in suitably powered experiments, it is however the case that a slight modify to the experimental design tin outcome in more sensitive statistical tests without increasing the total number of animals used.
Perhaps more strikingly, from Table 1 and 2 it can be seen that at that place is a significant drop in statistical power if the number of animals in the control group is less than the number in the handling groups. For example, if a design is required to compare four treatments with a control, the size of biologically relevant effect is 2 and the variability of the responses is 2.25, then a 30% increase in power can be achieved if the optimal replication of animals is used, when compared to a design where at that place are fewer animals in the control group compared to the treated groups.
Conclusions
A review of the literature seems to imply the researcher should employ a counterbalanced blueprint with the same number of animals allocated to each experimental group. For example, Ruxton and Colegrave [xix] country "Ever aim to balance your experiment, unless you have a very adept reason not to". In many statistical texts the sample size calculation is performed when the experimental blueprint consists of but two groups. In this instance the balanced blueprint is usually a sensible option. Unfortunately designs used in exercise are rarely then straightforward, and hence the orthodox strategy may not always be appropriate.
In this paper nosotros have shown the benefit that can be gained from using an experimental design that has been constructed to favour the comparisons that the researcher plans to make. Focusing on the comparisons of interest increases sensitivity as it reduces the adjustment that is required to manage the multiple testing brunt (i.due east. reduce the false positive adventure). In the example considered in this newspaper, where the researcher wishes to compare t treatments to a control, the design should be selected and then that the number of animals in the control grouping is times the number in the treated groups. It has been shown that such a design performs better than the usually used strategy of equally replicating across the treatment and control groups. While beyond the scope of this paper, a similar approach tin can be used for the more complicated experimental designs. For case, the choice of block or cross-over design can influence the reliability of the treatment comparisons.
Another strategy that researchers may follow when designing their experiments is to reduce the number of animals in the control grouping compared to the treatment groups. This approach is usually taken because the researcher has access to historical control data and feels that this noesis implies fewer concurrent controls are needed. In that location has been much written about the benefit of using historical control data when assessing the event of treatments [20] though it does not replace concurrent controls. Prior cognition, perhaps obtained from a historical control database, can also exist incorporated into the statistical analysis using a Bayesian analysis epitome. This approach has been successfully applied in clinical research, although such studies usually involve comparing a single handling to a control or placebo [21]. While at that place are certain benefits to comparison multiple treatments to a historical control group, this work highlights that a study with both concurrent and historic controls does not necessarily imply that fewer animals tin exist included in a concurrent control. When treatments are compared using the popular statistical analysis approaches discussed in the methods section, we have demonstrated that having more than animals in the treatment groups, compared to the command grouping, can atomic number 82 to a meaning reduction in statistical ability regardless of the benefits of using historical control information.
In this paper we have assumed that the variability is the aforementioned in all experimental groups. In practice this supposition may not agree. For case, in biological responses it is mutual for the variability to increase with the size of the response. Furthermore, responses that are bounded (due east.thousand. percentages which cannot go below 0 or higher up 100) tend to exist less variable as a boundary is approached. In such cases, in that location are statistical analysis strategies that can be applied, but they are beyond the scope of this paper. An alternative strategy, commonly recommended [8], [14], [22], is to investigate the use of non-linear data transformations to "correct" the data which and so allow application of the methods discussed within this paper. For case, the arcsine transformation for percentage information, square root transformation for count data and log transformation if the variability increases as the response increases.
The arguments presented in this paper, also presume that whatever compunction due to experimental procedures is expected to be the aforementioned beyond all groups. If the researcher believes, for example, that they are likely to lose more than animals in the treated groups, and then they may wish to accommodate the initial sample sizes so that the sample sizes achieved at the end of the study should event in a design that is close to the optimal design.
In practice, if the experiment is to be successful, many considerations should be taken into account when constructing a design. Bug such as practical constraints on the experimental material, financial pressures and ethical issues should be taken into account alongside optimal statistical design theory. In this paper we take aimed to highlight what a theoretically optimal experimental design, all things existence equal, would be. The researcher should apply this knowledge, in conjunction with other constraints, when planning their experiment.
Supporting Data
S1 Derivations
Determining the optimum control grouping replication.
(DOCX)
S2 Derivations
Determining the statistical power.
(DOCX)
Acknowledgments
We would like to dedicate this paper to the belatedly Robert Kempson who brought this problem and solution to our attention.
Funding Statement
This work was supported by the National Institutes of Health (www.nih.gov), grant number 1 U54 HG006370-01 (Natasha Karp). The funders had no role in written report blueprint, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability
The authors confirm that all data underlying the findings are fully bachelor without brake. All relevant information are within the paper.
References
one. Russell WMS, Burch RL (1959) The Principles of Humane Experimental Technique. London: Methuen & Co. Ltd.
two. Kilkenny C, Parsons North, Kadyszewski E, Festing MFW, Cuthill IC, et al. (2009) Survey of the Quality of Experimental Blueprint, Statistical Analysis and Reporting of Research Using Animals. PLoS One 4(11):e7824 x.1371/journal.pone.0007824 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
3. Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG (2010) Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research. PLoS Biol 8(half dozen):e1000412 ten.1371/journal.pbio.1000412 [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
4. Perel P, Roberts I, Sena E, Wheble P, Briscoe C, et al. (2007) Comparison of treatment effects betwixt animal experiments and clinical trials: systematic review. British Medical Journal 334:197–200. [PMC free article] [PubMed] [Google Scholar]
v. Sun TT (2004) Excessive trust in government and its influence on experimental blueprint. Nature Reviews Molecular Cell Biological science 5:577–581. [PubMed] [Google Scholar]
7. García-Pérez MA (2012) Statistical conclusion validity: some common threats and simple remedies. Frontiers in Psychology 3:325. [PMC free commodity] [PubMed] [Google Scholar]
8. Bate ST, Clark RA (2014) The Design and Statistical Analysis of Fauna Experiments. Cambridge: Cambridge Academy Press.
9. Thomas L, Juanes F (1996) The importance of statistical power analysis: An example from Brute Behaviour. Animal Behaviour 52:856–859. [Google Scholar]
10. Rozza AL, Meira de Faria F, Souza Brito AR, Pellizzon CH (2014) The Gastroprotective Effect of Menthol: Involvement of Anti-Apoptotic, Antioxidant and Anti-Inflammatory Activities. PLoS Ane nine(1):e86686 x.1371/journal.pone.0086686 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
xi. Nekohashi M, Ogawa M, Ogihara T, Nakazawa K, Kato H, et al. (2014) Luteolin and Quercetin Bear upon the Cholesterol Absorption Mediated by Epithelial Cholesterol Transporter Niemann–Pick C1-Like 1 in Caco-2 Cells and Rats. PLoS ONE nine(5):e97901 x.1371/journal.pone.0097901 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
12. Zhou K, Jiang M, Liu Y, Qu Y, Shi G, et al. (2014) Effect of Bile Pigments on the Compromised Gut Bulwark Function in a Rat Model of Bile Duct Ligation. PLoS One 9(half dozen):e98905 10.1371/journal.pone.0098905 [PMC gratuitous article] [PubMed] [CrossRef] [Google Scholar]
xiii. Lee South-H, Kim Y, Kim HY, Kim YH, Kim MS, et al. (2014) Aminostyrylbenzofuran Directly Reduces Oligomeric Amyloid-β and Reverses Cognitive Deficits in Alzheimer Transgenic Mice. PLoS ONE 9(4):e95733 x.1371/periodical.pone.0095733 [PMC costless article] [PubMed] [CrossRef] [Google Scholar]
14. Snedecor GW, Cochran WG (1989) Statistical Methods, eighth edition. Iowa: Iowa Land University Press.
15. Lenth RV (2001) Some applied guidelines for effective sample size determination. American Statistician 55:187–193. [Google Scholar]
16. Faul F, Erdfelder E, Lang AG, Buchner A (2007) G*Ability 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Beliefs Research Methods 39:175–191. [PubMed] [Google Scholar]
17. Clark RA, Shoaib M, Hewitt KN, Stanford SC, Bate ST (2012) A comparison of InVivoStat with other statistical software packages for analysis of data generated from brute experiments. Journal of Psychopharmacology 26:1136–1142. [PubMed] [Google Scholar]
xviii. Cohen J (1988) Statistical power assay for the behavioral sciences, second edition. New Bailiwick of jersey: Erlbaum.
nineteen. Ruxton GD, Colegrave NC (2006) Experimental Pattern for the Life Sciences. Oxford: Oxford University Printing.
20. Greim H, Gelbke HP, Reuter U, Thielmann HW, Edler 50 (2003) Evaluation of historical control information in carcinogenicity studies. Human being and Experimental Toxicology 22:541–549. [PubMed] [Google Scholar]
21. Viele K, Drupe Due south, Neuenschwander B, Amzal B, Chen F, et al. (2014) Employ of historical control data for assessing treatment furnishings in clinical trials. Pharmaceutical Statistics 13:41–54. [PMC free article] [PubMed] [Google Scholar]
22. Zar JH (1998) Biostatistical Assay, fourth edition. New Jersey: Prentice Hall.
Articles from PLoS Ane are provided here courtesy of Public Library of Science
Which Of The Four Treatments Was The Control For The Experiment?,
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4263717/
Posted by: carusoagied2001.blogspot.com
0 Response to "Which Of The Four Treatments Was The Control For The Experiment?"
Post a Comment