Tuesday, January 13, 2015

Guidelines for Planning and Evidence for Assessing a Well-Designed Experiment

This paper is full of great guidance for planning a campaign of experimentation, or assessing the sufficiency of a plan that already exists. The authors break up the effort into four phases:
  1. Plan a Series of Experiments to Accelerate Discovery
    1. Design Alternatives to Span the Factor Space
    2. Decide on a Design Strategy to Control the Risk of Wrong Conclusions
  2. Execute the Test
  3. Analyze the Experimental Design
They give a handy checklist for each phase (reproduced below). The checklists are comprehensive (significantly more than my little list of questions) and I think they stand-alone, but the whole paper is well worth a read. Design of experiments is more than just math, as this paper stresses it is a strategy for discovery.

Phase I. Plan a Series of Experiments to Accelerate Discovery
  • Agree upon a problem statement with a synopsis of historical research
    • Documentation of research and references
    • Problem statement that is clear, concise and agreed to by entire test team
    • Clear, concise, comprehensive objectives for each stage of testing
    • Objectives that are specific, measureable, achievable, relevant, and timely (SMART)
    • Clearly defined evaluation criteria for determining success
  • List the output performance measures and specify the anticipated precision, emphasizing continuous responses
    • Table of largely continuous numeric responses, with associated details
    • Estimates of anticipated ranges of response values
    • Specific sources of measurement error and estimates of error variability
    • Description of amount of change to the response that is of scientific or operational interest; justification of such a shift
  • Brainstorm all known system factors especially control factors to vary, those to hold constant and those allowed to vary (nuisance), as well as control factor levels, and test point collection strategy
    • Fishbone diagrams, tables of all factors considered, separated by type
    • Control factors provided with min and max, as well as desired low and high values
    • Hold constant factors provided with reason for limitations and scope reductions
    • Nuisance factors provided with ways to potentially reduce the variability contribution to each
  • Determine baseline estimate of resources (test assets available, operators, experiment location scheduling, etc.) needed for testing, including limitations
    • Resource, budgeting, and scheduling restrictions should be clearly articulated and initial estimate most restrictive
    • Risks associated with test execution estimated, including safety, ability to secure test resources, issues with interrupting normal operations, test staffing, etc.

Phase IIa. Design Alternatives to Span the Factor Space
  • Refine and make ready for experimental design the list of candidate control factors
    • Detailed description of control factors with discussing challenges in setting factor levels
    • Unit of measurement (real, physical values preferred over labels)
    • Range of physical levels; levels chosen for experimental design
    • Estimated priority of factor in describing system performance
    • Expense/difficulty of controlling level: easy-, hard-, very hard-to-change
    • Proposed strategy of experimental control for each factor: constant, matrix variable, or noise (covariate, randomized, or random effect if noise)
  • State the anticipated statistical model polynomial order based on existing knowledge of the system and test objective
    • Understanding of capability and need for first, second, and possibly third order polynomials (Screening and characterization objectives typically require first order plus interaction models, while mapping and optimization objetives are at least second order)
    • Guidance from system experts regarding likely potential interactions
    • Probability of and interest in nonlinear relationships for all numeric factors
  • Provide details of the initial alternative design strategies for the planned model, power, and factor types
    • Type of statistical design considered (e.g. factorial, fractional factorial, response surface, optimal) to accomodate model of interest
    • Dimension of the design, or the number of factors and levels planned
    • Design constraints due to factor setting physical limitations (e.g. A+B < 10), disallowed factor combinations, safety, resource restrictions
    • Strategy for hard-to-change factors, blocking, and replication
  • Plan to test-analyze-test in a sequential fashion to maximize knowledge and efficiency
    • Consideration of likely re-design strategies following initial exploration (e.g. augmentation for decoupling of higher order models, replication, or validation)
    • Estimated number of test phases, purpose of the phase (e.g. model augmentation to estimate interactions), number of runs for each phase, and total number of resources required
    • Strategy for sequences of tests based on test periods scheduled
  • Append management reserve based on anticipated risks of not completing test runs
    • Documentation that quantifies or details risks of interruptions, aborts, bad data points, unacceptable outside influences, and technological maturity
    • Consideration of a resource reserve of 10-30% (experience across wide variety of tests suggests this is typically adequate)

Phase IIb. Deciding on a Design Strategy to Control Risk of Wrong Conclusions
  • Report statistical power and type I error probability of incorrectly declaring a factor significant for proposed model effects and key performance measures (e.g. distance, accuracy, task time)
    • Type I error level (alpha) is appropriate for risk mitigation and justified for given testing
    • Values for delta clearly derived and justified from expert and decision maker input
    • Estimates of sigma provided from relevant historical data, pilot test, or expert judgment
    • Power values reported by factor type if mixed-level design
  • Report metrics per statistical design, weight metrics, and report final design scores
    • Design approaches specified and justified
    • Power (designs for screening) or prediction variance estimation (response surface designs)
    • Alignment between model order and design capability (i.e. adequately estimate terms plus additional points to guard against model misspecification or lack of fit)
    • Sufficient replication to estimate pure error
    • Validation strategy
    • Flexibility in testing sequentially
    • Ability to predict well new observations
  • Decide on final set of designs to accomplish all test objectives
    • Confirmation of limitations and restrictions, including hard-to-change factors
    • Comparison of measurement metrics across designs, with metrics weighting if appropriate
    • Designs graded the highest from multiple criteria compared to alternative designs chosen
  • Decide final sequential strategy for experimentation all design phases and test schedule
    • Estimates of test entries and time per event
    • Priorities set to obtain essential test points (e.g. complete fractional factorial design)
    • Replicates as blocks, or blocking effects separate from model effects
    • Addition of test events only as necessary to build on existing knowledge
    • Strategy for validation points and model checking

Phase III. Execute the Test
  • Maximize efficiency in test execution and carefully define a test event
    • Test team definition of start and finish of a test event; standard operating procedures to consistently repeat setup
    • Efficient strategy to transition from one test event to the next
    • Methods to collect multiple responses per test event
    • Clearly defined circumstances for 'go or no-go' decisions prior to run execution
  • Name the execution order plan to suppress background change influence (e.g. randomized) and justify the method
    • Default is completely randomized
    • Randomization with local control of error or blocks as appropriate
    • Restricted randomization with split plot designs and analysis
    • Analysis of covariance with observable, but uncontrollable variables
    • Reduction of measurement error with repeated measures or subsampling
  • Describe specific procedure to control background variability
    • Hold constant and nuissance variables readdressed
    • Process flow diagrams revisited to ensure standard operating procedures established
    • Practice runs to minimize error and lessen impact of learning curve
    • Procedures in place to reduce set point error (deviation between intended and actual input values)
  • Provide sequential approach to testing details
    • Rough approach: screen many factors, decouple or improve model, add higher order effects, validate
    • "Must have" test points identified in case of shortened or interrupted testing
  • Describe approach to ensure independence of successive observations (e.g. batching observations, resetting input levels)
    • Test input conditions reset after every test point
    • Decisions for combining multiple observations per test event (e.g. averaging 2 sec of 20Hz data)

Phase IV. Analyze the Experimental Design
  • Ensure test objectives align with the analysis objectives--screen, characterize, compare, predict, optimize, or map
    • Objectives have SMART responses that directly address stated goals
    • Designs are suited to the analysis objectives (e.g. second order design for an optimize)
  • Assess the ability of the design to statistically analyze and model the responses
    • Explanation of modeling strategy to include alternative analysis techniques
    • Intent to determine factor significance, quantify uncertainty, display models graphically, and provide intervals for estimation and prediction
    • Diagnostics of model quality (VIF's, prediction variance, regression coefficient variance, coefficient of determination)
  • Compare the design strategy to the intended general model
    • General model intended for the design--linear, interaction, etc
    • Strategy to enable fitting the general model, estimating error, and fitting a model more complex than assumed (lack of fit)
    • Confounding effects description--e.g. resolution or correlation of effects of interest
  • Detail the sequential model-building strategy and validation phase outlined
    • Strategy for initial design augmentation to resolve confounding--foldover, predict-confirm, etc
    • Estimate of number of augmenting runs required
    • Strategy for alternating augmentation test phases based on analysis of earlier phases

No comments:

Post a Comment