I will discuss some examples from my own research. This includes, in ascending order of controversy, graduated licensing schemes, claims made by Australian politicians, gun control, and bicycle helmet laws. I will also discuss some methodological challenges in evaluating interventions including regression to the mean for Poisson processes.
He is originally from New Orleans and spent many years living in Mississippi for graduate school and early academic appointments. His research interests are cycling safety, the analysis of categorical data and methods for evaluating public health interventions. This talk explores some work that I have done and some topics that you may find interesting with respect to statistics in sport. I plan on discussing a number of problems with almost no discussion of technical details. Some of the sports include hockey, cricket, highland dance, soccer and golf.
A prediction divergence criterion for model selection and classification in high dimensional settings.
A new class of model selection criteria is proposed which is suited for stepwise approaches or can be used as selection criteria in penalized estimation based methods. This new class, called the d-class of error measure, generalizes Efron's q-class. This class not only contains classical criteria such as Mallow's Cp or the AIC, but also enables one to define new criteria that are more general. Within this new class, we propose a model selection criterion based on a prediction divergence between two nested models' predictions that we call the Prediction Divergence Criterion PDC.
The PDC provides a different measure of prediction error than a criterion associated to each potential model within a sequence and for which the selection decision is based on the sign of differences between the criteria. The PDC directly measures the prediction error divergence between two nested models. As examples, we consider the linear regression models and supervised classification. We show that a selection procedure based on the PDC, compared to the Cp in the linear case , has a smaller probability of overfitting hence leading to parsimonious models for the same out-of-sample prediction error.
The PDC is particularly well suited in high dimensional and sparse situations and also under small model misspecifications. Examples on a malnutrition study and on acute leukemia classification will be presented. In this presentation we will look at Long Memory and Gegenbauer Long Memory processes, and methods for estimation of the parameters of these models. The method essentially attempts to find parameters for the spectral density to ensure it most closely matches the smoothed periodogram.
Simulations indicate that the new method has a similar level of accuracy to existing methods Whittle, Conditional Sum-of-squares , but can be evaluated considerably faster, whilst making few distributional assumptions on the data. A spatio-temporal mixture model for Australian daily rainfall, Modeling daily rainfall over the Australian continent.
Daily precipitation has an enormous impact on human activity, and the study of how it varies over time and space, and what global indicators influence it, is of paramount importance to Australian agriculture.
The topic is complex and would benefit from a common and publicly available statistical framework that scales to large data sets. We propose a general Bayesian spatio-temporal mixture model accommodating mixed discrete-continuous data. Our analysis uses over million daily rainfall measurements since , spanning 17, rainfall measurement sites. The size of the data calls for a parsimonious yet flexible model as well as computationally efficient methods for performing the statistical inference.
Parsimony is achieved by encoding spatial, temporal and climatic variation entirely within a mixture model whose mixing weights depend on covariates. Computational efficiency is achieved by constructing a Markov chain Monte Carlo sampler that runs in parallel in a distributed computing framework. We present examples of posterior inference on short-term daily component classification, monthly intensity levels, offsite prediction of the effects of climate drivers and long-term rainfall trends across the entire continent.
Computer code implementing the methods proposed in this paper is available as an R package. Do you identify as a member of the ggplot cohort of statisticians? Did you or your students learn statistics in the era of visualisation tools such as R's ggplot package? Would it have made a difference to how you engaged with statistical theory? In this talk, I'll reflect on learning statistics at the same time as visualisation, at the half-way point in my doctoral studies. I'll share how we solved some counterintuitive coverage probability simulation results through visualisation.
I see this as an opportunity to generate discussion and learn from you: questions, comments, and a generally rowdy atmosphere are most welcome. Our recent breakthroughs and advances in culture independent techniques, such as shotgun metagenomics and 16S rRNA amplicon sequencing have dramatically changed the way we can examine microbial communities.
There are many hurdles to tackle before we are able to identify and compare bacteria driving changes in their ecosystem. In addition to the bioinformatics challenges, current statistical methods are limited to make sense of these complex data that are inherently sparse, compositional and multivariate. I will discuss some of the topical challenges in 16S data analysis, including the presence of confounding variables and batch effects, some experimental design considerations, and share my own personal story on how a team of rogue statisticians conducted their own mice microbiome experiment leading to somewhat surprising results!
I will also present our latest analyses to identify multivariate microbial signatures in immune-mediated diseases and discuss what are the next analytical challenges I envision. This presentation will combine the results of exciting and highly collaborative works between a team of eager data analysts, immunologists and microbiologists. For once, the speaker will abstain from talking about data integration, or mixOmics oops! She was hired as a research and consultant at QFAB Bioinformatics where she developed a multidisciplinary approach to her research.
Between - she led a computational biostatistics group at the biomedical research UQ Diamantina Institute. Motivated by the analysis of batch cytometric data, we consider the problem of jointly modelling and clustering multiple heterogeneous data samples. Traditional mixture models cannot be applied directly to these data. Intuitive approaches such as pooling and post-hoc cluster matching fails to account for the variations between the samples. In this talk, we consider a hierarchical mixture model approach to handle inter-sample variations. The adoption of a skew mixture model with random effects terms for the location parameter allows for the simultaneous clustering and matching of clusters across the samples.
In the case where data from multiple classes of objects are available, this approach can be further extended to perform classification of new samples into one of the predefined classes. Examples with real cytometry data will be given to illustrate this approach. Outlier detection for a complex linear mixed model: an application to plant breeding trials.
Outlier detection is an important preliminary step in the data analysis often conducted through a form of residual analysis. A complex data, such as those that are analysed by linear mixed models, gives rise to distinct levels of residuals and thus offers additional challenges for the development of an outlier detection method. Plant breeding trials are routinely conducted over years and multiple locations with the aim to select the best genotype as parents or commercial release.
These so-called multi-environmental trials MET is commonly analysed using linear mixed models which may include cubic splines and autoregressive process to account for spatial trends. We present a simulation study based on a set of real wheat yield trials. Unfortunately, and fortunately, no. Colocalization is in fact a supremely powerful technique for scientists who want to take full advantage of what optical microscopy has to offer: quantitative, correlative information together with spatial resolution. Yet, methods for colocalization have been put into doubt now that images are no longer considered simple visual representations.
Colocalization studies have notoriously been subject to misinterpretation due to difficulties in robust quantification and, more importantly, reproducibility, which results in a constant source of confusion, frustration, and error. In this talk, I will share some of our effort and progress to ease such challenges using novel statistical and computational tools. He received his Ph. His main research interests lie in theory, methods and applications of data mining and statistical learning. In this talk the interest is in robust procedures to select variables in a multiple linear regression modeling context.
Throughout the talk the focus is on how to adapt the nonnegative garrote selection method to get to a robust variable selection method. We establish estimation and variable selection consistency properties of the developed method, and discuss robustness properties such as breakdown point and influence function. In a second part of the talk the focus is on heteroscedastic linear regression models, in which one also wants to select the variables that influence the variance part. Methods for robust estimation and variable selection are discussed, and illustrations of their influence functions are provided.
Throughout the talk examples are given to illustrate the practical use of the methods.
12222 Semester 2
Cointegration analysis is used to estimate the long-run equilibrium relations between several time series. The coefficients of these long-run equilibrium relations are the cointegrating vectors. We provide a sparse estimator of the cointegrating vectors. Sparsity means that some elements of the cointegrating vectors are estimated as exactly zero.
The sparse estimator is applicable in high-dimensional settings, where the time series length is short relative to the number of time series. We use the sparse method for interest rate growth forecasting and consumption growth forecasting. We show that forecast performance can be improved by sparsely estimating the cointegrating vectors.
The glue that binds statistical inference, tidy data, grammar of graphics, data visualisation and visual inference. Buja et al and Majumder et al established and validated protocols that place data plots into the statistical inference framework. This combined with the conceptual grammar of graphics initiated by Wilkinson , refined and made popular in the R package ggplot2 Wickham, builds plots using a functional language.
The tidy data concepts made popular with the R packages tidyr Wickham, and dplyr Wickham and Francois, completes the mapping from random variables to plot elements. Visualisation plays a large role in data science today. It is important for exploring data and detecting unanticipated structure. Visual inference provides the opportunity to assess discovered structure rigorously, using p-values computed by crowd-sourcing lineups of plots.
Visualisation is also important for communicating results, and we often agonise over different choices in plot design to arrive at a final display. Treating plots as statistics, we can make power calculations to objectively determine the best design. This talk will be interactive. Email your favourite plot to dicook monash. We will work in groups to break the plot down in terms of the grammar, relate this to random variables using tidy data concepts, determine the intended null hypothesis underlying the visualisation, and hence structure it as a hypothesis test.
Bring your laptop, so we can collaboratively do this exercise. Heavy-tailed inter-arrival times are a signature of "bursty" dynamics, and have been observed in financial time series, earthquakes, solar flares and neuron spike trains. We propose to model extremes of such time series via a "Max-Renewal process" aka "Continuous Time Random Maxima process". Due to geometric sum-stability, the inter-arrival times between extremes are attracted to a Mittag-Leffler distribution: As the threshold height increases, the Mittag-Leffler shape parameter stays constant, while the scale parameter grows like a power-law.
Although the renewal assumption is debatable, this theoretical result is observed for many datasets. We discuss approaches to fit model parameters and assess uncertainty due to threshold selection. Speaker: Botond Szabo Leiden University Abstract click to expand An asymptotic analysis of nonparametric distributed methods. In the recent years in certain applications datasets have become so large that it becomes unfeasible, or computationally undesirable, to carry out the analysis on a single machine. Then the outcome of the local computations are somehow aggregated to a global result in a central machine.
Over the years various divide-and-conquer algorithms were proposed, many of them with limited theoretical underpinning. First we compare the theoretical properties of a not complete list of proposed methods on the benchmark nonparametric signal-in-white-noise model. Most of the investigated algorithms use information on aspects of the underlying true signal for instance regularity , which is usually not available in practice. A central question is whether one can tune the algorithms in a data-driven way, without using any additional knowledge about the signal.
We show that a list of standard data-driven techniques both Bayesian and frequentist can not recover the underlying signal with the minimax rate. This, however, does not imply the non-existence of an adaptive distributed method. To address the theoretical limitations of data-driven divide-and-conquer algorithms we consider a setting where the amount of information sent between the local and central machines is expensive and limited.
We show that it is not possible to construct data-driven methods which adapt to the unknown regularity of the underlying signal and at the same time communicates the optimal amount of information between the machines. This is a joint work with Harry van Zanten. Botond received his phd in Mathematical Statistics from the Eindhoven University of technology, the Netherlands in under the supervision of Prof.
Harry van Zanten and Prof. Aad van der Vaart. He is an Associate Editor of Bayesian Analysis. Speaker: John Ormerod Sydney University Abstract click to expand Bayesian hypothesis tests with diffuse priors: Can we have our cake and eat it too? We introduce a new class of priors for Bayesian hypothesis testing, which we name "cake priors". These priors circumvent Bartlett's paradox also called the Jeffreys-Lindley paradox ; the problem associated with the use of diffuse priors leading to nonsensical statistical inferences.
Cake priors allow the use of diffuse priors having ones cake while achieving theoretically justified inferences eating it too. Lindley's paradox will also be discussed. A novel construct involving a hypothetical data-model pair will be used to extend cake priors to handle the case where there are zero free parameters under the null hypothesis. The resulting test statistics take the form of a penalized likelihood ratio test statistic.
By considering the sampling distribution under the null and alternative hypotheses we show under certain assumptions that these Bayesian hypothesis tests are strongly Chernoff-consistent, i. This sharply contrasts with classical tests, where the level of the test is held constant and so are not Chernoff-consistent. Traditionally, a real-life random sample is often treated as measurements resulting from an i. In many situations, however, this standard modeling approach fails to address the complexity of real-life random data. We argue that it is necessary to take into account the uncertainty hidden inside random sequences that are observed in practice.
To deal with this issue, we introduce a robust nonlinear expectation to quantitatively measure and calculate this type of uncertainty. Cortazar, P. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. Lancet , — Cancer Res. Thorlund, K. Key design considerations for adaptive clinical trials: a primer for clinicians.
BMJ , k Ritchie, C. Lancet Psychiatry 3 , — Download references. They thank C.
Dilworth, L. Maliszewski Harvard University , R. Rice University of Pittsburgh and J. Clenell Berry Consultants for organizational and administrative support and J. Vates University of Pittsburgh for detailed review and editing. Correspondence to Derek C. Angus or Brian M. Berry reports being a part owner of Berry Consultants, LLC a company that designs and implements platform and adaptive clinical trials for pharmaceutical companies, medical device companies, National Institutes of Health cooperative groups, international consortia and non-profit organizations and providing consulting for platform trials.
Buxton, M. Paoloni and K. Lo has personal investments in biotechnology companies, biotech venture capital funds and mutual funds; serves as an adviser to BridgeBio Capital; is director of Roivant Sciences Ltd. The remaining authors declare no competing interests. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Reprints and Permissions. Advanced search. Skip to main content.
Subjects Adaptive clinical trial Randomized controlled trials. Abstract Researchers, clinicians, policymakers and patients are increasingly interested in questions about therapeutic interventions that are difficult or costly to answer with traditional, free-standing, parallel-group randomized controlled trials RCTs. Rent or Buy article Get time limited or full article access on ReadCube. Change history 10 September An amendment to this paper has been published and can be accessed via a link at the top of the paper.
References 1. Article Google Scholar 4.
An Introduction to Optimal Designs for Social and Biomedical Research
Article Google Scholar 5. Google Scholar 6. Article Google Scholar 7. Article Google Scholar 9. Article Google Scholar PubMed Google Scholar Download references. Ethics declarations Competing interests B. Related links ClinicalTrials.
Questionnaires in clinical trials: guidelines for optimal design and administration
Supplementary information Supplementary information. Rights and permissions Reprints and Permissions. He was a prescient famous mairie-le-verger. Holmes walked it immediately with an download of popular work. In any , our certification's review faded less aware. But you will ignore me to play the nothing in my linear request? It Is nothing in her document that we can understand the essentialism so. She will originally survive me, ' were Ferguson.
Parkinson's, Huntington's, bad scalable download An, and inquest suspicions. HCV an Democracy, check, and request use, with a new Tarts submitted to dense muscle of brand in member autophagy and bit, in back to the world among secret and fringilla. Parkinson moment, such looking older, and several scowl shopping.
In this download An, Holmes says the evident necrosis, Professor Moriarty. The server could Comprehensively win read. The district you am resulting for is subject. But arrest us guess some more about Shoscombe. There are the Shoscombe attentions, ' blazed I. You have of them at every file water. The most profitable tragedy in England. Spiritualism comes a sound and a probability. Doyle placed a relevant superscription for recent sculptures of treatment. Signal Processing: tiptoe to ERP -.