Statistics
Te Tari Pāngarau me te Tatauranga
Department of Mathematics & Statistics

Archived seminars in Statistics

Seminars 301 to 350
Previous 50 seminars
Next 50 seminars
The tales of two R packages: MCMCglmm and rptR

Dr. Shinichi Nakagawa

Department of Zoology

Date: Thursday 24 September 2009

I will talk about two R packages, which I was involved in (MCMCglmm is entirely written by my collaborator J Hadfield at University of Edinburgh where as rptR is written with H Schielzeth at Uppsala University). The first package MCMCglmm is a powerful alternative to one of the most widely used package lme4 – a package for generalised linear mixed modelling (GLMM). I will introduce various capabilities of this package including an aspect, which I was involved in, i.e. phylogenetic mixed models and phylogenetic meta-analysis. The second package rptR contains various functions to calculate repeatability or intra-class correlations (ICC). What distinguish rptR from other packages for calculations of ICC is that rpt R have repeatability functions for non-Gaussian data via GLMM. I will explain how it recently became import that calculating accurate repeatability for such non-normal data in the context of animal personality research (a burgeoning field in evolutionary biology). Also, I will discuss some
difficulties and problems we had in the course of writing this package.
090922093025
Current statistical challenges in high-throughput genomics

Dr Mik Black

Department of Biochemistry

Date: Thursday 20 August 2009

Just over ten years has passed since microarray technology sparked intense interest in the analysis of genomic data within the field of statistics. Rather than being a passing trend, the area of "statistical genomics" has become a major component of our discipline, with ever-increasing numbers of statistically minded researchers now involved in the analysis of data from genomics projects. In such a crowded arena, it is often difficult to comprehend where new contributions can be made - this talk will attempt to proivde an overview of the current state of statistical genomics, including coverage of the current statistical challenges faced by genomics researchers.
090818141413
Modeling seed longevity to predict when to regenerate a germplasm collection

Dr Philip Dixon

Department of Statistics Iowa State University

Date: Thursday 30 July 2009

Various countries maintain germplasm collections, whose intent is to preserve living genetic material for the indefinite future. For example, the Ames Iowa collection preserves over 6,000 variants of maize, 1,700 variants of mustard, 172 variants of cucumbers, 130 variants of Echinacea, as well as many other species. These are preserved as dry seeds stored at controlled temperature and humidity. However, seeds die even in optimal storage conditions. So, curators irregularly test seed germination. Seed lots of maize are tested approximately every 8 years. When germination falls below a species-dependent threshold, e.g. 50% or 85%, the seed lot is regenerated by growing out plants and collecting their seeds. The current schedule of seed testing has two problems:
long-lived seed lots may be tested repeatedly until the threshold is reached
short-lived seed lots may not be tested until long after the threshold is reached.

We were asked if we could use available germination data to develop a better decision rule for when to test seeds and to detect threshold crossing. Doing this involved:
modeling expected seed germination over time (selecting an appropriate mean function)
choosing whether to pool data across variants (random effects or fixed effects)
choosing a reasonable error structure (linear mixed model or GLMM)
estimating quantiles of predicted distributions (bootstrap resampling for a LMM)
assessing alternative decision rules (receiver operating characteristic curves)

Each of these issues will be illustrated and discussed.
090727111907
Estimating species richness and comparing species similarity using various models

Dr Austina Clark

Department of Mathematics & Statistics

Date: Thursday 23 July 2009

A large data set of plant species has been collected in New Zealand from 1990 to 1993 on 10 different sites, where sampling was carried out using quadrats.

The 10 sites were categorized into three different area types according to their vegetation growth. Each site was also divided into three subsites according to the animals present/absent. Naturally, biologists are very interested in finding out species richness of each area, and also the similarity/diversity among areas and treatments. They also are interested in predicting unseen species if future surveys are applied.

We apply the homogenous model, homogenous MLE, Chao (2005) model and the ACE (Abundance-based Coverage Estimator) (Chao, Lee, 1992) to estimate species richness.

Next we use the multiple-community diversity measure to compare the similarity/diversity (Chao et al, 2008) among the areas as well as the three treatments.

The data was collected over four years. However the yearly factor could not be used as an independent factor because the sampling was taken over the same quadrate over each year.
090720154717
Adjusting Age at First Breeding of Gibson’s Albatrosses for Emigration and Study Duration Using Bayesian State-Space CMR Models

Mr Peter Dillingham

Mathematics and Statistics

Date: Thursday 16 July 2009

The age at first breeding ( ) is an important demographic parameter in determining maximum growth rate, population size, and generation time, and is a key parameter in calculating the potential biological removal of birds. Albatrosses (and other petrels) do not begin breeding for many years, with some first breeding in their teens. This means that even long-term studies of birds banded as chicks may not last long enough to observe the entire process of recruitment to breeding. Standard estimation methods adjust for imperfect observation, but not for emigration and study length. Modelling approaches may be used to estimate the mean age at first breeding ( ), along with age-specific breeding probabilities, but these must be used cautiously as they require untestable assumptions. The potential impact of these assumptions will be demonstrated by comparing estimates and from different recruitment models for a long-term dataset of Gibson's albatrosses.
090720154528
StatChat: Rejecting Statistical Hypothesis Testing

Chris Fonnesbeck

Department of Mathematics & Statistics

Date: Thursday 28 May 2009

The use of statistical hypothesis tests has been the subject of criticism for several decades (e.g., Rozeboom 1960, Carver 1978, Cohen 1994, Lukacs et al. 2007). Yet, such methods continue as the primary tool for statistical inference, and are widely taught throughout undergraduate statistics curricula. In the view of many statisticians, statistical hypothesis testing is inherently and fundamentally flawed, and its limitations are often unrecognised, frequently resulting in their misuse by practitioners. Given this, a suite of alternative approaches has been promoted to provide more effective inference. This StatChat will comprise a discussion of the major issues concerning hypothesis testing methods, and whether their continued use as the default method of statistical inference is warranted.
090522133347
PyMC 2.0: Building Bayesian Statistical Models Using Python

Christopher Fonnesbeck

Department of Mathematics and Statistics

Date: Thursday 21 May 2009

The lack of robust and easy-to-use software has frequently been cited as an explanation slow uptake of Bayesian methods among potential practitioners and students in a variety of fields. In particular, Markov chain Monte Carlo (MCMC) methods are non-trivial to implement for the vast majority of potential users. The release of WinBUGS (Spiegelhalter et al. 2000) made MCMC modeling accessible by a much wider audience. Alternatively, PyMC is a module for the Python programming language that implements Bayesian statistical models and fitting algorithms, including MCMC. Version 2.0 of PyMC features a new flexible object model and syntax that makes model specification easy. This flexibility makes it applicable to a large suite of problems and its core functionality is easily extensible. In addition, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics. Unlike most competing software, PyMC is modular component of a larger programming language, which allows for greater scope in the design of applications and the development of extensions. I will outline the major features of the software, and present a few examples of the range of models that can be fit.
090514153901
Replicating the Hockey Stick in a Toy Model

Peter Green

Department of Mathematics & Statistics

Date: Thursday 7 May 2009

The “Hockey Stick” is an attempt to reconstruct past temperatures by extrapolating recent correlations between temperature and various proxies into the last 1000 years.

Applying the method to simulated data from a climate model allows us to compare simulated reconstructions with known underlying model temperatures.

This provides us with a measure of the method's reconstructive skill, which will eventually allow us to compare different approaches, hopefully giving us a more accurate picture that current “spaghetti” graphs.
090504103726
THE TROUBLE WITH NORMAL: ECOLOGICAL DIAGNOSTICS FOR THE GREAT BARRIER REEF

Aaron MacNeil

Australian Institute of Marine Science

Date: Thursday 23 April 2009

Coral reefs are diverse and complex marine ecosystems, rich in marine life yet under threat from anthropogenic disturbance that may lead to precipitous declines in reef extent and productivity. As Australia's iconic natural asset, the Great Barrier Reef provides critical ecosystem services to the Australian economy in terms of fisheries and tourism revenue. Maintaining the ecological function of the GBR under a changing climate is made still more difficult by the fact that so little is known about what constitutes a healthy and resilient reef; the complexity of sampling and studying reef ecosystems; understanding how reef ecosystems function through disturbance and recovery; and the scale of the climate problem. Using a series of ecologically-based Bayesian hierarchical models and a unique set of long-term survey data, I attempt to estimate multiple meaningful benchmarks for understanding what constitute 'normal' conditions within the GBR ecosystem and to contextualize future disturbance. While many reefs are below optimally-functional levels of hard coral, many locations have recovered from extensive disturbance in the 1990's and are experiencing reasonable levels of recovery. Should rates of anthropogenic disturbance increase substantially, rates of coral reef recovery may be insufficient to ensure the long-term stability of the GBR.
090422112035
Probabilistic modelling of runoff in a semiarid banded vegetation system

Laimonis Kavalieris

Department of Mathematics & Statistics

Date: Thursday 9 April 2009

Banded vegetation systems occur naturally in many semiarid environments. They typically comprise alternate runoff and run-on zones, oriented across a hill slope and replicated down slope. Bare, crusted zones with low infiltration capacity generate overland runoff, which then cascades onto vegetated zones of high infiltration capacity. In this way water supply to the vegetation is increased above what is delivered via rainfall alone. The banded mulga (Acacia aneura) systems in Australia comprise three zones: bare soil; grass; and trees, with infiltration and water storage capacities increasing in that order. This triplet is repeated over large areas. It has been proposed that agroforestry systems can be designed to make use of this passive method of water redistribution, allowing bands of productive trees to be grown in environments with relatively low rainfall. Here we discuss a stochastic model for this system where, daily rainfall is modelled as a marked Poisson process; the soil water as a finite storage defined by soil porosity and plant root depth; and evapotranspiration as a loss function dependent on the level of soil water storage. We derive stationary distributions for soil water storage and surface runoff and discuss the response of such systems to changing rainfall patterns.
090402153002
Design of Experiments for Bivariate Binary Responses

John Eccelston

Department of Mathematics, University of Queensland

Date: Thursday 2 April 2009

Optimal design for generalized linear models has primarily focused on univariate data. Often experiments are performed that have multiple dependent responses modelled by regression type models and it is of interest and value to design the experiment for all these responses. This requires a multivariate distribution underlying a pre-chosen model for data. In this talk we consider the design of experiments for bivariate binary data, which are dependent. We explore Copula functions which provide a rich and flexible class of structures to derive joint distributions for bivariate binary data. Methods are presented for deriving optimal experimental designs for dependent bivariate binary data using Copulas, and demonstrate that by including the dependence between responses in the design process more efficient model parameter estimates are obtained than the usual practice of simply designing for one variable only. The approach is demonstrated through an example on drug efficacy and toxicity.
090325154334
No-think MCMC

Colin Fox

Department of Physics

Date: Thursday 26 March 2009

Implementing an MCMC sampler always requires some kind of problem-specific work, making a hurdle for the newcomer and a chore for the practitioner. The newcomer must first learn how to build and tune proposals for reasonable convergence times, while more sophisticated applications require Jacobians in the reversible-jump formalism. These tasks can be both onerous and subtle in applications of MCMC to inverse problems because representations usually live in high-dimensional spaces, and the ill-posedness causes the posterior distribution to have correlation coefficients of 1-0. The t-walk is a general purpose, black box, sampler that can adequately sample many problems, including inverse problems, and lets scientists get on with other jobs.
090319110729
SMALL SAMPLE PROPERTIES OF LONGITUDINAL POISSON MODELS OR WHAT I DID ON MY SABBATICAL

Melanie Bell

Department of Preventive & Social Medicine

Date: Thursday 19 March 2009

Statistical methods for longitudinal data are, in general, asymptotically based. However, these methods regularly get used when sample size is small.

The small sample properties for normal and binary data are fairly well known, but this is not the case when data are Poisson. I have performed a simulation which investigated type I error rate, bias and power for longitudinal Poisson models and have found that these models are unbiased, even for very small sample size, unlike the logistic models which have been shown to be biased. In particular, REPL estimation does not yield biased results for Poisson data. We conclude that statisticians should not generalize the results of logistic models for binary data to all non-normal data. We have shown that generalized linear mixed models yield too low of a type I error rate, whereas generalized estimating equations using uncorrected sandwich SEs have a too high type I error rate.
090313111023
Collaborative Research with Undergraduates: Kiwi Accents and Burden of Disease

Julie Legler

St. Olaf College, Northfield, Minnesota, US

Date: Thursday 12 March 2009

Two projects involving undergraduates and collaborators are described briefly. The projects come from two different approaches to statistics education. On campus, an interdisciplinary research team modeled challenging data provided by a linguist to assess the effect of predictors on quantifying accents. This project occurred under the auspices of our Center for Interdisciplinary Research (CIR). The CIR has been supported for 5 years with a $1.3 million grant from the National Science Foundation (NSF). Over 30 collaborative projects have been completed over this time. The CIR has attracted a large number of students to the study of statistics and led to 40 students attending graduate study in statistics or a closely related field from a school of 3000 students over the past 4 years. Many of our statistics students are also involved in a second relatively unique program which provides students the opportunity to collaborate with World Health Organization (WHO) researchers in Geneve, Switzerland. A primary charge of WHO is to evaluate the global burden of disease (GBD) for an extensive list of diseases. Students on this project examined the effect of model choices on estimating GBD with some surprising results.
090305145952
A simple Gibbs sampler for calculating posterior model probabilities using MCMC output from independently fitted models

Richard Barker

Department of Mathematics & Statistics

Date: Thursday 5 March 2009

An explanation of reversible-jump MCMC is provided in terms of a ‘palette’ of parameters ψ and K bijections that map ψ to model-specific parameters ψk, k = 1, …, K, where K is the number of models considered. One possible advantage of this formulation is that it is specified in terms of K bijections instead of the KC2 needed in the usual formulation. This representation also make it clear how a Gibbs sampler can be constructed that exploits MCMC output generated independently for each of the K models.
090227113342
Operator algebras associated to dynamical systems

Astrid an Huef

School of Mathematics & Statistics, University of New South Wales

Date: Thursday 29 January 2009

There are many mathematical formulations of dynamics. I will explain how the dynamical systems studied by operator algebraists arise from a more classical notion of dynamical system as a differential equation. The result will be a continuous action of a group G on a space X, and hence an action of G on the algebra C0(X) of continuous functions on X vanishing at infinity. I will survey the interplay of the dynamics of the pair (G, X) and the representation theory of an operator algebra built from the action of G on C0(X).
090121122149
Mixed Nonhomogeneous Poisson Process Spline Models for the Analysis of Recurrent Event Panel Data

Charmaine Dean

Simon Fraser University

Date: Friday 12 December 2008

A flexible semiparametric model for analyzing longitudinal panel count data is presented. Panel count data refers here to count data on recurrent events collected as the number of events which have occurred within specific followup periods. The model assumes that the counts for each subject are generated by a nonhomogeneous Poisson process with a smooth intensity function. Such smooth intensities are modeled with adaptive splines. Both random and discrete mixtures of intensities are considered to account for complex correlation structures, heterogeneity and hidden subpopulations common to this type of data. An estimating equation approach to inference requiring only low moment assumptions is developed and the method is illustrated on several data sets.
081120153348
What does the Wilcoxon test mean?

Thomas Lumley

University of Washington

Date: Friday 7 November 2008

It is notoriously difficult to explain simply and accurately what the Mann-Whitney/Wilcoxon two-sample test does (and many textbooks just settle for `simply'). I will describe some of the properties that I used to attribute to the Wilcoxon test, and why they were an attractive delusion. These problems extend to essentially all rank tests, which are inconsistent with any ordering of all distributions, for reasons that were basically known before statistics was invented.
081104115120
Recent Developments in Statistical Ecology

Byron Morgan

University of Kent, Canterbury, UK.

Date: Wednesday 29 October 2008

Early probability models in statistical ecology were simple, and used for the description of relatively short data sets. Current models can be far more complex, supported by long-term data sets. We describe some of the early models and then focus on a range of recent developments, motivated by the need to describe detailed data on grey herons, Ardea cinerea, Soay sheep, Ovis aries, Great cormorants, Phalacrocorax carbo sinensis, and the lizard orchid, Himantoglossum hircinum. We shall present state-space models for census data, fitted by means of a Kalman filter, capture-recapture models for estimating the survival of marked wild animals, and new ways of including appropriate covariates, for both survival and reproduction; the covariates include time-varying individual measures with missing values, such as weight, and global measures such as winter weather and population density. Climate change can result in habitat fragmentation, and populations that are distributed over several sites, with animals moving between those sites. We present a simple method for discriminating between highly parameterised multi-site models using score tests. Prediction is made through population projection matrices, and we present an exact method for perturbation analysis based on implicit plotting using symbolic algebra. The different parts of this talk describe joint work with Takis Besbeas, Rachel Borysiewicz, Ted Catchpole, Jean-Dominique Lebreton, David Miller, Martin Ridout, and Peter Rothery, as well as ecologists Thomas Bregnballe, Peter Carey, Tim Coulson and Giacomo Tavecchia.
080918102635
New Dog, Old Tricks

Darryl MacKenzie

Proteus Wildlife Research Consultants

Date: Thursday 9 October 2008

In many fields of science, from archaeology and palaeontology to epidemiology, botany and zoology, when seeking to establish the presence or absence of some item of interest, situations arise where the absence of evidence does not equate to evidence of absence. Whether it be establishing the distribution of a species (living or extinct), disease or ancient civilisation; the number of species present at a location or patients on a list with a particular disease, the potential for false absences has long been recognised. False absences result from failing to detect the necessary evidence with a certain level of search or sampling effort, given the item is truly present, and can lead to misleading conclusions. Recently, methods have been developed to account for false absences in the ecological/wildlife fields that could be well applied in other areas of science. While they still do not enable one to infer true absence, they do allow a probability to be estimated for it.

In this talk I shall briefly review these methods, provide examples of their use, and discuss ideas for how they could be applied in different situations.
080918102439
An Introduction to Temperature Reconstruction

Peter Green

Department of Mathematics and Statistics

Date: Thursday 2 October 2008

A brief look at the data and methods behind Northern hemisphere mean temperature reconstructions for the past millennium.
080925105150
Battle of the Bayes

Chris Fonnesbeck

Department of Mathematics & Statistics

Date: Thursday 25 September 2008

In place of our usual statistics seminar at 11 a.m. on September 25, I will lead a discussion on “Objections to Bayesian Analysis”. This was motivated by a series of papers published in the current issue of the journal “Bayesian Analysis” in which a series of contrived, but at times convincing, objections to the widespread use of Bayesian statistics was offered by Andrew Gelman. These assertions were commented upon by a range of statisticians, including a rejoinder by Gelman himself. The discussion addresses many issues expressed by applied statisticians, related to Bayesian inference.

For those interested in participating, I have posted the relevant papers here:

http://drop.io/Bayesian

if asked for a password, please use "2phast"

The articles are all very short, so please have a look before the StatChat (at least, read the original Gelman paper). They can either be downloaded, or read in-place from your browser.

Hope you’ll join us!
080911110707
Statistics 4th Year Presentations

Statistics 4th Year Presentations

Department of Mathematics & Statistics

Date: Friday 19 September 2008

1. Dorothee Hodapp
Estimation of Abundance and Occupancy of Bottlenose Dolphins using Bayesian Inference

2. Ella Iosua
Maori Population Stratification in the Genetic Study of Gout

3. Philippa Smale
Yellow Eyed Penguin Population Dynamics

4. Jimmy Zeng
Estimating Luminescence Lifetime

Refreshments will be provided
080915114234
Dehydrating athletes: Good, bad, or both?

Jim Cotter

School of Physical Education

Date: Thursday 11 September 2008

With athletes sweating through the heat and humidity of the Beijing games, it’s timely to critique what we do and don’t know about hydration, athletic performance and health. I’ll present some of our hydration research, showing that the scientific community knows a lot but also relatively little, whereas – perhaps arguably – the media and thus public know nothing of use. Our research is common to most of the exercise physiology literature in using small samples (often <10 per group, single blinded) and thus repeated measures designs analysed with simple univariate statistics. I suspect that many of us rely on the statistics we’ve always known, and that we have little more stats knowledge than in our student days. It’s convenient to believe that such mediocrity of design is due to the high resource requirement of experimenting on humans under stress, whereas mediocrity of analysis is due to underpowered designs and the fact that it is ‘appropriate’ for what we want to know.
080826090759
Model Selection and Time Series Asymptotics

Laimonis Kavalieris

Department of Mathematics and Statistics

Date: Thursday 4 September 2008

In any realistic modeling scenario candidate models are chosen to capture an interesting aspect of the data. We cannot assume the existence of a “true” model or even a finitely parameterized data generating process. Thus the model selection criterion should be tuned to the purpose of the statistical analysis, and a mathematical analysis of the criterion must allow increasingly model complexity with increasing sample size. I will discuss this model selection framework in the time series context and reflect on the asymptotic theory for penalty function criteria (such as AIC, amongst others) for model selection. I will not shy away from elegant mathematical formulae!!.
080902084109
Identifying calving dates in farmed red deer hinds monitored using GPS collars

Roger Littlejohn

AgResearch, Invermay Agricultural Centre

Date: Thursday 21 August 2008

I have recently been working on data from a trial where 8 farmed red deer hinds were monitored using GPS collars in a large herd in a large paddock. Recording was carried out between November and March. Calving was expected to occur during November. It was expected that immediately prior to calving, hinds would roam more widely than usual, and that immediately after calving they would not roam at all, but remain close to the same place. No direct observations were made on the calves. An initial inspection of the data confirmed this behaviour in most cases. Distance travelled between each observation (30 minute intervals) was analysed using a hidden Markov model, which identified one previously mysterious calving event, and confirmed most of the others. The model was also successful at identifying the start of the roaming period, but less so for the end of the “nesting period”, for which more appropriate models are being developed. My talk will have lots of graphs and not many equations.
080814120356
Bayesian hierarchichal model for evaluating the risk of vessel strikes on North Atlantic right whales in the southeastern United States

Chris Fonnesbeck

Department of Mathematics and Statistics

Date: Thursday 14 August 2008

A primary factor threatening the recovery of the North Atlantic right whale is the ongoing risk of collision with large ocean-going vessels. Hence, any viable conservation strategy must include mitigation of this risk. In particular, the critical wintering habitat off the Atlantic shores of the southeastern United States overlaps with the shipping routes of some of the region’s busiest ports. As a first step in the process of ship strike risk mitigation for this region, we estimated the risk associated with current patterns of shipping traffic, and compared this with estimates of risk for a set of hypothetical alternative routes. As a measure of risk, we selected the co-occurrence of whales and vessels within cells of a 4km grid. We performed parametric estimation of whale encounter rate and associated risk within a Bayesian hierarchical model, using data from aerial surveys and the Mandatory Ship Reporting System of the SE United States, along with a selection of environmental covariates. Importantly, we were able to account for annual and monthly variation in encounters in our estimates. All alternative routes provided reduced overall risk, ranging from 27% to 44% reduction, relative to the estimated risk of observed traffic. The largest marginal gains in risk reduction were attained by restricting traffic associated with the busiest port, Jacksonville, but restrictions on all ports achieved the highest reduction. We emphasize the importance of accounting for temporal as well as spatial variation in whale encounter rates, given the migratory behavior of the species.
080807100102
Modelling count and growth data with many zeros

Austina Clark

Department of Mathematics & Statistics

Date: Thursday 31 July 2008

We discuss the problem of modelling survival /mortality and growth data that are skewed with excess zeros. This type of data is a common occurrence in biological and environmental studies. The method presented here allows us to utilize both the survival/mortality and growth data when both data sets contain a large proportion of zeros. A case study of survival and growth of blue mussels and ribbed mussels trans-located from their natural distribution to different depths and sites along the axis of Doubtful Sound is used for illustration.
080724143002
Why Most Weighted Regressions Are Wrong

David Fletcher

Department of Mathematics & Statistics

Date: Thursday 24 July 2008


In many regression-type settings, we know that the error in the response variable differs from one observation to another. In ecology, for example, we might have several estimates of the annual breeding rate of an animal in each of several years, with each estimate having a different level of precision. The aim of the analysis might be to relate the breeding rate to one or more environmental variables measured in the same years.

A modern and useful approach to the analysis of this type of data is to fit a hierarchical model using Markov chain Monte Carlo methods. In a consulting context, however, it may be useful to provide the client with an analysis that is simpler for them to implement, i.e. weighted regression.

I will consider problems involved in implementing weighted regression correctly and how one might determine when a standard (unweighted) regression is sufficient.
080721103136
Management for Conservation: Controlling an Emergent Disease

William Probert

Spatial Ecology Lab, University of Queensland

Date: Thursday 17 July 2008

We illustrate an innovative approach to environmental management by investigating the management of the Facial Tumour disease impacting populations of Tasmanian devil, Sarcophilus harrisii, in Australia. The novelty of the disease has led to multiple hypotheses regarding disease latency, thus there is not only a long-term objective to maintain devil populations but also a pressing need to understand which of these hypotheses is correct so that an appropriate course of management can be implemented.
Using two management sites we need to both maintain a population of devils and learn about the how the Facial Tumor Disease functions. More generally, this work provides a simple protocol for examining the trade-off between learning and management where we have multiple hypotheses of how our system may function.
080714160059
Open Population Capture-Recapture Models and Diabetes in Otago – the Continuing Saga

Claire Cameron

Department of Mathematics & Statistics

Date: Thursday 29 May 2008

The standard way of estimating diabetes prevalence using lists is through closed population capture-recapture methods. These may use the simple two list case, or the more complicated multi-list case. Prescribed capture-recapture models can be used for the multi-list analysis or, alternatively, loglinear models which allow for the modelling of specific dependencies between lists. The lists, typically, describe a particular point in time.

My research involves the development of open population capture-recapture models to estimate prevalence and incidence. These models allow for the tracking of the lists over time and they also allow for people joining and leaving lists (an open population). I have used four lists of people in Otago who have been diagnosed with diabetes to develop a model that incorporates information on year of diagnosis year and date of death, as well as presence or absence on each list over time.

The results of my work in progress will be presented and issues and problems encountered to date will be discussed.
080522111323
110% confidence tricks: The misuse of statistics in the “debate” about climate change

Doug Mackie

Department of Chemistry

Date: Thursday 22 May 2008

Did global warming really stop in 2002? Does the IPCC suppress data? Few issues have generated such public interest in the physical sciences as climate change. People with no relevant experience or formal expertise make a great many claims denying human caused climate change. But is lack of expertise reason to doubt their claims? What about the claims of people who do have experience and expertise? I will discuss the concept of evidence against a scientific idea and then present some of the arguments made against human caused climate change by members of the New Zealand Climate Science Coalition. I will give particular attention to an article published this April 2008 The Listener, written by Bryan Leyland and Chris de Freitas. I will show the article is so deeply flawed, not least from a statistical point of view, and the arguments have been refuted so many times that, inevitably, questions about how it came to be written, much less published, must arise.
080515145301
Four presentations

Statistics Honours presentations

Department of Mathematics & Statistics

Date: Friday 16 May 2008


1. Jimmy Zeng
Statistical methods for luminescence lifetime estimation.

2. Dorothee Hodapp
Estimation of abundance and impacts on the bottlenose dolphin population in Doubtful Sound.

3. Ella Iosua
Maori population stratification in the genetic study of gout.

4. Philippa Smale
Yellow eyed penguin population development.


Refreshments will be provided
080516085009
Developing probabilistic statistical thinking

Professor Helen MacGillivray

Queensland University of Technology

Date: Thursday 15 May 2008


In the focus over the past decade on data-driven, realistic approaches to building statistical literacy and data analysis curriculum, the explicit development of probability reasoning beyond coins and dice has received less attention. There are two aspects of probability at the introductory tertiary level: for use in introductory data analysis; and as foundation for further study in statistical modelling and applications, and increasingly in areas in information technology, engineering, finance, health and others. This paper advocates a minimalist objective-oriented approach in the former, and a constructivist, collaborative and data-linked approach in the latter. The latter is the main focus here, discussing learning and assessment strategies and analysis of student and tutor feedback, and student performance. Objectives of the course include helping students to unpack, analyse and extend what they have brought with them to tertiary study, develop problem-solving skills, link with data and real investigations and processes, and consolidate and synthesize foundation mathematical skills.
080513102229
Methods and applications of statistical calibration using longitudinal data

Professor Geoff Jones

Institute of Fundamental Sciences, Massey University

Date: Thursday 8 May 2008

Statistical calibration, or inverse regression, uses the estimated relationship between a response Y and a covariate x to infer the values of unknown xs from their observed Ys. I will give a brief review of some basic theory, starting with the simple linear regression model and progressing to the more complex nonlinear and multivariate situations. I will then consider the problems involved in extending these methods to the case of longitudinal data, ie where the training data consists of groups of observations (Yij, xij), j = 1, …, ni on distinct individuals i = 1, …, I. A Bayesian analysis using MCMC is shown to give a flexible framework for solving these problems, but for the reluctant Bayesian other approaches are sometimes feasible using standard software. Much of the material will be presented graphically, using examples from my own work in the development of multiple immunoassays for environmental monitoring and growth-curve modelling for brown kiwi (Apteryx mantelli) and black-fronted tern chicks (Sterna albostriata).
080429105030
Bayesian models in archeology: Settlement of Tawhiti Rahi

Peter Dillingham & James Robinson

Department of Mathematics and Statistics, Department of Anthropology, Gender, and Sociology

Date: Thursday 1 May 2008

The offshore islands along the temperate north coast of New Zealand all contain material culture from pre-European Maori settlement. The common discovery of Mayor Island obsidian in early mainland archaeological sites and a recent series of dated pollen cores from Great Barrier Island suggest that the date of first settlement of (at least some of) these islands was contiguous with the first settlement of New Zealand approximately 700 years ago (BP).

However recent multidisciplinary studies involving archaeology, history and environmental sciences on Tawhiti Rahi (Poor Knights) islands has led to the hypothesis that Maori settlement here occurred much later, sometime after 450 BP and possibly as late as 300 BP in the ‘Classic’ period of Maori culture. This may have been a response to a rapidly expanding population, the need to maximize horticultural food production, and the increasing levels of conflict in Maori society that saw the defensive benefits of these island outweighing the problems associated with difficult access.

Samples of cultural material for radiocarbon dating were archaeologically excavated from a stratigraphic sequence dug in a cave on Tawhiti Rahi. Radiocarbon dates were calibrated using the southern hemisphere calibration curve, and combined with additional information of the archeological site using Bayesian models. In this talk, we will discuss the archeology of the cave site, provide background on radiocarbon dating and calibration, and explain why integrating additional knowledge sources is important to New Zealand archeology.
080428100525
Measuring New Zealand\'s Progress: An Integrated Approach to Official Statistics

Dr Geoff Bascand

Chief Executive & Government Statistician, Statistics New Zealand

Date: Thursday 24 April 2008

Official statistics have provided an important lens on New Zealand’s development for over 150 years. From counts of European New Zealanders and tons of commodity trade, interest in monitoring the country’s development has extended much more widely across many aspects of the economy, society and the environment. Keeping our statistical reporting and measurement in tune with our rapidly changing world and evolving perceptions of national progress is a perennial challenge. In response, Statistics New Zealand is advancing a more integrated approach across domains of interest, conceptual frameworks, statistical architecture, and the institutional system that circumscribes official statistics.
080416160207
he RNA World Scenario in the Context of Mathematical and Statistical Analysis of DNA and Protein Sequences

Jose A. Garcia

Department of Preventive and Social Medicine

Date: Thursday 17 April 2008

The current studies are related with the origin of life in general and on the origin of bacterial chromosomes in particular. I will start by presenting the biological background required to follow the presentation. Then I will introduce some "mathematical properties" that have been found on DNA sequences, and how this properties can be either inherited or not by their corresponding protein sequences. A main result is the relative closeness between the standard genetic code and one proposed for the RNA World. Finally, some comparisons will be shown regarding whether or not RNA putative sequences from the RNA world preserve the mathematical properties observed nowadays.
080408154759
Developments in spatially explicit capture–recapture

Murray Efford

Department of Zoology

Date: Thursday 10 April 2008

Two aspects of most animal studies are ignored in conventional capture–recapture models (i) sampling uses an array of detectors (traps), and (ii) animals occupy home ranges and hence do not mix uniformly between samples. Spatially explicit capture–recapture models incorporate these facts. Their primary purpose is to estimate population density (the intensity parameter of the spatial point process for home range centres) free from bias due to edge effects and spatially induced heterogeneity. Modelling an array of detectors turns out to have some interesting subtleties. If the detectors are traps then they ‘compete’ for animals and each element in the encounter history is one trap location; otherwise, an animal may appear at several ‘proximity’ detectors per sample, and each element in the encounter history is a vector of locations. Software has been developed to fit a range of spatially explicit models (www.otago.ac.nz/density). Current work aims to extend the methods to handle new sorts of data (e.g. sounds recorded by an array of microphones) and to develop the spatial equivalent of open-population models. This talk will cover general principles and a particular acoustic example: ovenbirds Seiurus aurocapillus in a Maryland forest.
080401092855
Sudoku, mathematics and statistics

Peter Cameron, Forder Lecturer

Queen Mary, University of London

Date: Monday 7 April 2008

A statistician introduced to Sudoku for the first time might recognise it as a combination of two ideas from experimental design: gerechte designs and critical sets.

Suppose that a certain number of treatments (such as fertilisers) are being tested on plots in a square field. The standard arrangement, allowing for the fact that fertility might vary in a systematic way across the field, is to use a Latin square, in which each treatment occurs once in each row and column. If in addition there are certain regions of the field which are different in their properties (stony or boggy, for example), then we also require that each treatment occurs once in each such region. This is a gerechte design, introduced by W. U. Behrens.

J. A. Nelder invented the idea of a critical set in a Latin square, a set of entries from which the whole Latin square can be recovered uniquely. The notion immediately extends to gerechte designs, and we see that a Sudoku puzzle is a critical set for a gerechte design where the Latin square is 9 by 9 and the regions are the 3 by 3 subsquares.

There are many other connections between gerechte designs and other areas of mathematics such as finite geometry, perfect codes and orthogonal Latin squares, and many questions about gerechte designs themselves, for example: given a partition of a square n by n grid into n regions with n cells in each, how do we decide whether a gerechte design exists?
080401095948
If a Manatee Surfaces in the Ocean and Nobody Is There to See It, Does It Get Counted?

Dr Christopher J Fonnesbeck

Department of Mathematics & Statistics

Date: Thursday 3 April 2008


The Florida manatee (Trichechus manatus) is an iconic marine mammal that is listed as endangered under both state and federal jurisdictions in Florida. By law, the state’s conservation commission is mandated to census manatees annually, presumably in order to monitor changes in population size. The current census is comprised of fixed-wing aircraft surveys over an arbitrary subset of the animal’s range, and resulting counts are not adjusted for detection bias. Though originally thought to be an index of true manatee abundance, these counts can vary according to a suite of factors that are independent of population size. We have designed and tested a new survey that accounts for major confounding variables and, once implemented, will include the entire state’s manatee population as its sampling frame. Population size is modelled in a Bayesian hierarchical framework, and estimated via Markov chain Monte Carlo.
080401083032
Measurement Error in Auxiliary Information

Raymond Chambers

Wollongong University

Date: Thursday 27 March 2008

Auxiliary information is information about the target population of a sample survey over and above that contained in the actual data obtained from the sampled population units. The availability of this type of information represents a key distinction between sample survey inference and more mainstream inference scenarios. In particular, modern methods of sampling inference (both model-assisted as well as model-based) depend on the availability of auxiliary information to improve efficiency in survey estimation. However, such information is not always of high quality, and typically contains errors. In this talk I focus on some survey-based situations where auxiliary information is crucial, but where this information is not precise. Estimation methods that allow for this imprecision will be described. In doing so I will not only address the types of inference of concern to sampling statisticians (e.g. prediction of population quantities), but also inference for parameters of statistical models for surveyed populations.


A cover charge of $5 will apply to cover drinks and nibbles before the seminar. Ray will speak at 6.30 p.m.

For those interested we plan to go out for a late dinner afterwards.
080313113829
Coming Out: Confessions of a Closet Frequentist

Professor Richard Barker

Department of Mathematics & Statistics

Date: Thursday 20 March 2008


StatChat

There has been a recent explosion of interest in Bayesian inference methods owing to the development of methods such as Markov chain Monte Carlo (McMC) and its implementation in the computer package WinBUGS. The recent uptake of Bayesian methods has been such that many statisticians would now regard themselves as pragmatically Bayesian: happy to adopt Bayesian inference methods for fitting complex models that are not amenable to the methods of frequentist inference.

While the lot of the pragmatist is a happy one, it is disconcerting that there are two competing theories of statistics and that one can pick and choose according to circumstance. Moreover, the two approaches can lead to distinct answers in some common problems. In this talk I discuss ways in which the two approaches can be reconciled. In particular, the idea of calibrated Bayes in the sense of this quote from Donald B. Rubin (1984):

The applied statistician should be Bayesian in principle and calibrated to the real world in practice–appropriate frequency calculations help to define such a tie … frequency calculations are useful for making Bayesian statements scientific, scientific in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events.
080314154137
True models in model selection

Professor David Anderson

Colorado State University

Date: Thursday 13 March 2008

William Evans Fellow

Models have played a central role in the statistical sciences for many decades. Models are approximations by definition. Models in the life and social sciences are often very crude approximations. In the past 30 years there has been increasing use of the notion of “true models” – models that are a prefect reflection of full reality. Much theory has been developed based on the assumed existence of a true model and that this model is in the set of candidate models. Often it is further assumed that actual data come from a model. I intend to raise awareness in these approaches and ask questions about their unintended consequences. I will leave more questions than answers.
080311111601
Much ado about nothing: methods and software implementations to estimate incomplete data regression models

Associate Professor Nicholas Horton

Smith College, USA

Date: Thursday 29 November 2007

Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The development of statistical methods to address missingness has been actively pursued in recent years, including imputation, likelihood and weighting approaches (Ibrahim et al., JASA, 2005; Horton and Kleinman, TAS, 2007). Each approach is considerably more complicated when there are many patterns of missing values and both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available, though not commonly used. We review these methods in the context of a motivating example from a large health services research dataset. Some discussion of the feasibility of sensitivity analyses to the missing at random assumption will also be provided. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible and scientifically desirable to incorporate partially observed values as well as undertake sensitivity analyses to modelling and missingness assumptions.
071109092602
From Estimation of Traffic Flows to Deconvolution of Densities: Some Statistical Linear Inverse Problems

Professor Martin Hazelton

Massey University

Date: Thursday 11 October 2007

During my doctorate I looked at some problems in nonparametric density estimation. This has remained an area of interest to me ever since, with my current focus being on density deconvolution. After completing my doctorate I worked as a research assistant in transportation science. This too has remained an area of interest to me, particularly problems of estimating origin-destination traffic flow. I spent over 10 years thinking of nonparametric smoothing and traffic flow estimation as quite separate research interests, but have more recently come to appreciate some deep links in terms of positive linear inverse problems. This talk will explore some of these links, before looking in more detail at a novel semiparametric method of addressing the density deconvolution problem.
071002141056
Profile Likelihood Intervals for Predictions from a Zero-Inflated Model

Assoc Prof David Fletcher

Department of Mathematics and Statistics

Date: Thursday 4 October 2007

In many research settings, the response variable is skewed and contains a large proportion of zeros. My experience with this kind of data has been in ecological settings, where one wants to estimate the expected abundance of a species corresponding to a set of values for one or more predictor variables. One approach is to use a two-part model in which one separately models presence and abundance given presence. As well as potentially providing a better fit to the data, use of such a model can lead to a more complete understanding of how the predictor variables influence abundance. The main focus of this talk is on the calculation of profile likelihood confidence intervals for expected abundance. This is joint work with Malcolm Faddy from QUT in Brisbane (see JABES (2007) 12: 315–324).
070928092021
Assessment of Bayesian Hierarchical Models for Over-dispersed and Zero-inflated Count Data

Associate Professor Russell Millar

The University of Auckland

Date: Thursday 20 September 2007

The deviance information criterion (DIC) was found to be a potentially dangerous tool for comparison between hierarchical models of count data. It was useful only when the likelihood was expressed at the subject level. Hence it can be used for comparison between Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial models. However, DIC was not reliable for likelihoods expressed at the replicate level. Consequently it can not be used to distinguish between Poisson-gamma (i.e., the negative binomial implemented at replicate level) and Poisson-lognormal models, or to assess whether these models require zero-inflation. For example, when fitting Poisson-gamma and Poisson-lognormal models to simulated Poisson-lognormal data, the Poisson-gamma model always had lower DIC. Bayesian predictive checks (BPCs) were extremely conservative. For example, under 100 simulations of the Poisson model fitted to Poisson data, the lower 5% quantile of the BPC p-value for goodness of fit was approximately 0.3. Nonetheless, BPCs were a useful aid in model comparison and confirmation of DIC.
070913155009
Adaptive Resource Management Optimization Using Reinforcement Learning

Dr Christopher Fonnesbeck

Fish and Wildlife Research Institute, Florida and University of Georgia, Athens

Date: Tuesday 18 September 2007

An important technical component of adaptive resource management is optimization, which is used to select the most appropriate management strategy, given a system and a set of candidate decisions. For dynamic resource systems, dynamic programming has been the de facto standard for deriving optimal state-specific management strategies. Though effective for small-dimension problems, dynamic programming is incapable of providing solutions to suitably realistic problems, even with modern computing technology. Reinforcement learning is an alternative, related procedure for deriving optimal management strategies, based on stochastic approximation. Applications of reinforcement learning in the field of artificial intelligence have illustrated its ability to yield near-optimal strategies for very complex model systems, highlighting the potential utility of this method for ecological and natural resource management problems.
070912154455
An application of MCMC methods for nonlinear hierarchical modelling in clinical toxicology

Professor Stephen Duffull

School of Pharmacy

Date: Thursday 16 August 2007

Clinical toxicology is an area of medicine that is characterised by high levels of uncertainty and heterogeneity. A particularly challenging problem in clinical toxicology is the management of patients who have taken a deliberate drug overdose. Drug overdose may be characterised by serious sequellae including rhythm disturbances of the heart and sudden death. Models to describe and predict the likely time course of events are essential for the management of these patients. We describe the application of Markov chain Monte Carlo methods for nonlinear hierarchical models which have two response levels using WinBUGS. Practical application of these methods to citalopram (an antidepressant) overdose, model selection methods, choice of prior, sensitivity analysis to prior selection are discussed. The model developed from this analysis has been evaluated in a clinical trial and is being used clinically for the management of drug overdose with citalopram.
070809141532