Exploiting occurrence times in likelihood inference for componentwise maxima
Dr. Alec Stephenson
National University of Singapore
Date: Wednesday 15 August 2007
Multivariate extreme value distributions arise as the limiting distributions of normalised componentwise maxima. They are often used to model multivariate data that can be regarded as the componentwise maxima of some unobserved underlying multivariate process. In many applications we have extra information. We often know the locations of the maxima within the underlying process. If the process is temporal this knowledge is frequently available through the dates on which the maxima are recorded. We show how to incorporate this extra information into maximum likelihood procedures.
070809141647
Can you see the wood for the trees? Surveys, sampling, estimation and approximation
Susan Starking
London South Bank University and is the 2006 Schools lecturer, Royal Statistical Society of Britain
Date: Thursday 2 August 2007
Woodland and wood play an important part in our daily lives and in this talk we look at how the Forestry Commission, the Tree Council and the Arboricultural Association assess the value of trees using statistics.
Practical examples involving both mathematics and statistics are used to show how we can assign a value to our trees. The audience is involved in the decision making process on how to assign a value to trees using surveys, sampling, estimation and approximation.
070730144351
BeSTGRID: Broadband enabled Science and Technology GRID
Paul Bonnington
The University of Auckland
Date: Thursday 2 August 2007
As we grow more scale in our research projects we recognize that the complex of IT and communications systems required to support them are beyond the scope of a single project to sustain. A new model of IT support for research has emerged that focuses on defining layered services that can be used to support multiple projects, all dependent on advanced research and education networks. This model is inherently collaborative, requiring a level of transparency not previously considered. These layered services fall in three themes: Data management; Collaboration tools; High Performance Computing.
With the arrival of KAREN in December 2006 New Zealand has quickly moved to embrace this new approach, led by the TEC IDF funded BeSTGRID National Grid programme. Within BeSTGRID we’re evolving a nationally coordinated collaborative approach to research modeled on international best practice but adapted to the unique scope of New Zealand research, science and technology. This means establishing resources and supporting capability easily accessible from any NZ tertiary institution or CRI. While this programme is initially funded establish this capability at Canterbury, Massey, and Auckland Universities, the model can be more widely applied. The model describes how investment can best be apportioned into national, institutional, and project specific resources and capabilities. Paul will be describing the BeSTGRID programme and demonstrating it’s mechanisms and tools, live on the emerging national research grid, BeSTGRID.
070726113106
A Bayes New World?
Ricahrd Barker
Department of Matheamtics & Statistics
Date: Thursday 26 July 2007
The Italian probabilist and Bayesian statistician Bruno de Finetti predicted that by 2020 we would all be Bayesian. This StatChat will be in the form of a forum in which we examine how far we have progressed toward de Finetti’s vision of Utopia.
070719151720
Integrated Genomics For Health And Disease
Mik Black
Department of Biochemistry
Date: Thursday 19 July 2007
The REANNZ funded project, Integrated Genomics for Health and Disease, will use KAREN (the Kiwi Advanced Research and Education Network) to provide New Zealand’s genomics researchers with the ability to store, share, and analyse large quantities of genomic data. A major component of this project involves establishing an online database that will house both public and private microarray data, and will include tools to allow extensive bioinformatic analysis across multiple data sets, a capability which is currently out of reach of many New Zealand research groups. The goal is to provide an online environment that both simplifies and standardizes the process of analyzing data from gene expression microarray experiments, while at the same time moving the burden of data storage and integration away from individual investigators. This talk will provide an overview of the initiative, along with examples of the meta-analysis style of data interrogation that will be possible with this powerful new tool.
070716140047
Correction for Radio Tracking Detection Bias in Estimation of Resource Selection Functions
Lyman L. McDonald
University of Canterbury Visiting Erskine Fellow and West, Inc., USA
Date: Tuesday 5 June 2007
The objective of this research is correction for detection bias in estimation of a resource selection function for an animal, or functions for a sample of a population of animals, whose positions within a study area are recorded periodically by radio tracking methods. It is well known that radio tracking methods occasionally fail to record locations of tagged animals in some resource units, for example, there is failure to triangulate on the radio signal or a Global Positioning System (GPS) tag loses satellite connections in units with dense canopy cover or rugged terrain. These missing observations lead to a potentially biased sample of ‘used’ units during the study period. Biased samples lead to bias in predictions of the absolute or relative probability of use of certain resource units as a function of covariates in analysis of either used versus unused units or used versus available units. Analysis of location data with a substantial amount of missing observations on used units can lead to erroneous inferences if differences in detection rates among resource units are not accounted for. I investigate the properties of a stepwise discrete choice resource selection function which has been modified for estimation of detection probability in radio tracking data and provide unbiased estimates of the probability of use of resource units. The method is illustrated with GPS tracking data of mule deer (Odocoileus hemionus) in Central Wyoming, USA, and properties of the method are investigated by Monte Carlo computer simulation.
070601153255
The Past, Present, and Future of Resource Selection Functions
Lyman L McDonald
University of Canterbury Visiting Erskine Fellow and West, Inc., USA
Date: Tuesday 5 June 2007
Past resource selection functions (RSF) can be based on weighted distribution theory, fu(x) = w(x)fa(x)/c, where x is a vector of predictor variables, the weighting function, w(x)=RSF, is related to the Kullback-Liebler directed distance from fu(x), distribution for ‘used’ units, to fa(x), distribution for all units in the study area, and the constant c is usually unknown. The weighting function gives relative probability by which one would select units in the study area to produce the distribution of used units. Unfortunately, RSF got off to a bad start with undue concern over definition of ‘availability’ of units. The models depend on data collected in a study area and time period in the same way that all models depend on observed data. It is a subjective judgment to use any model, including RSFs, for extrapolation of predictions beyond the data.
Presently, RSF methods estimate relative probability of “use and detection of use by the study protocol.” If an unbiased sample of used units is available, then useful models are obtained. I give a brief review of methods and respond to some issues raised in the provocative paper published by Ken Keating and Steve Cherry in the Journal of Wildlife Management, Number 4, 2004.
The future lies with breaking the confounding between the probability of “use” and probability of “detection given use,” e.g., the patch occupancy design recently proposed by Darryl MacKenzie, Dunedin, NZ, and others at the Patuxent Wildlife Research Center in Maryland, USA. Results are needed for additional study designs, e.g., for analysis of repeated relocations of animals tracked by Global Positioning Systems. If there are missing data in the relocations, i.e., probability of detection is not 100% in all habitat, then analysis methods are needed for estimating probability of “use”.
070524144521
Two presentations
Statistics Honours Presentations
Department of Mathematics & Statistics
Date: Friday 1 June 2007
1. Maryann Pirie
Reconstructing past climates
The use of Kalman Filter and Smoother in estimating the climate influence in tree-rings
2. Aaron Bryant
Hidden Markov models in mark-recapture studies
A look at how Hidden Markov models can be used to estimate temporary emigration of seabirds using mark-recapture data
Refreshments will be provided
070524142953
Bayesian Population Modelling of Hector’s dolphins: From assessment to prognosis
Andrew Gormley
Department of Zoology and Mathematics & Statistics
Date: Thursday 31 May 2007
This talk will report some results of a Bayesian approach to population modelling, using the population of Hector’s dolphins at Banks Peninsula as an example. I will begin by discussing the uses of population modelling in conservation biology. This will be followed by a look at the demographic parameters related to Hector’s dolphin survival and reproduction, and how they were estimated. I then describe an appropriate demographic model and illustrate some of the relationships between population growth and the demographic parameters. Finally the results of a range of population simulations will be presented leading to a prognosis of this population.
070524144352
Statistics and Mathematics
Assoc Prof David Fletcher
Department of Mathematics and Statistics
Date: Thursday 10 May 2007
In 1998, the Journal of the Royal Statistical Society published a number of papers on the role of mathematics in statistics (The Statistician 47: 239-290). I will use these as a starting point for a discussion on this topic. A quote from the abstract of one of the papers provides a clue as to the kind of issues that I will touch on: “Statistics is about solving real problems. An undue emphasis on its mathematical foundations is detrimental to the discipline …. Although mathematics lies at its core, statistics as a discipline involves several essential components beyond mathematics …. It is important that the discipline of statistics should not let itself be marginalised by an apparent obsession with mathematical niceties.”
070503090150
Markov chains – Mixing and Coupling
Jeff Hunter
Massey University
Date: Wednesday 2 May 2007
A cover charge of $5 will apply to cover drinks and nibbles before the seminar. Jeff will speak at 6.30 p.m.
For those interested we plan to go out for a late dinner afterwards.
The time to stationarity in a Markov chain is an important concept, especially in the application of Markov chain Monte Carlo methods. The time to stationarity can be defined in a variety of ways. In this talk we explore two possibilities – the “time to mixing” (as given by the presenter in a paper on “Mixing times with applications to perturbed Markov chains” in Linear Algebra Appl. 417, 108-123, (2006)) and the “time to coupling”. Both these related concepts are explored through the presentation of some general results, without detailed proofs, for expected times to mixing and coupling in finite state space Markov chains. A collection of special cases are explored in order to illustrate some general comparisons between the two expectations. For those not overly familiar with Markov chains an overview of the basic essential concepts will be included in the presentation.
070420164701
Where do they come from? Where did they go? Trace element composition & brown trout migration
Gery Closs
Department of Zoology
Date: Thursday 19 April 2007
Brown trout are known to exhibit considerable diversity in life history, with resident and anadromous fish often occurring within a catchment. Brown trout from mixed stocks were introduced to New Zealand in the 1860’s. Today both resident and anadromous fish are known to occur, but little is known of the frequency or distribution of different life history strategies, or the extent of the migrations occurring within catchments. In the past, various tagging methods have been used to track fish migration. However whilst these approaches have revealed interesting individual results, gaining information at a population level can be logistically challenging. Recent advances in the analysis of trace element composition of soft and hard tissues has provided significant opportunities for tracking migration. In various studies we have used otolith microchemistry to track adult fish back to the streams in which they were spawned, determine whether they have spent time in river estuarine reaches, and tracked whole life-time movements within river catchments. Within the Taieri River, results indicate that whilst fish recruit widely around the system, some streams are far more important than others as a source of recruits. We have also used the trace element composition of spawned eggs dug from reeds to determine the upstream extent of spawning migrations by adult trout. Results from that study indicate that upstream migration by adult trout can be extensive, although resident non-migratory populations exist in many streams. Our results indicate considerable flexibility and a diversity of life history strategies amongst brown trout in large lowland river systems in New Zealand. Clearly defined resident and anadromous populations do not exist; instead, trace element signatures indicate a contiuum of habitat use by adult fish from freshwater residency through to estuarine and marine migratory fish.
070417081637
Statistical Science in the 21st Century
Assoc Prof Richard Barker
Department of Mathematics & Statistics
Date: Tuesday 3 April 2007
Statistical thinking and practice in the 20th century was heavily influenced by the work of R A Fisher. The ideas developed by Fisher, and their subsequent extensions, form the basis of most statistics courses today, especially those taught to non-majors. Since the death of R A Fisher in 1962 there have been three major developments in statistics: the extension of linear model theory to non-normal data; recognition of the need for methods of multi-model inference; and a renewed interest in Bayesian inference. With the possible exception of models for non-normal data, these ideas have been slow to permeate through a statistics curriculum that still tends to emphasise methods in use in the 1960's rather than the ideas behind contemporary analyses of data. In this talk I consider each of these developments and discuss the implications they have for the statistics curriculum today. I emphasise:
1. The role that the development of modelling skills should play in the training of statisticians and scientists
2. The importance of accounting for all uncertainties when making inference
3. The pragmatic advantages of the Bayesian framework for complex problems.
These ideas are illustrated with examples.
070402095140
Decision making under uncertainty and adaptive management: essential concepts and methods, with application to conservation problems
Michael J. Conroy
USGS & University of Georgia, Athens
Date: Thursday 22 March 2007
Decision problems in general, and conservation decisions in particular, involve the assessment of which of several alternative decisions is most likely to meet the objectives of the decision maker. Decision making under uncertainty must consider both the probability of particular outcomes following a decision, and the value of these outcomes to the decision maker. Decision theory allows for the explicit consideration of both uncertainty and value, and leads to decisions that are coherent (Lindley 1985).
Most conservation problems involve sequential decisions in dynamic, stochastic systems. Decision makers must account both for the immediate consequences of actions and the potential consequence of today's actions on tomorrow's decision opportunities. Thus, in harvest management the objective function includes both current harvest yield and future, anticipated harvest; strategies that fulfill this objective are by definition sustainable. The Principle of Optimality (Bellman 1957) leads to a globally optimal solution to a broad class of Markov decision problems via dynamic programming (DP). In contrast to 'open loop' procedures such as simulation-gaming and genetic algorithms, DP is 'closed loop', explicitly providing for feedback of anticipated future states to present decisions.
In the above, uncertainty exists (e.g., due to demographic or environmental stochasticity) but under a single model of the underlying biological and physical processes. By contrast, there often is profound uncertainty about the processes relating candidate decisions to outcomes. This uncertainty degrades the ability to make an optimal decision. Under sequential decision making and monitoring uncertainty can potentially be reduced, via adaptation. Decision making not only changes the natural system state; in doing so, it influences the relative belief in alternative models; this is known as dual control. Thus, a key premise of adaptive management, which depends on dual control, is that decision making and 'learning' are synergistic, not competitive processes.
Challenges remain for the implementation of adaptive management under dual control. Some of these are technical, including limitations on the dimensionality of decision models under DP, and the most effective ways of incorporating information into decision making. Advances in computing efficiency and algorithms, and Bayesian methods, have reduced or eliminated the technical limitation to adaptive management in conservation. The remaining challenges are principally social or political, and include disagreement over fundamental objectives, confusion of values issues with technical issues, and definitions of adaptive management that confuse rather than clarify.
070315135121
Applications of the Bidomain Model for Studying Problems in Electrocardiology
Assoc Prof Peter Johnston
School of Science, Griffith University
Date: Tuesday 20 March 2007
Predictive modelling of the electrical activity of the heart can provide insights and explanations for both the normal and abnormal time traces observed in the electrocardiogram (ECG). Ideally, such modelling would be initiated at the cellular level (including both intracellular and extracellular spaces) but, even with modern computing, such a model is still some way off. As a consequence, the bidomain model has been introduced as a continuum approximation to account for the cellular structure of cardiac tissue. The bidomain model assumes the spatial coexistence of both intracellular and extracellular domains throughout the tissue and is based on Ohm’s law. The model allows for different conductivities in the intracellular and extracellular spaces, both along and across the fibres of cells which make up the cardiac tissue.
This talk will present two applications of the bidomain model. Firstly, in a simplified geometry, we will study the effect oxygen deprived (ischaemic) cardiac tissue has on the heart surface electric potential distribution. We will consider various approximate relationships between the conductivity values which make the bidomain model more tractable. All solutions are compared to experimental data.
A second application centres on using an array of micro-electrodes to determine the four bidomain conductivity values as well as the total fibre rotation through the wall of the heart. This application results in a non-linear inverse problem which uses the bidomain model as the forward problem. The methodology is described in detail and an example will be considered.
070315140306
Hidden and Not-So-Hidden Markov Models: Implications for Environmental Data Analysis
Richard W. Katz
National Center for Atmospheric Research, Boulder, CO, USA
Date: Thursday 30 November 2006
A hidden Markov model includes as one component an underlying Markov chain whose states are assumed unobserved. A “not-so-hidden” Markov model has the same probabilistic structure, but the states of the Markov chain are either fully, or at least partially, observed. The experience in applying not-so-hidden Markov models in the environmental and geophysical sciences is reviewed. It is argued that this experience should provide some hints about the circumstances in which the use of hidden Markov models would be beneficial in environmental data analysis.
One example of a not-so-hidden Markov model is a chain-dependent process, a model commonly fitted to time series of daily weather (sometimes termed a “weather generator”). This model involves a stochastic process (e.g., precipitation intensity, temperature) defined conditional on the state of an observed Markov chain (e.g., precipitation occurrence). In its most general form, the temporal dependence in the observed variable is directly modeled, not simply induced from the hidden Markov component.
Much debate in the environmental and geophysical sciences revolves around the question of whether “regimes” exist. Conceptual models, with close resemblance to hidden Markov models, have been proposed as probabilistic representations of such regimes. A particular variant of a hidden Markov model, used to quantify climate predictability as consistent with regimes, involves a Markov chain whose states are observed, but whose transition probabilities shift depending on a hidden state. Examples of geophysical variables for which regime changes would have important environmental implications are numerous. They include indices of large-scale atmospheric/oceanic circulation, such as the North Atlantic Oscillation, as well as palaeoclimate reconstructions (i.e., evidence for “abrupt” climate change).
The role of hidden Markov models in environmental and geophysical data analysis is not yet clear. Besides serving as an empirical device for more flexible modeling, attempts have been made to attach physical interpretation to hidden state variables. So far, such interpretations seemed to have verified what is already known, rather than resulted in new discoveries.
*The National Center for Atmospheric Research is sponsored by the National Science Foundation.
061130085610
The ability of New Zealand seabirds to sustain bycatch-related mortalities
Peter Dillingham
Department of Mathematics and Statistics
Date: Wednesday 15 November 2006
Seabirds are characterized by long life, delayed maturity, and low birth rates, with limited potential for growth even under optimal circumstances. Tens of thousands are killed annually in longline fisheries and the ability of seabird populations to sustain this additional mortality is an area of ecological interest. The data available for most species of seabirds is limited in scope and quality, making full population modelling difficult for many species. A simple rule developed for estimating allowable bycatch rates is combined with an estimate for the maximum annual growth rate of birds. This provides an initial framework for calculating the ability of New Zealand seabirds to sustain bycatch-related mortalities and requires only an estimate of the age at first breeding, adult survival, and population size. This approach is used to provide estimates of the level of human-induced mortality that may be sustained for several New Zealand seabirds.
061113081406
Confidentiality of interim data in clinical trials
Katrina Sharples
Department of Preventive & Social Medicine
Date: Thursday 12 October 2006
In Phase III trials intended to provide a definitive evaluation of the benefit to risk profile of a therapeutic strategy it has become standard practice to have an independent Data Monitoring Committee (DMC). The primary role of this committee is to safeguard the interests of the trial participants, but it also fulfills an important role in preserving the integrity and credibility of the trial. The committee makes recommendations to the trial steering committee regarding early termination of the trial as well as recommendations relating to trial conduct. The membership of DMCs is multidisciplinary, with a major role being played by biostatisticians.
Most commonly the accumulating data on the relative safety and efficacy of the treatment under evaluation is kept confidential to the independent DMC. However recent experiences on a DMC have highlighted some differing international and disciplinary perspectives on the appropriateness of this confidentiality, particularly in regard to publication of short term outcome data. This talk will discuss the issues surrounding the release of interim data including the unreliability of early data, the role of statistical stopping guidelines, ethics and the implications for trial integrity.
061006141031
Final presentations of honours statistics projects:
The students and topics are:
Mathematics & Statistics
Date: Thursday 5 October 2006
Samuel Brilleman:
Limits to Human Running Performance
Zijia Jiang (Carrie):
Gender Difference In Number of Sexual Partners
Hee Mong Wong (Levin):
Financial Time Series
Refreshments will be provided.
061003165112
Estimating fecundity of Hector’s dolphins when calves are not always detected and Eavesdropping on sperm whales - passive acoustic localization
Andrew Gormley and Brian Miller
Dept of Zoology and Maths & Stats and Dept of Marine Science
Date: Thursday 28 September 2006
Hector’s dolphin calves are born during summer and remain with their mothers for a number of months. Given perfect detection of calves, a simple estimate of fecundity is the proportion of mature females seen with a calf. A problem arises from the fact that a calf is not always detected meaning a sighting of mature female without a calf does not necessarily equate to no calf. Therefore we essentially have a problem of zero-inflated data.
and
Diving behaviour of sperm whales at Kaikoura has remained largely a mystery due in part to the extraordinary dive durations and depths typical of sperm whales. Passive acoustic technology and signal processing techniques can provide some insight by taking advantage of the fact that sperm whales click almost continuously while underwater. Clicks and echoes are recorded using a simple hydrophone array, and dive profiles are reconstructed using differences in sound arrival time between different hydrophones and measurements of how the speed of sound changes with depth.
060925135338
Estimating fecundity of Hector’s dolphins when calves are not always detected
Andrew Gormley
Departments of Zoology and Mathematics & Statistics
Date: Thursday 28 September 2006
Hector’s dolphin calves are born during summer and remain with their mothers for a number of months. Given perfect detection of calves, a simple estimate of fecundity is the proportion of mature females seen with a calf. A problem arises from the fact that a calf is not always detected meaning a sighting of mature female without a calf does not necessarily equate to no calf. Therefore we essentially have a problem of
zero-inflated data.
060925135124
A Semi-Markov Model For Biting Time Series
Roger Littlejohn
AgResearch Ltd., Invermay
Date: Thursday 21 September 2006
Biting time series consist of the vertical force exerted by animals on swards of grass over series of bites. The vertical force exerted is generally in the range of 50-150 N. Before each bite (pull) there is a push as the animal selects the mouthful with her tongue. After each bite there is a quiescent period, possibly followed by a small tear on residual blades of grass with a vertical force of 5-20 N.
The data are modeled using a semi-Markov model with 4 states. We are interested in differences in parameter estimates with experimental treatments and also in the relationship of the total pullforce during a bite on the total push-force during the previous interval and the length of that interval.
060919084320
StatChat: Model selection and model averaging
Richard Barker
Mathematics & Statistics
Date: Thursday 14 September 2006
When formulating models for data statisticians rarely have just one model in mind. Making inference from more than one model has been an important development in statistical thinking during the past 20 year. Much of this thinking has been popularised by Ken Burnham and David Anderson through two editions of their book “Model Selection and Inference – A Practical Information-theoretic Approach”. In this StatChat we cover a brief history of model-selection and AIC, including its roots in information theory, and consider whether the Bayesian paradigm offers a broader framework for multi-model inference and one in which AIC-based model weights should be naturally evaluated.
060908085928
Estimation and Testing with Interval-Censored Data
Jon A. Wellner
University of Washington, Seattle WA, (Currently visiting Victoria University, Wellington)
Date: Thursday 24 August 2006
Suppose that X is a random variable (a “survival time”) with distribution function F and Y is an independent random variable (an “observation time”) with distribution function G. Suppose that we can only observe (Y, Delta] where Delta = 1_{[XDelta Y]} and our goal is to estimate the distribution function F of the random variable X. The Nonparametric Maximum Likelihood Estimator (NPMLE) \hat{F}_n of F was described in 1955 in papers by H. D.Brunk (and four co-authors), and by C. van Eeden.
A further problem involves inference about the function F at a fixed point, say t_0. If we consider testing H : F(t_0) = theta then one interesting test statistic is the likelihood ratio statistic
\lambda_n = \frac{sup|F Ln(F)}{sup_{F:F(t_0)=\theta_0}L_n(F)}
= \frac{L_n(\hat{F}_n)}{L_n(\hat{F}^0_n)}.
This involves the additional problem of constrained estimation: we need to find the NPMLE \hat{F}^0_n of F subject to the constraint F(t_0) = \theta_0. Inversion of the likelihood ratio tests leads to
natural confidence intervals for F(t_0).
Even though the problem of estimating F is non-regular, with associated rate of convergence n^{−1/3} rather than the usual n^{−1/2}, the likelihood ratio statistic \lambda_n has a limiting
distribution analogous to the usual \chi_1^2 distribution for regular problems which is free of all nuisance parameters in the problem, and this leads to especially appealing tests and
confidence intervals for F(t_0).
In this talk I will describe the estimator \hat{F}_n and its constrained counterpart \hat{F}_0^n, discuss the asymptotic behavior of these estimators and the log-likelihood ratio statistic 2 log \lambda_n, and briefly describe the inversion of the tests to obtain confidence intervals. Some open problems concerned with generalizations in several directions will also be mentioned.
060817143416
The role of statisticians in the analysis of data from high dimensional genetic, genomic and proteomic technologies.
Mik Black
Department of Biochemistry
Date: Thursday 17 August 2006
The past decade has seen major advances in the technologies that biologists use to examine the intricacies of life at the molecular level. Although many differences exist, a feature shared by these technologies is the increasingly large amount of data they generate. Statisticians, therefore, are able to play an important role in the design, analysis and interpretation phases of experiments based on these approaches, and as a result have helped shape the rapidly evolving field of Bioinformatics. This talk will examine the role statisticians can play in studies which utilize these technologies, with a focus on statistical techniques that can be applied to the analysis of data from gene expression microarray experiments.
060809135229
Minimum Variance Importance Sampling Via Population Monte Carlo
Christian P. Robert
Universite Paris Dauphine and Erskine Visiting Professor
Date: Friday 21 July 2006
In the design of efficient simulation algorithms, one is often beset with a poor choice of proposal distributions. Although the performances of a given kernel can clarify how adequate it is for the problem at hand, a permanent on-line modification of kernels raises concerns about the validity of the resulting algorithm. While the issue is quite complex and most often intractable for MCMC algorithms, the equivalent version for importance sampling algorithms can be validated quite precisely. We derive sufficient convergence conditions for a wide class of population Monte Carlo algorithms and show that Rao--Blackwellized versions asymptotically achieve an optimum in terms of a Kullback divergence criterion, while more rudimentary versions simply do not benefit from repeated updating. In particular, since variance reduction has always been a central issue in Monte Carlo experiments, we show that population Monte Carlo can be used to this effect, in that a mixture of importance functions, called a D-kernel, can be iteratively optimised to achieve the minimum asymptotic variance for a function of interest among all possible mixtures.The implementation of this iterative scheme is illustrated for the computation of the price of a Europeanoption in the Cox- Ingersoll-Ross model.
Novel implementations of the population Monte Carlo method centered
around clustering and tempering will also be presented.
[This is joint work with R. Douc, A. Guillin, J.M. Marin & A. Mira]
060712121631
Clustering using Product Partition Models
Vicki Livingstone
Department of Preventive and Social Medicing
Date: Thursday 8 June 2006
Product partition models (PPMs) allow us to partition a set of objects into k sets. PPMs are a special case of Bayesian Partition models. They use partially exchangeable priors where given a partition of the objects into k sets, the objects in the same set are exchangeable and the objects belonging to distinct sets are independent. PPMs specify prior probabilities for a random partition and update these into posterior distributions of the same form. They provide a convenient way of allowing the data to weight the partitions likely to hold. Posterior estimates of the parameter of interest are obtained by conditioning on the partition and summing over all generated partitions. Markov chain Monte Carlo (MCMC) techniques are used to generate partitions of the data. PPMs can be applied to many diverse estimation problems and in this talk I outline some areas where they are useful.
060602102333
The students and topics are -
Preliminary presentations of honours statistics projects:
Department of Mathematics & Statistics
Date: Friday 26 May 2006
Hee Mong Wong (Levin):
Heavy Tails in Financial Data
Zijia Jiang (Carrie):
Reliability of Reporting of the Number of Sexual Partners
Samuel Brilleman:
Ultimate Running Performance of Man
060523135852
Steller sea lions: Population dynamics with a stellar budget
Peter Dillingham
Department of Mathematics & Statistics
Date: Thursday 18 May 2006
Between the late 1970s and the dawn of the new millennium, the population of Steller sea lions (Eumetopias jubatus) declined by approximately 75%. Into the fray between environmentalists, the fishing industry, and the U.S. National Marine Fisheries Service entered Senator Ted Stevens and US$120 million (NZ$180 million), allocated to scientists to determine the cause of the Steller sea lion’s decline.
One of the primary controversies surrounds the role of walleye pollock (Theragra chalcogramma) in the decline. One theory suggests over-fishing of pollock is to blame for the decline. A competing theory suggests that pollock are “junk food” and that the decline is a result of an oceanic regime shift in the 1970s that led to pollock, Pacific cod (Gadus macrocephalus), and flatfishes dominating the ecosystem. Methods and results from an observational analysis addressing this question will be presented, as will an overview of some of the other hypothesis surrounding the sea lion decline. Discussion regarding what data were available versus what was not can be used to gain insight into better ways to spend $120 million.
060509132506
The concept and measurement of glycaemic index for food
Sheila Williams
Department of Preventive and Social Medicine
Date: Thursday 11 May 2006
The glycaemic index (GI) is used to rank carbohydrates according to their effect on blood glucose. This seminar will describe the problems, statistical and other, associated with the estimation of GI.
060505100622
Stochastic Models in Climate and Hydrology
Dr Anna Panorska
Department of Mathematics and Statistics, University of Nevada at Reno
Date: Tuesday 9 May 2006
Stochastic models provide a natural framework to describe and quantify randomness in Nature. They enable us to compute the likelihood of potentially disastrous phenomena such as: a drought longer than all observed droughts, a flood with peak discharge exceeding previously measured discharges, or an El Niño episode with magnitude greater than all past ones. Stochastic models also provide rigorous decision theory useful for deciding, for example, whether two drought episodes are significantly different from one another. Such information holds great value for water resources managers, risk assessment, civil engineering projects, and the insurance industry.
We present a new stochastic approach to modeling duration, magnitude and maximum of hydroclimatic events such as drought or flood. Our approach rests on the theory of random sums and their limits. We focus on a bivariate model for duration and magnitude of hydroclimatic events and decision theory for magnitudes of different events. We also discuss the many fascinating statistical problems that stem from the hydroclimatic research questions. All stochastic theory is presented in the context of the “Dust Bowl” drought of the 1930s in the USA.
060502145953
Random effects modelling in the analysis of hospital performance
Dr David Ohlssen
MRC Biostatistics Unit, Cambridge, UK
Date: Monday 1 May 2006
Over the past 15 years there has been a large increase in the use of routine data to examine the performance of health services providers in the UK. Examples include the Bristol Royal Infirmary inquiry and the performance indicators produced each year by the UK Healthcare Commission. Various methods have been used to identify unusual performance including simple standardization methods, comparisons using ranking and comparisons using model diagnostics. The talk will provide background motivation based on a number of high profile examples and also review the statistical literature in this area. The methodological developments are divided into two separate parts:
1) A hierarchical modelling framework to identify unusual performance using hypothesis testing.
2) A completely flexible random effects model to estimate hospital effects using ideas from Bayesian nonparametrics.
The methods will be exemplified by a comparison of heart surgeons and a large NHS data set examining hospital survival rates. Throughout I will emphasise the use of freely available software and alternative applications for the methodology.
060427135515
The Impact of Paid Employment on Academic Achievement
Phil Morrison
Otago Polytechnic
Date: Thursday 27 April 2006
This pre NCEA study investigated the impact of student employment on academic achievement within the senior high school (N=223). External national examinations for all year 11, year 12, and year 13 students were analyzed against a survey of the students working status. The incidence of student employment during the school term (82-88%) exceeded than official statistics (25-30%). While the total hours worked during the school term was negatively related to respective subject examination outcomes, the problem appeared more detrimental for year 12 and year 13 students, and in particular Mathematics.
060303104133
Item Response Theory(IRT): Main models and applications in Education and other areas
Dalton F. Andrade
Department of Informatics and Statistics, Federal University of Santa Catarina, Brazil
Date: Thursday 20 April 2006
In this talk I shall be presenting the main models in item response theory with applications to large educational assesssment and other areas. I shall discuss in detail the Brazilian National Assessment of Basic Education. This includes the equating of the results of students belonging to the 4th, 8th and 11th grades. Briefly, I shall also talk about some other statistical methods, such us sampling, planning of experiments and hierarchical modelling that are also used in this type of assessment.
060303103914
Building houses on straw polls
Warren Palmer
Department of Mathematics & Statistics
Date: Thursday 13 April 2006
In this seminar we consider how New Zealand journalists report political polls. Two recent newspaper articles are featured. Perhaps not surprisingly we have detected a tendency for journalists to focus on sample size, to misunderstand the concept of margins of error, and to have little idea as to whether a result is generalisable. We also consider the importance of non-respondents. We wonder if journalists question the validity of survey results they have been given. We ask the question:
Could a non-random convenience survey have as much validity as a more formal survey conducted by a specialist research company?
060403105601
From scarfie to Her Majesty’s Chief Statistician
Len Cook
Former National Statistician and Registrar General for England and Wales
Date: Friday 7 April 2006
Population censuses provide a unique case study in the classification and enumeration of populations. Censuses such as the one we have just had in New Zealand have become less able to meet our needs to measure the population as the very nature of our population and its structure has become more complex. What makes censuses more difficult generally affects all sources of information about the population, yet the needs of public policy continue to generate high expectations of census takers. The talk will be about how the experiences in the last British census of 2001 put a strong spotlight on these problems, and how we can work through them. The experiences will compared with how these issues have been handled in New Zealand.
060330102200
An Introduction and Overview of Mixed Models
Melanie Bell
Department of Preventive and Social Medicine
Date: Thursday 30 March 2006
An assumption for many classic statistical methods is that the data are independent. However, in applied sciences data are often correlated. This correlation may be due to repeated measures, longitudinal data, clustering (for example, in multicenter trials or surveys) or spatial proximity. Mixed models are a powerful and flexible approach to handling correlated data, and their use is becoming more and more widespread, particularly in the health sciences. For example, one can use mixed models for crossover trials, multicenter trials, meta analysis, missing data problems, and growth curves. This talk will give an introduction and overview of some of the applications of mixed models, particularly in the field of biostatistics.
060323151208
Improving Middle-Level Mathematics Teaching and Learning: A Wyoming Statewide Initiative
Linda Hutchison
University of Wyoming
Date: Wednesday 29 March 2006
An NSF-funded project designed to develop, test, and implement a coordinated set of university courses specifically to prepare out-of-field teachers for teaching mathematics at the middle-school level (NZ Years 6-9) will be presented. Mathematicians, mathematics educators, statisticians and practicing teachers collaborated to complete project objectives. Results of the project completed in 2005 will be discussed.
060309100819
Drugs in Sport - Cheating and the Cheats
David Gerrard
Dunedin School of Medicine
Date: Thursday 23 March 2006
Sport is an international phenomenon attracting well-paid, high-performing athletes. We are bouyed by the success of our national teams at major events like the Commonwealth Games. But unfortunately sport has its "dark side" - a perspective best illustrated by the intrusion of performance-enhancing drugs and various methods employed to avoid their detection.
This seminar will highlight something of the history of drug misuse in sport, describe the current international opinion on standards and sanctions and look at the work of the New Zealand Sports Drug Agency. It will serve to remind us that the vast majority of our athletes are drug-free.
060320111133
A Unified Capture-Recapture Model
Matthew Schofield
Department of Informatics and Statistics
Date: Thursday 16 March 2006
Capture-recapture models provide information about demographic parameters. Despite capture-recapture methods being around for 40 years, there are many different models used for the birth process. Most models use parameters chosen for computational convenience. These parameters depend on attributes of both the study population and study design. An analysis of the same population with a different design will lead to different parameters. Some models use a per-capita birth rate index, a more natural parameter that depends on the expected population size at each period. Although a step in the right direction, a demographer would prefer a parameter that depends on the population size, not its expected value. This is not possible in the models above, because population size is a derived quantity and does not appear explicitly in the model. A related issue is density dependence. Method of moment estimators are available, but because population size is not explicitly in the model, density dependence is difficult to model in capture-recapture studies. Using a missing data framework, the times of birth and death for each individual in the population are included explicitly in the model. Summaries of these variables include population size and the number of births and deaths each period. These variables may be used to give per-capita growth rates, or used to model relationships between parameters, such as density dependence where survival depends on population size at the beginning of the period.
060309101247
Incorporating genotype uncertainty into mark-recapture-type models for estimating abundance using DNA samples
Janine Wright and Richard Barker
Department of Mathematics & Statistics
Date: Thursday 9 March 2006
The use of genetic tags (DNA derived from material collected non-invasively, such as hair or faeces) to identify individual animals is increasingly common in wildlife studies. The method has huge potential, but while it is possible to generate significant amounts of data from these non-invasive sources of DNA, the biggest challenge in the application of the approach is overcoming errors inherent in samples collected.
Genotyping errors arise when the poor quality or insufficient quantity of DNA leads to heterozygotes being scored as homozygotes (termed 'allelic drop-out'). These error rates will be specific to a species, and will depend on the source of samples. If errors go undetected and the genotypes are naively used in mark-recapture models, significant overestimates of population size can occur. Using data from brush-tailed possums in New Zealand we describe a method based on Bayesian imputation that allows us to model data from samples that includes uncertain genotypes.
060302135635