Fisheries marine mammal and bird bycatch in Alaska and Hawaii: observer sampling and the analysis of data
Bryan Manly
Western EcoSystems Technology, Inc., Cheyenne, Wyoming
Date: Thursday 4 August 2011
It will cover some or all of the following topics, depending on the time available. The Alaska Marine Mammal Observer Program (AMMOP) and the surveys carried out in Kodiak in 2002 and 2005, and in Yakutat in 2007 and 2008.
The analyses that I proposed for the Hawaiian deep-set longline fishery in 2009 to estimate take rates and numbers for marine mammals based on data from observers on about 20% of the fishing vessels, where a "take" is some interaction between a marine mammal and a fishing vessel.
The planning process for the sampling of the Southeast Alaska driftnet fishery starting in 2012.
110718101505
Continuous space-time modelling in epidemiology
Tilman Davies
Massey University
Date: Thursday 28 July 2011
It is natural to expect the incidence of disease in a given population to vary over the space-time continuum. Much of the spatio-temporal modelling carried out to date has lacked the flexibility required in order to adequately cope with the complex correlation structures inherent in these kinds of data. Recently, a highly promising class of semiparametric models was introduced into the literature to address this issue. Possible to be constructed using nonparametric, fully parametric and stochastic components, a number of problems both theoretical and practical, remain in their implementation. A brief coverage of these models is given, and current developments including the first R-language implementation are discussed.
110718101140
Reconstruction of a Demographic Expansion from Multiple Sources of Evidence
Dr Steven Miller
Computing & Mathematical Sciences, University of Waikato
Date: Thursday 14 July 2011
The Neolithic transition through Europe was a key event in the development of human technology. It was the last subdivision in the series of stone ages, and heralded the invention of agriculture.
Echoes of the expansion into Europe might still be found in data from the diverse fields of archaeology, genetics and linguistics. However, conclusions concerning the time and the direction of the expansion vary, particularly when considering data from disparate sources.
Our intention is to use methods from the area of indirect inference to reconstruct the Neolithic transition. By combining information from the three sources of data, along with a population diffusion simulation model, we aim to quantify the uncertainty about demographic parameters of interest.
This is still a work in progress, but we illustrate our proposed approach with a selection of trivial toy examples.
110705100542
Multimodel inference in ecology and evolution: Navigating AIC, GLMM and inbreeding
Dr Catherine Grueber
Department of Zoology, University of Otago
Date: Thursday 16 June 2011
Multimodel inference methods based on information theory and model averaging (IT-MA) are increasing in popularity, but these approaches can be difficult to apply to realistic, complex models that typify many ecological and evolutionary analyses. We aimed to apply IT-MA methods to the analysis of inbreeding in the highly endangered takahe, where inbreeding was expected to have only a weak effect on the fitness response relative to other measured predictors (such as population density and inter-site effects). We used generalised linear mixed-effects modelling to account for repeated observations from annually breeding pairs of birds. During our analysis, we encountered a number of practical obstacles to averaging such complex models where parameters of interest are weak and random effects are included. In this talk I will describe our analysis process, and identify some of the challenges we met along the way. In doing so, I will present solutions where they are available, and identify areas where future research would be helpful, particularly to those researchers without a formal background in information theory. We hope that an overview of these issues will help increase the accessibility of IT-MA methods for investigating systems in which multiple variables impact an evolutionary or ecological response.
110302100046
Modelling the data we wish we had: a non-Bayesian approach
Dr Darryl MacKenzie
Proteus Wildlife Research Consultants
Date: Thursday 2 June 2011
In this talk I shall discuss my (very) recent exploration of utilising the expectation-maximisation (EM) algorithm as a means to fit occupancy models (modelling species presence-absence data with false absences). Such an approach requires us to consider the data that we wish we had available for modelling by defining the complete data likelihood. Use of the EM algorithm offers a number of advantages over both typical likelihood-based and Bayesian hierarchical frameworks. A number of examples will be presented to highlight the usefulness of this alternative approach. While some of the jargon sounds scary, it's actually quite intuitive and (relatively) easy to follow. There's bound to be plenty of hand-waving as I discuss other areas where the EM algorithm may prove to be useful.
110302100013
4th year Project Presentations
400-level Maths students
Department of Mathematics and Statistics
Date: Friday 27 May 2011
Eman Alhassan
Dedekind Domains
Boris Daszuta
Spectral Methods, Wave Equations and the 2-Sphere
Richard McNamara
Parseval Frames
Sam Primrose
Leavitt Path Algebras
110518162414
Where did you get that rat? Using genetics to study the origins and swimming patterns of invasive pests
Associate Professor Rachel Fewster
Statistics Department, University of Auckland
Date: Thursday 26 May 2011
Every week, islands around New Zealand are subject to a barrage of invasions by four-legged creatures with sharp teeth and big appetites. These invaders are mammal pests, including rats, stoats, and mice, and they have plenty of tricks up their furry sleeves for reintroducing themselves to conservation sanctuaries. They are excellent and eager swimmers, hitch rides on boats, abound in resourcefulness, and can cost tens of thousands of dollars - each - to track down and remove when discovered on a sanctuary island.
Understanding where mammal invaders are coming from is pivotal to the long-term protection of sanctuaries. I will describe the statistics behind genetic assignment methods to estimate the origin of individuals, emphasising both the usefulness and the limitations of the techniques. To make best use of these tools, we need to coordinate efforts on a national scale. I will give a demonstration of some map-linked database software I am developing with programmer Sunil Patel, to coordinate data management from the initial trapping stages to the final data analysis.
The talk is intended to be accessible to biologists as well as of interest to statisticians and mathematicians.
110302095922
Bootstrapped model-averaged Confidence Interval
Jimmy Zeng
Department of Mathematics and Statistics
Date: Thursday 19 May 2011
Model averaging is commonly used to make allowance for model uncertainty in parameter estimation. In the frequentist setting, a model-averaged estimate of a parameter is a weighted mean of the estimates from individual models, the weights being based on an information criterion or on bootstrapping. In this talk, I will review current methods for calculating model-averaging confidence intervals and propose a new bootstrap-based method that appears to provide better coverage rates.
110302095602
Kernel density estimates to diagnose severe sepsis in critical care patients
Jacquelyn Parente
University of Canterbury
Date: Wednesday 18 May 2011
Severe sepsis is a serious, common, costly, and often deadly medical condition characterized by organ failure and whole-body inflammation due to infection. As current diagnostic methods do not meet existing early management guidelines, there remains a need for an accurate and timely severe sepsis diagnostic. Kernel density estimates were used to develop joint probability density profiles of hourly severe sepsis and non-severe sepsis bedside clinical data and for classification. This method provides an effective real-time diagnostic for severe sepsis with high accuracy. Currently, a 2-D kernel density estimation method is being developed to construct a stochastic model to describe the hourly transition of patient sepsis states. A prospective validation trial has been completed at Christchurch Hospital.
110323171024
Analysis of count data: implications for meta-analysis
Professor Peter Herbison
Preventive and Social Medicine, Dunedin School of Medicine, University of Otago
Date: Thursday 12 May 2011
In randomised trials with count outcomes (such as number of falls/ exacerbations of asthma) it is possible to analyse the results in many different ways. Possible ways include rate ratios, poisson regression, negative binomial regression, dichotomising the data (with and without the event), different forms of survival analyses, comparison of means and comparison of medians. It is not known whether it is possible to combine the results of these different methods of analysis in a meta-analysis. This talk will present the results of a simulation study with count data with a range of means and differing amounts of overdispersion to see if the different methods give the same answers.
110302095429
A new method for estimating overdispersion in count data
David Fletcher
Department of Mathematics and Statistics
Date: Thursday 5 May 2011
When analysing count data, it is natural to initially consider use of a Poisson model. In practice, the data are often overdispersed relative to the Poisson distribution, with the variance being greater than the mean. This phenomenon of overdispersion is well known. Failure to allow for it can lead to underestimation of standard errors and misleading model comparisons. One approach to dealing with overdispersion is to avoid assuming a distribution for the response variable, and to assume only that the variance is proportional to the mean, the constant of proportionality being the dispersion parameter. An estimate of this parameter is used to calculate appropriate standard errors and make more reliable model comparisons. A well-established method for estimating the dispersion parameter is to divide Pearson's lack-of-fit statistic by the residual degrees of freedom. We propose an alternative estimator, based on quadratic estimating equations, that appears to be more reliable, especially when the mean is small.
110302095350
Likelihood-based estimation of population density and the case of the horny toad
Dr Murray Efford
Department of Zoology, University of Otago
Date: Thursday 21 April 2011
Searches of an area for animals or their sign yield data from which population density can be estimated. Estimates will be biased, perhaps strongly so, if allowance is not made for movement across the boundary and for incomplete detection. A Bayesian method using data augmentation has been proposed for area-search data and applied to a dataset on the horny toad (actually a desert lizard) in Arizona. I develop a likelihood-based alternative. Comparison of maximum likelihood and Bayesian estimates for this dataset and with simulated data revealed errors in the Bayesian implementation. Performance of the MLE was generally superior to the corrected Bayesian method with respect to bias, interval coverage and speed. The likelihood-based method has been implemented for a very wide range of scenarios, including multiple irregular polygons, competing polygons, and walked transects.
110302095214
The R Project: A brief history and thoughts about the future
Associate Professor Ross Ihaka
University of Auckland and the R Foundation
Date: Wednesday 20 April 2011
The R project began as a small academic research project and has evolved into a grass roots movement which has produced one of the most fully featured pieces of statistical software available. In this talk I'll examine how the project evolved and discuss the lessons which can be learned from it. I'll also discuss a new project which is underway and which hopes to follow a similar path.
110405142657
Fuzzy ecological communities: clustering and pattern detection using mixtures
Professor Shirley Pledger
School of Mathematics, Statistics, and Operations Research, Victoria University of Wellington
Date: Thursday 14 April 2011
In studies of ecological communities, patterns of occurrence or abundance of different species over different samples are traditionally found by multivariate methods such as multidimensional scaling, cluster analysis and correspondence analysis. These give graphical and descriptive results, but usually not statistical conclusions. By introducing statistical mixture models, we may switch to fuzzy clustering, in which species and/or samples are allocated to groups probabilistically. This provides statistical inference in addition to graphical pattern analyses.
110302095058
Modeling a marine host-parasite system: sea lice and salmon population dynamics
Dr Martin Krkosek
Department of Zoology, University of Otago
Date: Wednesday 13 April 2011
Coastal marine ecosystems have experienced major changes in fish abundance due to fisheries and aquaculture. Pathogen transmission between wild and farmed fish populations has affected aquaculture and fishery productivity as well as biodiversity conservation. Focusing on salmon and parasitic copepods, I will describe the modeling framework we have developed to understand the dynamics of parasite outbreaks in wild and farmed fish populations. This quantitative framework for salmon-lice population dynamics combines models and data from fisheries, ecology, and epidemiology. Application of the framework in Canada yields quantitative insights into fundamental ecology as well as successful and not-so-successful coastal management.
110303100342
Robust Climate Reconstruction
Peter Green
Department of Mathematics and Statistics
Date: Thursday 7 April 2011
High resolution palaeoclimate reconstruction is an attempt to estimate past temperatures from natural proxies by calibrating them against the recent instrumental record.
The proxies used in reconstructed estimates tend to be very noisy, with the desired climate signal variously masked by process outliers (atypical local climate events), measurement outliers (idiosyncratic non-climatic local phenomena), and divergences (unknown factors that result in persistent biases in the proxy record).
The nature and degree of noise in the proxy record makes it difficult to calibrate them against an unfortunately short instrumental record.
Robust statistics allows us to calibrate despite the presence of outliers, and equally important, to identify and isolate outliers so that they can be modelled separately.
110302095014
Correlating diverse data sets: How season, gender, diet and location impacts on the sensory quality of sea urchin roe and other stories
Professor Phil Bremer
Department of Food Science, University of Otago
Date: Thursday 31 March 2011
In Food Science we are interested in determining the critical factors influencing the quality of food and beverages. This approach involves correlating diverse factors such as the variables involved in growing, harvesting / capturing, processing and storing a product, with how consumers perceive its sensory (appearance, aroma, taste, texture) characteristics.
An example of this approach is a recent research project with Kina roe (Sea Urchin, Evechinus chloroticus ). Using a multidisciplinary team of food scientists, sensory scientists, biochemists, marine biologists and kina fishers and processors we studied the wild caught fishery and ran a series of laboratory and field based feeding trials. Our research examined the relationship between season, urchin gender, diet and roe yield, composition and sensory properties. Sensory properties were estimated using descriptive sensory analysis with a trained panel. Biochemical composition and physical properties were assessed using standard analytical techniques and the volatile profile of the roe was measured using proton transfer reaction mass spectroscopy. Differences and the identification of attributes that significantly discriminated variables were examined using General Linear Model (GLM) univariate analysis of variance (ANOVA, SPSS Inc.) and principal components analysis (PCA, Unscrambler CAMO).
This multidisciplinary approach enabled us to identify a previously unreported relationship between kina flavour and gender. Female roe was found to be seasonally very bitter compared to male gonads. Further, while seasonal and gender effects were generally more significant than diet effects, dietary protein levels enhanced roe yield but resulted in an increased bitterness of the roe. Kina harvested from Southern NZ were found to differ in sensory properties and volatile profile from kina harvested from Northern NZ waters.
A current project in Food science involves determining the impact of viticulture and oenology practices on the characteristics of wine varieties. I am interested in discussing possible collaborations and new ways to analysis the data sets obtained.
110302094457
Inverse problems: where mathematics, computation and statistics meet
Associate Professor John Bardsley
Department of Mathematical Sciences, The University of Montana
Date: Wednesday 30 March 2011
The mathematical field of Inverse Problems is well-developed, with several scholarly journals dedicated to its study. The primary focus of scholarship within the field has been at its interface with analysis, numerical analysis, and scientific computing. However, in applications, inverse problems involve noisy data, suggesting that statistical techniques are also worth studying. Within the past decade, or so, a number of researchers have conducted pioneering work at the interface between inverse problems and statistics (e.g., University of Otago's Colin Fox), and it is now well-known that Bayesian statistical methods are particularly well-suited for use on inverse problems. In this talk, I will provide a general introduction to inverse problems, including issues that arise in their solution, and then discuss their connection with Bayesian statistics. Finally, I will present a MCMC method for computing samples from the posterior distribution of the unknown parameters in a linear inverse problem, which will allow us to compute estimates, as well as to quantify uncertainty in those estimates. Numerical examples will include image deblurring and computed tomography.
110302093852
Identifying cliques of convergent characters: an example from the evolution of cormorants and shags
Professor Hamish Spencer
Department of Zoology, University of Otago
Date: Thursday 17 March 2011
A phylogenetic tree comprising clades with high bootstrap values or other strong measures of statistical support is usually interpreted as providing a good estimate of the true phylogeny. Convergent evolution acting on groups of characters in concert, however, can lead to highly supported but erroneous phylogenies. Identifying such groups of phylogenetically misleading characters is obviously desirable. I will present a procedure, developed in conjunction with Barbara Holland, Martyn Kennedy and Trevor Worthy, that uses an independent data source to identify sets of characters that have undergone concerted convergent evolution, illustrated using the problematic case of the cormorants and shags, for which trees constructed using osteological and molecular characters both have strong statistical support and yet are fundamentally incongruent. I will show that the osteological characters can be separated into those that fit the phylogenetic history implied by the molecular data set and those that do not. Moreover, these latter nonfitting osteological characters are internally consistent and form groups of mutually compatible characters or “cliques,” which are significantly larger than cliques of shuffled characters. I suggest, therefore, that these cliques of characters are the result of similar selective pressures and are a signature of concerted convergence.
110302093232
Good practice in testing for an association in contingency tables
Prof Dr Markus Neuhäuser
RheinAhrCampus, Remagen, Germany
Date: Thursday 3 February 2011
The testing for an association between two categorical variables using count data is commonplace in statistical practice. Here, we present evidence that influential biostatistical textbooks give contradictory and incomplete advice on good practice in the analysis of such contingency table data. We survey the statistical literature and offer guidance on such analyses. Specifically, we call for greater use of exact testing rather than tests which use an asymptotic chi-squared distribution. That is, we suggest that researchers take a conservative approach and only perform asymptotic testing where there is little doubt that it is appropriate. We recommend a specific criterion for such decision-making. Where asymptotic testing is appropriate, we recommend chi-squared over the G-test and recommend against the implementation of Yates (or any other) correction. We also provide advice on the effective use of exact testing for associations in contingency tables. Lastly, we highlight issues that need to be considered when using the commonly recommended Fisher's exact test. Reference: Ruxton GD & Neuhäuser M (2010): Good practice in testing for an association in contingency tables. Behav Ecol Sociobiol 64, 1505-1513)
110119092221
Semi-parametric models for coral reef dynamics: investigating the evidence for alternative stable states
Dr Matthew Spencer
Department of Environmental Sciences, University of Liverpool
Date: Wednesday 19 January 2011
Coral reefs are sometimes dominated by corals and sometimes (especially in areas affected by human activities) by macroalgae. Many coral reef biologists believe that the coral-dominated and algal-dominated states represent alternative attractors in a dynamical system. However, there is little empirical evidence to support this idea. Instead, this belief is based largely on the behaviour of analytical and simulation models of coral reef ecosystems, which often have two attractors for a single set of parameter values. However, these models have rarely been fitted to empirical data, because most available time series are very short, and we have little knowledge of the underlying mechanisms.
We have many short time series of the proportional cover of corals, macroalgae, and other components at annual intervals on reefs from three regions (Caribbean, Kenya, Great Barrier Reef). We combine information within regions, assuming that each time series is a realization of the same stochastic process. We then use local linear estimation to construct a probability distribution for the state of a reef in a year's time, given its current state, without making mechanistic assumptions. We can then write down an integral equation that updates the distribution of reef states each year. The eigenfunction of this equation associated with the largest eigenvalue tells us about the long-term equilibrium of the system. We show that models for the three regions have very different long-term equilibria, but in each case, the equilibrium has only one mode. This suggests that coral- and algal-dominated coral reefs represent differences in environmental conditions, rather than alternative stable states for the same environmental conditions.
Finally, we construct a stochastic differential equation model which does have two attractors for some parameter values. Using data simulated under this model, we show that our method can detect alternative stable states when they exist.
110119091637
Phylogenetic diversity and its application to biodiversity conservation
Dr Steffen Klaere
Department of Mathematics and Statistics
Date: Tuesday 18 January 2011
In the early 1990s a group of conservation biologists proposed that the diversity of a geographic region should not be restricted to counting the species present in the region but rather also incorporate the genetic information of said species. This led to the introduction of phylogenetic diversity. Though the measure was well received its use was limited due to lack of sufficient genetic data and proper software.
In recent years, both limitations have been addressed. With the advent of next generation sequencing, generating massive amounts of data for a geographic region has become feasible. Further, bioinformaticians have provided several packages for computing the phylogenetic diversity for a set of species from a phylogeny.
Here, I will present such a tool which employs linear programming to compute the phylogenetic diversity for a geographic region based on one or more genes for a set of species considered. I will demonstrate the power of the method on a data set for 700 floral species from the Cape of South Africa.
110112110142
Hyperconvexity and Tight Span Theory for Diversities
Dr David Bryant
Department of Mathematics and Statistics
Date: Thursday 2 December 2010
The tight span, or injective envelope, is an elegant and useful construction that takes a metric space and returns the smallest hyperconvex space into which it can be embedded. The concept has stimulated a large body of theory and has applications in combinatorial optimisation and data visualisation (which was where I first got interested in it). In this talk I'll introduce diversities, which are similar to metrics except that they are defined on all subsets of a set instead of just pairs. A diversity (X,δ) satisfies, for all subsets A,B,C of X:
(D1) δ(A)≥ 0, and δ(A) = 0 if and only if |A| ≤ 1.
(D2) If B is non-empty then δ(A ∪ C) ≤ δ(A ∪ B) + δ(B ∪ C).
I will show how the rich theory associated to metric tight spans and hyperconvexity extends to a seemingly richer theory of diversity tight spans and diversity hyperconvexity.
This is joint work with Paul Tupper at Simon Fraser university.
101104102119
Model-averaged profile likelihood confidence intervals
DOUBLE BILL - Daniel Turek
Department of Mathematics and Statistics
Date: Thursday 2 December 2010
Profile likelihood confidence intervals based on the signed likelihood ratio statistic, are a well known alternative to Wald intervals. These intervals assume normality of parameter estimates to construct symmetric confidence intervals. Further, the technique of model-averaging allows consideration of multiple candidate models, and prediction of shared model parameters. In this presentation I will motivate and present a method for the construction of model-averaged profile likelihood confidence intervals. I study these intervals through simulation, and compare coverage rate and interval length properties against other model-averaged confidence intervals. Profile likelihood intervals are seen to have increased length and favourable coverage properties.
101129103237
Robust Temperature Reconstructions from Divergent Proxies
DOUBLE BILL - Peter Green
Department of Mathematics and Statistics
Date: Thursday 2 December 2010
Global temperatures for the last thousand years can be estimated from
natural proxies like tree rings.
But natural variables are rarely as well-behaved as we would like, and
the response at some observations may be dominated by unknown
confounding factors.
Robust methods let us estimate a relationship despite the presence of
outliers, and can help us identify which observations are outliers and
which are informative.
Some palaeoclimate proxies have diverged from temperature in the
latter part of the last century, and there is some concern that
similar divergences may have occurred in the past. The cause of the
recent divergence remains unknown, and robust methods may shed some
light on the properties of this divergence.
101129103002
Computational analysis of metagenomes
Professor Daniel Huson
Centre for Bioinformatics, Tübingen University, Germany
Date: Tuesday 9 November 2010
Continuing improvements in the through-put and cost efficiency of next generation sequencing technologies are fueling a rapid increase in the number and scope of metagenomics projects, which aim at studying consortia of microbes by sequencing, in environments as diverse as soil, water, gut, ancient bones and the human biome. The first three computational challenges in metagenomics are: (1) Taxonomic analysis, who is out there? (2) Functional analysis, what are they doing? (3) How do different metagenomes compare and are their differences correlated to environmental differences? This talk will give an introduction to metagenomics and will discuss some of the methods that have be developed to address these three challenges. Metagenomics is a very growing field and much work needs to be done to support the analysis of metagenomic datasets.
Biography: Daniel Huson is widely known for his work in computational biology. He is the author of many extensively used software packages including tools for phylogenetic analysis, data visualisation, meta-genomic analysis, and evolutionary music composition. After completing a PhD in mathematics in 1990, Daniel held various research positions in Germany and elsewhere including a two-year postdoctoral fellowship at Princeton working with Tandy Warnow and John Conway. Daniel was a senior scientist at Celera during the first sequencing of the human genome and has been professor of Bioinformatics at the University of Tübingen since 2002.
A JOINT SEMINAR WITH THE DEPARTMENT OF BIOCHEMISTRY
101022103346
New ways of visualising official statistics
Professor Sharleen Forbes
School of Government, Victoria University, Wellington and Statistics New Zealand
Date: Thursday 21 October 2010
Official statistics provide the evidence base for much of government policy but these have traditionally been released in simple and standard tables and graphs. The ability to harness the power of the internet together with new graphical techniques has led to a burst of creativity in a number of national statistics offices. New static and dynamic graphs and maps, combined interactive graphs and tables and graphs and maps that allow users to interrogate and interact with data in new ways will be demonstrated. Examples given include multidimensional scatterplots, cartograms, a CPI kaleidoscope, interactive maps, dynamic population pyramids and commuter flows and Hans Rosling’s Gapminder. A word or two of warning on the possible limitations of data visualisation will also be given.
101004090445
Students' presentations x 2
4th year Statistics students
Department of Mathematics and Statistics
Date: Thursday 7 October 2010
Darren Alexander
Correlated Binary Data: A methodological approach
Pharmaceutical studies often produce datasets in binary form. One such example is a study where overdoses of the drug, Venaflaxine, were witnessed in a 10 year long cohort study. The patients were monitored to see if a seizure occurred after a particular overdose, and provides the motivation for looking at binary data, whilst placing an emphasis on correlation and extreme proportions of binary success.
Ross Haines
Anticipating Penalty Kicks - Bayesian Modelling of Eye-Tracking Data
When the time-dependence of athletes’ gaze pattern data is utilised in analysis, useful information is extracted, and subtle behavioural characteristics that are not apparent with discrete summary statistics are revealed. To illustrate this, Markov models are fitted to gaze-behaviour data from a football penalty kick scenario.
100924115704
Potential biological removal of albatrosses and petrels with minimal demographic information
Dr Peter Dillingham
Department of Mathematics and Statistics
Date: Thursday 23 September 2010
Seabirds such as albatrosses and petrels are frequently caught in longline and trawl fisheries, but limited demographic data for many species creates management challenges. A method for estimating the potential biological removal (the PBR method) for birds requires knowledge of adult survival, age at first breeding, a conservation goal, and the lower limit of a 60% confidence interval for the population size. For seabirds, usually only the number of breeding pairs is known, rather than the actual population size. This requires estimating the population size from the number of breeding pairs when important demographic variables, such as breeding success, juvenile survival, and the proportion of the adult population that engages in breeding, are unknown. In order to do this, a simple population model was built where some demographic parameters were known while others were constrained by considering plausible asymptotic estimates of the growth rate. While the median posterior population estimates are sensitive to the assumed population growth rate, the 20th percentile estimates are not. This allows the calculation of a modified PBR that is based on the number of breeding pairs instead of the population size. For threatened albatross species, this suggests that human-caused mortalities should not exceed 1.5% of the number of breeding pairs, while for threatened petrel species, mortalities should be kept below 1.2% of the number of breeding pairs. The method is applied to 22 species and sub-species of albatrosses and petrels in New Zealand that are of management concern.
100826154500
A semi-statistical sojourn into the 1000 Genomes Project
Dr Mik Black
Department of Biochemistry
Date: Thursday 16 September 2010
The extensive resequencing of the human genome being undertaken as part of the 1000 Genomes Project provides a unique opportunity to investigate human genomic variation in high resolution (i.e., single nucleotide), at the population level.
While the scale of this project presents a number of analytic challenges, rapid progress has been made on the development of computational tools for processing and manipulating these large amounts of sequence data. In this talk I will provide an overview of the data currently available through the 1000 Genomes Project, and of the tools available for processing, visualization, and analysis. Use of these tools will be demonstrated in the context of a study of copy number variation at the FCGR locus currently being undertaken in the Department of Biochemistry at the University of Otago.
100910100648
An Integrated Approach to Modelling Constant Effort Bird-Ringing Data
Dr Vanessa Cave
AgResearch, Hamilton
Date: Thursday 26 August 2010
The United Kingdom government is committed to a range of international agreements concerning the protection of wild bird populations. Long-term monitoring of wild bird populations is essential if they are to be managed and conserved effectively. The British Trust for Ornithology’s Constant Effort Sites (CES) scheme, an annual programme of standardised bird-ringing across a large number of sites, provides long-term data for the monitoring of Britain’s common songbirds.
Data from the CES scheme are routinely used to index abundance and productivity, and to a lesser extent estimate adult survival rates. However, of increasing interest is the relationship between abundance and the underlying demographic rates, an understanding of which facilitates effective conservation. CES data are particularly amenable to an integrated approach to population modelling, providing a combination of demographic information from a single source. Such an integrated approach is developed here, employing Bayesian methodology and a simple population model to unite abundance, productivity and survival within a consistent framework. The method is illustrated using data for Acrocephalus warblers. We discuss some of the features of the CES data requiring particular consideration, such as the presence in the catch of 'transient' birds not associated with the local population, the need for separate data on the survival of juvenile birds, and the accommodation of sporadic failure in the constancy of effort.
100802104221
Thorough Evaluation of Tests by Exact Approximate Bayesian Computation
Dr Jessica Leigh
Department of Mathematics and Statistics
Date: Thursday 19 August 2010
Statistical methods applied to many areas of the sciences are often assessed and marketed using simulation-based performance evaluation. This sort of framework can involve repeated simulation over a large number of combinations of values for relevant parameters, or the selection of a few “pet” parameter values. While the former approach to parameter selection can be inefficient and unwieldy, the second is far from objective and potentially dishonest. We have developed a Markov chain Monte Carlo sampling method to identify regions of parameter space where methods perform either well or poorly. Our method is similar to Approximate Bayesian Computation in that it does not involve the calculation of likelihoods, but samples from the probability distribution of interest, rather than an approximation thereof. In addition to describing our method, I will present results from its application to such diverse subject areas as population genetics and public health.
100802105444
Some adaptive MCMC algorithms
Assoc Professor Renata Meyer
Department of Statistics, University of Auckland
Date: Thursday 12 August 2010
Different strategies have been proposed to improve mixing of Markov Chain Monte Carlo algorithms. These are mainly concerned with customizing the proposal density in the Metropolis-Hastings algorithm to the specific target density. Various Monte Carlo algorithms have been suggested that make use of previously sampled states in defining a proposal density and adapt as they run, hence called 'adaptive' Monte Carlo.
In the first part of this talk, we look at the crucial problem in applications of the Gibbs sampler: sampling efficiently from an arbitrary univariate full conditional distribution. We propose an alternative algorithm, called ARMS2, to the widely used adaptive rejection sampling technique ARS by Gilks and Wild (1992, JRSSC 42, 337-48) for generating a sample from univariate log-concave densities. Whereas ARS is based on sampling from piecewise exponentials, the new algorithm uses truncated normal distributions and makes use of a clever auxiliary variable technique (Damien and Walker, 2001, JCGS 10, 206-15).
Next we propose a general class of adaptive Metropolis-Hastings algorithms based on Metropolis-Hastings-within-Gibbs sampling. For the case of a one-dimensional target distribution, we present two novel algorithms using mixtures of triangular and trapezoidal densities. These can also be seen asimproved versions of the all-purpose adaptive rejection Metropolis sampling algorithm (Gilks et al., 1995, JRSSC 44, 455-72) to sample from non-logconcave univariate densities. Using various different examples, we demonstrate their properties and efficiencies and point out their advantages over ARMS and other adaptive alternatives such as the Normal Kernel Coupler.
100802101303
Estimating Trends
Dr Laimonis Kavalieris
Department of Mathematics and Statistics
Date: Thursday 5 August 2010
Since its earliest days, time series literature has given justified attention to basic concepts such as a “trend” and “trend estimation”. This work, which gives justification to much of the activity of government agencies such as StatisticsNZ, has continued without ever being able to say what a “trend” actually is. In some subject areas such as climate change, controversy rages where it is well known, to at least one Professor, that global warming ended in 1998.
In this talk I want to discuss the meaning of a trend. My real interest is to develop theory for the estimation of a trend that recognises the slippery nature of a trend, so I will take time to explain some of the mathematical aspects of my approach.
Audience participation and opinions welcome. Prizes for correct answers.
100802094911
General animal movement and migration models using multi-state random walks
Dr Brett McClintock
Centre for Research into Ecological and Environmental Modelling (CREEM), University of St Andrews, Scotland
Date: Wednesday 28 July 2010
Recent developments in animal tracking technology have permitted the collection of detailed movement paths from individuals of many species. Despite this rapidly increasing wealth of information, model development for the analysis of complex movement data has not kept pace with these technological advancements. This is largely because of the complicated structure found in time-series of animal location data, which often include considerable observation error in both time and space. Sophisticated statistical models of the underlying movement process are therefore required to facilitate reliable inference. To better understand complicated animal movements in heterogeneous landscapes, we propose that movement paths can be dissected into a few general movement strategies among which animals transition as they are affected by changes in the internal and external environment. We develop a suite of state-space models of animal movement based on mixtures of biased and correlated random walks that include different behavioural states for oriented, exploratory, and area-restricted movements. Models may then be “custom-built” for a wide variety of species applications, thereby allowing the simultaneous estimation of latent movement behaviour states, state transition probabilities, locations of foraging or resident centres of attraction, and the strength of attraction to specific locations. The inclusion of memory or covariate information in the modelling of state transition probabilities permits further investigation of specific factors related to different types of movement. Using reversible jump Markov chain Monte Carlo methods to facilitate Bayesian inference, we apply the proposed methodology to grey seal movements among haul-out and foraging locations in the North Sea.
100722142817
Math and Stat 400 level student presentations
Mathematics and Statistics Department
Date: Thursday 27 May 2010
STAT 480 Preliminary Presentations
Darren Alexander
Analysis of correlated binary data
Yahya Aljohani
"Statistics Anxiety": measuring its level and impact
on choice of statistical software
Ross Haines
Bayesian modelling of eye-tracking data
Crystal Symes
Multivariate analysis used to estimate stature of Prehistoric Thai people from Ban Non Wat
MATH 480 Final Presentation
Padarn Wilson
Constructing a Brownian motion
100521114829
Calibrating pseudoproxies
Peter Green
Department of Mathematics and Statistics, University of Otago
Date: Thursday 13 May 2010
The performance of a palaeoclimate reconstruction method can be tested in a pseudoproxy experiment. The method is used to reconstruct a model climate from simulated proxies, and then the result can be compared to the known target. The results of these experiments depend on the properties of these pseudoproxies; a higher signal to noise ratio will result in a better score for the reconstruction, or a more complicated noise structure will result in a lower score. In order to get an accurate assessment of the relative strengths and weaknesses of the various methods it is important that the properties of the pseudoproxies are as realistic as possible. But to facilitate interpretation the proxy model should also be as simple as possible. Many pseudoproxy models add random errors – often either independent normal or AR(1) errors – to a gridbox temperature series. A pseudoproxy may record temperature information via a number of climate variables, and so the total climate signal in a proxy record may be significantly underestimated if we limit the pseudoproxyʼs climate signal to a single gridbox temperature. In fact ʻtemperature plus noiseʼ pseudoproxies, with realistic correlations between the proxy and the local temperature, produce pseudo-reconstructions with unrealistically low calibration and validation performance. This suggests that more climate information needs to be included in pseudoproxy models.
100506124741
The use of the Chi-square test when observations are dependent
Austina Clark
Department of Mathematics and Statistics University of Otago
Date: Thursday 6 May 2010
When the Chi-square test is applied to test the association between two multinomial distributions, each with k cells, we usually assume that cell observations are independent. If some of the cells are dependent we would like to investigate (1) how to implement the Chi-square test and (2) how to find the test statistics and the associated degrees of freedom. The test statistics and degrees of freedom are developed from results by Geisser S & Greenhouse S W (1958, JEBS, 69-82) and Huynh H & Feldt L S (1976, AMS, 885-891). We will use an example of influenza symptoms of two groups of patients by Chang Y (2010, MJA, 1-4) to illustrate this method. One group of patients suffered from H1N1 influenza 09 and the other from seasonal influenza. There were twelve symptoms collected for each patient and these symptoms were not totally independent.
100429160213
Integrated analysis of a mark-recapture data set for native fish in the Murray River
Tomas Bird
Department of Mathematics and Statistics
Date: Thursday 29 April 2010
Capture-Mark-Recapture (CMR) studies have progressed significantly since their use by LaPlace in estimating the population of France. Now, hierarchical formulations of CMR studies are frequently used to estimate demographic rates in closed and open populations in situations where data may be missing and where supplementary data are used as an aid to the estimation of model parameters. However, many CMR studies are still vulnerable to bias in cases where the probability of detection is variable or where migration is uncertain. Given the facility with which additional data and explanatory models can be incorporated into hierarchical CMR studies, a new breed of CMR study is emerging that explicitly includes data collection tools which can be used to address issues of detection probability and migration. We present an example of a CMR dataset in an open population of native fish in the Murray River, Australia. A large-scale experimental manipulation was undertaken in a 100 km section of the river, to determine whether population growth rates of 4 native fish species would increase with the addition of suitable habitat. The study utilizes a standard CMR approach as a basis for the estimation of population size and demographic rates. Two distinct radiotagging datasets are used as a means to estimate detection probability and migration rates. In addition, size data are incorporated into the study to help account for size-based differences in survivorship and recapture probabilities. We present a conceptual approach to an integrated analysis of this data set.
100422151328
State-space model for abundance, survival, and recruitment from molecular parentage data
Dr. Jamie Sanderlin
Department of Mathematics and Statistics, Otago University
Date: Thursday 22 April 2010
Common analyses with molecular parentage data include determining reproductive success, mating strategies, and relatedness among individuals in a population. The effective population size or breeding population size is also of interest, which has implications in the conservation biology and management of small populations. Data collected for molecular parentage analysis can also be used to assess total population size, survival, and recruitment of individuals in the same analysis. These additional demographic parameters would assist with conservation decisions about the proportion of breeders in a population and population viability. We describe a Bayesian statespace approach to track survival, recruitment, and abundance of individuals through time using open population mark-recapture models. Data consist of multilocus genotype samples of offspring and potential parental genotypes over multiple years. Our method relies on data augmentation, since parents of sampled offspring may not be sampled and genetic samples may contain genotyping error.
100415162005
Coverage Properties of Model-Averaged Confidence Intervals
Assoc. Prof. David Fletcher and Dr Peter Dillingham
Department of Mathematics and Statistics
Date: Thursday 15 April 2010
Model-averaging is becoming increasingly common in data analysis, as there are clear benefits in using it to allow for model uncertainty. In situations where it is appropriate, little work has been carried out to assess the properties of the confidence interval associated with a modelaveraged estimate. We use simulation to assess the coverage properties of such intervals in a factorial ANOVA setting, for different methods of calculation of the interval and for different methods of assigning model weights.
100401131902
Accelerated Gibbs Sampling of Gaussain Distrubtions
Prof. Colin Fox
Department of Physics
Date: Thursday 1 April 2010
The Metropolis (Hastings-Green) algorithm for MCMC was invented in the 1950ʼs and has changed little since. In contrast, algorithms for linear algebra that were also invented in the 1950ʼs have seen dramatic improvement, as is evident from the robust and efficient numerical packages such as LAPACK, EISPACK, and others. Our goal is to steal the developments in linear algebra and apply them to sampling of probability distributions. Some important sampling algorithms, such as Gibbs sampling of Gaussian distributions, correspond exactly to stationary iterative methods used for solving linear systems. We state and prove a theorem to that effect. Further, the convergence rates are exactly the same. This means that novel and efficient Gibbs samplers can be produced, with convergence established, by looking in an introductory numerical analysis text. For large linear problems, stationary iterative methods are considered to be very slow, with the state of the art being Krylov-space methods with polynomial accelerators. We show how polynomial acceleration can be applied to give very fast Gibbs sampling algorithms, that sets a new state of the art for samplers.
100329144416
EFFECTS OF HYPERPARAMETERS IN A NORMAL MODEL WITH CONJUGATE PRIOR
Dr. Susan Alber
Department of Preventative and Social Medicine
Date: Thursday 25 March 2010
For a normal model with a conjugate prior, we investigate the effects of the hyperparameters on the long-run frequentist properties of posterior point and interval estimates. Under an assumed sampling model for the data generating mechanism, we develop two types of hyperpararameter optimality. Mean squared error (MSE) optimal
hyperparameters minimize the MSE of posterior point estimates. Credible interval optimal hyperparameters result in credible intervals that have minimum length while still retaining nominal coverage. We give an example to demonstrate how to use our results to guide the selection of hyperparameters.
100318171817
The calibration of two antimicrobial susceptibility tests using interval censored data with measurement error
Prof. Bruce Craig
Purdue University
Date: Thursday 18 March 2010
Drug dilution (MIC) and disk diffusion (DIA) are two common tests used by clinicians to determine pathogen susceptibility to antibiotics. For each of these tests, two drug-specific breakpoints classify the unknown pathogen as either susceptible, intermediate, or resistant to the drug. While MIC breakpoints are largely based on the pharmacokenetics and pharmacodynamics of the drug, comparable DIA breakpoints are not as straightforward to calculate. Current Clinical and Laboratory Standards Institute (CLSI) guidelines require a scattergram of test results for numerous pathogens and the DIA breakpoints are based on limiting the classification discrepancies. This approach, however, does not account for certain test properties and experimental errors. I will discuss this as an error-in-variables problem and then describe a hierarchical model, which factors in the uncertainty of both tests, the drug-specific relationship between the two tests, as well as the underlying distribution of pathogens. For the drug-specific relationship between these two tests, I propose both a parametric and nonparametric approach. A loss function is then used to determine the DIA breakpoints. This is joint work with the CLSI Subcommittee on Antimicrobial Susceptibility Testing.
100312100200
Minimum Description Length (MDL)
Dr. Lamonis Kavalieris
Department of Mathematics and Statistics
Date: Thursday 11 March 2010
Since the familiar information criterion 'A' by Akaike some 40 years ago, many variations including a Bayesian analogue have appeared. All are based on large sample theory applied in sometimes dubious ways. Their popularity appears to be based on data modelling success rather than a sound theoretical basis. About the same time, communications engineers began investigating questions of model selection basing their work on coding theory and Kolmogorov complexity. The essential idea is that a data set is to be transmitted in a code that is constructed using a statistical model. In order that a message be decoded it is necessary that the model parameters are also included in the transmission. Then the model achieving the minimum code length may be understood as a best model. When the model set contains a fixed (and known) number of parameters, such a procedure reduces to maximum likelihood. But it extends maximum likelihood and allows model selection when the number of parameters is not known.
100305121257
Four 4th Year Students
Talks by 4th-year Statistics Honours Students
Department of Mathematics & Statistics
Date: Friday 23 October 2009
Freya Broughton-Ansin
Using Hidden Markov Models to Estimate Breeding Success in Sooty Shearwater
Michelle Feyen
Comparison of Mark-Recapture Methods for Estimating Yellow-eyed Penguin Survival
Tim Jowett
Golf Handicapping on the Fringe of Normality
Ruby Morgan
Matrix Population Modelling and Simplifications to the Leslie Matrix
091015121516
Spatially explicit capture–recapture models in R
Dr. Murray Efford
Department of Zoology
Date: Thursday 15 October 2009
Spatially explicit capture–recapture (SECR) is a set of tools for estimating the density of animal populations sampled with an array of detectors (traps, microphones etc.). SECR models incorporate
conventional sources of variation in detection probability along with many new spatial effects,including spatial and temporal variation in density itself. The variety of possible SECR models is a serious obstacle to their effective use and communication. In this talk I will describe a new R package ʻsecrʼ that provides a simple but highly flexible formula-driven interface to SECR models. Tools for
model selection in secr will be illustrated with data on ovenbirds (Seiurus aurocapilla) captured in a Maryland forest over five breeding seasons.
091015094136
Hierarchical state-space model for estimating black bear abundance in central Georgia, USA
Dr. Jamie Sanderlin
Department of Mathematics and Statistics
Date: Thursday 15 October 2009
Estimating demographic parameters of cryptic species is often difficult because of the elusive behaviour of the animals. Combining multiple data sources may be useful in these wildlife studies with incomplete detection. We present a Bayesian hierarchical state-space abundance model for black bears (Ursus americanus) which accounts for both measurement and process error. We sampled the population using
two non-invasive techniques (DNA hair snares and digital cameras) and monitored animal presence during sampling periods with telemetry. Our abundance model from multiple data sources also accounts for study
genotyping error using calibration samples and repeat genotyping. We present results from the model using field data from five sampling periods (2004-2006) in central Georgia, USA. This collaborative research at the University of Georgia is from a portion of my Ph.D. thesis.
091009092618
Mad Math: Beyond the Presence-dome
Dr. Darryl MacKenzie
Proteus Wildlife Research Consultants
Date: Thursday 8 October 2009
The last 10 years has seen the development of new statistical models, occupancy models, which enable more reliable inferences about the presence or absence of a single species, while accounting for imperfect detection of the species (i.e., false absences). However oftentimes, simple presence/absence measures may be insufficient. A species may be relatively common or widely distributed and simply measuring “presence” on the landscape may be insensitive to changes in the population, rather, an index of relative abundance may be required. Another situation is that some additional characteristic of the population is also of interest, e.g., not only whether the species is present at a location, but whether breeding is also occurring there. This finer resolution information will typically be plagued by a form of imperfect detection: the evidence required to correctly classify a location into its true state will not always be observed. For example, not seeing evidence of reproduction during a survey of a location will not always preclude the possibility that breeding is actually occurring there. In this talk I shall describe the recently developed multi-state occupancy models that can be used in these situations. They enable a wider set of biologically interesting questions to be addressed at a landscape scale without detailed individual-level information (e.g., marked individuals). Examples of their application will be given, and it will be shown how the general framework can be applied in many situations (e.g., species co-occurrence and joint habitat-occupancy dynamics models).
Despite the title of this talk, the number of equations will be limted.
091002100330
An Adapted Robust Design for Multiple List Studies
Ms. Claire Cameron
Department of Mathematics and Statistics
Date: Thursday 1 October 2009
Iʼm feeling a little bit festive: my PhD is in its final stages. The work I did for my thesis consisted of adapting an existing capture-recapture model for its use in the context of multiple list studies in the area of epidemiology. The application of this new model was to local data(four patient lists) of people in Otago with diabetes. In this talk I will give an overview of the new model and discuss some of the results obtained. There will be scones, there will be music. There probably wonʼt be dancing or formulae.
090925171444