16 Aviation Accidents and Incidents
This lesson uses studies of aviation events to demonstrate techniques for data where both the predictor and response variables take categorical values. Performance shaping factors affecting aircraft accidents (serious events) and incidents (minor events) should be investigated in order to reduce the possibility of future accidents. This study, presented by David O’Hare, investigates human behaviours which may be involved in such accidents and incidents.
Data
There are 2 files associated with this presentation. The first contains the data you will need to complete the lesson tasks, and the second contains descriptions of the variables included in the data file.
Video
Objectives
Tasks
0. Read and Format data
0a. Read in the data
First make sure you have installed the package readxl
(see Section 2.6) and set the working directory (see Section 2.1).
Load the data into R.
The code has been hidden initially, so you can try to load the data yourself first before checking the solutions.
Code
#loads readxl package
library(readxl)
#loads the data file and names it aircraft
<-read_xls("AircraftData.xls")
aircraft
#view beginning of data frame
head(aircraft)
Code
#loads readxl package
library(readxl)
Warning: package 'readxl' was built under R version 4.2.2
Code
#loads the data file and names it aircraft
<-read_xls("AircraftData.xls")
aircraft
#view beginning of data frame
head(aircraft)
# A tibble: 6 × 14
age hrstot event_type failu…¹ p_dis…² othr_…³ workl…⁴ inter…⁵ nontrn wrngtrn
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 60 1600 1 1 2 2 1 2 2 2
2 26 80 0 NA 0 0 0 0 0 0
3 34 82 0 NA 0 0 0 0 0 0
4 59 230 0 NA 0 0 0 0 0 0
5 41 11000 0 NA 0 0 0 0 0 0
6 45 286 0 NA 0 0 0 0 0 0
# … with 4 more variables: incapac <dbl>, mentals <dbl>, int_physical <dbl>,
# ext_physical <dbl>, and abbreviated variable names ¹failure_mode,
# ²p_distract, ³othr_distract, ⁴workload, ⁵interfnc
# ℹ Use `colnames()` to see all variable names
This opens the “AircraftData.xls”
dataset. This dataset contains information on performance shaping factors and the resulting events in 1144 reports from aircraft pilots.
0b. Format the data
Many of the variables in this data set are binary (categorical with 2 levels). They are currently coded as 1 (yes) and 2 (no), but it is easier to work with binary data coded as 1 (yes) and 0 (no).
Recode the relevant categorical variables.
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$p_distract[aircraft$p_distract==2]<-0 aircraft
Repeat for other binary variables.
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$p_distract[aircraft$p_distract==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$othr_distract[aircraft$othr_distract==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$workload[aircraft$workload==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$interfnc[aircraft$interfnc==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$nontrn[aircraft$nontrn==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$wrngtrn[aircraft$wrngtrn==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$incapac[aircraft$incapac==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$mentals[aircraft$mentals==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$int_physical[aircraft$int_physical==2]<-0 aircraft
Code
#overwrite entries equal to 2 (no) with 0 (binary no)
$ext_physical[aircraft$ext_physical==2]<-0 aircraft
Code
#check new data frame
head(aircraft)
# A tibble: 6 × 14
age hrstot event_type failu…¹ p_dis…² othr_…³ workl…⁴ inter…⁵ nontrn wrngtrn
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 60 1600 1 1 0 0 1 0 0 0
2 26 80 0 NA 0 0 0 0 0 0
3 34 82 0 NA 0 0 0 0 0 0
4 59 230 0 NA 0 0 0 0 0 0
5 41 11000 0 NA 0 0 0 0 0 0
6 45 286 0 NA 0 0 0 0 0 0
# … with 4 more variables: incapac <dbl>, mentals <dbl>, int_physical <dbl>,
# ext_physical <dbl>, and abbreviated variable names ¹failure_mode,
# ²p_distract, ³othr_distract, ⁴workload, ⁵interfnc
# ℹ Use `colnames()` to see all variable names
This dataset contains performance shaping factors for incidents, accidents,and situations where no event occurred. Some of the analysis carried out in the video (that we will replicate in this lesson), is interested only in the comparison between incidents and accidents (disregarding non-events).
Create a second data frame that only includes observations where an incident or accident occurred.
Code
#subset observations where event type is 1 (incident) or 2 (accident)
<-aircraft[which(aircraft$event_type=="1"|aircraft$event_type=="2"),]
aircraftFailure
#check new data frame
head(aircraftFailure)
Code
#subset observations where event type is 1 (incident) or 2 (accident)
<-aircraft[which(aircraft$event_type=="1"|aircraft$event_type=="2"),]
aircraftFailure
#check new data frame
head(aircraftFailure)
# A tibble: 6 × 14
age hrstot event_type failu…¹ p_dis…² othr_…³ workl…⁴ inter…⁵ nontrn wrngtrn
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 60 1600 1 1 0 0 1 0 0 0
2 33 90 1 1 0 0 0 0 0 0
3 41 158 1 3 0 0 0 0 1 0
4 44 3500 1 2 0 0 0 0 0 0
5 29 300 1 5 1 0 0 0 0 0
6 56 239 1 7 0 0 0 0 0 0
# … with 4 more variables: incapac <dbl>, mentals <dbl>, int_physical <dbl>,
# ext_physical <dbl>, and abbreviated variable names ¹failure_mode,
# ²p_distract, ³othr_distract, ⁴workload, ⁵interfnc
# ℹ Use `colnames()` to see all variable names
1. Cross-Tabulation, Bar Plot (response by predictor)
1a. Cross-Tabulation
Using the aircraftFailure data frame, construct a cross tabulation of event_type against each of the 7 categories in failure_mode.
Calculate these values as proportions, that sum to 1 for each event type.
Code
#using prop.table on a table object converts the counts to proportions
#margin=1 sums over rows
prop.table(table(aircraftFailure$event_type,aircraftFailure$failure_mode),margin=1)
Code
#using prop.table on a table object converts the counts to proportions
#margin=1 sums over rows
prop.table(table(aircraftFailure$event_type,aircraftFailure$failure_mode),margin=1)
1 2 3 4 5 6
1 0.39914163 0.22317597 0.10729614 0.11587983 0.03862661 0.02145923
2 0.31372549 0.17647059 0.07843137 0.25490196 0.07843137 0.05882353
7
1 0.09442060
2 0.03921569
1b. Bar plot
It is often easier to compare values over many categories using visualisation. Use your table from 1a. to create a bar plot.
What does your bar plot show?
Code
#store proportion values in a matrix
<-prop.table(table(aircraftFailure$event_type,aircraftFailure$failure_mode),margin=1) AIMatrix
Code
#barplot using matrix
barplot(AIMatrix,beside=TRUE,names.arg=c("Mechanical","Information","Diagnosis","Goal","Strategy","Procedure","Action"),col=c("darkorange1","red3"),ylab="Proportion",xlab="Failure Mode",main="Event type by Failure mode",ylim=c(0,0.5),cex.names=0.82)
legend("topright",c("Incident","Accident"),fill=c("darkorange1","red3"))
Code
#store proportion values in a matrix
<-prop.table(table(aircraftFailure$event_type,aircraftFailure$failure_mode),margin=1) AIMatrix
Code
#barplot
barplot(AIMatrix,beside=TRUE,names.arg=c("Mechanical","Information","Diagnosis","Goal","Strategy","Procedure","Action"),col=c("darkorange1","red3"),ylab="Proportion",xlab="Failure Mode",main="Event type by Failure mode",ylim=c(0,0.5),cex.names=0.82)
legend("topright",c("Incident","Accident"),fill=c("darkorange1","red3"))
The mechanical, information, diagnosis and action failure modes are associated with a higher proportion of incidents than accidents, while the reverse is true for the goal, strategy and procedure failure modes. The goal mode has the largest difference in proportion of associated accidents and incidents.
Create cross-tabulations of the proportions of event_type associated with each of the 10 performance shaping factors (e.g. p_distract, workload, ext_physical etc.).
Following this, you can construct a matrix of proportion values of accidents and incidents associated with each performance shaping factor and replicate the bar graph shown in the video (12.58). Note - you should not plot all the values in your table so you must first make a matrix of the relevant ones, rather than provide the table directly to the barplot()
function.
2. Limitations of Self-report Data
The data used for this analysis came from self-reports of pilots who responded to a survey request.
How might these factors (participant response to survey and self-report) affect the data that was collected, and therefore the conclusions that can be made?
Although pilots who responded to the survey did not differ from those that declined in terms of demographics (as addressed in the video), the characteristics of their crashes may not be the same.For example, pilots who had an accident that was attributed to their actions (e.g. diagnosis, strategy) may feel uncomfortable about publicising this information. As a result, the association between these failure modes and accidents or incidents would be underestimated.
Self-report of survey answers can also affect the quality of data. Pilots may have forgotten the specifics of accidents that happened a long time ago, may be self-conscious about the true causes, and may be more willing to attribute a crash to external factors such as distractions.
3. New Variable, Cross-Tabulation, \(\chi^2\) Tests
3a. New Variable, Cross-Tabulation, \(\chi^2\) Tests
Using the aircraftFailure data frame, create a new binary variable that takes value 1 when the failure mode is mechanical (failure_mode=1) and 0 otherwise.
Use this new variable to produce a count table of event_type according to whether the failure was mechanical.
Perform a \(\chi^2\) test for a significant association between event_type and mechanical failure. Interpret the results.
New variable
Code
$failure_mech<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_mech[aircraftFailure$failure_mode==1]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_mech)
\(\chi^2\) test
Code
chisq.test(table(aircraftMech$event_type,aircraftMech$failure_mode))
New variable
Code
$failure_mech<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_mech[aircraftFailure$failure_mode==1]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_mech)
0 1
1 175 93
2 42 16
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_mech))
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_mech)
X-squared = 0.78848, df = 1, p-value = 0.3746
The p-value is 0.3746, which is greater than the significance level of 0.05. This indicates no significant association between event type and the mechanical failure mode.
3b. Practice: New Variable, Cross-Tabulation, \(\chi^2\) Tests
Repeat Task 3a. for the remaining failure modes (information, diagnosis, goal, strategy, procedure). Interpret each \(\chi^2\) test, are any of the associations significant?
INFORMATION FAILURE
New variable
Code
$failure_inf<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_inf[aircraftFailure$failure_mode==2]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_inf)
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_inf))
Repeat for other failure modes.
INFORMATION FAILURE
New variable
Code
$failure_inf<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_inf[aircraftFailure$failure_mode==2]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_inf)
0 1
1 216 52
2 49 9
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_inf))
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_inf)
X-squared = 0.25232, df = 1, p-value = 0.6154
p-value greater than 0.05, no significant association.
DIAGNOSIS FAILURE
New variable
Code
$failure_diag<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_diag[aircraftFailure$failure_mode==3]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_diag)
0 1
1 243 25
2 54 4
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_diag))
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_diag)
X-squared = 0.11256, df = 1, p-value = 0.7373
p-value greater than 0.05, no significant association.
GOAL FAILURE
New variable
Code
$failure_goal<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_goal[aircraftFailure$failure_mode==4]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_goal)
0 1
1 241 27
2 45 13
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_goal))
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_goal)
X-squared = 5.6465, df = 1, p-value = 0.01749
This p-value is less than 0.05, providing evidence of a significant association between event type and goal failure mode.
STRATEGY FAILURE
New variable
Code
$failure_stra<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_stra[aircraftFailure$failure_mode==5]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_stra)
0 1
1 259 9
2 54 4
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_stra))
Warning in chisq.test(table(aircraftFailure$event_type,
aircraftFailure$failure_stra)): Chi-squared approximation may be incorrect
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_stra)
X-squared = 0.77195, df = 1, p-value = 0.3796
p-value greater than 0.05, no significant association.
PROCEDURE FAILURE
New variable
Code
$failure_proc<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_proc[aircraftFailure$failure_mode==6]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_proc)
0 1
1 263 5
2 55 3
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_proc))
Warning in chisq.test(table(aircraftFailure$event_type,
aircraftFailure$failure_proc)): Chi-squared approximation may be incorrect
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_proc)
X-squared = 1.0157, df = 1, p-value = 0.3135
p-value greater than 0.05, no significant association.
ACTION FAILURE
New variable
Code
$failure_act<-rep(0,nrow(aircraftFailure))
aircraftFailure$failure_act[aircraftFailure$failure_mode==7]<-1 aircraftFailure
Table of counts
Code
table(aircraftFailure$event_type,aircraftFailure$failure_act)
0 1
1 246 22
2 56 2
\(\chi^2\) test
Code
chisq.test(table(aircraftFailure$event_type,aircraftFailure$failure_act))
Warning in chisq.test(table(aircraftFailure$event_type,
aircraftFailure$failure_act)): Chi-squared approximation may be incorrect
Pearson's Chi-squared test with Yates' continuity correction
data: table(aircraftFailure$event_type, aircraftFailure$failure_act)
X-squared = 0.96336, df = 1, p-value = 0.3263
p-value greater than 0.05, no significant association.
4. Multiple Logistic Regression
Using the full aircraft data frame, carry out a multinomial logistic regression with event_type (0 (no event), 1 (incident), 2 (accident)) as the outcome variable.
There are many potential predictor variables to choose from (aside from failure_mode, as this is only relevant when an incident or accident has already occurred), so you can fit several different models to find which ones are significant. Note - if you get a warning system is computationally singular
, reduce the number of predictor variables in the model.
Report your conclusions from the fitted model. The fitted models for multinomial logistic regression may be written out and interpreted using methods analogous to logistic regression. However, because there are now more than 2 response categories, we must construct several fitted equations for the comparison of each set of log-odds.
Code
library(mlogit)
<-mlogit(event_type~0|age+hrstot+p_distract+othr_distract+workload+interfnc,data=aircraft,shape="wide")
multiAir2summary(multiAir2)
Distractions (people and other), excessive workload, and interference of one task with another all have significant effects on the odds of incidents, accidents or non-events. Age and total flying hours do not.
Code
<-mlogit(event_type~0|nontrn+wrngtrn+incapac+mentals+int_physical+ext_physical,data=aircraft,shape="wide")
multiAir3summary(multiAir3)
Significant effects are also found for lack of training, cockpit and external environments. Significant effects are not detected for faulty training, physical incapacitation, or affected mental state. However it should not be concluded that these effects do not exist, rather that we were not able to detect their presence in this study. There are very limited instances of physical incapacitation (only 4) in our data set, for example.
Code
<-mlogit(event_type~0|nontrn+p_distract+othr_distract+workload+interfnc+int_physical+ext_physical,data=aircraft,shape="wide")
multiAir4summary(multiAir4)
Full model with all significant effects can then be interpreted.
Full model with all significant effects to be interpreted.
Code
library(mlogit)
Loading required package: dfidx
Attaching package: 'dfidx'
The following object is masked from 'package:stats':
filter
Code
<-mlogit(event_type~0|nontrn+p_distract+othr_distract+workload+interfnc+int_physical+ext_physical,data=aircraft,shape="wide")
multiAir4summary(multiAir4)
Call:
mlogit(formula = event_type ~ 0 | nontrn + p_distract + othr_distract +
workload + interfnc + int_physical + ext_physical, data = aircraft,
shape = "wide", method = "nr")
Frequencies of alternatives:choice
0 1 2
0.715035 0.234266 0.050699
nr method
7 iterations, 0h:0m:0s
g'(-H)^-1g = 6.91E-07
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):1 -2.25778 0.11139 -20.2694 < 2.2e-16 ***
(Intercept):2 -3.91695 0.21340 -18.3546 < 2.2e-16 ***
nontrn:1 5.14552 1.02874 5.0018 5.680e-07 ***
nontrn:2 5.28755 1.06356 4.9715 6.642e-07 ***
p_distract:1 3.34149 0.80064 4.1735 2.999e-05 ***
p_distract:2 3.31788 0.86553 3.8334 0.0001264 ***
othr_distract:1 2.55873 0.86173 2.9693 0.0029849 **
othr_distract:2 2.51929 0.92154 2.7338 0.0062612 **
workload:1 1.97424 0.95155 2.0748 0.0380078 *
workload:2 1.86404 1.01358 1.8391 0.0659063 .
interfnc:1 3.60906 0.64340 5.6094 2.031e-08 ***
interfnc:2 3.99639 0.69670 5.7362 9.684e-09 ***
int_physical:1 3.70952 0.77975 4.7574 1.961e-06 ***
int_physical:2 3.26526 0.86273 3.7848 0.0001538 ***
ext_physical:1 4.03363 0.62547 6.4489 1.126e-10 ***
ext_physical:2 4.27428 0.67592 6.3236 2.555e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -503.51
McFadden R^2: 0.39792
Likelihood ratio test : chisq = 665.53 (p.value = < 2.22e-16)
The event_type variable has 3 levels : 0=No event, 1=Incident, 2=Accident. The reference category in the multinomial model above is 0=No event. The fitted equation for the log-odds of an incident (minor event) relative to no aircraft event occurring is written using the model parameters with “:1” after them.
\[\log\left(\frac{Incident}{No event}\right) = -2.25778 + 5.14552_{\text{nontrn}} +3.34149 _{\text{pdistract}} + 2.55873_{\text{othrdistract}} + 1.97424_{\text{workload}} + 3.60906_{\text{interfnc}} + 3.70952 _{\text{intphysical}}+ 4.03363 _{\text{extphysical}} \] Write out the fitted equation for the log-odds of an accident relative to no aircraft event, using the model parameters with “:2” after them.
\[\log\left(\frac{Accident}{No event}\right) = -3.91695 + 5.28755_{\text{nontrn}} + 3.31788 _{\text{pdistract}} + 2.51929 _{\text{othrdistract}} +1.86404 _{\text{workload}} +3.99639_{\text{interfnc}} + 3.26526_{\text{intphysical}}+4.27428_{\text{extphysical}}\]
\[\log\left(\frac{Incident}{Accident}\right) = \log\left(\frac{Incident/No event}{Accident/No event}\right) = \log\left(\frac{Incident}{No event}\right) - \log\left(\frac{Accident}{Noevent}\right)\] Use this derivation to calculate the fitted equation for the log-odds of not supporting the stadium relative to being undecided.
\[\log\left(\frac{Incident}{Accident}\right) = -6.17473 - 0.14203_{\text{nontrn}} + 0.02361_{\text{pdistract}}+ 0.03944_{\text{othrdistract}} + 0.1102 _{\text{workload}} - 0.38733_{\text{interfnc}} + 0.44426_{\text{intphysical}}-0.24065_{\text{extphysical}}\]