1
00:00:05,300 --> 00:00:11,500
[Music]

2
00:00:12,480 --> 00:00:16,320
right now a special pdns presenter

3
00:00:14,960 --> 00:00:18,640
anthony green

4
00:00:16,320 --> 00:00:20,320
anthony is a veritable national treasure

5
00:00:18,640 --> 00:00:22,640
he's best known as chief election

6
00:00:20,320 --> 00:00:24,480
analyst with the australian broadcasting

7
00:00:22,640 --> 00:00:26,160
corporation and is the face of

8
00:00:24,480 --> 00:00:27,359
television election coverages in

9
00:00:26,160 --> 00:00:29,119
australia

10
00:00:27,359 --> 00:00:31,359
during the pdns anthony will be

11
00:00:29,119 --> 00:00:32,559
presenting election night analysis art

12
00:00:31,359 --> 00:00:34,239
of science

13
00:00:32,559 --> 00:00:36,960
and if we have anthony there i would

14
00:00:34,239 --> 00:00:38,800
like to throw over to you

15
00:00:36,960 --> 00:00:40,640
thank you miles now just make sure

16
00:00:38,800 --> 00:00:41,520
everyone's hearing me good

17
00:00:40,640 --> 00:00:43,600
um

18
00:00:41,520 --> 00:00:45,840
election night analysis art or science

19
00:00:43,600 --> 00:00:47,920
well the answer answered i'll answer

20
00:00:45,840 --> 00:00:50,800
right at start the answer is science

21
00:00:47,920 --> 00:00:52,480
um it's not guesswork it's not you know

22
00:00:50,800 --> 00:00:55,039
hunches and feelings

23
00:00:52,480 --> 00:00:55,920
it's hardcore maths and science

24
00:00:55,039 --> 00:00:57,920
and

25
00:00:55,920 --> 00:00:59,280
you know at that terrible moment when

26
00:00:57,920 --> 00:01:00,239
it's looking really close and you're not

27
00:00:59,280 --> 00:01:02,160
sure what's going to happen there's a

28
00:01:00,239 --> 00:01:05,199
bit of experience and i've been here

29
00:01:02,160 --> 00:01:07,040
before type thing the art the feeling

30
00:01:05,199 --> 00:01:08,799
um but you know when it comes down to it

31
00:01:07,040 --> 00:01:10,560
it's maths now

32
00:01:08,799 --> 00:01:13,200
what the whole i'm not just start with

33
00:01:10,560 --> 00:01:14,560
some credits first um

34
00:01:13,200 --> 00:01:16,080
pictures in this presentation are all

35
00:01:14,560 --> 00:01:18,320
from the australian electoral commission

36
00:01:16,080 --> 00:01:20,479
have very useful flicker site pictures

37
00:01:18,320 --> 00:01:22,159
the graphs have been prepared by me

38
00:01:20,479 --> 00:01:24,159
a statistical credit for dr ross

39
00:01:22,159 --> 00:01:26,720
cunningham who's an adjunct professor at

40
00:01:24,159 --> 00:01:29,520
anu background in statistical analysis

41
00:01:26,720 --> 00:01:31,200
and he developed many of the statistical

42
00:01:29,520 --> 00:01:33,200
methods we used to analyze elections in

43
00:01:31,200 --> 00:01:35,200
australia back in the 1980s when there

44
00:01:33,200 --> 00:01:37,520
was a lot less computer power and a lot

45
00:01:35,200 --> 00:01:39,119
less data my thanks to the abc who

46
00:01:37,520 --> 00:01:41,200
originally hired me for six months as a

47
00:01:39,119 --> 00:01:43,119
researcher my background was writing as

48
00:01:41,200 --> 00:01:44,640
computer programmer back in the 80s and

49
00:01:43,119 --> 00:01:46,720
then a background in political science

50
00:01:44,640 --> 00:01:48,320
from study and so i was the right

51
00:01:46,720 --> 00:01:51,119
combination of different skills when the

52
00:01:48,320 --> 00:01:52,960
abc was looking for someone in 1989 and

53
00:01:51,119 --> 00:01:54,880
30 years later they think i'm still

54
00:01:52,960 --> 00:01:55,840
doing useful work

55
00:01:54,880 --> 00:01:57,200
so

56
00:01:55,840 --> 00:02:00,000
election night

57
00:01:57,200 --> 00:02:01,520
it's about trying to work out

58
00:02:00,000 --> 00:02:03,119
take what these people are writing on

59
00:02:01,520 --> 00:02:04,719
their bits of paper

60
00:02:03,119 --> 00:02:06,159
and turning into who's going to run

61
00:02:04,719 --> 00:02:07,360
government for the next three years

62
00:02:06,159 --> 00:02:09,759
that's what we're doing on election

63
00:02:07,360 --> 00:02:13,440
night we're reporting all those bits of

64
00:02:09,759 --> 00:02:15,120
paper added summed up sent through to us

65
00:02:13,440 --> 00:02:16,720
when can we work out

66
00:02:15,120 --> 00:02:18,959
who's won the election

67
00:02:16,720 --> 00:02:21,440
now the long history of reporting

68
00:02:18,959 --> 00:02:23,520
election nights in australia here's one

69
00:02:21,440 --> 00:02:25,599
from wa um something that western

70
00:02:23,520 --> 00:02:28,000
australia thinks was a terrible mistake

71
00:02:25,599 --> 00:02:29,599
which was joining federation in those

72
00:02:28,000 --> 00:02:31,840
days the media used to run these big

73
00:02:29,599 --> 00:02:33,280
tally boards and they would report the

74
00:02:31,840 --> 00:02:35,040
results like this people would come in

75
00:02:33,280 --> 00:02:37,519
to watch results going up

76
00:02:35,040 --> 00:02:38,959
that process eventually i mean in those

77
00:02:37,519 --> 00:02:40,959
days

78
00:02:38,959 --> 00:02:42,800
newspapers tend to have had big

79
00:02:40,959 --> 00:02:43,920
telephony services more than the

80
00:02:42,800 --> 00:02:46,000
government did

81
00:02:43,920 --> 00:02:47,120
so that's why the newspapers used to use

82
00:02:46,000 --> 00:02:49,360
do this sort of stuff they could get the

83
00:02:47,120 --> 00:02:51,200
telegrams in quicker

84
00:02:49,360 --> 00:02:54,000
than the government

85
00:02:51,200 --> 00:02:55,440
by the 1990s this was the 1990 tally

86
00:02:54,000 --> 00:02:57,599
room in canberra

87
00:02:55,440 --> 00:02:59,440
huge great board with numbers on it

88
00:02:57,599 --> 00:03:01,120
which people used to read the old

89
00:02:59,440 --> 00:03:02,800
hardcore had been doing it for decades

90
00:03:01,120 --> 00:03:04,640
would be down front looking at the

91
00:03:02,800 --> 00:03:06,239
numbers and they could tell from four

92
00:03:04,640 --> 00:03:07,920
digit numbers who was winning or not

93
00:03:06,239 --> 00:03:09,200
something i've never been able to do i'm

94
00:03:07,920 --> 00:03:11,680
afraid i have to use statistical

95
00:03:09,200 --> 00:03:11,680
analysis

96
00:03:11,840 --> 00:03:15,760
this is the back of the tally board in

97
00:03:13,280 --> 00:03:17,360
2001 there's a bunch of names of the

98
00:03:15,760 --> 00:03:18,800
candidates the same as on the front

99
00:03:17,360 --> 00:03:20,319
these little boards have all got little

100
00:03:18,800 --> 00:03:22,159
pins and they used to put little numbers

101
00:03:20,319 --> 00:03:23,360
on them and then fit the board around

102
00:03:22,159 --> 00:03:25,519
and that's what people used to read now

103
00:03:23,360 --> 00:03:27,519
when i came into television we'd stopped

104
00:03:25,519 --> 00:03:29,680
shooting the board it had just become a

105
00:03:27,519 --> 00:03:31,840
big backdrop but

106
00:03:29,680 --> 00:03:33,760
for another two decades after 1990 they

107
00:03:31,840 --> 00:03:36,239
were still doing this board no one was

108
00:03:33,760 --> 00:03:38,000
shooting it was of much less use it was

109
00:03:36,239 --> 00:03:40,080
entirely done as a backdrop for

110
00:03:38,000 --> 00:03:42,799
television

111
00:03:40,080 --> 00:03:44,560
now australian federal elections

112
00:03:42,799 --> 00:03:46,720
for those who don't know the election

113
00:03:44,560 --> 00:03:47,920
process in australia that greatly quick

114
00:03:46,720 --> 00:03:49,599
run through that

115
00:03:47,920 --> 00:03:51,360
um you have two chambers elected on the

116
00:03:49,599 --> 00:03:54,239
same day the house representatives in

117
00:03:51,360 --> 00:03:56,400
the senate the house is elected from 151

118
00:03:54,239 --> 00:03:58,159
single member districts the senate is

119
00:03:56,400 --> 00:04:00,640
elected by proportional representation

120
00:03:58,159 --> 00:04:02,640
from six states and two territories i

121
00:04:00,640 --> 00:04:04,239
won't do anything on the senate tonight

122
00:04:02,640 --> 00:04:07,040
each house division has about a hundred

123
00:04:04,239 --> 00:04:09,439
thousand voters voting is compulsory

124
00:04:07,040 --> 00:04:10,720
turnout is usually above ninety percent

125
00:04:09,439 --> 00:04:13,360
around the country there's seven

126
00:04:10,720 --> 00:04:16,479
thousand polling places plus five plus

127
00:04:13,360 --> 00:04:17,759
pre-poll centers 400 plus mobile

128
00:04:16,479 --> 00:04:19,680
voting

129
00:04:17,759 --> 00:04:22,000
teams not trends and i've got my first

130
00:04:19,680 --> 00:04:23,919
spelling error and all voting sentences

131
00:04:22,000 --> 00:04:26,240
are counted on election night postals

132
00:04:23,919 --> 00:04:28,720
and absent votes are after

133
00:04:26,240 --> 00:04:29,840
added after election day

134
00:04:28,720 --> 00:04:31,680
australia uses what's called

135
00:04:29,840 --> 00:04:33,520
preferential voting or rank ordering

136
00:04:31,680 --> 00:04:35,680
voting there's a ballot paper on the

137
00:04:33,520 --> 00:04:37,680
right for the lecturer higgins i think

138
00:04:35,680 --> 00:04:39,680
that's 2016.

139
00:04:37,680 --> 00:04:41,919
voters must number all squares on that

140
00:04:39,680 --> 00:04:42,880
ballot paper a consecutive sequence of

141
00:04:41,919 --> 00:04:44,639
numbers

142
00:04:42,880 --> 00:04:46,080
to win a candidate must receive fifty

143
00:04:44,639 --> 00:04:47,759
percent of the vote

144
00:04:46,080 --> 00:04:49,840
if no candidate receives more than fifty

145
00:04:47,759 --> 00:04:51,440
percent of the vote then the lowest

146
00:04:49,840 --> 00:04:53,040
candidate is excluded their ballot

147
00:04:51,440 --> 00:04:54,960
papers re-examined for the next

148
00:04:53,040 --> 00:04:56,880
available preference and the tally of

149
00:04:54,960 --> 00:04:58,560
those preferences are transferred to

150
00:04:56,880 --> 00:04:59,440
another candidate another candidate in

151
00:04:58,560 --> 00:05:01,759
the account

152
00:04:59,440 --> 00:05:04,000
all the preferences that are counted are

153
00:05:01,759 --> 00:05:05,759
what is written on the ballot papers

154
00:05:04,000 --> 00:05:07,680
there's no party control no candidate

155
00:05:05,759 --> 00:05:09,520
control over that it's what voters write

156
00:05:07,680 --> 00:05:10,880
on the ballot paper

157
00:05:09,520 --> 00:05:13,360
the process of excluding and

158
00:05:10,880 --> 00:05:14,639
transferring continues until only two

159
00:05:13,360 --> 00:05:17,199
candidates remain and one of the

160
00:05:14,639 --> 00:05:19,600
advantages in australia is we do come

161
00:05:17,199 --> 00:05:21,759
down to only two candidates which makes

162
00:05:19,600 --> 00:05:24,000
all the mathematics of the modeling much

163
00:05:21,759 --> 00:05:26,000
easier than in many other countries and

164
00:05:24,000 --> 00:05:28,000
the fact the full distribution of

165
00:05:26,000 --> 00:05:30,800
preferences doesn't play take place till

166
00:05:28,000 --> 00:05:32,160
two weeks after the election so up until

167
00:05:30,800 --> 00:05:34,400
two weeks after the election when they

168
00:05:32,160 --> 00:05:36,320
formed declare winners we're working off

169
00:05:34,400 --> 00:05:38,720
preliminary figures especially on

170
00:05:36,320 --> 00:05:41,120
election night now the process of

171
00:05:38,720 --> 00:05:43,360
counting begins with this they tip the

172
00:05:41,120 --> 00:05:45,039
ballot papers out of the boxes now

173
00:05:43,360 --> 00:05:46,400
scrutineers are able to observe the

174
00:05:45,039 --> 00:05:47,440
closing and the opening of the ballot

175
00:05:46,400 --> 00:05:49,440
boxes

176
00:05:47,440 --> 00:05:51,680
uh they're able to observe all this

177
00:05:49,440 --> 00:05:54,400
process then they just tip these ballot

178
00:05:51,680 --> 00:05:55,919
boxes onto the table or and that's when

179
00:05:54,400 --> 00:05:57,360
the counting starts

180
00:05:55,919 --> 00:05:59,360
the first thing they do on election

181
00:05:57,360 --> 00:06:00,720
night is they count

182
00:05:59,360 --> 00:06:02,000
each of them is counted on the night

183
00:06:00,720 --> 00:06:03,840
after 6 pm

184
00:06:02,000 --> 00:06:06,319
the tallying of first preferences by

185
00:06:03,840 --> 00:06:07,840
candidate is done by hand it is not

186
00:06:06,319 --> 00:06:10,000
scanned it is not the american

187
00:06:07,840 --> 00:06:12,400
electronic voting it is not scanning a

188
00:06:10,000 --> 00:06:13,919
ballot papers it is done by hand

189
00:06:12,400 --> 00:06:15,680
candidates appoint scrutineers to

190
00:06:13,919 --> 00:06:17,120
observe the count they are able to

191
00:06:15,680 --> 00:06:18,800
observe the ceiling and the opening the

192
00:06:17,120 --> 00:06:21,120
ballot boxes and they are able to

193
00:06:18,800 --> 00:06:22,560
observe the count at all time and see

194
00:06:21,120 --> 00:06:24,560
the ballot papers they're not allowed to

195
00:06:22,560 --> 00:06:26,560
touch the ballot papers but they are

196
00:06:24,560 --> 00:06:28,479
able to challenge votes though on

197
00:06:26,560 --> 00:06:30,479
election night they're more trying to

198
00:06:28,479 --> 00:06:32,000
look at preference flows

199
00:06:30,479 --> 00:06:35,520
and get the tallies for their own

200
00:06:32,000 --> 00:06:37,039
internal purposes as a political party

201
00:06:35,520 --> 00:06:39,919
the number of ballot papers also varied

202
00:06:37,039 --> 00:06:42,160
against verified against the number of

203
00:06:39,919 --> 00:06:43,840
ballot papers issued and then once this

204
00:06:42,160 --> 00:06:44,639
is all done the numbers are phoned

205
00:06:43,840 --> 00:06:46,800
through

206
00:06:44,639 --> 00:06:48,479
to wherever the data entry operates for

207
00:06:46,800 --> 00:06:50,720
the electoral commissioner and the

208
00:06:48,479 --> 00:06:52,960
tallies for that polling place are

209
00:06:50,720 --> 00:06:56,160
entered so this is how the process

210
00:06:52,960 --> 00:06:58,639
occurs it's done the old-fashioned way

211
00:06:56,160 --> 00:07:00,639
bits of ballot paper put onto piles

212
00:06:58,639 --> 00:07:02,560
and at the end of that process

213
00:07:00,639 --> 00:07:04,479
they have a whole bunch of bundles of

214
00:07:02,560 --> 00:07:05,599
votes usually all bundled up into lumps

215
00:07:04,479 --> 00:07:07,120
of 100

216
00:07:05,599 --> 00:07:08,880
so they've got a tally of first

217
00:07:07,120 --> 00:07:11,680
preferences in every polling place and

218
00:07:08,880 --> 00:07:13,199
that's phone through

219
00:07:11,680 --> 00:07:15,919
what they've done now in australia since

220
00:07:13,199 --> 00:07:16,960
1993 and thankfully i didn't work before

221
00:07:15,919 --> 00:07:18,560
then because this would have been much

222
00:07:16,960 --> 00:07:20,319
harder

223
00:07:18,560 --> 00:07:22,000
is they do what's called an indicative

224
00:07:20,319 --> 00:07:23,520
preference count

225
00:07:22,000 --> 00:07:24,720
beforehand the electoral commission

226
00:07:23,520 --> 00:07:25,759
nominates

227
00:07:24,720 --> 00:07:27,919
um

228
00:07:25,759 --> 00:07:29,919
two candidates in every contest who will

229
00:07:27,919 --> 00:07:31,520
be the final pairing so they don't do

230
00:07:29,919 --> 00:07:33,199
the full distribution of preferences

231
00:07:31,520 --> 00:07:35,280
what they do is they nominate two

232
00:07:33,199 --> 00:07:36,880
candidates at the start

233
00:07:35,280 --> 00:07:38,960
they're in an envelope they open them so

234
00:07:36,880 --> 00:07:41,440
they know this after 6pm this is you

235
00:07:38,960 --> 00:07:43,280
can't know this information before 6pm

236
00:07:41,440 --> 00:07:46,560
it's only an indicative count so it's

237
00:07:43,280 --> 00:07:48,240
not important for anybody else to know

238
00:07:46,560 --> 00:07:50,560
then they examine all the bundles of

239
00:07:48,240 --> 00:07:52,240
ballot papers for the other candidates

240
00:07:50,560 --> 00:07:54,319
they go through all those and they work

241
00:07:52,240 --> 00:07:57,120
out from that list of candidates which

242
00:07:54,319 --> 00:07:58,879
of the final two gets the

243
00:07:57,120 --> 00:08:01,280
lower numbered preference on that ballot

244
00:07:58,879 --> 00:08:03,199
paper which candidate receives the

245
00:08:01,280 --> 00:08:04,960
preference from that ballot paper they

246
00:08:03,199 --> 00:08:06,879
tally all those preference flows for

247
00:08:04,960 --> 00:08:08,800
each candidate they add them to the

248
00:08:06,879 --> 00:08:10,960
first preference votes for the two final

249
00:08:08,800 --> 00:08:13,440
two candidates and then they phone those

250
00:08:10,960 --> 00:08:14,879
numbers through so from each polling

251
00:08:13,440 --> 00:08:17,599
place and each pre-poll center on

252
00:08:14,879 --> 00:08:21,039
election night we get two totals the

253
00:08:17,599 --> 00:08:22,479
first one is that first preference tally

254
00:08:21,039 --> 00:08:24,879
and the second one

255
00:08:22,479 --> 00:08:26,960
is the is the two-party preferred or two

256
00:08:24,879 --> 00:08:28,639
candidate preferred two-party preferred

257
00:08:26,960 --> 00:08:30,639
tends to prefer labor versus the

258
00:08:28,639 --> 00:08:32,560
coalition the two major parties

259
00:08:30,639 --> 00:08:34,080
sometimes the major part is excluded and

260
00:08:32,560 --> 00:08:36,159
we have what's called two candidate

261
00:08:34,080 --> 00:08:38,240
preferred in the end it all comes down

262
00:08:36,159 --> 00:08:39,760
to two candidates in every seat and it's

263
00:08:38,240 --> 00:08:41,599
worked out beforehand who in the most

264
00:08:39,760 --> 00:08:42,959
likely final two for this count if they

265
00:08:41,599 --> 00:08:44,880
get account wrong they do it again after

266
00:08:42,959 --> 00:08:47,040
the election but this is just done for

267
00:08:44,880 --> 00:08:48,959
extra information to help us understand

268
00:08:47,040 --> 00:08:50,720
this came about after 1990 federal

269
00:08:48,959 --> 00:08:52,240
election when there's a huge vote for

270
00:08:50,720 --> 00:08:54,560
the democrats nobody had preference

271
00:08:52,240 --> 00:08:56,160
counts and it was unclear who'd won so

272
00:08:54,560 --> 00:08:58,399
they introduced this process for the

273
00:08:56,160 --> 00:08:59,760
future and as the proportion of vote

274
00:08:58,399 --> 00:09:01,519
that's gone to minor parties has

275
00:08:59,760 --> 00:09:03,279
increased this has become ever more

276
00:09:01,519 --> 00:09:04,880
important

277
00:09:03,279 --> 00:09:06,000
what happens then is these numbers are

278
00:09:04,880 --> 00:09:08,480
phone through

279
00:09:06,000 --> 00:09:10,080
data entered into the ac's computer

280
00:09:08,480 --> 00:09:13,120
system transmitted to their central

281
00:09:10,080 --> 00:09:15,120
server put in a database and that runs

282
00:09:13,120 --> 00:09:16,720
the aec's virtual tally room if you've

283
00:09:15,120 --> 00:09:18,880
used it online

284
00:09:16,720 --> 00:09:21,279
there's also an xml strip published

285
00:09:18,880 --> 00:09:24,080
every two minutes of all the data

286
00:09:21,279 --> 00:09:25,279
so it's a total for every candidate in

287
00:09:24,080 --> 00:09:26,959
every electorate

288
00:09:25,279 --> 00:09:29,040
both first preference and two candidate

289
00:09:26,959 --> 00:09:31,040
preferred and those first preferences

290
00:09:29,040 --> 00:09:33,120
and two candidate also reported by

291
00:09:31,040 --> 00:09:34,880
polling place so we have this whacking

292
00:09:33,120 --> 00:09:36,640
great file with all that information in

293
00:09:34,880 --> 00:09:38,480
there there's various versions of it

294
00:09:36,640 --> 00:09:40,080
there's what's called the both versions

295
00:09:38,480 --> 00:09:42,240
which have all the strings like names

296
00:09:40,080 --> 00:09:43,920
and these other ones that just have the

297
00:09:42,240 --> 00:09:46,640
votes

298
00:09:43,920 --> 00:09:48,240
the files also contain historical data

299
00:09:46,640 --> 00:09:50,720
for each polling place and for each

300
00:09:48,240 --> 00:09:52,959
candidate and that is made available to

301
00:09:50,720 --> 00:09:54,959
the to the media as well beforehand but

302
00:09:52,959 --> 00:09:57,040
you can use it directly from the xml

303
00:09:54,959 --> 00:09:59,200
file if you don't have a database but

304
00:09:57,040 --> 00:10:00,880
for for various reasons we in the media

305
00:09:59,200 --> 00:10:05,519
prefer to have the data in our own

306
00:10:00,880 --> 00:10:08,000
system and or do our own calculations

307
00:10:05,519 --> 00:10:10,160
the abc computer is pre-loaded with all

308
00:10:08,000 --> 00:10:12,560
the electorates and candidate details

309
00:10:10,160 --> 00:10:14,320
polling place details including history

310
00:10:12,560 --> 00:10:16,640
estimated preference formulas that can

311
00:10:14,320 --> 00:10:17,680
be used until actual preference counts

312
00:10:16,640 --> 00:10:19,279
received

313
00:10:17,680 --> 00:10:21,680
and where we judge the aacs pick the

314
00:10:19,279 --> 00:10:23,200
wrong pairing of candidates

315
00:10:21,680 --> 00:10:24,959
as well as calculation information the

316
00:10:23,200 --> 00:10:27,600
data blaze space includes attributes

317
00:10:24,959 --> 00:10:30,079
that determine party colors ordering

318
00:10:27,600 --> 00:10:32,160
details for graphics for online and for

319
00:10:30,079 --> 00:10:34,240
picture names the abc's database

320
00:10:32,160 --> 00:10:36,000
provides the structure through which

321
00:10:34,240 --> 00:10:38,079
we're able to analyze the data we're

322
00:10:36,000 --> 00:10:39,920
just getting raw data we have to have

323
00:10:38,079 --> 00:10:42,880
our own structure which we use to

324
00:10:39,920 --> 00:10:46,000
analyze the data and aggregate the data

325
00:10:42,880 --> 00:10:48,000
now the abc passes the aec data file

326
00:10:46,000 --> 00:10:50,000
strips out the results by polling place

327
00:10:48,000 --> 00:10:52,480
an electorate and stores in our system

328
00:10:50,000 --> 00:10:54,240
we check it for obvious calculation

329
00:10:52,480 --> 00:10:56,160
errors and then our abc computer

330
00:10:54,240 --> 00:10:58,399
performs its predictive calculations on

331
00:10:56,160 --> 00:11:01,200
all seats and every up after every

332
00:10:58,399 --> 00:11:03,600
update and after any internal parameters

333
00:11:01,200 --> 00:11:05,120
change the computer also generates json

334
00:11:03,600 --> 00:11:07,200
output which is used for television

335
00:11:05,120 --> 00:11:08,880
graphics and for publishing our abc

336
00:11:07,200 --> 00:11:10,320
online system now i'm going to turn our

337
00:11:08,880 --> 00:11:12,240
camera off because we've been um we've

338
00:11:10,320 --> 00:11:16,320
having some camera problems but i'll

339
00:11:12,240 --> 00:11:18,240
continue with the the slide presentation

340
00:11:16,320 --> 00:11:19,279
um

341
00:11:18,240 --> 00:11:21,440
now

342
00:11:19,279 --> 00:11:22,320
let's call this camera

343
00:11:21,440 --> 00:11:24,560
now

344
00:11:22,320 --> 00:11:26,320
post election counting um

345
00:11:24,560 --> 00:11:28,880
this is

346
00:11:26,320 --> 00:11:30,399
after election night so this is i'm

347
00:11:28,880 --> 00:11:31,680
raising these issues just because people

348
00:11:30,399 --> 00:11:33,600
always have doubts of how the election

349
00:11:31,680 --> 00:11:35,519
accounting works and stuff this is just

350
00:11:33,600 --> 00:11:37,600
to tell you that the system has lots of

351
00:11:35,519 --> 00:11:39,200
other checks in all votes counted on

352
00:11:37,600 --> 00:11:40,640
election night are transferred overnight

353
00:11:39,200 --> 00:11:41,760
to the returning officer for the

354
00:11:40,640 --> 00:11:43,600
district

355
00:11:41,760 --> 00:11:45,279
all first preference and two candidate

356
00:11:43,600 --> 00:11:47,440
tallies are check counted over several

357
00:11:45,279 --> 00:11:49,920
days the account conducted by different

358
00:11:47,440 --> 00:11:51,360
staff and again watched by scrutineers

359
00:11:49,920 --> 00:11:53,200
not necessarily the same ones that

360
00:11:51,360 --> 00:11:55,519
washed on the night the indicative

361
00:11:53,200 --> 00:11:57,519
preference counts are redone

362
00:11:55,519 --> 00:12:00,000
and they're redone entirely if the wrong

363
00:11:57,519 --> 00:12:02,000
candidates were chosen postals absence

364
00:12:00,000 --> 00:12:04,240
and provisionals are added up added over

365
00:12:02,000 --> 00:12:06,079
for over fortnight after the election

366
00:12:04,240 --> 00:12:08,480
after a fortnight a full distribution of

367
00:12:06,079 --> 00:12:11,760
preferences is done which is effectively

368
00:12:08,480 --> 00:12:14,160
a third count of all non food of all

369
00:12:11,760 --> 00:12:16,800
votes for minor parties and if not

370
00:12:14,160 --> 00:12:18,320
already done a formal win it is declared

371
00:12:16,800 --> 00:12:19,120
at this point

372
00:12:18,320 --> 00:12:20,959
now

373
00:12:19,120 --> 00:12:23,120
just to explain how we've got to this

374
00:12:20,959 --> 00:12:26,000
point over the years

375
00:12:23,120 --> 00:12:28,160
the ac's first computers uh started to

376
00:12:26,000 --> 00:12:29,839
use them in 1980s we had to be in the

377
00:12:28,160 --> 00:12:31,120
tally room to get the data by exonic

378
00:12:29,839 --> 00:12:33,600
soft feed

379
00:12:31,120 --> 00:12:36,160
um they began to add historical data to

380
00:12:33,600 --> 00:12:39,519
the feed in 1990 um preference counts

381
00:12:36,160 --> 00:12:41,200
were added in 1993 they switched to xml

382
00:12:39,519 --> 00:12:42,160
as the export format rather than the old

383
00:12:41,200 --> 00:12:44,880
feed

384
00:12:42,160 --> 00:12:47,440
published to an ftp site

385
00:12:44,880 --> 00:12:50,160
in the mid 2000s the last physical room

386
00:12:47,440 --> 00:12:53,040
tally room was in 20 2010.

387
00:12:50,160 --> 00:12:54,959
the abc had originally a pdp-11 system

388
00:12:53,040 --> 00:12:57,680
when i started it was turned into a pc

389
00:12:54,959 --> 00:12:59,440
network running unix later linux

390
00:12:57,680 --> 00:13:01,519
um

391
00:12:59,440 --> 00:13:04,480
it was a memory map it you know there

392
00:13:01,519 --> 00:13:07,839
was um written in c first gui interface

393
00:13:04,480 --> 00:13:10,000
was 2008 uh it was rewritten in dot net

394
00:13:07,839 --> 00:13:11,440
c plus plus in about 2010

395
00:13:10,000 --> 00:13:14,160
and most recently last year was

396
00:13:11,440 --> 00:13:16,399
rewritten to run on an amazon server um

397
00:13:14,160 --> 00:13:18,639
with various other bits and pieces

398
00:13:16,399 --> 00:13:20,480
so it's been a long history of changes

399
00:13:18,639 --> 00:13:22,880
which has been sort of mirroring the way

400
00:13:20,480 --> 00:13:23,760
the internet has changed largely

401
00:13:22,880 --> 00:13:25,600
now

402
00:13:23,760 --> 00:13:27,279
election night the statistical problem

403
00:13:25,600 --> 00:13:29,120
let me start with a basic statistics

404
00:13:27,279 --> 00:13:30,560
question if you have a large bag full of

405
00:13:29,120 --> 00:13:32,320
a thousand black and white ping pong

406
00:13:30,560 --> 00:13:34,480
balls the balls have been thoroughly

407
00:13:32,320 --> 00:13:35,920
mixed what size sample do you need to

408
00:13:34,480 --> 00:13:38,480
draw to be certain of the ratio of

409
00:13:35,920 --> 00:13:40,720
blacks and white balls now the election

410
00:13:38,480 --> 00:13:42,399
night question is similar at 6 pm on

411
00:13:40,720 --> 00:13:44,800
election night each electorate has 100

412
00:13:42,399 --> 00:13:46,720
000 ballot papers sealed in ballot boxes

413
00:13:44,800 --> 00:13:48,480
and envelopes how many have to be

414
00:13:46,720 --> 00:13:51,360
counted and reported before we can be

415
00:13:48,480 --> 00:13:54,079
certain of the ratio of labor to liberal

416
00:13:51,360 --> 00:13:56,079
votes now the two problems are the same

417
00:13:54,079 --> 00:13:58,000
except with election night we have to

418
00:13:56,079 --> 00:13:59,839
take steps to account for the samples

419
00:13:58,000 --> 00:14:01,279
not being random

420
00:13:59,839 --> 00:14:03,519
we also have the advantage that the

421
00:14:01,279 --> 00:14:05,920
figures are progressive so each update

422
00:14:03,519 --> 00:14:07,839
is a larger sample of effectively but

423
00:14:05,920 --> 00:14:08,720
it's a cluster sample not a random

424
00:14:07,839 --> 00:14:10,320
sample

425
00:14:08,720 --> 00:14:12,959
but we do have the advantage of these

426
00:14:10,320 --> 00:14:15,040
clusters we know what the voting was in

427
00:14:12,959 --> 00:14:17,920
them was last time so it's the same

428
00:14:15,040 --> 00:14:19,839
basic statistics with a whole bunch of

429
00:14:17,920 --> 00:14:20,800
statistical methods to try and overcome

430
00:14:19,839 --> 00:14:22,560
the fact

431
00:14:20,800 --> 00:14:25,440
that um you have to account for the data

432
00:14:22,560 --> 00:14:27,680
not being random

433
00:14:25,440 --> 00:14:29,519
now predicting every electorate

434
00:14:27,680 --> 00:14:31,440
um

435
00:14:29,519 --> 00:14:33,360
every every electorate will finish with

436
00:14:31,440 --> 00:14:35,920
a final value p which is the winning

437
00:14:33,360 --> 00:14:37,519
candidate's proportion of all votes we

438
00:14:35,920 --> 00:14:39,600
often call this two-party preferred or

439
00:14:37,519 --> 00:14:41,519
two candidate preferred and we express

440
00:14:39,600 --> 00:14:43,680
it as a percentage 55

441
00:14:41,519 --> 00:14:45,279
to candidate preferred at all times in

442
00:14:43,680 --> 00:14:47,519
the count we have a point estimate a

443
00:14:45,279 --> 00:14:49,279
little p and that's our current estimate

444
00:14:47,519 --> 00:14:51,680
of the final result

445
00:14:49,279 --> 00:14:53,600
now our techniques are to minimize

446
00:14:51,680 --> 00:14:55,680
statistical bias

447
00:14:53,600 --> 00:14:57,440
that is things that make p an unreliable

448
00:14:55,680 --> 00:14:59,120
estimator

449
00:14:57,440 --> 00:15:00,800
and we also want to use methods that

450
00:14:59,120 --> 00:15:02,880
minimize the amount of variance the

451
00:15:00,800 --> 00:15:04,320
amount of variability the amount of

452
00:15:02,880 --> 00:15:05,279
moving up and down that the graphs will

453
00:15:04,320 --> 00:15:07,360
do

454
00:15:05,279 --> 00:15:08,639
and all points in the count at all

455
00:15:07,360 --> 00:15:10,639
points in account we construct a

456
00:15:08,639 --> 00:15:12,560
confidence interval for p having

457
00:15:10,639 --> 00:15:14,480
minimized bias and variance

458
00:15:12,560 --> 00:15:16,320
and calculate the probability that p is

459
00:15:14,480 --> 00:15:18,560
greater than 50 percent and if it's

460
00:15:16,320 --> 00:15:20,800
greater than 50 percent with a 99

461
00:15:18,560 --> 00:15:22,720
probability we make the binary solution

462
00:15:20,800 --> 00:15:24,480
to give it one side or the other

463
00:15:22,720 --> 00:15:25,680
and the overall result of the election

464
00:15:24,480 --> 00:15:27,760
is just simply the sum of the

465
00:15:25,680 --> 00:15:30,320
probabilities in individual seats with

466
00:15:27,760 --> 00:15:32,160
an error margin so this sounds

467
00:15:30,320 --> 00:15:33,680
all the effort goes into the individual

468
00:15:32,160 --> 00:15:35,519
seat results not into the overall

469
00:15:33,680 --> 00:15:37,600
prediction the overall prediction just

470
00:15:35,519 --> 00:15:39,839
falls out of the mathematics in each

471
00:15:37,600 --> 00:15:41,360
seat

472
00:15:39,839 --> 00:15:42,880
now we have a number of underlying

473
00:15:41,360 --> 00:15:44,480
assumptions one is that it tends to be a

474
00:15:42,880 --> 00:15:46,720
uniform swing

475
00:15:44,480 --> 00:15:48,720
from polling place to polling place and

476
00:15:46,720 --> 00:15:50,560
as i'll show in a while there is there

477
00:15:48,720 --> 00:15:51,920
you can make that assumption

478
00:15:50,560 --> 00:15:53,600
and um

479
00:15:51,920 --> 00:15:55,279
we also assume most people vote the same

480
00:15:53,600 --> 00:15:57,360
election as last time though that's

481
00:15:55,279 --> 00:15:59,440
begun to change in recent years with the

482
00:15:57,360 --> 00:16:00,639
increase in pre-poll voting

483
00:15:59,440 --> 00:16:02,320
just to comment

484
00:16:00,639 --> 00:16:04,800
comparing with the us we have much

485
00:16:02,320 --> 00:16:06,160
higher quality data here we have a much

486
00:16:04,800 --> 00:16:08,160
simpler issue because we have a two

487
00:16:06,160 --> 00:16:10,560
candidate contest in the u.s you haven't

488
00:16:08,160 --> 00:16:12,959
try and predict when the gap between the

489
00:16:10,560 --> 00:16:15,519
two candidates stabilizes we can work on

490
00:16:12,959 --> 00:16:16,560
this figure of being over 50

491
00:16:15,519 --> 00:16:19,360
um

492
00:16:16,560 --> 00:16:21,759
now what are the sources of error

493
00:16:19,360 --> 00:16:24,639
that we've got here we've got bias

494
00:16:21,759 --> 00:16:27,279
in elections that mainly many rural and

495
00:16:24,639 --> 00:16:29,199
especially mixed rural urban seats

496
00:16:27,279 --> 00:16:31,120
display a strong positive correlation

497
00:16:29,199 --> 00:16:33,279
between booth size and labor two-party

498
00:16:31,120 --> 00:16:35,519
preferred vote small rural polling

499
00:16:33,279 --> 00:16:37,839
places record lower labor vote than

500
00:16:35,519 --> 00:16:39,600
large urban ones now this matters

501
00:16:37,839 --> 00:16:42,160
because small polling places are quicker

502
00:16:39,600 --> 00:16:44,959
to count and first the report and

503
00:16:42,160 --> 00:16:47,199
therefore there's a resultant labor vote

504
00:16:44,959 --> 00:16:50,160
low labor vote early in the evening

505
00:16:47,199 --> 00:16:51,759
which is statistical bias in in terms of

506
00:16:50,160 --> 00:16:53,440
our point estimator one thing i'll say

507
00:16:51,759 --> 00:16:55,519
about this was um if you're familiar

508
00:16:53,440 --> 00:16:57,680
with the plate don's party that all

509
00:16:55,519 --> 00:17:00,000
began with labor vote high and falling

510
00:16:57,680 --> 00:17:03,839
through the night that used to happen

511
00:17:00,000 --> 00:17:05,919
before 1987 in 1987 they introduced the

512
00:17:03,839 --> 00:17:08,000
change so that all polling places

513
00:17:05,919 --> 00:17:09,439
counted at the start of the night before

514
00:17:08,000 --> 00:17:11,439
then because of lack of phones and the

515
00:17:09,439 --> 00:17:12,799
things a lot of country polling places

516
00:17:11,439 --> 00:17:14,559
were brought to town before they were

517
00:17:12,799 --> 00:17:17,120
counted and that meant the labor vote

518
00:17:14,559 --> 00:17:18,559
was high and fell through the night 1987

519
00:17:17,120 --> 00:17:20,480
was the first election where that trend

520
00:17:18,559 --> 00:17:21,760
reversed and that caused a lot of

521
00:17:20,480 --> 00:17:23,839
embarrassment for a lot of people who

522
00:17:21,760 --> 00:17:26,000
went on past trends and said that john

523
00:17:23,839 --> 00:17:27,520
howard had won the 87 election

524
00:17:26,000 --> 00:17:29,280
which he didn't

525
00:17:27,520 --> 00:17:31,440
now the other thing is variance past

526
00:17:29,280 --> 00:17:33,280
results can be used to calculate a range

527
00:17:31,440 --> 00:17:35,840
of variance of polling place results in

528
00:17:33,280 --> 00:17:38,080
each c seats with lower variance can be

529
00:17:35,840 --> 00:17:40,720
given away more quickly

530
00:17:38,080 --> 00:17:42,559
and and we also aim to adopt a

531
00:17:40,720 --> 00:17:43,760
statistical method

532
00:17:42,559 --> 00:17:47,280
where

533
00:17:43,760 --> 00:17:48,480
the variance of the vote the variance is

534
00:17:47,280 --> 00:17:50,640
smaller

535
00:17:48,480 --> 00:17:52,960
and that's happens to be with swing

536
00:17:50,640 --> 00:17:55,200
rather than vote so rather than using

537
00:17:52,960 --> 00:17:56,240
the vote the current vote to predict the

538
00:17:55,200 --> 00:17:58,160
result

539
00:17:56,240 --> 00:17:59,840
we use the change in vote

540
00:17:58,160 --> 00:18:01,520
from the same polling place as last time

541
00:17:59,840 --> 00:18:03,360
as a way i'll explain in a moment the

542
00:18:01,520 --> 00:18:05,360
americans always do this as statics they

543
00:18:03,360 --> 00:18:07,919
start they report the numbers report the

544
00:18:05,360 --> 00:18:09,840
numbers we in australia report the swing

545
00:18:07,919 --> 00:18:11,760
and show the projected value and it's

546
00:18:09,840 --> 00:18:13,679
much more reliable and it's why

547
00:18:11,760 --> 00:18:15,280
um you'll have that bloke running the

548
00:18:13,679 --> 00:18:17,280
screens in the united states talking

549
00:18:15,280 --> 00:18:19,360
about well the democrats are ahead but

550
00:18:17,280 --> 00:18:21,440
when this area comes in the panel handle

551
00:18:19,360 --> 00:18:23,200
of florida comes in the republicans will

552
00:18:21,440 --> 00:18:25,280
lead we don't have to talk about numbers

553
00:18:23,200 --> 00:18:27,679
here because we do that projection in

554
00:18:25,280 --> 00:18:30,080
how we present our data now to explain

555
00:18:27,679 --> 00:18:32,160
how all this works i'll move on to the

556
00:18:30,080 --> 00:18:33,200
example of an electorate

557
00:18:32,160 --> 00:18:35,120
and

558
00:18:33,200 --> 00:18:36,640
this is the electorate of braddon

559
00:18:35,120 --> 00:18:38,080
in tasmania

560
00:18:36,640 --> 00:18:39,520
just let me find my notes because i had

561
00:18:38,080 --> 00:18:41,360
a bit of scurrying around there for a

562
00:18:39,520 --> 00:18:42,559
few minutes

563
00:18:41,360 --> 00:18:44,320
brandon's up in the northwest of

564
00:18:42,559 --> 00:18:47,039
tasmania you can see it's a it's got a

565
00:18:44,320 --> 00:18:49,679
lot of blue dots a few red dots

566
00:18:47,039 --> 00:18:51,520
it's uh traditional swings cedar swung

567
00:18:49,679 --> 00:18:53,440
changed parties six of the last election

568
00:18:51,520 --> 00:18:55,039
eight elections a lot of big urban

569
00:18:53,440 --> 00:18:57,520
booths a lot of small

570
00:18:55,039 --> 00:19:00,240
country booths which have strong liberal

571
00:18:57,520 --> 00:19:02,320
votes and a smaller number of small

572
00:19:00,240 --> 00:19:05,120
labor boos on the west coast form a

573
00:19:02,320 --> 00:19:06,400
mining town so this is a uh this is a an

574
00:19:05,120 --> 00:19:09,120
electorate where you want to know what

575
00:19:06,400 --> 00:19:11,840
the data is like and where it's from

576
00:19:09,120 --> 00:19:14,799
this is a scatter plot from the 2019

577
00:19:11,840 --> 00:19:16,720
election the the grey dots are the

578
00:19:14,799 --> 00:19:19,600
two-party preferreds

579
00:19:16,720 --> 00:19:22,640
um for each uh

580
00:19:19,600 --> 00:19:25,360
each polling place now the yellow area

581
00:19:22,640 --> 00:19:26,720
is the two standard

582
00:19:25,360 --> 00:19:28,559
standard error confidence interval

583
00:19:26,720 --> 00:19:31,039
ninety-five percent confidence interval

584
00:19:28,559 --> 00:19:33,280
the variance on these polling places is

585
00:19:31,039 --> 00:19:35,280
eight point nine percent so two standard

586
00:19:33,280 --> 00:19:37,200
areas there's a seventeen percent range

587
00:19:35,280 --> 00:19:39,120
of results to have ninety-five percent

588
00:19:37,200 --> 00:19:40,880
of the polling places now i'll show you

589
00:19:39,120 --> 00:19:42,320
the two attributes with my mouse the

590
00:19:40,880 --> 00:19:44,559
first is you'll notice there's a bigger

591
00:19:42,320 --> 00:19:46,880
variance at the start and that's because

592
00:19:44,559 --> 00:19:48,480
a lot more small polling places small

593
00:19:46,880 --> 00:19:50,720
polling places are usually from small

594
00:19:48,480 --> 00:19:52,480
towns which are more homogenous than

595
00:19:50,720 --> 00:19:54,000
larger urban centers so there's a bit

596
00:19:52,480 --> 00:19:55,679
more variance particularly in a seat

597
00:19:54,000 --> 00:19:57,600
like this you'll also notice there's

598
00:19:55,679 --> 00:20:01,520
some bias there's a lot more of these

599
00:19:57,600 --> 00:20:03,840
dots above the 53.1 which was the final

600
00:20:01,520 --> 00:20:06,320
result of the election so on those early

601
00:20:03,840 --> 00:20:08,000
figures if the small birds come in first

602
00:20:06,320 --> 00:20:09,360
you're going to have a lot more

603
00:20:08,000 --> 00:20:10,640
variability

604
00:20:09,360 --> 00:20:12,240
you're also getting a lot of bias

605
00:20:10,640 --> 00:20:14,720
because there's a lot more liberal votes

606
00:20:12,240 --> 00:20:16,960
there in those early figures so if i go

607
00:20:14,720 --> 00:20:18,880
on to the next graph this graph is

608
00:20:16,960 --> 00:20:21,159
the same dots it's the biggest scale but

609
00:20:18,880 --> 00:20:22,799
it's the same dots and over it i've

610
00:20:21,159 --> 00:20:25,600
superimposed

611
00:20:22,799 --> 00:20:27,600
the progressive two candidate preferred

612
00:20:25,600 --> 00:20:29,440
two-party preferred from those polling

613
00:20:27,600 --> 00:20:30,320
places as they come in and as you can

614
00:20:29,440 --> 00:20:32,000
see

615
00:20:30,320 --> 00:20:35,039
in this area here there's a lot of

616
00:20:32,000 --> 00:20:36,720
variability it bounces around it's also

617
00:20:35,039 --> 00:20:38,559
biased towards the liberal party which

618
00:20:36,720 --> 00:20:41,120
is the area above 50

619
00:20:38,559 --> 00:20:43,600
so for quite a long while and these

620
00:20:41,120 --> 00:20:45,520
these graphs are in

621
00:20:43,600 --> 00:20:47,679
in polling place size order i've

622
00:20:45,520 --> 00:20:50,080
arranged them from smallest to largest

623
00:20:47,679 --> 00:20:51,919
which is not how it comes in i'll show

624
00:20:50,080 --> 00:20:54,400
you another example later but this is

625
00:20:51,919 --> 00:20:57,679
the worst case scenario and it takes a

626
00:20:54,400 --> 00:20:59,440
long time for this figure to settle down

627
00:20:57,679 --> 00:21:00,640
uh just let me consult my notes if

628
00:20:59,440 --> 00:21:01,679
there's anything else i've got to stay

629
00:21:00,640 --> 00:21:03,919
here

630
00:21:01,679 --> 00:21:06,320
no that's why that's the key thing that

631
00:21:03,919 --> 00:21:08,640
you've always got to watch for is this

632
00:21:06,320 --> 00:21:10,320
early figure now

633
00:21:08,640 --> 00:21:12,559
this next graph actually i'll stay on

634
00:21:10,320 --> 00:21:14,240
this graph um

635
00:21:12,559 --> 00:21:16,080
dr ross cunningham back in the 80s

636
00:21:14,240 --> 00:21:18,400
worked out a method to correct for this

637
00:21:16,080 --> 00:21:20,080
bias if you looked at the every one of

638
00:21:18,400 --> 00:21:22,080
these electrodes has a characteristic

639
00:21:20,080 --> 00:21:24,000
curve like that

640
00:21:22,080 --> 00:21:26,240
he went away and he worked out what's

641
00:21:24,000 --> 00:21:28,480
called a bias corrective method he he

642
00:21:26,240 --> 00:21:30,960
plotted what that characteristic curve

643
00:21:28,480 --> 00:21:33,840
normally looked back like and corrected

644
00:21:30,960 --> 00:21:35,520
for it he then built in for many seats

645
00:21:33,840 --> 00:21:37,200
and built in an overall regression model

646
00:21:35,520 --> 00:21:39,280
to pick the winner he was less concerned

647
00:21:37,200 --> 00:21:40,960
about picking individual seats than

648
00:21:39,280 --> 00:21:42,320
correcting for bias for an overall

649
00:21:40,960 --> 00:21:44,400
regression model to pick the winner of

650
00:21:42,320 --> 00:21:46,000
the election we've we've adapted his

651
00:21:44,400 --> 00:21:48,480
model to be more about picking

652
00:21:46,000 --> 00:21:50,159
individual seats and for the newer

653
00:21:48,480 --> 00:21:51,600
methods which are now available we don't

654
00:21:50,159 --> 00:21:53,280
need to do we're not as reliant on

655
00:21:51,600 --> 00:21:55,600
regression now

656
00:21:53,280 --> 00:21:58,159
this is the same data from 2019 and

657
00:21:55,600 --> 00:21:59,120
against it i've plotted the progressive

658
00:21:58,159 --> 00:22:01,520
numbers

659
00:21:59,120 --> 00:22:02,559
for 2016 which is the green line at the

660
00:22:01,520 --> 00:22:04,480
bottom

661
00:22:02,559 --> 00:22:05,679
it's the same data

662
00:22:04,480 --> 00:22:07,679
now

663
00:22:05,679 --> 00:22:09,039
what you can see here is if you can know

664
00:22:07,679 --> 00:22:11,200
what that graph is going to look like

665
00:22:09,039 --> 00:22:13,280
from last time you can plot where this

666
00:22:11,200 --> 00:22:15,360
regression's gone where this figure is

667
00:22:13,280 --> 00:22:18,320
going to end up there's a very good

668
00:22:15,360 --> 00:22:19,679
match between those two numbers uh that

669
00:22:18,320 --> 00:22:22,400
we're seeing here

670
00:22:19,679 --> 00:22:24,159
let me get my graphs yes um if you look

671
00:22:22,400 --> 00:22:26,480
at the gap between the two lines you can

672
00:22:24,159 --> 00:22:28,080
see that this this gap is stable

673
00:22:26,480 --> 00:22:29,760
if you know where this green line is

674
00:22:28,080 --> 00:22:32,240
going you know where the black line is

675
00:22:29,760 --> 00:22:34,400
going and that's that's

676
00:22:32,240 --> 00:22:36,000
um that's the method we use to call the

677
00:22:34,400 --> 00:22:37,440
election now the other thing that's to

678
00:22:36,000 --> 00:22:39,600
say

679
00:22:37,440 --> 00:22:40,720
is the gap between the two lines is the

680
00:22:39,600 --> 00:22:43,679
swing

681
00:22:40,720 --> 00:22:46,240
it's the change of vote at every point

682
00:22:43,679 --> 00:22:47,840
on that graph you've got a current total

683
00:22:46,240 --> 00:22:50,320
and you've got a historical total of the

684
00:22:47,840 --> 00:22:52,559
same figures that difference in between

685
00:22:50,320 --> 00:22:55,039
the two numbers is the swing and this

686
00:22:52,559 --> 00:22:57,679
graph shows the gap between those two

687
00:22:55,039 --> 00:23:00,000
lines doesn't have a lot of variability

688
00:22:57,679 --> 00:23:01,200
it has a lot less variability

689
00:23:00,000 --> 00:23:03,600
than the

690
00:23:01,200 --> 00:23:04,880
the the first preference line and that

691
00:23:03,600 --> 00:23:06,320
can be seen

692
00:23:04,880 --> 00:23:09,679
let me look now

693
00:23:06,320 --> 00:23:11,919
so so let me i'll come back to how we

694
00:23:09,679 --> 00:23:13,200
use this but the key point to make is

695
00:23:11,919 --> 00:23:15,919
that gap

696
00:23:13,200 --> 00:23:17,440
is the swing and if you can you rely on

697
00:23:15,919 --> 00:23:20,880
the swing you've got something really

698
00:23:17,440 --> 00:23:24,320
useful to use now this next graph

699
00:23:20,880 --> 00:23:27,760
this is the um graph of the swings by

700
00:23:24,320 --> 00:23:30,320
polling place again arranged by um

701
00:23:27,760 --> 00:23:32,000
by polling place size but just let me go

702
00:23:30,320 --> 00:23:33,919
back this has got the same range on the

703
00:23:32,000 --> 00:23:36,080
y-axis of 60

704
00:23:33,919 --> 00:23:37,840
if i go back to that previous one

705
00:23:36,080 --> 00:23:41,120
there's a huge

706
00:23:37,840 --> 00:23:43,919
um 34 point range in the variance and

707
00:23:41,120 --> 00:23:45,760
the the dots are all over the place

708
00:23:43,919 --> 00:23:48,159
the swing from polling place to polling

709
00:23:45,760 --> 00:23:50,240
place it doesn't have this it has a

710
00:23:48,159 --> 00:23:52,960
little bit of a cluster at the start but

711
00:23:50,240 --> 00:23:55,279
it's not biased it's not above or below

712
00:23:52,960 --> 00:23:57,039
in any particular order so we've got

713
00:23:55,279 --> 00:23:59,440
here something which doesn't have an

714
00:23:57,039 --> 00:24:01,840
early bias and has only got half of the

715
00:23:59,440 --> 00:24:04,400
standard deviation so you've got a more

716
00:24:01,840 --> 00:24:06,559
reliable estimator to use if you can

717
00:24:04,400 --> 00:24:08,720
operate on the swing and that's what we

718
00:24:06,559 --> 00:24:11,360
do the simple two candidate preferred

719
00:24:08,720 --> 00:24:14,000
which was that first black line

720
00:24:11,360 --> 00:24:15,919
is just the current total and the method

721
00:24:14,000 --> 00:24:18,799
has the problem that the bias and the

722
00:24:15,919 --> 00:24:21,440
large variance is built into that number

723
00:24:18,799 --> 00:24:24,880
the simple swing is the current 2cp

724
00:24:21,440 --> 00:24:28,480
minus the final 2cp from last time

725
00:24:24,880 --> 00:24:30,720
but because the 2cp you're using is bias

726
00:24:28,480 --> 00:24:32,640
both the swing and the 2cp are going to

727
00:24:30,720 --> 00:24:34,720
be biased and have various problems in

728
00:24:32,640 --> 00:24:36,720
the same manner what we do is what's

729
00:24:34,720 --> 00:24:39,760
called a match two candidate preferred

730
00:24:36,720 --> 00:24:41,840
analysis which uses the unbiased and low

731
00:24:39,760 --> 00:24:44,640
variance polling place swings

732
00:24:41,840 --> 00:24:46,799
at every point on the count

733
00:24:44,640 --> 00:24:50,720
our current count compares the current

734
00:24:46,799 --> 00:24:53,360
2cp to this historical 2cp subtracts one

735
00:24:50,720 --> 00:24:55,440
from the other and gets a match to swing

736
00:24:53,360 --> 00:24:57,120
which that match swing is the gap

737
00:24:55,440 --> 00:24:59,120
between those two graphs i showed you a

738
00:24:57,120 --> 00:25:00,480
moment ago and then what you do with

739
00:24:59,120 --> 00:25:02,799
that match swing

740
00:25:00,480 --> 00:25:04,080
is you ask add that swing to the two

741
00:25:02,799 --> 00:25:05,200
candidate preferred for the last

742
00:25:04,080 --> 00:25:07,360
election

743
00:25:05,200 --> 00:25:09,360
and what you get on this next graph the

744
00:25:07,360 --> 00:25:12,559
black graph is the same as on the

745
00:25:09,360 --> 00:25:15,200
previous chart but the red graph is is

746
00:25:12,559 --> 00:25:16,320
prediction based on the red based on the

747
00:25:15,200 --> 00:25:18,320
matte swing

748
00:25:16,320 --> 00:25:20,559
and this is what you get there's that

749
00:25:18,320 --> 00:25:22,400
black line which takes it to about 30 to

750
00:25:20,559 --> 00:25:24,880
settle down this is the red line this is

751
00:25:22,400 --> 00:25:27,360
the match figure this is stable this

752
00:25:24,880 --> 00:25:29,360
isn't bouncing around this is like 10 of

753
00:25:27,360 --> 00:25:30,799
the counted and within one percent of

754
00:25:29,360 --> 00:25:34,080
the final result

755
00:25:30,799 --> 00:25:36,640
the low variance and the lack of bias

756
00:25:34,080 --> 00:25:37,919
means that this is an accurate predictor

757
00:25:36,640 --> 00:25:40,559
and it's why

758
00:25:37,919 --> 00:25:42,799
we that's the method we use we use this

759
00:25:40,559 --> 00:25:44,640
comparative swing and that removes

760
00:25:42,799 --> 00:25:46,960
nearly all the bias and a lot of

761
00:25:44,640 --> 00:25:48,559
variance from the early figures

762
00:25:46,960 --> 00:25:51,440
now

763
00:25:48,559 --> 00:25:52,880
this is actually the same graph

764
00:25:51,440 --> 00:25:55,120
but what i've done here is i've actually

765
00:25:52,880 --> 00:25:57,600
used the time stamped data from the last

766
00:25:55,120 --> 00:26:00,559
election so this isn't on the order of

767
00:25:57,600 --> 00:26:03,360
polling places it's on timestamp now

768
00:26:00,559 --> 00:26:05,520
this performs slightly better so the the

769
00:26:03,360 --> 00:26:07,600
black line took to about 30 percent to

770
00:26:05,520 --> 00:26:09,520
settle down on the other graph using

771
00:26:07,600 --> 00:26:12,480
real-life data it's settling down about

772
00:26:09,520 --> 00:26:14,720
20 percent but the key point is the the

773
00:26:12,480 --> 00:26:16,799
red light is still it's just way more

774
00:26:14,720 --> 00:26:20,400
stable and this is the same in every

775
00:26:16,799 --> 00:26:23,039
election the swing is always more stable

776
00:26:20,400 --> 00:26:24,799
than the the two candidates essentially

777
00:26:23,039 --> 00:26:26,320
there is a wide range of results from

778
00:26:24,799 --> 00:26:28,240
polling place to polling place and if

779
00:26:26,320 --> 00:26:30,480
you want to use the raw numbers

780
00:26:28,240 --> 00:26:32,960
you've got all that wide range

781
00:26:30,480 --> 00:26:35,520
if you use the swing you're measuring

782
00:26:32,960 --> 00:26:37,679
change from the last election and that

783
00:26:35,520 --> 00:26:39,120
change will always be less than the two

784
00:26:37,679 --> 00:26:41,520
than the two candidate preferred

785
00:26:39,120 --> 00:26:43,120
variability in an individual electorate

786
00:26:41,520 --> 00:26:45,840
two candidate preferred right results

787
00:26:43,120 --> 00:26:47,919
can range from 20 to 80 percent you know

788
00:26:45,840 --> 00:26:50,080
they have a huge range

789
00:26:47,919 --> 00:26:51,440
swings will be clustered around the

790
00:26:50,080 --> 00:26:53,520
swing and they're going to be a lot

791
00:26:51,440 --> 00:26:55,440
smaller and the variance on average is

792
00:26:53,520 --> 00:26:58,080
between one-third just through the

793
00:26:55,440 --> 00:27:00,000
standard deviation between one-third and

794
00:26:58,080 --> 00:27:02,720
half and that's why we operate on the

795
00:27:00,000 --> 00:27:04,640
swing now

796
00:27:02,720 --> 00:27:07,200
on election night this is the way the

797
00:27:04,640 --> 00:27:08,480
data comes to us the blue line is first

798
00:27:07,200 --> 00:27:10,480
preferences

799
00:27:08,480 --> 00:27:12,080
and the second line is the two-party

800
00:27:10,480 --> 00:27:14,000
preferred which comes in later or two

801
00:27:12,080 --> 00:27:15,760
candidates preferred you can see that

802
00:27:14,000 --> 00:27:17,279
one lags the other and then they catch

803
00:27:15,760 --> 00:27:20,799
up later in the evening the key thing to

804
00:27:17,279 --> 00:27:23,279
watch for is 7 p.m i've only got 10

805
00:27:20,799 --> 00:27:25,120
less than 10 of the first preference

806
00:27:23,279 --> 00:27:27,440
felt only about three percent of the

807
00:27:25,120 --> 00:27:30,480
two-party preferred by 7 30 i've got to

808
00:27:27,440 --> 00:27:32,960
10 two-party preferred and

809
00:27:30,480 --> 00:27:35,440
remember i mean this this is 10 this but

810
00:27:32,960 --> 00:27:37,440
we haven't got the data in that order

811
00:27:35,440 --> 00:27:38,960
we've got it in time order so this is

812
00:27:37,440 --> 00:27:41,360
what the next graph is what the graph

813
00:27:38,960 --> 00:27:43,360
looks like if i plot this by time

814
00:27:41,360 --> 00:27:45,039
and you've got the graph is much further

815
00:27:43,360 --> 00:27:46,960
over to the right we're spending a lot

816
00:27:45,039 --> 00:27:48,720
more time earlier in the evening at 7 30

817
00:27:46,960 --> 00:27:50,799
in the evening talking about early

818
00:27:48,720 --> 00:27:52,320
figures because we haven't got a lot 10

819
00:27:50,799 --> 00:27:53,840
percent of the two-party preferred

820
00:27:52,320 --> 00:27:56,080
candidate but

821
00:27:53,840 --> 00:27:57,520
the key point to make there is this is

822
00:27:56,080 --> 00:27:59,840
more stable

823
00:27:57,520 --> 00:28:01,279
than the the number up here and that's

824
00:27:59,840 --> 00:28:03,919
what i want at 7 30 i want to know what

825
00:28:01,279 --> 00:28:05,600
the numbers are and i always say at 7 30

826
00:28:03,919 --> 00:28:07,360
on the night i usually know the result

827
00:28:05,600 --> 00:28:09,039
of the election if it's clear

828
00:28:07,360 --> 00:28:11,279
if i don't know the result i know we

829
00:28:09,039 --> 00:28:12,799
have to wait for more data you know if

830
00:28:11,279 --> 00:28:14,559
you ever know the result by 7 30 you

831
00:28:12,799 --> 00:28:16,240
might know it later if it's a really

832
00:28:14,559 --> 00:28:17,760
close election you won't know the

833
00:28:16,240 --> 00:28:19,279
results on the night

834
00:28:17,760 --> 00:28:20,799
but that's

835
00:28:19,279 --> 00:28:22,320
that's essentially what i'm doing on

836
00:28:20,799 --> 00:28:25,120
election night so

837
00:28:22,320 --> 00:28:27,840
what do we do next we've got that curve

838
00:28:25,120 --> 00:28:29,919
what do i do it's this red curve that's

839
00:28:27,840 --> 00:28:32,559
the line i showed earlier the dotted

840
00:28:29,919 --> 00:28:34,799
lines on either side that's the 99

841
00:28:32,559 --> 00:28:36,559
confidence interval once that confidence

842
00:28:34,799 --> 00:28:39,039
interval is above 50

843
00:28:36,559 --> 00:28:40,480
i'm confident that my prediction

844
00:28:39,039 --> 00:28:41,840
is over 50

845
00:28:40,480 --> 00:28:44,480
it is not going to fall back onto the

846
00:28:41,840 --> 00:28:46,640
other side it might bob back and

847
00:28:44,480 --> 00:28:49,440
forwards but it's not going to disappear

848
00:28:46,640 --> 00:28:50,960
this is the figure that i want to use i

849
00:28:49,440 --> 00:28:52,960
want something which gives me a stable

850
00:28:50,960 --> 00:28:54,720
prediction very early and so that's the

851
00:28:52,960 --> 00:28:56,080
method to use now i've got two vertical

852
00:28:54,720 --> 00:28:58,640
lines there

853
00:28:56,080 --> 00:29:01,440
because you do get variability in early

854
00:28:58,640 --> 00:29:03,279
figures we have two cutoffs

855
00:29:01,440 --> 00:29:05,919
three percent is we have a bottom of

856
00:29:03,279 --> 00:29:07,440
frame total on television no seat is

857
00:29:05,919 --> 00:29:09,279
included in that figure so i've got

858
00:29:07,440 --> 00:29:11,279
three percent counted that just

859
00:29:09,279 --> 00:29:13,760
minimizes the number of times the seats

860
00:29:11,279 --> 00:29:15,760
go down and that always makes people

861
00:29:13,760 --> 00:29:17,120
worried when the tally goes down now

862
00:29:15,760 --> 00:29:18,799
they say how can you have given the seat

863
00:29:17,120 --> 00:29:20,240
away and then take it back

864
00:29:18,799 --> 00:29:22,240
well if you're doing it by hand in

865
00:29:20,240 --> 00:29:25,279
absolute confidence you would be more

866
00:29:22,240 --> 00:29:27,760
cautious we have automated this so it

867
00:29:25,279 --> 00:29:29,679
will sometimes go down

868
00:29:27,760 --> 00:29:31,919
but the alternative is you do it

869
00:29:29,679 --> 00:29:33,600
manually and you get updates every two

870
00:29:31,919 --> 00:29:35,600
minutes and you're constantly behind if

871
00:29:33,600 --> 00:29:37,360
you adopt this statistical method you

872
00:29:35,600 --> 00:29:39,279
will always be up with the data and so

873
00:29:37,360 --> 00:29:41,279
that's what we choose to do now just a

874
00:29:39,279 --> 00:29:44,399
hint on this next graph because i'm not

875
00:29:41,279 --> 00:29:45,919
showing giving away too many secrets um

876
00:29:44,399 --> 00:29:47,919
this explains how this two-party

877
00:29:45,919 --> 00:29:50,480
preferred looks like if you plot the

878
00:29:47,919 --> 00:29:52,080
confidence interval um basically and

879
00:29:50,480 --> 00:29:54,080
this is not brandon this is an entirely

880
00:29:52,080 --> 00:29:55,679
different electorate and i've removed

881
00:29:54,080 --> 00:29:57,919
all the numbers so you can't read it but

882
00:29:55,679 --> 00:30:01,279
basically that confidence interval

883
00:29:57,919 --> 00:30:03,200
interval turns into a downward sloping

884
00:30:01,279 --> 00:30:05,279
downward curving line

885
00:30:03,200 --> 00:30:07,440
which approaches

886
00:30:05,279 --> 00:30:09,600
the the it's asymptotic the line

887
00:30:07,440 --> 00:30:13,840
approaches the 50 line

888
00:30:09,600 --> 00:30:15,840
when you get to um all the votes counted

889
00:30:13,840 --> 00:30:18,799
once that red line crosses that we give

890
00:30:15,840 --> 00:30:20,240
the seat away now the seat might drop

891
00:30:18,799 --> 00:30:22,000
back

892
00:30:20,240 --> 00:30:23,279
it generally drops back into leaning

893
00:30:22,000 --> 00:30:25,200
that way as you can see that that never

894
00:30:23,279 --> 00:30:27,440
dropped below 50 percent

895
00:30:25,200 --> 00:30:29,679
there's no way that that's once it gets

896
00:30:27,440 --> 00:30:32,480
close to that line it is not suddenly

897
00:30:29,679 --> 00:30:35,120
going to revert to the other side of 50

898
00:30:32,480 --> 00:30:37,120
and you've got a a reflective curve for

899
00:30:35,120 --> 00:30:39,120
the other candidate in the final race so

900
00:30:37,120 --> 00:30:40,960
that's that's what that confidence

901
00:30:39,120 --> 00:30:42,640
interval looks like i've not shown you

902
00:30:40,960 --> 00:30:45,279
because otherwise people sit and figure

903
00:30:42,640 --> 00:30:46,880
out you know a pseudo version of what we

904
00:30:45,279 --> 00:30:49,039
do and i'm just not going to do that but

905
00:30:46,880 --> 00:30:50,720
that's what you end up with and out of

906
00:30:49,039 --> 00:30:52,799
all this comes this box which is from

907
00:30:50,720 --> 00:30:53,760
the south australian election this gives

908
00:30:52,799 --> 00:30:55,279
me

909
00:30:53,760 --> 00:30:57,200
the um

910
00:30:55,279 --> 00:30:59,760
the number of seats won by each party

911
00:30:57,200 --> 00:31:01,840
and that's the whole game that we're

912
00:30:59,760 --> 00:31:03,600
doing here and on these numbers from

913
00:31:01,840 --> 00:31:05,840
quite early on we've got the liberal

914
00:31:03,600 --> 00:31:07,760
party on 25 seats

915
00:31:05,840 --> 00:31:09,919
and i have an error margin plus or minus

916
00:31:07,760 --> 00:31:12,000
three seats you need 23 seats for

917
00:31:09,919 --> 00:31:13,600
majority if i'm looking at these numbers

918
00:31:12,000 --> 00:31:16,000
on the night that's not close enough to

919
00:31:13,600 --> 00:31:18,000
call but this looks like magic this

920
00:31:16,000 --> 00:31:20,080
looks like this is all under so you know

921
00:31:18,000 --> 00:31:22,080
i'm sort of making a guess or something

922
00:31:20,080 --> 00:31:24,000
all the mathematics in each of those

923
00:31:22,080 --> 00:31:25,919
seats is being done constantly and it's

924
00:31:24,000 --> 00:31:27,840
being checked and we have alternatives

925
00:31:25,919 --> 00:31:29,760
for the preference formulas are wrong we

926
00:31:27,840 --> 00:31:31,360
have different ways we can if we think

927
00:31:29,760 --> 00:31:33,919
the formula is over predicting we can

928
00:31:31,360 --> 00:31:36,080
pull the seat back into doubt um i don't

929
00:31:33,919 --> 00:31:38,480
manually give seats away i will manually

930
00:31:36,080 --> 00:31:40,320
push them into wind out if i want to but

931
00:31:38,480 --> 00:31:42,000
that all then produces this total at the

932
00:31:40,320 --> 00:31:44,320
end so everyone thinks i'm you know

933
00:31:42,000 --> 00:31:46,640
making some guess or not i'm not it's a

934
00:31:44,320 --> 00:31:49,200
science this is all science all

935
00:31:46,640 --> 00:31:51,760
mathematics and that's the way it's done

936
00:31:49,200 --> 00:31:53,760
so um that's my little presentation but

937
00:31:51,760 --> 00:31:56,399
uh as i said

938
00:31:53,760 --> 00:31:59,600
if i go back to this graph

939
00:31:56,399 --> 00:32:00,640
um this is the magic if you use the

940
00:31:59,600 --> 00:32:02,720
swing

941
00:32:00,640 --> 00:32:05,200
you get a red line prediction like that

942
00:32:02,720 --> 00:32:06,960
which is stable from very early on if

943
00:32:05,200 --> 00:32:08,159
you use the black line

944
00:32:06,960 --> 00:32:09,600
um

945
00:32:08,159 --> 00:32:11,360
you're all over the place waiting for

946
00:32:09,600 --> 00:32:12,799
the figures to stabilize and i'll say

947
00:32:11,360 --> 00:32:14,240
one further thing

948
00:32:12,799 --> 00:32:16,240
it is getting slightly harder at the

949
00:32:14,240 --> 00:32:18,480
moment the rise in pre-poll voting and

950
00:32:16,240 --> 00:32:20,480
postals the assumption that people vote

951
00:32:18,480 --> 00:32:22,559
in the same place as last time is

952
00:32:20,480 --> 00:32:23,760
starting to be undermined and we've i

953
00:32:22,559 --> 00:32:25,440
know we did the queensland election

954
00:32:23,760 --> 00:32:27,279
there was a huge increase in pre-polar

955
00:32:25,440 --> 00:32:29,519
and postal voting it was a closer

956
00:32:27,279 --> 00:32:31,760
election we just had to wait longer and

957
00:32:29,519 --> 00:32:33,760
in fact in recent years we've begun to

958
00:32:31,760 --> 00:32:35,600
sort of wind out the variance formula so

959
00:32:33,760 --> 00:32:37,919
that the system gives a little bit more

960
00:32:35,600 --> 00:32:42,000
wiggle room in the predictions

961
00:32:37,919 --> 00:32:43,519
so anyway that's that's my presentation

962
00:32:42,000 --> 00:32:44,799
you for that anthony that was really

963
00:32:43,519 --> 00:32:46,320
really interesting i do like the

964
00:32:44,799 --> 00:32:48,320
statistical magic you've pulled off

965
00:32:46,320 --> 00:32:50,480
there it is quite incredible

966
00:32:48,320 --> 00:32:52,000
thank you very much

967
00:32:50,480 --> 00:32:54,080
so hopefully you'll be you'll be quite

968
00:32:52,000 --> 00:32:56,559
busy for the um the first half of this

969
00:32:54,080 --> 00:32:58,880
year i imagine anthony you'll be yes i

970
00:32:56,559 --> 00:33:00,640
have a um south australian election on

971
00:32:58,880 --> 00:33:02,320
the 19th of march

972
00:33:00,640 --> 00:33:03,919
and it's looking pretty clear that the

973
00:33:02,320 --> 00:33:06,080
federal election will be in may not

974
00:33:03,919 --> 00:33:07,360
march at the moment so that's my working

975
00:33:06,080 --> 00:33:10,399
assumption of course it could be wrong

976
00:33:07,360 --> 00:33:12,000
but that's my working assumption

977
00:33:10,399 --> 00:33:12,720
yes yes so you know who can predict

978
00:33:12,000 --> 00:33:16,919
these

979
00:33:12,720 --> 00:33:16,919
these politicians what would they do

