1
00:00:06,320 --> 00:00:11,499
[Music]

2
00:00:15,679 --> 00:00:19,920
hello everyone welcome back to kaya

3
00:00:17,840 --> 00:00:24,240
theta where we are in the middle of a

4
00:00:19,920 --> 00:00:27,039
little colonel talk party um so next up

5
00:00:24,240 --> 00:00:29,400
we have keith packard uh keith packard

6
00:00:27,039 --> 00:00:32,480
has been developing free software since

7
00:00:29,400 --> 00:00:34,800
1986. he is currently a senior principal

8
00:00:32,480 --> 00:00:36,960
engineer with amazon's device os group

9
00:00:34,800 --> 00:00:38,960
he has received a usenix lifetime

10
00:00:36,960 --> 00:00:42,000
achievement award an o'reilly open

11
00:00:38,960 --> 00:00:44,160
source award and sits on the x.org

12
00:00:42,000 --> 00:00:47,280
foundation and amateur radio digital

13
00:00:44,160 --> 00:00:48,960
communications boards um

14
00:00:47,280 --> 00:00:51,120
i think keith is probably a pretty

15
00:00:48,960 --> 00:00:52,559
familiar face um at

16
00:00:51,120 --> 00:00:53,840
linux comps

17
00:00:52,559 --> 00:00:56,320
so

18
00:00:53,840 --> 00:00:59,520
i think i think he knows the drill as do

19
00:00:56,320 --> 00:01:01,440
many of you so uh keith will be taking

20
00:00:59,520 --> 00:01:03,680
questions after the talk so if you have

21
00:01:01,440 --> 00:01:05,519
any questions for keith please put them

22
00:01:03,680 --> 00:01:07,920
in the little questions tab above the

23
00:01:05,519 --> 00:01:10,320
chat in venulis and we'll pass them on

24
00:01:07,920 --> 00:01:13,040
you can also upvote questions that you

25
00:01:10,320 --> 00:01:15,439
think are great and want to be asked

26
00:01:13,040 --> 00:01:17,680
okay all over to you keith

27
00:01:15,439 --> 00:01:20,000
thank you so much betsy uh thank you

28
00:01:17,680 --> 00:01:21,759
again for welcoming me to another

29
00:01:20,000 --> 00:01:24,080
glorious lca conference i wish i could

30
00:01:21,759 --> 00:01:25,280
be with you all uh maybe next year in

31
00:01:24,080 --> 00:01:27,360
canberra

32
00:01:25,280 --> 00:01:30,560
i'm going to be talking today about some

33
00:01:27,360 --> 00:01:32,079
work that i started um and somebody else

34
00:01:30,560 --> 00:01:34,240
has taken over the reins and is is

35
00:01:32,079 --> 00:01:36,079
working on much more than i am now uh

36
00:01:34,240 --> 00:01:37,840
talking about kernel hardening uh for

37
00:01:36,079 --> 00:01:40,720
arm 32

38
00:01:37,840 --> 00:01:45,439
working on some stuff that's that uh

39
00:01:40,720 --> 00:01:47,119
some bugs that got filed in uh 2019

40
00:01:45,439 --> 00:01:49,439
as betsy said i'm working in the device

41
00:01:47,119 --> 00:01:51,040
os group at amazon we're the group

42
00:01:49,439 --> 00:01:54,560
responsible for building operating

43
00:01:51,040 --> 00:01:59,520
systems for all of amazon's fun devices

44
00:01:54,560 --> 00:01:59,520
from tablets to tvs to echo devices

45
00:02:01,570 --> 00:02:06,000
[Music]

46
00:02:02,719 --> 00:02:08,959
okay the kernel self-protection project

47
00:02:06,000 --> 00:02:10,479
i asked case um when he started this

48
00:02:08,959 --> 00:02:12,160
project and he said he actually sent me

49
00:02:10,479 --> 00:02:15,760
a link to the email message that he sent

50
00:02:12,160 --> 00:02:17,040
out on the 5th of november in 2015.

51
00:02:15,760 --> 00:02:18,400
if you were all here for the last

52
00:02:17,040 --> 00:02:20,080
session you'll know that case cook has

53
00:02:18,400 --> 00:02:23,360
been doing kernel security for a very

54
00:02:20,080 --> 00:02:25,200
long time um and i'm really i i keep

55
00:02:23,360 --> 00:02:26,879
being awed by the amount of work and the

56
00:02:25,200 --> 00:02:29,360
amount of progress that he's made in in

57
00:02:26,879 --> 00:02:31,840
making our our favorite operating system

58
00:02:29,360 --> 00:02:34,000
secure uh even even though the language

59
00:02:31,840 --> 00:02:35,920
that it's written in is uh is not the

60
00:02:34,000 --> 00:02:38,400
best in the world

61
00:02:35,920 --> 00:02:41,120
the kernel self-protection project

62
00:02:38,400 --> 00:02:42,800
is all about defense in-depth or linux

63
00:02:41,120 --> 00:02:44,480
you heard case talking about one of the

64
00:02:42,800 --> 00:02:46,959
one of the newer projects i'm here to

65
00:02:44,480 --> 00:02:49,120
talk about one of the oldest uh set of

66
00:02:46,959 --> 00:02:51,840
bugs in the in that that are filed in

67
00:02:49,120 --> 00:02:53,760
that project um the part the in fact the

68
00:02:51,840 --> 00:02:55,519
bug that i'm uh mostly talking about

69
00:02:53,760 --> 00:02:57,280
today is bug number one in the kernel

70
00:02:55,519 --> 00:03:00,080
self-protection project

71
00:02:57,280 --> 00:03:01,599
uh kspp is is all about eliminating

72
00:03:00,080 --> 00:03:04,560
classes of bugs

73
00:03:01,599 --> 00:03:07,040
uh you can you can hear

74
00:03:04,560 --> 00:03:08,239
his talk about fixing overflows in mem

75
00:03:07,040 --> 00:03:10,000
copy

76
00:03:08,239 --> 00:03:12,400
it's worked he's worked on eliminating

77
00:03:10,000 --> 00:03:15,040
variable length arrays in the kernels

78
00:03:12,400 --> 00:03:17,200
static array overflows um and also

79
00:03:15,040 --> 00:03:20,319
eliminating methods of exploitation of

80
00:03:17,200 --> 00:03:22,159
bugs so the method that a lot of a lot

81
00:03:20,319 --> 00:03:25,040
of exploits use is return oriented

82
00:03:22,159 --> 00:03:26,720
programming and kspp has been working on

83
00:03:25,040 --> 00:03:29,120
mitigation techniques for things like

84
00:03:26,720 --> 00:03:31,120
that so both fixing the source code

85
00:03:29,120 --> 00:03:33,760
classes of bugs and then making the

86
00:03:31,120 --> 00:03:36,640
kernel harder uh harder to exploit once

87
00:03:33,760 --> 00:03:38,000
you've actually found a way in

88
00:03:36,640 --> 00:03:40,239
okay so

89
00:03:38,000 --> 00:03:42,799
i'm talking about the 32-bit arm

90
00:03:40,239 --> 00:03:44,959
architecture and you might be asking me

91
00:03:42,799 --> 00:03:47,120
are you actually building devices

92
00:03:44,959 --> 00:03:48,000
with 32-bit arm processor the answer is

93
00:03:47,120 --> 00:03:49,200
well

94
00:03:48,000 --> 00:03:51,040
not really

95
00:03:49,200 --> 00:03:53,599
essentially all of the devices that we

96
00:03:51,040 --> 00:03:56,560
build that run linux are actually modern

97
00:03:53,599 --> 00:03:59,040
arm chips that could run 64-bit code

98
00:03:56,560 --> 00:04:00,959
so why are we still running 32-bits and

99
00:03:59,040 --> 00:04:02,720
this graph is designed to give you an

100
00:04:00,959 --> 00:04:04,000
indication of why we might still be

101
00:04:02,720 --> 00:04:05,439
doing that

102
00:04:04,000 --> 00:04:06,239
so starting in

103
00:04:05,439 --> 00:04:09,200
in

104
00:04:06,239 --> 00:04:10,879
this data actually comes from jcmit.net

105
00:04:09,200 --> 00:04:12,400
who has historical data back to the

106
00:04:10,879 --> 00:04:14,000
1950s

107
00:04:12,400 --> 00:04:15,680
but i truncated this graph to just

108
00:04:14,000 --> 00:04:16,639
starting in 2000 to show you kind of the

109
00:04:15,680 --> 00:04:19,120
last

110
00:04:16,639 --> 00:04:21,120
of the last 15 years of that exponential

111
00:04:19,120 --> 00:04:23,120
decline in memory prices

112
00:04:21,120 --> 00:04:25,440
that kind of stopped somewhere between

113
00:04:23,120 --> 00:04:27,280
2010 and 2015 we stopped being able to

114
00:04:25,440 --> 00:04:28,960
reliably expect that memory would get

115
00:04:27,280 --> 00:04:30,560
cheaper every year

116
00:04:28,960 --> 00:04:32,240
and for the past 20 years or so we've

117
00:04:30,560 --> 00:04:34,720
had to expect that memory prices are

118
00:04:32,240 --> 00:04:36,800
pretty constant somewhere between five

119
00:04:34,720 --> 00:04:39,840
and ten dollars a gigabyte uh depending

120
00:04:36,800 --> 00:04:40,800
upon when you actually make your orders

121
00:04:39,840 --> 00:04:42,400
and so

122
00:04:40,800 --> 00:04:43,280
saving memory

123
00:04:42,400 --> 00:04:45,600
is

124
00:04:43,280 --> 00:04:48,479
you you can no longer expect that just

125
00:04:45,600 --> 00:04:49,280
delaying your product by a year uh and

126
00:04:48,479 --> 00:04:51,280
and

127
00:04:49,280 --> 00:04:52,800
and will will make the memory cheap

128
00:04:51,280 --> 00:04:54,639
enough for you to be able to afford to

129
00:04:52,800 --> 00:04:56,400
build your product you really need to

130
00:04:54,639 --> 00:04:59,040
start thinking about memories of fixed

131
00:04:56,400 --> 00:05:00,960
cost instead of an ever decreasing cost

132
00:04:59,040 --> 00:05:02,960
and so by using a 32-bit arm

133
00:05:00,960 --> 00:05:05,280
architecture we're able to save not a

134
00:05:02,960 --> 00:05:06,720
lot of memory but a bit of memory and

135
00:05:05,280 --> 00:05:08,320
every bit of memory

136
00:05:06,720 --> 00:05:11,440
every bit of memory saved is a bit of

137
00:05:08,320 --> 00:05:12,720
memory i can do more fun fun features in

138
00:05:11,440 --> 00:05:15,919
the products with

139
00:05:12,720 --> 00:05:17,680
and so we're really using the 32-bit arm

140
00:05:15,919 --> 00:05:19,840
kernel right now in order to save the

141
00:05:17,680 --> 00:05:21,759
memory that we can

142
00:05:19,840 --> 00:05:24,960
this may change in the future as arm

143
00:05:21,759 --> 00:05:26,479
tries to push a 32-bit architectures off

144
00:05:24,960 --> 00:05:28,800
of their roadmaps

145
00:05:26,479 --> 00:05:30,639
and vendors stop selling parts that can

146
00:05:28,800 --> 00:05:32,479
run 32-bit code

147
00:05:30,639 --> 00:05:34,000
we may have to do something different

148
00:05:32,479 --> 00:05:35,600
but for now we're still doing a lot of

149
00:05:34,000 --> 00:05:39,120
32-bit work

150
00:05:35,600 --> 00:05:41,919
and all for all for saving memory

151
00:05:39,120 --> 00:05:43,199
alas uh arm 32 world kind of feels left

152
00:05:41,919 --> 00:05:44,639
out of the kernel self protection

153
00:05:43,199 --> 00:05:47,199
project

154
00:05:44,639 --> 00:05:49,199
a lot of the um kernel self protection

155
00:05:47,199 --> 00:05:50,720
project fixes require architecture

156
00:05:49,199 --> 00:05:53,520
specific changes

157
00:05:50,720 --> 00:05:56,479
um and a lot of the people working in

158
00:05:53,520 --> 00:05:58,319
that area uh are working are focused on

159
00:05:56,479 --> 00:06:00,160
newer and higher end architectures that

160
00:05:58,319 --> 00:06:03,759
are more more interesting and fun to

161
00:06:00,160 --> 00:06:06,479
play with like you know x86 and arm 64

162
00:06:03,759 --> 00:06:08,160
power pc risk 5

163
00:06:06,479 --> 00:06:11,120
the kinds of places where it's actually

164
00:06:08,160 --> 00:06:12,880
easier to do a lot of this work

165
00:06:11,120 --> 00:06:14,800
older architectures

166
00:06:12,880 --> 00:06:17,840
especially especially smaller more

167
00:06:14,800 --> 00:06:20,000
limited devices like arm

168
00:06:17,840 --> 00:06:21,840
and mips have have unique challenges

169
00:06:20,000 --> 00:06:24,240
that we'll get into in the process of

170
00:06:21,840 --> 00:06:26,560
the pre of this presentation the problem

171
00:06:24,240 --> 00:06:29,280
is that some of the most critical fixes

172
00:06:26,560 --> 00:06:31,280
are still not available uh for kind of

173
00:06:29,280 --> 00:06:34,479
these two older architectures that are

174
00:06:31,280 --> 00:06:36,240
really common in consumer devices arm 32

175
00:06:34,479 --> 00:06:38,080
and mips

176
00:06:36,240 --> 00:06:40,160
so what are these first

177
00:06:38,080 --> 00:06:42,240
first bugs you ask

178
00:06:40,160 --> 00:06:44,479
the first four of these bugs

179
00:06:42,240 --> 00:06:44,479
are

180
00:06:44,639 --> 00:06:49,199
the thread info in the kernel stack and

181
00:06:46,960 --> 00:06:51,680
that's the one we're working on today

182
00:06:49,199 --> 00:06:54,800
that's where the the the a significant

183
00:06:51,680 --> 00:06:58,240
amount of the per process information

184
00:06:54,800 --> 00:07:01,120
um is actually stored in the same pages

185
00:06:58,240 --> 00:07:02,720
in memory uh as the kernel stack

186
00:07:01,120 --> 00:07:05,440
and we'll we'll find out why that's a

187
00:07:02,720 --> 00:07:08,000
really terrible idea

188
00:07:05,440 --> 00:07:10,720
the second second bug that we want to

189
00:07:08,000 --> 00:07:13,120
resolve from kspp number two is that the

190
00:07:10,720 --> 00:07:15,280
kernel stack should be protected so that

191
00:07:13,120 --> 00:07:16,960
if you overflow the kernel stack instead

192
00:07:15,280 --> 00:07:18,639
of smashing memory adjacent to the

193
00:07:16,960 --> 00:07:19,759
kernel stack that it should probably

194
00:07:18,639 --> 00:07:21,360
trap

195
00:07:19,759 --> 00:07:22,800
so that you know that a kernel stack is

196
00:07:21,360 --> 00:07:24,560
overflowed

197
00:07:22,800 --> 00:07:26,240
one of the problems with a c language is

198
00:07:24,560 --> 00:07:28,160
that it doesn't really have any guards

199
00:07:26,240 --> 00:07:30,479
against stack overflow

200
00:07:28,160 --> 00:07:32,160
you can kind of allocate stack memory

201
00:07:30,479 --> 00:07:33,919
however you like and there's no there's

202
00:07:32,160 --> 00:07:36,400
no easy way to

203
00:07:33,919 --> 00:07:38,240
uh to detect in the c code that you've

204
00:07:36,400 --> 00:07:40,479
overflowed the stack

205
00:07:38,240 --> 00:07:43,360
so we're using a hardware protection

206
00:07:40,479 --> 00:07:45,120
here uh to try to catch that

207
00:07:43,360 --> 00:07:46,560
instead of instead of relying on

208
00:07:45,120 --> 00:07:48,560
software

209
00:07:46,560 --> 00:07:50,960
and that's with these guard pages the

210
00:07:48,560 --> 00:07:51,919
way that this is done is you have

211
00:07:50,960 --> 00:07:53,120
pages

212
00:07:51,919 --> 00:07:55,440
you have

213
00:07:53,120 --> 00:07:57,759
the kernel stack pages and surrounding

214
00:07:55,440 --> 00:07:58,879
the kernel stack are are unmapped pages

215
00:07:57,759 --> 00:08:00,479
in memory

216
00:07:58,879 --> 00:08:02,240
and so that if you try to access those

217
00:08:00,479 --> 00:08:03,759
there's no memory there

218
00:08:02,240 --> 00:08:05,599
and so the kernel actually takes a

219
00:08:03,759 --> 00:08:07,919
memory protection fault

220
00:08:05,599 --> 00:08:10,960
in hardware and so that lets you let you

221
00:08:07,919 --> 00:08:13,039
trap the kernel stack overflow

222
00:08:10,960 --> 00:08:14,400
number bug uh bug number three and bug

223
00:08:13,039 --> 00:08:16,560
number four are things we're not gonna

224
00:08:14,400 --> 00:08:19,280
be working on today uh but i'm hoping to

225
00:08:16,560 --> 00:08:20,960
get started on those in in the future um

226
00:08:19,280 --> 00:08:24,080
and those are those are addressing some

227
00:08:20,960 --> 00:08:26,560
more common uh common problems uh that

228
00:08:24,080 --> 00:08:29,039
would be good to fix uh on the on the

229
00:08:26,560 --> 00:08:31,360
arm architecture uh the kernel uh base

230
00:08:29,039 --> 00:08:34,080
address offset randomization uh would

231
00:08:31,360 --> 00:08:36,399
make it more difficult for uh attacks to

232
00:08:34,080 --> 00:08:37,599
know where data and and code is in the

233
00:08:36,399 --> 00:08:40,080
kernel

234
00:08:37,599 --> 00:08:41,839
by uh by making locations of stuff in

235
00:08:40,080 --> 00:08:44,560
memory random and undetectable

236
00:08:41,839 --> 00:08:46,640
undiscoverable by applications

237
00:08:44,560 --> 00:08:49,040
and then turning on some more

238
00:08:46,640 --> 00:08:51,360
mandatory kernel memory protections

239
00:08:49,040 --> 00:08:54,160
right now in the arm environment we just

240
00:08:51,360 --> 00:08:56,000
don't have enough memory address space

241
00:08:54,160 --> 00:08:57,440
uh to really enable a lot of the kernel

242
00:08:56,000 --> 00:08:58,720
memory protections that we've used in

243
00:08:57,440 --> 00:09:00,880
other environments

244
00:08:58,720 --> 00:09:03,600
um so we're hoping to be able to do some

245
00:09:00,880 --> 00:09:06,080
of these uh and and improve the security

246
00:09:03,600 --> 00:09:08,959
for army 32 and mips devices

247
00:09:06,080 --> 00:09:12,160
uh why did i get started in this um so

248
00:09:08,959 --> 00:09:13,920
in last january i was talking uh talking

249
00:09:12,160 --> 00:09:16,240
to you from a different company

250
00:09:13,920 --> 00:09:17,279
and in may i started a new job uh at

251
00:09:16,240 --> 00:09:18,240
amazon

252
00:09:17,279 --> 00:09:19,839
um

253
00:09:18,240 --> 00:09:23,279
and i'm a senior principal engineer

254
00:09:19,839 --> 00:09:25,920
which is a senior a senior uh individual

255
00:09:23,279 --> 00:09:28,560
contributor but i spend most of my time

256
00:09:25,920 --> 00:09:31,040
uh mentoring other engineers uh doing

257
00:09:28,560 --> 00:09:33,600
project management uh and doing and

258
00:09:31,040 --> 00:09:36,080
doing uh high higher scale technical

259
00:09:33,600 --> 00:09:37,760
activities and i really miss uh the

260
00:09:36,080 --> 00:09:40,959
opportunity to get engaged in a

261
00:09:37,760 --> 00:09:41,760
low-level uh seriously technical project

262
00:09:40,959 --> 00:09:43,200
um

263
00:09:41,760 --> 00:09:45,120
it's so i

264
00:09:43,200 --> 00:09:47,360
so because i have so many outside

265
00:09:45,120 --> 00:09:49,600
commitments i was looking for kind of a

266
00:09:47,360 --> 00:09:52,000
side project related to my amazon work

267
00:09:49,600 --> 00:09:54,959
that was technical uh clearly relevant

268
00:09:52,000 --> 00:09:57,440
to amazon and as we use arma 32 this is

269
00:09:54,959 --> 00:09:59,360
this clearly uh relates to that um and

270
00:09:57,440 --> 00:10:01,440
something that is super important but is

271
00:09:59,360 --> 00:10:03,200
not being worked on by other people so

272
00:10:01,440 --> 00:10:05,040
that i could kind of contribute as i had

273
00:10:03,200 --> 00:10:06,959
time

274
00:10:05,040 --> 00:10:08,880
two other side goals of course we're

275
00:10:06,959 --> 00:10:10,720
playing with my friend case he and i

276
00:10:08,880 --> 00:10:13,040
both live in portland um and get

277
00:10:10,720 --> 00:10:15,360
together talk about the linux kernel and

278
00:10:13,040 --> 00:10:17,920
play board games and so having another

279
00:10:15,360 --> 00:10:20,160
topic to chat with case about admit that

280
00:10:17,920 --> 00:10:21,680
i might get to play more board games

281
00:10:20,160 --> 00:10:23,120
and of course learning another area of

282
00:10:21,680 --> 00:10:25,600
the linux kernel

283
00:10:23,120 --> 00:10:28,320
and my risk in my last job i was

284
00:10:25,600 --> 00:10:30,720
starting to get involved on in kernel

285
00:10:28,320 --> 00:10:33,120
initialization uh for risk five

286
00:10:30,720 --> 00:10:36,000
processors and i really got excited

287
00:10:33,120 --> 00:10:37,440
about the super low level uh details

288
00:10:36,000 --> 00:10:39,440
about how the kernel ran on an

289
00:10:37,440 --> 00:10:41,279
individual processor and so the

290
00:10:39,440 --> 00:10:43,920
opportunity to figure out how the how

291
00:10:41,279 --> 00:10:45,279
linux runs on arm 32 is is really really

292
00:10:43,920 --> 00:10:46,560
exciting to me

293
00:10:45,279 --> 00:10:49,040
it's something i haven't spent a lot of

294
00:10:46,560 --> 00:10:52,000
time working on i've done a lot of stuff

295
00:10:49,040 --> 00:10:54,480
on device drivers and outside that in

296
00:10:52,000 --> 00:10:56,480
kind of memory memory management

297
00:10:54,480 --> 00:10:58,399
graphics and that kind of thing

298
00:10:56,480 --> 00:11:00,079
and this uh this is really a very

299
00:10:58,399 --> 00:11:02,240
different area and so it's always fun to

300
00:11:00,079 --> 00:11:05,519
learn something new

301
00:11:02,240 --> 00:11:07,680
okay so i want to get started uh

302
00:11:05,519 --> 00:11:08,720
fixing these kspp

303
00:11:07,680 --> 00:11:10,320
issues

304
00:11:08,720 --> 00:11:13,120
so we'll start on bug number one it's

305
00:11:10,320 --> 00:11:15,440
always good to start at the beginning

306
00:11:13,120 --> 00:11:17,360
so this is talking the bug number one

307
00:11:15,440 --> 00:11:19,040
says that we want to get the thread info

308
00:11:17,360 --> 00:11:21,680
out of the kernel stack

309
00:11:19,040 --> 00:11:23,680
uh the thread info is used it contains

310
00:11:21,680 --> 00:11:25,519
information that is uh that is

311
00:11:23,680 --> 00:11:28,800
architecture specific

312
00:11:25,519 --> 00:11:31,519
um and it's used in early cis early it's

313
00:11:28,800 --> 00:11:33,360
used early in syscall operations so when

314
00:11:31,519 --> 00:11:35,040
you jump into the kernel

315
00:11:33,360 --> 00:11:36,880
and you're just and you're going to do a

316
00:11:35,040 --> 00:11:39,120
syscall there's some data in the thread

317
00:11:36,880 --> 00:11:41,839
info that's needed

318
00:11:39,120 --> 00:11:44,079
super early in that process mostly to do

319
00:11:41,839 --> 00:11:46,079
memory bounds checking

320
00:11:44,079 --> 00:11:48,320
it's super vulnerable to kernel stack

321
00:11:46,079 --> 00:11:50,000
overflow it's literally sitting in the

322
00:11:48,320 --> 00:11:51,920
same memory pages

323
00:11:50,000 --> 00:11:53,920
and just below the kernel stack so if

324
00:11:51,920 --> 00:11:55,760
you manage to overflow the kernel stack

325
00:11:53,920 --> 00:11:57,600
you can smash

326
00:11:55,760 --> 00:11:59,440
you can actually smash the thread info

327
00:11:57,600 --> 00:12:01,839
and there's absolutely no detection of

328
00:11:59,440 --> 00:12:04,320
this at all um in particular this is

329
00:12:01,839 --> 00:12:06,959
some critical security bits in here that

330
00:12:04,320 --> 00:12:10,560
that that limit how much that limit

331
00:12:06,959 --> 00:12:12,480
which addresses the application are

332
00:12:10,560 --> 00:12:14,959
considered valid for the application to

333
00:12:12,480 --> 00:12:16,720
use in the syscall interface and if you

334
00:12:14,959 --> 00:12:19,360
can smash that you can actually get the

335
00:12:16,720 --> 00:12:21,600
kernel to access arbitrary kernel memory

336
00:12:19,360 --> 00:12:23,440
uh through the syscall interface

337
00:12:21,600 --> 00:12:24,959
the details about that are kind of a

338
00:12:23,440 --> 00:12:27,680
little more convoluted than we have time

339
00:12:24,959 --> 00:12:30,079
to go into here but it's super super

340
00:12:27,680 --> 00:12:32,959
important to protect this uh protect

341
00:12:30,079 --> 00:12:34,480
these elements uh from applications uh

342
00:12:32,959 --> 00:12:36,959
destroying them

343
00:12:34,480 --> 00:12:40,160
and the number of kernel stack overflow

344
00:12:36,959 --> 00:12:42,079
exploits used to be super big because

345
00:12:40,160 --> 00:12:44,399
everybody used to do this threat info

346
00:12:42,079 --> 00:12:45,920
was always stored in the kernel stack

347
00:12:44,399 --> 00:12:48,639
and when all the other architectures

348
00:12:45,920 --> 00:12:50,399
moved it out all those exploits appeared

349
00:12:48,639 --> 00:12:53,200
to go away because they weren't present

350
00:12:50,399 --> 00:12:55,360
on arm64 or x86

351
00:12:53,200 --> 00:12:58,639
but guess what you can still use all

352
00:12:55,360 --> 00:13:00,639
those same techniques on an arm32 kernel

353
00:12:58,639 --> 00:13:02,880
and the way that this is solved is by

354
00:13:00,639 --> 00:13:05,040
merging this thread info

355
00:13:02,880 --> 00:13:06,959
into another per task data structure

356
00:13:05,040 --> 00:13:08,800
called the task struct

357
00:13:06,959 --> 00:13:11,200
right now the task struct is allocated

358
00:13:08,800 --> 00:13:12,959
just in regular kernel memory

359
00:13:11,200 --> 00:13:14,720
and the thread info is in this magic

360
00:13:12,959 --> 00:13:16,320
spot in the kernel stack

361
00:13:14,720 --> 00:13:18,320
and you can just merge those together

362
00:13:16,320 --> 00:13:20,639
it's a little tricky

363
00:13:18,320 --> 00:13:22,320
for reasons we'll go into later uh but

364
00:13:20,639 --> 00:13:24,320
essentially every other architecture

365
00:13:22,320 --> 00:13:27,040
other than mips has already done this

366
00:13:24,320 --> 00:13:28,079
work and so there's a lot of a lot of uh

367
00:13:27,040 --> 00:13:30,240
a lot of

368
00:13:28,079 --> 00:13:31,839
kind of a well-trodden path

369
00:13:30,240 --> 00:13:33,600
which meant that as i was learning how

370
00:13:31,839 --> 00:13:34,959
this system worked i could go back and

371
00:13:33,600 --> 00:13:37,120
review the patches from other

372
00:13:34,959 --> 00:13:38,959
architectures and figure out how those

373
00:13:37,120 --> 00:13:40,639
how this was done there

374
00:13:38,959 --> 00:13:42,880
and that made it super easy for me to

375
00:13:40,639 --> 00:13:46,000
kind of follow along and figure out

376
00:13:42,880 --> 00:13:48,560
how this work was uh needed to get done

377
00:13:46,000 --> 00:13:49,839
so what does this work really mean uh

378
00:13:48,560 --> 00:13:53,040
what is what are we going to be doing

379
00:13:49,839 --> 00:13:54,959
here uh so in the current arm 32 kernel

380
00:13:53,040 --> 00:13:57,920
we have these two structures we have the

381
00:13:54,959 --> 00:13:59,839
thread info and we have the task struct

382
00:13:57,920 --> 00:14:02,000
um and the thread info has a pointer to

383
00:13:59,839 --> 00:14:04,000
the task and the task struck has a

384
00:14:02,000 --> 00:14:05,680
pointer to the stack segment and of

385
00:14:04,000 --> 00:14:07,680
course the very first thing in the stack

386
00:14:05,680 --> 00:14:09,360
segment is the thread info so they kind

387
00:14:07,680 --> 00:14:11,760
of reference one another so if you have

388
00:14:09,360 --> 00:14:13,120
a task struct you can get a thread info

389
00:14:11,760 --> 00:14:14,639
and if you have a thread info you can

390
00:14:13,120 --> 00:14:16,720
get a task struct

391
00:14:14,639 --> 00:14:18,880
the goal here is to just smash these

392
00:14:16,720 --> 00:14:21,120
together and stick the thread info at

393
00:14:18,880 --> 00:14:23,040
the top of the task struct and the

394
00:14:21,120 --> 00:14:25,680
reason it goes the top of the task

395
00:14:23,040 --> 00:14:26,880
struct is super complicated um and

396
00:14:25,680 --> 00:14:29,519
there's that i actually have a slide

397
00:14:26,880 --> 00:14:31,920
about the the convolutions uh within the

398
00:14:29,519 --> 00:14:34,000
linux kernel uh that require to live at

399
00:14:31,920 --> 00:14:36,240
the top of this later on in the talk

400
00:14:34,000 --> 00:14:38,240
that was kind of a surprise to me

401
00:14:36,240 --> 00:14:39,680
welcome to the c programming language

402
00:14:38,240 --> 00:14:41,199
again

403
00:14:39,680 --> 00:14:42,880
so that's the goal

404
00:14:41,199 --> 00:14:44,480
is to take these two data structures and

405
00:14:42,880 --> 00:14:47,600
smash them together

406
00:14:44,480 --> 00:14:49,519
okay so why is the kernel struct uh why

407
00:14:47,600 --> 00:14:51,199
is the thread info in the in the kernel

408
00:14:49,519 --> 00:14:54,480
stack right now

409
00:14:51,199 --> 00:14:56,320
and and the number one reason on the arm

410
00:14:54,480 --> 00:14:57,600
processors is that you want to be able

411
00:14:56,320 --> 00:15:00,079
to find

412
00:14:57,600 --> 00:15:01,040
the thread info from your kernel stack

413
00:15:00,079 --> 00:15:01,920
pointer

414
00:15:01,040 --> 00:15:03,760
um

415
00:15:01,920 --> 00:15:04,959
and uh and

416
00:15:03,760 --> 00:15:06,639
it

417
00:15:04,959 --> 00:15:08,320
when you enter the sys when you enter us

418
00:15:06,639 --> 00:15:10,480
from a system call

419
00:15:08,320 --> 00:15:12,079
all you've got is the cpu registers and

420
00:15:10,480 --> 00:15:13,519
you need to be able to find something

421
00:15:12,079 --> 00:15:14,800
that lets you know what task is

422
00:15:13,519 --> 00:15:16,720
currently running

423
00:15:14,800 --> 00:15:18,160
um and so on arm the way that we do that

424
00:15:16,720 --> 00:15:21,360
is we just take the current stack

425
00:15:18,160 --> 00:15:22,959
pointer uh mask off all the high bits

426
00:15:21,360 --> 00:15:24,639
and then voila because we're now

427
00:15:22,959 --> 00:15:26,560
pointing to the base of the kernel stack

428
00:15:24,639 --> 00:15:28,560
setting we have a pointer to the threads

429
00:15:26,560 --> 00:15:31,199
to the thread info structure

430
00:15:28,560 --> 00:15:33,120
so it's architecture independent

431
00:15:31,199 --> 00:15:34,959
it doesn't depend upon any other state

432
00:15:33,120 --> 00:15:36,880
in the processor or in the system so

433
00:15:34,959 --> 00:15:39,199
it's atomic with respect to thread

434
00:15:36,880 --> 00:15:41,199
switching i mean it's super fast all you

435
00:15:39,199 --> 00:15:43,279
have to do is take the stack pointer

436
00:15:41,199 --> 00:15:44,399
and do a simple arithmetic operation on

437
00:15:43,279 --> 00:15:46,880
it

438
00:15:44,399 --> 00:15:48,880
a bunch of other per task information is

439
00:15:46,880 --> 00:15:51,040
in the task struct

440
00:15:48,880 --> 00:15:52,800
so we'll need to be able to find that

441
00:15:51,040 --> 00:15:54,240
but fortunately the thread info has a

442
00:15:52,800 --> 00:15:55,759
pointer to that and so we can go get

443
00:15:54,240 --> 00:15:57,519
that when we need it

444
00:15:55,759 --> 00:15:59,600
but the key here is that we need a way

445
00:15:57,519 --> 00:16:02,079
to we need kind of that ground truth

446
00:15:59,600 --> 00:16:03,680
once you once you are just sitting here

447
00:16:02,079 --> 00:16:05,199
happily running along the kernel and you

448
00:16:03,680 --> 00:16:07,920
need to find out what your thread info

449
00:16:05,199 --> 00:16:09,519
is all you have is the cpu registers and

450
00:16:07,920 --> 00:16:12,000
so you need to be able to take those cpu

451
00:16:09,519 --> 00:16:14,480
registers and compute

452
00:16:12,000 --> 00:16:16,079
thread info from them

453
00:16:14,480 --> 00:16:18,480
so that's one of the main reasons

454
00:16:16,079 --> 00:16:20,480
another reason is just historical

455
00:16:18,480 --> 00:16:23,680
in old v7 linux

456
00:16:20,480 --> 00:16:26,880
not linux in old v7 unix

457
00:16:23,680 --> 00:16:30,639
most of the uh per process information

458
00:16:26,880 --> 00:16:33,040
was stored um in in the uh in the

459
00:16:30,639 --> 00:16:35,360
in the in the thread stack as well in

460
00:16:33,040 --> 00:16:36,560
the kernel stack as well

461
00:16:35,360 --> 00:16:38,079
just because

462
00:16:36,560 --> 00:16:40,320
because there wasn't enough memory to

463
00:16:38,079 --> 00:16:43,360
keep it in regular kernel memory um and

464
00:16:40,320 --> 00:16:45,120
so the v7 unix uh saved a bunch of

465
00:16:43,360 --> 00:16:46,240
memory by putting all this per thread

466
00:16:45,120 --> 00:16:49,600
information

467
00:16:46,240 --> 00:16:51,360
uh in the in the in the task itself and

468
00:16:49,600 --> 00:16:52,320
so that when the task got swapped out to

469
00:16:51,360 --> 00:16:54,399
disk

470
00:16:52,320 --> 00:16:56,560
it could it could use that memory for

471
00:16:54,399 --> 00:16:58,800
other things so that's where this kind

472
00:16:56,560 --> 00:17:01,120
of behavior came from

473
00:16:58,800 --> 00:17:04,640
kind of a classic a classic hack of

474
00:17:01,120 --> 00:17:05,520
saving memory and address space

475
00:17:04,640 --> 00:17:08,400
okay

476
00:17:05,520 --> 00:17:11,199
so because we are about we're trying to

477
00:17:08,400 --> 00:17:13,679
smash the thread info and task struct

478
00:17:11,199 --> 00:17:15,919
together those are no longer going to be

479
00:17:13,679 --> 00:17:19,360
in the kernel stack and that means we

480
00:17:15,919 --> 00:17:22,880
need to find another way to get a hold

481
00:17:19,360 --> 00:17:25,439
of the thread info and the task struct

482
00:17:22,880 --> 00:17:27,120
from an arbitrary thread context that

483
00:17:25,439 --> 00:17:28,319
doesn't depend upon the stack pointer

484
00:17:27,120 --> 00:17:30,000
anymore

485
00:17:28,319 --> 00:17:30,880
so we need to we need to find another

486
00:17:30,000 --> 00:17:32,559
way

487
00:17:30,880 --> 00:17:34,880
and the basic problem is is that when

488
00:17:32,559 --> 00:17:37,039
the task enters the kernel we need to

489
00:17:34,880 --> 00:17:39,039
get these two pointers and all we have

490
00:17:37,039 --> 00:17:41,679
are the cpu registers

491
00:17:39,039 --> 00:17:43,840
it can be interrupted at any point

492
00:17:41,679 --> 00:17:45,360
so the the a lot of the places that we

493
00:17:43,840 --> 00:17:47,360
need to get this pointer aren't in an

494
00:17:45,360 --> 00:17:49,600
atomic context which means that between

495
00:17:47,360 --> 00:17:51,840
any two instructions we could switch

496
00:17:49,600 --> 00:17:54,960
which cpu we're running on so we can't

497
00:17:51,840 --> 00:17:57,280
depend upon which cpu we're running on

498
00:17:54,960 --> 00:18:01,360
we need to depend upon context in the

499
00:17:57,280 --> 00:18:02,960
cpu that is that is uh thread specific

500
00:18:01,360 --> 00:18:05,280
and that's why the stack pointer is so

501
00:18:02,960 --> 00:18:07,440
tempting because that is by by its very

502
00:18:05,280 --> 00:18:09,760
nature thread specific it points into

503
00:18:07,440 --> 00:18:11,679
the thread's kernel stack

504
00:18:09,760 --> 00:18:13,600
uh it turns out this is a lot harder

505
00:18:11,679 --> 00:18:15,200
than i thought uh because i really

506
00:18:13,600 --> 00:18:17,520
didn't have a good understanding of what

507
00:18:15,200 --> 00:18:19,840
was required here

508
00:18:17,520 --> 00:18:21,840
uh so i did the dumb thing uh well not

509
00:18:19,840 --> 00:18:23,280
really dumb this the the kind of the

510
00:18:21,840 --> 00:18:26,160
obvious thing

511
00:18:23,280 --> 00:18:28,720
i tried to create a per cpu variable

512
00:18:26,160 --> 00:18:30,559
that would point at the current uh the

513
00:18:28,720 --> 00:18:33,280
current thread info

514
00:18:30,559 --> 00:18:35,120
um per cpu variables are kind of a magic

515
00:18:33,280 --> 00:18:36,559
part of the kernel uh that i got to

516
00:18:35,120 --> 00:18:38,240
learn about when i did this which was

517
00:18:36,559 --> 00:18:40,880
cool uh learning about new stuff is

518
00:18:38,240 --> 00:18:43,760
always fun uh they're allocated uh

519
00:18:40,880 --> 00:18:44,480
they're kind of allocated

520
00:18:43,760 --> 00:18:47,280
at

521
00:18:44,480 --> 00:18:49,919
boot time through magic

522
00:18:47,280 --> 00:18:51,440
every per cpu variable has a magic

523
00:18:49,919 --> 00:18:53,840
offset value

524
00:18:51,440 --> 00:18:57,200
and a base address and you can find your

525
00:18:53,840 --> 00:18:59,520
per cpu value by adding the per cpu

526
00:18:57,200 --> 00:19:02,080
offset to the base address of the

527
00:18:59,520 --> 00:19:04,400
variable it's kind of funky but it means

528
00:19:02,080 --> 00:19:07,200
that you can uh that if you know your

529
00:19:04,400 --> 00:19:10,000
cpu number uh or if you know your your

530
00:19:07,200 --> 00:19:11,520
cpu offset you can go get these per cpu

531
00:19:10,000 --> 00:19:13,760
variables

532
00:19:11,520 --> 00:19:14,960
the problem is on newer arms this isn't

533
00:19:13,760 --> 00:19:17,200
atomic

534
00:19:14,960 --> 00:19:19,360
because you have to load the per cpu

535
00:19:17,200 --> 00:19:21,840
offset from this magic register

536
00:19:19,360 --> 00:19:23,840
that's holding that value on new arms

537
00:19:21,840 --> 00:19:25,360
and then you have to fetch the per cpu

538
00:19:23,840 --> 00:19:28,480
value from memory

539
00:19:25,360 --> 00:19:31,039
the problem is as was pointed out to me

540
00:19:28,480 --> 00:19:32,559
if you if you switch processors between

541
00:19:31,039 --> 00:19:34,880
these two steps

542
00:19:32,559 --> 00:19:36,799
then you have the wrong per cpu value in

543
00:19:34,880 --> 00:19:38,400
your register and you're going to load

544
00:19:36,799 --> 00:19:40,799
the wrong cpu

545
00:19:38,400 --> 00:19:43,200
the law the wrong uh per cpu value from

546
00:19:40,799 --> 00:19:44,880
memory um and in fact you're going to go

547
00:19:43,200 --> 00:19:46,640
talk about some other thread running in

548
00:19:44,880 --> 00:19:48,640
the system which is probably not going

549
00:19:46,640 --> 00:19:52,240
to work out very well

550
00:19:48,640 --> 00:19:54,480
an older arm 32 is even more problematic

551
00:19:52,240 --> 00:19:56,960
the per cpu offset on these older arm

552
00:19:54,480 --> 00:19:59,039
processors are stored in memory

553
00:19:56,960 --> 00:20:00,240
and it's fetched using the cpu as an

554
00:19:59,039 --> 00:20:03,120
index

555
00:20:00,240 --> 00:20:05,440
oh and the only place the system stores

556
00:20:03,120 --> 00:20:07,840
the cpu index for the current thread is

557
00:20:05,440 --> 00:20:10,000
oh right in thread info

558
00:20:07,840 --> 00:20:11,039
so that means if i want to find the per

559
00:20:10,000 --> 00:20:12,960
thread

560
00:20:11,039 --> 00:20:15,919
data structure this thread info data

561
00:20:12,960 --> 00:20:17,919
structure i need to go get go get the

562
00:20:15,919 --> 00:20:20,559
per cpu offset which is stored in an

563
00:20:17,919 --> 00:20:23,120
array which is indexed by the cpu

564
00:20:20,559 --> 00:20:24,799
which is in the thread info so i can't

565
00:20:23,120 --> 00:20:26,799
do this at all

566
00:20:24,799 --> 00:20:28,720
fortunately around the same time

567
00:20:26,799 --> 00:20:29,440
case decided to

568
00:20:28,720 --> 00:20:31,520
to

569
00:20:29,440 --> 00:20:33,280
bring another member kernel developer

570
00:20:31,520 --> 00:20:36,960
into his team

571
00:20:33,280 --> 00:20:37,760
and that that person is is uh named art

572
00:20:36,960 --> 00:20:38,720
um

573
00:20:37,760 --> 00:20:41,200
uh

574
00:20:38,720 --> 00:20:43,039
bishovel

575
00:20:41,200 --> 00:20:46,080
and i asked permission to pronounce his

576
00:20:43,039 --> 00:20:48,720
name in public uh and i apologize uh for

577
00:20:46,080 --> 00:20:49,600
not getting it quite right um he and

578
00:20:48,720 --> 00:20:51,600
case

579
00:20:49,600 --> 00:20:52,640
are he's actually

580
00:20:51,600 --> 00:20:54,640
dutch

581
00:20:52,640 --> 00:20:56,960
and case tried to give me some pointers

582
00:20:54,640 --> 00:20:59,200
on how to pronounce his name uh and so i

583
00:20:56,960 --> 00:21:01,440
hope i did okay uh case brought art into

584
00:20:59,200 --> 00:21:02,960
his google team uh probably about four

585
00:21:01,440 --> 00:21:04,559
or five months ago

586
00:21:02,960 --> 00:21:06,320
um and one of the first things he

587
00:21:04,559 --> 00:21:09,039
started to do was reviewing the patches

588
00:21:06,320 --> 00:21:10,880
that i'd provided um which was awesome

589
00:21:09,039 --> 00:21:12,960
you know the the best part about a free

590
00:21:10,880 --> 00:21:15,120
software world is that when you submit

591
00:21:12,960 --> 00:21:17,120
code out you get comments back

592
00:21:15,120 --> 00:21:20,000
and art's comments were super helpful

593
00:21:17,120 --> 00:21:21,600
and really positive um and so one of the

594
00:21:20,000 --> 00:21:23,280
comments that he made was yeah that per

595
00:21:21,600 --> 00:21:25,200
cpu variable thing is probably not going

596
00:21:23,280 --> 00:21:27,360
to work out very well for you

597
00:21:25,200 --> 00:21:29,280
so we'll have to go fix uh go fix the

598
00:21:27,360 --> 00:21:30,880
whole how do we find the thread info

599
00:21:29,280 --> 00:21:32,559
thing

600
00:21:30,880 --> 00:21:34,320
okay so we decided to make a second

601
00:21:32,559 --> 00:21:36,720
attempt

602
00:21:34,320 --> 00:21:38,080
that only worked on some arm 32

603
00:21:36,720 --> 00:21:41,120
processors

604
00:21:38,080 --> 00:21:43,520
newer arm 32 processors have have a

605
00:21:41,120 --> 00:21:46,799
bunch of extra registers

606
00:21:43,520 --> 00:21:49,440
and two of them are these tpidr prw and

607
00:21:46,799 --> 00:21:51,840
tpidr uro

608
00:21:49,440 --> 00:21:54,080
i have no idea what those initialisms

609
00:21:51,840 --> 00:21:56,159
are supposed to stand for

610
00:21:54,080 --> 00:22:00,400
but i knew i do know that the

611
00:21:56,159 --> 00:22:03,039
tpi dr prw register was already used

612
00:22:00,400 --> 00:22:04,400
as i as we as we saw before that's

613
00:22:03,039 --> 00:22:06,559
already being used in the kernel for

614
00:22:04,400 --> 00:22:10,880
these per cpu offsets

615
00:22:06,559 --> 00:22:12,799
uh and the tpi dru r0r

616
00:22:10,880 --> 00:22:15,440
is already being used

617
00:22:12,799 --> 00:22:17,919
in the gcc abi

618
00:22:15,440 --> 00:22:20,640
for the tls base register so when when

619
00:22:17,919 --> 00:22:22,640
you're up running in a user space are

620
00:22:20,640 --> 00:22:24,880
you using that register to find the base

621
00:22:22,640 --> 00:22:27,280
of your thread local storage uh

622
00:22:24,880 --> 00:22:29,360
data in your in your application

623
00:22:27,280 --> 00:22:31,120
and and so those two registers which i

624
00:22:29,360 --> 00:22:32,720
had available to me were both already

625
00:22:31,120 --> 00:22:34,799
being used

626
00:22:32,720 --> 00:22:37,760
so i put together a patch that switched

627
00:22:34,799 --> 00:22:39,760
the tpi dr prw

628
00:22:37,760 --> 00:22:40,880
from the per cpu offset to the thread

629
00:22:39,760 --> 00:22:42,559
info

630
00:22:40,880 --> 00:22:45,200
because i knew that once i had the

631
00:22:42,559 --> 00:22:47,600
thread info i could go get the cpu out

632
00:22:45,200 --> 00:22:50,000
of the thread info and use that to find

633
00:22:47,600 --> 00:22:51,760
the the per cpu offset

634
00:22:50,000 --> 00:22:54,240
using the global array just like it does

635
00:22:51,760 --> 00:22:55,760
on unit processor on on older arm

636
00:22:54,240 --> 00:22:57,280
processors

637
00:22:55,760 --> 00:22:58,799
so that seemed like a pretty easy thing

638
00:22:57,280 --> 00:23:00,400
to change

639
00:22:58,799 --> 00:23:03,280
and i put together a patch and i got it

640
00:23:00,400 --> 00:23:06,000
all working and i submitted it and

641
00:23:03,280 --> 00:23:08,240
i got back a bunch of comments uh mostly

642
00:23:06,000 --> 00:23:11,120
about um that's a performance problem we

643
00:23:08,240 --> 00:23:12,640
use this per cpu offset a bunch

644
00:23:11,120 --> 00:23:14,720
so now you're making that a lot more

645
00:23:12,640 --> 00:23:16,480
expensive to get you're adding several

646
00:23:14,720 --> 00:23:18,240
memory fetches in fact

647
00:23:16,480 --> 00:23:21,120
to go get that data you have to go get

648
00:23:18,240 --> 00:23:22,880
the cpu index out of the thread info

649
00:23:21,120 --> 00:23:25,520
and then you have to go get the per cpu

650
00:23:22,880 --> 00:23:27,440
offset out of the array

651
00:23:25,520 --> 00:23:31,600
and the other thing is is that gcc

652
00:23:27,440 --> 00:23:33,760
already knows about tpidr uro

653
00:23:31,600 --> 00:23:36,159
because it uses it in user space for the

654
00:23:33,760 --> 00:23:39,280
thread local storage pointer so we can

655
00:23:36,159 --> 00:23:41,039
actually use this magic gcc built-in

656
00:23:39,280 --> 00:23:43,279
function that knows all kinds of

657
00:23:41,039 --> 00:23:45,120
semantics about built-in thread pointer

658
00:23:43,279 --> 00:23:47,600
which means that when i optimize the

659
00:23:45,120 --> 00:23:50,480
code gcc knows how that function behaves

660
00:23:47,600 --> 00:23:51,600
it knows that it doesn't change a change

661
00:23:50,480 --> 00:23:54,000
between

662
00:23:51,600 --> 00:23:56,559
function calls it's always the same

663
00:23:54,000 --> 00:23:59,679
and so gcc can actually hoist

664
00:23:56,559 --> 00:24:02,080
that operation out of inner loops it can

665
00:23:59,679 --> 00:24:04,000
share the share the value between

666
00:24:02,080 --> 00:24:07,279
multiple statements that use this that

667
00:24:04,000 --> 00:24:10,799
use the the um the thread info pointer

668
00:24:07,279 --> 00:24:14,000
and so using tpi d-r-u-r-o was super

669
00:24:10,799 --> 00:24:16,159
tempting uh for that reason as well

670
00:24:14,000 --> 00:24:17,120
so art suggests that we try using that

671
00:24:16,159 --> 00:24:19,039
instead

672
00:24:17,120 --> 00:24:21,840
um the com the hard part there is that

673
00:24:19,039 --> 00:24:23,520
now we need to actually save and restore

674
00:24:21,840 --> 00:24:24,880
uh that value whenever we go back to

675
00:24:23,520 --> 00:24:26,720
user space

676
00:24:24,880 --> 00:24:28,640
but that didn't turn out to be too bad

677
00:24:26,720 --> 00:24:31,120
um and kernel and we already save and

678
00:24:28,640 --> 00:24:33,360
restore a bunch of registers across that

679
00:24:31,120 --> 00:24:35,279
across that operation anyhow

680
00:24:33,360 --> 00:24:37,840
and we often times have to restore the

681
00:24:35,279 --> 00:24:39,760
tpid or uro register going back to user

682
00:24:37,840 --> 00:24:40,720
space because we need to make sure that

683
00:24:39,760 --> 00:24:42,640
the right

684
00:24:40,720 --> 00:24:44,799
thread local storage pointer

685
00:24:42,640 --> 00:24:47,360
is stored for user space so we just

686
00:24:44,799 --> 00:24:49,919
needed to add a couple more checks uh

687
00:24:47,360 --> 00:24:50,960
to make that happen

688
00:24:49,919 --> 00:24:54,400
okay

689
00:24:50,960 --> 00:24:56,640
so now we are uh at slide 15

690
00:24:54,400 --> 00:24:59,279
and all that we've managed to do is get

691
00:24:56,640 --> 00:25:01,440
a pointer set uh in the kernel

692
00:24:59,279 --> 00:25:03,279
we haven't actually changed anything yet

693
00:25:01,440 --> 00:25:04,559
uh the data structures are still all in

694
00:25:03,279 --> 00:25:06,799
the same place

695
00:25:04,559 --> 00:25:09,200
but now that we have this pointer we can

696
00:25:06,799 --> 00:25:10,799
finally move the thread info

697
00:25:09,200 --> 00:25:12,640
and so that turned out to be kind of the

698
00:25:10,799 --> 00:25:15,039
easiest part of the easiest part of this

699
00:25:12,640 --> 00:25:17,600
process because the kernel already knows

700
00:25:15,039 --> 00:25:19,360
what to do here it already has all this

701
00:25:17,600 --> 00:25:20,799
configuration infrastructure and code

702
00:25:19,360 --> 00:25:23,600
that supports

703
00:25:20,799 --> 00:25:24,640
thread this config option thread info

704
00:25:23,600 --> 00:25:27,679
and task

705
00:25:24,640 --> 00:25:29,440
and so it was super easy to enable that

706
00:25:27,679 --> 00:25:30,799
and all of a sudden things were working

707
00:25:29,440 --> 00:25:33,440
again

708
00:25:30,799 --> 00:25:35,520
it took a few arm specific changes

709
00:25:33,440 --> 00:25:37,919
uh to actually use the register in the

710
00:25:35,520 --> 00:25:40,640
appropriate places um and then to kind

711
00:25:37,919 --> 00:25:42,960
of clean up the the thread info to get

712
00:25:40,640 --> 00:25:44,080
rid of the dregs of data that were stuck

713
00:25:42,960 --> 00:25:45,840
in there

714
00:25:44,080 --> 00:25:47,200
the good news is that this piece of the

715
00:25:45,840 --> 00:25:49,840
patch has actually

716
00:25:47,200 --> 00:25:50,799
landed in 5.16 kernel

717
00:25:49,840 --> 00:25:53,520
and so

718
00:25:50,799 --> 00:25:57,919
we've actually fixed bug number one

719
00:25:53,520 --> 00:25:59,600
for some arm architectures in 5.16 and

720
00:25:57,919 --> 00:26:00,640
we'll talk about the remaining work to

721
00:25:59,600 --> 00:26:02,240
be done

722
00:26:00,640 --> 00:26:03,679
later on

723
00:26:02,240 --> 00:26:05,919
okay so

724
00:26:03,679 --> 00:26:07,760
we managed to get that fixed

725
00:26:05,919 --> 00:26:10,480
but in the process of fixing that there

726
00:26:07,760 --> 00:26:12,000
was an unexpected new problem of course

727
00:26:10,480 --> 00:26:14,559
always right

728
00:26:12,000 --> 00:26:16,880
the thread info in task struct

729
00:26:14,559 --> 00:26:18,799
i don't know why but somebody decided

730
00:26:16,880 --> 00:26:21,120
that when you enabled that it should

731
00:26:18,799 --> 00:26:23,760
move where the cpu field remember that

732
00:26:21,120 --> 00:26:25,279
cpu field we're using to index the per

733
00:26:23,760 --> 00:26:28,080
thread information

734
00:26:25,279 --> 00:26:30,480
i mean the the per cpu information it's

735
00:26:28,080 --> 00:26:31,279
used for other things as well

736
00:26:30,480 --> 00:26:33,039
and it

737
00:26:31,279 --> 00:26:34,400
and for some reason when they move the

738
00:26:33,039 --> 00:26:36,720
thread info

739
00:26:34,400 --> 00:26:38,559
and merged it into the task struct

740
00:26:36,720 --> 00:26:42,640
all of those patches also assume that

741
00:26:38,559 --> 00:26:44,559
the cpu field is now in the task struct

742
00:26:42,640 --> 00:26:45,840
on all architectures

743
00:26:44,559 --> 00:26:47,840
so you don't get a choice about where

744
00:26:45,840 --> 00:26:49,760
that lives you didn't get a choice about

745
00:26:47,840 --> 00:26:53,039
where that lives

746
00:26:49,760 --> 00:26:56,640
the problem is is the task struct is

747
00:26:53,039 --> 00:26:58,880
super super huge and it has

748
00:26:56,640 --> 00:27:01,679
hundreds of fields

749
00:26:58,880 --> 00:27:04,240
of data types from all over the kernel

750
00:27:01,679 --> 00:27:07,760
architecture specific architect

751
00:27:04,240 --> 00:27:09,279
architecture independent um and so the

752
00:27:07,760 --> 00:27:12,559
problem is is that

753
00:27:09,279 --> 00:27:14,159
i can't include the files that reference

754
00:27:12,559 --> 00:27:17,039
the task struct

755
00:27:14,159 --> 00:27:20,320
every place that i need to fetch the cpu

756
00:27:17,039 --> 00:27:23,039
field out of the thread info

757
00:27:20,320 --> 00:27:26,320
the circular it became a circular

758
00:27:23,039 --> 00:27:29,120
include reference uh kind of disaster

759
00:27:26,320 --> 00:27:30,320
um you you'd try to include that file

760
00:27:29,120 --> 00:27:32,960
and then all of a sudden you had to

761
00:27:30,320 --> 00:27:34,720
include 50 other files before it

762
00:27:32,960 --> 00:27:37,520
and the patch were the patch that i

763
00:27:34,720 --> 00:27:39,279
actually put together to to fix this

764
00:27:37,520 --> 00:27:41,120
problem touched

765
00:27:39,279 --> 00:27:45,279
three or four thousand files in the

766
00:27:41,120 --> 00:27:47,840
kernel um and was super invasive um and

767
00:27:45,279 --> 00:27:50,080
kind of super scary in terms of

768
00:27:47,840 --> 00:27:52,320
changing how files were getting included

769
00:27:50,080 --> 00:27:53,840
across large swaths of the kernel which

770
00:27:52,320 --> 00:27:56,480
meant that it was really difficult for

771
00:27:53,840 --> 00:27:57,600
me to validate that it was correct

772
00:27:56,480 --> 00:27:59,200
and so

773
00:27:57,600 --> 00:28:00,480
i looked at that and decided to kind of

774
00:27:59,200 --> 00:28:02,559
walk away

775
00:28:00,480 --> 00:28:04,320
and not do that fix

776
00:28:02,559 --> 00:28:05,520
and so the first patch that i put

777
00:28:04,320 --> 00:28:08,320
together

778
00:28:05,520 --> 00:28:11,039
adopted a patch uh a terrible hack that

779
00:28:08,320 --> 00:28:13,840
power had uh had implemented

780
00:28:11,039 --> 00:28:16,480
where it computed the offset of the cpu

781
00:28:13,840 --> 00:28:19,120
field within the task struct before the

782
00:28:16,480 --> 00:28:20,960
kernel was compiled and then has this

783
00:28:19,120 --> 00:28:23,760
magic function that would add that

784
00:28:20,960 --> 00:28:26,320
offset to the base address of the of the

785
00:28:23,760 --> 00:28:28,399
thread info pointer and know that the

786
00:28:26,320 --> 00:28:31,200
thread info pointer was always embedded

787
00:28:28,399 --> 00:28:34,480
in a task struct and go pull the cpu

788
00:28:31,200 --> 00:28:36,320
field out of the enclosing task struct

789
00:28:34,480 --> 00:28:37,840
that was really awful

790
00:28:36,320 --> 00:28:39,600
but it did get rid of the circular

791
00:28:37,840 --> 00:28:41,760
reference problem

792
00:28:39,600 --> 00:28:44,320
and so that was kind of a piece of the

793
00:28:41,760 --> 00:28:47,200
first patch that i put together

794
00:28:44,320 --> 00:28:49,120
and art suggested a better solution a

795
00:28:47,200 --> 00:28:50,799
clearly better solution

796
00:28:49,120 --> 00:28:52,799
instead of

797
00:28:50,799 --> 00:28:54,000
using this terrible clue we should go

798
00:28:52,799 --> 00:28:56,799
evaluate

799
00:28:54,000 --> 00:28:58,960
why the cpu field was in the task struct

800
00:28:56,799 --> 00:29:01,679
and could we just move it back so

801
00:28:58,960 --> 00:29:04,000
instead of being in the task struct it

802
00:29:01,679 --> 00:29:06,240
could be in the thread info again

803
00:29:04,000 --> 00:29:08,240
and from a memory perspective it doesn't

804
00:29:06,240 --> 00:29:10,320
matter at all these the remember the

805
00:29:08,240 --> 00:29:13,200
thread info is embedded in the task

806
00:29:10,320 --> 00:29:15,360
struct so placing the cpu field in the

807
00:29:13,200 --> 00:29:16,240
thread info didn't change the allocation

808
00:29:15,360 --> 00:29:17,440
at all

809
00:29:16,240 --> 00:29:20,320
we were still going to have it in the

810
00:29:17,440 --> 00:29:22,799
same piece of the same allocation and in

811
00:29:20,320 --> 00:29:25,039
fact when i looked at x86 it turned out

812
00:29:22,799 --> 00:29:27,600
that moving the cpu field

813
00:29:25,039 --> 00:29:30,399
from where it was located in the task

814
00:29:27,600 --> 00:29:32,320
struct back into the thread info was

815
00:29:30,399 --> 00:29:33,279
going to get rid of a couple of padding

816
00:29:32,320 --> 00:29:35,279
fields

817
00:29:33,279 --> 00:29:36,960
that got inserted into the task struct

818
00:29:35,279 --> 00:29:38,720
and shrink it so that was kind of pretty

819
00:29:36,960 --> 00:29:40,960
cool

820
00:29:38,720 --> 00:29:43,600
and so we actually worked with all the

821
00:29:40,960 --> 00:29:45,600
other architecture well we by we i mean

822
00:29:43,600 --> 00:29:48,080
art uh worked with all of the other

823
00:29:45,600 --> 00:29:50,159
architecture teams uh to get these

824
00:29:48,080 --> 00:29:52,880
patches landed and he managed to land

825
00:29:50,159 --> 00:29:54,960
patches that moved the cpu field back

826
00:29:52,880 --> 00:29:56,640
into the thread info

827
00:29:54,960 --> 00:30:00,159
one of the results of that was the power

828
00:29:56,640 --> 00:30:03,039
pc uh terrible hack got removed and so

829
00:30:00,159 --> 00:30:05,440
now power gets a cleaner implementation

830
00:30:03,039 --> 00:30:08,720
uh for fetching the cpu field

831
00:30:05,440 --> 00:30:11,600
uh x86 saves a little bit of memory um

832
00:30:08,720 --> 00:30:13,600
and we are our fine arm patches can move

833
00:30:11,600 --> 00:30:15,679
forward

834
00:30:13,600 --> 00:30:17,919
okay so we fixed it on

835
00:30:15,679 --> 00:30:20,880
this arm chips that have these magic new

836
00:30:17,919 --> 00:30:24,880
registers uh so those are

837
00:30:20,880 --> 00:30:28,000
arm v7 and arm v6k

838
00:30:24,880 --> 00:30:29,840
it turns out that the only smp parts

839
00:30:28,000 --> 00:30:32,960
supported in the kernel right now which

840
00:30:29,840 --> 00:30:36,720
is to say the only kernel the only arm

841
00:30:32,960 --> 00:30:40,159
chips you can run a multi-core kernel on

842
00:30:36,720 --> 00:30:42,960
are either v7 or v6k and both of those

843
00:30:40,159 --> 00:30:45,520
include this magic new register and all

844
00:30:42,960 --> 00:30:47,360
the other arm chips that we run linux on

845
00:30:45,520 --> 00:30:49,600
our uniprocessor

846
00:30:47,360 --> 00:30:51,360
well it turns out that registers and

847
00:30:49,600 --> 00:30:52,559
memory are really similar to unit

848
00:30:51,360 --> 00:30:53,919
processor

849
00:30:52,559 --> 00:30:56,399
because

850
00:30:53,919 --> 00:30:58,000
the the there is only one set of

851
00:30:56,399 --> 00:31:00,640
registers and there's only one set of

852
00:30:58,000 --> 00:31:02,960
memory so on a unit processor part we

853
00:31:00,640 --> 00:31:05,760
can actually use a global variable

854
00:31:02,960 --> 00:31:07,039
for the current thread pointer

855
00:31:05,760 --> 00:31:09,600
um and so

856
00:31:07,039 --> 00:31:10,960
we don't need to use the old thread info

857
00:31:09,600 --> 00:31:13,679
in stack

858
00:31:10,960 --> 00:31:15,840
code anymore because we can just use the

859
00:31:13,679 --> 00:31:17,919
we can you we can put the thread info in

860
00:31:15,840 --> 00:31:19,600
the task struct i'm going to either

861
00:31:17,919 --> 00:31:22,399
reference the thread info the current

862
00:31:19,600 --> 00:31:24,559
thread info using the magic cpu register

863
00:31:22,399 --> 00:31:26,720
or if you're on a unit processor part

864
00:31:24,559 --> 00:31:28,240
you can use a global variable

865
00:31:26,720 --> 00:31:30,720
so that meant that we could get rid of

866
00:31:28,240 --> 00:31:32,720
the old code paths in in the arm code

867
00:31:30,720 --> 00:31:34,880
and that was really the kind of the the

868
00:31:32,720 --> 00:31:37,440
big concern that we had was that if we

869
00:31:34,880 --> 00:31:40,399
had to leave around support for thread

870
00:31:37,440 --> 00:31:41,279
info in the kernel stack then all of the

871
00:31:40,399 --> 00:31:43,039
arm

872
00:31:41,279 --> 00:31:45,760
assembly code and all of the arm

873
00:31:43,039 --> 00:31:49,120
specific architecture code would have to

874
00:31:45,760 --> 00:31:51,039
have to retain this these old code paths

875
00:31:49,120 --> 00:31:52,960
and because it didn't get run very often

876
00:31:51,039 --> 00:31:54,559
for testing and certainly i wasn't ever

877
00:31:52,960 --> 00:31:56,399
running it and art wasn't ever running

878
00:31:54,559 --> 00:31:58,080
it we were concerned that that code

879
00:31:56,399 --> 00:32:00,559
would eventually stop working correctly

880
00:31:58,080 --> 00:32:02,880
in some cases and so getting to a single

881
00:32:00,559 --> 00:32:05,440
code paths was super important to make

882
00:32:02,880 --> 00:32:07,679
these patches viable and so figuring out

883
00:32:05,440 --> 00:32:09,519
a figuring out that there were no parts

884
00:32:07,679 --> 00:32:12,399
that we needed to use that old code path

885
00:32:09,519 --> 00:32:13,760
on was really really helpful

886
00:32:12,399 --> 00:32:16,320
and now we just had to figure out what

887
00:32:13,760 --> 00:32:20,080
we wanted to do uh to solve the unit

888
00:32:16,320 --> 00:32:22,880
processor versus uh s p parts

889
00:32:20,080 --> 00:32:24,799
um and so uh i i actually did wasn't

890
00:32:22,880 --> 00:32:26,880
part of this development at all

891
00:32:24,799 --> 00:32:28,960
this happened in november and december

892
00:32:26,880 --> 00:32:30,240
when art went back and figured out that

893
00:32:28,960 --> 00:32:33,919
what he could do

894
00:32:30,240 --> 00:32:36,399
was patch the linux kernel at boot time

895
00:32:33,919 --> 00:32:39,200
to switch between these two modes so you

896
00:32:36,399 --> 00:32:41,679
can actually compile the kernel and say

897
00:32:39,200 --> 00:32:44,000
if you run on a uniprocessor part that

898
00:32:41,679 --> 00:32:47,200
doesn't have this magic register

899
00:32:44,000 --> 00:32:50,000
then rewrite all the code that that

900
00:32:47,200 --> 00:32:51,840
fetches the current thread info pointer

901
00:32:50,000 --> 00:32:54,559
from a register rewrite that with code

902
00:32:51,840 --> 00:32:56,559
that fetches it from a global variable

903
00:32:54,559 --> 00:32:57,600
it took some clever code to make sure

904
00:32:56,559 --> 00:33:01,440
that the

905
00:32:57,600 --> 00:33:03,600
uh code sequences that fetch data from

906
00:33:01,440 --> 00:33:06,000
memory didn't need any temporary

907
00:33:03,600 --> 00:33:10,320
registers because we didn't have any uh

908
00:33:06,000 --> 00:33:12,480
in the in the uh to to use at that point

909
00:33:10,320 --> 00:33:14,880
but art found a nice clever mechanism

910
00:33:12,480 --> 00:33:17,039
using a couple of a couple of magic arm

911
00:33:14,880 --> 00:33:19,360
instructions in thumb mode that managed

912
00:33:17,039 --> 00:33:22,559
to do that computation

913
00:33:19,360 --> 00:33:24,960
so now we can compile the kernel

914
00:33:22,559 --> 00:33:28,080
uh a single kernel can get compiled that

915
00:33:24,960 --> 00:33:30,080
will run on all arm systems uh

916
00:33:28,080 --> 00:33:32,080
uniprocessor or the magic new

917
00:33:30,080 --> 00:33:34,000
multiprocessor parts with this met the

918
00:33:32,080 --> 00:33:36,720
magic registers

919
00:33:34,000 --> 00:33:38,480
these patches haven't quite landed

920
00:33:36,720 --> 00:33:39,279
these and a couple of other things are

921
00:33:38,480 --> 00:33:42,720
are

922
00:33:39,279 --> 00:33:44,240
are hoping to land in 5.18

923
00:33:42,720 --> 00:33:45,279
um but it's going to be super cool

924
00:33:44,240 --> 00:33:47,360
because all of a sudden we're going to

925
00:33:45,279 --> 00:33:49,840
have a single kernel that runs on uh

926
00:33:47,360 --> 00:33:53,440
that runs on all arm parts um and fixes

927
00:33:49,840 --> 00:33:55,519
this particular bug okay so after we

928
00:33:53,440 --> 00:33:58,240
found after we got uh bug number one

929
00:33:55,519 --> 00:34:00,399
fixed uh art has started to go off uh

930
00:33:58,240 --> 00:34:02,000
and work on bug number two

931
00:34:00,399 --> 00:34:03,919
we hope it doesn't take six months to

932
00:34:02,000 --> 00:34:05,840
fix that one as well

933
00:34:03,919 --> 00:34:08,240
and that was talking about guard pages

934
00:34:05,840 --> 00:34:10,000
around the kernel stack um

935
00:34:08,240 --> 00:34:12,320
we're going to put missing pages around

936
00:34:10,000 --> 00:34:14,480
the kernel stack to cause a hardware

937
00:34:12,320 --> 00:34:16,639
fault and that'll that'll catch both

938
00:34:14,480 --> 00:34:18,480
underflow and underflow overflow and

939
00:34:16,639 --> 00:34:20,399
underflow of the stack

940
00:34:18,480 --> 00:34:22,399
and it means that applicate exploits

941
00:34:20,399 --> 00:34:24,240
can't write read or write data adjacent

942
00:34:22,399 --> 00:34:26,000
to the stack allocation

943
00:34:24,240 --> 00:34:28,079
remember the current the thread info

944
00:34:26,000 --> 00:34:30,320
used to be right in the stack frame i'm

945
00:34:28,079 --> 00:34:32,560
going to stack overflow could smash that

946
00:34:30,320 --> 00:34:34,879
but if the stack is allocated in regular

947
00:34:32,560 --> 00:34:36,960
kernel memory then whatever is allocated

948
00:34:34,879 --> 00:34:39,040
right next to the kernel stack is still

949
00:34:36,960 --> 00:34:40,480
subject to overflow

950
00:34:39,040 --> 00:34:42,079
and so we want to protect the kernel

951
00:34:40,480 --> 00:34:44,480
against that as well

952
00:34:42,079 --> 00:34:46,079
it requires a virtually mapped stack and

953
00:34:44,480 --> 00:34:48,960
that means that the

954
00:34:46,079 --> 00:34:50,320
the stack is no longer going to be part

955
00:34:48,960 --> 00:34:51,839
of the

956
00:34:50,320 --> 00:34:54,399
part of the kernel's linear address

957
00:34:51,839 --> 00:34:57,599
space and that has a huge number of

958
00:34:54,399 --> 00:35:00,800
implications across the entire kernel

959
00:34:57,599 --> 00:35:02,720
uh especially for low-level boot code

960
00:35:00,800 --> 00:35:05,760
and especially on architecture's old

961
00:35:02,720 --> 00:35:07,359
architectures like arm32 arm32

962
00:35:05,760 --> 00:35:09,280
um so

963
00:35:07,359 --> 00:35:11,359
the the first thing we tried was just

964
00:35:09,280 --> 00:35:13,440
turn it on and see what happens

965
00:35:11,359 --> 00:35:14,960
there's a config vmap stack option and

966
00:35:13,440 --> 00:35:16,960
you just turn it on and suddenly you get

967
00:35:14,960 --> 00:35:20,560
virtually mapped stacks

968
00:35:16,960 --> 00:35:22,400
well on arm 32 yeah not so much

969
00:35:20,560 --> 00:35:24,800
there's a huge amount of code in the arm

970
00:35:22,400 --> 00:35:27,040
32 architecture specific stuff that

971
00:35:24,800 --> 00:35:28,640
assumes the kernel stacks are all in

972
00:35:27,040 --> 00:35:32,160
this linear map

973
00:35:28,640 --> 00:35:34,240
and not mapped in virtual address space

974
00:35:32,160 --> 00:35:35,920
the suspend and resume code assumes that

975
00:35:34,240 --> 00:35:37,920
the physical addresses will match the

976
00:35:35,920 --> 00:35:40,240
kernel addresses

977
00:35:37,920 --> 00:35:42,640
we also have to deal with the fact that

978
00:35:40,240 --> 00:35:44,400
if you can overflow the stack

979
00:35:42,640 --> 00:35:46,320
you want to be able to recover safely

980
00:35:44,400 --> 00:35:47,839
from the stack overflow

981
00:35:46,320 --> 00:35:50,720
so in order to do that you need to

982
00:35:47,839 --> 00:35:53,359
allocate a new temporary stack in case

983
00:35:50,720 --> 00:35:55,359
you overflow the current kernel stack

984
00:35:53,359 --> 00:35:57,040
and so for every c every core in the

985
00:35:55,359 --> 00:35:59,040
system there's an overflow stack

986
00:35:57,040 --> 00:36:00,480
allocated a single page

987
00:35:59,040 --> 00:36:01,599
that we can use when we're dealing with

988
00:36:00,480 --> 00:36:03,599
a problem

989
00:36:01,599 --> 00:36:06,400
but now all of a sudden the kernel stack

990
00:36:03,599 --> 00:36:08,640
might not be contiguous in memory

991
00:36:06,400 --> 00:36:11,359
and that causes all kinds of havoc

992
00:36:08,640 --> 00:36:13,440
especially with kernel stack traces so

993
00:36:11,359 --> 00:36:15,040
when you get a kernel stack overflow you

994
00:36:13,440 --> 00:36:16,480
really want to find out where that

995
00:36:15,040 --> 00:36:18,800
happened you really want to get a stack

996
00:36:16,480 --> 00:36:20,160
trace printed out that's reliable to the

997
00:36:18,800 --> 00:36:23,280
to the console

998
00:36:20,160 --> 00:36:25,280
so you can figure out where the bug was

999
00:36:23,280 --> 00:36:26,880
and that means that we had that art had

1000
00:36:25,280 --> 00:36:29,520
to go back and

1001
00:36:26,880 --> 00:36:31,200
re-engineer a ton of the stack craze

1002
00:36:29,520 --> 00:36:34,000
stack trace code to deal with the fact

1003
00:36:31,200 --> 00:36:36,720
that now we have this overflow stack and

1004
00:36:34,000 --> 00:36:39,520
it has pointer references back up into

1005
00:36:36,720 --> 00:36:42,079
the main stack and those are

1006
00:36:39,520 --> 00:36:44,000
not in contiguous parts of memory

1007
00:36:42,079 --> 00:36:45,200
and those patches still have not landed

1008
00:36:44,000 --> 00:36:48,000
upstream

1009
00:36:45,200 --> 00:36:50,640
uh but we're getting closer

1010
00:36:48,000 --> 00:36:52,240
okay so what is the current status

1011
00:36:50,640 --> 00:36:54,960
as i said above

1012
00:36:52,240 --> 00:36:58,079
the bug number one has been fixed uh for

1013
00:36:54,960 --> 00:36:59,280
for the modern arm 32 processors v6k and

1014
00:36:58,079 --> 00:37:02,240
v7

1015
00:36:59,280 --> 00:37:04,480
um we're waiting to get uh

1016
00:37:02,240 --> 00:37:08,560
number one for that to get merged um we

1017
00:37:04,480 --> 00:37:12,160
hope to get that merged in in 5.18

1018
00:37:08,560 --> 00:37:14,880
uh number two uh the vmap stack fix uh

1019
00:37:12,160 --> 00:37:17,839
the fixes for bug number two uh uh got

1020
00:37:14,880 --> 00:37:19,359
dropped from the 5.17 merge window

1021
00:37:17,839 --> 00:37:21,680
hoping to get

1022
00:37:19,359 --> 00:37:24,560
them into the 5.18 merge window so

1023
00:37:21,680 --> 00:37:26,800
things are making progress uh and i'm

1024
00:37:24,560 --> 00:37:29,280
hoping to hoping to help art get these

1025
00:37:26,800 --> 00:37:31,200
merged in the next couple of months

1026
00:37:29,280 --> 00:37:33,119
uh and with that at the end of my

1027
00:37:31,200 --> 00:37:35,520
presentation if we have questions we've

1028
00:37:33,119 --> 00:37:37,040
got uh about eight minutes seven and a

1029
00:37:35,520 --> 00:37:40,880
half minutes for questions thanks very

1030
00:37:37,040 --> 00:37:40,880
much for letting me come back to lca

1031
00:37:40,960 --> 00:37:46,079
thank you so much keith um for sharing

1032
00:37:44,160 --> 00:37:46,960
that story with us

1033
00:37:46,079 --> 00:37:48,240
um

1034
00:37:46,960 --> 00:37:51,599
it's

1035
00:37:48,240 --> 00:37:54,079
yeah it's fascinating hearing stories of

1036
00:37:51,599 --> 00:37:56,480
bug fixes and investigations and things

1037
00:37:54,079 --> 00:37:58,640
we do have a few questions for you

1038
00:37:56,480 --> 00:38:01,119
and we've got quite a bit of time

1039
00:37:58,640 --> 00:38:02,880
um and the rest of the questions are a

1040
00:38:01,119 --> 00:38:05,920
bit more technical i promise but the

1041
00:38:02,880 --> 00:38:07,760
most upvoted question here so far is

1042
00:38:05,920 --> 00:38:11,280
which are the best board games to play

1043
00:38:07,760 --> 00:38:11,280
with case while chatting colonel

1044
00:38:11,440 --> 00:38:15,680
casey has a huge collection of board

1045
00:38:13,839 --> 00:38:17,680
games which is awesome

1046
00:38:15,680 --> 00:38:18,960
and so one of the ones that that he

1047
00:38:17,680 --> 00:38:21,119
showed me that we've been playing quite

1048
00:38:18,960 --> 00:38:23,440
a bit is called patchwork

1049
00:38:21,119 --> 00:38:26,079
it's a quilt-themed two-player board

1050
00:38:23,440 --> 00:38:28,000
game uh we're trying to piece together

1051
00:38:26,079 --> 00:38:29,839
uh so it's not really a quilt themed

1052
00:38:28,000 --> 00:38:31,760
it's more of a patchwork theme

1053
00:38:29,839 --> 00:38:34,079
or you're trying to patch work together

1054
00:38:31,760 --> 00:38:35,839
a quilt on your game board and scoring

1055
00:38:34,079 --> 00:38:39,200
points as a result of that and we've

1056
00:38:35,839 --> 00:38:41,440
been having a lot of fun with that one

1057
00:38:39,200 --> 00:38:43,520
that sounds lovely that sounds

1058
00:38:41,440 --> 00:38:44,880
absolutely lovely

1059
00:38:43,520 --> 00:38:47,040
okay

1060
00:38:44,880 --> 00:38:49,359
next question

1061
00:38:47,040 --> 00:38:52,480
does the current status mean for someone

1062
00:38:49,359 --> 00:38:55,200
making a product with um arm 64 chip

1063
00:38:52,480 --> 00:38:58,000
there is choice between more security uh

1064
00:38:55,200 --> 00:39:01,040
running 64-bit kernel or less memory

1065
00:38:58,000 --> 00:39:03,839
usage running a 32-bit kernel

1066
00:39:01,040 --> 00:39:06,560
yes exactly oh and so right now because

1067
00:39:03,839 --> 00:39:09,680
there are still so many unfixed kspp

1068
00:39:06,560 --> 00:39:12,320
problems uh arm32 kernels are vulnerable

1069
00:39:09,680 --> 00:39:13,920
to exploits that arm 64 kernels simply

1070
00:39:12,320 --> 00:39:15,599
are not

1071
00:39:13,920 --> 00:39:16,320
and

1072
00:39:15,599 --> 00:39:17,920
in

1073
00:39:16,320 --> 00:39:20,000
and we're going to try to fix those as

1074
00:39:17,920 --> 00:39:23,119
rapidly as we can to try to try try to

1075
00:39:20,000 --> 00:39:25,200
get back to parity um even still arm 64

1076
00:39:23,119 --> 00:39:27,440
is likely to be more secure because it

1077
00:39:25,200 --> 00:39:29,760
has a bigger memory address space

1078
00:39:27,440 --> 00:39:31,839
um and so a lot of the things like uh

1079
00:39:29,760 --> 00:39:34,720
address-based randomization are kind of

1080
00:39:31,839 --> 00:39:36,800
more effective in the 64-bit world

1081
00:39:34,720 --> 00:39:38,880
but i'm hoping that arm32 kernels will

1082
00:39:36,800 --> 00:39:41,359
will at least be

1083
00:39:38,880 --> 00:39:45,520
kind of to parity with 32-bit x86

1084
00:39:41,359 --> 00:39:45,520
kernels within the next year or so

1085
00:39:46,720 --> 00:39:49,839
your next question

1086
00:39:48,880 --> 00:39:51,839
is

1087
00:39:49,839 --> 00:39:54,560
when will amazon be able to use these

1088
00:39:51,839 --> 00:39:57,280
new features in their products

1089
00:39:54,560 --> 00:39:59,119
i'm hoping super soon um we're we're

1090
00:39:57,280 --> 00:40:00,000
constantly updating which kernels we're

1091
00:39:59,119 --> 00:40:02,320
using

1092
00:40:00,000 --> 00:40:04,480
um one of the one of the uh challenges

1093
00:40:02,320 --> 00:40:06,880
with any with any linux based products

1094
00:40:04,480 --> 00:40:09,599
is that you you get uh you work with an

1095
00:40:06,880 --> 00:40:11,920
soc vendor to get linux kernels

1096
00:40:09,599 --> 00:40:14,400
tuned for a particular soc and ready to

1097
00:40:11,920 --> 00:40:16,720
go and integrate into your project

1098
00:40:14,400 --> 00:40:17,760
and so we need to work with the our soc

1099
00:40:16,720 --> 00:40:19,040
vendors

1100
00:40:17,760 --> 00:40:20,800
to figure out when they're going to be

1101
00:40:19,040 --> 00:40:23,200
ready to switch to a kernel that has the

1102
00:40:20,800 --> 00:40:25,599
stuff enabled

1103
00:40:23,200 --> 00:40:27,119
i would i really can't uh give any kind

1104
00:40:25,599 --> 00:40:27,870
of dates for that because i don't have

1105
00:40:27,119 --> 00:40:31,140
any idea

1106
00:40:27,870 --> 00:40:31,140
[Laughter]

1107
00:40:31,920 --> 00:40:36,240
timelines but we do have quite a bit of

1108
00:40:33,760 --> 00:40:37,839
leverage with soc vendors uh and and

1109
00:40:36,240 --> 00:40:40,240
getting them getting them to run more

1110
00:40:37,839 --> 00:40:43,119
recent kernels is definitely one of our

1111
00:40:40,240 --> 00:40:45,200
one of our big big issues

1112
00:40:43,119 --> 00:40:46,560
that makes sense

1113
00:40:45,200 --> 00:40:48,640
is

1114
00:40:46,560 --> 00:40:49,920
anyone working on similar issues and i'm

1115
00:40:48,640 --> 00:40:51,680
sorry i'm not sure if this is supposed

1116
00:40:49,920 --> 00:40:53,839
to be pronounced letter by letter or as

1117
00:40:51,680 --> 00:40:56,000
a whole mips

1118
00:40:53,839 --> 00:40:57,839
on mips i don't know of anybody working

1119
00:40:56,000 --> 00:40:58,880
on this on mips which is kind of

1120
00:40:57,839 --> 00:40:59,680
interesting

1121
00:40:58,880 --> 00:41:01,839
um

1122
00:40:59,680 --> 00:41:03,599
there are still a lot of products based

1123
00:41:01,839 --> 00:41:06,560
on mips especially in the television

1124
00:41:03,599 --> 00:41:08,319
space i mean it would be awesome for uh

1125
00:41:06,560 --> 00:41:10,960
for people who are who are working with

1126
00:41:08,319 --> 00:41:12,720
those parts uh to actually dig in

1127
00:41:10,960 --> 00:41:15,040
and figure out how to how to how to do

1128
00:41:12,720 --> 00:41:17,680
something similar um

1129
00:41:15,040 --> 00:41:19,359
uh i don't know of anything that that i

1130
00:41:17,680 --> 00:41:20,960
don't think there are any amazon

1131
00:41:19,359 --> 00:41:23,280
products that are using mips chips at

1132
00:41:20,960 --> 00:41:25,599
this point uh they used to be used a lot

1133
00:41:23,280 --> 00:41:28,079
in routers so um

1134
00:41:25,599 --> 00:41:30,960
maybe the maybe the uh open worked uh

1135
00:41:28,079 --> 00:41:32,720
crowd uh wants to dig in and find uh

1136
00:41:30,960 --> 00:41:36,720
find some some people interested in

1137
00:41:32,720 --> 00:41:36,720
working on these patches for mips

1138
00:41:37,599 --> 00:41:42,720
are there any plans to also apply the

1139
00:41:40,240 --> 00:41:44,640
thread info task struct change to power

1140
00:41:42,720 --> 00:41:47,040
pc

1141
00:41:44,640 --> 00:41:50,160
uh those are already all in power pc all

1142
00:41:47,040 --> 00:41:52,560
of this stuff oh i'm sorry the power pc

1143
00:41:50,160 --> 00:41:55,200
hack that i was talking about was was

1144
00:41:52,560 --> 00:41:57,280
actually a patch required because they

1145
00:41:55,200 --> 00:41:59,920
moved to this mechanism

1146
00:41:57,280 --> 00:42:04,079
so in power pc they needed to go find

1147
00:41:59,920 --> 00:42:05,200
that cpu value given only a thread info

1148
00:42:04,079 --> 00:42:06,880
pointer

1149
00:42:05,200 --> 00:42:08,960
and the place where they needed to use

1150
00:42:06,880 --> 00:42:11,440
that they couldn't include the entire

1151
00:42:08,960 --> 00:42:13,280
task struck include file um and so

1152
00:42:11,440 --> 00:42:15,440
that's the reason they used this magic

1153
00:42:13,280 --> 00:42:17,760
cluj they used the kluge because they've

1154
00:42:15,440 --> 00:42:20,319
already done this and all powerpc uses

1155
00:42:17,760 --> 00:42:20,319
this already

1156
00:42:22,720 --> 00:42:25,680
um

1157
00:42:24,160 --> 00:42:29,440
there's a there's a bunch of questions

1158
00:42:25,680 --> 00:42:31,520
that are tied for votes here um

1159
00:42:29,440 --> 00:42:33,520
with the thread info being architecture

1160
00:42:31,520 --> 00:42:37,119
specific are many attacks seen in the

1161
00:42:33,520 --> 00:42:40,000
wild yet against arm 32

1162
00:42:37,119 --> 00:42:40,000
i don't know

1163
00:42:41,440 --> 00:42:45,520
i would love i would love to get some

1164
00:42:43,200 --> 00:42:48,480
information about that but um

1165
00:42:45,520 --> 00:42:51,440
i'm i'm i haven't really looked to see

1166
00:42:48,480 --> 00:42:54,240
what kind of cves are being reported

1167
00:42:51,440 --> 00:42:56,240
against arm 32 products

1168
00:42:54,240 --> 00:42:57,760
i'm kind of terrified to go look because

1169
00:42:56,240 --> 00:43:00,319
i know that they're vulnerable and

1170
00:42:57,760 --> 00:43:03,119
surely somebody must be using these to

1171
00:43:00,319 --> 00:43:05,040
exploit vulnerabilities in arm32 based

1172
00:43:03,119 --> 00:43:06,720
products but i

1173
00:43:05,040 --> 00:43:08,960
i can't really see them separately

1174
00:43:06,720 --> 00:43:10,880
because the cves that i get that i have

1175
00:43:08,960 --> 00:43:14,079
visibility to are mostly on the server

1176
00:43:10,880 --> 00:43:15,920
side and so those are the arm 64 and x86

1177
00:43:14,079 --> 00:43:18,160
64 cves

1178
00:43:15,920 --> 00:43:20,240
um and so maybe i'll get some visibility

1179
00:43:18,160 --> 00:43:21,119
into arm32 cves

1180
00:43:20,240 --> 00:43:23,359
um

1181
00:43:21,119 --> 00:43:25,520
but they're they're much

1182
00:43:23,359 --> 00:43:28,240
there's kind of a different community of

1183
00:43:25,520 --> 00:43:30,079
of people working on those chips

1184
00:43:28,240 --> 00:43:31,599
and so i don't know how the security

1185
00:43:30,079 --> 00:43:32,800
vulnerabilities are reported in that

1186
00:43:31,599 --> 00:43:34,400
environment

1187
00:43:32,800 --> 00:43:37,760
i should go find out thank you that's a

1188
00:43:34,400 --> 00:43:37,760
good good suggestion

1189
00:43:38,000 --> 00:43:41,920
was most of this work performed under

1190
00:43:40,000 --> 00:43:44,160
emulation

1191
00:43:41,920 --> 00:43:46,560
yes uh essentially all the development

1192
00:43:44,160 --> 00:43:49,839
work was performed under emulation

1193
00:43:46,560 --> 00:43:51,119
um because the emulator lets you run gdb

1194
00:43:49,839 --> 00:43:52,560
on the target

1195
00:43:51,119 --> 00:43:53,760
and so you get a full debugging

1196
00:43:52,560 --> 00:43:56,960
environment

1197
00:43:53,760 --> 00:43:58,319
but not to worry um our case and case

1198
00:43:56,960 --> 00:44:00,000
and i have a friend who live in portland

1199
00:43:58,319 --> 00:44:02,480
vagrant cascadian

1200
00:44:00,000 --> 00:44:03,680
who had a couple of spare raspberry pi

1201
00:44:02,480 --> 00:44:06,720
boards

1202
00:44:03,680 --> 00:44:08,640
with suitable processors and so let me

1203
00:44:06,720 --> 00:44:10,640
see if i can watch it here

1204
00:44:08,640 --> 00:44:11,359
i actually have a board that he shipped

1205
00:44:10,640 --> 00:44:13,839
me

1206
00:44:11,359 --> 00:44:15,520
uh that i've been doing uh the actual

1207
00:44:13,839 --> 00:44:16,720
validation of the patches on real

1208
00:44:15,520 --> 00:44:18,400
hardware

1209
00:44:16,720 --> 00:44:20,000
because it's nice to know that they work

1210
00:44:18,400 --> 00:44:21,440
in emulation but you really need to

1211
00:44:20,000 --> 00:44:23,680
validate that it works on hardware

1212
00:44:21,440 --> 00:44:26,720
before you're sure um i need to get a

1213
00:44:23,680 --> 00:44:28,720
couple more boards running uh now that

1214
00:44:26,720 --> 00:44:31,040
we have the unit processor stuff going

1215
00:44:28,720 --> 00:44:33,599
uh to make sure that also works on those

1216
00:44:31,040 --> 00:44:35,200
so yes emulation is awesome everybody

1217
00:44:33,599 --> 00:44:37,680
should do all kernel development in

1218
00:44:35,200 --> 00:44:39,760
emulation

1219
00:44:37,680 --> 00:44:42,160
that makes sense if only that were

1220
00:44:39,760 --> 00:44:42,160
possible

1221
00:44:43,359 --> 00:44:48,560
okay we are out of time there's still a

1222
00:44:45,440 --> 00:44:51,440
couple more questions um so we will move

1223
00:44:48,560 --> 00:44:54,960
those transfer those questions over to

1224
00:44:51,440 --> 00:44:57,280
the post talk chat kaya theater channel

1225
00:44:54,960 --> 00:45:00,640
which is invenulous if you have you may

1226
00:44:57,280 --> 00:45:02,560
have to go into the browse channels um

1227
00:45:00,640 --> 00:45:04,720
button to find that channel if it's not

1228
00:45:02,560 --> 00:45:07,280
appearing in your list of channels

1229
00:45:04,720 --> 00:45:08,640
and keith will be there to have a bit of

1230
00:45:07,280 --> 00:45:10,960
a chat after

1231
00:45:08,640 --> 00:45:13,119
so we're out of time it is now lunch

1232
00:45:10,960 --> 00:45:15,920
time enjoy a little bit of a break

1233
00:45:13,119 --> 00:45:19,040
everyone um and a reminder that the

1234
00:45:15,920 --> 00:45:21,839
linux australia agm is happening now so

1235
00:45:19,040 --> 00:45:24,000
if you're going to that ah don't miss it

1236
00:45:21,839 --> 00:45:28,119
okay thanks again keith and enjoy your

1237
00:45:24,000 --> 00:45:28,119
lunch everyone yep

