1
00:00:06,320 --> 00:00:11,499
[Music]

2
00:00:15,440 --> 00:00:18,720
hello everyone and welcome back from

3
00:00:17,039 --> 00:00:20,800
your lunch i hope you had a great time

4
00:00:18,720 --> 00:00:23,279
uh and hopefully had some good food

5
00:00:20,800 --> 00:00:26,400
uh next up in in the kia ora theater at

6
00:00:23,279 --> 00:00:28,720
linux conference 2022 is mike cohen a

7
00:00:26,400 --> 00:00:30,160
renowned digital forensic engineer and

8
00:00:28,720 --> 00:00:32,559
senior software engineer who described

9
00:00:30,160 --> 00:00:34,239
himself as a digital paleontologist

10
00:00:32,559 --> 00:00:36,160
mike is the founder and creator of

11
00:00:34,239 --> 00:00:37,760
philosoraptor which is an advanced open

12
00:00:36,160 --> 00:00:40,399
source digital forensic and incident

13
00:00:37,760 --> 00:00:43,200
response framework supporting

14
00:00:40,399 --> 00:00:45,200
linux mac os and windows there's so many

15
00:00:43,200 --> 00:00:46,640
thousand digital forensics

16
00:00:45,200 --> 00:00:47,920
uh

17
00:00:46,640 --> 00:00:49,600
today mike is talking through hunting

18
00:00:47,920 --> 00:00:52,480
for threats on a linux host using

19
00:00:49,600 --> 00:00:53,440
velociraptor and its query language vql

20
00:00:52,480 --> 00:00:56,399
take it off

21
00:00:53,440 --> 00:00:58,399
thanks very much thank you so um i'm

22
00:00:56,399 --> 00:01:01,280
really glad to be here today and talk to

23
00:00:58,399 --> 00:01:03,520
you about velociraptor uh and today

24
00:01:01,280 --> 00:01:06,080
we're going to cover some of the

25
00:01:03,520 --> 00:01:07,680
linux aspects of typical investigation

26
00:01:06,080 --> 00:01:10,000
that we see

27
00:01:07,680 --> 00:01:10,720
when we're doing linux incident response

28
00:01:10,000 --> 00:01:13,040
or

29
00:01:10,720 --> 00:01:14,880
forensic investigations to give you guys

30
00:01:13,040 --> 00:01:18,240
a little bit of a taste to

31
00:01:14,880 --> 00:01:21,920
how to do forensic response at scale and

32
00:01:18,240 --> 00:01:23,360
how velociraptor can make that easier so

33
00:01:21,920 --> 00:01:26,080
specifically

34
00:01:23,360 --> 00:01:28,479
velociraptor has a lot of capabilities

35
00:01:26,080 --> 00:01:30,799
we're really not going to even touch on

36
00:01:28,479 --> 00:01:33,119
many of the capabilities that it has

37
00:01:30,799 --> 00:01:35,680
uh and we're going to cover a lot of the

38
00:01:33,119 --> 00:01:38,000
the things in very briefly but because

39
00:01:35,680 --> 00:01:39,360
this is a linux conference um then we're

40
00:01:38,000 --> 00:01:41,520
going to really look at

41
00:01:39,360 --> 00:01:42,399
typical linux use cases

42
00:01:41,520 --> 00:01:44,640
and

43
00:01:42,399 --> 00:01:46,720
since it's an open source conference so

44
00:01:44,640 --> 00:01:49,360
i'm hoping to give you guys a bit of an

45
00:01:46,720 --> 00:01:52,799
idea as to how to join the open source

46
00:01:49,360 --> 00:01:55,280
uh project and contribute and uh and and

47
00:01:52,799 --> 00:01:57,520
use that so um

48
00:01:55,280 --> 00:02:00,799
what is velociraptor and i've spoken

49
00:01:57,520 --> 00:02:02,960
about uh velociraptor in linux conf um i

50
00:02:00,799 --> 00:02:05,119
think last year there was um

51
00:02:02,960 --> 00:02:08,000
a workshop about it so you know we we

52
00:02:05,119 --> 00:02:10,879
covered it in a lot of depth uh but it

53
00:02:08,000 --> 00:02:13,920
is an open source tool uh that's really

54
00:02:10,879 --> 00:02:15,440
designed for um for digital forensic and

55
00:02:13,920 --> 00:02:18,080
instant response

56
00:02:15,440 --> 00:02:20,160
and uh and also alerting and detection

57
00:02:18,080 --> 00:02:23,840
and essentially it's a way of making it

58
00:02:20,160 --> 00:02:25,760
easy for us to manage uh or investigate

59
00:02:23,840 --> 00:02:26,800
uh at scale

60
00:02:25,760 --> 00:02:28,480
so

61
00:02:26,800 --> 00:02:30,720
the the thing that makes velociraptor

62
00:02:28,480 --> 00:02:33,680
really cool is that it has a query

63
00:02:30,720 --> 00:02:35,519
language called vql and vql is really

64
00:02:33,680 --> 00:02:36,400
kind of at the core of velociraptor it

65
00:02:35,519 --> 00:02:37,840
makes it

66
00:02:36,400 --> 00:02:39,519
do everything and we're going to cover

67
00:02:37,840 --> 00:02:40,640
some of the um

68
00:02:39,519 --> 00:02:42,800
some of the things that we can do with

69
00:02:40,640 --> 00:02:45,040
vql today and how we can use that in the

70
00:02:42,800 --> 00:02:47,840
real world um so

71
00:02:45,040 --> 00:02:49,360
you know it's it's just a bit of a taste

72
00:02:47,840 --> 00:02:50,879
um oops

73
00:02:49,360 --> 00:02:53,200
all right so let's have a look at

74
00:02:50,879 --> 00:02:55,840
generally what velociraptor looks like

75
00:02:53,200 --> 00:02:57,920
so we have a velociraptor server usually

76
00:02:55,840 --> 00:03:01,120
we deploy it in the cloud

77
00:02:57,920 --> 00:03:03,440
and it basically it connects with uh

78
00:03:01,120 --> 00:03:06,720
assets which could be laptops or

79
00:03:03,440 --> 00:03:09,200
servers or essentially any kind of

80
00:03:06,720 --> 00:03:11,760
system that runs the agent so we have

81
00:03:09,200 --> 00:03:12,640
support for windows mac os and linux

82
00:03:11,760 --> 00:03:14,959
agents

83
00:03:12,640 --> 00:03:18,000
today we'll talk about linux

84
00:03:14,959 --> 00:03:19,920
but the agents are connected consider

85
00:03:18,000 --> 00:03:22,400
persistently to the server so that means

86
00:03:19,920 --> 00:03:23,519
that we can investigate each of these

87
00:03:22,400 --> 00:03:25,280
agents

88
00:03:23,519 --> 00:03:26,640
with you know within seconds we don't

89
00:03:25,280 --> 00:03:27,920
need to wait for them to pull or

90
00:03:26,640 --> 00:03:30,159
anything like that we can immediately

91
00:03:27,920 --> 00:03:32,400
get results from them and then we have

92
00:03:30,159 --> 00:03:34,720
the admin ui which

93
00:03:32,400 --> 00:03:37,519
we use that to manage the deployment so

94
00:03:34,720 --> 00:03:40,000
i actually have a bit of a demo today so

95
00:03:37,519 --> 00:03:42,879
i'm just going to show you guys what the

96
00:03:40,000 --> 00:03:43,920
admin ui looks like and as you can see

97
00:03:42,879 --> 00:03:46,720
um

98
00:03:43,920 --> 00:03:49,280
we have just the the welcome screen

99
00:03:46,720 --> 00:03:51,519
there is a dashboard over here that uh

100
00:03:49,280 --> 00:03:53,680
you know just tells us uh some

101
00:03:51,519 --> 00:03:55,519
information about this deployment

102
00:03:53,680 --> 00:03:58,239
uh like how much disk space there is and

103
00:03:55,519 --> 00:04:01,840
things like that and uh today in this

104
00:03:58,239 --> 00:04:04,319
demonstration i have um i have about a

105
00:04:01,840 --> 00:04:06,480
thousand clients connected so a thousand

106
00:04:04,319 --> 00:04:08,480
endpoints connected and uh and the

107
00:04:06,480 --> 00:04:10,799
server is you know kind of waiting for

108
00:04:08,480 --> 00:04:12,959
us we're gonna do some some interesting

109
00:04:10,799 --> 00:04:14,560
work on that so if i just um

110
00:04:12,959 --> 00:04:16,479
search for my clients i can see these

111
00:04:14,560 --> 00:04:17,759
are all my clients here

112
00:04:16,479 --> 00:04:19,440
and

113
00:04:17,759 --> 00:04:21,680
you know i can look at each of them

114
00:04:19,440 --> 00:04:25,440
randomly and see some information about

115
00:04:21,680 --> 00:04:28,240
it including um collecting so telemetry

116
00:04:25,440 --> 00:04:30,880
you know about like how how much cpu and

117
00:04:28,240 --> 00:04:33,120
usage it's you know that each client is

118
00:04:30,880 --> 00:04:34,800
taking each endpoint is taking

119
00:04:33,120 --> 00:04:37,919
but uh that's just

120
00:04:34,800 --> 00:04:38,800
um so just showing you how i can control

121
00:04:37,919 --> 00:04:42,160
each

122
00:04:38,800 --> 00:04:44,720
client each uh we call clients the um

123
00:04:42,160 --> 00:04:47,840
the assets right so they are the clients

124
00:04:44,720 --> 00:04:49,520
okay so typically um we it's very

125
00:04:47,840 --> 00:04:51,840
efficient it's really fast designed to

126
00:04:49,520 --> 00:04:52,880
collect a lot of data real quickly

127
00:04:51,840 --> 00:04:56,240
um

128
00:04:52,880 --> 00:04:58,479
because most of the work is done by

129
00:04:56,240 --> 00:05:00,400
using the query language which runs on

130
00:04:58,479 --> 00:05:02,639
the endpoint so you'll see that later

131
00:05:00,400 --> 00:05:04,720
when we're going to do some pretty heavy

132
00:05:02,639 --> 00:05:06,560
lifting and you'll see the endpoints are

133
00:05:04,720 --> 00:05:08,320
doing a lot of work so even if we hunt

134
00:05:06,560 --> 00:05:10,880
for it with

135
00:05:08,320 --> 00:05:12,000
many many endpoints then we will we will

136
00:05:10,880 --> 00:05:13,680
be able to

137
00:05:12,000 --> 00:05:14,880
uh very quickly

138
00:05:13,680 --> 00:05:16,880
um

139
00:05:14,880 --> 00:05:19,199
see um

140
00:05:16,880 --> 00:05:22,000
we're going to very quickly uh see that

141
00:05:19,199 --> 00:05:24,880
it you know they'll scale really quickly

142
00:05:22,000 --> 00:05:27,039
all right so um

143
00:05:24,880 --> 00:05:29,919
the the idea behind vql is instead of

144
00:05:27,039 --> 00:05:33,440
having specific analysis modules

145
00:05:29,919 --> 00:05:37,039
um we have generic what we call vql

146
00:05:33,440 --> 00:05:40,080
plugins and those plugins uh perform

147
00:05:37,039 --> 00:05:42,080
some low-level forensic analysis such as

148
00:05:40,080 --> 00:05:44,240
uh parsing files

149
00:05:42,080 --> 00:05:46,960
um you know in in the windows world we

150
00:05:44,240 --> 00:05:49,360
have you know ntfs buzzing mft and so on

151
00:05:46,960 --> 00:05:52,240
uh in the linux world we have parsing

152
00:05:49,360 --> 00:05:53,440
using grog sqlite and so on uh and

153
00:05:52,240 --> 00:05:55,680
binary parsing we're going to look at

154
00:05:53,440 --> 00:05:58,639
some of those today but instead of just

155
00:05:55,680 --> 00:06:00,400
having like a module that just does you

156
00:05:58,639 --> 00:06:02,240
know we're going to look at

157
00:06:00,400 --> 00:06:03,919
you know browser history

158
00:06:02,240 --> 00:06:07,360
we have generic pauses and then the

159
00:06:03,919 --> 00:06:09,440
query uses that to uh to build a more

160
00:06:07,360 --> 00:06:10,880
complicated query

161
00:06:09,440 --> 00:06:13,039
parser out of that

162
00:06:10,880 --> 00:06:15,199
so so this is the point of having the

163
00:06:13,039 --> 00:06:17,520
query language we can string together

164
00:06:15,199 --> 00:06:20,400
different basic building blocks to

165
00:06:17,520 --> 00:06:21,360
create a more complex and capable

166
00:06:20,400 --> 00:06:22,560
um

167
00:06:21,360 --> 00:06:23,360
capability

168
00:06:22,560 --> 00:06:25,520
so

169
00:06:23,360 --> 00:06:27,199
because this is all about open source

170
00:06:25,520 --> 00:06:29,199
and this conference you know really

171
00:06:27,199 --> 00:06:31,759
focuses on a lot of the open source

172
00:06:29,199 --> 00:06:33,520
aspects as well uh because it's an open

173
00:06:31,759 --> 00:06:36,639
source we have a vibrant community of

174
00:06:33,520 --> 00:06:38,319
people who write these vql queries for

175
00:06:36,639 --> 00:06:40,160
us so

176
00:06:38,319 --> 00:06:42,000
if you just wanted to know

177
00:06:40,160 --> 00:06:44,160
how to do a particular forensic analysis

178
00:06:42,000 --> 00:06:46,720
or particular or look for particular

179
00:06:44,160 --> 00:06:48,880
threat then probably there's going to be

180
00:06:46,720 --> 00:06:51,280
someone that had written a vql query

181
00:06:48,880 --> 00:06:53,360
that they would share with the world and

182
00:06:51,280 --> 00:06:55,199
that that allows us to kind of

183
00:06:53,360 --> 00:06:57,599
crowdsource these capabilities so we

184
00:06:55,199 --> 00:06:59,280
have on our website let me just

185
00:06:57,599 --> 00:07:00,880
quickly point out

186
00:06:59,280 --> 00:07:03,039
uh so this is our

187
00:07:00,880 --> 00:07:04,479
website talks to the velociraptor.apps

188
00:07:03,039 --> 00:07:06,479
there's going to be links at the you

189
00:07:04,479 --> 00:07:09,680
know at the end to it but we have this

190
00:07:06,479 --> 00:07:12,560
thing called uh the artifact exchange

191
00:07:09,680 --> 00:07:13,840
and this is where people share

192
00:07:12,560 --> 00:07:15,039
their different artifacts so you can see

193
00:07:13,840 --> 00:07:16,639
there's a whole bunch of different

194
00:07:15,039 --> 00:07:18,800
artifacts here

195
00:07:16,639 --> 00:07:20,080
um you know for example

196
00:07:18,800 --> 00:07:22,880
look for j

197
00:07:20,080 --> 00:07:24,160
uh detection you know someone has a

198
00:07:22,880 --> 00:07:25,520
contributed

199
00:07:24,160 --> 00:07:28,560
log4j

200
00:07:25,520 --> 00:07:31,759
artifact and this is the vql that runs

201
00:07:28,560 --> 00:07:33,919
so we can simply share those uh easily

202
00:07:31,759 --> 00:07:37,199
so let me just show you quickly

203
00:07:33,919 --> 00:07:39,120
uh in velociraptor we call artifacts are

204
00:07:37,199 --> 00:07:41,520
those vql libraries

205
00:07:39,120 --> 00:07:44,000
that contains those queries and so these

206
00:07:41,520 --> 00:07:45,919
are the ones that come you know uh built

207
00:07:44,000 --> 00:07:47,440
in and you can see that you know this is

208
00:07:45,919 --> 00:07:49,520
the query here

209
00:07:47,440 --> 00:07:51,919
and these are all built in

210
00:07:49,520 --> 00:07:55,680
right and we can actually

211
00:07:51,919 --> 00:07:59,039
leverage that uh artifact exchange to

212
00:07:55,680 --> 00:08:01,520
obtain all of that community sourced

213
00:07:59,039 --> 00:08:03,120
artifacts so these are built-in and i

214
00:08:01,520 --> 00:08:04,800
can simply

215
00:08:03,120 --> 00:08:06,800
import

216
00:08:04,800 --> 00:08:08,879
those artifacts

217
00:08:06,800 --> 00:08:13,520
so i just

218
00:08:08,879 --> 00:08:14,639
choose to run a server collection uh and

219
00:08:13,520 --> 00:08:16,000
search for

220
00:08:14,639 --> 00:08:17,440
import

221
00:08:16,000 --> 00:08:18,560
in the artifact

222
00:08:17,440 --> 00:08:20,080
and

223
00:08:18,560 --> 00:08:22,240
select this

224
00:08:20,080 --> 00:08:24,879
uh this artifact so it's it's like

225
00:08:22,240 --> 00:08:27,280
there's a built-in artifact that a

226
00:08:24,879 --> 00:08:29,599
built-in query that populates the server

227
00:08:27,280 --> 00:08:31,520
with the community queries basically so

228
00:08:29,599 --> 00:08:32,479
when we when we collect that from the

229
00:08:31,520 --> 00:08:36,000
server

230
00:08:32,479 --> 00:08:38,880
um then it will go off and fetch

231
00:08:36,000 --> 00:08:41,120
uh you know and fetch the um

232
00:08:38,880 --> 00:08:42,959
all the other artifacts and insert them

233
00:08:41,120 --> 00:08:44,800
into the server we're going to use those

234
00:08:42,959 --> 00:08:46,080
today so that's why i need to do that

235
00:08:44,800 --> 00:08:47,600
first

236
00:08:46,080 --> 00:08:49,839
and you can see that

237
00:08:47,600 --> 00:08:52,000
uh now when i look at all of our so

238
00:08:49,839 --> 00:08:54,560
these are like our artifacts which are

239
00:08:52,000 --> 00:08:56,320
saved queries essentially there are the

240
00:08:54,560 --> 00:08:58,000
built-in ones from before but then there

241
00:08:56,320 --> 00:08:59,360
are ones with the

242
00:08:58,000 --> 00:09:01,200
little user icon these are the

243
00:08:59,360 --> 00:09:03,600
contributed artifacts that came from the

244
00:09:01,200 --> 00:09:04,880
artifact exchange so we can see these

245
00:09:03,600 --> 00:09:07,120
are all the ones and we're going to use

246
00:09:04,880 --> 00:09:09,920
some of those today so so now we have

247
00:09:07,120 --> 00:09:11,600
those loaded so we can use them

248
00:09:09,920 --> 00:09:13,839
um so let's

249
00:09:11,600 --> 00:09:17,839
uh so the artifact exchange again is a

250
00:09:13,839 --> 00:09:21,120
place for exchanging uh your uh these

251
00:09:17,839 --> 00:09:24,160
community contributed artifacts queries

252
00:09:21,120 --> 00:09:26,320
uh and we just imported it from the um

253
00:09:24,160 --> 00:09:28,240
artifact exchange by just going to new

254
00:09:26,320 --> 00:09:29,040
collection from the server

255
00:09:28,240 --> 00:09:30,720
so

256
00:09:29,040 --> 00:09:32,480
um so let's have a look at some actual

257
00:09:30,720 --> 00:09:34,480
example like how do we how can we

258
00:09:32,480 --> 00:09:35,200
actually use this vql

259
00:09:34,480 --> 00:09:38,800
to

260
00:09:35,200 --> 00:09:41,279
um to create some actual

261
00:09:38,800 --> 00:09:42,240
um something useful right

262
00:09:41,279 --> 00:09:44,399
so

263
00:09:42,240 --> 00:09:47,040
let's uh and i'm gonna go through a bit

264
00:09:44,399 --> 00:09:48,080
of the process of creating your own

265
00:09:47,040 --> 00:09:50,399
content

266
00:09:48,080 --> 00:09:51,440
to try and give you guys the idea

267
00:09:50,399 --> 00:09:54,320
of

268
00:09:51,440 --> 00:09:56,399
uh how you can use vql creatively to

269
00:09:54,320 --> 00:09:59,360
make some new content to to make new

270
00:09:56,399 --> 00:10:00,480
detections and new new ideas

271
00:09:59,360 --> 00:10:01,839
so

272
00:10:00,480 --> 00:10:02,640
the first example we're going to look

273
00:10:01,839 --> 00:10:05,040
for

274
00:10:02,640 --> 00:10:07,600
is detecting ssh logging events and

275
00:10:05,040 --> 00:10:09,600
because you know linux typically

276
00:10:07,600 --> 00:10:13,200
uh a lot of the investigations that you

277
00:10:09,600 --> 00:10:14,560
know we do on linux are around ssh

278
00:10:13,200 --> 00:10:16,959
compromise

279
00:10:14,560 --> 00:10:19,600
lateral movement happens by

280
00:10:16,959 --> 00:10:21,440
compromising ssh keys

281
00:10:19,600 --> 00:10:24,320
and you know and then sometimes we have

282
00:10:21,440 --> 00:10:25,200
to go through and recover um

283
00:10:24,320 --> 00:10:27,200
you know

284
00:10:25,200 --> 00:10:29,200
who who logged into this machine where

285
00:10:27,200 --> 00:10:32,399
did they come from these kind of things

286
00:10:29,200 --> 00:10:35,279
so ssh is a big part of linux

287
00:10:32,399 --> 00:10:36,320
investigations not the only part but

288
00:10:35,279 --> 00:10:38,959
we're going to we're going to look at

289
00:10:36,320 --> 00:10:40,640
that as an example today

290
00:10:38,959 --> 00:10:41,680
so we're going to look at

291
00:10:40,640 --> 00:10:45,440
how do we

292
00:10:41,680 --> 00:10:47,760
leverage ssh logs to try and understand

293
00:10:45,440 --> 00:10:49,920
how this kind of attack chain

294
00:10:47,760 --> 00:10:52,640
occurs

295
00:10:49,920 --> 00:10:55,600
so let's take a look at

296
00:10:52,640 --> 00:10:58,399
what does an ssh log look like and

297
00:10:55,600 --> 00:11:01,040
you've all seen i'm sure

298
00:10:58,399 --> 00:11:02,640
ssh logs uh typically they are logged

299
00:11:01,040 --> 00:11:05,440
through syslog

300
00:11:02,640 --> 00:11:07,920
uh and there's a file in syslog var log

301
00:11:05,440 --> 00:11:09,200
off log and it contains or any on

302
00:11:07,920 --> 00:11:12,240
different systems it's in a different

303
00:11:09,200 --> 00:11:15,519
location perhaps uh but essentially

304
00:11:12,240 --> 00:11:17,680
syslog is uh is the defect the default

305
00:11:15,519 --> 00:11:20,240
logging system on linux so i think

306
00:11:17,680 --> 00:11:22,399
pretty much all linux systems use syslog

307
00:11:20,240 --> 00:11:24,160
but syslog is not

308
00:11:22,399 --> 00:11:26,560
uh especially

309
00:11:24,160 --> 00:11:28,640
easy to work with the

310
00:11:26,560 --> 00:11:31,200
the difficulty with this log is that it

311
00:11:28,640 --> 00:11:33,440
it consists of line based unstructured

312
00:11:31,200 --> 00:11:35,200
logs so it's essentially just

313
00:11:33,440 --> 00:11:37,920
you know like a print you know statement

314
00:11:35,200 --> 00:11:40,720
essentially you're printing a a line

315
00:11:37,920 --> 00:11:44,320
and that means something right but from

316
00:11:40,720 --> 00:11:46,720
a um a dfir perspective or you know an

317
00:11:44,320 --> 00:11:49,120
investigation of forensics it's it's

318
00:11:46,720 --> 00:11:52,639
unstructured so it's very hard to

319
00:11:49,120 --> 00:11:54,720
to uh to associate it with anything you

320
00:11:52,639 --> 00:11:56,720
know like to make queries on it because

321
00:11:54,720 --> 00:11:58,320
it's uh it's unstructured

322
00:11:56,720 --> 00:12:00,639
so typically

323
00:11:58,320 --> 00:12:01,519
this is what it looks like uh this is a

324
00:12:00,639 --> 00:12:03,360
line

325
00:12:01,519 --> 00:12:05,680
and it has all the key pieces of

326
00:12:03,360 --> 00:12:07,120
information in it that we want but

327
00:12:05,680 --> 00:12:09,120
they're kind of like all over the place

328
00:12:07,120 --> 00:12:10,880
right so it has the date and as you can

329
00:12:09,120 --> 00:12:12,480
see in syslog even it doesn't have the

330
00:12:10,880 --> 00:12:13,920
year which is

331
00:12:12,480 --> 00:12:15,920
terrible

332
00:12:13,920 --> 00:12:18,399
and then it has the host name it has the

333
00:12:15,920 --> 00:12:20,959
servers the service and then and then it

334
00:12:18,399 --> 00:12:22,720
has some key pieces of information like

335
00:12:20,959 --> 00:12:24,639
whether the key was accepted the

336
00:12:22,720 --> 00:12:27,120
connection was accepted or rejected so

337
00:12:24,639 --> 00:12:28,480
we have the word accepted here

338
00:12:27,120 --> 00:12:30,800
and then we have

339
00:12:28,480 --> 00:12:32,639
what kind of authentication it was from

340
00:12:30,800 --> 00:12:34,720
here and then who's the user

341
00:12:32,639 --> 00:12:37,519
and ip addresses and so on

342
00:12:34,720 --> 00:12:39,760
and this is really bad this is really

343
00:12:37,519 --> 00:12:42,079
hard to to

344
00:12:39,760 --> 00:12:43,920
query against right so when we do an

345
00:12:42,079 --> 00:12:47,040
investigation usually what we need to do

346
00:12:43,920 --> 00:12:49,360
is convert these unstructured

347
00:12:47,040 --> 00:12:52,000
you know essentially text soup i would

348
00:12:49,360 --> 00:12:54,399
say into structured logs that we can

349
00:12:52,000 --> 00:12:57,120
query you know in a proper way and

350
00:12:54,399 --> 00:12:58,639
usually the way this works um well you

351
00:12:57,120 --> 00:13:00,560
know i mean you can write like regular

352
00:12:58,639 --> 00:13:02,800
expressions to try and find little bits

353
00:13:00,560 --> 00:13:03,920
and pieces from that and you know

354
00:13:02,800 --> 00:13:06,160
essentially

355
00:13:03,920 --> 00:13:08,240
the way uh that the industry is kind of

356
00:13:06,160 --> 00:13:09,760
settled on solving this problem is using

357
00:13:08,240 --> 00:13:12,959
something called grog

358
00:13:09,760 --> 00:13:14,880
uh grok is just like a way of

359
00:13:12,959 --> 00:13:17,519
expressing very complicated regular

360
00:13:14,880 --> 00:13:20,320
expressions in a little bit simpler way

361
00:13:17,519 --> 00:13:22,240
so these end up essentially being very

362
00:13:20,320 --> 00:13:24,000
large regular expressions still

363
00:13:22,240 --> 00:13:26,000
and you're kind of matching that against

364
00:13:24,000 --> 00:13:28,320
what the log supposed to look like and

365
00:13:26,000 --> 00:13:29,279
sometimes it sort of works

366
00:13:28,320 --> 00:13:31,519
so

367
00:13:29,279 --> 00:13:32,839
that's kind of i guess that's that's the

368
00:13:31,519 --> 00:13:35,760
state

369
00:13:32,839 --> 00:13:38,560
of that's the state of

370
00:13:35,760 --> 00:13:40,560
of logging on linux is not great so um

371
00:13:38,560 --> 00:13:43,279
so this is the best we can do so let's

372
00:13:40,560 --> 00:13:44,079
just have a look at how we can use vql

373
00:13:43,279 --> 00:13:46,000
to

374
00:13:44,079 --> 00:13:47,839
get some structured information from

375
00:13:46,000 --> 00:13:51,120
these syslogs and i'm going to show you

376
00:13:47,839 --> 00:13:52,959
how to quickly write a vql query so the

377
00:13:51,120 --> 00:13:54,959
first thing that we do is we have this

378
00:13:52,959 --> 00:13:56,480
thing called a notebook and a notebook

379
00:13:54,959 --> 00:13:59,120
is like something

380
00:13:56,480 --> 00:14:01,040
that we can use to build up uh to build

381
00:13:59,120 --> 00:14:02,079
vql and run it interactively sort of

382
00:14:01,040 --> 00:14:04,800
like

383
00:14:02,079 --> 00:14:06,880
if you've ever used a jupiter notebook

384
00:14:04,800 --> 00:14:08,320
so it's sort of similar to that so i'm

385
00:14:06,880 --> 00:14:09,920
going to open this notebook here that

386
00:14:08,320 --> 00:14:10,959
i've prepared earlier just for the sake

387
00:14:09,920 --> 00:14:13,120
of time

388
00:14:10,959 --> 00:14:15,839
uh and going through the example of

389
00:14:13,120 --> 00:14:17,600
parsing ssh logs and so i'm going to

390
00:14:15,839 --> 00:14:20,320
give some i'm going to talk about some

391
00:14:17,600 --> 00:14:23,199
of the vql and point out how it's used

392
00:14:20,320 --> 00:14:25,199
to parse these logs so just for uh to

393
00:14:23,199 --> 00:14:27,199
get better real estate on the screen i'm

394
00:14:25,199 --> 00:14:29,279
just going to change it into full screen

395
00:14:27,199 --> 00:14:31,040
so it's a little bit easier to see so a

396
00:14:29,279 --> 00:14:32,480
notebook consists of

397
00:14:31,040 --> 00:14:34,399
these are called cells and they're kind

398
00:14:32,480 --> 00:14:37,120
of invisible initially but if you click

399
00:14:34,399 --> 00:14:39,040
on it then you know they become obvious

400
00:14:37,120 --> 00:14:41,040
and then we can edit each cell so each

401
00:14:39,040 --> 00:14:43,279
cell is like it's kind of quick it has a

402
00:14:41,040 --> 00:14:46,320
query and then it runs that query so you

403
00:14:43,279 --> 00:14:47,680
can see here this is the vql query

404
00:14:46,320 --> 00:14:49,600
uh here

405
00:14:47,680 --> 00:14:52,240
and so the first query we're just going

406
00:14:49,600 --> 00:14:53,440
to grab the files out of the off logs

407
00:14:52,240 --> 00:14:56,160
right so

408
00:14:53,440 --> 00:14:58,240
sorry grab the lines out of the oslo so

409
00:14:56,160 --> 00:15:00,720
you know as i said syslog is just a line

410
00:14:58,240 --> 00:15:02,639
based format so it's just they're just

411
00:15:00,720 --> 00:15:04,560
straightforward lines and so you can see

412
00:15:02,639 --> 00:15:06,000
that this query what it does

413
00:15:04,560 --> 00:15:07,279
uh there is this thing called a plugin

414
00:15:06,000 --> 00:15:08,399
called pars

415
00:15:07,279 --> 00:15:10,480
uh

416
00:15:08,399 --> 00:15:12,560
uh parse lines and then parselines

417
00:15:10,480 --> 00:15:14,320
basically grabs each line and puts it

418
00:15:12,560 --> 00:15:17,040
out into a variable called line so this

419
00:15:14,320 --> 00:15:18,720
is a query and as a

420
00:15:17,040 --> 00:15:19,920
because it's a query it returns a series

421
00:15:18,720 --> 00:15:22,320
of rows

422
00:15:19,920 --> 00:15:24,320
and columns right so queries always

423
00:15:22,320 --> 00:15:26,720
always return rows and columns and so we

424
00:15:24,320 --> 00:15:28,240
can have this is actually a

425
00:15:26,720 --> 00:15:30,639
whole bunch of rows

426
00:15:28,240 --> 00:15:31,680
and that's the column called line right

427
00:15:30,639 --> 00:15:34,000
so

428
00:15:31,680 --> 00:15:36,240
so this is how we would uh now in vql in

429
00:15:34,000 --> 00:15:37,600
in here uh you know we can we can use

430
00:15:36,240 --> 00:15:39,759
command line completion and things like

431
00:15:37,600 --> 00:15:42,240
that so we can see like what parameters

432
00:15:39,759 --> 00:15:44,320
does you know this plugin use

433
00:15:42,240 --> 00:15:45,680
uh you know or we could do like you know

434
00:15:44,320 --> 00:15:47,680
select

435
00:15:45,680 --> 00:15:48,880
start from

436
00:15:47,680 --> 00:15:50,880
and then we can see these are all the

437
00:15:48,880 --> 00:15:51,759
plugins that we could use

438
00:15:50,880 --> 00:15:54,160
um

439
00:15:51,759 --> 00:15:57,040
you know pause and we can search for it

440
00:15:54,160 --> 00:15:59,040
so so this is the uh preferred interface

441
00:15:57,040 --> 00:16:01,040
to write your query because it really

442
00:15:59,040 --> 00:16:02,800
helps you with writing the query once

443
00:16:01,040 --> 00:16:04,959
you write the query and you click save

444
00:16:02,800 --> 00:16:06,800
then it recalculates it so in this

445
00:16:04,959 --> 00:16:08,560
particular case we are pulling 50 lines

446
00:16:06,800 --> 00:16:10,240
out of the first log so that's the first

447
00:16:08,560 --> 00:16:11,759
step is to just get the lines out but

448
00:16:10,240 --> 00:16:14,079
again they're not structured at this

449
00:16:11,759 --> 00:16:16,240
point so what i want to do is i want to

450
00:16:14,079 --> 00:16:18,720
convert them into something structured

451
00:16:16,240 --> 00:16:20,560
and i use the grok expression that's the

452
00:16:18,720 --> 00:16:22,720
big expression that you see before and

453
00:16:20,560 --> 00:16:23,600
these things are available on the on the

454
00:16:22,720 --> 00:16:25,440
nets

455
00:16:23,600 --> 00:16:26,880
um and there's libraries of them so it's

456
00:16:25,440 --> 00:16:30,320
not like you have to come up with them

457
00:16:26,880 --> 00:16:32,480
yourself but it's basically expands into

458
00:16:30,320 --> 00:16:34,480
a big regular expression that matches

459
00:16:32,480 --> 00:16:37,440
that line like i mentioned before and it

460
00:16:34,480 --> 00:16:38,800
converts it into a structured uh format

461
00:16:37,440 --> 00:16:40,959
and you can see that's that's the

462
00:16:38,800 --> 00:16:43,600
structured format it's like it creates

463
00:16:40,959 --> 00:16:45,759
uh uh the whole thing basically uh

464
00:16:43,600 --> 00:16:47,759
splits into a dictionary and then you

465
00:16:45,759 --> 00:16:49,600
know it has these different fields so it

466
00:16:47,759 --> 00:16:51,680
pulls out specific things and you'll see

467
00:16:49,600 --> 00:16:54,320
that the the sad thing is that the time

468
00:16:51,680 --> 00:16:56,480
stamp again is has no year in it so it's

469
00:16:54,320 --> 00:16:58,320
like it's not easy to parse

470
00:16:56,480 --> 00:16:59,759
um but you know

471
00:16:58,320 --> 00:17:02,079
we've got all the key pieces of

472
00:16:59,759 --> 00:17:03,920
information whether it was accepted you

473
00:17:02,079 --> 00:17:06,959
know what kind of thing it was public

474
00:17:03,920 --> 00:17:09,439
key private key etc program etc so so we

475
00:17:06,959 --> 00:17:11,280
can use that to essentially pull out

476
00:17:09,439 --> 00:17:12,480
these structured information so let me

477
00:17:11,280 --> 00:17:15,360
just um

478
00:17:12,480 --> 00:17:16,799
so now once once we have that we want to

479
00:17:15,360 --> 00:17:18,319
actually create something called an

480
00:17:16,799 --> 00:17:19,760
artifact because we don't want people to

481
00:17:18,319 --> 00:17:22,079
have to type

482
00:17:19,760 --> 00:17:24,000
all of this vql into the gui each time

483
00:17:22,079 --> 00:17:26,480
right it's kind of a pain and error

484
00:17:24,000 --> 00:17:28,319
prone so what we want to do is have have

485
00:17:26,480 --> 00:17:29,919
it somehow encapsulate so we can publish

486
00:17:28,319 --> 00:17:31,440
it in an artifact

487
00:17:29,919 --> 00:17:33,360
so luckily

488
00:17:31,440 --> 00:17:36,160
uh that let me just get out of full

489
00:17:33,360 --> 00:17:37,039
screen mode and go back to our

490
00:17:36,160 --> 00:17:39,520
uh

491
00:17:37,039 --> 00:17:43,520
artifact library here and if i search

492
00:17:39,520 --> 00:17:45,919
for ssh then luckily uh oh no this is

493
00:17:43,520 --> 00:17:47,760
this is this uh there is a built-in one

494
00:17:45,919 --> 00:17:48,720
which is actually exactly the same as

495
00:17:47,760 --> 00:17:49,520
before

496
00:17:48,720 --> 00:17:51,600
uh

497
00:17:49,520 --> 00:17:53,760
it's just now it's just kind of like

498
00:17:51,600 --> 00:17:56,000
encapsulated inside of this thing called

499
00:17:53,760 --> 00:17:57,600
artifact which is we can just use

500
00:17:56,000 --> 00:18:00,320
so we don't need to type any of these

501
00:17:57,600 --> 00:18:02,400
queries in we could just use them and

502
00:18:00,320 --> 00:18:04,400
you can edit it and customize it you

503
00:18:02,400 --> 00:18:05,280
know so this is the query that we've had

504
00:18:04,400 --> 00:18:06,799
before it's a little bit more

505
00:18:05,280 --> 00:18:08,480
complicated now because it's going to

506
00:18:06,799 --> 00:18:10,000
look for different files in different

507
00:18:08,480 --> 00:18:12,480
places because it could be a number of

508
00:18:10,000 --> 00:18:15,120
auth logs and it could be zipped up and

509
00:18:12,480 --> 00:18:16,240
etc right so but you know this is a very

510
00:18:15,120 --> 00:18:18,640
simple thing

511
00:18:16,240 --> 00:18:20,320
um and uh let's just uh let's just find

512
00:18:18,640 --> 00:18:24,080
my favorite machine

513
00:18:20,320 --> 00:18:25,919
uh let's uh pick up uh this one

514
00:18:24,080 --> 00:18:28,640
one of my recent hosts

515
00:18:25,919 --> 00:18:30,640
and uh and this is this is uh i've got a

516
00:18:28,640 --> 00:18:32,720
tag on it called mike right so i've got

517
00:18:30,640 --> 00:18:34,480
a label on that machine so i can go to

518
00:18:32,720 --> 00:18:35,919
it straight away quickly

519
00:18:34,480 --> 00:18:37,840
uh and let's have a look at all the

520
00:18:35,919 --> 00:18:39,919
artifacts that we've collected before so

521
00:18:37,840 --> 00:18:42,559
i've collected some other ones before

522
00:18:39,919 --> 00:18:45,200
right for example i grabbed like uh

523
00:18:42,559 --> 00:18:46,960
different files and so on uh but let me

524
00:18:45,200 --> 00:18:49,679
just add uh

525
00:18:46,960 --> 00:18:50,640
let me just search for this ssh

526
00:18:49,679 --> 00:18:53,280
login

527
00:18:50,640 --> 00:18:56,559
okay so uh in this case what i want to

528
00:18:53,280 --> 00:18:58,480
do is search for that uh ssh login again

529
00:18:56,559 --> 00:19:00,320
it just it just goes through and it

530
00:18:58,480 --> 00:19:02,960
takes parameters here

531
00:19:00,320 --> 00:19:04,720
uh so this is just the defaults

532
00:19:02,960 --> 00:19:07,039
so then the next step i'll configure the

533
00:19:04,720 --> 00:19:09,120
parameters for this artifact

534
00:19:07,039 --> 00:19:10,640
uh and you'll notice that i mean the the

535
00:19:09,120 --> 00:19:12,160
vicuoil is in there but i don't really

536
00:19:10,640 --> 00:19:13,520
need to know anything about it so i

537
00:19:12,160 --> 00:19:15,919
don't need to really pause it or

538
00:19:13,520 --> 00:19:17,919
anything um i've got some defaults that

539
00:19:15,919 --> 00:19:19,760
i can change like maybe if my logs are

540
00:19:17,919 --> 00:19:21,600
in a different place i can look for them

541
00:19:19,760 --> 00:19:23,120
uh and this is the grok expression that

542
00:19:21,600 --> 00:19:25,919
i can maybe tweak a little bit maybe

543
00:19:23,120 --> 00:19:27,600
it's a non-conventional version of ssh

544
00:19:25,919 --> 00:19:29,120
and the logs are a little bit different

545
00:19:27,600 --> 00:19:31,520
yeah that happens

546
00:19:29,120 --> 00:19:34,000
um but anyway the defaults are usually

547
00:19:31,520 --> 00:19:35,840
fine we'll just launch it and uh and go

548
00:19:34,000 --> 00:19:37,919
off and collect it and you'll see that

549
00:19:35,840 --> 00:19:40,240
it's you know it's finished in 0.15

550
00:19:37,919 --> 00:19:42,320
seconds it just got essentially as soon

551
00:19:40,240 --> 00:19:44,160
as i tasked this endpoint it went off

552
00:19:42,320 --> 00:19:46,080
and collected that thing and paused it

553
00:19:44,160 --> 00:19:48,640
right and then if i look at my results

554
00:19:46,080 --> 00:19:50,080
then i've got all the ssh logos all the

555
00:19:48,640 --> 00:19:52,720
ssh logs

556
00:19:50,080 --> 00:19:54,640
um and you can see clearly that this is

557
00:19:52,720 --> 00:19:55,919
a problematic machine right right away

558
00:19:54,640 --> 00:19:59,120
why because we're seeing all these

559
00:19:55,919 --> 00:20:01,440
failed password logins so somehow this

560
00:19:59,120 --> 00:20:02,960
machine is getting uh if you've ever run

561
00:20:01,440 --> 00:20:04,159
you know linux machines on the internet

562
00:20:02,960 --> 00:20:07,280
of course they're going to get brute

563
00:20:04,159 --> 00:20:09,039
force all the time so these failed

564
00:20:07,280 --> 00:20:11,120
passwords you know you're going to see

565
00:20:09,039 --> 00:20:12,559
them a lot if your machine is available

566
00:20:11,120 --> 00:20:14,480
on the internet

567
00:20:12,559 --> 00:20:16,720
you'll see the accepted public key which

568
00:20:14,480 --> 00:20:17,520
is the legitimate users using keys and

569
00:20:16,720 --> 00:20:19,919
that's

570
00:20:17,520 --> 00:20:21,200
prob that's fine probably

571
00:20:19,919 --> 00:20:23,120
but then you have a whole bunch of

572
00:20:21,200 --> 00:20:24,960
passwords here

573
00:20:23,120 --> 00:20:27,039
but what would be really bad what would

574
00:20:24,960 --> 00:20:29,120
be really bad if this machine had a

575
00:20:27,039 --> 00:20:31,440
successful attempt with a password

576
00:20:29,120 --> 00:20:33,520
because that is not good right like so

577
00:20:31,440 --> 00:20:35,120
normally you're supposed to use keys and

578
00:20:33,520 --> 00:20:38,480
if someone's brute forcing the password

579
00:20:35,120 --> 00:20:40,799
and got in then you know then that's not

580
00:20:38,480 --> 00:20:42,640
good right so what we can do is we can

581
00:20:40,799 --> 00:20:44,320
do this thing called post processing of

582
00:20:42,640 --> 00:20:46,559
the data so we've collected all the data

583
00:20:44,320 --> 00:20:49,039
with the artifact from this machine

584
00:20:46,559 --> 00:20:51,360
and i can open up a notebook just to

585
00:20:49,039 --> 00:20:53,120
post-process that one collection

586
00:20:51,360 --> 00:20:54,640
and uh and it's the same thing but what

587
00:20:53,120 --> 00:20:56,799
i'm going to do is i'm going to change

588
00:20:54,640 --> 00:20:58,720
this query and i'm just going to add

589
00:20:56,799 --> 00:21:00,720
conditions so it returns all these rows

590
00:20:58,720 --> 00:21:02,480
but i just want to see the rows

591
00:21:00,720 --> 00:21:04,240
but the result

592
00:21:02,480 --> 00:21:05,760
right matches

593
00:21:04,240 --> 00:21:08,480
accepted

594
00:21:05,760 --> 00:21:11,520
all right because so that uh equal tilde

595
00:21:08,480 --> 00:21:13,120
is the regular expression match operator

596
00:21:11,520 --> 00:21:14,559
uh and

597
00:21:13,120 --> 00:21:15,679
uh methods

598
00:21:14,559 --> 00:21:19,039
matches

599
00:21:15,679 --> 00:21:20,720
password okay so if someone got in with

600
00:21:19,039 --> 00:21:22,000
a password you know that would be super

601
00:21:20,720 --> 00:21:24,240
bad right

602
00:21:22,000 --> 00:21:26,240
and so immediately that thing pops up to

603
00:21:24,240 --> 00:21:28,159
me it's like hey that is not cool right

604
00:21:26,240 --> 00:21:30,559
someone use the password now it could

605
00:21:28,159 --> 00:21:33,840
well be configured to do that maybe it's

606
00:21:30,559 --> 00:21:35,280
okay but usually um that requires

607
00:21:33,840 --> 00:21:36,640
further in

608
00:21:35,280 --> 00:21:37,360
inspection

609
00:21:36,640 --> 00:21:40,000
so

610
00:21:37,360 --> 00:21:42,240
so let's just go back to the slides

611
00:21:40,000 --> 00:21:46,240
and recap so we don't go too far ahead

612
00:21:42,240 --> 00:21:49,280
of the of the slides so we used vql we

613
00:21:46,240 --> 00:21:50,880
could use vql to pass each line out of

614
00:21:49,280 --> 00:21:54,159
the file

615
00:21:50,880 --> 00:21:58,240
and then we applied a grok expression to

616
00:21:54,159 --> 00:22:01,360
create a structure out of the

617
00:21:58,240 --> 00:22:02,960
text soup of the syslog right and then

618
00:22:01,360 --> 00:22:05,280
um and then we

619
00:22:02,960 --> 00:22:07,840
wrapped it in something called an

620
00:22:05,280 --> 00:22:11,360
artifact which basically is a yaml file

621
00:22:07,840 --> 00:22:14,559
with metadata so it has a name and it

622
00:22:11,360 --> 00:22:16,640
has parameters that are declared as part

623
00:22:14,559 --> 00:22:20,159
of the the artifact

624
00:22:16,640 --> 00:22:22,559
and you can see that here if i um

625
00:22:20,159 --> 00:22:22,559
simply

626
00:22:22,960 --> 00:22:27,200
find it again so this it's built in

627
00:22:24,640 --> 00:22:29,200
right but in this case but um

628
00:22:27,200 --> 00:22:31,440
you can click edit and then you can see

629
00:22:29,200 --> 00:22:33,600
this is what an artifact looks like

630
00:22:31,440 --> 00:22:36,000
right it has different parts the name

631
00:22:33,600 --> 00:22:37,919
description references and then it has

632
00:22:36,000 --> 00:22:39,600
these parameters section and those are

633
00:22:37,919 --> 00:22:42,400
the things that we can change you know

634
00:22:39,600 --> 00:22:44,159
when we run it so essentially that query

635
00:22:42,400 --> 00:22:45,600
you don't need to really kind of i mean

636
00:22:44,159 --> 00:22:47,919
you can look at it right but you don't

637
00:22:45,600 --> 00:22:51,120
need to really type it each time once

638
00:22:47,919 --> 00:22:53,440
that artifact is created then it's just

639
00:22:51,120 --> 00:22:55,440
ready to be used by anyone right

640
00:22:53,440 --> 00:22:57,039
um and so it can be easily discovered we

641
00:22:55,440 --> 00:23:00,000
just searched for it and ran it and it

642
00:22:57,039 --> 00:23:02,400
was done right so all we did is we can

643
00:23:00,000 --> 00:23:04,960
search for it in our artifact library

644
00:23:02,400 --> 00:23:06,960
which is that um third

645
00:23:04,960 --> 00:23:08,480
thing here

646
00:23:06,960 --> 00:23:10,080
you can

647
00:23:08,480 --> 00:23:12,240
you can

648
00:23:10,080 --> 00:23:14,240
view artifact screen right

649
00:23:12,240 --> 00:23:15,919
uh and then uh and then we selected it

650
00:23:14,240 --> 00:23:18,559
we can look at it we can customize it

651
00:23:15,919 --> 00:23:21,039
and we can collect it so we've collected

652
00:23:18,559 --> 00:23:23,280
it on a system this was the system that

653
00:23:21,039 --> 00:23:25,440
we were looking at so the one that shows

654
00:23:23,280 --> 00:23:27,840
up up the top here so you know in this

655
00:23:25,440 --> 00:23:29,520
case it was this machine here

656
00:23:27,840 --> 00:23:31,520
right and then we went to collected

657
00:23:29,520 --> 00:23:32,799
artifacts and we collected that artifact

658
00:23:31,520 --> 00:23:34,640
and you know

659
00:23:32,799 --> 00:23:36,559
um so that's what we're doing with that

660
00:23:34,640 --> 00:23:38,799
machine just specifically

661
00:23:36,559 --> 00:23:41,039
but now uh we would really like to be

662
00:23:38,799 --> 00:23:42,640
able to do it like everywhere like you

663
00:23:41,039 --> 00:23:44,799
know look we have a thousand machines

664
00:23:42,640 --> 00:23:47,360
right so we want to know

665
00:23:44,799 --> 00:23:49,840
you know did anyone you know pass would

666
00:23:47,360 --> 00:23:52,240
put force our password in any of our you

667
00:23:49,840 --> 00:23:54,559
know machines so going from

668
00:23:52,240 --> 00:23:57,120
investigating one machine to

669
00:23:54,559 --> 00:23:59,039
investigating thunder machines is easy

670
00:23:57,120 --> 00:24:01,760
it's called hunt so we just go and hit

671
00:23:59,039 --> 00:24:02,840
hunt manager create a new hunt

672
00:24:01,760 --> 00:24:06,960
look

673
00:24:02,840 --> 00:24:08,559
for password logins description

674
00:24:06,960 --> 00:24:10,320
uh and this tells me like how many

675
00:24:08,559 --> 00:24:12,240
machines it's expecting that it will

676
00:24:10,320 --> 00:24:14,159
apply to and if i say around everywhere

677
00:24:12,240 --> 00:24:16,720
that's all my deployment

678
00:24:14,159 --> 00:24:18,240
i can match it by label and you know

679
00:24:16,720 --> 00:24:20,240
there's only one machine that has that

680
00:24:18,240 --> 00:24:23,840
label so i can just target it just with

681
00:24:20,240 --> 00:24:25,919
that uh label or if i just match it by

682
00:24:23,840 --> 00:24:28,880
um all my linux machines in this case

683
00:24:25,919 --> 00:24:30,559
they're all linux anyway so so that you

684
00:24:28,880 --> 00:24:33,360
know that's all of them

685
00:24:30,559 --> 00:24:35,520
um and then if i just simply click uh

686
00:24:33,360 --> 00:24:38,159
search for ssh

687
00:24:35,520 --> 00:24:39,840
uh we're gonna do the same thing but on

688
00:24:38,159 --> 00:24:41,279
all our thousand machines and and then

689
00:24:39,840 --> 00:24:42,320
just go right so

690
00:24:41,279 --> 00:24:43,840
uh

691
00:24:42,320 --> 00:24:46,400
when we you know see how it's in the

692
00:24:43,840 --> 00:24:47,919
post state so we can just start it

693
00:24:46,400 --> 00:24:49,279
all right and you can see that as soon

694
00:24:47,919 --> 00:24:50,320
as i click start it's starting to

695
00:24:49,279 --> 00:24:51,919
schedule it

696
00:24:50,320 --> 00:24:54,159
and it goes off

697
00:24:51,919 --> 00:24:56,559
uh you know scheduling it for all the

698
00:24:54,159 --> 00:24:58,400
machines right 200 300 it's going to go

699
00:24:56,559 --> 00:25:00,240
off and collect the results from every

700
00:24:58,400 --> 00:25:01,840
single one now because each one of them

701
00:25:00,240 --> 00:25:04,320
is doing it in parallel they are all

702
00:25:01,840 --> 00:25:05,679
coming back pretty quick um and so you

703
00:25:04,320 --> 00:25:08,640
know we can

704
00:25:05,679 --> 00:25:10,720
we can see the um the results as they

705
00:25:08,640 --> 00:25:12,240
come in so let me just uh let me just

706
00:25:10,720 --> 00:25:14,000
see where are we

707
00:25:12,240 --> 00:25:16,000
okay so that's hunting and processing so

708
00:25:14,000 --> 00:25:17,600
the same thing is happening but on

709
00:25:16,000 --> 00:25:19,200
multiple machines

710
00:25:17,600 --> 00:25:21,520
and you can do the post processing on

711
00:25:19,200 --> 00:25:24,159
the hunt as well and

712
00:25:21,520 --> 00:25:26,640
and we've seen that let me just move on

713
00:25:24,159 --> 00:25:28,720
to the next example real quick

714
00:25:26,640 --> 00:25:31,120
um and so in this example we're talking

715
00:25:28,720 --> 00:25:34,720
about unsecured search keys so again ssh

716
00:25:31,120 --> 00:25:35,679
is our theme today so let's look at um

717
00:25:34,720 --> 00:25:37,440
how

718
00:25:35,679 --> 00:25:39,679
ssh keys should be protected now we all

719
00:25:37,440 --> 00:25:41,200
know that we need to protect our ssh

720
00:25:39,679 --> 00:25:44,000
keys with at least the passphrase

721
00:25:41,200 --> 00:25:45,600
because if we don't then someone can

722
00:25:44,000 --> 00:25:47,520
that can break into that machine they

723
00:25:45,600 --> 00:25:51,039
can just use those ssh keys the

724
00:25:47,520 --> 00:25:53,200
unprotected ones to uh laterally move

725
00:25:51,039 --> 00:25:54,559
from that machine to all the other

726
00:25:53,200 --> 00:25:57,200
machines on the environment right

727
00:25:54,559 --> 00:25:58,720
without having any impediments right so

728
00:25:57,200 --> 00:26:01,919
essentially that key

729
00:25:58,720 --> 00:26:04,159
becomes you know a liability so we need

730
00:26:01,919 --> 00:26:06,240
to uh protect them with a password but

731
00:26:04,159 --> 00:26:08,799
you know like on many environments for

732
00:26:06,240 --> 00:26:10,960
instance in aws when you get a key pair

733
00:26:08,799 --> 00:26:12,240
they're not encrypted or not protected

734
00:26:10,960 --> 00:26:13,679
and a lot of people just look at them

735
00:26:12,240 --> 00:26:15,200
and they're like okay cool and they use

736
00:26:13,679 --> 00:26:16,960
them but they don't realize they need to

737
00:26:15,200 --> 00:26:19,360
go through that extra step of actually

738
00:26:16,960 --> 00:26:21,520
encrypting them uh and so they have a

739
00:26:19,360 --> 00:26:23,520
lot of these keys lying around the

740
00:26:21,520 --> 00:26:24,880
environment that are not protected so

741
00:26:23,520 --> 00:26:26,400
what we want to do here is we want to

742
00:26:24,880 --> 00:26:27,840
find you know all the keys in the

743
00:26:26,400 --> 00:26:30,400
environments that are not properly

744
00:26:27,840 --> 00:26:33,120
encrypted so let's take a look at how we

745
00:26:30,400 --> 00:26:35,760
can parse these these files this private

746
00:26:33,120 --> 00:26:39,200
key format so let me just show you like

747
00:26:35,760 --> 00:26:42,000
how you come up with this id this query

748
00:26:39,200 --> 00:26:43,760
for something quite new so again i've

749
00:26:42,000 --> 00:26:45,440
got an example here

750
00:26:43,760 --> 00:26:47,600
of um

751
00:26:45,440 --> 00:26:49,679
of you know a new notebook i'll just

752
00:26:47,600 --> 00:26:52,400
make it full screen again

753
00:26:49,679 --> 00:26:54,080
and we can see that the first query the

754
00:26:52,400 --> 00:26:56,159
first thing i'm going to do is i'm just

755
00:26:54,080 --> 00:26:57,520
going to read that file right this is my

756
00:26:56,159 --> 00:26:59,520
private key

757
00:26:57,520 --> 00:27:02,640
i'm just going to read it and

758
00:26:59,520 --> 00:27:04,720
i have a read file function in vql and i

759
00:27:02,640 --> 00:27:07,279
can see that it looks like you know it

760
00:27:04,720 --> 00:27:09,279
has this uh header a private key and

761
00:27:07,279 --> 00:27:12,159
then it has a whole bunch of what looks

762
00:27:09,279 --> 00:27:14,480
to be base64 encoded something and then

763
00:27:12,159 --> 00:27:16,880
there's a tail on the end so that's

764
00:27:14,480 --> 00:27:19,440
that's cool so then clearly the thing in

765
00:27:16,880 --> 00:27:22,720
between the the thing between here and

766
00:27:19,440 --> 00:27:25,600
there is looks to be base64

767
00:27:22,720 --> 00:27:27,679
uh encrypted uh encoded right so let's

768
00:27:25,600 --> 00:27:29,440
um let's take a look at

769
00:27:27,679 --> 00:27:31,120
how we decode it

770
00:27:29,440 --> 00:27:32,480
and um

771
00:27:31,120 --> 00:27:34,559
what i'm going to do the first thing is

772
00:27:32,480 --> 00:27:35,919
i'm going to use a regular expression to

773
00:27:34,559 --> 00:27:37,840
pull out

774
00:27:35,919 --> 00:27:40,799
the data between the key

775
00:27:37,840 --> 00:27:43,520
the start the the header and the end

776
00:27:40,799 --> 00:27:46,240
right and that will give me the first

777
00:27:43,520 --> 00:27:48,559
part which is just that base 64-bit

778
00:27:46,240 --> 00:27:52,399
right and then in the second part i'm

779
00:27:48,559 --> 00:27:54,320
going to decode it with base64 decoding

780
00:27:52,399 --> 00:27:55,600
and so i can see straight away this is

781
00:27:54,320 --> 00:27:58,399
what you know so there's a whole bunch

782
00:27:55,600 --> 00:28:00,640
of binary data but straight away you can

783
00:27:58,399 --> 00:28:02,640
see that there is something here right

784
00:28:00,640 --> 00:28:05,440
it says open ssh key

785
00:28:02,640 --> 00:28:06,720
and it has none and none and if you look

786
00:28:05,440 --> 00:28:08,240
and there's a whole bunch of information

787
00:28:06,720 --> 00:28:10,880
here that could be useful as well like

788
00:28:08,240 --> 00:28:12,159
inside the key so that where it was made

789
00:28:10,880 --> 00:28:14,799
and things like that

790
00:28:12,159 --> 00:28:16,880
and you know the the type of key that it

791
00:28:14,799 --> 00:28:20,480
is and so on so there's some information

792
00:28:16,880 --> 00:28:23,039
here that is quite useful but uh but all

793
00:28:20,480 --> 00:28:24,480
this thing is you know binary so now we

794
00:28:23,039 --> 00:28:27,120
have this problem of like okay we have

795
00:28:24,480 --> 00:28:29,520
all this binary data and i can see stuff

796
00:28:27,120 --> 00:28:30,640
in there but i have no idea how to pass

797
00:28:29,520 --> 00:28:31,840
it out

798
00:28:30,640 --> 00:28:33,200
so let's uh

799
00:28:31,840 --> 00:28:35,120
so let's

800
00:28:33,200 --> 00:28:38,640
do some research on google what is this

801
00:28:35,120 --> 00:28:40,960
binary blob uh what is the structure

802
00:28:38,640 --> 00:28:43,520
so we have this there is a someone has

803
00:28:40,960 --> 00:28:46,240
done some research which is great

804
00:28:43,520 --> 00:28:49,279
and describing the uh the format so

805
00:28:46,240 --> 00:28:50,720
let's go to that site here

806
00:28:49,279 --> 00:28:52,799
okay so this is just a page on the

807
00:28:50,720 --> 00:28:55,200
internet that explains the format and

808
00:28:52,799 --> 00:28:57,600
you can see here uh the format is a

809
00:28:55,200 --> 00:28:59,919
binary format and there is like there is

810
00:28:57,600 --> 00:29:02,799
a description of you know so there's

811
00:28:59,919 --> 00:29:05,679
like the length and and some um

812
00:29:02,799 --> 00:29:07,840
uh it explains how the binary data is

813
00:29:05,679 --> 00:29:09,520
you know structured right so we can take

814
00:29:07,840 --> 00:29:12,720
this information and we can build a

815
00:29:09,520 --> 00:29:15,840
binary parser to extract this info this

816
00:29:12,720 --> 00:29:17,679
uh fields out of the binary data

817
00:29:15,840 --> 00:29:19,440
so uh like you know you can write a

818
00:29:17,679 --> 00:29:20,159
python script or something to get it out

819
00:29:19,440 --> 00:29:22,880
but

820
00:29:20,159 --> 00:29:24,000
in vql we actually have built-in binary

821
00:29:22,880 --> 00:29:25,760
parser

822
00:29:24,000 --> 00:29:28,000
and i just want to quickly show you that

823
00:29:25,760 --> 00:29:29,679
i'm not going to go into details about

824
00:29:28,000 --> 00:29:32,640
the parser

825
00:29:29,679 --> 00:29:34,320
uh but in this private keys one

826
00:29:32,640 --> 00:29:36,159
i'm just going to show you

827
00:29:34,320 --> 00:29:38,159
what puzzle looks like so it's kind of

828
00:29:36,159 --> 00:29:41,279
like a descriptive thing there is a

829
00:29:38,159 --> 00:29:43,200
profile that we can use to describe

830
00:29:41,279 --> 00:29:45,360
how each field is laid out so you know

831
00:29:43,200 --> 00:29:47,360
that's that's the header there's a magic

832
00:29:45,360 --> 00:29:50,159
string at offset zero and then the

833
00:29:47,360 --> 00:29:52,960
length of the cipher is offset 15 and

834
00:29:50,159 --> 00:29:55,279
it's a uni-32 big endian

835
00:29:52,960 --> 00:29:56,960
and then the cipher itself which is the

836
00:29:55,279 --> 00:29:59,440
string that describes you know what

837
00:29:56,960 --> 00:30:01,679
cipher is used to encrypt it uh is at

838
00:29:59,440 --> 00:30:04,240
offset 19 and you know the length is

839
00:30:01,679 --> 00:30:06,720
given by you know that other field so so

840
00:30:04,240 --> 00:30:10,000
we can have this so this is this is this

841
00:30:06,720 --> 00:30:12,159
is a description of uh of the binary

842
00:30:10,000 --> 00:30:14,240
format and we can use that to pass the

843
00:30:12,159 --> 00:30:16,320
keys out so again the same thing we're

844
00:30:14,240 --> 00:30:17,919
going to go out to our machine here

845
00:30:16,320 --> 00:30:20,320
and we're going to add another

846
00:30:17,919 --> 00:30:22,399
collection and let's look for our

847
00:30:20,320 --> 00:30:24,559
private keys

848
00:30:22,399 --> 00:30:25,840
okay so here's our private key on a real

849
00:30:24,559 --> 00:30:27,520
machine

850
00:30:25,840 --> 00:30:29,760
and we're going to collect this artifact

851
00:30:27,520 --> 00:30:31,600
again we don't need to necessarily

852
00:30:29,760 --> 00:30:34,000
understand the vql we just need to know

853
00:30:31,600 --> 00:30:36,720
how to use it so if we go over here we

854
00:30:34,000 --> 00:30:38,799
can change the parameters and this one

855
00:30:36,720 --> 00:30:39,840
basically there's a bit more complexity

856
00:30:38,799 --> 00:30:41,679
in the

857
00:30:39,840 --> 00:30:43,279
uh in this artifact because it can

858
00:30:41,679 --> 00:30:44,640
search for keys everywhere and we want

859
00:30:43,279 --> 00:30:46,880
to make sure that it doesn't go into

860
00:30:44,640 --> 00:30:49,520
proc and you know and then get lost in

861
00:30:46,880 --> 00:30:51,520
there right so uh so we can so there's a

862
00:30:49,520 --> 00:30:53,360
few more functions of functionality than

863
00:30:51,520 --> 00:30:55,520
you know we've just described uh but

864
00:30:53,360 --> 00:30:56,320
basically we go off and and collect this

865
00:30:55,520 --> 00:30:58,559
thing

866
00:30:56,320 --> 00:31:00,399
and it comes back in you know a matter

867
00:30:58,559 --> 00:31:02,720
of seconds and we can say oh look you

868
00:31:00,399 --> 00:31:04,640
know this user has a key and it's

869
00:31:02,720 --> 00:31:06,880
protected so that's great so so this is

870
00:31:04,640 --> 00:31:07,840
good right we checked it and and that's

871
00:31:06,880 --> 00:31:11,840
good

872
00:31:07,840 --> 00:31:14,159
so um but you know maybe that key has

873
00:31:11,840 --> 00:31:16,640
other that user has other keys lying

874
00:31:14,159 --> 00:31:18,480
around right so we only by default

875
00:31:16,640 --> 00:31:21,200
search for the keys if you look at the

876
00:31:18,480 --> 00:31:24,320
parameter uh or the the default

877
00:31:21,200 --> 00:31:26,880
parameter uh only uses it in slash home

878
00:31:24,320 --> 00:31:28,880
slash ubuntu sshd which is the location

879
00:31:26,880 --> 00:31:32,080
where normally the keys sit right but

880
00:31:28,880 --> 00:31:34,399
let's uh let's um search we copy that

881
00:31:32,080 --> 00:31:35,919
artifact and we we can tell it to search

882
00:31:34,399 --> 00:31:36,720
you know everywhere

883
00:31:35,919 --> 00:31:38,640
so

884
00:31:36,720 --> 00:31:40,320
that's the default search pattern which

885
00:31:38,640 --> 00:31:42,399
is wildcard

886
00:31:40,320 --> 00:31:44,559
right and we can just make it if if we

887
00:31:42,399 --> 00:31:47,600
do star star that's like a recursive

888
00:31:44,559 --> 00:31:50,320
search for this the system we look for

889
00:31:47,600 --> 00:31:52,480
pam id rsa or idsa these are the three

890
00:31:50,320 --> 00:31:54,640
types of names that we're gonna search

891
00:31:52,480 --> 00:31:57,360
for and uh and you know we go ahead and

892
00:31:54,640 --> 00:31:58,559
we do that and uh and it's gonna take a

893
00:31:57,360 --> 00:31:59,840
little bit longer because it's gonna

894
00:31:58,559 --> 00:32:01,039
search through the whole system so i'm

895
00:31:59,840 --> 00:32:03,440
going to leave it for a couple of

896
00:32:01,039 --> 00:32:06,559
seconds and we'll come back to it later

897
00:32:03,440 --> 00:32:08,320
but you can see that just recapping uh

898
00:32:06,559 --> 00:32:10,720
we read the file we basically fold the

899
00:32:08,320 --> 00:32:12,960
coded we noticed some binary data

900
00:32:10,720 --> 00:32:14,960
and we created a parser for it now

901
00:32:12,960 --> 00:32:17,039
because the parser is in vql we don't

902
00:32:14,960 --> 00:32:20,080
really need to rebuild or recompile or

903
00:32:17,039 --> 00:32:22,000
redeploy anything right we just we just

904
00:32:20,080 --> 00:32:24,559
you know write the vql it's descriptive

905
00:32:22,000 --> 00:32:26,880
and it the vql can go ahead and uh

906
00:32:24,559 --> 00:32:29,279
dissect that data out of the endpoint

907
00:32:26,880 --> 00:32:31,519
right so so then you know we wrote it

908
00:32:29,279 --> 00:32:32,240
into an artifact and then we collected

909
00:32:31,519 --> 00:32:34,320
it

910
00:32:32,240 --> 00:32:36,640
from the artifact uh

911
00:32:34,320 --> 00:32:40,000
um repository here

912
00:32:36,640 --> 00:32:42,320
and uh let me just see if it's finished

913
00:32:40,000 --> 00:32:44,640
yeah it's it's it's only taken uh 16

914
00:32:42,320 --> 00:32:46,640
seconds to go over the file system it's

915
00:32:44,640 --> 00:32:48,480
only a cloud vm so it's quite small it

916
00:32:46,640 --> 00:32:49,519
could take longer on other systems but

917
00:32:48,480 --> 00:32:52,799
we can see

918
00:32:49,519 --> 00:32:55,360
uh that this user has some aws keys

919
00:32:52,799 --> 00:32:56,799
here and they don't have any cyphers so

920
00:32:55,360 --> 00:32:59,519
this immediately because that's the

921
00:32:56,799 --> 00:33:00,880
default way that aws creates those keys

922
00:32:59,519 --> 00:33:02,399
this user didn't

923
00:33:00,880 --> 00:33:03,919
you know go to the extra step of

924
00:33:02,399 --> 00:33:06,960
re-securing their keys after they

925
00:33:03,919 --> 00:33:09,039
downloaded them from the aws console so

926
00:33:06,960 --> 00:33:10,960
this is this is really problematic and

927
00:33:09,039 --> 00:33:12,640
this is a really big deal we can see

928
00:33:10,960 --> 00:33:14,399
lateral movement through these keys all

929
00:33:12,640 --> 00:33:15,120
the time right because people don't do

930
00:33:14,399 --> 00:33:18,399
that

931
00:33:15,120 --> 00:33:20,640
so this is now we can go ahead and uh

932
00:33:18,399 --> 00:33:22,399
and you know tell the user hey you know

933
00:33:20,640 --> 00:33:24,000
you've done the wrong thing let's fix it

934
00:33:22,399 --> 00:33:26,559
but let's just think about what actually

935
00:33:24,000 --> 00:33:28,320
happened here um is that we could

936
00:33:26,559 --> 00:33:31,279
actually do that we could actually do

937
00:33:28,320 --> 00:33:33,120
that as a hand on all the systems um

938
00:33:31,279 --> 00:33:34,799
maybe we can do that real quick

939
00:33:33,120 --> 00:33:36,399
um you know

940
00:33:34,799 --> 00:33:39,519
i've shown you how to do that before but

941
00:33:36,399 --> 00:33:41,600
like you know uh search

942
00:33:39,519 --> 00:33:45,360
search or tam

943
00:33:41,600 --> 00:33:47,600
uh and again we do the same thing

944
00:33:45,360 --> 00:33:49,519
for the private key so it's the same

945
00:33:47,600 --> 00:33:51,760
process right but we're just gonna do it

946
00:33:49,519 --> 00:33:53,200
you know everywhere instead of on one

947
00:33:51,760 --> 00:33:55,279
machine

948
00:33:53,200 --> 00:33:57,120
and then go for it and what's going to

949
00:33:55,279 --> 00:33:58,720
happen now is that all of our machines

950
00:33:57,120 --> 00:34:00,320
are going to go like all thousands of

951
00:33:58,720 --> 00:34:02,080
them and it could be more right they're

952
00:34:00,320 --> 00:34:03,919
going to go and search for that on their

953
00:34:02,080 --> 00:34:05,919
own system but because each one is doing

954
00:34:03,919 --> 00:34:08,079
it sort of in parallel then it still

955
00:34:05,919 --> 00:34:10,399
doesn't take very long to uh to do that

956
00:34:08,079 --> 00:34:12,320
so the goal come back and getting the

957
00:34:10,399 --> 00:34:15,119
the results you know so like just like

958
00:34:12,320 --> 00:34:16,960
before uh it's very calculated so before

959
00:34:15,119 --> 00:34:18,639
we found all the logins right the same

960
00:34:16,960 --> 00:34:20,000
thing so they all came back from all the

961
00:34:18,639 --> 00:34:22,399
machines right then we could still do

962
00:34:20,000 --> 00:34:25,119
the post processing then uh in this case

963
00:34:22,399 --> 00:34:26,800
we uh we're doing the the same thing uh

964
00:34:25,119 --> 00:34:29,359
see if the results are here yet no

965
00:34:26,800 --> 00:34:31,679
they're still coming um and

966
00:34:29,359 --> 00:34:34,480
uh and then we can we can find that oh

967
00:34:31,679 --> 00:34:36,079
here we go so we've got some data there

968
00:34:34,480 --> 00:34:38,000
and uh

969
00:34:36,079 --> 00:34:40,480
yeah so we can we can then see you know

970
00:34:38,000 --> 00:34:41,919
everybody's you know keys and someone

971
00:34:40,480 --> 00:34:42,720
that that are

972
00:34:41,919 --> 00:34:44,720
uh

973
00:34:42,720 --> 00:34:46,560
this this all these machines thousand

974
00:34:44,720 --> 00:34:47,359
machines are kind of virtual all the

975
00:34:46,560 --> 00:34:48,720
same

976
00:34:47,359 --> 00:34:50,079
machines so we're going to get the same

977
00:34:48,720 --> 00:34:52,800
data but

978
00:34:50,079 --> 00:34:54,320
you get the idea of hunting so this is

979
00:34:52,800 --> 00:34:56,399
cool the other thing that's cool about

980
00:34:54,320 --> 00:34:58,240
it so this is how we created a new hunt

981
00:34:56,399 --> 00:35:00,640
we configured it

982
00:34:58,240 --> 00:35:03,040
and then we ran it now the interesting

983
00:35:00,640 --> 00:35:05,520
thing about it is that we haven't

984
00:35:03,040 --> 00:35:07,520
actually downloaded anyone's keys right

985
00:35:05,520 --> 00:35:09,280
so it's not like we went out

986
00:35:07,520 --> 00:35:11,119
grabbed all the keys and ran a python

987
00:35:09,280 --> 00:35:12,560
script locally to check are they

988
00:35:11,119 --> 00:35:14,240
encrypted because obviously that would

989
00:35:12,560 --> 00:35:15,040
be like really bad right because we

990
00:35:14,240 --> 00:35:17,520
don't

991
00:35:15,040 --> 00:35:20,960
copy everybody's private keys right so

992
00:35:17,520 --> 00:35:22,720
having it done by the end point uh means

993
00:35:20,960 --> 00:35:25,599
means that we can we we don't have to

994
00:35:22,720 --> 00:35:27,599
get the data essentially all right last

995
00:35:25,599 --> 00:35:30,640
uh last example recovering deleted log

996
00:35:27,599 --> 00:35:32,240
so we looked at uh how you know the logs

997
00:35:30,640 --> 00:35:34,079
look at syslog

998
00:35:32,240 --> 00:35:37,280
but let's say in a lot of cases you know

999
00:35:34,079 --> 00:35:39,359
people delete the logs or uh compromises

1000
00:35:37,280 --> 00:35:42,000
happened so long ago that the logs got

1001
00:35:39,359 --> 00:35:43,760
rotated uh maybe a few weeks you know

1002
00:35:42,000 --> 00:35:45,520
before normally it's like four weeks

1003
00:35:43,760 --> 00:35:47,760
after and then they get rotated out

1004
00:35:45,520 --> 00:35:50,560
depending on the rotation policy uh

1005
00:35:47,760 --> 00:35:52,960
those logs can be aggressively rotated

1006
00:35:50,560 --> 00:35:54,720
so in that case we really we really need

1007
00:35:52,960 --> 00:35:56,560
to go back in time and try and find

1008
00:35:54,720 --> 00:35:59,359
forensic evidence of

1009
00:35:56,560 --> 00:36:01,760
these compromises from the logs uh and

1010
00:35:59,359 --> 00:36:03,599
we try to recover deleted logs now if we

1011
00:36:01,760 --> 00:36:05,280
if we are back to that you know this is

1012
00:36:03,599 --> 00:36:07,040
is not good right it's not a good

1013
00:36:05,280 --> 00:36:07,839
outcome it's better to have vlogs right

1014
00:36:07,040 --> 00:36:09,680
but

1015
00:36:07,839 --> 00:36:11,200
if you are struggling

1016
00:36:09,680 --> 00:36:13,200
and we don't have logs then we can use a

1017
00:36:11,200 --> 00:36:14,400
technique called carving and carving is

1018
00:36:13,200 --> 00:36:16,160
a very simple technique where we

1019
00:36:14,400 --> 00:36:17,760
basically look for patterns in

1020
00:36:16,160 --> 00:36:19,280
unstructured data

1021
00:36:17,760 --> 00:36:21,760
and the idea is that when someone

1022
00:36:19,280 --> 00:36:23,760
deletes those logs then the data is

1023
00:36:21,760 --> 00:36:26,480
still on the disk so we we might be able

1024
00:36:23,760 --> 00:36:29,119
to find it you know from just the disk

1025
00:36:26,480 --> 00:36:31,280
unstructured data so let me just show

1026
00:36:29,119 --> 00:36:32,560
you an example of how that's how we how

1027
00:36:31,280 --> 00:36:34,079
we do that

1028
00:36:32,560 --> 00:36:35,839
so in this

1029
00:36:34,079 --> 00:36:37,680
particular example let me just make it

1030
00:36:35,839 --> 00:36:38,960
full screen again so the idea is to try

1031
00:36:37,680 --> 00:36:41,520
and look for

1032
00:36:38,960 --> 00:36:43,760
uh patterns that look like a syslog

1033
00:36:41,520 --> 00:36:46,320
message right so uh we've seen before

1034
00:36:43,760 --> 00:36:48,240
the syslog starts with uh jan feb

1035
00:36:46,320 --> 00:36:50,960
martial look at month name and then it

1036
00:36:48,240 --> 00:36:53,200
has the dates of the month and then it

1037
00:36:50,960 --> 00:36:55,760
has you know the the time and then we

1038
00:36:53,200 --> 00:36:57,520
know that you know it's a line so we're

1039
00:36:55,760 --> 00:37:00,320
going to take all the characters until

1040
00:36:57,520 --> 00:37:02,880
the next new line so that's one line so

1041
00:37:00,320 --> 00:37:04,400
when we do this we can so we're going to

1042
00:37:02,880 --> 00:37:07,280
do this query here

1043
00:37:04,400 --> 00:37:09,440
we use a tool called yara which is uh

1044
00:37:07,280 --> 00:37:11,920
used for essentially applying regular

1045
00:37:09,440 --> 00:37:14,880
expressions at scale it's very fast and

1046
00:37:11,920 --> 00:37:16,079
efficient and so on uh and so we can do

1047
00:37:14,880 --> 00:37:17,760
that on

1048
00:37:16,079 --> 00:37:19,680
ideally we want to do it on a device in

1049
00:37:17,760 --> 00:37:21,599
the end but for testing we're just going

1050
00:37:19,680 --> 00:37:22,800
to do it on the real file to find the

1051
00:37:21,599 --> 00:37:25,040
right

1052
00:37:22,800 --> 00:37:27,520
regular expression and so on and so you

1053
00:37:25,040 --> 00:37:29,359
know we go ahead we we grab that file

1054
00:37:27,520 --> 00:37:30,720
and we run the cr expression on it and

1055
00:37:29,359 --> 00:37:33,280
it's supposed to

1056
00:37:30,720 --> 00:37:35,599
uh hit on all of the

1057
00:37:33,280 --> 00:37:37,839
all of the lines that sort of look like

1058
00:37:35,599 --> 00:37:40,480
that right that sort of look like maybe

1059
00:37:37,839 --> 00:37:42,880
a c slog line right so when when i run

1060
00:37:40,480 --> 00:37:44,720
this the first query so that's this one

1061
00:37:42,880 --> 00:37:47,359
uh you can see that there is you know

1062
00:37:44,720 --> 00:37:49,040
some hex data here and it matches you

1063
00:37:47,359 --> 00:37:51,440
know that kind of pattern right so we

1064
00:37:49,040 --> 00:37:54,880
got the gen 16 and then it goes all the

1065
00:37:51,440 --> 00:37:56,640
way to and again no no year right but um

1066
00:37:54,880 --> 00:37:58,720
you know so this basically pulls out our

1067
00:37:56,640 --> 00:38:01,359
log lines or things that look sort of

1068
00:37:58,720 --> 00:38:04,640
like a log line so you know uh then what

1069
00:38:01,359 --> 00:38:07,680
we're going to do is we're going to uh

1070
00:38:04,640 --> 00:38:10,079
extract the actual hit and look for only

1071
00:38:07,680 --> 00:38:12,240
things that sort of look like maybe ssh

1072
00:38:10,079 --> 00:38:13,760
logins right so it has to have the word

1073
00:38:12,240 --> 00:38:16,000
either accepted or failed that's really

1074
00:38:13,760 --> 00:38:18,240
all we care about in this case and so

1075
00:38:16,000 --> 00:38:20,160
you know that's the second query here

1076
00:38:18,240 --> 00:38:22,720
and so you can see the hit

1077
00:38:20,160 --> 00:38:25,599
is uh essentially

1078
00:38:22,720 --> 00:38:27,839
you know uh the ssh

1079
00:38:25,599 --> 00:38:29,920
keys right so it's accepted a login

1080
00:38:27,839 --> 00:38:31,440
accepted login and so on right so so

1081
00:38:29,920 --> 00:38:33,440
basically all we do is we write this

1082
00:38:31,440 --> 00:38:34,880
thing and then we just

1083
00:38:33,440 --> 00:38:37,359
and then again we do the same thing with

1084
00:38:34,880 --> 00:38:39,760
rocket and all the rest right and so we

1085
00:38:37,359 --> 00:38:41,920
can apply that and get

1086
00:38:39,760 --> 00:38:43,280
uh and then carve out

1087
00:38:41,920 --> 00:38:44,800
uh

1088
00:38:43,280 --> 00:38:46,800
the

1089
00:38:44,800 --> 00:38:48,800
ssh logs that could be deleted so let's

1090
00:38:46,800 --> 00:38:50,800
have a look at this one comes from the

1091
00:38:48,800 --> 00:38:54,240
exchange so again it's a probably

1092
00:38:50,800 --> 00:38:56,560
probably contributed content and again

1093
00:38:54,240 --> 00:38:59,119
this is the query this is the regular

1094
00:38:56,560 --> 00:39:02,240
expression that finds out the error rule

1095
00:38:59,119 --> 00:39:06,000
and then this is the grok expression

1096
00:39:02,240 --> 00:39:07,760
okay so let's go over here and carve it

1097
00:39:06,000 --> 00:39:10,640
now carving takes a long time because

1098
00:39:07,760 --> 00:39:12,720
you are really looking for the raw disk

1099
00:39:10,640 --> 00:39:13,520
right so if we actually look at

1100
00:39:12,720 --> 00:39:15,520
uh

1101
00:39:13,520 --> 00:39:17,359
it's it's it's looking at the raw device

1102
00:39:15,520 --> 00:39:20,400
and then just looking for patterns that

1103
00:39:17,359 --> 00:39:22,640
sort of look like ssh um

1104
00:39:20,400 --> 00:39:24,320
messages right so it could take a while

1105
00:39:22,640 --> 00:39:26,480
to do it's going to scan all the disk

1106
00:39:24,320 --> 00:39:28,400
this is a cloud machine so it's not that

1107
00:39:26,480 --> 00:39:29,680
big but it could take a while so let's

1108
00:39:28,400 --> 00:39:32,400
just leave it

1109
00:39:29,680 --> 00:39:35,119
and we'll come back to that

1110
00:39:32,400 --> 00:39:36,720
okay so just recapping

1111
00:39:35,119 --> 00:39:38,640
and then we're going to see some hits

1112
00:39:36,720 --> 00:39:41,200
over here um we're going to see that

1113
00:39:38,640 --> 00:39:43,359
later so in the last five minutes i just

1114
00:39:41,200 --> 00:39:45,920
want to show you guys another cool

1115
00:39:43,359 --> 00:39:48,640
feature in velociraptor which is about

1116
00:39:45,920 --> 00:39:50,160
monitoring events from the endpoints so

1117
00:39:48,640 --> 00:39:52,400
normally when we run a query we talked

1118
00:39:50,160 --> 00:39:54,839
about vql and you can see it's really

1119
00:39:52,400 --> 00:39:58,240
quick and it finishes and gives you

1120
00:39:54,839 --> 00:40:01,040
a table but it doesn't have to finish

1121
00:39:58,240 --> 00:40:03,119
right so the query actually returns data

1122
00:40:01,040 --> 00:40:05,280
as soon as it's available so that means

1123
00:40:03,119 --> 00:40:07,280
that if we can write a query that runs

1124
00:40:05,280 --> 00:40:09,200
sort of forever right

1125
00:40:07,280 --> 00:40:12,480
then as soon as something happens it

1126
00:40:09,200 --> 00:40:14,800
will return stream data back so vql can

1127
00:40:12,480 --> 00:40:18,000
support streaming queries and that is

1128
00:40:14,800 --> 00:40:19,760
where event queries

1129
00:40:18,000 --> 00:40:22,400
go in so you can write

1130
00:40:19,760 --> 00:40:24,720
a query that is running all the time but

1131
00:40:22,400 --> 00:40:26,160
it's constantly streaming back

1132
00:40:24,720 --> 00:40:28,400
uh

1133
00:40:26,160 --> 00:40:30,480
you know rows and then that row those

1134
00:40:28,400 --> 00:40:32,560
rows can simply be forwarded to the

1135
00:40:30,480 --> 00:40:34,720
server and then we are collecting those

1136
00:40:32,560 --> 00:40:35,920
as events so we can use that for a

1137
00:40:34,720 --> 00:40:38,319
number of things we can use it for

1138
00:40:35,920 --> 00:40:41,839
monitoring and also we can use it for

1139
00:40:38,319 --> 00:40:43,760
response for creating uh for automating

1140
00:40:41,839 --> 00:40:46,079
response so we can go and do stuff based

1141
00:40:43,760 --> 00:40:48,160
on those those queries so here's an

1142
00:40:46,079 --> 00:40:51,520
example of i just wanted to show you

1143
00:40:48,160 --> 00:40:53,680
guys the query here how do we turn that

1144
00:40:51,520 --> 00:40:56,000
other query that we did before which was

1145
00:40:53,680 --> 00:40:58,160
remember we had

1146
00:40:56,000 --> 00:41:01,520
passed lines which just goes off and

1147
00:40:58,160 --> 00:41:04,000
reads the lines but there is a similar

1148
00:41:01,520 --> 00:41:07,200
event version of that query called watch

1149
00:41:04,000 --> 00:41:09,839
syslog and that is watching the the line

1150
00:41:07,200 --> 00:41:11,680
so it's essentially like a tail uh uh a

1151
00:41:09,839 --> 00:41:13,680
tail that shaft or you know something

1152
00:41:11,680 --> 00:41:15,200
like that um

1153
00:41:13,680 --> 00:41:17,359
or or a

1154
00:41:15,200 --> 00:41:19,760
less with a tail following or whatever

1155
00:41:17,359 --> 00:41:22,240
right so it looks at the end of the file

1156
00:41:19,760 --> 00:41:24,240
it has watches for new lines to appear

1157
00:41:22,240 --> 00:41:26,480
and then it releases each line into the

1158
00:41:24,240 --> 00:41:29,359
query so it never terminates but once we

1159
00:41:26,480 --> 00:41:31,680
run this query it will always work and

1160
00:41:29,359 --> 00:41:34,240
and then just grok the lines as they

1161
00:41:31,680 --> 00:41:36,480
come and then filter them out and if

1162
00:41:34,240 --> 00:41:38,160
they're ssh then it will say hey that's

1163
00:41:36,480 --> 00:41:40,240
that's you know an interesting one and

1164
00:41:38,160 --> 00:41:44,079
it will pass it on and so we can use

1165
00:41:40,240 --> 00:41:46,240
that to um to monitor for ssh logins so

1166
00:41:44,079 --> 00:41:48,079
there is this artifact here which is

1167
00:41:46,240 --> 00:41:49,920
windows event ssh login i'll just

1168
00:41:48,079 --> 00:41:52,079
quickly show you that

1169
00:41:49,920 --> 00:41:54,640
so again we're going to look for ssh and

1170
00:41:52,079 --> 00:41:56,240
this one is an event version of that and

1171
00:41:54,640 --> 00:41:58,560
we can see that it's a client it's a

1172
00:41:56,240 --> 00:42:00,240
slightly different type but it's still

1173
00:41:58,560 --> 00:42:03,440
the same kind of general structure it's

1174
00:42:00,240 --> 00:42:06,079
still an artifact and but to install it

1175
00:42:03,440 --> 00:42:07,599
we have to go into this screen here

1176
00:42:06,079 --> 00:42:10,880
which shows us

1177
00:42:07,599 --> 00:42:14,079
uh the event monitoring on the client so

1178
00:42:10,880 --> 00:42:16,560
we can target it specifically to a label

1179
00:42:14,079 --> 00:42:18,720
group say mike

1180
00:42:16,560 --> 00:42:21,760
and you know and then otherwise it's

1181
00:42:18,720 --> 00:42:24,079
kind of the same um ui right we just

1182
00:42:21,760 --> 00:42:26,400
select which ones we want and then we

1183
00:42:24,079 --> 00:42:27,280
can configure them and so on right

1184
00:42:26,400 --> 00:42:29,200
and

1185
00:42:27,280 --> 00:42:31,119
and then once we do that then the event

1186
00:42:29,200 --> 00:42:33,280
starts streaming in

1187
00:42:31,119 --> 00:42:35,440
so we can see that so in this case for

1188
00:42:33,280 --> 00:42:37,200
instance uh you can see that there is

1189
00:42:35,440 --> 00:42:39,119
one event that came in

1190
00:42:37,200 --> 00:42:40,720
i had them on before so i can show you

1191
00:42:39,119 --> 00:42:42,800
how it looks like

1192
00:42:40,720 --> 00:42:44,880
when someone logs in then immediately

1193
00:42:42,800 --> 00:42:46,960
that event is streamed to the server so

1194
00:42:44,880 --> 00:42:49,119
it's not log folding it's not just

1195
00:42:46,960 --> 00:42:51,119
forwarding all the logs indiscriminately

1196
00:42:49,119 --> 00:42:53,280
it's doing the querying and processing

1197
00:42:51,119 --> 00:42:54,960
on the endpoint directly and then just

1198
00:42:53,280 --> 00:42:56,319
for forwarding back

1199
00:42:54,960 --> 00:42:57,680
uh just

1200
00:42:56,319 --> 00:42:59,520
those ones that are relevant to the

1201
00:42:57,680 --> 00:43:00,400
query right so we can do we can do both

1202
00:42:59,520 --> 00:43:02,319
we can

1203
00:43:00,400 --> 00:43:05,200
uh forward all the events or we can do

1204
00:43:02,319 --> 00:43:07,119
the post uh process the pre-filtering

1205
00:43:05,200 --> 00:43:08,960
and the processing on the endpoint and

1206
00:43:07,119 --> 00:43:10,880
just forward back those really

1207
00:43:08,960 --> 00:43:13,119
high-valued i mean this is a really

1208
00:43:10,880 --> 00:43:15,119
high-valued event uh you know and it

1209
00:43:13,119 --> 00:43:16,800
could be sitting between thousands of

1210
00:43:15,119 --> 00:43:18,720
syslog lines right we don't care about

1211
00:43:16,800 --> 00:43:21,280
those we just care about this one so

1212
00:43:18,720 --> 00:43:22,240
they all go in the same place right uh

1213
00:43:21,280 --> 00:43:24,560
then the

1214
00:43:22,240 --> 00:43:26,720
let me just finally the last uh thing

1215
00:43:24,560 --> 00:43:28,480
that i wanted to show you guys

1216
00:43:26,720 --> 00:43:30,480
uh so this is how we collect the events

1217
00:43:28,480 --> 00:43:32,400
right we just run and we see those uh

1218
00:43:30,480 --> 00:43:35,119
things the last thing that i wanted to

1219
00:43:32,400 --> 00:43:37,520
show uh to talk about is sysmon and

1220
00:43:35,119 --> 00:43:40,880
cismo is really exciting it's sysmon is

1221
00:43:37,520 --> 00:43:42,960
like the default um i guess kernel

1222
00:43:40,880 --> 00:43:44,480
events monitoring tool for windows so

1223
00:43:42,960 --> 00:43:47,119
it's been around for a long time on

1224
00:43:44,480 --> 00:43:50,240
windows and just recently they've

1225
00:43:47,119 --> 00:43:51,839
released a sysmon for linux based on

1226
00:43:50,240 --> 00:43:54,000
ebpf

1227
00:43:51,839 --> 00:43:55,440
and we've talked a lot about ebpf in

1228
00:43:54,000 --> 00:43:58,800
this conference especially in the kernel

1229
00:43:55,440 --> 00:44:01,520
hacking uh minicom early on uh early on

1230
00:43:58,800 --> 00:44:03,680
in conference so ebpf is a method for us

1231
00:44:01,520 --> 00:44:05,599
to be able to get information from the

1232
00:44:03,680 --> 00:44:08,000
kernel about things like process

1233
00:44:05,599 --> 00:44:09,599
execution network connections all that

1234
00:44:08,000 --> 00:44:11,119
really good stuff from the detection

1235
00:44:09,599 --> 00:44:14,000
perspective

1236
00:44:11,119 --> 00:44:16,960
and sysmon is now a nice easy way of

1237
00:44:14,000 --> 00:44:19,200
getting into that um it's still immature

1238
00:44:16,960 --> 00:44:20,480
you know it's still a little bit buggy

1239
00:44:19,200 --> 00:44:22,800
but it has a lot of interest from the

1240
00:44:20,480 --> 00:44:26,240
community everybody's excited about it

1241
00:44:22,800 --> 00:44:28,240
um it still has some shortfalls

1242
00:44:26,240 --> 00:44:30,319
but i'll just show you how how it looks

1243
00:44:28,240 --> 00:44:32,160
like that this is the sysmon uh the

1244
00:44:30,319 --> 00:44:33,680
synthetic version itself just writes the

1245
00:44:32,160 --> 00:44:35,119
syslog which is

1246
00:44:33,680 --> 00:44:36,720
terrible because then you have to apply

1247
00:44:35,119 --> 00:44:37,920
these regular expressions to get the

1248
00:44:36,720 --> 00:44:39,599
data out

1249
00:44:37,920 --> 00:44:41,839
um so i've

1250
00:44:39,599 --> 00:44:43,440
written a patch to fix it to

1251
00:44:41,839 --> 00:44:46,079
write it to unix domain socket so it's a

1252
00:44:43,440 --> 00:44:48,319
lot more efficient and json encoded

1253
00:44:46,079 --> 00:44:50,640
and so we can use this plugin called

1254
00:44:48,319 --> 00:44:53,920
netcat which connects to the unix domain

1255
00:44:50,640 --> 00:44:55,599
socket and reads all the lines out uh

1256
00:44:53,920 --> 00:44:57,839
and but otherwise it's exactly the same

1257
00:44:55,599 --> 00:44:59,839
path that jason didn't show it so

1258
00:44:57,839 --> 00:45:01,040
let me just quickly show you what that

1259
00:44:59,839 --> 00:45:03,520
looks like

1260
00:45:01,040 --> 00:45:05,119
uh so all we do is we just collect that

1261
00:45:03,520 --> 00:45:08,160
from our endpoint

1262
00:45:05,119 --> 00:45:10,079
uh and you can see that it's basically

1263
00:45:08,160 --> 00:45:12,079
uh it's it's giving us this structured

1264
00:45:10,079 --> 00:45:14,160
information about

1265
00:45:12,079 --> 00:45:16,240
you know process execution like here's a

1266
00:45:14,160 --> 00:45:18,640
ps command line that ran

1267
00:45:16,240 --> 00:45:20,240
uh you know where it ran from and so on

1268
00:45:18,640 --> 00:45:22,720
a lot of these fields are they kind of

1269
00:45:20,240 --> 00:45:24,960
all only make sense on windows but maybe

1270
00:45:22,720 --> 00:45:26,800
there isn't really an equivalent

1271
00:45:24,960 --> 00:45:28,319
you know source data source for it on

1272
00:45:26,800 --> 00:45:30,319
linux but

1273
00:45:28,319 --> 00:45:32,000
um but you can use them to just like

1274
00:45:30,319 --> 00:45:33,920
filter and say oh you know when this

1275
00:45:32,000 --> 00:45:35,760
process ran what was the parent process

1276
00:45:33,920 --> 00:45:38,319
what did it do and then you can write

1277
00:45:35,760 --> 00:45:40,079
detections based on that so again

1278
00:45:38,319 --> 00:45:42,000
so much mike we've run up against time

1279
00:45:40,079 --> 00:45:45,520
and we need to keep on schedule no

1280
00:45:42,000 --> 00:45:49,440
worries uh well so just last last slide

1281
00:45:45,520 --> 00:45:51,599
um just references check out the uh the

1282
00:45:49,440 --> 00:45:53,839
the website the github uh and the

1283
00:45:51,599 --> 00:45:54,720
discord and um thank you very much for

1284
00:45:53,839 --> 00:45:56,079
your time

1285
00:45:54,720 --> 00:45:58,400
thank you very much um i hope that you

1286
00:45:56,079 --> 00:46:00,560
can drop some of those links and answer

1287
00:45:58,400 --> 00:46:01,680
the questions in the text chat in

1288
00:46:00,560 --> 00:46:06,640
venulis

1289
00:46:01,680 --> 00:46:07,440
uh up next at three at uh 2 25 p.m adt

1290
00:46:06,640 --> 00:46:10,240
is j

1291
00:46:07,440 --> 00:46:11,680
rosen rosenbaum with rolfer initiative

1292
00:46:10,240 --> 00:46:13,119
how to make the world of ai a more

1293
00:46:11,680 --> 00:46:14,400
ethical place

1294
00:46:13,119 --> 00:46:17,400
thank you very much mike

1295
00:46:14,400 --> 00:46:17,400
thanks

