1
00:00:06,320 --> 00:00:11,499
[Music]

2
00:00:15,360 --> 00:00:20,480
i'd like to introduce claire

3
00:00:17,520 --> 00:00:22,160
um claire is an urban planner

4
00:00:20,480 --> 00:00:24,480
and programmer and is currently

5
00:00:22,160 --> 00:00:26,000
undertaking a phd in the intersection of

6
00:00:24,480 --> 00:00:28,240
these two fields

7
00:00:26,000 --> 00:00:30,320
they find themselves writing scripts

8
00:00:28,240 --> 00:00:32,399
frequently during their research to help

9
00:00:30,320 --> 00:00:35,120
them record and make sense of the large

10
00:00:32,399 --> 00:00:38,000
amounts of qualitative data necessary to

11
00:00:35,120 --> 00:00:39,680
answer questions about the adoption of

12
00:00:38,000 --> 00:00:41,200
digital technology and the future of

13
00:00:39,680 --> 00:00:43,280
urban planning work

14
00:00:41,200 --> 00:00:45,440
outside of their phd claire works with

15
00:00:43,280 --> 00:00:47,760
others in urban planning pro in the

16
00:00:45,440 --> 00:00:49,680
urban planning profession to advocate

17
00:00:47,760 --> 00:00:51,920
for the use of open technology and

18
00:00:49,680 --> 00:00:54,640
standards to ensure good governance

19
00:00:51,920 --> 00:00:57,280
and our cities and regions so over to

20
00:00:54,640 --> 00:00:59,440
you claire

21
00:00:57,280 --> 00:01:03,440
hi thanks for that

22
00:00:59,440 --> 00:01:05,119
great looks like it's all working

23
00:01:03,440 --> 00:01:06,560
um

24
00:01:05,119 --> 00:01:08,560
let's get started

25
00:01:06,560 --> 00:01:11,119
so before we begin i would like to

26
00:01:08,560 --> 00:01:12,880
acknowledge the wonga people who are the

27
00:01:11,119 --> 00:01:14,400
traditional owners of the land from

28
00:01:12,880 --> 00:01:16,400
which i am speaking

29
00:01:14,400 --> 00:01:19,360
to you from today and acknowledge their

30
00:01:16,400 --> 00:01:21,840
elders past present and emerging and to

31
00:01:19,360 --> 00:01:24,080
also acknowledge that sovereignty was

32
00:01:21,840 --> 00:01:26,640
never seated

33
00:01:24,080 --> 00:01:28,720
so hi my name is claire daniel and i'm

34
00:01:26,640 --> 00:01:31,200
an urban planner and data science type

35
00:01:28,720 --> 00:01:34,560
person currently undertaking a phd at

36
00:01:31,200 --> 00:01:36,960
unsw and now as part of this phd i

37
00:01:34,560 --> 00:01:39,600
conducted a citation network analysis of

38
00:01:36,960 --> 00:01:41,759
the planning support systems literature

39
00:01:39,600 --> 00:01:43,040
now as fascinating as planning support

40
00:01:41,759 --> 00:01:46,320
systems are

41
00:01:43,040 --> 00:01:48,560
and as interesting as this niche area of

42
00:01:46,320 --> 00:01:50,799
academic endeavor is

43
00:01:48,560 --> 00:01:52,799
i've made the call that is probably not

44
00:01:50,799 --> 00:01:55,600
relevant to most of you watching today

45
00:01:52,799 --> 00:01:58,159
however the process of undertaking a

46
00:01:55,600 --> 00:02:00,560
citation network analysis may well be

47
00:01:58,159 --> 00:02:02,159
quite relevant for people within the

48
00:02:00,560 --> 00:02:04,399
galleries libraries and museums

49
00:02:02,159 --> 00:02:07,680
community so it is that which i thought

50
00:02:04,399 --> 00:02:10,239
i'd talk to you about today

51
00:02:07,680 --> 00:02:13,120
so right from the start what is citation

52
00:02:10,239 --> 00:02:15,120
network analysis well it is a type of

53
00:02:13,120 --> 00:02:17,200
systematic literature review which is

54
00:02:15,120 --> 00:02:19,920
essentially conceptually quite simple

55
00:02:17,200 --> 00:02:22,959
but can be computationally attached

56
00:02:19,920 --> 00:02:24,879
and um as we know in academic literature

57
00:02:22,959 --> 00:02:26,640
there are very strict con conventions

58
00:02:24,879 --> 00:02:28,959
saying if there's any information or

59
00:02:26,640 --> 00:02:31,280
ideas that have been published elsewhere

60
00:02:28,959 --> 00:02:33,440
that those are referenced within

61
00:02:31,280 --> 00:02:37,440
scholarly publications and there are a

62
00:02:33,440 --> 00:02:41,680
huge body of standards and syntax strict

63
00:02:37,440 --> 00:02:43,680
syntax the kind of process of doing so

64
00:02:41,680 --> 00:02:46,080
um and a citation network analysis

65
00:02:43,680 --> 00:02:48,319
simply assumes that if one document has

66
00:02:46,080 --> 00:02:51,040
cited another document that these two

67
00:02:48,319 --> 00:02:53,200
documents are somehow related and what

68
00:02:51,040 --> 00:02:55,840
you can then do is you can map out all

69
00:02:53,200 --> 00:02:57,200
of these citation relationships using a

70
00:02:55,840 --> 00:02:59,599
formal mathematical network

71
00:02:57,200 --> 00:03:02,159
representation or mathematical graph

72
00:02:59,599 --> 00:03:04,560
and you upon which you can then do

73
00:03:02,159 --> 00:03:06,000
various quantitative statistics which

74
00:03:04,560 --> 00:03:09,280
will give you insights into the

75
00:03:06,000 --> 00:03:11,760
structure of your research field

76
00:03:09,280 --> 00:03:14,720
so next up in our program how to

77
00:03:11,760 --> 00:03:17,760
citation network analysis well according

78
00:03:14,720 --> 00:03:19,360
to zaun strotman 2015 there are five

79
00:03:17,760 --> 00:03:21,040
steps which i will be going through in

80
00:03:19,360 --> 00:03:23,840
more detail

81
00:03:21,040 --> 00:03:27,040
step one delineation of your research

82
00:03:23,840 --> 00:03:29,599
field now this is difficult you do have

83
00:03:27,040 --> 00:03:31,440
to put a boundary around the research

84
00:03:29,599 --> 00:03:32,480
papers that you want to include you

85
00:03:31,440 --> 00:03:35,840
cannot

86
00:03:32,480 --> 00:03:37,840
feasibly do an analysis of everything so

87
00:03:35,840 --> 00:03:39,920
like most forms of systematic review

88
00:03:37,840 --> 00:03:43,440
this delineation is usually done on a

89
00:03:39,920 --> 00:03:44,480
keyword search in a citation database

90
00:03:43,440 --> 00:03:46,400
um

91
00:03:44,480 --> 00:03:48,239
and so what are your options for

92
00:03:46,400 --> 00:03:50,799
databases

93
00:03:48,239 --> 00:03:53,200
well traditionally there have been only

94
00:03:50,799 --> 00:03:56,080
two databases with decent coverage of

95
00:03:53,200 --> 00:03:58,799
published papers that is scopus and

96
00:03:56,080 --> 00:04:00,400
weber science of course google scholar

97
00:03:58,799 --> 00:04:02,239
has for long

98
00:04:00,400 --> 00:04:04,000
a long period of time now provided a

99
00:04:02,239 --> 00:04:06,319
free means of searching the academic

100
00:04:04,000 --> 00:04:07,519
literature that currently provides no

101
00:04:06,319 --> 00:04:09,920
easy way

102
00:04:07,519 --> 00:04:11,920
to to download data from that system in

103
00:04:09,920 --> 00:04:13,680
bulk and provides no api to its

104
00:04:11,920 --> 00:04:15,439
databases

105
00:04:13,680 --> 00:04:17,440
so traditionally weber science and

106
00:04:15,439 --> 00:04:20,320
scopus have been the only way to do

107
00:04:17,440 --> 00:04:22,079
broad scale citation analysis

108
00:04:20,320 --> 00:04:24,880
now my university is subscribed to

109
00:04:22,079 --> 00:04:26,720
scopus and for my analysis i use the

110
00:04:24,880 --> 00:04:28,400
scopus api

111
00:04:26,720 --> 00:04:30,320
um but the fact that i was advised to

112
00:04:28,400 --> 00:04:31,440
use the proprietary api for this

113
00:04:30,320 --> 00:04:34,160
analysis

114
00:04:31,440 --> 00:04:35,759
kind of seemed like another artifact of

115
00:04:34,160 --> 00:04:38,560
the structural problems that we have in

116
00:04:35,759 --> 00:04:41,440
academia with unjust academic paywalls

117
00:04:38,560 --> 00:04:43,840
and citation data is really important so

118
00:04:41,440 --> 00:04:47,120
the amount of academic literature is

119
00:04:43,840 --> 00:04:50,080
growing exponentially and without access

120
00:04:47,120 --> 00:04:51,440
to the citation data we kind of create

121
00:04:50,080 --> 00:04:53,680
the risk of

122
00:04:51,440 --> 00:04:55,520
duplication of effort creating even more

123
00:04:53,680 --> 00:04:58,240
research silos

124
00:04:55,520 --> 00:05:00,479
and even more importantly

125
00:04:58,240 --> 00:05:03,039
ensure like not

126
00:05:00,479 --> 00:05:03,840
preventing access to academic research

127
00:05:03,039 --> 00:05:06,000
from

128
00:05:03,840 --> 00:05:08,240
which is often publicly funded

129
00:05:06,000 --> 00:05:10,160
uh making it harder for for people who

130
00:05:08,240 --> 00:05:12,400
need to actually use that research to

131
00:05:10,160 --> 00:05:13,360
access it

132
00:05:12,400 --> 00:05:15,280
so

133
00:05:13,360 --> 00:05:17,440
in preparation for this talk i have

134
00:05:15,280 --> 00:05:19,919
actually done a little bit of reading

135
00:05:17,440 --> 00:05:23,039
about some of the new initiatives that

136
00:05:19,919 --> 00:05:25,039
are changing this status quo um and to

137
00:05:23,039 --> 00:05:26,479
qualify this information i'm not

138
00:05:25,039 --> 00:05:30,320
personally involved in any of the

139
00:05:26,479 --> 00:05:31,600
projects that i am about to mention

140
00:05:30,320 --> 00:05:33,440
so

141
00:05:31,600 --> 00:05:36,320
it seems that the initiative for open

142
00:05:33,440 --> 00:05:38,800
citations has been a big influence in

143
00:05:36,320 --> 00:05:39,919
recent years to make citation data more

144
00:05:38,800 --> 00:05:42,160
accessible

145
00:05:39,919 --> 00:05:44,800
crossref of course has existed since

146
00:05:42,160 --> 00:05:48,240
2000 and crossref is the registration

147
00:05:44,800 --> 00:05:50,639
agency for digital object identifiers

148
00:05:48,240 --> 00:05:53,440
for scholarly work and it maintains an

149
00:05:50,639 --> 00:05:54,960
open infrastructure an open database

150
00:05:53,440 --> 00:05:57,759
to which

151
00:05:54,960 --> 00:05:59,840
various publishers submit

152
00:05:57,759 --> 00:06:02,720
details of their publications and it

153
00:05:59,840 --> 00:06:06,800
maintains an open api

154
00:06:02,720 --> 00:06:09,039
um however in 2017 just one percent of

155
00:06:06,800 --> 00:06:11,919
the eligible papers

156
00:06:09,039 --> 00:06:14,960
that were listed in crossref

157
00:06:11,919 --> 00:06:17,120
uh contained open citation data and this

158
00:06:14,960 --> 00:06:20,080
is when the where the initiative for

159
00:06:17,120 --> 00:06:22,000
open citations comes in so this

160
00:06:20,080 --> 00:06:24,400
initiative was supporting publishers to

161
00:06:22,000 --> 00:06:26,479
open this citation data to

162
00:06:24,400 --> 00:06:28,880
allow it to be openly available in

163
00:06:26,479 --> 00:06:30,080
crossref and there's been massive

164
00:06:28,880 --> 00:06:32,880
success

165
00:06:30,080 --> 00:06:34,080
because as of october 2021 the

166
00:06:32,880 --> 00:06:36,240
percentage of

167
00:06:34,080 --> 00:06:39,520
relevant articles that now provide this

168
00:06:36,240 --> 00:06:42,000
data openly is has gone up to a whopping

169
00:06:39,520 --> 00:06:42,000
88

170
00:06:42,400 --> 00:06:46,639
separate to this um there was an

171
00:06:44,720 --> 00:06:48,880
initiative by microsoft called microsoft

172
00:06:46,639 --> 00:06:50,880
academic and this was doing something

173
00:06:48,880 --> 00:06:52,960
similar to the way that google

174
00:06:50,880 --> 00:06:55,199
automatically compiles its records in

175
00:06:52,960 --> 00:06:57,440
google scholar but unlike google

176
00:06:55,199 --> 00:06:59,599
microsoft academic made

177
00:06:57,440 --> 00:07:02,560
its data available over

178
00:06:59,599 --> 00:07:04,800
under a license like an open attribution

179
00:07:02,560 --> 00:07:06,960
type of license

180
00:07:04,800 --> 00:07:08,960
um and so from these initiatives you

181
00:07:06,960 --> 00:07:10,720
start to see a number of open projects

182
00:07:08,960 --> 00:07:13,520
for searching the literature kind of

183
00:07:10,720 --> 00:07:15,360
starting to bloom um that rely on one or

184
00:07:13,520 --> 00:07:17,599
the other of these two databases or a

185
00:07:15,360 --> 00:07:19,840
combination of both and this is quite

186
00:07:17,599 --> 00:07:22,400
good because the search functions built

187
00:07:19,840 --> 00:07:24,960
into the cross cross riff infrastructure

188
00:07:22,400 --> 00:07:27,039
are very rudimentarily essentially it's

189
00:07:24,960 --> 00:07:29,919
designed to return the details of an

190
00:07:27,039 --> 00:07:33,120
individual paper if you feed it uh

191
00:07:29,919 --> 00:07:36,080
individual identification information um

192
00:07:33,120 --> 00:07:37,840
like the doi or the

193
00:07:36,080 --> 00:07:39,199
title and also details so on and so

194
00:07:37,840 --> 00:07:41,520
forth

195
00:07:39,199 --> 00:07:43,440
um however

196
00:07:41,520 --> 00:07:45,520
microsoft academic has actually been

197
00:07:43,440 --> 00:07:47,599
retired as of about two weeks ago

198
00:07:45,520 --> 00:07:49,440
december 2021

199
00:07:47,599 --> 00:07:52,319
um and i'm not quite sure what the

200
00:07:49,440 --> 00:07:54,879
downstream impact this might have on

201
00:07:52,319 --> 00:07:56,639
these projects

202
00:07:54,879 --> 00:07:58,560
luckily however when we look at the

203
00:07:56,639 --> 00:08:01,360
overall coverage of each of these

204
00:07:58,560 --> 00:08:04,400
foundation databases there's been some

205
00:08:01,360 --> 00:08:07,520
analysis done by martin martin at al

206
00:08:04,400 --> 00:08:10,479
and cross rift ref has really shot up so

207
00:08:07,520 --> 00:08:12,639
in mid 2021 they finally convinced

208
00:08:10,479 --> 00:08:14,879
elsevier who i hear is one of the

209
00:08:12,639 --> 00:08:16,720
largest academic publishers in the world

210
00:08:14,879 --> 00:08:19,280
finally convinced them to make their

211
00:08:16,720 --> 00:08:20,560
citation data openly available in

212
00:08:19,280 --> 00:08:22,720
crossref

213
00:08:20,560 --> 00:08:24,639
um and according to the study it is

214
00:08:22,720 --> 00:08:26,319
looking like the

215
00:08:24,639 --> 00:08:28,960
citation databases that have been built

216
00:08:26,319 --> 00:08:31,840
from that data are now rivaling

217
00:08:28,960 --> 00:08:34,000
uh the proprietary databases of scopus

218
00:08:31,840 --> 00:08:35,360
that like kobus dimensions web science

219
00:08:34,000 --> 00:08:38,000
in terms of

220
00:08:35,360 --> 00:08:40,399
the scope of their coverage so that is

221
00:08:38,000 --> 00:08:41,360
good news

222
00:08:40,399 --> 00:08:44,080
right

223
00:08:41,360 --> 00:08:46,399
so once we have chosen our database

224
00:08:44,080 --> 00:08:48,399
there will be various python packages

225
00:08:46,399 --> 00:08:51,680
that will assist with extracting data

226
00:08:48,399 --> 00:08:54,399
from some of the more well-known apis

227
00:08:51,680 --> 00:08:57,040
and most of these apis will return to

228
00:08:54,399 --> 00:08:59,040
you a long list of data metadata about

229
00:08:57,040 --> 00:09:01,360
every individual paper the title the

230
00:08:59,040 --> 00:09:03,600
attract details journal details

231
00:09:01,360 --> 00:09:07,279
institutional details and of course a

232
00:09:03,600 --> 00:09:09,440
list of digital ids for their reference

233
00:09:07,279 --> 00:09:10,880
list

234
00:09:09,440 --> 00:09:14,000
onto step two

235
00:09:10,880 --> 00:09:16,720
construction of the network

236
00:09:14,000 --> 00:09:18,560
so first to construct a raw citation

237
00:09:16,720 --> 00:09:21,680
network for

238
00:09:18,560 --> 00:09:24,399
for our purposes we need to construct a

239
00:09:21,680 --> 00:09:27,440
raw adjacency matrix so this is a large

240
00:09:24,399 --> 00:09:29,920
and sparse matrix of ones and zeros

241
00:09:27,440 --> 00:09:32,399
where one access represents the citing

242
00:09:29,920 --> 00:09:34,959
papers and the other access represents

243
00:09:32,399 --> 00:09:38,480
the cited papers and so you can see on

244
00:09:34,959 --> 00:09:40,640
the slide if paper a size paper b

245
00:09:38,480 --> 00:09:44,720
you then put a one in the corresponding

246
00:09:40,640 --> 00:09:46,880
cell it is as simple as that

247
00:09:44,720 --> 00:09:48,800
and so for those of you who are working

248
00:09:46,880 --> 00:09:50,640
with people perhaps who aren't python

249
00:09:48,800 --> 00:09:52,880
users or aren't programmers

250
00:09:50,640 --> 00:09:54,880
the good news is there is plenty of free

251
00:09:52,880 --> 00:09:56,399
and open source software that has

252
00:09:54,880 --> 00:09:58,880
graphic user interfaces that will

253
00:09:56,399 --> 00:10:01,200
construct these networks automatically

254
00:09:58,880 --> 00:10:02,800
from data that is downloaded manually

255
00:10:01,200 --> 00:10:06,079
from the websites of these various

256
00:10:02,800 --> 00:10:08,720
citation databases and some of the

257
00:10:06,079 --> 00:10:10,399
more useful ones i have put up on the

258
00:10:08,720 --> 00:10:12,160
screen

259
00:10:10,399 --> 00:10:14,399
for those that are python users there

260
00:10:12,160 --> 00:10:16,880
may be packages that do a lot of this

261
00:10:14,399 --> 00:10:19,040
but for my part i constructed a list of

262
00:10:16,880 --> 00:10:21,120
the citation pairs

263
00:10:19,040 --> 00:10:23,600
and then i gave this to the network x

264
00:10:21,120 --> 00:10:27,279
package to construct a graph and then

265
00:10:23,600 --> 00:10:29,600
generate the raw adjacency matrix for me

266
00:10:27,279 --> 00:10:32,320
which was a fairly straight forward

267
00:10:29,600 --> 00:10:33,360
process

268
00:10:32,320 --> 00:10:35,920
so

269
00:10:33,360 --> 00:10:38,720
we have our adjacency matrix we've got

270
00:10:35,920 --> 00:10:40,839
our raw network set up the second step

271
00:10:38,720 --> 00:10:43,200
half of step two comes with a major

272
00:10:40,839 --> 00:10:45,040
methodological consideration

273
00:10:43,200 --> 00:10:47,519
which is which connected is

274
00:10:45,040 --> 00:10:50,000
connectedness measure to use

275
00:10:47,519 --> 00:10:52,320
so essentially the raw citation network

276
00:10:50,000 --> 00:10:55,120
is very sparse uh it contains a lot of

277
00:10:52,320 --> 00:10:57,200
zeros it contains a lot of blank space

278
00:10:55,120 --> 00:11:00,320
um and therefore it's very hard to

279
00:10:57,200 --> 00:11:02,240
calculate any meaningful statistics with

280
00:11:00,320 --> 00:11:03,360
this

281
00:11:02,240 --> 00:11:05,519
matrix

282
00:11:03,360 --> 00:11:07,600
so instead what we do is we calculate

283
00:11:05,519 --> 00:11:09,920
connectedness measures to solidify this

284
00:11:07,600 --> 00:11:12,800
matrix and we use the raw adjacency

285
00:11:09,920 --> 00:11:15,440
matrix to calculate either a co-citation

286
00:11:12,800 --> 00:11:18,320
matrix or a bibliographic coupling

287
00:11:15,440 --> 00:11:19,680
matrix these things are both useful

288
00:11:18,320 --> 00:11:20,880
but they measure slightly different

289
00:11:19,680 --> 00:11:22,320
things so it's important to know the

290
00:11:20,880 --> 00:11:24,800
difference between these two things when

291
00:11:22,320 --> 00:11:27,360
you interpret your results

292
00:11:24,800 --> 00:11:30,720
so to try to explain a co-citation

293
00:11:27,360 --> 00:11:34,160
matrix represents how many times paper a

294
00:11:30,720 --> 00:11:37,360
and paper b appear in the same reference

295
00:11:34,160 --> 00:11:40,480
list of another article

296
00:11:37,360 --> 00:11:43,040
um and a bibliographic coupling matrix

297
00:11:40,480 --> 00:11:45,839
on the other hand measures how many

298
00:11:43,040 --> 00:11:47,040
references paper a and paper b have in

299
00:11:45,839 --> 00:11:48,160
common

300
00:11:47,040 --> 00:11:50,880
and so

301
00:11:48,160 --> 00:11:53,600
as you imagine the co-citation matrix is

302
00:11:50,880 --> 00:11:56,079
most useful in the identification of

303
00:11:53,600 --> 00:11:58,320
groups of influential papers important

304
00:11:56,079 --> 00:12:00,639
in defining the past direction and

305
00:11:58,320 --> 00:12:02,880
structure of your research field while

306
00:12:00,639 --> 00:12:04,000
the bibliographic cutting matrix is a

307
00:12:02,880 --> 00:12:06,079
little more useful for the

308
00:12:04,000 --> 00:12:08,800
identification of clauses or papers that

309
00:12:06,079 --> 00:12:10,800
draw on similar ideas and it's a

310
00:12:08,800 --> 00:12:13,120
slightly more useful measure for

311
00:12:10,800 --> 00:12:16,399
classifying recent papers that have yet

312
00:12:13,120 --> 00:12:16,399
to be cited by others

313
00:12:16,880 --> 00:12:21,360
so in python all you need to do here is

314
00:12:19,279 --> 00:12:24,560
transpose the raw adjacency matrix and

315
00:12:21,360 --> 00:12:27,040
multiply it by the original adjacency

316
00:12:24,560 --> 00:12:30,480
matrix and further solidify i've applied

317
00:12:27,040 --> 00:12:32,639
a cosine similarity function

318
00:12:30,480 --> 00:12:34,639
on to step three which is multivariate

319
00:12:32,639 --> 00:12:36,639
statistical analysis

320
00:12:34,639 --> 00:12:39,279
so the standard method of citation

321
00:12:36,639 --> 00:12:41,839
analysis is then to perform multivariate

322
00:12:39,279 --> 00:12:44,800
statistical analysis or factor analysis

323
00:12:41,839 --> 00:12:46,800
on our matrix this is a linear

324
00:12:44,800 --> 00:12:49,839
math statistical method and it's a great

325
00:12:46,800 --> 00:12:53,120
way to identify a smaller number of

326
00:12:49,839 --> 00:12:54,959
factors when you have a large

327
00:12:53,120 --> 00:12:58,160
uh but in the relationships between a

328
00:12:54,959 --> 00:12:59,920
large number of underlying variables and

329
00:12:58,160 --> 00:13:00,959
in our analysis we have a lot of

330
00:12:59,920 --> 00:13:02,720
underlying

331
00:13:00,959 --> 00:13:04,959
underlying variables because every

332
00:13:02,720 --> 00:13:07,360
individual paper in the group of papers

333
00:13:04,959 --> 00:13:10,959
that we are studying our research corpus

334
00:13:07,360 --> 00:13:12,800
is a is an individual variable um so

335
00:13:10,959 --> 00:13:14,560
what we really need to do is we need to

336
00:13:12,800 --> 00:13:17,279
group these papers together we need to

337
00:13:14,560 --> 00:13:18,720
identify a few underlying research

338
00:13:17,279 --> 00:13:21,200
themes

339
00:13:18,720 --> 00:13:23,839
um and these under the these underlying

340
00:13:21,200 --> 00:13:25,680
factors will will uh these underlying

341
00:13:23,839 --> 00:13:27,440
research themes might represent

342
00:13:25,680 --> 00:13:30,000
different topics or different lines of

343
00:13:27,440 --> 00:13:32,079
inquiry within your research field

344
00:13:30,000 --> 00:13:36,079
um and this type of statistical analysis

345
00:13:32,079 --> 00:13:38,480
has proved to be quite a robust way of

346
00:13:36,079 --> 00:13:41,040
doing this kind of characterization of

347
00:13:38,480 --> 00:13:43,120
your research field

348
00:13:41,040 --> 00:13:45,760
so principal component analysis is the

349
00:13:43,120 --> 00:13:47,760
most common type of factor analysis used

350
00:13:45,760 --> 00:13:49,920
and there are various statistical

351
00:13:47,760 --> 00:13:52,959
program packages in python that will

352
00:13:49,920 --> 00:13:54,800
help you do this

353
00:13:52,959 --> 00:13:57,199
so four network analysis and

354
00:13:54,800 --> 00:13:59,199
visualization so in addition to the

355
00:13:57,199 --> 00:14:01,199
factory analysis there are various other

356
00:13:59,199 --> 00:14:03,760
quantitative measures you can use to

357
00:14:01,199 --> 00:14:05,279
analyze your network um you have your

358
00:14:03,760 --> 00:14:07,839
algorithms that will do your network

359
00:14:05,279 --> 00:14:09,839
petitioning although that's not as good

360
00:14:07,839 --> 00:14:12,160
as your factor analysis because network

361
00:14:09,839 --> 00:14:14,800
partitioning algorithms will force you

362
00:14:12,160 --> 00:14:16,320
to classify paper in one theme or the

363
00:14:14,800 --> 00:14:18,480
other when in reality they could be

364
00:14:16,320 --> 00:14:19,680
related to to more than one different

365
00:14:18,480 --> 00:14:20,880
theme

366
00:14:19,680 --> 00:14:21,920
um

367
00:14:20,880 --> 00:14:23,519
but

368
00:14:21,920 --> 00:14:25,120
the other useful thing that you can do

369
00:14:23,519 --> 00:14:27,519
with your network is to calculate

370
00:14:25,120 --> 00:14:29,360
various measures of centrality so you

371
00:14:27,519 --> 00:14:31,600
have your things like your degree

372
00:14:29,360 --> 00:14:33,760
centrality which is the number of times

373
00:14:31,600 --> 00:14:35,839
a paper is cited by others in that

374
00:14:33,760 --> 00:14:37,760
network um all the way through to

375
00:14:35,839 --> 00:14:40,000
something like betweenness centrality

376
00:14:37,760 --> 00:14:42,880
which means that your paper kind of

377
00:14:40,000 --> 00:14:45,680
forms a node in a very in a frequent

378
00:14:42,880 --> 00:14:48,079
path a path that's frequently um

379
00:14:45,680 --> 00:14:49,600
used if you were to run an algorithm

380
00:14:48,079 --> 00:14:51,680
going from one

381
00:14:49,600 --> 00:14:53,760
side of your network to the other for

382
00:14:51,680 --> 00:14:55,600
instance and that could indicate a high

383
00:14:53,760 --> 00:14:57,120
between a centrality could indicate that

384
00:14:55,600 --> 00:15:00,240
that paper has some kind of boundary

385
00:14:57,120 --> 00:15:01,760
spanning properties um and is is

386
00:15:00,240 --> 00:15:06,079
expanding across different research

387
00:15:01,760 --> 00:15:06,079
ideas or different research silos

388
00:15:06,320 --> 00:15:10,160
there are lots of python packages out

389
00:15:08,320 --> 00:15:12,639
there that will help you do network

390
00:15:10,160 --> 00:15:14,639
analysis i used igraph because it is

391
00:15:12,639 --> 00:15:17,440
what i'm most familiar with but there's

392
00:15:14,639 --> 00:15:20,320
the aforementioned network x and at

393
00:15:17,440 --> 00:15:22,079
python this year i went to a talk by

394
00:15:20,320 --> 00:15:25,040
someone who is

395
00:15:22,079 --> 00:15:27,199
helping out with kg lab which is a

396
00:15:25,040 --> 00:15:28,240
python project kind of looking to

397
00:15:27,199 --> 00:15:30,160
integrate

398
00:15:28,240 --> 00:15:33,120
all of these things

399
00:15:30,160 --> 00:15:35,120
um and whilst these python packages will

400
00:15:33,120 --> 00:15:39,040
draw decent represent visual

401
00:15:35,120 --> 00:15:40,959
representations of the raw networks i do

402
00:15:39,040 --> 00:15:43,440
recommend using software the graphic

403
00:15:40,959 --> 00:15:45,920
user interface because that will make it

404
00:15:43,440 --> 00:15:48,000
easier for you to play with your visual

405
00:15:45,920 --> 00:15:49,120
properties if you are doing your network

406
00:15:48,000 --> 00:15:51,920
diagrams

407
00:15:49,120 --> 00:15:54,000
um doing graphics programmatically can

408
00:15:51,920 --> 00:15:54,720
be a faff

409
00:15:54,000 --> 00:15:58,000
so

410
00:15:54,720 --> 00:16:00,000
step five interpretation of results

411
00:15:58,000 --> 00:16:02,880
and validation

412
00:16:00,000 --> 00:16:05,600
um so the important thing here is to go

413
00:16:02,880 --> 00:16:07,440
back and look at each of the subgroups

414
00:16:05,600 --> 00:16:09,360
of papers that you have identified so

415
00:16:07,440 --> 00:16:12,959
your factor analysis will give every

416
00:16:09,360 --> 00:16:15,279
individual paper a score for each factor

417
00:16:12,959 --> 00:16:18,079
and you look at the groups of paper

418
00:16:15,279 --> 00:16:20,880
which uh have a high score within each

419
00:16:18,079 --> 00:16:23,600
factor usually above 0.3 or 0.5

420
00:16:20,880 --> 00:16:24,880
depending on what tolerance you choose

421
00:16:23,600 --> 00:16:27,600
and so you look at all those high

422
00:16:24,880 --> 00:16:28,720
scoring papers in each under each factor

423
00:16:27,600 --> 00:16:30,399
as a group

424
00:16:28,720 --> 00:16:32,320
um and you can do things like

425
00:16:30,399 --> 00:16:34,160
descriptive statistics looking at the

426
00:16:32,320 --> 00:16:35,920
type the the universities and the

427
00:16:34,160 --> 00:16:38,720
countries that are contributing research

428
00:16:35,920 --> 00:16:41,199
in that subgroup um but most importantly

429
00:16:38,720 --> 00:16:43,440
is to do your qualitative analysis and

430
00:16:41,199 --> 00:16:45,199
now while this might seem quite daunting

431
00:16:43,440 --> 00:16:48,560
when you have thousands of research

432
00:16:45,199 --> 00:16:50,560
papers it is actually quite gratifying

433
00:16:48,560 --> 00:16:52,399
at least in my experience

434
00:16:50,560 --> 00:16:54,240
when even just skimming through the

435
00:16:52,399 --> 00:16:57,040
titles of those subgroups just how

436
00:16:54,240 --> 00:17:00,079
quickly those research themes emerge and

437
00:16:57,040 --> 00:17:02,839
how easy it is to delineate those

438
00:17:00,079 --> 00:17:05,600
different groups of papers within your

439
00:17:02,839 --> 00:17:07,280
network um and to give you a better idea

440
00:17:05,600 --> 00:17:09,600
of the type of insights that are

441
00:17:07,280 --> 00:17:11,919
possible for from citation network

442
00:17:09,600 --> 00:17:13,600
analysis i would like to run through

443
00:17:11,919 --> 00:17:16,799
some of the key figures

444
00:17:13,600 --> 00:17:18,880
from my upcoming paper

445
00:17:16,799 --> 00:17:21,039
um so this is just a raw citation

446
00:17:18,880 --> 00:17:24,000
network showing a number of different

447
00:17:21,039 --> 00:17:27,600
keyword searches

448
00:17:24,000 --> 00:17:29,600
this shows the kind of research output

449
00:17:27,600 --> 00:17:31,760
that was um

450
00:17:29,600 --> 00:17:33,840
downloaded for each of those keyword

451
00:17:31,760 --> 00:17:36,559
searches over time and you can see the

452
00:17:33,840 --> 00:17:37,760
trends in those different fields

453
00:17:36,559 --> 00:17:39,039
over time

454
00:17:37,760 --> 00:17:41,600
with my

455
00:17:39,039 --> 00:17:43,840
fields kind of modest

456
00:17:41,600 --> 00:17:46,320
modest increment kind of being dwarfed

457
00:17:43,840 --> 00:17:48,160
by exponential increases in the interest

458
00:17:46,320 --> 00:17:50,720
in things like smart cities and urban

459
00:17:48,160 --> 00:17:52,320
science and urban analytics

460
00:17:50,720 --> 00:17:55,200
um

461
00:17:52,320 --> 00:17:57,679
this is a diagram that shows kind of the

462
00:17:55,200 --> 00:18:01,760
first order relationships between

463
00:17:57,679 --> 00:18:03,919
various different keyword searches

464
00:18:01,760 --> 00:18:06,160
um and how closely though those

465
00:18:03,919 --> 00:18:08,799
different areas are related

466
00:18:06,160 --> 00:18:10,640
and this is my

467
00:18:08,799 --> 00:18:13,039
research corpus itself so this is the

468
00:18:10,640 --> 00:18:15,679
planning support systems literature and

469
00:18:13,039 --> 00:18:18,400
it's been colored by the highest scoring

470
00:18:15,679 --> 00:18:20,400
factor for each paper and gratifyingly

471
00:18:18,400 --> 00:18:21,200
you can already see those clusters kind

472
00:18:20,400 --> 00:18:22,320
of

473
00:18:21,200 --> 00:18:23,200
within the

474
00:18:22,320 --> 00:18:27,120
raw

475
00:18:23,200 --> 00:18:29,840
um citation network as well

476
00:18:27,120 --> 00:18:31,600
and the size of each of the little

477
00:18:29,840 --> 00:18:33,520
bubbles is corresponds to the number of

478
00:18:31,600 --> 00:18:34,960
citations

479
00:18:33,520 --> 00:18:35,760
and like

480
00:18:34,960 --> 00:18:39,120
you

481
00:18:35,760 --> 00:18:40,880
and similar to the having those large

482
00:18:39,120 --> 00:18:42,400
those few papers with a large number of

483
00:18:40,880 --> 00:18:44,320
citations that's quite

484
00:18:42,400 --> 00:18:46,000
quite normal in the citation network so

485
00:18:44,320 --> 00:18:47,919
you'll have

486
00:18:46,000 --> 00:18:50,559
um

487
00:18:47,919 --> 00:18:52,640
a it's kind of like any social network

488
00:18:50,559 --> 00:18:55,679
it follows a power law you'll find that

489
00:18:52,640 --> 00:18:57,919
there are papers that act as hubs within

490
00:18:55,679 --> 00:19:00,080
the networks which have exponentially

491
00:18:57,919 --> 00:19:01,679
more citations

492
00:19:00,080 --> 00:19:03,520
um than

493
00:19:01,679 --> 00:19:05,600
the majority of

494
00:19:03,520 --> 00:19:08,400
papers within your network which might

495
00:19:05,600 --> 00:19:10,240
only have one or two

496
00:19:08,400 --> 00:19:12,080
uh you can do these kinds of descriptive

497
00:19:10,240 --> 00:19:14,880
statistics on the metadata like the

498
00:19:12,080 --> 00:19:17,440
countries the institutions um the most

499
00:19:14,880 --> 00:19:20,400
prolific authors

500
00:19:17,440 --> 00:19:23,200
here's another representation that i did

501
00:19:20,400 --> 00:19:27,200
of my different research streams kind of

502
00:19:23,200 --> 00:19:29,120
showing the overlap between the papers

503
00:19:27,200 --> 00:19:30,000
within those different research streams

504
00:19:29,120 --> 00:19:31,919
and again those bubbles are

505
00:19:30,000 --> 00:19:35,679
proportionate to the number

506
00:19:31,919 --> 00:19:38,240
of papers within the research streams

507
00:19:35,679 --> 00:19:40,160
uh here's another showing each of my

508
00:19:38,240 --> 00:19:42,080
individual factors or research streams

509
00:19:40,160 --> 00:19:45,120
to change changing those different

510
00:19:42,080 --> 00:19:45,120
groups over time

511
00:19:45,520 --> 00:19:48,799
so

512
00:19:46,559 --> 00:19:51,440
um that's some of the things that you

513
00:19:48,799 --> 00:19:53,360
can do with it and in conclusion

514
00:19:51,440 --> 00:19:55,039
performing a citation network analysis

515
00:19:53,360 --> 00:19:56,480
of course doesn't replace a traditional

516
00:19:55,039 --> 00:19:58,000
literature review

517
00:19:56,480 --> 00:19:59,440
doing a citation network analysis

518
00:19:58,000 --> 00:20:01,200
doesn't tell you

519
00:19:59,440 --> 00:20:02,960
um anything about the quality of the

520
00:20:01,200 --> 00:20:05,840
research itself or even that much about

521
00:20:02,960 --> 00:20:08,320
its findings but it is a really good

522
00:20:05,840 --> 00:20:10,000
shortcut to kind of that overall view of

523
00:20:08,320 --> 00:20:11,600
your research field and identifying

524
00:20:10,000 --> 00:20:14,720
those key players

525
00:20:11,600 --> 00:20:17,679
and those key lines of inquiry

526
00:20:14,720 --> 00:20:19,280
um one word of warning for anyone who is

527
00:20:17,679 --> 00:20:21,440
going to attempt this or

528
00:20:19,280 --> 00:20:23,039
advising anyone to attempt this that

529
00:20:21,440 --> 00:20:25,039
there are hundreds of different

530
00:20:23,039 --> 00:20:27,600
statistics that can be calculated from

531
00:20:25,039 --> 00:20:29,760
these citation data sets uh which was a

532
00:20:27,600 --> 00:20:31,200
little overwhelming and that they're for

533
00:20:29,760 --> 00:20:32,880
and then it's kind of a challenge for

534
00:20:31,200 --> 00:20:35,760
the researcher to condense all of these

535
00:20:32,880 --> 00:20:37,919
statistics into some kind of meaningful

536
00:20:35,760 --> 00:20:40,240
and useful story

537
00:20:37,919 --> 00:20:42,960
finally though to reiterate academic

538
00:20:40,240 --> 00:20:45,440
literature is growing exponentially and

539
00:20:42,960 --> 00:20:48,080
opening this data and these tools will

540
00:20:45,440 --> 00:20:50,799
become more and more important both as a

541
00:20:48,080 --> 00:20:53,120
means of just retaining

542
00:20:50,799 --> 00:20:54,240
the ability to keep track of research

543
00:20:53,120 --> 00:20:56,400
findings

544
00:20:54,240 --> 00:20:59,280
um which will prevent duplication of

545
00:20:56,400 --> 00:21:02,960
effort but also make research more

546
00:20:59,280 --> 00:21:06,159
accessible to everyone

547
00:21:02,960 --> 00:21:08,320
here is the main how-to text that i use

548
00:21:06,159 --> 00:21:11,200
when i perform my own analysis and it's

549
00:21:08,320 --> 00:21:13,120
a really useful textbook that is free to

550
00:21:11,200 --> 00:21:14,080
download

551
00:21:13,120 --> 00:21:15,919
um

552
00:21:14,080 --> 00:21:17,919
also if you're in a specific field i

553
00:21:15,919 --> 00:21:19,919
would recommend that you look for

554
00:21:17,919 --> 00:21:21,520
previously published papers that utilize

555
00:21:19,919 --> 00:21:22,640
this method to get an

556
00:21:21,520 --> 00:21:24,960
idea

557
00:21:22,640 --> 00:21:26,720
of how how it could be used in your own

558
00:21:24,960 --> 00:21:30,480
speciality

559
00:21:26,720 --> 00:21:32,640
um the full paper of my analysis will be

560
00:21:30,480 --> 00:21:35,039
published very shortly it's been

561
00:21:32,640 --> 00:21:37,919
accepted and i've submitted the final

562
00:21:35,039 --> 00:21:40,159
documents when it is published i will

563
00:21:37,919 --> 00:21:42,480
make sure the pre-print will go up on my

564
00:21:40,159 --> 00:21:46,320
personal website and the code will be

565
00:21:42,480 --> 00:21:47,360
available on my github account

566
00:21:46,320 --> 00:21:50,080
so

567
00:21:47,360 --> 00:21:52,799
thank you everyone and

568
00:21:50,080 --> 00:21:54,799
happy to take any discussion or any

569
00:21:52,799 --> 00:21:57,200
questions

570
00:21:54,799 --> 00:22:00,400
wonderful thank you so much claire

571
00:21:57,200 --> 00:22:03,039
um we currently have two questions uh

572
00:22:00,400 --> 00:22:05,200
first one is i'm on the periphery of the

573
00:22:03,039 --> 00:22:07,679
research world this seems to be useful

574
00:22:05,200 --> 00:22:10,080
to use uh connections between papers

575
00:22:07,679 --> 00:22:13,600
rather than the content of the paper is

576
00:22:10,080 --> 00:22:15,600
that another area of research

577
00:22:13,600 --> 00:22:18,080
connections between papers rather than

578
00:22:15,600 --> 00:22:20,080
content so

579
00:22:18,080 --> 00:22:21,120
okay so yes

580
00:22:20,080 --> 00:22:23,679
um

581
00:22:21,120 --> 00:22:26,000
the citation network analysis is good

582
00:22:23,679 --> 00:22:27,520
for looking at getting that overall

583
00:22:26,000 --> 00:22:30,400
structure

584
00:22:27,520 --> 00:22:32,000
of the field um

585
00:22:30,400 --> 00:22:33,520
and those identifying those

586
00:22:32,000 --> 00:22:35,200
relationships and those different

587
00:22:33,520 --> 00:22:37,760
research streams

588
00:22:35,200 --> 00:22:39,360
you might identify kind of silos of

589
00:22:37,760 --> 00:22:41,520
thought or or

590
00:22:39,360 --> 00:22:43,039
kind of different groups of universities

591
00:22:41,520 --> 00:22:44,559
that might be collaborating really

592
00:22:43,039 --> 00:22:45,360
closely together

593
00:22:44,559 --> 00:22:47,840
um

594
00:22:45,360 --> 00:22:49,520
it's a really good way of making sure

595
00:22:47,840 --> 00:22:53,200
you don't miss

596
00:22:49,520 --> 00:22:55,200
those really key pieces of research that

597
00:22:53,200 --> 00:22:57,440
have been done before that have been

598
00:22:55,200 --> 00:22:58,159
cited thousands of times and because

599
00:22:57,440 --> 00:22:59,840
they

600
00:22:58,159 --> 00:23:02,159
i know i've done that before and i've

601
00:22:59,840 --> 00:23:04,000
always felt stupid when i'm like oh i

602
00:23:02,159 --> 00:23:05,280
should have known about this

603
00:23:04,000 --> 00:23:08,320
um

604
00:23:05,280 --> 00:23:11,039
but yes it doesn't it doesn't help you

605
00:23:08,320 --> 00:23:13,600
really evaluate the quality of the

606
00:23:11,039 --> 00:23:16,080
research as such so you still need to do

607
00:23:13,600 --> 00:23:17,919
your traditional literature review or

608
00:23:16,080 --> 00:23:20,400
you can do another type of systematic

609
00:23:17,919 --> 00:23:23,679
literature review where you do

610
00:23:20,400 --> 00:23:26,799
go in and read the papers exhaustively

611
00:23:23,679 --> 00:23:28,240
and that kind of thing um yeah so this

612
00:23:26,799 --> 00:23:29,919
is just another

613
00:23:28,240 --> 00:23:31,840
another useful tool and those tools are

614
00:23:29,919 --> 00:23:34,080
being um

615
00:23:31,840 --> 00:23:36,320
with the with this data becoming so much

616
00:23:34,080 --> 00:23:38,559
more open and a number of those open

617
00:23:36,320 --> 00:23:40,159
source projects both with the apis but

618
00:23:38,559 --> 00:23:43,600
also um

619
00:23:40,159 --> 00:23:46,640
the free open source software which

620
00:23:43,600 --> 00:23:48,480
anyone can can download and use

621
00:23:46,640 --> 00:23:51,279
without necessarily

622
00:23:48,480 --> 00:23:53,679
a statistics degree or a computer

623
00:23:51,279 --> 00:23:55,440
programming degree um

624
00:23:53,679 --> 00:23:57,200
it's kind of something that i would

625
00:23:55,440 --> 00:24:00,320
recommend most people give a go before

626
00:23:57,200 --> 00:24:02,799
they start a major research project

627
00:24:00,320 --> 00:24:06,240
yeah interestingly um because i work in

628
00:24:02,799 --> 00:24:07,520
the library sector we um try and link up

629
00:24:06,240 --> 00:24:10,080
between universities and their

630
00:24:07,520 --> 00:24:12,080
repositories as to who's got the same

631
00:24:10,080 --> 00:24:13,600
papers and if there's duplicates do we

632
00:24:12,080 --> 00:24:14,880
really need to store the duplicates so

633
00:24:13,600 --> 00:24:15,840
that would be an interesting study as

634
00:24:14,880 --> 00:24:18,159
well

635
00:24:15,840 --> 00:24:20,720
um another question for you did claire

636
00:24:18,159 --> 00:24:23,600
consider creating an intermediary mind

637
00:24:20,720 --> 00:24:26,080
map of what she had found or taxonomy so

638
00:24:23,600 --> 00:24:29,120
adding a halfway step that is more an

639
00:24:26,080 --> 00:24:32,120
art than science

640
00:24:29,120 --> 00:24:32,120
right

641
00:24:32,240 --> 00:24:37,200
um

642
00:24:34,640 --> 00:24:40,080
i suppose

643
00:24:37,200 --> 00:24:41,440
what i did to make sense of it so when

644
00:24:40,080 --> 00:24:44,640
you get your

645
00:24:41,440 --> 00:24:46,880
results from your factor analysis

646
00:24:44,640 --> 00:24:49,279
basically it will have

647
00:24:46,880 --> 00:24:51,840
it will give you an eigenvalue

648
00:24:49,279 --> 00:24:54,080
for each factor and that kind of

649
00:24:51,840 --> 00:24:56,320
represents

650
00:24:54,080 --> 00:24:57,440
it kind of gives it each factor scored

651
00:24:56,320 --> 00:24:58,840
and

652
00:24:57,440 --> 00:25:02,720
which tells you

653
00:24:58,840 --> 00:25:04,880
the um how well that factor

654
00:25:02,720 --> 00:25:06,960
is describing what percentage of

655
00:25:04,880 --> 00:25:09,840
variation in your network that factor is

656
00:25:06,960 --> 00:25:11,360
describing how important is it to the

657
00:25:09,840 --> 00:25:13,440
entire structure of the network and it

658
00:25:11,360 --> 00:25:15,520
will you'll usually have a few that have

659
00:25:13,440 --> 00:25:17,120
really high and then it will

660
00:25:15,520 --> 00:25:18,559
will drop off

661
00:25:17,120 --> 00:25:20,400
kind of like that

662
00:25:18,559 --> 00:25:23,630
and you kind of go one two three four

663
00:25:20,400 --> 00:25:25,440
five so the numbers on my um

664
00:25:23,630 --> 00:25:29,840
[Music]

665
00:25:25,440 --> 00:25:29,840
let me see if i can bring up my

666
00:25:30,080 --> 00:25:37,200
screen again

667
00:25:32,720 --> 00:25:38,960
am i still showing my screen yeah oops

668
00:25:37,200 --> 00:25:42,320
let's see how we go

669
00:25:38,960 --> 00:25:45,200
so if i skip through

670
00:25:42,320 --> 00:25:45,200
and look at

671
00:25:47,120 --> 00:25:52,799
that's not that's not helping

672
00:25:50,159 --> 00:25:54,400
stop sharing let's try again share share

673
00:25:52,799 --> 00:25:59,279
screen

674
00:25:54,400 --> 00:26:02,880
share screen uh screen too

675
00:25:59,279 --> 00:26:04,480
and then people can probably see that

676
00:26:02,880 --> 00:26:06,720
i'll just bring out my powerpoint that

677
00:26:04,480 --> 00:26:10,159
might be easier okay

678
00:26:06,720 --> 00:26:11,679
sorry back on track

679
00:26:10,159 --> 00:26:15,200
all right so each of those numbers

680
00:26:11,679 --> 00:26:17,840
represents the kind of ranked order of

681
00:26:15,200 --> 00:26:21,279
those factors and what i've done

682
00:26:17,840 --> 00:26:23,360
is kind of grouped them

683
00:26:21,279 --> 00:26:25,919
in um

684
00:26:23,360 --> 00:26:28,159
brought four broader things

685
00:26:25,919 --> 00:26:30,559
um and i've done that by analyzing how

686
00:26:28,159 --> 00:26:31,600
much they overlap in a more quantitative

687
00:26:30,559 --> 00:26:33,840
way

688
00:26:31,600 --> 00:26:36,960
um but there's also a little bit of

689
00:26:33,840 --> 00:26:36,960
qualitative kind of

690
00:26:37,279 --> 00:26:41,279
yeah assigning one

691
00:26:39,360 --> 00:26:44,159
assigning a bubble to a category

692
00:26:41,279 --> 00:26:47,120
particularly on those edges um was

693
00:26:44,159 --> 00:26:48,159
was a qualitative decision so

694
00:26:47,120 --> 00:26:49,360
yeah

695
00:26:48,159 --> 00:26:51,440
wonderful

696
00:26:49,360 --> 00:26:53,520
next question as you say there are many

697
00:26:51,440 --> 00:26:55,279
different analysis options for your

698
00:26:53,520 --> 00:26:57,120
network as well as

699
00:26:55,279 --> 00:26:59,279
all the choices for how you construct

700
00:26:57,120 --> 00:27:01,279
your network can you tell us more about

701
00:26:59,279 --> 00:27:03,840
how you determined what story you wanted

702
00:27:01,279 --> 00:27:06,799
to tell and which technical choices

703
00:27:03,840 --> 00:27:07,840
would be useful for that

704
00:27:06,799 --> 00:27:09,760
right

705
00:27:07,840 --> 00:27:12,080
okay so the first major decision was

706
00:27:09,760 --> 00:27:14,240
co-citation of google electric coupling

707
00:27:12,080 --> 00:27:16,799
um i went with bibliographic coding

708
00:27:14,240 --> 00:27:19,200
because i knew there would be a lot of

709
00:27:16,799 --> 00:27:20,960
useful papers that probably haven't had

710
00:27:19,200 --> 00:27:23,440
many citations

711
00:27:20,960 --> 00:27:25,679
um it's still limited citation network

712
00:27:23,440 --> 00:27:27,120
analysis is better at looking at what's

713
00:27:25,679 --> 00:27:28,960
happened in the past and what is

714
00:27:27,120 --> 00:27:30,640
happening right now or what might happen

715
00:27:28,960 --> 00:27:33,919
in the future

716
00:27:30,640 --> 00:27:36,159
um so i went bibliographic coupling and

717
00:27:33,919 --> 00:27:36,159
then

718
00:27:36,320 --> 00:27:41,520
then yeah it took it took a while i just

719
00:27:39,200 --> 00:27:42,559
generated big dashboards for every

720
00:27:41,520 --> 00:27:44,640
single

721
00:27:42,559 --> 00:27:46,720
factor and you just go through it and

722
00:27:44,640 --> 00:27:49,120
you're like okay this this this thing is

723
00:27:46,720 --> 00:27:51,919
different about this factor suddenly

724
00:27:49,120 --> 00:27:54,720
twice as many researchers from china are

725
00:27:51,919 --> 00:27:58,240
kind of in this bubble why is that

726
00:27:54,720 --> 00:28:00,880
um so going through those like you can

727
00:27:58,240 --> 00:28:03,760
get python to print out however many

728
00:28:00,880 --> 00:28:05,679
charts of your most frequent words or

729
00:28:03,760 --> 00:28:06,960
your institutions and countries and all

730
00:28:05,679 --> 00:28:07,919
that kind of thing

731
00:28:06,960 --> 00:28:09,440
um

732
00:28:07,919 --> 00:28:11,760
yeah and just combing through and being

733
00:28:09,440 --> 00:28:13,760
like okay that's that's different that's

734
00:28:11,760 --> 00:28:16,080
different why is that different

735
00:28:13,760 --> 00:28:19,600
um but it is it is a time consuming

736
00:28:16,080 --> 00:28:22,000
process to make it

737
00:28:19,600 --> 00:28:24,799
to to kind of go through all of that

738
00:28:22,000 --> 00:28:26,399
information even if it is technically

739
00:28:24,799 --> 00:28:28,159
especially if you may be using a piece

740
00:28:26,399 --> 00:28:30,480
of software you might be able to bring

741
00:28:28,159 --> 00:28:32,320
up your network in a matter of minutes

742
00:28:30,480 --> 00:28:35,919
but um yeah

743
00:28:32,320 --> 00:28:37,440
making sense of it can take a while

744
00:28:35,919 --> 00:28:40,320
uh i recommend looking for those

745
00:28:37,440 --> 00:28:40,320
outliers yeah

746
00:28:40,399 --> 00:28:43,120
one last question before we go to the

747
00:28:42,000 --> 00:28:44,960
break

748
00:28:43,120 --> 00:28:47,440
have you been able to determine if a

749
00:28:44,960 --> 00:28:50,799
citation provider has an impact on

750
00:28:47,440 --> 00:28:52,559
articles that are unearthed

751
00:28:50,799 --> 00:28:54,720
how's the impact of articles that are on

752
00:28:52,559 --> 00:28:57,360
earth i have not personally done that

753
00:28:54,720 --> 00:28:58,799
research i am certain

754
00:28:57,360 --> 00:29:01,760
that there are

755
00:28:58,799 --> 00:29:03,200
papers out there with people who have

756
00:29:01,760 --> 00:29:05,120
looked at that

757
00:29:03,200 --> 00:29:08,080
um so

758
00:29:05,120 --> 00:29:11,039
the person who

759
00:29:08,080 --> 00:29:13,679
did this kind of analysis

760
00:29:11,039 --> 00:29:15,440
is probably a good place to start

761
00:29:13,679 --> 00:29:17,679
in terms of

762
00:29:15,440 --> 00:29:19,520
that kind of research so they've been

763
00:29:17,679 --> 00:29:21,919
really interested in the coverage of all

764
00:29:19,520 --> 00:29:23,840
these different citation databases

765
00:29:21,919 --> 00:29:27,039
um there was another study at flight

766
00:29:23,840 --> 00:29:29,360
past that looked at the quality of the

767
00:29:27,039 --> 00:29:31,120
search functions um and there's probably

768
00:29:29,360 --> 00:29:33,600
various people who could do statistical

769
00:29:31,120 --> 00:29:37,360
analysis looking at

770
00:29:33,600 --> 00:29:40,399
um the correlation between citations and

771
00:29:37,360 --> 00:29:43,520
it's it's sourced within a database or

772
00:29:40,399 --> 00:29:46,480
it's accessibility within any of these

773
00:29:43,520 --> 00:29:46,480
these kind of tools

774
00:29:46,720 --> 00:29:50,559
wonderful well thank you so much if

775
00:29:48,799 --> 00:29:53,840
anyone has any other questions for

776
00:29:50,559 --> 00:29:56,559
claire claire will be in the

777
00:29:53,840 --> 00:29:58,799
go glam community chat

778
00:29:56,559 --> 00:30:02,720
channel and

779
00:29:58,799 --> 00:30:07,000
we will be back with bonnie at 11 30 a.m

780
00:30:02,720 --> 00:30:07,000
and we'll see you there thanks

