The last example asks the following question: What characteristics of universities and what characteristics of the environment of universities influence the quality of universities? There are about 2500 higher education institutions in Europe. A few of those are the outstanding - the top ranked - universities, but most of them are much more ‘normal’. Why some universities have been the outstanding one’s forever is one question, what influences the performance of the large majority is another. One may think of characteristics of the HE institutions, such as size, number of undergraduate and graduate students, student-staff ratio, amount of externally funded research, and so on.
But also contextual variables may have an effect: degree of urbanization, other (higher) education institutions in the vicinity, presence of R&D performing companies, or public research institutes, and other variables representing the social, economic, and demographic characteristics of the region. Why would these factors may be relent. Several theories could be used, but the least one may say is that those social and other factors may affect the attractiveness of the university and the environment for potential students and academic staff. And the more attractive these are, the better staff and students one may be able to select. Another factor may be is that the presence of a variety of research and development and innovation related activities in the vicinity of an HE institution may result in an increase of exchange of ideas, an increase of (interdisciplinary) collaboration, and of funding possibilities.How would be be able to answer these questions? We will focus on the role of the latter factor.
The SMS datastore contains for the moment one dataset with performance data at the university level: the Leiden Ranking. This set contains data for the better but by far not for all HE institutions. Furthermore,the Leiden Ranking only reflects research performance, whereas other rankings also take into account teaching, or external funding (from e.g., industry. In the near future SMS may add some other rankings to increase the scope and the size of the coverage.
The SMS datastore contains several datasets with information about HE institutions, such as ETER, OrgRef, GRID, OrgReg, etcetera. From those we may extract the relevant properties of the HE institutions we are interested in. However, in this example we focus on the contextual factors. How would we retrieve those from the SMS datastore?
The whole process consists of different steps, from linking data, via geo-localization and finding the relevant geo-boundaries, to identifying the other R&D intensive organizations within these geo-boundaries. Then we can measure the number, kind and variety of R&D organizations in the environment of the university as a measure of the quality of the context. Finally we can do some statistics to answer the questions. Does the number, kind and variety of closeby R&D organizations influence the ranking of universities?
Step 1: Linking of the organization names between the relevant datasets, and this is described earlier in this report. In this case, it is about four datasets. After we have done so, we have for all HE institutions a variety of variables, among others the geo-coordinates.
Fig 42. Linking the relevant datasets
Fig 43. Detecting the other relevant institutions within the environment of an HE institution
Step 2: The geo-coordinates are used to define the boundaries of the environment, and that is needed to find the other R&D intensive within those boundaries.
Step 3: For that we again use the OrgRef dataset, as this contains a huge amount of those organizations, all with their geo-coordinates. For each HE institution, we can now determine which R&D organizations are closeby. As OrgRef also has information about the type of organization, we not only know the number, but also the types, and the variety.
Step 4: These variables, together with the characteristics derived from ETER, can then be used in the explanation of ranking of HE institutions. Figure 44 shows a part of the dataset that can be analyzed in a statistical package like SPPS SAS, or R. The ‘english_name’, ‘country’, ‘category’, ‘total_expenditure’, ‘third_party _funding’ and ‘Academic staff size’ are all retrieved from ETER. The performance score ‘PP_top10’ comes from the Leiden ranking, the ‘longitude and ‘latitude’ come from GRID, and the ‘geo-boundary’ is produced in the SMS platform. The geo-boundary and GRID are used to calculate the ‘Number of R&D intensive organizations’.
To what extent these variables indeed predict the ranking is to be answered - but the correlation between the two yellow columns (with the Netherlands universities only) is 0.58.
Fig 44. Part of the resulting dataset (Dutch universites only, and a few of the variables)