INVENTORS DYNAMICS IN BALKANIC AREA: EVIDENCES BY A NETWORK ANALYSIS

Patent data is a key source of information for innovation economists. In recent decades it has been possible to observe its significant diffusion and success mainly thanks either to archives digitization or to authorities’ greater openness with respect to patent granting procedure. Furthermore, the use of this information over time has not been limited to simple statistics on patents and their classification, but, going further, has extended to the analysis of applicants, inventors, citations, and much more. By this seminal paper, we are going to analyze starting from Data analysis related to a selection of Balkanic Countries, chosen among the most dynamic in innovation process and production of patents: Croatia, Serbia, and Bosnia and Herzegovina. How it will explain into the work, this selection was not accidental: the aim was to represent the evolution of these Countries, in terms of patent internationalization, depending on their “link” with the European Union, not all Western Balkan Countries are in fact part of it. Croatia, an official EU member since 2012, was chosen as the representative state of European influence. Some interesting results were obtained with a novel approach by social network analysis techniques.


INTRODUCTION AND LITERATURE REVIEW
F or several years, the attention of the scientific community has focused on the study and analysis of bibliometric indicators; the latter are useful for relating the various research activities of the numerous universities around the world. The interest of many researchers, in fact, has focused on the data that have measured the impact that scientific collaboration networks between universities are having in the world because this type of network not only describes the academic society but has been able to trace a clear path to learn about the structure, diffusion and evolution of knowledge in an open innovation community [1,2]. Furthermore, in the literature there are various studies on the uses of publication and analysis of citations in the evaluation of scientific activities and some of the basic statistical properties of scientific literature, in particular the asymmetry of the distributions of publications and citations, reference time frames, and various anomalies in citation patterns from one country to another. For several years, many scholars have devoted a large part of their energies to the development of a similar research base and infrastructure for patent bibliometry, that is, for the use of patents, and patent citations in the evaluation of technological assets. There are striking similarities between literature bibliometry and patent bibliometry, and they are both applicable to the same wide range of problems.
Narin has shown in his work that there are striking similarities between the literature and patent distributions of national productivity, inventor productivity, reference cycles, the impact of citations and citation preferences within the country [3] (see also [12] for patient bibliometrics).
Following the idea that research can be improved using public or private financial investments, Gao et al. studied the evaluation of research in China with respect to public spending [4]. Therefore, government funding is a fundamental resource for scientific research and has made a concrete contribution to the scientific and technological development of the world. But these funds come from common taxpayers, so we need to evaluate the effectiveness of these funds. In general, policymakers use the peer review method for making assessments. Making up for peer review shortcomings, the authors propose the benchmarking assessment method, mainly guiding scientometrics indicators, for evaluating publications results and research grants use. One of the topics that have aroused our interest concerns academic relations in the Balkan area. Numerous studies have been conducted on this line that analyze the scientific production of the countries of the western Balkan area, such as Albania, Croatia, Bulgaria, Bosnia and Herzegovina, etc. There are many articles focusing on scientific disciplines, institutions and journals from these countries. For this reason, this paper intends to give the Western Balkans the prominence it deserves by studying its research productivity using a bibliometric approach. Rabkin & Inhaber were among the first scholars to analyze the scientific interactions of Argentina, Brazil and Norway in terms of citations and references to the scientific literature taken into consideration [5]. In their work they show how these three nations heavily cite the publications of the central nations as opposed to those of their own country.
Another interesting and systematic study on the development of the problem of science and technology policy in the peripheric areas of the Third World is that of Moravcsik in [11].
Starting from this research, Pravdic et al. studied the academic report of a peripheral country such as (former) Yugoslavia using three different factors: possibilities and limits of the evaluation of scientific activity; problem of the form and dimensions of science as a human activity in general; specificity of communication systems in science [6]. Crescenzi et al. in their work examine the characteristics of collaborations between inventors in the United Kingdom (UK) by observing which types of proximity -geographic, organizational, cognitive, social and culturalethnic -among inventors are prevalent in the partnerships that ultimately has led to technological progress [7]. By studying and using a new group of British inventors, the authors provided an analysis of the associations between these "neighborhoods" and the co-patent. The results show that while collaboration within companies, research centers and universities remains crucial, external networks of inventors are a key feature of innovation teams. Furthermore, the analysis shows that external networks are highly dependent on previous social connections, but are generally not constrained by cultural or cognitive factors. Therefore, based on some findings the authors suggest that innovation policies should, rather than focus on spatial clustering, facilitate the formation of open and diverse inventor networks. Hiring inventors has long been recognized as a learning method used by innovative companies. Palomeras and Melero, in their paper state that the characteristics of the knowledge accumulated by an inventor a in their current employment determine what hiring firms can learn from him [8]. The implication is that some inventors are more likely to be hired than their peers. The authors carried out a study on the relation between the type of knowledge embodied by inventors working at IBM and their probability of moving. Relying on patent data to track the movement of inventors between companies and to characterize the type of know-how they hold, they have identified various factors of inventor mobility, such as the quality of their work; the ЭН тренды развития глобальной науки 232 complementarity of their knowledge with those of other inventors; and, to a lesser extent, their experience in key areas of the firm where the firm is not a dominant player. The results obtained confirmed the role of knowledge characteristics underpinning R&D staff mobility and suggest that learning is a relevant force in the market for inventors. Knowledge networks made up of links between elements of knowledge and social networks made up of interactions between inventors both play a key role in innovation.
Brennecke and Rank, using a multilevel network approach, the authors integrate research on the two types of networks and investigate how a firm's knowledge network affects work-related interactions between its inventors [9]. To this end, they associate inventors with specific knowledge elements in the company's knowledge network and examine how this association affects the popularity and activity of inventors in a job-related consulting network. The analysis was conducted on 135 inventors working in a German high-tech company with information derived from the company's 1031 patents. The results obtained from multilevel exponential random graph (ERGM) models show that different dimensions of knowledge derived from the firm's knowledge network shape the transfer of advice between inventors in unique ways. Therefore, in their study they demonstrate how the structural characteristics of the firm's knowledge stock influence the interpersonal interactions between its inventors, thus influencing the intra-organizational diffusion of knowledge and the recombinant possibilities of the firm. The adoption of stricter patent laws and the composition of patent rights vary from country to country according to the level of economic development [10]. A patent is a contract between an inventor and a state; it guarantees exclusive right, granted for an invention, a product or a process that makes a new way of doing something accessible, offering a new technical solution to a problem. This is a technical-legal document in which there is a detailed technical description of the object of the patent and the related claims for protection. In this case, it must contain a summary of the previous state of the art, or the technology known at the time of filing. In each country there is a national office to which it is possible to apply for a patent; by way of example, in Italy there is the Italian Patent and Trademark Office (UIBM), based in Rome and part of the Ministry of Economic Development. Nowadays, however, companies operate in an international context and it is necessary to have protection of the value of innovation not only at the local level; the possibility of extending one's right to other countries or directly to all their respective continent and beyond, is therefore recognized, by forwarding the request to various bodies, including the European Patent Office (EPO).

RESEARCH AIM AND MOTIVATIONS
This research work aims to understand how the internationalization strategy of the inventive activity of the Balkan countries has changed over the years. In particular, it was decided to extrapolate, from the database used, the information relating to 3 of the 13 countries of the western peninsula, such as: Croatia, Serbia and Bosnia and Herzegovina. They were chosen deliberately for the purpose of analyzing the way in which their link with EU (not all western Balkan countries are EU members) has affected their evolution in terms of patent internationalization. Croatia, an official EU member since 2012, was chosen as the representative state of European influence. In order to underline the differences, Serbia, an ЭКОНОМИКА НАУКИ 2021, Т. 7, № 4 ЭН тренды развития глобальной науки official EU candidate country since 2003, and Bosnia and Herzegovina, a potential candidate since 2003, were then analyzed. '' use by foreign companies, developing relationships external to the company and benefiting from an alternative mode of access to foreign markets through collaboration with other companies. The European Union knows well how important the internationalization of inventive activity is and that is why it promotes its developments, investing time and resources for the creation of incentives. The time frame chosen to evaluate the internationalization process, that is to measure the degree of geographic heterogeneity of inventors working in the same research group, based on the country of residence (through the Country Codes), has been the one that goes from 2010 to 2017. In order to achieve this goal, an Internationalization Index of Research Groups -IGI was used, used for the first time by a group of researchers from the Complutense University of Madrid. The IGI is an innovative indicator that measures the degree of heterogeneity of the research groups. It arises from the normalization of the Herfindahl-Hirschman concentration index, used above all to measure the degree of competition present in ЭН тренды развития глобальной науки 234 a given market. Homogeneity is often referred to as the opposite concept of concentration. Therefore, saying that a concentration index (such as HHI is) measures homogeneity might raise some criticism. However, it is a fact that, in this context, the concentration of the research groups, in terms of the country of residence of its members, corresponds to the homogeneity of the inventors and therefore suitable for measuring their degree. The concept opposite to homogeneity is heterogeneity. To measure the degree of heterogeneity we give a rate ranging from 0 to 1. In our case this value must be minimum (0) if each inventor of the group comes from the same country, while it must be maximum (1) if each of them is from a different country. To measure the assumed degree of heterogeneity, we first use the HHI function: where: n = the number of countries of residence of the inventors in the observed research group; q i = the share of inventors of the group redeemed in country.
For example, if we had a research group made up of three inventors, two of them Spanish and one Italian we would have:, with Spain = 2/3 = 0.66 and Italy = 1/3 = 0.33. The resulting HHI would be the sum of the squares of these two numbers (0.54). The index as structured would measure the geographic homogeneity of the inventors and would be between and, however not giving a measure in the proper sense and furthermore, groups with different n could not be compared using this index. Considering for example two research groups, where: in one we have two inventors coming respectively from Italy and China, and in the second three investors respectively from Argentina, Germany and Spain; the extent of the concentration in the two groups should be the same (i. e., the minimum), since in both each inventor is from a different country. However, HHI for the first research group is equal to 0.5, while for the second it is equal to 0.33. This is due to the fact that, as mentioned, the lower limit of the HHI is inversely related to n and the passage from one value of n to another generates differences in the minimum value of the HHI. Moreover, given the same difference between the values of n, the difference induced in the minimum value of the HHI will make it the greater the smaller the two n are in absolute value. This means that the incompatibility problem is more serious for small values of n; to overcome this, a normalized HHI is adopted. The formula for calculating the normalized HHI is as follows: Where is the number of inventors in the research group and HHI is the simple Herfindahl -Hirschman index calculated with the previous equation. This index ranges from 0 to 1, regardless of n. Going back to the example above, now, both hunt groups would show a equal to 0. The one described in the equation is a standardized indicator of geographical homogeneity of the research groups. To get our heterogeneity indicator -IGI, we subtract from 1 to: The IGI measures the internationalization of a research group over a range from 0 to 1, being, as mentioned above, 0 in the case in which all inventors reside in the same country and 1 in that each of them resides in one different. Furthermore, being a standardized measure, it allows the comparison of groups with a different number of inventors. This means that, in a patent dataset, such as the one used, each patent will show its IGI score and will be comparable to all other patents based on that score. Then, once the IGI for each patent has been calculated, the result is added up and divided by the priority year, thus obtaining the average IGI for that year. In this way it is possible to describe the general trend towards internationalization of research groups in PATSTAT Global -1992 to 2018.

EPO'S PATSTAT
The main source of patent information is the PATSTAT database. It is a database with global coverage that contains bibliographic information on almost all the patents currently in use. PATSTAT consists of two single products, such as: ЭКОНОМИКА НАУКИ 2021, Т. 7, № 4 ЭН тренды развития глобальной науки

235
PATSTAT Global: which contains bibliographic data relating to over 100 million patent documents from major industrialized and developing countries. It also includes legal event data from more than 40 patent authorities contained in the EPO World Legal Event Data (INPADOC), bibliographic information on applications and publications; PATSTAT EP Register: contains detailed bibliographic, procedural and legal event information for EP patents (Euro-PCT published).
It is a valuable tool for the research community because it contains raw data collected transparently. This rich database promises to dramatically improve the quality of empirical research in the field.
The database consists of a set of tables following a relational database scheme in which the tables can be linked to each other using a relevant entry key. The table on patent applications, called tls201_appln, is the central element of PATSTAT. The other tables contain information on each patent application, for example, inventors and owners, technology fields, titles and abstracts, abstracts publication requests, publication requests and citations.
We see in Figure 2 the scheme of the database.
To drastically reduce the calculation time, we run our queries on the data contained in the PATSTAT Global -Autumn Edition 2018 (hereinafter referred to as "PG light"), a database provided by EPO itself as sample data of the PATSTAT Global ranging from 1992 to 2018. PG light has the same fields as the original one but with a much smaller number of records, which allows simulations to be performed in a reasonable time, but at the same time provides less accurate data.

LIBRARIES INVOLVED IN THE WORK
Libraries are sets of written routines and functions that perform a specific task, and can be called up as needed. We can consider a library as a set of modules stored within packages. Each module contains simple instructions and definitions. The combination of various modules constitutes a library. Often the modules have already been written by other developers, and there is no need to start over each time. One of the first libraries imported into Jupyter Lab was Pandas. Once installed, the latter must be imported into the Python environment. With the code: Pandas, and in particular its series of objects such as Series, DataFrame, which are based on the array structure, provide efficient access to the data processing activity that occupies much of a data scientist's time, providing the tools for analysis data in the Python language. The DataFrame, fundamental structure in Pandas, can be thought of as a generator of a matrix, a DataFrame can be thought of as a sequence of objects in series aligned. The package is open source and comes with different data structures that can be used for different data manipulation tasks. Pandas is a very popular library for retrieving, preparing and using future data with other libraries. It also allows you to easily retrieve data from different sources, for example: SQL database, text, CSV, Excel, JSON file. In this regard, the code used was: Thus obtaining the data in tabular form of the database tls201_appln: Once the data is in memory, there are dozens of different operations to parse, transform, retrieve missing values, clean up the dataset, as well as SQL-like operations and a set of statistical functions to perform even a simple analysis. The next step was to import a second library: NumPy; with code: ЭН тренды развития глобальной науки

237
Numpy is an acronym for: Numeric Python and represents the fundamental package for scientific computing with Python. NumPy is obviously one of the largest scientific and mathematical computation libraries for Python. One of the most important features of NumPy is its array interface. This interface can be used to express images, sound waves, or other raw binary streams as arrays of real numbers with size N. Once the tls201_appln has been imported, we then create an array, containing all the keys of the appln_id column, and execute a search query, in the case below, all the keys associated with the HR country (Croatia).
This series of steps made it possible to highlight the first column of the table tls201_appln, or appln_id, and to extract only patents registered in Croatia (HR) by 2020, using the appln_auth key.
At this point it was necessary to use a new library in order to create a graph representing the results obtained. This is where Matplotlib comes to the rescue. This is a standard Python library used for creating 3D charts and graphs. It is quite low-level, which means that it requires more commands to generate graphs and figures than some advanced libraries. However, the main advantage is flexibility. With enough commands, you can create virtually any type of graph you want (histograms and scatter graphs to graphs with non-Cartesian coordinates). Matplotlib benefits from an additional Python library, Seabors, which enhances the data visualization tools of the Matplotlib module. The transmitted code was: By doing so, we obtained a first representative graph of all the patents made in Croatia from 1992 to 2018. The same procedure was used for other Balkan countries, such as: Serbia and Bosnia and Herzegovina; obtaining also in this case the number of patents created for that given country from 1992 to 2018.
Once these first steps have been completed, it is possible to move on to the second step of the proposed analysis. To search for inventors and their nationality for each patent, a new database table had to be imported tls207_pers_appln.
From the table we note that the tls207_pers_appln has the appln_id column in common with the tls201_appln; the first column, on the other hand, person_id, contains the references of those who participated in the patent.
For each extracted patent we then check how many person_id are associated, and we do this by distributing the patents over some years (e. g. 2010) in order to then have results that tell me the internalization strategies of that country for that year. In Python: Obtained the number of person_id through the tls206_person table, we find the country code of each inventor, thus discovering their nationality and name. Therefore Thanks to this work of manipulation and research in the database, it was possible to obtain the information necessary for the calculation of the IGI.

RESULTS: SOME EMPIRICAL EVIDENCE
The first results obtained concern the quantity of patents available from 1992 to 2018 in PAT-STAT Global -Autumn Edition 2018. Out of a total of 279,881 patents, 150 were registered in Croatia, 68 in Serbia and only 9 in Bosnia and Herzegovina.
Starting from Figure 3 on the left, it is possible to notice a significant increase in the quantity of patents produced in Croatia in 2013. It is possible to appreciate the significance of the data by contextualizing the historical events of the reference period. The entry into the EU of Croatia  European Council confirms Serbia as a candidate country. On the contrary, Bosnia and Herzegovina, despite having the same peak in the same year, initiated a high-level dialogue on the accession process. Therefore, it is clear that although the relations between the three countries with the EU are of a different nature, they have similarities in terms of patent production. Subsequently, for the purposes of our analysis on the internationalization of research groups, only the patents registered from 2010 to 2017 were taken into consideration and, in relation to the period of time considered, the patents registered in Croatia decreased from 150 to 100, those in Serbia from 68 to 56 and from 9 to 6 those registered in Bosnia and Herzegovina. From Figures 6-8 it is possible to extrapolate information about the degree of internationalization of the countries considered a few lines above. The common factor of the three countries lies in the constant growth of the IGI index, found in the years 2011-2012. As already mentioned, the historical context of that year is characterized by the relationship established with the EU, whose influence is visibly poured into the production of inventive activity of each country. The representative curve of the IGI index for Croatia is the only one to show constant growth over the entire time period considered; on the contrary, Serbia and Bosnia and Herzegovina present discontinuities or significant leaps in function. The IGI values for Croatia are reflected in a such a higher degree of heterogeneity that the curve's value is within a 0.4-to-1 range (the maximum degree). On the contrary, the IGI of Serbia never touches the value of 0.9, showing a significant decline after 2012. The reasons why the IGI tends to assume values close to 0 are many. One of the hypotheses would be based on a lack of collaboration with EU countries, justified by advantages not obtainable from the negotiation situation that characterized the years shown by the graph. The advantages in question could lie in market opportunities that can be achieved differently if not with facilitated partnerships between the countries of the European community. Bosnia and Herzegovina presents a completely  different graphic situation compared to Serbia, due to the negligible number of patents available. Figure 8 shows in fact a rapid surge of the IGI curve, allowing to reach maximum degree values in a short time. We are aware of the countless variables that could affect the different collaboration strategies between EU and non-EU countries, such that inferring a 100% reliable result is very difficult. On the other hand, it is also true that the negotiations show a degree of correlation with the internationalization index that is not negligible, making it possible to extend, although not entirely, the effect of the negotiations to the production of the inventive activity of the countries considered. Infine, tornando all'area UE si è voluto considerare il grado di collaborazione tra Germania e Croazia. Figure 9 shows how much the levels of German influence in Croatian inventive activity are noteworthy. Although they are not characterized by a line of continuity in the proportions, settling on values which are alternately close to/lower than 50, the German influence in the patents produced in Croatia remains a constant that accompanies inventive production throughout the range temporal considered. The maximum value was recorded in the sixth year, exceeding half of the total value.

CONCLUSIONS AND FINAL REMARKS: NEW INSIGHTS FROM BIG DATA?
The analysis carried out on the PATSTAT Global database -Autumn 2018 edition made it possible to undertake a search on EU and non-EU patents, by drawing on sufficient amount of available data. The combined use of the Python computer programming language and the use of the database made it possible to grasp the potential opportunities and the predisposition of the system to provide computational support. In particular, the calculation of the IGI internalization index made it possible to numerically quantify the heterogeneity of the nationalities of inventors on European and non-European patents.
The management of the database in Jupyter enabled some cross-cutting research resulting in an extremely insightful analysis carried out in the light of an outstanding data combination mirroring the different historical backgrounds concerned. In fact, internalization is also an expression of the negotiations that take place in the community and of the internal socio-political dynamics between countries. The essence of data, or rather Big Data, contains phenomena that at first glance might not be evident; in the specific case, only the combined use of data was able to make it possible to arrive at considerations on the relationship of collaboration between the countries of the Balkan area and European countries. Although the results offer a generic picture of the degree of internalization in the Balkan area, the quality of the data available should have been at higher levels. In fact, the PATSTAT database has a level of cleanliness such that it was necessary to manage the data on a light database, thus allowing to manage the criticalities on a more modest amount of data. However, the shortage of relevant information paradoxically emphasizes the value of the results obtained which anyway can be deemed acceptable grounds for drawing reliable conclusions.