Knowledge economy classification in African countries: A model-based clustering approach

ABSTRACT Knowledge economy (KE) has been a central issue in the political-economic literature of advanced economies, but little research has focused on the transition towards a KE in Africa. Using a latent profile analysis, six clusters of the KE were found in the region. The clusters range from very prepared with good performance in all KE dimensions (institutional, education, and innovation output) to very unprepared with low performance in each KE dimension. Lastly, we offer policy recommendations that shed some light on the national and international economic policies towards a more knowledge-oriented environment. One such recommendation is that effective policies should consider both the similarities and dissimilarities of African knowledge economies. How precise that can be done is one direction future research can take.


Introduction
Knowledge-based economies (henceforth, KBE) have received central attention in key policy reports from different international organizations such as the Organization for Economic Co-operation and Development (OECD, 1996) and the World Bank (2007). 1 Popularized by Drucker (1969), the concept of knowledge economy (henceforth, KE), which forms the basis of KBE, was primarily introduced by Machlup (1962) who classified knowledge depending on its application to areas of economic activities. Before that Stigler (1961) viewed knowledge as an economic category with an emphasis on information searching costs. Other scholars consider KE as an economic system where knowledge is a key factor (or resource) of production and economic growth (see for instance, Kochetkov & Vlasov, 2016). The fundamental determinants of the KE include significant dependence of the 120 countries in a dataset with indicators of technological knowledge, African countries were excluded from the sample. Lastly, there is only one study that applies the k-means clustering approach to categorize countries according to their level of KE based on World Bank's KE procedure (Paz-Marín et al., 2018). Besides the fact that African countries as well as emerging countries are not included in the sample, this study has two inherent problems. The first is that the selection of clusters is not based on any statistical criterion, and the second is that the study does not allow countries to change from one cluster to another over time. Thus, as Rao and McNaughton (2019) demonstrate, the value of knowledge is diminished when the dynamic dimension as well as the nature of panel design are neglected. In addition to this, while focusing on a sample of OECD countries, the study largely ignores that the conditions for, and transition towards, a KE are rather different for different countries.
Our main research question is: Are African countries technologically diverse or homogenous in their transition towards a KE through different dimensions? This question can be unbundled as the following sub-questions: . What is the African countries' potential to enable a KE society through the development of ICT? . What is the institutional framework needed to facilitate the transformation of national economies to a capable KE? . What are the human capital requirements needed to absorb the full potential of innovations that support KE? . How have innovation outputs become a key driver of further advances toward KE?
We argue that the question (or its sub-questions) justifies this investigation, because the current lack and deficiency of classification models in the literature on the KE may undermine the usefulness to policy and future research of composite indicators for quantifying the progress towards KE at the national level, especially in developing countries. We also argue that given the importance that policy makers place on African countries' progress towards a KE a richer and deeper approach is needed as an alternative strategy to using simple composite indices to capture the level of the KE across countries. This study demonstrates the use of a refined model-based clustering approach to distinguish underlying homogeneous groups, or latent classes, of countries belonging to similar levels of KE. The main idea is to create a taxonomy of countries that allows us to make comparisons and generate a new classification according to their respective KE levels. For that purpose, we employ a certain number of KE dimensions: education, institutional factors, innovation, and ICT infrastructure. Instead of picking KE dimensions arbitrarily ourselves, we take the Knowledge Assessment Methodology (henceforth, KAM) as a benchmark (see World Bank, 2008;Chen & Dahlman, 2006) and we apply a General (Gaussian) Mixture Models (henceforth, GMMs) clustering approach due to its advantages in comparison to traditional clustering techniques (such as k-means) as we describe later in that they provide an objective mathematical criterion for determining the number of clusters present (Fraley & Raftery, 2002).
The relevance of a GMM, and our novel application of it, is that it permits us to determine endogenously how the different components of the KE play an important role in the development context and how similar African countries are with respect to the levels of KE, which in turn allows us to generate a ranking or classification of countries that is dynamic and non-arbitrary (see for instance for a similar application, Abad- González & Martínez, 2017). We also contribute to the modeling strand of literature in that our approach has both exploratory as well as confirmatory elements to it. However, since the existing literature is too thin to permit for a categorical choice between exploratory analysis and confirmatory analysis, and we are, therefore, unable to construct testable hypotheses in a conventional way, we favor a GMM approach over alternatives like k-means. The approach serves as a complement for the judgement of stakeholders to know the real stages of countries are in, and how they are evolving or can move forward This is a significant contribution because even though composite indices like KEI exists, it is not clear how they classify African countries. If countries are miss-classified, then the measured effects of ICT or IT on development would be incorrect. To stress the point, in regression models often ICT, IT, or collectively KEI is often represented by the Africa dummy variable which assumes that African countries are technologically homogenous. Our classification model and analysis bring clarity to how correct these matters are and so by extension to the nexus between development and ICT or IT.
Moreover, previous research also reveals the strengths and weaknesses of each African country over the different KE dimensions. Hence, the study enhances understanding of how African countries may improve their technological competitiveness and consequently the quality of life for their citizens. This is the overarching goal and to accomplishit we borrow from the KAM initiative published by the World Bank (2012) and apply a GMM to classify 50 African countries over the period 1996-2017 according to their progress towards a KE. Our results show four representative clusters of KE levels: Very prepared, Prepared, Unprepared, and Very unprepared. Thus, African countries are technologically diverse and statistical tests confirm the presence in that data of different clusters. One key conclusion we draw is that even if they do exist technology clubs are not static but dynamic and they can coexist with technology gaps. Once the data is labeled, a classifier is applied. In this way, we provide some ranking for assessing the degree of competitiveness of the African countries in the context of KBE. Another conclusion is that different countries can belong to different clusters at different times for different reasons, which questions the relevance of static classifications. Thus, our study makes three significant contributions to the KE literature: First, most studies involve comparisons of KE across a relatively small number of countries, and mainly focusing on high income countries. Our study focuses on a sample of 50 African countries. This is significant because the Africa region has been largely neglected in formal studies on KE. Indeed, the choice of the sample might also lead to different outcomes in terms of KE convergence. Most of the empirical literature has focused on the macro determinants of the KE in African countries (Andrés et al., 2015;Andres, et al., 2017;Asongu, et al., 2019;Asongu et al., 2018). They have employed ad hoc statistical techniques to identify causality by using longitudinal data. Particularly, they employ generalized method of moments (GMM) that deals with endogeneity issues, although many of the empirical outcomes might not be valid as GMM is quite sensitive to the choice of valid external or internal instruments. Our approach is to endogenously determine how similar African countries are with respect to the levels of KE in a way that allows us to generate a dynamic ranking or classification of countries that are endogenous and non-arbitrary, which adds to the formal literature on KE.
The second contribution of the paper is that it also evaluates the state of the KE in African countries and explores the differences across the different KE dimensions. Hence, it fuels the debate over the ways to measure the KE through composite indicators as elaborated by several international organizations (World Bank, European Commission, and World Economic Forum, among others). Finally, the study contributes to existing literature by seeking to formalize the diverse typologies of KE in African countries within the economics literature using a model-based clustering approach rather than other cluster algorithms (for instance, k-means), again, thereby enabling us to create classificatory topologies of African countries' KE levels based on strong statistical criteria.
The remainder of the paper is as follows: In Section 2, we review the current literature in two subsections. The first subsection overviews the KE assessment methodologies, whereas the second describes clustering to justify why we selected GMM to study KE. Section 3 describes the variables and data we utilize, and the proposed methodology: the GMM technique. The results are presented and discussed in Section 4, while Section 5 concludes the paper with the implications for policy and further research.
2 Literature review

KE assessment methodologies
Conceptually, knowledge is a source of competitive advantage in the twenty-first Century. This is not a new idea; economic theory has long appreciated the importance of knowledge in economic performance and human welfare (Dodgson & Gann, 2018, pp. 12-32). Schumpeter (2005), for example, saw technology as a key driver of economic growth and productivity that could both create and destroy jobs, and often does both simultaneously (Becker, 2005;Dodgson & Gann, 2018, pp. 12-32). This is Schumpeter's well-known "gale of creative destruction," and it means that the endogenous growth approach emphasizes knowledge as both an output-input (Grossman & Helpman, 1991;Lucas, 1988;Rebelo, 1990;Romer, 1986;. As input, knowledge allows technological advancement and associated innovations to drive long-run growth. Nevertheless, from the empirical side, it is hard to net out the contribution of knowledge from the total factor productivity, because of the measurement problem (either at micro or macro level). Krugman (2013), for example, cast doubt on the empirical verification of the theory by pointing out that there are plenty of assumptions about how unmeasurable things affect other unmeasurable things. The doubt is understandable because knowledge also involves combination of factors that interact in intangible ways. According to Kaplinsky (2005), there are several types of knowledge rent: technological, human resources, organization and marketing and design.
Leaving aside the conceptual issues associated with the multidimensional concept of KE, the literature on measuring knowledge in developing countries has also been limited, despite the efforts such as those by Samoilenko and Osei-Bryson (2008) who argue for "context-specific micro-economic" assessment of national ICT capabilities in the DEA analytical framework (see, also , Samoilenko & Osei-Bryson, 2008). As Carter (1996) has pointed out it is not only difficult to measure knowledge at firm level, but also at country level. The difficulty in quantifying knowledge and innovations has obstructed research and policy, leading to an assumption of homogeneous regional dummies as representations of innovations and technology like the so-called Africa dummy now common to many growth regressions in which countries are arbitrarily group according to their income levels (Azam et al., 2002;Barro, 1991;Barro & Lee, 1993;Burnside & Dollar, 1997;Collier, 2007;Collier & Gunning, 1999a;1999b;Easterly, 2001;Easterly & Levine, 1997;Englebert, 2000;Jerven, 2011;Knedlik & Reinowski, 2008;Mauro, 1995). The results of these studies have clearly extended the Solow-Swan tradition, but just as clearly, they continue to leave the growth effects of technology and technological change unexplained. Despite existing gaps there has been little research on these issues in African countries.
We put emphasis on the measurement of KE at macro level in this paper. In this respect, a variety of composite indicators have been proposed that acknowledge the multidimensional aspect of the KE concept. Nevertheless, there is no clear consensus yet about indicators employed to measure a KBE. One of the main criticisms is that existing composite indicators tend to be data-driven, meaning that they use only the information available across countries, which essentially means countries for which data does not exist, or the available data has missing values, these countries are left alone in the dark (Shapira et al., 2006). For example, Archibugi et al. (2009) make a comparison of aggregate indicators of technological capabilities and conclude, 2 that the rankings at country level have consistently significant discrepancies for some nations (World Bank, 2016). Moreover, these indicators are less suitable to capture changes in technological knowledge over time. In many situations, the choice of indicators is restricted due to data availability and the mutual interdependencies between a combination of inputs and outputs indicators. Lastly, these composite indicators rely on the quality and accuracy of the national statistical institutionsa "statistical tragedy" in Africa's case (Devarajan, 2013).
In this paper, we take a more holistic approach in examining the determinants of KE, assuming that their relative performance can be assessed through a benchmarking methodology. 3 This is an old concept in the context of organizational comparisons but one that has, nevertheless, been commonly also used in the context of country comparisons to identify and compare the degree of competitiveness of countries (see for instance, Dolowitz & Marsh, 2000). Some examples of the benchmarking methodology are: first, the OECD's going for Growth Exercises which identify five productivity related policy priorities for each OECD member (OECD, 2005), second, the European Commission's Internal market Scoreboard (European Commission, 2020) which ranks member countries' performance in the implementation of the required legislation for internal market convergence, and lastly, the KAM that measures the countries' capacity to compete in what the World Bank has named the Knowledge economy (World Bank, 2007).
The formal literature leads us to employing the KAM (World Bank, 2007) framework for selecting the input and output indicators to measure countries' capacities to compete within the same KE (see Parcero & Ryan, 2017;Širá et al., 2020). KAM is a reasonable starting point because World Bank researchers constructed the knowledge economy index (KEI) as an element of the general knowledge index (KI) in any national economy in response to the need we outlined above. In addition, using data for the last available year for each indicator, Archibugi et al. (2009) show that there are positive and high correlations, ranging from 0.47 to 0.92, of this index with other composite measures of technological knowledge. Nonetheless, various composite indicators might measure different things, for if the correlations are relatively high, it simply means support for the choice of the indicators. The KAM is reasonable also because it is the most inclusive methodology in comparing and assessing the level of KE across countries, and it recognizes that the conditions leading to a KBE should include an institutional regime offering the right incentives, an educated and skilled labor force, a modern information infrastructure, and an effective innovation system. Shortly, the methodology involves four pillars or dimensions of KE and 148 indicators for 146 countries in the world, which are described briefly below (see, Chen & Dahlman, 2006;World Bank, 2012): . Pillar 1: Economic and institutional regime: It provides incentives for creation, dissemination, and use of the existing knowledge. It covers a diversity of f issues and policy areas ranging from aspects of the business environment, finance and banking, macroeconomic framework, regulations, governance, and institutional quality. The importance of institutions and their impact on economic growth has widely been recognized in the formal literature (Landes, 1998;North, 1990). Although inadequate drivers, institutions of governance are shown to promote economic growth (Blackburn & Forgues-Puccio, 2010) & so too are the incentives that economic and financial institutions offer (Ryan & Shinnick, 2011;Tchamyou, 2016;Andrés et al., 2015;Kauffman et al., 2010;Chen and Dahlman, 2006). The selected proxy variables for this pillar are corruption control index, regulatory quality, and the rule of law. . Pillar 2: Education: One of the most relevant pillars of the KE is human capital, education is a critical factor in the creation and dissemination of knowledge, and for the use of knowledge effectively. In addition, most new ideas and inventions are generated in knowledge clusters where scientific skills are required (Buesa et al., 2010, Marrocu et al., 2013. The selected indicators for the education pillar are gross primary, secondary, and tertiary enrollment rates. . Pillar 3: Information and communication infrastructure (ICT): It facilitates the effective communication, processing, and dissemination of information. ICT can be defined as a combination of hardware, software and communication networks that enable electronic information capture, storing, processing, and transfer. ICT and supporting technologies work in synergy in sustaining business activities and socioeconomic development (Borgmann, 2006). The economics literature has paid attention to the effects of ICT on innovation and socioeconomic development. The ICT term is largely used as an extension of or interchangeable with information technologies (IT).
Research suggests that total telephones and mobile phones positively influence innovation (see, Carayannis et al., 2013). Moreover, empirical research has also documented the impact of ICT on economic growth (Driouchi et al., 2006;Thompson & Walsham, 2010;Tripathi and Inani (2020); and Datta & Agarwal, 2004). Furthermore, Chavula (2013), and Qureshi (2013), among others, find that telecommunications infrastructure plays an important role in promoting economic growth, while for Grant and Yeo (2018), ICT affects the economy through technology investment and financing in the manufacturing and service industries. ICT indicators such as total telephones per person, internet use per person, and fixed broadband internet are used as proxies for the ICT pillar.
. Pillar 4: Innovation system: This dimension is more concerned with innovation outputs. A good innovation system consists of an interconnected array of universities, research centers, firms, consultants, and other organizations that generate, assimilate, and adapt knowledge. Previous research has paid attention to the characteristics of the national innovation systems and their relevance for economic growth and competitiveness (Lundvall et al., 2009). In terms of intellectual protection, it has been argued that stronger IPRs protection leads to more innovations (Arrow, 1962). It is also clear that the IPRs systems are not well developed in many countries, and this a clear evidence for the low levels of intellectual property creation measured by the patents per capita. The same applies to the universities as a source of new knowledge. African universities are focus on teaching and are behind in terms of research. This is mainly due to inadequate computers and network systems, unstable power supplies, and limited capacity to pay for subscription content (see, Marfo et al., 2011: Mitchell et al., 2020. Proxies such as scientific journal articles and patent applications in per capita terms are used to capture innovation. Notice that countries should keep a proper balance among the four pillars to create, disseminate and use of knowledge efficiently. Clearly, all these pillars are interrelated and connected (see Figure 1). From theory, we know that knowledge and technology can contribute to a country's wealth, because the generation of wealth at both the country and firm level can be represented by a conventional production function of the following log-log framework: where at the country-level Q is a measure of development, Conventional factors include capital and labor, and KEI include ICT, and so logically IT. In that sense, ICT is a relevant factor in the generation of wealth at country level as Torero and von Braun's (2006) have illustrated at both the national and firm levels.
In addition, this framework allows us to fully understand a country's strengths and weaknesses relative to the other countries, therefore being useful from the policy dimension as it can reveal country's problems and opportunities where policy makers can implement national strategies for achieving a KE. Lastly, we explicitly agree that there are inputs and outputs in our conceptual framework. Thus, we use patent data that is more an innovation output and education is more a necessary input to acquire technology or new inventions. Nevertheless, we do not explore the efficiency of economies on their way to KE (see for instance, Samoilenko & Osei-Bryson, 2008;van Bijon & Osei-Bryson, 2020) but rather how the dynamics towards a KE has changed through these four dimensions at national level.
Each of the pillars outlined above has several indicators as proxy variables for each dimension that ranges from zero to ten, suggesting that a higher index means a higher KE level. Based on data from 2012 (the latest year), Sweden tops the list with the KEI score of 9.43, followed by Finland with a 9.33 score. For comparison, the United States ranks 12th with a score of 8.77. Among African countries included in our sample, Mauritius is at 61st with a score of 5.52, followed by South Africa with a score of 5.21, Botswana with a score of 4.31, and Namibia with a score of 4.10. African countries with the lowest KEI scores in our sample are Angola and Sierra Leone with scores of 1.08 and 0.87, respectively. By comparing African KEIs to the rest of the world, most Sub-Saharan African countries (SSA) are still in the KE infancy with only a few countries such as Mauritius, and South Africa close to being on the transitional path in the journey towards a viable KE (see Agyapong & Oseifuah, 2015). As pointed out earlier in a related area, some scholars have in fact noted a decline over time in Sub-Saharan African countries' development towards a knowledge economy between the period 2000-2012, not only in the total KEI score but also in terms of the three pillars of the KE (education, ICT infrastructure, and institutional quality), see e.g. Anyanwu (2012), and Asongu et al. (2018). Indeed, the education and ICT pillars are the weak pillars in comparison with the innovation dimension, which is a matter of profound concern and deserve research attention, because overall human capital and ICT infrastructure are among the main facilitators of KE.

Clustering overview: GMM vs. k-means
Cluster involves the partitioning of a set of objects into a useful set of mutually exclusive such that the similarity between the observations within each cluster (i.e. subset) is high, while the similarity between the observations from the different clusters is low (see Mardia et al., 1979). Before proceeding with an overview of the clustering techniques and further justification of our choice in our analysis, we next discuss the reasons for doing clustering.
One of the reasons is to find a set of natural groups and the corresponding description of each group (see for instance, Samoilenko & Osei-Bryson, 2008). Moreover, this approach allows researchers to generate a classification of groups that is endogenous and non-arbitrary (based on cut-off points, and ad-hoc weights). Hence, the use of cluster analysis assumes that there are natural groupings in the data. Secondly, cluster analysis is a powerful tool because it allows researchers to explore the socio-economic phenomenon through interaction with organization, technology, and people (Balijepally et al., 2011). Xiong et al. (2014) point out that cluster analysis should be used in combination with other research methods, such as in determining the number of clusters, validating clusters, and multicollinearity among variables.
Cluster algorithms can be categorized in various ways such as Partitioning (e.g, k-means, kmedian), hierarchical, fuzzy, density-based, and model-based clustering (Hair et al., 2006). This paper utilizes the latter methods as its title indicates. We will now provide a general overview of the last approach and highlight its differences and advantages with other classical approaches (e.g. k-means).
Model-based clustering advances the earlier clustering methods like hierarchical and k-means clustering that are heuristic and less formal so that it is possible for different runs of one k-means algorithm to generate different results even when the user specifies the optimal number of clusters. Model-based clustering overcomes these weaknesses by considering the data as coming from a distribution that is a mixture of two or more clusters unlike k-means which assumes a specific probability for each cluster (Fraley & Raftery, 2002, Fraley et al., 2012. Theoretically, both k-means and GMM are partitioning clustering techniques, which try to divide the feature space into different regions and represent each of these actions by means of a prototype or centroid. The objective of this prototype is to be as representative as possible of the instances that fall in the region of space associated with it. In the case of k-means, the prototype is the mean vector of the feature vectors of the instances that have fallen in the region of the feature space associated with the prototype. These regions are bounded by linear edges and they are called Voronoid regions. By contrast, in GMM techniques, Gaussian probability distributions ("Models") are used as prototypes that represent each region of space. Multi-dimensional Gaussians are characterized by a mean vector (note that this is the only representation that k-means uses for its prototypes) and a covariance matrix. In a way, the main advantage of using GMM over k-means is similar to the advantage of characterizing a population using a probability distribution (GMM) or only the mean value of the population (k-means). There is no doubt that using a probability distribution better characterizes the population. Some of the consequences of this difference are: (1) k-means generates clusters with a spherical shape, while GMM provides more flexible representations by allowing the Gaussians to have different variation in different directions, as well as different orientations in space. Of course, in the case of using a Gaussian oriented with the axes of the feature space, and with the same variation in all dimensions, we have a sphere. In other words: everything that can be represented by a k-means prototype can be represented by a GMM prototype. The opposite is not true.
(2) In k-means the membership of a cluster is total; an instance either belongs or does not belong to a cluster. The real world is often more complex than this situation; for example, there may be countries that are in a transition between two states, without fully presenting the characteristics of either of them. In GMM each instance has associated a mathematical probability of belonging to a cluster. This probability, of course, could be 1 or 0, these being the cases in which the membership is clearer, and being equivalent to the representation capability of K-means. But it can also be any continuous value between (0,1), and an instance can belong, for example, to two different clusters representing two different states. This endows GMM with a greater ability to characterize transitions between states. On the other hand, the value of the probability of belonging to a cluster is a measure of the confidence that the sample actually belongs to the cluster. In k-means, all the samples that belong to a cluster, belong to it with probability 1. In GMM we have information about the certainty that we have about said membership as a continuous probability between [0,1]; i.e. we have more detail about the certainty of membership. (3) The evaluation of the k-means results is usually carried with ad hoc geometric criteria such as the Davies-Bouldin index, Dunn's index (Havens et al., 2008;Xiao et al., 2017), Levine-Domany index (Levine & Domany, 2001), the silhouette coefficient (Zhou & Gao, 2014), and other similar criteria (Bezdek & Pal, 1998). In these criteria, the quality measure of a clustering configuration is usually based on two basic concepts: (1) the instances that belong to a cluster should be as similar as possible to each other, and (2) the instances should be as different as possible from the instances of the rest of the clusters. These indexes try to quantitatively formalize both criteria using ad hoc strategies that seem reasonable to human intuition, but for which there are no mathematical proofs that lead to an optimum configuration, beyond satisfying the ad hoc criterion itself. Consequently, there is no guarantee that these different indexes will select the same clustering configuration when applying different criteria to the same clustering configurations. Furthermore, there is no mathematical proof that if one of the tested clustering configurations corresponds to the true underlying data structure, the index in question would prefer that configuration over the others. It is possible to use all these indices to evaluate a clustering configuration obtained with GMM. However, this is typically not done, and it is preferred to use the Bayesian Information Criterion (Chen & Chen, 2008). The Bayesian Information Criteria (henceforth, BIC) penalizes complexity and rewards for parsimony when comparing different models that differ in the extracted number of clusters, and at the same time, it tries to maximize the likelihood of observing the data set. BIC is not an ad hoc criterion but is based on a powerful mathematical formalism: the Bayesian theory. It can be mathematically proven that if BIC is used to find the optimum model (clustering configuration in the case that concerns us) that best fits a set of observed data from a set of candidate models, and the real model that has generated the data ("true model") it is present in the set of candidate models, BIC will always choose the true model as long as the data set is large enough to allow its adequate estimation (Neath & Cavanaugh, 2012). Note that BIC depends on the existence of mathematical models (probability distributions) for its computation, and therefore it cannot be applied to k-means. The authors are unaware of any reason to prefer ad hoc geometric criteria (such as the silhouette coefficient) over BIC to evaluate a GMM cluster configuration and definitely, in the machine learning literature, BIC is by far the most common measure used when evaluating GMMs. (4) k-means is more sensitive to the initialization of centroids and has a greater tendency to get stuck at local minima compared to GMM. In this sense, we must highlight that the GMM implementation used in the paper uses a deterministic initialization for the Gaussians based on hierarchical clustering that guarantees that the same results will always be obtained when executing the algorithm on the same data.
In short: k-means is computationally more efficient than GMM: i.e. it requires less execution time. From the authors' point of view, this is the only advantage k-means has over GMM. The rest of the aspects, including greater flexibility in the shape of the clusters (this approach does not bias the structure of the clusters to have a specific structure as k-means does), a more powerful description of these (probability distributions instead of mean value; the possibility that a sample belongs to several clusters with different probabilities instead belonging to a single cluster), a robust mechanism for selecting the number of clusters (BIC), and less tendency to fall local minimums, GMM is superior to k-means. This situation is quite expected when the algorithm is currently more than 60 years old (Jain, 2010). Obviously, there has been some additional progress in clustering research in those 60 years. That said, for small to medium data sets (a few hundreds, or a few thousand data points), each represented by one or two dozen variables, GMM typically runs in less than a second, so this should not be a problem in practice. For large data sets (millions, tens of millions of data), the GMM run time can be considerably longer and k-means may be preferred as a less powerful but faster alternative (see for instance, Márquez et al., 2019).

Overview of the dataset
We gathered the data from the World Development Indicators (see https://data.worldbank.org/ indicator). Observations on the key variables were selected based on data availability, because institutional variables from the World Bank, for instance, are available only since 1996. For that reason, our study covers the 1996-2017 years. Table 1 displays the World Bank's classification of the 50 selected economies according to 2019 Gross National Income (GNI) per capita. 4 There is only one African country in the group of high-income countries: Seychelles. In the middle, there are countries classified as "upper middle economies" (14%) while most of the African countries are concentrated in the lower middle income (40%) and low-income group (44%).
Variable definitions, and data sources are displayed in Table 1(A). Table 2(A) presents descriptive statistics, while Figure 1(A) shows the correlation matrix of all variables employed in the empirical analysis. The data employed in the empirical examination were analyzed using the R statistical package (R Core Team, 2019). This means that before implementing the cluster technique, we carry out a thorough descriptive analysis of the variables selected for our cluster analysis. We end up with 23% missing values. Table 2 shows the distribution of missing values across variables in our sample. Most of the missing values correspond to the fixed broadband variable for internet subscribers. Value imputation strategies are detailed further in Table 3(A) in the Appendix.

Description of the methodology
We shall now describe the main elements of our cluster approach. It is crucial to stress the GMM since is not widely used as the traditional methods in Economics. 5 Here, the data generation process (DGP) is assumed to be given by some finite mixture of probability distributions f (X|u), where X = (x 1 , x 2 , . . . , x i , . . . , x n ) is an nxm matrix of n instances, each of them comprised of m features; i.e. ( . , x m i ); x i represents one of the African countries, and each of the m features is one of the variables in Table 1(A); and u = (u 1 , u 2 , . . . , u g , . . . , u K ) are the parameters of the K Gaussian probability distributions that form the mixture (Verbeek et al., 2003), i.e. u i = {m i , S i }, m i being the mean of the i Gaussian and S i its covariance. Then the likelihood of an instance x i having been generated by the mixture of Gaussians will be the sum of the likelihood that that instance has been generated by any of the Gaussians. This implies that the density of x will be given by a finite mixture of the form: where K is the number of Gaussians; and w g acts as a weight that permits modeling the fact that different groups (clusters) may have a different number of instances within them (Ahlquist & Breunig, 2012). Since there are a total of n instances, the likelihood that all of these instances X having been generated by the mixture of Gaussians will be the multiplication of the likelihood of each of the instances x i having been generated by the mixture, i.e.
In practice, the data X is known, but the parameters of each of the mixtures u = (u 1 , u 2 , . . . , u g , . . . , u K ) as well as their weights w = (w 1 , w 2 , . . . , w g , . . . , w K ) are not known.
In model-based clustering, a process of searching for the parameters that maximize the likelihood of observing the complete set is carried out by means of a two-step Expectation Maximization algorithm (EM) (Jung et al., 2014). In a first step the likelihood that each instance belongs to each of the  mixtures is calculated. In a second step, the parameters of the mixtures and their weights are updated trying to maximize the overall likelihood (the so-called E-Step). These two steps are repeated multiple times, until the likelihood does not change, or until the changes in likelihood are negligible (the M-Step). Figure 2 below lays out the entire process implied by Equations (1) and (2).
In the GMM algorithms, there is a robust statistical criterion that assists the analyst in the selection of the optimal value of K: the classical BIC. The BIC is defined as where f (u) is given by Equation (2), n is the number of observations, and p is the number of parameters of the model (the number of parameters of all Gaussians in the case that concerns us). The smaller the value of the BIC, the stronger the evidence in favor of the corresponding model; i.e. BIC prefers simple models (a smaller number of parameters p implies a lower value of ln (n).p (given that n is constant) that have high likelihood (the term −2 · ln ( f (u)) decreases when the likelihood increases). The number of clusters is not considered an independent parameter for the purposes of computing the BIC. By calculating the BIC for different values of K and looking for the one that minimizes Equation (3), we can find the optimal number of clusters according to this criterion. Although Equations (1)-(3) may appear complicated, the steps involved in our modeling approach can be easily summarized as Figure 3 below clearly illustrates.
For the problem under study, we shall use the MCLUST package developed by Fraley and Raftery (1998), for which Scrucca et al. (2016) designed the R language for the application of clustering based on GMMs. We perform clustering with K = 1 up to K = 10, and we will evaluate the quality of each clustering configuration by using the BIC criterion. Our focus here is to describe the KE according to four dimensions outlined previously. Here the unit of observation is the pair country (i)-year(t). In this context, countries may stay in the same cluster over the entire period or move up or down to other clusters. We present and discuss the results of the analysis next.

Findings and discussion of results
After imputation, we ran the GMM-based clustering over the data trying from 1 to 10 different models. The best solution for the clustering, based on the BIC criterion (see Equation (3)), was obtained for 6 models. A summary of the selected mixture model is displayed in Table 3.
The classification of each cluster seems to be balanced. The first cluster contains 273 observations, the second 131, the third 143, the fourth 326, the fifth 164, and the sixth 113. For instance, 28% of the data points belong to cluster 4, and only 10% belong to cluster 6. Based on this statistical criterion, we select six clusters. Recall that we should look for the model that maximizes BIC (see Table 3, the best model is the VEV = ellipsoidal, equal shape) with 6 clusters. The parameterization of the covariance matrix, VEV means variable volume, equal shape, and variable orientation. In our application, we get a BIC value equal to −14,513.05. The Integrated Completed Likelihood criterion 6 (ICL= −14,624.42) is nearly identical to the BIC, implying that the E-Step and M-Step generated stable probabilities. Figure 4 displays the spherical plots for the initial classification for six clusters. The figure is a result of a principal components analysis (PCA) that projects our data on the first two principal components (linear combination of our original variables). The two dimensions (Dim2 on the y-axis and Dim1 on the x-axis) capture the most variation in our data (around 58%). The first component explains 44% of the variation and the second one accounts for 14%. This may sound too technical for our non-expert readers, but the main point is simple: There are clusters in the data.
By examining the entropy values for the K clusters, merging from 6 to 4 clusters is necessary since the decrease in entropy is large. Note that the entropy is only an exploratory tool that can help us to separate the clusters rather than a formal inference tool (see, for instance Baudry et al., 2010). 7 The lower entropy coefficient means better clustering. Moreover, from a simple inspection of the radar plots (see Figure 5(a,b)), we can see that Cluster 3 and Cluster 4 are quite similar. Further analysis of these two groups reveals that Cluster 4 performs badly in the education dimension (in particular, secondary, and tertiary school enrollment). Both clusters are similar in the ICT dimension and perform equally in terms of scientific output. If we look at the institutional variables, Group 3 is slightly better than Group 4, but Group 4 is better in terms of trade openness than Group 3. Lastly, we also merge cluster 1 with the previous merged clusters 3 and 4 resulting in four clusters. Table 4 provides the means of clusters.
Having grouped our six initial clusters into four, we shall now perform some statistical tests to check if the clusters are statistically different. We apply a Kruskal-Wallis test (henceforth, KKW, see Kruskal & Wallis, 1952) 8 for each numerical variable in our dataset. The results of the KKW test are statistically significant (p value < 0.01). Finally, for pairwise comparison across clusters, we carry out a Wilcoxon test (see Bauer, 1972). The results also display that there are statistically significant differences between paired clusters (1-2, 2-3, 3-4, 1-3, 1-4, and 2-4) with p-values < 0.01.
Since two of the six initial clusters are fuzzy (nearly similar), we can categorize countries into four groups according to their transition towards a KE. However, we do not neglect to assign certain labels to these clusters instead of relying solely on descriptions to explain each profile. Let us describe each of the clusters: . Cluster 1 (Very prepared): This cluster contains the pair country and year observations that perform the best in each of the KE dimensions. They are national economies with high quality institutional frameworks. They perform slightly better in terms of innovation output (patents and scientific  articles). They also perform better in terms of educational variables and are also strong in terms of ICT indicators. . Cluster 2 (Prepared): This cluster group follows the country-year observations in Cluster 1. There is a clear hierarchy according to the mean of the features of each KE dimension. For instance, there is significant difference in the education indicators of Cluster 2 compared to those of Cluster 1. The innovation variables are similar in terms of patents per capita, but there is a difference in performance related to scientific articles. . Cluster 3 (Unprepared): This cluster shows up low performance in education (low primary, secondary, and tertiary school enrollment), and in innovation (low number of patents and scientific articles per capita). In relation to institutional variables, they show quite similar values on average to Cluster 2. The countries classified in this cluster show low values of ICT (internet users and broadband internet penetration). . Cluster 4 (Very unprepared): This cluster groups has low performance in each KE dimension.
Whereas Table 4(A) in the Appendix presents a complete country-year picture of the composition of the four clusters, Table 5 compares Algeria and Botswana as an example. The table tells us that different economies are differently prepared for different reasons and at different times, i.e. clusters are not static. Algeria started off unprepared for transition to the KE, held back by the weakness of all the dimensions of KE. Improvements in the education and innovation dimensions allowed the transition to Cluster 2, but it was not until the quality of institutions made possible by the end of civil strife and national reconciliation (Hamdy, 2007). Botswana on the other hand took a different path; the country's stable and good quality of institutions permitted it to jump from being unprepared (Cluster 3) to being very prepared (Cluster 1). However, because, the education, innovations, and ICT dimensions of KE in Botswana remained weak, the quality of institutions has been the primary driving force. It makes good sense then that it is sensitive to shocks (real or perceived) to the general economy which in turn affected KE so that in 2009KE so that in , 2011KE so that in , and 2015 Botswana's KE slipped back to Cluster 2. These findings are consistent with what we discern from UNESCO's (Online) reports and from Hamdy (2007), Isaacs (2007) and Ouedraogo et al. (2021).
Lastly, Figure 6 displays the classification tree representation for African countries. Classification trees are easy to interpret and can give us an indication of which variable is more relevant within each cluster (see, Samoilenko & Osei-Bryson, 2008). This is, by no means, a way to validate our cluster formation; it is a way of providing plausible explanations that are easily interpretable by human beings of what variables can explain to which cluster each country belongs. The subsets created by the splits are nodes and the subsets which are not split are called terminal nodes. Each terminal node is assigned to one of our two labels (prepared and unprepared). At the top is  2003-2008, 2010, 2012-2014, 2016-2017 Good the root node with the TELEP3 variable. If FIXBI is lower than 6.915, then it goes down to another node. There we ask if the level of patents per capita is lower than 0.004725, then we classify the observations in the cluster unprepared and prepared. For the sake of interpretation, this final node tells us that 897 observations fall in the unprepared class, and 14 are incorrectly classified (Table 6). According to our decision trees, 180 observations in the cluster prepared can be explained with ICT variables, and only in 22 cases with the innovation indicators (patents and scientific publications per capita).
We also computed the overall accuracy of our model, and the accuracy is of 97%. 9 Table 5 displays the confusion matrix for our classifier once we merged all clusters into two categories. Table 6 displays that 899 observations were predicted in the class unprepared, and it turns out to be true. Similarly, 225 observations were predicted in the class prepared, and it turns out to be true.  Seven observations can be classified as False Positive, and 19 observations as False Negative. This classification tree has also remarkably interesting policy implications; it gives us relevant information on how countries end up in a similar stage of KE using different paths. These results demonstrate clearly that one-size-fits-all policies are mistaken; African KE's are similar, but not identical and hence require specific policies targeted to dimensions in which weakness lies. The findings also show that observed differences are irrespective of the level of development shown in Table 1, political institutions, or region.

Conclusions
The aim of this paper is to apply a GMM methodology to provide an alternative strategy to using simple composite indices to capture the level of the KE across countries over time. As a manageable objective for accomplish the goal we perform GMM clustering. Countries were grouped by the classical dimensions that characterize the KE at the country level according to the World Bank's KAM that consists of four dimensions: education, economic and institutional regimes, ICT, and the innovation. The method we employ is a promising technique that better aligns an empirical approach to understanding the KE with recent theoretical frameworks. Subsequent clustering analysis obtained: very prepared (Cluster 1), prepared (Cluster 2), unprepared (Cluster 3), and very unprepared (Cluster 4). Further analysis identified the evolution of any country over time as a country can belong to a different cluster in different years for different reasons. The results are consistent with the literature on both technology gaps and technology clubs. A simple interpretation of the latter is that technology clubs are dynamic, not static. The former suggests that technology gaps can exist within countries belonging to the same clubs.
The results clearly indicate that nations, irrespective of their different levels of economic development, face different issues in their KE pillars. Blanket policies may be necessary, but they are inadequate promotors of KE. In the light of the findings, we conclude that not all African countries are KEI-challenged; countries in Cluster 1 did well at least in some years. However, the evolution of KEI even in this cluster is not static; it depends on changes in the features of the pillars. If this result holds that KEI is generally dynamic, that countries move in and out of clusters, then the World Bank's ranking of countries by their KE levels is severely misleading as it might well be that some developing countries have stronger KEs than developed countries in some years. This is an area that needs further inquiry because our findings do not agree with previous studies which conclude that technology clubs are dynamic only for high-income countries and static only for low-income countries. An important implication of these findings is that pooling one class overlooks the heterogeneity in the KE process and leading to incorrect conclusions about the KE in African countries. Moreover, the findings are not optimistic. Most African countries are not prepared for progress towards KE.
Our research methodology helps us clarify further what we know of IT for development in that both the use of an advanced clustering technique, and the robust imputation of the missing values, have allowed us to find meaningful clusters of countries that helped us to understand better the KE phenomenon in Africa. Particularly, we now understand better and more clearly the dimensions or forces behind the transition of African countries towards a KE, and dimensions (institutional, education, ICT, and innovation) that each African country should reinforce to facilitate its transition towards a better KE, and hence sustainable social and economic development.
Two extensions from a methodological point of view can be considered. First, the choice of the variables in each dimension can be a bit arbitrary and result in selection bias in the clustering method because they are only proxy variables for each dimension. Even so, we see two directions for further examinations. One, future research can explore additional or alternative ways of measuring each dimension of the KE at the national level. In this respect, the methodology presented by Fop and Murphy (2018) can be employed for variable selection within model-based clustering. They compare models with different variables based on the BIC criterion. They employ a similar approach to stepwise regression. For that purpose, they employ a forward/backward and backward/forward feedback. While we are aware of potential benefits from alternative methodologies, we took them for granted in favor of the obvious advantage offered by the KAM dimensions. Even with this weakness, our approach is novel, the problem we addressed is relevant, and the results we obtained are robust and informative to both policy and future research.
Secondly, another extension of the current work is the use of a latent profile analysis in a panel data context. This approach possesses superior complexity, accounting for both time and cross-sectional dimension, and capturing the variability in each KE indicator over time. These models have been largely investigated in other fields such as: applied statistics, and biostatistics. Nevertheless, little attention has been paid on socio-economic applications. Some exceptions are Fruhwirth-Schnatter and Kaufmann (2008), and Fruhwirth-Schnatter (2011). As more data concerning the variables under the study becomes available for African countries additional studies should be conducted to improve the robustness of these findings.
The potential implications of our paper in terms of ICT for development, as we discussed earlier include the fact that ICT is a key driver for economic growth and wealth generation. Our results show that only a small group of African countries perform well in the ICT dimension. These group of countries are experiencing rapid growth in the adoption of ICT. We also show that broadband is not relevant variable for this group at this stage, but total number of mobile subscribers would be more important dimensions for both policy and further research to stress. It is also important to notice that prepaid mobile cards subscribers are a quite importance due to weak ICT infrastructure and lack of optical cables in these countries. Future research should take this into consideration because of their implications for development in those countries.
Finally, this methodology can be valuable from a managerial point of view as it can be used as an additional tool in the decision-making process; it allows managers to identify the current country's stage, its evolution over time, and what needs done to facilitate its progress. This is a benchmarking methodology that allows us to compare results with other reference countries that belongs to the same group and to learn from their best practices. This analytical approach can also help policy makers in a similar vein of the composite indices of KE constructed by international organizations, but with an added benefit of a robust classification even in the presence of missing data as often is the case in developing, especially African, countries. Indeed, we can also simulate different scenarios that could help to anticipate the group of the countries according to the values of the variables employed to empirically measure the KE. Notes 1. The OECD defines KBEs at a very general level as economies that are directly based on the production, distribution, and use of knowledge and information (OECD, 1996, p. 7). 2. They employ the Technological Readiness Index and the technological innovation index from the World Economic Forum, the Technological Advanced index edited by UNIDO (United Nations Industrial Development, the Global summary index from the European Commission, the Technological Activity Index (TAI) FROM UNCTAD, and ArCO (Archibugi & Coco, 2004). 3. Benchmarking can be defined as a sequence of activities that involves process and assessment (see Watson, 1993). 4. Recent applications of mixture models in different contexts can be found in Csereklyei et al. (2017), Sulkowski and White (2016), Alfo et al. (2008), Seo and Thorson (2016), Kumar (2019), and Clement (2020). 5. Available at https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-andlending-groups 6. See for further details, Biernacki et al. (2000). 7. Details available upon request. 8. This is a non-parametric statistical test that assesses the differences among three or more independently sampled groups on a single, non-normally distributed continuous variable, 9. R code is available upon request. The classification tree has been generated with rpart package (Therneau & Atkinson, 2019). rpart stands for recursive portioning and regression trees. In our context given that our variable is a factor then we deal with a classification tree. By default, rpart () function uses the Gini impurity measure to split the node. The higher the Gini coefficient, the more different instances within the node.