Open access peer-reviewed chapter

Logistic Regression: Risk Question for Disabled People

Written By

Paulo Tadeu Meira e Silva de Oliveira

Submitted: 04 February 2022 Reviewed: 30 June 2022 Published: 26 August 2022

DOI: 10.5772/intechopen.106212

From the Edited Volume

Recent Advances in Medical Statistics

Edited by Cruz Vargas-De-León

Chapter metrics overview

84 Chapter Downloads

View Full Metrics

Abstract

All over the world, since ancient times, disabled people have always had worse health, education, economical participation, and higher poverty rate compared to non-disabled people. For disabled people to achieve better and more lasting prospects, these people must be empowered and seek to eliminate barriers that prevent them from participating and being included in the community, having access to quality education, finding decent work, and having their voices heard. In statistical terms, a useful alternative that can serve as support and monitoring of public policies in this area is to propose, for continuous use, the risk index called risk index for disabled people (long-term physical, hearing, intellectual, or sensory), which consists of evaluating which factors are associated with this risk, as well its intensity and direction of each of these factors, generating a final score that can be ordered or classified, according to non-disabled person probability became disabled person. In the Brazilian case, we propose the use of binary and ordinal logistic regression techniques to select the most significant factors using criteria such as AIC and BIC and calculate the risk probability for different disabilities (visual, hearing, physical, and intellectual) for the dataset. Sample composed of 20,800,804 respondents to the 2010 IBGE Census Complete Questionnaire.

Keywords

  • disabled people
  • disability risk
  • variable selection
  • model selection
  • stereotype ordinal logistic regression

1. Introduction

According to the World Health Organization (WHO) in 2010, it is estimated that more than one billion people from all over the world, representing about 15% of the world population and in the case of Brazil, according to the Geography Brazilian Institute (IBGE) in 2010, it is estimated that 45.6 million people, equivalent to approximately 23.9% of the Brazilian population, live with some type of disability. In general, disabled people have worse health prospects, educational, economical participation, and a higher rate of poverty compared to non-disabled people.

Disabled people make up a group of excluded people who have always aroused feelings that range from repulsion to extreme pity and have even been considered less human or lacking in humanity. Currently, within the scope of social and educational inclusion policies, they have become the target of affirmative actions, which seek to guarantee their rights in various aspects of life in society [1, 2].

It is believed that the low working conditions of disabled people are due to situations such as: difficulty in accessing education, inadequate infrastructure, prejudice, little knowledge, and better accessibility conditions on the part of schools and companies that make these people have a lower education, which makes it difficult to enter the formal job market [3].

In order for disabled people to achieve better and more lasting prospects, it is necessary to empower them and remove barriers that prevent them from participating in the community, accessing quality education, finding decent work, and having their voices heard [4].

To better assess the needs of disabled people, it is necessary to describe this group of people to know the answers to questions such as: How many are there? Where they live? How do you live? What implications does disability have on these people’s access to all the different human services in an autonomous and comprehensive way? In short, how can disability influence the life quality of these people?

In statistical terms, it shows the existence of few formal studies, among which the data obtained through censuses stand out, allowing questions such as: How are disabled people distributed across the country? How to assess the access of disabled people to the different services mentioned earlier? How is the evolution of disabled people when comparing them with those without disabilities? What would be the variables that most contribute to cases of disability? How do disabled people compare to people without disabilities? Answering these and other questions can contribute to better support for these people so that they can be better assisted and resources to be better managed and optimized by public policy actions in this area.

Statistically, a useful alternative to assist in the monitoring of public policies in this area is the risk index, which consists of evaluating which factors are most impacting for this risk, as well as its intensity and direction, generating a score that can be ordered or classified according to the probability of people becoming disabled. In the case of this work, we propose the use of techniques such as binary and ordinal logistic regression to select the most significant factors by applying criteria such as binary and ordinal logistic regression to select the most significant factors by applying criteria such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Deviation Information Criterion (DIC) and calculate the risk probability for the different disabilities (vision, hearing, movement, and intellectual) for the sample dataset composed of 20,800,804 respondents of the Complete Questionnaire of the IBGE 2010 Census by state, region, and country.

In a previous work [1], we considered as response variable, the different disabilities, and the existence of at least one disability as binary variable, that is, whether a given individual is or is not a disability person. In this work, we are considering the different deficiencies, incorporating their different degrees of severity and number of deficiencies as ordinal response variable, which allows better quality in terms of information and fit in the model.

In Section 2, we present an introduction to the problem, we establish and characterize the variables to be used, the stereotyped ordinal logistic model, selection of variables such as the Wald test, and models using the AIC, BIC, and DIC criteria, and we define the risk of disability for different degrees of severity “cannot at all,” “can, but with great difficulty” and “can, but with a little difficulty” for visual, hearing and physical disability, and, in the case of intellectual disability, it was proposed the following levels the use of the risk “has” or “does not have” intellectual disability. In Section 3, we present results and discussions; and, in Section 4, we present conclusions and suggestions for future work.

Advertisement

2. Materials and methods

2.1 Motivation

For better inclusion of disabled people, it is important to know what are the factors that most impact the conditions of these people. In this work, we propose the adjustment of stereotyped ordinal logistic models to incorporate the most significant factors using AIC, BIC, and DIC as selection criteria, creation and determination of the risk of deficiency for a set of sample data of the respondents of the Complete Questionnaire of the 2010 IBGE Census.

2.2 Data description

The variables were obtained directly from the questionnaire applied to the dataset of the sample that responded to the Complete Questionnaire and can be found on the website www.ibge.gov.br in the 2010 Census, sample, and microdata with more details about its description in Oliveira [1].

2.3 Ordinal logistic regression

A good number of the variables used in the social sciences and humanities are ordinal. Often, the dependent variable takes discrete values, or sortable categories, but the distance between them is neither known nor constant. For example, in epidemiological studies, the level of severity of visual, hearing, or physical is set out in the 2010 Demographic Census Sample Questionnaire, which can be classified as ¨can not at all,¨ ¨he succeeds, but with great difficulty,¨ ¨can, but with a little difficulty,¨ and, finally, “no problem” to hear, see, or get around. In the case of intellectual disability, it is divided into ¨has¨ or ¨has not.¨

Among possible adjustment models for ordinal logistic regression, the following ones stand out: proportional probability model, more suitable for interpretation when the response variable is continuous and has been categorized; continuous ratio model, suitable in situations where there is specific interest in a particular category of the response variable; partial proportional probability model that allows to moderate covariates with the assumption of proportional probabilities, and for other variables in which this assumption is not satisfied, specific parameters that vary for the different categories compared and an extension of the proportional probability model are included in the model; and finally, stereotype model, proposed by [5, 6, 7]) used in situations where the response variable is ordinal, which is not a discrete version of some continuous variable that was considered in this research.

For this work, we have response variables: visual, hearing, physical, and intellectual disabilities, which are ordinal variables. In view of this, we adopted the stereotyped model in this work.

2.3.1 Stereotype model specification

Imagine that the dependent variable consists of J categories (m = 1, …, J) and consider K predictors (J = 1, …,K). The stereotype ordinal model is defined at an early stage with the multinomial regression model to which the condition is added βmJφmβ̂, where J is the reference category, that is, we have that the multinomial regression model is given by:

Proby=mx=expβmJxj=1JexpβmJx,withm=1,,JE1

Replacing βmJ=φmβ in Eq. (1) results in the stereotype model that can be written mathematically.

Proby=mx=expφmβxj=1Jexpφ1βx=expφmβ0+φmβ1x1++φmβkxkj=1Jexpφmβ0+φmβ1x1++φmβkxk,withm=1,,JE2

For some parameters of Eq. (2) that are not identifiable, we consider as constraints φmβ0θmm=1J,where ϕJ = 0; and φmβjθmβjm=1Jej=1k,whereφJ=0eφ=1. Thus, from Eq. (2), the stereotype model can be written as follows:

Proby=mx=expθmφmβxj=1Jexpφ1βx,E3

with m = 1, …, J where θJ = 0, φJ=0whereφ=1.

2.3.2 Interpretation of estimated coefficients

Applying logarithm in function (3) to any two categories, we get:

logpY=q/xpY=r/x=θqθrφqφrβx.E4

Applying the exponential function to the exponential function to Eq. (4), it follows

Ωq/r=pY=q/xpY=r/x=expθqθrφqφrβx.E5

Eq. (5) allows us to evaluate the odds ratio before and after we add a unit to the variable xj, that is,

Ωq/rxxk+1Ωq/rxxk=expφrφqβx.E6

The value obtained in expression (6) can be interpreted as adding a unit to the variable xk, the odds ratio of category r varies expφrφqβk, keeping all other variables constant.

2.3.3 Estimation of estimated coefficients

The parameters of the stereotype model are estimated by the maximum likelihood method, in which the estimators are obtained by the system of equations given in (7) as follows:

pi=Probyi=1xiφθifyi=1Probyi=mxiφθifyi=mProbyi=Jxiφθifyi=JE7

where pi is the probability of observing any value of y, and the Probyi=1xiφθ was defined in expression (3). Assuming that the sample is independent and identically distributed, the likelihood function is given by the following expression (8):

Lβφθyx=i=1Npi=m=1Jy=mProby=mxφθE8

on what y=j indicates the multiplications over all cases where y = m (m = 1, …,J). Applying logarithm to the likelihood function obtained in (8), we obtain the logarithm of the likelihood function given in (9) as follows:

logLβφθyx=m=1Jy=mlogProby=mxφθ.E9

The parameters ϕ’s and θ’s of Eq. (9) are estimated by the Newton–Raphson method.

The odds ratio formed will have an upward trend, as the weights can be produced by sorting. Thus, the effect of covariates on the first odds ratio is smaller than the effect on the second, and so on.

These weights can be done a priori, being estimated by a pilot study or by a set of properly chosen values.

In the case of this work, the number of disabilities that a person may have can vary from 0 to 4, and there may be five response options.

In order to assess the goodness of fit for ordinal models, it can be done using tests such as Pearson’s or deviation. These tests involve creating a contingency table in which the rows consist of all possible configurations of the model’s covariates and the columns are the ordinal response categories [8]. The expected counts (Elj) from this table are expressed by Elj=l=1NLp̂ij,on where NL is the total number of individuals classified in the row l and p̂ij represents the probability of an individual in line l having the answer j calculated from the adopted model.

Pearson’s test to assess the adequacy of fit compares these expected counts with those observed by the formula:

χ2=l=1Lj=1kOljElj2EljE10

The deviance stat also compares observed (Olj) and expected counts, but using the formula:

D2=2l=1Lj=1kOljlogOljEljE11

The tests to assess the goodness of fit of the model are given by approximations of statistics (10) and (11) for chi-square distribution with (L – 1)(k – 1)p degrees of freedom, where L and k are as defined earlier and p is the number of model covariates. Significant differences lead to the conclusion that the model does not fit the data studied.

As an alternative, we will use the Wald test which is given by:

W=p̂p̂0V̂p1p̂p̂0E12

on where V̂p is the consistent estimator of the variance-covariance matrix of the estimator p̂ of the proportion vector p̂. An estimator V̂p can be obtained by linearization method.

2.3.4 Significance test for the model

The Wald test for the parameters considered individually can be obtained by comparing the estimate of maximum likelihood of a given coefficient β̂j with the estimate of its standard error (based on the asymptotic distribution of the maximum likelihood estimators). Thus, the null hypothesis and the alternative hypothesis of the test are respectively:

H0:β̂j=βjvsH1:β̂jβjj=2k,E13

the respective statistic under the null hypothesis:

T=β̂jβjvarβ̂jN01E14

By rejecting H0, for a significance α, we conclude that the estimated parameter is statistically different from βj. Generally, use βj=0 which, under these conditions, we conclude that the parameter is relevant to explain the behavior of the dependent variable.

2.3.5 Selection of variables

Selecting variables means choosing a subset that retains the most important predictor variables in such a way that we seek to avoid problems such as multicollinearity and that this subset fits as well as the complete model and contains the most important predictor variables.

Among different procedures that can be used to select variables, we highlight forward stepwise and backward stepwise. Forward stepwise starts with the constant β0 and sequentially adds the predictor Xi most correlated with Y to the model so that it improves the fit according to the evaluation of the F statistic and the introduction of variables when it fails to produce an F statistic greater than the 90th or 95th percentile of the distribution, F1, N – k – 2, where N is the sample size and k is the number of variables.

On the other hand, the backward stepwise selection strategy starts with the model with all independent variables, and sequentially, excludes variables using the F statistic to choose the predictors to be eliminated. The predictor that has the smallest F statistic is eliminated, and the process stops when each predictor eliminated from the model has an F value greater than the 90th or 95th percentile of the distribution, F1, N – k – 2. For this work, forward backward and the Wald statistic were chosen.

In ordinal logistic regression, the TRV (likelihood ratio test) ensures the significance of the fit. Thus, at each stage of the process, the most important variable, in statistical terms, is one that produces the greatest change in the logarithm of the likelihood in relation to the model without the variable [9].

After estimating the parameters, the next step is to verify if the covariates used for modeling are statistically significant for the modeled event, for example, condition of an individual becoming a disability person.

To test the significance of the coefficient of a covariate, it is sufficient to compare the observed values of the response variable with the predicted values obtained by the models with and without the variable of interest [10].

The comparison between observed and predicted values is made using the likelihood ratio test, which is widely applicable by the maximum likelihood estimation.

For test H0:θΘ0versusHa:θΘ0c, we calculated the statistics [11]:

λx=supθ0Lθ/xsupθLθ/x.Forn,2lnλxχv2.E15

where ν is obtained through the difference between the number of parameters existing in the tested model and the number of parameters existing in the saturated model [12].

To verify the quality of the adjusted model, it is sufficient to compare the observed and predicted values for the response variable (in this case, one of the different deficiencies already mentioned).

When choosing a particular model, it means that we must include as many independent variables as possible to improve the forecast; simultaneously, we want to include a smaller number of variables for reasons of cost and simplicity [10].

According to Draper and Smith [13], to select the best model is to reconcile two objectives (incorporating a certain number of variables that can improve the predictability of the model, at the same time, discarding variables that are not significant as a way of simplifying the model to reduce costs). This selection involves a dose of subjectivity, and the result may be different if the procedure is used for selection changes.

2.3.6 Model selection

Selecting a model means, after the formulation and adjustment of different plausible models, to select the model that ¨best¨ fits the data of a certain experiment according to a certain criterion adopted [14].

In statistics, there is a vast literature relevant to the selection of models [15, 16, 17]. An alternative for model selection is the use of methods based on the likelihood function that provides several statistical measures that help in the comparison between different models. The most common of these measures are as follows: Akaike Information Criterion (AIC) proposed by Paulino et al. [18] and Sakamoto et al. [19] with penalty given discounting the value of twice the difference between the number of parameters between the two models; Bayesian Information Criterion (BIC) discussed by Paulino et al. [18] and having as a penalty the value of double the number of parameters between the two models multiplied by the Naperian logarithm of the sample size; and, finally, Deviation Information Criterion (DIC) also discussed by Paulino et al. [18] and the penalty is given by the sum of the difference value between the number of parameters between the two models.

In this text, for each of the AIC, BIC, and DIC criteria, the model with the lowest value for each one of them is chosen.

2.4 Epidemiology

According to the International Epidemiology Association (IEA), epidemiology is defined as the study of the different factors involved in the spread and propagation of diseases, frequency, their mode of distribution, their evolution, and the placement of the necessary means for their prevention in human communities.

According to Suser [20], epidemiology is essentially a population science, which is based on the social sciences for the understanding of social structure and dynamics, on mathematics for statistical, probability, inference, and estimation notions, and, on the biological sciences, the knowledge of the environment organic substrate where the observed manifestations will find individual expression.

A single and precise definition of epidemiology as a scientific field ends up not being possible due to the increasing complexity and scope of its current practice:

Science that studies the health-disease process in society, analyzing population distribution and determining factors of risk, diseases, injuries, and events associated with health, proposing specific measures for the prevention, control, or eradication of diseases, damages, or health and protection problems, promotion or recovery of individual and collective health, producing information and knowledge to support decision-making in the planning, administration, and evaluation of health systems, programs, services, and actions [21].

Epidemiology is a basic discipline of public health aimed at understanding the health-disease process within populations, an aspect that differentiates it from clinical practice, which aims to study this same process, but in individual terms and that studies the different factors that intervene in the spread and propagation of diseases, their frequency, their mode of distribution, their evolution, and the placement of the necessary means for their prevention.

In scientific terms, epidemiology is based on causal reasoning; as a public health discipline, focusing on the development of a sequence of actions aimed at protecting and promoting the health of the community.

Epidemiology is also an important tool for policy development in the health sector. Its application in this case must be taken into account the available knowledge, adapting it to local realities.

Among the possibilities of applications of epidemiology, we highlight: the analysis of the health situation; identify profiles and risk factors; carry out epidemiological assessment of services; study and understand the causality of health problems; describe the clinical spectrum of diseases and their natural history; assess the performance of health services in responding to the problems and needs of populations; test the efficacy, effectiveness, and impact of intervention strategies, as well as the quality, access, and availability of health services to control, prevent, and treat health problems in the community; identify risk factors for a disease and groups of individuals who are at greater risk of being affected by a particular disease; define modes of transmission; identify and explain patterns of geographic distribution of diseases; establish methods and strategies to control health problems; establish preventive measures; assist in the planning and development of health services; and, finally, establish criteria for health surveillance.

In the discussion about disabilities, epidemiological views on social points of view, accessibility, assistive technology, among others, were used in these researches, and, physicians, from the perspective of prevention, treatment, and control.

2.5 Disability risk

According to the WHO:

  • The prevalence of disabled people is high;

  • The number of disabled people increases due to the aging of the population and the global improvement in chronic health conditions associated with disability such as diabetes, cardiovascular disease, and mental illness;

  • Diverse experiences in which disability resulting from the interaction between health conditions, personal, and environmental factors vary widely; and finally,

  • Factors such as prevalence, purchasing power, working conditions, and education are considered risks for people to become disabled. Causes like these that can aggravate this situation in vulnerable populations.

Given this scenario, reasons have emerged that justify the need to assess the well-being or disabled people life quality, we propose the creation of the risk index for disabled people, composed of the weighting of the responses of the different variables obtained from the microdata of the IBGE Census and selected as significant after applying backward stepwise methodology in an ordinal logistic regression adjustment of the stereotype type for each disability studied. This methodology gradually emerged from simpler techniques to more complex techniques such as multivariate as factor analysis.

2.6 Epidemiological risk

In the area of health, several studies on risk are located in the epidemiological area. Briefly, epidemiological risk can be summarized as the probability of the occurrence of a health-related event, estimated from the occurrence of an event that occurred in the recent past. In this way, this risk can be computed by quantifying of times the event occurred divided by the potential number of events that could have happened. In this way, the risk of becoming a disability person in a given population or group of people is the amount of disabilities persons that occurred in the previous period by the number of people existing in that period, since any person or all can potentially become a disabled person.

The definition of the epidemiological risk concept and the method incorporated by the medical area end up defining lifestyles producing meanings that guide behaviors; thus, a form of individual surveillance is articulated in a pulverized, internalized, and less visible way, translated into self-control [22].

In this work, we are considering the risk of a given person becoming a disability person, including a set of health and social factors.

Advertisement

3. Results and discussions

For this work, we used ordinal logistic regression analysis for each of the following response variables:

  • Disabilities, which represent the number of disabilities that each person has and can assume a value between 0 and 4 disabilities;

  • Disability to see, hear, and move considering the categories: 0, ¨for those who cannot at all,¨ 1, ¨for those who can, but with great difficulty,¨ 2, ¨for those who can, but with a little difficulty, ¨ and, 3, ¨for those who do not have a problem¨;

  • Intellectual disability, considering the categories ¨have¨ or ¨have not,¨ and finally;

  • For statistical analysis, the following programs were used SPSS, Statistica, R, and Excel.7

For this study, the variables were divided into blocks such as: identification of respondents, education, family, and work. For each of these blocks, the models were adjusted considering the variables considered significant were applied:

  1. Selection of variables using the backward stepwise procedure, excluding variables that are not significant by the Wald test at each step;

  2. Repeat step a) until there are no more variables to be deleted;

  3. For each of these adjustments, calculate AIC, BIC, and DIC model selection criteria;

  4. Select the best model among the different final models for each of the different deficiencies and number of deficiencies for the criteria: AIC, BIC, and DIC, and finally;

  5. Calculate for each individual the risk of being a disability person for different degrees of severity, disability, and number of disabilities.

Figures 18 present in item (a) the risk graphs of being a person with one (represented by p1 in blue dots), two (represented by p2 in red dots), three (represented by p3 in green dots), four disabilities (represented by p4 in purple dots), and at least one disability (represented by pt in black dots) and in item b) of being a visually disabled person for each different degrees of severity: “total blind” (represented by p1 in blue dots); “low vision” (represented by p2 in red dots); “lighter visual” (represented by p3 in green dots); and, finally “visually disability person” (represented by pt in purple dots) for the variables: region in Figure 1, sex in Figure 2, age in Figure 3, race in Figure 4, education in Figure 5, main job in Figure 6, income categorized in Figure 7, and number of children in Figure 8.

Figure 1.

Graphs of probability of occurrence: (a) of a certain number of disabilities and (b) of visual disability according to their degrees of severity for variable region.

Figure 2.

Graphs of probability of occurrence: (a) of a certain number of disabilities and (b) of visual disability according to their degrees of severity for variable sex.

Figure 3.

Graphs of probability of occurrence (a) of a certain number of disabilities and (b) of visual disability according to their degrees of severity for age variable.

Figure 4.

Graphs of probability of occurrence (a) of a certain number of disabilities and (b) of visual disability according to their severity degrees for race variable.

Figure 5.

Graphs of probability of occurrence (a) of a certain number of disabilities and (b) of visual disability according to their severity degrees for education.

Figure 6.

Graphs of probability of occurrence (a) of a certain number of disabilities and (b) of visual disability according to their severity degree for main work.

Figure 7.

Graphs of probability of occurrence (a) of a certain number of disabilities and (b) of visual disability according to their degrees of severity for income.

Figure 8.

Graphs of probability of occurrence (a) of a certain number of disabilities and (b) of visual disability according to their degrees of severity for number of children.

In Figure 1, the following regions were considered: 1 – “north,” 2 – “northeast,” 3 – “southeast,” 4 – “south,” and 5 – “central west.”

Starting from the graphs in Figure 1 for the region, we see that the highest incidence risks in item a) of disability and in item b) of visual disability are found in the northeast region for all different degrees of disability and all different severity degree. In contrast, the lowest incidence rates in a) number of disabilities are found in the Midwest region and b) the lowest incidence of risk of visual disability is found in the South region.

Figure 2 shows (a) the risks of being a disabled person, and (b) the risk of incidence of visually disabled person considering genders 1 – male and 2 – female.

From the graphs in Figure 2, it can be seen that in all cases, the highest risk of incidence of: (a) disability and (b) visual disability is higher for females.

On the other hand, Figure 3 presents the risks of incidence of: (a) disability and (b) visual disability as a function of age.

In Figure 3, it is possible to notice that the risks of disability in (a) and visual disability in (b) increase as the age of the people interviewed increases.

It is also noted in Figure 3 that, from a certain age, starting at 80 years old, the points begin to be randomized, and this type of occurrence is believed to be due to a smaller number of people in these older age groups.

Foe the races in Figure 4, the races were defined as: 1 – White, 2 – Black, 3 – Yellow, 4 – Brown, and 5 – Indigenous.

As for the results of Figure 4, we note that the highest probability of occurrence of disability and visual disability is found in the Yellow race and lower in the Indigenous race.

Next, for Figure 5, we considered for education: 1 – “between no education and incomplete elementary,” 2 – “between complete elementary and incomplete high school,” 3 – “between complete high school and incomplete higher education,” and, finally, 4 – “complete higher or more.”

Continuing, examining Figure 5, we found that the highest occurrence of new cases of risk of disability and risk of visual disability is found in 1, “among no schooling and incomplete elementary school,” while the lowest incidence of these risks is found in 3, “between high school complete and incomplete elementary higher education” in all situations.

For the main job in Figure 6, we consider the following levels: 1 – “employees with a formal contract,” 2 – “military and statutory civil servants,” 3 – “employees without a formal contract,” 4 – “own account,” 5 – “employers,” 6 – “unpaid,” 7 – “workers in production for their own consumption,” and, finally, 8 – “total.”

Observing the graphs in Figure 6 for the type of main job, we see that the highest risk of incidence of disability and visual impairment are found in 6, “workers in production for own consumption” and the lowest risk of incidence in both cases was found in 2, “employees with a formal contract.”

Continuing in Figure 7 with income, we adopted as criterion: 1 – “between 0 and 1 minimum wage,” 2 –“between 1 and 3 minimum wages,” 3 – “between 3 and 7 minimum wages,” 4 – “between 7 and 15 minimum wages,” and, finally, 5 – “15 minimum wages or more.”

From the results obtained in the graphs in Figure 7, we can see that the highest risk of incidence of disability and visual disability was found in 1, “between 0 and 1 minimum wage,” and it is noted that this risk decreases as income increases of the person interviewed.

Finally, in Figure 8, a scatter plot was made for the risk of incidence of disability and visual disability as a function of the number of children.

As for Figure 8, it is possible to verify that the risk of disability and visual impairment increases as the number of children increases.

This result may reflect situations such as: a greater number of children can mean a greater number of accidents and less parental attention to each child in social and economic terms.

Tables 15 shows results for the analyses: stereotype ordinal logistic regression; selection criteria for AIC, BIC, and DIC models and for point and interval estimates of the parameters considering as response variable for the adjustments having as a response variable the deficiencies: number of disabilities (Table 1), visual (Table 2), hearing (Table 3), physical (Table 4), and intellectual (Table 5) marked in bold, as well as the explanatory variables included in the final model for each of the adjustments for significant variables according to the backward stepwise method.

VariablesEstimativesStandard errorsWalddfp-valueConfidence interval 95%
Lower limitUpper limit
Disabilities0−.210.0757.7861.005−.358−.063
11.922.075650.4171.0001.7742.070
23.938.0772636.8901.0003.7874.088
37.034.1044562.7831.0006.8307.238
Region1.250.013365.8721.000.225.276
2.295.011714.2161.000.273.317
3−.071.01051.1791.000−.090−.051
4−.181.011281.2851.000−.202−.160
500
Naturalness1−.060.00697.1271.000−.072−.048
2.076.01431.2791.000.049.103
300
Read and write1−.428.015869.8931.000−.456−.399
200
Childcare1−.022.023.9531.329−.066.022
2−.114.02421.5741.000−.162−.066
3−.012.019.4121.521−.050.025
400
Occupation condition1.086.0592.1301.144−.029.201
2−.205.05912.0911.001−.320−.089
3−.405.05947.3191.000−.520−.289
4−.486.05967.1491.000−.602−.370
50a0
Instruction level1.037.0147.1431.008.010.063
2−.060.01417.0071.000−.088−.031
3−.006.017.1181.731−.039.028
400
Union nature1.297.014473.4001.000.270.324
2.480.024411.5061.000.434.527
3.561.0161209.1351.000.529.592
4.819.0231244.4241.000.773.864
500
Marital status1−1.055.0155009.2571.000−1.084−1.026
2−.909.0135024.6431.000−.934−.884
3−.509.0131621.9051.000−.534−.485
400
Income1.304.03289.9341.000.241.367
2.171.03229.1011.000.109.233
3.170.03228.0471.000.107.233
4.133.03514.3781.000.064.202
500
Return1−.221.017174.5621.000−.254−.188
200
Main job1−.209.02379.6981.000−.255−.163
2−.460.13511.6021.001−.724−.195
3.066.0257.0061.008.017.114
4−.131.02331.4781.000−.176−.085
5−.066.0247.6201.006−.113−.019
6−.428.032180.5771.000−.491−.366
700

Table 1.

Point and interval estimates of the parameters of the logistic model considering the number of deficiencies (deficiencies) as the response variable.

VariablesEstimativesStandard errorsWalddfp-valueConfidence interval 95%
Lower limitUpper limit
Visual disability1−5.190.0725147.1651.000−5.332−5.048
2−2.012.066935.5711.000−2.141−1.883
3.177.0667.3101.007.049.306
410.966.1406174.5371.00010.69211.239
Region1−.272.013421.4761.000−.298−.246
2−.262.011544.8961.000−.284−.240
3.141.010193.1141.000.121.161
4.275.011609.2901.000.253.297
500
Naturalness1.051.00667.1991.000.039.064
2−.077.01430.1911.000−.105−.050
300
Read and write1.369.014710.2971.000.342.396
200
Childcare1−.041.0223.9401.049−.084.002
2.043.0243.0981.078−.005.091
3−.018.019.9311.335−.054.018
400
Instruction level1−.062.0601.0641.302−.181.056
2.226.06014.0151.000.108.345
3.425.06049.5291.000.307.544
4.460.06157.5771.000.341.579
500
Union nature1−.223.0071108.6251.000−.236−.210
2−.135.008260.2761.000−.151−.118
3−.067.01617.0051.000−.099−.035
400
Children11.101.0155656.8021.0001.0721.130
2.922.0125599.1021.000.898.946
3.498.0121696.9171.000.474.522
400
Return1.213.016166.4271.000.180.245
200
Condition100
Situation100

Table 2.

Point and interval estimates of the logistic model parameters considering visual disability as the response variable.

VariablesEstimativesStandard errorsWalddfp-valueConfidence interval 95%
Lower limitUpper limit
Hearing disability1−6.251.0815885.0791.000−6.410−6.091
2−4.350.0793058.6651.000−4.504−4.196
3−2.495.0781017.0601.000−2.648−2.342
412.767.2881962.3301.00012.20213.332
Region1−.098.01829.7491.000−.133−.063
2−.299.015401.5181.000−.329−.270
3−.015.0141.2041.273−.043.012
4.005.015.1311.718−.024.035
500
Naturalness1.039.00822.2191.000.023.055
2−.110.01838.6631.000−.145−.075
300
Read and write1.449.0121302.2861.000.424.473
200
Instruction level1−.445.07535.1781.000−.592−.298
2−.119.0752.4751.116−.266.029
3.123.0752.6881.101−.024.271
4.292.07614.7601.000.143.441
500
Marital status1−.122.009177.5451.000−.139−.104
2−.395.021352.2621.000−.436−.354
3−.468.016824.9931.000−.500−.436
4−.816.0143433.9261.000−.843−.789
500
Children1.826.0162645.8641.000.795.858
2.709.0132870.2221.000.683.735
3.432.0121207.6551.000.407.456
400
Condition1.042.0149.3561.002.015.068
200
Situation100
200

Table 3.

Point and interval estimates of the parameters of the logistic model considering hearing disability as the response variable.

VariablesEstimativesStandard errorsWalddfp-valueConfidence Interval 95%
Lower limitUpper limit
Walk disability1−5.591.1251987.2991.000−5.837−5.345
2−2.726.120512.0611.000−2.962−2.490
3−1.146.12090.8321.000−1.382−.911
413.027.2343110.9801.00012.56913.485
Region1−.182.02271.1721.000−.224−.139
2−.370.018436.8021.000−.404−.335
3−.086.01628.3751.000−.118−.055
4−.047.0187.1121.008−.082−.013
500
Naturalness1.012.0101.5761.209−.007.031
2−.157.02060.0581.000−.197−.117
300
Read and write1.577.0171171.1791.000.544.610
200
Childcare1.333.029133.6211.000.277.390
2.586.039220.7791.000.509.664
3−.025.0221.3111.252−.068.018
400
Instruction level1−.569.10927.1571.000−.783−.355
2−.104.109.9081.341−.319.110
3.2670.1095.9631.015.053.482
4.5380.1123.8811.000.322.754
500
Marital status1−.125.010142.8101.000−.146−.105
2−.530.022585.6041.000−.572−.487
3−.621.0171326.1371.000−.654−.588
4−.933.0163244.6351.000−.965−.901
500
Children11.009.0202554.3281.000.9701.049
2.794.0162471.0851.000.763.825
3.382.015639.5661.000.352.411
400
Return100
Time1.691.031506.5291.000.631.751
2.673.029524.7801.000.615.73
3.508.030283.2111.000.449.568
4.284.03280.3611.000.222.346
500
Condition100
Situation100
Main job1.542.033266.7741.000.477.607
2.497.2414.2291.040.023.970
3−.010.036.0761.783−.080.060
4.415.033158.5031.000.351.480
5.216.03440.9301.000.150.283
6.623.053139.5551.000.520.727
700

Table 4.

Point and interval estimates of the parameters of the logistic model considering physical disability as response variable.

VariablesEstimativesStandard errorsWalddfp-valueConfidence interval 95%
Lower limitUpper limit
Intellectual disability1−3.795.0871923.9021.000−3.964−3.625
210.498.09611945.5971.00010.31010.686
Sex1−.103.006284.6561.000−.115−.091
200
Age1.664.0132719.2031.000.639.689
2.073.00884.0071.000.057.089
300
Naturalness1−.139.007454.5761.000−.152−.126
2−.251.015274.0871.000−.281−.221
300
Read and write11.486.00746829.3111.0001.4731.500
200
Instruction level1−.943.086120.2031.000−1.112−.775
2−.303.08712.2541.000−.473−.133
3.116.0871.8021.179−.054.286
4.362.08916.6471.000.188.535
500

Table 5.

Point and interval estimates of the parameters of the logistic model considering as the answer variable intellectual disability.

For variable number of disabilities, we obtain the following predictor variables as significant as an adjustment for each different block:

Identification: domicile, categorized age, birthplace, nationality, and region; Education: reading and writing, day care, other graduation, and education; Family: union nature, marital status, and number of children; Work: income, secondary work, main work, travel, and return time; and finally; Combined model (Table 1 – made up of all predictor variables considered significant in each of the blocks): region, place of birth, reading and writing, day care, employment status, education, union nature, marital status, number of children, income, return, and main job. For model selection, we get −7232 for AIC, −8791,418 for BIC, and − 6917,953 for DIC.

As for visual disability, the following variables were selected: Identification: region, domicile, sex, birthplace, and nationality; Education: reading and writing, day care, other graduation, and education; Family: union nature, marital status, and number of children; Work: income, time, condition, situation, and secondary work, and finally; Combined model (Table 2) initialized with all explanatory variables that were considered significant for each of the different blocks and were selected: region, birthplace, read and write, day care, education, union nature, number of children, return, condition, and situation. For model selection, we get −2549,708 for AIC, −3291,833 for BIC, and finally −2399,707 for DIC.

Next, for hearing disability, the following variables were selected for each of the different blocks: Identification: region, domicile, sex, race, and birthplace; Education: reading and writing, day care, other graduation, and education; Family: union nature, marital status, and number of children; Work: income, time, condition, situation, main work, and secondary work; and finally, Joint model (Table 3): region, birthplace, reading and writing, education, marital status, number of children, condition, and situation. For model selection, we get −2921.348 for AIC, −3331.401 for BIC, and −2865.348 for DIC.

For physical disability, the following variables were selected: Identification: region, age, and birthplace; Education: reading and writing, day care, other graduation, and education; Family: union nature, marital status, and number of children; Work: income, return, time, condition, situation, main work, and secondary work; and finally, Joint model (Table 4): region, birthplace, reading and writing, day care, education, marital status, number of children, return, time, condition, situation, and main job. For model selection, we get AIC = −1258.613, BIC = −2119.480, and DIC = −1084.013.

Finally, in the case of Table 5, the following variables were selected as significant: gender, age, birthplace, knowing how to read and write, and education. Totally, there are five variables.

In intellectual ability, the following variables were selected: Identification: region, sex, age, race, and birthplace; Education: reading and writing, day care, other graduation, and education; Family: union nature, marital status, and number of children; Work: income, time, return, condition, situation, and secondary work; and finally, Joint model: gender, age, birthplace, reading and writing, and education. For model selection, we get AIC = −14,548. BIC = −14,711, and DIC = −14,515.

Making a comparative study between the models given in Tables 15, we noticed that the model that included a smaller number of variables was the logistic model adjusted for intellectual disability, while the model that required the largest number of independent variables was for the number of deficiencies.

The adjustment by stereotype ordinal logistic regression was compared with binary logistic regression [1] and multinomial logistic regression [23], and visual, hearing, physical, intellectual, and multiple disabilities were considered.

It was found that, for all the different disabilities, the one that had the highest number of independent variables considered significant was for the regression methodology, binary logistic followed by the stereotype ordinal logistic regression methodology, and this can be motivated by the following facts:

To enable the use of dummy variables, the response variable had to be transformed to determine whether or not it has a disability, which increased the sensitivity of the analysis, making differences more easily detected.

The stereotype logistic regression methodology performed better in relation to the multinomial logistic regression methodology, as it took into account that the response categories were ordinal, contrary to what happened when the multinomial logistic regression model was applied, and this probably caused that the multinomial logistic regression methodology has little sensitivity and presents a smaller number of selected variables in the composition of its models [24].

Among the advantages of using multinomial logistic regression, we can mention the fact of not making assumptions about the probabilistic behavior of the independent variables, possibility of testing the significance of a large number of independent variables, and, finally, possibility of direct estimation of the probability of an observation belonging to a certain class [25, 26].

Advertisement

4. Conclusions

The adjusted model with the lowest number of explanatory variables was the intellectual one with 5, while the one that needed the highest number was disability with 13 variables.

In this work, using the ordinal stereotype ordinal logistic model, it was possible to improve the quality of the fit when compared to the fit of the binary logistic model proposed in Oliveira [1]. When using the ordinal response, the disability risk was incorporated for different severity degrees and disabilities number.

The different deficiencies are not homogeneous, as for different predictor variables.

The incidence risks of being a disability person and being a visual disability person are probably greater in situations such as residing in the northeast region, female gender, aged over 80 years, Yellow race, incomplete elementary education, working in production for their own consumption, and high number of children.

The lower incidence risks are observed in situations such as residing in the southern region, male gender, aged 15 years or less, Indigenous race, schooling between complete high school and incomplete higher education, and worker with a formal contract and without children.

Next, for Figures 18, we proceed to establish possible justifications and suggestions for work or research that may or may not accept the considered hypotheses.

  • Figure 1. These results may be occurring due to the low effective investment in terms of health and infrastructure, smaller in the northeast and north regions, and larger in regions like the southeast and south.

To evaluate this hypothesis, an alternative is to carry out a survey of the effective volume spent on health, accessibility, and infrastructure that favor disabled people between the different regions, counting the number of people who were effectively benefited and make a comparative assessment between the different regions;

  • Figure 2: These are most likely results that reflect women’s greater exposure to domestic accidents and the double shift of modern women who work outside the home and take care of the home.

To better assess this point, the proposal can be a comparative study by sampling between the times of work at home and outside the home between men and women;

  • Figure 3: These results show that with the aging of the population over the years, with greater life expectancy and more subject to diseases of advanced age and a greater incidence of becoming disabled people.

In this case, it is possible to suggest studies that simultaneously prove the increase in life expectancy of the population and the emergence of diseases that occur at more advanced ages. This point can be easily confirmed by the data from the 2010 IBGE Census Sample;

  • Figure 4: These results show cultural and dietary conditions of Eastern and Indigenous peoples.

For a better understanding of this result, we suggest a research study on the life habits of Yellow and Indigenous people races, considering their possibilities of becoming disabled people;

  • Figure 5: It is believed that a low education can mean less knowledge of information, low purchasing power, and greater dependence on government aid.

In order to prove it, research can be carried out that can establish relationships between level of education and income;

  • Figure 6: Most likely, the different types of professions reflect the education obtained by different workers, since being military or statutory depends on passing a public tender that requires better preparation and study, while I work for my own consumption or without pay, in general, it is made up of people who work in the countryside, are unemployed and have lower purchasing power.

In order to better evaluate this possibility, the proposal is to carry out research that can establish the average remuneration for different professions by disability, sex, education, and other demographic variables;

  • Figure 7: The higher incidence of risk can be justified as it tends to be higher when the population’s purchasing power or income is lower.

In this case, we suggest a study in which a survey is carried out on disabled people and without disabilities and then, visual disabled people and without visual disability, and that we make a comparison between different income levels; and finally;

  • Figure 8: This result may reflect situations such as a greater number of children can mean an increase in the number of accidents and less attention paid to each child by the parents in social and economic terms.

In this case, to show this result, it suggests the establishment of a survey that can compare life quality among families with different numbers of children and evaluate their respective risk.

For Figures 18, the results were similar for the amount of disabilities and visual disability.

The conclusions of this work verify, in addition, the importance of other studies, researches, and analyses, because, when talking about risk, there are several methods to assess this risk, whether using regression coefficients, whether using regression analysis, factor scores, weighting of the disability risk considering the weighting of the risk for each of the different explanatory variables. For example, disability risk is known to increase as age increases, so does the number of children, and so on. Among various alternatives for future work, we can mention the following ones:

  1. Beta regression model, factor analysis, structural equation modeling, and the BART algorithm as a way to improve the goodness of fit and its reliability in determining the deficiency index.

  2. Repeat the analysis including variables related to housing conditions and possession of other assets.

  3. Among several questions that need to be answered are questions about how disabled people live and what situation they find themselves in when buying them from people without disabilities.

  4. In situations like this, a risk index with good reliability and adjustment quality is interesting to facilitate the monitoring of this situation, in the same way as with the Human Development Index (HDI), although this is a more general index, still does not take into account the issue of disabled people.

  5. Evaluate the accessibility of the surroundings of the houses of disabled people, considering the locations in a georeferenced way, evaluating the conditions of the infrastructure proposing a geostatistical model.

  6. The difficulty in estimating the risk index is to obtain a method that is efficient and reliable and that manages to reduce its discrepancy. Due to this problem, it ends up becoming of interest on the part of researchers, making the use of several methods to be able to estimate this risk considered and evaluated in this study.

  7. The advantage of having an index that can be compared is that it can be used as a parameter to see if its value has increased or decreased, in such a way that the higher this index should reflect the greater need for intervention by public authorities to reduce the existing barriers in terms of access to different human rights and accessibility around the homes of disabled people.

  8. Propose improvements in the census questionnaire, for example, if a respondent answered that he is a disability person, also ask at what age it occurred, because, according to the existing literature [3], it is known that the older the age people become disabled people, the greater the difficulties for that person to adapt.

  9. In statistical terms, improve national statistics on disability, using an efficient and low-cost approach to obtain more comprehensive data and add disability questions, cross-reference between different datasets, collect longitudinal data, add disability issues to allow monitoring, improve data comparability, develop adequate tools, fill gaps between investigations and, finally, strengthen and support the different investigations considering the creation of instruments that can measure and monitor life quality and the well-being of these people on a continuous and periodic basis.

  10. Also include issues related to health conditions, housing, work, education, accessibility, and leisure.

  11. Repeat the analysis by region, state, and municipalities.

  12. It is hoped that results such as this research can contribute to the action of public managers with better support in meeting the needs of disabled people.

Advertisement

Acknowledgments

The author thanks IBGE for accessing the microdata of the selected households to compose the sample and Professor Julia Maria Pavan Soler for indicating the topic.

References

  1. 1. Oliveira PTMS. Pessoas com deficiência: análise dos resultados do Censo 2010 e a sua evolução. In: 58 RBRAS/15 SEAGRO, in the period from July 22 to 26 2013. Brazil: Campina Grande – PB; 2013
  2. 2. Silva OM. A epopeia ignorada. São Paulo-SP, Brazil: CEDAS; 1987
  3. 3. Garcia VG. Disabled People and the Labor Market. Brazil: Economic Institut – UNICAMP; 2010
  4. 4. Figueira E. Caminhando em silêncio. São Paulo: Giz editorial e Livraria Ltda; 2008
  5. 5. Agresti A. An Introduction to Categorical Data Analysis. Florida, USA: Wiley & Sons; 2019
  6. 6. Anderson JA. Regression and ordered categorical variables. Journal of Royal Statistics Society. 1984;16:1-30
  7. 7. Paulino CD, Singer JM. Análise de dados categorizados. Blucher: Editora Edgard; 2006
  8. 8. Abreu MNS. Uso de modelos de regressão logística ordinal em epidemiologia: um exemplo usando a qualidade de vida. Belo: Public Heralth College; 2007
  9. 9. Hastie T, Tibshirani P, Friedmann J. The Elements of Learning: Data Mining, Inference and Population. Canada: Springer; 2009
  10. 10. Oliveira PTMS. Application of the Genetic Algorithm in the Mapping of Epistatic Genes in Controlled Crosses. São Paulo, Brazil: IME-USP; 2008
  11. 11. Casella G, Berker PL. Statistical Inference. Brooks, California: EUA; 1990
  12. 12. Oliveira PTMS. Estimation and Hypothesis Testing in Comparative Calibration. São Paulo: IME-USP; 2001
  13. 13. Draper NR, Smith H. Applied Regression Analysis. New York: Jonh Wiley; 1998
  14. 14. Camarinha-Filho JA. Mixed Linear Models: Estimates of Variance and Covariance Matrices and Model Selection. Brazil: ESDALQ-USP; 2008
  15. 15. Broman K. Identifying Quantitative Trait Locos in Experimental Cross. Berkeley: University of California; 1997
  16. 16. Burnham KP, Anderson DR. Model Selection and Multimodel Inference. New York: Springer; 2002
  17. 17. Burnham KP, Anderson DR. Model Selection and Inference. New York: Springer; 1998
  18. 18. Paulino CD, Turkman AA, Murteira BJF. Estatística Bayesiana. Portugal: Fundação Calouste Gulbenkian; 2003
  19. 19. Sakamoto Y, Ishguru M, Kitamura G. Akaike Information Criterion Statistics. Japon: KTK, Scientific Publisher; 1986
  20. 20. Suser M. Epidemiology, Health & Society – Selected Papers. New York: Oxford University press; 1987
  21. 21. Almeida Filho N, Rouquayrol MZ. Introdução a epidemiologia moderna. Rio de Janeiro: Guanabara Koogan; 2006
  22. 22. Luiz OC, Cohn A. Sociedade de risco e risco epidemiológico. Rio de Janeiro: Cadernos Saúde Pública; 2006
  23. 23. Oliveira PTMS. Disabilities people: some analysis of the 2010 Census results and its evolution. In: 58 RBRAS/15 SEAGRO, 2013, from July 22 to 26. Campina Grande-PB; 2014
  24. 24. Abreu MNS, Siqueira AL, Caiaffa WT. Ordinal logistic regression in epidemiological studies. Rev. Saúde Pública. 2009;43:1
  25. 25. Oliveira PTMS. Pessoas com deficiência: questão de risco sob aplicação de regressão logística politômica e sob visão epidemiológica. In: XIV School Regression Models, 2015, from March 2 to 5, Convention Center, UNICAMP, Campinas-SP, Program and Abstracts. Brazilian Statistical Association; 2015
  26. 26. Oliveira PTMS. Disabled People in Brazil: Risk Question. 2018. Available from: https://www.preprints.org/manuscript/201802.0171/v1

Written By

Paulo Tadeu Meira e Silva de Oliveira

Submitted: 04 February 2022 Reviewed: 30 June 2022 Published: 26 August 2022