Warning: mkdir(): Permission denied in /home/virtual/lib/view_data.php on line 81

Warning: fopen(upload/ip_log/ip_log_2024-12.txt): failed to open stream: No such file or directory in /home/virtual/lib/view_data.php on line 83

Warning: fwrite() expects parameter 1 to be resource, boolean given in /home/virtual/lib/view_data.php on line 84
Statistical notes for clinical researchers: logistic regression
Skip Navigation
Skip to contents

Restor Dent Endod : Restorative Dentistry & Endodontics

OPEN ACCESS

Articles

Page Path
HOME > Restor Dent Endod > Volume 42(4); 2017 > Article
Open Lecture on Statistics Statistical notes for clinical researchers: logistic regression
Hae-Young Kimorcid
Restorative Dentistry & Endodontics 2017;42(4):342-348.
DOI: https://doi.org/10.5395/rde.2017.42.4.342
Published online: September 12, 2017

Department of Health Policy and Management, College of Health Science, and Department of Public Health Science, Graduate School, Korea University, Seoul, Korea.

Correspondence to Hae-Young Kim, DDS, PhD. Professor, Department of Health Policy and Management, College of Health Science, and Department of Public Health Science, Graduate School, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Korea. Tel: +82-2-3290-5667, Fax: +82-2-940-2879, kimhaey@korea.ac.kr

Copyright © 2017. The Korean Academy of Conservative Dentistry

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 23 Views
  • 0 Download
prev
Logistic regression is a regression model where the dependent variable is categorical and corresponding independent variables can be categorical or continuous. This article covers the case of a binary dependent variable such as an event occurring coded 1 = ‘event’ and 0 = ‘no event’. Frequent outcomes are pass/fail, win/lose, disease/no disease, etc. The logistic regression model estimates the probability that an event occurs versus the probability that the event does not occur.
Let's say that an institution performed an assessment procedure to determine pass and fail of the participants considering exam scores, interview result, and reputation among colleagues. Table 1 shows a data with 2 variables, exam scores and pass state (1 = pass, 0 = fail). We can notice that there is a trend that persons with lower scores are more likely to fail, while persons with higher scores tend to pass. When we plot the data as Figure 1A, we can see persons with value 1 (pass) have scores that shift to the right side, while persons with value 0 (fail) have those that shift to the left side. Persons with same score may not have the same outcome (e.g., cases of score = 799) because the assessment procedure comprises other factors. At least we can postulate that the probability of pass may be higher if the score is higher. What is the best-fit line for this data? A usual straight regression line ranging from minus infinity to infinity does not make sense for this case. Instead of ordinal regression the logistic regression can fit the probability more adequately. In Figure 1B, the probability estimated by logistic regression is presented. The estimated probability by the logistic regression model (red dot and line) seems reasonable because it reflects the observed reality that the probability of pass decreases close to zero with very low scores, while the probability increases close to one with very high scores.
Table 1

Scores of applicants who passed the final assessments

Score 755 755 763 781 783 788 792 793 798 799 799 802 813 824 845
Pass 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1
Figure 1
Scatterplot of pass (1 = pass, 0 = fail) and score: (A) pass and score, and (B) estimated probability (P) of pass added.
rde-42-342-g001.jpg
From the previous sections about risk, odds, and odds ratio, they were defined as following formulas:
Probabilityorriskp=numberofeventsnumberofallobservations
Odds=p(event)p(non-event)=p1-p
Oddsratio=odds1odds0=p11-p1p01-p0
Let's consider an example of flipping of fair coins vs. loaded coins.
Odds ratio is important in interpreting in logistic regression because it represents how much the odds change with 1 unit increase in the predictor variables while keeping all other variables constant.
1. Logit link function
Logistic regression uses logit link function to estimate unknown probability of outcome (p) for a linear combination of predictor variables. The original probability ranging from zero to one cannot match with linear combination of predictor variables ranging minus infinity to infinity [1].
Logitp=lnodds=logep1-p
where logex = ln (x) and e = Euler's number, 2.71828.
Logit link function accommodate p ranging from zero to one. The logit link function reconciles the incongruity by changing the range of dependent variable, p, into minus infinity to infinity. As seen in Table 2, final logit (p) values cover from minus values to plus values.
Table 2

Logit transformation from probability (p)

P 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99
1−p 0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01
odds 0.01 0.11 0.25 0.43 0.67 1.00 1.50 2.33 4.00 9.00 99.00
Logit (p) = ln (odds) −4.60 −2.20 −1.39 −0.85 −0.41 0.00 0.41 0.85 1.39 2.20 4.60
2. Property of logit and inverse logit
Shown in Figure 2A, logit function has an s-shaped curve. Logit (p) is undefined at p = 0 and p = 1. When p approaches close to zero, the value of logit (p) goes toward minus infinity and when p get larger close to one, it goes toward infinity. We can notice that the logit (p) has a value of zero at p = 0.5.
Figure 2B shows inverse logit graph. Inverse logit returns the probability of the event ranging from zero to one. Figure 1B and Figure 2B show similar shape because both represent estimated probability. The induced inverse logit formula is as following:
Inverselogitp=log-1p1-p=11+e-α=eα1+eα=p
where α = some number.
Figure 2
Probability and logit transformation: (A) natural log of adds ratio (logit [p]), (B) inverse logit (p).
rde-42-342-g002.jpg
3. Estimation of logistic regression equation
Simple logistic regression is expressed as logit (p) and linear combination of predictor variables as below.
logitp=lnp1-p=β0+β1x
Using a fictitious data based on the example above logistic regression was performed and the output was provided (pages 6–7). The observations (n = 15) are multiplied by 100 to provide high power to get significant estimates artificially. The dependent variable was the binary variable pass and score was the predictor variable. The SPSS (IBM Corp., Armonk, NY, USA) output of (e) below gives coefficients as following.
The estimated logistic equation is:
logitp=lnp1-p=β0+β1x=-73.578140+0.093115×(Score)
where p = probability of ‘pass’.
Here represents odds ratio which means the amount of change in odds with 1 unit increase in the predictor variable. The odds ratio, exp (β1) = e0.093115 = 1.097588. Therefore, as the score increases by 1 point, the odds of pass was estimated to increase by 9.8%. The 95% confidence interval of odds ratio was [1.086, 1.109] which does not include a value one. Odds ratio value of one means that 1 unit increase in the predictor variable does not make any difference in odds. Therefore, to get statistical significance, it is important to confirm that 95% confidence interval of odds ratio does not include one.

1) Estimated probability

After some algebra, inverse logit gives us the estimated probability by the predictor variable as follows:
logitp=lnp1-p=β0+β1x
p1-p=eβ0+β1xp=eβ0+β1x1-pp1+eβ0+β1x=eβ0+β1xp^=eβ0+β1x1+eβ0+β1x
To get the probability of pass at score 781, we can use the estimated probability function. Also, if the score increases by one point to 782 then the estimated probability can be calculated as shown in Table 3. According to the results for the score 781, estimated probability of pass in the assessment is 0.30 or 30%. Also, the odds ratio is obtained as 1.098, which is the same value with exp (β1) from the SPSS output, representing the increase of odds of 9.8% related to a 1 point increase of the score.
Table 3

Estimated probability and odds ratio based on logistic regression model

Score = 781 p^=eβ0+β1x1+eβ0+β1x=e73.578140+0.0931157811+e73.578140+0.093115781=e0.855331+e0.85533=0.4251451+0.425145=0.298317rde-42-342-i010.jpg
odds=p1-p=0.2983171-0.298317=0.425145rde-42-342-i011.jpg
Score = 782 p^=eβ0+β1x1+eβ0+β1x=e73.578140+0.0931157821+e73.578140+0.093115782=e0.762211+e0.76221=0.4666341+0.466634=0.318167rde-42-342-i012.jpg
odds=p1-p=0.3181671-0.318167=0.466634rde-42-342-i013.jpg
Odds ratio for a 1 point increase in score: odds at 782odds at 781=0.4666340.425145=1.097588rde-42-342-i014.jpg
Estimated probability for other score values are shown in the SPSS output (f) below under ‘PRE_1’. Using this we can calculate odds and odds ratio between 2 specific scores. For example, suppose my present score is 781 and I'd like to know how much increase in odds if I raise my score by 11 points and get 792. Then the odds ratio can be obtained easily. The calculation ends up to an increase of 179% in odds when I raise up my score by 11 points (Table 4).
oddsat792oddsat781=1.180.43=2.79
Table 4

Scores, estimated probabilities, and odds ratios based on logistic regression model

Score 755 755 763 781 783 788 792 793 798 799 799 802 813 824 845
P 0.04 0.04 0.07 0.30 0.34 0.45 0.54 0.57 0.67 0.69 0.69 0.75 0.89 0.96 0.99
Odds 0.04 0.04 0.08 0.43 0.51 0.82 1.18 1.30 2.07 2.27 2.27 3.01 8.37 23.31 164.84
  • 1. Allison PD. Logistic regression using SAS: theory and application. 2nd ed. Cary (NC, USA): SAS Institute Inc.; 2012. p. 19-26.
Appendix 1

Procedure of logistic regression using IBM SPSS.

The procedure of logistic regression using IBM SPSS Statistics for Windows Version 23.0 (IBM Corp.) is as follows.
rde-42-342-a001.jpg
*In this fictitious data, the ‘freq’ variable was used to multiply the number of observations to get sufficient power.

Tables & Figures

REFERENCES

    Citations

    Citations to this article as recorded by  

      • ePub LinkePub Link
      • Cite
        CITE
        export Copy Download
        Close
        Download Citation
        Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

        Format:
        • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
        • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
        Include:
        • Citation for the content below
        Statistical notes for clinical researchers: logistic regression
        Restor Dent Endod. 2017;42(4):342-348.   Published online September 12, 2017
        Close
      • XML DownloadXML Download
      Figure
      • 0
      • 1
      Statistical notes for clinical researchers: logistic regression
      Image Image
      Figure 1 Scatterplot of pass (1 = pass, 0 = fail) and score: (A) pass and score, and (B) estimated probability (P) of pass added.
      Figure 2 Probability and logit transformation: (A) natural log of adds ratio (logit [p]), (B) inverse logit (p).
      Statistical notes for clinical researchers: logistic regression

      Scores of applicants who passed the final assessments

      Score755755763781783788792793798799799802813824845
      Pass000011010011111

      Logit transformation from probability (p)

      P0.010.10.20.30.40.50.60.70.80.90.99
      1−p0.990.90.80.70.60.50.40.30.20.10.01
      odds0.010.110.250.430.671.001.502.334.009.0099.00
      Logit (p) = ln (odds)−4.60−2.20−1.39−0.85−0.410.000.410.851.392.204.60

      Estimated probability and odds ratio based on logistic regression model

      Score = 781p^=eβ0+β1x1+eβ0+β1x=e73.578140+0.0931157811+e73.578140+0.093115781=e0.855331+e0.85533=0.4251451+0.425145=0.298317
      odds=p1-p=0.2983171-0.298317=0.425145
      Score = 782p^=eβ0+β1x1+eβ0+β1x=e73.578140+0.0931157821+e73.578140+0.093115782=e0.762211+e0.76221=0.4666341+0.466634=0.318167
      odds=p1-p=0.3181671-0.318167=0.466634
      Odds ratio for a 1 point increase in score: odds at 782odds at 781=0.4666340.425145=1.097588

      Scores, estimated probabilities, and odds ratios based on logistic regression model

      Score755755763781783788792793798799799802813824845
      P0.040.040.070.300.340.450.540.570.670.690.690.750.890.960.99
      Odds0.040.040.080.430.510.821.181.302.072.272.273.018.3723.31164.84
      Table 1 Scores of applicants who passed the final assessments

      Table 2 Logit transformation from probability (p)

      Table 3 Estimated probability and odds ratio based on logistic regression model

      Table 4 Scores, estimated probabilities, and odds ratios based on logistic regression model


      Restor Dent Endod : Restorative Dentistry & Endodontics
      Close layer
      TOP