Dirk Enzmann - Statistical
Software (Some Useful
Things)

Below you find some small executables, SPSS macros and scripts, Excel-templates, R functions (see: http://www.r-project.org/) and Stata ado-files I wrote for special calculations in statistical analyses. The executable programs are written in Pascal 7.0 and run under 16- and 32-bit Windows (3.x, 9x, NT4, XP). The files can be downloaded and spread without further permisson under the condition that they remain unchanged. They have been tested as virus free. The author is not liable to any damages caused by their use. Comments on improvements are welcome.

Name | Description | Application | Download |

BetaDiff | For calculating confidence intervals and testing the significance of the difference of two beta-coefficients from independent samples (description). | Executable | BetaDiff.zip |

Center | For centering a set of variables (with listwise deletion of missing cases); useful for computing products of variables for interaction terms in regression analyses. | SPSS | center.sps |

clstop_lbt | Stata
module to determine via -cluster stop, rule(lbt)- the number of
kmeans clusters (or to determine whether there is more than one kmeans
cluster) according to the lower bound technique presented in Steinley & Brusco (2011). (To install you may copy the .ado- and the .sthlp-file into your "\ado\plus\c\" folder - the recommended method, however, is to enter ssc install clstop_lbt in Stata's command window.) | Stata | clstop_lbt.ado clstop_lbt.sthlp |

CorrTot | For computing pooled means, standard deviations and a pooled correlation matrix from means, standard deviations and correlation matrices of two independent samples (description). | R Executable | corrtot.r CorrTot.zip |

CovMat | For writing a covariance matrix of a set of variables (with listwise deletion of missing cases) to a text file. | SPSS | covmat.sps |

Crosstabs | R function to simulate the SPSS procedure CROSSTABS. | R | crosstabs.r |

DivCat | Stata module to calculate five measures of diversity for multiple categories: Generalized variance (GV), entropy (H), its normalized counterparts (NGV, NH) (see Budesco & Budesco, 2012), and polarization (RQ) (see Montalvo & Reynal-Querol, 2008). (To install you may copy the contents of the .zip-file into your "\ado\plus\d\" folder - the recommended method, however, is to enter ssc install divcat in Stata's command window.) | Stata | divcat.zip |

dta2sps | Stata module to create SPSS syntax and a Stata data file to
convert Stata data into SPSS data. Extended missing values which are
labeled will be recoded into "numeric" values which will be defined as
missing by using SPSS syntax created by -dta2sav-. This allows to
preserve labels of missing values as defined in Stata for subsequent
use in SPSS. (To install you may copy the .ado- and the .sthlp-file into your "\ado\plus\d\" folder - the recommended method, however, is to enter ssc install dta2sav in Stata's command window.) | Stata | dta2sav.do dta2sav.sthlp |

DumCode | For creating dummy variables (indicator coding) of a nominal variable. Useful for regression analyses with independent variables that are categorical. | SPSS | dumcode.sps |

Fa.promax | To compute maximum likelihood factor analysis with varimax and promax rotation; allows specification of promax power and sorting of loadings; output includes correlation matrix of factors and (optionally) matrices of factor scores | R | fa.promax.r |

Freq | R function to simulate the SPSS procedure FREQUENCIES. | R | freq.r |

Hist.kdnc | To plot a histogram overlayed by a kernel density and a normal curve. | R | hist.kdnc.r |

IntGraph | Template for drawing interaction plots of a regression equation with interaction term (description). | Excel | intgraph.zip |

Kurtosis | To compute the unbiased population estimate or biased sample statistic of kurtosis. | R | kurtosis.r |

LogRegR2 | To calculate ChiČ model fit and RČ analogs (pseudo RČ: McFadden's RČ, Cox & Snell index, Nagelkerke index, McKelvey & Zavoina's RČ) of a logistic regression model obtained by glm(..., family = 'binomial'). | R | LogRegR2.r |

MeanSD | For computing interactively the mean and standard deviation of a combined sample from up to 50 independent samples. | Executable | meansd.zip |

MeanSDF | Same as MeanSD for up to 1000 samples and input file as input (description). | Executable | meansdf.zip |

Median | For calculating the median and quartiles of a variable (optionally for all values of a break variable) according to one of six different methods (description). | SPSS | median.sps |

MEResc | To rescale the results of mixed (multilevel) nonlinear probability
models such as xtmelogit, xtlogit, or xtprobit to the same scale as the
intercept-only model. This allows to compare regression coefficients or
variance components across hierarchically nested models [see: Hox, J. J. (2010). Multilevel Analysis: Techniques and Applications (Chapter 6.5, pp. 133-139). New York (2nd ed.): Routledge]. (To install you may copy the .ado-, .mo- and .sthlp-files into your "\ado\plus\m\" folder - the recommended method, however, is to enter ssc install meresc in Stata's command window.) | Stata | meresc.zip |

Miss2Sys | Script to recode all missing values of all numeric variables to system missing values (useful if you want to import an SPSS data file with different missing values in R) (description). | SPSS | Miss2Sys.sbs |

Moments2 | To calculate the mean, standard deviation, and different types of skewness and kurtosis (according to Joanes & Gill, 1988)
of a list of variables. The default are estimates of skewness and
kurtosis as used in SAS and SPSS. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\m\" folder - the recommended method, however, is to enter ssc install moments2 in Stata's command window.) | Stata | moments2.ado moments2.hlp |

nb_adjust | For identifying and adjusting (or removing) outliers of a variable assumed to have a negative binomial distribution. (Requires Stata version 12.1 or higher. To install you may copy all files of the .zip-file starting with "n" into the "\ado\plus\n\" folder and all files starting with "r" into the "\ado\plus\r\" folder.) | Stata | nb_adjust.zip |

Part_tst | For testing the difference between two standardized regression coefficients of the same equation (one sample) (description). | SPSS | part_tst.zip |

PCA | To compute a principal components "factor" analysis (PCA) with varimax and promax rotation; different options for the number of components (factors): direct specification, parallel test criteria (random eigenvalues), or minimum eigenvalue; optionally specification of promax power, sorting of loadings, and matrices of factor scores (see also: RanEigen and Fa.promax). | R | pca.r |

Plot.fitPNB | To plot the proportion of the observed counts and the fitted (expected) probabilities of Poisson and negative binomial distributed counts of a variable. | R | plot.fitPoisNegb.r |

Plot.kdnc | To plot a kernel density curve overlayed by a normal curve. | R | plot.kdnc.r |

Plot.power | To calculate and plot power of a one sample z-test of a sample mean. | R | plot.power.r |

Plot_Power | Create graph to demonstrate power analysis (one-sample z-test of a mean) - see demonstration in pow_demo.do. | Stata | plot_power.do pow_demo.do |

ProfSim | For calculating different measures of profile similarity based on two sets of variables (description: see comments at the end of the macro). | SPSS | profsim.sps |

prop.CI | To calculate the confidence interval of a single proportion according to one of eleven methods (see: Brown, Cai, & DasGupta, 2001; Newcombe, 1998) (default: likelihood ratio method) (description: see comments of source file). | R | prop.CI.r ex_prop.CI.r |

R2_mz | To compute McKelvey &
Zavoina's Pseudo-RČ for multilevel logistic regression, random effects, and
fixed effects logit and probit models (see Windmeijer, 1995). (To install you may copy the .ado-, .mo- and .sthlp-files into your "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install r2_mz in Stata's command window.) | Stata | r2_mz.zip |

RanEigen | For determining the number of components (factors) to retain in a principal component analysis (PCA) by using random eigenvalues (parallel analysis) (APM article describing version 1.0) (how to install RanEigen?). | Executable R | pacrit.zip RanEigen.r |

Rel_Clust | Stata module to compute indices of relative clusterability of a set of variables according to Steinley & Brusco (2008) and to transform a set of variables to z-standardized, range standardized, or to variance-to-range ratio weighted variables for use in (K-means) cluster analysis. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\r\" folder - the recommended method, however, is to enter ssc install rel_clust in Stata's command window.) | Stata | rel_clust.ado rel_clust.sthlp |

RelDiff | For computing the reliability of a difference score (gain score) according to Zimmerman & Williams (1982). | Executable | reldiff.zip |

Reliability | R function to simulate the SPSS procedure RELIABILITY. | R | reliability.r |

r_bis | For computing a biserial correlation coefficient and its significance. | SPSS |
r_bis.sps examp_r.sps |

R_Prob | For calculating the significance, 95%-confidence interval, and Fisher's Z value of a Pearson correlation coefficient r (given sample size n). | Executable | r_prob.zip |

r_tetra | For computing a tetrachoric correlation coefficient and its significance (see also: TetCorr). | SPSS |
r_tetra.sps examp_r.sps |

scores (R) | To create scores (min, max, sum, sd, or mean) of variables. The user can specify the minimum number of valid values necessary for the score to be valid. If mean scores are requested it is possible to center them at the overall mean, to transform them to z-scores, or to transform them to POMP (percent of maximum possible) scores. | R | scores.r test_sc.r |

scores (Stata) | To
create scores (row-wise) of a set of variables. The user can specify
the minimum number of valid
values
necessary for the score to be valid. The scores created can be:
minimum, maximum, total (sum), median, percentile, standard deviation,
or mean. If mean scores are requested it is
possible to center them at the overall mean or to transfrom them to
z-scores, POMP (percent of maximum
possible) scores, the proportion of maximum possible scores, or the shrunken proportion of maximum possible scores. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\s\" folder - the recommended method, however, is to enter ssc install scores in Stata's command window.) | Stata |
scores.ado scores.hlp |

sim_BE | To simulate series of Bernoulli experiments and plot the cumulative sequence of success rates (optionally including confidence intervals). | Stata | sim_be.do be_demo.do |

sim_CI | To demonstrate the concept of confidence intervals (CIs) by simulation. The program creates (animated) plots of confidence intervals (employing either t- or normal-distribution) by drawing a user specified number of samples of user specified size from the normal distribution with user specified mu and sigma. Optional output contains sample statistics and coverage rate of confidence intervals. | R Stata |
sim_CI.r CI_demo.r sim_ci.do ci_demo.do |

Skewness | To compute the unbiased population estimate or biased sample statistic of skewness. | R | skewness.r |

SortL | To sort rotated factor loadings (pattern matrix) or components
previously created by the postestimation command -rotate-. Sorting of
loadings or components by size facilitates the interpretation of a
factor solution. (To install you may copy the .ado- and the .hlp-file into your "\ado\plus\s\" folder - the recommended method, however, is to enter ssc install sortl in Stata's command window.) | Stata |
sortl.ado sortl.hlp |

SPSS2Stata | Script for converting an SPSS data file (.sav) into a Stata/SE data file (.dta). The script now supports variable names longer than 8 characters. Nevertheless, you may find the Stata ado -usespss- useful, too (to install enter ssc install usespss in Stata's command window). However, in contrast to this script and similar to StatTransfer -usespss- ignores value labels of missing values (description). | SPSS | spss2stata.sbs |

t-Test | For testing the difference in means between two indepedent samples (given means, standard deviations and sample sizes of both samples) (description). | Executable | t_test.zip |

TabNotes | To convert .not-files created by the data entry software EpiData (see: http://www.epidata.dk/index.htm) containing data entry notes into a tabulator-delimited file (for example, to export the notes into an Excel file) (description). | Executable | TabNotes.zip |

TetCorr | DOS program and source code (Pascal) for computing a matrix of tetrachoric correlation coefficients of up to 50 variables and a maximum of 8,000 cases (see also: r_tetra) (description). | Executable | tetcorr.zip |

TetVNPos | To determine which variables are responsible for a matrix of tetrachoric correlations not being positive definite (dependencies: packages -psych- and -mvtnorm-) | R | TetVNPos.r |

TRd | For computing the Satorra-Bentler scaled chi-square difference test (TRd) based on the MLM estimators obtained by MPlus, see: http://www.statmodel.com/chidiff.html. | Executable | trd.zip |

VDef2SPS | Script for creating SPSS syntax to define the variables (variable labels, value labels, and missing values) according to the definitions of a specific SPSS data file (*.sav) (description). | SPSS | VDef2SPS.sbs |

Some other useful things:

- A very useful utility is the "real-time codebook" ViewSav written by Karel Asselberghs that allows to view the variables of SPSS and Stata data files including labels and basic statistics, see: http://www.asselberghs.nl/stuff.htm
- For an extremely useful source of SPSS macros see: http://www.spsstools.net

(last update: January 26, 2015)