Programming

A new version of my Stata guide is now available and freely accessible via Github (including code and data examples). Please see https://github.com/dveenman/stataguide or here to go directly to the pdf.

The objective of this guide is to assist BSc/MSc students, PhD students, and junior researchers in using Stata for empirical archival research. Stata is a powerful program that can be used to analyze many different research questions in the fields of accounting, finance, economics, and beyond. While many statistical packages exist, a major advantage of Stata is that it allows you to carefully manage your data and research process, compute key variables needed in empirical research (e.g., fitted or residual values from a prediction model), and easily merge large sets of data (e.g., combining financial statement data from Compustat with stock market data from CRSP and analyst forecast data from IBES). The purpose of this guide is to give students a head start in using Stata in empirical accounting and finance research settings.

Version history:

  • Version 5.0: September 2023
  • Version 4.1: May 2019
  • Version 4.0: January 2019
  • Version 3.0: December 2013
  • Version 2.0: December 2011
  • Version 1.0: December 2010

New in version 5:

  • Update to Stata 18;
  • Publication of code on Github;
  • Improved and expanded section on standard errors;
  • Improved section of fixed effects estimation;
  • Improved section on matching and added entropy balancing;
  • Added section on robust regression estimators in Chapter 4;
  • Added section on difference-in-differences estimators in Chapter 4;
  • Added section on creating and formatting graphs in Chapter 4;
  • Added a separate chapter on simulations and programming in Chapter 5;
  • Added Appendix on the use of Stata for downloading of data from WRDS;
  • Improved Appendix on implied cost of capital estimation;

New version: Introductory Guide to Using Stata in Empirical Financial Accounting Research

A new version of my Stata guide is now available and freely accessible via Github (including code and data examples). Please see https://github.com/dveenman/stataguide or here to go directly to the pdf. The objective of this guide is to assist BSc/MSc students, PhD students, and junior researchers in using Stata for empirical archival research. Stata is a powerful program that can be used to analyze many different research questions in the fields of accounting, finance, economics,…
DAVID VEENMAN
DAVID VEENMAN
2 min read
0
3139

Create engaging online content using OBS Studio

Covid-19 and lockdowns have accelerated the gradual move to online and hybrid teaching from an evolution to a revolution – during 2020 and the first half of 2021, educators were confronted with digital challenges and new pedagogical approaches that only seem to be the online equivalent of traditional classroom teaching. Still an undiscovered land for many, video conferencing platforms like Microsoft Teams, Zoom, or Cisco Webex Meetings put restrictions on what the educator could do…
Anastasia Kopita
Anastasia Kopita
2 min read
0
1565

Better TLCF data: more accurate research results

The possibility to compensate taxable profits with taxable losses from prior years is important in explaining firms’ tax incentives, tax planning, and tax aggressiveness. The tax loss carryforward (TLCF) is the total amount of taxable losses from the past that can be used to offset future taxable income. Data on this TLCF are, however, often missing in Compustat or not available at all for several countries. In our article “Estimating and imputing missing tax loss…
Jacco Wielhouwer
Jacco Wielhouwer
3 min read
0
2733

Creating a collaborative virtual classroom: the case of Discord

In this article I will describe my experience with using Discord, which was initially designed as a communication platform for video gamers, as a virtual classroom to enable highly collaborative team work in a remote context. But first some backstory; I give a yearly Python course for European PhD students organized through the Limperg Institute. This year I was going to give one in June at Tilburg University, however, due to the COVID-19 circumstances that…
Ties de Kok
Ties de Kok
6 min read
0
4618

How to create eye-candy graphs with python

Having attended many presentations over the years, I started noticing that tables are such a drag. Unfortunately, we are used to a presentation format that makes us want to show tables. Why? Quite often presenters only reveal the parts of a table that support their story. So, when was the last time you saw a row with low R-squares? And the coefficients? Right, these often join stars. Presenters often rely on animated circles and boxes…
MARTIEN LUBBERINK
MARTIEN LUBBERINK
2 min read
0
1711

Getting started with Python for Accounting Research

The Python programming language is a very powerful tool to have in your toolkit as an Accounting researcher. Python is the data science equivalent of a Swiss army knife as it can be used to solve a wide variety of problems: data gathering, web scraping, data processing/cleaning, natural language processing, data analysis (e.g. machine learning), and data visualization. While other tools like R, Stata, and SAS might outshine Python in specific applications (e.g. statistical analysis)…
Ties de Kok
Ties de Kok
3 min read
0
8526

Textual Analysis for Accounting Research using Python

During the 2018 EAA PhD Forum in Milan I gave a break-out session on NLP / Textual Analysis for Accounting research using Python. In my talk I provided a bird’s-eye view of the various NLP techniques that are relevant for Accounting research. The full set of slides can be viewed online on my personal website: http://www.tiesdekok.com/EAA_2018_NLP/ (My talk was recorded on video, available here) I have also created a Jupyter Notebook with code examples for…
Ties de Kok
Ties de Kok
< 1 min read
0
2819

Retrieving data from WRDS directly using Python, R, and Stata

Downloading data from the WRDS website is convenient but not the most transparent and replicable as it requires a workflow along the lines of: Create a list of identifiers using your program (i.e. Python) Load the identifiers into the WRDS web interface + make your query Download the resulting data from WRDS into a file Load the WRDS data file back into your program The problem is that step 2 and step 3 happen outside…
Ties de Kok
Ties de Kok
4 min read
0
6482

Use Python to calculate the facial width to height ratio (fWHR)

There is a rise of papers that calculate the Facial Width-to-Height ratio (fWHR) as a proxy for the personal/physical traits (&#8220;facial masculinity&#8221;) of executives such as CEOs. There is no perfect definition on what the fWHR captures, but most papers interpret the fWHR to be associated with traits such as aggression, risk-seeking, and egocentrism. For a more comprehensive discussion I recommend reading Lefevre, et al. (2013) and Jia, van Lent, and Zeng (2014), available here:…
Ties de Kok
Ties de Kok
5 min read
0
16651

Getting started with the Jupyter Notebook

During the last EAA meeting (2016, Maastricht) I was asked to give a short talk during the PhD Forum on the topic of using a tool called the Jupyter Notebook to increase the replicability and transparancy of our research: The slides are available here: https://goo.gl/kjcwfv &nbsp; Understand the Jupyter Notebook There are three components to project Jupyter. The Jupyter Notebook which is accessed and used through your browser The Jupyter Server that is run on…
Ties de Kok
Ties de Kok
3 min read
0
1914