Downloading data from the WRDS website is convenient but not the most transparent and replicable as it requires a workflow along the lines of:
The problem is that step 2 and step 3 happen outside of your code. This essentially leaves a gap in the steps required to run your code, which can leave other people (or your future-self) confused as to what you exactly did on the WRDS website to extract the required data.
For a long time WRDS only offered to programmatically interface with WRDS data through SAS files. Non-SAS clients (think Python and R) could therefore only interact with WRDS data through SAS-bindings. This approach technically worked but was definitely a substantial hassle for many applications. Fortunately, WRDS decided to also make all their data available through a series of PostgreSQL databases. This change happened somewhere in June / July of 2017 (at least for the WRDS Python package). In other words, we can now finally directly interface with WRDS data using tools like Python, R, Stata, and MATLAB!
The WRDS support documents extensively describe how interface with WRDS using the different programs. I will include all links below, but for the sake of completeness will discuss a couple of tips and tricks + some Python code examples to illustrate how one can use it.
WRDSlibrary can be installed from the command line using
pip install wrds
db = wrds.Connection()
Enter your WRDS username [TiesPC]: wrds_user Enter your password:········ Loading library list... Done
libraries = db.list_libraries() libraries[:4]
['aha_sample', 'ahasamp', 'audit', 'audit_audit_comp']
tables = db.list_tables('wrdssec') tables[:4]
['wrds_forms_reg', 'wrds_13f_link', 'wrds_13f_summary', 'wrds_forms']
col_headers = db.describe_table(library='wrdssec', table='wrds_nlpsa')
Approximately 45332100 rows in wrdssec.wrds_nlpsa.
There are two ways:
For a table with not too many rows (like some of the linktables) you can use
data_5rows = db.get_table(library='wrdssec', table='wrds_nlpsa', columns = ['gvkey', 'cik', 'filename', 'finterms_litigious_count'], obs=5)
array(['0000000003', '0000000003', '0000000003', '0000000003', '0000000003'], dtype=object)
sql_query = """ SELECT gvkey, cik, filename, finterms_litigious_count FROM wrdssec.wrds_nlpsa WHERE cik IN ('0000000003', '0000000003', '0000000003', '0000000003', '0000000003') """
data_query = db.raw_sql(sql_query)