Blog

My Data Science Tool Box

This post describes the tools I currently use for working with data. People often ask me to recommend specific tools, and I always hesitate, because so much boils down to personal preference. I recently added a workshop to the DSS lineup providing an overview of popular tools for working with data. The core idea is that researchers have a lot of choices available when it comes to choosing tools to implement a reproducible workflow. For example, it doesn't really...

Read more about My Data Science Tool Box

Update: Stata v14.1 and 15 Advisory: xtreg, fe does what you expect; manual is incorrect

UPDATE: Sunday, March 4

Yesterday, I received the following message from David Drukker, the Executive Director of Econometrics at Stata:

"The xtreg-fe command in Stata produces consistent point estimates and
standard errors for all the model parameters.  There is a typo on page 27 of
https://www.stata.com/manuals/xtxtreg.pdf .  The formula for bar(bar(y))
should be the grand mean instead of the average of the panel-level means.

William Gould explicitly derived the grand mean as the term to add back in
to recover the...

Read more about Update: Stata v14.1 and 15 Advisory: xtreg, fe does what you expect; manual is incorrect

Extracting content from .pdf files

One of common question I get as a data science consultant involves extracting content from .pdf files. In the best-case scenario the content can be extracted to consistently formatted text files and parsed from there into a usable form. In the worst case the file will need to be run through an optical character recognition (OCR) program to extract the text.

Overview of available tools

For years pdftotext from...

Read more about Extracting content from .pdf files

Escaping from character encoding hell in R on Windows

Note: the title of this post was inspired by this question on stackoverflow.

This section gives the basic facts and recommendations for importing files with arbitrary encoding on Windows. The issues described here by and large to not apply on Mac or Linux; they are specific to running R on Windows.

If you are on a deadline and just need to get the job done this section should be all you need....

Read more about Escaping from character encoding hell in R on Windows

Create Choropleth Maps in R

USA Choropleth maps are a means of visualizing spatial data by shading or patterning areas of a map in proportion to the values of a variable. This kind of map can provide insight into how a variable is distributed across a geographic area or the level of variability within a region. 

... Read more about Create Choropleth Maps in R