This post describes the tools I currently use for working with data.
People often ask me to recommend specific tools, and I always
hesitate, because so much boils down to personal preference. I
recently added a workshop to the DSS lineup providing an overview of
popular tools for working with data. The core idea is that researchers
have a lot of choices available when it comes to choosing tools to
implement a reproducible workflow. For example, it doesn't really...
Yesterday, I received the following message from David Drukker, the Executive Director of Econometrics at Stata:
"The xtreg-fe command in Stata produces consistent point estimates and standard errors for all the model parameters. There is a typo on page 27 of https://www.stata.com/manuals/xtxtreg.pdf . The formula for bar(bar(y)) should be the grand mean instead of the average of the panel-level means.
William Gould explicitly derived the grand mean as the term to add back in to...
A list of resources for learning R in preparation for CS109 this Spring.
A wealth of R resources are available, and I'm sure I've missed some really good ones. If you have a favorite tutorial or resource that is not listed here, please email me or submit a bug report or pull request to http://github.com/izahn/blog...
One of common question I get as a data science consultant involves extracting content from .pdf files. In the best-case scenario the content can be extracted to consistently formatted text files and parsed from there into a usable form. In the worst case the file will need to be run through an optical character recognition (OCR) program to extract the text.
This section gives the basic facts and recommendations for importing files with arbitrary encoding on Windows. The issues described here by and large to not apply on Mac or Linux; they are specific to running R on Windows.
If you are on a deadline and just need to get the job done this section should be all you need....
Choropleth maps are a means of visualizing spatial data by shading or patterning areas of a map in proportion to the values of a variable. This kind of map can provide insight into how a variable is distributed across a geographic area or the level of variability within a region.
Is your data not in the format of your favorite statistical package? Multiple files need to be merged and reformatted? You've been doing these tedious tasks by hand? There is a better way! Read about an example using a time series of Indian rainfall data.
The selection of a sampling method and sample size is a critical step in the applied research process. The choice depends on factors such as the focus and characteristics of the study, the research population and the effect sizes of interest.
Interactive visualizations of graph data can be valuable tools to understand relationships across a variety of domains. But how do you convert 4000 year old letters into a format that can be graphed? And then, how can you graph this data and share it with people?