One of common question I get as a data science consultant involves extracting content from .pdf files. In the best-case scenario the content can be extracted to consistently formatted text files and parsed from there into a usable form. In the worst case the file will need to be run through an optical character recognition (OCR) program to extract the text.
This section gives the basic facts and recommendations for importing files with arbitrary encoding on Windows. The issues described here by and large to not apply on Mac or Linux; they are specific to running R on Windows.
If you are on a deadline and just need to get the job done this section should be all you need.
Choropleth maps are a means of visualizing spatial data by shading or patterning areas of a map in proportion to the values of a variable. This kind of map can provide insight into how a variable is distributed across a geographic area or the level of variability within a region.
Is your data not in the format of your favorite statistical package? Multiple files need to be merged and reformatted? You've been doing these tedious tasks by hand? There is a better way! Read about an example using a time series of Indian rainfall data.
The selection of a sampling method and sample size is a critical step in the applied research process. The choice depends on factors such as the focus and characteristics of the study, the research population and the effect sizes of interest.
Interactive visualizations of graph data can be valuable tools to understand relationships across a variety of domains. But how do you convert 4000 year old letters into a format that can be graphed? And then, how can you graph this data and share it with people?