Our team of PhD data science specialists serves the Harvard social science community. We perform free in-person consulting on publication research issues as a part of the Institute for Quantitative Social Science. Our services are available to faculty, postdocs, graduate students, staff, and undergraduates writing a senior thesis. Specifically, we aim to provide advice on:

  • Data analysis and programming
  • Organization, secure storage, and sharing of data
  • Research project planning
  • Training in the use of both established software packages and emerging tools

To schedule a consultation or request help, please send a detailed description of your specific research problem, your affiliation (School / Dept.), and position (e.g., grad student) to help@iq.harvard.edu or use our contact form.

Our team also offers additional data science services. If you have a data science problem and you wish to speak to us, please contact us at help@iq.harvard.edu.


Introduction to R

2017 Sep 15

Introduction to R (9:30 -- 12:00)

Date: 

Friday, September 15, 2017, 9:30am to 12:00pm

Location: 

K018, CGIS Knafel building, concourse level

Get an introduction to R, the open-source system for statistical computation and graphics. 

With hands-on exercises, learn how to import and manage datasets, create R objects, install and load R packages, conduct basic statistical analyses, and create common graphical displays. This workshop is appropriate for those with little or no prior experience with R.

More details including workshop materials are available at http://dss.iq.harvard.edu/workshop-materials#widget-2.

This workshop is free for Harvard and MIT affiliates. Click here to sign up!

2017 Sep 15

Introduction to R (1:00 -- 3:30)

Date: 

Friday, September 15, 2017, 1:00pm to 3:30pm

Location: 

K018, CGIS Knafel building, concourse level

Get an introduction to R, the open-source system for statistical computation and graphics. 

With hands-on exercises, learn how to import and manage datasets, create R objects, install and load R packages, conduct basic statistical analyses, and create common graphical displays. This workshop is appropriate for those with little or no prior experience with R.

 

More details including workshop materials are available at http://dss.iq.harvard.edu/workshop-materials#widget-2.

This workshop is free for Harvard and MIT affiliates. Click here to sign up!

 

Regression Models in R

Introduction to R Graphics with ggplot2

Basic R Programming for Data Analysis

Introduction to Stata

2017 Sep 08

Introduction to Stata

9:30am to 12:00pm

Location: 

K018, CGIS Knafel building, concourse level

This class will provide a hands-on introduction to Stata. You will learn how to navigate Stata’s graphical user interface, import data, calculate descriptive statistics and manage data and value labels. This workshop is designed for individuals who have little or no experience using Stata software.

Read more about Introduction to Stata

Data Management in Stata

2016 Apr 01

Data Management in Stata

9:00am to 12:00pm

Location: 

Rm K018, 1737 Cambridge St (CGIS Knafel Building)

This class will introduce common data management techniques in Stata.  Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria.  Participants will be introduced to strategies for merging datasets (adding both variables and observations), and collapsing datasets. This workshop is intended for users who have an introductory level of knowledge of Stata software.

This workshop is free for Harvard and MIT affiliates. Click here to sign up!

Read more about Data Management in Stata

Regression and Graphing in Stata

2017 Sep 08

Regression and Graphing in Stata

1:00pm to 3:30pm

Location: 

K018, CGIS Knafel building, concourse level

This hands-on class will provide a comprehensive introduction to graphics in Stata.  Topics for the class include graphing principles, descriptive graphs, linear regression, factor variables, and post-estimation graphs.  This is an introductory workshop appropriate for those with only basic familiarity with Stata.

Read more about Regression and Graphing in Stata

Introduction to Python

2016 Apr 28

Introduction to Python

9:30am to 12:30pm

Location: 

Rm K018, 1737 Cambridge St (CGIS Knafel Building)

Have you always wanted to learn a programming language, but not sure how to get started? This workshop teaches the basic grammar of the python programming language, a powerful but easy to use tool for getting more out of your computer.  Little to no knowledge of python or programming is assumed.

This workshop is free for Harvard and MIT affiliates. Click here to sign up!

Read more about Introduction to Python

2017 Mar 22

Introduction to Python for Data Analysis 1

6:00pm to 8:00pm

Location: 

1737 Cambridge St. Cambridge, CGIS Knafel Building, K018

 

The intended audience of this course is data scientists and other people involved in data analysis rather than career programmers. There will be some light discussion of good programming principles, but only as they pertain to good data science practices. The goal is to take participants from a low level of familiar with Python to performing simple data manipulations. At the end of the workshop, you should leave with a solid foundation to continue your own learning and experimenting with the language.

Workshop Preparation

In preparation for the

Read more about Introduction to Python for Data Analysis 1
2017 Mar 23

Introduction to Python for Data Analysis 2

6:00pm to 8:00pm

Location: 

1737 Cambridge St. Cambridge, CGIS Knafel Building, K018

The modules for manipulating tabular data in Python are different enough, that it sometimes feels like a different language from basic python. Building on the foundation from Introduction to Python for Data Analysis 1, we will explore this aspect of python together, loading simple datasets into Python.

Prerequites: Introduction to Python for Data Analysis 2

Workshop Preparation

In preparation for the workshop please install the Anaconda distribution of Python 3.5.  It can be found here:
https://www.continuum.io/downloads

Read more about Introduction to Python for Data Analysis 2

Intermediate Python

2016 Apr 29

Intermediate Python

9:30am to 12:30pm

Location: 

Rm K018, 1737 Cambridge St (CGIS Knafel Building)

This course is a survey of advanced features of the python programming language that are relevant to data analysis.  This includes exposure to some of the most powerful features of python, such as functional and object-oriented programming.  In addition, we will learn how to use inspection to learn about the undocumented features of new modules and data structures.

This workshop is free for Harvard and MIT affiliates. Click here to sign up!

Read more about Intermediate Python

2016 Nov 03

Visualization in Python

6:00pm to 8:00pm

Location: 

1737 Cambridge St. Cambridge, CGIS Knafel Building, K018

In this course, we explore what it takes to create beautiful visualizations in Python using the matplotlib and seaborn packages.
Prerequites: Introduction to Python for Data Analysis 1; Introduction to Python for Data Analysis 2

 

Workshop Preparation

In preparation for the workshop please install the Anaconda distribution of Python 3.5.  It can be found here:

https://www.continuum.io/downloads

This is the only version of Python that will be supported. If you are having trouble installing this

Read more about Visualization in Python
2017 Mar 29

Text Analysis in Python

6:00pm to 8:00pm

Location: 

1737 Cambridge St. Cambridge, CGIS Knafel Building, K018

Python is an extremely powerful tool for text analysis. We will explore the use of TextBlob, nltk and scipy for text analysis.


Prerequites: This is an advanced python workshop. To get the most out of the material, comfort with base python is recommended, along with some familiarity to numpy and scipy, and some exposure to pandas.

 

Workshop Preparation

In preparation for the workshop please install the Anaconda distribution of Python 3.5.  It can be found here:
https://www.continuum.io/downloads

Read more about Text Analysis in Python

 

Extracting content from .pdf files

One of common question I get as a data science consultant involves extracting content from .pdf files. In the best-case scenario the content can be extracted to consistently formatted text files and parsed from there into a usable form. In the worst case the file will need to be run through an optical character recognition (OCR) program to extract the text.

Overview of available tools

For years pdftotext from

Read more about Extracting content from .pdf files

Escaping from character encoding hell in R on Windows

Note: the title of this post was inspired by this question on stackoverflow.

This section gives the basic facts and recommendations for importing files with arbitrary encoding on Windows. The issues described here by and large to not apply on Mac or Linux; they are specific to running R on Windows.

If you are on a deadline and just need to get the job done this section should be all you need.

Read more about Escaping from character encoding hell in R on Windows
Read more