Literate Programming & Dynamic Document Options in Stata


I’ve spent the last few months attempting to incorporate different literate programming and reproducible research options with the Stata statistical software. This post provides a quick overview of my goal, a brief “how-to” on each option, and my thoughts on realistically introducing them in a workflow.

What is literate programming?

Literate programming is a term coined by Donald E. Knuth in 1984. The general idea is to combine human readable text with machine-readable commands into the same document, manual, or website.  At the beginning of his paper, Knuth writes,

“Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.”

Although Knuth is describing the process of writing computer programs, his concept applies to any scenario where a series of commands are used to get a computer to perform a particular set of functions.

The data analysis process includes a successive set of commands used to manipulate the rawest form of the data, transform it into summaries or visualizations, and then model it for predictions and inferences. Each step in the process is based on the preceding step, so tracking the entire method in a systematic way is essential for remaining organized and being able to reproduce your work.

The yardstick I was using to evaluate each Stata literate programming option was how well each method provided a relatively seamless transition between the Stata commands and the human-readable explanations.

A Typical Stata Workflow

Below is an example workflow in Stata.


While working in Stata, commands get put into a .do file, and they are used to create the output in either  .gph (image) or .smcl (text) files. The text results will then get written up in a .docx, and the tables and figures will be created using .xlsx. These documents are then sent off to a scientific journal somewhere, where the finished paper will most likely end up as a .pdf or .html on the web.

So to recap, the general process is:

.do >> .gph >> .smcl >> .docx >> .xlsx >> .pdf >> .html 

This workflow involves seven different file types to create a single finished manuscript (that will usually only contain only text and images).

Why should I care?

If you’ve read this far, the answer should be obvious: the process outlined above is inefficient. The analysis process is split across Stata, MS Word, MS Excel, Adobe/Preview, and whatever web browser you’re using. This makes working in Stata tedious.

Solution: Digital Notebooks

I recently came across a white paper that discusses the benefits of using Notebooks, and I’ve summarized the main points below:

  • “…notebooks speed up the process of trying out data models and frameworks and testing hypotheses, enabling everyone to work quickly, iteratively and collaboratively”
  • “…can be used to perform an analysis, generate visualizations, and add media and annotations within a single document.”
  • “…can be used to annotate the analyses for future reference and keep track of each step in the data discovery process so that the notebook becomes a source of project communication and documentation, as well as a learning resource for other data scientists.”

Although the paper is referring specifically to the Jupyter Notebooks, RStudio recently introduced the R Notebooks. Both methods combine similar sections for markdown formatted text, data analysis commands, output tables and/or figures, and other relevant portions of the results. These digital notebooks closely resemble paper laboratory notebooks (see below).

This slideshow requires JavaScript.

As you can see from this example, some of the text and calculations have been handwritten, while others have been calculated outside of the notebook, printed, and then pasted back inside the lab notebook. I commend the authors for their transparency, but this doesn’t seem like the most efficient method of keeping track of your work.

Does Stata have an equivalent option?

Sort of. Below I review my experience using three Stata options that collectively provide similar abilities to the notebooks provided by Python and R.

#1 markdoc

markdoc is a package for creating dynamic documents in Stata. It is very similar to the kintr package in R. The package was written by E. F. Haghish and is available on Github. To run markdoc, you’ll need to install Pandoc (which requires a type-setting software for TeX/LaTeX–I used MikTeX on my PC and Homebrew on my Mac).

After installing Pandoc and MikTeX, you’ll also need to download and install wkhtmltopdf.

Installing markdoc

You can install markdoc with the ssc command

ssc install markdoc

You should see:

checking markdoc consistency and verifying not already installed...
installing into c:\ado\plus\...
installation complete.

You’ll also need to install weaver and statax

ssc install weaver
ssc install statax

also has a handy dialog box you can install with the following command:

db markdoc

The Output Files

Haghish provides example .do files for getting started with markdoc. I recommend working through each of them, but it shouldn’t be too difficult if you’re used to commenting in your .do files or writing in markdown. The .docx and .pdf output files are clean, orgranized, and formatted.



markdoc is ideal for producing high-quality documents directly from the Stata IDE. After you understand the markdoc syntax, you will be able to perform the majority of your work in the .do file. The only downside I encountered in markdoc was a somewhat buggy installation–it worked better for me on the Mac. But the package is incredibly well maintained by the author, and I was able to find answers to my questions on his Github page eventually.

#2 weave

Germán Rodríguez at Princeton created weave, and I consider it a markdoc-light Stata package.

Installing weave

Installation is easy. Just type the following command into the Stata IDE.

net from

And there’s an example .do file on his website.

weave essentially uses markdown/html tags for inline images and headers that are written directly into your .do files. The results are inserted into the output as plain text, so there is no need to tweak their formatting. When you’re finished with your analyses, you just type the following commands directly into the IDE.

weave using sample

The beginning of the .do file contains a command for logging using everything as a .usl file. The .usl file is then ‘weaved’ to create a .html output which will automatically open in your web browser.

The Output Files

You can just print the .html file to a .pdf like you would any web page. Chrome seems to create the best-looking .pdfs.


*TIP: use minimal lines on your .do file to create cleaner looking output. I’ve created a detailed example here.

I use weave whenever I’m using Stata on my Mac. It’s easy to use, quick to format, and only requires me to have Stata open with a .do file.  I’ll use markdoc if I am creating a more professional-looking report, but the bugginess of markdoc doesn’t make it very user-friendly

#3 ipystata

The Jupyter Notebooks (previously IPython Notebooks) can be configured to work with Stata commands. Unfortunately, the package works best with Windows/PC.  The setup isn’t too complicated but has a few steps that can trip you up.

Download Anaconda from Continuum

You can download the most recent version of Anaconda from Continuum . This will include the following applications


The only application I will be covering in this post is the Anaconda Prompt.

Changing the Jupyter Notebook working directory

The first thing you will want to do is set up your Jupyter Notebook in an appropriate working directory. You can do this by right-clicking on the Anaconda Prompt and run it as an administrator (I’ve moved the application to the taskbar).


When the prompt is displayed (it should say Administrator)


copy+paste the file directory to the folder you want the Jupyter Notebook to open in. In the Anaconda Prompt, type

cd C:\Users\Google Drive\...\ipystata\notebooks

This will change your working directory. After you’re in the correct working directory, start up the Jupyter Notebooks by typing the following command in your Anaconda Prompt,

Jupyter Notebook

This should open a new tab in your default web browser.


You can open a new notebook using the tab on the far right of the screen by selecting, “New” >> “Python [default]

Registering Stata

You will need to open a Command Prompt window as an administrator by right-clicking on the application and selecting, “Run as administrator” (*you can search for this application in the windows search bar by typing “cmd”).


from here you need to navigate to your Stata application in your Program Files (usually in the C:\ drive)


copy+paste the file destination and enter it into the Command Prompt window preceded by cd

cd C:\Program Files (x86)\Stata14


from this location, register the Stata application by typing the name of the .exe file followed by a space and /Register


*No news is good news on this command. 

Installing ipystata and pandas

Now go back to your Jupyter Notebook and install pandas. Pandas is an open-source data analysis package for python. Read about it here.

In the first line of your notebook type:

import pandas as pd

To install ipystata, you’ll need to open a Windows PowerShell window (as an administrator) and enter the following command:

pip install ipystata


After the package has been installed, enter the following command in your Jupyter Notebook:

import ipystata

To test if it worked, type a simple display command preceded by the %%stata . The output should look like this:


Using the %%stata Magic Commands

Now that you are up and running, the Jupyter Notebook basically replaces your .do file. You will just need to precede the Stata commands with a line containing the %%stata

Start by loading a native dataset

sysuse auto, clear

You can get a quick overview of these data by using codebook, compact or describe, short


Including Graphs in the Output

To include graphs in your output, simply include the -gr command on the same line as your  %%stata command.


matrix graph


scatter plot

Sharing Your Output Online

In my opinion, the best part of using Jupyter Notebooks is the ability to share your work online. You can publish your notebook using the cloud+arrow icon on the toolbar (register your account first).


In fact, this notebook and a complete example of the ipystata package is available online. I think this feature makes the Jupyter Notebook the best option for literate programming and reproducible research in Stata. The complicated setup is definitely worth the time investment because you’ll be able to have an ongoing stream of commands, formatted text summaries, and output all in one place.





  1. Ties de Kok · December 4, 2016

    Interesting article, good work! If you have feature suggestions, feel free to let me know and I will try to improve iPyStata further.

    Liked by 1 person

    • newsandnumbers · December 10, 2016

      Thank you! I love the package. It’s great for sharing your thought process/analysis steps.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.