Last updated 10/26/2017


This site was set up to host workshop content and is meant for students for reference/self-study. It is split into the following sections:

  • Introduction & Basics
  • Data Visualization (coming soon)
  • Machine Learning (coming soon)

Stay tuned as we add additional sections!

 

Introduction to Python

What is Python?

Python is a high-level programming language. It’s a swiss army knife-like tool that is quick to deploy and can be used for all sorts of applications.

Just a quick laundry list of things it can be used it for at work:

  • Automating menial tasks like mail-merging, reporting, and making updates in files
  • Making pretty charts charts
  • And of course, analytics!

 

Why Python vs. Excel/Tableau/SQL?

Python is FREE. That’s numero uno.

Second, have you ever tried to walk through someone else’s Excel workbook? That’s at best mildly annoying, and at worst, an absolute nightmare. Once you learn them, Python and other text-based languages allow for clean and logical information sharing.

Lastly, Python can do WAY more once you learn how to use it. Once you hit Excel’s limit on rows (~100K records) you’ll need to transition to a more powerful tool, cough Python. Further, unlike Excel/Powerpoint, new functionality is added all the time.

Should you learn R or Python? It doesn’t matter for analytics. Once you learn one you can pick the other up pretty easily. However, Python is broadly a more universal tool that can do more things. If you plan to do stuff outside of analytics, you should learn Python.

 

Installation

Step 1, install Python.

We will use version 2.7. There is a newer 3.X version available, which introduces new functions but most applications use Python 2.7.

You can download Python 2.7 @ https://www.python.org/downloads/release/python-2714/

Scroll down until you hit the ‘Files’ section. If you have Windows, download the Windows x86-64 MSI installer package. If you have a Mac, you have Python installed by default.

 

What do I need to know to start?

First off, you’ll need to know some terminology.

  • shell: sometimes referred to as terminal too. Kind of like a DOS-prompt for Python, notice the “>>>” in the window. This window will give you error messages when something goes wrong, but other than that, you won’t be in this very much.

 

  • library: a pre-programmed software addin that adds functionality to Python (allows it to make pretty graphs or run analytical models or make it easier to manipulate data). Below are a few examples:
    • numpy: allows Python to do data manipulations
    • ggplot: lets Python create Tableau-like graphics
    • seaborn: another super popular visualization for statistics package
    • pandas: allows Python to do store data and do Excel-like manipulations on data

 

  • text editor: what you’ll be writing your code in (front window, the shell is in the back)

 

Second, the installation allows your computer to understand Python code and will set you up with a crappy text editor called IDLE. For our purposes here, we’ll use IDLE. The next section will detail out a couple of better ones you can try :)

Lastly, if you ever run into an error, there are a TON of resources on the web for help. Your best bet is to google “what does (insert error message here) mean Python?” and you should find an answer. Feel free to also hit up KDA office hours!

 

Text Editors

Software editors are often used to write code. They help highlight certain parts of code (SUPER useful), use smart indenting, and basically you should totally be using one and NOT Notepad/Textpad. They also contain super helpful menu commands for changing settings and running code previews.

There are a few out there that you can try:

 

Basics through DataCamp

We highly recommend going through the DataCamp course: Introduction to Python for Data Science to learn the basics of programming and Python. You can find a link to it under the Resources tab up above. It’s super intuitive, does a good job teaching the basics, and gives you a follow along environment in your browser that’s really helpful.

In it, you’ll go over a lot of programming basics that you’ll have to know to use more advanced functions in Python:

  • operators: how math functions work
  • variables: how to assign stuff
  • data types: what types of data can you store in variables
  • lists: how to store MULTIPLE data points