I’m going to walk through how to develop a basic spider using the Scrapy package. The purpose of this tutorial will be to introduce the basic concepts needed to understand and use the Scrapy package, and give you the tools to develop custom spiders for your data collection needs. Scrapy does have very good documentation, so I encourage you to look through the docs to dig a little deeper into what Scrapy has to offer.

Why use Scrapy?

If you’ve ever developed a web scraping script before, your well aware that there are many menial tasks that need to be coded from scratch…

Similar to regular expressions, Xpath can be thought of as a language for finding information in and XML/HTML document. It has many uses, but personally I use it most for developing web crawlers and grabbing information from websites. We’re going to go over the basics of the language, and how to grab the content you need from a document. In order to follow along with this tutorial, you can use the console in your Chrome Developer Tools (any browser developer tools will do) or you can use your favorite web scraping framework. …

I like to use at least two editors: A full featured editor such as Atom, for most of my everyday coding needs (when I’m not in a Jupyter Notebook), and a second lightweight editor that I can use to make fast edits. For my lightweight editor, I use Vim. Vim is a command line based code editor, depending on how deep you dive into Vim, it is either a lightweight editor for small edits or a full featured code editor with the ability to drastically increase your productivity. …

Data Mining in the Medicare/Medicaid system was made legal in 2013. How do authorities use this data to catch bad actors?

In the United States, Medicare is the national health insurance plan made primarily available to older citizens over the age of 65. This program is expensive, partly due to America’s aging population and sky-rocketing healthcare costs. It is financed through general government revenues (43%), payroll taxes (36%), and beneficiary premiums (15%). There are three main parts to the Medicare program:

  1. Medicare Part A : covers mostly inpatient hospital and hospice care.
  2. Medicare Part B: covers mostly hospital outpatient services, and prescriptions administered by a healthcare worker while in the hospital.
  3. Medicare Part C: (aka - Medicare Advantage) are private plans…

My go-to tool for data collection is the SelectorLib library. It is an easy to use, quick alternative to setting up a scraping solution from scratch. There are many ways to implement the library, and I will share my workflow. I encourage anyone interested to also take a look at the documentation on the website, because they do a good job of providing tutorials and guides that spell things out clearly.


In order to use this module you need to download the python package, and download the chrome extension.

pip install selectorlib

You can think of the process of using…

As you probably know, Tor is an anonymity network which utilizes a unique system of Onion Routing (explained later) to keep users of the network anonymous. In this article, i’m going to touch on some of the fundamental principals used by Tor, and how you might find it useful. Started in the mid-1990’s at the U.S Naval Research Laboratory, the project was developed to foster secure communications between spies and other government agents involved in covert investigations. In 2002 the software was released under a free public-use license. Control was handed to the Electronic Frontier Foundation, who in turn handed…

When I was first introduced to the command line on my computer, it seemed confusing, and low-tech. It brought me back to my days as a teenager, punching codes into an old-school POS system, or checking inventory in an outdated company intranet. Though these technologies looked dated, they were functional. As I became more comfortable with the terminal I quickly realized that it was far more than functional, it exposed a host of small programs that you could use to interact with the computer, solve problems, and even chain together in complex ways. …

Often neglected in the implementations of the most popular machine learning and statistical analysis frameworks is survival analysis. Simply, survival analysis is the time it takes for an event of interest to occur. Although that seems pretty straight forward, the reality is a little more complicated. In this article, we will go through some of the high level concepts necessary to understand when conducting survival analysis, or deciding if it is the right tool for your problem.

What problems does survival analysis solve?

As you may have guessed by the name, survival analysis has historically been employed by the medical research community to measure the survival…

In part 1 of this series, I went over what PostgreSQL is, how it works, how you can get started adding databases and tables to create a relational database for storing your information. In this article i’m going to assume that you have everything downloaded, and databases/tables to work with. I’m going to go through a few operations that I used to find myself constantly googling. If you find this information useful it would be a good addition to your bookmarks. Let’s begin!

How to import a CSV.

Let’s say you have a CSV file that you want to import into an postgres database using…

PostgreSQL is an open-source Relational Database Management System (RDMS) thats popular for a number of reasons: It’s free, it’s secure, it supports custom functions, it’s object relational model architecture, and unlimited rows per table. Check out this article for a more in-depth breakdown. PostgreSQL is also used by many major companies including: NASA, Twitch, Apple, and Reddit. In this article we are going touch on the basics of PostgreSQL so you can get up and running fast.

Downloading PostgreSQL and pgAdmin4.

On a Mac, the process of downloading postgres is simplified thanks to the installation package. …

Brendan Ferris

Turning over rocks and seeing what crawls out.

