You may find yourself in a situation where you have defined a function, but you need to make sure that the data that is passed to the function is of a certain data type. There are many ways to implement this functionality, but we will be going over two of the methods I find most useful in my day-to-day.

The reason that this becomes an issue with Python is because Python is a dynamically typed language, data types are not assigned explicitly like in other languages like C and Java. …

It goes without saying that we live in a world where data is more valuable than oil, we like to think that most of this data is locked in the secure servers of the many private companies that collect it. Anyone who hasn’t been sleeping for the past decade also realizes that those servers are not immune from attack. We’ve seen big breaches including: Target, Home Depot, and Equifax, among others. We also ‘give’ our data away in return for advertisers to target their ads on the platforms that have become a part of our daily life. With the rise…

Recently, I completed a small project that required me to make suggestions for optimizing an elevator configuration within a theoretical high-rise in New York City. The building is set up as follows. Entrants to the building must first swipe badges through a security system, then they can push the elevator call button and wait for an elevator to arrive. I was asked to answer 5 main questions, and to only spend a couple of hours completing the task. The main questions were:

Provide an overview of the current state of elevator wait times.

Figure out the overall average wait time.

How to save Python objects persistently using the built-in shelve module

I’ve had my share of experience writing classes and utilizing OOP in Python, but I have only developed objects that are used in the current running script and saved in the cache. Those cached objects are not available when you run your program in the future.

My first thought when wondering how I could store these objects was to pickle each and save the .pkl files in their own directory, but I came across the shelve module which is part of the standard library and was built exactly for this purpose — object persistence.

How does it work?

The shelve module uses both the…

Open-source Intelligence, or OSINT, consists of using publicly available information to collect data on subject. Used primarily by investigators, law-enforcement, and penetration testers, this information could include anything from public records to social media data. Several freely available tools exist to collect open-source information in Python, and I will present a few of the most popular.

Social Mapper

Social Mapper uses facial recognition to search for social media accounts related to either an individual or company. As an input, the program needs a CSV file with persons name in one column, and path to picture in another. Social Mapper will then search…

Sometimes, it’s necessary to add some visual output to Python scripts in order to provide users with feedback as to the progress of a program.

In this article, we are going to touch on two ways you can incorporate this feedback: progress bars and ANSI color codes. Using these two methods will allow you to build programs that look better, and provide a better user experience. I’m going to assume that you have a script that is run in the terminal with a call similar to:


However, keep in mind that some of these tools will also work…

I recently was involved in a project that involved the creation of an automated tool that collects a large amount of documents. In order to save these documents, it was necessary to make create files and folders on the local filesystem where the rest of the script could access the information. In this article I’m going to walk through some of the basics of file management in python using mostly the os module and the shutil module, which is part of python’s standard library (you don’t need to download anything).

The first issue I came across was how would we…

I recently was involved with a project that required parsing of a PDF in order to identify the regions of page and return the text from those regions. The text regions would then be fed to a Q/A model (farm-haystack), and return extracted data from the PDF. Essentially, we wanted the computer to read PDF’s for us and tell us what it found. Currently, there are a few popular modules that perform this task with varying effectiveness, namely, pdfminer and py2pdf. The problem is that table data is very hard to parse/detect. The solution? …

I’m going to walk through how to develop a basic spider using the Scrapy package. The purpose of this tutorial will be to introduce the basic concepts needed to understand and use the Scrapy package, and give you the tools to develop custom spiders for your data collection needs. Scrapy does have very good documentation, so I encourage you to look through the docs to dig a little deeper into what Scrapy has to offer.

Why use Scrapy?

If you’ve ever developed a web scraping script before, your well aware that there are many menial tasks that need to be coded from scratch…

Similar to regular expressions, Xpath can be thought of as a language for finding information in and XML/HTML document. It has many uses, but personally I use it most for developing web crawlers and grabbing information from websites. We’re going to go over the basics of the language, and how to grab the content you need from a document. In order to follow along with this tutorial, you can use the console in your Chrome Developer Tools (any browser developer tools will do) or you can use your favorite web scraping framework. …

