How to save Python objects persistently using the built-in shelve module

Photo by Leonie Krickhuhn on Unsplash

I’ve had my share of experience writing classes and utilizing OOP in Python, but I have only developed objects that are used in the current running script and saved in the cache. Those cached objects are not available when you run your program in the future.

My first thought when wondering how I could store these objects was to pickle each and save the .pkl files in their own directory, but I came across the shelve module which is part of the standard library and was built exactly for this purpose — object persistence.

The shelve module uses both the…

Photo by Chris Yang on Unsplash

Open-source Intelligence, or OSINT, consists of using publicly available information to collect data on subject. Used primarily by investigators, law-enforcement, and penetration testers, this information could include anything from public records to social media data. Several freely available tools exist to collect open-source information in Python, and I will present a few of the most popular.

Social Mapper uses facial recognition to search for social media accounts related to either an individual or company. As an input, the program needs a CSV file with persons name in one column, and path to picture in another. Social Mapper will then search…

Photo by Amy Shamblen on Unsplash

Sometimes, it’s necessary to add some visual output to Python scripts in order to provide users with feedback as to the progress of a program.

In this article, we are going to touch on two ways you can incorporate this feedback: progress bars and ANSI color codes. Using these two methods will allow you to build programs that look better, and provide a better user experience. I’m going to assume that you have a script that is run in the terminal with a call similar to:


However, keep in mind that some of these tools will also work…

Photo by Mr Cup / Fabien Barral on Unsplash

I recently was involved in a project that involved the creation of an automated tool that collects a large amount of documents. In order to save these documents, it was necessary to make create files and folders on the local filesystem where the rest of the script could access the information. In this article I’m going to walk through some of the basics of file management in python using mostly the os module and the shutil module, which is part of python’s standard library (you don’t need to download anything).

The first issue I came across was how would we…

I recently was involved with a project that required parsing of a PDF in order to identify the regions of page and return the text from those regions. The text regions would then be fed to a Q/A model (farm-haystack), and return extracted data from the PDF. Essentially, we wanted the computer to read PDF’s for us and tell us what it found. Currently, there are a few popular modules that perform this task with varying effectiveness, namely, pdfminer and py2pdf. The problem is that table data is very hard to parse/detect. The solution? …

Photo by Vidar Nordli-Mathisen on Unsplash

I’m going to walk through how to develop a basic spider using the Scrapy package. The purpose of this tutorial will be to introduce the basic concepts needed to understand and use the Scrapy package, and give you the tools to develop custom spiders for your data collection needs. Scrapy does have very good documentation, so I encourage you to look through the docs to dig a little deeper into what Scrapy has to offer.

If you’ve ever developed a web scraping script before, your well aware that there are many menial tasks that need to be coded from scratch…

Photo by Lili Popper on Unsplash

Similar to regular expressions, Xpath can be thought of as a language for finding information in and XML/HTML document. It has many uses, but personally I use it most for developing web crawlers and grabbing information from websites. We’re going to go over the basics of the language, and how to grab the content you need from a document. In order to follow along with this tutorial, you can use the console in your Chrome Developer Tools (any browser developer tools will do) or you can use your favorite web scraping framework. …

I like to use at least two editors: A full featured editor such as Atom, for most of my everyday coding needs (when I’m not in a Jupyter Notebook), and a second lightweight editor that I can use to make fast edits. For my lightweight editor, I use Vim. Vim is a command line based code editor, depending on how deep you dive into Vim, it is either a lightweight editor for small edits or a full featured code editor with the ability to drastically increase your productivity. …

Data Mining in the Medicare/Medicaid system was made legal in 2013. How do authorities use this data to catch bad actors?

Photo by National Cancer Institute on Unsplash

In the United States, Medicare is the national health insurance plan made primarily available to older citizens over the age of 65. This program is expensive, partly due to America’s aging population and sky-rocketing healthcare costs. It is financed through general government revenues (43%), payroll taxes (36%), and beneficiary premiums (15%). There are three main parts to the Medicare program:

  1. Medicare Part A : covers mostly inpatient hospital and hospice care.
  2. Medicare Part B: covers mostly hospital outpatient services, and prescriptions administered by a healthcare worker while in the hospital.
  3. Medicare Part C: (aka - Medicare Advantage) are private plans…

Photo by roberto bernardi on Unsplash

My go-to tool for data collection is the SelectorLib library. It is an easy to use, quick alternative to setting up a scraping solution from scratch. There are many ways to implement the library, and I will share my workflow. I encourage anyone interested to also take a look at the documentation on the website, because they do a good job of providing tutorials and guides that spell things out clearly.

In order to use this module you need to download the python package, and download the chrome extension.

pip install selectorlib

You can think of the process of using…

Brendan Ferris

Turning over rocks and seeing what crawls out.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store