I recently was involved in a project that involved the creation of an automated tool that collects a large amount of documents. In order to save these documents, it was necessary to make create files and folders on the local filesystem where the rest of the script could access the information. In this article I’m going to walk through some of the basics of file management in python using mostly the os module and the shutil module, which is part of python’s standard library (you don’t need to download anything).
The first issue I came across was how would we name the files? I our case, they were PDF’s but the names sometimes contained spaces and special characters. I decided that in order to retain the information it would be a good idea to convert the original filenames into Base64, then rename the files with the Base64 string. This way, if we ever needed to see the original file name all we had to do was decode the Base64 string. No information loss!
The following methods can be used to achieve this result:
Now let’s move on to some basic operations. If we want to list the files in our current directory, we can run:
>>> os.listdir()['.DS_Store', 'requirements.txt', 'Models', 'geckodriver.log', '.gitignore', 'awscli-bundle.zip', '.ipynb_checkpoints', '.git', 'main.py', 'Notebooks']
To make a new file:
>>> newFile = open('mynewfile.txt', 'w+')
The above command will create a new file if it does not already exist in your folder, but it will also overwrite a file if it has the same name. Notice the
w+ , there are many options for opening files such as appending to the end of a file (
a), or reading the file contents in binary (
rb). Please see here for an explanation of all of these options.