Chapter 11 – Files

Sections 11.1 - 11.4

\(\boxdot\) Working with Data Files
  • Data files - for large amounts of data input.

  • A text file (usually denoted by the suffix .txt) has data organized in a line-by-line fashion.

  • Use the open function to open a file.

    data_file = open(filename, 'r')
    • filename is the name of the file you are accessing

    • the ‘r’ parameter indicates it is being opened for reading the data

    • data_file is the name you provide and the open function returns a reference to the file object named filename

    • if ‘w’ is used in place of ‘r’, then filename is open for writing

  • Use the close method to close the file when file use (either reading or writing) is complete.

\(\boxdot\) Finding a File on the Computer
  • the file name and path give the full name of the file

  • Mac - /Users/yourname/filename.txt

  • sub-folders : /Users/yourname/PyCharmProjects/project_01/filename.txt

  • Windows - C:\Users\yourname\My Documents\sub-folder1\...\filename.txt

    • however, in a Python program, use / to separate the folder names in a path when using a Windows machine as well
  • if your Python file and data file are in the same folder, then no path name is required

  • suppose you have a Python program in one sub-folder called “programs” and a data file in another folder called “data”

  • absolute address /Users/yourname/data/filename.txt

  • relative address ../data/filename.txt

\(\boxdot\) Processing the File
  • the open function returns a reference to the filename file, breaking it into individual lines

  • a for loop can be used to access each line

  • the split() method splits the items in the line at spaces places where the text has space

  • Activity: Store the following data in a text file within PyCharm.

10056 01012017 c Walmart 22.71
10056 01062017 c Fairway 22.24
10056 01112017 c T_Bocks 27.71
10056 01122017 c Koreana 22.63
10289 01032017 c Koreana 14.34
10289 01092017 c Roscoes 50.53
10289 01192017 c Amazon 40.03
10289 01202017 c Walmart 187.13
10289 01222017 c T_Bocks 14.52
10289 01222017 c Pulpit_Rock 28.58
10289 01292017 c Fairway 75.30
19542 01082017 c Pulpit_Rock 21.26
19542 01092017 c Walmart 157.77
19542 01172017 c Amazon 40.10
19542 01192017 c Koreana 31.24
19542 01212017 c Pulpit_Rock 27.71
19542 01302017 c Roscoes 65.82
19542 01302017 c Toppling_Goliath 53.68
10998 01012017 c Fairway 104.21
10998 01032017 c Toppling_Goliath 35.23
10998 01042017 c Oneota_Market 7.28
10998 01042017 c Pulpit_Rock 17.18
10998 01072017 c T_Bocks 43.95
10998 01072017 c Koreana 49.49
10998 01082017 c Pulpit_Rock 51.41
10998 01112017 c Walmart 72.98
10998 01182017 c Koreana 17.43
10998 01202017 c Amazon 7.74
10998 01262017 c Pulpit_Rock 34.47
10998 01292017 c Walmart 15.38
10998 01292017 c Toppling_Goliath 35.66

Sections 11.5 - 11.7

\(\boxdot\) File Reading Variations
  • The code below shows four different ways a text file may be read. Create a new Python file in PyCharm called file_reading.py. Copy and paste the code shown below into the new file and explore the various ways file reading and printing may be accomplished.

    def read_v1(f_in):  
        print("Reading a file - Version 1 \n")  
        for line in f_in:  
            print(line)  
    
    
    def read_v2(f_in):  
        print("Reading a file - Version 2 \n")  
        lines = f_in.readlines()  
        for line in lines:  
            print(line)  
        print(lines[-1])  
    
    
    def read_v3(f_in):  
        print("Reading a file - Version 3 \n")  
        line = f_in.readline()  
        while line:  
            print(line)  
            line = f_in.readline()  
    
    
    def read_v4(f_in):  
        print("Reading a file - Version 4 \n")  
        all_lines = f_in.read()  
        print(all_lines)  
    
    
    def main():  
        f_in = open('../data/CC_data.txt', 'r')  
        read_v1(f_in)  
        f_in.close()
    
    
    main()  
  • Summary of Ways :

    • the f_in = open("file_in_name.txt", 'r') function returns a reference to the file and can be used to loop through the file, in a sequential-line fashion, to access and/or process each line.

    • the f_in.readlines() method creates a list, with a name determined by the user. The list of lines may be accessed sequentially, using a for loop, or an individual line may be accessed using an appropriate index value. The maximum positive interger value is (len(lines)) - 1. Index “0” is the first line in the list, index = ‘-1’ is another way of accessing the last line. Using f_in.readlines(n) results in the first n characters being read. The value of n is rounded up so that the process reads an entire line, not a partial line, before it stops.

    • the f_in.readline() method reads a single next line including the new line character \n. If f_in.readline(n) is used, then the first n characters of the next line are read if the next line is more than n characters long.

    • the f_in.read() method reads the entire file as a single str object (including the \n characters).

  • which one of these is used may depend on the purpose of the task. For example, readlines() allows the programmer to go directly to a specific line number with looping through all preceeding lines. The read() method allows the entire file to be searched as a single object in order to find certain characters, or character patterns.

  • The strip() method – stripping the the new line (\n) character off the end of the line

    print(line.strip())  
  • The split() method – splitting the line on spaces

  • Specify the item using an index

    line_in = file_in.readline().split()  
    
    for item in range(len(line_in)):  
        print(line_in[item])  
\(\boxdot\) Writing to Files
  • Open a txt file to which to write using

    file_o = open("file_out_name.txt", 'w')  
    
    file_o.write(line.strip())  
  • Activity: Use PyCharm to write a program that reads in the credit card data, and writes all of the data for a given account to a separate file for each account.

\(\boxdot\) Further Activities
  • Activity: