Data files - for large amounts of data input.
A text file (usually denoted by the suffix .txt) has data organized in a line-by-line fashion.
Use the open
function to open a file.
data_file = open(filename, 'r')
filename is the name of the file you are accessing
the ‘r’ parameter indicates it is being opened for reading the data
data_file is the name you provide and the open
function returns a reference to the file object named filename
if ‘w’ is used in place of ‘r’, then filename is open for writing
Use the close
method to close the file when file use (either reading or writing) is complete.
the file name and path give the full name of the file
Mac - /Users/yourname/filename.txt
sub-folders : /Users/yourname/PyCharmProjects/project_01/filename.txt
Windows - C:\Users\yourname\My Documents\sub-folder1\...\filename.txt
/
to separate the folder names in a path when using a Windows machine as wellif your Python file and data file are in the same folder, then no path name is required
suppose you have a Python program in one sub-folder called “programs” and a data file in another folder called “data”
absolute address /Users/yourname/data/filename.txt
relative address ../data/filename.txt
the open
function returns a reference to the filename file, breaking it into individual lines
a for
loop can be used to access each line
the split()
method splits the items in the line at spaces places where the text has space
Activity: Store the following data in a text file within PyCharm.
10056 01012017 c Walmart 22.71
10056 01062017 c Fairway 22.24
10056 01112017 c T_Bocks 27.71
10056 01122017 c Koreana 22.63
10289 01032017 c Koreana 14.34
10289 01092017 c Roscoes 50.53
10289 01192017 c Amazon 40.03
10289 01202017 c Walmart 187.13
10289 01222017 c T_Bocks 14.52
10289 01222017 c Pulpit_Rock 28.58
10289 01292017 c Fairway 75.30
19542 01082017 c Pulpit_Rock 21.26
19542 01092017 c Walmart 157.77
19542 01172017 c Amazon 40.10
19542 01192017 c Koreana 31.24
19542 01212017 c Pulpit_Rock 27.71
19542 01302017 c Roscoes 65.82
19542 01302017 c Toppling_Goliath 53.68
10998 01012017 c Fairway 104.21
10998 01032017 c Toppling_Goliath 35.23
10998 01042017 c Oneota_Market 7.28
10998 01042017 c Pulpit_Rock 17.18
10998 01072017 c T_Bocks 43.95
10998 01072017 c Koreana 49.49
10998 01082017 c Pulpit_Rock 51.41
10998 01112017 c Walmart 72.98
10998 01182017 c Koreana 17.43
10998 01202017 c Amazon 7.74
10998 01262017 c Pulpit_Rock 34.47
10998 01292017 c Walmart 15.38
10998 01292017 c Toppling_Goliath 35.66
The code below shows four different ways a text file may be read. Create a new Python file in PyCharm called file_reading.py
. Copy and paste the code shown below into the new file and explore the various ways file reading and printing may be accomplished.
def read_v1(f_in):
print("Reading a file - Version 1 \n")
for line in f_in:
print(line)
def read_v2(f_in):
print("Reading a file - Version 2 \n")
lines = f_in.readlines()
for line in lines:
print(line)
print(lines[-1])
def read_v3(f_in):
print("Reading a file - Version 3 \n")
line = f_in.readline()
while line:
print(line)
line = f_in.readline()
def read_v4(f_in):
print("Reading a file - Version 4 \n")
all_lines = f_in.read()
print(all_lines)
def main():
f_in = open('../data/CC_data.txt', 'r')
read_v1(f_in)
f_in.close()
main()
Summary of Ways :
the f_in = open("file_in_name.txt", 'r')
function returns a reference to the file and can be used to loop through the file, in a sequential-line fashion, to access and/or process each line.
the f_in.readlines()
method creates a list, with a name determined by the user. The list of lines may be accessed sequentially, using a for
loop, or an individual line may be accessed using an appropriate index value. The maximum positive interger value is (len(lines)) - 1
. Index “0” is the first line in the list, index = ‘-1’ is another way of accessing the last line. Using f_in.readlines(n)
results in the first n characters being read. The value of n is rounded up so that the process reads an entire line, not a partial line, before it stops.
the f_in.readline()
method reads a single next line including the new line character \n
. If f_in.readline(n)
is used, then the first n characters of the next line are read if the next line is more than n characters long.
the f_in.read()
method reads the entire file as a single str object (including the \n
characters).
which one of these is used may depend on the purpose of the task. For example, readlines()
allows the programmer to go directly to a specific line number with looping through all preceeding lines. The read()
method allows the entire file to be searched as a single object in order to find certain characters, or character patterns.
The strip()
method – stripping the the new line (\n
) character off the end of the line
print(line.strip())
The split()
method – splitting the line on spaces
Specify the item using an index
line_in = file_in.readline().split()
for item in range(len(line_in)):
print(line_in[item])
Open a txt file to which to write using
file_o = open("file_out_name.txt", 'w')
file_o.write(line.strip())
Activity: Use PyCharm to write a program that reads in the credit card data, and writes all of the data for a given account to a separate file for each account.