Reading a Text File Into a Hashmap and Printing It

Reading and Writing Text Files

Overview

Education: lx min
Exercises: 30 min

Questions

  • How can I read in data that is stored in a file or write data out to a file?

Objectives

  • Be able to open a file and read in the data stored in that file

  • Empathize the departure between the file proper noun, the opened file object, and the data read in from the file

  • Be able to write output to a text file with elementary formatting

Why do we want to read and write files?

Being able to open up and read in files allows us to work with larger data sets, where information technology wouldn't be possible to type in each and every value and store them one-at-a-fourth dimension as variables. Writing files allows us to process our data so save the output to a file so we can look at it later.

Right now, we volition practice working with a comma-delimited text file (.csv) that contains several columns of data. However, what you learn in this lesson tin can exist applied to any full general text file. In the next lesson, you will learn some other manner to read and procedure .csv data.

Paths to files

In order to open a file, we demand to tell Python exactly where the file is located, relative to where Python is currently working (the working directory). In Spyder, we can do this past setting our electric current working directory to the folder where the file is located. Or, when we provide the file name, we can requite a consummate path to the file.

Lesson Setup

We will piece of work with the practice file Plates_output_simple.csv.

  1. Locate the file Plates_output_simple.csv in the directory home/Desktop/workshops/fustigate-git-python.
  2. Copy the file to your working directory, home/Desktop/workshops/YourName.
  3. Brand sure that your working directory is likewise set up to the folder abode/Desktop/workshops/YourName.
  4. As you are working, brand sure that you save your file opening script(south) to this directory.

The File Setup

Let's open up and examine the structure of the file Plates_output_simple.csv. If you open up the file in a text editor, you will come across that the file contains several lines of text.

DataFileRaw

However, this is fairly difficult to read. If you open the file in a spreadsheet program such every bit LibreOfficeCalc or Excel, you tin see that the file is organized into columns, with each column separated by the commas in the prototype above (hence the file extension .csv, which stands for comma-separated values).

DataFileColumns

The file contains one header row, followed by eight rows of information. Each row represents a single plate image. If we look at the cavalcade headings, we can see that we have nerveless data for each plate:

  • The proper name of the prototype from which the data was collected
  • The plate number (there were four plates, with each plate imaged at 2 different fourth dimension points)
  • The growth status (either command or experimental)
  • The observation timepoint (either 24 or 48 hours)
  • Colony count for the plate
  • The average colony size for the plate
  • The percentage of the plate covered by bacterial colonies

We will read in this data file so work to analyze the data.

Opening and reading files is a three-step process

We will open and read the file in three steps.

  1. We will create a variable to hold the name of the file that nosotros want to open.
  2. We volition call a open to open the file.
  3. We will call a function to actually read the information in the file and store it in a variable so that we can procedure information technology.

And so, there's one more than pace to practice!

  • When nosotros are done, we should remember to close the file!

You lot can think of these three steps equally being similar to checking out a book from the library. Beginning, you take to go to the catalog or database to notice out which volume you need (the filename). And then, you have to go and go it off the shelf and open the book up (the open function). Finally, to gain whatsoever information from the volume, yous have to read the words (the read role)!

Here is an example of opening, reading, and closing a file.

                          #Create a variable for the file name              filename              =              'Plates_output_simple.csv'              #This is simply a cord of text              #Open the file              infile              =              open              (              filename              ,              'r'              )              # 'r' says we are opening the file to read, infile is the opened file object that we will read from              #Shop the information from the file in a variable              information              =              infile              .              read              ()              #Print the data in the file              print              (              data              )              #close the file              infile              .              close              ()                      

One time nosotros have read the data in the file into our variable data, we can care for it like whatsoever other variable in our code.

Employ consistent names to make your lawmaking clearer

It is a practiced idea to develop some consistent habits well-nigh the style you open and read files. Using the same (or similar!) variable names each time will make it easier for you to continue track of which variable is the name of the file, which variable is the opened file object, and which variable contains the read-in data.

In these examples, we will use filename for the text string containing the file name, infile for the open file object from which we tin read in data, and information for the variable belongings the contents of the file.

Commands for reading in files

At that place are a variety of commands that allow us to read in information from files.
infile.read() will read in the entire file as a unmarried string of text.
infile.readline() will read in one line at a time (each time you call this control, information technology reads in the next line).
infile.readlines() will read all of the lines into a list, where each line of the file is an particular in the list.

Mixing these commands can accept some unexpected results.

                          #Create a variable for the file name              filename              =              'Plates_output_simple.csv'              #Open the file              infile              =              open              (              filename              ,              'r'              )              #Impress the first 2 lines of the file              print              (              infile              .              readline              ())              print              (              infile              .              readline              ())              #call infile.read()              impress              (              infile              .              read              ())              #close the file              infile              .              close              ()                      

Notice that the infile.read()command started at the third line of the file, where the first two infile.readline() commands left off.

Think of it like this: when the file is opened, a pointer is placed at the top left corner of the file at the beginning of the first line. Any fourth dimension a read part is called, the cursor or pointer advances from where it already is. The first infile.readline() started at the beginning of the file and advanced to the end of the showtime line. Now, the pointer is positioned at the starting time of the second line. The second infile.readline() advanced to the stop of the second line of the file, and left the pointer positioned at the first of the third line. infile.read() began from this position, and advanced through to the end of the file.

In general, if you want to switch betwixt the dissimilar kinds of read commands, you should close the file and and so open information technology again to start over.

Reading all of the lines of a file into a list

infile.readlines() volition read all of the lines into a list, where each line of the file is an particular in the listing. This is extremely useful, because in one case we take read the file in this mode, we can loop through each line of the file and procedure information technology. This arroyo works well on data files where the data is organized into columns similar to a spreadsheet, considering it is likely that we will want to handle each line in the same manner.

The example below demonstrates this arroyo:

                          #Create a variable for the file proper noun              filename              =              "Plates_output_simple.csv"              #Open the file              infile              =              open              (              filename              ,              'r'              )              lines              =              infile              .              readlines              ()              for              line              in              lines              :              #lines is a listing with each item representing a line of the file              if              'control'              in              line              :              print              (              line              )              #print lines for control condition              infile              .              close              ()              #close the file when you're washed!                      

Using .separate() to separate "columns"

Since our data is in a .csv file, we tin can apply the split up command to split up each line of the file into a list. This can be useful if nosotros want to access specific columns of the file.

                          #Create a variable for the file name                            filename              =              "Plates_output_simple.csv"              #Open up the file              infile              =              open              (              filename              ,              'r'              )              lines              =              infile              .              readlines              ()              for              line              in              lines              :              sline              =              line              .              split              (              ','              )              # separates line into a list of items.  ',' tells information technology to split the lines at the commas              print              (              sline              )              #each line is at present a listing              infile              .              close              ()              #E'er close the file!                      

Consistent names, over again

At first glance, the variable name sline in the example in a higher place may not brand much sense. In fact, nosotros chose it to be an abridgement for "split line", which exactly describes the contents of the variable.

You don't accept to use this naming convention if you lot don't want to, but you should work to use consequent variable names across your code for common operations like this. Information technology will make it much easier to open an old script and quickly understand exactly what it is doing.

Converting text to numbers

When we called the readlines() command in the previous code, Python reads in the contents of the file as a cord. If we want our code to recognize something in the file as a number, we demand to tell it this!

For example, float('5.0') will tell Python to care for the text cord '5.0' as the number five.0. int(sline[4]) will tell our code to treat the text string stored in the 5th position of the listing sline as an integer (not-decimal) number.

For each line in the file, the ColonyCount is stored in the fifth column (index 4 with our 0-based counting).
Change the code higher up to print the line only if the ColonyCount is greater than 30.

Solution

                                  #Create a variable for the file proper noun                  filename                  =                  'Plates_output_simple.csv'                  ##Open the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  for                  line                  in                  lines                  [                  1                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  split                  (                  ','                  )                  # separates line into a list of items.  ',' tells it to split the lines at the commas                  colonyCount                  =                  int                  (                  sline                  [                  4                  ])                  #shop the colony count for the line equally an integer                  if                  colonyCount                  >                  30                  :                  print                  (                  sline                  )                  #close the file                  infile                  .                  close                  ()                              

Writing data out to a file

Frequently, nosotros will desire to write data to a new file. This is especially useful if we take done a lot of computations or information processing and we desire to be able to salvage information technology and come up back to it later.

Writing a file is the same multi-step procedure

Just like reading a file, we will open and write the file in multiple steps.

  1. Create a variable to agree the proper name of the file that we want to open up. Often, this will be a new file that doesn't yet exist.
  2. Phone call a function to open the file. This time, we will specify that we are opening the file to write into information technology!
  3. Write the information into the file. This requires some careful attending to formatting.
  4. When we are washed, we should remember to close the file!

The code below gives an example of writing to a file:

                          filename              =              "output.txt"              #w tells python we are opening the file to write into it              outfile              =              open              (              filename              ,              'w'              )              outfile              .              write              (              "This is the first line of the file"              )              outfile              .              write              (              "This is the 2nd line of the file"              )              outfile              .              close              ()              #Close the file when we're washed!                      

Where did my file end up?

Any time you open up a new file and write to it, the file volition be saved in your electric current working directory, unless you lot specified a dissimilar path in the variable filename.

Newline characters

When you examine the file yous just wrote, you will come across that all of the text is on the same line! This is because nosotros must tell Python when to start on a new line by using the special string character '\n'. This newline character will tell Python exactly where to start each new line.

The example below demonstrates how to use newline characters:

                          filename              =              'output_newlines.txt'              #westward tells python we are opening the file to write into it              outfile              =              open              (              filename              ,              'w'              )              outfile              .              write              (              "This is the first line of the file              \n              "              )              outfile              .              write              (              "This is the second line of the file              \n              "              )              outfile              .              close              ()              #Close the file when we're done!                      

Get open the file you just wrote and and cheque that the lines are spaced correctly.:

Dealing with newline characters when you read a file

You may have noticed in the last file reading example that the printed output included newline characters at the end of each line of the file:

['colonies02.tif', 'two', 'exp', '24', '84', '3.2', '22\n']
['colonies03.tif', '3', 'exp', '24', '792', '3', '78\n']
['colonies06.tif', '2', 'exp', '48', '85', 'v.2', '46\n']

Nosotros tin go rid of these newlines by using the .strip() function, which will get rid of newline characters:

                              #Create a variable for the file name                filename                =                'Plates_output_simple.csv'                ##Open the file                infile                =                open                (                filename                ,                'r'                )                lines                =                infile                .                readlines                ()                for                line                in                lines                [                1                :]:                #skip the commencement line, which is the header                sline                =                line                .                strip                ()                #become rid of trailing newline characters at the end of the line                sline                =                sline                .                divide                (                ','                )                # separates line into a list of items.  ',' tells it to carve up the lines at the commas                colonyCount                =                int                (                sline                [                4                ])                #store the colony count for the line as an integer                if                colonyCount                >                xxx                :                print                (                sline                )                #close the file                infile                .                close                ()                          

Writing numbers to files

Simply like Python automatically reads files in every bit strings, the write()function expects to but write strings. If we want to write numbers to a file, nosotros will need to "cast" them every bit strings using the function str().

The lawmaking below shows an example of this:

                          numbers              =              range              (              0              ,              ten              )              filename              =              "output_numbers.txt"              #west tells python we are opening the file to write into it              outfile              =              open              (              filename              ,              'w'              )              for              number              in              numbers              :              outfile              .              write              (              str              (              number              ))              outfile              .              shut              ()              #Close the file when we're washed!                      

Writing new lines and numbers

Go open up and examine the file yous just wrote. You lot will run across that all of the numbers are written on the same line.

Modify the code to write each number on its ain line.

Solution

                                  numbers                  =                  range                  (                  0                  ,                  ten                  )                  #Create the range of numbers                  filename                  =                  "output_numbers.txt"                  #provide the file proper name                  #open the file in 'write' mode                  outfile                  =                  open                  (                  filename                  ,                  'due west'                  )                  for                  number                  in                  numbers                  :                  outfile                  .                  write                  (                  str                  (                  number                  )                  +                  '                  \due north                  '                  )                  outfile                  .                  shut                  ()                  #Close the file when we're done!                              

The file you just wrote should exist saved in your Working Directory. Open the file and check that the output is correctly formatted with one number on each line.

Opening files in different 'modes'

When we have opened files to read or write data, we have used the function parameter 'r' or 'w' to specify which "way" to open the file.
'r' indicates we are opening the file to read data from it.
'west' indicates we are opening the file to write data into it.

Be very, very careful when opening an existing file in 'w' mode.
'w' volition over-write whatsoever information that is already in the file! The overwritten data will be lost!

If you want to add on to what is already in the file (instead of erasing and over-writing it), y'all can open the file in append way by using the 'a' parameter instead.

Pulling it all together

Read in the data from the file Plates_output_simple.csv that we accept been working with. Write a new csv-formatted file that contains only the rows for control plates.
You will demand to do the following steps:

  1. Open the file.
  2. Utilise .readlines() to create a list of lines in the file. And so close the file!
  3. Open a file to write your output into.
  4. Write the header line of the output file.
  5. Use a for loop to allow you to loop through each line in the listing of lines from the input file.
  6. For each line, cheque if the growth condition was experimental or control.
  7. For the control lines, write the line of data to the output file.
  8. Close the output file when you're washed!

Solution

Hither'south 1 style to practice it:

                                  #Create a variable for the file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open the file                  infile                  =                  open up                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  #We volition process the lines of the file later                  #close the input file                  infile                  .                  shut                  ()                  #Create the file we volition write to                  filename                  =                  'ControlPlatesData.txt'                  outfile                  =                  open                  (                  filename                  ,                  'w'                  )                  outfile                  .                  write                  (                  lines                  [                  0                  ])                  #This will write the header line of the file                                    for                  line                  in                  lines                  [                  1                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  separate                  (                  ','                  )                  # separates line into a list of items.  ',' tells it to split the lines at the commas                  condition                  =                  sline                  [                  2                  ]                  #shop the condition for the line equally a string                  if                  condition                  ==                  "control"                  :                  outfile                  .                  write                  (                  line                  )                  #The variable line is already formatted correctly!                  outfile                  .                  close                  ()                  #Close the file when we're washed!                              

Challenge Problem

Open and read in the data from Plates_output_simple.csv. Write a new csv-formatted file that contains only the rows for the command condition and includes only the columns for Fourth dimension, colonyCount, avgColonySize, and percentColonyArea. Hint: you tin can utilize the .join() part to join a list of items into a string.

                              names                =                [                'Erin'                ,                'Mark'                ,                'Tessa'                ]                nameString                =                ', '                .                join                (                names                )                #the ', ' tells Python to join the list with each item separated past a comma + space                print                (                nameString                )                          

'Erin, Marking, Tessa'

Solution

                                  #Create a variable for the input file name                  filename                  =                  'Plates_output_simple.csv'                  ##Open the file                  infile                  =                  open                  (                  filename                  ,                  'r'                  )                  lines                  =                  infile                  .                  readlines                  ()                  #We volition process the lines of the file later                  #close the file                  infile                  .                  close                  ()                  # Create the file we volition write to                  filename                  =                  'ControlPlatesData_Reduced.txt'                  outfile                  =                  open up                  (                  filename                  ,                  'w'                  )                  #Write the header line                  headerList                  =                  lines                  [                  0                  ]                  .                  divide                  (                  ','                  )[                  3                  :]                  #This will return the list of column headers from 'fourth dimension' on                  headerString                  =                  ','                  .                  join                  (                  headerList                  )                  #join the items in the list with commas                  outfile                  .                  write                  (                  headerString                  )                  #There is already a newline at the end, so no demand to add together one                  #Write the remaining lines                  for                  line                  in                  lines                  [                  ane                  :]:                  #skip the first line, which is the header                  sline                  =                  line                  .                  dissever                  (                  ','                  )                  # separates line into a list of items.  ',' tells information technology to split the lines at the commas                  condition                  =                  sline                  [                  2                  ]                  #store the colony count for the line as an integer                  if                  status                  ==                  "control"                  :                  dataList                  =                  sline                  [                  three                  :]                  dataString                  =                  ','                  .                  join                  (                  dataList                  )                  outfile                  .                  write                  (                  dataString                  )                  #The variable line is already formatted correctly!                  outfile                  .                  close                  ()                  #Close the file when we're done!                              

Key Points

  • Opening and reading a file is a multistep process: Defining the filename, opening the file, and reading the data

  • Data stored in files can be read in using a multifariousness of commands

  • Writing data to a file requires attending to data types and formatting that isn't necessary with a print() argument

pledgercersecove.blogspot.com

Source: https://eldoyle.github.io/PythonIntro/08-ReadingandWritingTextFiles/

0 Response to "Reading a Text File Into a Hashmap and Printing It"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel