SAF(NET) = STEPHEN A. FUQUA operating on the Web since 1995

Stephen is a web developer, Bahá'í, and interfaith activist in St. Paul, Minnesota. He likes to write about religion, social justice, sustainability, science, programming, &c.

July 23, 2007

Performance #6: Reading Directly Into the Parser

This article is part of the series An Exercise in Performance Tuning in C#.Net.

As I look at the code I now have, I wonder if the fileLines variable is an unnecessary intermediate step. Can I rewrite so that stream.ReadLine() is passed directly into the parsing? If I do so, I’ll be leaving the file open longer, but since no other application should be attempting to access the file, I’m okay with that. This means moving the open file command into MyClass.ProcessFile().

This is what we left off with:

// Read intput file into a string
List<string> fileLines = new List<string>();
using (StreamReader stream = new StreamReader(inputFileName, Encoding.ASCII, true, 800))
{
     string line;
     while ((line = stream.ReadLine()) != null)
          fileLines.Add(line);
}

// Parse the input file
MyClass upload = new MyClass(fileLines);
upload.ProcessFile();


ProcessFile:

// perform various tasks on each line

And here is the new code:

using (StreamReader stream = new StreamReader(inputFileName, Encoding.ASCII, true, 800))
{
     while (stream.Peek() >= 0)
     {
          processSingleLine(stream.ReadLine());
     }
}

I actually made two changes here. It is possible the stream.Peek() statement helped speed things up as well. This change was plainly necessary in order for the second change — passing the result of stream.ReadLine() directly to stream.Peek(). This bit of code is now embedded in the ProcessFile() function rather than in the parent application.

Result: 82% improvement in processing time! That's incredible. Maybe I should explain just a bit more. The processSingleLine() routine is parsing out the input file's data into various objects, and doing some manipulation along the way. In the original version, the file was being opened by the application, and its lines added to an array. This array (or List<string>) was then passed into the ProcessFile() function, which basically just called processSingleLine().

By calling processSingleLine() directly, I end up leaving the file open for as long as the parsing takes. In the original situation I wanted to "open late and close early" — keep the file open only long enough to read its contents into memory. But, here is a crucial point: in this situation, no other application will be trying to read the file. So it does not really matter if the file remains open. By keeping it open and passing the output from readLine() into processSingleLine(), I eliminated the creation of a large array of strings, which creation required considerable overhead (moving in and out of memory).

Comments

Post a comment

Remember personal info?




deprecated

On safnet.com

Other sites managed or developed by S.A.F.

S.A.F. elsewhere on the web

  • LinkedIn
    LinkedIn can actually be useful when looking for prospective hires and business or organizational partners
  • GoodReads
    A fun and relatively-unknown social networking site geared towards one's book list
  • Live Journal
    Mirror of the blog at safnet.com, so that a few LJ friends can more easily read and comment there