paxad.blogg.se - Mbox email format

That makes 5 iterations before anything at all is even parsed. (/facepalm) is a ReadLine()-based parser that sits on top of another StreamReader that does charset conversion. Now you hand off your mega-string to SharpMimeTools which uses a SharpMimeMessageStream which. We are now up to 3 iterations over the input text and no parsing has even happened yet. This requires another pass over your input (copying every char from each string read from ReadLine() into a StringBuilder, presumably?). Then you combine all of the lines you've read into a single mega-string. Not to mention all of the memory allocations going on. So right there, with just reading lines, you have at least 2 passes over your mbox input stream. While I'm sure StreamReader() does internal buffering, it needs to do the following steps:Ī) Convert the block of bytes read from the file into unicode (this requires iterating over the bytes in the byte read from disk to convert the bytes read from the stream into a unicode char).ī) Then it needs to iterate over its internal char, copying each char into a StringBuilder until it finds a '\n'. StreamReader.ReadLine() is not a very optimal way of reading data from a file. The reasons you are finding your current approach to be slow (StreamReader.ReadLine(), combining the text, then passing it off to SharpMimeTools) are because of the following reasons: I suspect it'll be slower than my C implementation, but since the bottleneck is I/O and MimeKit is written to do optimal (4k) reads like GMime is, they should be pretty close. I haven't tested MimeKit for performance yet, but I am using many of the same techniques in C# that I used in C. It's based on earlier MIME & mbox parsers I've written (such as GMime) which were insanely fast (could parse every message in an 1.2GB mbox file in about 1 second). I'm working on a MIME & mbox parser in C# called MimeKit.