Building Really Long Strings
14.10.2013 1 Kommentar
Dynamically building strings with Gupta is actually pretty fast – at least if the resulting string is below 1,000,000 characters long. The need for the composition of strings beyond 32kByte is also low: SQL statements are usually limited to 32 KB length, otherwise the input buffer is complaining. The multiline text box is limited to this length also. Row-oriented file exports consist rarely of extremely long rows of data.
Concatenate can take some time
But, when serializing data in XML or JSON format, it can happen that one has to build very long strings of about 1-10 MB of characters when you cant write parts to disk inbetween. What you do ist to concatenate short stringparts to a large output string (for example, an XML tag with content and end tag). From a certain length on the time required to append additional strings is getting extreme. Building a string with up to 500,000 characters takes about two seconds on an average desktop computer. A million characters take up to 40 seconds. For ten million characters, I had no more patience. That had to be optimized. Because in itself, Gupta has no problem with very large strings: one can read files with tens of megabytes in size with SalFileRead(…) as a BLOB. But frequently concatenating short strings can become very time consuming.
Analysis
For a quantitative analysis I have written a small program. In a loop I append short strings of length 10 to sOut. Once per 100 passes the time is measured and exported to a CSV file. The result is plotted in the following diagram:
You can see a sharp bend at stringlength 500,000 characters from which on adding further stringparts takes much longer (in debug mode the bend is located at 100,000 characters).
Accelerator
Adding stringparts to a temporary string which will be concatenated to the output string when it reaches a length of N characters, you can speed up the process as a whole. I have determined the optimal value for N through a series of tests. It lies between 10,000 and 20,000 as can be seen in the following graphic. To create a 10 MByte string from a total of one million string attachments takes only 13 seconds.
StringBuilder
I packed the behavior described above in a class StringBuilder with the following methods: Append(…) appends a stringpart, ToString() returns the output string. Lenght() returns the current length of the output string and clear() resets it. You can download the source code of StringBuilder including the performance tests.
Download
StringBuilderPerfomanceTests.zip
Happy coding
Hello Thomas Great post on StringBuilder ! Would like to investigate your findings more…but your link to ‚StringBuilderPerfomanceTests.zip‘ doesn’t seem to work .