Testing write/read speed

Hi there…

I need to store some data, mainly related to mesh verts coords and I need SPEED in reading/writing process too, so I experimented a bit. My testing data was a list of coords of 32 verts of 1 testing mesh which looks like this:

 [13.861026, 15.697771, 13.036913](vector)
[16.298828, 12.333653, 14.428299](vector)
[18.110275, 8.495561, 15.116545](vector)
[19.225758, 4.330988, 15.075292](vector)
[19.602404, -0.000020, 14.306086](vector)
[19.225742, -4.331016, 16.440475](vector)
[18.110250, -8.495571, 14.330877](vector)
[16.298794, -12.333649, 14.152359](vector)
[13.860991, -15.697751, 12.133544](vector)
[10.890518, -18.458599, 12.268692](vector)
[7.501533, -20.510096, 8.596981](vector)
[3.824270, -21.773401, 8.463569](vector)
[0.000044, -22.199970, 4.807958](vector)
[-3.824183, -21.773411, 2.666348](vector)
[-7.501449, -20.510111, 1.891712](vector)
[-10.890436, -18.458622, -0.720490](vector)
[-13.860913, -15.697780, -2.761461](vector)
[-16.298723, -12.333682, -4.152846](vector)
[-18.110186, -8.495609, -4.841144](vector)
[-19.225685, -4.331056, -4.799891](vector)
[-19.602352, -0.000061, -4.030696](vector)
[-19.225714, 4.330938, -6.165120](vector)
[-18.110247, 8.495502, -4.055552](vector)
[-16.298813, 12.333592, -4.987048](vector)
[-13.861027, 15.697713, -1.858239](vector)
[-10.890571, 18.458582, -1.993388](vector)
[-7.501597, 20.510101, 1.678313](vector)
[-3.824341, 21.773434, 1.811715](vector)
[-0.000117, 22.200031, 5.467336](vector)
[3.824112, 21.773497, 6.594309](vector)
[7.501384, 20.510225, 8.383614](vector)
[10.890386, 18.458759, 10.995815](vector)

I’ve performed some testing with this testing code of read/write procs in three versions (part of code, after importing ALL modules required):

def Save_coords_A(lst):
    file_spec = "C:\Blender_Input\Model_coords_2.txt"
    fw = open(file_spec, "w")
    for elem in lst:
        fw.write(str(elem)+"
")
    fw.close()
    st = "Coords data - in "+file_spec
    print st
    print
    
def Save_coords_B(lst):
    file_spec = "C:\Blender_Input\Model_coords_3.txt"
    fw = open(file_spec, "wb")
    for elem in lst:
        print str(elem)
        pickle.dump(list(elem),fw)
    fw.close()
    st = "Coords data - in "+file_spec
    print st
    print
    
def Save_coords_C(lst):
    file_spec = "C:\Blender_Input\Model_coords_4.txt"
    fw = open(file_spec, "wb")
    lst_2 = []
    for elem in lst:
        print str(elem)
        lst_2.append(list(elem))
    pickle.dump(lst_2,fw)
    fw.close()
    st = "Coords data - in "+file_spec
    print st
    print
    
def ExtractXYZ(st):
    li = st.split(", ")
    value = (float(li[0]),float(li[1]),float(li[2]))
    return value  # Data for one mesh vertex

def Read_coords_A():
    file_spec = "C:\Blender_Input\Model_coords_2.txt"
    fr = open(file_spec, "r")
    li = fr.readlines()
    fr.close()
    lst = []
    kl = len(li)
    for i in range(kl):
#    Convert each data into: "1.000000, -1.500000, 0.000000"
        st = li[i][1:-2]
        elem = ExtractXYZ(st)
        lst.append(list(elem))
    return lst
    
def Read_coords_B():
    file_spec = "C:\Blender_Input\Model_coords_3.txt"
    fr = open(file_spec, "rb")
    lst = []
    while True:
        try: 
            elem = pickle.load(fr)
            lst.append(elem)
        except EOFError:
            fr.close()
            return lst
        
def Read_coords_C():
    file_spec = "C:\Blender_Input\Model_coords_4.txt"
    fr = open(file_spec, "rb")
    lst = []
    while True:
        try: 
            lst = pickle.load(fr)
        except EOFError:
            fr.close()
            return lst
    
##############################################################################

def Test_reading_speed(lst):
    Save_coords_A(lst)
    Save_coords_B(lst)
    Save_coords_C(lst)
    
    # Testing the speed of reading different formats of the same data....
    
    k = 100  # number of testing cycles
    
    # Old style - data in strings + converted into 3 floats....
    t1 = sys.time()
    for i in range(k):
        lst = Read_coords_A()
    t2 = sys.time()
    print k,"* Read_coords_A() timing = ",1000*(t2-t1),"milliseconds"
    print
    print "lst = ",lst
    print
    
    # New style - data in directly unpickled into 3 floats....
    t1 = sys.time()
    for i in range(k):
        lst = Read_coords_B()
    t2 = sys.time()
    print k,"* Read_coords_B() timing = ",1000*(t2-t1),"milliseconds"
    print
    print "lst = ",lst
    print
    
    # New style (2) - data in directly unpickled into a list of 3 floats....
    t1 = sys.time()
    for i in range(k):
        lst = Read_coords_C()
    t2 = sys.time()
    print k,"* Read_coords_C() timing = ",1000*(t2-t1),"milliseconds"
    print
    print "lst = ",lst
    print

......................
Test_reading_speed(lst)

The reading procedures basically produce the same list of data read from the corresponding file.

The first set of procedures (A) are very much customised and do convert coords into a string before storing data and convert the string back to a list of floats while reading data. I think this is to take more time so I was looking for more speedy solution…

Procs set (B) relies on pickle functionality of 1 of Python modules. This basically stores any type of data in a more comprehensive way, i.e. by using just 1 built-in proc. Presumably, this will be more speedy… :slight_smile: But unfortunately, Python says it cannot pickle vectors so I needed to pickle data after converting vectors into lists.

Then thinking that I cannot pickle vectors and this cost processing time, I came up with the idea that I can pickle the whole initial list I hace :slight_smile: It is implemented in procs of set ©, OK? :wink:

As you can see, I tested each reading proc (only) 100 times for better measurement… I’ve made the tests on two different comps - Comp_A which is a bit old already, and on Comp_B which is Intel Dual Core 2.0 GHz and is a rather NEW one. :wink: Results of timings became a bit surprising though:

          Test #    Proc_A    Proc_B    Proc_C
Comp_A    Test_1     125       756       450
          Test_2     123       749       453
Comp_B    Test_1      34       127        92
          Test_2      43       132        98

(fugures in milliseconds per 100 testing cycles)

Sooo surprisingly, set A proves to be more effective even there are more conversions done during its work… I dont know how pickle’s procs work though :stuck_out_tongue:

Let me mention that I know the procs are not fully optimised - for example, file specifications can be taken out of the procs but this is NOT the goal of my testing… I think I assured comparable conditions of ALL versions of procs. :wink:

Just to mention that the size of storage files produces with the random data (see the beginning) is about 20% more for case B and C compared with case A. I.e. case A is AGAIN more effective… Even more… When working with more rounded data like:

 [0.000001, -1.000000, 5.000000](vector)
[-0.195089, -0.980786, 5.000000](vector)
[-0.382682, -0.923880, 5.000000](vector)

… => case B and C produce storage files that are 50% larger then in case A!!! :slight_smile: Test_1 on Comp_B shows effect on timings while working with such figures (more rounded, like 5.0; 1.5; 2.0 etc.)

I am working on a project where I’d need a lot of writing/reading stored data and giving the user possibility to store its own data (that is the main idea of flexibility I like to achieve). The above is just an example of my expected needs for the most popular procedures since saving/reading coordinate data will be essential but I suspect that in the run of this project I’d need to store/read different type of data (not only 3 float-list/tuples)… Therefore, I’d like to get a piece of advice on the above tactics (proc sets A, B and C) - which one you’d recommend to use… Should you have a suggestion for a different solution, presumably, a more effective (speedy) one - please share… :yes:

hi,

what type of pickle did you use? In python 2.6 there is still a difference between pickle and cpickle and you have to choose. (in 3.x this is no longer true). The speedup can be dramatic (but you have to check of course what works for you). Also, there are different version of the protocol to consider: if backward compatibility is no issue, use the newest. Finally, pickle is quite versatile but not necessarly fast. For simple dumping/reading of arrays of floats (or vectors) you might consider using the struct module: this module ‘converts’ data to strings, but packing is a better word. It basically allows you to write binary data as a string to disk. (like pickles newer binary protocols).

also another detial i think is relevant here

if you include any print into yourloop this may slow down the whole loop too!
but is nto related to the speed of writing to the disk as such i thnk

on windows any print is limited in speed by the operating system
and i’v noticed that with print a loop can become a lot slower than without a prints command

good luck

@ Ricky - that is why I compare only Reading procs where there are NO print commands :wink:

@ varkenvarken - So far I use pickle module - it is quite visible from the script. In fact I was not aware of cpickle :stuck_out_tongue: I was a bit astonished from the results cause with relatively simple proc in addition (ExtractXYZ in my case), a significantly faster performance is reached plus that storage file is readable. On the contrary, while using any pickle(s), the data stored is not-so-that readable and this may be important for the development phase, i.e. to be able quickly see what is your data in storage file w/o a need to convert (unpack) it by a proc/program :eek: