Advice on string problem please

My problem is this. I have to extract some data from a .las file (a drill log from an oil field). The file has a structure where after the header there are sets of 31 numbers delimited by four (4) spaces except for the first one that has three spaces in front of it.

Here’s an example:
4725.25000 5085.47949 0.19664 0.19059 0.20837 0.23082 0.29800 0.28800 0.23096 0.20655 0.20140 0.19814 0.18736 0.18889 0.20527 0.22971 0.29054 50.56643 129.59918 0.35363 128.97762 1008.37579 2400.50000 -999.25000 -999.25000 -27.49278 -999.25000 -999.25000 12.18924 -999.25000 -0.03994
a typical file has many thousands of these and the model that I will produce uses LOTS of these files.

So let’s say for example I need the fifth and ninth number in each set. How would I ask Python to find those?
Any ideas?
-cc-

The php text editor is removing the spaces between the numbers so you will just have to trust me ~~four spaces between each number

Hi, with the following code you can extract the values from the BIG string:


st = "4725.25000 5085.47949 0.19664 0.19059 0.20837 0.23082 0.29800 0.28800 0.23096 0.20655 0.20140 0.19814 0.18736 0.18889 0.20527 0.22971 0.29054 50.56643 129.59918 0.35363 128.97762 1008.37579 2400.50000 -999.25000 -999.25000 -27.49278 -999.25000 -999.25000 12.18924 -999.25000 -0.03994"

def ExtractValues(st):
    ls = st.split(" ")  # producing a list of strings
    print ls
    print
    lf = list([float(elem) for elem in ls])  # list of floats
    print lf
    print
    n = 1
    print lf[n-1]  # the first element/value
    n = 2
    print lf[n-1]  # the second element/value
    n = 7
    print lf[n-1]  # the 7-th element/value
    print
    

ExtractValues(st)

After extraction of strings the procedure converts then into floats and print elements #1, #2 and #7

The output looks like this:

['4725.25000', '5085.47949', '0.19664', '0.19059', '0.20837', '0.23082', '0.2980
0', '0.28800', '0.23096', '0.20655', '0.20140', '0.19814', '0.18736', '0.18889',
 '0.20527', '0.22971', '0.29054', '50.56643', '129.59918', '0.35363', '128.97762
', '1008.37579', '2400.50000', '-999.25000', '-999.25000', '-27.49278', '-999.25
000', '-999.25000', '12.18924', '-999.25000', '-0.03994']

[4725.25, 5085.4794899999997, 0.19664000000000001, 0.19059000000000001, 0.20837,
 0.23082, 0.29799999999999999, 0.28799999999999998, 0.23096, 0.20655000000000001
, 0.2014, 0.19814000000000001, 0.18736, 0.18889, 0.20527000000000001, 0.22971, 0
.29054000000000002, 50.566429999999997, 129.59917999999999, 0.35363, 128.97762,
1008.3757900000001, 2400.5, -999.25, -999.25, -27.49278, -999.25, -999.25, 12.18
924, -999.25, -0.039940000000000003]

4725.25
5085.47949
0.298

Therefore, depending of your needs, you may wish to round the values stored in lf list :wink:

Regards,

Thanks for the quick response Abidos.

I should have put a finer point on the “four spaces between numbers” bit in my post. (The text editor here on the forum culled out those 3 spaces as useless whitespace)
When that condition exists Python returns:

ValueError: empty string for float()

So I guess the real problem that I am struggling with here is first massaging that string to remove those extra spaces between the numbers

Maybe you could use the string.replace() function for this?

At the bottom of this page.


newString = string.replace(str, "   ", " ")

Unfortunately, this is deprecated. However, there should be an alternative to use in Python 3.0 that I’m not familiar with.

You can just split on " " (four spaces) instead of just " " (one).

Martin

Thanks FunkyWyrm:

This is what worked for me
st = st.replace(’ ‘,’ ')
as opposed to the three arguments in your example

I like Theeths solution the best in this case tho
Thank you everyone!

-cc-