Insert str into larger str in the most pythonic wayWhat is the most pythonic way to solve a differential...

Where can I get an anonymous Rav Kav card issued?

ArcMap not displaying attribute table?

How do email clients "send later" without storing a password?

Can I say "I have encrypted something" if I hash something?

Tracks in the snow

extract lines from bottom until regex match

Will replacing a fake visa with a different fake visa cause me problems when applying for a legal study permit?

Might have gotten a coworker sick, should I address this?

Can Boris Johnson request a Brexit extension to November 1st?

Job offer without any details but asking me to withdraw other applications - is it normal?

Is the union of a chain of elementary embeddings elementary?

Can the UK veto its own extension request?

How can I protect myself in case of a human attack like the murders of the hikers Jespersen and Ueland in Morocco?

Have there been any countries that voted themselves out of existence?

Should I leave the first authourship of our paper to the student who did the project whereas I solved it?

Why the word "rain" is considered a verb if it is not possible to conjugate it?

Do ibuprofen or paracetamol cause hearing loss?

Why was "leaping into the river" a valid trial outcome to prove one's innocence?

Are the definite and indefinite integrals actually two different things? Where is the flaw in my understanding?

Why would "an mule" be used instead of "a mule"?

Can a new chain significantly improve the riding experience? If yes - what else can?

Can I toggle Do Not Disturb on/off on my Mac as easily as I can on my iPhone?

Evidence that matrix multiplication cannot be done in O(n^2 poly(log(n))) time

Using the pipe operator ("|") when executing system commands

Insert str into larger str in the most pythonic way

What is the most pythonic way to solve a differential equation using the Euler method?Insert a character into a stringMost pythonic way to combine elements of arbitrary lists into a single listInsert elements into a listPythonic way to flatten nested dictionarysStoring the API endpoints in a Pythonic wayPythonic way of checking for dict keysInsert into SQLite3 databasepythonic way to split bytes

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I want to insert a small str into a larger one in the most pythonic way.

Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.

I have tried something with reduce but it didn't work and it was too complex.

Code:

from random import choice, randrange

from typing import List





def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:

    text: str = text

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'  # solution 1

        text = text[:offset] + tag + text[offset:]  # solution 2

    return text





if __name__ == '__main__':

    tag_nbr = 30

    # Text example

    base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do ' 

                'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut ' 

                'enim ad minim veniam, quis nostrud exercitation ullamco laboris ' 

                'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor ' 

                'in reprehenderit in voluptate velit esse cillum dolore eu fugiat ' 

                'nulla pariatur. Excepteur sint occaecat cupidatat non proident, ' 

                'sunt in culpa qui officia deserunt mollit anim id est laborum.' 

        .replace(',', '').replace('.', '')

    # Random list of tag only for testing purpose

    tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]

    # Ordered list of random offset only for testing purpose

    offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]

    offset_li.sort(reverse=True)

    # Debug

    print(tag_li, offset_li, sep='n')



    # Review

    new_text: str = insert_tag(base_text, offset_li, tag_li)

    # End Review



    # Debug

    print(new_text)

Output:

"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"

PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

2

$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago

2

$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago

add a comment |

I want to insert a small str into a larger one in the most pythonic way.

Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.

I have tried something with reduce but it didn't work and it was too complex.

Code:

from random import choice, randrange

from typing import List





def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:

    text: str = text

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'  # solution 1

        text = text[:offset] + tag + text[offset:]  # solution 2

    return text





if __name__ == '__main__':

    tag_nbr = 30

    # Text example

    base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do ' 

                'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut ' 

                'enim ad minim veniam, quis nostrud exercitation ullamco laboris ' 

                'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor ' 

                'in reprehenderit in voluptate velit esse cillum dolore eu fugiat ' 

                'nulla pariatur. Excepteur sint occaecat cupidatat non proident, ' 

                'sunt in culpa qui officia deserunt mollit anim id est laborum.' 

        .replace(',', '').replace('.', '')

    # Random list of tag only for testing purpose

    tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]

    # Ordered list of random offset only for testing purpose

    offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]

    offset_li.sort(reverse=True)

    # Debug

    print(tag_li, offset_li, sep='n')



    # Review

    new_text: str = insert_tag(base_text, offset_li, tag_li)

    # End Review



    # Debug

    print(new_text)

Output:

"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"

PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

2

$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago

2

$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago

add a comment |

I want to insert a small str into a larger one in the most pythonic way.

Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.

I have tried something with reduce but it didn't work and it was too complex.

Code:

from random import choice, randrange

from typing import List





def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:

    text: str = text

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'  # solution 1

        text = text[:offset] + tag + text[offset:]  # solution 2

    return text





if __name__ == '__main__':

    tag_nbr = 30

    # Text example

    base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do ' 

                'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut ' 

                'enim ad minim veniam, quis nostrud exercitation ullamco laboris ' 

                'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor ' 

                'in reprehenderit in voluptate velit esse cillum dolore eu fugiat ' 

                'nulla pariatur. Excepteur sint occaecat cupidatat non proident, ' 

                'sunt in culpa qui officia deserunt mollit anim id est laborum.' 

        .replace(',', '').replace('.', '')

    # Random list of tag only for testing purpose

    tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]

    # Ordered list of random offset only for testing purpose

    offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]

    offset_li.sort(reverse=True)

    # Debug

    print(tag_li, offset_li, sep='n')



    # Review

    new_text: str = insert_tag(base_text, offset_li, tag_li)

    # End Review



    # Debug

    print(new_text)

Output:

"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"

PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

I want to insert a small str into a larger one in the most pythonic way.

Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.

I have tried something with reduce but it didn't work and it was too complex.

Code:

from random import choice, randrange

from typing import List





def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:

    text: str = text

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'  # solution 1

        text = text[:offset] + tag + text[offset:]  # solution 2

    return text





if __name__ == '__main__':

    tag_nbr = 30

    # Text example

    base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do ' 

                'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut ' 

                'enim ad minim veniam, quis nostrud exercitation ullamco laboris ' 

                'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor ' 

                'in reprehenderit in voluptate velit esse cillum dolore eu fugiat ' 

                'nulla pariatur. Excepteur sint occaecat cupidatat non proident, ' 

                'sunt in culpa qui officia deserunt mollit anim id est laborum.' 

        .replace(',', '').replace('.', '')

    # Random list of tag only for testing purpose

    tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]

    # Ordered list of random offset only for testing purpose

    offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]

    offset_li.sort(reverse=True)

    # Debug

    print(tag_li, offset_li, sep='n')



    # Review

    new_text: str = insert_tag(base_text, offset_li, tag_li)

    # End Review



    # Debug

    print(new_text)

Output:

"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"

PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.

python strings

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

edited 13 hours ago

Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

asked 17 hours ago

Dorian Turba

1266 bronze badges

asked 17 hours ago

Dorian Turba

1266 bronze badges

New contributor

2

$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago

2

$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago

add a comment |

2

$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago

2

$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago

Please see What to do when someone answers. I have rolled back Rev 5 → 3

– Sᴀᴍ Onᴇᴌᴀ
13 hours ago

Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.

I didn't see that, thank you

– Dorian Turba
13 hours ago

Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.

I didn't see that, thank you

– Dorian Turba
13 hours ago

add a comment |

3 Answers
3

active

oldest

votes

The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...) if you need to support older Python versions) looks fine. However, you overwrite text each loop, but continue to use the originally provided offset's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:

In [1]: insert_tag('abc', [1, 2], ['x', 'y'])

Out[1]: 'axyyxbc'

I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.

It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:

In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):

   ...:     offsets = [0] + list(offsets)

   ...:     chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset

   ...: s[1:], tags)]

   ...:     chunks.append([text[offsets[-1]:]])

   ...:     return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)





In [3]: insert_tags('abc', [1, 2], ['x', 'y'])

Out[3]: 'axbyc'

Other stylistic considerations:

Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.

Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago

3

$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago

$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago

|
show 3 more comments

Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text (using values in offsets) and the tags into a list, then call ''.join(<list>) at the end to generate the output; (2) split text into segments, create a format string using '{}'.join(segments), then finally apply the format string to tags to generate the output.

Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).

For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.

edited 13 hours ago

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago

1

$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago

$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago

add a comment |

Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:

#!/usr/bin/env python3



from io import StringIO

from timeit import timeit





# Assume that 'offsets' decreases monotonically





def soln1(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'

    return text





def soln2(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = text[:offset] + tag + text[offset:]

    return text





def gen_join(text: str, offsets: tuple, tags: tuple) -> str:

    offsets += (0,)

    return ''.join(

        text[offsets[i+1]:offsets[i]] + tag

        for i, tag in reversed(tuple(enumerate(tags)))

    ) + text[offsets[0]:]





def naive(text: str, offsets: tuple, tags: tuple) -> str:

    output = text[:offsets[-1]]

    for i in range(len(tags)-1,-1,-1):

        output += tags[i] + text[offsets[i]:offsets[i-1]]

    return output + text[offsets[0]:]





def strio(text: str, offsets: tuple, tags: tuple) -> str:

    output = StringIO()

    output.write(text[:offsets[-1]])

    for i in range(len(tags)-1,-1,-1):

        output.write(tags[i])

        output.write(text[offsets[i]:offsets[i-1]])

    output.write(text[offsets[0]:])

    return output.getvalue()





def ranges(text: str, offsets: tuple, tags: tuple) -> str:

    final_len = len(text) + sum(len(t) for t in tags)

    output = [None]*final_len

    offsets += (0,)



    begin_text = 0

    for i in range(len(tags)-1,-1,-1):

        o1, o2 = offsets[i+1], offsets[i]

        end_text = begin_text + o2 - o1

        output[begin_text: end_text] = text[o1: o2]



        tag = tags[i]

        end_tag = end_text + len(tag)

        output[end_text: end_tag] = tag



        begin_text = end_tag



    output[begin_text:] = text[offsets[0]:]

    return ''.join(output)





def insertion(text: str, offsets: tuple, tags: tuple) -> str:

    output = []

    offsets = (1+len(tags),) + offsets

    for i in range(len(tags)):

        output.insert(0, text[offsets[i+1]: offsets[i]])

        output.insert(0, tags[i])

    output.insert(0, text[:offsets[-1]])

    return ''.join(output)





def test(fun):

    actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))

    expected = 'aAbBcCdDe'

    assert expected == actual





def main():

    funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)

    for fun in funs:

        test(fun)



    N = 5_000  # number of averaging repetitions

    L = 150  # input length

    text = 'abcde' * (L//5)

    offsets = tuple(range(L-1, 0, -1))

    tags = text[-1:0:-1].upper()



    print(f'{"Name":10} {"time/call, us":15}')

    for fun in funs:

        def call():

            return fun(text, offsets, tags)

        dur = timeit(stmt=call, number=N)

        print(f'{fun.__name__:10} {dur/N*1e6:.1f}')





main()

This yields:

Name       time/call, us  

soln1      134.2

soln2      133.2

gen_join   289.7

naive      116.8

strio      159.6

ranges     513.9

insertion  204.7

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago

$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago

$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago

$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f227778%2finsert-str-into-larger-str-in-the-most-pythonic-way%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

In [1]: insert_tag('abc', [1, 2], ['x', 'y'])

Out[1]: 'axyyxbc'

I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.

In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):

   ...:     offsets = [0] + list(offsets)

   ...:     chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset

   ...: s[1:], tags)]

   ...:     chunks.append([text[offsets[-1]:]])

   ...:     return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)





In [3]: insert_tags('abc', [1, 2], ['x', 'y'])

Out[3]: 'axbyc'

Other stylistic considerations:

Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.

Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago

3

$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago

$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago

|
show 3 more comments

In [1]: insert_tag('abc', [1, 2], ['x', 'y'])

Out[1]: 'axyyxbc'

I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.

In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):

   ...:     offsets = [0] + list(offsets)

   ...:     chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset

   ...: s[1:], tags)]

   ...:     chunks.append([text[offsets[-1]:]])

   ...:     return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)





In [3]: insert_tags('abc', [1, 2], ['x', 'y'])

Out[3]: 'axbyc'

Other stylistic considerations:

Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.

Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago

3

$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago

$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago

|
show 3 more comments

In [1]: insert_tag('abc', [1, 2], ['x', 'y'])

Out[1]: 'axyyxbc'

I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.

In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):

   ...:     offsets = [0] + list(offsets)

   ...:     chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset

   ...: s[1:], tags)]

   ...:     chunks.append([text[offsets[-1]:]])

   ...:     return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)





In [3]: insert_tags('abc', [1, 2], ['x', 'y'])

Out[3]: 'axbyc'

Other stylistic considerations:

Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.

Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

In [1]: insert_tag('abc', [1, 2], ['x', 'y'])

Out[1]: 'axyyxbc'

I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.

In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):

   ...:     offsets = [0] + list(offsets)

   ...:     chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset

   ...: s[1:], tags)]

   ...:     chunks.append([text[offsets[-1]:]])

   ...:     return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)





In [3]: insert_tags('abc', [1, 2], ['x', 'y'])

Out[3]: 'axbyc'

Other stylistic considerations:

Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.

Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

answered 14 hours ago

scnerd

1,0801 silver badge8 bronze badges

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago

3

$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago

$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago

|
show 3 more comments

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago

3

$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago

$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago

$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago

The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.

– Dorian Turba
14 hours ago

That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.

– scnerd
13 hours ago

Ok, I update the question with your improvement.

– Dorian Turba
13 hours ago

To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'

– Dorian Turba
13 hours ago

The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]

– Dorian Turba
13 hours ago

|
show 3 more comments

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).

For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.

edited 13 hours ago

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago

1

$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago

$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago

add a comment |

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).

For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.

edited 13 hours ago

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago

1

$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago

$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago

add a comment |

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).

For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.

edited 13 hours ago

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).

For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.

edited 13 hours ago

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

edited 13 hours ago

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

answered 14 hours ago

GZ0

4664 bronze badges

answered 14 hours ago

GZ0

4664 bronze badges

New contributor

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago

1

$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago

$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago

add a comment |

$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago

1

$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago

$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago

The offset still valid since I do offset_li.sort(reverse=True)

– Dorian Turba
14 hours ago

Sorry I overlooked that part. I have corrected my answer.

– GZ0
13 hours ago

For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834

– Reinderien
2 hours ago

add a comment |

#!/usr/bin/env python3



from io import StringIO

from timeit import timeit





# Assume that 'offsets' decreases monotonically





def soln1(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'

    return text





def soln2(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = text[:offset] + tag + text[offset:]

    return text





def gen_join(text: str, offsets: tuple, tags: tuple) -> str:

    offsets += (0,)

    return ''.join(

        text[offsets[i+1]:offsets[i]] + tag

        for i, tag in reversed(tuple(enumerate(tags)))

    ) + text[offsets[0]:]





def naive(text: str, offsets: tuple, tags: tuple) -> str:

    output = text[:offsets[-1]]

    for i in range(len(tags)-1,-1,-1):

        output += tags[i] + text[offsets[i]:offsets[i-1]]

    return output + text[offsets[0]:]





def strio(text: str, offsets: tuple, tags: tuple) -> str:

    output = StringIO()

    output.write(text[:offsets[-1]])

    for i in range(len(tags)-1,-1,-1):

        output.write(tags[i])

        output.write(text[offsets[i]:offsets[i-1]])

    output.write(text[offsets[0]:])

    return output.getvalue()





def ranges(text: str, offsets: tuple, tags: tuple) -> str:

    final_len = len(text) + sum(len(t) for t in tags)

    output = [None]*final_len

    offsets += (0,)



    begin_text = 0

    for i in range(len(tags)-1,-1,-1):

        o1, o2 = offsets[i+1], offsets[i]

        end_text = begin_text + o2 - o1

        output[begin_text: end_text] = text[o1: o2]



        tag = tags[i]

        end_tag = end_text + len(tag)

        output[end_text: end_tag] = tag



        begin_text = end_tag



    output[begin_text:] = text[offsets[0]:]

    return ''.join(output)





def insertion(text: str, offsets: tuple, tags: tuple) -> str:

    output = []

    offsets = (1+len(tags),) + offsets

    for i in range(len(tags)):

        output.insert(0, text[offsets[i+1]: offsets[i]])

        output.insert(0, tags[i])

    output.insert(0, text[:offsets[-1]])

    return ''.join(output)





def test(fun):

    actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))

    expected = 'aAbBcCdDe'

    assert expected == actual





def main():

    funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)

    for fun in funs:

        test(fun)



    N = 5_000  # number of averaging repetitions

    L = 150  # input length

    text = 'abcde' * (L//5)

    offsets = tuple(range(L-1, 0, -1))

    tags = text[-1:0:-1].upper()



    print(f'{"Name":10} {"time/call, us":15}')

    for fun in funs:

        def call():

            return fun(text, offsets, tags)

        dur = timeit(stmt=call, number=N)

        print(f'{fun.__name__:10} {dur/N*1e6:.1f}')





main()

This yields:

Name       time/call, us  

soln1      134.2

soln2      133.2

gen_join   289.7

naive      116.8

strio      159.6

ranges     513.9

insertion  204.7

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago

$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago

$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago

$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago

add a comment |

#!/usr/bin/env python3



from io import StringIO

from timeit import timeit





# Assume that 'offsets' decreases monotonically





def soln1(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'

    return text





def soln2(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = text[:offset] + tag + text[offset:]

    return text





def gen_join(text: str, offsets: tuple, tags: tuple) -> str:

    offsets += (0,)

    return ''.join(

        text[offsets[i+1]:offsets[i]] + tag

        for i, tag in reversed(tuple(enumerate(tags)))

    ) + text[offsets[0]:]





def naive(text: str, offsets: tuple, tags: tuple) -> str:

    output = text[:offsets[-1]]

    for i in range(len(tags)-1,-1,-1):

        output += tags[i] + text[offsets[i]:offsets[i-1]]

    return output + text[offsets[0]:]





def strio(text: str, offsets: tuple, tags: tuple) -> str:

    output = StringIO()

    output.write(text[:offsets[-1]])

    for i in range(len(tags)-1,-1,-1):

        output.write(tags[i])

        output.write(text[offsets[i]:offsets[i-1]])

    output.write(text[offsets[0]:])

    return output.getvalue()





def ranges(text: str, offsets: tuple, tags: tuple) -> str:

    final_len = len(text) + sum(len(t) for t in tags)

    output = [None]*final_len

    offsets += (0,)



    begin_text = 0

    for i in range(len(tags)-1,-1,-1):

        o1, o2 = offsets[i+1], offsets[i]

        end_text = begin_text + o2 - o1

        output[begin_text: end_text] = text[o1: o2]



        tag = tags[i]

        end_tag = end_text + len(tag)

        output[end_text: end_tag] = tag



        begin_text = end_tag



    output[begin_text:] = text[offsets[0]:]

    return ''.join(output)





def insertion(text: str, offsets: tuple, tags: tuple) -> str:

    output = []

    offsets = (1+len(tags),) + offsets

    for i in range(len(tags)):

        output.insert(0, text[offsets[i+1]: offsets[i]])

        output.insert(0, tags[i])

    output.insert(0, text[:offsets[-1]])

    return ''.join(output)





def test(fun):

    actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))

    expected = 'aAbBcCdDe'

    assert expected == actual





def main():

    funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)

    for fun in funs:

        test(fun)



    N = 5_000  # number of averaging repetitions

    L = 150  # input length

    text = 'abcde' * (L//5)

    offsets = tuple(range(L-1, 0, -1))

    tags = text[-1:0:-1].upper()



    print(f'{"Name":10} {"time/call, us":15}')

    for fun in funs:

        def call():

            return fun(text, offsets, tags)

        dur = timeit(stmt=call, number=N)

        print(f'{fun.__name__:10} {dur/N*1e6:.1f}')





main()

This yields:

Name       time/call, us  

soln1      134.2

soln2      133.2

gen_join   289.7

naive      116.8

strio      159.6

ranges     513.9

insertion  204.7

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago

$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago

$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago

$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago

add a comment |

#!/usr/bin/env python3



from io import StringIO

from timeit import timeit





# Assume that 'offsets' decreases monotonically





def soln1(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'

    return text





def soln2(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = text[:offset] + tag + text[offset:]

    return text





def gen_join(text: str, offsets: tuple, tags: tuple) -> str:

    offsets += (0,)

    return ''.join(

        text[offsets[i+1]:offsets[i]] + tag

        for i, tag in reversed(tuple(enumerate(tags)))

    ) + text[offsets[0]:]





def naive(text: str, offsets: tuple, tags: tuple) -> str:

    output = text[:offsets[-1]]

    for i in range(len(tags)-1,-1,-1):

        output += tags[i] + text[offsets[i]:offsets[i-1]]

    return output + text[offsets[0]:]





def strio(text: str, offsets: tuple, tags: tuple) -> str:

    output = StringIO()

    output.write(text[:offsets[-1]])

    for i in range(len(tags)-1,-1,-1):

        output.write(tags[i])

        output.write(text[offsets[i]:offsets[i-1]])

    output.write(text[offsets[0]:])

    return output.getvalue()





def ranges(text: str, offsets: tuple, tags: tuple) -> str:

    final_len = len(text) + sum(len(t) for t in tags)

    output = [None]*final_len

    offsets += (0,)



    begin_text = 0

    for i in range(len(tags)-1,-1,-1):

        o1, o2 = offsets[i+1], offsets[i]

        end_text = begin_text + o2 - o1

        output[begin_text: end_text] = text[o1: o2]



        tag = tags[i]

        end_tag = end_text + len(tag)

        output[end_text: end_tag] = tag



        begin_text = end_tag



    output[begin_text:] = text[offsets[0]:]

    return ''.join(output)





def insertion(text: str, offsets: tuple, tags: tuple) -> str:

    output = []

    offsets = (1+len(tags),) + offsets

    for i in range(len(tags)):

        output.insert(0, text[offsets[i+1]: offsets[i]])

        output.insert(0, tags[i])

    output.insert(0, text[:offsets[-1]])

    return ''.join(output)





def test(fun):

    actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))

    expected = 'aAbBcCdDe'

    assert expected == actual





def main():

    funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)

    for fun in funs:

        test(fun)



    N = 5_000  # number of averaging repetitions

    L = 150  # input length

    text = 'abcde' * (L//5)

    offsets = tuple(range(L-1, 0, -1))

    tags = text[-1:0:-1].upper()



    print(f'{"Name":10} {"time/call, us":15}')

    for fun in funs:

        def call():

            return fun(text, offsets, tags)

        dur = timeit(stmt=call, number=N)

        print(f'{fun.__name__:10} {dur/N*1e6:.1f}')





main()

This yields:

Name       time/call, us  

soln1      134.2

soln2      133.2

gen_join   289.7

naive      116.8

strio      159.6

ranges     513.9

insertion  204.7

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

#!/usr/bin/env python3



from io import StringIO

from timeit import timeit





# Assume that 'offsets' decreases monotonically





def soln1(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = f'{text[:offset]}{tag}{text[offset:]}'

    return text





def soln2(text: str, offsets: tuple, tags: tuple) -> str:

    for offset, tag in zip(offsets, tags):

        text = text[:offset] + tag + text[offset:]

    return text





def gen_join(text: str, offsets: tuple, tags: tuple) -> str:

    offsets += (0,)

    return ''.join(

        text[offsets[i+1]:offsets[i]] + tag

        for i, tag in reversed(tuple(enumerate(tags)))

    ) + text[offsets[0]:]





def naive(text: str, offsets: tuple, tags: tuple) -> str:

    output = text[:offsets[-1]]

    for i in range(len(tags)-1,-1,-1):

        output += tags[i] + text[offsets[i]:offsets[i-1]]

    return output + text[offsets[0]:]





def strio(text: str, offsets: tuple, tags: tuple) -> str:

    output = StringIO()

    output.write(text[:offsets[-1]])

    for i in range(len(tags)-1,-1,-1):

        output.write(tags[i])

        output.write(text[offsets[i]:offsets[i-1]])

    output.write(text[offsets[0]:])

    return output.getvalue()





def ranges(text: str, offsets: tuple, tags: tuple) -> str:

    final_len = len(text) + sum(len(t) for t in tags)

    output = [None]*final_len

    offsets += (0,)



    begin_text = 0

    for i in range(len(tags)-1,-1,-1):

        o1, o2 = offsets[i+1], offsets[i]

        end_text = begin_text + o2 - o1

        output[begin_text: end_text] = text[o1: o2]



        tag = tags[i]

        end_tag = end_text + len(tag)

        output[end_text: end_tag] = tag



        begin_text = end_tag



    output[begin_text:] = text[offsets[0]:]

    return ''.join(output)





def insertion(text: str, offsets: tuple, tags: tuple) -> str:

    output = []

    offsets = (1+len(tags),) + offsets

    for i in range(len(tags)):

        output.insert(0, text[offsets[i+1]: offsets[i]])

        output.insert(0, tags[i])

    output.insert(0, text[:offsets[-1]])

    return ''.join(output)





def test(fun):

    actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))

    expected = 'aAbBcCdDe'

    assert expected == actual





def main():

    funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)

    for fun in funs:

        test(fun)



    N = 5_000  # number of averaging repetitions

    L = 150  # input length

    text = 'abcde' * (L//5)

    offsets = tuple(range(L-1, 0, -1))

    tags = text[-1:0:-1].upper()



    print(f'{"Name":10} {"time/call, us":15}')

    for fun in funs:

        def call():

            return fun(text, offsets, tags)

        dur = timeit(stmt=call, number=N)

        print(f'{fun.__name__:10} {dur/N*1e6:.1f}')





main()

This yields:

Name       time/call, us  

soln1      134.2

soln2      133.2

gen_join   289.7

naive      116.8

strio      159.6

ranges     513.9

insertion  204.7

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

answered 2 hours ago

Reinderien

8,07211 silver badges37 bronze badges

$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago

$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago

$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago

$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago

add a comment |

$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago

$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago

$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago

$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago

One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.

– GZ0
1 hour ago

@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.

– Reinderien
1 hour ago

Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.

– GZ0
56 mins ago

Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.

– GZ0
49 mins ago

add a comment |

Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs