Insert str into larger str in the most pythonic wayWhat is the most pythonic way to solve a differential...

Where can I get an anonymous Rav Kav card issued?

ArcMap not displaying attribute table?

How do email clients "send later" without storing a password?

Can I say "I have encrypted something" if I hash something?

Tracks in the snow

extract lines from bottom until regex match

Will replacing a fake visa with a different fake visa cause me problems when applying for a legal study permit?

Might have gotten a coworker sick, should I address this?

Can Boris Johnson request a Brexit extension to November 1st?

Job offer without any details but asking me to withdraw other applications - is it normal?

Is the union of a chain of elementary embeddings elementary?

Can the UK veto its own extension request?

How can I protect myself in case of a human attack like the murders of the hikers Jespersen and Ueland in Morocco?

Have there been any countries that voted themselves out of existence?

Should I leave the first authourship of our paper to the student who did the project whereas I solved it?

Why the word "rain" is considered a verb if it is not possible to conjugate it?

Do ibuprofen or paracetamol cause hearing loss?

Why was "leaping into the river" a valid trial outcome to prove one's innocence?

Are the definite and indefinite integrals actually two different things? Where is the flaw in my understanding?

Why would "an mule" be used instead of "a mule"?

Can a new chain significantly improve the riding experience? If yes - what else can?

Can I toggle Do Not Disturb on/off on my Mac as easily as I can on my iPhone?

Evidence that matrix multiplication cannot be done in O(n^2 poly(log(n))) time

Using the pipe operator ("|") when executing system commands



Insert str into larger str in the most pythonic way


What is the most pythonic way to solve a differential equation using the Euler method?Insert a character into a stringMost pythonic way to combine elements of arbitrary lists into a single listInsert elements into a listPythonic way to flatten nested dictionarysStoring the API endpoints in a Pythonic wayPythonic way of checking for dict keysInsert into SQLite3 databasepythonic way to split bytes






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







5












$begingroup$


I want to insert a small str into a larger one in the most pythonic way.



Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.



I have tried something with reduce but it didn't work and it was too complex.



Code:



from random import choice, randrange
from typing import List


def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text


if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')

# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review

# Debug
print(new_text)


Output:



"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"


PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.










share|improve this question









New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$










  • 2




    $begingroup$
    Please see What to do when someone answers. I have rolled back Rev 5 → 3
    $endgroup$
    – Sᴀᴍ Onᴇᴌᴀ
    13 hours ago






  • 2




    $begingroup$
    Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
    $endgroup$
    – Dorian Turba
    13 hours ago




















5












$begingroup$


I want to insert a small str into a larger one in the most pythonic way.



Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.



I have tried something with reduce but it didn't work and it was too complex.



Code:



from random import choice, randrange
from typing import List


def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text


if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')

# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review

# Debug
print(new_text)


Output:



"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"


PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.










share|improve this question









New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$










  • 2




    $begingroup$
    Please see What to do when someone answers. I have rolled back Rev 5 → 3
    $endgroup$
    – Sᴀᴍ Onᴇᴌᴀ
    13 hours ago






  • 2




    $begingroup$
    Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
    $endgroup$
    – Dorian Turba
    13 hours ago
















5












5








5





$begingroup$


I want to insert a small str into a larger one in the most pythonic way.



Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.



I have tried something with reduce but it didn't work and it was too complex.



Code:



from random import choice, randrange
from typing import List


def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text


if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')

# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review

# Debug
print(new_text)


Output:



"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"


PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.










share|improve this question









New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$




I want to insert a small str into a larger one in the most pythonic way.



Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag.



I have tried something with reduce but it didn't work and it was too complex.



Code:



from random import choice, randrange
from typing import List


def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text


if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')

# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review

# Debug
print(new_text)


Output:



"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"


PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.







python strings






share|improve this question









New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|improve this question









New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this question




share|improve this question








edited 13 hours ago









Sᴀᴍ Onᴇᴌᴀ

13.5k6 gold badges25 silver badges89 bronze badges




13.5k6 gold badges25 silver badges89 bronze badges






New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked 17 hours ago









Dorian TurbaDorian Turba

1266 bronze badges




1266 bronze badges




New contributor



Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




Dorian Turba is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • 2




    $begingroup$
    Please see What to do when someone answers. I have rolled back Rev 5 → 3
    $endgroup$
    – Sᴀᴍ Onᴇᴌᴀ
    13 hours ago






  • 2




    $begingroup$
    Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
    $endgroup$
    – Dorian Turba
    13 hours ago
















  • 2




    $begingroup$
    Please see What to do when someone answers. I have rolled back Rev 5 → 3
    $endgroup$
    – Sᴀᴍ Onᴇᴌᴀ
    13 hours ago






  • 2




    $begingroup$
    Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
    $endgroup$
    – Dorian Turba
    13 hours ago










2




2




$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago




$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago




2




2




$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago






$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code. I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago












3 Answers
3






active

oldest

votes


















8














$begingroup$

The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...) if you need to support older Python versions) looks fine. However, you overwrite text each loop, but continue to use the originally provided offset's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:



In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'


I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.



It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:



In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)


In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'


Other stylistic considerations:




  • Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.


  • Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.







share|improve this answer









$endgroup$















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 3




    $begingroup$
    That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
    $endgroup$
    – scnerd
    13 hours ago










  • $begingroup$
    Ok, I update the question with your improvement.
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
    $endgroup$
    – Dorian Turba
    13 hours ago



















5














$begingroup$

Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text (using values in offsets) and the tags into a list, then call ''.join(<list>) at the end to generate the output; (2) split text into segments, create a format string using '{}'.join(segments), then finally apply the format string to tags to generate the output.



Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:





  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
    and such).


For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.







share|improve this answer










New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





$endgroup$















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True)
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 1




    $begingroup$
    Sorry I overlooked that part. I have corrected my answer.
    $endgroup$
    – GZ0
    13 hours ago










  • $begingroup$
    For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
    $endgroup$
    – Reinderien
    2 hours ago



















0














$begingroup$

Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:



#!/usr/bin/env python3

from io import StringIO
from timeit import timeit


# Assume that 'offsets' decreases monotonically


def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text


def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text


def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]


def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]


def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()


def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)

begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]

tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag

begin_text = end_tag

output[begin_text:] = text[offsets[0]:]
return ''.join(output)


def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)


def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual


def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)

N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()

print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')


main()


This yields:



Name       time/call, us  
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7





share|improve this answer









$endgroup$















  • $begingroup$
    One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
    $endgroup$
    – GZ0
    1 hour ago










  • $begingroup$
    @GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
    $endgroup$
    – Reinderien
    1 hour ago










  • $begingroup$
    Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
    $endgroup$
    – GZ0
    56 mins ago












  • $begingroup$
    Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
    $endgroup$
    – GZ0
    49 mins ago
















Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});







Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded
















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f227778%2finsert-str-into-larger-str-in-the-most-pythonic-way%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









8














$begingroup$

The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...) if you need to support older Python versions) looks fine. However, you overwrite text each loop, but continue to use the originally provided offset's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:



In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'


I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.



It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:



In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)


In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'


Other stylistic considerations:




  • Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.


  • Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.







share|improve this answer









$endgroup$















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 3




    $begingroup$
    That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
    $endgroup$
    – scnerd
    13 hours ago










  • $begingroup$
    Ok, I update the question with your improvement.
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
    $endgroup$
    – Dorian Turba
    13 hours ago
















8














$begingroup$

The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...) if you need to support older Python versions) looks fine. However, you overwrite text each loop, but continue to use the originally provided offset's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:



In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'


I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.



It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:



In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)


In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'


Other stylistic considerations:




  • Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.


  • Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.







share|improve this answer









$endgroup$















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 3




    $begingroup$
    That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
    $endgroup$
    – scnerd
    13 hours ago










  • $begingroup$
    Ok, I update the question with your improvement.
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
    $endgroup$
    – Dorian Turba
    13 hours ago














8














8










8







$begingroup$

The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...) if you need to support older Python versions) looks fine. However, you overwrite text each loop, but continue to use the originally provided offset's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:



In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'


I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.



It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:



In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)


In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'


Other stylistic considerations:




  • Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.


  • Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.







share|improve this answer









$endgroup$



The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...) if you need to support older Python versions) looks fine. However, you overwrite text each loop, but continue to use the originally provided offset's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:



In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'


I'd expect this to produce 'axbyc'. If that output looks correct to you, ignore the rest of this answer.



It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:



In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)


In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'


Other stylistic considerations:




  • Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.


  • Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using Sequence instead of List, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.








share|improve this answer












share|improve this answer



share|improve this answer










answered 14 hours ago









scnerdscnerd

1,0801 silver badge8 bronze badges




1,0801 silver badge8 bronze badges















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 3




    $begingroup$
    That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
    $endgroup$
    – scnerd
    13 hours ago










  • $begingroup$
    Ok, I update the question with your improvement.
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
    $endgroup$
    – Dorian Turba
    13 hours ago


















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 3




    $begingroup$
    That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
    $endgroup$
    – scnerd
    13 hours ago










  • $begingroup$
    Ok, I update the question with your improvement.
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
    $endgroup$
    – Dorian Turba
    13 hours ago










  • $begingroup$
    The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
    $endgroup$
    – Dorian Turba
    13 hours ago
















$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago




$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago




3




3




$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago




$begingroup$
That should either be built into the function (something like offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago












$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago




$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago












$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago




$begingroup$
To answer the question in your answer : with insert_tag('abc', [1, 2], ['x', 'y']) you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago












$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago




$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of insert_tag will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago













5














$begingroup$

Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text (using values in offsets) and the tags into a list, then call ''.join(<list>) at the end to generate the output; (2) split text into segments, create a format string using '{}'.join(segments), then finally apply the format string to tags to generate the output.



Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:





  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
    and such).


For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.







share|improve this answer










New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





$endgroup$















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True)
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 1




    $begingroup$
    Sorry I overlooked that part. I have corrected my answer.
    $endgroup$
    – GZ0
    13 hours ago










  • $begingroup$
    For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
    $endgroup$
    – Reinderien
    2 hours ago
















5














$begingroup$

Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text (using values in offsets) and the tags into a list, then call ''.join(<list>) at the end to generate the output; (2) split text into segments, create a format string using '{}'.join(segments), then finally apply the format string to tags to generate the output.



Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:





  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
    and such).


For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.







share|improve this answer










New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





$endgroup$















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True)
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 1




    $begingroup$
    Sorry I overlooked that part. I have corrected my answer.
    $endgroup$
    – GZ0
    13 hours ago










  • $begingroup$
    For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
    $endgroup$
    – Reinderien
    2 hours ago














5














5










5







$begingroup$

Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text (using values in offsets) and the tags into a list, then call ''.join(<list>) at the end to generate the output; (2) split text into segments, create a format string using '{}'.join(segments), then finally apply the format string to tags to generate the output.



Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:





  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
    and such).


For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.







share|improve this answer










New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





$endgroup$



Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text (using values in offsets) and the tags into a list, then call ''.join(<list>) at the end to generate the output; (2) split text into segments, create a format string using '{}'.join(segments), then finally apply the format string to tags to generate the output.



Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:





  • Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
    and such).


For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.








share|improve this answer










New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this answer



share|improve this answer








edited 13 hours ago





















New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








answered 14 hours ago









GZ0GZ0

4664 bronze badges




4664 bronze badges




New contributor



GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




GZ0 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True)
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 1




    $begingroup$
    Sorry I overlooked that part. I have corrected my answer.
    $endgroup$
    – GZ0
    13 hours ago










  • $begingroup$
    For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
    $endgroup$
    – Reinderien
    2 hours ago


















  • $begingroup$
    The offset still valid since I do offset_li.sort(reverse=True)
    $endgroup$
    – Dorian Turba
    14 hours ago






  • 1




    $begingroup$
    Sorry I overlooked that part. I have corrected my answer.
    $endgroup$
    – GZ0
    13 hours ago










  • $begingroup$
    For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
    $endgroup$
    – Reinderien
    2 hours ago
















$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago




$begingroup$
The offset still valid since I do offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago




1




1




$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago




$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago












$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago




$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago











0














$begingroup$

Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:



#!/usr/bin/env python3

from io import StringIO
from timeit import timeit


# Assume that 'offsets' decreases monotonically


def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text


def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text


def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]


def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]


def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()


def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)

begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]

tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag

begin_text = end_tag

output[begin_text:] = text[offsets[0]:]
return ''.join(output)


def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)


def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual


def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)

N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()

print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')


main()


This yields:



Name       time/call, us  
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7





share|improve this answer









$endgroup$















  • $begingroup$
    One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
    $endgroup$
    – GZ0
    1 hour ago










  • $begingroup$
    @GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
    $endgroup$
    – Reinderien
    1 hour ago










  • $begingroup$
    Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
    $endgroup$
    – GZ0
    56 mins ago












  • $begingroup$
    Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
    $endgroup$
    – GZ0
    49 mins ago


















0














$begingroup$

Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:



#!/usr/bin/env python3

from io import StringIO
from timeit import timeit


# Assume that 'offsets' decreases monotonically


def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text


def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text


def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]


def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]


def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()


def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)

begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]

tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag

begin_text = end_tag

output[begin_text:] = text[offsets[0]:]
return ''.join(output)


def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)


def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual


def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)

N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()

print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')


main()


This yields:



Name       time/call, us  
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7





share|improve this answer









$endgroup$















  • $begingroup$
    One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
    $endgroup$
    – GZ0
    1 hour ago










  • $begingroup$
    @GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
    $endgroup$
    – Reinderien
    1 hour ago










  • $begingroup$
    Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
    $endgroup$
    – GZ0
    56 mins ago












  • $begingroup$
    Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
    $endgroup$
    – GZ0
    49 mins ago
















0














0










0







$begingroup$

Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:



#!/usr/bin/env python3

from io import StringIO
from timeit import timeit


# Assume that 'offsets' decreases monotonically


def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text


def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text


def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]


def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]


def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()


def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)

begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]

tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag

begin_text = end_tag

output[begin_text:] = text[offsets[0]:]
return ''.join(output)


def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)


def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual


def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)

N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()

print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')


main()


This yields:



Name       time/call, us  
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7





share|improve this answer









$endgroup$



Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:



#!/usr/bin/env python3

from io import StringIO
from timeit import timeit


# Assume that 'offsets' decreases monotonically


def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text


def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text


def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]


def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]


def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()


def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)

begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]

tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag

begin_text = end_tag

output[begin_text:] = text[offsets[0]:]
return ''.join(output)


def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)


def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual


def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)

N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()

print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')


main()


This yields:



Name       time/call, us  
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7






share|improve this answer












share|improve this answer



share|improve this answer










answered 2 hours ago









ReinderienReinderien

8,07211 silver badges37 bronze badges




8,07211 silver badges37 bronze badges















  • $begingroup$
    One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
    $endgroup$
    – GZ0
    1 hour ago










  • $begingroup$
    @GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
    $endgroup$
    – Reinderien
    1 hour ago










  • $begingroup$
    Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
    $endgroup$
    – GZ0
    56 mins ago












  • $begingroup$
    Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
    $endgroup$
    – GZ0
    49 mins ago




















  • $begingroup$
    One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
    $endgroup$
    – GZ0
    1 hour ago










  • $begingroup$
    @GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
    $endgroup$
    – Reinderien
    1 hour ago










  • $begingroup$
    Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
    $endgroup$
    – GZ0
    56 mins ago












  • $begingroup$
    Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
    $endgroup$
    – GZ0
    49 mins ago


















$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago




$begingroup$
One problem of the tests is that it assumes that offsets is in descending order, which adds unnecessary overhead to all the str.join implementations here.
$endgroup$
– GZ0
1 hour ago












$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago




$begingroup$
@GZ0 The alternatives - probably adding a sort() somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago












$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago






$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the str.join implementations), it would be more fair to explicitly call sort() within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join-related methods); (2) the tuple call is redundant anyway and reversed is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...), reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago














$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago






$begingroup$
Meanwhile, the slowness of insertion is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago













Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded

















Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.













Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.












Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f227778%2finsert-str-into-larger-str-in-the-most-pythonic-way%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...