Insert str into larger str in the most pythonic wayWhat is the most pythonic way to solve a differential...
Where can I get an anonymous Rav Kav card issued?
ArcMap not displaying attribute table?
How do email clients "send later" without storing a password?
Can I say "I have encrypted something" if I hash something?
Tracks in the snow
extract lines from bottom until regex match
Will replacing a fake visa with a different fake visa cause me problems when applying for a legal study permit?
Might have gotten a coworker sick, should I address this?
Can Boris Johnson request a Brexit extension to November 1st?
Job offer without any details but asking me to withdraw other applications - is it normal?
Is the union of a chain of elementary embeddings elementary?
Can the UK veto its own extension request?
How can I protect myself in case of a human attack like the murders of the hikers Jespersen and Ueland in Morocco?
Have there been any countries that voted themselves out of existence?
Should I leave the first authourship of our paper to the student who did the project whereas I solved it?
Why the word "rain" is considered a verb if it is not possible to conjugate it?
Do ibuprofen or paracetamol cause hearing loss?
Why was "leaping into the river" a valid trial outcome to prove one's innocence?
Are the definite and indefinite integrals actually two different things? Where is the flaw in my understanding?
Why would "an mule" be used instead of "a mule"?
Can a new chain significantly improve the riding experience? If yes - what else can?
Can I toggle Do Not Disturb on/off on my Mac as easily as I can on my iPhone?
Evidence that matrix multiplication cannot be done in O(n^2 poly(log(n))) time
Using the pipe operator ("|") when executing system commands
Insert str into larger str in the most pythonic way
What is the most pythonic way to solve a differential equation using the Euler method?Insert a character into a stringMost pythonic way to combine elements of arbitrary lists into a single listInsert elements into a listPythonic way to flatten nested dictionarysStoring the API endpoints in a Pythonic wayPythonic way of checking for dict keysInsert into SQLite3 databasepythonic way to split bytes
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I want to insert a small str into a larger one in the most pythonic way.
Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag
.
I have tried something with reduce but it didn't work and it was too complex.
Code:
from random import choice, randrange
from typing import List
def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text
if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')
# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review
# Debug
print(new_text)
Output:
"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"
PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.
python strings
New contributor
$endgroup$
add a comment |
$begingroup$
I want to insert a small str into a larger one in the most pythonic way.
Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag
.
I have tried something with reduce but it didn't work and it was too complex.
Code:
from random import choice, randrange
from typing import List
def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text
if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')
# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review
# Debug
print(new_text)
Output:
"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"
PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.
python strings
New contributor
$endgroup$
2
$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago
2
$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.
I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago
add a comment |
$begingroup$
I want to insert a small str into a larger one in the most pythonic way.
Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag
.
I have tried something with reduce but it didn't work and it was too complex.
Code:
from random import choice, randrange
from typing import List
def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text
if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')
# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review
# Debug
print(new_text)
Output:
"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"
PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.
python strings
New contributor
$endgroup$
I want to insert a small str into a larger one in the most pythonic way.
Maybe I missed a useful str method or function ? The function that insert the small str into the larger one is insert_tag
.
I have tried something with reduce but it didn't work and it was too complex.
Code:
from random import choice, randrange
from typing import List
def insert_tag(text: str, offsets: List[int], tags: List[str]) -> str:
text: str = text
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}' # solution 1
text = text[:offset] + tag + text[offset:] # solution 2
return text
if __name__ == '__main__':
tag_nbr = 30
# Text example
base_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do '
'eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut '
'enim ad minim veniam, quis nostrud exercitation ullamco laboris '
'nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor '
'in reprehenderit in voluptate velit esse cillum dolore eu fugiat '
'nulla pariatur. Excepteur sint occaecat cupidatat non proident, '
'sunt in culpa qui officia deserunt mollit anim id est laborum.'
.replace(',', '').replace('.', '')
# Random list of tag only for testing purpose
tag_li: List[str] = [f'<{choice(base_text.split())}>' for _ in range(tag_nbr)]
# Ordered list of random offset only for testing purpose
offset_li: List[int] = [randrange(len(base_text)) for _ in range(tag_nbr)]
offset_li.sort(reverse=True)
# Debug
print(tag_li, offset_li, sep='n')
# Review
new_text: str = insert_tag(base_text, offset_li, tag_li)
# End Review
# Debug
print(new_text)
Output:
"Lorem ips<consectetur>u<Lorem>m dolo<Lorem>r sit amet consecte<Lorem>tur <nostrud>adipiscing eli<aute>t sed d<anim>o eiusmod tempo<voluptate>r <exercitation>incid<anim>i<non>dunt ut l<quis>abor<do>e et dol<reprehenderit>ore magna aliqua Ut enim ad minim veniam quis nostrud exercitatio<est>n ullamco laboris nisi ut aliquip ex<nulla> ea <consectetur>commodo consequat Duis aute irure dolor in repreh<deserunt>enderit in voluptate vel<sint>it esse cillum dolore eu f<qui>ugiat nulla pariatur <magna>Excepteur sin<ex>t occaecat cup<labore>i<ut>datat non proident sun<est>t<in> in cul<fugiat>pa qui officia<enim> deserunt mollit anim i<in><anim>d est laborum"
PS: The rest of the code is not my primary concern, but if it can be improved, tell me, thanks.
python strings
python strings
New contributor
New contributor
edited 13 hours ago
Sᴀᴍ Onᴇᴌᴀ
13.5k6 gold badges25 silver badges89 bronze badges
13.5k6 gold badges25 silver badges89 bronze badges
New contributor
asked 17 hours ago
Dorian TurbaDorian Turba
1266 bronze badges
1266 bronze badges
New contributor
New contributor
2
$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago
2
$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.
I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago
add a comment |
2
$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago
2
$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.
I didn't see that, thank you
$endgroup$
– Dorian Turba
13 hours ago
2
2
$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago
$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago
2
2
$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.
I didn't see that, thank you$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.
I didn't see that, thank you$endgroup$
– Dorian Turba
13 hours ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...)
if you need to support older Python versions) looks fine. However, you overwrite text
each loop, but continue to use the originally provided offset
's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:
In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'
I'd expect this to produce 'axbyc'
. If that output looks correct to you, ignore the rest of this answer.
It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:
In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)
In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'
Other stylistic considerations:
Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.
Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using
Sequence
instead ofList
, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.
$endgroup$
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
3
$begingroup$
That should either be built into the function (something likeoffsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : withinsert_tag('abc', [1, 2], ['x', 'y'])
you will get'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def ofinsert_tag
will look like this :def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
|
show 3 more comments
$begingroup$
Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text
(using values in offsets
) and the tags into a list
, then call ''.join(<list>)
at the end to generate the output; (2) split text
into segments, create a format string using '{}'.join(segments)
, then finally apply the format string to tags
to generate the output.
Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:
- Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
New contributor
$endgroup$
$begingroup$
The offset still valid since I dooffset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
1
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
add a comment |
$begingroup$
Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:
#!/usr/bin/env python3
from io import StringIO
from timeit import timeit
# Assume that 'offsets' decreases monotonically
def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text
def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text
def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]
def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]
def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()
def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)
begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]
tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag
begin_text = end_tag
output[begin_text:] = text[offsets[0]:]
return ''.join(output)
def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)
def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual
def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)
N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()
print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')
main()
This yields:
Name time/call, us
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7
$endgroup$
$begingroup$
One problem of the tests is that it assumes thatoffsets
is in descending order, which adds unnecessary overhead to all thestr.join
implementations here.
$endgroup$
– GZ0
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding asort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods callsreversed
, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor thestr.join
implementations), it would be more fair to explicitly callsort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just forstr.join
-related methods); (2) thetuple
call is redundant anyway andreversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable likeenumerate(...)
, reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago
$begingroup$
Meanwhile, the slowness ofinsertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f227778%2finsert-str-into-larger-str-in-the-most-pythonic-way%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...)
if you need to support older Python versions) looks fine. However, you overwrite text
each loop, but continue to use the originally provided offset
's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:
In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'
I'd expect this to produce 'axbyc'
. If that output looks correct to you, ignore the rest of this answer.
It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:
In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)
In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'
Other stylistic considerations:
Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.
Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using
Sequence
instead ofList
, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.
$endgroup$
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
3
$begingroup$
That should either be built into the function (something likeoffsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : withinsert_tag('abc', [1, 2], ['x', 'y'])
you will get'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def ofinsert_tag
will look like this :def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
|
show 3 more comments
$begingroup$
The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...)
if you need to support older Python versions) looks fine. However, you overwrite text
each loop, but continue to use the originally provided offset
's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:
In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'
I'd expect this to produce 'axbyc'
. If that output looks correct to you, ignore the rest of this answer.
It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:
In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)
In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'
Other stylistic considerations:
Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.
Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using
Sequence
instead ofList
, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.
$endgroup$
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
3
$begingroup$
That should either be built into the function (something likeoffsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : withinsert_tag('abc', [1, 2], ['x', 'y'])
you will get'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def ofinsert_tag
will look like this :def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
|
show 3 more comments
$begingroup$
The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...)
if you need to support older Python versions) looks fine. However, you overwrite text
each loop, but continue to use the originally provided offset
's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:
In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'
I'd expect this to produce 'axbyc'
. If that output looks correct to you, ignore the rest of this answer.
It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:
In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)
In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'
Other stylistic considerations:
Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.
Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using
Sequence
instead ofList
, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.
$endgroup$
The core code (I'd prefer the f-string if you can guarantee Python 3.6+, else the addition or "{}{}{}".format(...)
if you need to support older Python versions) looks fine. However, you overwrite text
each loop, but continue to use the originally provided offset
's, which could be very confusing since the offsets will become incorrect after the first tag insertion. The code doesn't crash, but I'd argue it doesn't "work", certainly not as I'd intuitively expect:
In [1]: insert_tag('abc', [1, 2], ['x', 'y'])
Out[1]: 'axyyxbc'
I'd expect this to produce 'axbyc'
. If that output looks correct to you, ignore the rest of this answer.
It would be better to break the original string apart first, then perform the insertions. As is typical in Python, we want to avoid manual looping if possible, and as is typical for string generation, we want to create as few new strings as possible. I'd try something like the following:
In [2]: def insert_tags(text: str, offsets: [int], tags: [str]):
...: offsets = [0] + list(offsets)
...: chunks = [(text[prev_offset:cur_offset], tag) for prev_offset, cur_offset, tag in zip(offsets[:-1], offset
...: s[1:], tags)]
...: chunks.append([text[offsets[-1]:]])
...: return ''.join(piece for chunk_texts in chunks for piece in chunk_texts)
In [3]: insert_tags('abc', [1, 2], ['x', 'y'])
Out[3]: 'axbyc'
Other stylistic considerations:
Your function supports the insertion of multiple tags, so it should probably be called "insert_tags" (plural) for clarity.
Type annotation are generally good (though recall that they lock you in to newer versions of Python 3.X), but getting them right can be a nuisance. E.g., in this case I'd suggest using
Sequence
instead ofList
, since you really just need to be able to loop on the offsets and tags (e.g., they could also be tuples, sets, or even dicts with interesting keys). I also find them to often be more clutter than useful, so consider if adding type annotations really makes this code clearer, or if something simpler, like a succinct docstring, might be more helpful.
answered 14 hours ago
scnerdscnerd
1,0801 silver badge8 bronze badges
1,0801 silver badge8 bronze badges
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
3
$begingroup$
That should either be built into the function (something likeoffsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : withinsert_tag('abc', [1, 2], ['x', 'y'])
you will get'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def ofinsert_tag
will look like this :def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
|
show 3 more comments
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
3
$begingroup$
That should either be built into the function (something likeoffsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.
$endgroup$
– scnerd
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : withinsert_tag('abc', [1, 2], ['x', 'y'])
you will get'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def ofinsert_tag
will look like this :def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
$begingroup$
The offset still valid since I do offset_li.sort(reverse=True), I update the output to show it.
$endgroup$
– Dorian Turba
14 hours ago
3
3
$begingroup$
That should either be built into the function (something like
offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.$endgroup$
– scnerd
13 hours ago
$begingroup$
That should either be built into the function (something like
offsets, tags = zip(*sorted(zip(offsets, tags), reverse=True))
) or made crystal clear in the function's documentation. The function shouldn't fail horribly like I show because the user expects the function to behave the same regardless of the order of the offsets.$endgroup$
– scnerd
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
Ok, I update the question with your improvement.
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : with
insert_tag('abc', [1, 2], ['x', 'y'])
you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
To answer the question in your answer : with
insert_tag('abc', [1, 2], ['x', 'y'])
you will get 'axybc'
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of
insert_tag
will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
$begingroup$
The fact is that in the real program, tag and offset is linked together because they are in the same object, so def of
insert_tag
will look like this : def insert_tag(tags: List[Tuple(int, str)]
$endgroup$
– Dorian Turba
13 hours ago
|
show 3 more comments
$begingroup$
Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text
(using values in offsets
) and the tags into a list
, then call ''.join(<list>)
at the end to generate the output; (2) split text
into segments, create a format string using '{}'.join(segments)
, then finally apply the format string to tags
to generate the output.
Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:
- Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
New contributor
$endgroup$
$begingroup$
The offset still valid since I dooffset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
1
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
add a comment |
$begingroup$
Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text
(using values in offsets
) and the tags into a list
, then call ''.join(<list>)
at the end to generate the output; (2) split text
into segments, create a format string using '{}'.join(segments)
, then finally apply the format string to tags
to generate the output.
Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:
- Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
New contributor
$endgroup$
$begingroup$
The offset still valid since I dooffset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
1
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
add a comment |
$begingroup$
Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text
(using values in offsets
) and the tags into a list
, then call ''.join(<list>)
at the end to generate the output; (2) split text
into segments, create a format string using '{}'.join(segments)
, then finally apply the format string to tags
to generate the output.
Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:
- Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
New contributor
$endgroup$
Besides your approach there are a few other alternatives to solve this problem: (1) collect segments of the input text
(using values in offsets
) and the tags into a list
, then call ''.join(<list>)
at the end to generate the output; (2) split text
into segments, create a format string using '{}'.join(segments)
, then finally apply the format string to tags
to generate the output.
Be aware that your approach consumes more memory and has a theoretical time complexity of $O(n^2)$ to join $n$ strings because memory reallocation is needed for each intermediate concatenated string. It could be more efficient in some cases on small $n$s but will be less efficient when $n$ becomes larger. Therefore, this is not recommended by PEP 8:
- Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco,
and such).
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
New contributor
edited 13 hours ago
New contributor
answered 14 hours ago
GZ0GZ0
4664 bronze badges
4664 bronze badges
New contributor
New contributor
$begingroup$
The offset still valid since I dooffset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
1
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
add a comment |
$begingroup$
The offset still valid since I dooffset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
1
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
$begingroup$
The offset still valid since I do
offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
$begingroup$
The offset still valid since I do
offset_li.sort(reverse=True)
$endgroup$
– Dorian Turba
14 hours ago
1
1
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
Sorry I overlooked that part. I have corrected my answer.
$endgroup$
– GZ0
13 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
$begingroup$
For some measurements of time efficiency (memory efficiency ignored), refer to codereview.stackexchange.com/a/228822/25834
$endgroup$
– Reinderien
2 hours ago
add a comment |
$begingroup$
Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:
#!/usr/bin/env python3
from io import StringIO
from timeit import timeit
# Assume that 'offsets' decreases monotonically
def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text
def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text
def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]
def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]
def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()
def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)
begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]
tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag
begin_text = end_tag
output[begin_text:] = text[offsets[0]:]
return ''.join(output)
def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)
def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual
def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)
N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()
print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')
main()
This yields:
Name time/call, us
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7
$endgroup$
$begingroup$
One problem of the tests is that it assumes thatoffsets
is in descending order, which adds unnecessary overhead to all thestr.join
implementations here.
$endgroup$
– GZ0
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding asort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods callsreversed
, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor thestr.join
implementations), it would be more fair to explicitly callsort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just forstr.join
-related methods); (2) thetuple
call is redundant anyway andreversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable likeenumerate(...)
, reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago
$begingroup$
Meanwhile, the slowness ofinsertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago
add a comment |
$begingroup$
Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:
#!/usr/bin/env python3
from io import StringIO
from timeit import timeit
# Assume that 'offsets' decreases monotonically
def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text
def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text
def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]
def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]
def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()
def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)
begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]
tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag
begin_text = end_tag
output[begin_text:] = text[offsets[0]:]
return ''.join(output)
def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)
def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual
def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)
N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()
print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')
main()
This yields:
Name time/call, us
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7
$endgroup$
$begingroup$
One problem of the tests is that it assumes thatoffsets
is in descending order, which adds unnecessary overhead to all thestr.join
implementations here.
$endgroup$
– GZ0
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding asort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods callsreversed
, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor thestr.join
implementations), it would be more fair to explicitly callsort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just forstr.join
-related methods); (2) thetuple
call is redundant anyway andreversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable likeenumerate(...)
, reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago
$begingroup$
Meanwhile, the slowness ofinsertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago
add a comment |
$begingroup$
Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:
#!/usr/bin/env python3
from io import StringIO
from timeit import timeit
# Assume that 'offsets' decreases monotonically
def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text
def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text
def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]
def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]
def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()
def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)
begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]
tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag
begin_text = end_tag
output[begin_text:] = text[offsets[0]:]
return ''.join(output)
def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)
def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual
def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)
N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()
print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')
main()
This yields:
Name time/call, us
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7
$endgroup$
Ignoring for now the various gotchas that @scnerd describes about offsets, and making some broad assumptions about the monotonicity and order of the input, let's look at the performance of various implementations. Happily, your implementations are close to the most efficient given these options; only the shown "naive" method is faster on my machine:
#!/usr/bin/env python3
from io import StringIO
from timeit import timeit
# Assume that 'offsets' decreases monotonically
def soln1(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = f'{text[:offset]}{tag}{text[offset:]}'
return text
def soln2(text: str, offsets: tuple, tags: tuple) -> str:
for offset, tag in zip(offsets, tags):
text = text[:offset] + tag + text[offset:]
return text
def gen_join(text: str, offsets: tuple, tags: tuple) -> str:
offsets += (0,)
return ''.join(
text[offsets[i+1]:offsets[i]] + tag
for i, tag in reversed(tuple(enumerate(tags)))
) + text[offsets[0]:]
def naive(text: str, offsets: tuple, tags: tuple) -> str:
output = text[:offsets[-1]]
for i in range(len(tags)-1,-1,-1):
output += tags[i] + text[offsets[i]:offsets[i-1]]
return output + text[offsets[0]:]
def strio(text: str, offsets: tuple, tags: tuple) -> str:
output = StringIO()
output.write(text[:offsets[-1]])
for i in range(len(tags)-1,-1,-1):
output.write(tags[i])
output.write(text[offsets[i]:offsets[i-1]])
output.write(text[offsets[0]:])
return output.getvalue()
def ranges(text: str, offsets: tuple, tags: tuple) -> str:
final_len = len(text) + sum(len(t) for t in tags)
output = [None]*final_len
offsets += (0,)
begin_text = 0
for i in range(len(tags)-1,-1,-1):
o1, o2 = offsets[i+1], offsets[i]
end_text = begin_text + o2 - o1
output[begin_text: end_text] = text[o1: o2]
tag = tags[i]
end_tag = end_text + len(tag)
output[end_text: end_tag] = tag
begin_text = end_tag
output[begin_text:] = text[offsets[0]:]
return ''.join(output)
def insertion(text: str, offsets: tuple, tags: tuple) -> str:
output = []
offsets = (1+len(tags),) + offsets
for i in range(len(tags)):
output.insert(0, text[offsets[i+1]: offsets[i]])
output.insert(0, tags[i])
output.insert(0, text[:offsets[-1]])
return ''.join(output)
def test(fun):
actual = fun('abcde', (4, 3, 2, 1), ('D', 'C', 'B', 'A'))
expected = 'aAbBcCdDe'
assert expected == actual
def main():
funs = (soln1, soln2, gen_join, naive, strio, ranges, insertion)
for fun in funs:
test(fun)
N = 5_000 # number of averaging repetitions
L = 150 # input length
text = 'abcde' * (L//5)
offsets = tuple(range(L-1, 0, -1))
tags = text[-1:0:-1].upper()
print(f'{"Name":10} {"time/call, us":15}')
for fun in funs:
def call():
return fun(text, offsets, tags)
dur = timeit(stmt=call, number=N)
print(f'{fun.__name__:10} {dur/N*1e6:.1f}')
main()
This yields:
Name time/call, us
soln1 134.2
soln2 133.2
gen_join 289.7
naive 116.8
strio 159.6
ranges 513.9
insertion 204.7
answered 2 hours ago
ReinderienReinderien
8,07211 silver badges37 bronze badges
8,07211 silver badges37 bronze badges
$begingroup$
One problem of the tests is that it assumes thatoffsets
is in descending order, which adds unnecessary overhead to all thestr.join
implementations here.
$endgroup$
– GZ0
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding asort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods callsreversed
, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor thestr.join
implementations), it would be more fair to explicitly callsort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just forstr.join
-related methods); (2) thetuple
call is redundant anyway andreversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable likeenumerate(...)
, reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago
$begingroup$
Meanwhile, the slowness ofinsertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago
add a comment |
$begingroup$
One problem of the tests is that it assumes thatoffsets
is in descending order, which adds unnecessary overhead to all thestr.join
implementations here.
$endgroup$
– GZ0
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding asort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods callsreversed
, and it's difficult to say whether that's the limiting factor in speed of that method.
$endgroup$
– Reinderien
1 hour ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor thestr.join
implementations), it would be more fair to explicitly callsort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just forstr.join
-related methods); (2) thetuple
call is redundant anyway andreversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable likeenumerate(...)
, reversing it requires to collect all the elements first.
$endgroup$
– GZ0
56 mins ago
$begingroup$
Meanwhile, the slowness ofinsertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.
$endgroup$
– GZ0
49 mins ago
$begingroup$
One problem of the tests is that it assumes that
offsets
is in descending order, which adds unnecessary overhead to all the str.join
implementations here.$endgroup$
– GZ0
1 hour ago
$begingroup$
One problem of the tests is that it assumes that
offsets
is in descending order, which adds unnecessary overhead to all the str.join
implementations here.$endgroup$
– GZ0
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding a
sort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed
, and it's difficult to say whether that's the limiting factor in speed of that method.$endgroup$
– Reinderien
1 hour ago
$begingroup$
@GZ0 The alternatives - probably adding a
sort()
somewhere, to be more robust - are probably going to incur more cost than assuming descending order. Iterating in reverse order is efficient, especially with indexes. Only one of the methods calls reversed
, and it's difficult to say whether that's the limiting factor in speed of that method.$endgroup$
– Reinderien
1 hour ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the
str.join
implementations), it would be more fair to explicitly call sort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join
-related methods); (2) the tuple
call is redundant anyway and reversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...)
, reversing it requires to collect all the elements first.$endgroup$
– GZ0
56 mins ago
$begingroup$
Two points: (1) even if you do not assume the input is in the ascending order (which might slightly favor the
str.join
implementations), it would be more fair to explicitly call sort()
within all the methods, which would incur the same cost for all methods (so you could actually assume the ascending order just for str.join
-related methods); (2) the tuple
call is redundant anyway and reversed
is efficient only if the input is already a sequence like a list or tuple. For an arbitrary iterable like enumerate(...)
, reversing it requires to collect all the elements first.$endgroup$
– GZ0
56 mins ago
$begingroup$
Meanwhile, the slowness of
insertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.$endgroup$
– GZ0
49 mins ago
$begingroup$
Meanwhile, the slowness of
insertion
is due to all the insertions happening at the beginning of the list. The implementation would not be so awkward if not because of the decending order.$endgroup$
– GZ0
49 mins ago
add a comment |
Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.
Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.
Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.
Dorian Turba is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f227778%2finsert-str-into-larger-str-in-the-most-pythonic-way%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
Please see What to do when someone answers. I have rolled back Rev 5 → 3
$endgroup$
– Sᴀᴍ Onᴇᴌᴀ
13 hours ago
2
$begingroup$
Do not add an improved version of the code after receiving an answer. Including revised versions of the code makes the question confusing, especially if someone later reviews the newer code.
I didn't see that, thank you$endgroup$
– Dorian Turba
13 hours ago