bistr

class bistring.bistr(original: Union[str, bistring._bistr.bistr], modified: Optional[str] = None, alignment: Optional[bistring._alignment.Alignment] = None)

Bases: object

A bidirectionally transformed string.

A bistr can be constructed from only a single string, which will give it identical original and modified strings and an identity alignment:

>>> s = bistr('test')
>>> s.original
'test'
>>> s.modified
'test'
>>> s.alignment
Alignment.identity(4)

You can also explicitly specify both the original and modified string. The inferred alignment will be as course as possible:

>>> s = bistr('TEST', 'test')
>>> s.original
'TEST'
>>> s.modified
'test'
>>> s.alignment
Alignment([(0, 0), (4, 4)])

Finally, you can specify the alignment explicitly too, if you know it:

>>> s = bistr('TEST', 'test', Alignment.identity(4))
>>> s[1:3]
bistr('ES', 'es', Alignment.identity(2))

original: str: The original string, before any modifications.

modified: str: The current value of the string, after all modifications.

alignment: bistring._alignment.Alignment: The sequence alignment between original and modified.

classmethod infer(original, modified, cost_fn=None)

Create a bistr, automatically inferring an alignment between the original and modified strings.

This method can be useful if the modified string was produced by some method out of your control. If at all possible, you should start with bistr(original) and perform non-destructive operations to get to the modified string instead.

>>> s = bistr.infer('color', 'colour')
>>> print(s[0:3])
⮎'col'⮌
>>> print(s[3:5])
('o' ⇋ 'ou')
>>> print(s[5:6])
⮎'r'⮌

infer() tries to be intelligent about certain aspects of Unicode, which enables it to guess good alignments between strings that have been case-mapped, normalized, etc.:

>>> s = bistr.infer(
...     '🅃🄷🄴 🅀🅄🄸🄲🄺, 🄱🅁🄾🅆🄽 🦊 🄹🅄🄼🄿🅂 🄾🅅🄴🅁 🅃🄷🄴 🄻🄰🅉🅈 🐶',
...     'the quick brown fox jumps over the lazy dog',
... )
>>> print(s[0:3])
('🅃🄷🄴' ⇋ 'the')
>>> print(s[4:9])
('🅀🅄🄸🄲🄺' ⇋ 'quick')
>>> print(s[10:15])
('🄱🅁🄾🅆🄽' ⇋ 'brown')
>>> print(s[16:19])
('🦊' ⇋ 'fox')

Warning: this operation has time complexity O(N*M), where N and M are the lengths of the original and modified strings, and so should only be used for relatively short strings.

Parameters

original (str) – The original string
modified (str) – The modified string.
cost_fn (Optional[Callable[[Optional[str], Optional[str]], Union[int, float]]]) – A function returning the cost of performing an edit (see Alignment.infer()).

Return type

bistr

Returns

A bistr with the inferred alignment.

__getitem__(index: int) → str

__getitem__(index: slice) → bistring._bistr.bistr

Indexing a bistr returns the nth character of the modified string:

>>> s = bistr('TEST').lower()
>>> s[1]
'e'

Slicing a bistr extracts a substring, complete with the matching part of the original string:

>>> s = bistr('TEST').lower()
>>> s[1:3]
bistr('ES', 'es', Alignment.identity(2))

Return type: Union[str, bistr]

inverse()

Return type

bistr

Returns

The inverse of this string, swapping the original and modified strings.

>>> s = bistr('HELLO WORLD').lower()
>>> s
bistr('HELLO WORLD', 'hello world', Alignment.identity(11))
>>> s.inverse()
bistr('hello world', 'HELLO WORLD', Alignment.identity(11))

chunks()

Return type: Iterable[bistr]
Returns: All the chunks of associated text in this string.

count(sub, start=None, end=None)

Like str.count(), counts the occurrences of sub in the string.

Return type: int

find(sub, start=None, end=None)

Like str.find(), finds the position of sub in the string.

Return type: int

find_bounds(sub, start=None, end=None)

Like find(), but returns both the start and end bounds for convenience.

Return type: Tuple[int, int]
Returns: The first i, j within [start, end) such that self[i:j] == sub, or (-1, -1) if not found.

rfind(sub, start=None, end=None)

Like str.rfind(), finds the position of sub in the string backwards.

Return type: int

rfind_bounds(sub, start=None, end=None)

Like rfind(), but returns both the start and end bounds for convenience.

Return type: Tuple[int, int]
Returns: The last i, j within [start, end) such that self[i:j] == sub, or (-1, -1) if not found.

index(sub, start=None, end=None)

Like str.index(), finds the first position of sub in the string, otherwise raising a ValueError.

Return type: int

index_bounds(sub, start=None, end=None)

Like index(), but returns both the start and end bounds for convenience. If the substring is not found, a ValueError is raised.

Return type: Tuple[int, int]
Returns: The first i, j within [start, end) such that self[i:j] == sub.
Raises: ValueError if the substring is not found.

rindex(sub, start=None, end=None)

Like str.index(), finds the last position of sub in the string, otherwise raising a ValueError.

Return type: int

rindex_bounds(sub, start=None, end=None)

Like rindex(), but returns both the start and end bounds for convenience. If the substring is not found, a ValueError is raised.

Return type: Tuple[int, int]
Returns: The last i, j within [start, end) such that self[i:j] == sub.
Raises: ValueError if the substring is not found.

startswith(prefix, start=None, end=None)

Like str.startswith(), checks if the string starts with the given prefix.

Return type: bool

endswith(suffix, start=None, end=None)

Like str.endswith(), checks if the string starts with the given suffix.

Return type: bool

join(iterable)

Like str.join(), concatenates many (bi)strings together.

Return type: bistr

split(sep=None, maxsplit=- 1)

Like str.split(), splits this string on a separator.

Return type: List[bistr]

partition(sep)

Like str.partition(), splits this string into three chunks on a separator.

Return type: Tuple[bistr, bistr, bistr]

rpartition(sep)

Like str.rpartition(), splits this string into three chunks on a separator, searching from the end.

Return type: Tuple[bistr, bistr, bistr]

center(width, fillchar=' ')

Like str.center(), pads the start and end of the string to center it.

Return type: bistr

ljust(width, fillchar=' ')

Like str.ljust(), pads the end of the string to a fixed width.

Return type: bistr

rjust(width, fillchar=' ')

Like str.rjust(), pads the start of the string to a fixed width.

Return type: bistr

casefold()

Computes the case folded form of this string. Case folding is used for case-insensitive operations, and the result may not be suitable for displaying to a user. For example:

>>> s = bistr('straße').casefold()
>>> s.modified
'strasse'
>>> s[4:6]
bistr('ß', 'ss')

Return type: bistr

lower(locale=None)

Converts this string to lowercase. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('HELLO WORLD').lower()
bistr('HELLO WORLD', 'hello world', Alignment.identity(11))
>>> bistr('I').lower('en_US')
bistr('I', 'i')
>>> bistr('I').lower('tr_TR')
bistr('I', 'ı')

Return type: bistr

upper(locale=None)

Converts this string to uppercase. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello world').upper()
bistr('hello world', 'HELLO WORLD', Alignment.identity(11))
>>> bistr('i').upper('en_US')
bistr('i', 'I')
>>> bistr('i').upper('tr_TR')
bistr('i', 'İ')

Return type: bistr

title(locale=None)

Converts this string to title case. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello world').title()
bistr('hello world', 'Hello World', Alignment.identity(11))
>>> bistr('istanbul').title('en_US')
bistr('istanbul', 'Istanbul', Alignment.identity(8))
>>> bistr('istanbul').title('tr_TR')
bistr('istanbul', 'İstanbul', Alignment.identity(8))

Return type: bistr

capitalize(locale=None)

Capitalize the first character of this string, and lowercase the rest. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello WORLD').capitalize()
bistr('hello WORLD', 'Hello world', Alignment.identity(11))
>>> bistr('ἴΣ').capitalize('el_GR')
bistr('ἴΣ', 'Ἴς', Alignment.identity(2))

Return type: bistr

swapcase(locale=None)

Swap the case of every letter in this string. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello WORLD').swapcase()
bistr('hello WORLD', 'HELLO world', Alignment.identity(11))

Some Unicode characters, such as title-case ligatures and digraphs, don’t have a case-swapped equivalent:

>>> bistr('ǈepòta').swapcase('hr_HR')
bistr('ǈepòta', 'ǈEPÒTA', Alignment.identity(6))

In these cases, compatibilty normalization may help:

>>> s = bistr('ǈepòta')
>>> s = s.normalize('NFKC')
>>> s = s.swapcase('hr_HR')
>>> print(s)
('ǈepòta' ⇋ 'lJEPÒTA')

Return type: bistr

expandtabs(tabsize=8)

Like str.expandtabs(), replaces tab (\t) characters with spaces to align on multiples of tabsize.

Return type: bistr

replace(old, new, count=None)

Like str.replace(), replaces occurrences of old with new.

Return type: bistr

sub(regex, repl)

Like re.sub(), replaces all matches of regex with the replacement repl.

Parameters

regex (Union[str, Pattern[str]]) – The regex to match. Can be a string pattern or a compiled regex.
repl (Union[str, Callable[[Match[str]], str]]) – The replacement to use. Can be a string, which is interpreted as in re.Match.expand(), or a callable, which will receive each match and return the replacement string.

Return type

bistr

strip(chars=None)

Like str.strip(), removes leading and trailing characters (whitespace by default).

Return type: bistr

lstrip(chars=None)

Like str.lstrip(), removes leading characters (whitespace by default).

Return type: bistr

rstrip(chars=None)

Like str.rstrip(), removes trailing characters (whitespace by default).

Return type: bistr

normalize(form)

Like unicodedata.normalize(), applies a Unicode normalization form. The choices for form are:

'NFC': Canonical Composition
'NFKC': Compatibility Composition
'NFD': Canonical Decomposition
'NFKD': Compatibilty Decomposition

Return type: bistr