bistr

class bistring.bistr(original: Union[str, bistring._bistr.bistr], modified: Optional[str] = None, alignment: Optional[bistring._alignment.Alignment] = None)

Bases: object

A bidirectionally transformed string.

A bistr can be constructed from only a single string, which will give it identical original and modified strings and an identity alignment:

>>> s = bistr('test')
>>> s.original
'test'
>>> s.modified
'test'
>>> s.alignment
Alignment.identity(4)

You can also explicitly specify both the original and modified string. The inferred alignment will be as course as possible:

>>> s = bistr('TEST', 'test')
>>> s.original
'TEST'
>>> s.modified
'test'
>>> s.alignment
Alignment([(0, 0), (4, 4)])

Finally, you can specify the alignment explicitly too, if you know it:

>>> s = bistr('TEST', 'test', Alignment.identity(4))
>>> s[1:3]
bistr('ES', 'es', Alignment.identity(2))
original: str

The original string, before any modifications.

modified: str

The current value of the string, after all modifications.

alignment: bistring._alignment.Alignment

The sequence alignment between original and modified.

classmethod infer(original, modified, cost_fn=None)

Create a bistr, automatically inferring an alignment between the original and modified strings.

This method can be useful if the modified string was produced by some method out of your control. If at all possible, you should start with bistr(original) and perform non-destructive operations to get to the modified string instead.

>>> s = bistr.infer('color', 'colour')
>>> print(s[0:3])
⮎'col'⮌
>>> print(s[3:5])
('o' ⇋ 'ou')
>>> print(s[5:6])
⮎'r'⮌

infer() tries to be intelligent about certain aspects of Unicode, which enables it to guess good alignments between strings that have been case-mapped, normalized, etc.:

>>> s = bistr.infer(
...     '🅃🄷🄴 🅀🅄🄸🄲🄺, 🄱🅁🄾🅆🄽 🦊 🄹🅄🄼🄿🅂 🄾🅅🄴🅁 🅃🄷🄴 🄻🄰🅉🅈 🐶',
...     'the quick brown fox jumps over the lazy dog',
... )
>>> print(s[0:3])
('🅃🄷🄴' ⇋ 'the')
>>> print(s[4:9])
('🅀🅄🄸🄲🄺' ⇋ 'quick')
>>> print(s[10:15])
('🄱🅁🄾🅆🄽' ⇋ 'brown')
>>> print(s[16:19])
('🦊' ⇋ 'fox')

Warning: this operation has time complexity O(N*M), where N and M are the lengths of the original and modified strings, and so should only be used for relatively short strings.

Parameters
Return type

bistr

Returns

A bistr with the inferred alignment.

__getitem__(index: int) str
__getitem__(index: slice) bistring._bistr.bistr

Indexing a bistr returns the nth character of the modified string:

>>> s = bistr('TEST').lower()
>>> s[1]
'e'

Slicing a bistr extracts a substring, complete with the matching part of the original string:

>>> s = bistr('TEST').lower()
>>> s[1:3]
bistr('ES', 'es', Alignment.identity(2))
Return type

Union[str, bistr]

inverse()
Return type

bistr

Returns

The inverse of this string, swapping the original and modified strings.

>>> s = bistr('HELLO WORLD').lower()
>>> s
bistr('HELLO WORLD', 'hello world', Alignment.identity(11))
>>> s.inverse()
bistr('hello world', 'HELLO WORLD', Alignment.identity(11))

chunks()
Return type

Iterable[bistr]

Returns

All the chunks of associated text in this string.

count(sub, start=None, end=None)

Like str.count(), counts the occurrences of sub in the string.

Return type

int

find(sub, start=None, end=None)

Like str.find(), finds the position of sub in the string.

Return type

int

find_bounds(sub, start=None, end=None)

Like find(), but returns both the start and end bounds for convenience.

Return type

Tuple[int, int]

Returns

The first i, j within [start, end) such that self[i:j] == sub, or (-1, -1) if not found.

rfind(sub, start=None, end=None)

Like str.rfind(), finds the position of sub in the string backwards.

Return type

int

rfind_bounds(sub, start=None, end=None)

Like rfind(), but returns both the start and end bounds for convenience.

Return type

Tuple[int, int]

Returns

The last i, j within [start, end) such that self[i:j] == sub, or (-1, -1) if not found.

index(sub, start=None, end=None)

Like str.index(), finds the first position of sub in the string, otherwise raising a ValueError.

Return type

int

index_bounds(sub, start=None, end=None)

Like index(), but returns both the start and end bounds for convenience. If the substring is not found, a ValueError is raised.

Return type

Tuple[int, int]

Returns

The first i, j within [start, end) such that self[i:j] == sub.

Raises

ValueError if the substring is not found.

rindex(sub, start=None, end=None)

Like str.index(), finds the last position of sub in the string, otherwise raising a ValueError.

Return type

int

rindex_bounds(sub, start=None, end=None)

Like rindex(), but returns both the start and end bounds for convenience. If the substring is not found, a ValueError is raised.

Return type

Tuple[int, int]

Returns

The last i, j within [start, end) such that self[i:j] == sub.

Raises

ValueError if the substring is not found.

startswith(prefix, start=None, end=None)

Like str.startswith(), checks if the string starts with the given prefix.

Return type

bool

endswith(suffix, start=None, end=None)

Like str.endswith(), checks if the string starts with the given suffix.

Return type

bool

join(iterable)

Like str.join(), concatenates many (bi)strings together.

Return type

bistr

split(sep=None, maxsplit=- 1)

Like str.split(), splits this string on a separator.

Return type

List[bistr]

partition(sep)

Like str.partition(), splits this string into three chunks on a separator.

Return type

Tuple[bistr, bistr, bistr]

rpartition(sep)

Like str.rpartition(), splits this string into three chunks on a separator, searching from the end.

Return type

Tuple[bistr, bistr, bistr]

center(width, fillchar=' ')

Like str.center(), pads the start and end of the string to center it.

Return type

bistr

ljust(width, fillchar=' ')

Like str.ljust(), pads the end of the string to a fixed width.

Return type

bistr

rjust(width, fillchar=' ')

Like str.rjust(), pads the start of the string to a fixed width.

Return type

bistr

casefold()

Computes the case folded form of this string. Case folding is used for case-insensitive operations, and the result may not be suitable for displaying to a user. For example:

>>> s = bistr('straße').casefold()
>>> s.modified
'strasse'
>>> s[4:6]
bistr('ß', 'ss')
Return type

bistr

lower(locale=None)

Converts this string to lowercase. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('HELLO WORLD').lower()
bistr('HELLO WORLD', 'hello world', Alignment.identity(11))
>>> bistr('I').lower('en_US')
bistr('I', 'i')
>>> bistr('I').lower('tr_TR')
bistr('I', 'ı')
Return type

bistr

upper(locale=None)

Converts this string to uppercase. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello world').upper()
bistr('hello world', 'HELLO WORLD', Alignment.identity(11))
>>> bistr('i').upper('en_US')
bistr('i', 'I')
>>> bistr('i').upper('tr_TR')
bistr('i', 'İ')
Return type

bistr

title(locale=None)

Converts this string to title case. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello world').title()
bistr('hello world', 'Hello World', Alignment.identity(11))
>>> bistr('istanbul').title('en_US')
bistr('istanbul', 'Istanbul', Alignment.identity(8))
>>> bistr('istanbul').title('tr_TR')
bistr('istanbul', 'İstanbul', Alignment.identity(8))
Return type

bistr

capitalize(locale=None)

Capitalize the first character of this string, and lowercase the rest. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello WORLD').capitalize()
bistr('hello WORLD', 'Hello world', Alignment.identity(11))
>>> bistr('ἴΣ').capitalize('el_GR')
bistr('ἴΣ', 'Ἴς', Alignment.identity(2))
Return type

bistr

swapcase(locale=None)

Swap the case of every letter in this string. Unless you specify the locale parameter, the current system locale will be used.

>>> bistr('hello WORLD').swapcase()
bistr('hello WORLD', 'HELLO world', Alignment.identity(11))

Some Unicode characters, such as title-case ligatures and digraphs, don’t have a case-swapped equivalent:

>>> bistr('Ljepòta').swapcase('hr_HR')
bistr('Ljepòta', 'LjEPÒTA', Alignment.identity(6))

In these cases, compatibilty normalization may help:

>>> s = bistr('Ljepòta')
>>> s = s.normalize('NFKC')
>>> s = s.swapcase('hr_HR')
>>> print(s)
('Ljepòta' ⇋ 'lJEPÒTA')
Return type

bistr

expandtabs(tabsize=8)

Like str.expandtabs(), replaces tab (\t) characters with spaces to align on multiples of tabsize.

Return type

bistr

replace(old, new, count=None)

Like str.replace(), replaces occurrences of old with new.

Return type

bistr

sub(regex, repl)

Like re.sub(), replaces all matches of regex with the replacement repl.

Parameters
  • regex (Union[str, Pattern[str]]) – The regex to match. Can be a string pattern or a compiled regex.

  • repl (Union[str, Callable[[Match[str]], str]]) – The replacement to use. Can be a string, which is interpreted as in re.Match.expand(), or a callable, which will receive each match and return the replacement string.

Return type

bistr

strip(chars=None)

Like str.strip(), removes leading and trailing characters (whitespace by default).

Return type

bistr

lstrip(chars=None)

Like str.lstrip(), removes leading characters (whitespace by default).

Return type

bistr

rstrip(chars=None)

Like str.rstrip(), removes trailing characters (whitespace by default).

Return type

bistr

normalize(form)

Like unicodedata.normalize(), applies a Unicode normalization form. The choices for form are:

  • 'NFC': Canonical Composition

  • 'NFKC': Compatibility Composition

  • 'NFD': Canonical Decomposition

  • 'NFKD': Compatibilty Decomposition

Return type

bistr