bistr
- class bistring.bistr(original: Union[str, bistring._bistr.bistr], modified: Optional[str] = None, alignment: Optional[bistring._alignment.Alignment] = None)
Bases:
object
A bidirectionally transformed string.
A bistr can be constructed from only a single string, which will give it identical original and modified strings and an identity alignment:
>>> s = bistr('test') >>> s.original 'test' >>> s.modified 'test' >>> s.alignment Alignment.identity(4)
You can also explicitly specify both the original and modified string. The inferred alignment will be as course as possible:
>>> s = bistr('TEST', 'test') >>> s.original 'TEST' >>> s.modified 'test' >>> s.alignment Alignment([(0, 0), (4, 4)])
Finally, you can specify the alignment explicitly too, if you know it:
>>> s = bistr('TEST', 'test', Alignment.identity(4)) >>> s[1:3] bistr('ES', 'es', Alignment.identity(2))
- alignment: bistring._alignment.Alignment
- classmethod infer(original, modified, cost_fn=None)
Create a bistr, automatically inferring an alignment between the original and modified strings.
This method can be useful if the modified string was produced by some method out of your control. If at all possible, you should start with
bistr(original)
and perform non-destructive operations to get to the modified string instead.>>> s = bistr.infer('color', 'colour') >>> print(s[0:3]) ⮎'col'⮌ >>> print(s[3:5]) ('o' ⇋ 'ou') >>> print(s[5:6]) ⮎'r'⮌
infer() tries to be intelligent about certain aspects of Unicode, which enables it to guess good alignments between strings that have been case-mapped, normalized, etc.:
>>> s = bistr.infer( ... '🅃🄷🄴 🅀🅄🄸🄲🄺, 🄱🅁🄾🅆🄽 🦊 🄹🅄🄼🄿🅂 🄾🅅🄴🅁 🅃🄷🄴 🄻🄰🅉🅈 🐶', ... 'the quick brown fox jumps over the lazy dog', ... ) >>> print(s[0:3]) ('🅃🄷🄴' ⇋ 'the') >>> print(s[4:9]) ('🅀🅄🄸🄲🄺' ⇋ 'quick') >>> print(s[10:15]) ('🄱🅁🄾🅆🄽' ⇋ 'brown') >>> print(s[16:19]) ('🦊' ⇋ 'fox')
Warning: this operation has time complexity
O(N*M)
, where N and M are the lengths of the original and modified strings, and so should only be used for relatively short strings.- Parameters
- Return type
- Returns
A bistr with the inferred alignment.
- __getitem__(index: int) str
- __getitem__(index: slice) bistring._bistr.bistr
Indexing a bistr returns the nth character of the modified string:
>>> s = bistr('TEST').lower() >>> s[1] 'e'
Slicing a bistr extracts a substring, complete with the matching part of the original string:
>>> s = bistr('TEST').lower() >>> s[1:3] bistr('ES', 'es', Alignment.identity(2))
- inverse()
- Return type
- Returns
The inverse of this string, swapping the original and modified strings.
>>> s = bistr('HELLO WORLD').lower() >>> s bistr('HELLO WORLD', 'hello world', Alignment.identity(11)) >>> s.inverse() bistr('hello world', 'HELLO WORLD', Alignment.identity(11))
- count(sub, start=None, end=None)
Like
str.count()
, counts the occurrences of sub in the string.- Return type
- find(sub, start=None, end=None)
Like
str.find()
, finds the position of sub in the string.- Return type
- find_bounds(sub, start=None, end=None)
Like
find()
, but returns both the start and end bounds for convenience.
- rfind(sub, start=None, end=None)
Like
str.rfind()
, finds the position of sub in the string backwards.- Return type
- rfind_bounds(sub, start=None, end=None)
Like
rfind()
, but returns both the start and end bounds for convenience.
- index(sub, start=None, end=None)
Like
str.index()
, finds the first position of sub in the string, otherwise raising a ValueError.- Return type
- index_bounds(sub, start=None, end=None)
Like
index()
, but returns both the start and end bounds for convenience. If the substring is not found, aValueError
is raised.- Return type
- Returns
The first i, j within [start, end) such that
self[i:j] == sub
.- Raises
ValueError
if the substring is not found.
- rindex(sub, start=None, end=None)
Like
str.index()
, finds the last position of sub in the string, otherwise raising a ValueError.- Return type
- rindex_bounds(sub, start=None, end=None)
Like
rindex()
, but returns both the start and end bounds for convenience. If the substring is not found, aValueError
is raised.- Return type
- Returns
The last i, j within [start, end) such that
self[i:j] == sub
.- Raises
ValueError
if the substring is not found.
- startswith(prefix, start=None, end=None)
Like
str.startswith()
, checks if the string starts with the given prefix.- Return type
- endswith(suffix, start=None, end=None)
Like
str.endswith()
, checks if the string starts with the given suffix.- Return type
- join(iterable)
Like
str.join()
, concatenates many (bi)strings together.- Return type
- split(sep=None, maxsplit=- 1)
Like
str.split()
, splits this string on a separator.
- partition(sep)
Like
str.partition()
, splits this string into three chunks on a separator.
- rpartition(sep)
Like
str.rpartition()
, splits this string into three chunks on a separator, searching from the end.
- center(width, fillchar=' ')
Like
str.center()
, pads the start and end of the string to center it.- Return type
- ljust(width, fillchar=' ')
Like
str.ljust()
, pads the end of the string to a fixed width.- Return type
- rjust(width, fillchar=' ')
Like
str.rjust()
, pads the start of the string to a fixed width.- Return type
- casefold()
Computes the case folded form of this string. Case folding is used for case-insensitive operations, and the result may not be suitable for displaying to a user. For example:
>>> s = bistr('straße').casefold() >>> s.modified 'strasse' >>> s[4:6] bistr('ß', 'ss')
- Return type
- lower(locale=None)
Converts this string to lowercase. Unless you specify the locale parameter, the current system locale will be used.
>>> bistr('HELLO WORLD').lower() bistr('HELLO WORLD', 'hello world', Alignment.identity(11)) >>> bistr('I').lower('en_US') bistr('I', 'i') >>> bistr('I').lower('tr_TR') bistr('I', 'ı')
- Return type
- upper(locale=None)
Converts this string to uppercase. Unless you specify the locale parameter, the current system locale will be used.
>>> bistr('hello world').upper() bistr('hello world', 'HELLO WORLD', Alignment.identity(11)) >>> bistr('i').upper('en_US') bistr('i', 'I') >>> bistr('i').upper('tr_TR') bistr('i', 'İ')
- Return type
- title(locale=None)
Converts this string to title case. Unless you specify the locale parameter, the current system locale will be used.
>>> bistr('hello world').title() bistr('hello world', 'Hello World', Alignment.identity(11)) >>> bistr('istanbul').title('en_US') bistr('istanbul', 'Istanbul', Alignment.identity(8)) >>> bistr('istanbul').title('tr_TR') bistr('istanbul', 'İstanbul', Alignment.identity(8))
- Return type
- capitalize(locale=None)
Capitalize the first character of this string, and lowercase the rest. Unless you specify the locale parameter, the current system locale will be used.
>>> bistr('hello WORLD').capitalize() bistr('hello WORLD', 'Hello world', Alignment.identity(11)) >>> bistr('ἴΣ').capitalize('el_GR') bistr('ἴΣ', 'Ἴς', Alignment.identity(2))
- Return type
- swapcase(locale=None)
Swap the case of every letter in this string. Unless you specify the locale parameter, the current system locale will be used.
>>> bistr('hello WORLD').swapcase() bistr('hello WORLD', 'HELLO world', Alignment.identity(11))
Some Unicode characters, such as title-case ligatures and digraphs, don’t have a case-swapped equivalent:
>>> bistr('Ljepòta').swapcase('hr_HR') bistr('Ljepòta', 'LjEPÒTA', Alignment.identity(6))
In these cases, compatibilty normalization may help:
>>> s = bistr('Ljepòta') >>> s = s.normalize('NFKC') >>> s = s.swapcase('hr_HR') >>> print(s) ('Ljepòta' ⇋ 'lJEPÒTA')
- Return type
- expandtabs(tabsize=8)
Like
str.expandtabs()
, replaces tab (\t
) characters with spaces to align on multiples of tabsize.- Return type
- replace(old, new, count=None)
Like
str.replace()
, replaces occurrences of old with new.- Return type
- sub(regex, repl)
Like
re.sub()
, replaces all matches of regex with the replacement repl.- Parameters
regex (
Union
[str
,Pattern
[str
]]) – The regex to match. Can be a string pattern or a compiled regex.repl (
Union
[str
,Callable
[[Match
[str
]],str
]]) – The replacement to use. Can be a string, which is interpreted as inre.Match.expand()
, or a callable, which will receive each match and return the replacement string.
- Return type
- strip(chars=None)
Like
str.strip()
, removes leading and trailing characters (whitespace by default).- Return type
- lstrip(chars=None)
Like
str.lstrip()
, removes leading characters (whitespace by default).- Return type
- rstrip(chars=None)
Like
str.rstrip()
, removes trailing characters (whitespace by default).- Return type
- normalize(form)
Like
unicodedata.normalize()
, applies a Unicode normalization form. The choices for form are:'NFC'
: Canonical Composition'NFKC'
: Compatibility Composition'NFD'
: Canonical Decomposition'NFKD'
: Compatibilty Decomposition
- Return type