Alignment
- class bistring.Alignment(values)
Bases:
object
An alignment between two related sequences.
Consider this alignment between two strings:
|it's| |aligned!| | \ \ | |it is| |aligned|
An alignment stores all the indices that are known to correspond between the original and modified sequences. For the above example, it would be
>>> a = Alignment([ ... (0, 0), ... (4, 5), ... (5, 6), ... (13, 13), ... ])
Alignments can be used to answer questions like, “what’s the smallest range of the original sequence that is guaranteed to contain this part of the modified sequence?” For example, the range
(0, 5)
(“it is”) is known to match the range(0, 4)
(“it’s”) of the original sequence:>>> a.original_bounds(0, 5) (0, 4)
Results may be imprecise if the alignment is too course to match the exact inputs:
>>> a.original_bounds(0, 2) (0, 4)
A more granular alignment like this:
|i|t|'s| |a|l|i|g|n|e|d|!| | | | \ \ \ \ \ \ \ \ \ / |i|t| is| |a|l|i|g|n|e|d|
>>> a = Alignment([ ... (0, 0), (1, 1), (2, 2), (4, 5), (5, 6), (6, 7), (7, 8), ... (8, 9), (9, 10), (10, 11), (11, 12), (12, 13), (13, 13), ... ])
Can be more precise:
>>> a.original_bounds(0, 2) (0, 2)
- Parameters
values (
Iterable
[Tuple
[int
,int
]]) – The sequence of aligned indices. Each element should be a tuple(x, y)
, where x is the original sequence position and y is the modified sequence position.
- classmethod identity(__length: int) bistring._alignment.Alignment
- classmethod identity(__start: int, __stop: int) bistring._alignment.Alignment
- classmethod identity(__bounds: Union[range, slice, Tuple[int, int]]) bistring._alignment.Alignment
Create an identity alignment, which maps all intervals to themselves. You can pass the size of the sequence:
>>> Alignment.identity(5) Alignment.identity(5)
or the start and end positions:
>>> Alignment.identity(1, 5) Alignment.identity(1, 5)
or a range-like object (
range
,slice
, orTuple[int, int]
):>>> Alignment.identity(range(1, 5)) Alignment.identity(1, 5)
- Return type
- classmethod infer(original, modified, cost_fn=None)
Infer the alignment between two sequences with the lowest edit distance.
>>> Alignment.infer('color', 'color') Alignment.identity(5) >>> a = Alignment.infer('color', 'colour') >>> # 'ou' -> 'o' >>> a.original_bounds(3, 5) (3, 4)
Warning: this operation has time complexity
O(N*M)
, where N and M are the lengths of the original and modified sequences, and so should only be used for relatively short sequences.- Parameters
cost_fn (
Optional
[Callable
[[Optional
[TypeVar
(T
)],Optional
[TypeVar
(U
)]],Union
[int
,float
]]]) – A function returning the cost of performing an edit.cost_fn(a, b)
returns the cost of replacing a with b.cost_fn(a, None)
returns the cost of deleting a, andcost_fn(None, b)
returns the cost of inserting b. By default, all operations have cost 1 except replacing identical elements, which has cost 0.
- Return type
- Returns
The inferred alignment.
- __getitem__(index: int) Tuple[int, int]
- __getitem__(index: slice) bistring._alignment.Alignment
Indexing an alignment returns the nth pair of aligned positions:
>>> a = Alignment.identity(5) >>> a[3] (3, 3)
Slicing an alignment returns a new alignment with a subrange of its values:
>>> a[1:5] Alignment.identity(1, 4)
- shift(delta_o, delta_m)
Shift this alignment.
- original_bounds(*args)
Maps a subrange of the modified sequence to the original sequence. Can be called with either two arguments:
>>> a = Alignment.identity(5).shift(1, 0) >>> a.original_bounds(1, 3) (2, 4)
or with a range-like object:
>>> a.original_bounds(range(1, 3)) (2, 4)
With no arguments, returns the bounds of the entire original sequence:
>>> a.original_bounds() (1, 6)
- original_range(*args)
Like
original_bounds()
, but returns arange
.- Return type
- original_slice(*args)
Like
original_bounds()
, but returns aslice
.- Return type
- modified_bounds(*args)
Maps a subrange of the original sequence to the modified sequence. Can be called with either two arguments:
>>> a = Alignment.identity(5).shift(1, 0) >>> a.modified_bounds(2, 4) (1, 3)
or with a range-like object:
>>> a.modified_bounds(range(2, 4)) (1, 3)
With no arguments, returns the bounds of the entire modified sequence:
>>> a.modified_bounds() (0, 5)
- modified_range(*args)
Like
modified_bounds()
, but returns arange
.- Return type
- modified_slice(*args)
Like
modified_bounds()
, but returns arange
.- Return type
- slice_by_original(*args)
Slice this alignment by a span of the original sequence.
>>> a = Alignment.identity(5).shift(1, 0) >>> a.slice_by_original(2, 4) Alignment([(2, 1), (3, 2), (4, 3)])
- Return type
- Returns
The slice of this alignment that corresponds with the given span of the original sequence.
- slice_by_modified(*args)
Slice this alignment by a span of the modified sequence.
>>> a = Alignment.identity(5).shift(1, 0) >>> a.slice_by_modified(1, 3) Alignment([(2, 1), (3, 2), (4, 3)])
- Return type
- Returns
The slice of this alignment that corresponds with the given span of the modified sequence.
- compose(other)
- Return type
- Returns
A new alignment equivalent to applying this one first, then the other.