Alignment

class bistring.Alignment(values)

Bases: object

An alignment between two related sequences.

Consider this alignments between two strings:

|it's| |aligned!|
|    \ \        |
|it is| |aligned|

An alignment stores all the indices that are known to correspond between the original and modified sequences. For the above example, it would be

>>> a = Alignment([
...     (0, 0),
...     (4, 5),
...     (5, 6),
...     (13, 13),
... ])

Alignments can be used to answer questions like, “what’s the smallest range of the original sequence that is guaranteed to contain this part of the modified sequence?” For example, the range (0, 5) (“it is”) is known to match the range (0, 4) (“it’s”) of the original sequence:

>>> a.original_bounds(0, 5)
(0, 4)

Results may be imprecise if the alignment is too course to match the exact inputs:

>>> a.original_bounds(0, 2)
(0, 4)

A more granular alignment like this:

|i|t|'s| |a|l|i|g|n|e|d|!|
| | |  \ \ \ \ \ \ \ \ \ /
|i|t| is| |a|l|i|g|n|e|d|
>>> a = Alignment([
...     (0, 0), (1, 1), (2, 2), (4, 5), (5, 6), (6, 7), (7, 8),
...     (8, 9), (9, 10), (10, 11), (11, 12), (12, 13), (13, 13),
... ])

Can be more precise:

>>> a.original_bounds(0, 2)
(0, 2)
Parameters

values (Iterable[Tuple[int, int]]) – The sequence of aligned indices. Each element should be a tuple (x, y), where x is the original sequence position and y is the modified sequence position.

classmethod identity(*args)

Create an identity alignment, which maps all intervals to themselves. You can pass the size of the sequence:

>>> Alignment.identity(5)
Alignment.identity(5)

or the start and end positions:

>>> Alignment.identity(1, 5)
Alignment.identity(1, 5)

or a range-like object (range, slice, or Tuple[int, int]):

>>> Alignment.identity(range(1, 5))
Alignment.identity(1, 5)
Return type

Alignment

classmethod infer(original, modified, cost_fn=None)

Infer the alignment between two sequences with the lowest edit distance.

>>> Alignment.infer('color', 'color')
Alignment.identity(5)
>>> a = Alignment.infer('color', 'colour')
>>> # 'ou' -> 'o'
>>> a.original_bounds(3, 5)
(3, 4)

Warning: this operation has time complexity O(N*M), where N and M are the lengths of the original and modified sequences, and so should only be used for relatively short sequences.

Parameters
  • original (Sequence[~T]) – The original sequence.

  • modified (Sequence[~U]) – The modified sequence.

  • cost_fn (Optional[Callable[[Optional[~T], Optional[~U]], Union[int, float]]]) – A function returning the cost of performing an edit. cost_fn(a, b) returns the cost of replacing a with b. cost_fn(a, None) returns the cost of deleting a, and cost_fn(None, b) returns the cost of inserting b. By default, all operations have cost 1 except replacing identical elements, which has cost 0.

Return type

Alignment

Returns

The inferred alignment.

__getitem__(index)

Indexing an alignment returns the nth pair of aligned positions:

>>> a = Alignment.identity(5)
>>> a[3]
(3, 3)

Slicing an alignment returns a new alignment with a subrange of its values:

>>> a[1:5]
Alignment.identity(1, 4)
Return type

Union[Tuple[int, int], Alignment]

shift(delta_o, delta_m)

Shift this alignment.

Parameters
  • delta_o (int) – The distance to shift the original sequence.

  • delta_m (int) – The distance to shift the modified sequence.

Return type

Alignment

Returns

An alignment with all the positions shifted by the given amounts.

original_bounds(*args)

Maps a subrange of the modified sequence to the original sequence. Can be called with either two arguments:

>>> a = Alignment.identity(5).shift(1, 0)
>>> a.original_bounds(1, 3)
(2, 4)

or with a range-like object:

>>> a.original_bounds(range(1, 3))
(2, 4)

With no arguments, returns the bounds of the entire original sequence:

>>> a.original_bounds()
(1, 6)
Return type

Tuple[int, int]

Returns

The corresponding bounds in the original sequence.

original_range(*args)

Like original_bounds(), but returns a range.

Return type

range

original_slice(*args)

Like original_bounds(), but returns a slice.

Return type

slice

modified_bounds(*args)

Maps a subrange of the original sequence to the modified sequence. Can be called with either two arguments:

>>> a = Alignment.identity(5).shift(1, 0)
>>> a.modified_bounds(2, 4)
(1, 3)

or with a range-like object:

>>> a.modified_bounds(range(2, 4))
(1, 3)

With no arguments, returns the bounds of the entire modified sequence:

>>> a.modified_bounds()
(0, 5)
Return type

Tuple[int, int]

Returns

The corresponding bounds in the modified sequence.

modified_range(*args)

Like modified_bounds(), but returns a range.

Return type

range

modified_slice(*args)

Like modified_bounds(), but returns a range.

Return type

slice

slice_by_original(*args)

Slice this alignment by a span of the original sequence.

>>> a = Alignment.identity(5).shift(1, 0)
>>> a.slice_by_original(2, 4)
Alignment([(2, 1), (3, 2), (4, 3)])
Return type

Alignment

Returns

The slice of this alignment that corresponds with the given span of the original sequence.

slice_by_modified(*args)

Slice this alignment by a span of the modified sequence.

>>> a = Alignment.identity(5).shift(1, 0)
>>> a.slice_by_modified(1, 3)
Alignment([(2, 1), (3, 2), (4, 3)])
Return type

Alignment

Returns

The slice of this alignment that corresponds with the given span of the modified sequence.

compose(other)
Return type

Alignment

Returns

A new alignment equivalent to applying this one first, then the other.

inverse()
Return type

Alignment

Returns

The inverse of this alignment, from the modified to the original sequence.