Mainly Tech projects on Python and Electronic Design Automation.

Saturday, August 19, 2023

OEISify

Best read on larger than a phone screen

The title is a rather mean pun on ossify, but it stuck.

 

I had been creating series of integers from prior work, and had been looking them up, manually, on

Work progressed and I was creating a table of related sequences and looked to automate the OEIS searches.
I found a description of the textual internal format here which seemed steraight-forward and "awkable" - meaning that the format is written in a way that a short awk program could parse. I did find a couple of old python libraries that said they did this, but I wanted to flex my parsing skills and this seemed an enticing.

Mile high looking down

Checking that internal format, the example result seem to be of the form:

%<TAG> A<number> <value>

To show search results in the internal format you need to make a slight change to a url:

https://oeis.org/search?fmt=text&q=3%2C11%2C26%2C50%2C85 (test)

Reading the internal format in more detail 

I saw that the %S %T and %U lines are all giving members of the sequence - there seems to be no reason to keep a distinction, I could parse and return all the values in order.
 
The %p, %t, %o lines when you concatenate all of the same type, give program sources in three different languages. They can be treated similarly.
 
the %k keywords are best returned as a list of strings.
 
The other keywords, if they appear more than once in a sequence definition then subsequent values can append as new lines, accumulating a final value.

Extra Info given

When you look at a search result in text format you get a Search: and Showing line I might want to return this data and maybe also derive a Next entry from the details of Showing that is what is needed to generate the next page of results for the given search.

The Regexp

Yep, I started with a regexp, so put a lot of sample text into regex101 as TEST STRING then generated the regexp used in the program. 

The Output

A Python dict with all string keys. Each found A-series will be a nested dict of keys corresponding to the %-letter tags.
Keys Search, Showing and Next are about the search and so appear in the top dict.
 

The Code: oeis_request.py

 
#!/bin/env python3
# -*- coding: utf-8 -*-
"""

Module to search and return sequences from:
    The On-Line Encyclopedia of Integer Sequences® (OEIS®)
    https://oeis.org/
    https://oeis.org/eishelp1.html

Standalone Usage:
    oeis_request.py <query> <start>
    oeis_request.py --help
    oeis_request.py --test


Created on Sat Aug 12 07:57:53 2023

@author: paddy3118, paddy3118.blogspot.com

"""

from pprint import pp
import re
import ssl
import sys
from typing import Any
import urllib.request
import urllib.parse
import json  # pylint: disable=W0611


URL = 'https://oeis.org/search'

DBG = not True

finditer = re.compile(r"""
    # https://regex101.com/r/iNL0ve/2

    (?: Search: \s+ (?P<SEARCH> .* ))
    | (?: (?P<SHOWING> ^Showing\s.*))
    | (?: ^% (?P<_TAG>  (?P<SEQ>  [STU] ) |  (?P<PROG>  [pto] ) |  (?P<REST> [A-Za-z]) ) \s+
             (?P<ANUMBER> A\d+) \s+ (?P<VALUE> .*) )
    """, re.MULTILINE | re.VERBOSE).finditer

OEISTYPE = dict[str, str | dict[str, str | list[int | str]]]

def oeis_request(query: str, start: int=0) -> OEISTYPE:
    """
    Search OEIS for query returning a page of results from position start as a Python dict


    Parameters
    ----------
    query : str
        The OEIS query in a format consistent with https://oeis.org/hints.html.
    start : int, optional
        page offset when multiple pages of results are available. The default is 0.

    Returns
    -------
    dict[str,                   # key0, top level keys: SEARCH, SHOWING, NEXT
         str                    # string value,
         | dict[                # Or a nested dict for each sequence definition:
                str,                # key1, sequence level string keys,
                str |               # And either a string value,
                list[int            # A list for the int Sequence,
                     | str]]]       # or a list of the str Keywords; as value.

        Returned Python datastructure representing the (page of) sequences found.
        Top level keys are the 'SEARCH' and what results are 'SHOWING', together
        with the sequence numbers of returned sequences like 'A123456' - which
        all have one sub-dict as its value. Top level key 'NEXT', if present,
        has as its value the query and start value pre-computed to retrieve the
        next page of results from oeis_request.

        All possible keys of a sequence sub-dict are the values of global variable
        RECORD_MAP
    """

    # https://stackoverflow.com/a/60671292/10562
    # Security risk as ssl cert checking is skipped for this function
    # pylint: disable=protected-access
    tmp, ssl._create_default_https_context = ssl._create_default_https_context, \
                                             ssl._create_unverified_context
    # pylint: enable=protected-access

    data = {}
    data['fmt'] = 'text'
    data['q'] = query
    if start:
        data['start'] = start
    url_values = urllib.parse.urlencode(data)
    url = URL + '?' + url_values
    if DBG:
        sys.stderr.write(url+'\n')
        sys.stderr.flush()
    with urllib.request.urlopen(url) as req:
        txt =req.read().decode('utf8')

    # Restored:
    ssl._create_default_https_context = tmp    # pylint: disable=protected-access

    return oeis_parser(txt)

def oeis_parser(text: str) -> dict[str, Any]:  # pylint: disable=too-many-branches
    """
    Parse text formatted like https://oeis.org/eishelp1.html

    Parameters
    ----------
    text : str
        text returned from OEIS search.

    Returns
    -------
    dict[str, Any]
        parsed data

    """
    data = {}

    for matchobj in finditer(text):
        group = matchobj.groupdict()
        try:
            anumber = group['ANUMBER']
            if anumber:  # Skip None
                adict = data.setdefault(anumber, {})
        except KeyError:
            anumber = ''
        try:
            value = group['VALUE']
        except KeyError:
            value = ''

        if val:=group[key0:='SEARCH']:
            data[key0] = val
        elif val:=group[key0:='SHOWING']:
            data[key0] = val

        elif val:=group[key1:='SEQ']:
            # Concat all %S $T and %U lines as list[str]
            adict.setdefault(key1, []).extend(int(v) for v in
                                             value.strip().strip(',').split(','))
        elif val:=group['PROG']:
            # Concat all individual %p %t %o lines as list[str]
            adict.setdefault(RECORD_MAP[val], []).append(value.strip())
        elif (val:=group['REST']) and val == 'K':
            # Split and concat Keywords, %K lines as list[str]
            adict.setdefault(RECORD_MAP[val], []).extend(value.strip().split(','))
        elif val:=group['REST']:
            # Concat other %_ lines as list[str]
            adict.setdefault(RECORD_MAP[val], []).append(value.strip())
        else:
            # Should never arrive here from the regexp!
            assert False, f"Got {group = }"

    # fixup
    for key, subd in data.items():
        if key[0] == 'A' and issubclass(type(subd), dict):
            for record, value in subd.items():
                if record != 'Keywords' and issubclass(type(value), list) \
                   and issubclass(type(value[0]), str):
                    subd[record] = '\n'.join(value)

    next_from_showing(data)

    return data

def next_from_showing(parsed: OEISTYPE)-> None:
    """
    Parse "SHOWING" key to generate "NEXT" value that would request the next page

    If already at the end then no next key is in data
    parsed is altered in-place

    Parameters
    ----------
    parsed : OEISTYPE
        Parsed data to insert a NEXT key/value.

    Returns
    -------
    None.
        argument updated in-place

    """
    matchobj = re.match(r"^Showing\s+(\d+)-(\d+)\s+of\s+(\d+)\s*$", parsed.get('SHOWING', ''))
    if matchobj:
        _start, stop, end = [int(x) for x in matchobj.groups()]
        if stop < end:
            search = parsed['SEARCH'].split(':', 1)[-1]
            parsed['NEXT'] = [search, stop]
            return

    if 'NEXT' in parsed:
        del parsed['NEXT']


record_info = """
%I A000001 Identification line (required)
%S A000001 First line of sequence (required)
%T A000001 2nd line of sequence.
%U A000001 3rd line of sequence.
%N A000001 Name (required)
%D A000001 Reference Detailed reference line.
%H A000001 Link to other site.
%F A000001 Formula .
%Y A000001 Cross-references to other sequences.
%A A000001 Author (required)
%O A000001 Offset (required)
%E A000001 Etc Extensions, errors, etc.
%e A000001 Examples examples to illustrate initial terms.
%p A000001 Maple program.
%t A000001 Mathematica program.
%o A000001 OtherProgram in another language.
%K A000001 Keywords (required)
%C A000001 Comments.
""".strip().splitlines()

RECORD_MAP = {words[0][1]: words[2]
              for line in record_info
              for words in [line.split()]}
# pp(RECORD_MAP, sort_dicts=False)

HELP = f"""\
{__doc__}

function oeis_request
=====================

{oeis_request.__doc__}
"""

if __name__ == '__main__':
    if '--help' in sys.argv:
        print(HELP)
        sys.exit(0)

    if '--test' in sys.argv[1:]:
        _req, _start = '1,2,3,4,5,6,6,7,7,8', 0
    elif len(sys.argv) == 3:
        _req, _start = sys.argv[1:]  # pylint: disable=W0632
    else:
        print(HELP)
        sys.exit(1)

    _data = oeis_request(_req, _start)  # pylint: disable=E0601
    pp(_data, width=512, sort_dicts=False)
    #print(json.dumps(_data, indent=2))


 

Test output

$ ./oeis_request.py 3,11,26,50,85 0
{'SEARCH': 'seq:3,11,26,50,85',
 'SHOWING': 'Showing 1-1 of 1',
 'A051925': {'Identification': '#84 Jun 26 2022 03:06:23',
             'SEQ': [0, 0, 3, 11, 26, 50, 85, 133, 196, 276, 375, 495, 638, 806, 1001, 1225, 1480, 1768, 2091, 2451, 2850, 3290, 3773, 4301, 4876, 5500, 6175, 6903, 7686, 8526, 9425, 10385, 11408, 12496, 13651, 14875, 16170, 17538, 18981, 20501, 22100, 23780],
             'Name': 'a(n) = n*(2*n+5)*(n-1)/6.',
             'Comments.': 'Related to variance of number of inversions of a random permutation of n letters.\n'
                          'Zero followed by partial sums of A005563. - _Klaus Brockhaus_, Oct 17 2008\n'
                          'a(n)/12 is the variance of the number of inversions of a random permutation of n letters. See evidence in Mathematica code below. - _Geoffrey Critzer_, May 15 2010\n'
                          'The sequence is related to A033487 by A033487(n-1) = n*a(n) - Sum_{i=0..n-1} a(i) = n*(n+1)*(n+2)*(n+3)/4. - _Bruno Berselli_, Apr 04 2012\n'
                          "Deleting the two 0's leaves row 2 of the convolution array A213750. - _Clark Kimberling_, Jun 20 2012\n"
                          'For n>=4, a(n-2) is the number of permutations of 1,2...,n with the distribution of up (1) - down (0) elements 0...0110 (the first n-4 zeros), or, the same, a(n-2) is up-down coefficient {n,6} (see comment in A060351). - _Vladimir Shevelev_, Feb 15 2014',
             'Reference': 'V. N. Sachkov, Probabilistic Methods in Combinatorial Analysis, Cambridge, 1997.',
             'Link': 'Vincenzo Librandi, <a href="/A051925/b051925.txt">Table of n, a(n) for n = 0..1000</a>\nJ. Wang and H. Li, <a href="http://dx.doi.org/10.1016/S0012-365X(01)00301-6">The upper bound of essential chromatic numbers of hypergraphs</a>, Discr. Math. 254 (2002), 555-564.\n<a href="/index/Rec#order_04">Index entries for linear recurrences with constant coefficients</a>, signature (4,-6,4,-1).',
             'Formula': 'a(n) = A000330(n) - n. - _Andrey Kostenko_, Nov 30 2008\nG.f.: x^2*(3-x)/(1-x)^4. - _Colin Barker_, Apr 04 2012\na(n) = 4*a(n-1) - 6*a(n-2) + 4*a(n-3) - a(n-4). - _Vincenzo Librandi_, Apr 27 2012\nE.g.f.: (x^2/6)*(2*x + 9)*exp(x). - _G. C. Greubel_, Jul 19 2017',
             'Mathematica': 'f[{x_, y_}] := 2 y - x^2; Table[f[Coefficient[ Series[Product[Sum[Exp[i t], {i, 0, m}], {m, 1, n - 1}]/n!, {t, 0, 2}], t, {1, 2}]], {n, 0, 41}]*12 (* _Geoffrey Critzer_, May 15 2010 *)\nCoefficientList[Series[x^2*(3-x)/(1-x)^4,{x,0,50}],x] (* _Vincenzo Librandi_, Apr 27 2012 *)',
             'OtherProgram': '(PARI) {print1(a=0, ","); for(n=0, 42, print1(a=a+(n+1)^2-1, ","))} \\\\ _Klaus Brockhaus_, Oct 17 2008\n(Magma) I:=[0, 0, 3, 11]; [n le 4 select I[n] else 4*Self(n-1)-6*Self(n-2)+4*Self(n-3)-Self(n-4): n in [1..50]]; // _Vincenzo Librandi_, Apr 27 2012',
             'Cross-references': 'Cf. A000330, A005563, A033487.',
             'Keywords': ['nonn', 'easy'],
             'Offset': '0,3',
             'Author': '_N. J. A. Sloane_, Dec 19 1999'}}
$


A test with paged results

Highlighting the NEXT value of what is needed to get the next page of results

$ ./oeis_request.py 1,2,3,4,5 0
{'SEARCH': 'seq:1,2,3,4,5',
 'SHOWING': 'Showing 1-10 of 7513',
 'A000027': {'Identification': 'M0472 N0173 #637 Aug 14 2023 15:10:40',
             ...
 'A007953': {'Identification': '#280 Jun 18 2023 11:41:19',
             ...
 'A000961':
 ...
 'A002260': {'Identification': '#205 Feb 03 2023 18:43:52',
             ...
 'NEXT': ['1,2,3,4,5', 10]}
$

No comments:

Post a Comment

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive