Best read on larger than a phone screen
The title is a rather mean pun on ossify, but it stuck.
I had been creating series of integers from prior work, and had been looking them up, manually, on
Work progressed and I was creating a table of related sequences and looked to automate the OEIS searches.
I found a description of the textual internal format here which seemed steraight-forward and "awkable" - meaning that the format is written in a way that a short awk program could parse. I did find a couple of old python libraries that said they did this, but I wanted to flex my parsing skills and this seemed an enticing.
Mile high looking down
Checking that internal format, the example result seem to be of the form:
%<TAG> A<number> <value>
To show search results in the internal format you need to make a slight change to a url:
https://oeis.org/search?fmt=text&q=3%2C11%2C26%2C50%2C85 (test)
Reading the internal format in more detail
I saw that the %S %T and %U lines are all giving members of the sequence - there seems to be no reason to keep a distinction, I could parse and return all the values in order.
The %p, %t, %o lines when you concatenate all of the same type, give program sources in three different languages. They can be treated similarly.
the %k keywords are best returned as a list of strings.
The other keywords, if they appear more than once in a sequence definition then subsequent values can append as new lines, accumulating a final value.
Extra Info given
When you look at a search result in text format you get a Search: and Showing line I might want to return this data and maybe also derive a Next entry from the details of Showing that is what is needed to generate the next page of results for the given search.
The Regexp
Yep, I started with a regexp, so put a lot of sample text into regex101 as TEST STRING then generated the regexp used in the program.
The Output
A Python dict with all string keys. Each found A-series will be a nested dict of keys corresponding to the %-letter tags.
Keys Search, Showing and Next are about the search and so appear in the top dict.
The Code: oeis_request.py
#!/bin/env python3
# -*- coding: utf-8 -*-
"""
Module to search and return sequences from:
The On-Line Encyclopedia of Integer Sequences® (OEIS®)
https://oeis.org/
https://oeis.org/eishelp1.html
Standalone Usage:
oeis_request.py <query> <start>
oeis_request.py --help
oeis_request.py --test
Created on Sat Aug 12 07:57:53 2023
@author: paddy3118, paddy3118.blogspot.com
"""
from pprint import pp
import re
import ssl
import sys
from typing import Any
import urllib.request
import urllib.parse
import json # pylint: disable=W0611
URL = 'https://oeis.org/search'
DBG = not True
finditer = re.compile(r"""
# https://regex101.com/r/iNL0ve/2
(?: Search: \s+ (?P<SEARCH> .* ))
| (?: (?P<SHOWING> ^Showing\s.*))
| (?: ^% (?P<_TAG> (?P<SEQ> [STU] ) | (?P<PROG> [pto] ) | (?P<REST> [A-Za-z]) ) \s+
(?P<ANUMBER> A\d+) \s+ (?P<VALUE> .*) )
""", re.MULTILINE | re.VERBOSE).finditer
OEISTYPE = dict[str, str | dict[str, str | list[int | str]]]
def oeis_request(query: str, start: int=0) -> OEISTYPE:
"""
Search OEIS for query returning a page of results from position start as a Python dict
Parameters
----------
query : str
The OEIS query in a format consistent with https://oeis.org/hints.html.
start : int, optional
page offset when multiple pages of results are available. The default is 0.
Returns
-------
dict[str, # key0, top level keys: SEARCH, SHOWING, NEXT
str # string value,
| dict[ # Or a nested dict for each sequence definition:
str, # key1, sequence level string keys,
str | # And either a string value,
list[int # A list for the int Sequence,
| str]]] # or a list of the str Keywords; as value.
Returned Python datastructure representing the (page of) sequences found.
Top level keys are the 'SEARCH' and what results are 'SHOWING', together
with the sequence numbers of returned sequences like 'A123456' - which
all have one sub-dict as its value. Top level key 'NEXT', if present,
has as its value the query and start value pre-computed to retrieve the
next page of results from oeis_request.
All possible keys of a sequence sub-dict are the values of global variable
RECORD_MAP
"""
# https://stackoverflow.com/a/60671292/10562
# Security risk as ssl cert checking is skipped for this function
# pylint: disable=protected-access
tmp, ssl._create_default_https_context = ssl._create_default_https_context, \
ssl._create_unverified_context
# pylint: enable=protected-access
data = {}
data['fmt'] = 'text'
data['q'] = query
if start:
data['start'] = start
url_values = urllib.parse.urlencode(data)
url = URL + '?' + url_values
if DBG:
sys.stderr.write(url+'\n')
sys.stderr.flush()
with urllib.request.urlopen(url) as req:
txt =req.read().decode('utf8')
# Restored:
ssl._create_default_https_context = tmp # pylint: disable=protected-access
return oeis_parser(txt)
def oeis_parser(text: str) -> dict[str, Any]: # pylint: disable=too-many-branches
"""
Parse text formatted like https://oeis.org/eishelp1.html
Parameters
----------
text : str
text returned from OEIS search.
Returns
-------
dict[str, Any]
parsed data
"""
data = {}
for matchobj in finditer(text):
group = matchobj.groupdict()
try:
anumber = group['ANUMBER']
if anumber: # Skip None
adict = data.setdefault(anumber, {})
except KeyError:
anumber = ''
try:
value = group['VALUE']
except KeyError:
value = ''
if val:=group[key0:='SEARCH']:
data[key0] = val
elif val:=group[key0:='SHOWING']:
data[key0] = val
elif val:=group[key1:='SEQ']:
# Concat all %S $T and %U lines as list[str]
adict.setdefault(key1, []).extend(int(v) for v in
value.strip().strip(',').split(','))
elif val:=group['PROG']:
# Concat all individual %p %t %o lines as list[str]
adict.setdefault(RECORD_MAP[val], []).append(value.strip())
elif (val:=group['REST']) and val == 'K':
# Split and concat Keywords, %K lines as list[str]
adict.setdefault(RECORD_MAP[val], []).extend(value.strip().split(','))
elif val:=group['REST']:
# Concat other %_ lines as list[str]
adict.setdefault(RECORD_MAP[val], []).append(value.strip())
else:
# Should never arrive here from the regexp!
assert False, f"Got {group = }"
# fixup
for key, subd in data.items():
if key[0] == 'A' and issubclass(type(subd), dict):
for record, value in subd.items():
if record != 'Keywords' and issubclass(type(value), list) \
and issubclass(type(value[0]), str):
subd[record] = '\n'.join(value)
next_from_showing(data)
return data
def next_from_showing(parsed: OEISTYPE)-> None:
"""
Parse "SHOWING" key to generate "NEXT" value that would request the next page
If already at the end then no next key is in data
parsed is altered in-place
Parameters
----------
parsed : OEISTYPE
Parsed data to insert a NEXT key/value.
Returns
-------
None.
argument updated in-place
"""
matchobj = re.match(r"^Showing\s+(\d+)-(\d+)\s+of\s+(\d+)\s*$", parsed.get('SHOWING', ''))
if matchobj:
_start, stop, end = [int(x) for x in matchobj.groups()]
if stop < end:
search = parsed['SEARCH'].split(':', 1)[-1]
parsed['NEXT'] = [search, stop]
return
if 'NEXT' in parsed:
del parsed['NEXT']
record_info = """
%I A000001 Identification line (required)
%S A000001 First line of sequence (required)
%T A000001 2nd line of sequence.
%U A000001 3rd line of sequence.
%N A000001 Name (required)
%D A000001 Reference Detailed reference line.
%H A000001 Link to other site.
%F A000001 Formula .
%Y A000001 Cross-references to other sequences.
%A A000001 Author (required)
%O A000001 Offset (required)
%E A000001 Etc Extensions, errors, etc.
%e A000001 Examples examples to illustrate initial terms.
%p A000001 Maple program.
%t A000001 Mathematica program.
%o A000001 OtherProgram in another language.
%K A000001 Keywords (required)
%C A000001 Comments.
""".strip().splitlines()
RECORD_MAP = {words[0][1]: words[2]
for line in record_info
for words in [line.split()]}
# pp(RECORD_MAP, sort_dicts=False)
HELP = f"""\
{__doc__}
function oeis_request
=====================
{oeis_request.__doc__}
"""
if __name__ == '__main__':
if '--help' in sys.argv:
print(HELP)
sys.exit(0)
if '--test' in sys.argv[1:]:
_req, _start = '1,2,3,4,5,6,6,7,7,8', 0
elif len(sys.argv) == 3:
_req, _start = sys.argv[1:] # pylint: disable=W0632
else:
print(HELP)
sys.exit(1)
_data = oeis_request(_req, _start) # pylint: disable=E0601
pp(_data, width=512, sort_dicts=False)
#print(json.dumps(_data, indent=2))
Test output
$ ./oeis_request.py 3,11,26,50,85 0
{'SEARCH': 'seq:3,11,26,50,85',
'SHOWING': 'Showing 1-1 of 1',
'A051925': {'Identification': '#84 Jun 26 2022 03:06:23',
'SEQ': [0, 0, 3, 11, 26, 50, 85, 133, 196, 276, 375, 495, 638, 806, 1001, 1225, 1480, 1768, 2091, 2451, 2850, 3290, 3773, 4301, 4876, 5500, 6175, 6903, 7686, 8526, 9425, 10385, 11408, 12496, 13651, 14875, 16170, 17538, 18981, 20501, 22100, 23780],
'Name': 'a(n) = n*(2*n+5)*(n-1)/6.',
'Comments.': 'Related to variance of number of inversions of a random permutation of n letters.\n'
'Zero followed by partial sums of A005563. - _Klaus Brockhaus_, Oct 17 2008\n'
'a(n)/12 is the variance of the number of inversions of a random permutation of n letters. See evidence in Mathematica code below. - _Geoffrey Critzer_, May 15 2010\n'
'The sequence is related to A033487 by A033487(n-1) = n*a(n) - Sum_{i=0..n-1} a(i) = n*(n+1)*(n+2)*(n+3)/4. - _Bruno Berselli_, Apr 04 2012\n'
"Deleting the two 0's leaves row 2 of the convolution array A213750. - _Clark Kimberling_, Jun 20 2012\n"
'For n>=4, a(n-2) is the number of permutations of 1,2...,n with the distribution of up (1) - down (0) elements 0...0110 (the first n-4 zeros), or, the same, a(n-2) is up-down coefficient {n,6} (see comment in A060351). - _Vladimir Shevelev_, Feb 15 2014',
'Reference': 'V. N. Sachkov, Probabilistic Methods in Combinatorial Analysis, Cambridge, 1997.',
'Link': 'Vincenzo Librandi, <a href="/A051925/b051925.txt">Table of n, a(n) for n = 0..1000</a>\nJ. Wang and H. Li, <a href="http://dx.doi.org/10.1016/S0012-365X(01)00301-6">The upper bound of essential chromatic numbers of hypergraphs</a>, Discr. Math. 254 (2002), 555-564.\n<a href="/index/Rec#order_04">Index entries for linear recurrences with constant coefficients</a>, signature (4,-6,4,-1).',
'Formula': 'a(n) = A000330(n) - n. - _Andrey Kostenko_, Nov 30 2008\nG.f.: x^2*(3-x)/(1-x)^4. - _Colin Barker_, Apr 04 2012\na(n) = 4*a(n-1) - 6*a(n-2) + 4*a(n-3) - a(n-4). - _Vincenzo Librandi_, Apr 27 2012\nE.g.f.: (x^2/6)*(2*x + 9)*exp(x). - _G. C. Greubel_, Jul 19 2017',
'Mathematica': 'f[{x_, y_}] := 2 y - x^2; Table[f[Coefficient[ Series[Product[Sum[Exp[i t], {i, 0, m}], {m, 1, n - 1}]/n!, {t, 0, 2}], t, {1, 2}]], {n, 0, 41}]*12 (* _Geoffrey Critzer_, May 15 2010 *)\nCoefficientList[Series[x^2*(3-x)/(1-x)^4,{x,0,50}],x] (* _Vincenzo Librandi_, Apr 27 2012 *)',
'OtherProgram': '(PARI) {print1(a=0, ","); for(n=0, 42, print1(a=a+(n+1)^2-1, ","))} \\\\ _Klaus Brockhaus_, Oct 17 2008\n(Magma) I:=[0, 0, 3, 11]; [n le 4 select I[n] else 4*Self(n-1)-6*Self(n-2)+4*Self(n-3)-Self(n-4): n in [1..50]]; // _Vincenzo Librandi_, Apr 27 2012',
'Cross-references': 'Cf. A000330, A005563, A033487.',
'Keywords': ['nonn', 'easy'],
'Offset': '0,3',
'Author': '_N. J. A. Sloane_, Dec 19 1999'}}
$
A test with paged results
Highlighting the NEXT value of what is needed to get the next page of results
$ ./oeis_request.py 1,2,3,4,5 0
{'SEARCH': 'seq:1,2,3,4,5',
'SHOWING': 'Showing 1-10 of 7513',
'A000027': {'Identification': 'M0472 N0173 #637 Aug 14 2023 15:10:40',
...
'A007953': {'Identification': '#280 Jun 18 2023 11:41:19',
...
'A000961':
...
'A002260': {'Identification': '#205 Feb 03 2023 18:43:52',
...
'NEXT': ['1,2,3,4,5', 10]}
$
No comments:
Post a Comment