----------------------------------------------------------------------------------
@MSGID: <20230913231055.828@kylheku.com> 646f482e
@REPLY: <20230913201834.46@kylheku.com> 8bed65d5
@REPLYADDR Kaz Kylheku <864-117-4973@kylheku.com>
@REPLYTO 2:5075/128 Kaz Kylheku
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID: <20230913231055.828@kylheku.com>
@RFC-References: <20230911202309.171@kylheku.com>
<20230911234401.851@kylheku.com> <20230912101332.94@kylheku.com> <rq5atj-sh92.ln1@wilbur.25thandClement.com>
<20230913201834.46@kylheku.com>
@TZUTC: -0000
@PID: slrn/pre1.0.4-9 (Linux)
@TID: FIDOGATE-5.12-ge4e8b94
On 2023-09-14, Kaz Kylheku <
864-117-4973@kylheku.com> wrote:
> On 2023-09-14, William Ahern <
william@25thandClement.com> wrote:
>> Kaz Kylheku <
864-117-4973@kylheku.com> wrote:
>>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>>>> The real function should handle patterns starting with "**/" and also
>>>>> ending in "/**", as well as when "**" is the entire pattern.
>>>>
>>>> I fixed this in the prototype.
>>>
>>> Issues:
>>
>>>
>>> 2. Escaping
>>>
>>> The interior /**/ pattern could occur in a class like [abc/**/def]
>>> in which case it must not be recognized.
>>
>> FWIW, OpenBSD sh seems not to tolerate slashes in bracket expressions:
>
> Yes; and it doesn`t make sense. Or not in glob, anyway.
>
> Nevertheless, a star glob preprocessor above glob should heed the class
> syntax and not treat /**/; just pass that through to glob and let it
> fail.
>
> Matching slashes with class syntax makes sense in situations when
> we are not matching paths. Or matching paths more freely.
> obvously it`s allowed in a POSIX shell case statement, and in fnmatch()
> (in the absence of FNM_PATHNAME) and and so on.
>
>> At first I was wondering why you thought you could get away with merely
>> scanning for slash+double-star and double-star+slash--bracket expressions
>> obviously require stateful parsing.
>
> I was initially after the behavior: proof-of-concept. When things have
> driven around the block, then we tighten the lugnuts.
>
> In the current version I have it look for ** being the whole thing,
> starting with **/, ending with /** or containing /**/ where the
> leading / is not escaped or in a character class.
>
> I think I may havae a bug: for detecting the trailing /**, we should
> avoid interpreting it if it follows a backslash; let glob deal
> with it.
>
>> But I guess none of that is helpful if you`re trying to match some
>> sophisticated Bash behavior.
>
> I used this to cob together a function called glob* that is now
> integrated in TXR Lisp.
>
> The current glob function is a wrapper for glob written in C,
> which now calls superglob if the extension flag GLOB_XSTAR is present.
>
> glob accepts a list of patterns, not just a single pattern; the results
> from multiple patterns are catenated.
>
> glob* is written in Lisp, and performs its own brace expansion to
> generate a list of patterns passed to glob with GLOB_XSTAR.
I should mention I fixed the brace expansion sorting issue;
and the general sorting issue.
The individual super_globs calls beneath the glob wrapper do the
sorting: GLOB_NOSORT is passed to glob and then the array is sorted
using qsort.
The brace expansion is processed in glob* which appends the individual
results together in order, so there is no global sort messing things up.
For qsort, I am using the comparison function below.
In contrast, Bash`s internal glob function does stupid things,
like use strcoll or strcmp under different circumstnaces.
strcoll falls victim to locale; strcmp is silly for paths.
The following function just sorts on byte without regard for
character set. However, it collates the / character before any other.
So for instance these two entries are in sorted order:
test/
test-dir/
whereas under strcmp(), test-dir would come first because -
is before / in ASCII.
convert is a casting macro; just pretend convert(T, E) is ((T) (E)).
static int glob_path_cmp(const void *ls, const void *rs)
{
const unsigned char *lstr = *convert(const unsigned char * const *, ls);
const unsigned char *rstr = *convert(const unsigned char * const *, rs);
for (; *lstr && *rstr; lstr++, rstr++)
{
if (*lstr == *rstr)
continue;
if (*lstr == `/`)
return -1;
if (*rstr == `/`)
return 1;
if (*lstr < *rstr)
return -1;
if (*lstr > *rstr)
return 1;
}
if (!*lstr)
return -1;
if (!*rstr)
return 1;
return 0;
}
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don`t see you, unless you`re whitelisted.
--- slrn/pre1.0.4-9 (Linux)
* Origin: A noiseless patient Spider (2:5075/128)
SEEN-BY: 5001/100 5005/49 5010/352 5015/255 5019/40
5020/715 848 1042 4441
SEEN-BY: 5020/12000 5030/49 1081 5075/128
@PATH: 5075/128 5020/1042 4441