Hi Ian,
I had another search question I was hoping you could help me with.
Is there a way to search within a specific folder, without searching through any of the folders it itself contains.
For example, I search for a bam file in /iplant/home/nowlanf, which contains several bam files, but also has a folder (let's say Human) that contains bam files. I only want to return the bam files at the level of /iplant/home/nowlanf and exclude those at /iplant/home/nowlanf/Human.
Any help would be appreciated!
Best,
Nowlan
Hello! If you know the folder structure in advance then I think it's
possible, but nothing immediately comes to mind (with what's currently
implemented in our search DSL) if you don't know the structure in
advance.
So for your example, knowing the structure in advance, what you could do
is something like:
{"query": {"all": [<clause for a path prefix of /iplant/home/nowlanf/>,
<any other query stuff, like filtering down to bam files, users,
metadata, etc>], "none": [<clause for a path prefix of
/iplant/home/nowlanf/Human/>]}}
that is, you can explicitly exclude the subfolders if you know of their
existence when constructing the query.
I can't think of a way with the current search DSL to do an arbitrary
search for "items in X folder but not a subfolder", however. For our
needs, we'd only implemented a "path prefix" clause, and I think you'd
need more complex filtering on the path to filter out all subfolders.
If you need the more generalized version, I can think of two options for
how to change things:
- lower performance, faster implementation, possibly more flexible:
allow passing a regular expression to the 'path' clause (in addition
to/instead of a prefix), and then you'd be able to filter out paths
that have any additional slashes after the folder name you care about
(or even more complex things, like including some subfolders but not
others, etc. etc.)
- higher performance, but needs more changes on our end, more
tailor-made for this use case (so less flexible): add an indexed field
for just the containing folder (rather than full path), plus either
a clause specifically for this field or an amendment of the path
clause
In either case, we'd of course want to push it through our planning and
scheduling process, so I've cc:ed in Sriram, our boss.
Hope this helps answer your question!
–
Ian
Hi Ian,
Thank you, this is very helpful!
Let me explain a little more about the issue we're running into and what I was hoping to accomplish with the search endpoint.
The web app we are building using Terrain relies on us knowing whether a file has been shared with the user "anonymous", and thus is available through dav-anon. To list files/folders we are currently using the paged-directory endpoint. This gives us everything we need except the user permissions. For user permissions, we are using the user-permissions endpoint, though it requires us to make an API call for every file or folder. This works well enough for the user's own data, but we ran into an issue with shared/community data. The user-permissions endpoint will not work on shared/community data, as it throws an error (user is not owner). However, if we use the search endpoint on that same file/folder we can get the file permissions even if the user is not the owner (so a bit of a workaround).
Ultimately we just need to be able to list everything within a folder but not subfolder, include the user permissions, and it would work for the user's data as well as shared/community data. I think either of the options you mentioned for enhancing the search endpoint would work for us. Or another way to look at it would be if there was a way to include the user-permissions for all files/folders in the paged-directory endpoint. Either would get us what we need.
Thank you!
Nowlan
Ah, I see. We're discussing among ourselves what way to solve that we
like best. One question that might inform the plan: do you want to do
the folder listing as though you're the anonymous user, i.e. showing
only things that anonymous can see, or do you want to do it as the
authenticated user, but only allow certain actions when it's anonymously
accessible?
That is, imagine this structure:
/folder: accessible to 'foo', 'bar', and anonymous
/folder/file1: accessible to 'foo' only
/folder/file2: accessible to 'bar' and 'anonymous'
/folder/file3: accessible to 'foo' and 'anonymous'
Imagine a request by the user 'foo'. Should they
a.) see file1 and file3 (those accessible to user foo), with some
special indicator for file3 (since it's anonymously accessible)
or b.) should they see file2 and file3 (those accessible to anonymous,
excluding those accessible only to 'foo').
Or, perhaps, c.) should it show all three since they're all accessible
to either 'foo' or 'anonymous'.
Some options we're thinking about:
- add full permissions to the listing (allows all 3 options, but it's
the most work)
- add 'isPublic' and 'isAnonymous' flags or similar to the listing, that
indicate if 'public' and 'anonymous' can see files which otherwise
appear in the listing (this would enable option a, and also could be
useful in the DE for us)
- allow passing some sort of flag that does the listing as the anonymous
user regardless of who's actually authenticated (this would enable
option b)
We're also thinking about amending search and such, but we wanted to
make sure that we know which of those sorts of things you're looking to
do specifically!
Thanks,
Ian
Hi Ian,
We are interested in listing the files/folders as the authenticated user and allowing the user to do certain actions if the file/folder is anonymously accessible. We do not need the user to see files/folders that the user does not have access to, but have been made anonymous.
Our general flow is: user signs in to our web app with their CyVerse credentials using the CAS single sign on. We use their access token to request the list of files/folders in their home directory. We would then like to enable/disable a button in the UI based on whether a file/folder has been shared with anonymous (our software (IGB) requires the data be shared with anonymous so that we can make byte range requests). Right now the button is enabled for all files by default, which makes for a bad user experience. In addition, this breaks down with the shared/community data as there is no way to visually indicate which files have been shared with anonymous without making a lot of API calls.
So in short, I think example (a) best describes our situation.
Either of the first two options would give us what we need. And I think just having the isAnonymous flag returned in the listing would be all we would need.
Thank you so much!
Nowlan
Closing pending CyVerse implementing the history.