Designing a REST api by URI vs query string

  • Let's say I have three resources that are related like so:

    Grandparent (collection) -> Parent (collection) -> and Child (collection)
    

    The above depicts the relationship among these resources like so: Each grandparent can map to one or several parents. Each parent can map to one or several children. I want the ability to support searching against the child resource but with the filter criteria:

    If my clients pass me an id reference to a grandparent, I want to only search against children who are direct descendants of that grandparent.

    If my clients pass me an id reference to a parent, I want to only search against children who are direct descendants of my parent.

    I have thought of something like so:

    GET /myservice/api/v1/grandparents/{grandparentID}/parents/children?search={text}
    

    and

    GET /myservice/api/v1/parents/{parentID}/children?search={text}
    

    for the above requirements, respectively.

    But I could also do something like this:

    GET /myservice/api/v1/children?search={text}&grandparentID={id}&parentID=${id}
    

    In this design, I could allow my client to pass me one or the other in the query string: either grandparentID or parentID, but not both.

    My questions are:

    1) Which API design is more RESTful, and why? Semantically, they mean and behave the same way. The last resource in the URI is "children", effectively implying that the client is operating on the children resource.

    2) What are the pros and cons to each in terms of understandability from a client's perspective, and maintainability from the designer's perspective.

    3) What are query strings really used for, besides "filtering" on your resource? If you go with the first approach, the filter parameter is embedded in the URI itself as a path parameter instead of a query string parameter.

    Thanks!

    The title of your question should be extremely confusing to anyone viewing this. The valid segments of a URI are defined as ://:@:/;?/# (although is deprecated) A "query string" is a valid component of a URI so your "vs" in the title is crazy talk.

    Do you mean `I want to only search against children who are INdirect descendants of that grandparent.` ? According to your structure, Grandparent has no direct children.

    What is the diference between a child and a parent? Is a parent a parent if he doesnt have children? Smells of a design fault

    re: `potential design flaw` and if you have information about a person but no information on their parents, do they qualify as a `child`? (e.g., Adam and Eve ) :)

  • First

    As Per RFC 3986 §3.4 (Uniform Resource Identifiers § (Syntax Components)|Query

    3.4 Query

    The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority (if any).

    Query components are for retrieval of non-hierarchical data; there are few things more hierarchical in nature than a family tree! Ergo - regardless of whether you think it is "REST-y" or not- in order to conform to the formats, protocols, and frameworks of and for developing systems on the internet, you must not use the query string to identify this information.

    REST has nothing to do with this definition.

    Before addressing your specific questions, your query parameter of "search" is poorly named. Better would be to treat your query segment as a dictionary of key-value pairs.

    Your query string could be more appropriately defined as

    ?first_name={firstName}&last_name={lastName}&birth_date={birthDate} etc.

    To answer your specific questions

    1) Which API design is more RESTful, and why? Semantically, they mean and behave the same way. The last resource in the URI is "children", effectively implying that the client is operating on the children resource.

    I don't think this is as clear cut as you seem to believe.

    None of these resource interfaces are RESTful. The major precondition for the RESTful architectural style is that Application State transitions must be communicated from the server as hypermedia. People have labored over the structure of URIs to make them somehow "RESTful URIs" but the formal literature regarding REST actually has very little to say about this. My personal opinion is that much of the meta-misinformation about REST was published with the intent of breaking old, bad habits. (Building a truly "RESTful" system is actually quite a bit of work. The industry glommed on to "REST" and back-filled some orthogonal concerns with nonsensical qualifications and restrictions. )

    What the REST literature does say is that if you are going to use HTTP as your application protocol, you must adhere to the formal requirements of the protocol's specifications and you cannot "make http up as you go and still declare that you are using http"; if you are going to use URIs for identifying your resources, you must adhere to the formal requirements of the specifications regarding URI/URLs.

    Your question is addressed directly by RFC3986 §3.4, which I have linked above. The bottom line on this matter is that even though a conforming URI is insufficient to consider an API "RESTful", if you want your system to actually be "RESTful" and you are using HTTP and URIs, then you cannot identify hierarchical data through the query string because:

    3.4 Query

    The query component contains non-hierarchical data

    ...it's as simple as that.

    2) What are the pros and cons to each in terms of understandability from a client's perspective, and maintainability from the designer's perspective.

    The "pros" of the first two is that they are on the right path. The "cons" of the third one is that it appears to be flat out wrong.

    As far as your understandability and maintainability concerns, those are definitely subjective and depend on the comprehension level of the client developer and the design chops of the designer. The URI specification is the definitive answer as to how URIs are supposed to be formatted. Hierarchical data is supposed to be represented on the path and with path parameters. Non-hierarchical data is supposed to be represented in the query. The fragment is more complicated, because its semantics depend specifically upon the media type of the representation being requested. So to address the "understandability" component of your question, I will attempt to translate exactly what your first two URIs are actually saying. Then, I will attempt to represent what you say you are trying to accomplish with valid URIs.

    Translation of your verbatim URIs to their semantic meaning /myservice/api/v1/grandparents/{grandparentID}/parents/children?search={text} This says for the parents of grandparents, find their child having search={text} What you said with your URI is only coherent if searching for a grandparent's siblings. With your "grandparents, parents, children" you found a "grandparent" went up a generation to their parents and then came back down to the "grandparent" generation by looking at the parents' children.

    /myservice/api/v1/parents/{parentID}/children?search={text} This says that for the parent identified by {parentID}, find their child having ?search={text} This is closer to correct to what you are wanting, and represents a parent->child relationship that can likely be used to model your entire API. To model it this way, the burden is placed upon the client to recognize that if they have a "grandparentId", that there is a layer of indirection between the ID they have and the portion of the family graph they are wishing to see. To find a "child" by "grandparentId", you can call your /parents/{parentID}/children service and then foreach child that is returned, search their children for your person identifier.

    Implementation of your requirements as URIs If you want to model a more extensible resource identifier that can walk the tree, I can think of several ways you can accomplish that.

    1) The first one, I've already alluded to. Represent the graph of "People" as a composite structure. Each person has a reference to the generation above it through its Parents path and to a generation below it through its Children path.

    /Persons/Joe/Parents/Mother/Parents would be a way to grab Joe's maternal grandparents.

    /Persons/Joe/Parents/Parents would be a way to grab all of Joe's grandparents.

    /Persons/Joe/Parents/Parents?id={Joe.GrandparentID} would grab Joe's grandparent having the identifier you have in hand.

    and these would all make sense (note that there could be a performance penalty here depending on task by forcing a dfs on the server due to a lack of branch identification in the "Parents/Parents/Parents" pattern.) You also benefit from having the ability to support any arbitrary number of generations. If, for some reason, you desire to look up 8 generations, you could represent this as

    /Persons/Joe/Parents/Parents/Parents/Parents/Parents/Parents/Parents/Parents?id={Joe.NotableAncestor}

    but this leads into the second dominant option for representing this data: through a path parameter.


    2) Use path parameters to "query the hierarchy" You could develop the following structure to help ease the burden on consumers and still have an API that makes sense.

    To look back 147 generations, representing this resource identifier with path parameters allows you to do

    /Persons/Joe/Parents;generations=147?id={Joe.NotableAncestor}

    To locate Joe from his Great Grandparent, you could look down the graph a known number of generations for Joe's Id. /Persons/JoesGreatGrandparent/Children;generations=3?id={Joe.Id}

    The major thing of note with these approaches is that without further information in the identifier and request, you should expect that the first URI is retrieving a Person 147 generations up from Joe with the identifier of Joe.NotableAncestor. You should expect the second one to retrieve Joe. Assume that what you actually want is for your calling client to be able to retrieve the entire set of nodes and their relationships between the root Person and the final context of your URI. You could do that with the same URI (with some additional decoration) and setting an Accept of text/vnd.graphviz on your request, which is the IANA registered media type for the .dot graph representation. With that, change the URI to

    /Persons/Joe/Parents;generations=147?id={Joe.NotableAncestor.Id}#directed

    with an HTTP Request Header Accept: text/vnd.graphviz and you can have clients fairly clearly communicate that they want the directed graph of the generational hierarchy between Joe and 147 generations prior where that 147th ancestral generation contains a person identified as Joe's "Notable Ancestor."

    I'm unsure if text/vnd.graphviz has any pre-defined semantics for its fragment;I could find none in a search for instruction. If that media type actually does have pre-defined fragment information, then its semantics should be followed to create a conforming URI. But, if those semantics are not pre-defined, the URI specification states that the semantics of the fragment identifier are unconstrained and instead defined by the server, making this usage valid.


    3) What are query strings really used for, besides "filtering" on your resource? If you go with the first approach, the filter parameter is embedded in the URI itself as a path parameter instead of a query string parameter.

    I believe I have already thoroughly beaten this to death, but query strings are not for "filtering" resources. They are for identifying your resource from non-hierarchical data. If you have drilled down your hierarchy with your path by going /person/{id}/children/ and you are wishing to identify a specific child or a specific set of children, you would use some attribute that applies to the set you are identifying and include it inside the query.

    The RFC is only concerned with hierarchy insofar as it defines a syntax and algorithm for resolving relative URI references. Could you elaborate or cite some sources explaining why the examples in the original post are not conforming?

    @user2313838 (comment probably not best medium for elaboration) No. The global restriction you seem to have placed upon RFC3986 for specifying only the resolution of relative URI references is not present. The mentioning in the RFC of "relative URIs" is permissive, not exclusive of other approaches. The hierarchical nature of the path described in the general syntax imports no context of defining references relative to a separately defined context. The hierarchical nature of the path is to identify hierarchically organized data. If your data is hierarchical, thou shalt use the path.

    Isn't a family tree really a graph not a tree, and not at all hierarchical. considering multiple parents, divorce and re-marriage etc.

    @Myster Yes and no. A "tree" is technically known as an arborescent graph(directed and acyclic), but this is a hierarchical structure. I did make a mistake in my final point where I used the "undirected" fragment; I meant to say "directed" and I hadn't noticed that I had reversed it. I probably wouldn't have caught it if you had not made your comment.

    @K.AlanBates Would the fact that the query string doesn't contain "filters" but attributes that identify the resource imply a 404 returned if the result of the search is the empty set? That would sound counterintuitive for me.

    @RobertoAloi It seems counterintuitive to me to communicate your own "No Items Found" interface through an empty set when HTTP already has a definition for that. The basic principle is that you are asking the server to return "thing(s)" and if there are no "thing(s)" to return, the server communicates that with "404 - Not Found" What's counterintuitive about that?

    I always believed 404 to indicate that the root URL of the resource was not found, i.e. the collection as a whole. So If you queried /books?author=Jim and there were no books by jim, yo'd receive an empty set []. But if you queried /articles?author=Jim but the articles resource didn't even exist as a collection 404 would help to indicate that there's no use in looking for any articles at all.

    @adjenks You didn't learn that from formal specifications. Structurally, the constituents of a url can be thought of as containing a "root" if that helps you reason about the purposes for the constituent parts, but ultimately the query string is not a display filter against a resource identified via the path. The query string is a first class citizen of a url. When you find no resources on the server which match your url (including the query string) 404 is the means defined within http to communicate this. The interaction model you pose introduces a distinction without a difference.

    @adjenks (continued) ...you are perfectly within reason to define your own client-server interaction model which has deviations from specified, expected behavior.But when you do that, you are unequivocally building a system which is NOT RESTful and the context of this question and answer centers around building a RESTful system.Your amendment to switch system interactions with urls and http from being 'resource location based' to being 'search result set' based is inconsistent with the RESTful architectural style.(continued=>)

    (continued) REST requires that if you are using "URL"s and HTTP/1.1, you must adhere to RFC 3986 for the requirements of URIs and RFCs 7230,7231,7232,7233,7234, and 7235 for the requirements of HTTP/1.1. If you wish to declare your system "RESTful" and declare support for URLs and HTTP/1.1, your amendment would be forbidden. REST is intentionally an exceptionally strict architectural style. (end)

    @adjenks regarding your use case for providing an empty array instead of a 404: that is building a curated protocol.You are encoding meaning behind the type of response returned when resources are not found and the structure involved is very particular to your system and to its goals.Ultimately, your gain is that you avoid having to make a secondary call to check existence of your secondary entity (author).You can constrain that use case entirely within the client, so your simplification actually complicates matters rather than clarifying them.Make the second call if you care why you got a 404

    It's good to see someone following RFC 3986 for URIs, and *just* that. Lots of answers on Stackoverflow state that the query component should be used as a filter (for no reason), or that the path component should not have a hierarchy beyond 1 level (e.g., `/artists/{artist}`) or 2 levels (e.g., `/artists/{artist}/albums/`), arbitrarily prohibiting lower levels (e.g., `/artists/{artist}/albums/{album}/songs/{song}`). So I need your expertise on my recent post (I haven't accepted the given answer yet as something feels wrong): https://softwareengineering.stackexchange.com/q/391242/184279

    … Basically I am told to give up the hierarchical structure in my URIs for a flat structure. It seems to go against the role of the path component as defined in RFC 3986. The only advantage of this that I see is for querying: with a single level path component (`/artists/{artist_uuid}`, `/albums/{album_uuid}` and `/songs/{song_uuid}`), one can easily build queries such as `/songs/?artist={artist_uuid}`. I cannot think of a way to achieve that in one request with a multi-level path component (`/artists/{artist_name}/albums/{album_name}/songs/{song_name}`). Any suggestion?

    @Maggyero I can take a look at your question and see if there is any additional perspective that I can add. I can't answer your question directly through this comment because I don't know what tools you have available to your service client.

    @Maggyero I see why you have concerns with the guidance provided to you so far. I should be able to help you find your way to where you are trying to go. I'll put a response together for you tonight

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM