ISE Review of Internet Draft
draft-rprice-ups-management-protocol.03

Reply to ISE Comments

> Date: Mon, 14 Jun 2021 05:24:00 -0700
> Subject: ISE review of draft-rprice-ups-management-protocol
> From: "RFC ISE (Adrian Farrel)" <rfc-ise@rfc-editor.org>
> To: draft-rprice-ups-management-protocol@ietf.org

> Hi,

> I have (finally!) done my initial review as Independent Submissions
> Editor. The purpose of the review is a sanity check to make sure that
> the document is in a good state to send out for wider review.

> Below you'll find a long list of minor issues and nits.

> Obviously, you also have a change to the IANA stuff held in your buffer
> and we should get a revision with that change before asking for external
> review (otherwise we'll just end up having the same conversations all
> over again!).

> Adrian Farrel (ISE),
> rfc-ise@rfc-editor.org

Hello Adrian,

Thanks very much for all the comments. Never when I worked in the ISO did I ever get comments like these!

It is clear that version 03 was insufficiently explained and motivated. There were also sections which were very confusing and difficult to understand.

I have made many changes and I have added sections to provide context, motivation and explanation. The NUT project has accepted to change command names which are obviously confusing such as LOGIN.

I hope that the result in version 04 will be clearer to someone who is not a NUT project specialist.

The following paragraphs reply to each of your comments.

Thanks again for such helpful comments which have led to a major improvement in the document.

Roger

Email: draft-rprice-ups-management-protocol@ietf.org


  1. Section 1 Introduction

    Can you please confirm that any associated IPR of which you are aware has been disclosed according to the IETF's IPR policies?

    2021-06-17: I confirm that neither I nor any of the project mailing list subscribers are aware of any associated IPR that has to disclosed according to the IETF's IPR policies.


  2. Section Abstract

    Current practice leads to weak security and this is addressed in the Security and IANA Considerations.

    I read the IANA Considerations section, and I don't see how it is relevant to security except for 7.2.2. But (see my later comments on Section 7), the IANA Considerations section needs to be restricted only to instructions to IANA (not justification, not background). So I think you can rephrase as...

    Current practice leads to weak security, and this is addressed in the Security Considerations section of this document.

    2021-06-17: Rephrased abstract as proposed.


  3. Section 2.2 Driver (now section 2.3)

    The protocol between the driver and the attachment daemon may use [RFC1628].

    Bearing in mind that 1628 documents a MIB module, I think you need a bit more detail in this statement. Something like...

    The protocol between the driver and the attachment daemon may encode data using the UPS Management Information Base module [RFC1628] and possibly directly use SNMPv3 [RFC3411].

    2021-06-17: After reflection it seemed to me that discussion of the traffic between the driver and the attachement daemon is out of the scope of this document. I will remove all reference to RFC1628


  4. Section 2.3 Event (now section 2.4)

    An Event is a change in UPS status (2.10) detected by the Management Daemon (2.5)

    This is really up to you: I'm not asking for a change unless you think one is needed. Normally, a state transition is triggered by an event, but is not an event of itself.

    In fact, Section 5.2 has this right when it says

    A Management Daemon (2.5) deduces the occurrence of a UPS Event from a change in the UPS status (2.10) received from the Attachment Daemon (2.1).

    2021-06-17: Section 5.2 is correct. Section 2.3 (now 2.4) is sloppy. I will clarify the definition to distinguish between the status change in the Attachment Daemon and the Event observed by the Management Daemon.


  5. Section 3 Protocol Overview

    I understand the configuration in Figure 4, but I wonder what happens if the primary goes off line for some reason. Do all of the other processors (the secondaries) lose power protection, or does one of the secondaries take over as primary? If so, how?

    2021-06-17: added explanatory note below figure 4.


  6. Section 4 Protocol specification

    Most responses received by the Management Daemon (2.5) are sufficient in themselves, and at most require knowledge of the previous response to that command.

    This begs the question about the other response.

    2021-06-17: Sloppy writing! Rewritten.


  7. Section 4.2.1 FSD (now section 4.2.3)

    Current practice uses the symbol to tell each Secondary (2.8) system to shut down.

    It would help to say which component does this. I think it is the Attachment Daemon in the primary.

    2021-06-18: Clarified.


  8. Section 4.2.1 FSD (now section 4.2.3)

    Current practice requires that an application introduce security controls in its session management to defend against abusive use of this command. The details are outside the scope of this text.

    Actually, I don't think they are out of scope. You could...

    OLD
    The details are outside the scope of this text.
    NEW
    Further discussion of this issue can be found in the Security Considerations (6).
    END

    2021-06-18: I beefed up the whole treatment of administrative users with a definition of Administrative User in the new terminology section 2.2, improved discussion in newly numbered section 4.2.13 USERNAME, section 4.2.8 PASSWORD, and a new sub-section 6.5 "Administrative Security" in the Security Considerations. See also items 4.2.6 and 4.2.8 below.


  9. Section 4.2.2.3 NUMLOGINS (now section 4.2.4.3 NUMATTACH)

    where <value> is the number of clients which have succeeded in doing a LOGIN to this UPS.

    This is ambiguous. Is it the total number of clients that have ever logged in, the total number of successful logins, or the number of clients currently logged in?

    From the subsequent text...

    This information may be needed by the Management Daemon (2.5) to determine how many clients are still connected when starting the system shutdown process.

    ...I assume you mean "currently logged in".

    2021-06-19: The whole subject of LOGIN/LOGOUT/NUMLOGINS is completely confused by improper terminology. The NUT project has decided to change the terms used as follows

    Version 03 Version 04
    Section Term Section Term
    4.2.6 LOGIN 4.2.1 ATTACH
    4.2.7 LOGOUT 4.2.2 DETACH
    4.2.2.3 NUMLOGINS 4.2.4.3 NUMATTACH

    The section LOGIN/ATTACH has been rewritten and expanded to make clear what is happening: the Attachment Daemon is counting the number of secondaries.

    The sections LOGOUT/DETACH and NUMLOGINS/NUMATTACH have also been rewritten to explain the technique of counting secondaries.


  10. Section 4.2.5.3 ENUM (now section 4.2.7.3)

       ENUM su700 input.transfer.low "103"
       ENUM su700 input.transfer.low "100"

    I assume you have deliberately given an example of the enumeration values not in increasing numeric order, but this implies that the enumeration can be listed as a random ordering which might be odd. It would probably help to clarify in the text what is and isn't allowed.

    2021-06-19: Clarified that in current practice the output is an unordered list.


  11. Section 4.2.5.4 RANGE (now section 4.2.7.4)

    The example shows multiple valid ranges being returned on a single query. The implication is that the range might be discontinuous and that this is a way of returning the full set of valid values.

    However, the text in this section uses the singular when talking about "the interval in which valid values of a UPS variable lie." I think you need to use "intervals" in a couple of places. And I think you need to explicitly call out how multiple ranges are reported and why.

    2021-06-20: The example given is misleading. It refers to an earlier exotic UPS still supported by the code base. One generally expects a single range and this is what the I-D should present: the example shown has been simplified.


  12. Section 4.2.5.5 RW (now section 4.2.7.5)

    An economical UPS will support few variables

    I think s/will/may/ (Cheap does not need to mean rubbish!)

    Action 2021-06-19: True! Change made.


  13. Section 4.2.6 LOGIN (now section 4.2.1 ATTACH)

    The response is "OK" if the login is successful.

    It would be helpful to indicate what happens if the login is not successful. (Yes, I know one specific failure is described later in the section.) I think a pointer to 4.3 would be enough.

    Action 2021-06-19: This section has been thoroughly rewritten. I added a pointer to the error responses for many commands.


  14. Section 4.2.6 LOGIN and 4.2.8 PASSWORD (now 4.2.1 and 4.2.8)

    I cannot tell from the text how I am supposed to login when a password is required. Do I enter LOGIN followed by PASSWORD? How do I know that a password is required?

    Or, in fact, is the PASSWORD command associated with the USERNAME command of 4.2.13?

    2021-06-20: The section is just plain wrong and incomprehensible. To sort this out, the NUT project has changed LOGIN -> ATTACH and LOGOUT -> DETACH.

    I have reworked and beefed up the whole treatment of administrative users. I added a definition of Administrative User in new section 2.2 Administrative User, improved discussion in section 4.2.13 USERNAME and 4.2.8 PASSWORD, and I wrote a new sub-section 6.5 "Administrative Security" in the Security Considerations showing how Administrative users get "logged in".


  15. Section 4.2.8 PASSWORD

    Another case where a pointer to 4.3 is needed to explain the error cases.

    2021-06-23: Added pointer to the error responses for many commands.


  16. Section 4.2.9 PRIMARY

    It seems to me from the text and figures that commands are issued by the Management Daemon and answered by the Attachment Daemon. But in this section you have...

    The Attachment Daemon (2.1) uses this command within a Session (2.9) to claim that it is a Primary (2.7) and has the required authority to perform such critical actions as setting status symbol "FSD".

    That seems to be a command flowing in the other direction.

    I *think* that this command does actually flow from the Management Daemon and is answered by the Attachment Daemon, but it is a query. That is, the Management Daemon is asking the Attachment Daemon whether it is the Primary.

    Incidentally, I don't see a way to ask an Attachment Daemon to assume the Primary role.

    2021-06-23: Added new section 6.5.1 "Management of Administrative Users" to explain how the Attachment Daemon is told that an Administrative User is Primary or Secondary. This information is stored in the Attachment Daemon, so the Management Daemon has to get confirmation that it is Primary. It's the OK response for the PRIMARY command which says "Yes, you are a Primary".

    Rewrote section 4.2.9 Primary which now uses 6.5.1 to explain the role of the PRIMARY command.


  17. Section 4.2.10 PROTVER

          |  Note: Historically, this command was known as "NETVER" and
          |  current practice is to use "NETVER" instead of "PROTVER"

    I think that this text (which should be normal text) is intended to give important implementation advice. That is, a new Daemon implementation should expect to be able to support both NETVER and PROTVER.

    Action 2021-06-23: Yes, this should be normal text.


  18. Section 4.2.10 PROTVER

    Just checking that the version number returned is not enclosed in quotes

    2021-06-23: I added a note to sections 4.2.10 PROTVER and 4.2.14 VER to clarify that there are no quotation marks.


  19. Section 4.2.11 SET

    Another case where a pointer to 4.3 is needed.

    Action 2021-06-23: I added a reference to the error messages 4.3 to this and other commands.
  20. Section 4.2.12 STARTTLS

    This seems to be the only command where success doesn't just return "OK" but "OK <command name>" Again, a pointer to 4.3 is needed for the error case.

    The replies in case of success are:

    Response if command succeeds
    03 Section Command OK reply
    4.2.1 FSD OK FSD-SET
    4.2.2 GET Sub command specific
    4.2.3 HELP Specific
    4.2.4 INSTCMD OK
    4.2.5 LIST Sub command specific4
    4.2.6 LOGIN OK
    4.2.7 LOGOUT OK Goodbye
    4.2.8 PASSWORD OK
    4.2.9 PRIMARY OK
    4.2.10 PROTVER Specific
    4.2.11 SET OK
    4.2.12 STARTTLS OK STARTTLS
    4.2.13 USERNAME OK
    4.2.14 VER Specific

    Action 2021-06-23: I added in a reference to the error messages 4.3 for this and other commands. I will put this table in section 4.3.


  21. Section 4.2.12 STARTTLS

    Should you give some mention of TLS versions?

    Action 2021-06-23: I assume the system administrators have their own site policy, so I'll say that the TLS version "is a matter for site security policy and is not specified in this document".
  22. Section 4.2.13 USERNAME

    You are describing the protocol as it is, so I probably shouldn't pick at it, but this command appears to be allowing a user who does not have authority to set the fact that they do have authority.

    2021-06-23: 4.2.13 USERNAME has been rewritten based on the new section 6.5.1 Management of Administrative Users.


  23. Section 4.3 Error responses

    While you can do nothing about the text the protocol uses to report conditions that are not valid (e.g., "INVALID-ARGUMENT") it would be good if you could use "not valid" rather than "invalid" in the descriptive text. (There is also one occurrence of "invalid" in 4.2.2.4 that could change to "not valid".)

    2021-06-23: Changed all use of the word "invalid" in the descriptive text to "not valid" or "non valid".


  24. Section 5.2 Events

    In current practice, the variable "ups.status" is retrieved every 5 seconds.

    This is, presumably, an implementation choice for the Management Daemon rather than a feature of the protocol. So "current practice" may be the wrong thing to say. Actually, I think you need to advise implementations how to behave. Something like...

    Action 2021-06-23: I stole your words!

    The Management Daemon should retrieve the variable "ups.status" from the Attachment Daemon at regular intervals. If the interval is too short, valuable resources will be wasted, but if the interval is too large, the Management Daemon will not have up-to-date information about the UPS status. A default value of 5 seconds is recommended, but an implementation may make this value configurable.


  25. Section 6 Security considerations

    A functioning power supply is vital to a computing system. The Management Daemon (2.5) is able to shut down a working system and it's power supply

    Technically, I think, the Management Daemon can shut down the power supply and that may bring down the working system. However, that will not be an orderly shut down of the working system.

    You might want to reword as...

    A functioning power supply is vital to a computing system. The Management Daemon (2.5) is able to shut down the power supply to a working system and so bring down the working system

    2021-06-24: In normal operation, the system is shut down before the power supply. Since the precise sequence of operations is not intuitive, I have added new appendix B "The Shutdown Story" to explain exactly what happens. I added links to this appendix in 1.2 Current Practice, and 6 Security Considerations. However a malicious client acting as a Management Daemon could turn off the UPS power outlets, thus bringing down a working system.

    I will add this to the introduction to section 6.


  26. Section 6.2 Encryption

    The protocol provides command "STARTTLS" which calls on the Attachment Daemon (2.1) to support TLS encryption of the communication. If this command is accepted, the Management Daemon (2.5) must also encrypt.

    At present the command "STARTTLS" is too frequently refused, and traffic proceeds unencrypted, with for example plain text transmission of passwords and status values.

    There are several serious concerns here.

    (1) Can a Management Daemon send STARTTLS before LOGIN and PASSWORD? This is not clear from 4.2.12. But, obviously, if LOGIN and PASSWORD are used before TLS, then the details are sent in the clear and can be intercepted.

    2021-06-25: This raises three problems in NUT:

    1. The misuse of the term LOGIN for something that is not a login,

      I have rewritten the section 4.2.6 LOGIN (now 4.2.1 ATTACH) to explain what is really happening, and added section 2.2 Administrative user, with examples in section 6.5 Administrative Security.

      My use of LOGIN in section 6.1 Agent Verification is wrong and misleading. I should have written USERNAME. My apologies for that one. I have corrected and clarified section 6.1.

    2. The use of the term USERNAME when LOGIN would have been more appropriate

      We have to live with this choice. I have tried to clarify section 4.2.13 USERNAME

    3. The general security weakness. In current NUT practice, the use of TLS is optional and to enforce it the system administrator must set declarations FORCESSL to 1 and CERTVERIFY to 1 in the Management Daemon configuration file. Secure practice would require enforcement by the Attachment Daemon.

      I have extended the description of current practice as insecure, since use of TLS is not enforced by the server. It's currently up to the system administrator to evaluate risk and mitigation in each installation. For example some installations use stunnel for secure communication of commands and responses.

      Enforced secure communication will require tightening up the Attachment Daemon to enforce the use of command STARTTLS, and to refuse non-TLS communication. A simple way of doing this is to retain port nut/3493 for the current weakly secure traffic, and to use some other port, perhaps ups/TBD1, for enforced secure communication, much in the manner of http and https. I have added new section 6.4.3 Long Term: Enforced Secure Communication to explain this.


  27. Section 6.2 Encryption continued

    (2) I think you need to give some advice to implementations. Firstly, you need to strongly recommend that Management Daemons always request TLS. Secondly, you should recommend Attachment Daemons to support TLS and accept the request.

    2021-06-25: Agreed. I added section 6.4.3 "Long Term: Enforced Secure Communication". ... MUST encrypt ...


  28. Section 6.2 Encryption continued

    (3) You're quite right to flag up the frequent refusal the way you do. I think you might explain the consequences and risks a bit more. Perhaps you could also give advice to a Management Daemon about what it should do if TLS is not available.

    2021-06-30: I have reinforced the introduction to section 6 with reference to, and short quotation from, IEC TS 62351-1 Power systems management and associated information exchange -- Data and communications security. Part 1: Communication network and system security -- Introduction to security issues

    The TLS shims described in 6.4.1 are a somewhat abstracted suggestion. Current practice uses stunnel, and I have included mention of stunnel in new subsection 6.2.1 "Secure tunnels", and in new subsection 6.4.2 "SSH tunnels".


  29. Section 6.2 Encryption continued

    (4) Is there a risk of "downgrade attacks"? I think so. In this attack, the STARTTLS message is intercepted by an attacker, and a spoof rejection is returned. Thus, the Management Daemon believes that the Attachment Daemon is unwilling/unable to use encryption and will send all messages in the clear.

    2021-07-03: Yes, the risk exists with current practice. For the long term, I have added section 6.4.3 "Long Term: Enforced Secure Communication". ... MUST encrypt ...


  30. Section 7.1 Namespaces

    ... it's great that you're not burdening IANA with managing this namespace. I think, therefore, you need to move it out of the IANA section into a separate section (otherwise IANA will be confused as to what they have to do).

    2021-07-03: Subsection 7.1 is now in a section entitled "Codepoint Management" and is now outside section 8 IANA Considerations.


  31. Section 7 IANA Considerations (now section 8)

    I believe you have some edits planned for Section 7.2 to clarify exactly what action you want IANA to take.

    Section 7 "IANA Considerations" (now section 8) has been redrafted to say exactly what IANA is asked to do, and nothing else.


  32. Section 8 Implementation status (now section 9)

    Section 8 is useful.

    According to RFC 7942 (which I know quite well ;-) the intention is that Implementation Status sections are deleted before publication as an RFC. The reasoning is that implementation status gets pretty out of date as soon the RFC is published. That makes the main purpose of the section to convince people that publication is OK - well, I'm convinced already!

    There are two options for you:

    1. Delete the section having decided that there is no need of its material any more.
    2. Remove the discussion of 7942, and reword the section to make it a snap-shot report. That means removing any "current status" and all use of the present tense.

    2021-07-03: I prefer the second option: an historical snap-shot report. I have reworked the section to this effect.


  33. ==Nits== Title case

    Could you make all the section titles in title case. That is, more upper case at the start of words.

    2021-07-03: Done


  34. General s/e.g./e.g.,/

    2021-07-03: Done. Also did s/i.e./i.e.,/


  35. Abstract

    s/This text/This document/

    s/Considerations/Considerations sections of this document/

    2021-07-03: Done


  36. Section 1.1 How to read this text

    You can remove the editor note ... // Ed: To be removed.

    2021-07-03: Done


  37. Section 1.2 Current Practice

    Why is the text "Historically, the previous number 3305..." indented and marked with a vertical bar? If it is commentary, you should present it as normal text. If it is a quote, you need to provide a reference.

    2021-07-03: The <aside> is a historical comment and is irrelevant to the protocol, so I removed it.


  38. Section 1.3 Requirements Language

    I don't think you use any BCP 14 language in this document. That's OK, and you can delete the section.

    2021-07-03: The new section 6.4.3 Long Term: Enforced Secure Communication uses SHOULD, MUST and SHALL, so I kept this section.


  39. Section 1.4 Comments

    Once published as an RFC, I think you move beyond comments and editorial matters. Any editorial issues will be raised through the RFC errata reporting system. I think you can re-write this as...

    OLD
    1.4. Comments

    The editor welcomes comments. Technical matters should be addressed to the NUT Project (2.6)'s mailing list [mailinglist]. Editorial matters may be addressed directly to the editor, email: "ietf@rogerprice.org" .

    NEW
    1.4. Additional Information

    Further information about this protocol and related technical matters may be addressed to the mailing list of the NUT Project (2.6) [mailinglist].

    END

    2021-07-03: Agreed, done.


  40. Section 2.1 Attachment Daemon

    "hardware statuses 2.10"

    I think s/2.10/(2.10)

    2021-07-03: Done. Fixed. Looks like an xml2rfc curiosity.


  41. Section 2.1 Attachment Daemon

    An Attachment Daemon runs as a detached software service for a dedicated user, often called "nut".

    It's not clear whether it is the dedicated user or the Attachement Daemon that is often called "nut".

    2021-07-03: I clarified that the daemon is launched as system user "root" and drops to a dedicated non-privileged system user. I removed reference to "nut", a commonly used non-privileged system user, since there is no technical requirement to use this name.


  42. Section 2.10 UPS Status

    Statuses...

    A number of the statuses are abbreviations (OB, OL, LB, DAL, CHRG, ...). It would be good to expand them somewhere. 2.10 seems the obvious place. But, of course, it is usual to expand abbreviations at the time they are first used within the document.

    Since Section 5.1 has all of this information, I think a way to handle this would be to add some text to section 1.1. Something like...

    The statuses and commands used in this document are named with short text-form names, some of which are abbreviations. A full list of these statuses can be found in (5.1) while the commands are listed in (5.2).

    2021-07-03: I added to following text to section 1.1 How to Read this Document:

    The statuses and events appearing in this document are named with short text-form names, some of which are abbreviations. A full list of the statuses can be found in section Status Symbols (5.1) while the events are listed in section Events (5.2).


  43. Section 2.10 UPS Status

    See the appendix (5.1) which specifies each of these symbols.

    I don't think Section 5.1 is an appendix

    2021-07-03: My error, corrected.


  44. Section 4.1 Notation used in this Specification

    I'm not sure the double double quotes is right in ""UPS on fire""

    2021-07-03: This is xml2rfc weirdness. The XML markup is <tt>"UPS on fire"</tt> which is rendered correctly in a fixed-width font in the HTML output. But the rendering for .txt output mysteriously adds double quotation marks. I recommended in section 1.1 How to Read this Document that readers should prefer the HTML version, which renders the tag <tt> correctly without quotes.

    I have removed the <tt> and </tt> tags from the words "UPS on fire".


  45. Section 4.1 Notation used in this Specification

    White space within commands and responses is reduced to one U+0020 (SP) SPACE.

    For consistency with the previous paragraph, you should have "SPACE (SP)"

    2021-07-03: Yes, looks better, corrected.


  46. Section 4.2 Commands

    The indented text with vertical bars looks to me like regular text I should read.

    2021-07-03: There is no need for that note to be an <aside>. I removed the <aside> tags. The vertical bars will disappear.


  47. Section 4.2.1 FSD (and a number of other sections)

       Command: "FSD <upsname>"           The response is: "OK FSD-SET" if
       the command is successful.

    Maybe format on separate lines like you do in other sections.

    2021-07-03: Yes, that will be more consistent, and look better. Done.


  48. Section 4.2.2 GET (now 4.2.4)

    Retrieve a single response from the server.

    This is the first mention of "server". Is that the term you intend to use?

    2021-07-03: I was thinking of the technical implementation. It would be better in a standards specification to say Attachment Daemon.


  49. Section 4.2.2.1 CMDDESC (now 4.2.4.1)

    OLD
    This is like "DESC"
    NEW
    This is like "DESC" (4.2.2.2)
    END

    2021-07-03: Done. Reference added.


  50. Section 4.2.2.4 TYPE (now 4.2.4.4)

           | RW           | This variable may be set to another value  |
           |              | with command "SET"                         |

    It might be clearer to note that this is a read/write variable. (I assume that's what RW means.)

    2021-07-03: I was picking up text as-is from the project's documentation. I rewrote with some much needed clarification that RW = read/write.


  51. Section 4.2.6 LOGIN (now 4.2.1 ATTACH)

    The indented text with vertical bars looks to me like regular text I should read.

    2021-07-04: This section has been completely rewritten, and the <aside> has disappeared.


  52. Section 5.2 Events

    Might be nice to renumber the Notes in Table 4 so that Note 3 is not empty.

    2021-07-04: A bad habit from my ISO days. Change made.


  53. Section 7 IANA Considerations

    You can remove the comment about RFC 8126.

    2021-07-04: Done. RFC 8126 also removed from the References.


This file: http://rogerprice.org/NUT/ISE-comments-2021-06-14.reply.html
Last change was on Tue Jul 6 15:16:40 CEST 2021