Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(utils): add build context env for author email (#446) #452

Merged
merged 3 commits into from Sep 22, 2020

Conversation

gpaciga
Copy link
Contributor

@gpaciga gpaciga commented Sep 18, 2020

Adds LHCI_BUILD_CONTEXT__AUTHOR_EMAIL environment variable to override getting the commit author's email from git, and uses this to get the gravatar URL when LHCI_BUILD_CONTEXT__AVATAR_URL is not specified.

Resolves #446

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks very much @gpaciga! I had one extra thought, but this LGTM too if you don't want to add that one :)

@@ -245,12 +245,28 @@ function getAvatarUrl(hash = 'HEAD') {
});
if (result.status !== 0) {
throw new Error(
"Unable to determine commit author's avatar URL because `git log --format=%aE -n 1` failed to provide author's email. " +
'This can be overridden with setting LHCI_BUILD_CONTEXT__AVATAR_URL env.'
'Unable to determine commit author email with `git log --format=%aE -n 1`. ' +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think we should first fallback to getAuthor(hash) and try to extract the email from the <user@example.com> bit before falling back to git log again, WDYT are you willing to add that here? :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar thought but hesitated, mainly because introducing a regex to extract the email will be more fallible than asking git for it, and without the build context vars falling back to getAuthor would still call git log twice. (unless the author is stored somewhere we can pull from?)

As a compromise, getAuthorEmail could look for the AUTHOR env again, and extract from that if present, but otherwise still use git. And if the AUTHOR env var exists but doesn't match, say, /^.* <(.+)>$/, throw an error. How does that sound?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool yeah something like that sounds good, I just put up my suggestion. I don't think we want to have to manage the author vars in two places so calling it worst case 3 times so we can dedupe our management of overrides seems fine . it's not a live, super time-sensitive CLI call 🤷

const envHash = getEnvVarIfSet([
// Manual override
'LHCI_BUILD_CONTEXT__AVATAR_URL',
'LHCI_BUILD_CONTEXT__AUTHOR_EMAIL',
]);
if (envHash) return envHash;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this?

Suggested change
if (envHash) return envHash;
if (envHash) return envHash;
// Next try to parse the email from the complete author.
const author = getAuthor(hash);
const emailMatch = author.match(/ <(\S+@\S+)>$/);
if (emailMatch) return emailMatch[1];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with something along these lines, but with an error check on the match, and keeping the AVATAR_URL override option available. Might have been overzealous on the tests but I like to be sure I know what a regex is actually doing.

const emailRegex = new RegExp(/ <(\S+@\S+)>$/);
const emailMatch = author.match(emailRegex);
if (!emailMatch) {
throw new Error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if there was a miscommunication @gpaciga but I don't think we want to throw do we? The email can be invalid today and we should just fallback to the existing behavior in that case.

Was there a concern you had with the suggestion that I added to the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a difference - using a regex means there's new expectations on the format of the author build context var or the email in git that didn't exist before. Now, if there's not an AUTHOR build context, we might use the email from git or we might not. If there is an AUTHOR build context var, but formatted oddly, we risk creating a garbage url without the clear cause and effect that inputing a garbage AVATAR_URL build context has. I think it's work being clear about what we're doing.

Something closer to the existing behaviour would be to throw in getAvatarUrl in the case that no email is found, the difference being now that the error is with a regex rather than git. There are actually several possible remedies now - use the AVATAR_URL build context, or fix the AUTHOR build context, or fix the git email.

If we really want to maintain the existing behaviour of accept anything in git no matter what, I think the only choices are either (1) only use the regex to parse the AUTHOR build context or (2) go back to an explicit AUTHOR_EMAIL var. In the first case, we don't need to throw if you prefer, but I would add a comment to the configuration docs making clear what the connection between AUTHOR and AVATAR_URL is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to maintain the existing behaviour of accept anything in git no matter what

Since I don't want to force a major version bump just for this, yes, this is a requirement of the impl here.

I think the only choices are either (1) only use the regex to parse the AUTHOR build context

I think that was exactly my suggestion in #452 (comment), right? :)

In the first case, we don't need to throw if you prefer, but I would add a comment to the configuration docs making clear what the connection between AUTHOR and AVATAR_URL is.

That sounds great to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was exactly my suggestion in #452 (comment), right? :)

Not quite, the suggestion uses the regex regardless of how the author is retrieved. It also appears to drop support for AVATAR_URL. Unless I'm misreading something?

Just to confirm though before I take another stab at it, since this differs slightly from the original comment, what we want is:

  • if LHCI_BUILD_CONTEXT__AVATAR_URL is set, use that (existing behaviour)
  • Otherwise, if LHCI_BUILD_CONTEXT__AUTHOR is set, extract email from that with regex
    • use whatever extraction provides as email
    • if no match is found, fall back to what? empty string? and rely on docs to flag this case.
  • Otherwise, if neither are set, get email from git log and throw if that fails (existing behaviour)

This would still change the behaviour for someone setting AUTHOR but not AVATAR_URL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, the suggestion uses the regex regardless of how the author is retrieved.

Correct, I'm sorry but I'm still not understanding why we would want to restrict it to just LHCI_BUILD_CONTEXT__AUTHOR 😅 we'll get the same answer either way, just with a regex instead?

It also appears to drop support for AVATAR_URL. Unless I'm misreading something?

No it doesn't. Maybe it would help if I applied my suggestion and showed the full code I am talking about

  const envHash = getEnvVarIfSet([
    // Manual override
    'LHCI_BUILD_CONTEXT__AVATAR_URL',
  ]);  
  if (envHash) return envHash;
  
  // Next try to parse the email from the complete author.
  const author = getAuthor(hash);
  const emailMatch = author.match(/ <(\S+@\S+)>$/);
  if (emailMatch) return emailMatch[1];

  // Finally fallback to git again if we couldn't parse the email out of the author.
  const result = childProcess.spawnSync('git', ['log', '--format=%aE', '-n', '1', hash], {
    encoding: 'utf8',
  });
  if (result.status !== 0) {
    throw new Error(
      "Unable to determine commit author's avatar URL because `git log --format=%aE -n 1` failed to provide author's email. " +
        'This can be overridden with setting LHCI_BUILD_CONTEXT__AVATAR_URL env.'
    );
  }

  return getGravatarUrlFromEmail(result.stdout);

This solves every concern I can think of right now. It uses an email from LHCI_BUILD_CONTEXT__AUTHOR if an email exists in it; if one isn't being set, or it's missing an email in the manual override, or our regex is wrong and can't parse the git output for some reason, it gracefully falls back to the behavior today.

If we flag that the avatar URL will be inferred from AUTHOR email in the docs, I believe this also addresses all the concerns you have mentioned 👍

This would still change the behaviour for someone setting AUTHOR but not AVATAR_URL.

To clarify, I'm not concerned about any behavior changing. Automatically picking up an email from the AUTHOR env var is a sensible bug fix. I'm concerned about behavior that results in a fatal error where previously there was no fatal error (as would happen if we threw on LHCI_BUILD_CONTEXT__AUTHOR missing an email which has never been a requirement)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be if (emailMatch) return getGravatarUrlFromEmail(emailMatch[1]); but otherwise ok, we're looking at the same code now :)

There are values for %aN and %aE that can find a match in %aN <%aE> that do not equal %aE, which changes the resulting gravatar url someone would get. Are we ok with that case? It would require %aE not being a valid email in the first place, so presumably those cases would have had the 'wrong' gravatar url anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah oops, sorry yes return getGravatarUrlFromEmail(emailMatch[1]); :)

Are we ok with that case? It would require %aE not being a valid email in the first place, so presumably those cases would have had the 'wrong' gravatar url anyway.

Yeah that seems fine to me. I'm primarily concerned with changes that result in exceptions and non-zero exit codes that fail users' CI pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think we're on the same page now, new commit in.

I kept the separate regex function for the sake of simpler/more direct tests, but let me know if you'd prefer to move it inline in getAvatarUrl like you had and I'll adjust.

expect(() => buildContext.getEmailFromAuthor('Patrick Hulce <patrick@ >')).toThrow();
});

it('should ignore angle brackets other than the last pair', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are amazing thank you @gpaciga ! 😃 🎉

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect 👌 thanks very much @gpaciga!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add build context env for author email for deriving gravatar urls
2 participants