Skip to content

Conversation

@sandyarmstrong
Copy link
Member

In VS 18.4, the editor supports a new ISpanAnnotator API to allow assigning arbitrary kinds to text spans. This is used by Find in Files, and populates a new "Kind" column in the results table. This allows to sort/filter by these kinds, fulfilling the user request of being able to exclude comments from text search.

This PR provides an initial implementation of this API for C# that identifies comments and strings.

image

@sandyarmstrong sandyarmstrong requested review from a team as code owners January 26, 2026 20:34
@dotnet-policy-service dotnet-policy-service bot added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Jan 26, 2026
private const int LargeObjectHeapLimitInChars = 40 * 1024;

public IEnumerable<SpanAndKind> GetAnnotatedSpans(
ITextSnapshot snapshot,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have an open file, will this give the itextsnapshot for that open file? if so, we can map that right to a preparsed roslyn document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. But there's no guarantee that this is an open file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i def get that. but we can easily try to get the corresponding doc if it is open, and fall back to this approach if it is not.

that said, if it's a closed file, i would far prefer we get passed a path. That way we can see if we already have the doc, and the parsed tree, for it.

we ccan always fall back to this. but i would prefer we do everything we can to avoid parsing for files we have already.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it'd be incorrect in some cases - what guarantees you'll be using the same text as the snapshot we're passing to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with Kayle, we've decided not to change the API. Instead, if you find that you need the file path and can use it safely, we can add it into the snapshot's buffer's property bag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving this since I saw this after replying to the other comment: why aren't we getting an ITextDocument then or something to better pass along the file name? Also should the snapshot be an ITextImage?

public IEnumerable<SpanAndKind> GetAnnotatedSpans(
ITextSnapshot snapshot,
IEnumerable<Span> spans,
CancellationToken cancellationToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question about this api in a general sense. does the UI block on getting this information? or is it added lazily to the UI once computed?

i'm very wary about slowing down the UI with something potentially expensive.

Copy link
Member Author

@sandyarmstrong sandyarmstrong Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Find in Files is implemented today, this is part of a pipeline that blocks on a file-by-file basis.

EDIT: Nope, Kayle is right. For the Find case, it is in fact happening on the threadpool. For the Replace All case, though, it is currently on the UI thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that, i would consider even going simpler here and looking at the line and ignoring it if the line doesn't start with <spaces>//<spaces> or <spaces>*<spaces>. Yes, this is slightly inaccurate. But it's a good filter to avoid unnecessary work.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's multiple tasks running on the threadpool for this - it won't block the UI thread

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kayle is right, edited my comment for accuracy.

}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipped to this point.

@CyrusNajmabadi
Copy link
Contributor

Done with pass.

@davidwengier
Copy link
Member

FYI @sandyarmstrong, I'm going to be creating a release/dev18.4 branch in a few hours so this will probably need retargeting.

{
public ISpanAnnotator GetAnnotator(string fileExtension) => new CSharpSpanAnnotator();

private class CSharpSpanAnnotator : ISpanAnnotator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a design perspective, i would like us to have the roslyn side not be c# specific. this should work for VB as well, and it would not be hard to make that work, given how little of this is lang specific. i can help with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was mostly hoping to just dump this prototype in the Roslyn team's lap and let them do whatever they want with it.

The feedback I'm mostly looking for is on the span annotator API itself. This is based on discussions we had a few months ago, and the outcome of design review, but I'm still open to tweaks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works for me.

i think we should pass the filename. it just seems like we should let the LS use information it has if it has it. and only pay the tokenization cost if the file really isn't in memory, and the ITextSnapshot is what we should be using.

try
{
tokensEnumerator = SyntaxFactory.ParseTokens(
snapshot.GetText(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the more concerning call. Can we parse tokens from a snapshot directly instead of allocating a giant string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at computer. But if this is creating a string, that's bad. On the other hand, we have helpers to wrap an ITextSnapshot with a SourceText, which is waht we really want to do here.

Note though that i strongly think that we should not just tokenize the entire file to find the matching token for a particular span in teh file. We should use heuristics first to only do this work if we have a strong indication that we're in a comment. In the majority of cases, it would likely be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's fine. I'm more concerned with false positives than false negatives here (that is, I think users will be more annoyed if we erroneously mark a result as being a comment/string, rather than the other way around).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code path only taken for all files found in Find in Files, or only after we've already determined we don't have the information another way? Because it's quite unfortunate that we have lost file name information here, since it's entirely possible we already have this tree parsed and ready to go if you're using find in files while a solution is open.

<!-- CodeStyleAnalyzerVersion should we updated together with version of dotnet-format in dotnet-tools.json -->
<CodeStyleAnalyzerVersion>4.8.0-3.final</CodeStyleAnalyzerVersion>
<VisualStudioEditorPackagesVersion>18.0.404-preview</VisualStudioEditorPackagesVersion>
<VisualStudioEditorPackagesVersion>18.4.43</VisualStudioEditorPackagesVersion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to sync with @dibarbet since this will have some impact to our integration tests. At the moment while I write this they're not running, but this would delay their re-enabling further. I'm not taking any position on what we want to do here, just want to call out the potential conflict.

[Export(typeof(ISpanAnnotatorProvider))]
[Name(nameof(CSharpSpanAnnotatorProvider))]
[SupportsFileExtension(".cs")]
[Obsolete] // ISpanAnnotatorProvider is still experimental and subject to change
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General FYI we have an experimental attribute we might start to be able to use for this.


[Export(typeof(ISpanAnnotatorProvider))]
[Name(nameof(CSharpSpanAnnotatorProvider))]
[SupportsFileExtension(".cs")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concept of content types being used here? Not saying that's bad, it's just another model (unless we have other prior art and I missed the memo).

try
{
tokensEnumerator = SyntaxFactory.ParseTokens(
snapshot.GetText(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code path only taken for all files found in Find in Files, or only after we've already determined we don't have the information another way? Because it's quite unfortunate that we have lost file name information here, since it's entirely possible we already have this tree parsed and ready to go if you're using find in files while a solution is open.

private const int LargeObjectHeapLimitInChars = 40 * 1024;

public IEnumerable<SpanAndKind> GetAnnotatedSpans(
ITextSnapshot snapshot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving this since I saw this after replying to the other comment: why aren't we getting an ITextDocument then or something to better pass along the file name? Also should the snapshot be an ITextImage?

@sandyarmstrong sandyarmstrong changed the title [18.4] Add support for labeling Find results as comments or strings [18.5] Add support for labeling Find results as comments or strings Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-IDE Community The pull request was submitted by a contributor who is not a Microsoft employee.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants