Update dependency org.jsoup:jsoup to v1.21.2 #44

2025-04-29T09:00:26+02:00

renovate commented

2025-04-29 09:00:26 +02:00

This PR contains the following updates:

Package	Type	Update	Change
org.jsoup:jsoup (source)	compile	minor	`1.19.1` -> `1.21.2`

Release Notes

jhy/jsoup (org.jsoup:jsoup)

`v1.21.2`

Changes

Deprecated internal (yet visible) methods Normalizer#normalize(String, bool) and Attribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.
Deprecated Connection#sslSocketFactory(SSLSocketFactory) in favor of the new Connection#sslContext(SSLContext). Using sslSocketFactory will force the use of the legacy HttpUrlConnection implementation, which does not support HTTP/2. #2370

Improvements

When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
Updated Connection.Response#statusMessage() to return a simple loggable string message (e.g. "OK") when using the HttpClient implementation, which doesn't otherwise return any server-set status message. #2356
Attributes#size() and Attributes#isEmpty() now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369
Added Connection#sslContext(SSLContext) to provide a custom SSL (TLS) context to requests, supporting both the HttpClient and the legacy HttUrlConnection implementations. #2370
Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (element.child(0).remove(), and when using Parser#parseBodyFragement() to parse a large number of direct children. #2373.

Bug Fixes

When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
In NodeTraversor, if a last child element was removed during the head() call, the parent would be visited twice. #2355.
Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for Attributes#size() and Attributes#isEmpty(). #2356
In a multithreaded application where multiple threads are calling Element#children() on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366
When parsing HTML with svg:script elements in SVG elements, don't enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an IndexOutOfBoundsException. #2374
Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.

`v1.21.1`

Changes

Removed previously deprecated methods. #2317
Deprecated the :matchText pseduo-selector due to its side effects on the DOM; use the new ::textnode selector and the Element#selectNodes(String css, Class type) method instead. #2343
Deprecated Connection.Response#bufferUp() in lieu of Connection.Response#readFully() which can throw a checked IOException.
Deprecated internal methods Validate#ensureNotNull (replaced by typed Validate#expectNotNull); protected HTML appenders from Attribute and Node.
If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.

Improvements

Enhanced the Selector to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: ::comment:contains(prices) + p will select p elements immediately after a  comment. Supported types include ::node, ::leafnode, ::comment, ::text, ::data, and ::cdata. Node contextual selectors like ::node:contains(text), :matches(regex), and :blank are also supported. Introduced Element#selectNodes(String css) and Element#selectNodes(String css, Class nodeType) for direct node selection. #2324
Added TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).
Made TokenQueue and CharacterReader autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.
Added Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias of QueryParser.parse(String css).
Custom tags (defined via the TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.
Added NodeVisitor#traverse(Node) to simplify node traversal calls (vs. importing NodeTraversor).
Updated the default user-agent string to improve compatibility. #2341
The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #2326.
Added Connection#readFully() as a replacement for Connection#bufferUp() with an explicit IOException. Similarly, added Connection#readBody() over Connection#body(). Deprecated Connection#bufferUp(). #2327
When serializing HTML, the < and > characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337
Changed Connection to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via -Djsoup.useHttpClient=false. #2340

Bug Fixes

The contents of a script in a svg foreign context should be parsed as script data, not text. #2320
Tag#isFormSubmittable() was updating the Tag's options. #2323
The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. #2325
Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. #2332
When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. #2334

`v1.20.1`

Changes

To better follow the HTML5 spec and current browsers, the HTML parser no longer allows self-closing tags (<foo />)
to close HTML elements by default. Foreign content (SVG, MathML), and content parsed with the XML parser, still
supports self-closing tags. If you need specific HTML tags to support self-closing, you can register a custom tag via
the TagSet configured in Parser.tagSet(), using Tag#set(Tag.SelfClose). Standard void tags (such as <img>,
<br>, etc.) continue to behave as usual and are not affected by this
change. #2300.
The following internal components have been deprecated. If you do happen to be using any of these, please take the opportunity now to migrate away from them, as they will be removed in jsoup 1.21.1.
- ChangeNotifyingArrayList, Document.updateMetaCharsetElement(), Document.updateMetaCharsetElement(boolean), HtmlTreeBuilder.isContentForTagData(String), Parser.isContentForTagData(String), Parser.setTreeBuilder(TreeBuilder), Tag.formatAsBlock(), Tag.isFormListed(), TokenQueue.addFirst(String), TokenQueue.chompTo(String), TokenQueue.chompToIgnoreCase(String), TokenQueue.consumeToIgnoreCase(String), TokenQueue.consumeWord(), TokenQueue.matchesAny(String...)

Functional Improvements

Rebuilt the HTML pretty-printer, to simplify and consolidate the implementation, improve consistency, support custom
Tags, and provide a cleaner path for ongoing improvements. The specific HTML produced by the pretty-printer may be
different from previous versions. #2286.
Added the ability to define custom tags, and to modify properties of known tags, via the TagSet tag collection.
Their properties can impact both the parse and how content is
serialized (output as HTML or XML). #2285.
Element.cssSelector() will prefer to return shorter selectors by using ancestor IDs when available and unique. E.g.
#id > div > p instead of html > body > div > div > p #2283.
Added Elements.deselect(int index), Elements.deselect(Object o), and Elements.deselectAll() methods to remove
elements from the Elements list without removing them from the underlying DOM. Also added Elements.asList() method
to get a modifiable list of elements without affecting the DOM. (Individual Elements remain linked to the
DOM.) #2100.
Added support for sending a request body from an InputStream with
Connection.requestBodyStream(InputStream stream). #1122.
The XML parser now supports scoped xmlns: prefix namespace declarations, and applies the correct namespace to Tags and
Attributes. Also, added Tag#prefix(), Tag#localName(), Attribute#prefix(), Attribute#localName(), and
Attribute#namespace() to retrieve these. #2299.
CSS identifiers are now escaped and unescaped correctly to the CSS spec. Element#cssSelector() will emit
appropriately escaped selectors, and the QueryParser supports those. Added Selector.escapeCssIdentifier() and
Selector.unescapeCssIdentifier(). #2297, #2305

Structure and Performance Improvements

Refactored the CSS QueryParser into a clearer recursive descent
parser. #2310.
CSS selectors with consecutive combinators (e.g. div >> p) will throw an explicit parse
exception. #2311.
Performance: reduced the shallow size of an Element from 40 to 32 bytes, and the NodeList from 32 to 24.
#2307.
Performance: reduced GC load of new StringBuilders when tokenizing input
HTML. #2304.
Made Parser instances threadsafe, so that inadvertent use of the same instance across threads will not lead to
errors. For actual concurrency, use Parser#newInstance() per
thread. #2314.

Bug Fixes

Element names containing characters invalid in XML are now normalized to valid XML names when
serializing. #1496.
When serializing to XML, characters that are invalid in XML 1.0 should be removed (not
encoded). #1743.
When converting a Document to the W3C DOM in W3CDom, elements with an attribute in an undeclared namespace now
get a declaration of xmlns:prefix="undefined". This allows subsequent serialization to XML via W3CDom.asString()
to succeed. #2087.
The StreamParser could emit the final elements of a document twice, due to how onNodeCompleted was fired when closing out the stack. #2295.
When parsing with the XML parser and error tracking enabled, the trailing ? in <?xml version="1.0"?> would
incorrectly emit an error. #2298.
Calling Element#cssSelector() on an element with combining characters in the class or ID now produces the correct output. #1984.

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [org.jsoup:jsoup](https://jsoup.org/) ([source](https://github.com/jhy/jsoup)) | compile | minor | `1.19.1` -> `1.21.2` | --- ### Release Notes <details> <summary>jhy/jsoup (org.jsoup:jsoup)</summary> ### [`v1.21.2`](https://github.com/jhy/jsoup/blob/HEAD/CHANGES.md#1212-2025-Aug-25) ##### Changes - Deprecated internal (yet visible) methods `Normalizer#normalize(String, bool)` and `Attribute#shouldCollapseAttribute(Document.OutputSettings)`. These will be removed in a future version. - Deprecated `Connection#sslSocketFactory(SSLSocketFactory)` in favor of the new `Connection#sslContext(SSLContext)`. Using `sslSocketFactory` will force the use of the legacy `HttpUrlConnection` implementation, which does not support HTTP/2. [#2370](https://github.com/jhy/jsoup/pull/2370) ##### Improvements - When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. [#2349](https://github.com/jhy/jsoup/pull/2349). - Updated `Connection.Response#statusMessage()` to return a simple loggable string message (e.g. "OK") when using the `HttpClient` implementation, which doesn't otherwise return any server-set status message. [#2356](https://github.com/jhy/jsoup/issues/2346) - `Attributes#size()` and `Attributes#isEmpty()` now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. [#2369](https://github.com/jhy/jsoup/pull/2369) - Added `Connection#sslContext(SSLContext)` to provide a custom SSL (TLS) context to requests, supporting both the `HttpClient` and the legacy `HttUrlConnection` implementations. [#2370](https://github.com/jhy/jsoup/pull/2370) - Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (`element.child(0).remove()`, and when using `Parser#parseBodyFragement()` to parse a large number of direct children. [#2373](https://github.com/jhy/jsoup/pull/2373). ##### Bug Fixes - When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. [#2353](https://github.com/jhy/jsoup/issues/2353). - In `NodeTraversor`, if a last child element was removed during the `head()` call, the parent would be visited twice. [#2355](https://github.com/jhy/jsoup/issues/2355). - Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for `Attributes#size()` and `Attributes#isEmpty()`. [#2356](https://github.com/jhy/jsoup/issues/2356) - In a multithreaded application where multiple threads are calling `Element#children()` on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. [#2366](https://github.com/jhy/jsoup/issues/2366) - When parsing HTML with svg:script elements in SVG elements, don't enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an IndexOutOfBoundsException. [#2374](https://github.com/jhy/jsoup/issues/2374) - Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. [#2377](https://github.com/jhy/jsoup/pull/2377). ### [`v1.21.1`](https://github.com/jhy/jsoup/blob/HEAD/CHANGES.md#1211-2025-Jun-23) ##### Changes - Removed previously deprecated methods. [#2317](https://github.com/jhy/jsoup/pull/2317) - Deprecated the `:matchText` pseduo-selector due to its side effects on the DOM; use the new `::textnode` selector and the `Element#selectNodes(String css, Class type)` method instead. [#2343](https://github.com/jhy/jsoup/pull/2343) - Deprecated `Connection.Response#bufferUp()` in lieu of `Connection.Response#readFully()` which can throw a checked IOException. - Deprecated internal methods `Validate#ensureNotNull` (replaced by typed `Validate#expectNotNull`); protected HTML appenders from Attribute and Node. - If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release. ##### Improvements - Enhanced the `Selector` to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: `::comment:contains(prices) + p` will select `p` elements immediately after a `` comment. Supported types include `::node`, `::leafnode`, `::comment`, `::text`, `::data`, and `::cdata`. Node contextual selectors like `::node:contains(text)`, `:matches(regex)`, and `:blank` are also supported. Introduced `Element#selectNodes(String css)` and `Element#selectNodes(String css, Class nodeType)` for direct node selection. [#2324](https://github.com/jhy/jsoup/pull/2324) - Added `TagSet#onNewTag(Consumer<Tag> customizer)`: register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace). - Made `TokenQueue` and `CharacterReader` autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse. - Added `Selector#evaluatorOf(String css)`, as a clearer way to obtain an Evaluator from a CSS query. An alias of `QueryParser.parse(String css)`. - Custom tags (defined via the `TagSet`) in a foreign namespace (e.g. SVG) can be configured to parse as data tags. - Added `NodeVisitor#traverse(Node)` to simplify node traversal calls (vs. importing `NodeTraversor`). - Updated the default user-agent string to improve compatibility. [#2341](https://github.com/jhy/jsoup/issues/2341) - The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) [#2326](https://github.com/jhy/jsoup/issues/2326). - Added `Connection#readFully()` as a replacement for `Connection#bufferUp()` with an explicit IOException. Similarly, added `Connection#readBody()` over `Connection#body()`. Deprecated `Connection#bufferUp()`. [#2327](https://github.com/jhy/jsoup/pull/2327) - When serializing HTML, the `<` and `>` characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. [#2337](https://github.com/jhy/jsoup/pull/2337) - Changed `Connection` to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via `-Djsoup.useHttpClient=false`. [#2340](https://github.com/jhy/jsoup/pull/2340) ##### Bug Fixes - The contents of a `script` in a `svg` foreign context should be parsed as script data, not text. [#2320](https://github.com/jhy/jsoup/issues/2320) - `Tag#isFormSubmittable()` was updating the Tag's options. [#2323](https://github.com/jhy/jsoup/issues/2323) - The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. [#2325](https://github.com/jhy/jsoup/issues/2325) - Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. [#2332](https://github.com/jhy/jsoup/issues/2332) - When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. [#2334](https://github.com/jhy/jsoup/issues/2334) ### [`v1.20.1`](https://github.com/jhy/jsoup/blob/HEAD/CHANGES.md#1201-2025-Apr-29) ##### Changes - To better follow the HTML5 spec and current browsers, the HTML parser no longer allows self-closing tags (`<foo />`) to close HTML elements by default. Foreign content (SVG, MathML), and content parsed with the XML parser, still supports self-closing tags. If you need specific HTML tags to support self-closing, you can register a custom tag via the `TagSet` configured in `Parser.tagSet()`, using `Tag#set(Tag.SelfClose)`. Standard void tags (such as `<img>`, `<br>`, etc.) continue to behave as usual and are not affected by this change. [#2300](https://github.com/jhy/jsoup/issues/2300). - The following internal components have been **deprecated**. If you do happen to be using any of these, please take the opportunity now to migrate away from them, as they will be removed in jsoup 1.21.1. - `ChangeNotifyingArrayList`, `Document.updateMetaCharsetElement()`, `Document.updateMetaCharsetElement(boolean)`, `HtmlTreeBuilder.isContentForTagData(String)`, `Parser.isContentForTagData(String)`, `Parser.setTreeBuilder(TreeBuilder)`, `Tag.formatAsBlock()`, `Tag.isFormListed()`, `TokenQueue.addFirst(String)`, `TokenQueue.chompTo(String)`, `TokenQueue.chompToIgnoreCase(String)`, `TokenQueue.consumeToIgnoreCase(String)`, `TokenQueue.consumeWord()`, `TokenQueue.matchesAny(String...)` ##### Functional Improvements - Rebuilt the HTML pretty-printer, to simplify and consolidate the implementation, improve consistency, support custom Tags, and provide a cleaner path for ongoing improvements. The specific HTML produced by the pretty-printer may be different from previous versions. [#2286](https://github.com/jhy/jsoup/issues/2286). - Added the ability to define custom tags, and to modify properties of known tags, via the `TagSet` tag collection. Their properties can impact both the parse and how content is serialized (output as HTML or XML). [#2285](https://github.com/jhy/jsoup/issues/2285). - `Element.cssSelector()` will prefer to return shorter selectors by using ancestor IDs when available and unique. E.g. `#id > div > p` instead of `html > body > div > div > p` [#2283](https://github.com/jhy/jsoup/pull/2283). - Added `Elements.deselect(int index)`, `Elements.deselect(Object o)`, and `Elements.deselectAll()` methods to remove elements from the `Elements` list without removing them from the underlying DOM. Also added `Elements.asList()` method to get a modifiable list of elements without affecting the DOM. (Individual Elements remain linked to the DOM.) [#2100](https://github.com/jhy/jsoup/issues/2100). - Added support for sending a request body from an InputStream with `Connection.requestBodyStream(InputStream stream)`. [#1122](https://github.com/jhy/jsoup/issues/1122). - The XML parser now supports scoped xmlns: prefix namespace declarations, and applies the correct namespace to Tags and Attributes. Also, added `Tag#prefix()`, `Tag#localName()`, `Attribute#prefix()`, `Attribute#localName()`, and `Attribute#namespace()` to retrieve these. [#2299](https://github.com/jhy/jsoup/issues/2299). - CSS identifiers are now escaped and unescaped correctly to the CSS spec. `Element#cssSelector()` will emit appropriately escaped selectors, and the QueryParser supports those. Added `Selector.escapeCssIdentifier()` and `Selector.unescapeCssIdentifier()`. [#2297](https://github.com/jhy/jsoup/pull/2297), [#2305](https://github.com/jhy/jsoup/pull/2305) ##### Structure and Performance Improvements - Refactored the CSS `QueryParser` into a clearer recursive descent parser. [#2310](https://github.com/jhy/jsoup/pull/2310). - CSS selectors with consecutive combinators (e.g. `div >> p`) will throw an explicit parse exception. [#2311](https://github.com/jhy/jsoup/pull/2311). - Performance: reduced the shallow size of an Element from 40 to 32 bytes, and the NodeList from 32 to 24. [#2307](https://github.com/jhy/jsoup/pull/2307). - Performance: reduced GC load of new StringBuilders when tokenizing input HTML. [#2304](https://github.com/jhy/jsoup/pull/2304). - Made `Parser` instances threadsafe, so that inadvertent use of the same instance across threads will not lead to errors. For actual concurrency, use `Parser#newInstance()` per thread. [#2314](https://github.com/jhy/jsoup/pull/2314). ##### Bug Fixes - Element names containing characters invalid in XML are now normalized to valid XML names when serializing. [#1496](https://github.com/jhy/jsoup/issues/1496). - When serializing to XML, characters that are invalid in XML 1.0 should be removed (not encoded). [#1743](https://github.com/jhy/jsoup/issues/1743). - When converting a `Document` to the W3C DOM in `W3CDom`, elements with an attribute in an undeclared namespace now get a declaration of `xmlns:prefix="undefined"`. This allows subsequent serialization to XML via `W3CDom.asString()` to succeed. [#2087](https://github.com/jhy/jsoup/issues/2087). - The `StreamParser` could emit the final elements of a document twice, due to how `onNodeCompleted` was fired when closing out the stack. [#2295](https://github.com/jhy/jsoup/issues/2295). - When parsing with the XML parser and error tracking enabled, the trailing `?` in `<?xml version="1.0"?>` would incorrectly emit an error. [#2298](https://github.com/jhy/jsoup/issues/2298). - Calling `Element#cssSelector()` on an element with combining characters in the class or ID now produces the correct output. [#1984](https://github.com/jhy/jsoup/issues/1984). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).

renovate added 1 commit 2025-04-29 09:00:27 +02:00

Update dependency org.jsoup:jsoup to v1.20.1

continuous-integration/drone/pr Build is failing

Details

0da2b82707

renovate changed title from ~~Update dependency org.jsoup:jsoup to v1.20.1~~ to Update dependency org.jsoup:jsoup to v1.21.1

2025-06-23 07:00:32 +02:00

renovate force-pushed renovate/org.jsoup-jsoup-1.x from 0da2b82707 to 2887ff6123

2025-06-23 07:00:33 +02:00

Compare

renovate changed title from ~~Update dependency org.jsoup:jsoup to v1.21.1~~ to Update dependency org.jsoup:jsoup to v1.21.2

2025-11-06 14:44:16 +01:00

renovate force-pushed renovate/org.jsoup-jsoup-1.x from 2887ff6123 to 2df821b28b

2025-11-06 14:44:16 +01:00

Compare

bea merged commit 1227fff1b1 into main

2025-11-06 16:13:57 +01:00

bea deleted branch renovate/org.jsoup-jsoup-1.x

2025-11-06 16:13:58 +01:00

bea referenced this issue from a commit

2025-11-06 16:13:59 +01:00

Merge pull request 'Update dependency org.jsoup:jsoup to v1.21.2' (#44) from renovate/org.jsoup-jsoup-1.x into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: bea/HidekoBot#44