Discussion:
[jira] [Created] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
Ben Fortuna (JIRA)
2016-08-19 05:07:20 UTC
Permalink
Ben Fortuna created COCOON-2352:
-----------------------------------

Summary: XMLEncoder doesn't support Unicode surrogate pairs
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core
Reporter: Ben Fortuna


Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.

A simple unit test that demonstrates this issue is here:

https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy

More background info here also: SLING-5973

This seems to have been identified/addressed in other Apache projects also:

https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-09-15 03:55:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492282#comment-15492282 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

So I've looked at XMLEncoder, and it seems that the fix will require a change to the method signature - specifically XMLEncoder.encode(char c):

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java#L88

Unfortunately this also means the Encoder interface needs to change, so will need an exercise to identify what else implements this interface. The proposed change would be something like:

public char[] Encoder.encode(char[] chars)

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/Encoder.java#L36

I'm happy to implement a fix and submit a pull request, just looking for some acknowledgement of the issue before proceeding.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-09-15 04:16:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492310#comment-15492310 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

A possibly less-instrusive approach would be to leave the method signatures as is, but when a surrogate char is detected, record it and return an empty char array. Expect the second surrogate in the pair to be encoded next and return the correct char array result (if second surrogate in the pair isn't encoded throw encoding exception).
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-09-15 07:26:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492633#comment-15492633 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

Hi Ben, thanks for reporting.

Just for confirmation: is this bug identified against Cocoon 2.1? Also with latest development version available at [1]? (svn checkout from [2]).

Are you willing to provide a patch (possibly including an unit test)?

[1] http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
[2] http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-09-16 01:37:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495081#comment-15495081 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Hi Francesco,

The JAR I am using is: org.apache.cocoon:cocoon-serializers-charsets:1.0.2 - which appears to be built in 2012. It looks like it came from the BRANCH_2_1.X branch but I can't be certain..

I will try to make a patch - the easiest for me would a pull request on GitHub, but if you prefer a patch file I can do that also.

I am looking at the unit tests in the project and it is a little difficult to get my head around. Would you prefer that I write a unit test using htmlunit, or junit, or no preference? It appears tests haven't been updated for a number of years. Many thanks.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-09-16 06:57:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Chicchiriccò updated COCOON-2352:
-------------------------------------------
Component/s: Blocks: Serializers
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-09-16 07:08:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495597#comment-15495597 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

XMLEncoder (for Cocoon 2.1) is at

http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java

while the Encoder interface is at

https://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/Encoder.java

As you say above, there are around several implementations of such interface.

Also, have you already taken a look at

https://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/util/EncodingSerializer.java

?
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-09-16 07:23:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Fortuna updated COCOON-2352:
--------------------------------
Comment: was deleted

(was: Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to change the Encoder interface so I'll see if it's possible to just update XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs to be encoded "together", so the logic really needs to be in the delegate encoder (i.e. XMLEncoder).
)
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-09-16 07:23:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495623#comment-15495623 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to change the Encoder interface so I'll see if it's possible to just update XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs to be encoded "together", so the logic really needs to be in the delegate encoder (i.e. XMLEncoder).
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-09-16 07:23:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495624#comment-15495624 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to change the Encoder interface so I'll see if it's possible to just update XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs to be encoded "together", so the logic really needs to be in the delegate encoder (i.e. XMLEncoder).
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-09-16 07:28:22 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495631#comment-15495631 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

Understand, thanks for working on this.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-10-10 01:31:22 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560981#comment-15560981 ]

ASF GitHub Bot commented on COCOON-2352:
----------------------------------------

GitHub user benfortuna opened a pull request:

https://github.com/apache/cocoon/pull/1

Support for Unicode surrogate pairs

This PR adds support for encoding surrogate pairs as a single character the XMLEncoder implementation. See [COCOON-2352](https://issues.apache.org/jira/browse/COCOON-2352) for further details.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benfortuna/cocoon BRANCH_2_1_X

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cocoon/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1

----
commit 4975a555b8330446089c81e17e8bfaaaee669600
Author: Ben Fortuna <***@gmail.com>
Date: 2016-10-10T00:11:32Z

Added required folder for build

commit cf2d9b65eb55b9d19a0b0c179e90fe7c7b70b6e6
Author: Ben Fortuna <***@gmail.com>
Date: 2016-10-10T00:11:58Z

Added support for decoding surrogate pairs

commit cc68b0040c5afc6286dc767810ea2ec7abd58340
Author: Ben Fortuna <***@gmail.com>
Date: 2016-10-10T01:26:20Z

Added unit test for encoding unicode surrogate pairs

----
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-10 01:36:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560988#comment-15560988 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

I've just created a pull request in github to add support for surrogate pairs.

https://github.com/apache/cocoon/pull/1

Summary of changes:

* Added instance variable to XMLEncoder to record the first surrogate of the pair - NOTE: this means the XMLEncoder is no longer thread safe. This may have implications I'm not aware of (i.e. usage in multi-threaded way)
* Added unit test to demonstrate the behaviour - NOTE: I needed to add the serializers project to the test classpath, not sure if there is a better way to do this with the ant config.

I look forward to any feedback or comments.

regards,
ben
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-10-10 07:29:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561538#comment-15561538 ]

ASF GitHub Bot commented on COCOON-2352:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/cocoon/pull/1
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-10 07:33:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561543#comment-15561543 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

Hi [~fortuna], thanks for your PR (which is also the very first coming from github, wow...)!

As you can see from [1] (I had to download the PR as diff, then rework it a bit to make it compatible with Cocoon 2.1 JUnit tests [2]), your changes are now incorporated.
I have also added [3] to properly handle XMLEncoder#highSurrogate re-initialization.

Shall we close this issue, then?

[1] http://svn.apache.org/viewvc?view=revision&revision=1764023
[2] http://cocoon.apache.org/2.1/installing/tests.html
[3] http://svn.apache.org/viewvc/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/EncodingSerializer.java?r1=1764023&r2=1764022&pathrev=1764023
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-10 07:39:22 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Chicchiriccò reassigned COCOON-2352:
----------------------------------------------

Assignee: Francesco Chicchiriccò
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-10 07:40:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Chicchiriccò updated COCOON-2352:
-------------------------------------------
Fix Version/s: 2.1.13
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-10 07:40:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Chicchiriccò updated COCOON-2352:
-------------------------------------------
Affects Version/s: 2.1.12
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-10-10 08:19:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561631#comment-15561631 ]

Hudson commented on COCOON-2352:
--------------------------------

FAILURE: Integrated in Jenkins build Cocoon 2.1.X #111 (See [https://builds.apache.org/job/Cocoon%202.1.X/111/])
[COCOON-2352] Support for Unicode surrogate pairs - This closes #1 (ilgrosso: [http://svn.apache.org/viewvc/?view=rev&rev=1764023])
* (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/EncodingSerializer.java
* (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* (add) BRANCH_2_1_X/src/blocks/serializers/test
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-10 22:26:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563730#comment-15563730 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Hmm, I guess from that failed build that you are still maintaining compatibility with Java 1.4 (Character.isLowSurrogate() was introduced in 1.5). I guess we can work around that although I'm not sure anyone is using Java 1.4 anymore.. ;)
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-11 07:01:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564709#comment-15564709 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

We have just decided to upgrade to 1.5 compatibility in COCOON-2356, so I am happy to keep your contribution.
A subsequent build [1] succeeded, in fact.

Can we close this issue, then?

[1] https://builds.apache.org/job/Cocoon%202.1.X/112
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-12 23:15:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570179#comment-15570179 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

[~ilgrosso] I am happy to have this issue closed, however it would be good if there was a snapshot JAR available to verify the functionality. Specifically I am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your efforts.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-12 23:16:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570179#comment-15570179 ]

Ben Fortuna edited comment on COCOON-2352 at 10/12/16 11:15 PM:
----------------------------------------------------------------

[~ilgrosso] I am happy to have this issue closed, however it would be good if there was a snapshot JAR available to verify the functionality. Specifically I am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your efforts.


was (Author: fortuna):
[~ilgrosso] I am happy to have this issue closed, however it would be good if there was a snapshot JAR available to verify the functionality. Specifically I am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your efforts.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-13 06:50:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571065#comment-15571065 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

I have reworked your patch to be also applied to the org.apache.cocoon:cocoon-serializers-charsets Maven artifact (used by Cocoon 2.2 and Cocoon 3.0).

I don't know when we will be able to officially release your fix there; in the meanwhile, however, you could use the SNAPSHOT artifact by setting the following dependency:

<dependency>
<groupId>org.apache.cocoon</groupId>
<artifactId>cocoon-serializers-charsets</artifactId>
<version>1.0.3-SNAPSHOT</version>
</dependency>

and adding the following repository to your pom:

<repository>
<id>apache.snapshots</id>
<name>Apache Snapshot Repository</name>
<url>http://repository.apache.org/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
</repository>

Alternatively, you can download the updated SNAPSHOT artifact from

https://repository.apache.org/content/groups/snapshots/org/apache/cocoon/cocoon-serializers-charsets/1.0.3-SNAPSHOT/cocoon-serializers-charsets-1.0.3-20161013.064604-1.jar
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-13 06:51:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Chicchiriccò closed COCOON-2352.
------------------------------------------
Resolution: Fixed
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-14 04:56:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574193#comment-15574193 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

[~ilgrosso] Fantastic, thanks. I've used the snapshot dependency to test the fix in my project and I did notice one more thing.. whilst it does create the unicode character correctly from the surrogate pair it doesn't actually HTML encode the character.

In order to fix this I've created another pull request, which simply encodes the unicode character created from the surrogate pair:

https://github.com/apache/cocoon/pull/2/files#diff-2b4ac8dab4cdcce4c7ffd948c2490b52R101

I hope it isn't too much trouble to apply this change also, I'm confident this is the last change required. Many thanks.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-14 06:48:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574444#comment-15574444 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

Ben, I have applied your further PR in [1] but I have unfortunately noticed later that the test is failing in this assertion:

assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));

Unfortunately, I have noticed this *after* committing to COCOON_2_1_X, but I have stopped myself right before deploying the updated SNAPSHOT artifact (thanks Maven and the surefire plugin!).

Does your test case need to be updated as well?

[1] http://svn.apache.org/viewvc?rev=1764819&view=rev
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-10-14 08:08:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574611#comment-15574611 ]

Hudson commented on COCOON-2352:
--------------------------------

SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #115 (See [https://builds.apache.org/job/Cocoon%202.1.X/115/])
[COCOON-2352] This closes #2 (ilgrosso: [http://svn.apache.org/viewvc/?view=rev&rev=1764819])
* (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-16 23:15:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580714#comment-15580714 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Yes sorry, I forgot to mention I had updated the unit test also. See the same PR for the changes.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-16 23:39:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580714#comment-15580714 ]

Ben Fortuna edited comment on COCOON-2352 at 10/16/16 11:39 PM:
----------------------------------------------------------------

Yes sorry, I forgot to mention I had updated the unit test also. See the same PR for the changes (3 lines in the test method).

https://github.com/apache/cocoon/pull/2/files#diff-4f5d5b9cb8b320832b3f0dfb8183a1b9R28




was (Author: fortuna):
Yes sorry, I forgot to mention I had updated the unit test also. See the same PR for the changes.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-17 07:07:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581450#comment-15581450 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

With the new test code, I receive

java.lang.IllegalArgumentException: Expected low surrogate char
at org.apache.cocoon.components.serializers.encoding.XMLEncoder.encode(XMLEncoder.java:97)
at org.apache.cocoon.components.serializers.encoding.XMLEncoderTestCase.testEncodingSurrogatePairs(XMLEncoderTestCase.java:42)

when running the test.
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-17 12:45:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582147#comment-15582147 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines.

Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-17 12:45:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582147#comment-15582147 ]

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:45 PM:
----------------------------------------------------------------

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines.

Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```



was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines.

Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-17 12:46:59 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582147#comment-15582147 ]

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:46 PM:
----------------------------------------------------------------

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines.

Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this:

char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));




was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines.

Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-17 13:01:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582183#comment-15582183 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

My bad: problem solved, committed to (Cocoon 2.1)

* http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java

and (Maven artifact, with SNAPSHOT already redeployed):

* http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/test/java/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.java
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-10-17 13:08:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582202#comment-15582202 ]

Hudson commented on COCOON-2352:
--------------------------------

SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #116 (See [https://builds.apache.org/job/Cocoon%202.1.X/116/])
[COCOON-2352] Applying further changes to better deal with HTML encoding (ilgrosso: [http://svn.apache.org/viewvc/?view=rev&rev=1765265])
* (edit) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-17 13:56:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582333#comment-15582333 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

Great! I've tested the snapshot against my code and it looks good. Many thanks for your assistance. :-)
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-17 13:58:59 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582339#comment-15582339 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

You're welcome ;-)
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ben Fortuna (JIRA)
2016-10-20 01:25:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590441#comment-15590441 ]

Ben Fortuna commented on COCOON-2352:
-------------------------------------

My sincerest apologies, but I discovered a bug in the patch I submitted. Unfortunately I had assumed we can cast an int to a char to encode the higher order unicode characters, but of course this isn't possible and is why unicode surrogate pairs exist in the first place..

So I had to make a slight change to the code (again) - I have updated two files: XMLEncoder and XMLEncoderTestCase to ensure that after combining a surrogate pair to a code point we are then correctly encoding the int value as an HTML-compatible string.

https://github.com/apache/cocoon/pull/3/files

Thanks again, and fingers crossed there are no more changes required. :-)
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Francesco Chicchiriccò (JIRA)
2016-10-20 11:49:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591607#comment-15591607 ]

Francesco Chicchiriccò commented on COCOON-2352:
------------------------------------------------

Further changes committed with [1] (Cocoon 2.1) and [2] (Cocoon XML Serializers); 1.0.3-SNAPSHOT redeployed to Maven repo.

Thanks again.

[1] http://svn.apache.org/viewvc?rev=1765804&view=rev
[2] http://svn.apache.org/viewvc?rev=1765807&view=rev
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-10-20 12:28:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591691#comment-15591691 ]

Hudson commented on COCOON-2352:
--------------------------------

SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #117 (See [https://builds.apache.org/job/Cocoon%202.1.X/117/])
[COCOON-2352] Third PR applied - This closes #3 (ilgrosso: [http://svn.apache.org/viewvc/?view=rev&rev=1765804])
* (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* (edit) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java
Post by Ben Fortuna (JIRA)
XMLEncoder doesn't support Unicode surrogate pairs
--------------------------------------------------
Key: COCOON-2352
URL: https://issues.apache.org/jira/browse/COCOON-2352
Project: Cocoon
Issue Type: Bug
Components: * Cocoon Core, Blocks: Serializers
Affects Versions: 2.1.12
Reporter: Ben Fortuna
Assignee: Francesco Chicchiriccò
Fix For: 2.1.13
Whilst investigating an issue with the Sling project and support for emoji characters, I've come to notice that the XMLEncoder used by HTMLSerializer doesn't support Unicode surrogate pairs to represent higher order unicode characters.
https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
More background info here also: SLING-5973
https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...