Cannot store newlines in text read from URL

Question

Cannot store newlines in text read from URL

I am reading text from a URL using Jsoup. The following link provides some tips for saving newlines when converting body to text. How do I save line breaks when using jsoup to convert html to plain text?

I use the following lines to convert tags

  String prettyPrintedBodyFragment = Jsoup.clean(body, "", Whitelist
            .none().addTags("br", "p",  "h1"), new OutputSettings()
            .prettyPrint(true));
  System.out.println(prettyPrintedBodyFragment);

I still get the body / content on one line. Any tips pl?

EDIT : Here is the complete source code and I see the output in only one line

 public static void main(String[] args) throws Exception {

        Connection conn = Jsoup.connect("http://finance.yahoo.com/");
        Document doc  = conn.get();

         String body = doc.body().text();

        String prettyPrintedBodyFragment = Jsoup.clean(body, "", Whitelist
                .none().addTags("br", "p",  "h1"), new OutputSettings()
                .prettyPrint(true));

        System.out.println(prettyPrintedBodyFragment);



    }

+3

java jsoup

kashili kashili Feb 10 '14 at 1:40

source share

1 answer

Popofibo · Accepted Answer · 2014-02-10T16:31:19+0000

Edit:

String body = doc.body().text();

To:

String body = doc.body().html();

Since you are already dropping tags, yours Whitelistcannot turn them on when formatting your text.

Cannot store newlines in text read from URL

More articles: