Cannot store newlines in text read from URL

I am reading text from a URL using Jsoup. The following link provides some tips for saving newlines when converting body to text. How do I save line breaks when using jsoup to convert html to plain text?

I use the following lines to convert tags

  String prettyPrintedBodyFragment = Jsoup.clean(body, "", Whitelist
            .none().addTags("br", "p",  "h1"), new OutputSettings()
            .prettyPrint(true));
  System.out.println(prettyPrintedBodyFragment);

I still get the body / content on one line. Any tips pl?

EDIT : Here is the complete source code and I see the output in only one line

 public static void main(String[] args) throws Exception {

        Connection conn = Jsoup.connect("http://finance.yahoo.com/");
        Document doc  = conn.get();

         String body = doc.body().text();

        String prettyPrintedBodyFragment = Jsoup.clean(body, "", Whitelist
                .none().addTags("br", "p",  "h1"), new OutputSettings()
                .prettyPrint(true));

        System.out.println(prettyPrintedBodyFragment);



    }
+3
source share
1 answer

Edit:

String body = doc.body().text();

To:

String body = doc.body().html();

Since you are already dropping tags, yours Whitelistcannot turn them on when formatting your text.

+1
source

All Articles