Home

/

GitHub

/

JohnCoene/chirp

/

website/node_modules/truncate-html/readme.md

chirp: 'Twitter' Networks Analyser

truncate-html

Truncate html string and keep tags in safe. You can custom ellipsis sign, ignore unwanted elements and truncate html by words.

Notice This is a node module depends on cheerio can only run on nodejs. If you need a browser version, you may consider truncate or nodejs-html-truncate.

const truncate = require('truncate-html')
truncate('<p><img src="xxx.jpg">Hello from earth!</p>', 2, { byWords: true })
// => <p><img src="xxx.jpg">Hello from ...</p>

npm install truncate-html or yarn add truncate-html

Click https://npm.runkit.com/truncate-html to try.

/**
 * truncate html
 * @method truncate(html, [length], [options])
 * @param  {String|CheerioStatic}         html    html string to truncate, or  existing cheerio instance(aka cheerio $)
 * @param  {Object|number}  length how many letters(words if `byWords` is true) you want reserve
 * @param  {Object|null}    options
 * @param  {Boolean}        [options.stripTags] remove all tags, default false
 * @param  {String}         [options.ellipsis] ellipsis sign, default '...'
 * @param  {Boolean}        [options.decodeEntities] decode html entities(e.g. convert `&amp;` to `&`) before
 *                                                   counting length, default false
 * @param  {String|Array}   [options.excludes] elements' selector you want ignore
 * @param  {Number}         [options.length] how many letters(words if `byWords` is true)
 *                                           you want reserve
 * @param  {Boolean}        [options.byWords] if true, length means how many words to reserve
 * @param  {Boolean|Number} [options.reserveLastWord] how to deal with when truncate in the middle of a word
 *                                1. by default, just cut at that position.
 *                                2. set it to true, with max exceed 10 letters can exceed to reserver the last word
 *                                3. set it to a positive number decide how many letters can exceed to reserve the last word
 *                                4. set it to negetive number to remove the last word if cut in the middle.
 * @param  {Boolean}        [options.keepWhitespaces] keep whitespaces, by default continuous
 *                                spaces will be replaced with one space
 *                                set it true to reserve them, and continuous spaces will count as one
 * @return {String}
 */
truncate(html, [length], [options])
// and truncate.setup to change default options
truncate.setup(options)

{
  byWords: false,
  stripTags: false,
  ellipsis: '...',
  decodeEntities: false,
  keepWhitespaces: false,
  excludes: '',
  reserveLastWord: false,
  keepWhitespaces: false
}

You can change default options by using truncate.setup

e.g.

truncate.setup({ stripTags: true, length: 10 })
truncate('<p><img src="xxx.jpg">Hello from earth!</p>')
// => Hello from

or use existing cheerio instance

import * as cheerio from 'cheerio'
truncate.setup({ stripTags: true, length: 10 })
// truncate option `decodeEntities` will not work
//    you should config it in cheerio options by yourself
const $ = cheerio.load('<p><img src="xxx.jpg">Hello from earth!</p>', {
  /** set decodeEntities if you need it */
  decodeEntities: true
  /* any cheerio instance options*/
})
truncate($)
// => Hello from

If the html string content's length is shorter than options.length, then no ellipsis will be appended to the final html string. If longer, then the final string length will be options.length + options.ellipsis. And if you set reserveLastWord to true of none zero number, the final string will be various.

All html comments  will be removed

When dealing with none alphabetic languages, such as Chinese/Japanese/Korean, they don't separate words with whitespaces, so options byWords and reserveLastWord should only works well with alphabetic languages.

And the only dependency of this project cheerio has an issue when dealing with none alphabetic languages, see Known Issues for details.

If you want to use existing cheerio instance, truncate option decodeEntities will not work, you should set it in your own cheerio instance:

const $ = cheerio.load(`${html}`, {
  decodeEntities: true
  /** other cheerio options */
})

var truncate = require('truncate-html')

// truncate html
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, 10)
// returns: <p><img src="abc.png">This is a ...</p>

// with options, remove all tags
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, 10, { stripTags: true })
// returns: This is a ...

// with options, truncate by words.
//  if you try to truncate none alphabet language(like CJK)
//      it will not act as you wish
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, 3, { byWords: true })
// returns: <p><img src="abc.png">This is a ...</p>

// with options, keep whitespaces
var html = '<p>         <img src="abc.png">This is a string</p> for test.'
truncate(html, 10, { keepWhitespaces: true })
// returns: <p>         <img src="abc.png">This is a ...</p>

// combine length and options
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, {
  length: 10,
  stripTags: true
})
// returns: This is a ...

// custom ellipsis sign
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, {
  length: 10,
  ellipsis: '~'
})
// reutrns: <p><img src="abc.png">This is a ~</p>

// exclude some special elements(by selector), they will be removed before counting content's length
var html = '<p><img src="abc.png">This is a string</p> for test.'
truncate(html, {
  length: 10,
  ellipsis: '~',
  excludes: 'img'
})
// reutrns: <p>This is a ~</p>

// exclude more than one category elements
var html =
  '<p><img src="abc.png">This is a string</p><div class="something-unwanted"> unwanted string inserted ( ´•̥̥̥ω•̥̥̥` ）</div> for test.'
truncate(html, {
  length: 20,
  stripTags: true,
  ellipsis: '~',
  excludes: ['img', '.something-unwanted']
})
// returns: This is a string for~

// handing encoded characters
var html = '<p>&nbsp;test for &lt;p&gt; encoded string</p>'
truncate(html, {
  length: 20,
  decodeEntities: true
})
// returns: <p> test for &lt;p&gt; encode...</p>

// when set decodeEntities false
var html = '<p>&nbsp;test for &lt;p&gt; encoded string</p>'
truncate(html, {
  length: 20,
  decodeEntities: false // this is the dafault value
})
// returns: <p>&nbsp;test for &lt;p...</p>

// and there may be a surprise by setting `decodeEntities` to true  when handing CJK characters
var html = '<p>&nbsp;test for &lt;p&gt; 中文 string</p>'
truncate(html, {
  length: 20,
  decodeEntities: true
})
// returns: <p> test for &lt;p&gt; &#x4E2D;&#x6587; str...</p>
// to fix this, see below for instructions

for More usages, check truncate.spec.ts

Known issues about handing CJK(Chinese/Japanese/Korean) characters when set the option decodeEntities to true.

You have seen the option decodeEntities, it's really magic! When it's true, encoded html entities will be decoded automatically, so & will be treat as a single character. This is probably what we want. But, if there are CJK characters in the html string, they will be replaced by characters like ö(still count as one character when truncating) in the final html you get. That's confused.

To fix this, you have two choices:

keep the option decodeEntities false, but & will treat as five characters.
modify cheerio's source code: find out the function getInverse in the file ./node_modules/cheerio/node_modules/entities/lib/decode.js, comment out the last line .replace(re_nonASCII, singleCharReplacer);.

Thanks to:

@calebeno es6 support and unit tests

JohnCoene/chirp documentation built on May 25, 2021, 6:33 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

JohnCoene/chirp
'Twitter' Networks Analyser

website/node_modules/truncate-html/readme.md
In JohnCoene/chirp: 'Twitter' Networks Analyser

truncate-html

Truncate html string and keep tags in safe. You can custom ellipsis sign, ignore unwanted elements and truncate html by words.

Installation

Try it online

API

Default options

Notice

About final string length

About html comments

About dealing with none alphabetic languages

Using existing cheerio instance

Examples

Known issues

Credits

R Package Documentation

Browse R Packages

We want your feedback!

JohnCoene/chirp 'Twitter' Networks Analyser

website/node_modules/truncate-html/readme.md In JohnCoene/chirp: 'Twitter' Networks Analyser

truncate-html

Truncate html string and keep tags in safe. You can custom ellipsis sign, ignore unwanted elements and truncate html by words.

Installation

Try it online

API

Default options

Notice

About final string length

About html comments

About dealing with none alphabetic languages

Using existing cheerio instance

Examples

Known issues

Credits

R Package Documentation

Browse R Packages

We want your feedback!

JohnCoene/chirp
'Twitter' Networks Analyser

website/node_modules/truncate-html/readme.md
In JohnCoene/chirp: 'Twitter' Networks Analyser