The goal of dadosCVM is to obtain and adjust data of investments funds registered on Comissão de Valores Mobiliários (Security and Exchange Commissions from Brazil). Data is collected from files available at http://dados.cvm.gov.br/dados/.
Main goals/ideas are descripted below:
You can install the development version of dadosCVM from GitHub with:
# install.packages("devtools")
devtools::install_github("lucasan93/dadosCVM")
The function cad_fi() downloads the latest registration data available on CVM.
library(dadosCVM)
library(dplyr)
library(kableExtra)
Below the top-10 funds in operation are extracted, sorted by their equity and identified by their CNPJ (Brazil National Registry of Legal Entities number). Note that the data.frame contains 40 columns and more than 60 thousand observations.
dados_cadastrais <- cad_fi() %>%
filter(situacao == 'EM FUNCIONAMENTO NORMAL') %>%
select(cnpj,
classe,
tipo,
inicio_atv,
pl_data,
pl) %>%
arrange(desc(pl)) %>%
slice_head(n = 10) %>%
mutate(pl = format(pl, big.mark = ','))
kable(dados_cadastrais)
cnpj
classe
tipo
inicio\_atv
pl\_data
pl
01.608.573/0001-65
Fundo de Renda Fixa
FI
1997-01-02
2022-03-23
159,651,227,089
27.146.328/0001-77
Fundo de Renda Fixa
FI
2017-05-03
2022-03-23
137,429,549,362
07.593.972/0001-86
Fundo de Renda Fixa
FI
2005-09-21
2022-03-23
122,607,768,840
22.985.157/0001-56
FIP Multi
FIP
2015-09-01
2021-12-31
107,568,311,434
01.597.187/0001-15
Fundo de Renda Fixa
FI
1996-12-30
2022-03-23
95,925,774,092
00.822.055/0001-87
Fundo de Renda Fixa
FI
1995-10-02
2022-03-23
94,753,353,219
42.592.302/0001-46
Fundo de Renda Fixa
FI
2021-09-30
2022-03-23
72,417,562,362
42.592.315/0001-15
Fundo de Renda Fixa
FI
2021-09-30
2022-03-23
72,412,668,571
04.288.966/0001-27
Fundo de Renda Fixa
FI
2002-07-15
2022-03-23
59,114,466,417
03.737.219/0001-66
Fundo de Renda Fixa
FI
2005-08-24
2022-03-23
55,208,583,595
The function dados_diarios() downloads daily data (CNPJ, portfolio value, share value, equity, withdraws, fund raisings, and number of investors) for specifics fund (identified by theis CNPJs) between two given dates. Note that daily data is only available for funds of type ‘FI’ and that initial date must be greater than 2005-01-01.
library(dadosCVM)
library(dplyr)
library(ggplot2)
library(scales)
Below we obtain data from funds displayed in the previous example and plot their equity value over time:
dados_diarios(cnpj = dados_cadastrais$cnpj,
start = as.Date('2005-01-01'),
end = as.Date('2022-03-01')) %>%
select(data, cnpj, pl) %>%
ggplot() +
aes(x = data, y = pl, color = cnpj) +
geom_line() +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Equity (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2021-01-01 and 2022-03-01"
#> [1] "Obtaining data between 2005-01-01 and 2022-03-01"
The function dados_fidc() downloads monthly data for a group of funds (identified by theis CNPJs) between two given dates and for a specified database (identified in CVM’s FIDC’s META file) out of 16 different databases. Initial date must be greater than 2013-01-01.
This function is currently under development. I’m working on simplifying column names and adding the opting to pivot the data.frame format from wide to long.
library(dadosCVM)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(scales)
library(kableExtra)
First, let’s select the top-10 FIDCs in operation with the highest equity value and extract their CNPJs and the operations’ start date using the cad_fi function:
infos <- cad_fi() %>%
filter(situacao == 'EM FUNCIONAMENTO NORMAL',
tipo == 'FIDC') %>%
arrange(desc(pl)) %>%
slice_head(n = 10) %>%
select(cnpj,
inicio_atv)
Now we obtain the database which provides us with information about the top-10 funds’ aggregated portfolio. CVM has a ‘META’ file describing the content of each FIDC database. A quick look at it and we find that the one we are interested in is the ‘II’ table. Soon the package will provide a reference database so this search can be done quicker. Note that although the oldest fund started in 2009, the function automatically set the start date to 2013-01-01 since there’s no data prior to that date. Note that the function pivots the CVM databases to longer and provides two identification columns (‘segment’ and ‘item’), which greatly facilitates analysis and plots.
dados_fidc(cnpj = infos$cnpj,
start = min(infos$inicio_atv),
end = as.Date('2022-02-01'),
table = 'II') %>%
replace(is.na(.), 0) %>%
filter(value > 0) %>%
group_by(data, segment) %>%
summarise(value = sum(value)) %>%
ggplot() +
aes(x = data, y = value, fill = segment) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"
If we are interested in analyzing the items within a specific segment, let’s say ‘financeiro’, we can plot the following:
dados_fidc(cnpj = infos$cnpj,
start = min(infos$inicio_atv),
end = as.Date('2022-02-01'),
table = 'II') %>%
replace(is.na(.), 0) %>%
filter(value > 0, segment == 'financeiro') %>%
group_by(data, item) %>%
summarise(value = sum(value)) %>%
ggplot() +
aes(x = data, y = value, fill = item) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-01"
#> [1] "Obtaining data between 2013-01-01 and 2018-12-31"
And in order to verify how the segments and its items are organized based on their CVM categories, take a look at the defs_fidc database within the package. Note that the observations in which item == ‘total’ are excluded when running the dados_fidc() function in order to avoid double counting.
defs_fidcs %>%
filter(str_detect(category, '^TAB_II_')) %>%
kable(align = c('l', 'l')) %>%
kable_styling(latex_options = 'striped')
category
table
id
base
segment
class
item
TAB\_II\_VL\_CARTEIRA
II
II
carteira
carteira
total
total
TAB\_II\_A\_VL\_INDUST
II
II.A
carteira
industrial
industrial
industrial
TAB\_II\_B\_VL\_IMOBIL
II
II.B
carteira
imobiliario
imobiliario
imobiliario
TAB\_II\_C\_VL\_COMERC
II
II.C
carteira
comercial
total
total
TAB\_II\_C1\_VL\_COMERC
II
II.C.1
carteira
comercial
comercial
comercial
TAB\_II\_C2\_VL\_VAREJO
II
II.C.2
carteira
comercial
varejo
varejo
TAB\_II\_C3\_VL\_ARREND
II
II.C.3
carteira
comercial
arrendamento mercantil
arrendamento mercantil
TAB\_II\_D\_VL\_SERV
II
II.D
carteira
servicos
total
total
TAB\_II\_D1\_VL\_SERV
II
II.D.1
carteira
servicos
servicos
servicos
TAB\_II\_D2\_VL\_SERV\_PUBLICO
II
II.D.2
carteira
servicos
servicos publicos
servicos publicos
TAB\_II\_D3\_VL\_SERV\_EDUC
II
II.D.3
carteira
servicos
servicos educacionais
servicos educacionais
TAB\_II\_D4\_VL\_ENTRET
II
II.D.4
carteira
servicos
servicos entretenimento
servicos entretenimento
TAB\_II\_E\_VL\_AGRONEG
II
II.E
carteira
agronegocio
agronegocio
agronegocio
TAB\_II\_F\_VL\_FINANC
II
II.F
carteira
financeiro
total
total
TAB\_II\_F1\_VL\_CRED\_PESSOA
II
II.F.1
carteira
financeiro
credito pessoal
credito pessoal
TAB\_II\_F2\_VL\_CRED\_PESSOA\_CONSIG
II
II.F.2
carteira
financeiro
credito pessoal consignado
credito pessoal consignado
TAB\_II\_F3\_VL\_CRED\_CORP
II
II.F.3
carteira
financeiro
credito corporativo
credito corporativo
TAB\_II\_F4\_VL\_MIDMARKET
II
II.F.4
carteira
financeiro
middle market
middle market
TAB\_II\_F5\_VL\_VEICULO
II
II.F.5
carteira
financeiro
veiculos
veiculos
TAB\_II\_F6\_VL\_IMOBIL\_EMPRESA
II
II.F.6
carteira
financeiro
carteira imobiliaria empresarial
carteira imobiliaria empresarial
TAB\_II\_F7\_VL\_IMOBIL\_RESID
II
II.F.7
carteira
financeiro
carteira imobiliaria residencial
carteira imobiliaria residencial
TAB\_II\_F8\_VL\_OUTRO
II
II.F.8
carteira
financeiro
outros
outros
TAB\_II\_G\_VL\_CREDITO
II
II.G
carteira
cartao de credito
cartao de credito
cartao de credito
TAB\_II\_H\_VL\_FACTOR
II
II.H
carteira
factoring
total
total
TAB\_II\_H1\_VL\_PESSOA
II
II.H.1
carteira
factoring
pessoal
pessoal
TAB\_II\_H2\_VL\_CORP
II
II.H.2
carteira
factoring
corporativo
corporativo
TAB\_II\_I\_VL\_SETOR\_PUBLICO
II
II.I
carteira
setor publico
total
total
TAB\_II\_I1\_VL\_PRECAT
II
II.I.1
carteira
setor publico
precatorios
precatorios
TAB\_II\_I2\_VL\_TRIBUT
II
II.I.2
carteira
setor publico
creditos tributarios
creditos tributarios
TAB\_II\_I3\_VL\_ROYALTIES
II
II.I.3
carteira
setor publico
royalties
royalties
TAB\_II\_I4\_VL\_OUTRO
II
II.I.4
carteira
setor publico
outros
outros
TAB\_II\_J\_VL\_JUDICIAL
II
II.J
carteira
acoes judiciais
acoes judiciais
acoes judiciais
TAB\_II\_K\_VL\_MARCA
II
II.K
carteira
prop intelectual
prop intelectual
prop intelectual
Now let’s start developing a fund’s balance sheet. We can see through the tabs_fidc database available in the dadosCVM package that the assets’ data are located in the first table:
tabs_fidcs %>%
kable(align = c('l', 'l')) %>%
kable_styling(latex_options = 'striped')
tabela
conteudo
I\_TODOS
informacoes cadastrais dos fundos e ativos
II
classificacoes da carteira
III
passivos
IV
patrimonio liquido
V
direitos creditorios classificados por prazo de vencimento inadimplencia
e pagamentos antecipados
VI
direitos creditorios classificados por prazo de vencimento inadimplencia
e pagamentos antecipados
VII
aquisicoes substituicoes e recompras
IX
taxas de desconto e de juros das compras e vendas
X\_1
informacoes sobre cotistas por classe e serie
X\_1\_1
classificacoes dos cotistas
X\_2
informacoes sobre as cotas
X\_3
rentabilidade
X\_4
transacoes de cotas do fundo
X\_5
ativos classificados pela liquidez
X\_6
taxas de desempenho real e esperada
X\_7
garantias
Let’s then gather the largest FIDC’s assets data and plot its main segments:
dados_fidc(cnpj = infos$cnpj[1],
start = infos$inicio_atv[1],
end = as.Date('2022-03-31'),
table = 'I') %>%
filter(base == 'ativo', value != 0) %>%
group_by(data, segment) %>%
summarise(valor = sum(value)) %>%
ggplot() +
aes(x = data, y = valor, fill = segment) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "Bi", scale = 1e-9)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal()
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"
We can also analyze the items within each asset segment:
dados_fidc(cnpj = infos$cnpj[1],
start = infos$inicio_atv[1],
end = as.Date('2022-03-31'),
table = 'I') %>%
filter(base == 'ativo', value != 0) %>%
group_by(data, segment, item) %>%
summarise(valor = sum(value)) %>%
ggplot() +
aes(x = data, y = valor, fill = item) +
geom_bar(position = 'stack', stat = 'identity') +
scale_y_continuous(labels = unit_format(unit = "MM", scale = 1e-6)) +
xlab('Date') +
ylab('Value (BRL)') +
theme_minimal() +
facet_wrap(~segment, scales = 'free_y')
#> [1] "Obtaining data between 2019-01-01 and 2022-02-28"
#> [1] "Obtaining data between 2016-04-20 and 2018-12-31"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.